Streaming

Server-Sent Events on both endpoints. Set stream: true in the request body and read a text/event-stream response.

Two formats, one transport

Nimbus emits the wire format the endpoint owns:

  • /v1/chat/completions — OpenAI chunk format. Every event line is data: . Terminal event is the literal data: [DONE].
  • /v1/messages — Anthropic typed events. Each event has an event: line and a data: line. No [DONE] sentinel; message_stop terminates.

OpenAI: /v1/chat/completions stream

text
data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}

data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":2,"total_tokens":14}}

data: [DONE]

Event catalog

linewhenpayload
data: { ... chunk ... }Every delta (0+ per response).OpenAI chat.completion.chunk shape. First chunk carries role: 'assistant', later chunks carry content or tool_calls deltas, terminal chunk carries finish_reason and usage.
data: [DONE]Exactly once, at end.Literal string. Signals end-of-stream. Not JSON — do not try to parse.

Tool-call streaming

Tool calls stream as partial arguments strings. Concatenate the string fragments across chunks; the resulting JSON is only guaranteed well-formed after finish_reason: tool_calls.

text
data: {..."delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]}}

data: {..."delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"ci"}}]}}

data: {..."delta":{"tool_calls":[{"index":0,"function":{"arguments":"ty\":\"Reykjavik\"}"}}]}}

data: {..."delta":{},"finish_reason":"tool_calls"}

data: [DONE]

Anthropic: /v1/messages stream

text
event: message_start
data: {"type":"message_start","message":{"id":"msg_01H8...","type":"message","role":"assistant","model":"anthropic/claude-opus-4.5","content":[],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":41,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" world"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":2}}

event: message_stop
data: {"type":"message_stop"}

Event catalog

eventwhenpayload
message_startOnce, at start.Full message envelope with empty content and initial input_tokens.
content_block_startPer block.Emitted before each content block. content_block.type is text or tool_use.
content_block_deltaMany per block.delta.type is text_delta (text field) or input_json_delta (partial_json field for tool inputs).
content_block_stopPer block.Emitted when a block finishes.
message_deltaOnce, near end.delta carries final stop_reason + stop_sequence; usage carries output_tokens.
message_stopExactly once, at end.Terminal event. No further data will arrive.
pingHeartbeat, ~every 15s.Keeps proxies from closing the connection. Safe to ignore.
errorOn upstream failure mid-stream.Same JSON shape as a non-streaming error body. HTTP status remains 200 because headers already flushed.

Tool-use streaming

Tool inputs stream as input_json_delta events carrying partial_json string fragments. Concatenate them across the block; parse only after content_block_stop.

text
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"tool_use","id":"toolu_01A","name":"get_weather","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"{\"ci"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"ty\":\"Reykjavik\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"}}

event: message_stop
data: {"type":"message_stop"}

SDK examples

from openai import OpenAI

client = OpenAI(base_url="https://llm.nimbusapi.net/v1", api_key="sk-nim-YOUR_KEY")

stream = client.chat.completions.create(
    model="openai/gpt-5.1",
    messages=[{"role": "user", "content": "Count to 5."}],
    max_tokens=64,
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
    if chunk.choices[0].finish_reason:
        print()  # final newline
        break

Reconnection

SSE supports resumption via the Last-Event-ID header. Nimbus emits an id: line before every data: for the Anthropic format. If your client disconnects mid-stream, reconnect to the same URL and send the last observed ID as Last-Event-ID — Nimbus replays from that offset when the upstream supports it.

Caveat. OpenAI-family upstreams do not support mid-stream resume. Nimbus will reject the reconnect with 400 resume_not_supported for those models. Treat resume as a best-effort optimization; always be prepared to re-issue the full request.

Errors mid-stream

If the upstream fails after the first byte, the HTTP status is already 200 and cannot change. Nimbus emits a terminal event: error with the same JSON body shape as a non-streaming error, then closes the connection. Handle it in your stream loop — do not assume every stream ends with [DONE] or message_stop.

text
event: error
data: {"type":"error","error":{"type":"upstream_error","message":"Upstream anthropic disconnected after 812 output tokens.","code":"upstream_disconnect","request_id":"req_01H9..."}}

Edge cases

  • Disable buffering. If you run behind Nginx, set proxy_buffering off for the endpoint or the client will see the stream in giant delayed bursts.
  • curl requires -N. Without -N (no-buffer), curl buffers the full response before printing.
  • Idle timeouts. Nimbus emits ping events every ~15s on long generations. Do not set an inactivity timeout below 30s on your reader.
  • Usage in streaming. Final token counts arrive on the terminal chunk (OpenAI, in usage) or the message_delta (Anthropic, in usage.output_tokens). If you need billing precision, read that final event before closing.