Streaming

Server-Sent Events on both endpoints. Set stream: true in the request body and read a text/event-stream response.

Two formats, one transport

Nimbus emits the wire format the endpoint owns:

/v1/chat/completions — OpenAI chunk format. Every event line is data: . Terminal event is the literal data: [DONE].
/v1/messages — Anthropic typed events. Each event has an event: line and a data: line. No [DONE] sentinel; message_stop terminates.

OpenAI: /v1/chat/completions stream

text

data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}

data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}

data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":2,"total_tokens":14}}

data: [DONE]

Event catalog

line	when	payload
data: { ... chunk ... }	Every delta (0+ per response).	OpenAI chat.completion.chunk shape. First chunk carries role: 'assistant', later chunks carry content or tool_calls deltas, terminal chunk carries finish_reason and usage.
data: [DONE]	Exactly once, at end.	Literal string. Signals end-of-stream. Not JSON — do not try to parse.

Tool-call streaming

Tool calls stream as partial arguments strings. Concatenate the string fragments across chunks; the resulting JSON is only guaranteed well-formed after finish_reason: tool_calls.

text

data: {..."delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]}}

data: {..."delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"ci"}}]}}

data: {..."delta":{"tool_calls":[{"index":0,"function":{"arguments":"ty\":\"Reykjavik\"}"}}]}}

data: {..."delta":{},"finish_reason":"tool_calls"}

data: [DONE]

Anthropic: /v1/messages stream

text

event: message_start
data: {"type":"message_start","message":{"id":"msg_01H8...","type":"message","role":"assistant","model":"anthropic/claude-opus-4.5","content":[],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":41,"output_tokens":0}}}

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" world"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":2}}

event: message_stop
data: {"type":"message_stop"}

Event catalog

event	when	payload
message_start	Once, at start.	Full message envelope with empty content and initial input_tokens.
content_block_start	Per block.	Emitted before each content block. content_block.type is text or tool_use.
content_block_delta	Many per block.	delta.type is text_delta (text field) or input_json_delta (partial_json field for tool inputs).
content_block_stop	Per block.	Emitted when a block finishes.
message_delta	Once, near end.	delta carries final stop_reason + stop_sequence; usage carries output_tokens.
message_stop	Exactly once, at end.	Terminal event. No further data will arrive.
ping	Heartbeat, ~every 15s.	Keeps proxies from closing the connection. Safe to ignore.
error	On upstream failure mid-stream.	Same JSON shape as a non-streaming error body. HTTP status remains 200 because headers already flushed.

Tool-use streaming

Tool inputs stream as input_json_delta events carrying partial_json string fragments. Concatenate them across the block; parse only after content_block_stop.

text

event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"tool_use","id":"toolu_01A","name":"get_weather","input":{}}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"{\"ci"}}

event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"ty\":\"Reykjavik\"}"}}

event: content_block_stop
data: {"type":"content_block_stop","index":0}

event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"}}

event: message_stop
data: {"type":"message_stop"}

SDK examples

from openai import OpenAI

client = OpenAI(base_url="https://llm.nimbusapi.net/v1", api_key="sk-nim-YOUR_KEY")

stream = client.chat.completions.create(
    model="openai/gpt-5.1",
    messages=[{"role": "user", "content": "Count to 5."}],
    max_tokens=64,
    stream=True,
)

for chunk in stream:
    delta = chunk.choices[0].delta
    if delta.content:
        print(delta.content, end="", flush=True)
    if chunk.choices[0].finish_reason:
        print()  # final newline
        break

Reconnection

SSE supports resumption via the Last-Event-ID header. Nimbus emits an id: line before every data: for the Anthropic format. If your client disconnects mid-stream, reconnect to the same URL and send the last observed ID as Last-Event-ID — Nimbus replays from that offset when the upstream supports it.

Caveat. OpenAI-family upstreams do not support mid-stream resume. Nimbus will reject the reconnect with 400 resume_not_supported for those models. Treat resume as a best-effort optimization; always be prepared to re-issue the full request.

Errors mid-stream

If the upstream fails after the first byte, the HTTP status is already 200 and cannot change. Nimbus emits a terminal event: error with the same JSON body shape as a non-streaming error, then closes the connection. Handle it in your stream loop — do not assume every stream ends with [DONE] or message_stop.

text

event: error
data: {"type":"error","error":{"type":"upstream_error","message":"Upstream anthropic disconnected after 812 output tokens.","code":"upstream_disconnect","request_id":"req_01H9..."}}

Edge cases

Disable buffering. If you run behind Nginx, set proxy_buffering off for the endpoint or the client will see the stream in giant delayed bursts.
curl requires -N. Without -N (no-buffer), curl buffers the full response before printing.
Idle timeouts. Nimbus emits ping events every ~15s on long generations. Do not set an inactivity timeout below 30s on your reader.
Usage in streaming. Final token counts arrive on the terminal chunk (OpenAI, in usage) or the message_delta (Anthropic, in usage.output_tokens). If you need billing precision, read that final event before closing.