Streaming
Server-Sent Events on both endpoints. Set stream: true in the request body and read a text/event-stream response.
Two formats, one transport
Nimbus emits the wire format the endpoint owns:
/v1/chat/completions— OpenAI chunk format. Every event line isdata:. Terminal event is the literaldata: [DONE]./v1/messages— Anthropic typed events. Each event has anevent:line and adata:line. No[DONE]sentinel;message_stopterminates.
OpenAI: /v1/chat/completions stream
data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"role":"assistant","content":""},"finish_reason":null}]}
data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}],"usage":{"prompt_tokens":12,"completion_tokens":2,"total_tokens":14}}
data: [DONE]Event catalog
| line | when | payload |
|---|---|---|
| data: { ... chunk ... } | Every delta (0+ per response). | OpenAI chat.completion.chunk shape. First chunk carries role: 'assistant', later chunks carry content or tool_calls deltas, terminal chunk carries finish_reason and usage. |
| data: [DONE] | Exactly once, at end. | Literal string. Signals end-of-stream. Not JSON — do not try to parse. |
Tool-call streaming
Tool calls stream as partial arguments strings. Concatenate the string fragments across chunks; the resulting JSON is only guaranteed well-formed after finish_reason: tool_calls.
data: {..."delta":{"role":"assistant","content":null,"tool_calls":[{"index":0,"id":"call_abc","type":"function","function":{"name":"get_weather","arguments":""}}]}}
data: {..."delta":{"tool_calls":[{"index":0,"function":{"arguments":"{\"ci"}}]}}
data: {..."delta":{"tool_calls":[{"index":0,"function":{"arguments":"ty\":\"Reykjavik\"}"}}]}}
data: {..."delta":{},"finish_reason":"tool_calls"}
data: [DONE]Anthropic: /v1/messages stream
event: message_start
data: {"type":"message_start","message":{"id":"msg_01H8...","type":"message","role":"assistant","model":"anthropic/claude-opus-4.5","content":[],"stop_reason":null,"stop_sequence":null,"usage":{"input_tokens":41,"output_tokens":0}}}
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"text","text":""}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":"Hello"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"text_delta","text":" world"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"end_turn","stop_sequence":null},"usage":{"output_tokens":2}}
event: message_stop
data: {"type":"message_stop"}Event catalog
| event | when | payload |
|---|---|---|
| message_start | Once, at start. | Full message envelope with empty content and initial input_tokens. |
| content_block_start | Per block. | Emitted before each content block. content_block.type is text or tool_use. |
| content_block_delta | Many per block. | delta.type is text_delta (text field) or input_json_delta (partial_json field for tool inputs). |
| content_block_stop | Per block. | Emitted when a block finishes. |
| message_delta | Once, near end. | delta carries final stop_reason + stop_sequence; usage carries output_tokens. |
| message_stop | Exactly once, at end. | Terminal event. No further data will arrive. |
| ping | Heartbeat, ~every 15s. | Keeps proxies from closing the connection. Safe to ignore. |
| error | On upstream failure mid-stream. | Same JSON shape as a non-streaming error body. HTTP status remains 200 because headers already flushed. |
Tool-use streaming
Tool inputs stream as input_json_delta events carrying partial_json string fragments. Concatenate them across the block; parse only after content_block_stop.
event: content_block_start
data: {"type":"content_block_start","index":0,"content_block":{"type":"tool_use","id":"toolu_01A","name":"get_weather","input":{}}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"{\"ci"}}
event: content_block_delta
data: {"type":"content_block_delta","index":0,"delta":{"type":"input_json_delta","partial_json":"ty\":\"Reykjavik\"}"}}
event: content_block_stop
data: {"type":"content_block_stop","index":0}
event: message_delta
data: {"type":"message_delta","delta":{"stop_reason":"tool_use"}}
event: message_stop
data: {"type":"message_stop"}SDK examples
from openai import OpenAI
client = OpenAI(base_url="https://llm.nimbusapi.net/v1", api_key="sk-nim-YOUR_KEY")
stream = client.chat.completions.create(
model="openai/gpt-5.1",
messages=[{"role": "user", "content": "Count to 5."}],
max_tokens=64,
stream=True,
)
for chunk in stream:
delta = chunk.choices[0].delta
if delta.content:
print(delta.content, end="", flush=True)
if chunk.choices[0].finish_reason:
print() # final newline
breakReconnection
SSE supports resumption via the Last-Event-ID header. Nimbus emits an id: line before every data: for the Anthropic format. If your client disconnects mid-stream, reconnect to the same URL and send the last observed ID as Last-Event-ID — Nimbus replays from that offset when the upstream supports it.
Caveat. OpenAI-family upstreams do not support mid-stream resume. Nimbus will reject the reconnect with 400 resume_not_supported for those models. Treat resume as a best-effort optimization; always be prepared to re-issue the full request.
Errors mid-stream
If the upstream fails after the first byte, the HTTP status is already 200 and cannot change. Nimbus emits a terminal event: error with the same JSON body shape as a non-streaming error, then closes the connection. Handle it in your stream loop — do not assume every stream ends with [DONE] or message_stop.
event: error
data: {"type":"error","error":{"type":"upstream_error","message":"Upstream anthropic disconnected after 812 output tokens.","code":"upstream_disconnect","request_id":"req_01H9..."}}Edge cases
- Disable buffering. If you run behind Nginx, set
proxy_buffering offfor the endpoint or the client will see the stream in giant delayed bursts. - curl requires
-N. Without-N(no-buffer), curl buffers the full response before printing. - Idle timeouts. Nimbus emits
pingevents every ~15s on long generations. Do not set an inactivity timeout below 30s on your reader. - Usage in streaming. Final token counts arrive on the terminal
chunk(OpenAI, inusage) or themessage_delta(Anthropic, inusage.output_tokens). If you need billing precision, read that final event before closing.