Chat Completions
POST /v1/chat/completions — the OpenAI-compatible entrypoint. Works with the OpenAI SDK, LangChain, LiteLLM, and anything that speaks the OpenAI wire format.
Endpoint
POST https://llm.nimbusapi.net/v1/chat/completionsParameters
| name | type | required | description |
|---|---|---|---|
| model | string | yes | Model ID (e.g. openai/gpt-5.1, anthropic/claude-opus-4.5, google/gemini-3-pro). |
| messages | array | yes | Ordered conversation. Each item has role (system|user|assistant|tool) and content. |
| max_tokens | integer | no | Upper bound on generated tokens. Nimbus caps at the model's context limit if unset. |
| temperature | number | no | 0.0 to 2.0. Lower = more deterministic. Default varies by model. |
| top_p | number | no | Nucleus sampling threshold, 0.0 to 1.0. Prefer temperature OR top_p, not both. |
| stop | string | array | no | Up to 4 stop sequences. Generation halts before emitting any of them. |
| tools | array | no | Function definitions the model may call. See the Function Calling reference. |
| tool_choice | string | object | no | auto | none | required | { type: 'function', function: { name } }. |
| response_format | object | no | { type: 'text' | 'json_object' | 'json_schema', json_schema?: {...} }. |
| stream | boolean | no | When true, response is a Server-Sent Events stream of chunks. See Streaming. |
| user | string | no | Stable per-end-user string. Aids Nimbus abuse tracking and appears in usage exports. |
| seed | integer | no | Best-effort determinism. Not all upstreams honor it — Nimbus passes it through. |
Request body
{
"model": "openai/gpt-5.1",
"messages": [
{ "role": "system", "content": "You are a terse ops assistant." },
{ "role": "user", "content": "Summarize the last deploy log in one sentence." }
],
"max_tokens": 200,
"temperature": 0.2,
"top_p": 1,
"stop": ["\n\nEND"],
"response_format": { "type": "text" },
"seed": 42,
"user": "internal-user-4711"
}Response body
{
"id": "chatcmpl_01H8VXQZ3P4E5N6Y7K8B9M0F1G",
"object": "chat.completion",
"created": 1735689600,
"model": "openai/gpt-5.1",
"choices": [
{
"index": 0,
"message": {
"role": "assistant",
"content": "Deploy 8f2a1 succeeded in 42s; 0 warnings, 0 rollbacks."
},
"finish_reason": "stop"
}
],
"usage": {
"prompt_tokens": 38,
"completion_tokens": 17,
"total_tokens": 55
}
}finish_reason— one ofstop,length,tool_calls,content_filter.usage— token accounting. Billing is prompt_tokens plus completion_tokens at the model's per-token rate.
Basic call
curl -sS https://llm.nimbusapi.net/v1/chat/completions \
-H "Authorization: Bearer $NIMBUS_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"model": "openai/gpt-5.1",
"messages": [{"role":"user","content":"Say hi in 3 words."}],
"max_tokens": 32
}'Tool calling
Attach a tools array. If the model decides to call one, the assistant message returns a tool_calls array with the function name and JSON-encoded arguments. Feed the result back as a role: "tool" message on the next turn.
{
"model": "openai/gpt-5.1",
"messages": [
{ "role": "user", "content": "What is the weather in Reykjavik?" }
],
"tools": [
{
"type": "function",
"function": {
"name": "get_weather",
"description": "Get the current weather for a city.",
"parameters": {
"type": "object",
"properties": {
"city": { "type": "string" },
"unit": { "type": "string", "enum": ["c", "f"] }
},
"required": ["city"]
}
}
}
],
"tool_choice": "auto"
}{
"id": "chatcmpl_01H8VXQZ3P4E5N6Y7K8B9M0F1G",
"object": "chat.completion",
"created": 1735689600,
"model": "openai/gpt-5.1",
"choices": [{
"index": 0,
"message": {
"role": "assistant",
"content": null,
"tool_calls": [{
"id": "call_01H8ABCXYZ",
"type": "function",
"function": {
"name": "get_weather",
"arguments": "{\"city\":\"Reykjavik\",\"unit\":\"c\"}"
}
}]
},
"finish_reason": "tool_calls"
}],
"usage": { "prompt_tokens": 62, "completion_tokens": 24, "total_tokens": 86 }
}JSON mode
Set response_format to constrain output to valid JSON. With json_schema and strict: true, the response is guaranteed to parse against the schema.
{
"model": "openai/gpt-5.1",
"messages": [
{ "role": "system", "content": "Return only JSON matching the requested schema." },
{ "role": "user", "content": "Give me a person with name and age." }
],
"response_format": {
"type": "json_schema",
"json_schema": {
"name": "person",
"strict": true,
"schema": {
"type": "object",
"properties": {
"name": { "type": "string" },
"age": { "type": "integer", "minimum": 0 }
},
"required": ["name", "age"],
"additionalProperties": false
}
}
}
}Streaming
Set stream: true. Response is text/event-stream. Each chunk is a JSON delta; the terminal event is a literal data: [DONE]. Full event catalog in the Streaming reference.
data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}
data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}
data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}
data: [DONE]Edge cases
- System message placement. Only the first system message counts on Anthropic-family models — later system messages are prepended to the closest user message. Consolidate up front for portable behavior.
- temperature vs top_p. Setting both is legal but the result is compounded. Pick one and set the other to its default.
- response_format on models that don't support JSON schema. Nimbus falls back to
json_objectand injects the schema into the system prompt. Setstrict: falseto opt out of fallback and get a400 unsupported_parameterinstead. - max_tokens = 0. Legal. Returns a completion with
finish_reason: lengthand an empty content string. Useful for cost estimation viausage.prompt_tokens.
Error codes
See the Errors reference for the full catalog. Endpoint-specific:
400 invalid_request—code: context_length_exceededwhen prompt plus max_tokens exceeds the model window.400 invalid_request—code: unsupported_parameterwhen a parameter is not supported by the selected model.400 invalid_request—code: tool_call_parse_errorwhen the model emits invalid JSON in a tool call. See Errors → Edge cases.