Chat Completions

POST /v1/chat/completions — the OpenAI-compatible entrypoint. Works with the OpenAI SDK, LangChain, LiteLLM, and anything that speaks the OpenAI wire format.

Endpoint

http
POST https://llm.nimbusapi.net/v1/chat/completions

Parameters

nametyperequireddescription
modelstringyesModel ID (e.g. openai/gpt-5.1, anthropic/claude-opus-4.5, google/gemini-3-pro).
messagesarrayyesOrdered conversation. Each item has role (system|user|assistant|tool) and content.
max_tokensintegernoUpper bound on generated tokens. Nimbus caps at the model's context limit if unset.
temperaturenumberno0.0 to 2.0. Lower = more deterministic. Default varies by model.
top_pnumbernoNucleus sampling threshold, 0.0 to 1.0. Prefer temperature OR top_p, not both.
stopstring | arraynoUp to 4 stop sequences. Generation halts before emitting any of them.
toolsarraynoFunction definitions the model may call. See the Function Calling reference.
tool_choicestring | objectnoauto | none | required | { type: 'function', function: { name } }.
response_formatobjectno{ type: 'text' | 'json_object' | 'json_schema', json_schema?: {...} }.
streambooleannoWhen true, response is a Server-Sent Events stream of chunks. See Streaming.
userstringnoStable per-end-user string. Aids Nimbus abuse tracking and appears in usage exports.
seedintegernoBest-effort determinism. Not all upstreams honor it — Nimbus passes it through.

Request body

json
{
  "model": "openai/gpt-5.1",
  "messages": [
    { "role": "system", "content": "You are a terse ops assistant." },
    { "role": "user", "content": "Summarize the last deploy log in one sentence." }
  ],
  "max_tokens": 200,
  "temperature": 0.2,
  "top_p": 1,
  "stop": ["\n\nEND"],
  "response_format": { "type": "text" },
  "seed": 42,
  "user": "internal-user-4711"
}

Response body

json
{
  "id": "chatcmpl_01H8VXQZ3P4E5N6Y7K8B9M0F1G",
  "object": "chat.completion",
  "created": 1735689600,
  "model": "openai/gpt-5.1",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": "Deploy 8f2a1 succeeded in 42s; 0 warnings, 0 rollbacks."
      },
      "finish_reason": "stop"
    }
  ],
  "usage": {
    "prompt_tokens": 38,
    "completion_tokens": 17,
    "total_tokens": 55
  }
}
  • finish_reason — one of stop, length, tool_calls, content_filter.
  • usage — token accounting. Billing is prompt_tokens plus completion_tokens at the model's per-token rate.

Basic call

curl -sS https://llm.nimbusapi.net/v1/chat/completions \
  -H "Authorization: Bearer $NIMBUS_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "openai/gpt-5.1",
    "messages": [{"role":"user","content":"Say hi in 3 words."}],
    "max_tokens": 32
  }'

Tool calling

Attach a tools array. If the model decides to call one, the assistant message returns a tool_calls array with the function name and JSON-encoded arguments. Feed the result back as a role: "tool" message on the next turn.

json
{
  "model": "openai/gpt-5.1",
  "messages": [
    { "role": "user", "content": "What is the weather in Reykjavik?" }
  ],
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "get_weather",
        "description": "Get the current weather for a city.",
        "parameters": {
          "type": "object",
          "properties": {
            "city": { "type": "string" },
            "unit": { "type": "string", "enum": ["c", "f"] }
          },
          "required": ["city"]
        }
      }
    }
  ],
  "tool_choice": "auto"
}
json
{
  "id": "chatcmpl_01H8VXQZ3P4E5N6Y7K8B9M0F1G",
  "object": "chat.completion",
  "created": 1735689600,
  "model": "openai/gpt-5.1",
  "choices": [{
    "index": 0,
    "message": {
      "role": "assistant",
      "content": null,
      "tool_calls": [{
        "id": "call_01H8ABCXYZ",
        "type": "function",
        "function": {
          "name": "get_weather",
          "arguments": "{\"city\":\"Reykjavik\",\"unit\":\"c\"}"
        }
      }]
    },
    "finish_reason": "tool_calls"
  }],
  "usage": { "prompt_tokens": 62, "completion_tokens": 24, "total_tokens": 86 }
}

JSON mode

Set response_format to constrain output to valid JSON. With json_schema and strict: true, the response is guaranteed to parse against the schema.

json
{
  "model": "openai/gpt-5.1",
  "messages": [
    { "role": "system", "content": "Return only JSON matching the requested schema." },
    { "role": "user", "content": "Give me a person with name and age." }
  ],
  "response_format": {
    "type": "json_schema",
    "json_schema": {
      "name": "person",
      "strict": true,
      "schema": {
        "type": "object",
        "properties": {
          "name": { "type": "string" },
          "age": { "type": "integer", "minimum": 0 }
        },
        "required": ["name", "age"],
        "additionalProperties": false
      }
    }
  }
}

Streaming

Set stream: true. Response is text/event-stream. Each chunk is a JSON delta; the terminal event is a literal data: [DONE]. Full event catalog in the Streaming reference.

text
data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"role":"assistant","content":"Hello"},"finish_reason":null}]}

data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{"content":" world"},"finish_reason":null}]}

data: {"id":"chatcmpl_01H8...","object":"chat.completion.chunk","created":1735689600,"model":"openai/gpt-5.1","choices":[{"index":0,"delta":{},"finish_reason":"stop"}]}

data: [DONE]

Edge cases

  • System message placement. Only the first system message counts on Anthropic-family models — later system messages are prepended to the closest user message. Consolidate up front for portable behavior.
  • temperature vs top_p. Setting both is legal but the result is compounded. Pick one and set the other to its default.
  • response_format on models that don't support JSON schema. Nimbus falls back to json_object and injects the schema into the system prompt. Set strict: false to opt out of fallback and get a 400 unsupported_parameter instead.
  • max_tokens = 0. Legal. Returns a completion with finish_reason: length and an empty content string. Useful for cost estimation via usage.prompt_tokens.

Error codes

See the Errors reference for the full catalog. Endpoint-specific:

  • 400 invalid_request code: context_length_exceeded when prompt plus max_tokens exceeds the model window.
  • 400 invalid_request code: unsupported_parameter when a parameter is not supported by the selected model.
  • 400 invalid_request code: tool_call_parse_error when the model emits invalid JSON in a tool call. See Errors → Edge cases.