Messages

POST /v1/messages — the Anthropic-compatible entrypoint. Works with the Anthropic SDK, Claude Code, and anything that speaks the Messages API wire format.

Endpoint

http
POST https://llm.nimbusapi.net/v1/messages

Required headers: x-api-key and anthropic-version: 2023-06-01. The Bearer token form (Authorization: Bearer sk-nim-...) is also accepted for cross-SDK convenience.

Parameters

nametyperequireddescription
modelstringyesModel ID (e.g. anthropic/claude-opus-4.5, anthropic/claude-sonnet-4.5, anthropic/claude-haiku-4).
max_tokensintegeryesREQUIRED on /v1/messages. Upper bound on generated tokens. Nimbus caps at the model's output limit.
messagesarrayyesOrdered conversation. Alternating user/assistant. Each content is a string or an array of typed blocks (text, image, tool_use, tool_result).
systemstring | arraynoSystem prompt. Prefer this over a role: system message — it does not consume the messages array.
temperaturenumberno0.0 to 1.0 on Anthropic models (narrower than OpenAI). Lower = more deterministic.
top_pnumbernoNucleus sampling threshold, 0.0 to 1.0.
top_kintegernoTop-K sampling. Anthropic-specific. Recommended: leave unset unless tuning for creativity.
stop_sequencesarraynoUp to 4 stop sequences. Generation halts before emitting any of them; stop_reason returns stop_sequence.
toolsarraynoTool definitions. See Function Calling for the block-based tool_use / tool_result cycle.
tool_choiceobjectno{ type: 'auto' | 'any' | 'tool', name?: string, disable_parallel_tool_use?: boolean }.
streambooleannoWhen true, response is a Server-Sent Events stream of typed events. See Streaming.
metadataobjectno{ user_id?: string }. The user_id is a stable per-end-user string used for abuse tracking.

Request body

json
{
  "model": "anthropic/claude-opus-4.5",
  "max_tokens": 1024,
  "system": "You are a terse ops assistant. Answer in <=2 sentences.",
  "messages": [
    { "role": "user", "content": "Summarize the last deploy log in one sentence." }
  ],
  "temperature": 0.2,
  "top_p": 1,
  "top_k": 40,
  "stop_sequences": ["\n\nEND"],
  "metadata": { "user_id": "internal-user-4711" }
}

Response body

json
{
  "id": "msg_01H8VXQZ3P4E5N6Y7K8B9M0F1G",
  "type": "message",
  "role": "assistant",
  "model": "anthropic/claude-opus-4.5",
  "content": [
    {
      "type": "text",
      "text": "Deploy 8f2a1 succeeded in 42s with zero warnings and zero rollbacks."
    }
  ],
  "stop_reason": "end_turn",
  "stop_sequence": null,
  "usage": {
    "input_tokens": 41,
    "output_tokens": 19
  }
}
  • content — an array of typed blocks. Block types: text, tool_use. A single assistant turn can emit multiple blocks (e.g. a thought text followed by a tool_use).
  • stop_reason — one of end_turn, max_tokens, stop_sequence, tool_use.
  • usage — token accounting. Billing is input_tokens plus output_tokens at the model's per-token rate.

Basic call

curl -sS https://llm.nimbusapi.net/v1/messages \
  -H "x-api-key: $NIMBUS_API_KEY" \
  -H "anthropic-version: 2023-06-01" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "anthropic/claude-opus-4.5",
    "max_tokens": 128,
    "messages": [{"role":"user","content":"Say hi in 3 words."}]
  }'

Tool use blocks

Anthropic tool calls arrive as tool_use content blocks (not a separate tool_calls field like OpenAI). Every tool_use has a id — you must echo it back as tool_use_id on the corresponding tool_result block in the next user turn.

json
{
  "model": "anthropic/claude-opus-4.5",
  "max_tokens": 1024,
  "tools": [
    {
      "name": "get_weather",
      "description": "Get the current weather for a city.",
      "input_schema": {
        "type": "object",
        "properties": {
          "city": { "type": "string" },
          "unit": { "type": "string", "enum": ["c", "f"] }
        },
        "required": ["city"]
      }
    }
  ],
  "tool_choice": { "type": "auto" },
  "messages": [
    { "role": "user", "content": "What is the weather in Reykjavik?" }
  ]
}
json
{
  "id": "msg_01H8VXQZ3P4E5N6Y7K8B9M0F1G",
  "type": "message",
  "role": "assistant",
  "model": "anthropic/claude-opus-4.5",
  "content": [
    {
      "type": "tool_use",
      "id": "toolu_01A2B3C4",
      "name": "get_weather",
      "input": { "city": "Reykjavik", "unit": "c" }
    }
  ],
  "stop_reason": "tool_use",
  "usage": { "input_tokens": 74, "output_tokens": 42 }
}

Second turn — return the tool result and let the model reason over it:

json
{
  "model": "anthropic/claude-opus-4.5",
  "max_tokens": 1024,
  "tools": [ /* ... same tools array ... */ ],
  "messages": [
    { "role": "user", "content": "What is the weather in Reykjavik?" },
    {
      "role": "assistant",
      "content": [
        {
          "type": "tool_use",
          "id": "toolu_01A2B3C4",
          "name": "get_weather",
          "input": { "city": "Reykjavik", "unit": "c" }
        }
      ]
    },
    {
      "role": "user",
      "content": [
        {
          "type": "tool_result",
          "tool_use_id": "toolu_01A2B3C4",
          "content": "3 degrees C, overcast, wind 22 km/h from NE."
        }
      ]
    }
  ]
}

Edge cases

  • max_tokens is REQUIRED. Unlike /v1/chat/completions, /v1/messages rejects requests without it. This is the Anthropic contract; Nimbus preserves it.
  • messages must alternate user/assistant. Two consecutive user messages return 400 invalid_request. Merge the two into one before sending.
  • Prefill the assistant turn. End your messages with a role: "assistant" entry to prefill the model's next output. Useful for forcing a JSON opening brace.
  • Cross-family models. You can call an OpenAI-family model through /v1/messages — Nimbus translates. Same the other way. Wire format is a client preference, not a model constraint.

Error codes

See the Errors reference for the full catalog. Endpoint-specific:

  • 400 invalid_request param: max_tokens when omitted.
  • 400 invalid_request code: message_role_sequence when messages do not alternate user/assistant.
  • 400 invalid_request code: tool_use_id_mismatch when a tool_result block references an ID that was not emitted in the prior turn.