Messages
POST /v1/messages — the Anthropic-compatible entrypoint. Works with the Anthropic SDK, Claude Code, and anything that speaks the Messages API wire format.
Endpoint
http
POST https://llm.nimbusapi.net/v1/messagesRequired headers: x-api-key and anthropic-version: 2023-06-01. The Bearer token form (Authorization: Bearer sk-nim-...) is also accepted for cross-SDK convenience.
Parameters
| name | type | required | description |
|---|---|---|---|
| model | string | yes | Model ID (e.g. anthropic/claude-opus-4.5, anthropic/claude-sonnet-4.5, anthropic/claude-haiku-4). |
| max_tokens | integer | yes | REQUIRED on /v1/messages. Upper bound on generated tokens. Nimbus caps at the model's output limit. |
| messages | array | yes | Ordered conversation. Alternating user/assistant. Each content is a string or an array of typed blocks (text, image, tool_use, tool_result). |
| system | string | array | no | System prompt. Prefer this over a role: system message — it does not consume the messages array. |
| temperature | number | no | 0.0 to 1.0 on Anthropic models (narrower than OpenAI). Lower = more deterministic. |
| top_p | number | no | Nucleus sampling threshold, 0.0 to 1.0. |
| top_k | integer | no | Top-K sampling. Anthropic-specific. Recommended: leave unset unless tuning for creativity. |
| stop_sequences | array | no | Up to 4 stop sequences. Generation halts before emitting any of them; stop_reason returns stop_sequence. |
| tools | array | no | Tool definitions. See Function Calling for the block-based tool_use / tool_result cycle. |
| tool_choice | object | no | { type: 'auto' | 'any' | 'tool', name?: string, disable_parallel_tool_use?: boolean }. |
| stream | boolean | no | When true, response is a Server-Sent Events stream of typed events. See Streaming. |
| metadata | object | no | { user_id?: string }. The user_id is a stable per-end-user string used for abuse tracking. |
Request body
json
{
"model": "anthropic/claude-opus-4.5",
"max_tokens": 1024,
"system": "You are a terse ops assistant. Answer in <=2 sentences.",
"messages": [
{ "role": "user", "content": "Summarize the last deploy log in one sentence." }
],
"temperature": 0.2,
"top_p": 1,
"top_k": 40,
"stop_sequences": ["\n\nEND"],
"metadata": { "user_id": "internal-user-4711" }
}Response body
json
{
"id": "msg_01H8VXQZ3P4E5N6Y7K8B9M0F1G",
"type": "message",
"role": "assistant",
"model": "anthropic/claude-opus-4.5",
"content": [
{
"type": "text",
"text": "Deploy 8f2a1 succeeded in 42s with zero warnings and zero rollbacks."
}
],
"stop_reason": "end_turn",
"stop_sequence": null,
"usage": {
"input_tokens": 41,
"output_tokens": 19
}
}content— an array of typed blocks. Block types:text,tool_use. A single assistant turn can emit multiple blocks (e.g. a thought text followed by a tool_use).stop_reason— one ofend_turn,max_tokens,stop_sequence,tool_use.usage— token accounting. Billing is input_tokens plus output_tokens at the model's per-token rate.
Basic call
curl -sS https://llm.nimbusapi.net/v1/messages \
-H "x-api-key: $NIMBUS_API_KEY" \
-H "anthropic-version: 2023-06-01" \
-H "Content-Type: application/json" \
-d '{
"model": "anthropic/claude-opus-4.5",
"max_tokens": 128,
"messages": [{"role":"user","content":"Say hi in 3 words."}]
}'Tool use blocks
Anthropic tool calls arrive as tool_use content blocks (not a separate tool_calls field like OpenAI). Every tool_use has a id — you must echo it back as tool_use_id on the corresponding tool_result block in the next user turn.
json
{
"model": "anthropic/claude-opus-4.5",
"max_tokens": 1024,
"tools": [
{
"name": "get_weather",
"description": "Get the current weather for a city.",
"input_schema": {
"type": "object",
"properties": {
"city": { "type": "string" },
"unit": { "type": "string", "enum": ["c", "f"] }
},
"required": ["city"]
}
}
],
"tool_choice": { "type": "auto" },
"messages": [
{ "role": "user", "content": "What is the weather in Reykjavik?" }
]
}json
{
"id": "msg_01H8VXQZ3P4E5N6Y7K8B9M0F1G",
"type": "message",
"role": "assistant",
"model": "anthropic/claude-opus-4.5",
"content": [
{
"type": "tool_use",
"id": "toolu_01A2B3C4",
"name": "get_weather",
"input": { "city": "Reykjavik", "unit": "c" }
}
],
"stop_reason": "tool_use",
"usage": { "input_tokens": 74, "output_tokens": 42 }
}Second turn — return the tool result and let the model reason over it:
json
{
"model": "anthropic/claude-opus-4.5",
"max_tokens": 1024,
"tools": [ /* ... same tools array ... */ ],
"messages": [
{ "role": "user", "content": "What is the weather in Reykjavik?" },
{
"role": "assistant",
"content": [
{
"type": "tool_use",
"id": "toolu_01A2B3C4",
"name": "get_weather",
"input": { "city": "Reykjavik", "unit": "c" }
}
]
},
{
"role": "user",
"content": [
{
"type": "tool_result",
"tool_use_id": "toolu_01A2B3C4",
"content": "3 degrees C, overcast, wind 22 km/h from NE."
}
]
}
]
}Edge cases
- max_tokens is REQUIRED. Unlike
/v1/chat/completions,/v1/messagesrejects requests without it. This is the Anthropic contract; Nimbus preserves it. - messages must alternate user/assistant. Two consecutive user messages return
400 invalid_request. Merge the two into one before sending. - Prefill the assistant turn. End your
messageswith arole: "assistant"entry to prefill the model's next output. Useful for forcing a JSON opening brace. - Cross-family models. You can call an OpenAI-family model through
/v1/messages— Nimbus translates. Same the other way. Wire format is a client preference, not a model constraint.
Error codes
See the Errors reference for the full catalog. Endpoint-specific:
400 invalid_request—param: max_tokenswhen omitted.400 invalid_request—code: message_role_sequencewhen messages do not alternate user/assistant.400 invalid_request—code: tool_use_id_mismatchwhen atool_resultblock references an ID that was not emitted in the prior turn.