Errors
Every non-2xx response carries a machine-readable error object and a request_id you can quote to support.
Error body shape
Errors are always JSON with a top-level error object. The four fields are stable across every endpoint and every status code:
type— high-level category (e.g.rate_limited). Safe to switch on.message— human-readable, includes numeric context (balance, RPM, retry-after seconds).code— narrow programmatic code. More specific thantype.param— the offending parameter name for400errors, otherwisenull.
{
"error": {
"type": "insufficient_balance",
"message": "Key balance is $0.02; request estimated at $0.14. Top up at nimbusapi.net/dashboard/billing.",
"code": "insufficient_balance",
"param": null,
"request_id": "req_01H9K7Z2Q4T5N6Y7B8M9F0G1H2"
}
}Status code catalog
| status | type | code | meaning | retry? |
|---|---|---|---|---|
| 400 | invalid_request | invalid_request | missing_required | unsupported_parameter | context_length_exceeded | Request body is malformed, a required field is missing, or a parameter is not supported by the selected model. | no — fix the request |
| 401 | authentication_error | invalid_api_key | missing_api_key | revoked_api_key | The Authorization or x-api-key header is missing, malformed, or the key was revoked. | no — fix credentials |
| 402 | insufficient_balance | insufficient_balance | spend_cap_reached | Key balance is exhausted or the per-key USD cap was reached. Top up or raise the cap. | no — top up first |
| 403 | forbidden | model_not_allowed | region_blocked | policy_violation | The key's allow-list does not include this model, the region is blocked, or content policy rejected the prompt. | no — change model or prompt |
| 404 | model_not_found | model_not_found | The model ID does not exist. Check for a typo or a retired model — see /docs/models. | no — fix model ID |
| 429 | rate_limited | rate_limited | concurrent_limit | You exceeded the per-key RPM or the account-wide concurrency ceiling. | yes — honor Retry-After |
| 500 | internal_error | internal_error | Nimbus itself failed. These are rare and always logged with the request_id. | yes — backoff up to 30s |
| 502 | upstream_error | upstream_error | upstream_timeout | upstream_overloaded | The upstream provider (OpenAI, Anthropic, Google, etc.) returned a non-2xx after Nimbus retried its failover chain. | yes — backoff, or select a different model |
| 503 | model_unavailable | model_unavailable | all_upstreams_down | Every upstream mirror for the model is currently unhealthy. Fall back to a sibling model. | yes — try a different model |
429 rate_limited example
Every 429 response carries a Retry-After header in seconds. Honor it — do not retry earlier.
{
"error": {
"type": "rate_limited",
"message": "You exceeded 60 requests per minute on key sk-nim-****abcd. Retry after 12s.",
"code": "rate_limited",
"param": null,
"request_id": "req_01H9K7Z2Q4T5N6Y7B8M9F0G1H3"
}
}502 upstream_error example
Nimbus already ran its failover chain (multiple mirrors per model). A 502 means every mirror the router tried failed. The upstream block names the provider and the number of attempts, so you can decide whether to retry the same model or fall back to a sibling.
{
"error": {
"type": "upstream_error",
"message": "Upstream provider anthropic returned 529 overloaded. Nimbus already tried 2 failovers.",
"code": "upstream_overloaded",
"param": null,
"request_id": "req_01H9K7Z2Q4T5N6Y7B8M9F0G1H4",
"upstream": {
"provider": "anthropic",
"status": 529,
"attempts": 3
}
}
}Retry guidance
- Retry only
429,500,502,503. Never retry4xxin the400–404range — the request itself is wrong. - Exponential backoff with jitter: start at 1s, double each attempt, cap at 30s, with ±25% jitter to avoid thundering herds.
- When the response has a
Retry-Afterheader, use it as a floor. Do not retry earlier. - Cap total attempts at 5 for interactive traffic, 10 for background jobs. Give up cleanly and surface the error to your caller.
- On
503 model_unavailable, switch to a sibling model rather than pounding the same one. Every model has a documented fallback tier — see /docs/models.
import time
import httpx
def call_with_retry(payload, key, max_attempts=5):
backoff = 1.0
for attempt in range(max_attempts):
r = httpx.post(
"https://llm.nimbusapi.net/v1/chat/completions",
headers={"Authorization": f"Bearer {key}"},
json=payload,
timeout=60,
)
if r.status_code < 400:
return r.json()
if r.status_code in (429, 502, 503):
wait = float(r.headers.get("retry-after", backoff))
time.sleep(wait)
backoff = min(backoff * 2, 30)
continue
# 400, 401, 402, 403, 404 — do not retry
raise RuntimeError(r.json()["error"])
raise RuntimeError("exhausted retries")Logging request_id
Every response — success or failure — carries an x-request-id response header and, on errors, the same value inside error.request_id. Log it on every failed call. When you file a ticket, quote it — Nimbus support can pull the full upstream trace within seconds using that ID.
Tip. Attach x-request-id to your application logs on every outbound call, not just failures. If a customer reports a bad completion two hours later, you can still map it back to the exact Nimbus request.
Edge cases
- Streaming errors mid-stream. If the upstream disconnects after the first token, Nimbus emits a terminal SSE event of shape
event: errorwith the same JSON body. The HTTP status remains200because headers already flushed. See /docs/streaming. - Tool-call parse errors. If the model returns malformed JSON in a
tool_call.argumentsfield, Nimbus surfaces it as400 invalid_requestwithcode: tool_call_parse_errorand returns the raw string inmessage. - Idempotency. Pass a stable
Idempotency-Keyheader on retryable POSTs. Nimbus deduplicates within a 24-hour window and returns the cached response body, so retrying a completed request is safe. - Client-side timeouts. Never set your HTTP client timeout below 60s for non-streaming completions. Long generations on large models legitimately take 30–45s.