Errors

Every non-2xx response carries a machine-readable error object and a request_id you can quote to support.

Error body shape

Errors are always JSON with a top-level error object. The four fields are stable across every endpoint and every status code:

type — high-level category (e.g. rate_limited). Safe to switch on.
message — human-readable, includes numeric context (balance, RPM, retry-after seconds).
code — narrow programmatic code. More specific than type.
param — the offending parameter name for 400 errors, otherwise null.

json

{
  "error": {
    "type": "insufficient_balance",
    "message": "Key balance is $0.02; request estimated at $0.14. Top up at nimbusapi.net/dashboard/billing.",
    "code": "insufficient_balance",
    "param": null,
    "request_id": "req_01H9K7Z2Q4T5N6Y7B8M9F0G1H2"
  }
}

Status code catalog

status	type	code	meaning	retry?
400	invalid_request	invalid_request \| missing_required \| unsupported_parameter \| context_length_exceeded	Request body is malformed, a required field is missing, or a parameter is not supported by the selected model.	no — fix the request
401	authentication_error	invalid_api_key \| missing_api_key \| revoked_api_key	The Authorization or x-api-key header is missing, malformed, or the key was revoked.	no — fix credentials
402	insufficient_balance	insufficient_balance \| spend_cap_reached	Key balance is exhausted or the per-key USD cap was reached. Top up or raise the cap.	no — top up first
403	forbidden	model_not_allowed \| region_blocked \| policy_violation	The key's allow-list does not include this model, the region is blocked, or content policy rejected the prompt.	no — change model or prompt
404	model_not_found	model_not_found	The model ID does not exist. Check for a typo or a retired model — see /docs/models.	no — fix model ID
429	rate_limited	rate_limited \| concurrent_limit	You exceeded the per-key RPM or the account-wide concurrency ceiling.	yes — honor Retry-After
500	internal_error	internal_error	Nimbus itself failed. These are rare and always logged with the request_id.	yes — backoff up to 30s
502	upstream_error	upstream_error \| upstream_timeout \| upstream_overloaded	The upstream provider (OpenAI, Anthropic, Google, etc.) returned a non-2xx after Nimbus retried its failover chain.	yes — backoff, or select a different model
503	model_unavailable	model_unavailable \| all_upstreams_down	Every upstream mirror for the model is currently unhealthy. Fall back to a sibling model.	yes — try a different model

429 rate_limited example

Every 429 response carries a Retry-After header in seconds. Honor it — do not retry earlier.

json

{
  "error": {
    "type": "rate_limited",
    "message": "You exceeded 60 requests per minute on key sk-nim-****abcd. Retry after 12s.",
    "code": "rate_limited",
    "param": null,
    "request_id": "req_01H9K7Z2Q4T5N6Y7B8M9F0G1H3"
  }
}

502 upstream_error example

Nimbus already ran its failover chain (multiple mirrors per model). A 502 means every mirror the router tried failed. The upstream block names the provider and the number of attempts, so you can decide whether to retry the same model or fall back to a sibling.

json

{
  "error": {
    "type": "upstream_error",
    "message": "Upstream provider anthropic returned 529 overloaded. Nimbus already tried 2 failovers.",
    "code": "upstream_overloaded",
    "param": null,
    "request_id": "req_01H9K7Z2Q4T5N6Y7B8M9F0G1H4",
    "upstream": {
      "provider": "anthropic",
      "status": 529,
      "attempts": 3
    }
  }
}

Retry guidance

Retry only 429, 500, 502, 503. Never retry 4xx in the 400–404 range — the request itself is wrong.
Exponential backoff with jitter: start at 1s, double each attempt, cap at 30s, with ±25% jitter to avoid thundering herds.
When the response has a Retry-After header, use it as a floor. Do not retry earlier.
Cap total attempts at 5 for interactive traffic, 10 for background jobs. Give up cleanly and surface the error to your caller.
On 503 model_unavailable, switch to a sibling model rather than pounding the same one. Every model has a documented fallback tier — see /docs/models.

import time
import httpx

def call_with_retry(payload, key, max_attempts=5):
    backoff = 1.0
    for attempt in range(max_attempts):
        r = httpx.post(
            "https://llm.nimbusapi.net/v1/chat/completions",
            headers={"Authorization": f"Bearer {key}"},
            json=payload,
            timeout=60,
        )
        if r.status_code < 400:
            return r.json()
        if r.status_code in (429, 502, 503):
            wait = float(r.headers.get("retry-after", backoff))
            time.sleep(wait)
            backoff = min(backoff * 2, 30)
            continue
        # 400, 401, 402, 403, 404 — do not retry
        raise RuntimeError(r.json()["error"])
    raise RuntimeError("exhausted retries")

Logging request_id

Every response — success or failure — carries an x-request-id response header and, on errors, the same value inside error.request_id. Log it on every failed call. When you file a ticket, quote it — Nimbus support can pull the full upstream trace within seconds using that ID.

Tip. Attach x-request-id to your application logs on every outbound call, not just failures. If a customer reports a bad completion two hours later, you can still map it back to the exact Nimbus request.

Edge cases

Streaming errors mid-stream. If the upstream disconnects after the first token, Nimbus emits a terminal SSE event of shape event: error with the same JSON body. The HTTP status remains 200 because headers already flushed. See /docs/streaming.
Tool-call parse errors. If the model returns malformed JSON in a tool_call.arguments field, Nimbus surfaces it as 400 invalid_request with code: tool_call_parse_error and returns the raw string in message.
Idempotency. Pass a stable Idempotency-Key header on retryable POSTs. Nimbus deduplicates within a 24-hour window and returns the cached response body, so retrying a completed request is safe.
Client-side timeouts. Never set your HTTP client timeout below 60s for non-streaming completions. Long generations on large models legitimately take 30–45s.