Errors

Every non-2xx response carries a machine-readable error object and a request_id you can quote to support.

Error body shape

Errors are always JSON with a top-level error object. The four fields are stable across every endpoint and every status code:

  • type — high-level category (e.g. rate_limited). Safe to switch on.
  • message — human-readable, includes numeric context (balance, RPM, retry-after seconds).
  • code — narrow programmatic code. More specific than type.
  • param — the offending parameter name for 400 errors, otherwise null.
json
{
  "error": {
    "type": "insufficient_balance",
    "message": "Key balance is $0.02; request estimated at $0.14. Top up at nimbusapi.net/dashboard/billing.",
    "code": "insufficient_balance",
    "param": null,
    "request_id": "req_01H9K7Z2Q4T5N6Y7B8M9F0G1H2"
  }
}

Status code catalog

statustypecodemeaningretry?
400invalid_requestinvalid_request | missing_required | unsupported_parameter | context_length_exceededRequest body is malformed, a required field is missing, or a parameter is not supported by the selected model.no — fix the request
401authentication_errorinvalid_api_key | missing_api_key | revoked_api_keyThe Authorization or x-api-key header is missing, malformed, or the key was revoked.no — fix credentials
402insufficient_balanceinsufficient_balance | spend_cap_reachedKey balance is exhausted or the per-key USD cap was reached. Top up or raise the cap.no — top up first
403forbiddenmodel_not_allowed | region_blocked | policy_violationThe key's allow-list does not include this model, the region is blocked, or content policy rejected the prompt.no — change model or prompt
404model_not_foundmodel_not_foundThe model ID does not exist. Check for a typo or a retired model — see /docs/models.no — fix model ID
429rate_limitedrate_limited | concurrent_limitYou exceeded the per-key RPM or the account-wide concurrency ceiling.yes — honor Retry-After
500internal_errorinternal_errorNimbus itself failed. These are rare and always logged with the request_id.yes — backoff up to 30s
502upstream_errorupstream_error | upstream_timeout | upstream_overloadedThe upstream provider (OpenAI, Anthropic, Google, etc.) returned a non-2xx after Nimbus retried its failover chain.yes — backoff, or select a different model
503model_unavailablemodel_unavailable | all_upstreams_downEvery upstream mirror for the model is currently unhealthy. Fall back to a sibling model.yes — try a different model

429 rate_limited example

Every 429 response carries a Retry-After header in seconds. Honor it — do not retry earlier.

json
{
  "error": {
    "type": "rate_limited",
    "message": "You exceeded 60 requests per minute on key sk-nim-****abcd. Retry after 12s.",
    "code": "rate_limited",
    "param": null,
    "request_id": "req_01H9K7Z2Q4T5N6Y7B8M9F0G1H3"
  }
}

502 upstream_error example

Nimbus already ran its failover chain (multiple mirrors per model). A 502 means every mirror the router tried failed. The upstream block names the provider and the number of attempts, so you can decide whether to retry the same model or fall back to a sibling.

json
{
  "error": {
    "type": "upstream_error",
    "message": "Upstream provider anthropic returned 529 overloaded. Nimbus already tried 2 failovers.",
    "code": "upstream_overloaded",
    "param": null,
    "request_id": "req_01H9K7Z2Q4T5N6Y7B8M9F0G1H4",
    "upstream": {
      "provider": "anthropic",
      "status": 529,
      "attempts": 3
    }
  }
}

Retry guidance

  • Retry only 429, 500, 502, 503. Never retry 4xx in the 400404 range — the request itself is wrong.
  • Exponential backoff with jitter: start at 1s, double each attempt, cap at 30s, with ±25% jitter to avoid thundering herds.
  • When the response has a Retry-After header, use it as a floor. Do not retry earlier.
  • Cap total attempts at 5 for interactive traffic, 10 for background jobs. Give up cleanly and surface the error to your caller.
  • On 503 model_unavailable, switch to a sibling model rather than pounding the same one. Every model has a documented fallback tier — see /docs/models.
import time
import httpx

def call_with_retry(payload, key, max_attempts=5):
    backoff = 1.0
    for attempt in range(max_attempts):
        r = httpx.post(
            "https://llm.nimbusapi.net/v1/chat/completions",
            headers={"Authorization": f"Bearer {key}"},
            json=payload,
            timeout=60,
        )
        if r.status_code < 400:
            return r.json()
        if r.status_code in (429, 502, 503):
            wait = float(r.headers.get("retry-after", backoff))
            time.sleep(wait)
            backoff = min(backoff * 2, 30)
            continue
        # 400, 401, 402, 403, 404 — do not retry
        raise RuntimeError(r.json()["error"])
    raise RuntimeError("exhausted retries")

Logging request_id

Every response — success or failure — carries an x-request-id response header and, on errors, the same value inside error.request_id. Log it on every failed call. When you file a ticket, quote it — Nimbus support can pull the full upstream trace within seconds using that ID.

Tip. Attach x-request-id to your application logs on every outbound call, not just failures. If a customer reports a bad completion two hours later, you can still map it back to the exact Nimbus request.

Edge cases

  • Streaming errors mid-stream. If the upstream disconnects after the first token, Nimbus emits a terminal SSE event of shape event: error with the same JSON body. The HTTP status remains 200 because headers already flushed. See /docs/streaming.
  • Tool-call parse errors. If the model returns malformed JSON in a tool_call.arguments field, Nimbus surfaces it as 400 invalid_request with code: tool_call_parse_error and returns the raw string in message.
  • Idempotency. Pass a stable Idempotency-Key header on retryable POSTs. Nimbus deduplicates within a 24-hour window and returns the cached response body, so retrying a completed request is safe.
  • Client-side timeouts. Never set your HTTP client timeout below 60s for non-streaming completions. Long generations on large models legitimately take 30–45s.