Rate limits
No hard per-key limits by default. Best-effort fair-use throttling, plus honest pass-through of upstream 429s with a real Retry-After.
The design philosophy: your money is the rate limit. If your balance is positive and your key is under its cost cap, we try to serve every request. When we can't — because Anthropic, OpenAI, Google, or the underlying vendor is throttling us — we return 429 with a real Retry-After header from the upstream, not a made-up number.
Default per-key limits
- No hard RPS. New keys are not capped at some arbitrary requests-per-second. Sustained traffic just works.
- No token-per-minute cap. If you want to burn through a million tokens per minute on Opus, that's fine as long as the balance covers it and upstream capacity is available.
- Fair-use ceiling ≈ 500 RPS burst. The gateway begins to shed load if a single key exceeds ~500 requests/sec sustained for more than a minute. This is a soft ceiling — email sales@nimbusapi.net to lift it in advance of a spike.
- Concurrency: up to 1,024 in-flight requests per key. Above that you'll see 429 with
code=concurrency_exceeded.
Bursting rules
The gateway uses a leaky-bucket per key with generous burst allowance:
- Burst bucket: 2,000 requests. If the bucket has capacity, the request goes through immediately.
- Refill rate: 500 requests/sec.
- Sustained throughput: 500 RPS indefinitely.
- Burst duration: ~4 seconds of free bursting at 1,000 RPS from a full bucket before throttling kicks in.
When you'll see 429
- Upstream throttled us. Anthropic or OpenAI returned 429 because our aggregate is over their per-org limit. This is the common case during viral moments. We pass through their Retry-After.
- Upstream 5xx cascade. The vendor is having an outage. We retry once internally with jitter, then fall through to a configured fallback model in the same class if you registered one, otherwise 429.
- Fair-use burst exceeded. code=
rate_limit_exceeded. Slow down or contact us to lift the soft cap. - Concurrency exceeded. Too many in-flight requests for the same key.
Retry-After header format
Every 429 response includes an HTTP-compliant Retry-After header. It is always a decimal number of seconds (never an HTTP-date). We also mirror the upstream's rate-limit headers for observability:
HTTP/1.1 429 Too Many Requests
retry-after: 3
x-ratelimit-remaining-requests: 0
x-ratelimit-reset-requests: 2026-07-01T14:32:21Z
content-type: application/json
{
"error": {
"type": "rate_limit_error",
"code": "upstream_throttled",
"message": "Upstream provider (anthropic) returned 429. Retry in 3s.",
"upstream_provider": "anthropic",
"upstream_status": 429,
"retry_after_seconds": 3
}
}Recommended retry pattern
Respect Retry-After if it's present, fall back to exponential backoff with jitter otherwise, and cap total attempts:
async function callWithRetry(
endpoint: string,
body: unknown,
maxAttempts = 5,
): Promise<Response> {
let attempt = 0;
while (true) {
const res = await fetch(endpoint, {
method: "POST",
headers: {
"content-type": "application/json",
authorization: `Bearer ${process.env.NIMBUS_API_KEY}`,
},
body: JSON.stringify(body),
});
if (res.status !== 429 || attempt >= maxAttempts) return res;
const retryAfter = Number(res.headers.get("retry-after") ?? "1");
const backoff = Math.min(retryAfter * 1000, 2 ** attempt * 500);
await new Promise((r) => setTimeout(r, backoff + Math.random() * 250));
attempt++;
}
}Multi-provider fallback
If you register a fallback chain on your key (in /dashboard/keys), a 429 or 5xx from the primary model auto-falls through to the next model in the chain and only returns 429 to you when the whole chain is exhausted. Popular chains:
- Opus fallback:
claude-opus-4.8 → claude-opus-4.7 → gpt-5.5 - Coding fallback:
gpt-5.3-codex → deepseek-v4-pro → qwen3-coder - Cheap fallback:
gemini-3-flash-preview → gpt-5.4-mini
Higher limits
If you plan to sustain more than 500 RPS or need pre-allocated capacity on a specific model during a launch, email sales@nimbusapi.net with your expected RPS, TPM, and the model. We can pre-reserve upstream capacity with 48 hours notice for accounts over $5k/mo.
See also Cost caps for the per-key budget ceiling (separate from RPS limits).