Pricing
28 models across 5 providers. Every rate is exactly 50% off the official published rate from Anthropic, OpenAI, Google, DeepSeek, Qwen, Z.AI, and Moonshot.
Nimbus is prepaid. You add funds to your account balance and every request debits at the rate below. There is no seat license, no monthly commit, no throttle. Prices are USD per 1,000,000 tokens. The crossed number is the vendor's public list rate — the black number is what Nimbus charges. If you want a quick primer, jump to Top up or Models.
Text & chat models
| Model ID | Provider | Context | Input / 1M | Output / 1M | Retail |
|---|---|---|---|---|---|
| anthropic/claude-opus-4.8 | Anthropic | 1M | $2.50 | $12.50 | $5.00 / $25.00 |
| anthropic/claude-opus-4.7 | Anthropic | 1M | $2.50 | $12.50 | $5.00 / $25.00 |
| anthropic/claude-opus-4.6 | Anthropic | 1M | $2.50 | $12.50 | $5.00 / $25.00 |
| anthropic/claude-sonnet-4.6 | Anthropic | 1M | $1.50 | $7.50 | $3.00 / $15.00 |
| anthropic/claude-haiku-4.5 | Anthropic | 200K | $0.5 | $2.50 | $1.00 / $5.00 |
| openai/gpt-5.5 | OpenAI | 1M | $2.50 | $15.00 | $5.00 / $30.00 |
| openai/gpt-5.4 | OpenAI | 1M | $1.25 | $7.50 | $2.50 / $15.00 |
| openai/gpt-5.4-mini | OpenAI | 400K | $0.375 | $2.25 | $0.75 / $4.50 |
| openai/gpt-5.3-codex | OpenAI | 400K | $0.875 | $7.00 | $1.75 / $14.00 |
| openai/gpt-5.1-codex-max | OpenAI | 400K | $0.625 | $5.00 | $1.25 / $10.00 |
| openai/gpt-5.1-codex-mini | OpenAI | 400K | $0.125 | $1.00 | $0.25 / $2.00 |
| google/gemini-3.1-pro-preview | 1M | $1.00 | $6.00 | $2.00 / $12.00 | |
| google/gemini-3.5-flash | 1M | $0.75 | $4.50 | $1.50 / $9.00 | |
| google/gemini-3-flash-preview | 1M | $0.25 | $1.50 | $0.5 / $3.00 | |
| deepseek/deepseek-v4-pro | China | 1M | $0.215 | $0.435 | $0.43 / $0.87 |
| qwen/qwen3-coder | China | 1M | $0.11 | $0.9 | $0.22 / $1.80 |
| qwen/qwen3.7-max | China | 1M | $0.625 | $1.88 | $1.25 / $3.75 |
| z-ai/glm-5 | China | 200K | $0.3 | $0.96 | $0.6 / $1.92 |
| z-ai/glm-5.1 | China | 200K | $0.49 | $1.54 | $0.98 / $3.08 |
| z-ai/glm-5.2 | China | 1M | $0.7 | $2.20 | $1.40 / $4.40 |
| moonshotai/kimi-k2.6 | China | 256K | $0.34 | $1.71 | $0.68 / $3.41 |
| moonshotai/kimi-k2.7-code | China | 256K | $0.375 | $1.75 | $0.75 / $3.50 |
Image models
Image generation is billed per image based on the underlying vendor rate. All image models are priced at 50% off official rates. Contact billing@nimbusapi.net for the per-image cost sheet.
| Model ID | Provider | Type |
|---|---|---|
| google/gemini-2.5-flash-image | Image | Image generation |
| google/gemini-3-pro-image | Image | Image generation |
| google/gemini-3.1-flash-image | Image | Image generation |
| openai/gpt-5-image | Image | Image generation |
| openai/gpt-5-image-mini | Image | Image generation |
| openai/gpt-5.4-image-2 | Image | Image generation |
How the 50% discount works
Nimbus buys tokens at wholesale under enterprise contracts with each provider (Anthropic direct, OpenAI direct, Google AI Studio + Vertex, and reseller arrangements for the China labs). Wholesale rates are typically 40–60% below published list, and we pass the entire spread to customers by pricing every model at exactly retail ÷ 2. When a vendor lowers the public rate, we lower ours the same day. When a vendor raises the public rate, we keep the -50% ratio.
Volume discount rules
- Under $5,000/month: published rate above. No extra discount.
- $5,000 – $25,000/month: an extra 10% off any single model on request. Email sales@nimbusapi.net with your model mix and expected volume.
- $25,000 – $100,000/month: 15% extra, monthly invoice with net-15 terms available in place of prepaid.
- Over $100,000/month: up to 25% extra plus a dedicated capacity guarantee at Anthropic / OpenAI / Google, private Slack channel, and named account engineer.
Billing granularity
Every request debits the account balance immediately after the response completes. Input and output tokens are metered separately using the exact token counts returned by the upstream vendor. Cached input (Anthropic / OpenAI cache-control blocks) is billed at the provider's cached rate divided by two — the discount stacks on the vendor cache discount. Streamed responses charge only for tokens actually delivered. Failed responses (5xx from the upstream) never bill.