Models

28 models on Nimbus, grouped by provider. Every model streams, most call tools, most see images. Every price is exactly 50% off the vendor's list rate.

Full price table is on the Pricing page. This page is for choosing a model — what each one is good at, and what capabilities it supports. All models share the same OpenAI-compatible endpoint at https://llm.nimbusapi.net/v1. Anthropic-format is also available at /anthropic/v1 for Claude models.

Anthropic

Claude family. Strongest at long-context reasoning, agent loops, and tool use. Best default for coding assistants and multi-turn agents.

anthropic/claude-opus-4.8

Claude Opus 4.8

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

Frontier reasoning, autonomous agents, long-horizon planning

Input / 1M

$2.50

Output / 1M

$12.50

StreamingTool callingVision

anthropic/claude-opus-4.7

Claude Opus 4.7

Top-tier reasoning. Best for complex agents and long-context work.

Context

Best for

Complex agents, deep research, long-context code review

Input / 1M

$2.50

Output / 1M

$12.50

StreamingTool callingVision

anthropic/claude-opus-4.6

Claude Opus 4.6

Previous-gen Opus. Proven reasoning at the same price.

Context

Best for

Same reasoning at proven stability. Good for pinned production

Input / 1M

$2.50

Output / 1M

$12.50

StreamingTool callingVision

anthropic/claude-sonnet-4.6

Claude Sonnet 4.6

Balanced speed and intelligence. Great default for chat and tools.

Context

Best for

Balanced default: chat, tool use, moderate reasoning

Input / 1M

$1.50

Output / 1M

$7.50

StreamingTool callingVision

anthropic/claude-haiku-4.5

Claude Haiku 4.5

Lowest latency. Built for high-throughput, real-time tasks.

Context

200K

Best for

High-throughput classification, real-time chat, cheap fallback

Input / 1M

$0.5

Output / 1M

$2.50

StreamingTool callingVision

OpenAI

GPT-5 family. Broadest general skills, best-in-class function calling, and the Codex line for code-specific workloads.

openai/gpt-5.5

GPT-5.5

Most-used flagship. Strong general intelligence with 1M context.

Context

Best for

General flagship: assistants, agents, structured output

Input / 1M

$2.50

Output / 1M

$15.00

StreamingTool callingVision

openai/gpt-5.4

GPT-5.4

Best price-to-quality. Workhorse for high-volume pipelines.

Context

Best for

Best $/quality workhorse. Pipelines and tool-calling backends

Input / 1M

$1.25

Output / 1M

$7.50

StreamingTool callingVision

openai/gpt-5.4-mini

GPT-5.4 mini

Compact GPT-5.4. Cheaper variant with the same skills, lower latency.

Context

400K

Best for

Cheap GPT-5.4 for high-volume tasks with the same skills

Input / 1M

$0.375

Output / 1M

$2.25

StreamingTool callingVision

openai/gpt-5.3-codex

GPT-5.3 Codex

Code-tuned GPT-5.3. Best for agents, refactors and toolcalls.

Context

400K

Best for

Coding agents, refactors, function calls, IDE integrations

Input / 1M

$0.875

Output / 1M

$7.00

StreamingTool calling

openai/gpt-5.1-codex-max

GPT-5.1 Codex Max

Long-context code model. Built for repo-scale refactors.

Context

400K

Best for

Long-context code work — whole-repo refactors, PR reviews

Input / 1M

$0.625

Output / 1M

$5.00

StreamingTool calling

openai/gpt-5.1-codex-mini

GPT-5.1 Codex mini

Tiny code model. Cheap autocomplete and quick agent loops.

Context

400K

Best for

Inline autocomplete, cheap agent loops, quick refactors

Input / 1M

$0.125

Output / 1M

$1.00

StreamingTool calling

Google

Gemini family. Cheapest 1M-context model on the market (Flash 3 Preview) and strong multimodal grounding on Pro.

google/gemini-3.1-pro-preview

Gemini 3.1 Pro Preview

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

1M-context multimodal reasoning, PDF + image + video input

Input / 1M

$1.00

Output / 1M

$6.00

StreamingTool callingVision

google/gemini-3.5-flash

Gemini Flash 3.5

Fast, cheap Gemini. Great default for high-volume Google workloads.

Context

Best for

Cheap fast Gemini, high-volume Google Workspace integrations

Input / 1M

$0.75

Output / 1M

$4.50

StreamingTool callingVision

google/gemini-3-flash-preview

Gemini 3 Flash Preview

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

Cheapest 1M-context model on Nimbus. Bulk classification

Input / 1M

$0.25

Output / 1M

$1.50

StreamingTool callingVision

China labs

DeepSeek, Qwen, Z.AI, Moonshot. Frontier quality at 5–15% of Western prices. Best value for high-volume pipelines that don't need Anthropic-tier reasoning.

deepseek/deepseek-v4-pro

DeepSeek V4 Pro

Flagship DeepSeek. 1M context, strong reasoning at a fraction of the price.

Context

Best for

Frontier reasoning at $0.22 / $0.44 per 1M. Bulk pipelines

Input / 1M

$0.215

Output / 1M

$0.435

StreamingTool calling

qwen/qwen3-coder

Qwen3 Coder

Code-specialized Qwen. 1M context, built for agentic coding workflows.

Context

Best for

Agentic coding at 1M context. Best cheap coding model

Input / 1M

$0.11

Output / 1M

$0.9

StreamingTool calling

qwen/qwen3.7-max

Qwen3.7 Max

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

Top-tier Qwen flagship. Long-horizon reasoning tasks

Input / 1M

$0.625

Output / 1M

$1.88

StreamingTool calling

z-ai/glm-5

GLM-5

Latest Z.AI flagship. Strong reasoning and tool use for agents.

Context

200K

Best for

Z.AI flagship. Strong tool use for agentic workloads

Input / 1M

$0.3

Output / 1M

$0.96

StreamingTool calling

z-ai/glm-5.1

GLM 5.1

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

200K

Best for

Incremental Z.AI update. Better tool routing than GLM-5

Input / 1M

$0.49

Output / 1M

$1.54

StreamingTool calling

z-ai/glm-5.2

GLM 5.2

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

Latest GLM with 1M context. Long-doc summarization

Input / 1M

$0.7

Output / 1M

$2.20

StreamingTool calling

moonshotai/kimi-k2.6

Kimi K2.6

Moonshot agentic model with vision. Long context, great for tools.

Context

256K

Best for

Vision + agentic tools, 256K context. Great for OCR + workflow

Input / 1M

$0.34

Output / 1M

$1.71

StreamingTool callingVision

moonshotai/kimi-k2.7-code

Kimi K2.7 Code

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

256K

Best for

Latest Kimi coding variant. Strong at Python + JS agents

Input / 1M

$0.375

Output / 1M

$1.75

StreamingTool calling

Image generation

Gemini Flash Image, Gemini 3 Pro Image, GPT-5 Image variants. Billed per image at the underlying vendor rate ÷ 2.

google/gemini-2.5-flash-image

Gemini 2.5 Flash Image

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

—

Best for

Fast image edits, product mockups, iteration

Vision

google/gemini-3-pro-image

Gemini 3 Pro Image

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

—

Best for

Highest-fidelity Gemini image gen. Marketing hero shots

Vision

google/gemini-3.1-flash-image

Gemini 3.1 Flash Image

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

—

Best for

Cheap fast image gen. Bulk asset production

Vision

openai/gpt-5-image

GPT-5 Image

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

—

Best for

OpenAI photorealism. Best for advertising and product renders

Vision

openai/gpt-5-image-mini

GPT-5 Image Mini

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

—

Best for

Cheaper GPT-5 image tier. High-volume creative iteration

Vision

openai/gpt-5.4-image-2

GPT-5.4 Image 2

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

—

Best for

Latest GPT-5.4 image, best prompt adherence in the OpenAI line

Vision

Choosing a model

Highest quality, budget flexible: Claude Opus 4.8 or GPT-5.5.
Best value coding: GPT-5.3 Codex for hosted, DeepSeek V4 Pro or Qwen3 Coder for bulk.
Cheapest 1M context: Gemini 3 Flash Preview or DeepSeek V4 Pro.
Real-time chat: Claude Haiku 4.5 or GPT-5.4 mini.
Vision: Claude Opus / Sonnet / Haiku 4.x, GPT-5.5, Gemini 3.1 Pro, Kimi K2.6.