Models

28 models on Nimbus, grouped by provider. Every model streams, most call tools, most see images. Every price is exactly 50% off the vendor's list rate.

Full price table is on the Pricing page. This page is for choosing a model — what each one is good at, and what capabilities it supports. All models share the same OpenAI-compatible endpoint at https://llm.nimbusapi.net/v1. Anthropic-format is also available at /anthropic/v1 for Claude models.

Anthropic

Claude family. Strongest at long-context reasoning, agent loops, and tool use. Best default for coding assistants and multi-turn agents.

anthropic/claude-opus-4.8

Claude Opus 4.8

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

1M

Best for

Frontier reasoning, autonomous agents, long-horizon planning

Input / 1M

$2.50

Output / 1M

$12.50

StreamingTool callingVision

anthropic/claude-opus-4.7

Claude Opus 4.7

Top-tier reasoning. Best for complex agents and long-context work.

Context

1M

Best for

Complex agents, deep research, long-context code review

Input / 1M

$2.50

Output / 1M

$12.50

StreamingTool callingVision

anthropic/claude-opus-4.6

Claude Opus 4.6

Previous-gen Opus. Proven reasoning at the same price.

Context

1M

Best for

Same reasoning at proven stability. Good for pinned production

Input / 1M

$2.50

Output / 1M

$12.50

StreamingTool callingVision

anthropic/claude-sonnet-4.6

Claude Sonnet 4.6

Balanced speed and intelligence. Great default for chat and tools.

Context

1M

Best for

Balanced default: chat, tool use, moderate reasoning

Input / 1M

$1.50

Output / 1M

$7.50

StreamingTool callingVision

anthropic/claude-haiku-4.5

Claude Haiku 4.5

Lowest latency. Built for high-throughput, real-time tasks.

Context

200K

Best for

High-throughput classification, real-time chat, cheap fallback

Input / 1M

$0.5

Output / 1M

$2.50

StreamingTool callingVision

OpenAI

GPT-5 family. Broadest general skills, best-in-class function calling, and the Codex line for code-specific workloads.

openai/gpt-5.5

GPT-5.5

Most-used flagship. Strong general intelligence with 1M context.

Context

1M

Best for

General flagship: assistants, agents, structured output

Input / 1M

$2.50

Output / 1M

$15.00

StreamingTool callingVision

openai/gpt-5.4

GPT-5.4

Best price-to-quality. Workhorse for high-volume pipelines.

Context

1M

Best for

Best $/quality workhorse. Pipelines and tool-calling backends

Input / 1M

$1.25

Output / 1M

$7.50

StreamingTool callingVision

openai/gpt-5.4-mini

GPT-5.4 mini

Compact GPT-5.4. Cheaper variant with the same skills, lower latency.

Context

400K

Best for

Cheap GPT-5.4 for high-volume tasks with the same skills

Input / 1M

$0.375

Output / 1M

$2.25

StreamingTool callingVision

openai/gpt-5.3-codex

GPT-5.3 Codex

Code-tuned GPT-5.3. Best for agents, refactors and toolcalls.

Context

400K

Best for

Coding agents, refactors, function calls, IDE integrations

Input / 1M

$0.875

Output / 1M

$7.00

StreamingTool calling

openai/gpt-5.1-codex-max

GPT-5.1 Codex Max

Long-context code model. Built for repo-scale refactors.

Context

400K

Best for

Long-context code work — whole-repo refactors, PR reviews

Input / 1M

$0.625

Output / 1M

$5.00

StreamingTool calling

openai/gpt-5.1-codex-mini

GPT-5.1 Codex mini

Tiny code model. Cheap autocomplete and quick agent loops.

Context

400K

Best for

Inline autocomplete, cheap agent loops, quick refactors

Input / 1M

$0.125

Output / 1M

$1.00

StreamingTool calling

Google

Gemini family. Cheapest 1M-context model on the market (Flash 3 Preview) and strong multimodal grounding on Pro.

google/gemini-3.1-pro-preview

Gemini 3.1 Pro Preview

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

1M

Best for

1M-context multimodal reasoning, PDF + image + video input

Input / 1M

$1.00

Output / 1M

$6.00

StreamingTool callingVision

google/gemini-3.5-flash

Gemini Flash 3.5

Fast, cheap Gemini. Great default for high-volume Google workloads.

Context

1M

Best for

Cheap fast Gemini, high-volume Google Workspace integrations

Input / 1M

$0.75

Output / 1M

$4.50

StreamingTool callingVision

google/gemini-3-flash-preview

Gemini 3 Flash Preview

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

1M

Best for

Cheapest 1M-context model on Nimbus. Bulk classification

Input / 1M

$0.25

Output / 1M

$1.50

StreamingTool callingVision

China labs

DeepSeek, Qwen, Z.AI, Moonshot. Frontier quality at 5–15% of Western prices. Best value for high-volume pipelines that don't need Anthropic-tier reasoning.

deepseek/deepseek-v4-pro

DeepSeek V4 Pro

Flagship DeepSeek. 1M context, strong reasoning at a fraction of the price.

Context

1M

Best for

Frontier reasoning at $0.22 / $0.44 per 1M. Bulk pipelines

Input / 1M

$0.215

Output / 1M

$0.435

StreamingTool calling

qwen/qwen3-coder

Qwen3 Coder

Code-specialized Qwen. 1M context, built for agentic coding workflows.

Context

1M

Best for

Agentic coding at 1M context. Best cheap coding model

Input / 1M

$0.11

Output / 1M

$0.9

StreamingTool calling

qwen/qwen3.7-max

Qwen3.7 Max

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

1M

Best for

Top-tier Qwen flagship. Long-horizon reasoning tasks

Input / 1M

$0.625

Output / 1M

$1.88

StreamingTool calling

z-ai/glm-5

GLM-5

Latest Z.AI flagship. Strong reasoning and tool use for agents.

Context

200K

Best for

Z.AI flagship. Strong tool use for agentic workloads

Input / 1M

$0.3

Output / 1M

$0.96

StreamingTool calling

z-ai/glm-5.1

GLM 5.1

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

200K

Best for

Incremental Z.AI update. Better tool routing than GLM-5

Input / 1M

$0.49

Output / 1M

$1.54

StreamingTool calling

z-ai/glm-5.2

GLM 5.2

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

1M

Best for

Latest GLM with 1M context. Long-doc summarization

Input / 1M

$0.7

Output / 1M

$2.20

StreamingTool calling

moonshotai/kimi-k2.6

Kimi K2.6

Moonshot agentic model with vision. Long context, great for tools.

Context

256K

Best for

Vision + agentic tools, 256K context. Great for OCR + workflow

Input / 1M

$0.34

Output / 1M

$1.71

StreamingTool callingVision

moonshotai/kimi-k2.7-code

Kimi K2.7 Code

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

256K

Best for

Latest Kimi coding variant. Strong at Python + JS agents

Input / 1M

$0.375

Output / 1M

$1.75

StreamingTool calling

Image generation

Gemini Flash Image, Gemini 3 Pro Image, GPT-5 Image variants. Billed per image at the underlying vendor rate ÷ 2.

google/gemini-2.5-flash-image

Gemini 2.5 Flash Image

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

Fast image edits, product mockups, iteration

Vision

google/gemini-3-pro-image

Gemini 3 Pro Image

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

Highest-fidelity Gemini image gen. Marketing hero shots

Vision

google/gemini-3.1-flash-image

Gemini 3.1 Flash Image

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

Cheap fast image gen. Bulk asset production

Vision

openai/gpt-5-image

GPT-5 Image

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

OpenAI photorealism. Best for advertising and product renders

Vision

openai/gpt-5-image-mini

GPT-5 Image Mini

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

Cheaper GPT-5 image tier. High-volume creative iteration

Vision

openai/gpt-5.4-image-2

GPT-5.4 Image 2

Frontier model in the Nimbus catalog. See the pricing page for full rate details.

Context

Best for

Latest GPT-5.4 image, best prompt adherence in the OpenAI line

Vision

Choosing a model

  • Highest quality, budget flexible: Claude Opus 4.8 or GPT-5.5.
  • Best value coding: GPT-5.3 Codex for hosted, DeepSeek V4 Pro or Qwen3 Coder for bulk.
  • Cheapest 1M context: Gemini 3 Flash Preview or DeepSeek V4 Pro.
  • Real-time chat: Claude Haiku 4.5 or GPT-5.4 mini.
  • Vision: Claude Opus / Sonnet / Haiku 4.x, GPT-5.5, Gemini 3.1 Pro, Kimi K2.6.