# AI Model Comparison

> https://ai-driven-office.github.io/model-providers-comparison

AI model throughput and pricing comparison dashboard by AI Driven Office (CyberAgent, Inc.).
Compare speed (tokens/sec), pricing ($/M tokens), and capability scores across frontier AI models.

Last updated: 2026-03-16

## Models tracked

- Claude Haiku 4.5 (Anthropic): 88 tps, $1/$5 per M tokens
- Claude Opus 4.6 (Anthropic): 41 tps, $5/$25 per M tokens
- Claude Sonnet 4.6 (Anthropic): 43 tps, $3/$15 per M tokens
- Gemini 3.1 Flash-Lite (Google AI Studio): 318 tps, $0.25/$1.5 per M tokens
- Gemini 3.1 Pro (Google Vertex): 103 tps, $2/$12 per M tokens
- Gemini 3 Flash (Google AI Studio): 132 tps, $0.5/$3 per M tokens
- GLM 4.7 (Cerebras (Direct)): 538 tps, $2.25/$2.75 per M tokens
- GLM 5 (SiliconFlow): 36 tps, $0.3/$2.55 per M tokens
- GPT-5.3-Codex (OpenAI): 62 tps, $1.75/$14 per M tokens
- GPT-5.3-Codex-Spark (OpenAI): 965 tps, $1.75/$14 per M tokens
- GPT-5.4 (OpenAI): 78 tps, $2.5/$15 per M tokens
- GPT-5.4 Fast (OpenAI): 116 tps, $5/$30 per M tokens
- GPT-5.4 Pro (OpenAI): 31 tps, $30/$180 per M tokens
- Grok Code Fast 1 (xAI): 173 tps, $0.2/$1.5 per M tokens
- Kimi K2.5 (Moonshot AI): 44 tps, $0.6/$3 per M tokens
- Llama 3.1 8B (Taalas): 17000 tps, pricing not published
- Mercury 2 (Inception (Mercury)): 655 tps, $0.25/$0.75 per M tokens
- MiniMax M2.5 (MiniMax): 183 tps, $0.3/$1.2 per M tokens
- Opus 4.6 Fast (Anthropic): 103 tps, $30/$150 per M tokens
- Qwen 3.5 27B (Alibaba Cloud (Qwen)): 88 tps, $0.3/$2.4 per M tokens
- Qwen 3.5 397B (Alibaba Cloud (Qwen)): 55 tps, $0.6/$3.6 per M tokens

## Pages

- Dashboard: https://ai-driven-office.github.io/model-providers-comparison/
- Full data (Markdown): https://ai-driven-office.github.io/model-providers-comparison/data.md
- GLM x Cerebras Guide: https://ai-driven-office.github.io/model-providers-comparison/glm-cerebras/

## Data format

For machine-readable data, fetch the Markdown version:
  https://ai-driven-office.github.io/model-providers-comparison/data.md

The Markdown version contains full model data tables, ability scores, and provider information.

## Recent News

- 2026-03-05: GPT-5.4 — OpenAI's 1M Context Unified Model Replaces Codex Line — 1x faster than GPT-5.4
- 2026-02-24: Mercury 2 Brings Diffusion LLMs Back Into the Latency Race — 2.06x faster than Gemini 3.1 Flash-Lite
- 2026-02-20: Taalas HC1 Pushes Silicon Llama to ~17K tokens/sec — 415x faster than Claude Opus 4.6