# AI Model Comparison > https://ai-driven-office.github.io/model-providers-comparison AI model throughput and pricing comparison dashboard by AI Driven Office (CyberAgent, Inc.). Compare speed (tokens/sec), pricing ($/M tokens), and capability scores across frontier AI models. Last updated: 2026-03-16 ## Models tracked - Claude Haiku 4.5 (Anthropic): 88 tps, $1/$5 per M tokens - Claude Opus 4.6 (Anthropic): 41 tps, $5/$25 per M tokens - Claude Sonnet 4.6 (Anthropic): 43 tps, $3/$15 per M tokens - Gemini 3.1 Flash-Lite (Google AI Studio): 318 tps, $0.25/$1.5 per M tokens - Gemini 3.1 Pro (Google Vertex): 103 tps, $2/$12 per M tokens - Gemini 3 Flash (Google AI Studio): 132 tps, $0.5/$3 per M tokens - GLM 4.7 (Cerebras (Direct)): 538 tps, $2.25/$2.75 per M tokens - GLM 5 (SiliconFlow): 36 tps, $0.3/$2.55 per M tokens - GPT-5.3-Codex (OpenAI): 62 tps, $1.75/$14 per M tokens - GPT-5.3-Codex-Spark (OpenAI): 965 tps, $1.75/$14 per M tokens - GPT-5.4 (OpenAI): 78 tps, $2.5/$15 per M tokens - GPT-5.4 Fast (OpenAI): 116 tps, $5/$30 per M tokens - GPT-5.4 Pro (OpenAI): 31 tps, $30/$180 per M tokens - Grok Code Fast 1 (xAI): 173 tps, $0.2/$1.5 per M tokens - Kimi K2.5 (Moonshot AI): 44 tps, $0.6/$3 per M tokens - Llama 3.1 8B (Taalas): 17000 tps, pricing not published - Mercury 2 (Inception (Mercury)): 655 tps, $0.25/$0.75 per M tokens - MiniMax M2.5 (MiniMax): 183 tps, $0.3/$1.2 per M tokens - Opus 4.6 Fast (Anthropic): 103 tps, $30/$150 per M tokens - Qwen 3.5 27B (Alibaba Cloud (Qwen)): 88 tps, $0.3/$2.4 per M tokens - Qwen 3.5 397B (Alibaba Cloud (Qwen)): 55 tps, $0.6/$3.6 per M tokens ## Pages - Dashboard: https://ai-driven-office.github.io/model-providers-comparison/ - Full data (Markdown): https://ai-driven-office.github.io/model-providers-comparison/data.md - GLM x Cerebras Guide: https://ai-driven-office.github.io/model-providers-comparison/glm-cerebras/ ## Data format For machine-readable data, fetch the Markdown version: https://ai-driven-office.github.io/model-providers-comparison/data.md The Markdown version contains full model data tables, ability scores, and provider information. ## Recent News - 2026-03-05: GPT-5.4 — OpenAI's 1M Context Unified Model Replaces Codex Line — 1x faster than GPT-5.4 - 2026-02-24: Mercury 2 Brings Diffusion LLMs Back Into the Latency Race — 2.06x faster than Gemini 3.1 Flash-Lite - 2026-02-20: Taalas HC1 Pushes Silicon Llama to ~17K tokens/sec — 415x faster than Claude Opus 4.6