AI := DrivenLive Data · Mar 2026
Turn off effects
EN日本語

AI Model Comparison

Speed, pricing, capabilities, and recommendations across AI model providers·Why →

NEW
GPT-5.4 — OpenAI's 1M Context Unified Model Replaces Codex Line
ACBAutoCodeBench language rankings — pass rates across 21 languages
View →
Speed Champion
Llama 3.1 8Bon Taalas
17,000
tokens/sec

Throughput (tokens per second)

Alibaba Cloud (Qwen)
Anthropic
Cerebras (Direct)
Google AI Studio
Google Vertex
Inception (Mercury)
MiniMax
Moonshot AI
OpenAI
SiliconFlow
Taalas
xAI
Fast Mode
GLM 4.7 × Cerebras Guide
AI coding at 1,000 tps — OpenCode setup & recommended workflow
Taalas HC1 — New Speed Record
17,000 tps — custom silicon with extreme throughput even after heavy quantization
ModelProviderTPSInput $/MOutput $/M>200K Input $/M>200K Output $/M
Llama 3.1 8BTry free →Taalas17,000
GPT-5.3-Codex-SparkTry free →OpenAI965$1.75$14
Mercury 2FASTTry free →Inception (Mercury)655$0.25$0.75
GLM 4.7Try free →Cerebras (Direct)538$2.25$2.75
Gemini 3.1 Flash-LiteTry free →Google AI Studio318$0.25$1.5
MiniMax M2.5Try free →MiniMax183$0.3$1.2
Grok Code Fast 1Try free →xAI173$0.2$1.5
Gemini 3 FlashTry free →Google AI Studio132$0.5$3
GPT-5.4 FastFASTTry free →OpenAI116$5$30$10$45
Gemini 3.1 ProTry free →Google Vertex103$2$12$4$18
Opus 4.6 FastFASTTry free →Anthropic103$30$150$60$225
Claude Haiku 4.5Try free →Anthropic88$1$5
Qwen 3.5 27BAlibaba Cloud (Qwen)88$0.3$2.4
GPT-5.4Try free →OpenAI78$2.5$15$5$22.5
GPT-5.3-CodexTry free →OpenAI62$1.75$14
Qwen 3.5 397BAlibaba Cloud (Qwen)55$0.6$3.6
Kimi K2.5Try free →Moonshot AI44$0.6$3
Claude Sonnet 4.6Try free →Anthropic43$3$15
Claude Opus 4.6Try free →Anthropic41$5$25
GLM 5Try free →SiliconFlow36$0.3$2.55
GPT-5.4 ProTry free →OpenAI31$30$180
News & Community

News & Updates

Latest developments in AI model performance and infrastructure

MAR 5, 2026releaseopenaicomputer uselong contextcodex replacement

GPT-5.4 — OpenAI's 1M Context Unified Model Replaces Codex Line

OpenAI has released GPT-5.4, their most ambitious model consolidation yet. It merges the previously separate Codex coding line, reasoning capabilities, and general knowledge into a single model — and adds native computer-use as a first for OpenAI's mainline models. The headline feature is a 1,050,000-token context window, but there's a catch: input beyond 272K tokens costs 2x ($5/M input, $22.50/M output vs standard $2.50/$15), and long-context performance degrades significantly. While GPT-4.1 scored 100% on needle-in-haystack at 1M tokens, real-world agentic tasks show diminishing returns as context grows — models lose track of earlier instructions, hallucinate references, and exhibit attention drift. OpenAI acknowledges this by training GPT-5.4 with "compaction" to compress trajectories, but independent evaluations are still pending. For most use cases, the sweet spot remains under 256K tokens. The biggest wins are in agentic benchmarks: OSWorld jumps to 75% (surpassing the 72.4% human baseline), GDPval hits 83% across 44 professions, and ARC-AGI-2 reaches 73.3%. On coding, it matches GPT-5.3-Codex on SWE-Bench Pro (57.7% vs 56.8%) while adding much stronger general knowledge. GPT-5.2 Thinking is scheduled for deprecation on June 5, 2026, with GPT-5.4 positioned as the successor. Codex continues to run on the GPT-5.4 family, and OpenAI's priority tier keeps the higher-throughput path at premium pricing, but the public baseline remains the standard GPT-5.4 API at roughly 78 tok/s and $2.50/$15. That leaves it materially faster than Claude Opus 4.6 while staying close to Gemini 3.1 Pro on price.

Speed Comparisons
GPT-5.4
1x
GPT-5.3-Codex
1.26x
Claude Opus 4.6
1.9x
Context: 1.05M tokens
Max output: 128K tokens
Input price: $2.50/M
Output price: $15/M
OSWorld: 75% (>human)
SWE-Bench Pro: 57.7%
FEB 24, 2026releaseinferencelatencydiffusion

Mercury 2 Brings Diffusion LLMs Back Into the Latency Race

Inception launched Mercury 2 as a faster and cheaper follow-up to the original Mercury line. Officially, the company says Mercury 2 reaches 1,009 tokens per second on Blackwell GPUs with 128K context and $0.25/$0.75 per million token pricing. Independent tracking is more conservative but still unusually fast: Artificial Analysis' latest public snapshot puts Mercury 2 around 655 tok/s, which keeps it far ahead of mainstream frontier APIs on direct latency. The tradeoff is capability ceiling rather than raw speed. Mercury 2 is best read as a throughput-first model for short-loop agentic work, low-latency chat, and interactive coding assistance where response speed matters more than absolute benchmark leadership.

Speed Comparisons
Gemini 3.1 Flash-Lite
2.06x
Grok Code Fast 1
3.79x
GPT-5.4
8.4x
Context: 128K tokens
Input price: $0.25/M
Output price: $0.75/M
Official speed: 1,009 tok/s
AA measured speed: 655 tok/s
Inception (Mercury)
Read launch post
FEB 20, 2026speed recordinferencecustom silicon

Taalas HC1 Pushes Silicon Llama to ~17K tokens/sec

Taalas says its HC1 chip can run Silicon Llama 3.1 8B at about 17K tokens per second per user, which is still far beyond GPU-class direct inference. The design hardwires the model into custom silicon instead of relying on HBM-heavy accelerator stacks, and Taalas claims roughly 10x lower power than conventional hardware. The public hardware details remain striking: TSMC 6nm, 815mm² die size, 53B transistors, a 24-person team, and about $169M raised. The key caveat is quality: Taalas explicitly says the first-generation Silicon Llama is aggressively quantized with mixed 3-bit and 6-bit weights, so it does not match full-precision GPU baselines on quality. Even with that caveat, the speed headroom is unusual enough to make near-zero-latency chat, instant summarization, and multi-step agent loops practical in ways general cloud inference still struggles to match.

Speed Comparisons
Claude Opus 4.6
415x
GPT-5.3-Codex
274x
GLM 4.7 (Cerebras)
31.6x
Process: TSMC 6nm
Die size: 815mm²
Transistors: 53B
Quantization: Mixed 3-bit + 6-bit
Team: 24
Funding: $169M

Spread the word

Know someone evaluating AI models? Share this comparison — the more eyes, the better the data.

Help us keep this accurate

Found a wrong price, missing model, or outdated benchmark? Open an issue or send a pull request — every fix helps the community.

Open an Issue or PR