updated 24 Feb 2026
LLM Leaderboard
This LLM leaderboard displays the latest public benchmark performance for SOTA model versions released after April 2024. The data comes from model providers as well as independently run evaluations by Vellum or the open-source community. We feature results from non-saturated benchmarks, excluding outdated benchmarks (e.g. MMLU). If you want to use these models in your agents, try Vellum.
Top models per tasks
Best in Reasoning (GPQA Diamond)
100%94%88%82%77%
Best in High School Math (AIME 2025)
100%96%93%89%86%
Best in Agentic Coding (SWE Bench)
85%81%76%72%68%
Best Overall (Humanity's Last Exam)
50%38%25%13%0%
Best in Visual Reasoning (ARC-AGI 2)
70%53%35%18%0%
Best in Multilingual Reasoning (MMMLU)
95%90%86%81%77%
Fastest and most affordable models
Fastest Models (Tokens/sec)
1Llama 4 Scout2600 t/s
2Llama 3.3 70b2500 t/s
3Llama 3.1 70b2100 t/s
4Llama 3.1 8b1800 t/s
5Llama 3.1 405b969 t/s
Lowest Latency (TTFT)
1GPT-5.3 Codex0.003s
2Nova Micro0.3s
3Llama 3.1 8b0.32s
4Llama 4 Scout0.33s
5Gemini 2.0 Flash0.34s
Cheapest Models (per 1M tokens)
1Nova Micro$0.04 / $0.14
2Gemma 3 27b$0.07 / $0.07
3Gemini 1.5 Flash$0.075 / $0.3
4GPT oss 20b$0.08 / $0.35
5Gemini 2.0 Flash$0.1 / $0.4
Compare models
vs
| Claude Opus 4.6 | Claude Sonnet 4.6 | |
|---|---|---|
| Context size | 200,000 | 200,000 |
| Cutoff date | May 2025 | Aug 2025 |
| I/O cost | $5 / $25 | $3 / $15 |
| Max output | 128,000 | 64,000 |
| Latency | 1.6s | 0.73s |
| Speed | 67 t/s | 55 t/s |
Claude Opus 4.6Claude Sonnet 4.6
Model Comparison
| Model | Context size | Cutoff date | I/O cost | Max output | Latency | Speed |
|---|---|---|---|---|---|---|
| 200,000 | May 2025 | $5 / $25 | 128,000 | 1.6s | 67 t/s | |
| 200,000 | Aug 2025 | $3 / $15 | 64,000 | 0.73s | 55 t/s | |
OpenAI o3-mini | 200,000 | Dec 2024 | $1.1 / $4.4 | 8,000 | 14s | 214 t/s |
| 128,000 | Dec 2024 | $0.55 / $2.19 | 8,000 | 4s | 24 t/s | |
| 200,000 | Nov 2024 | $3 / $15 | 64,000 | 0.95s | 78 t/s | |
| 1,000,000 | Nov 2024 | $1.25 / $10 | 65,000 | 30s | 191 t/s | |
GPT-5 | 400,000 | April 2025 | $1.25 / $10 | 128,000 | - | - |
Kimi K2 Thinking | 256,000 | April 2025 | $0.6 / $2.5 | 16,400 | 25.3s | 79 t/s |
| 10000000 | April 2025 | $2 / $12 | 650000 | 30.3s | 128 t/s | |
| 200,000 | Mar 2025 | $3 / $15 | 64,000 | 1.9s | - | |
| 200,000 | Mar 2025 | $15 / $75 | 32,000 | 1.95s | - | |
GPT oss 120b | 131,072 | April 2025 | $0.15 / $0.6 | 131,072 | 8.1s | 260 t/s |
GPT oss 20b | 131,072 | April 2025 | $0.08 / $0.35 | 131,072 | 4s | 564 t/s |
| 200,000 | April 2025 | $15 / $75 | 32,000 | - | - | |
GPT 5.1 | 200,000 | April 2025 | $1.25 / $10 | 128,000 | - | - |
| 200000 | April 2025 | $3 / $15 | 160000 | 31s | 69 t/s | |
GPT 5.2 | 400k | Aug 2025 | $1.5 / $14 | 16,000 | 0.6s | 92 t/s |
| 128,000 | Dec 2024 | $0.27 / $1.1 | 8,000 | 4s | 33 t/s | |
Qwen2.5-VL-32B | 131,000 | Dec 2024 | - | 8,000 | - | - |
GPT-4.5 | 128,000 | Nov 2024 | $75 / $150 | 16,384 | 1.25s | 48 t/s |
| 200,000 | Nov 2024 | $3 / $15 | 128,000 | 0.91s | 78 t/s | |
| / | Nov 2024 | - | - | - | - | |
| 128,000 | Nov 2024 | $0.07 / $0.07 | 8192 | 0.72s | 59 t/s | |
GPT-4.1 | 1,000,000 | December 2024 | $2 / $8 | 16,000 | - | - |
GPT-4.1 mini | 1,000,000 | December 2024 | $0.4 / $1.6 | 16,000 | - | - |
| 200,000 | April 2025 | $5 / $25 | 64,000 | - | - | |
OpenAI o1-mini | 128,000 | Dec 2024 | $3 / $12 | 8,000 | 11.43s | 220 t/s |
| 10,000,000 | November 2024 | $0.2 / $0.6 | 8,000 | 0.45s | 126 t/s | |
| 10,000,000 | November 2024 | $0.11 / $0.34 | 8,000 | 0.33s | 2600 t/s | |
| - | November 2024 | - | - | - | - | |
GPT-4.1 nano | 1,000,000 | December 2024 | $0.1 / $0.4 | 32,000 | - | - |
GPT-5.3 Codex | 400,000 | Aug 2025 | $1.75 / $14 | 128,000 | 0.003s | 50 t/s |
Context window, cost and speed comparison
| Models | Context Window | Input Cost / 1M tokens | Output Cost / 1M tokens | Speed (tokens/second) | Latency |
|---|---|---|---|---|---|
| 200,000 | $5 | $25 | 67 t/s | 1.6 seconds | |
| 200,000 | $3 | $15 | 55 t/s | 0.73 seconds | |
GPT-5.3 Codex | 400,000 | $1.75 | $14 | 50 t/s | 0.003 seconds |
| 128,000 | $0.27 | $1.1 | 33 t/s | 4 seconds | |
Qwen2.5-VL-32B | 131,000 | n/a | n/a | n/a | n/a |
OpenAI o1-mini | 128,000 | $3 | $12 | 220 t/s | 11.43 seconds |
OpenAI o3-mini | 200,000 | $1.1 | $4.4 | 214 t/s | 14 seconds |
| 128,000 | $0.55 | $2.19 | 24 t/s | 4 seconds | |
| 200,000 | $3 | $15 | 78 t/s | 0.95 seconds | |
GPT-4.5 | 128,000 | $75 | $150 | 48 t/s | 1.25 seconds |
| 200,000 | $3 | $15 | 78 t/s | 0.91 seconds | |
| 1,000,000 | $1.25 | $10 | 191 t/s | 30 seconds | |
| / | n/a | n/a | n/a | n/a | |
| 128,000 | $0.07 | $0.07 | 59 t/s | 0.72 seconds | |
| 10,000,000 | $0.2 | $0.6 | 126 t/s | 0.45 seconds | |
| 10,000,000 | $0.11 | $0.34 | 2600 t/s | 0.33 seconds | |
| n/a | n/a | n/a | n/a | n/a | |
GPT-4.1 | 1,000,000 | $2 | $8 | n/a | n/a |
GPT-4.1 mini | 1,000,000 | $0.4 | $1.6 | n/a | n/a |
GPT-4.1 nano | 1,000,000 | $0.1 | $0.4 | n/a | n/a |
| 200,000 | $3 | $15 | n/a | 1.9 seconds | |
| 200,000 | $15 | $75 | n/a | 1.95 seconds | |
GPT oss 120b | 131,072 | $0.15 | $0.6 | 260 t/s | 8.1 seconds |
GPT oss 20b | 131,072 | $0.08 | $0.35 | 564 t/s | 4 seconds |
| 200,000 | $15 | $75 | n/a | n/a | |
GPT-5 | 400,000 | $1.25 | $10 | n/a | n/a |
GPT 5.1 | 200,000 | $1.25 | $10 | n/a | n/a |
Kimi K2 Thinking | 256,000 | $0.6 | $2.5 | 79 t/s | 25.3 seconds |
| 10000000 | $2 | $12 | 128 t/s | 30.3 seconds | |
| 200000 | $3 | $15 | 69 t/s | 31 seconds | |
| 200,000 | $5 | $25 | n/a | n/a | |
GPT 5.2 | 400k | $1.5 | $14 | 92 t/s | 0.6 seconds |
OpenAI o3-mini
Kimi K2 Thinking
Qwen2.5-VL-32B