Open LLM Leaderboard
This LLM leaderboard displays the latest public benchmark performance for SOTA open-sourced model versions released after April 2024. The data comes from model providers as well as independently run evaluations by Vellum or the AI community. We feature results from non-saturated benchmarks, excluding outdated benchmarks (e.g. MMLU). If you want to evaluate these models on your use-cases, try Vellum Evals.
Best open source models per task
Best in Reasoning (GPQA Diamond)
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Nemotron Ultra 253B
76
Llama 4 Behemoth
73.7
DeepSeek-R1
71.5
Llama 4 Maverick
69.8
DeepSeek V3 0324
64.8
Best in High School Math (AIME 2024)
Score (Percentage)
100%
90%
80%
70%
60%
50%
Nemotron Ultra 253B
80.08
DeepSeek-R1
79.8
DeepSeek V3 0324
59.4
Llama 3.1 405b
23.3
Best in Agentic Coding (SWE Bench)
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
DeepSeek-R1
49.2
DeepSeek V3 0324
38.8

Qwen2.5-VL-32B
18.8
Gemma 3 27b
10.2
Best in Tool Use (BFCL)
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Llama 3.1 405b
81.1
Llama 3.3 70b
77.3

Qwen2.5-VL-32B
62.79
Gemma 3 27b
59.11
DeepSeek V3 0324
58.55
Best in Adaptive Reasoning (GRIND)
Score (Percentage)
100%
90%
80%
70%
60%
50%
40%
30%
20%
10%
0%
Nemotron Ultra 253B
57.1
Llama 4 Maverick
53.6
DeepSeek-R1
53.6

Qwen2.5-VL-32B
42.9
Best Coding (LiveCode Bench)
Score (Percentage)
50
40
30
20
10
0
DeepSeek-R1
64.3
Nemotron Ultra 253B
64
Llama 4 Behemoth
49.4
Llama 4 Maverick
41
DeepSeek V3 0324
41
Fastest and most affordable models
Fastest Models
Tokens/seconds
2500
2000
1500
1000
500
0
Llama 4 Scout
2600
Llama 3.3 70b
2500
Llama 3.1 70b
2100
Llama 3.1 405b
969
Llama 4 Maverick
126
Lowest Latency (TTFT)
Seconds to first token
0.6s
0.5s
0.4s
0.3s
0.2s
0.1s
0.0s
Llama 4 Scout
0.33
Llama 4 Maverick
0.45
Llama 3.3 70b
0.52
Gemma 3 27b
0.72
Llama 3.1 405b
0.73
Cheapest Models
Input
Output
USD per 1M tokens
0.8
0.65
0.5
0.35
0.2
0.05
Gemma 3 27b
$
0.07
$
0.07
Llama 4 Scout
$
0.11
$
0.34
Llama 4 Maverick
$
0.2
$
0.6
DeepSeek V3 0324
$
0.27
$
1.1
Compare open models
Select two models to compare
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
vs
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Model
Context size
Cutoff date
I/O cost
Max output
Latency
Speed
Gemini 2.5 Flash
1,000,000
May 2024
0.15
/
0.6
30,000
0.35
s
n/a
200
t/s
n/a
OpenAI o3
200,000
May 2024
10
/
40
100,000
28
s
n/a
94
t/s
n/a
OpenAI o4-mini
200,000
May 2024
1.1
/
4.4
100,000
35.3
s
n/a
135
t/s
n/a
Nemotron Ultra 253B
/
s
n/a
t/s
n/a
GPT-4.1 nano
1,000,000
December 2024
0.1
/
0.4
32,000
s
n/a
t/s
n/a
GPT-4.1 mini
1,000,000
December 2024
0.4
/
1.6
16,000
s
n/a
t/s
n/a
GPT-4.1
1,000,000
December 2024
2
/
8
16,000
s
n/a
t/s
n/a
Llama 4 Behemoth
November 2024
/
s
n/a
t/s
n/a
Llama 4 Scout
10,000,000
November 2024
0.11
/
0.34
8,000
0.33
s
n/a
2600
t/s
n/a
Llama 4 Maverick
10,000,000
November 2024
0.2
/
0.6
8,000
0.45
s
n/a
126
t/s
n/a
Gemma 3 27b
128,000
Nov 2024
0.07
/
0.07
8192
0.72
s
n/a
59
t/s
n/a
Grok 3 mini [Beta]
/
Nov 2024
/
s
n/a
t/s
n/a
Grok 3 [Beta]
/
Nov 2024
/
s
n/a
t/s
n/a
Gemini 2.5 Pro
1,000,000
Nov 2024
1.25
/
10
65,000
30
s
n/a
191
t/s
n/a
Claude 3.7 Sonnet
200,000
Nov 2024
3
/
15
128,000
0.91
s
n/a
78
t/s
n/a
GPT-4.5
128,000
Nov 2024
75
/
150
16,384
1.25
s
n/a
48
t/s
n/a
Claude 3.7 Sonnet [R]
200,000
Nov 2024
3
/
15
64,000
0.95
s
n/a
78
t/s
n/a
DeepSeek-R1
128,000
Dec 2024
0.55
/
2.19
8,000
4
s
n/a
24
t/s
n/a
OpenAI o3-mini
200,000
Dec 2024
1.1
/
4.4
8,000
14
s
n/a
214
t/s
n/a
OpenAI o1-mini
128,000
Dec 2024
3
/
12
8,000
11.43
s
n/a
220
t/s
n/a
Qwen2.5-VL-32B
131,000
Dec 2024
/
8,000
s
n/a
t/s
n/a
DeepSeek V3 0324
128,000
Dec 2024
0.27
/
1.1
8,000
4
s
n/a
33
t/s
n/a
OpenAI o1
200,000
Oct 2023
15
/
60
100,000
30
s
n/a
100
t/s
n/a
Gemini 2.0 Flash
1,000,000
Aug 2024
0.1
/
0.4
8,192
0.34
s
n/a
257
t/s
n/a
Llama 3.3 70b
128,000
July 2024
0.59
/
0.7
32,768
0.52
s
n/a
2500
t/s
n/a
Nova Micro
128,000
July 2024
0.04
/
0.14
4096
0.3
s
n/a
t/s
n/a
Nova Lite
300,000
July 2024
/
4096
0.4
s
n/a
t/s
n/a
Nova Pro
300,000
July 2024
1
/
4
4096
0.64
s
n/a
128
t/s
n/a
Claude 3.5 Haiku
200,000
July 2024
0.8
/
4
4096
0.88
s
n/a
66
t/s
n/a
Llama 3.1 405b
128,000
Dec 2023
3.5
/
3.5
4096
0.73
s
n/a
969
t/s
n/a
Llama 3.1 70b
128,000
Dec 2023
/
4096
s
n/a
2100
t/s
n/a
Llama 3.1 8b
128,000
Dec 2023
/
4096
0.32
s
n/a
1800
t/s
n/a
Gemini 1.5 Flash
1,000,000
May 2024
0.075
/
0.3
4096
1.06
s
n/a
166
t/s
n/a
Gemini 1.5 Pro
2,000,000
May 2024
/
4096
s
n/a
61
t/s
n/a
GPT-3.5 Turbo
16,400
Sept 2023
/
4096
s
n/a
t/s
n/a
GPT-4o mini
128,000
Oct 2023
0.15
/
0.6
4096
0.35
s
n/a
65
t/s
n/a
GPT-Turbo
128,000
Dec 2023
/
4096
s
n/a
t/s
n/a
GPT-4o
128,000
Oct 2023
2.5
/
10
4096
0.51
s
n/a
143
t/s
n/a
Claude 3 Haiku
200,000
Apr 2024
/
4096
s
n/a
t/s
n/a
Claude 3.5 Sonnet
200,000
Apr 2024
3
/
15
4096
1.22
s
n/a
78
t/s
n/a
Claude 3 Opus
200,000
Aug 2023
/
4096
s
n/a
t/s
n/a
GPT-4
8192
Dec 2023
/
4096
s
n/a
t/s
n/a
Gemini 2.5 Flash
1,000,000
May 2024
0.15
/
0.6
30,000
0.35
s
n/a
200
t/s
n/a
OpenAI o3
200,000
May 2024
10
/
40
100,000
28
s
n/a
94
t/s
n/a
OpenAI o4-mini
200,000
May 2024
1.1
/
4.4
100,000
35.3
s
n/a
135
t/s
n/a
Nemotron Ultra 253B
/
s
n/a
t/s
n/a
GPT-4.1 nano
1,000,000
December 2024
0.1
/
0.4
32,000
s
n/a
t/s
n/a
GPT-4.1 mini
1,000,000
December 2024
0.4
/
1.6
16,000
s
n/a
t/s
n/a
GPT-4.1
1,000,000
December 2024
2
/
8
16,000
s
n/a
t/s
n/a
Llama 4 Behemoth
November 2024
/
s
n/a
t/s
n/a
Llama 4 Scout
10,000,000
November 2024
0.11
/
0.34
8,000
0.33
s
n/a
2600
t/s
n/a
Llama 4 Maverick
10,000,000
November 2024
0.2
/
0.6
8,000
0.45
s
n/a
126
t/s
n/a
Gemma 3 27b
128,000
Nov 2024
0.07
/
0.07
8192
0.72
s
n/a
59
t/s
n/a
Grok 3 mini [Beta]
/
Nov 2024
/
s
n/a
t/s
n/a
Grok 3 [Beta]
/
Nov 2024
/
s
n/a
t/s
n/a
Gemini 2.5 Pro
1,000,000
Nov 2024
1.25
/
10
65,000
30
s
n/a
191
t/s
n/a
Claude 3.7 Sonnet
200,000
Nov 2024
3
/
15
128,000
0.91
s
n/a
78
t/s
n/a
GPT-4.5
128,000
Nov 2024
75
/
150
16,384
1.25
s
n/a
48
t/s
n/a
Claude 3.7 Sonnet [R]
200,000
Nov 2024
3
/
15
64,000
0.95
s
n/a
78
t/s
n/a
DeepSeek-R1
128,000
Dec 2024
0.55
/
2.19
8,000
4
s
n/a
24
t/s
n/a
OpenAI o3-mini
200,000
Dec 2024
1.1
/
4.4
8,000
14
s
n/a
214
t/s
n/a
OpenAI o1-mini
128,000
Dec 2024
3
/
12
8,000
11.43
s
n/a
220
t/s
n/a
Qwen2.5-VL-32B
131,000
Dec 2024
/
8,000
s
n/a
t/s
n/a
DeepSeek V3 0324
128,000
Dec 2024
0.27
/
1.1
8,000
4
s
n/a
33
t/s
n/a
OpenAI o1
200,000
Oct 2023
15
/
60
100,000
30
s
n/a
100
t/s
n/a
Gemini 2.0 Flash
1,000,000
Aug 2024
0.1
/
0.4
8,192
0.34
s
n/a
257
t/s
n/a
Llama 3.3 70b
128,000
July 2024
0.59
/
0.7
32,768
0.52
s
n/a
2500
t/s
n/a
Nova Micro
128,000
July 2024
0.04
/
0.14
4096
0.3
s
n/a
t/s
n/a
Nova Lite
300,000
July 2024
/
4096
0.4
s
n/a
t/s
n/a
Nova Pro
300,000
July 2024
1
/
4
4096
0.64
s
n/a
128
t/s
n/a
Claude 3.5 Haiku
200,000
July 2024
0.8
/
4
4096
0.88
s
n/a
66
t/s
n/a
Llama 3.1 405b
128,000
Dec 2023
3.5
/
3.5
4096
0.73
s
n/a
969
t/s
n/a
Llama 3.1 70b
128,000
Dec 2023
/
4096
s
n/a
2100
t/s
n/a
Llama 3.1 8b
128,000
Dec 2023
/
4096
0.32
s
n/a
1800
t/s
n/a
Gemini 1.5 Flash
1,000,000
May 2024
0.075
/
0.3
4096
1.06
s
n/a
166
t/s
n/a
Gemini 1.5 Pro
2,000,000
May 2024
/
4096
s
n/a
61
t/s
n/a
GPT-3.5 Turbo
16,400
Sept 2023
/
4096
s
n/a
t/s
n/a
GPT-4o mini
128,000
Oct 2023
0.15
/
0.6
4096
0.35
s
n/a
65
t/s
n/a
GPT-Turbo
128,000
Dec 2023
/
4096
s
n/a
t/s
n/a
GPT-4o
128,000
Oct 2023
2.5
/
10
4096
0.51
s
n/a
143
t/s
n/a
Claude 3 Haiku
200,000
Apr 2024
/
4096
s
n/a
t/s
n/a
Claude 3.5 Sonnet
200,000
Apr 2024
3
/
15
4096
1.22
s
n/a
78
t/s
n/a
Claude 3 Opus
200,000
Aug 2023
/
4096
s
n/a
t/s
n/a
GPT-4
8192
Dec 2023
/
4096
s
n/a
t/s
n/a
Standard Benchmarks
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.