Llama 4 Maverick, a 17 billion active parameter model with 128 experts, is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash across a broad range of widely reported benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and coding—at less than half the active parameters. Llama 4 Maverick offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena.
Added Sep 5, 2025
Context Window
1.0M
Max Output
65.5K
Input Price (Auto)
$0.16/1M
Output Price (Auto)
$0.63/1M
Capabilities
Performance metrics and benchmarks
Sourced from Artificial Analysis.
Intelligence Index
18.4
Choose explicit providers for this model. Auto routing remains available as the default option.
Loading provider options…
Coding Index
15.6
GPQA Diamond
Graduate-level scientific reasoning
67.1%
Better than 57% of models compared
HLE
Humanity's Last Exam
4.8%
Better than 36% of models compared
IFBench
Instruction-following benchmark
43.0%
Better than 55% of models compared
T²-Bench Telecom
Conversational AI agents in dual-control scenarios
17.8%
Better than 19% of models compared
AA-LCR
Long context reasoning evaluation
46.0%
Better than 64% of models compared
SciCode
Python programming for scientific computing
33.1%
Better than 59% of models compared
Terminal-Bench Hard
Agentic coding and terminal use
6.8%
Better than 43% of models compared
AIME 2025
American Invitational Mathematics Examination 2025
19.3%
Better than 22% of models compared
AIME
American Invitational Mathematics Examination
39.0%
Better than 63% of models compared
MMLU-Pro
Professional and academic subject knowledge
80.9%
Better than 72% of models compared
Last updated May 15, 2026
Artificial AnalysisLiveCodeBench
Contamination-free coding benchmark
39.7%
Better than 46% of models compared
Math-500
Diverse mathematical problem solving benchmark
88.9%
Better than 61% of models compared