104B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents
Context Window
128.0K
Max Output
4.1K
Input Price (Auto)
$2.86/1M
Output Price (Auto)
$14.25/1M
Performance metrics and benchmarks
Sourced from Artificial Analysis.
Intelligence Index
8.3
Auto routing is available for this model. Explicit provider selection is not available.
Loading provider options…
GPQA Diamond
Graduate-level scientific reasoning
32.3%
Better than 12% of models compared
HLE
Humanity's Last Exam
4.5%
Better than 29% of models compared
SciCode
Python programming for scientific computing
11.8%
Better than 14% of models compared
LiveCodeBench
Contamination-free coding benchmark
12.2%
Better than 11% of models compared
AIME
American Invitational Mathematics Examination
0.7%
Better than 9% of models compared
Math-500
Diverse mathematical problem solving benchmark
27.9%
Better than 4% of models compared
MMLU-Pro
Professional and academic subject knowledge
43.2%
Better than 10% of models compared
Last updated May 15, 2026
Artificial Analysis