Experimental release of Qwen's reasoning model. Great at coding and math, but still in development so may exhibit odd bugs. Not production-ready.
Added Feb 27, 2025
Context Window
32.8K
Max Output
32.8K
Input Price (Auto)
$0.20/1M
Output Price (Auto)
$0.20/1M
Performance metrics and benchmarks
Sourced from Artificial Analysis.
Intelligence Index
19.7
Auto routing is available for this model. Explicit provider selection is not available.
Loading provider options…
GPQA Diamond
Graduate-level scientific reasoning
59.3%
Better than 46% of models compared
HLE
Humanity's Last Exam
8.2%
Better than 66% of models compared
IFBench
Instruction-following benchmark
38.8%
Better than 42% of models compared
AA-LCR
Long context reasoning evaluation
25.0%
Better than 44% of models compared
SciCode
Python programming for scientific computing
35.8%
Better than 66% of models compared
LiveCodeBench
Contamination-free coding benchmark
63.1%
Better than 69% of models compared
AIME 2025
American Invitational Mathematics Examination 2025
29.0%
Better than 30% of models compared
AIME
American Invitational Mathematics Examination
78.0%
Better than 87% of models compared
MMLU-Pro
Professional and academic subject knowledge
76.4%
Better than 54% of models compared
Last updated May 15, 2026
Artificial AnalysisMath-500
Diverse mathematical problem solving benchmark
95.7%
Better than 81% of models compared