OpenHands+Devstral is 100% local 100% open, and is SOTA for the category on SWE-Bench Verified: 46.8% accuracy.
Added Aug 2, 2025
Context Window
32.8K
Max Output
8.2K
Input Price (Auto)
$0.060/1M
Output Price (Auto)
$0.060/1M
Performance metrics and benchmarks
Sourced from Artificial Analysis.
Intelligence Index
18.0
Auto routing is available for this model. Explicit provider selection is not available.
Loading provider options…
Coding Index
12.2
Agentic Index
25.1
GPQA Diamond
Graduate-level scientific reasoning
43.4%
Better than 27% of models compared
HLE
Humanity's Last Exam
4.0%
Better than 15% of models compared
IFBench
Instruction-following benchmark
31.6%
Better than 22% of models compared
T²-Bench Telecom
Conversational AI agents in dual-control scenarios
38.0%
Better than 54% of models compared
AA-LCR
Long context reasoning evaluation
26.7%
Better than 46% of models compared
GDPval-AA
Economically valuable tasks
16.7%
Better than 72% of models compared
CritPt
Research-level physics reasoning
0.0%
Better than 36% of models compared
SciCode
Python programming for scientific computing
24.5%
Better than 36% of models compared
Terminal-Bench Hard
Agentic coding and terminal use
6.1%
Better than 38% of models compared
AIME
American Invitational Mathematics Examination
6.7%
Better than 24% of models compared
Math-500
Diverse mathematical problem solving benchmark
68.4%
Better than 25% of models compared
MMLU-Pro
Professional and academic subject knowledge
63.2%
Better than 24% of models compared
AA-Omniscience Accuracy
Proportion of correctly answered questions
15.9%
Better than 35% of models compared
Last updated May 15, 2026
Artificial AnalysisLiveCodeBench
Contamination-free coding benchmark
25.8%
Better than 27% of models compared
AA-Omniscience Hallucination Rate
Rate of incorrect answers among non-correct responses
86.6%
Better than 33% of models compared