Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, better roleplaying, reasoning, multi-turn conversation, and long context coherence. This 70B model is a competitive finetune of Llama-3.1-70B focused on aligning LLMs to the user with powerful steering capabilities.
Added Jan 7, 2026
Context Window
65.5K
Max Output
8.2K
Input Price (Auto)
$0.31/1M
Output Price (Auto)
$0.31/1M
Performance metrics and benchmarks
Sourced from Artificial Analysis.
Intelligence Index
10.6
Choose explicit providers for this model. Auto routing remains available as the default option.
Loading provider options…
Agentic Index
10.0
GPQA Diamond
Graduate-level scientific reasoning
40.1%
Better than 21% of models compared
HLE
Humanity's Last Exam
4.1%
Better than 18% of models compared
GDPval-AA
Economically valuable tasks
1.1%
Better than 42% of models compared
CritPt
Research-level physics reasoning
0.0%
Better than 36% of models compared
SciCode
Python programming for scientific computing
23.1%
Better than 32% of models compared
LiveCodeBench
Contamination-free coding benchmark
18.8%
Better than 20% of models compared
AIME
American Invitational Mathematics Examination
2.3%
Better than 13% of models compared
Math-500
Diverse mathematical problem solving benchmark
53.8%
Better than 17% of models compared
MMLU-Pro
Professional and academic subject knowledge
57.1%
Better than 19% of models compared
AA-Omniscience Accuracy
Proportion of correctly answered questions
18.2%
Better than 52% of models compared
Last updated May 15, 2026
Artificial AnalysisAA-Omniscience Hallucination Rate
Rate of incorrect answers among non-correct responses
80.1%
Better than 59% of models compared