Private AI
Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost. On performance, Mistral Medium 3 also surpasses leading open models such as Llama 4 Maverick and enterprise models such as Cohere Command A. On pricing, the model beats cost leaders such as DeepSeek v3, both in API and self-deployed systems.
Added Sep 25, 2025
Context Window
131.1K
Max Output
32.8K
Input Price (Auto)
$0.40/1M
Output Price (Auto)
$2.00/1M
Performance metrics and benchmarks
Sourced from Artificial Analysis.
Intelligence Index
12.5
Auto routing is available for this model. Explicit provider selection is not available.
Loading provider options…
GPQA Diamond
Graduate-level scientific reasoning
57.8%
Better than 38% of models compared
HLE
Humanity's Last Exam
4.3%
Better than 20% of models compared
IFBench
Instruction-following benchmark
39.3%
Better than 38% of models compared
T²-Bench Telecom
Conversational AI agents in dual-control scenarios
24.3%
Better than 28% of models compared
AA-LCR
Long context reasoning evaluation
28.0%
Better than 42% of models compared
SciCode
Python programming for scientific computing
33.1%
Better than 52% of models compared
Terminal-Bench Hard
Agentic coding and terminal use
3.8%
AIME 2025
American Invitational Mathematics Examination 2025
30.3%
Better than 32% of models compared
AIME
American Invitational Mathematics Examination
44.0%
Better than 66% of models compared
MMLU-Pro
Professional and academic subject knowledge
76.0%
Better than 52% of models compared
Last updated Jun 25, 2026
Artificial AnalysisBetter than 26% of models compared
LiveCodeBench
Contamination-free coding benchmark
40.0%
Better than 47% of models compared
Math-500
Diverse mathematical problem solving benchmark
90.7%
Better than 65% of models compared