Grok 4 Fast, xAI’s latest advancement in cost‑efficient reasoning. Built on learnings from Grok 4, it blends reasoning and non‑reasoning in one model with a 2M‑token context window and state‑of‑the‑art cost efficiency. Content policy rejections can still be charged: xAI may pass through a $0.05 moderation-failure fee, or a $0.055 usage-guidelines violation fee, depending on which rejection upstream returns.
Added Sep 20, 2025
Context Window
2.0M
Max Output
131.1K
Input Price (Auto)
$0.20/1M
Output Price (Auto)
$0.50/1M
Capabilities
Performance metrics and benchmarks
Sourced from Artificial Analysis.
Intelligence Index
23.1
Auto routing is available for this model. Explicit provider selection is not available.
Loading provider options…
Coding Index
19.0
Agentic Index
32.0
GPQA Diamond
Graduate-level scientific reasoning
60.6%
Better than 49% of models compared
HLE
Humanity's Last Exam
5.0%
Better than 41% of models compared
IFBench
Instruction-following benchmark
37.7%
Better than 38% of models compared
T²-Bench Telecom
Conversational AI agents in dual-control scenarios
63.7%
Better than 66% of models compared
AA-LCR
Long context reasoning evaluation
20.0%
Better than 37% of models compared
GDPval-AA
Economically valuable tasks
13.9%
Better than 66% of models compared
CritPt
Research-level physics reasoning
0.0%
Better than 36% of models compared
SciCode
Python programming for scientific computing
32.9%
Better than 58% of models compared
Terminal-Bench Hard
Agentic coding and terminal use
12.1%
Better than 54% of models compared
AIME 2025
American Invitational Mathematics Examination 2025
41.3%
Better than 43% of models compared
MMLU-Pro
Professional and academic subject knowledge
73.0%
Better than 43% of models compared
AA-Omniscience Accuracy
Proportion of correctly answered questions
15.9%
Better than 35% of models compared
Last updated May 15, 2026
Artificial AnalysisLiveCodeBench
Contamination-free coding benchmark
40.1%
Better than 47% of models compared
AA-Omniscience Hallucination Rate
Rate of incorrect answers among non-correct responses
78.2%
Better than 63% of models compared