Z.ai's native multimodal agent model for vision-based coding and agent workflows. This is the standard non-thinking variant for image, video, and text inputs, tuned for perceive-plan-execute loops, complex coding, and tool-driven task execution. Not included in the subscription.
Added Apr 1, 2026
Context Window
202.8K
Max Output
131.1K
Input Price (Auto)
$1.20/1M
Output Price (Auto)
$4.00/1M
Cache Read (Auto)
$0.24/1M
Capabilities
Performance metrics and benchmarks
Sourced from Artificial Analysis.
Intelligence Index
46.8
Auto routing is available for this model. Explicit provider selection is not available.
Loading provider options…
Coding Index
36.8
GPQA Diamond
Graduate-level scientific reasoning
84.7%
Better than 91% of models compared
HLE
Humanity's Last Exam
25.4%
Better than 92% of models compared
IFBench
Instruction-following benchmark
73.2%
Better than 93% of models compared
T²-Bench Telecom
Conversational AI agents in dual-control scenarios
98.5%
Better than 99% of models compared
AA-LCR
Long context reasoning evaluation
60.7%
Better than 82% of models compared
SciCode
Python programming for scientific computing
43.6%
Better than 91% of models compared
Terminal-Bench Hard
Agentic coding and terminal use
33.3%
Better than 85% of models compared
Last updated May 15, 2026
Artificial Analysis