Private AI
Llama-3.3-Nemotron-Super-49B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model's memory footprint, enabling larger workloads, as well as fitting the model on a single GPU at high workloads (H200). This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. For more information on the NAS approach, please refer to this paper.
Added Aug 8, 2025
Context Window
128.0K
Max Output
16.4K
Input Price (Auto)
$0.15/1M
Output Price (Auto)
$0.15/1M
Performance metrics and benchmarks
Sourced from Artificial Analysis.
Intelligence Index
8.5
Auto routing is available for this model. Explicit provider selection is not available.
Loading provider options…
GPQA Diamond
Graduate-level scientific reasoning
51.7%
Better than 31% of models compared
HLE
Humanity's Last Exam
3.5%
Better than 4% of models compared
IFBench
Instruction-following benchmark
39.5%
Better than 38% of models compared
AA-LCR
Long context reasoning evaluation
11.3%
Better than 24% of models compared
SciCode
Python programming for scientific computing
22.9%
Better than 27% of models compared
Terminal-Bench Hard
Agentic coding and terminal use
0.0%
AIME 2025
American Invitational Mathematics Examination 2025
7.7%
Better than 12% of models compared
AIME
American Invitational Mathematics Examination
19.3%
Better than 46% of models compared
MMLU-Pro
Professional and academic subject knowledge
69.8%
Better than 36% of models compared
Last updated Jun 24, 2026
Artificial AnalysisBetter than 6% of models compared
LiveCodeBench
Contamination-free coding benchmark
28.0%
Better than 31% of models compared
Math-500
Diverse mathematical problem solving benchmark
77.5%
Better than 42% of models compared