Benchmarks

Top models across a combined benchmark plus Artificial Analysis, LMArena, LiveBench, FrontierCode, Epoch AI, ARC Prize, EQ-Bench, Design Arena, and NanoGPT benchmark categories.

Combined

Equal-weight blend of Artificial Analysis Intelligence Index, LMArena Overall, LiveBench Overall, NanoGPT Usage Share. Each source is min-max normalized to 0-100 across its current leaderboard and weighted at 25%. Missing or unavailable source entries contribute 0.

Top 20 price vs performance

X-axis: $/M blended tokens

Best value frontier: No cheaper model has a better benchmark result.

1.

GPT 5.5
OpenAI logo

by OpenAI

64.0%

Best value

2.

Claude Opus 4.8
Anthropic logo

by Anthropic

59.8%

Best value

3.

56.2%

Best value

4.

Claude 4.7 Opus
Anthropic logo

by Anthropic

52.4%

5.

GPT 5.4
OpenAI logo

by OpenAI

48.7%

6.

Claude 4.6 Opus
Anthropic logo

by Anthropic

46.5%

7.

29.6%

Best value

8.

Qwen3.7
Qwen logo

by Qwen

28.4%

10.

Claude Fable 5 Xhigh Effort

17.4%

Weighted blend of latest source snapshots

NanoGPT Composite