Anthropic just released Claude Opus 4.6, their most capable model to date. It's already live on NanoGPT — both in the app and via API — with full support for thinking modes, adaptive effort levels, and prompt caching.
Here's what you need to know.
What's new in Opus 4.6
Opus 4.6 is a meaningful jump from Opus 4.5. The headline improvements:
- 1 million token context window (beta) — the first time an Opus-class model supports this. Ideal for working with large codebases, long documents, or extended conversations.
- 128k token output — more than enough for long-form generation, detailed analysis, or multi-file code changes.
- Better coding — higher scores on Terminal-Bench 2.0 (65.4% vs 59.8% for Opus 4.5), improved planning, better code review, and stronger debugging in large codebases.
- Stronger reasoning — leads all frontier models on Humanity's Last Exam and scores 68.8% on ARC AGI 2 (up from 37.6% for Opus 4.5).
- Better long-context retrieval — 76% on the 8-needle 1M variant of MRCR v2, compared to 18.5% for Sonnet 4.5. When you put a lot of context in, it actually uses it.
Available models on NanoGPT
We support Opus 4.6 in several configurations:
| Model ID | Description |
|---|
claude-opus-4-6 | Standard (non-thinking) |
claude-opus-4-6-thinking | Thinking enabled, high effort (default) |
claude-opus-4-6-thinking:low | Thinking with low effort |
claude-opus-4-6-thinking:medium | Thinking with medium effort |
claude-opus-4-6-thinking:max | Thinking with maximum effort |
Thinking and adaptive effort
Opus 4.6 introduces adaptive thinking — the model decides on its own how much reasoning effort a task needs. Instead of setting a fixed thinking token budget like previous Claude thinking models, you select an effort level and the model allocates thinking depth accordingly.
- Low: Fast responses for simple tasks. The model skips deep reasoning when it isn't needed.
- Medium: A balanced mode for everyday questions and moderate complexity.
- High (default): Solid reasoning for most tasks — a good default for anything non-trivial.
- Max: Full reasoning depth for the hardest problems. Use this for math, complex code architecture, or multi-step analysis where you want the model to really think things through.
The non-thinking variant (claude-opus-4-6) skips extended thinking entirely, giving you faster responses at lower cost for straightforward tasks.
Prompt caching is supported
Claude prompt caching works with Opus 4.6 on NanoGPT. If you're sending repeated context (system prompts, long documents, reference code), caching lets you avoid re-processing those tokens on every request. This significantly reduces cost and latency for workflows with shared context.
What Opus 4.6 is great for
- Complex coding tasks — multi-file refactors, debugging tricky issues, code review across large repos, writing and fixing tests. This is where the model shines brightest.
- Long-context work — analyzing lengthy documents, working through entire codebases, or maintaining context across long conversations. The 1M context window makes this viable in a way previous Opus models didn't support.
- Research and analysis — financial analysis, legal reasoning (90.2% on BigLaw Bench), multi-step information synthesis.
- Agentic workflows — tasks that require sustained autonomous work over many steps. The model stays productive over longer sessions and handles ambiguous problems with better judgment.
- Hard reasoning problems — math, logic, science questions, anything where extended thinking pays off.
What it's less ideal for
- Quick, simple questions — if you're asking something straightforward, Opus 4.6 with thinking enabled can overthink it, adding cost and latency for no real benefit. Use the non-thinking variant or a lighter model like Sonnet 4.5 for these.
- High-volume, low-complexity workloads — for tasks like classification, extraction, or simple formatting where speed and cost matter more than reasoning depth, smaller models are more efficient.
- When you need the fastest response time — Opus is a large model. If latency is your top priority and the task doesn't need deep reasoning, Sonnet or Haiku will get you an answer faster.
Pricing
Same as previous Opus models: $5 per million input tokens, $25 per million output tokens. For prompts exceeding 200k tokens, extended context pricing applies at $10/$37.50 per million tokens. Thinking tokens are billed as output tokens.
Try it now
Opus 4.6 is available right now on NanoGPT. Use it in the app by selecting it from the model dropdown, or hit the API with any of the model IDs listed above.