Private AI
Browse and discover the best AI language models for conversations, coding, and creative writing.
Nvidia Nemotron 3 Ultra 550B
Nvidia's Nemotron 3 Ultra 550B A55B model from the Nemotron 3 family. It uses a hybrid Mamba-Transformer MoE architecture. Provider-specific context limits vary, with the longest current route supporting up to 1M context.
Features
Context
1.0M
Max Output
65.5K
Date Added
Jun 4, 2026
Pricing
Input:
$0.50/1M
Output:
$2.50/1M
Est./msg:
$0.0018
Subscription
Included in subscription
View Providers
Nvidia Nemotron 3 Ultra 550B Thinking
Nvidia's Nemotron 3 Ultra 550B A55B model from the Nemotron 3 family. It uses a hybrid Mamba-Transformer MoE architecture. Provider-specific context limits vary, with the longest current route supporting up to 1M context. Thinking enabled.
Features
Context
1.0M
Max Output
65.5K
Date Added
Jun 4, 2026
Pricing
Input:
$0.50/1M
Output:
$2.50/1M
Est./msg:
$0.0018
Subscription
Included in subscription
View Providers
Qwen3.6 27B TEE
Qwen3.6 27B is a dense language model from Alibaba's Qwen team with text and image input, configurable thinking/reasoning behavior, and a native 262K context window. Running inside a TEE (Trusted Execution Environment), with provider attestation support.
Benchmarks (Artificial Analysis)
Intelligence
51.8
Coding
44.9
Speed
40.3
Features
Context
262.1K
Max Output
65.5K
Date Added
Jun 4, 2026
Pricing
Input:
$0.32/1M
Output:
$2.70/1M
Est./msg:
$0.0017
Subscription
Not included in subscription
Nex N2 Pro
Nex AGI's open-source agentic reasoning model, post-trained on Qwen3.5-397B-A17B. It is built for agentic coding, software engineering, deep research, tool use, and long-horizon tasks with a 256K context window.
Features
Context
262.1K
Max Output
262.1K
Date Added
Jun 4, 2026
Performance
TPS
77.2
TTFT
6.9s
Pricing
Input:
$0.50/1M
Output:
$1.50/1M
Est./msg:
$0.0013
Subscription
Included in subscription
MiMo V2.5 Thinking
MiMo V2.5 with Xiaomi thinking enabled. It supports deep reasoning, tool calling, structured outputs, and web search with up to 1M context.
Benchmarks (Artificial Analysis)
Intelligence
49.0
Coding
42.1
Speed
77.8
Features
Context
1.0M
Max Output
131.1K
Date Added
Jun 3, 2026
Performance
TPS
55.4
TTFT
1.9s
Pricing
Input:
$0.14/1M
Output:
$0.28/1M
Cache:
Read $0.03/1M
Est./msg:
$0.0003
Subscription
Included in subscription
MiMo V2.5 Pro Thinking
MiMo V2.5 Pro with Xiaomi thinking enabled for coding, long-context reasoning, and agentic orchestration.
Benchmarks (Artificial Analysis)
Intelligence
53.8
Coding
45.5
Speed
44.3
Features
Context
1.0M
Max Output
131.1K
Date Added
Jun 3, 2026
Pricing
Input:
$0.44/1M
Output:
$0.87/1M
Cache:
Read $0.04/1M
Est./msg:
$0.0009
Subscription
Included in subscription
View Providers
Mistral Code Agent Latest
Mistral Code Agent Latest is Mistral's direct API alias for devstral-2512, an agentic coding model built for autonomous software engineering, tool use, and long-running code tasks.
Features
Context
262.1K
Max Output
32.8K
Date Added
Jun 2, 2026
Pricing
Input:
$0.40/1M
Output:
$2.00/1M
Est./msg:
$0.0014
Subscription
Included in subscription
Mistral Code Latest
Mistral Code Latest is Mistral's direct API alias for codestral-2508, a low-latency coding model for code generation, completion, fill-in-the-middle workflows, function calling, and structured output.
Features
Context
256.0K
Max Output
32.8K
Date Added
Jun 2, 2026
Pricing
Input:
$0.30/1M
Output:
$0.90/1M
Est./msg:
$0.0007
Subscription
Not included in subscription
MiniMax M3
MiniMax M3 is the non-thinking route for MiniMax's open-weights frontier model, built for coding, agent workflows, tool use, and multimodal understanding from step zero. It keeps native thinking disabled for faster direct answers. MiniMax reports 59.0% on SWE-Bench Pro and 66.0% on Terminal Bench 2.1, with Sparse Attention designed to scale context to 1M. It starts with a 512K context cap on NanoGPT for now.
Benchmarks (Artificial Analysis)
Intelligence
54.7
Coding
43.4
Speed
44.5
Features
Context
512.0K
Max Output
80.0K
Date Added
Jun 1, 2026
Pricing
Input:
$0.30/1M
Output:
$1.20/1M
Est./msg:
$0.0009
Subscription
Included in subscription
View Providers
MiniMax M3 Thinking
MiniMax M3 Thinking is the adaptive-thinking version of MiniMax's open-weights frontier model for coding, agent workflows, tool use, long-context tasks, and native multimodal understanding. MiniMax reports 59.0% on SWE-Bench Pro and 66.0% on Terminal Bench 2.1, with Sparse Attention designed to scale context to 1M. It starts with a 512K context cap on NanoGPT for now.
Features
Context
512.0K
Max Output
80.0K
Date Added
Jun 1, 2026
Pricing
Input:
$0.30/1M
Output:
$1.20/1M
Est./msg:
$0.0009
Subscription
Included in subscription
View Providers
Qwen3.7 Plus
Qwen3.7 Plus is Alibaba's cost-effective Qwen 3.7 multimodal agent model for coding, tool use, productivity workflows, visual understanding, screen reading, and GUI interaction.
Benchmarks (Artificial Analysis)
Intelligence
53.3
Coding
46.5
Speed
53.3
Features
Context
991.8K
Max Output
65.5K
Date Added
Jun 1, 2026
Pricing
Input:
$0.40/1M
Output:
$1.60/1M
Cache:
Read $0.04/1M · Write $0.50/1M (5m) / $0.80/1M (1h)
Est./msg:
$0.0012
Subscription
Not included in subscription
View Providers
Qwen3.7 Plus Thinking
Qwen3.7 Plus with thinking mode enabled for deeper multimodal reasoning, coding, tool use, screen reading, and productivity workflows.
Features
Context
983.6K
Max Output
65.5K
Date Added
Jun 1, 2026
Pricing
Input:
$0.40/1M
Output:
$1.60/1M
Cache:
Read $0.04/1M · Write $0.50/1M (5m) / $0.80/1M (1h)
Est./msg:
$0.0012
Subscription
Not included in subscription
View Providers
Step 3.7 Flash Thinking
Step 3.7 Flash Thinking is StepFun's high-efficiency multimodal MoE model with visible reasoning enabled for deeper agentic coding, long-context reasoning, tool use, and native image/video understanding. ⚠️ Note: This model routes through StepFun, so privacy and logging guarantees may be limited.
Features
Context
256.0K
Max Output
256.0K
Date Added
May 29, 2026
Performance
TPS
171.3
TTFT
3.2s
Pricing
Input:
$0.20/1M
Output:
$1.15/1M
Est./msg:
$0.0008
Subscription
Included in subscription
M-Prometheus 14B
M-Prometheus 14B is an open multilingual LLM judge from Unbabel, fine-tuned from Qwen2.5 14B to evaluate model outputs with direct assessment, pairwise comparison, and long-form feedback.
Context
32.8K
Max Output
8.2K
Date Added
May 29, 2026
Pricing
Input:
$0.20/1M
Output:
$0.20/1M
Est./msg:
$0.0003
Subscription
Included in subscription
Claude Opus 4.8
Anthropic Claude Opus 4.8 with text, image, and file inputs and a 1M-token context window.
Features
Context
1.0M
Max Output
128.0K
Date Added
May 28, 2026
Performance
TPS
91.6
TTFT
3s
Pricing
Input:
$5.00/1M
Output:
$25.01/1M
Cache:
Read $0.50/1M · Write $6.25/1M (5m) / $10.00/1M (1h)
Est./msg:
$0.0175
Subscription
Not included in subscription
Claude Opus 4.8 Thinking
Anthropic Claude Opus 4.8 with thinking enabled.
Features
Context
1.0M
Max Output
128.0K
Date Added
May 28, 2026
Performance
TPS
128.7
TTFT
18.1s
Pricing
Input:
$5.00/1M
Output:
$25.01/1M
Cache:
Read $0.50/1M · Write $6.25/1M (5m) / $10.00/1M (1h)
Est./msg:
$0.0175
Subscription
Not included in subscription
Qwen3.5 122B A10B TEE
Qwen3.5 122B A10B is a large MoE model with 122B total parameters, 10B active parameters per token, strong coding and tool-use behavior, and a 262K token context window. Running inside a TEE (Trusted Execution Environment), with provider attestation support.
Benchmarks (Artificial Analysis)
Intelligence
35.9
Coding
31.6
Speed
150.1
Features
Context
262.1K
Max Output
262.1K
Date Added
May 26, 2026
Pricing
Input:
$0.46/1M
Output:
$3.68/1M
Est./msg:
$0.0023
Subscription
Not included in subscription
Gemma 4 31B IT TEE
Gemma 4 31B Instruct is a 30.7B dense model with a long context window, multilingual performance, function calling, and configurable reasoning. Running inside a TEE (Trusted Execution Environment), with provider attestation support.
Benchmarks (Artificial Analysis)
Intelligence
32.3
Coding
33.9
Speed
59.9
Features
Context
262.1K
Max Output
262.1K
Date Added
May 26, 2026
Performance
TPS
210.8
TTFT
8.2s
Pricing
Input:
$0.15/1M
Output:
$0.46/1M
Est./msg:
$0.0004
Subscription
Not included in subscription
Qwen3.6 35B A3B Uncensored TEE
Qwen3.6 35B A3B Uncensored Aggressive is a native vision-language MoE model tuned to reduce refusal behavior while preserving Qwen3.6 coding, math, and multimodal capabilities. Running inside a TEE (Trusted Execution Environment), with provider attestation support.
Benchmarks (Artificial Analysis)
Intelligence
51.8
Coding
44.9
Speed
40.3
Features
Context
131.1K
Max Output
131.1K
Date Added
May 23, 2026
Performance
TPS
189.2
TTFT
1.8s
Pricing
Input:
$0.30/1M
Output:
$1.50/1M
Est./msg:
$0.0010
Subscription
Not included in subscription
Gemma 4 26B A4B Uncensored TEE
Gemma 4 26B A4B Uncensored Heretic is a multimodal MoE model tuned to reduce refusal behavior while preserving coding, reasoning, and function-calling strengths. Running inside a TEE (Trusted Execution Environment), with provider attestation support.
Benchmarks (Artificial Analysis)
Intelligence
31.2
Coding
22.4
Features
Context
65.5K
Max Output
65.5K
Date Added
May 23, 2026
Performance
TPS
295.6
TTFT
5s
Pricing
Input:
$0.15/1M
Output:
$0.70/1M
Est./msg:
$0.0005
Subscription
Not included in subscription
Cohere Command A+ (05/2026)
Cohere's open-weight Command A+ model for complex agentic, multimodal, multilingual, and reasoning tasks. It is a 218B total parameter sparse MoE model with 25B active parameters, supports text and image inputs, structured outputs, and 48 languages.
Features
Context
128.0K
Max Output
64.0K
Date Added
May 22, 2026
Pricing
Input:
$2.50/1M
Output:
$10.00/1M
Est./msg:
$0.0075
Subscription
Not included in subscription
Qwen3.7 Max
Qwen3.7 Max is Alibaba's latest flagship Qwen model for agentic coding, office automation, and long-running tool workflows in non-thinking mode.
Benchmarks (Artificial Analysis)
Intelligence
56.6
Coding
50.1
Speed
140.9
Context
1.0M
Max Output
65.5K
Date Added
May 21, 2026
Pricing
Input:
$2.50/1M
Output:
$7.50/1M
Cache:
Read $0.25/1M · Write $3.13/1M (5m) / $5.00/1M (1h)
Est./msg:
$0.0063
Subscription
Not included in subscription
View Providers
Qwen3.7 Max Thinking
Qwen3.7 Max with thinking mode enabled for deeper reasoning, coding, and long-running tool workflows.
Benchmarks (Artificial Analysis)
Intelligence
56.6
Coding
50.1
Speed
140.9
Features
Context
1.0M
Max Output
65.5K
Date Added
May 21, 2026
Pricing
Input:
$2.50/1M
Output:
$7.50/1M
Cache:
Read $0.25/1M · Write $3.13/1M (5m) / $5.00/1M (1h)
Est./msg:
$0.0063
Subscription
Not included in subscription
View Providers
Grok Build 0.1
Grok Build 0.1 is xAI's fast coding model trained specifically for agentic software engineering workflows. It supports text and image inputs with text output, and is optimized for interactive coding agents, tool use, and multi-step development tasks. Currently in early access. Content policy rejections can still be charged: xAI may pass through a $0.05 moderation-failure fee, or a $0.055 usage-guidelines violation fee, depending on which rejection upstream returns.
Features
Context
256.0K
Max Output
256.0K
Date Added
May 20, 2026
Pricing
Input:
$1.00/1M
Output:
$2.00/1M
Est./msg:
$0.0020
Subscription
Not included in subscription
Gemini 3.5 Flash
Google's speed-focused Gemini Flash model for frontier multimodal intelligence across text, images, audio, video, PDFs, and code. Built for agentic coding, reliable tool use, structured outputs, and long-context workflows.
Features
Context
1.0M
Max Output
65.5K
Date Added
May 19, 2026
Performance
TPS
190
TTFT
1.4s
Pricing
Input:
$1.50/1M
Output:
$9.00/1M
Cache:
Read $0.15/1M
Est./msg:
$0.0060
Subscription
Not included in subscription
Gemini 3.5 Flash Thinking
Gemini 3.5 Flash with higher thinking enabled for harder reasoning, agentic coding, tool use, multimodal analysis, and long-context workflows.
Features
Context
1.0M
Max Output
65.5K
Date Added
May 19, 2026
Performance
TPS
220.4
TTFT
3.5s
Pricing
Input:
$1.50/1M
Output:
$9.00/1M
Cache:
Read $0.15/1M
Est./msg:
$0.0060
Subscription
Not included in subscription
Perceptron Mk1
Perceptron Mk1 is Perceptron's vision-language model for image and video understanding, OCR, document parsing, object detection, counting, spatial localization, and embodied visual reasoning.
Features
Context
32.8K
Max Output
8.2K
Date Added
May 12, 2026
Pricing
Input:
$0.15/1M
Output:
$1.50/1M
Est./msg:
$0.0009
Subscription
Not included in subscription
Sarvam 30B
Sarvam 30B is a 30B parameter chat completion model from Sarvam AI with multilingual support, streaming, tool calling, reasoning controls, and a 64k context window.
Benchmarks (Artificial Analysis)
Intelligence
12.3
Coding
7.9
Speed
136.7
Features
Context
65.5K
Max Output
4.1K
Date Added
May 12, 2026
Pricing
Input:
$0.03/1M
Output:
$0.11/1M
Est./msg:
$0.0001
Subscription
Not included in subscription
Sarvam 105B
Sarvam 105B is a 105B parameter chat completion model from Sarvam AI with multilingual support, streaming, tool calling, reasoning controls, and a 128k context window.
Benchmarks (Artificial Analysis)
Intelligence
18.2
Coding
9.8
Speed
98.8
Features
Context
131.1K
Max Output
4.1K
Date Added
May 12, 2026
Pricing
Input:
$0.05/1M
Output:
$0.18/1M
Est./msg:
$0.0001
Subscription
Not included in subscription
MiroThinker 1.7 Deep Research
MiroMind's flagship 235B deep research agent for multi-step research with built-in reasoning and web/tool execution.
Features
Context
262.1K
Max Output
16.4K
Date Added
May 11, 2026
Pricing
Input:
$4.00/1M
Output:
$25.00/1M
Est./msg:
$0.0165
Subscription
Not included in subscription