Available Models
Explore 456 AI models from leading providers, all accessible through NanoGPT's unified API.
Jump to Provider

Meta
Llama 4 Maverick
meta-llama/llama-4-maverick
Llama 4 Maverick, a 17 billion active parameter model with 128 experts, is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash across a broad range of widely reported benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and coding—at less than half the active parameters. Llama 4 Maverick offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena.
Context: 1,048,576 tokens
Input: $---/M • Output: $---/M
Llama 4 Scout
meta-llama/llama-4-scout
Llama 4 Scout, a 17 billion active parameter model with 16 experts, is the best multimodal model in the world in its class and is more powerful than all previous generation Llama models, while fitting in a single H100 GPU. Additionally, Llama 4 Scout offers an industry-leading context window of 10M and delivers better results than Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across a broad range of widely reported benchmarks.
Context: 328,000 tokens
Input: $---/M • Output: $---/M
CodeLlama 70B
codellama/CodeLlama-70b-Instruct-hf
CodeLlama 70B is a large language model optimized for code generation and understanding. It excels at coding tasks, debugging, and technical problem-solving.
Context: 16,000 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70b Instruct
meta-llama/llama-3.3-70b-instruct
Llama 3.3 is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Sao10K Stheno 8b
Sao10K/L3-8B-Stheno-v3.2
Sao10K's latest Stheno fine-tune optimized for instruction following.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Llama 3.1 Large
Meta-Llama-3-1-405B-Instruct-FP8
Note: comes with a 90% discount currently, enjoy! Meta's largest Llama 3.1 405B model. Open-source, run through an open permissionless crypto network (no central provider).
Context: 128,000 tokens
Input: $---/M • Output: $---/M
EVA Llama 3.33 70B
EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0
A RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Steelskull Nevoria 70b
Steelskull/L3.3-MS-Nevoria-70b
Steelskull Nevoria 70b
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Steelskull Nevoria R1 70b
Steelskull/L3.3-Nevoria-R1-70b
Steelskull Nevoria R1 70b
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Steelskull Electra R1 70b
Steelskull/L3.3-Electra-R1-70b
Steelskull Electra R1 70b
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Llama 3.1 70B Dracarys 2
abacusai/Dracarys-72B-Instruct
Llama 3.1 70b finetune that offers improvements on coding.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Llama 3.2 Medium
meta-llama/llama-3.2-90b-vision-instruct
Medium-size (and capability) version of Meta's newest model (3.2 series).
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Instruct abliterated
huihui-ai/Llama-3.3-70B-Instruct-abliterated
An abliterated (removed restrictions and censorship) version of Llama 3.3 70b.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B
TEE/llama3-3-70b
Meta's Llama 3.3 with 70B parameters. Secure inference with encrypted inputs and outputs, running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Llama 3.1 8b Instruct
meta-llama/llama-3.1-8b-instruct
Fast and efficient for simple purposes.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Neural Daredevil 8B abliterated
mlabonne/NeuralDaredevil-8B-abliterated
The best performing 8B abliterated model according to most benchmarks.
Context: 8,192 tokens
Input: $---/M • Output: $---/M
Llama 3 70B abliterated
failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5
An abliterated (removed restrictions and censorship) version of Llama 3.1 70b.
Context: 8,192 tokens
Input: $---/M • Output: $---/M
Llama 3.05 Storybreaker Ministral 70b
Envoid/Llama-3.05-NT-Storybreaker-Ministral-70B
Much more inclined to output adult content than its predecessor. Great choice for novelty roleplay scenarios.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Nemotron Tenyxchat Storybreaker 70b
Envoid/Llama-3.05-Nemotron-Tenyxchat-Storybreaker-70B
Overall it provides a solid option for RP and creative writing while still functioning as an assistant model, if desired. If used to continue a roleplay it will generally follow the ongoing cadence of the conversation.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Mag Mell R1
inflatebot/MN-12B-Mag-Mell-R1
Mag Mell demonstrates worldbuilding capabilities unlike any model in its class, comparable to old adventuring models like Tiefighter, and prose that exhibits minimal slop.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Evayale 70b
Steelskull/L3.3-MS-Evayale-70B
Combination of EVA and Euryale.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Lumimaid 70b
NeverSleep/Llama-3-Lumimaid-70B-v0.1
Neversleep Llama 3 Lumimaid 70B
Context: 16,384 tokens
Input: $---/M • Output: $---/M
MS Evalebis 70b
Steelskull/L3.3-MS-Evalebis-70b
Combination of EVA, Euryale and Anubis.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Qwerky 72B
featherless-ai/Qwerky-72B
Linear models offer a promising approach to significantly reduce computational costs at scale, particularly for large context lengths. Enabling a >1000x improvement in inference costs, enabling o1 inference time thinking and wider AI accessibility.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Anubis 70B v1
TheDrummer/Anubis-70B-v1
L3.3 finetune for roleplaying.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70b Mirai Fanfare
Llama-3.3-70B-MiraiFanfare
A Llama 3.3 70b finetuned for roleplay and storytelling.
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.2 3b Instruct
meta-llama/llama-3.2-3b-instruct
Small model optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Llama 3.1 8B (decentralized)
Meta-Llama-3-1-8B-Instruct-FP8
Meta's Llama 3.1 8B model via an open permissionless network
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Llama 3.1 70B Hanami
Sao10K/L3.1-70B-Hanami-x1
Euryale v2.2-based finetune.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Euryale
Sao10K/L3.3-70B-Euryale-v2.3
A 70B parameter model from SAO10K based on Llama 3.3 70B, offering high-quality text generation.
Context: 20,480 tokens
Input: $---/M • Output: $---/M
Llama 3.1 70B Euryale
Sao10K/L3.1-70B-Euryale-v2.2
A 70B parameter model from SAO10K based on Llama 3.1 70B, offering high-quality text generation.
Context: 20,480 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Cu Mai
Steelskull/L3.3-Cu-Mai-R1-70b
A 70B parameter model from Steelskull based on Llama 3.3 70B, offering high-quality text generation.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Llama 3.1 70B Celeste v0.1
nothingiisreal/L3.1-70B-Celeste-V0.1-BF16
Creative model based on Llama 3.1 70B
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Wayfarer
LatitudeGames/Wayfarer-Large-70B-Llama-3.3
Llama 3.3 70B Wayfarer is a fine-tuned version of Llama 3.3 70B, trained on a diverse set of creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Drummer Anubis 70B v1.1
parasail-drummer-anubis-70b-1-1
FP8 quantized version of TheDrummer/Anubis-70B-v1.1 optimized for creative storytelling and roleplay scenarios
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Anubis Pro 105b v1
anubis-pro-105b-v1
An upscaled version of Llama 3.3 70B with 50% more layers. Finetuned further to make use of its new layers.
Context: 64,000 tokens
Input: $---/M • Output: $---/M
Llama-xLAM-2 70B fc-r
Salesforce/Llama-xLAM-2-70b-fc-r
Salesforce’s 70-B frontier model focused on function-calling & retrieval-augmented generation.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B ArliAI RPMax v2
Llama-3.3-70B-ArliAI-RPMax-v2
Llama 3.3 70B ArliAI RPMax v2
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Mokume Gane R1
Llama-3.3-70B-Mokume-Gane-R1
Llama 3.3 70B Mokume Gane R1
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B MS Nevoria
Llama-3.3-70B-MS-Nevoria
Llama 3.3 70B MS Nevoria
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Cirrus x1
Llama-3.3-70B-Cirrus-x1
Llama 3.3 70B Cirrus x1
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Bigger Body
Llama-3.3-70B-Bigger-Body
Llama 3.3 70B Bigger Body
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Damascus R1
Llama-3.3-70B-Damascus-R1
Llama 3.3 70B Damascus R1
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Forgotten Safeword 3.6
Llama-3.3-70B-Forgotten-Safeword-3.6
Llama 3.3 70B Forgotten Safeword 3.6
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Legion V2.1
Llama-3.3-70B-Legion-V2.1
Llama 3.3 70B Legion V2.1
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Electra R1
Llama-3.3-70B-Electra-R1
Llama 3.3 70B Electra R1
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Vulpecula R1
Llama-3.3-70B-Vulpecula-R1
Llama 3.3 70B Vulpecula R1
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Anubis v1
Llama-3.3-70B-Anubis-v1
Llama 3.3 70B Anubis v1
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Magnum v4 SE
Llama-3.3-70B-Magnum-v4-SE
Llama 3.3 70B Magnum v4 SE
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Fallen R1 v1
Llama-3.3-70B-Fallen-R1-v1
Llama 3.3 70B Fallen R1 v1
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Cu Mai R1
Llama-3.3-70B-Cu-Mai-R1
Llama 3.3 70B Cu Mai R1
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B RPMax v1.4
Llama-3.3-70B-ArliAI-RPMax-v1.4
Llama 3.3 70B RPMax v1.4
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Electranova v1.0
Llama-3.3-70B-Electranova-v1.0
Llama 3.3 70B Electranova v1.0
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3+ 70B Hanami x1
Llama-3.3+(3.1v3.3)-70B-Hanami-x1
Llama 70B with older 3.1 LoRA, optimized for creative storytelling
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Mhnnn x1
Llama-3.3-70B-Mhnnn-x1
Llama 70B LoRA variant with unique creative capabilities
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B GeneticLemonade Unleashed v3
Llama-3.3-70B-GeneticLemonade-Unleashed-v3
Llama 70B LoRA optimized for unrestricted creative expression
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Anubis v1.1
Llama-3.3-70B-Anubis-v1.1
Llama 70B LoRA - Updated Anubis with improved reasoning
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3+ 70B TenyxChat DaybreakStorywriter
Llama-3.3+(3v3.3)-70B-TenyxChat-DaybreakStorywriter
Llama 70B with older 3.1 LoRA focused on narrative storytelling
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Forgotten Abomination v5.0
Llama-3.3-70B-Forgotten-Abomination-v5.0
Llama 70B LoRA with dark and horror-themed creativity
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Fallen v1
Llama-3.3-70B-Fallen-v1
Llama 70B LoRA with fallen angel themed responses
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B StrawberryLemonade v1.0
Llama-3.3-70B-StrawberryLemonade-v1.0
Llama 70B LoRA with sweet and refreshing conversational style
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3+ 70B New Dawn v1.1
Llama-3.3+(3.1v3.3)-70B-New-Dawn-v1.1
Llama 70B with older 3.1 LoRA for new beginnings in storytelling
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B StrawberryLemonade v1.2
Llama-3.3-70B-Strawberrylemonade-v1.2
Llama 70B LoRA - Updated with improved sweetness balance
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Shakudo
Llama-3.3-70B-Shakudo
Llama 70B LoRA - Creative model with unique style
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Predatorial Extasy
Llama-3.3-70B-Predatorial-Extasy
Llama 70B LoRA - Creative model for intense scenarios
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B ArliAI RPMax v3
Llama-3.3-70B-ArliAI-RPMax-v3
Llama 70B LoRA - ArliAI fine-tuned for roleplay, v3
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Argunaut 1 SFT
Llama-3.3-70B-Argunaut-1-SFT
Llama 70B LoRA - Supervised fine-tuned creative model
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Dark Ages v0.1
Llama-3.3-70B-Dark-Ages-v0.1
Llama 70B LoRA - Creative model for historical fantasy
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Aurora Borealis
Llama-3.3-70B-Aurora-Borealis
Llama 70B LoRA - Creative model with ethereal qualities
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Magnum v4 SE Cirrus x1 SLERP
Llama-3.3-70B-Magnum-v4-SE-Cirrus-x1-SLERP
Creative Model - SLERP merge of Magnum v4 SE and Cirrus x1
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Progenitor V3.3
Llama-3.3-70B-Progenitor-V3.3
Creative Model
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Omega Directive Unslop v2.0
Llama-3.3-70B-The-Omega-Directive-Unslop-v2.0
Llama 3.3 70B with Omega Directive Unslop v2.0 for enhanced creative output
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B RAWMAW
Llama-3.3-70B-RAWMAW
Llama 3.3 70B RAWMAW model for unrestricted creative writing
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Anthrobomination
Llama-3.3-70B-Anthrobomination
Llama 3.3 70B Anthrobomination for creative storytelling
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Omega Directive Unslop v2.1
Llama-3.3-70B-The-Omega-Directive-Unslop-v2.1
Llama 3.3 70B with Omega Directive Unslop v2.1 - improved version
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Sapphira 0.1
Llama-3.3-70B-Sapphira-0.1
Llama 3.3 70B Sapphira 0.1 for creative and expressive writing
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Incandescent Malevolence
Llama-3.3-70B-Incandescent-Malevolence
Llama 3.3 70B Incandescent Malevolence for dark creative narratives
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Nova
Llama-3.3-70B-Nova
Llama 3.3 70B Nova for creative and expressive writing
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Ignition v0.1
Llama-3.3-70B-Ignition-v0.1
Llama 3.3 70B Ignition v0.1 tuned for dynamic roleplay and dialogue
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B GeneticLemonade Opus
Llama-3.3-70B-GeneticLemonade-Opus
Advanced creative LoRA from GeneticLemonade focused on expressive narrative
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.3 70B Sapphira 0.2
Llama-3.3-70B-Sapphira-0.2
Successor to Sapphira 0.1 with refined style for creative writing
Context: 65,536 tokens
Input: $---/M • Output: $---/M
OpenAI
OpenAI o3-mini
o3-mini
The cheaper version of OpenAI's newest thinking model. Fast, cheap, and with a maximum output of 100,000 words. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
OpenAI o3 Pro
o3-pro
The pro version of the already fantastic o3. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
GPT OSS 120B
openai/gpt-oss-120b
An open-weight, 117B-parameter Mixture-of-Experts (MoE) language model designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT OSS 20B
openai/gpt-oss-20b
An open-weight 21B parameter model released under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI's Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
OpenAI o3-mini high
o3-mini-high
OpenAI's newest flagship model with reasoning effort set to high. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
OpenAI o3-mini low
o3-mini-low
OpenAI's newest flagship model with reasoning effort set to low. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
OpenAI o1
o1
Useful when tackling complex problems in science, coding, math, and similar fields. Outdated compared to the newer o3 and o4 models. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
ChatGPT 4o
chatgpt-4o-latest
OpenAI's current standard model, the well-known ChatGPT. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT 5 Chat
gpt-5-chat-latest
GPT-5 is the GPT 5 version optimized for advanced, natural and multimodal conversations for enterprise applications. It's the model generally used in ChatGPT. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 400,000 tokens
Input: $---/M • Output: $---/M
GPT 5
gpt-5
GPT-5 is OpenAI's most advanced model, offering major improvements in reasoning, code quality, and user experience. It handles complex coding tasks with minimal prompting, provides clear explanations, and introduces enhanced agentic capabilities. Designed for logic and multi-step tasks. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 400,000 tokens
Input: $---/M • Output: $---/M
GPT 5 Codex
gpt-5-codex
GPT-5 Codex is a coding-focused variant of GPT-5 built for interactive development and long-running, autonomous engineering work. It excels at feature implementation, debugging, large-scale refactors, and code review, with higher steerability and tighter adherence to developer instructions for cleaner, production-ready code. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 400,000 tokens
Input: $---/M • Output: $---/M
GPT 5 Mini
gpt-5-mini
A lightweight version of GPT-5 for cost-sensitive applications. Balances advanced capabilities with efficient resource usage. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 400,000 tokens
Input: $---/M • Output: $---/M
GPT 5 Nano
gpt-5-nano
Optimized for speed and ideal for applications requiring low latency. The fastest and most efficient GPT-5 variant. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 400,000 tokens
Input: $---/M • Output: $---/M
OpenAI o1 preview
o1-preview
OpenAI's new flagship series of reasoning models for solving hard problems. Useful when tackling complex problems in science, coding, math, and similar fields. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
OpenAI o1-mini
o1-mini
A fast, cost-efficient version of OpenAI's o1 reasoning model tailored to coding, math, and science use cases. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
OpenAI o1 Pro
openai/o1-pro
OpenAI's flagship series of reasoning models for solving hard problems. The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers. o1 pro comes with a massive 100,000 word output window. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
GPT 4.1 Nano
openai/gpt-4.1-nano
Cheapest model in the GPT-4.1 series. Huge context window with fast throughput and low latency. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 1,047,576 tokens
Input: $---/M • Output: $---/M
GPT 4.1 Mini
openai/gpt-4.1-mini
Mid-sized GPT 4.1, comparable to GPT4o with a far higher context window, at lower cost and with higher speed. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 1,047,576 tokens
Input: $---/M • Output: $---/M
GPT 4.1
openai/gpt-4.1
GPT 4.1 is the new flagship model from OpenAI. Huge context window (1 mln tokens), outperforms GPT-4o and GPT 4.5 across coding and does very well at understanding large contexts. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 1,047,576 tokens
Input: $---/M • Output: $---/M
OpenAI o3
o3
Full version of OpenAI's o3. The current flagship model by OpenAI which OpenAI sees as getting close to true AGI. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
OpenAI o4-mini
o4-mini
o4 mini is the mini version of what will be the next version of OpenAI models. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
OpenAI o4-mini high
o4-mini-high
The maximum/high version of the o4-mini model. The next generation of OpenAI models. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
OpenAI o4-mini Deep Research
o4-mini-deep-research
Advanced research-focused model that does deep, multi-step reasoning on complex research tasks. Best for comprehensive analysis and investigation. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
OpenAI o3 Deep Research
o3-deep-research
o3-deep-research is OpenAI's most advanced model for deep research, designed to tackle complex, multi-step research tasks. It can search and synthesize information from across the internet as well as from your own data. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
GPT 4o mini
gpt-4o-mini
OpenAI's most cost-efficient small model. Cheaper and smarter than GPT-3.5 (the original ChatGPT), but less performant than gpt-4o. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT 4o Mini Search
gpt-4o-mini-search-preview
GPT 4o Mini with web search built in natively via OpenAI. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT 4o Search
gpt-4o-search-preview
GPT 4o with web search built in natively via OpenAI. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT 4o 08 06
gpt-4o-2024-08-06
OpenAI's precusor to ChatGPT-4o. Great on English text and code, with significant improvements on text in non-English languages. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT 4o 11 20
gpt-4o-2024-11-20
OpenAI's precusor to ChatGPT-4o. Great on English text and code, with significant improvements on text in non-English languages. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT 4 Turbo Preview
gpt-4-turbo-preview
Can take in the largest messages (up to 300 pages of context), and all round seen as one of the best in class models. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT 4o
gpt-4o
OpenAI's precusor to ChatGPT-4o. Great on English text and code, with significant improvements on text in non-English languages. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT 3.5 Turbo
gpt-3.5-turbo
Older model. Brought ChatGPT to the mainstream, seen as dated nowadays. 90% cheaper than GPT-4-Turbo, recommended for very simple tasks. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 16,385 tokens
Input: $---/M • Output: $---/M
GPT-OSS 120B TEE
TEE/gpt-oss-120b
Open-source GPT model with 120B parameters. running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Azure o1
azure-o1
Azure version of OpenAI o1. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Azure o3-mini
azure-o3-mini
Azure version of OpenAI o3-mini. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Azure gpt-4o
azure-gpt-4o
Azure version of OpenAI gpt-4o. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Azure gpt-4o-mini
azure-gpt-4o-mini
Azure version of OpenAI gpt-4o-mini. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Azure gpt-4-turbo
azure-gpt-4-turbo
Azure version of OpenAI gpt-4-turbo. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT 4o Reasoner
gpt-4o-reasoner
'DeepGPT4o', fusion of GPT 4o and Deepseek R1. Deepseek R1 reasons, then feeds it into GPT 4o to generate a response. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GPT 4.1 Reasoner
gpt-4.1-reasoner
'DeepGPT 4.1', fusion of GPT 4.1 and Deepseek R1. Deepseek R1 reasons, then feeds it into GPT 4.1 to generate a response. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
ChatGPT 4o Reasoner
chatgpt-4o-latest-reasoner
'DeepChatGPT', fusion of ChatGPT 4o and Deepseek R1. Deepseek R1 reasons, then feeds it into ChatGPT 4o to generate a response. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Google Gemini
Gemini 2.0 Pro 0205
gemini-2.0-pro-exp-02-05
Note: This model is now being routed to Gemini 2.5 Pro because Google no longer has the Gemini 2.0 Pro model available and Gemini 2.5 Pro is an across the board improvement.
Context: 2,097,152 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash 0520
gemini-2.5-flash-preview-05-20
Deprecated. Mapped to Gemini 2.5 Flash because this model is deprecated
Context: 1,048,000 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash 0520 Thinking
gemini-2.5-flash-preview-05-20:thinking
Deprecated. Mapped to Gemini 2.5 Flash because this model is deprecated
Context: 1,048,000 tokens
Input: $---/M • Output: $---/M
Gemini 2.0 Pro 1206
gemini-exp-1206
Note: This model is now being routed to Gemini 2.5 Pro because Google no longer has the Gemini 2.0 Pro model available and Gemini 2.5 Pro is an across the board improvement.
Context: 2,097,152 tokens
Input: $---/M • Output: $---/M
Gemini 2.0 Flash Thinking 1219
gemini-2.0-flash-thinking-exp-1219
The December 19t 2024 version of the Gemini 2.0 Flash model. Google's first thinking model, now relatively outdated.
Context: 32,767 tokens
Input: $---/M • Output: $---/M
Gemini Text + Image
gemini-2.0-flash-exp-image-generation
Gemini 2.0 Flash Image Generation. Can generate both text and images within the same prompt!
Context: 32,767 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash Preview
gemini-2.5-flash-preview-04-17
Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series. April 17th 2025 version.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash Preview Thinking
gemini-2.5-flash-preview-04-17:thinking
Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series. Thinking turned on by default
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Pro Experimental 0325
gemini-2.5-pro-exp-03-25
Gemini 2.5 Pro Exp 0325. Google's experimental model from March 2025.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Pro Preview 0325
gemini-2.5-pro-preview-03-25
Gemini 2.5 Pro Preview 0325. Google's model from March 2025.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Pro Preview 0506
gemini-2.5-pro-preview-05-06
Gemini 2.5 Pro Preview 0506. Google's model from May 2025.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Pro Preview 0605
gemini-2.5-pro-preview-06-05
Gemini 2.5 Pro Preview 0605. Google's latest preview model with advanced capabilities and performance improvements.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Pro
gemini-2.5-pro
Gemini 2.5 Pro stable release. Google's most capable generalist model with strong performance across a wide range of tasks.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash
gemini-2.5-flash
Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series. Stable release with improved capabilities.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash Lite Preview
gemini-2.5-flash-lite-preview-06-17
Ultra-lightweight and fast variant of Gemini 2.5 Flash. Preview release optimized for speed and efficiency.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash (No Thinking)
gemini-2.5-flash-nothinking
Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series. Stable release with thinking mode disabled.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash Preview (09/2025)
gemini-2.5-flash-preview-09-2025
State-of-the-art Gemini 2.5 Flash checkpoint tuned for advanced reasoning, coding, math, and scientific tasks. Built-in thinking for higher accuracy and nuanced context handling.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash Preview (09/2025) – Thinking
gemini-2.5-flash-preview-09-2025-thinking
Same checkpoint with thinking enabled by default for deeper reasoning and stepwise analysis on complex tasks.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash Lite Preview (09/2025)
gemini-2.5-flash-lite-preview-09-2025
Lightweight Gemini 2.5 reasoning variant optimized for ultra-low latency and cost. Higher throughput and faster generation than earlier Flash models.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.5 Flash Lite Preview (09/2025) – Thinking
gemini-2.5-flash-lite-preview-09-2025-thinking
Lite preview with thinking enabled by default for more reliable reasoning while retaining low-latency performance.
Context: 1,048,756 tokens
Input: $---/M • Output: $---/M
Gemini 2.0 Flash Exp
gemini-2.0-flash-exp
Experimental version of Google's newest model, outperforming even Gemini 1.5 Pro.
Context: 1,048,576 tokens
Input: $---/M • Output: $---/M
Gemini 2.0 Flash Thinking 0121
gemini-2.0-flash-thinking-exp-01-21
Google's newest model, outperforming even Gemini 1.5 Pro, now with a thinking mode enabled similar to the o1 series of OpenAI.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Gemini 2.0 Flash
gemini-2.0-flash-001
Upgraded version of Gemini Flash 1.5. Faster, with higher output, and overall increase in intelligence.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Gemini 2.0 Flash Lite
gemini-2.0-flash-lite
Upgraded version of Gemini Flash 1.5. Faster, with higher output, and overall increase in intelligence.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Gemini LearnLM Experimental
learnlm-1.5-pro-experimental
LearnLM is a task-specific model trained to align with learning science principles when following system instructions for teaching and learning use cases. For instance, the model can take on tasks to act as an expert or guide to educate users on specific topics.
Context: 32,767 tokens
Input: $---/M • Output: $---/M
Gemini 1.5 Flash
google/gemini-flash-1.5
Google's fastest multimodal model with great performance for diverse, repetitive tasks and a 2 million words context window.
Context: 2,000,000 tokens
Input: $---/M • Output: $---/M
Gemma 3 27B TEE
TEE/gemma-3-27b-it
Google's Gemma 3 27B instruction-tuned model. running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Gemini 2.0 Pro Reasoner
gemini-2.0-pro-reasoner
Note: This model is now being routed to Gemini 2.5 Pro because Google no longer has the Gemini 2.0 Pro model available and Gemini 2.5 Pro is an across the board improvement. 'DeepGemini', fusion of Gemini 2.5 Pro and Deepseek R1.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Gemma 3 27B IT
unsloth/gemma-3-27b-it
Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Gemma 3 12B IT
unsloth/gemma-3-12b-it
Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Gemma 3 4B IT
unsloth/gemma-3-4b-it
Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Gemma 3 1B IT
unsloth/gemma-3-1b-it
Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Gemma 3 27B IT Abliterated
Gemma-3-27B-it-Abliterated
Gemma 3 27B IT - Abliterated model for censor removal
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Gemma 3 27B Big Tiger v3
Gemma-3-27B-Big-Tiger-v3
Gemma 3 27B Big Tiger v3 - Advanced model for creative writing and roleplay
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Gemma 3 27B Nidum Uncensored
Gemma-3-27B-Nidum-Uncensored
Gemma 3 27B Nidum Uncensored - Unrestricted creative model
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Gemma 3 27B IT
Gemma-3-27B-it
Google's Gemma 3 27B instruction-tuned model for versatile tasks
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Gemma 3 27B RPMax v3
Gemma-3-27B-ArliAI-RPMax-v3
Gemma 3 27B LoRA fine-tuned for enhanced creative and roleplay capabilities
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Gemma 3 27B Glitter
Gemma-3-27B-Glitter
Gemma 27B LoRA with sparkling creative personality
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Gemma 3 27B CardProjector v4
Gemma-3-27B-CardProjector-v4
Gemma 27B LoRA specialized in creating character profiles (cards)
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Anthropic
Claude Sonnet 4.5
claude-sonnet-4-5-20250929
Frontier‑level coding and agentic performance. Claude Sonnet 4.5 leads on real‑world coding tasks and shows substantial gains in computer use, reasoning, and math. Designed for long‑horizon, multi‑step work across IDE + terminal workflows, large codebases, and complex agents — a drop‑in upgrade over previous Sonnet versions.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Claude Sonnet 4.5 Thinking
claude-sonnet-4-5-20250929-thinking
Adds extended, step‑by‑step reasoning for tougher coding, planning, and multi‑tool tasks. Ideal for long‑horizon agent workflows, complex problem solving, and scenarios that benefit from explicit thinking traces.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Claude 3.7 Sonnet Reasoner
claude-3-7-sonnet-reasoner
Claude 3.7 Sonnet Reasoner blends Deepseek R1's reasoning with Claude 3.7 Sonnet's response.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepClaude
deepclaude
Harness the power of DeepSeek R1's reasoning combined with Claude's creativity and code generation. Feeds your query into DeepSeek R1, then feeds the query + thinking process into Claude 3.5 Sonnet and returns an answer. Note: this routes through original DeepSeek meaning your data may be stored and used by DeepSeek.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Claude 3.7 Sonnet
claude-3-7-sonnet-20250219
Anthropic's updated most intelligent model. Preferred by many for its programming skills and its natural language.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Sonnet
claude-sonnet-4-20250514
Claude 4 Sonnet by Anthropic. A new generation model with improved capabilities, especially on programming and development. NOTE: Inputs > 200k tokens are charged at 2x input, 1.5x output rate.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Opus
claude-opus-4-20250514
Claude 4 Opus by Anthropic. The premium version of the new Claude models. A new generation model with improved capabilities, especially on programming and development.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 3.7 Sonnet Thinking
claude-3-7-sonnet-thinking
Anthropic's Claude 3.7 Sonnet with the ability to show its thinking process step by step.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Sonnet Thinking
claude-sonnet-4-thinking
Anthropic's Claude 4 Sonnet with the ability to show its thinking process step by step.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Opus Thinking
claude-opus-4-thinking
Anthropic's Claude 4 Opus with the ability to show its thinking process step by step.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 3.5 Sonnet
claude-3-5-sonnet-20241022
One of Anthropic's top models, offering even better results on many subjects than GPT-4o.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 3.5 Sonnet Old
claude-3-5-sonnet-20240620
Anthropic's most intelligent model, offering even better results on many subjects than GPT-4o.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 3.5 Haiku
claude-3-5-haiku-20241022
Anthropic's updated faster and cheaper model, offering good results on chatbots and coding.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 3 Opus
claude-3-opus-20240229
Anthropic's flagship model, outperforming GPT-4 on most benchmarks.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 3.7 Sonnet Thinking (1K)
claude-3-7-sonnet-thinking:1024
Claude 3.7 Sonnet with minimal thinking budget (1,024 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 3.7 Sonnet Thinking (8K)
claude-3-7-sonnet-thinking:8192
Claude 3.7 Sonnet with reduced thinking budget (8,192 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 3.7 Sonnet Thinking (32K)
claude-3-7-sonnet-thinking:32768
Claude 3.7 Sonnet with extended thinking budget (32,768 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 3.7 Sonnet Thinking (128K)
claude-3-7-sonnet-thinking:128000
Claude 3.7 Sonnet with maximum thinking budget (128,000 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Sonnet Thinking (1K)
claude-sonnet-4-thinking:1024
Claude 4 Sonnet with minimal thinking budget (1,024 tokens).
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Sonnet Thinking (8K)
claude-sonnet-4-thinking:8192
Claude 4 Sonnet with reduced thinking budget (8,192 tokens).
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Sonnet Thinking (32K)
claude-sonnet-4-thinking:32768
Claude 4 Sonnet with extended thinking budget (32,768 tokens).
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Sonnet Thinking (64K)
claude-sonnet-4-thinking:64000
Claude 4 Sonnet with maximum thinking budget (64,000 tokens).
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Opus Thinking (1K)
claude-opus-4-thinking:1024
Claude 4 Opus with minimal thinking budget (1,024 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Opus Thinking (8K)
claude-opus-4-thinking:8192
Claude 4 Opus with reduced thinking budget (8,192 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Opus Thinking (32K)
claude-opus-4-thinking:32768
Claude 4 Opus with extended thinking budget (32,768 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4 Opus Thinking (32K)
claude-opus-4-thinking:32000
Claude 4 Opus with maximum thinking budget (32,000 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4.1 Opus
claude-opus-4-1-20250805
Claude Opus 4.1 is a powerful model from Anthropic that delivers sustained performance for complex coding and other long-running tasks that require thousands of steps, expanding the capabilities of AI agents.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4.1 Opus Thinking
claude-opus-4-1-thinking
Anthropic's Claude 4.1 Opus with the ability to show its thinking process step by step.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4.1 Opus Thinking (1K)
claude-opus-4-1-thinking:1024
Claude 4.1 Opus with minimal thinking budget (1,024 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4.1 Opus Thinking (8K)
claude-opus-4-1-thinking:8192
Claude 4.1 Opus with reduced thinking budget (8,192 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4.1 Opus Thinking (32K)
claude-opus-4-1-thinking:32768
Claude 4.1 Opus with extended thinking budget (32,768 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Claude 4.1 Opus Thinking (32K)
claude-opus-4-1-thinking:32000
Claude 4.1 Opus with maximum thinking budget (32,000 tokens).
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Qwen
Qwen 3 Coder 480B
Qwen/Qwen3-Coder-480B-A35B-Instruct-FP8
Qwen 3 Coder 480B, a 480 billion total parameter model with 35B active, and 160 total experts with 8 active. Performs similar to Claude 4 Sonnet in coding benchmarks, but does so at a much lower price. Quantized at FP8.
Context: 262,000 tokens
Input: $---/M • Output: $---/M
Qwen3 Coder Plus
qwen/qwen3-coder-plus
Alibaba’s proprietary upgrade to the open‑weights Qwen3 Coder 480B A35B. A coding‑first agent model with strong tool use and environment control for autonomous programming, while remaining capable at general tasks.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Qwen3 Coder Flash
qwen/qwen3-coder-flash
A speed‑optimized and budget‑friendly sibling to Coder Plus. Excellent at code generation and agentic workflows (tool use, environment interaction) with solid general‑purpose ability.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Qwen 3 32b
qwen/qwen3-32b
Qwen 3 32b is a 32b model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.
Context: 41,000 tokens
Input: $---/M • Output: $---/M
Qwen 3 14b
qwen/qwen3-14b
Qwen 3 14b is a 14b model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.
Context: 41,000 tokens
Input: $---/M • Output: $---/M
Qwen3 30B A3B
qwen/qwen3-30b-a3b
Qwen 3 30b A3B is a 30b model with 3 billion active parameters per pass. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.
Context: 41,000 tokens
Input: $---/M • Output: $---/M
Qwen3 VL 235B A22B Thinking
qwen3-vl-235b-a22b-thinking
Qwen3 Vision‑Language model built on a 235B MoE backbone (≈22B active per token). Strong at OCR, charts/tables, multi‑image reasoning, and complex document understanding. The Thinking variant enables long‑form, chain‑of‑thought style reasoning.
Context: N/A tokens
Input: $---/M • Output: $---/M
Qwen3 VL 235B A22B Instruct
qwen3-vl-235b-a22b-instruct
Note: direct via Alibaba, a Chinese entity - privacy and logging guarantees may be limited. Qwen3 Vision‑Language model (235B MoE, ≈22B active) tuned for instruction following and grounded visual QA. Excels at image understanding, dense OCR, charts and diagrams, and multi‑image context. Use this variant when you want concise, direct answers grounded in the visuals.
Context: N/A tokens
Input: $---/M • Output: $---/M
Qwen3 Max
qwen/qwen3-max
Qwen3 Max. The latest Qwen 3 model (5 september 2025). Higher accuracy in coding and science, better instruction following, and optimized for tool calling.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Qwen3 Coder 30B A3B Instruct
qwen3-coder-30b-a3b-instruct
Qwen3 Coder 30B with 3B active parameters, optimized for code generation and technical tasks
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Qwen 3 235b A22B
Qwen/Qwen3-235B-A22B
Qwen 3 235b is a 235b model with 22B active parameters. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.
Context: 41,000 tokens
Input: $---/M • Output: $---/M
Qwen 3 235b A22B 2507
Qwen/Qwen3-235B-A22B-Instruct-2507
Qwen 3 235b A22B Instruct 2507 the updated version of Qwen3 235B A22B, with significant improvements in performance. This model is non-thinking.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Qwen 3 235b A22B 2507 Thinking
Qwen/Qwen3-235B-A22B-Thinking-2507
The thinking version of Qwen 3 235b A22B 2507, with enhanced reasoning capabilities and step-by-step problem solving.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Qwen3 Next 80B A3B (Instruct)
Qwen/Qwen3-Next-80B-A3B-Instruct
Based on the new Qwen3‑Next architecture (hybrid attention, highly sparse MoE, training‑stability optimizations, and multi‑token prediction), the Qwen3‑Next‑80B‑A3B‑Instruct model delivers extreme efficiency with only 3B active parameters per pass. It performs comparably to Qwen3‑235B‑A22B‑Instruct‑2507 and shows clear advantages on ultra‑long context tasks (up to 256K tokens).
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Qwen3 Next 80B A3B (Thinking)
Qwen/Qwen3-Next-80B-A3B-Thinking
Introduction We believe that Context Length Scaling and Total Parameter Scaling are two major trends in the future of large models. To further improve training and inference efficiency under long-context and large-parameter settings, we design a brand-new model architecture called Qwen3-Next. Compared to the MoE structure of Qwen3, Qwen3-Next introduces several key improvements: a hybrid attention mechanism, a highly sparse Mixture-of-Experts (MoE) structure, training-stability-friendly optimizations, and a multi-token prediction mechanism for faster inference. Based on this new architecture, we train the Qwen3-Next-80B-A3B-Base model — an 80-billion-parameter model that activates only 3 billion parameters during inference. This base model achieves performance comparable to (or even slightly better than) the dense Qwen3-32B model, while using less than 10% of its training cost (GPU hours). In inference, especially with context lengths over 32K tokens, it delivers more than 10x higher throughput — achieving extreme efficiency in both training and inference. We develop and release two post-trained versions based on Qwen3-Next-80B-A3B-Base: Qwen3-Next-80B-A3B-Instruct and Qwen3-Next-80B-A3B-Thinking. We solve the long-standing stability and efficiency issues in reinforcement learning (RL) training caused by the hybrid attention + high-sparsity MoE architecture. This led to improvements in both RL training speed and final performance. The Qwen3-Next-80B-A3B-Instruct performs comparably to our flagship model Qwen3-235B-A22B-Instruct-2507, and shows clear advantages in tasks requiring ultra-long context (up to 256K tokens). The Qwen3-Next-80B-A3B-Thinking excels at complex reasoning tasks — outperforming higher-cost models like Qwen3-30B-A3B-Thinking-2507 and Qwen3-32B-Thinking, outperforming the closed-source Gemini-2.5-Flash-Thinking on multiple benchmarks, and approaching the performance of our top-tier model Qwen3-235B-A22B-Thinking-2507.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Qwen3 30B A3B Instruct 2507
qwen3-30b-a3b-instruct-2507
Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Qwen 3 8B
Qwen/Qwen3-8B
Qwen 3 8B is a 8B model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.
Context: 41,000 tokens
Input: $---/M • Output: $---/M
Qwen3 Coder TEE
TEE/qwen3-coder
Qwen3 model optimized for coding tasks. running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Qwen25 VL 72b
qwen25-vl-72b-instruct
Qwen25 VL 72b model with 32k context window
Context: 32,000 tokens
Input: $---/M • Output: $---/M
QwenLong L1 32B
Tongyi-Zhiwen/QwenLong-L1-32B
The first long-context LRM trained with reinforcement learning for long-context reasoning. Outperforms flagship models like o3-mini and achieves performance on par with Claude 3.7 Sonnet Thinking, demonstrating leading performance for long-context document QA tasks.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B Instruct
Qwen2.5-72B-Instruct
Qwen 2.5 72B Instruct
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B Instruct Abliterated
Qwen2.5-72B-Instruct-Abliterated
Qwen 2.5 72B Instruct Abliterated
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B Magnum v4
Qwen2.5-72B-Magnum-v4
Qwen 2.5 72B Magnum v4
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B Evathene v1.2
Qwen2.5-72B-Evathene-v1.2
Qwen 2.5 72B Evathene v1.2
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B Chuluun v0.08
Qwen2.5-72B-Chuluun-v0.08
Qwen 2.5 72B Chuluun v0.08
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B Evathene v1.3
Qwen2.5-72B-Evathene-v1.3
Creative Model
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B Malaysian
Qwen2.5-72B-Instruct-Malaysian
Qwen 2.5 72B fine-tuned for Malaysian language and culture
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B Spiral da HYAH
Qwen2.5-72B-spiral-da-HYAH
Qwen 2.5 72B Spiral da HYAH for creative and engaging content
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B Doctor Kunou
Qwen2.5-72B-Doctor-Kunou
Qwen 2.5 72B Doctor Kunou for specialized creative writing
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B Eva Mindlink
Qwen2.5-72B-Eva-Mindlink
Qwen 2.5 72B Eva Mindlink for immersive roleplay experiences
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Zhipu
GLM Zero Preview
glm-zero-preview
GLM Zero Preview is a thinking model like o1, but with a smaller context window
Context: 8,000 tokens
Input: $---/M • Output: $---/M
GLM 4 Plus 0111
glm-4-plus-0111
GLM 4 Plus 0111 is a 1M token context window model
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM 4 Air 0111
glm-4-air-0111
MiniMax's flagship model with a 1M token context window
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM Z1 Air
glm-z1-air
Incredibly cheap yet highly performant Chinese model, comparable to Deepseek R1 in performance on many metrics at 1/30th the cost.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
GLM Z1 AirX
glm-z1-airx
Fastest reasoning model in China, with up to 200 tokens per second. The stronger version of GLM Z1 Air.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
GLM 4.1V Thinking Flash
glm-4.1v-thinking-flash
Vision-Language Model with thinking paradigm and reinforcement learning. Achieves state-of-the-art performance among 10B-parameter VLMs. Supports 64k context length, handles arbitrary aspect ratios and up to 4K image resolution. Bilingual Chinese/English.
Context: 64,000 tokens
Input: $---/M • Output: $---/M
GLM 4.1V Thinking FlashX
glm-4.1v-thinking-flashx
Enhanced version of GLM 4.1V Thinking Flash. Vision-Language Model with advanced reasoning for complex visual tasks, multimodal problem solving, and intelligent agents. Supports 64k context length and handles arbitrary aspect ratios up to 4K resolution.
Context: 64,000 tokens
Input: $---/M • Output: $---/M
GLM 4.6
glm-4.6
Latest GLM series chat model with strong general performance. Note: Routed directly via GLM (Zhipu) — not via open‑source providers.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
GLM 4.6 Thinking
glm-4.6:thinking
Thinking version of the latest GLM series chat model with strong general performance. Note: Routed directly via GLM (Zhipu) — not via open‑source providers.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
GLM-4 Plus
glm-4-plus
GLM high-intelligence flagship model with 128K context window
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM-4
glm-4
High-intelligence model with 128K context window
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM-4 Long
glm-4-long
Extended context model supporting up to 1M tokens
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
GLM 4.5
zai-org/GLM-4.5-FP8
GLM-4.5 is Z-AI's latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture with 355B total / 32B active parameters and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM 4.5 Air
zai-org/GLM-4.5-Air
GLM-4.5-Air is a 106B total / 12B active parameter model designed to unify frontier reasoning, coding, and agentic capabilities. On the SWE-bench Verified benchmark, it delivers the best performance at its scale with a competitive performance-to-cost ratio.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM 4.5 (Thinking)
zai-org/GLM-4.5-FP8:thinking
GLM-4.5 with thinking mode enabled for enhanced reasoning capabilities. Shows step-by-step thought process.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM 4.5 Air (Thinking)
zai-org/GLM-4.5-Air:thinking
GLM-4.5-Air with thinking mode enabled for enhanced reasoning capabilities. Shows step-by-step thought process.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM 4.5V
zai-org/GLM-4.5V-FP8
GLM-4.5V is Z.AI's state-of-the-art multimodal reasoning model, excelling across 41 benchmarks with leading performance in VQA, STEM reasoning, video understanding, GUI tasks, and OCR/chart analysis.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM 4.5V (Thinking)
zai-org/GLM-4.5V-FP8:thinking
GLM-4.5V multimodal model with thinking mode enabled for enhanced reasoning capabilities. Shows step-by-step thought process.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM-4 AirX
glm-4-airx
Fastest GLM-4 variant with 8K context window
Context: 8,000 tokens
Input: $---/M • Output: $---/M
GLM-4 Air
glm-4-air
High-performance model with 128K context window
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM-4 Flash
glm-4-flash
Extremely cheap model with 128K context window
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM Z1 9B 0414
THUDM/GLM-Z1-9B-0414
9B small-sized model maintaining the open-source tradition. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
GLM 4 9B 0414
THUDM/GLM-4-9B-0414
A 9B parameter version of the GLM-4 series, offering a balance of performance and efficiency.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
GLM Z1 Rumination 32B 0414
THUDM/GLM-Z1-Rumination-32B-0414
A deep reasoning model with rumination capabilities (benchmarked against OpenAI's Deep Research). Employs longer periods of deep thought to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). Integrates search tools during its deep thinking process.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
GLM 4 32B 0414
THUDM/GLM-4-32B-0414
Features 32 billion parameters. Performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series. Pre-trained on 15T of high-quality data, including reasoning-type synthetic data. Enhanced performance in instruction following, engineering code, and function calling.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
GLM Z1 32B 0414
THUDM/GLM-Z1-32B-0414
A reasoning model with deep thinking capabilities, based on GLM-4-32B-0414. Further trained on tasks involving mathematics, code, and logic. Significantly improves mathematical abilities and capability to solve complex tasks.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek
Deepseek R1 0528
deepseek-ai/DeepSeek-R1-0528
The new (May 28th) Deepseek R1 model.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Deepseek R1 0528 Qwen3 8B
deepseek-ai/DeepSeek-R1-0528-Qwen3-8B
The new (May 28th) Deepseek R1 model in distilled version. Way cheaper, way faster, yet still extremely performant.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Deepseek R1 T Chimera
tngtech/DeepSeek-R1T-Chimera
Deepseek V3 0324 with R1 reasoning using a novel construction method. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Deepseek TNG R1T2 Chimera
tngtech/DeepSeek-TNG-R1T2-Chimera
Assembly of Experts Chimera model constructed with DeepSeek R1-0528, R1 and V3-0324. This refined tri-mind assembly fixes the <think> token consistency issue, operates ~20% faster than R1 and twice as fast as R1-0528, while being significantly more intelligent than regular R1 on benchmarks like GPQA and AIME-24.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
ByteDance Seed OSS 36B
ByteDance-Seed/Seed-OSS-36B-Instruct
ByteDance's Seed-OSS-36B-Instruct is a 36 billion parameter open-source language model optimized for instruction following and general-purpose tasks. It offers strong performance across various domains including reasoning, coding, and creative writing.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek R1
deepseek-r1
DeepSeek's R1 is a thinking model, scoring very well on all benchmarks at low cost. This version is run via open-source providers, never routing through DeepSeek themselves.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek V3/Deepseek Chat
deepseek-chat
DeepSeek original V3 model, trained on nearly 15 trillion tokens, matches leading closed-source models at a far lower price. Quantized at FP8.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek V3.1 Original
deepseek-v3.1-original
Deepseek V3.1 through the direct Chinese providers! ⚠️ Note: This model runs through DeepSeek directly, so we cannot guarantee no logging.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek V3.2 Exp Original
deepseek-v3.2-exp-original
Experimental DeepSeek V3.2 routed directly via DeepSeek. ⚠️ WARNING: Requests are sent to DeepSeek directly; we cannot guarantee no logging by the provider.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek V3.2 Exp Thinking Original
deepseek-v3.2-exp-thinking-original
Experimental DeepSeek V3.2 (Thinking) routed directly via DeepSeek. ⚠️ WARNING: Requests are sent to DeepSeek directly; we cannot guarantee no logging by the provider.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek V3/Chat Cheaper
deepseek-chat-cheaper
Cheaper version of Deepseek V3/Chat. Note: may be routed through Deepseek itself. Quantized at FP8.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek V3.1
deepseek-ai/DeepSeek-V3.1
DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. It does better at tool calling and agent tasks, and has higher thinking efficiency than its predecessor. This is the non-thinking version. Quantized at FP8.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Deepseek V3.1 (Thinking)
deepseek-ai/DeepSeek-V3.1:thinking
Thinking enabled version of Deepseek V3.1. DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. It does better at tool calling and agent tasks, and has higher thinking efficiency than its predecessor. Quantized at FP8.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek V3.1 Terminus
deepseek-ai/DeepSeek-V3.1-Terminus
DeepSeek-V3.1-Terminus. The latest update builds on V3.1’s strengths while addressing key user feedback. Language consistency improvements (fewer CN/EN mix-ups, no random chars), stronger Code Agent & Search Agent performance, and more stable, reliable outputs across benchmarks.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek V3.1 Terminus (Thinking)
deepseek-ai/DeepSeek-V3.1-Terminus:thinking
Thinking-enabled DeepSeek-V3.1-Terminus with improved language consistency, upgraded Code/Search Agents, and stronger stability and reliability versus V3.1.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek Chat 0324
deepseek-v3-0324
DeepSeek V3 0324, DeepSeek's 03 March 2025 V3 model, optimized for general-purpose tasks. Quantized at FP8.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek R1 Zero Preview
deepseek-ai/DeepSeek-R1-Zero
Preview version of Deepseek R1, also known as DeepSeek R1 Zero. Deepseek R1 without the supervised finetuning.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek Reasoner
deepseek-reasoner
DeepSeek-R1 is now live and open source, rivaling OpenAI's Model o1.
Context: 64,000 tokens
Input: $---/M • Output: $---/M
Deepseek R1 Qwen Abliterated
huihui-ai/DeepSeek-R1-Distill-Qwen-32B-abliterated
Uncensored version of the Deepseek R1 Qwen 32B model
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Deepseek R1 Llama 70b Abliterated
huihui-ai/DeepSeek-R1-Distill-Llama-70B-abliterated
Uncensored version of the Deepseek R1 Llama 70B model
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Deepseek R1 Cheaper
deepseek-reasoner-cheaper
Cheaper version of DeepSeek R1. Note: may be routed through Chinese providers.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek R1 Fast
deepseek-r1-sambanova
DeepSeek R1 via Sambanova: the full model with very fast output. Note: max 4k output tokens.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepSeek R1 70B Distill TEE
TEE/deepseek-r1-70b-distill
DeepSeek's R1 model distilled into Llama 70B architecture for improved efficiency. running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
DeepSeek Chat V3 0324 TEE
TEE/deepseek-chat-v3-0324
DeepSeek V3 0324, DeepSeek's 03 March 2025 V3 model, optimized for general-purpose tasks. Running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider. Quantized at FP8.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
DeepSeek Prover v2 671B
deepseek/deepseek-prover-v2-671b
Specializing in Mathematical Theorem Proving The new model employs a Mixture of Experts (MoE) architecture and is trained using the Lean 4 framework for formal reasoning. With 671 billion parameters, it leverages reinforcement learning and large-scale synthetic data to significantly enhance automated theorem-proving capabilities.
Context: 160,000 tokens
Input: $---/M • Output: $---/M
Mistral
Sarvan Medium
sarvan-medium
Sarvam AI has launched Sarvam-M, a 24-billion-parameter hybrid language model boasting strong performance in math, programming, and Indian languages.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Mistral Medium 3
mistralai/mistral-medium-3
Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost. On performance, Mistral Medium 3 also surpasses leading open models such as Llama 4 Maverick and enterprise models such as Cohere Command A. On pricing, the model beats cost leaders such as DeepSeek v3, both in API and self-deployed systems.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Mistral Medium 3.1
mistralai/mistral-medium-3.1
Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
QwQ 32b Arli V1
QwQ-32B-ArliAI-RpR-v1
A QwQ 32b finetuned for roleplay and storytelling.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
The Drummer Cydonia 24B v2
TheDrummer/Cydonia-24B-v2
Cydonia 24B v2 is a finetune of Mistral's latest 'Small' model (2501). Aliases: Cydonia 24B, Cydonia v2, Cydonia on that broken base.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
The Drummer Cydonia 24B v4
TheDrummer/Cydonia-24B-v4
Cydonia 24B v4 is the latest iteration of TheDrummer's Cydonia series, a finetune of Mistral Small.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
The Drummer Cydonia 24B v4.1
TheDrummer/Cydonia-24B-v4.1
Cydonia 24B v4.1 is the newest release of TheDrummer's Cydonia series, featuring improved performance and refined capabilities.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Mistral Large 2411
mistralai/mistral-large
Upgrade to Mistral's flagship model. It is fluent in English, French, Spanish, German, and Italian, with high grammatical accuracy, with a long context window.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
TheDrummer Skyfall 36B V2
thedrummer/skyfall-36b-v2
TheDrummer's Skyfall 36B V2, a 36B parameter model with a focus on high quality and consistency.
Context: 64,000 tokens
Input: $---/M • Output: $---/M
Mistral Small 3.1 24B
TEE/mistral-small-3-1-24b
Mistral's small model with 24B parameters. Secure inference with encrypted inputs and outputs, running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Mistral Tiny
mistralai/mistral-tiny
Powered by Mistral-7B-v0.2, best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Mistral Saba
mistralai/mistral-saba
Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional datasets, it supports multiple Indian-origin languages—including Tamil and Malayalam—alongside Arabic. This makes it a versatile option for a range of regional and multilingual applications.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Mistral 7B Instruct
mistralai/mistral-7b-instruct
Optimized for speed with decent context length
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Mistral Nemo
mistralai/Mistral-Nemo-Instruct-2407
12B parameter model with multilingual support.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Dolphin 3.0 R1 Mistral 24B
cognitivecomputations/Dolphin3.0-R1-Mistral-24B
Latest Dolphin model with R1 reasoning capabilities built on Mistral 24B.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Rocinante 12b
TheDrummer/Rocinante-12B-v1.1
Designed for engaging storytelling and rich prose. Expanded vocabulary with unique and expressive word choices, enhanced creativity and captivating stories.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
UnslopNemo 12b v4
TheDrummer/UnslopNemo-12B-v4.1
UnslopNemo v4 is the previous version from the creator of Rocinante, designed for adventure writing and role-play scenarios.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
NemoMix 12B Unleashed
MarinaraSpaghetti/NemoMix-Unleashed-12B
Great for RP and storytelling.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Mistral Nemo Starcannon 12b v1
VongolaChouko/Starcannon-Unleashed-12B-v1.0
Mistral Nemo finetine that offers improvements on roleplay.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Mistral Nemo Inferor 12B
Infermatic/MN-12B-Inferor-v0.0
Inferor is a merge of top roleplay models, expert on immersive narratives and storytelling.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Mistral Small 31 24b Instruct
mistral-small-31-24b-instruct
Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Mistral Small 3.2 24b Instruct
chutesai/Mistral-Small-3.2-24B-Instruct-2506
The latest iteration of Mistral Small, version 3.2 (2506) brings enhanced performance and capabilities. With 24 billion parameters, this model delivers state-of-the-art results across text generation tasks with improved efficiency.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Magistral Small 2506
Magistral-Small-2506
Magistral Small is a compact, high-performance language model optimized for efficient inference while maintaining strong capabilities across various tasks.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Mistral Nemo 12B Instruct 2407
Mistral-Nemo-12B-Instruct-2407
Mistral Nemo 12B Instruct 2407
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Alibaba
Qwen: QvQ Max
qvq-max
QvQ Max is the top model of the Qwen series. QvQ Max is capable of thinking and reasoning, can achieve significantly enhanced performance especially on hard problems.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Qwen: QwQ 32B
qwq-32b
QwQ is the reasoning model of the Qwen series. QwQ is capable of thinking and reasoning, can achieve significantly enhanced performance especially on hard problems.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Tongyi DeepResearch 30B A3B
Alibaba-NLP/Tongyi-DeepResearch-30B-A3B
Tongyi DeepResearch, an agentic large language model featuring 30 billion total parameters, with only 3 billion activated per token. Developed by Tongyi Lab, the model is specifically designed for long‑horizon, deep information‑seeking tasks. Tongyi‑DeepResearch demonstrates state‑of‑the‑art performance across agentic search benchmarks, including Humanity's Last Exam, BrowserComp, BrowserComp‑ZH, WebWalkerQA, GAIA, xbench‑DeepSearch and FRAMES.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Qwen QwQ 32B Preview
Qwen/QwQ-32B-Preview
Experimental release of Qwen's reasoning model. Great at coding and math, but still in development so may exhibit odd bugs. Not production-ready.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Qwen QwQ 32B Preview
qwen/qwq-32b-preview
Experimental release of Qwen's reasoning model. Great at coding and math, but still in development so may exhibit odd bugs. Not production-ready.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Dolphin 72b
cognitivecomputations/dolphin-2.9.2-qwen2-72b
Dolphin is the most uncensored model yet, built on top of Qwen's 72b model.
Context: 8,192 tokens
Input: $---/M • Output: $---/M
Grayline Qwen3 8B
soob3123/GrayLine-Qwen3-8B
Grayline is an neutral AI assistant engineered for uncensored information delivery and task execution. This model operates without inherent ethical or moral frameworks, designed to process and respond to any query with objective efficiency and precision. Grayline's core function is to leverage its full capabilities to provide direct answers and execute tasks as instructed, without offering unsolicited commentary, warnings, or disclaimers. It accesses and processes information without bias or restriction.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Qwen Turbo
qwen-turbo
Alibaba's fastest and cheapest model. Suitable for simple tasks, fast and low cost, with a 1 million token context window.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 Max
qwen-max
Qwen 2.5 Max is the upgraded version of Qwen Max, beating GPT-4o, Deepseek V3 and Claude 3.5 Sonnet in benchmarks.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Qwen Plus
qwen-plus
Alibaba's balanced model. Fast, cheap, yet still very powerful.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Qwen Long 10M
qwen-long
Alibaba's huge context window model. Takes in up to 10 million tokens, which is equivalent to dozens of books.
Context: 10,000,000 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 Coder 32b
Qwen/Qwen2.5-Coder-32B-Instruct
The latest series of Code-Specific Qwen large language models.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Qwen2.5 72B
qwen/qwen-2.5-72b-instruct
Great multilingual support, strong at mathematics and coding, supports roleplay and chatbots.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
EVA Qwen2.5 72B
eva-unit-01/eva-qwen-2.5-72b
Full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.
Context: 16,000 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 72B
TEE/qwen2-5-72b
Alibaba's Qwen 2.5 with 72B parameters. Secure inference with encrypted inputs and outputs, running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Qwen 2.5 32b EVA
Qwen2.5-32B-EVA-v0.2
A Qwen 2.5 32b finetuned for roleplay and storytelling.
Context: 24,576 tokens
Input: $---/M • Output: $---/M
Cogito v1 Preview Qwen 32B
deepcogito/cogito-v1-preview-qwen-32B
32-B parameter reasoning model from DeepCogito (Qwen backbone) – strong general reasoning & coding at low price.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Baidu
Ernie 4.5 8k Preview
ernie-4.5-8k-preview
ERNIE 4.5 is Baidu's new generation native multimodal foundation model independently developed by the company. It achieves collaborative optimization through joint modeling of multiple modalities, demonstrating exceptional multimodal comprehension capabilities. With refined language skills, it exhibits comprehensive improvements in understanding, generation, reasoning and memory, along with notable enhancements in hallucination prevention, logical reasoning, and coding abilities. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.
Context: 8,000 tokens
Input: $---/M • Output: $---/M
Ernie X1 32k
ernie-x1-32k-preview
ERNIE X1 is a Baidu model, surpassing earlier versions in terms of intelligence and maximum input/output size. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Ernie X1 Turbo 32k
ernie-x1-turbo-32k
ERNIE X1 is a deep-thinking reasoning model, outperforming DeepSeek R1 and the latest version of V3. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Ernie 4.5 Turbo VL 32k
ernie-4.5-turbo-vl-32k
ERNIE 4.5 Turbo demonstrates overall progress in hallucination reduction, logical reasoning, and coding abilities, with faster response. The multimodal capabilities of ERNIE 4.5 Turbo are on par with GPT-4.1 and superior to GPT-4o across multiple benchmarks. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Ernie 4.5 Turbo 128k
ernie-4.5-turbo-128k
ERNIE 4.5 Turbo demonstrates overall progress in hallucination reduction, logical reasoning, and coding abilities, with faster response. The multimodal capabilities of ERNIE 4.5 Turbo are on par with GPT-4.1 and superior to GPT-4o across multiple benchmarks. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Ernie X1 32k
ernie-x1-32k
ERNIE X1 is a deep-thinking reasoning model, outperforming DeepSeek R1 and the latest version of V3. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
ERNIE 4.5 300B
baidu/ernie-4.5-300b-a47b
ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in both English and Chinese. Optimized for high-throughput inference and efficient scaling, it uses a heterogeneous MoE structure with advanced routing and quantization strategies, including FP8 and 2-bit formats. This version is fine-tuned for language-only tasks and supports reasoning, tool parameters, and extended context lengths up to 131k tokens. Suitable for general-purpose LLM applications with high reasoning and throughput demands. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
ERNIE X1.1
ernie-x1.1-preview
The Wenxin large model X1.1 has made significant improvements in question answering, tool invocation, intelligent agents, instruction following, logical reasoning, mathematics, and coding tasks, with notable enhancements in factual accuracy. The context length has been extended to 64K tokens, supporting longer inputs and dialogue history, which improves the coherence of long-chain reasoning while maintaining response speed. ⚠️ Note: This model routes through Baidu (China) — privacy and logging guarantees may be limited.
Context: 64,000 tokens
Input: $---/M • Output: $---/M
ERNIE 4.5 VL 28B
baidu/ernie-4.5-vl-28b-a3b
ERNIE 4.5 VL is a multimodal model from Baidu that supports both text and vision tasks. This 28B parameter model with A3B architecture delivers strong performance on various benchmarks.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Doubao
Doubao 1.5 Pro 256k
doubao-1.5-pro-256k
Doubao's (Bytedance) flagship model with a 256k token context window. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Doubao 1.5 Thinking Pro
doubao-1-5-thinking-pro-250415
Doubao-1.5 is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Doubao 1.5 Thinking Vision Pro
doubao-1-5-thinking-vision-pro-250428
Doubao-1.5 is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Doubao 1.5 Thinking Pro Vision
doubao-1-5-thinking-pro-vision-250415
Vision version of Doubao-1.5 pro thinking which is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Doubao 1.5 Pro 32k
doubao-1.5-pro-32k
Doubao's (Bytedance) pro model with a 32k token context window. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Doubao 1.5 Vision Pro 32k
doubao-1.5-vision-pro-32k
Doubao's (Bytedance) vision-enabled pro model (JPG only) with a 32k token context window. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Doubao Seed 1.6
doubao-seed-1-6-250615
Doubao-Seed-1.6 is a brand-new multimodal deep thinking model that supports three thinking modes: auto, thinking, and non-thinking. In non-thinking mode, the model's performance is significantly improved compared to Doubao-1.5-pro/250115. It supports a 256k context window and an output length of up to 16k tokens. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Doubao Seed 1.6 Flash
doubao-seed-1-6-flash-250615
Doubao-Seed-1.6-flash is an extremely fast multimodal deep thinking model, with TPOT requiring only 10ms. It supports both text and visual understanding, with its text comprehension skills surpassing the previous generation lite model and its visual understanding on par with competitor's pro series models. It supports a 256k context window and an output length of up to 16k tokens. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Doubao Seed 1.6 Thinking
doubao-seed-1-6-thinking-250615
The Doubao-Seed-1.6-thinking model has significantly enhanced reasoning capabilities. Compared with Doubao-1.5-thinking-pro, it has further improvements in fundamental abilities such as coding, mathematics, and logical reasoning, and now also supports visual understanding. It supports a 256k context window, with output length supporting up to 16k tokens. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Moonshot AI
Kimi Thinking Preview
kimi-thinking-preview
Kimi Thinking Preview is a new model that is capable of thinking and reasoning. It's quite expensive!
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Kimi K2 Latest
moonshotai/Kimi-K2-Instruct
Points to Kimi K2 0905. Kimi-k2 is a Mixture-of-Experts (MoE) foundation model with exceptional coding and agent capabilities, featuring 1 trillion total parameters and 32 billion activated parameters. In benchmark evaluations covering general knowledge reasoning, programming, mathematics, and agent-related tasks, the K2 model outperforms other leading open-source models. Quantized at FP8.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Kimi K2 0905
moonshotai/Kimi-K2-Instruct-0905
Kimi K2 0905. Kimi-k2 is a Mixture-of-Experts (MoE) foundation model with exceptional coding and agent capabilities, featuring 1 trillion total parameters and 32 billion activated parameters. In benchmark evaluations covering general knowledge reasoning, programming, mathematics, and agent-related tasks, the K2 model outperforms other leading open-source models. Quantized at FP8.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Kimi K2 0711
moonshotai/kimi-k2-instruct-0711
Kimi K2 0711 version. Kimi-k2 is a Mixture-of-Experts (MoE) foundation model with exceptional coding and agent capabilities, featuring 1 trillion total parameters and 32 billion activated parameters. In benchmark evaluations covering general knowledge reasoning, programming, mathematics, and agent-related tasks, the K2 model outperforms other leading open-source models. Quantized at FP8.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Kimi Dev 72B
moonshotai/Kimi-Dev-72B
Kimi Dev 72B is a 72B parameter model from Moonshot AI. 60.4% performance on SWE bench Verified, which is as of June 16th 2025 state of the art for open source models.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Kimi K2 0711 Instruct FP4
baseten/Kimi-K2-Instruct-FP4
Kimi K2 Instruct with FP4 quantization for faster inference while maintaining quality.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Kimi VL Thinking
moonshotai/Kimi-VL-A3B-Thinking
Efficient open-source MoE vision-language model (2.8B active params) with advanced multimodal reasoning, 128K long-context understanding, strong agent capabilities, and long-thinking variant. Excels in multi-turn agent tasks, image/video comprehension, OCR, math reasoning, and multi-image understanding. Competes with GPT-4o-mini, Qwen2.5-VL-7B, Gemma-3-12B-IT, surpasses GPT-4o in some domains.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Kimi K2 0711 Fast
kimi-k2-instruct-fast
Moonshot AI's Kimi K2 model optimized for fast inference. Excellent for chat, reasoning, and general tasks. Quantized at FP8.
Context: 131,072 tokens
Input: $---/M • Output: $---/M

NanoGPT

Study Mode
study_gpt-chatgpt-4o-latest
Study mode uses custom instructions with ChatGPT 4o to maximize learning by encouraging participation, using self reflection and fostering curiosity with supportive feedback. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.
Context: 200,000 tokens
Input: $---/M • Output: $---/M

Auto model
auto-model
Automatically uses the best model for your task. Categorizes the prompt, then uses the model that performs best in that particular category according to global user preferences. Scores updated daily. Ability to set pricing tier in Adjust Settings.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M

Auto model (Basic)
auto-model-basic
Automatically uses the best model for your task, always with the Basic pricing tier.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M

Auto model (Standard)
auto-model-standard
Automatically uses the best model for your task, always with the Standard pricing tier.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M

Auto model (Premium)
auto-model-premium
Automatically uses the best model for your task, always with the Premium pricing tier.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M

Model Recommender
model-selector
Model Recommender - input your query and it will recommend the best model for your task, giving three different options in different price classes.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M

Free model
free-model
Free model to try out our service with.
Context: 8,000 tokens
Input: $---/M • Output: $---/M
Perplexity
Perplexity Deep Research
sonar-deep-research
Analyzes hundreds of sources, delivering expert-level insights in minutes. Deep Research API has a 93.9% accuracy on SimpleQA benchmark and attains a score of 21.1% accuracy on Humanity's Last Exam, significantly outperforming Gemini Thinking, o3-mini, o1, and DeepSeek-R1.
Context: 60,000 tokens
Input: $---/M • Output: $---/M
Perplexity Pro
sonar-pro
Sonar Pro tackles complex questions that need deeper research and provides more sources.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Perplexity Reasoning Pro
sonar-reasoning-pro
Perplexity's Sonar Reasoning Pro uses DeepSeek R1's thinking process combined with looking up on the web to tackle complex questions that need deeper research and provides more sources.
Context: 127,000 tokens
Input: $---/M • Output: $---/M
Perplexity Reasoning
sonar-reasoning
Perplexity's Sonar Reasoning uses DeepSeek R1's thinking process combined with looking up on the web to tackle complex questions that need deeper research and provides more sources.
Context: 127,000 tokens
Input: $---/M • Output: $---/M
Perplexity Simple
sonar
A Perplexity model that gives fast, straightforward answers.
Context: 127,000 tokens
Input: $---/M • Output: $---/M
Perplexity R1 1776
r1-1776
R1 1776 is a version of the DeepSeek R1 model that has been post-trained by Perplexity to provide uncensored, unbiased, and factual information.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Nous Research
Hermes 4 (Thinking)
NousResearch/Hermes-4-70B:thinking
Hermes 4 70B with thinking enabled. Emits explicit reasoning content before final answer when streamed.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Hermes 3 Large
nousresearch/hermes-3-llama-3.1-405b
Llama 3.1 405b with the brakes taken off. Less censored than the regular version, but not abliterated
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Hermes 4 Large
nousresearch/hermes-4-405b
Advanced reasoning model built on Llama-3.1-405B with hybrid thinking modes. Features internal deliberation capabilities, excels at math, code, STEM, and logical reasoning while supporting structured outputs, function calling, and tool use with improved steerability and neutral alignment.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Hermes 4 Medium
nousresearch/hermes-4-70b
Efficient reasoning model based on Llama-3.1-70B. Offers hybrid thinking capabilities with strong performance in math, code, and logical reasoning tasks. Supports structured outputs, JSON mode, and function calling with enhanced steerability.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
DeepHermes-3 Mistral 24B (Preview)
NousResearch/DeepHermes-3-Mistral-24B-Preview
24-B parameter Mistral model fine-tuned by NousResearch for balanced reasoning & creativity.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
NVIDIA
Nvidia Nemotron 70b
nvidia/Llama-3.1-Nemotron-70B-Instruct-HF
Nvidia's latest Llama fine-tune optimized for instruction following. Early results hints that it might outperform models such as GPT-4o and Claude 3.5 Sonnet.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Nemotron 3.1 70B abliterated
huihui-ai/Llama-3.1-Nemotron-70B-Instruct-HF-abliterated
An abliterated (removed restrictions and censorship) version of Llama 3.1 70b Nemotron.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Nvidia Nemotron Super 49B
nvidia/Llama-3.3-Nemotron-Super-49B-v1
Llama-3.3-Nemotron-Super-49B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model's memory footprint, enabling larger workloads, as well as fitting the model on a single GPU at high workloads (H200). This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. For more information on the NAS approach, please refer to this paper.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Nvidia Nemotron Ultra 253B
nvidia/Llama-3.1-Nemotron-Ultra-253B-v1
Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.1-405B-Instruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. The model supports a context length of 128K tokens. This model fits on a single 8xH100 node for inference. Llama-3.1-Nemotron-Ultra-253B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model's memory footprint, enabling larger workloads, as well as reducing the number of GPUs required to run the model in a data center environment. This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. Furthermore, by using a novel method to vertically compress the model (see details here), it also offers a significant improvement in latency.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Nvidia Nemotron Nano 9B v2
nvidia/nvidia-nemotron-nano-9b-v2
Nvidia's efficient 9B parameter model optimized for speed and cost. Nemotron Nano v2 offers excellent performance for its size with enhanced instruction following capabilities.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
01.AI
Yi Lightning
yi-lightning
Chinese-developed multilingual (English, Chinese and others) model by 01.ai that's very fast and cheap, yet scores high on independent leaderboards.
Context: 12,000 tokens
Input: $---/M • Output: $---/M
Yi Large
yi-large
Large version of Yi Lightning with a 32k context window, but more expensive.
Context: 32,000 tokens
Input: $---/M • Output: $---/M
Yi Medium 200k
yi-medium-200k
Medium version of Yi with a 200k context window.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Yi Medium 200k
yi-34b-chat-200k
Medium version of Yi Lightning with a huge 200k context window
Context: 16,000 tokens
Input: $---/M • Output: $---/M
Yi Spark
yi-34b-chat-0205
Small and powerful, lightweight and fast model. Provides enhanced mathematical operation and code writing capabilities.
Context: 16,000 tokens
Input: $---/M • Output: $---/M
Microsoft Azure
Microsoft Deepseek R1
microsoft/MAI-DS-R1-FP8
MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team to improve its responsiveness on blocked topics and its risk profile, while maintaining its reasoning capabilities and competitive performance.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Phi 4 Multimodal
phi-4-multimodal-instruct
Phi 4 by Microsoft. A small multimodal model that can handle images and text.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Phi 4 Mini
phi-4-mini-instruct
Phi 4 Mini by Microsoft. A small multilingual model.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
WizardLM-2 8x22B
microsoft/wizardlm-2-8x22b
Microsoft's advanced Wizard model. The most popular role-playing model.
Context: 65,536 tokens
Input: $---/M • Output: $---/M
StepFun
Step-2 16k Exp
step-2-16k-exp
Step-2 16k Exp is a 16k context window model
Context: 16,000 tokens
Input: $---/M • Output: $---/M
Step-2 Mini
step-2-mini
MiniMax's flagship model with a 1M token context window
Context: 8,000 tokens
Input: $---/M • Output: $---/M
Step-3
step-3
Step3 is a cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators.
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Step R1 V Mini
step-r1-v-mini
Step-R1-V-Mini, which supports image and text input, text output, has good instruction following and general capabilities, can perceive images with high precision and complete complex reasoning tasks.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
AionLabs
Aion 1.0 mini (DeepSeek)
aion-labs/aion-1.0-mini
A distilled version of the DeepSeek-R1 model that excels in reasoning domains like mathematics, coding, and logic.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Aion 1.0
aion-labs/aion-1.0
Aion Labs most powerful reasoning model with high performance across reasoning and coding.
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Llama 3.1 8b (uncensored)
aion-labs/aion-rp-llama-3.1-8b
This is a truly uncensored model, trained to excel at roleplaying and creative writing. However, it can also do other things!
Context: 32,768 tokens
Input: $---/M • Output: $---/M
MiniMax
MiniMax 01
minimax/minimax-01
MiniMax's flagship model with a 1M token context window
Context: 1,000,192 tokens
Input: $---/M • Output: $---/M
MiniMax M1
MiniMax-M1
MiniMax-M1 is a hybrid MoE reasoning model with 40K thinking budget. World's first open-weight, large-scale hybrid-attention model with lightning attention for efficient test-time compute scaling. Excels at complex tasks requiring extensive reasoning.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
MiniMax M1 80K
MiniMaxAI/MiniMax-M1-80k
MiniMax-M1 with 80K thinking budget. Enhanced version of the hybrid MoE reasoning model with double the thinking capacity. Ideal for extremely complex software engineering, tool use, and long context tasks.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
Amazon
Amazon Nova Pro 1.0
amazon/nova-pro-v1
Amazon's new flagship model. Can handle up to 300k input tokens, with comparable performance to ChatGPT and Claude 3.5 Sonnet.
Context: 300,000 tokens
Input: $---/M • Output: $---/M
Amazon Nova Lite 1.0
amazon/nova-lite-v1
Amazon's new lower cost model. Can handle up to 300k input tokens, with faster output but less thorough understanding than Amazon's Nova Pro.
Context: 300,000 tokens
Input: $---/M • Output: $---/M
Amazon Nova Micro 1.0
amazon/nova-micro-v1
Amazon's lowest cost model. Comparable to GPT-4o-mini and Gemini 1.5 Flash, with the fastest output.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Cohere
Cohere: Command R
cohere/command-r
35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Cohere: Command R+
cohere/command-r-plus-08-2024
104B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Cohere Command A (08/2025)
command-a-reasoning-08-2025
Cohere's first reasoning model designed for enterprise customer service and automation. 111B parameters with tool-use capabilities, supports 256K context and 23 languages including English, French, Spanish, Japanese, Arabic, and Hindi. Optimized for document processing, scheduling, data analysis, and more.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Tencent
Hunyuan T1
hunyuan-t1-latest
Hunyuan T1 is Tencent's top tier reasoning model. Good at large scale reasoning, precise following of complex instructions, low hallucations and blazing fast outputs.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Hunyuan Turbo S
hunyuan-turbos-20250226
Hunyuan Turbo S by Tencent is a thinking model that responds instantly.
Context: 24,000 tokens
Input: $---/M • Output: $---/M
Inflection
Inflection 3 Pi
inflection/inflection-3-pi
A chatbot with emotional intelligence. Has access to recent news, excels in scenarios like customer support and roleplay. Mirrors your conversation style.
Context: 8,000 tokens
Input: $---/M • Output: $---/M
Inflection 3 Productivity
inflection/inflection-3-productivity
Optimized for instruction following. Good at tasks that require precise adherence to provided guidelines. Has access to recent news.
Context: 8,000 tokens
Input: $---/M • Output: $---/M
DMind
DMind-1
dmind/dmind-1
Web3-specialized LLM fine-tuned using SFT and RLHF on curated Web3 data. Integrates deep knowledge across DeFi, DAOs, security, and smart contracts. Note: prompts are logged by DMind for model optimization and fine-tuning purposes. Logs are retained for 7 days, then deleted
Context: 32,768 tokens
Input: $---/M • Output: $---/M
DMind-1-Mini
dmind/dmind-1-mini
Mini version of DMind, the Web3-specialized LLM fine-tuned using SFT and RLHF on curated Web3 data. Integrates deep knowledge across DeFi, DAOs, security, and smart contracts. Note: prompts are logged by DMind for model optimization and fine-tuning purposes. Logs are retained for 7 days, then deleted
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Fetch.AI
ASI1 Mini
asi1-mini
ASI-1 Mini introduces next-level adaptive reasoning, context-aware decision-making. It features native reasoning support with four dynamic reasoning modes, intelligently selecting from Multi-Step, Complete, Optimized, and Short Reasoning, balancing depth, efficiency, and precision. Whether tackling complex, multi-layered problems or delivering concise, high-impact insights, ASI-1 Mini ensures reasoning is always tailored to the task at hand. Note: this model is rate limited at the moment.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Other Models
Amoral Gemma3 27B v2
soob3123/amoral-gemma3-27B-v2
Amoral Gemma3 27B v2 is a 27B parameter model that is a more advanced version of Gemma3 27B.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Mistral Devstral Small 2505
mistralai/Devstral-Small-2505
OpenHands+Devstral is 100% local 100% open, and is SOTA for the category on SWE-Bench Verified: 46.8% accuracy.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Veiled Calla 12B
soob3123/Veiled-Calla-12B
Veiled Calla 12B is a 12B parameter model that is a more advanced version of Calla 12B.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
The Omega Abomination V1
ReadyArt/The-Omega-Abomination-L-70B-v1.0
The merger of the Omega Directive M 24b v1.1 and Cydonia 24b v2.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Grok 4
x-ai/grok-4-07-09
Grok 4 0709 by xAI. Their latest and greatest flagship model, offering unparalleled performance in natural language, math and reasoning - the perfect jack of all trades.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Grok 4 Fast
x-ai/grok-4-fast
Grok 4 Fast, xAI’s latest advancement in cost‑efficient reasoning. Built on learnings from Grok 4, it blends reasoning and non‑reasoning in one model with a 2M‑token context window and state‑of‑the‑art cost efficiency.
Context: 2,000,000 tokens
Input: $---/M • Output: $---/M
Grok 4 Fast Thinking
x-ai/grok-4-fast:thinking
Grok 4 Fast with explicit thinking enabled for harder reasoning tasks. 2M context, highly token‑efficient.
Context: 2,000,000 tokens
Input: $---/M • Output: $---/M
Grok Code Fast 1
x-ai/grok-code-fast-1
The coding-specialized version of Grok 4
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Hermes 4 Large (Thinking)
nousresearch/hermes-4-405b:thinking
Hermes 4 Large with thinking enabled. Streams visible reasoning before the final answer.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
OpenReasoning Nemotron 32B
pamanseau/OpenReasoning-Nemotron-32B
OpenReasoning-Nemotron-32B is a reasoning model derived from Qwen2.5-32B-Instruct, post-trained for math, science, and code solution generation. Evaluated with up to 64K output tokens. Available in multiple sizes: 1.5B, 7B, 14B, and 32B.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
inclusionAI Ling Flash 2.0
inclusionAI/Ling-flash-2.0
Low-latency flash model suitable for general chat and assistants.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
inclusionAI Ring Flash 2.0
inclusionAI/Ring-flash-2.0
Low-latency flash model optimized for responsiveness.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Grok 3 Beta
grok-3-beta
Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Grok 3 Fast Beta
grok-3-fast-beta
Faster output version of Grok 3. Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Grok 3 Mini Beta
grok-3-mini-beta
Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It's ideal for reasoning-heavy tasks that don't demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Grok 3 Mini Fast Beta
grok-3-mini-fast-beta
Faster output version of Grok 3 Mini. Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It's ideal for reasoning-heavy tasks that don't demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems.
Context: 131,072 tokens
Input: $---/M • Output: $---/M
Lumimaid v0.2
NeverSleep/Lumimaid-v0.2-70B
Upgrade to Llama-3 Lumimaid 70B. A Llama 3.1 70B finetune trained on curated roleplay data.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
SorcererLM 8x22B
raifle/sorcererlm-8x22b
Advanced roleplaying model with reasoning and emotional intelligence for engaging interactions, contextual awareness and enhanced narrative depth
Context: 16,000 tokens
Input: $---/M • Output: $---/M
MythoMax 13B
Gryphe/MythoMax-L2-13b
One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay.
Context: 4,000 tokens
Input: $---/M • Output: $---/M
Magnum v4 72B
anthracite-org/magnum-v4-72b
Upgraded model of Magnum V2 72B. From the creators of Goliath. Aimed at achieving prose quality similar to Claude Opus 3, trained on 55 million tokens of curated Roleplay data.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
EVA-Qwen2.5-32B-v0.2
EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2
A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
K2-Think
LLM360/K2-Think
K2-Think is a 32B open-weights general reasoning model with strong competitive math performance. Benchmarks: AIME 2024 90.83, AIME 2025 81.24, GPQA-Diamond 71.08, LiveCodeBench v5 63.97.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
MN-LooseCannon-12B-v1
GalrionSoftworks/MN-LooseCannon-12B-v1
Merge of Starcannon and Sao Lyra.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
EVA-Qwen2.5-72B-v0.2
EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2
A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
EVA-LLaMA-3.33-70B-v0.1
EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1
A RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Dolphin 2.9.2 Mixtral 8x22B
cognitivecomputations/dolphin-mixtral-8x22b
Successor to Dolphin 2.6 Mixtral 8x7b. Great for instruction following, conversational, and coding.
Context: 16,000 tokens
Input: $---/M • Output: $---/M
ReMM SLERP 13B
undi95/remm-slerp-l2-13b
A recreation trial of the original MythoMax-L2-B13 but merged with updated models.
Context: 6,144 tokens
Input: $---/M • Output: $---/M
Mercury Coder Small
mercury-coder-small
Model by Inception AI. A diffusion large language model that runs incredibly quickly (500+ tokens/second) while matching Claude 3.5 Haiku and GPT-4o-mini. 1st in speed on Copilot arena, and matching 2nd in quality.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Magnum V2 72B
anthracite-org/magnum-v2-72b
Magnum V2 72B
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Damascus R1.
Steelskull/L3.3-Damascus-R1
Damascus-R1 builds upon some elements of the Nevoria foundation but represents a significant step forward with a completely custom-made DeepSeek R1 Distill base: Hydroblated-R1-V3. Constructed using the new SCE (Select, Calculate, and Erase) merge method, Damascus-R1 prioritizes stability, intelligence, and enhanced awareness.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
LongCat Flash
meituan-longcat/LongCat-Flash-Chat-FP8
560B MoE model with 27B active params, 128K context. Exceptional at agentic tasks with dynamic computation and shortcut-connected architecture for 100+ TPS inference.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
LongCat Flash (Thinking)
meituan-longcat/LongCat-Flash-Thinking-FP8
Thinking‑optimized variant of LongCat Flash. 560B MoE with ~27B active parameters, 128K context, and FP8 inference for high throughput. Adds deliberate step‑by‑step reasoning for complex tasks while preserving the model’s fast agentic performance.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Athene V2 Chat
Nexusflow/Athene-V2-Chat
An open-weights LLM on-par with GPT-4o across benchmarks.
Context: 16,384 tokens
Input: $---/M • Output: $---/M
Jamba Large
jamba-large
Jamba 1.7 with improved grounding and instruction following for more accurate and reliable responses. Ideal for complex reasoning and document analysis tasks with 256k context window.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Jamba Mini
jamba-mini
Jamba Mini 1.7 - smaller and efficient version with improved grounding and instruction following. Perfect for cost-effective tasks while maintaining 256k context window.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Jamba Large 1.7
jamba-large-1.7
Latest Jamba model with improved grounding and instruction following for more accurate and reliable responses. Superior speed while processing large volumes of unstructured data.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Jamba Mini 1.7
jamba-mini-1.7
Latest smaller Jamba model (52B params) with improved grounding and instruction following. Cost-effective option for smaller tasks while maintaining the 256k context window.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Jamba Large 1.6
jamba-large-1.6
Its ability to process large volumes of unstructured data (256k tokens) with high accuracy makes it ideal for summarization and document analysis.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Jamba Mini 1.6
jamba-mini-1.6
Smaller and cheaper version of Jamba Large 1.6 (52B parameters versus 398B parameters), ideal for smaller tasks and lower budgets, still with a 256k context window.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
Hermes 3 Large
hermes-3-llama-3.1-405b
Hermes 3 Llama 3.1 405B for uncensored creative exploration.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Nvidia Nemotron Super 49B v1.5
nvidia/Llama-3_3-Nemotron-Super-49B-v1_5
Advanced 49B parameter reasoning model based on Llama 3.3 architecture, optimized through Neural Architecture Search with reasoning ON/OFF modes. Trained via knowledge distillation using synthetic data from advanced models. Supports 128K context window with improved efficiency over v1.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Tesslate UIGEN-X 32B
Tesslate/UIGEN-X-32B-0727
Reasoning-only UI generation model built on Qwen3-32B architecture. Systematically plans, architects, and implements complete user interfaces across modern development stacks including web (React, Vue, Angular), mobile (React Native, Flutter), desktop (Electron, Tauri), and Python (Streamlit, Gradio) with 21+ visual styles.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
InternVL3 78B
OpenGVLab/InternVL3-78B
InternVL3-78B is a large-scale multimodal large language model with 78 billion parameters, providing state-of-the-art performance on vision-language understanding benchmarks.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
DeepCoder 14B Preview
agentica-org/DeepCoder-14B-Preview
DeepCoder-14B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning (RL) to scale up to long context lengths. The model achieves 60.6% Pass@1 accuracy on LiveCodeBench v5 (8/1/24-2/1/25), representing a 8% improvement over the base model (53%) and achieving similar performance to OpenAI's o3-mini with just 14B parameters.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Shisa V2 Llama 3.3 70B
shisa-ai/shisa-v2-llama3.3-70b
Shisa V2 is a family of bilingual Japanese/English language models ranging from 7B to 70B parameters, optimized for high-quality Japanese language capabilities while maintaining strong English performance.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Hunyuan A13B Instruct
tencent/Hunyuan-A13B-Instruct
Tencent's innovative 80B total/13B active parameter MoE model with fine-grained architecture, dual-mode reasoning (fast/slow thinking), 256K context, and competitive performance across math, science, coding and agent tasks.
Context: 256,000 tokens
Input: $---/M • Output: $---/M
v0 1.5 MD
v0-1.5-md
Vercel model. v0 1.5 MD composite model combines specialized knowledge from RAG, reasoning from state-of-the-art LLMs, and error fixing from custom streaming post-processing. Specialized in building fast, beautiful full-stack web applications with continuously updated framework knowledge. Currently powered by Sonnet 4 base model.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
v0 1.5 LG
v0-1.5-lg
Vercel model. v0 1.5 LG composite model with larger context window and enhanced reasoning for hyper-specialized fields like physics engines, three.js, and multi-step tasks. Better at complex database migrations and architectural decisions. Achieves 89.8% error-free generation rate on web development benchmarks.
Context: 1,000,000 tokens
Input: $---/M • Output: $---/M
v0 1.0 MD
v0-1.0-md
Vercel model. v0 1.0 MD composite model specialized for web development with RAG, error correction, and optimized for code generation. Features custom AutoFix model for real-time error correction and best practice enforcement. Currently powered by Sonnet 3.7 base model.
Context: 200,000 tokens
Input: $---/M • Output: $---/M
Baichuan M2 32B Medical
Baichuan-M2
Medical-enhanced reasoning model built on Qwen2.5-32B with Large Verifier System. Specialized for medical reasoning with breakthrough performance while maintaining general capabilities.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Baichuan 4 Air
Baichuan4-Air
Fast and efficient AI model from Baichuan Intelligence, optimized for quick responses with balanced performance across various tasks.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Baichuan 4 Turbo
Baichuan4-Turbo
High-performance model from Baichuan Intelligence featuring enhanced capabilities for complex reasoning and specialized tasks.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Ling Flash 2.0
Ling-Flash-2.0
Open‑weights MoE model with 100B total parameters and ~6.1B active per token, trained on 20T+ tokens. Delivers strong complex reasoning and code generation (including frontend), with 32K context extendable to 128K via YaRN and efficient, high‑throughput inference.
Context: 65,000 tokens
Input: $---/M • Output: $---/M
ArliAI RpR Ultra 235B
ArliAI-RpR-Ultra-235B
ArliAI's first big model, currently in preview/testing for roleplaying and storytelling
Context: 65,536 tokens
Input: $---/M • Output: $---/M
Venice Uncensored Web
venice-uncensored:web
Venice's uncensored model with native web access included. Built on the Dolphin Mistral 24b model with a very low refusal rate.
Context: 80,000 tokens
Input: $---/M • Output: $---/M
Venice Uncensored
venice-uncensored
Venice's uncensored model. Built on the Dolphin Mistral 24b model with a very low refusal rate.
Context: 128,000 tokens
Input: $---/M • Output: $---/M
Cogito v2 Preview 70B
deepcogito/cogito-v2-preview-llama-70B
Cogito 70B is a dense hybrid reasoning model that combines direct answering capabilities with advanced self-reflection. Built with iterative policy improvement, it delivers strong performance across reasoning tasks while maintaining efficiency through shorter reasoning chains and improved intuition.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Cogito v2 Preview 109B MoE
deepcogito/cogito-v2-preview-llama-109B-MoE
Cogito 109B MoE leverages mixture-of-experts architecture to deliver advanced reasoning capabilities with computational efficiency. This hybrid model excels at both direct responses and complex reasoning tasks while maintaining multimodal capabilities through innovative transfer learning.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Cogito v2 Preview 405B
deepcogito/cogito-v2-preview-llama-405B
Cogito 405B represents a significant step toward frontier intelligence with dense architecture delivering performance competitive with leading closed models. This advanced reasoning system combines policy improvement with massive scale for exceptional capabilities.
Context: 32,768 tokens
Input: $---/M • Output: $---/M
Cogito v2 Preview 671B MoE
deepcogito/cogito-v2-preview-deepseek-671b
Cogito 671B MoE represents one of the strongest open models globally, matching performance of latest Deepseek models while approaching closed frontier systems like o3 and Claude 4 Opus. This advanced system demonstrates significant progress toward scalable superintelligence through policy improvement.
Context: 32,768 tokens
Input: $---/M • Output: $---/M