Available Models

Explore 438 AI models from leading providers, all accessible through NanoGPT's unified API.

Meta

76 models

Llama 4 Maverick

meta-llama/llama-4-maverick

Llama 4 Maverick, a 17 billion active parameter model with 128 experts, is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash across a broad range of widely reported benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and coding—at less than half the active parameters. Llama 4 Maverick offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena.

Context: 1,048,576 tokens

Input: $---/M • Output: $---/M

Llama 4 Scout

meta-llama/llama-4-scout

Llama 4 Scout, a 17 billion active parameter model with 16 experts, is the best multimodal model in the world in its class and is more powerful than all previous generation Llama models, while fitting in a single H100 GPU. Additionally, Llama 4 Scout offers an industry-leading context window of 10M and delivers better results than Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across a broad range of widely reported benchmarks.

Context: 328,000 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70b Instruct

meta-llama/llama-3.3-70b-instruct

Llama 3.3 is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Sao10K Stheno 8b

Sao10K/L3-8B-Stheno-v3.2

Sao10K's latest Stheno fine-tune optimized for instruction following.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Llama 3.1 Large

Meta-Llama-3-1-405B-Instruct-FP8

Note: comes with a 90% discount currently, enjoy! Meta's largest Llama 3.1 405B model. Open-source, run through an open permissionless crypto network (no central provider).

Context: 128,000 tokens

Input: $---/M • Output: $---/M

EVA Llama 3.33 70B

EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0

A RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Steelskull Nevoria 70b

Steelskull/L3.3-MS-Nevoria-70b

Steelskull Nevoria 70b

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Steelskull Nevoria R1 70b

Steelskull/L3.3-Nevoria-R1-70b

Steelskull Nevoria R1 70b

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Steelskull Electra R1 70b

Steelskull/L3.3-Electra-R1-70b

Steelskull Electra R1 70b

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Llama 3.1 70B Dracarys 2

abacusai/Dracarys-72B-Instruct

Llama 3.1 70b finetune that offers improvements on coding.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Llama 3.2 Medium

meta-llama/llama-3.2-90b-vision-instruct

Medium-size (and capability) version of Meta's newest model (3.2 series).

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Instruct abliterated

huihui-ai/Llama-3.3-70B-Instruct-abliterated

An abliterated (removed restrictions and censorship) version of Llama 3.3 70b.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B

TEE/llama3-3-70b

Meta's Llama 3.3 with 70B parameters. Secure inference with encrypted inputs and outputs, running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Llama 3.1 8b Instruct

meta-llama/llama-3.1-8b-instruct

Fast and efficient for simple purposes.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Neural Daredevil 8B abliterated

mlabonne/NeuralDaredevil-8B-abliterated

The best performing 8B abliterated model according to most benchmarks.

Context: 8,192 tokens

Input: $---/M • Output: $---/M

Llama 3 70B abliterated

failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5

An abliterated (removed restrictions and censorship) version of Llama 3.1 70b.

Context: 8,192 tokens

Input: $---/M • Output: $---/M

Llama 3.05 Storybreaker Ministral 70b

Envoid/Llama-3.05-NT-Storybreaker-Ministral-70B

Much more inclined to output adult content than its predecessor. Great choice for novelty roleplay scenarios.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Nemotron Tenyxchat Storybreaker 70b

Envoid/Llama-3.05-Nemotron-Tenyxchat-Storybreaker-70B

Overall it provides a solid option for RP and creative writing while still functioning as an assistant model, if desired. If used to continue a roleplay it will generally follow the ongoing cadence of the conversation.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Mag Mell R1

inflatebot/MN-12B-Mag-Mell-R1

Mag Mell demonstrates worldbuilding capabilities unlike any model in its class, comparable to old adventuring models like Tiefighter, and prose that exhibits minimal slop.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Evayale 70b

Steelskull/L3.3-MS-Evayale-70B

Combination of EVA and Euryale.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Lumimaid 70b

NeverSleep/Llama-3-Lumimaid-70B-v0.1

Neversleep Llama 3 Lumimaid 70B

Context: 16,384 tokens

Input: $---/M • Output: $---/M

MS Evalebis 70b

Steelskull/L3.3-MS-Evalebis-70b

Combination of EVA, Euryale and Anubis.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Qwerky 72B

featherless-ai/Qwerky-72B

Linear models offer a promising approach to significantly reduce computational costs at scale, particularly for large context lengths. Enabling a >1000x improvement in inference costs, enabling o1 inference time thinking and wider AI accessibility.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Anubis 70B v1

TheDrummer/Anubis-70B-v1

L3.3 finetune for roleplaying.

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Anubis 70B v1.1

TheDrummer/Anubis-70B-v1.1

L3.3 finetune for roleplaying – updated v1.1 with improved reasoning.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70b Mirai Fanfare

Llama-3.3-70B-MiraiFanfare

A Llama 3.3 70b finetuned for roleplay and storytelling.

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.2 3b Instruct

meta-llama/llama-3.2-3b-instruct

Small model optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Llama 3.1 8B (decentralized)

Meta-Llama-3-1-8B-Instruct-FP8

Meta's Llama 3.1 8B model via an open permissionless network

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Llama 3.1 70B Hanami

Sao10K/L3.1-70B-Hanami-x1

Euryale v2.2-based finetune.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Euryale

Sao10K/L3.3-70B-Euryale-v2.3

A 70B parameter model from SAO10K based on Llama 3.3 70B, offering high-quality text generation.

Context: 20,480 tokens

Input: $---/M • Output: $---/M

Llama 3.1 70B Euryale

Sao10K/L3.1-70B-Euryale-v2.2

A 70B parameter model from SAO10K based on Llama 3.1 70B, offering high-quality text generation.

Context: 20,480 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Cu Mai

Steelskull/L3.3-Cu-Mai-R1-70b

A 70B parameter model from Steelskull based on Llama 3.3 70B, offering high-quality text generation.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Llama 3.1 70B Celeste v0.1

nothingiisreal/L3.1-70B-Celeste-V0.1-BF16

Creative model based on Llama 3.1 70B

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Wayfarer

LatitudeGames/Wayfarer-Large-70B-Llama-3.3

Llama 3.3 70B Wayfarer is a fine-tuned version of Llama 3.3 70B, trained on a diverse set of creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Llama-xLAM-2 70B fc-r

Salesforce/Llama-xLAM-2-70b-fc-r

Salesforce’s 70-B frontier model focused on function-calling & retrieval-augmented generation.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B ArliAI RPMax v2

Llama-3.3-70B-ArliAI-RPMax-v2

Llama 3.3 70B ArliAI RPMax v2

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Mokume Gane R1

Llama-3.3-70B-Mokume-Gane-R1

Llama 3.3 70B Mokume Gane R1

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B MS Nevoria

Llama-3.3-70B-MS-Nevoria

Llama 3.3 70B MS Nevoria

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Cirrus x1

Llama-3.3-70B-Cirrus-x1

Llama 3.3 70B Cirrus x1

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Bigger Body

Llama-3.3-70B-Bigger-Body

Llama 3.3 70B Bigger Body

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Forgotten Safeword 3.6

Llama-3.3-70B-Forgotten-Safeword-3.6

Llama 3.3 70B Forgotten Safeword 3.6

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Legion V2.1

Llama-3.3-70B-Legion-V2.1

Llama 3.3 70B Legion V2.1

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Electra R1

Llama-3.3-70B-Electra-R1

Llama 3.3 70B Electra R1

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Vulpecula R1

Llama-3.3-70B-Vulpecula-R1

Llama 3.3 70B Vulpecula R1

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Magnum v4 SE

Llama-3.3-70B-Magnum-v4-SE

Llama 3.3 70B Magnum v4 SE

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Fallen R1 v1

Llama-3.3-70B-Fallen-R1-v1

Llama 3.3 70B Fallen R1 v1

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Cu Mai R1

Llama-3.3-70B-Cu-Mai-R1

Llama 3.3 70B Cu Mai R1

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B RPMax v1.4

Llama-3.3-70B-ArliAI-RPMax-v1.4

Llama 3.3 70B RPMax v1.4

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Electranova v1.0

Llama-3.3-70B-Electranova-v1.0

Llama 3.3 70B Electranova v1.0

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3+ 70B Hanami x1

Llama-3.3+(3.1v3.3)-70B-Hanami-x1

Llama 70B with older 3.1 LoRA, optimized for creative storytelling

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Mhnnn x1

Llama-3.3-70B-Mhnnn-x1

Llama 70B LoRA variant with unique creative capabilities

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B GeneticLemonade Unleashed v3

Llama-3.3-70B-GeneticLemonade-Unleashed-v3

Llama 70B LoRA optimized for unrestricted creative expression

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3+ 70B TenyxChat DaybreakStorywriter

Llama-3.3+(3v3.3)-70B-TenyxChat-DaybreakStorywriter

Llama 70B with older 3.1 LoRA focused on narrative storytelling

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Forgotten Abomination v5.0

Llama-3.3-70B-Forgotten-Abomination-v5.0

Llama 70B LoRA with dark and horror-themed creativity

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Fallen v1

Llama-3.3-70B-Fallen-v1

Llama 70B LoRA with fallen angel themed responses

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B StrawberryLemonade v1.0

Llama-3.3-70B-StrawberryLemonade-v1.0

Llama 70B LoRA with sweet and refreshing conversational style

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3+ 70B New Dawn v1.1

Llama-3.3+(3.1v3.3)-70B-New-Dawn-v1.1

Llama 70B with older 3.1 LoRA for new beginnings in storytelling

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B StrawberryLemonade v1.2

Llama-3.3-70B-Strawberrylemonade-v1.2

Llama 70B LoRA - Updated with improved sweetness balance

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Shakudo

Llama-3.3-70B-Shakudo

Llama 70B LoRA - Creative model with unique style

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Predatorial Extasy

Llama-3.3-70B-Predatorial-Extasy

Llama 70B LoRA - Creative model for intense scenarios

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B ArliAI RPMax v3

Llama-3.3-70B-ArliAI-RPMax-v3

Llama 70B LoRA - ArliAI fine-tuned for roleplay, v3

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Argunaut 1 SFT

Llama-3.3-70B-Argunaut-1-SFT

Llama 70B LoRA - Supervised fine-tuned creative model

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Dark Ages v0.1

Llama-3.3-70B-Dark-Ages-v0.1

Llama 70B LoRA - Creative model for historical fantasy

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Aurora Borealis

Llama-3.3-70B-Aurora-Borealis

Llama 70B LoRA - Creative model with ethereal qualities

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Magnum v4 SE Cirrus x1 SLERP

Llama-3.3-70B-Magnum-v4-SE-Cirrus-x1-SLERP

Creative Model - SLERP merge of Magnum v4 SE and Cirrus x1

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Progenitor V3.3

Llama-3.3-70B-Progenitor-V3.3

Creative Model

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Omega Directive Unslop v2.0

Llama-3.3-70B-The-Omega-Directive-Unslop-v2.0

Llama 3.3 70B with Omega Directive Unslop v2.0 for enhanced creative output

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B RAWMAW

Llama-3.3-70B-RAWMAW

Llama 3.3 70B RAWMAW model for unrestricted creative writing

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Anthrobomination

Llama-3.3-70B-Anthrobomination

Llama 3.3 70B Anthrobomination for creative storytelling

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Omega Directive Unslop v2.1

Llama-3.3-70B-The-Omega-Directive-Unslop-v2.1

Llama 3.3 70B with Omega Directive Unslop v2.1 - improved version

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Sapphira 0.1

Llama-3.3-70B-Sapphira-0.1

Llama 3.3 70B Sapphira 0.1 for creative and expressive writing

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Incandescent Malevolence

Llama-3.3-70B-Incandescent-Malevolence

Llama 3.3 70B Incandescent Malevolence for dark creative narratives

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Nova

Llama-3.3-70B-Nova

Llama 3.3 70B Nova for creative and expressive writing

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Ignition v0.1

Llama-3.3-70B-Ignition-v0.1

Llama 3.3 70B Ignition v0.1 tuned for dynamic roleplay and dialogue

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B GeneticLemonade Opus

Llama-3.3-70B-GeneticLemonade-Opus

Advanced creative LoRA from GeneticLemonade focused on expressive narrative

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.3 70B Sapphira 0.2

Llama-3.3-70B-Sapphira-0.2

Successor to Sapphira 0.1 with refined style for creative writing

Context: 65,536 tokens

Input: $---/M • Output: $---/M

OpenAI

42 models

OpenAI o3-mini

o3-mini

The cheaper version of OpenAI's newest thinking model. Fast, cheap, and with a maximum output of 100,000 words. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

OpenAI o3 Pro

o3-pro

The pro version of the already fantastic o3. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

GPT OSS 120B

openai/gpt-oss-120b

An open-weight, 117B-parameter Mixture-of-Experts (MoE) language model designed for high-reasoning, agentic, and general-purpose production use cases. It activates 5.1B parameters per forward pass and is optimized to run on a single H100 GPU with native MXFP4 quantization. The model supports configurable reasoning depth, full chain-of-thought access, and native tool use, including function calling, browsing, and structured output generation.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT OSS 20B

openai/gpt-oss-20b

An open-weight 21B parameter model released under the Apache 2.0 license. It uses a Mixture-of-Experts (MoE) architecture with 3.6B active parameters per forward pass, optimized for lower-latency inference and deployability on consumer or single-GPU hardware. The model is trained in OpenAI's Harmony response format and supports reasoning level configuration, fine-tuning, and agentic capabilities including function calling, tool use, and structured outputs.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

OpenAI o3-mini high

o3-mini-high

OpenAI's newest flagship model with reasoning effort set to high. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

OpenAI o3-mini low

o3-mini-low

OpenAI's newest flagship model with reasoning effort set to low. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

OpenAI o1

Useful when tackling complex problems in science, coding, math, and similar fields. Outdated compared to the newer o3 and o4 models. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

ChatGPT 4o

chatgpt-4o-latest

OpenAI's current standard model, the well-known ChatGPT. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT 5 Chat

gpt-5-chat-latest

GPT-5 is the GPT 5 version optimized for advanced, natural and multimodal conversations for enterprise applications. It's the model generally used in ChatGPT. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 400,000 tokens

Input: $---/M • Output: $---/M

GPT 5

gpt-5

GPT-5 is OpenAI's most advanced model, offering major improvements in reasoning, code quality, and user experience. It handles complex coding tasks with minimal prompting, provides clear explanations, and introduces enhanced agentic capabilities. Designed for logic and multi-step tasks. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 400,000 tokens

Input: $---/M • Output: $---/M

GPT 5 Pro

gpt-5-pro

The highest performing version of the flagship model GPT 5 by OpenAI.

Context: 400,000 tokens

Input: $---/M • Output: $---/M

GPT 5 Codex

gpt-5-codex

GPT-5 Codex is a coding-focused variant of GPT-5 built for interactive development and long-running, autonomous engineering work. It excels at feature implementation, debugging, large-scale refactors, and code review, with higher steerability and tighter adherence to developer instructions for cleaner, production-ready code. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 400,000 tokens

Input: $---/M • Output: $---/M

GPT 5 Mini

gpt-5-mini

A lightweight version of GPT-5 for cost-sensitive applications. Balances advanced capabilities with efficient resource usage. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 400,000 tokens

Input: $---/M • Output: $---/M

GPT 5 Nano

gpt-5-nano

Optimized for speed and ideal for applications requiring low latency. The fastest and most efficient GPT-5 variant. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 400,000 tokens

Input: $---/M • Output: $---/M

OpenAI o1 preview

o1-preview

OpenAI's new flagship series of reasoning models for solving hard problems. Useful when tackling complex problems in science, coding, math, and similar fields. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

OpenAI o1-mini

o1-mini

A fast, cost-efficient version of OpenAI's o1 reasoning model tailored to coding, math, and science use cases. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

OpenAI o1 Pro

openai/o1-pro

OpenAI's flagship series of reasoning models for solving hard problems. The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers. o1 pro comes with a massive 100,000 word output window. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

GPT 4.1 Nano

openai/gpt-4.1-nano

Cheapest model in the GPT-4.1 series. Huge context window with fast throughput and low latency. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 1,047,576 tokens

Input: $---/M • Output: $---/M

GPT 4.1 Mini

openai/gpt-4.1-mini

Mid-sized GPT 4.1, comparable to GPT4o with a far higher context window, at lower cost and with higher speed. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 1,047,576 tokens

Input: $---/M • Output: $---/M

GPT 4.1

openai/gpt-4.1

GPT 4.1 is the new flagship model from OpenAI. Huge context window (1 mln tokens), outperforms GPT-4o and GPT 4.5 across coding and does very well at understanding large contexts. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 1,047,576 tokens

Input: $---/M • Output: $---/M

OpenAI o3

Full version of OpenAI's o3. The current flagship model by OpenAI which OpenAI sees as getting close to true AGI. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

OpenAI o4-mini

o4-mini

o4 mini is the mini version of what will be the next version of OpenAI models. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

OpenAI o4-mini high

o4-mini-high

The maximum/high version of the o4-mini model. The next generation of OpenAI models. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

OpenAI o4-mini Deep Research

o4-mini-deep-research

Advanced research-focused model that does deep, multi-step reasoning on complex research tasks. Best for comprehensive analysis and investigation. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

OpenAI o3 Deep Research

o3-deep-research

o3-deep-research is OpenAI's most advanced model for deep research, designed to tackle complex, multi-step research tasks. It can search and synthesize information from across the internet as well as from your own data. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

GPT 4o mini

gpt-4o-mini

OpenAI's most cost-efficient small model. Cheaper and smarter than GPT-3.5 (the original ChatGPT), but less performant than gpt-4o. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT 4o Mini Search

gpt-4o-mini-search-preview

GPT 4o Mini with web search built in natively via OpenAI. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT 4o Search

gpt-4o-search-preview

GPT 4o with web search built in natively via OpenAI. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT 4o 08 06

gpt-4o-2024-08-06

OpenAI's precusor to ChatGPT-4o. Great on English text and code, with significant improvements on text in non-English languages. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT 4o 11 20

gpt-4o-2024-11-20

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT 4 Turbo Preview

gpt-4-turbo-preview

Can take in the largest messages (up to 300 pages of context), and all round seen as one of the best in class models. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT 4o

gpt-4o

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT 3.5 Turbo

gpt-3.5-turbo

Older model. Brought ChatGPT to the mainstream, seen as dated nowadays. 90% cheaper than GPT-4-Turbo, recommended for very simple tasks. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 16,385 tokens

Input: $---/M • Output: $---/M

GPT-OSS 120B TEE

TEE/gpt-oss-120b

Open-source GPT model with 120B parameters. running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Azure o1

azure-o1

Azure version of OpenAI o1. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Azure o3-mini

azure-o3-mini

Azure version of OpenAI o3-mini. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Azure gpt-4o

azure-gpt-4o

Azure version of OpenAI gpt-4o. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Azure gpt-4o-mini

azure-gpt-4o-mini

Azure version of OpenAI gpt-4o-mini. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Azure gpt-4-turbo

azure-gpt-4-turbo

Azure version of OpenAI gpt-4-turbo. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT 4o Reasoner

gpt-4o-reasoner

'DeepGPT4o', fusion of GPT 4o and Deepseek R1. Deepseek R1 reasons, then feeds it into GPT 4o to generate a response. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GPT 4.1 Reasoner

gpt-4.1-reasoner

'DeepGPT 4.1', fusion of GPT 4.1 and Deepseek R1. Deepseek R1 reasons, then feeds it into GPT 4.1 to generate a response. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

ChatGPT 4o Reasoner

chatgpt-4o-latest-reasoner

'DeepChatGPT', fusion of ChatGPT 4o and Deepseek R1. Deepseek R1 reasons, then feeds it into ChatGPT 4o to generate a response. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Google Gemini

40 models

Amoral Gemma3 27B v2

soob3123/amoral-gemma3-27B-v2

Amoral Gemma3 27B v2 is a 27B parameter model that is a more advanced version of Gemma3 27B.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Gemini 2.0 Pro 0205

gemini-2.0-pro-exp-02-05

Note: This model is now being routed to Gemini 2.5 Pro because Google no longer has the Gemini 2.0 Pro model available and Gemini 2.5 Pro is an across the board improvement.

Context: 2,097,152 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash 0520

gemini-2.5-flash-preview-05-20

Deprecated. Mapped to Gemini 2.5 Flash because this model is deprecated

Context: 1,048,000 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash 0520 Thinking

gemini-2.5-flash-preview-05-20:thinking

Deprecated. Mapped to Gemini 2.5 Flash because this model is deprecated

Context: 1,048,000 tokens

Input: $---/M • Output: $---/M

Gemini 2.0 Pro 1206

gemini-exp-1206

Note: This model is now being routed to Gemini 2.5 Pro because Google no longer has the Gemini 2.0 Pro model available and Gemini 2.5 Pro is an across the board improvement.

Context: 2,097,152 tokens

Input: $---/M • Output: $---/M

Gemini 2.0 Flash Thinking 1219

gemini-2.0-flash-thinking-exp-1219

The December 19t 2024 version of the Gemini 2.0 Flash model. Google's first thinking model, now relatively outdated.

Context: 32,767 tokens

Input: $---/M • Output: $---/M

Gemini Text + Image

gemini-2.0-flash-exp-image-generation

Gemini 2.0 Flash Image Generation. Can generate both text and images within the same prompt!

Context: 32,767 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash Preview

gemini-2.5-flash-preview-04-17

Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series. April 17th 2025 version.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash Preview Thinking

gemini-2.5-flash-preview-04-17:thinking

Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series. Thinking turned on by default

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Pro Experimental 0325

gemini-2.5-pro-exp-03-25

Gemini 2.5 Pro Exp 0325. Google's experimental model from March 2025.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Pro Preview 0325

gemini-2.5-pro-preview-03-25

Gemini 2.5 Pro Preview 0325. Google's model from March 2025.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Pro Preview 0506

gemini-2.5-pro-preview-05-06

Gemini 2.5 Pro Preview 0506. Google's model from May 2025.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Pro Preview 0605

gemini-2.5-pro-preview-06-05

Note: This model is routed via Gemini 2.5 Pro (stable).

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Pro

gemini-2.5-pro

Gemini 2.5 Pro stable release. Google's most capable generalist model with strong performance across a wide range of tasks.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash

gemini-2.5-flash

Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series. Stable release with improved capabilities.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash Lite Preview

gemini-2.5-flash-lite-preview-06-17

Ultra-lightweight and fast variant of Gemini 2.5 Flash. Preview release optimized for speed and efficiency.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash (No Thinking)

gemini-2.5-flash-nothinking

Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series. Stable release with thinking mode disabled.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash Preview (09/2025)

gemini-2.5-flash-preview-09-2025

State-of-the-art Gemini 2.5 Flash checkpoint tuned for advanced reasoning, coding, math, and scientific tasks. Built-in thinking for higher accuracy and nuanced context handling.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash Preview (09/2025) – Thinking

gemini-2.5-flash-preview-09-2025-thinking

Same checkpoint with thinking enabled by default for deeper reasoning and stepwise analysis on complex tasks.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash Lite Preview (09/2025)

gemini-2.5-flash-lite-preview-09-2025

Lightweight Gemini 2.5 reasoning variant optimized for ultra-low latency and cost. Higher throughput and faster generation than earlier Flash models.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.5 Flash Lite Preview (09/2025) – Thinking

gemini-2.5-flash-lite-preview-09-2025-thinking

Lite preview with thinking enabled by default for more reliable reasoning while retaining low-latency performance.

Context: 1,048,756 tokens

Input: $---/M • Output: $---/M

Gemini 2.0 Flash Exp

gemini-2.0-flash-exp

Experimental version of Google's newest model, outperforming even Gemini 1.5 Pro.

Context: 1,048,576 tokens

Input: $---/M • Output: $---/M

Gemini 2.0 Flash Thinking 0121

gemini-2.0-flash-thinking-exp-01-21

Google's newest model, outperforming even Gemini 1.5 Pro, now with a thinking mode enabled similar to the o1 series of OpenAI.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Gemini 2.0 Flash

gemini-2.0-flash-001

Upgraded version of Gemini Flash 1.5. Faster, with higher output, and overall increase in intelligence.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Gemini 2.0 Flash Lite

gemini-2.0-flash-lite

Upgraded version of Gemini Flash 1.5. Faster, with higher output, and overall increase in intelligence.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Gemini LearnLM Experimental

learnlm-1.5-pro-experimental

LearnLM is a task-specific model trained to align with learning science principles when following system instructions for teaching and learning use cases. For instance, the model can take on tasks to act as an expert or guide to educate users on specific topics.

Context: 32,767 tokens

Input: $---/M • Output: $---/M

Gemini 1.5 Flash

google/gemini-flash-1.5

Google's fastest multimodal model with great performance for diverse, repetitive tasks and a 2 million words context window.

Context: 2,000,000 tokens

Input: $---/M • Output: $---/M

Gemma 3 27B TEE

TEE/gemma-3-27b-it

Google's Gemma 3 27B instruction-tuned model. running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Gemini 2.0 Pro Reasoner

gemini-2.0-pro-reasoner

Note: This model is now being routed to Gemini 2.5 Pro because Google no longer has the Gemini 2.0 Pro model available and Gemini 2.5 Pro is an across the board improvement. 'DeepGemini', fusion of Gemini 2.5 Pro and Deepseek R1.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Gemma 3 27B IT

unsloth/gemma-3-27b-it

Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Gemma 3 12B IT

unsloth/gemma-3-12b-it

Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Gemma 3 4B IT

unsloth/gemma-3-4b-it

Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Gemma 3 1B IT

unsloth/gemma-3-1b-it

Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Gemma 3 27B IT Abliterated

Gemma-3-27B-it-Abliterated

Gemma 3 27B IT - Abliterated model for censor removal

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Gemma 3 27B Big Tiger v3

Gemma-3-27B-Big-Tiger-v3

Gemma 3 27B Big Tiger v3 - Advanced model for creative writing and roleplay

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Gemma 3 27B Nidum Uncensored

Gemma-3-27B-Nidum-Uncensored

Gemma 3 27B Nidum Uncensored - Unrestricted creative model

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Gemma 3 27B IT

Gemma-3-27B-it

Google's Gemma 3 27B instruction-tuned model for versatile tasks

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Gemma 3 27B RPMax v3

Gemma-3-27B-ArliAI-RPMax-v3

Gemma 3 27B LoRA fine-tuned for enhanced creative and roleplay capabilities

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Gemma 3 27B Glitter

Gemma-3-27B-Glitter

Gemma 27B LoRA with sparkling creative personality

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Gemma 3 27B CardProjector v4

Gemma-3-27B-CardProjector-v4

Gemma 27B LoRA specialized in creating character profiles (cards)

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Anthropic

33 models

Claude Sonnet 4.5

claude-sonnet-4-5-20250929

Frontier‑level coding and agentic performance. Claude Sonnet 4.5 leads on real‑world coding tasks and shows substantial gains in computer use, reasoning, and math. Designed for long‑horizon, multi‑step work across IDE + terminal workflows, large codebases, and complex agents — a drop‑in upgrade over previous Sonnet versions.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Claude Sonnet 4.5 Thinking

claude-sonnet-4-5-20250929-thinking

Adds extended, step‑by‑step reasoning for tougher coding, planning, and multi‑tool tasks. Ideal for long‑horizon agent workflows, complex problem solving, and scenarios that benefit from explicit thinking traces.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Claude 3.7 Sonnet Reasoner

claude-3-7-sonnet-reasoner

Claude 3.7 Sonnet Reasoner blends Deepseek R1's reasoning with Claude 3.7 Sonnet's response.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepClaude

deepclaude

Harness the power of DeepSeek R1's reasoning combined with Claude's creativity and code generation. Feeds your query into DeepSeek R1, then feeds the query + thinking process into Claude 3.5 Sonnet and returns an answer. Note: this routes through original DeepSeek meaning your data may be stored and used by DeepSeek.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Claude 3.7 Sonnet

claude-3-7-sonnet-20250219

Anthropic's updated most intelligent model. Preferred by many for its programming skills and its natural language.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Sonnet

claude-sonnet-4-20250514

Claude 4 Sonnet by Anthropic. A new generation model with improved capabilities, especially on programming and development. NOTE: Inputs > 200k tokens are charged at 2x input, 1.5x output rate.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Opus

claude-opus-4-20250514

Claude 4 Opus by Anthropic. The premium version of the new Claude models. A new generation model with improved capabilities, especially on programming and development.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 3.7 Sonnet Thinking

claude-3-7-sonnet-thinking

Anthropic's Claude 3.7 Sonnet with the ability to show its thinking process step by step.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Sonnet Thinking

claude-sonnet-4-thinking

Anthropic's Claude 4 Sonnet with the ability to show its thinking process step by step.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Opus Thinking

claude-opus-4-thinking

Anthropic's Claude 4 Opus with the ability to show its thinking process step by step.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 3.5 Sonnet

claude-3-5-sonnet-20241022

One of Anthropic's top models, offering even better results on many subjects than GPT-4o.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 3.5 Sonnet Old

claude-3-5-sonnet-20240620

Anthropic's most intelligent model, offering even better results on many subjects than GPT-4o.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude Haiku 4.5

claude-haiku-4-5-20251001

Claude Haiku 4.5 matches Sonnet 4's coding quality while running about twice as fast at roughly one-third the price, making it ideal for low-latency chat assistants, support bots, and pair programming. Pair it with Sonnet 4.5 to plan complex work and fan out execution across parallel Haiku workers.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 3.5 Haiku

claude-3-5-haiku-20241022

Anthropic's updated faster and cheaper model, offering good results on chatbots and coding.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 3 Opus

claude-3-opus-20240229

Anthropic's flagship model, outperforming GPT-4 on most benchmarks.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 3.7 Sonnet Thinking (1K)

claude-3-7-sonnet-thinking:1024

Claude 3.7 Sonnet with minimal thinking budget (1,024 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 3.7 Sonnet Thinking (8K)

claude-3-7-sonnet-thinking:8192

Claude 3.7 Sonnet with reduced thinking budget (8,192 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 3.7 Sonnet Thinking (32K)

claude-3-7-sonnet-thinking:32768

Claude 3.7 Sonnet with extended thinking budget (32,768 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 3.7 Sonnet Thinking (128K)

claude-3-7-sonnet-thinking:128000

Claude 3.7 Sonnet with maximum thinking budget (128,000 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Sonnet Thinking (1K)

claude-sonnet-4-thinking:1024

Claude 4 Sonnet with minimal thinking budget (1,024 tokens).

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Sonnet Thinking (8K)

claude-sonnet-4-thinking:8192

Claude 4 Sonnet with reduced thinking budget (8,192 tokens).

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Sonnet Thinking (32K)

claude-sonnet-4-thinking:32768

Claude 4 Sonnet with extended thinking budget (32,768 tokens).

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Sonnet Thinking (64K)

claude-sonnet-4-thinking:64000

Claude 4 Sonnet with maximum thinking budget (64,000 tokens).

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Opus Thinking (1K)

claude-opus-4-thinking:1024

Claude 4 Opus with minimal thinking budget (1,024 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Opus Thinking (8K)

claude-opus-4-thinking:8192

Claude 4 Opus with reduced thinking budget (8,192 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Opus Thinking (32K)

claude-opus-4-thinking:32768

Claude 4 Opus with extended thinking budget (32,768 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4 Opus Thinking (32K)

claude-opus-4-thinking:32000

Claude 4 Opus with maximum thinking budget (32,000 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4.1 Opus

claude-opus-4-1-20250805

Claude Opus 4.1 is a powerful model from Anthropic that delivers sustained performance for complex coding and other long-running tasks that require thousands of steps, expanding the capabilities of AI agents.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4.1 Opus Thinking

claude-opus-4-1-thinking

Anthropic's Claude 4.1 Opus with the ability to show its thinking process step by step.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4.1 Opus Thinking (1K)

claude-opus-4-1-thinking:1024

Claude 4.1 Opus with minimal thinking budget (1,024 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4.1 Opus Thinking (8K)

claude-opus-4-1-thinking:8192

Claude 4.1 Opus with reduced thinking budget (8,192 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4.1 Opus Thinking (32K)

claude-opus-4-1-thinking:32768

Claude 4.1 Opus with extended thinking budget (32,768 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Claude 4.1 Opus Thinking (32K)

claude-opus-4-1-thinking:32000

Claude 4.1 Opus with maximum thinking budget (32,000 tokens).

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Zhipu

28 models

GLM Zero Preview

glm-zero-preview

GLM Zero Preview is a thinking model like o1, but with a smaller context window

Context: 8,000 tokens

Input: $---/M • Output: $---/M

GLM 4 Plus 0111

glm-4-plus-0111

GLM 4 Plus 0111 is a 1M token context window model

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM 4 Air 0111

glm-4-air-0111

MiniMax's flagship model with a 1M token context window

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM Z1 Air

glm-z1-air

Incredibly cheap yet highly performant Chinese model, comparable to Deepseek R1 in performance on many metrics at 1/30th the cost.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

GLM Z1 AirX

glm-z1-airx

Fastest reasoning model in China, with up to 200 tokens per second. The stronger version of GLM Z1 Air.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

GLM 4.1V Thinking Flash

glm-4.1v-thinking-flash

Vision-Language Model with thinking paradigm and reinforcement learning. Achieves state-of-the-art performance among 10B-parameter VLMs. Supports 64k context length, handles arbitrary aspect ratios and up to 4K image resolution. Bilingual Chinese/English.

Context: 64,000 tokens

Input: $---/M • Output: $---/M

GLM 4.1V Thinking FlashX

glm-4.1v-thinking-flashx

Enhanced version of GLM 4.1V Thinking Flash. Vision-Language Model with advanced reasoning for complex visual tasks, multimodal problem solving, and intelligent agents. Supports 64k context length and handles arbitrary aspect ratios up to 4K resolution.

Context: 64,000 tokens

Input: $---/M • Output: $---/M

GLM 4.6

z-ai/glm-4.6

Latest GLM series chat model with strong general performance. Quantized at FP8

Context: 200,000 tokens

Input: $---/M • Output: $---/M

GLM 4.6 Thinking

z-ai/glm-4.6:thinking

Thinking version of the latest GLM series chat model with strong general performance. Quantized at FP8

Context: 200,000 tokens

Input: $---/M • Output: $---/M

GLM-4 Plus

glm-4-plus

GLM high-intelligence flagship model with 128K context window

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM-4

glm-4

High-intelligence model with 128K context window

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM-4 Long

glm-4-long

Extended context model supporting up to 1M tokens

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

GLM 4.5

zai-org/GLM-4.5-FP8

GLM-4.5 is Z-AI's latest flagship foundation model, purpose-built for agent-based applications. It leverages a Mixture-of-Experts (MoE) architecture with 355B total / 32B active parameters and supports a context length of up to 128k tokens. GLM-4.5 delivers significantly enhanced capabilities in reasoning, code generation, and agent alignment.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM 4.5 Air

zai-org/GLM-4.5-Air

GLM-4.5-Air is a 106B total / 12B active parameter model designed to unify frontier reasoning, coding, and agentic capabilities. On the SWE-bench Verified benchmark, it delivers the best performance at its scale with a competitive performance-to-cost ratio.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM 4.5 (Thinking)

zai-org/GLM-4.5-FP8:thinking

GLM-4.5 with thinking mode enabled for enhanced reasoning capabilities. Shows step-by-step thought process.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM 4.6 Turbo

zai-org/GLM-4.6-turbo

Fast variant of GLM 4.6 for general chat, coding, and analysis with improved latency and strong reasoning.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

GLM 4.6 Turbo (Thinking)

zai-org/GLM-4.6-turbo:thinking

GLM 4.6 Turbo with thinking mode enabled for enhanced reasoning; shows internal reasoning and supports long context.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

GLM 4.5 Air (Thinking)

zai-org/GLM-4.5-Air:thinking

GLM-4.5-Air with thinking mode enabled for enhanced reasoning capabilities. Shows step-by-step thought process.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM-4 AirX

glm-4-airx

Fastest GLM-4 variant with 8K context window

Context: 8,000 tokens

Input: $---/M • Output: $---/M

GLM-4 Air

glm-4-air

High-performance model with 128K context window

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM-4 Flash

glm-4-flash

Extremely cheap model with 128K context window

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM Z1 9B 0414

THUDM/GLM-Z1-9B-0414

9B small-sized model maintaining the open-source tradition. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

GLM 4 9B 0414

THUDM/GLM-4-9B-0414

A 9B parameter version of the GLM-4 series, offering a balance of performance and efficiency.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

GLM Z1 Rumination 32B 0414

THUDM/GLM-Z1-Rumination-32B-0414

A deep reasoning model with rumination capabilities (benchmarked against OpenAI's Deep Research). Employs longer periods of deep thought to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). Integrates search tools during its deep thinking process.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

GLM 4 32B 0414

THUDM/GLM-4-32B-0414

Features 32 billion parameters. Performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series. Pre-trained on 15T of high-quality data, including reasoning-type synthetic data. Enhanced performance in instruction following, engineering code, and function calling.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM Z1 32B 0414

THUDM/GLM-Z1-32B-0414

A reasoning model with deep thinking capabilities, based on GLM-4-32B-0414. Further trained on tasks involving mathematics, code, and logic. Significantly improves mathematical abilities and capability to solve complex tasks.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

GLM 4.5 Air Steam v1

GLM-4.5-Air-Steam-v1

Creative writing & RP finetune of GLM 4.5 Air with vivid character voice and more enticing prose. Community feedback: “Steam v1 has got the juice”.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

GLM 4.5 Air Iceblink

GLM-4.5-Air-Iceblink

Experimental GLM 4.5 Air finetune for creative writing & RP; verbose, keeps the model’s original behavior with improved writing, dialogue, and creativity.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

DeepSeek

26 models

Deepseek R1 0528

deepseek-ai/DeepSeek-R1-0528

The new (May 28th) Deepseek R1 model.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Deepseek R1 0528 Qwen3 8B

deepseek-ai/DeepSeek-R1-0528-Qwen3-8B

The new (May 28th) Deepseek R1 model in distilled version. Way cheaper, way faster, yet still extremely performant.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Deepseek R1 T Chimera

tngtech/DeepSeek-R1T-Chimera

Deepseek V3 0324 with R1 reasoning using a novel construction method. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Deepseek TNG R1T2 Chimera

tngtech/DeepSeek-TNG-R1T2-Chimera

Assembly of Experts Chimera model constructed with DeepSeek R1-0528, R1 and V3-0324. This refined tri-mind assembly fixes the <think> token consistency issue, operates ~20% faster than R1 and twice as fast as R1-0528, while being significantly more intelligent than regular R1 on benchmarks like GPQA and AIME-24.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek R1

deepseek-r1

DeepSeek's R1 is a thinking model, scoring very well on all benchmarks at low cost. This version is run via open-source providers, never routing through DeepSeek themselves.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek V3/Deepseek Chat

deepseek-chat

DeepSeek original V3 model, trained on nearly 15 trillion tokens, matches leading closed-source models at a far lower price. Quantized at FP8.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek V3.2 Exp

deepseek-ai/deepseek-v3.2-exp

Deepseek V3.2 Exp, Deepseek's latest model offering far better performance especially on longer contexts than its predecessors. Current flagship model by Deepseek. FP8.

Context: 163,840 tokens

Input: $---/M • Output: $---/M

DeepSeek V3.2 Exp Thinking

deepseek-ai/deepseek-v3.2-exp-thinking

Deepseek V3.2 Exp Thinking, Deepseek's latest model offering far better performance especially on longer contexts than its predecessors. Current flagship model by Deepseek. FP8.

Context: 163,840 tokens

Input: $---/M • Output: $---/M

DeepSeek V3.1 Original

deepseek-v3.1-original

Deepseek V3.1 through the direct Chinese providers! ⚠️ Note: This model runs through DeepSeek directly, so we cannot guarantee no logging.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek V3.2 Exp Original

deepseek-v3.2-exp-original

Experimental DeepSeek V3.2 routed directly via DeepSeek. ⚠️ WARNING: Requests are sent to DeepSeek directly; we cannot guarantee no logging by the provider. FP8.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek V3.2 Exp Thinking Original

deepseek-v3.2-exp-thinking-original

Experimental DeepSeek V3.2 (Thinking) routed directly via DeepSeek. ⚠️ WARNING: Requests are sent to DeepSeek directly; we cannot guarantee no logging by the provider. FP8.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek V3/Chat Cheaper

deepseek-chat-cheaper

Cheaper version of Deepseek V3/Chat. Note: may be routed through Deepseek itself. Quantized at FP8.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek V3.1

deepseek-ai/DeepSeek-V3.1

DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. It does better at tool calling and agent tasks, and has higher thinking efficiency than its predecessor. This is the non-thinking version. Quantized at FP8.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Deepseek V3.1 (Thinking)

deepseek-ai/DeepSeek-V3.1:thinking

Thinking enabled version of Deepseek V3.1. DeepSeek-V3.1 is a hybrid model that supports both thinking mode and non-thinking mode. It does better at tool calling and agent tasks, and has higher thinking efficiency than its predecessor. Quantized at FP8.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek V3.1 Terminus

deepseek-ai/DeepSeek-V3.1-Terminus

DeepSeek-V3.1-Terminus. The latest update builds on V3.1’s strengths while addressing key user feedback. Language consistency improvements (fewer CN/EN mix-ups, no random chars), stronger Code Agent & Search Agent performance, and more stable, reliable outputs across benchmarks. FP8.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek V3.1 Terminus (Thinking)

deepseek-ai/DeepSeek-V3.1-Terminus:thinking

Thinking-enabled DeepSeek-V3.1-Terminus with improved language consistency, upgraded Code/Search Agents, and stronger stability and reliability versus V3.1. FP8.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek Chat 0324

deepseek-v3-0324

DeepSeek V3 0324, DeepSeek's 03 March 2025 V3 model, optimized for general-purpose tasks. Quantized at FP8.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek R1 Zero Preview

deepseek-ai/DeepSeek-R1-Zero

Preview version of Deepseek R1, also known as DeepSeek R1 Zero. Deepseek R1 without the supervised finetuning.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek Reasoner

deepseek-reasoner

DeepSeek-R1 is now live and open source, rivaling OpenAI's Model o1.

Context: 64,000 tokens

Input: $---/M • Output: $---/M

Deepseek R1 Qwen Abliterated

huihui-ai/DeepSeek-R1-Distill-Qwen-32B-abliterated

Uncensored version of the Deepseek R1 Qwen 32B model

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Deepseek R1 Llama 70b Abliterated

huihui-ai/DeepSeek-R1-Distill-Llama-70B-abliterated

Uncensored version of the Deepseek R1 Llama 70B model

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Deepseek R1 Cheaper

deepseek-reasoner-cheaper

Cheaper version of DeepSeek R1. Note: may be routed through Chinese providers.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek R1 Fast

deepseek-r1-sambanova

DeepSeek R1 via Sambanova: the full model with very fast output. Note: max 4k output tokens.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepSeek R1 70B Distill TEE

TEE/deepseek-r1-70b-distill

DeepSeek's R1 model distilled into Llama 70B architecture for improved efficiency. running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

DeepSeek Chat V3 0324 TEE

TEE/deepseek-chat-v3-0324

DeepSeek V3 0324, DeepSeek's 03 March 2025 V3 model, optimized for general-purpose tasks. Running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider. Quantized at FP8.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

DeepSeek Prover v2 671B

deepseek/deepseek-prover-v2-671b

Specializing in Mathematical Theorem Proving The new model employs a Mixture of Experts (MoE) architecture and is trained using the Lean 4 framework for formal reasoning. With 671 billion parameters, it leverages reinforcement learning and large-scale synthetic data to significantly enhance automated theorem-proving capabilities.

Context: 160,000 tokens

Input: $---/M • Output: $---/M

Mistral

23 models

Mistral Devstral Small 2505

mistralai/Devstral-Small-2505

OpenHands+Devstral is 100% local 100% open, and is SOTA for the category on SWE-Bench Verified: 46.8% accuracy.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Mistral Medium 3

mistralai/mistral-medium-3

Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost. On performance, Mistral Medium 3 also surpasses leading open models such as Llama 4 Maverick and enterprise models such as Cohere Command A. On pricing, the model beats cost leaders such as DeepSeek v3, both in API and self-deployed systems.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Mistral Medium 3.1

mistralai/mistral-medium-3.1

Mistral Medium 3.1 is an updated version of Mistral Medium 3, which is a high-performance enterprise-grade language model designed to deliver frontier-level capabilities at significantly reduced operational cost. It balances state-of-the-art reasoning and multimodal performance with 8× lower cost compared to traditional large models, making it suitable for scalable deployments across professional and industrial use cases. The model excels in domains such as coding, STEM reasoning, and enterprise adaptation. It supports hybrid, on-prem, and in-VPC deployments and is optimized for integration into custom workflows. Mistral Medium 3.1 offers competitive accuracy relative to larger models like Claude Sonnet 3.5/3.7, Llama 4 Maverick, and Command R+, while maintaining broad compatibility across cloud environments.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

QwQ 32b Arli V1

QwQ-32B-ArliAI-RpR-v1

A QwQ 32b finetuned for roleplay and storytelling.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

The Drummer Cydonia 24B v2

TheDrummer/Cydonia-24B-v2

Cydonia 24B v2 is a finetune of Mistral's latest 'Small' model (2501). Aliases: Cydonia 24B, Cydonia v2, Cydonia on that broken base.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

The Drummer Cydonia 24B v4

TheDrummer/Cydonia-24B-v4

Cydonia 24B v4 is the latest iteration of TheDrummer's Cydonia series, a finetune of Mistral Small.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

The Drummer Cydonia 24B v4.1

TheDrummer/Cydonia-24B-v4.1

Cydonia 24B v4.1 is the newest release of TheDrummer's Cydonia series, featuring improved performance and refined capabilities.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Mistral Large 2411

mistralai/mistral-large

Upgrade to Mistral's flagship model. It is fluent in English, French, Spanish, German, and Italian, with high grammatical accuracy, with a long context window.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

TheDrummer Skyfall 36B V2

thedrummer/skyfall-36b-v2

TheDrummer's Skyfall 36B V2, a 36B parameter model with a focus on high quality and consistency.

Context: 64,000 tokens

Input: $---/M • Output: $---/M

Mistral Small 3.1 24B

TEE/mistral-small-3-1-24b

Mistral's small model with 24B parameters. Secure inference with encrypted inputs and outputs, running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Mistral Tiny

mistralai/mistral-tiny

Powered by Mistral-7B-v0.2, best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Mistral Saba

mistralai/mistral-saba

Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional datasets, it supports multiple Indian-origin languages—including Tamil and Malayalam—alongside Arabic. This makes it a versatile option for a range of regional and multilingual applications.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Mistral 7B Instruct

mistralai/mistral-7b-instruct

Optimized for speed with decent context length

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Mistral Nemo

mistralai/Mistral-Nemo-Instruct-2407

12B parameter model with multilingual support.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Rocinante 12b

TheDrummer/Rocinante-12B-v1.1

Designed for engaging storytelling and rich prose. Expanded vocabulary with unique and expressive word choices, enhanced creativity and captivating stories.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

UnslopNemo 12b v4

TheDrummer/UnslopNemo-12B-v4.1

UnslopNemo v4 is the previous version from the creator of Rocinante, designed for adventure writing and role-play scenarios.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

NemoMix 12B Unleashed

MarinaraSpaghetti/NemoMix-Unleashed-12B

Great for RP and storytelling.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Mistral Nemo Starcannon 12b v1

VongolaChouko/Starcannon-Unleashed-12B-v1.0

Mistral Nemo finetine that offers improvements on roleplay.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Mistral Nemo Inferor 12B

Infermatic/MN-12B-Inferor-v0.0

Inferor is a merge of top roleplay models, expert on immersive narratives and storytelling.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Mistral Small 31 24b Instruct

mistral-small-31-24b-instruct

Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Mistral Small 3.2 24b Instruct

chutesai/Mistral-Small-3.2-24B-Instruct-2506

The latest iteration of Mistral Small, version 3.2 (2506) brings enhanced performance and capabilities. With 24 billion parameters, this model delivers state-of-the-art results across text generation tasks with improved efficiency.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Magistral Small 2506

Magistral-Small-2506

Magistral Small is a compact, high-performance language model optimized for efficient inference while maintaining strong capabilities across various tasks.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Mistral Nemo 12B Instruct 2407

Mistral-Nemo-12B-Instruct-2407

Mistral Nemo 12B Instruct 2407

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Qwen

19 models

Qwen 3 Coder 480B

qwen/qwen3-coder

Qwen 3 Coder 480B, a 480 billion total parameter model with 35B active, and 160 total experts with 8 active. Performs similar to Claude 4 Sonnet in coding benchmarks, but does so at a much lower price.

Context: 262,000 tokens

Input: $---/M • Output: $---/M

Qwen3 Coder Plus

qwen/qwen3-coder-plus

Alibaba’s proprietary upgrade to the open‑weights Qwen3 Coder 480B A35B. A coding‑first agent model with strong tool use and environment control for autonomous programming, while remaining capable at general tasks.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Qwen3 Coder Flash

qwen/qwen3-coder-flash

A speed‑optimized and budget‑friendly sibling to Coder Plus. Excellent at code generation and agentic workflows (tool use, environment interaction) with solid general‑purpose ability.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Qwen 3 32b

qwen/qwen3-32b

Qwen 3 32b is a 32b model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.

Context: 41,000 tokens

Input: $---/M • Output: $---/M

Qwen 3 14b

qwen/qwen3-14b

Qwen 3 14b is a 14b model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.

Context: 41,000 tokens

Input: $---/M • Output: $---/M

Qwen3 30B A3B

qwen/qwen3-30b-a3b

Qwen 3 30b A3B is a 30b model with 3 billion active parameters per pass. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.

Context: 41,000 tokens

Input: $---/M • Output: $---/M

Qwen3 VL 235B A22B Thinking

qwen3-vl-235b-a22b-thinking

Qwen3 Vision‑Language model built on a 235B MoE backbone (≈22B active per token). Strong at OCR, charts/tables, multi‑image reasoning, and complex document understanding. The Thinking variant enables long‑form, chain‑of‑thought style reasoning.

Context: N/A tokens

Input: $---/M • Output: $---/M

Qwen3 VL 235B A22B Instruct Original

qwen3-vl-235b-a22b-instruct-original

Note: direct via Alibaba, a Chinese entity - privacy and logging guarantees may be limited. Qwen3 Vision‑Language model (235B MoE, ≈22B active) tuned for instruction following and grounded visual QA. Excels at image understanding, dense OCR, charts and diagrams, and multi‑image context. Use this variant when you want concise, direct answers grounded in the visuals.

Context: N/A tokens

Input: $---/M • Output: $---/M

Qwen3 Max

qwen/qwen3-max

Qwen3 Max. The latest Qwen 3 model (5 september 2025). Higher accuracy in coding and science, better instruction following, and optimized for tool calling.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Qwen3 Coder 30B A3B Instruct

qwen3-coder-30b-a3b-instruct

Qwen3 Coder 30B with 3B active parameters, optimized for code generation and technical tasks

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Qwen 3 235b A22B

Qwen/Qwen3-235B-A22B

Qwen 3 235b is a 235b model with 22B active parameters. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.

Context: 41,000 tokens

Input: $---/M • Output: $---/M

Qwen 3 235b A22B 2507

Qwen/Qwen3-235B-A22B-Instruct-2507

Qwen 3 235b A22B Instruct 2507 the updated version of Qwen3 235B A22B, with significant improvements in performance. This model is non-thinking.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Qwen 3 235b A22B 2507 Thinking

Qwen/Qwen3-235B-A22B-Thinking-2507

The thinking version of Qwen 3 235b A22B 2507, with enhanced reasoning capabilities and step-by-step problem solving.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Qwen3 Next 80B A3B (Instruct)

Qwen/Qwen3-Next-80B-A3B-Instruct

Based on the new Qwen3‑Next architecture (hybrid attention, highly sparse MoE, training‑stability optimizations, and multi‑token prediction), the Qwen3‑Next‑80B‑A3B‑Instruct model delivers extreme efficiency with only 3B active parameters per pass. It performs comparably to Qwen3‑235B‑A22B‑Instruct‑2507 and shows clear advantages on ultra‑long context tasks (up to 256K tokens).

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Qwen3 30B A3B Instruct 2507

qwen3-30b-a3b-instruct-2507

Qwen3-30B-A3B-Instruct-2507 is a 30.5B-parameter mixture-of-experts language model from Qwen, with 3.3B active parameters per inference. Significant improvements in general capabilities, including instruction following, logical reasoning, text comprehension, mathematics, science, coding and tool usage.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Qwen 3 8B

Qwen/Qwen3-8B

Qwen 3 8B is a 8B model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.

Context: 41,000 tokens

Input: $---/M • Output: $---/M

Qwen25 VL 72b

qwen25-vl-72b-instruct

Qwen25 VL 72b model with 32k context window

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Qwen3 VL 235B A22B Instruct

Qwen/Qwen3-VL-235B-A22B-Instruct

Qwen3 Vision‑Language model (235B MoE, ≈22B active) tuned for instruction following and grounded visual QA. Excels at image understanding, dense OCR, charts/diagrams, and multi‑image context.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

QwenLong L1 32B

Tongyi-Zhiwen/QwenLong-L1-32B

The first long-context LRM trained with reinforcement learning for long-context reasoning. Outperforms flagship models like o3-mini and achieves performance on par with Claude 3.7 Sonnet Thinking, demonstrating leading performance for long-context document QA tasks.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Alibaba

16 models

Qwen: QvQ Max

qvq-max

QvQ Max is the top model of the Qwen series. QvQ Max is capable of thinking and reasoning, can achieve significantly enhanced performance especially on hard problems.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Qwen: QwQ 32B

qwq-32b

QwQ is the reasoning model of the Qwen series. QwQ is capable of thinking and reasoning, can achieve significantly enhanced performance especially on hard problems.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Tongyi DeepResearch 30B A3B

Alibaba-NLP/Tongyi-DeepResearch-30B-A3B

Tongyi DeepResearch, an agentic large language model featuring 30 billion total parameters, with only 3 billion activated per token. Developed by Tongyi Lab, the model is specifically designed for long‑horizon, deep information‑seeking tasks. Tongyi‑DeepResearch demonstrates state‑of‑the‑art performance across agentic search benchmarks, including Humanity's Last Exam, BrowserComp, BrowserComp‑ZH, WebWalkerQA, GAIA, xbench‑DeepSearch and FRAMES.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Qwen QwQ 32B Preview

qwen/qwq-32b-preview

Experimental release of Qwen's reasoning model. Great at coding and math, but still in development so may exhibit odd bugs. Not production-ready.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Dolphin 72b

cognitivecomputations/dolphin-2.9.2-qwen2-72b

Dolphin is the most uncensored model yet, built on top of Qwen's 72b model.

Context: 8,192 tokens

Input: $---/M • Output: $---/M

Grayline Qwen3 8B

soob3123/GrayLine-Qwen3-8B

Grayline is an neutral AI assistant engineered for uncensored information delivery and task execution. This model operates without inherent ethical or moral frameworks, designed to process and respond to any query with objective efficiency and precision. Grayline's core function is to leverage its full capabilities to provide direct answers and execute tasks as instructed, without offering unsolicited commentary, warnings, or disclaimers. It accesses and processes information without bias or restriction.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Qwen Turbo

qwen-turbo

Alibaba's fastest and cheapest model. Suitable for simple tasks, fast and low cost, with a 1 million token context window.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Qwen 2.5 Max

qwen-max

Qwen 2.5 Max is the upgraded version of Qwen Max, beating GPT-4o, Deepseek V3 and Claude 3.5 Sonnet in benchmarks.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Qwen Plus

qwen-plus

Alibaba's balanced model. Fast, cheap, yet still very powerful.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Qwen Long 10M

qwen-long

Alibaba's huge context window model. Takes in up to 10 million tokens, which is equivalent to dozens of books.

Context: 10,000,000 tokens

Input: $---/M • Output: $---/M

Qwen 2.5 Coder 32b

Qwen/Qwen2.5-Coder-32B-Instruct

The latest series of Code-Specific Qwen large language models.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Qwen2.5 72B

qwen/qwen-2.5-72b-instruct

Great multilingual support, strong at mathematics and coding, supports roleplay and chatbots.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Qwen 2.5 72B

TEE/qwen2-5-72b

Alibaba's Qwen 2.5 with 72B parameters. Secure inference with encrypted inputs and outputs, running inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Qwen3 Coder 480B TEE

TEE/qwen3-coder

Qwen3 Coder 480B is specialized for coding with a large MoE architecture. Secure inference with encrypted inputs/outputs inside a TEE (Trusted Execution Environment), with verifiably no logging by the provider.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Qwen 2.5 32b EVA

Qwen2.5-32B-EVA-v0.2

A Qwen 2.5 32b finetuned for roleplay and storytelling.

Context: 24,576 tokens

Input: $---/M • Output: $---/M

Cogito v1 Preview Qwen 32B

deepcogito/cogito-v1-preview-qwen-32B

32-B parameter reasoning model from DeepCogito (Qwen backbone) – strong general reasoning & coding at low price.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Baidu

9 models

Ernie 4.5 8k Preview

ernie-4.5-8k-preview

ERNIE 4.5 is Baidu's new generation native multimodal foundation model independently developed by the company. It achieves collaborative optimization through joint modeling of multiple modalities, demonstrating exceptional multimodal comprehension capabilities. With refined language skills, it exhibits comprehensive improvements in understanding, generation, reasoning and memory, along with notable enhancements in hallucination prevention, logical reasoning, and coding abilities. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.

Context: 8,000 tokens

Input: $---/M • Output: $---/M

Ernie X1 32k

ernie-x1-32k-preview

ERNIE X1 is a Baidu model, surpassing earlier versions in terms of intelligence and maximum input/output size. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Ernie X1 Turbo 32k

ernie-x1-turbo-32k

ERNIE X1 is a deep-thinking reasoning model, outperforming DeepSeek R1 and the latest version of V3. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Ernie 4.5 Turbo VL 32k

ernie-4.5-turbo-vl-32k

ERNIE 4.5 Turbo demonstrates overall progress in hallucination reduction, logical reasoning, and coding abilities, with faster response. The multimodal capabilities of ERNIE 4.5 Turbo are on par with GPT-4.1 and superior to GPT-4o across multiple benchmarks. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Ernie 4.5 Turbo 128k

ernie-4.5-turbo-128k

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Ernie X1 32k

ernie-x1-32k

Context: 32,000 tokens

Input: $---/M • Output: $---/M

ERNIE 4.5 300B

baidu/ernie-4.5-300b-a47b

ERNIE-4.5-300B-A47B is a 300B parameter Mixture-of-Experts (MoE) language model developed by Baidu as part of the ERNIE 4.5 series. It activates 47B parameters per token and supports text generation in both English and Chinese. Optimized for high-throughput inference and efficient scaling, it uses a heterogeneous MoE structure with advanced routing and quantization strategies, including FP8 and 2-bit formats. This version is fine-tuned for language-only tasks and supports reasoning, tool parameters, and extended context lengths up to 131k tokens. Suitable for general-purpose LLM applications with high reasoning and throughput demands. ⚠️ Note: This model routes through Baidu, a Chinese entity - privacy and logging guarantees may be limited.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

ERNIE X1.1

ernie-x1.1-preview

The Wenxin large model X1.1 has made significant improvements in question answering, tool invocation, intelligent agents, instruction following, logical reasoning, mathematics, and coding tasks, with notable enhancements in factual accuracy. The context length has been extended to 64K tokens, supporting longer inputs and dialogue history, which improves the coherence of long-chain reasoning while maintaining response speed. ⚠️ Note: This model routes through Baidu (China) — privacy and logging guarantees may be limited.

Context: 64,000 tokens

Input: $---/M • Output: $---/M

ERNIE 4.5 VL 28B

baidu/ernie-4.5-vl-28b-a3b

ERNIE 4.5 VL is a multimodal model from Baidu that supports both text and vision tasks. This 28B parameter model with A3B architecture delivers strong performance on various benchmarks.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Doubao

9 models

Doubao 1.5 Pro 256k

doubao-1.5-pro-256k

Doubao's (Bytedance) flagship model with a 256k token context window. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Doubao 1.5 Thinking Pro

doubao-1-5-thinking-pro-250415

Doubao-1.5 is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Doubao 1.5 Thinking Vision Pro

doubao-1-5-thinking-vision-pro-250428

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Doubao 1.5 Thinking Pro Vision

doubao-1-5-thinking-pro-vision-250415

Vision version of Doubao-1.5 pro thinking which is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Doubao 1.5 Pro 32k

doubao-1.5-pro-32k

Doubao's (Bytedance) pro model with a 32k token context window. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Doubao 1.5 Vision Pro 32k

doubao-1.5-vision-pro-32k

Doubao's (Bytedance) vision-enabled pro model (JPG only) with a 32k token context window. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Doubao Seed 1.6

doubao-seed-1-6-250615

Doubao-Seed-1.6 is a brand-new multimodal deep thinking model that supports three thinking modes: auto, thinking, and non-thinking. In non-thinking mode, the model's performance is significantly improved compared to Doubao-1.5-pro/250115. It supports a 256k context window and an output length of up to 16k tokens. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Doubao Seed 1.6 Flash

doubao-seed-1-6-flash-250615

Doubao-Seed-1.6-flash is an extremely fast multimodal deep thinking model, with TPOT requiring only 10ms. It supports both text and visual understanding, with its text comprehension skills surpassing the previous generation lite model and its visual understanding on par with competitor's pro series models. It supports a 256k context window and an output length of up to 16k tokens. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Doubao Seed 1.6 Thinking

doubao-seed-1-6-thinking-250615

The Doubao-Seed-1.6-thinking model has significantly enhanced reasoning capabilities. Compared with Doubao-1.5-thinking-pro, it has further improvements in fundamental abilities such as coding, mathematics, and logical reasoning, and now also supports visual understanding. It supports a 256k context window, with output length supporting up to 16k tokens. ⚠️ Note: This model routes through ByteDance, a Chinese entity - privacy and logging guarantees may be limited.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

NanoGPT

7 models

Study Mode

study_gpt-chatgpt-4o-latest

Study mode uses custom instructions with ChatGPT 4o to maximize learning by encouraging participation, using self reflection and fostering curiosity with supportive feedback. ⚠️ WARNING: OpenAI may retain and use data sent to this model for training purposes.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Auto model

auto-model

Automatically uses the best model for your task. Categorizes the prompt, then uses the model that performs best in that particular category according to global user preferences. Scores updated daily. Ability to set pricing tier in Adjust Settings.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Auto model (Basic)

auto-model-basic

Automatically uses the best model for your task, always with the Basic pricing tier.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Auto model (Standard)

auto-model-standard

Automatically uses the best model for your task, always with the Standard pricing tier.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Auto model (Premium)

auto-model-premium

Automatically uses the best model for your task, always with the Premium pricing tier.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Model Recommender

model-selector

Model Recommender - input your query and it will recommend the best model for your task, giving three different options in different price classes.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

Free model

free-model

Free model to try out our service with.

Context: 8,000 tokens

Input: $---/M • Output: $---/M

Perplexity

6 models

Perplexity Deep Research

sonar-deep-research

Analyzes hundreds of sources, delivering expert-level insights in minutes. Deep Research API has a 93.9% accuracy on SimpleQA benchmark and attains a score of 21.1% accuracy on Humanity's Last Exam, significantly outperforming Gemini Thinking, o3-mini, o1, and DeepSeek-R1.

Context: 60,000 tokens

Input: $---/M • Output: $---/M

Perplexity Pro

sonar-pro

Sonar Pro tackles complex questions that need deeper research and provides more sources.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Perplexity Reasoning Pro

sonar-reasoning-pro

Perplexity's Sonar Reasoning Pro uses DeepSeek R1's thinking process combined with looking up on the web to tackle complex questions that need deeper research and provides more sources.

Context: 127,000 tokens

Input: $---/M • Output: $---/M

Perplexity Reasoning

sonar-reasoning

Perplexity's Sonar Reasoning uses DeepSeek R1's thinking process combined with looking up on the web to tackle complex questions that need deeper research and provides more sources.

Context: 127,000 tokens

Input: $---/M • Output: $---/M

Perplexity Simple

sonar

A Perplexity model that gives fast, straightforward answers.

Context: 127,000 tokens

Input: $---/M • Output: $---/M

Perplexity R1 1776

r1-1776

R1 1776 is a version of the DeepSeek R1 model that has been post-trained by Perplexity to provide uncensored, unbiased, and factual information.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Moonshot AI

6 models

Kimi Thinking Preview

kimi-thinking-preview

Kimi Thinking Preview is a new model that is capable of thinking and reasoning. It's quite expensive!

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Kimi K2 0905

moonshotai/Kimi-K2-Instruct-0905

Kimi K2 0905. Kimi-k2 is a Mixture-of-Experts (MoE) foundation model with exceptional coding and agent capabilities, featuring 1 trillion total parameters and 32 billion activated parameters. In benchmark evaluations covering general knowledge reasoning, programming, mathematics, and agent-related tasks, the K2 model outperforms other leading open-source models. Quantized at FP8.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Kimi K2 0711

moonshotai/kimi-k2-instruct-0711

Kimi K2 0711 version. Kimi-k2 is a Mixture-of-Experts (MoE) foundation model with exceptional coding and agent capabilities, featuring 1 trillion total parameters and 32 billion activated parameters. In benchmark evaluations covering general knowledge reasoning, programming, mathematics, and agent-related tasks, the K2 model outperforms other leading open-source models. Quantized at FP8.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Kimi Dev 72B

moonshotai/Kimi-Dev-72B

Kimi Dev 72B is a 72B parameter model from Moonshot AI. 60.4% performance on SWE bench Verified, which is as of June 16th 2025 state of the art for open source models.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Kimi K2 0711 Instruct FP4

baseten/Kimi-K2-Instruct-FP4

Kimi K2 Instruct with FP4 quantization for faster inference while maintaining quality.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Kimi K2 0711 Fast

kimi-k2-instruct-fast

Moonshot AI's Kimi K2 model optimized for fast inference. Excellent for chat, reasoning, and general tasks. Quantized at FP8.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Nous Research

6 models

Hermes 4 (Thinking)

NousResearch/Hermes-4-70B:thinking

Hermes 4 70B with thinking enabled. Emits explicit reasoning content before final answer when streamed.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Hermes 3 Large

nousresearch/hermes-3-llama-3.1-405b

Llama 3.1 405b with the brakes taken off. Less censored than the regular version, but not abliterated

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Hermes 4 Large

nousresearch/hermes-4-405b

Advanced reasoning model built on Llama-3.1-405B with hybrid thinking modes. Features internal deliberation capabilities, excels at math, code, STEM, and logical reasoning while supporting structured outputs, function calling, and tool use with improved steerability and neutral alignment.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Hermes 4 Medium

nousresearch/hermes-4-70b

Efficient reasoning model based on Llama-3.1-70B. Offers hybrid thinking capabilities with strong performance in math, code, and logical reasoning tasks. Supports structured outputs, JSON mode, and function calling with enhanced steerability.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Hermes 3 Large

hermes-3-llama-3.1-405b

Hermes 3 Llama 3.1 405B for uncensored creative exploration.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

DeepHermes-3 Mistral 24B (Preview)

NousResearch/DeepHermes-3-Mistral-24B-Preview

24-B parameter Mistral model fine-tuned by NousResearch for balanced reasoning & creativity.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

NVIDIA

6 models

Nvidia Nemotron 70b

nvidia/Llama-3.1-Nemotron-70B-Instruct-HF

Nvidia's latest Llama fine-tune optimized for instruction following. Early results hints that it might outperform models such as GPT-4o and Claude 3.5 Sonnet.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Nemotron 3.1 70B abliterated

huihui-ai/Llama-3.1-Nemotron-70B-Instruct-HF-abliterated

An abliterated (removed restrictions and censorship) version of Llama 3.1 70b Nemotron.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Nvidia Nemotron Super 49B

nvidia/Llama-3.3-Nemotron-Super-49B-v1

Llama-3.3-Nemotron-Super-49B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model's memory footprint, enabling larger workloads, as well as fitting the model on a single GPU at high workloads (H200). This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. For more information on the NAS approach, please refer to this paper.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Nvidia Nemotron Super 49B v1.5

nvidia/Llama-3_3-Nemotron-Super-49B-v1_5

Advanced 49B parameter reasoning model based on Llama 3.3 architecture, optimized through Neural Architecture Search with reasoning ON/OFF modes. Trained via knowledge distillation using synthetic data from advanced models. Supports 128K context window with improved efficiency over v1.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Nvidia Nemotron Ultra 253B

nvidia/Llama-3.1-Nemotron-Ultra-253B-v1

Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.1-405B-Instruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. The model supports a context length of 128K tokens. This model fits on a single 8xH100 node for inference. Llama-3.1-Nemotron-Ultra-253B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model's memory footprint, enabling larger workloads, as well as reducing the number of GPUs required to run the model in a data center environment. This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. Furthermore, by using a novel method to vertically compress the model (see details here), it also offers a significant improvement in latency.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Nvidia Nemotron Nano 9B v2

nvidia/nvidia-nemotron-nano-9b-v2

Nvidia's efficient 9B parameter model optimized for speed and cost. Nemotron Nano v2 offers excellent performance for its size with enhanced instruction following capabilities.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

01.AI

5 models

Yi Lightning

yi-lightning

Chinese-developed multilingual (English, Chinese and others) model by 01.ai that's very fast and cheap, yet scores high on independent leaderboards.

Context: 12,000 tokens

Input: $---/M • Output: $---/M

Yi Large

yi-large

Large version of Yi Lightning with a 32k context window, but more expensive.

Context: 32,000 tokens

Input: $---/M • Output: $---/M

Yi Medium 200k

yi-medium-200k

Medium version of Yi with a 200k context window.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Yi Medium 200k

yi-34b-chat-200k

Medium version of Yi Lightning with a huge 200k context window

Context: 16,000 tokens

Input: $---/M • Output: $---/M

Yi Spark

yi-34b-chat-0205

Small and powerful, lightweight and fast model. Provides enhanced mathematical operation and code writing capabilities.

Context: 16,000 tokens

Input: $---/M • Output: $---/M

Microsoft Azure

4 models

Microsoft Deepseek R1

microsoft/MAI-DS-R1-FP8

MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team to improve its responsiveness on blocked topics and its risk profile, while maintaining its reasoning capabilities and competitive performance.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Phi 4 Multimodal

phi-4-multimodal-instruct

Phi 4 by Microsoft. A small multimodal model that can handle images and text.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Phi 4 Mini

phi-4-mini-instruct

Phi 4 Mini by Microsoft. A small multilingual model.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

WizardLM-2 8x22B

microsoft/wizardlm-2-8x22b

Microsoft's advanced Wizard model. The most popular role-playing model.

Context: 65,536 tokens

Input: $---/M • Output: $---/M

MiniMax

4 models

MiniMax 01

minimax/minimax-01

MiniMax's flagship model with a 1M token context window

Context: 1,000,192 tokens

Input: $---/M • Output: $---/M

MiniMax M1

MiniMax-M1

MiniMax-M1 is a hybrid MoE reasoning model with 40K thinking budget. World's first open-weight, large-scale hybrid-attention model with lightning attention for efficient test-time compute scaling. Excels at complex tasks requiring extensive reasoning.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

MiniMax M1 80K

MiniMaxAI/MiniMax-M1-80k

MiniMax-M1 with 80K thinking budget. Enhanced version of the hybrid MoE reasoning model with double the thinking capacity. Ideal for extremely complex software engineering, tool use, and long context tasks.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

MiniMax M2

MiniMax-M2

MiniMax M2 offers enhanced reasoning and strong general performance. Optimized for coding and agentic workflows.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

StepFun

4 models

Step-2 16k Exp

step-2-16k-exp

Step-2 16k Exp is a 16k context window model

Context: 16,000 tokens

Input: $---/M • Output: $---/M

Step-2 Mini

step-2-mini

MiniMax's flagship model with a 1M token context window

Context: 8,000 tokens

Input: $---/M • Output: $---/M

Step-3

step-3

Step3 is a cutting-edge multimodal reasoning model—built on a Mixture-of-Experts architecture with 321B total parameters and 38B active. It is designed end-to-end to minimize decoding costs while delivering top-tier performance in vision–language reasoning. Through the co-design of Multi-Matrix Factorization Attention (MFA) and Attention-FFN Disaggregation (AFD), Step3 maintains exceptional efficiency across both flagship and low-end accelerators.

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Step R1 V Mini

step-r1-v-mini

Step-R1-V-Mini, which supports image and text input, text output, has good instruction following and general capabilities, can perceive images with high precision and complete complex reasoning tasks.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

AionLabs

3 models

Aion 1.0 mini (DeepSeek)

aion-labs/aion-1.0-mini

A distilled version of the DeepSeek-R1 model that excels in reasoning domains like mathematics, coding, and logic.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Aion 1.0

aion-labs/aion-1.0

Aion Labs most powerful reasoning model with high performance across reasoning and coding.

Context: 65,536 tokens

Input: $---/M • Output: $---/M

Llama 3.1 8b (uncensored)

aion-labs/aion-rp-llama-3.1-8b

This is a truly uncensored model, trained to excel at roleplaying and creative writing. However, it can also do other things!

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Amazon

3 models

Amazon Nova Pro 1.0

amazon/nova-pro-v1

Amazon's new flagship model. Can handle up to 300k input tokens, with comparable performance to ChatGPT and Claude 3.5 Sonnet.

Context: 300,000 tokens

Input: $---/M • Output: $---/M

Amazon Nova Lite 1.0

amazon/nova-lite-v1

Amazon's new lower cost model. Can handle up to 300k input tokens, with faster output but less thorough understanding than Amazon's Nova Pro.

Context: 300,000 tokens

Input: $---/M • Output: $---/M

Amazon Nova Micro 1.0

amazon/nova-micro-v1

Amazon's lowest cost model. Comparable to GPT-4o-mini and Gemini 1.5 Flash, with the fastest output.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Cohere

3 models

Cohere: Command R

cohere/command-r

35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Cohere: Command R+

cohere/command-r-plus-08-2024

104B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Cohere Command A (08/2025)

command-a-reasoning-08-2025

Cohere's first reasoning model designed for enterprise customer service and automation. 111B parameters with tool-use capabilities, supports 256K context and 23 languages including English, French, Spanish, Japanese, Arabic, and Hindi. Optimized for document processing, scheduling, data analysis, and more.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Tencent

2 models

Hunyuan T1

hunyuan-t1-latest

Hunyuan T1 is Tencent's top tier reasoning model. Good at large scale reasoning, precise following of complex instructions, low hallucations and blazing fast outputs.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Hunyuan Turbo S

hunyuan-turbos-20250226

Hunyuan Turbo S by Tencent is a thinking model that responds instantly.

Context: 24,000 tokens

Input: $---/M • Output: $---/M

Inflection

2 models

Inflection 3 Pi

inflection/inflection-3-pi

A chatbot with emotional intelligence. Has access to recent news, excels in scenarios like customer support and roleplay. Mirrors your conversation style.

Context: 8,000 tokens

Input: $---/M • Output: $---/M

Inflection 3 Productivity

inflection/inflection-3-productivity

Optimized for instruction following. Good at tasks that require precise adherence to provided guidelines. Has access to recent news.

Context: 8,000 tokens

Input: $---/M • Output: $---/M

DMind

2 models

DMind-1

dmind/dmind-1

Web3-specialized LLM fine-tuned using SFT and RLHF on curated Web3 data. Integrates deep knowledge across DeFi, DAOs, security, and smart contracts. Note: prompts are logged by DMind for model optimization and fine-tuning purposes. Logs are retained for 7 days, then deleted

Context: 32,768 tokens

Input: $---/M • Output: $---/M

DMind-1-Mini

dmind/dmind-1-mini

Mini version of DMind, the Web3-specialized LLM fine-tuned using SFT and RLHF on curated Web3 data. Integrates deep knowledge across DeFi, DAOs, security, and smart contracts. Note: prompts are logged by DMind for model optimization and fine-tuning purposes. Logs are retained for 7 days, then deleted

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Fetch.AI

1 models

ASI1 Mini

asi1-mini

ASI-1 Mini introduces next-level adaptive reasoning, context-aware decision-making. It features native reasoning support with four dynamic reasoning modes, intelligently selecting from Multi-Step, Complete, Optimized, and Short Reasoning, balancing depth, efficiency, and precision. Whether tackling complex, multi-layered problems or delivering concise, high-impact insights, ASI-1 Mini ensures reasoning is always tailored to the task at hand. Note: this model is rate limited at the moment.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Other Models

53 models

Veiled Calla 12B

soob3123/Veiled-Calla-12B

Veiled Calla 12B is a 12B parameter model that is a more advanced version of Calla 12B.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Sarvan Medium

sarvan-medium

Sarvam AI has launched Sarvam-M, a 24-billion-parameter hybrid language model boasting strong performance in math, programming, and Indian languages.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

The Omega Abomination V1

ReadyArt/The-Omega-Abomination-L-70B-v1.0

The merger of the Omega Directive M 24b v1.1 and Cydonia 24b v2.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Grok 4

x-ai/grok-4-07-09

Grok 4 0709 by xAI. Their latest and greatest flagship model, offering unparalleled performance in natural language, math and reasoning - the perfect jack of all trades.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Grok 4 Fast

x-ai/grok-4-fast

Grok 4 Fast, xAI’s latest advancement in cost‑efficient reasoning. Built on learnings from Grok 4, it blends reasoning and non‑reasoning in one model with a 2M‑token context window and state‑of‑the‑art cost efficiency.

Context: 2,000,000 tokens

Input: $---/M • Output: $---/M

Grok 4 Fast Thinking

x-ai/grok-4-fast:thinking

Grok 4 Fast with explicit thinking enabled for harder reasoning tasks. 2M context, highly token‑efficient.

Context: 2,000,000 tokens

Input: $---/M • Output: $---/M

Grok Code Fast 1

x-ai/grok-code-fast-1

The coding-specialized version of Grok 4

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Hermes 4 Large (Thinking)

nousresearch/hermes-4-405b:thinking

Hermes 4 Large with thinking enabled. Streams visible reasoning before the final answer.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

OpenReasoning Nemotron 32B

pamanseau/OpenReasoning-Nemotron-32B

OpenReasoning-Nemotron-32B is a reasoning model derived from Qwen2.5-32B-Instruct, post-trained for math, science, and code solution generation. Evaluated with up to 64K output tokens. Available in multiple sizes: 1.5B, 7B, 14B, and 32B.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

inclusionAI Ling Flash 2.0

inclusionAI/Ling-flash-2.0

Low-latency flash model suitable for general chat and assistants.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

inclusionAI Ring Flash 2.0

inclusionAI/Ring-flash-2.0

Low-latency flash model optimized for responsiveness.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Grok 3 Beta

grok-3-beta

Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Grok 3 Fast Beta

grok-3-fast-beta

Faster output version of Grok 3. Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Grok 3 Mini Beta

grok-3-mini-beta

Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It's ideal for reasoning-heavy tasks that don't demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Grok 3 Mini Fast Beta

grok-3-mini-fast-beta

Faster output version of Grok 3 Mini. Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It's ideal for reasoning-heavy tasks that don't demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems.

Context: 131,072 tokens

Input: $---/M • Output: $---/M

Lumimaid v0.2

NeverSleep/Lumimaid-v0.2-70B

Upgrade to Llama-3 Lumimaid 70B. A Llama 3.1 70B finetune trained on curated roleplay data.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

SorcererLM 8x22B

raifle/sorcererlm-8x22b

Advanced roleplaying model with reasoning and emotional intelligence for engaging interactions, contextual awareness and enhanced narrative depth

Context: 16,000 tokens

Input: $---/M • Output: $---/M

MythoMax 13B

Gryphe/MythoMax-L2-13b

One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay.

Context: 4,000 tokens

Input: $---/M • Output: $---/M

Magnum v4 72B

anthracite-org/magnum-v4-72b

Upgraded model of Magnum V2 72B. From the creators of Goliath. Aimed at achieving prose quality similar to Claude Opus 3, trained on 55 million tokens of curated Roleplay data.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

EVA-Qwen2.5-32B-v0.2

EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2

A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

K2-Think

LLM360/K2-Think

K2-Think is a 32B open-weights general reasoning model with strong competitive math performance. Benchmarks: AIME 2024 90.83, AIME 2025 81.24, GPQA-Diamond 71.08, LiveCodeBench v5 63.97.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

InclusionAI Ring 1T

inclusionai/ring-1t

1T‑scale model focused on deep thinking without the wait. Extensive RL post‑training unlocks strong natural‑language reasoning with competitive results across AIME 2025, HMMT 2025, LiveCodeBench v6, Codeforces, and ARC‑AGI‑1. Ideal for hard reasoning, olympiad‑style problem solving, code generation, and abstraction tasks.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

InclusionAI Ling 1T

inclusionai/ling-1t

Ling‑1T is the first flagship non‑thinking model in the Ling 2.0 series, featuring ~1T total parameters with ~50B active parameters per token. Built on the Ling 2.0 architecture and pre‑trained on 20T+ high‑quality, reasoning‑dense tokens, it supports up to 128K context and adopts an evolutionary chain‑of‑thought (Evo‑CoT) curriculum across mid‑ and post‑training. This design improves efficiency and reasoning depth, enabling state‑of‑the-art performance on complex reasoning benchmarks in math and code while balancing accuracy and latency.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

MN-LooseCannon-12B-v1

GalrionSoftworks/MN-LooseCannon-12B-v1

Merge of Starcannon and Sao Lyra.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

EVA-Qwen2.5-72B-v0.2

EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2

A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.

Context: 16,384 tokens

Input: $---/M • Output: $---/M

EVA-LLaMA-3.33-70B-v0.1

EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1

Context: 16,384 tokens

Input: $---/M • Output: $---/M

ReMM SLERP 13B

undi95/remm-slerp-l2-13b

A recreation trial of the original MythoMax-L2-B13 but merged with updated models.

Context: 6,144 tokens

Input: $---/M • Output: $---/M

Mercury Coder Small

mercury-coder-small

Model by Inception AI. A diffusion large language model that runs incredibly quickly (500+ tokens/second) while matching Claude 3.5 Haiku and GPT-4o-mini. 1st in speed on Copilot arena, and matching 2nd in quality.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Magnum V2 72B

anthracite-org/magnum-v2-72b

Magnum V2 72B

Context: 16,384 tokens

Input: $---/M • Output: $---/M

Damascus R1.

Steelskull/L3.3-Damascus-R1

Damascus-R1 builds upon some elements of the Nevoria foundation but represents a significant step forward with a completely custom-made DeepSeek R1 Distill base: Hydroblated-R1-V3. Constructed using the new SCE (Select, Calculate, and Erase) merge method, Damascus-R1 prioritizes stability, intelligence, and enhanced awareness.

Context: 65,500 tokens

Input: $---/M • Output: $---/M

LongCat Flash

meituan-longcat/LongCat-Flash-Chat-FP8

560B MoE model with 27B active params, 128K context. Exceptional at agentic tasks with dynamic computation and shortcut-connected architecture for 100+ TPS inference.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Jamba Large

jamba-large

Jamba 1.7 with improved grounding and instruction following for more accurate and reliable responses. Ideal for complex reasoning and document analysis tasks with 256k context window.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Jamba Mini

jamba-mini

Jamba Mini 1.7 - smaller and efficient version with improved grounding and instruction following. Perfect for cost-effective tasks while maintaining 256k context window.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Jamba Large 1.7

jamba-large-1.7

Latest Jamba model with improved grounding and instruction following for more accurate and reliable responses. Superior speed while processing large volumes of unstructured data.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Jamba Mini 1.7

jamba-mini-1.7

Latest smaller Jamba model (52B params) with improved grounding and instruction following. Cost-effective option for smaller tasks while maintaining the 256k context window.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Jamba Large 1.6

jamba-large-1.6

Its ability to process large volumes of unstructured data (256k tokens) with high accuracy makes it ideal for summarization and document analysis.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

Jamba Mini 1.6

jamba-mini-1.6

Smaller and cheaper version of Jamba Large 1.6 (52B parameters versus 398B parameters), ideal for smaller tasks and lower budgets, still with a 256k context window.

Context: 256,000 tokens

Input: $---/M • Output: $---/M

InternVL3 78B

OpenGVLab/InternVL3-78B

InternVL3-78B is a large-scale multimodal large language model with 78 billion parameters, providing state-of-the-art performance on vision-language understanding benchmarks.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

DeepCoder 14B Preview

agentica-org/DeepCoder-14B-Preview

DeepCoder-14B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning (RL) to scale up to long context lengths. The model achieves 60.6% Pass@1 accuracy on LiveCodeBench v5 (8/1/24-2/1/25), representing a 8% improvement over the base model (53%) and achieving similar performance to OpenAI's o3-mini with just 14B parameters.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Shisa V2 Llama 3.3 70B

shisa-ai/shisa-v2-llama3.3-70b

Shisa V2 is a family of bilingual Japanese/English language models ranging from 7B to 70B parameters, optimized for high-quality Japanese language capabilities while maintaining strong English performance.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

v0 1.5 MD

v0-1.5-md

Vercel model. v0 1.5 MD composite model combines specialized knowledge from RAG, reasoning from state-of-the-art LLMs, and error fixing from custom streaming post-processing. Specialized in building fast, beautiful full-stack web applications with continuously updated framework knowledge. Currently powered by Sonnet 4 base model.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

v0 1.5 LG

v0-1.5-lg

Vercel model. v0 1.5 LG composite model with larger context window and enhanced reasoning for hyper-specialized fields like physics engines, three.js, and multi-step tasks. Better at complex database migrations and architectural decisions. Achieves 89.8% error-free generation rate on web development benchmarks.

Context: 1,000,000 tokens

Input: $---/M • Output: $---/M

v0 1.0 MD

v0-1.0-md

Vercel model. v0 1.0 MD composite model specialized for web development with RAG, error correction, and optimized for code generation. Features custom AutoFix model for real-time error correction and best practice enforcement. Currently powered by Sonnet 3.7 base model.

Context: 200,000 tokens

Input: $---/M • Output: $---/M

Baichuan M2 32B Medical

Baichuan-M2

Medical-enhanced reasoning model built on Qwen2.5-32B with Large Verifier System. Specialized for medical reasoning with breakthrough performance while maintaining general capabilities.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Baichuan 4 Air

Baichuan4-Air

Fast and efficient AI model from Baichuan Intelligence, optimized for quick responses with balanced performance across various tasks.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Baichuan 4 Turbo

Baichuan4-Turbo

High-performance model from Baichuan Intelligence featuring enhanced capabilities for complex reasoning and specialized tasks.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Ling Flash 2.0

Ling-Flash-2.0

Open‑weights MoE model with 100B total parameters and ~6.1B active per token, trained on 20T+ tokens. Delivers strong complex reasoning and code generation (including frontend), with 32K context extendable to 128K via YaRN and efficient, high‑throughput inference.

Context: 65,000 tokens

Input: $---/M • Output: $---/M

Venice Uncensored Web

venice-uncensored:web

Venice's uncensored model with native web access included. Built on the Dolphin Mistral 24b model with a very low refusal rate.

Context: 80,000 tokens

Input: $---/M • Output: $---/M

Venice Uncensored

venice-uncensored

Venice's uncensored model. Built on the Dolphin Mistral 24b model with a very low refusal rate.

Context: 128,000 tokens

Input: $---/M • Output: $---/M

Cogito v2 Preview 70B

deepcogito/cogito-v2-preview-llama-70B

Cogito 70B is a dense hybrid reasoning model that combines direct answering capabilities with advanced self-reflection. Built with iterative policy improvement, it delivers strong performance across reasoning tasks while maintaining efficiency through shorter reasoning chains and improved intuition.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Cogito v2 Preview 109B MoE

deepcogito/cogito-v2-preview-llama-109B-MoE

Cogito 109B MoE leverages mixture-of-experts architecture to deliver advanced reasoning capabilities with computational efficiency. This hybrid model excels at both direct responses and complex reasoning tasks while maintaining multimodal capabilities through innovative transfer learning.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Cogito v2 Preview 405B

deepcogito/cogito-v2-preview-llama-405B

Cogito 405B represents a significant step toward frontier intelligence with dense architecture delivering performance competitive with leading closed models. This advanced reasoning system combines policy improvement with massive scale for exceptional capabilities.

Context: 32,768 tokens

Input: $---/M • Output: $---/M

Cogito v2 Preview 671B MoE

deepcogito/cogito-v2-preview-deepseek-671b

Cogito 671B MoE represents one of the strongest open models globally, matching performance of latest Deepseek models while approaching closed frontier systems like o3 and Claude 4 Opus. This advanced system demonstrates significant progress toward scalable superintelligence through policy improvement.

Context: 32,768 tokens

Input: $---/M • Output: $---/M