API
For our documentation, head to docs.nano-gpt.com.
API keys
Generate up to 5 API keys to use NanoGPT in other applications. If you require more keys, please contact us at support@nano-gpt.com and we will help you out.
Authenticate by including your API key as a HTTP header: "Authorization": f"Bearer API_KEY"
or "api-key": "API_KEY"
depending on the endpoint.
Name | Status | Created | API Key |
---|
Get notified about API updates.
We will only use this to contact you updates to how the API works. You can unsubscribe at any time.
If you are a (potentially) large user of our website or our API, we are glad to have you. Reach out to us at support@nano-gpt.com or join our Discord for a discount.
API Reference
For our documentation, head to docs.nano-gpt.com.
The below example code can be used in Python, NanoGPTjs is a great starting point for JS users.
If you encounter issues or need further information please contact support@nano-gpt.com
Text models
POST https://nano-gpt.com/api/talk-to-gpt
ASI1 Mini | $1.70 | $1.70 | asi1-mini | ASI-1 Mini introduces next-level adaptive reasoning, context-aware decision-making. It features native reasoning support with four dynamic reasoning modes, intelligently selecting from Multi-Step, Complete, Optimized, and Short Reasoning, balancing depth, efficiency, and precision. Whether tackling complex, multi-layered problems or delivering concise, high-impact insights, ASI-1 Mini ensures reasoning is always tailored to the task at hand. Note: this model is rate limited at the moment. |
Aion 1.0 | $6.80 | $13.60 | aion-labs/aion-1.0 | Aion Labs most powerful reasoning model with high performance across reasoning and coding. |
Aion 1.0 mini (DeepSeek) | $1.19 | $2.38 | aion-labs/aion-1.0-mini | A distilled version of the DeepSeek-R1 model that excels in reasoning domains like mathematics, coding, and logic. |
Amazon Nova Lite 1.0 | $0.10 | $0.41 | amazon/nova-lite-v1 | Amazon's new lower cost model. Can handle up to 300k input tokens, with faster output but less thorough understanding than Amazon's Nova Pro. |
Amazon Nova Micro 1.0 | $0.06 | $0.24 | amazon/nova-micro-v1 | Amazon's lowest cost model. Comparable to GPT-4o-mini and Gemini 1.5 Flash, with the fastest output. |
Amazon Nova Pro 1.0 | $1.36 | $5.44 | amazon/nova-pro-v1 | Amazon's new flagship model. Can handle up to 300k input tokens, with comparable performance to ChatGPT and Claude 3.5 Sonnet. |
Amoral Gemma3 27B v2 | $0.51 | $0.51 | soob3123/amoral-gemma3-27B-v2 | Amoral Gemma3 27B v2 is a 27B parameter model that is a more advanced version of Gemma3 27B. |
Anubis 70B v1 | $0.85 | $0.85 | TheDrummer/Anubis-70B-v1 | L3.3 finetune for roleplaying. |
Anubis Pro 105b v1 | $1.36 | $1.70 | anubis-pro-105b-v1 | An upscaled version of Llama 3.3 70B with 50% more layers. Finetuned further to make use of its new layers. |
Athene V2 Chat | $0.85 | $0.85 | Nexusflow/Athene-V2-Chat | An open-weights LLM on-par with GPT-4o across benchmarks. |
Azure gpt-4-turbo | $17.00 | $51.00 | azure-gpt-4-turbo | Azure version of OpenAI gpt-4-turbo |
Azure gpt-4o | $4.25 | $17.00 | azure-gpt-4o | Azure version of OpenAI gpt-4o |
Azure gpt-4o-mini | $0.25 | $1.02 | azure-gpt-4o-mini | Azure version of OpenAI gpt-4o-mini |
Azure o1 | $25.50 | $102.00 | azure-o1 | Azure version of OpenAI o1 |
Azure o3-mini | $1.87 | $7.48 | azure-o3-mini | Azure version of OpenAI o3-mini |
ChatGPT 4o | $8.50 | $25.50 | chatgpt-4o-latest | OpenAI's current recommended model, the well-known ChatGPT. |
ChatGPT 4o Reasoner | $8.50 | $25.50 | chatgpt-4o-latest-reasoner | 'DeepChatGPT', fusion of ChatGPT 4o and Deepseek R1. Deepseek R1 reasons, then feeds it into ChatGPT 4o to generate a response. |
Claude 3 Opus | $25.50 | $127.50 | claude-3-opus-20240229 | Anthropic's flagship model, outperforming GPT-4 on most benchmarks. |
Claude 3.5 Haiku | $13.60 | $6.80 | claude-3-5-haiku-20241022 | Anthropic's updated faster and cheaper model, offering good results on chatbots and coding. |
Claude 3.5 Sonnet | $5.10 | $25.50 | claude-3-5-sonnet-20241022 | One of Anthropic's top models, offering even better results on many subjects than GPT-4o. |
Claude 3.5 Sonnet Old | $5.10 | $25.50 | claude-3-5-sonnet-20240620 | Anthropic's most intelligent model, offering even better results on many subjects than GPT-4o. |
Claude 3.7 Sonnet | $5.10 | $25.50 | claude-3-7-sonnet-20250219 | Anthropic's updated most intelligent model. Preferred by many for its programming skills and its natural language. |
Claude 3.7 Sonnet Reasoner | $5.10 | $25.50 | claude-3-7-sonnet-reasoner | Claude 3.7 Sonnet Reasoner blends Deepseek R1's reasoning with Claude 3.7 Sonnet's response. |
Claude 3.7 Sonnet Thinking | $5.10 | $25.50 | claude-3-7-sonnet-thinking | Anthropic's Claude 3.7 Sonnet with the ability to show its thinking process step by step. |
Claude 3.7 Sonnet Thinking (128K) | $5.10 | $25.50 | claude-3-7-sonnet-thinking:128000 | Claude 3.7 Sonnet with maximum thinking budget (128,000 tokens). |
Claude 3.7 Sonnet Thinking (1K) | $5.10 | $25.50 | claude-3-7-sonnet-thinking:1024 | Claude 3.7 Sonnet with minimal thinking budget (1,024 tokens). |
Claude 3.7 Sonnet Thinking (32K) | $5.10 | $25.50 | claude-3-7-sonnet-thinking:32768 | Claude 3.7 Sonnet with extended thinking budget (32,768 tokens). |
Claude 3.7 Sonnet Thinking (8K) | $5.10 | $25.50 | claude-3-7-sonnet-thinking:8192 | Claude 3.7 Sonnet with reduced thinking budget (8,192 tokens). |
Claude 4 Opus | $25.50 | $127.50 | claude-opus-4-20250514 | Claude 4 Opus by Anthropic. The premium version of the new Claude models. A new generation model with improved capabilities, especially on programming and development. |
Claude 4 Opus Thinking | $25.50 | $127.50 | claude-opus-4-thinking | Anthropic's Claude 4 Opus with the ability to show its thinking process step by step. |
Claude 4 Opus Thinking (128K) | $25.50 | $127.50 | claude-opus-4-thinking:128000 | Claude 4 Opus with maximum thinking budget (128,000 tokens). |
Claude 4 Opus Thinking (1K) | $25.50 | $127.50 | claude-opus-4-thinking:1024 | Claude 4 Opus with minimal thinking budget (1,024 tokens). |
Claude 4 Opus Thinking (32K) | $25.50 | $127.50 | claude-opus-4-thinking:32768 | Claude 4 Opus with extended thinking budget (32,768 tokens). |
Claude 4 Opus Thinking (8K) | $25.50 | $127.50 | claude-opus-4-thinking:8192 | Claude 4 Opus with reduced thinking budget (8,192 tokens). |
Claude 4 Sonnet | $5.10 | $25.50 | claude-sonnet-4-20250514 | Claude 4 Sonnet by Anthropic. A new generation model with improved capabilities, especially on programming and development. |
Claude 4 Sonnet Thinking | $5.10 | $25.50 | claude-sonnet-4-thinking | Anthropic's Claude 4 Sonnet with the ability to show its thinking process step by step. |
Claude 4 Sonnet Thinking (128K) | $5.10 | $25.50 | claude-sonnet-4-thinking:128000 | Claude 4 Sonnet with maximum thinking budget (128,000 tokens). |
Claude 4 Sonnet Thinking (1K) | $5.10 | $25.50 | claude-sonnet-4-thinking:1024 | Claude 4 Sonnet with minimal thinking budget (1,024 tokens). |
Claude 4 Sonnet Thinking (32K) | $5.10 | $25.50 | claude-sonnet-4-thinking:32768 | Claude 4 Sonnet with extended thinking budget (32,768 tokens). |
Claude 4 Sonnet Thinking (8K) | $5.10 | $25.50 | claude-sonnet-4-thinking:8192 | Claude 4 Sonnet with reduced thinking budget (8,192 tokens). |
Cogito v1 Preview Qwen 32B | $3.06 | $2.75 | deepcogito/cogito-v1-preview-qwen-32B | 32-B parameter reasoning model from DeepCogito (Qwen backbone) β strong general reasoning & coding at low price. |
Cohere: Command R | $0.81 | $2.42 | cohere/command-r | 35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents |
Cohere: Command R+ | $4.85 | $24.23 | cohere/command-r-plus-08-2024 | 104B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents |
Damascus R1. | $0.85 | $0.85 | Steelskull/L3.3-Damascus-R1 | Damascus-R1 builds upon some elements of the Nevoria foundation but represents a significant step forward with a completely custom-made DeepSeek R1 Distill base: Hydroblated-R1-V3. Constructed using the new SCE (Select, Calculate, and Erase) merge method, Damascus-R1 prioritizes stability, intelligence, and enhanced awareness. |
Dazzling Star Aurora 32b | $0.85 | $0.85 | Qwen2.5-32B-Dazzling-Star-Aurora-32b-v0.0 | A Qwen 2.5 32b finetuned for roleplay and storytelling. |
DeepClaude | $5.10 | $25.50 | deepclaude | Harness the power of DeepSeek R1's reasoning combined with Claude's creativity and code generation. Feeds your query into DeepSeek R1, then feeds the query + thinking process into Claude 3.5 Sonnet and returns an answer. Note: this routes through original DeepSeek meaning your data may be stored and used by DeepSeek. |
DeepCoder 14B Preview | $0.25 | $0.25 | agentica-org/DeepCoder-14B-Preview | DeepCoder-14B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning (RL) to scale up to long context lengths. The model achieves 60.6% Pass@1 accuracy on LiveCodeBench v5 (8/1/24-2/1/25), representing a 8% improvement over the base model (53%) and achieving similar performance to OpenAI's o3-mini with just 14B parameters. |
DeepHermes-3 Mistral 24B (Preview) | $1.02 | $0.90 | NousResearch/DeepHermes-3-Mistral-24B-Preview | 24-B parameter Mistral model fine-tuned by NousResearch for balanced reasoning & creativity. |
DeepSeek Chat 0324 | $0.24 | $0.48 | deepseek-v3-0324 | DeepSeek V3 0324, DeepSeek's 03 March 2025 V3 model, optimized for general-purpose tasks. |
DeepSeek Prover V2 671B | $0.09 | $0.46 | deepseek-ai/DeepSeek-Prover-V2-671B | Deepseek Prover V2 is a specialized model primarily used for proving math equations. Not a general model. |
DeepSeek R1 | $0.46 | $1.95 | deepseek-r1-nano | DeepSeek's R1 is a thinking model, scoring very well on all benchmarks at low cost. This version is run via open-source providers and Azure, never routing through DeepSeek themselves. |
DeepSeek R1 70B TEE | $0.30 | $1.05 | TEE/deepseek-r1-70b | DeepSeek's R1 model distilled into Llama 70B architecture for improved efficiency. Runs inside a GPU TEE for full, provable privacy. |
DeepSeek R1 Fast | $8.50 | $11.90 | deepseek-r1-sambanova | DeepSeek R1 via Sambanova: the full model with very fast output. Note: max 4k output tokens. |
DeepSeek R1 Llama 70b | $0.25 | $0.25 | deepseek-r1-llama-70b | DeepSeek R1 Llama 70b is a fine-tuned version of DeepSeek R1 on Llama 70B. |
DeepSeek R1 Zero Preview | $3.74 | $3.74 | deepseek-ai/DeepSeek-R1-Zero | Preview version of Deepseek R1, also known as DeepSeek R1 Zero. Deepseek R1 without the supervised finetuning. |
DeepSeek Reasoner | $0.51 | $2.04 | deepseek-reasoner | DeepSeek-R1 is now live and open source, rivaling OpenAI's Model o1. |
DeepSeek V3/Chat Cheaper | $0.20 | $0.41 | deepseek-chat-cheaper | Cheaper version of Deepseek V3/Chat. Note: may be routed through Deepseek itself. |
DeepSeek V3/Deepseek Chat | $0.24 | $0.48 | deepseek-chat | DeepSeek original V3 model, trained on nearly 15 trillion tokens, matches leading closed-source models at a far lower price. |
Deepseek R1 Cheaper | $0.42 | $1.70 | deepseek-reasoner-cheaper | Cheaper version of DeepSeek R1. Note: may be routed through Chinese providers. |
Deepseek R1 Llama 70b Abliterated | $0.99 | $0.99 | huihui-ai/DeepSeek-R1-Distill-Llama-70B-abliterated | Uncensored version of the Deepseek R1 Llama 70B model |
Deepseek R1 Qwen Abliterated | $0.99 | $0.99 | huihui-ai/DeepSeek-R1-Distill-Qwen-32B-abliterated | Uncensored version of the Deepseek R1 Qwen 32B model |
Deepseek R1 T Chimera | $0.46 | $1.95 | tngtech/DeepSeek-R1T-Chimera | Deepseek V3 0324 with R1 reasoning using a novel construction method. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens. |
Dolphin 2.9.2 Mixtral 8x22B | $1.53 | $1.53 | cognitivecomputations/dolphin-mixtral-8x22b | Successor to Dolphin 2.6 Mixtral 8x7b. Great for instruction following, conversational, and coding. |
Dolphin 72b | $0.51 | $0.51 | dolphin-2.9.2-qwen2-72b | Dolphin is the most uncensored model yet, built on top of Qwen's 72b model. |
Doubao 1.5 Pro 256k | $1.02 | $1.70 | doubao-1.5-pro-256k | Doubao's (Bytedance) flagship model with a 256k token context window |
Doubao 1.5 Pro 32k | $0.17 | $0.42 | doubao-1.5-pro-32k | Doubao's (Bytedance) pro model with a 32k token context window |
Doubao 1.5 Thinking Pro | $1.02 | $4.08 | doubao-1-5-thinking-pro-250415 | Doubao-1.5 is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry. |
Doubao 1.5 Thinking Pro Vision | $1.02 | $4.08 | doubao-1-5-thinking-pro-vision-250415 | Vision version of Doubao-1.5 pro thinking which is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry. |
Doubao 1.5 Thinking Vision Pro | $0.94 | $2.43 | doubao-1-5-thinking-vision-pro-250428 | Doubao-1.5 is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry. |
Doubao 1.5 Vision Pro 32k | $0.68 | $1.70 | doubao-1.5-vision-pro-32k | Doubao's (Bytedance) vision-enabled pro model (JPG only) with a 32k token context window |
EVA Llama 3.33 70B | $3.40 | $3.40 | EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0 | A RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model. |
EVA Qwen2.5 72B | $8.50 | $8.50 | eva-unit-01/eva-qwen-2.5-72b | Full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model. |
EVA-LLaMA-3.33-70B-v0.1 | $3.40 | $3.40 | EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1 | A RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model. |
EVA-Qwen2.5-32B-v0.2 | $1.36 | $1.36 | EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2 | A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model. |
EVA-Qwen2.5-72B-v0.2 | $1.19 | $1.19 | EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2 | A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model. |
Ernie 4.5 8k Preview | $1.12 | $4.42 | ernie-4.5-8k-preview | ERNIE 4.5 is Baidu's new generation native multimodal foundation model independently developed by the company. It achieves collaborative optimization through joint modeling of multiple modalities, demonstrating exceptional multimodal comprehension capabilities. With refined language skills, it exhibits comprehensive improvements in understanding, generation, reasoning and memory, along with notable enhancements in hallucination prevention, logical reasoning, and coding abilities. |
Ernie 4.5 Turbo 128k | $0.22 | $0.94 | ernie-4.5-turbo-128k | ERNIE 4.5 Turbo demonstrates overall progress in hallucination reduction, logical reasoning, and coding abilities, with faster response. The multimodal capabilities of ERNIE 4.5 Turbo are on par with GPT-4.1 and superior to GPT-4o across multiple benchmarks. |
Ernie 4.5 Turbo VL 32k | $0.84 | $2.43 | ernie-4.5-turbo-vl-32k | ERNIE 4.5 Turbo demonstrates overall progress in hallucination reduction, logical reasoning, and coding abilities, with faster response. The multimodal capabilities of ERNIE 4.5 Turbo are on par with GPT-4.1 and superior to GPT-4o across multiple benchmarks. |
Ernie X1 32k | $0.56 | $2.24 | ernie-x1-32k-preview | ERNIE X1 is a Baidu model, surpassing earlier versions in terms of intelligence and maximum input/output size. |
Ernie X1 32k | $0.56 | $2.24 | ernie-x1-32k | ERNIE X1 is a deep-thinking reasoning model, outperforming DeepSeek R1 and the latest version of V3. |
Ernie X1 Turbo 32k | $0.28 | $1.12 | ernie-x1-turbo-32k | ERNIE X1 is a deep-thinking reasoning model, outperforming DeepSeek R1 and the latest version of V3. |
Evayale 70b | $0.85 | $0.85 | Steelskull/L3.3-MS-Evayale-70B | Combination of EVA and Euryale. |
GLM 4 32B 0414 | $0.34 | $0.34 | THUDM/GLM-4-32B-0414 | Features 32 billion parameters. Performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series. Pre-trained on 15T of high-quality data, including reasoning-type synthetic data. Enhanced performance in instruction following, engineering code, and function calling. |
GLM 4 9B 0414 | $0.34 | $0.34 | THUDM/GLM-4-9B-0414 | A 9B parameter version of the GLM-4 series, offering a balance of performance and efficiency. |
GLM 4 Air 0111 | $0.24 | $0.24 | glm-4-air-0111 | MiniMax's flagship model with a 1M token context window |
GLM 4 Plus 0111 | $17.00 | $17.00 | glm-4-plus-0111 | GLM 4 Plus 0111 is a 1M token context window model |
GLM Z1 32B 0414 | $0.34 | $0.34 | THUDM/GLM-Z1-32B-0414 | A reasoning model with deep thinking capabilities, based on GLM-4-32B-0414. Further trained on tasks involving mathematics, code, and logic. Significantly improves mathematical abilities and capability to solve complex tasks. |
GLM Z1 9B 0414 | $0.34 | $0.34 | THUDM/GLM-Z1-9B-0414 | 9B small-sized model maintaining the open-source tradition. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size. |
GLM Z1 AirX | $0.12 | $0.12 | glm-z1-air | Incredibly cheap yet highly performant Chinese model, comparable to Deepseek R1 in performance on many metrics at 1/30th the cost. |
GLM Z1 AirX | $1.19 | $1.19 | glm-z1-airx | Fastest reasoning model in China, with up to 200 tokens per second. The stronger version of GLM Z1 Air. |
GLM Z1 Rumination 32B 0414 | $0.34 | $0.34 | THUDM/GLM-Z1-Rumination-32B-0414 | A deep reasoning model with rumination capabilities (benchmarked against OpenAI's Deep Research). Employs longer periods of deep thought to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). Integrates search tools during its deep thinking process. |
GLM Zero Preview | $3.06 | $3.06 | glm-zero-preview | GLM Zero Preview is a thinking model like o1, but with a smaller context window |
GLM-4 | $25.50 | $25.50 | glm-4 | High-intelligence model with 128K context window |
GLM-4 Air | $0.34 | $0.34 | glm-4-air | High-performance model with 128K context window |
GLM-4 AirX | $3.40 | $3.40 | glm-4-airx | Fastest GLM-4 variant with 8K context window |
GLM-4 Flash | $0.02 | $0.02 | glm-4-flash | Extremely cheap model with 128K context window |
GLM-4 Long | $0.34 | $0.34 | glm-4-long | Extended context model supporting up to 1M tokens |
GLM-4 Plus | $12.75 | $12.75 | glm-4-plus | GLM high-intelligence flagship model with 128K context window |
GPT 3.5 Turbo | $0.85 | $2.55 | gpt-3.5-turbo | Older model. Brought ChatGPT to the mainstream, seen as dated nowadays. 90% cheaper than GPT-4-Turbo, recommended for very simple tasks. |
GPT 4 Turbo Preview | $17.00 | $51.00 | gpt-4-turbo-preview | Can take in the largest messages (up to 300 pages of context), and all round seen as one of the best in class models. |
GPT 4.1 | $2.00 | $8.00 | openai/gpt-4.1 | GPT 4.1 is the new flagship model from OpenAI. Huge context window (1 mln tokens), outperforms GPT-4o and GPT 4.5 across coding and does very well at understanding large contexts. |
GPT 4.1 Mini | $0.40 | $1.60 | openai/gpt-4.1-mini | Mid-sized GPT 4.1, comparable to GPT4o with a far higher context window, at lower cost and with higher speed. |
GPT 4.1 Nano | $0.10 | $0.40 | openai/gpt-4.1-nano | Cheapest model in the GPT-4.1 series. Huge context window with fast throughput and low latency. |
GPT 4.1 Reasoner | $3.40 | $13.60 | gpt-4.1-reasoner | 'DeepGPT 4.1', fusion of GPT 4.1 and Deepseek R1. Deepseek R1 reasons, then feeds it into GPT 4.1 to generate a response. |
GPT 4.5 | $102.00 | $204.00 | gpt-4.5-preview | GPT 4.5 Preview. Largest GPT model designed for creative tasks and agentic planning, currently available in a research preview. |
GPT 4.5 Preview Reasoner | $127.50 | $255.00 | gpt-4.5-preview-2025-02-27-reasoner | 'DeepGPT4.5', fusion of GPT 4.5 Preview and Deepseek R1. Deepseek R1 reasons, then feeds it into GPT 4.5 Preview to generate a response. |
GPT 4o | $2.26 | $9.01 | gpt-4o | OpenAI's precusor to ChatGPT-4o. Great on English text and code, with significant improvements on text in non-English languages. |
GPT 4o 08 06 | $4.25 | $17.00 | gpt-4o-2024-08-06 | OpenAI's precusor to ChatGPT-4o. Great on English text and code, with significant improvements on text in non-English languages. |
GPT 4o 11 20 | $4.25 | $17.00 | gpt-4o-2024-11-20 | OpenAI's precusor to ChatGPT-4o. Great on English text and code, with significant improvements on text in non-English languages. |
GPT 4o Mini Search | $0.25 | $1.02 | gpt-4o-mini-search-preview | GPT 4o Mini with web search built in natively via OpenAI. |
GPT 4o Reasoner | $4.25 | $17.00 | gpt-4o-reasoner | 'DeepGPT4o', fusion of GPT 4o and Deepseek R1. Deepseek R1 reasons, then feeds it into GPT 4o to generate a response. |
GPT 4o Search | $4.25 | $17.00 | gpt-4o-search-preview | GPT 4o with web search built in natively via OpenAI. |
GPT 4o mini | $0.25 | $1.02 | gpt-4o-mini | OpenAI's most cost-efficient small model. Cheaper and smarter than GPT-3.5 (the original ChatGPT), but less performant than gpt-4o |
Gemini 1.5 Flash | $0.13 | $0.51 | google/gemini-flash-1.5 | Google's fastest multimodal model with great performance for diverse, repetitive tasks and a 2 million words context window. |
Gemini 2.0 Flash | $0.17 | $0.68 | gemini-2.0-flash-001 | Upgraded version of Gemini Flash 1.5. Faster, with higher output, and overall increase in intelligence. |
Gemini 2.0 Flash Exp | $0.34 | $0.85 | gemini-2.0-flash-exp | Experimental version of Google's newest model, outperforming even Gemini 1.5 Pro. |
Gemini 2.0 Flash Exp Search | $0.34 | $1.02 | gemini-2.0-flash-exp-search | Google's newest model, outperforming even Gemini 1.5 Pro. Now with web access. |
Gemini 2.0 Flash Lite | $0.13 | $0.51 | gemini-2.0-flash-lite | Upgraded version of Gemini Flash 1.5. Faster, with higher output, and overall increase in intelligence. |
Gemini 2.0 Flash Thinking 0121 | $0.34 | $0.85 | gemini-2.0-flash-thinking-exp-01-21 | Google's newest model, outperforming even Gemini 1.5 Pro, now with a thinking mode enabled similar to the o1 series of OpenAI. |
Gemini 2.0 Flash Thinking 1219 | $0.34 | $0.85 | gemini-2.0-flash-thinking-exp-1219 | Google's newest model, outperforming even Gemini 1.5 Pro, now with a thinking mode enabled similar to the o1 series of OpenAI. |
Gemini 2.0 Pro 0205 | $3.40 | $13.60 | gemini-2.0-pro-exp-02-05 | Gemini 2.0 Pro Exp 0205, the latest version of the Gemini 2.0 Pro model. |
Gemini 2.0 Pro 1206 | $4.25 | $17.00 | gemini-exp-1206 | Gemini 2.0 Pro 1206, the previous version of the Gemini 2.0 Pro model. |
Gemini 2.0 Pro Reasoner | $2.21 | $8.50 | gemini-2.0-pro-reasoner | 'DeepGemini', fusion of Gemini 2.0 Pro and Deepseek R1. Deepseek R1 reasons, then feeds it into Gemini 2.0 Pro to generate a response. |
Gemini 2.5 Flash 0520 | $0.15 | $0.60 | gemini-2.5-flash-preview-05-20 | Gemini 2.5 Flash 0520, the latest version of the Gemini 2.5 Flash model. |
Gemini 2.5 Flash 0520 Thinking | $0.15 | $3.50 | gemini-2.5-flash-preview-05-20:thinking | Gemini 2.5 Flash 0520, the latest version of the Gemini 2.5 Flash model. |
Gemini 2.5 Flash Preview | $0.25 | $1.02 | gemini-2.5-flash-preview-04-17 | Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series. |
Gemini 2.5 Flash Preview Thinking | $0.25 | $5.95 | gemini-2.5-flash-preview-04-17:thinking | Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series. Thinking turned on by default |
Gemini 2.5 Pro | $4.25 | $17.00 | gemini-2.5-pro-preview-03-25 | Gemini 2.5 Pro Preview 0325. Google's newest model. Topping all the leaderboards as of March 28th 2025. |
Gemini 2.5 Pro | $4.25 | $17.00 | gemini-2.5-pro-preview-05-06 | Gemini 2.5 Pro Preview 0506. Google's newest model. |
Gemini 2.5 Pro Experimental | $4.25 | $17.00 | gemini-2.5-pro-exp-03-25 | Gemini 2.5 Pro Exp 0325. Google's newest model. Topping all the leaderboards as of March 28th 2025. |
Gemini LearnLM Experimental | $5.95 | $17.85 | learnlm-1.5-pro-experimental | LearnLM is a task-specific model trained to align with learning science principles when following system instructions for teaching and learning use cases. For instance, the model can take on tasks to act as an expert or guide to educate users on specific topics. |
Gemini Text + Image | $0.34 | $1.36 | gemini-2.0-flash-exp-image-generation | Gemini 2.0 Flash Image Generation. Can generate both text and images within the same prompt! |
Gemma 3 12B IT | $0.42 | $0.42 | unsloth/gemma-3-12b-it | Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. |
Gemma 3 1B IT | $0.17 | $0.17 | unsloth/gemma-3-1b-it | Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. |
Gemma 3 27B IT | $0.51 | $0.51 | unsloth/gemma-3-27b-it | Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. |
Gemma 3 4B IT | $0.25 | $0.25 | unsloth/gemma-3-4b-it | Gemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions. |
Grayline Qwen3 8B | $0.51 | $0.51 | soob3123/GrayLine-Qwen3-8B | Grayline is an neutral AI assistant engineered for uncensored information delivery and task execution. This model operates without inherent ethical or moral frameworks, designed to process and respond to any query with objective efficiency and precision. Grayline's core function is to leverage its full capabilities to provide direct answers and execute tasks as instructed, without offering unsolicited commentary, warnings, or disclaimers. It accesses and processes information without bias or restriction. |
Grok 3 Beta | $5.10 | $25.50 | grok-3-beta | Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking. |
Grok 3 Fast Beta | $8.50 | $42.50 | grok-3-fast-beta | Faster output version of Grok 3. Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking. |
Grok 3 Mini Beta | $0.51 | $0.85 | grok-3-mini-beta | Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It's ideal for reasoning-heavy tasks that don't demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems. |
Grok 3 Mini Fast Beta | $1.02 | $6.80 | grok-3-mini-fast-beta | Faster output version of Grok 3 Mini. Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It's ideal for reasoning-heavy tasks that don't demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems. |
Hermes 3 70B TEE | $0.75 | $0.75 | TEE/hermes-3-llama-3.1-70b | Hermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Runs inside a GPU TEE for full, provable privacy. |
Hermes 3 Large | $3.40 | $5.10 | nousresearch/hermes-3-llama-3.1-405b | Llama 3.1 405b with the brakes taken off. Less censored than the regular version, but not abliterated |
Hunyuan T1 | $0.29 | $1.12 | hunyuan-t1-latest | Hunyuan T1 is Tencent's top tier reasoning model. Good at large scale reasoning, precise following of complex instructions, low hallucations and blazing fast outputs. |
Hunyuan Turbo S | $0.24 | $0.56 | hunyuan-turbos-20250226 | Hunyuan Turbo S by Tencent is a thinking model that responds instantly. |
Inflection 3 Pi | $4.25 | $17.00 | inflection/inflection-3-pi | A chatbot with emotional intelligence. Has access to recent news, excels in scenarios like customer support and roleplay. Mirrors your conversation style. |
Inflection 3 Productivity | $4.25 | $17.00 | inflection/inflection-3-productivity | Optimized for instruction following. Good at tasks that require precise adherence to provided guidelines. Has access to recent news. |
Jamba Large 1.6 | $3.40 | $13.60 | jamba-large-1.6 | Its ability to process large volumes of unstructured data (256k tokens) with high accuracy makes it ideal for summarization and document analysis. |
Jamba Mini 1.6 | $0.34 | $0.68 | jamba-mini-1.6 | Smaller and cheaper version of Jamba Large 1.6 (52B parameters versus 398B parameters), ideal for smaller tasks and lower budgets, still with a 256k context window. |
Kimi Latest | $8.50 | $8.50 | kimi-latest | Always point to the latest stable Kimi model. |
Kimi Thinking Preview | $53.48 | $53.48 | kimi-thinking-preview | Kimi Thinking Preview is a new model that is capable of thinking and reasoning. It's quite expensive! |
Kimi VL Thinking | $1.02 | $1.02 | moonshotai/Kimi-VL-A3B-Thinking | Efficient open-source MoE vision-language model (2.8B active params) with advanced multimodal reasoning, 128K long-context understanding, strong agent capabilities, and long-thinking variant. Excels in multi-turn agent tasks, image/video comprehension, OCR, math reasoning, and multi-image understanding. Competes with GPT-4o-mini, Qwen2.5-VL-7B, Gemma-3-12B-IT, surpasses GPT-4o in some domains. |
LatitudeGames WayFarer 12B | $0.34 | $0.34 | Mistral-Nemo-12B-Wayfarer | Latitude Games Wayfarer 12B |
Llama 3 70B abliterated | $0.99 | $0.99 | failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5 | An abliterated (removed restrictions and censorship) version of Llama 3.1 70b. |
Llama 3.05 Storybreaker Ministral 70b | $0.85 | $0.85 | Envoid/Llama-3.05-NT-Storybreaker-Ministral-70B | Much more inclined to output adult content than its predecessor. Great choice for novelty roleplay scenarios. |
Llama 3.1 70B ArliAI RPMax v1.3 | $0.51 | $0.51 | Llama-3.3+3.1-70B-ArliAI-RPMax-v1.3 | RPMax are a series of models that are trained on a diverse set of curated creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations. |
Llama 3.1 70B Celeste v0.1 | $0.85 | $0.85 | nothingiisreal/L3.1-70B-Celeste-V0.1-BF16 | Creative model based on Llama 3.1 70B |
Llama 3.1 70B Dracarys 2 | $0.85 | $0.85 | abacusai/Dracarys-72B-Instruct | Llama 3.1 70b finetune that offers improvements on coding. |
Llama 3.1 70B Euryale | $0.51 | $0.59 | Sao10K/L3.1-70B-Euryale-v2.2 | A 70B parameter model from SAO10K based on Llama 3.1 70B, offering high-quality text generation. |
Llama 3.1 70B Hanami | $0.85 | $0.85 | Sao10K/L3.1-70B-Hanami-x1 | Euryale v2.2-based finetune. |
Llama 3.1 8B (decentralized) | $0.02 | $0.03 | Meta-Llama-3-1-8B-Instruct-FP8 | Meta's Llama 3.1 8B model via an open permissionless network |
Llama 3.1 8b (uncensored) | $0.34 | $0.34 | aion-labs/aion-rp-llama-3.1-8b | This is a truly uncensored model, trained to excel at roleplaying and creative writing. However, it can also do other things! |
Llama 3.1 8b Instruct | $0.09 | $0.09 | meta-llama/llama-3.1-8b-instruct | Fast and efficient for simple purposes. |
Llama 3.1 Large | $0.34 | $0.34 | Meta-Llama-3-1-405B-Instruct-FP8 | Note: comes with a 90% discount currently, enjoy! Meta's largest Llama 3.1 405B model. Open-source, run through an open permissionless crypto network (no central provider). |
Llama 3.2 3b Instruct | $0.05 | $0.09 | meta-llama/llama-3.2-3b-instruct | Small model optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization |
Llama 3.2 Medium | $1.53 | $1.53 | meta-llama/llama-3.2-90b-vision-instruct | Medium-size (and capability) version of Meta's newest model (3.2 series). |
Llama 3.3 70B Anubis v1 | $0.85 | $0.85 | Llama-3.3-70B-Anubis-v1 | Llama 3.3 70B Anubis v1 |
Llama 3.3 70B Cu Mai | $0.85 | $0.85 | Steelskull/L3.3-Cu-Mai-R1-70b | A 70B parameter model from Steelskull based on Llama 3.3 70B, offering high-quality text generation. |
Llama 3.3 70B Cu Mai R1 | $0.51 | $0.51 | Llama-3.3-70B-Cu-Mai-R1 | Llama 3.3 70B Cu Mai R1 |
Llama 3.3 70B Electra R1 | $0.51 | $0.51 | Llama-3.3-70B-Electra-R1 | Llama 3.3 70B Electra R1 |
Llama 3.3 70B Electranova v1.0 | $0.51 | $0.51 | Llama-3.3-70B-Electranova-v1.0 | Llama 3.3 70B Electranova v1.0 |
Llama 3.3 70B Euryale | $0.85 | $0.85 | Sao10K/L3.3-70B-Euryale-v2.3 | A 70B parameter model from SAO10K based on Llama 3.3 70B, offering high-quality text generation. |
Llama 3.3 70B Fallen R1 v1 | $0.51 | $0.51 | Llama-3.3-70B-Fallen-R1-v1 | Llama 3.3 70B Fallen R1 v1 |
Llama 3.3 70B Instruct abliterated | $0.99 | $0.99 | huihui-ai/Llama-3.3-70B-Instruct-abliterated | An abliterated (removed restrictions and censorship) version of Llama 3.3 70b. |
Llama 3.3 70B Legion V2.1 | $0.51 | $0.51 | Llama-3.3-70B-Legion-V2.1 | Llama 3.3 70B Legion V2.1 |
Llama 3.3 70B Magnum v4 SE | $0.51 | $0.51 | Llama-3.3-70B-Magnum-v4-SE | Llama 3.3 70B Magnum v4 SE |
Llama 3.3 70B RPMax v1.4 | $0.51 | $0.51 | Llama-3.3-70B-ArliAI-RPMax-v1.4 | Llama 3.3 70B RPMax v1.4 |
Llama 3.3 70B TEE | $0.18 | $0.52 | TEE/llama-3.3-70b-instruct | The Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Runs inside a GPU TEE for full, provable privacy. |
Llama 3.3 70B Vulpecula R1 | $0.51 | $0.51 | Llama-3.3-70B-Vulpecula-R1 | Llama 3.3 70B Vulpecula R1 |
Llama 3.3 70B Wayfarer | $1.19 | $1.19 | LatitudeGames/Wayfarer-Large-70B-Llama-3.3 | Llama 3.3 70B Wayfarer is a fine-tuned version of Llama 3.3 70B, trained on a diverse set of creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations. |
Llama 3.3 70b Instruct | $0.10 | $0.25 | meta-llama/llama-3.3-70b-instruct | Llama 3.3 is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. |
Llama 3.3 70b Mirai Fanfare | $0.85 | $0.85 | Llama-3.3-70B-MiraiFanfare | A Llama 3.3 70b finetuned for roleplay and storytelling. |
Llama 3.3+ 70B WhiteRabbitNeo-2 | $0.51 | $0.51 | Llama-3.3+(3.1v3.3)-70B-WhiteRabbitNeo-2 | WhiteRabbitNeo is a model series that can be used for offensive and defensive cybersecurity. |
Llama 3.3+(3.1v3.3) 70B Hanami x1 | $0.51 | $0.51 | Llama-3.3+(3.1v3.3)-70B-Hanami-x1 | Llama 3.3+(3.1v3.3) 70B Hanami x1 |
Llama 3.3+(3v3.3) 70B TenyxChat DaybreakStorywriter | $0.51 | $0.51 | Llama-3.3+(3v3.3)-70B-TenyxChat-DaybreakStorywriter | Llama 3.3+(3v3.3) 70B TenyxChat DaybreakStorywriter |
Llama 4 Maverick | $0.18 | $0.80 | meta-llama/llama-4-maverick | Llama 4 Maverick, a 17 billion active parameter model with 128 experts, is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash across a broad range of widely reported benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and codingβat less than half the active parameters. Llama 4 Maverick offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena. |
Llama 4 Scout | $0.09 | $0.46 | meta-llama/llama-4-scout | Llama 4 Scout, a 17 billion active parameter model with 16 experts, is the best multimodal model in the world in its class and is more powerful than all previous generation Llama models, while fitting in a single H100 GPU. Additionally, Llama 4 Scout offers an industry-leading context window of 10M and delivers better results than Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across a broad range of widely reported benchmarks. |
Llama-xLAM-2 70B fc-r | $4.25 | $3.88 | Salesforce/Llama-xLAM-2-70b-fc-r | Salesforceβs 70-B frontier model focused on function-calling & retrieval-augmented generation. |
Lumimaid 70b | $3.40 | $3.40 | NeverSleep/Llama-3-Lumimaid-70B-v0.1 | Neversleep Llama 3 Lumimaid 70B |
Lumimaid v0.2 | $2.01 | $2.01 | NeverSleep/Lumimaid-v0.2-70B | Upgrade to Llama-3 Lumimaid 70B. A Llama 3.1 70B finetune trained on curated roleplay data. |
MN-LooseCannon-12B-v1 | $0.85 | $0.85 | GalrionSoftworks/MN-LooseCannon-12B-v1 | Merge of Starcannon and Sao Lyra. |
MS Evalebis 70b | $0.85 | $0.85 | Steelskull/L3.3-MS-Evalebis-70b | Combination of EVA, Euryale and Anubis. |
Mag Mell R1 | $0.85 | $0.85 | inflatebot/MN-12B-Mag-Mell-R1 | Mag Mell demonstrates worldbuilding capabilities unlike any model in its class, comparable to old adventuring models like Tiefighter, and prose that exhibits minimal slop. |
Magnum V2 72B | $3.40 | $5.10 | anthracite-org/magnum-v2-72b | Magnum V2 72B |
Magnum v4 72B | $3.40 | $5.10 | anthracite-org/magnum-v4-72b | Upgraded model of Magnum V2 72B. From the creators of Goliath. Aimed at achieving prose quality similar to Claude Opus 3, trained on 55 million tokens of curated Roleplay data. |
Mercury Coder Small | $0.42 | $1.70 | mercury-coder-small | Model by Inception AI. A diffusion large language model that runs incredibly quickly (500+ tokens/second) while matching Claude 3.5 Haiku and GPT-4o-mini. 1st in speed on Copilot arena, and matching 2nd in quality. |
Microsoft Deepseek R1 | $0.17 | $0.17 | microsoft/MAI-DS-R1-FP8 | MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team to improve its responsiveness on blocked topics and its risk profile, while maintaining its reasoning capabilities and competitive performance. |
Microsoft Phi 4 Reasoning | $0.10 | $0.10 | microsoft/Phi-4-reasoning | A 14-billion parameter open-weight reasoning model that rivals much larger models on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated reasoning demonstrations from OpenAI o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectively leverage additional inference-time compute. |
Microsoft Phi 4 Reasoning Plus | $0.10 | $0.10 | microsoft/Phi-4-reasoning-plus | A 14-billion parameter open-weight reasoning model that rivals much larger models on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated reasoning demonstrations from OpenAI o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectively leverage additional inference-time compute. Phi-4-reasoning-plus builds upon Phi-4-reasoning capabilities, further trained with reinforcement learning to utilize more inference-time compute, using 1.5x more tokens than Phi-4-reasoning, to deliver higher accuracy. |
MiniMax 01 | $0.34 | $1.87 | minimax/minimax-01 | MiniMax's flagship model with a 1M token context window |
Mistral 7B Instruct | $0.09 | $0.09 | mistralai/mistral-7b-instruct | Optimized for speed with decent context length |
Mistral Devstral Small 2505 | $17.00 | $34.00 | mistralai/Devstral-Small-2505 | OpenHands+Devstral is 100% local 100% open, and is SOTA for the category on SWE-Bench Verified: 46.8% accuracy. |
Mistral Large 2411 | $3.40 | $10.20 | mistralai/mistral-large | Upgrade to Mistral's flagship model. It is fluent in English, French, Spanish, German, and Italian, with high grammatical accuracy, with a long context window. |
Mistral Medium 3 | $0.68 | $3.40 | mistralai/mistral-medium-3 | Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost. On performance, Mistral Medium 3 also surpasses leading open models such as Llama 4 Maverick and enterprise models such as Cohere Command A. On pricing, the model beats cost leaders such as DeepSeek v3, both in API and self-deployed systems. |
Mistral Nemo | $0.17 | $0.20 | mistralai/Mistral-Nemo-Instruct-2407 | 12B parameter model with multilingual support. |
Mistral Nemo 12B Instruct 2407 | $0.17 | $0.20 | Mistral-Nemo-12B-Instruct-2407 | Mistral Nemo 12B Instruct 2407 |
Mistral Nemo 12B NemoMix Unleashed | $0.34 | $0.34 | Mistral-Nemo-12B-NemoMix-Unleashed | Mistral Nemo 12B NemoMix Unleashed |
Mistral Nemo 12B RPMax v1.1 | $0.17 | $0.25 | Mistral-Nemo-12B-ArliAI-RPMax-v1.1 | Mistral Nemo 12B RPMax v1.1 |
Mistral Nemo 12B RPMax v1.3 | $0.17 | $0.25 | Mistral-Nemo-12B-ArliAI-RPMax-v1.3 | Mistral Nemo 12B RPMax v1.3 |
Mistral Nemo 12B SauerkrautLM | $0.17 | $0.25 | Mistral-Nemo-12B-SauerkrautLM | Mistral Nemo 12B SauerkrautLM |
Mistral Nemo Inferor 12B | $0.42 | $0.85 | Infermatic/MN-12B-Inferor-v0.0 | Inferor is a merge of top roleplay models, expert on immersive narratives and storytelling. |
Mistral Nemo Starcannon 12b v1 | $0.85 | $0.85 | VongolaChouko/Starcannon-Unleashed-12B-v1.0 | Mistral Nemo finetine that offers improvements on roleplay. |
Mistral Saba | $0.34 | $1.02 | mistralai/mistral-saba | Mistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional datasets, it supports multiple Indian-origin languagesβincluding Tamil and Malayalamβalongside Arabic. This makes it a versatile option for a range of regional and multilingual applications. |
Mistral Small 31 24b Instruct | $0.17 | $0.51 | mistral-small-31-24b-instruct | Building upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks. |
Mistral Tiny | $0.42 | $0.42 | mistralai/mistral-tiny | Powered by Mistral-7B-v0.2, best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial. |
MythoMax 13B | $0.17 | $0.17 | Gryphe/MythoMax-L2-13b | One of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay. |
Nemo Arli 12b RPMa V1.2 | $0.17 | $0.25 | Mistral-Nemo-12B-ArliAI-RPMax-v1.2 | A Mistral Nemo 12b finetuned for roleplay and storytelling. |
NemoMix 12B Unleashed | $0.85 | $0.85 | MarinaraSpaghetti/NemoMix-Unleashed-12B | Great for RP and storytelling. |
Nemotron 3.1 70B abliterated | $0.99 | $0.99 | huihui-ai/Llama-3.1-Nemotron-70B-Instruct-HF-abliterated | An abliterated (removed restrictions and censorship) version of Llama 3.1 70b Nemotron. |
Nemotron Tenyxchat Storybreaker 70b | $0.85 | $0.85 | Envoid/Llama-3.05-Nemotron-Tenyxchat-Storybreaker-70B | Overall it provides a solid option for RP and creative writing while still functioning as an assistant model, if desired. If used to continue a roleplay it will generally follow the ongoing cadence of the conversation. |
Neural Daredevil 8B abliterated | $0.61 | $0.61 | mlabonne/NeuralDaredevil-8B-abliterated | The best performing 8B abliterated model according to most benchmarks. |
Nvidia Nemotron 70b | $0.59 | $0.68 | nvidia/Llama-3.1-Nemotron-70B-Instruct-HF | Nvidia's latest Llama fine-tune optimized for instruction following. Early results hints that it might outperform models such as GPT-4o and Claude 3.5 Sonnet. |
Nvidia Nemotron Super 49B | $2.55 | $2.55 | nvidia/Llama-3.3-Nemotron-Super-49B-v1 | Llama-3.3-Nemotron-Super-49B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model's memory footprint, enabling larger workloads, as well as fitting the model on a single GPU at high workloads (H200). This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. For more information on the NAS approach, please refer to this paper. |
Nvidia Nemotron Ultra 253B | $0.68 | $1.36 | nvidia/Llama-3.1-Nemotron-Ultra-253B-v1 | Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.1-405B-Instruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. The model supports a context length of 128K tokens. This model fits on a single 8xH100 node for inference. Llama-3.1-Nemotron-Ultra-253B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model's memory footprint, enabling larger workloads, as well as reducing the number of GPUs required to run the model in a data center environment. This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. Furthermore, by using a novel method to vertically compress the model (see details here), it also offers a significant improvement in latency. |
OlympicCoder 32B | $0.85 | $0.85 | open-r1/OlympicCoder-32B | A code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics. |
OlympicCoder 7B | $0.34 | $0.34 | open-r1/OlympicCoder-7b | A lightweight code model that performs well on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics. |
OpenAI o1 | $25.50 | $102.00 | o1 | OpenAI's flagship reasoning model for solving hard problems. Useful when tackling complex problems in science, coding, math, and similar fields. |
OpenAI o1 Pro | $255.00 | $1020.00 | openai/o1-pro | OpenAI's flagship series of reasoning models for solving hard problems. The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers. o1 pro comes with a massive 100,000 word output window. |
OpenAI o1 preview | $25.50 | $102.00 | o1-preview | OpenAI's new flagship series of reasoning models for solving hard problems. Useful when tackling complex problems in science, coding, math, and similar fields |
OpenAI o1-mini | $5.10 | $20.40 | o1-mini | A fast, cost-efficient version of OpenAI's o1 reasoning model tailored to coding, math, and science use cases. |
OpenAI o3 | $10.00 | $40.00 | o3 | Full version of OpenAI's o3. The current flagship model by OpenAI which OpenAI sees as getting close to true AGI. |
OpenAI o3-mini | $1.87 | $7.48 | o3-mini | The cheaper version of OpenAI's newest thinking model. Fast, cheap, and with a maximum output of 100,000 words. |
OpenAI o3-mini high | $1.87 | $7.48 | o3-mini-high | OpenAI's newest flagship model with reasoning effort set to high. |
OpenAI o3-mini low | $1.87 | $7.48 | o3-mini-low | OpenAI's newest flagship model with reasoning effort set to low. |
OpenAI o4-mini | $1.10 | $4.40 | o4-mini | o4 mini is the mini version of what will be the next version of OpenAI models. |
OpenAI o4-mini high | $1.10 | $4.40 | o4-mini-high | The maximum/high version of the o4-mini model. The next generation of OpenAI models. |
Perplexity Deep Research | $3.40 | $13.60 | sonar-deep-research | Currently unstable. Analyzes hundreds of sources, delivering expert-level insights in minutes. Deep Research API has a 93.9% accuracy on SimpleQA benchmark and attains a score of 21.1% accuracy on Humanity's Last Exam, significantly outperforming Gemini Thinking, o3-mini, o1, and DeepSeek-R1. |
Perplexity Pro | $5.10 | $25.50 | sonar-pro | Sonar Pro tackles complex questions that need deeper research and provides more sources. |
Perplexity R1 1776 | $3.40 | $13.60 | r1-1776 | R1 1776 is a version of the DeepSeek R1 model that has been post-trained by Perplexity to provide uncensored, unbiased, and factual information. |
Perplexity Reasoning | $1.70 | $8.50 | sonar-reasoning | Perplexity's Sonar Reasoning uses DeepSeek R1's thinking process combined with looking up on the web to tackle complex questions that need deeper research and provides more sources. |
Perplexity Reasoning Pro | $3.40 | $13.60 | sonar-reasoning-pro | Perplexity's Sonar Reasoning Pro uses DeepSeek R1's thinking process combined with looking up on the web to tackle complex questions that need deeper research and provides more sources. |
Perplexity Simple | $1.70 | $1.70 | sonar | A Perplexity model that gives fast, straightforward answers. |
Phi 4 Mini | $0.20 | $0.82 | phi-4-mini-instruct | Phi 4 Mini by Microsoft. A small multilingual model. |
Phi 4 Multimodal | $0.12 | $0.19 | phi-4-multimodal-instruct | Phi 4 by Microsoft. A small multimodal model that can handle images and text. |
QwQ 32B Snowdrop v0 | $0.34 | $0.34 | QwQ-32B-Snowdrop-v0 | QwQ 32B Snowdrop v0 |
QwQ 32B Snowdrop v0 nothink | $0.34 | $0.34 | QwQ-32B-Snowdrop-v0-nothink | QwQ 32B Snowdrop v0 nothink |
QwQ 32b Arli V1 | $0.34 | $0.34 | QwQ-32B-ArliAI-RpR-v1 | A QwQ 32b finetuned for roleplay and storytelling. |
QwQ 32b Arli V2 | $0.34 | $0.34 | QwQ-32B-ArliAI-RpR-v2 | A QwQ 32b finetuned for roleplay and storytelling. |
QwQ 32b Arli V3 | $0.34 | $0.34 | QwQ-32B-ArliAI-RpR-v3 | A QwQ 32b finetuned for roleplay and storytelling. |
Qwen 2.5 32b EVA | $0.85 | $0.85 | Qwen2.5-32B-EVA-v0.2 | A Qwen 2.5 32b finetuned for roleplay and storytelling. |
Qwen 2.5 72B Instruct | $0.59 | $0.68 | Qwen2.5-72B-Instruct | Qwen 2.5 72B Instruct |
Qwen 2.5 Coder 32b | $0.27 | $0.27 | Qwen/Qwen2.5-Coder-32B-Instruct | The latest series of Code-Specific Qwen large language models. |
Qwen 2.5 Max | $2.72 | $10.88 | qwen-max | Qwen 2.5 Max is the upgraded version of Qwen Max, beating GPT-4o, Deepseek V3 and Claude 3.5 Sonnet in benchmarks. |
Qwen 3 14b | $0.06 | $0.18 | qwen/qwen3-14b | Qwen 3 14b is a 14b model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning. |
Qwen 3 235b A22B | $0.13 | $0.50 | qwen/qwen3-235b-a22b | Qwen 3 235b is a 235b model with 22B active parameters. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning. |
Qwen 3 30b A3B | $0.08 | $0.24 | qwen/qwen3-30b-a3b | Qwen 3 30b A3B is a 30b model with 3 billion active parameters per pass. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning. |
Qwen 3 32b | $0.08 | $0.24 | qwen/qwen3-32b | Qwen 3 32b is a 32b model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning. |
Qwen 3 8B | $0.06 | $0.18 | Qwen/Qwen3-8B | Qwen 3 8B is a 8B model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning. |
Qwen Long 10M | $0.17 | $0.68 | qwen-long | Alibaba's huge context window model. Takes in up to 10 million tokens, which is equivalent to dozens of books. |
Qwen Plus | $0.68 | $2.04 | qwen-plus | Alibaba's balanced model. Fast, cheap, yet still very powerful. |
Qwen QwQ 32B Preview | $0.68 | $0.68 | Qwen/QwQ-32B-Preview | Experimental release of Qwen's reasoning model. Great at coding and math, but still in development so may exhibit odd bugs. Not production-ready. |
Qwen Turbo | $0.09 | $0.34 | qwen-turbo | Alibaba's fastest and cheapest model. Suitable for simple tasks, fast and low cost, with a 1 million token context window. |
Qwen2.5 72B | $0.59 | $0.68 | qwen/qwen-2.5-72b-instruct | Great multilingual support, strong at mathematics and coding, supports roleplay and chatbots. |
Qwen2.5 7B TEE | $0.60 | $0.60 | TEE/qwen-2.5-7b-instruct | Great multilingual support, strong at mathematics and coding, supports roleplay and chatbots. Runs inside a GPU TEE for full, provable privacy. |
Qwen25 VL 72b | $1.19 | $1.19 | qwen25-vl-72b-instruct | Qwen25 VL 72b model with 32k context window |
Qwen3 32B | $0.34 | $0.34 | Qwen3-32B | Qwen3 32B |
Qwen: QvQ Max | $2.38 | $9.01 | qvq-max | QvQ Max is the top model of the Qwen series. QvQ Max is capable of thinking and reasoning, can achieve significantly enhanced performance especially on hard problems. |
Qwen: QwQ 32B | $0.34 | $0.34 | qwq-32b | QwQ is the reasoning model of the Qwen series. QwQ is capable of thinking and reasoning, can achieve significantly enhanced performance especially on hard problems. |
Qwerky 72B | $0.85 | $0.85 | featherless-ai/Qwerky-72B | Linear models offer a promising approach to significantly reduce computational costs at scale, particularly for large context lengths. Enabling a >1000x improvement in inference costs, enabling o1 inference time thinking and wider AI accessibility. |
ReMM SLERP 13B | $1.36 | $2.04 | undi95/remm-slerp-l2-13b | A recreation trial of the original MythoMax-L2-B13 but merged with updated models. |
Rocinante 12b | $0.68 | $1.02 | TheDrummer/Rocinante-12B-v1.1 | Designed for engaging storytelling and rich prose. Expanded vocabulary with unique and expressive word choices, enhanced creativity and captivating stories. |
Sao10K Stheno 8b | $0.85 | $0.85 | Sao10K/L3-8B-Stheno-v3.2 | Sao10K's latest Stheno fine-tune optimized for instruction following. |
Sarvan Medium | $0.42 | $1.28 | sarvan-medium | Sarvam AI has launched Sarvam-M, a 24-billion-parameter hybrid language model boasting strong performance in math, programming, and Indian languages. |
Shisa V2 Llama 3.3 70B | $0.85 | $0.85 | shisa-ai/shisa-v2-llama3.3-70b | Shisa V2 is a family of bilingual Japanese/English language models ranging from 7B to 70B parameters, optimized for high-quality Japanese language capabilities while maintaining strong English performance. |
SorcererLM 8x22B | $7.65 | $7.65 | raifle/sorcererlm-8x22b | Advanced roleplaying model with reasoning and emotional intelligence for engaging interactions, contextual awareness and enhanced narrative depth |
Steelskull Electra R1 70b | $1.19 | $1.19 | Steelskull/L3.3-Electra-R1-70b | Steelskull Electra R1 70b |
Steelskull Nevoria 70b | $0.85 | $0.85 | Steelskull/L3.3-MS-Nevoria-70b | Steelskull Nevoria 70b |
Steelskull Nevoria R1 70b | $0.85 | $0.85 | Steelskull/L3.3-Nevoria-R1-70b | Steelskull Nevoria R1 70b |
Step R1 V Mini | $4.25 | $18.70 | step-r1-v-mini | Step-R1-V-Mini, which supports image and text input, text output, has good instruction following and general capabilities, can perceive images with high precision and complete complex reasoning tasks. |
Step-2 16k Exp | $11.90 | $34.00 | step-2-16k-exp | Step-2 16k Exp is a 16k context window model |
Step-2 Mini | $0.34 | $0.68 | step-2-mini | MiniMax's flagship model with a 1M token context window |
The Drummer Cydonia 24B | $0.17 | $0.20 | TheDrummer/Cydonia-24B-v2 | Cydonia 24B v2 is a finetune of Mistral's latest 'Small' model (2501). Aliases: Cydonia 24B, Cydonia v2, Cydonia on that broken base. |
The Omega Abomination V1 | $0.70 | $0.95 | ReadyArt/The-Omega-Abomination-L-70B-v1.0 | The merger of the Omega Directive M 24b v1.1 and Cydonia 24b v2. |
The Omega Abomination V5 | $0.70 | $0.95 | ReadyArt/The-Omega-Abomination-L-70B-v5.0 | The merger of the Omega Directive M 24b v1.1 and Cydonia 24b v2. |
TheDrummer Skyfall 36B V2 | $0.85 | $0.85 | thedrummer/skyfall-36b-v2 | TheDrummer's Skyfall 36B V2, a 36B parameter model with a focus on high quality and consistency. |
TheDrummer: Valkyrie 49B V1 | $1.02 | $1.36 | parasail-valkyrie-49b-v1 | Built on top of NVIDIA's Llama 3 Nemotron Super 49B. |
UnslopNemo 12b v4 | $0.85 | $0.85 | TheDrummer/UnslopNemo-12B-v4.1 | UnslopNemo v4 is the previous version from the creator of Rocinante, designed for adventure writing and role-play scenarios. |
Veiled Calla 12B | $0.51 | $0.51 | soob3123/Veiled-Calla-12B | Veiled Calla 12B is a 12B parameter model that is a more advanced version of Calla 12B. |
WizardLM-2 8x22B | $0.85 | $0.85 | microsoft/wizardlm-2-8x22b | Microsoft's advanced Wizard model. The most popular role-playing model. |
Yi Large | $5.44 | $5.44 | yi-large | Large version of Yi Lightning with a 32k context window, but more expensive. |
Yi Lightning | $0.34 | $0.34 | yi-lightning | Chinese-developed multilingual (English, Chinese and others) model by 01.ai that's very fast and cheap, yet scores high on independent leaderboards. |
Yi Medium 200k | $4.25 | $4.25 | yi-medium-200k | Medium version of Yi with a 200k context window. |
Yi Medium 200k | $0.02 | $0.02 | yi-34b-chat-200k | Medium version of Yi Lightning with a huge 200k context window |
Yi Spark | $0.02 | $0.02 | yi-34b-chat-0205 | Small and powerful, lightweight and fast model. Provides enhanced mathematical operation and code writing capabilities. |
Image models
POST https://nano-gpt.com/api/create-image
BAGEL | bagel | BAGEL is a high-quality text-to-image model with excellent prompt adherence and creative capabilities. Supports both text-to-image and image-to-image generation. Supports thought tokens for enhanced generation quality. |
DALL-E-3 | dall-e-3 | OpenAI's most well-known image model. |
DALL-E-3 HD | dall-e-3-hd | OpenAI's most well-known image model, now in HD quality. |
Dreamshaper XL | dreamshaper_8_93211.safetensors | Dreamshaper generates realistic and anime/illustration-style images, and is best suited to sci-fi and fantasy scenes. |
Flux Dev | flux-dev-image-to-image | Flux Dev: Next-gen image-to-image model for advanced creative edits and improvements. |
Flux Lightning | flux-lightning | Juggernaut Lightning by FAL, for the fastest text-to-image generation with high-quality results. |
Flux Lora | flux-lora | FLUX.1 [dev] with LoRA support, fast and high-quality image generation with the option to use LORAs for specific styles. |
Flux Pro V1 | flux-pro | Older version of Flux V1.1. Exceptional quality and prompt adherence. |
Flux Pro V1.1 | flux-pro/v1.1 | Excellent image quality, prompt adherence, and output diversity. |
Flux Pro V1.1 Ultra | flux-pro/v1.1-ultra | 4K version of Flux Pro V1.1. Excellent image quality, prompt adherence, and output diversity. |
Flux Realism | flux-realism | Incredibly photorealistic image generation. Generate people, animals, landscapes that are hard to distinguish from reality. |
Flux Schnell | flux/schnell | Fast and high-quality image generation - the cheaper version of the Flux range of models. |
GPT-4o Image | gpt-4o-image-vip | OpenAI's GPT-4o image generation model. Currently in preview mode. Supports both text to image and image-to-image generation. |
Gemini Image Edit | gemini-flash-edit | Edit an existing image using Gemini based on a text prompt. Google-based, so quite censored. |
Ghiblify | ghiblify | Transforms an input image into a Ghibli-inspired style. |
Hidream | hidream | Hidream I1 Full, the latest and greatest image generation model from Hidream. Supports specific parameters like shift. |
Hidream Edit | hidream-edit | Edit an existing image using Hidream I1 based on a text prompt. |
Ideogram V2 | ideogram-ai/ideogram-v2 | An excellent image model with state of the art inpainting, prompt comprehension and especially text rendering. |
Ideogram V2 Turbo | ideogram-ai/ideogram-v2-turbo | A fast image model with state of the art inpainting, prompt comprehension and especially text rendering. |
Ideogram V3 | ideogram-v3-default | Ideogram V3 (Default) provides a good balance between speed and quality for text-to-image generation. |
Ideogram V3 Quality | ideogram-v3-quality | Ideogram V3 (Quality) offers the highest fidelity images and poster-grade text rendering capabilities. |
Ideogram V3 Turbo | ideogram-v3-turbo | Ideogram V3 (Turbo) generates images with lightning-fast speed and high text rendering accuracy. |
Image Model Recommender | auto-image-selection | Categorizes your prompt and recommends the best image model for your task. Does not immediately generate an image itself. |
Imagen V3 | imagen-3.0-generate-002 | Google's highest quality text-to-image model with fine detail, rich lighting, and excellent text rendering capabilities. |
Imagen V4 Preview | imagen-4.0-generate-preview-05-20 | Google's highest quality image generation model. Excels at fine details, diverse art styles, and understanding prompts. Works best when limited to 500 words input. |
Imagen V4 Ultra | imagen-4.0-ultra-generate-exp-05-20 | Ultra version of Google's highest quality image generation model. Excels at fine details, diverse art styles, and understanding prompts. Works best when limited to 500 words input. |
Midjourney | midjourney | Midjourney creates stunningly detailed and imaginative images from simple text prompts. Note: generates 4 images at once, price shown is for 4 images. |
Movie Generator | longstories | Uses LongStories AI to generate high-quality content from text prompts. Generates engaging short stories, similar to Youtube Shorts, TikTok clips etc, on any subject you want. Offers many customization options. Note: generation can take from 30 seconds to a few minutes. |
Movie Generator for Kids | longstories-kids | Generates engaging short stories for kids on any subject you want. Offers many customization options. Note: generation can take from 30 seconds to a few minutes. |
Playground V2.5 | playground-v25 | Playground V2.5 outperforms SDXL in many user tests. Suitable for a broad range of images. |
Promptchan | promptchan | High-quality image generation with lots of customization options. |
Proteus | proteus-v0.2 | A versatile image generation model with high-quality outputs. |
ReV Animated | revAnimated_v122.safetensors | ReV Animated specialized in fantasy, anime and semi-realistic landscapes. |
Recraft V3 | recraft-v3 | Recraft V3 is a state-of-the-art image generation model that is known for its high quality and prompt adherence. |
SD 3.5 Large | stable-diffusion-v35-large | Stable Diffusion's newest model. Generates a wide variety of images reflecting different styles without complex prompting. |
SD 3.5 Large Turbo | stable-diffusion-v35-large/turbo | Turbo version of Stable Diffusion's newest model. Faster and cheaper performance while still maintaining great prompt adherence and quality. |
SDXL ArliMix V1 | SDXL-ArliMix-v1 | Image generation using SDXL ArliMix V1 via Arli AI. Cartoon-like generation |
Seedream 3.0 | general_v3.0 | Supports native 2K resolution output, offers faster response speeds, generates more accurate small text, improves text layout effects, enhances aesthetics and structural quality, and demonstrates excellent fidelity and detail performance. It has achieved leading rankings in multiple evaluations. |
Stable Diffusion 3 Medium | sd3_base_medium.safetensors | Excels at photorealism, typography, and prompt following. Works best in 1024x1024. |
Stable Diffusion XL | fast-sdxl | Cheap and powerful text-to-image model that generates pictures rapidly. |