API

For our documentation, head to docs.nano-gpt.com.

API keys

Generate up to 5 API keys to use NanoGPT in other applications. If you require more keys, please contact us at support@nano-gpt.com and we will help you out.

Authenticate by including your API key as a HTTP header: "Authorization": f"Bearer API_KEY" or "api-key": "API_KEY" depending on the endpoint.

NameStatusCreatedAPI Key

Get notified about API updates.

We will only use this to contact you updates to how the API works. You can unsubscribe at any time.

If you are a (potentially) large user of our website or our API, we are glad to have you. Reach out to us at support@nano-gpt.com or join our Discord for a discount.

API Reference

For our documentation, head to docs.nano-gpt.com.

The below example code can be used in Python, NanoGPTjs is a great starting point for JS users.

If you encounter issues or need further information please contact support@nano-gpt.com

Text models
POST https://nano-gpt.com/api/talk-to-gpt
ASI1 Mini$1.70$1.70asi1-miniASI-1 Mini introduces next-level adaptive reasoning, context-aware decision-making. It features native reasoning support with four dynamic reasoning modes, intelligently selecting from Multi-Step, Complete, Optimized, and Short Reasoning, balancing depth, efficiency, and precision. Whether tackling complex, multi-layered problems or delivering concise, high-impact insights, ASI-1 Mini ensures reasoning is always tailored to the task at hand. Note: this model is rate limited at the moment.
Aion 1.0$6.80$13.60aion-labs/aion-1.0Aion Labs most powerful reasoning model with high performance across reasoning and coding.
Aion 1.0 mini (DeepSeek)$1.19$2.38aion-labs/aion-1.0-miniA distilled version of the DeepSeek-R1 model that excels in reasoning domains like mathematics, coding, and logic.
Amazon Nova Lite 1.0$0.10$0.41amazon/nova-lite-v1Amazon's new lower cost model. Can handle up to 300k input tokens, with faster output but less thorough understanding than Amazon's Nova Pro.
Amazon Nova Micro 1.0$0.06$0.24amazon/nova-micro-v1Amazon's lowest cost model. Comparable to GPT-4o-mini and Gemini 1.5 Flash, with the fastest output.
Amazon Nova Pro 1.0$1.36$5.44amazon/nova-pro-v1Amazon's new flagship model. Can handle up to 300k input tokens, with comparable performance to ChatGPT and Claude 3.5 Sonnet.
Amoral Gemma3 27B v2$0.51$0.51soob3123/amoral-gemma3-27B-v2Amoral Gemma3 27B v2 is a 27B parameter model that is a more advanced version of Gemma3 27B.
Anubis 70B v1$0.85$0.85TheDrummer/Anubis-70B-v1L3.3 finetune for roleplaying.
Anubis Pro 105b v1$1.36$1.70anubis-pro-105b-v1An upscaled version of Llama 3.3 70B with 50% more layers. Finetuned further to make use of its new layers.
Athene V2 Chat$0.85$0.85Nexusflow/Athene-V2-ChatAn open-weights LLM on-par with GPT-4o across benchmarks.
Azure gpt-4-turbo$17.00$51.00azure-gpt-4-turboAzure version of OpenAI gpt-4-turbo
Azure gpt-4o$4.25$17.00azure-gpt-4oAzure version of OpenAI gpt-4o
Azure gpt-4o-mini$0.25$1.02azure-gpt-4o-miniAzure version of OpenAI gpt-4o-mini
Azure o1$25.50$102.00azure-o1Azure version of OpenAI o1
Azure o3-mini$1.87$7.48azure-o3-miniAzure version of OpenAI o3-mini
ChatGPT 4o$8.50$25.50chatgpt-4o-latestOpenAI's current recommended model, the well-known ChatGPT.
ChatGPT 4o Reasoner$8.50$25.50chatgpt-4o-latest-reasoner'DeepChatGPT', fusion of ChatGPT 4o and Deepseek R1. Deepseek R1 reasons, then feeds it into ChatGPT 4o to generate a response.
Claude 3 Opus$25.50$127.50claude-3-opus-20240229Anthropic's flagship model, outperforming GPT-4 on most benchmarks.
Claude 3.5 Haiku$13.60$6.80claude-3-5-haiku-20241022Anthropic's updated faster and cheaper model, offering good results on chatbots and coding.
Claude 3.5 Sonnet$5.10$25.50claude-3-5-sonnet-20241022One of Anthropic's top models, offering even better results on many subjects than GPT-4o.
Claude 3.5 Sonnet Old$5.10$25.50claude-3-5-sonnet-20240620Anthropic's most intelligent model, offering even better results on many subjects than GPT-4o.
Claude 3.7 Sonnet$5.10$25.50claude-3-7-sonnet-20250219Anthropic's updated most intelligent model. Preferred by many for its programming skills and its natural language.
Claude 3.7 Sonnet Reasoner$5.10$25.50claude-3-7-sonnet-reasonerClaude 3.7 Sonnet Reasoner blends Deepseek R1's reasoning with Claude 3.7 Sonnet's response.
Claude 3.7 Sonnet Thinking$5.10$25.50claude-3-7-sonnet-thinkingAnthropic's Claude 3.7 Sonnet with the ability to show its thinking process step by step.
Claude 3.7 Sonnet Thinking (128K)$5.10$25.50claude-3-7-sonnet-thinking:128000Claude 3.7 Sonnet with maximum thinking budget (128,000 tokens).
Claude 3.7 Sonnet Thinking (1K)$5.10$25.50claude-3-7-sonnet-thinking:1024Claude 3.7 Sonnet with minimal thinking budget (1,024 tokens).
Claude 3.7 Sonnet Thinking (32K)$5.10$25.50claude-3-7-sonnet-thinking:32768Claude 3.7 Sonnet with extended thinking budget (32,768 tokens).
Claude 3.7 Sonnet Thinking (8K)$5.10$25.50claude-3-7-sonnet-thinking:8192Claude 3.7 Sonnet with reduced thinking budget (8,192 tokens).
Claude 4 Opus$25.50$127.50claude-opus-4-20250514Claude 4 Opus by Anthropic. The premium version of the new Claude models. A new generation model with improved capabilities, especially on programming and development.
Claude 4 Opus Thinking$25.50$127.50claude-opus-4-thinkingAnthropic's Claude 4 Opus with the ability to show its thinking process step by step.
Claude 4 Opus Thinking (128K)$25.50$127.50claude-opus-4-thinking:128000Claude 4 Opus with maximum thinking budget (128,000 tokens).
Claude 4 Opus Thinking (1K)$25.50$127.50claude-opus-4-thinking:1024Claude 4 Opus with minimal thinking budget (1,024 tokens).
Claude 4 Opus Thinking (32K)$25.50$127.50claude-opus-4-thinking:32768Claude 4 Opus with extended thinking budget (32,768 tokens).
Claude 4 Opus Thinking (8K)$25.50$127.50claude-opus-4-thinking:8192Claude 4 Opus with reduced thinking budget (8,192 tokens).
Claude 4 Sonnet$5.10$25.50claude-sonnet-4-20250514Claude 4 Sonnet by Anthropic. A new generation model with improved capabilities, especially on programming and development.
Claude 4 Sonnet Thinking$5.10$25.50claude-sonnet-4-thinkingAnthropic's Claude 4 Sonnet with the ability to show its thinking process step by step.
Claude 4 Sonnet Thinking (128K)$5.10$25.50claude-sonnet-4-thinking:128000Claude 4 Sonnet with maximum thinking budget (128,000 tokens).
Claude 4 Sonnet Thinking (1K)$5.10$25.50claude-sonnet-4-thinking:1024Claude 4 Sonnet with minimal thinking budget (1,024 tokens).
Claude 4 Sonnet Thinking (32K)$5.10$25.50claude-sonnet-4-thinking:32768Claude 4 Sonnet with extended thinking budget (32,768 tokens).
Claude 4 Sonnet Thinking (8K)$5.10$25.50claude-sonnet-4-thinking:8192Claude 4 Sonnet with reduced thinking budget (8,192 tokens).
Cogito v1 Preview Qwen 32B$3.06$2.75deepcogito/cogito-v1-preview-qwen-32B32-B parameter reasoning model from DeepCogito (Qwen backbone) – strong general reasoning & coding at low price.
Cohere: Command R$0.81$2.42cohere/command-r35B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents
Cohere: Command R+$4.85$24.23cohere/command-r-plus-08-2024104B parameter model that performs conversational language tasks at a higher quality, more reliably, and with a longer context than previous models. It can be used for complex workflows like code generation, retrieval augmented generation (RAG), tool use, and agents
Damascus R1.$0.85$0.85Steelskull/L3.3-Damascus-R1Damascus-R1 builds upon some elements of the Nevoria foundation but represents a significant step forward with a completely custom-made DeepSeek R1 Distill base: Hydroblated-R1-V3. Constructed using the new SCE (Select, Calculate, and Erase) merge method, Damascus-R1 prioritizes stability, intelligence, and enhanced awareness.
Dazzling Star Aurora 32b$0.85$0.85Qwen2.5-32B-Dazzling-Star-Aurora-32b-v0.0A Qwen 2.5 32b finetuned for roleplay and storytelling.
DeepClaude$5.10$25.50deepclaudeHarness the power of DeepSeek R1's reasoning combined with Claude's creativity and code generation. Feeds your query into DeepSeek R1, then feeds the query + thinking process into Claude 3.5 Sonnet and returns an answer. Note: this routes through original DeepSeek meaning your data may be stored and used by DeepSeek.
DeepCoder 14B Preview$0.25$0.25agentica-org/DeepCoder-14B-PreviewDeepCoder-14B-Preview is a code reasoning LLM fine-tuned from DeepSeek-R1-Distilled-Qwen-14B using distributed reinforcement learning (RL) to scale up to long context lengths. The model achieves 60.6% Pass@1 accuracy on LiveCodeBench v5 (8/1/24-2/1/25), representing a 8% improvement over the base model (53%) and achieving similar performance to OpenAI's o3-mini with just 14B parameters.
DeepHermes-3 Mistral 24B (Preview)$1.02$0.90NousResearch/DeepHermes-3-Mistral-24B-Preview24-B parameter Mistral model fine-tuned by NousResearch for balanced reasoning & creativity.
DeepSeek Chat 0324$0.24$0.48deepseek-v3-0324DeepSeek V3 0324, DeepSeek's 03 March 2025 V3 model, optimized for general-purpose tasks.
DeepSeek Prover V2 671B$0.09$0.46deepseek-ai/DeepSeek-Prover-V2-671BDeepseek Prover V2 is a specialized model primarily used for proving math equations. Not a general model.
DeepSeek R1$0.46$1.95deepseek-r1-nanoDeepSeek's R1 is a thinking model, scoring very well on all benchmarks at low cost. This version is run via open-source providers and Azure, never routing through DeepSeek themselves.
DeepSeek R1 70B TEE$0.30$1.05TEE/deepseek-r1-70bDeepSeek's R1 model distilled into Llama 70B architecture for improved efficiency. Runs inside a GPU TEE for full, provable privacy.
DeepSeek R1 Fast$8.50$11.90deepseek-r1-sambanovaDeepSeek R1 via Sambanova: the full model with very fast output. Note: max 4k output tokens.
DeepSeek R1 Llama 70b$0.25$0.25deepseek-r1-llama-70bDeepSeek R1 Llama 70b is a fine-tuned version of DeepSeek R1 on Llama 70B.
DeepSeek R1 Zero Preview$3.74$3.74deepseek-ai/DeepSeek-R1-ZeroPreview version of Deepseek R1, also known as DeepSeek R1 Zero. Deepseek R1 without the supervised finetuning.
DeepSeek Reasoner$0.51$2.04deepseek-reasonerDeepSeek-R1 is now live and open source, rivaling OpenAI's Model o1.
DeepSeek V3/Chat Cheaper$0.20$0.41deepseek-chat-cheaperCheaper version of Deepseek V3/Chat. Note: may be routed through Deepseek itself.
DeepSeek V3/Deepseek Chat$0.24$0.48deepseek-chatDeepSeek original V3 model, trained on nearly 15 trillion tokens, matches leading closed-source models at a far lower price.
Deepseek R1 Cheaper$0.42$1.70deepseek-reasoner-cheaperCheaper version of DeepSeek R1. Note: may be routed through Chinese providers.
Deepseek R1 Llama 70b Abliterated$0.99$0.99huihui-ai/DeepSeek-R1-Distill-Llama-70B-abliteratedUncensored version of the Deepseek R1 Llama 70B model
Deepseek R1 Qwen Abliterated$0.99$0.99huihui-ai/DeepSeek-R1-Distill-Qwen-32B-abliteratedUncensored version of the Deepseek R1 Qwen 32B model
Deepseek R1 T Chimera$0.46$1.95tngtech/DeepSeek-R1T-ChimeraDeepseek V3 0324 with R1 reasoning using a novel construction method. In benchmarks, it appears to be as smart as R1 but much faster, using 40% fewer output tokens.
Dolphin 2.9.2 Mixtral 8x22B$1.53$1.53cognitivecomputations/dolphin-mixtral-8x22bSuccessor to Dolphin 2.6 Mixtral 8x7b. Great for instruction following, conversational, and coding.
Dolphin 72b$0.51$0.51dolphin-2.9.2-qwen2-72bDolphin is the most uncensored model yet, built on top of Qwen's 72b model.
Doubao 1.5 Pro 256k$1.02$1.70doubao-1.5-pro-256kDoubao's (Bytedance) flagship model with a 256k token context window
Doubao 1.5 Pro 32k$0.17$0.42doubao-1.5-pro-32kDoubao's (Bytedance) pro model with a 32k token context window
Doubao 1.5 Thinking Pro$1.02$4.08doubao-1-5-thinking-pro-250415Doubao-1.5 is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry.
Doubao 1.5 Thinking Pro Vision$1.02$4.08doubao-1-5-thinking-pro-vision-250415Vision version of Doubao-1.5 pro thinking which is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry.
Doubao 1.5 Thinking Vision Pro$0.94$2.43doubao-1-5-thinking-vision-pro-250428Doubao-1.5 is a new deep thinking model that performs well in professional fields such as mathematics, programming, scientific reasoning, and general tasks such as creative writing. It has achieved outstanding results in AIME 2024, Codeforces, GPQA and other authoritative benchmarks have reached or are close to the first-tier level in the industry.
Doubao 1.5 Vision Pro 32k$0.68$1.70doubao-1.5-vision-pro-32kDoubao's (Bytedance) vision-enabled pro model (JPG only) with a 32k token context window
EVA Llama 3.33 70B$3.40$3.40EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.0A RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.
EVA Qwen2.5 72B$8.50$8.50eva-unit-01/eva-qwen-2.5-72bFull-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.
EVA-LLaMA-3.33-70B-v0.1$3.40$3.40EVA-UNIT-01/EVA-LLaMA-3.33-70B-v0.1A RP/storywriting specialist model, full-parameter finetune of Llama-3.3-70B-Instruct on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.
EVA-Qwen2.5-32B-v0.2$1.36$1.36EVA-UNIT-01/EVA-Qwen2.5-32B-v0.2A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-32B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.
EVA-Qwen2.5-72B-v0.2$1.19$1.19EVA-UNIT-01/EVA-Qwen2.5-72B-v0.2A RP/storywriting specialist model, full-parameter finetune of Qwen2.5-72B on mixture of synthetic and natural data. It uses Celeste 70B 0.1 data mixture, greatly expanding it to improve versatility, creativity and flavor of the resulting model.
Ernie 4.5 8k Preview$1.12$4.42ernie-4.5-8k-previewERNIE 4.5 is Baidu's new generation native multimodal foundation model independently developed by the company. It achieves collaborative optimization through joint modeling of multiple modalities, demonstrating exceptional multimodal comprehension capabilities. With refined language skills, it exhibits comprehensive improvements in understanding, generation, reasoning and memory, along with notable enhancements in hallucination prevention, logical reasoning, and coding abilities.
Ernie 4.5 Turbo 128k$0.22$0.94ernie-4.5-turbo-128kERNIE 4.5 Turbo demonstrates overall progress in hallucination reduction, logical reasoning, and coding abilities, with faster response. The multimodal capabilities of ERNIE 4.5 Turbo are on par with GPT-4.1 and superior to GPT-4o across multiple benchmarks.
Ernie 4.5 Turbo VL 32k$0.84$2.43ernie-4.5-turbo-vl-32kERNIE 4.5 Turbo demonstrates overall progress in hallucination reduction, logical reasoning, and coding abilities, with faster response. The multimodal capabilities of ERNIE 4.5 Turbo are on par with GPT-4.1 and superior to GPT-4o across multiple benchmarks.
Ernie X1 32k$0.56$2.24ernie-x1-32k-previewERNIE X1 is a Baidu model, surpassing earlier versions in terms of intelligence and maximum input/output size.
Ernie X1 32k$0.56$2.24ernie-x1-32kERNIE X1 is a deep-thinking reasoning model, outperforming DeepSeek R1 and the latest version of V3.
Ernie X1 Turbo 32k$0.28$1.12ernie-x1-turbo-32kERNIE X1 is a deep-thinking reasoning model, outperforming DeepSeek R1 and the latest version of V3.
Evayale 70b $0.85$0.85Steelskull/L3.3-MS-Evayale-70BCombination of EVA and Euryale.
GLM 4 32B 0414$0.34$0.34THUDM/GLM-4-32B-0414Features 32 billion parameters. Performance is comparable to OpenAI's GPT series and DeepSeek's V3/R1 series. Pre-trained on 15T of high-quality data, including reasoning-type synthetic data. Enhanced performance in instruction following, engineering code, and function calling.
GLM 4 9B 0414$0.34$0.34THUDM/GLM-4-9B-0414A 9B parameter version of the GLM-4 series, offering a balance of performance and efficiency.
GLM 4 Air 0111$0.24$0.24glm-4-air-0111MiniMax's flagship model with a 1M token context window
GLM 4 Plus 0111$17.00$17.00glm-4-plus-0111GLM 4 Plus 0111 is a 1M token context window model
GLM Z1 32B 0414$0.34$0.34THUDM/GLM-Z1-32B-0414A reasoning model with deep thinking capabilities, based on GLM-4-32B-0414. Further trained on tasks involving mathematics, code, and logic. Significantly improves mathematical abilities and capability to solve complex tasks.
GLM Z1 9B 0414$0.34$0.34THUDM/GLM-Z1-9B-04149B small-sized model maintaining the open-source tradition. Despite its smaller scale, GLM-Z1-9B-0414 still exhibits excellent capabilities in mathematical reasoning and general tasks. Its overall performance is already at a leading level among open-source models of the same size.
GLM Z1 AirX$0.12$0.12glm-z1-airIncredibly cheap yet highly performant Chinese model, comparable to Deepseek R1 in performance on many metrics at 1/30th the cost.
GLM Z1 AirX$1.19$1.19glm-z1-airxFastest reasoning model in China, with up to 200 tokens per second. The stronger version of GLM Z1 Air.
GLM Z1 Rumination 32B 0414$0.34$0.34THUDM/GLM-Z1-Rumination-32B-0414A deep reasoning model with rumination capabilities (benchmarked against OpenAI's Deep Research). Employs longer periods of deep thought to solve more open-ended and complex problems (e.g., writing a comparative analysis of AI development in two cities and their future development plans). Integrates search tools during its deep thinking process.
GLM Zero Preview$3.06$3.06glm-zero-previewGLM Zero Preview is a thinking model like o1, but with a smaller context window
GLM-4$25.50$25.50glm-4High-intelligence model with 128K context window
GLM-4 Air$0.34$0.34glm-4-airHigh-performance model with 128K context window
GLM-4 AirX$3.40$3.40glm-4-airxFastest GLM-4 variant with 8K context window
GLM-4 Flash$0.02$0.02glm-4-flashExtremely cheap model with 128K context window
GLM-4 Long$0.34$0.34glm-4-longExtended context model supporting up to 1M tokens
GLM-4 Plus$12.75$12.75glm-4-plusGLM high-intelligence flagship model with 128K context window
GPT 3.5 Turbo$0.85$2.55gpt-3.5-turboOlder model. Brought ChatGPT to the mainstream, seen as dated nowadays. 90% cheaper than GPT-4-Turbo, recommended for very simple tasks.
GPT 4 Turbo Preview$17.00$51.00gpt-4-turbo-previewCan take in the largest messages (up to 300 pages of context), and all round seen as one of the best in class models.
GPT 4.1$2.00$8.00openai/gpt-4.1GPT 4.1 is the new flagship model from OpenAI. Huge context window (1 mln tokens), outperforms GPT-4o and GPT 4.5 across coding and does very well at understanding large contexts.
GPT 4.1 Mini$0.40$1.60openai/gpt-4.1-miniMid-sized GPT 4.1, comparable to GPT4o with a far higher context window, at lower cost and with higher speed.
GPT 4.1 Nano$0.10$0.40openai/gpt-4.1-nanoCheapest model in the GPT-4.1 series. Huge context window with fast throughput and low latency.
GPT 4.1 Reasoner$3.40$13.60gpt-4.1-reasoner'DeepGPT 4.1', fusion of GPT 4.1 and Deepseek R1. Deepseek R1 reasons, then feeds it into GPT 4.1 to generate a response.
GPT 4.5$102.00$204.00gpt-4.5-previewGPT 4.5 Preview. Largest GPT model designed for creative tasks and agentic planning, currently available in a research preview.
GPT 4.5 Preview Reasoner$127.50$255.00gpt-4.5-preview-2025-02-27-reasoner'DeepGPT4.5', fusion of GPT 4.5 Preview and Deepseek R1. Deepseek R1 reasons, then feeds it into GPT 4.5 Preview to generate a response.
GPT 4o$2.26$9.01gpt-4oOpenAI's precusor to ChatGPT-4o. Great on English text and code, with significant improvements on text in non-English languages.
GPT 4o 08 06$4.25$17.00gpt-4o-2024-08-06OpenAI's precusor to ChatGPT-4o. Great on English text and code, with significant improvements on text in non-English languages.
GPT 4o 11 20$4.25$17.00gpt-4o-2024-11-20OpenAI's precusor to ChatGPT-4o. Great on English text and code, with significant improvements on text in non-English languages.
GPT 4o Mini Search$0.25$1.02gpt-4o-mini-search-previewGPT 4o Mini with web search built in natively via OpenAI.
GPT 4o Reasoner$4.25$17.00gpt-4o-reasoner'DeepGPT4o', fusion of GPT 4o and Deepseek R1. Deepseek R1 reasons, then feeds it into GPT 4o to generate a response.
GPT 4o Search$4.25$17.00gpt-4o-search-previewGPT 4o with web search built in natively via OpenAI.
GPT 4o mini$0.25$1.02gpt-4o-miniOpenAI's most cost-efficient small model. Cheaper and smarter than GPT-3.5 (the original ChatGPT), but less performant than gpt-4o
Gemini 1.5 Flash$0.13$0.51google/gemini-flash-1.5Google's fastest multimodal model with great performance for diverse, repetitive tasks and a 2 million words context window.
Gemini 2.0 Flash$0.17$0.68gemini-2.0-flash-001Upgraded version of Gemini Flash 1.5. Faster, with higher output, and overall increase in intelligence.
Gemini 2.0 Flash Exp$0.34$0.85gemini-2.0-flash-expExperimental version of Google's newest model, outperforming even Gemini 1.5 Pro.
Gemini 2.0 Flash Exp Search$0.34$1.02gemini-2.0-flash-exp-searchGoogle's newest model, outperforming even Gemini 1.5 Pro. Now with web access.
Gemini 2.0 Flash Lite$0.13$0.51gemini-2.0-flash-liteUpgraded version of Gemini Flash 1.5. Faster, with higher output, and overall increase in intelligence.
Gemini 2.0 Flash Thinking 0121$0.34$0.85gemini-2.0-flash-thinking-exp-01-21Google's newest model, outperforming even Gemini 1.5 Pro, now with a thinking mode enabled similar to the o1 series of OpenAI.
Gemini 2.0 Flash Thinking 1219$0.34$0.85gemini-2.0-flash-thinking-exp-1219Google's newest model, outperforming even Gemini 1.5 Pro, now with a thinking mode enabled similar to the o1 series of OpenAI.
Gemini 2.0 Pro 0205$3.40$13.60gemini-2.0-pro-exp-02-05Gemini 2.0 Pro Exp 0205, the latest version of the Gemini 2.0 Pro model.
Gemini 2.0 Pro 1206$4.25$17.00gemini-exp-1206Gemini 2.0 Pro 1206, the previous version of the Gemini 2.0 Pro model.
Gemini 2.0 Pro Reasoner$2.21$8.50gemini-2.0-pro-reasoner'DeepGemini', fusion of Gemini 2.0 Pro and Deepseek R1. Deepseek R1 reasons, then feeds it into Gemini 2.0 Pro to generate a response.
Gemini 2.5 Flash 0520$0.15$0.60gemini-2.5-flash-preview-05-20Gemini 2.5 Flash 0520, the latest version of the Gemini 2.5 Flash model.
Gemini 2.5 Flash 0520 Thinking$0.15$3.50gemini-2.5-flash-preview-05-20:thinkingGemini 2.5 Flash 0520, the latest version of the Gemini 2.5 Flash model.
Gemini 2.5 Flash Preview$0.25$1.02gemini-2.5-flash-preview-04-17Fast, cost efficient performance on complex tasks. The workhorse of the Gemini series.
Gemini 2.5 Flash Preview Thinking$0.25$5.95gemini-2.5-flash-preview-04-17:thinkingFast, cost efficient performance on complex tasks. The workhorse of the Gemini series. Thinking turned on by default
Gemini 2.5 Pro$4.25$17.00gemini-2.5-pro-preview-03-25Gemini 2.5 Pro Preview 0325. Google's newest model. Topping all the leaderboards as of March 28th 2025.
Gemini 2.5 Pro$4.25$17.00gemini-2.5-pro-preview-05-06Gemini 2.5 Pro Preview 0506. Google's newest model.
Gemini 2.5 Pro Experimental$4.25$17.00gemini-2.5-pro-exp-03-25Gemini 2.5 Pro Exp 0325. Google's newest model. Topping all the leaderboards as of March 28th 2025.
Gemini LearnLM Experimental$5.95$17.85learnlm-1.5-pro-experimentalLearnLM is a task-specific model trained to align with learning science principles when following system instructions for teaching and learning use cases. For instance, the model can take on tasks to act as an expert or guide to educate users on specific topics.
Gemini Text + Image$0.34$1.36gemini-2.0-flash-exp-image-generationGemini 2.0 Flash Image Generation. Can generate both text and images within the same prompt!
Gemma 3 12B IT$0.42$0.42unsloth/gemma-3-12b-itGemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.
Gemma 3 1B IT$0.17$0.17unsloth/gemma-3-1b-itGemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.
Gemma 3 27B IT$0.51$0.51unsloth/gemma-3-27b-itGemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.
Gemma 3 4B IT$0.25$0.25unsloth/gemma-3-4b-itGemma 3 has a large, 128K context window, multilingual support in over 140 languages, and is available in more sizes than previous versions.
Grayline Qwen3 8B$0.51$0.51soob3123/GrayLine-Qwen3-8BGrayline is an neutral AI assistant engineered for uncensored information delivery and task execution. This model operates without inherent ethical or moral frameworks, designed to process and respond to any query with objective efficiency and precision. Grayline's core function is to leverage its full capabilities to provide direct answers and execute tasks as instructed, without offering unsolicited commentary, warnings, or disclaimers. It accesses and processes information without bias or restriction.
Grok 3 Beta$5.10$25.50grok-3-betaGrok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking.
Grok 3 Fast Beta$8.50$42.50grok-3-fast-betaFaster output version of Grok 3. Grok 3 is the latest model from xAI. It's their flagship model that excels at enterprise use cases like data extraction, coding, and text summarization. Possesses deep domain knowledge in finance, healthcare, law, and science. Excels in structured tasks and benchmarks like GPQA, LCB, and MMLU-Pro where it outperforms Grok 3 Mini even on high thinking.
Grok 3 Mini Beta$0.51$0.85grok-3-mini-betaGrok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It's ideal for reasoning-heavy tasks that don't demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems.
Grok 3 Mini Fast Beta$1.02$6.80grok-3-mini-fast-betaFaster output version of Grok 3 Mini. Grok 3 Mini is a lightweight, smaller thinking model. Unlike traditional models that generate answers immediately, Grok 3 Mini thinks before responding. It's ideal for reasoning-heavy tasks that don't demand extensive domain knowledge, and shines in math-specific and quantitative use cases, such as solving challenging puzzles or math problems.
Hermes 3 70B TEE$0.75$0.75TEE/hermes-3-llama-3.1-70bHermes 3 is a generalist language model with many improvements over Hermes 2, including advanced agentic capabilities, much better roleplaying, reasoning, multi-turn conversation, long context coherence, and improvements across the board. Runs inside a GPU TEE for full, provable privacy.
Hermes 3 Large$3.40$5.10nousresearch/hermes-3-llama-3.1-405bLlama 3.1 405b with the brakes taken off. Less censored than the regular version, but not abliterated
Hunyuan T1$0.29$1.12hunyuan-t1-latestHunyuan T1 is Tencent's top tier reasoning model. Good at large scale reasoning, precise following of complex instructions, low hallucations and blazing fast outputs.
Hunyuan Turbo S$0.24$0.56hunyuan-turbos-20250226Hunyuan Turbo S by Tencent is a thinking model that responds instantly.
Inflection 3 Pi$4.25$17.00inflection/inflection-3-piA chatbot with emotional intelligence. Has access to recent news, excels in scenarios like customer support and roleplay. Mirrors your conversation style.
Inflection 3 Productivity$4.25$17.00inflection/inflection-3-productivityOptimized for instruction following. Good at tasks that require precise adherence to provided guidelines. Has access to recent news.
Jamba Large 1.6$3.40$13.60jamba-large-1.6Its ability to process large volumes of unstructured data (256k tokens) with high accuracy makes it ideal for summarization and document analysis.
Jamba Mini 1.6$0.34$0.68jamba-mini-1.6Smaller and cheaper version of Jamba Large 1.6 (52B parameters versus 398B parameters), ideal for smaller tasks and lower budgets, still with a 256k context window.
Kimi Latest$8.50$8.50kimi-latestAlways point to the latest stable Kimi model.
Kimi Thinking Preview$53.48$53.48kimi-thinking-previewKimi Thinking Preview is a new model that is capable of thinking and reasoning. It's quite expensive!
Kimi VL Thinking$1.02$1.02moonshotai/Kimi-VL-A3B-ThinkingEfficient open-source MoE vision-language model (2.8B active params) with advanced multimodal reasoning, 128K long-context understanding, strong agent capabilities, and long-thinking variant. Excels in multi-turn agent tasks, image/video comprehension, OCR, math reasoning, and multi-image understanding. Competes with GPT-4o-mini, Qwen2.5-VL-7B, Gemma-3-12B-IT, surpasses GPT-4o in some domains.
LatitudeGames WayFarer 12B$0.34$0.34Mistral-Nemo-12B-WayfarerLatitude Games Wayfarer 12B
Llama 3 70B abliterated$0.99$0.99failspy/Meta-Llama-3-70B-Instruct-abliterated-v3.5An abliterated (removed restrictions and censorship) version of Llama 3.1 70b.
Llama 3.05 Storybreaker Ministral 70b$0.85$0.85Envoid/Llama-3.05-NT-Storybreaker-Ministral-70BMuch more inclined to output adult content than its predecessor. Great choice for novelty roleplay scenarios.
Llama 3.1 70B ArliAI RPMax v1.3$0.51$0.51Llama-3.3+3.1-70B-ArliAI-RPMax-v1.3RPMax are a series of models that are trained on a diverse set of curated creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations.
Llama 3.1 70B Celeste v0.1$0.85$0.85nothingiisreal/L3.1-70B-Celeste-V0.1-BF16Creative model based on Llama 3.1 70B
Llama 3.1 70B Dracarys 2$0.85$0.85abacusai/Dracarys-72B-InstructLlama 3.1 70b finetune that offers improvements on coding.
Llama 3.1 70B Euryale$0.51$0.59Sao10K/L3.1-70B-Euryale-v2.2A 70B parameter model from SAO10K based on Llama 3.1 70B, offering high-quality text generation.
Llama 3.1 70B Hanami$0.85$0.85Sao10K/L3.1-70B-Hanami-x1Euryale v2.2-based finetune.
Llama 3.1 8B (decentralized)$0.02$0.03Meta-Llama-3-1-8B-Instruct-FP8Meta's Llama 3.1 8B model via an open permissionless network
Llama 3.1 8b (uncensored)$0.34$0.34aion-labs/aion-rp-llama-3.1-8bThis is a truly uncensored model, trained to excel at roleplaying and creative writing. However, it can also do other things!
Llama 3.1 8b Instruct$0.09$0.09meta-llama/llama-3.1-8b-instructFast and efficient for simple purposes.
Llama 3.1 Large$0.34$0.34Meta-Llama-3-1-405B-Instruct-FP8Note: comes with a 90% discount currently, enjoy! Meta's largest Llama 3.1 405B model. Open-source, run through an open permissionless crypto network (no central provider).
Llama 3.2 3b Instruct$0.05$0.09meta-llama/llama-3.2-3b-instructSmall model optimized for advanced natural language processing tasks like dialogue generation, reasoning, and summarization
Llama 3.2 Medium$1.53$1.53meta-llama/llama-3.2-90b-vision-instructMedium-size (and capability) version of Meta's newest model (3.2 series).
Llama 3.3 70B Anubis v1$0.85$0.85Llama-3.3-70B-Anubis-v1Llama 3.3 70B Anubis v1
Llama 3.3 70B Cu Mai$0.85$0.85Steelskull/L3.3-Cu-Mai-R1-70bA 70B parameter model from Steelskull based on Llama 3.3 70B, offering high-quality text generation.
Llama 3.3 70B Cu Mai R1$0.51$0.51Llama-3.3-70B-Cu-Mai-R1Llama 3.3 70B Cu Mai R1
Llama 3.3 70B Electra R1$0.51$0.51Llama-3.3-70B-Electra-R1Llama 3.3 70B Electra R1
Llama 3.3 70B Electranova v1.0$0.51$0.51Llama-3.3-70B-Electranova-v1.0Llama 3.3 70B Electranova v1.0
Llama 3.3 70B Euryale$0.85$0.85Sao10K/L3.3-70B-Euryale-v2.3A 70B parameter model from SAO10K based on Llama 3.3 70B, offering high-quality text generation.
Llama 3.3 70B Fallen R1 v1$0.51$0.51Llama-3.3-70B-Fallen-R1-v1Llama 3.3 70B Fallen R1 v1
Llama 3.3 70B Instruct abliterated$0.99$0.99huihui-ai/Llama-3.3-70B-Instruct-abliteratedAn abliterated (removed restrictions and censorship) version of Llama 3.3 70b.
Llama 3.3 70B Legion V2.1$0.51$0.51Llama-3.3-70B-Legion-V2.1Llama 3.3 70B Legion V2.1
Llama 3.3 70B Magnum v4 SE$0.51$0.51Llama-3.3-70B-Magnum-v4-SELlama 3.3 70B Magnum v4 SE
Llama 3.3 70B RPMax v1.4$0.51$0.51Llama-3.3-70B-ArliAI-RPMax-v1.4Llama 3.3 70B RPMax v1.4
Llama 3.3 70B TEE$0.18$0.52TEE/llama-3.3-70b-instructThe Meta Llama 3.3 multilingual large language model (LLM) is a pretrained and instruction tuned generative model in 70B (text in/text out). The Llama 3.3 instruction tuned text only model is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks. Runs inside a GPU TEE for full, provable privacy.
Llama 3.3 70B Vulpecula R1$0.51$0.51Llama-3.3-70B-Vulpecula-R1Llama 3.3 70B Vulpecula R1
Llama 3.3 70B Wayfarer$1.19$1.19LatitudeGames/Wayfarer-Large-70B-Llama-3.3Llama 3.3 70B Wayfarer is a fine-tuned version of Llama 3.3 70B, trained on a diverse set of creative writing and RP datasets with a focus on variety and deduplication. This model is designed to be highly creative and non-repetitive by making sure no two entries in the dataset have repeated characters or situations, which makes sure the model does not latch on to a certain personality and be capable of understanding and acting appropriately to any characters or situations.
Llama 3.3 70b Instruct$0.10$0.25meta-llama/llama-3.3-70b-instructLlama 3.3 is optimized for multilingual dialogue use cases and outperforms many of the available open source and closed chat models on common industry benchmarks.
Llama 3.3 70b Mirai Fanfare$0.85$0.85Llama-3.3-70B-MiraiFanfareA Llama 3.3 70b finetuned for roleplay and storytelling.
Llama 3.3+ 70B WhiteRabbitNeo-2$0.51$0.51Llama-3.3+(3.1v3.3)-70B-WhiteRabbitNeo-2WhiteRabbitNeo is a model series that can be used for offensive and defensive cybersecurity.
Llama 3.3+(3.1v3.3) 70B Hanami x1$0.51$0.51Llama-3.3+(3.1v3.3)-70B-Hanami-x1Llama 3.3+(3.1v3.3) 70B Hanami x1
Llama 3.3+(3v3.3) 70B TenyxChat DaybreakStorywriter$0.51$0.51Llama-3.3+(3v3.3)-70B-TenyxChat-DaybreakStorywriterLlama 3.3+(3v3.3) 70B TenyxChat DaybreakStorywriter
Llama 4 Maverick$0.18$0.80meta-llama/llama-4-maverickLlama 4 Maverick, a 17 billion active parameter model with 128 experts, is the best multimodal model in its class, beating GPT-4o and Gemini 2.0 Flash across a broad range of widely reported benchmarks, while achieving comparable results to the new DeepSeek v3 on reasoning and codingβ€”at less than half the active parameters. Llama 4 Maverick offers a best-in-class performance to cost ratio with an experimental chat version scoring ELO of 1417 on LMArena.
Llama 4 Scout$0.09$0.46meta-llama/llama-4-scoutLlama 4 Scout, a 17 billion active parameter model with 16 experts, is the best multimodal model in the world in its class and is more powerful than all previous generation Llama models, while fitting in a single H100 GPU. Additionally, Llama 4 Scout offers an industry-leading context window of 10M and delivers better results than Gemma 3, Gemini 2.0 Flash-Lite, and Mistral 3.1 across a broad range of widely reported benchmarks.
Llama-xLAM-2 70B fc-r$4.25$3.88Salesforce/Llama-xLAM-2-70b-fc-rSalesforce’s 70-B frontier model focused on function-calling & retrieval-augmented generation.
Lumimaid 70b$3.40$3.40NeverSleep/Llama-3-Lumimaid-70B-v0.1Neversleep Llama 3 Lumimaid 70B
Lumimaid v0.2$2.01$2.01NeverSleep/Lumimaid-v0.2-70BUpgrade to Llama-3 Lumimaid 70B. A Llama 3.1 70B finetune trained on curated roleplay data.
MN-LooseCannon-12B-v1$0.85$0.85GalrionSoftworks/MN-LooseCannon-12B-v1Merge of Starcannon and Sao Lyra.
MS Evalebis 70b$0.85$0.85Steelskull/L3.3-MS-Evalebis-70bCombination of EVA, Euryale and Anubis.
Mag Mell R1$0.85$0.85inflatebot/MN-12B-Mag-Mell-R1Mag Mell demonstrates worldbuilding capabilities unlike any model in its class, comparable to old adventuring models like Tiefighter, and prose that exhibits minimal slop.
Magnum V2 72B$3.40$5.10anthracite-org/magnum-v2-72bMagnum V2 72B
Magnum v4 72B$3.40$5.10anthracite-org/magnum-v4-72bUpgraded model of Magnum V2 72B. From the creators of Goliath. Aimed at achieving prose quality similar to Claude Opus 3, trained on 55 million tokens of curated Roleplay data.
Mercury Coder Small$0.42$1.70mercury-coder-smallModel by Inception AI. A diffusion large language model that runs incredibly quickly (500+ tokens/second) while matching Claude 3.5 Haiku and GPT-4o-mini. 1st in speed on Copilot arena, and matching 2nd in quality.
Microsoft Deepseek R1$0.17$0.17microsoft/MAI-DS-R1-FP8MAI-DS-R1 is a DeepSeek-R1 reasoning model that has been post-trained by the Microsoft AI team to improve its responsiveness on blocked topics and its risk profile, while maintaining its reasoning capabilities and competitive performance.
Microsoft Phi 4 Reasoning$0.10$0.10microsoft/Phi-4-reasoningA 14-billion parameter open-weight reasoning model that rivals much larger models on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated reasoning demonstrations from OpenAI o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectively leverage additional inference-time compute.
Microsoft Phi 4 Reasoning Plus$0.10$0.10microsoft/Phi-4-reasoning-plusA 14-billion parameter open-weight reasoning model that rivals much larger models on complex reasoning tasks. Trained via supervised fine-tuning of Phi-4 on carefully curated reasoning demonstrations from OpenAI o3-mini, Phi-4-reasoning generates detailed reasoning chains that effectively leverage additional inference-time compute. Phi-4-reasoning-plus builds upon Phi-4-reasoning capabilities, further trained with reinforcement learning to utilize more inference-time compute, using 1.5x more tokens than Phi-4-reasoning, to deliver higher accuracy.
MiniMax 01$0.34$1.87minimax/minimax-01MiniMax's flagship model with a 1M token context window
Mistral 7B Instruct$0.09$0.09mistralai/mistral-7b-instructOptimized for speed with decent context length
Mistral Devstral Small 2505$17.00$34.00mistralai/Devstral-Small-2505OpenHands+Devstral is 100% local 100% open, and is SOTA for the category on SWE-Bench Verified: 46.8% accuracy.
Mistral Large 2411$3.40$10.20mistralai/mistral-largeUpgrade to Mistral's flagship model. It is fluent in English, French, Spanish, German, and Italian, with high grammatical accuracy, with a long context window.
Mistral Medium 3$0.68$3.40mistralai/mistral-medium-3Mistral Medium 3 delivers frontier performance while being an order of magnitude less expensive. For instance, the model performs at or above 90% of Claude Sonnet 3.7 on benchmarks across the board at a significantly lower cost. On performance, Mistral Medium 3 also surpasses leading open models such as Llama 4 Maverick and enterprise models such as Cohere Command A. On pricing, the model beats cost leaders such as DeepSeek v3, both in API and self-deployed systems.
Mistral Nemo$0.17$0.20mistralai/Mistral-Nemo-Instruct-240712B parameter model with multilingual support.
Mistral Nemo 12B Instruct 2407$0.17$0.20Mistral-Nemo-12B-Instruct-2407Mistral Nemo 12B Instruct 2407
Mistral Nemo 12B NemoMix Unleashed$0.34$0.34Mistral-Nemo-12B-NemoMix-UnleashedMistral Nemo 12B NemoMix Unleashed
Mistral Nemo 12B RPMax v1.1$0.17$0.25Mistral-Nemo-12B-ArliAI-RPMax-v1.1Mistral Nemo 12B RPMax v1.1
Mistral Nemo 12B RPMax v1.3$0.17$0.25Mistral-Nemo-12B-ArliAI-RPMax-v1.3Mistral Nemo 12B RPMax v1.3
Mistral Nemo 12B SauerkrautLM$0.17$0.25Mistral-Nemo-12B-SauerkrautLMMistral Nemo 12B SauerkrautLM
Mistral Nemo Inferor 12B$0.42$0.85Infermatic/MN-12B-Inferor-v0.0Inferor is a merge of top roleplay models, expert on immersive narratives and storytelling.
Mistral Nemo Starcannon 12b v1$0.85$0.85VongolaChouko/Starcannon-Unleashed-12B-v1.0Mistral Nemo finetine that offers improvements on roleplay.
Mistral Saba$0.34$1.02mistralai/mistral-sabaMistral Saba is a 24B-parameter language model specifically designed for the Middle East and South Asia, delivering accurate and contextually relevant responses while maintaining efficient performance. Trained on curated regional datasets, it supports multiple Indian-origin languagesβ€”including Tamil and Malayalamβ€”alongside Arabic. This makes it a versatile option for a range of regional and multilingual applications.
Mistral Small 31 24b Instruct$0.17$0.51mistral-small-31-24b-instructBuilding upon Mistral Small 3 (2501), Mistral Small 3.1 (2503) adds state-of-the-art vision understanding and enhances long context capabilities up to 128k tokens without compromising text performance. With 24 billion parameters, this model achieves top-tier capabilities in both text and vision tasks.
Mistral Tiny$0.42$0.42mistralai/mistral-tinyPowered by Mistral-7B-v0.2, best used for large batch processing tasks where cost is a significant factor but reasoning capabilities are not crucial.
MythoMax 13B$0.17$0.17Gryphe/MythoMax-L2-13bOne of the highest performing and most popular fine-tunes of Llama 2 13B, with rich descriptions and roleplay.
Nemo Arli 12b RPMa V1.2$0.17$0.25Mistral-Nemo-12B-ArliAI-RPMax-v1.2A Mistral Nemo 12b finetuned for roleplay and storytelling.
NemoMix 12B Unleashed$0.85$0.85MarinaraSpaghetti/NemoMix-Unleashed-12BGreat for RP and storytelling.
Nemotron 3.1 70B abliterated$0.99$0.99huihui-ai/Llama-3.1-Nemotron-70B-Instruct-HF-abliteratedAn abliterated (removed restrictions and censorship) version of Llama 3.1 70b Nemotron.
Nemotron Tenyxchat Storybreaker 70b$0.85$0.85Envoid/Llama-3.05-Nemotron-Tenyxchat-Storybreaker-70BOverall it provides a solid option for RP and creative writing while still functioning as an assistant model, if desired. If used to continue a roleplay it will generally follow the ongoing cadence of the conversation.
Neural Daredevil 8B abliterated$0.61$0.61mlabonne/NeuralDaredevil-8B-abliteratedThe best performing 8B abliterated model according to most benchmarks.
Nvidia Nemotron 70b$0.59$0.68nvidia/Llama-3.1-Nemotron-70B-Instruct-HFNvidia's latest Llama fine-tune optimized for instruction following. Early results hints that it might outperform models such as GPT-4o and Claude 3.5 Sonnet.
Nvidia Nemotron Super 49B$2.55$2.55nvidia/Llama-3.3-Nemotron-Super-49B-v1Llama-3.3-Nemotron-Super-49B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model's memory footprint, enabling larger workloads, as well as fitting the model on a single GPU at high workloads (H200). This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. For more information on the NAS approach, please refer to this paper.
Nvidia Nemotron Ultra 253B$0.68$1.36nvidia/Llama-3.1-Nemotron-Ultra-253B-v1Llama-3.1-Nemotron-Ultra-253B-v1 is a large language model (LLM) which is a derivative of Meta Llama-3.1-405B-Instruct (AKA the reference model). It is a reasoning model that is post trained for reasoning, human chat preferences, and tasks, such as RAG and tool calling. The model supports a context length of 128K tokens. This model fits on a single 8xH100 node for inference. Llama-3.1-Nemotron-Ultra-253B-v1 is a model which offers a great tradeoff between model accuracy and efficiency. Efficiency (throughput) directly translates to savings. Using a novel Neural Architecture Search (NAS) approach, we greatly reduce the model's memory footprint, enabling larger workloads, as well as reducing the number of GPUs required to run the model in a data center environment. This NAS approach enables the selection of a desired point in the accuracy-efficiency tradeoff. Furthermore, by using a novel method to vertically compress the model (see details here), it also offers a significant improvement in latency.
OlympicCoder 32B$0.85$0.85open-r1/OlympicCoder-32BA code model that achieves very strong performance on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics.
OlympicCoder 7B$0.34$0.34open-r1/OlympicCoder-7bA lightweight code model that performs well on competitive coding benchmarks such as LiveCodeBench and the 2024 International Olympiad in Informatics.
OpenAI o1$25.50$102.00o1OpenAI's flagship reasoning model for solving hard problems. Useful when tackling complex problems in science, coding, math, and similar fields.
OpenAI o1 Pro$255.00$1020.00openai/o1-proOpenAI's flagship series of reasoning models for solving hard problems. The o1 series of models are trained with reinforcement learning to think before they answer and perform complex reasoning. The o1-pro model uses more compute to think harder and provide consistently better answers. o1 pro comes with a massive 100,000 word output window.
OpenAI o1 preview$25.50$102.00o1-previewOpenAI's new flagship series of reasoning models for solving hard problems. Useful when tackling complex problems in science, coding, math, and similar fields
OpenAI o1-mini$5.10$20.40o1-miniA fast, cost-efficient version of OpenAI's o1 reasoning model tailored to coding, math, and science use cases.
OpenAI o3$10.00$40.00o3Full version of OpenAI's o3. The current flagship model by OpenAI which OpenAI sees as getting close to true AGI.
OpenAI o3-mini$1.87$7.48o3-miniThe cheaper version of OpenAI's newest thinking model. Fast, cheap, and with a maximum output of 100,000 words.
OpenAI o3-mini high$1.87$7.48o3-mini-highOpenAI's newest flagship model with reasoning effort set to high.
OpenAI o3-mini low$1.87$7.48o3-mini-lowOpenAI's newest flagship model with reasoning effort set to low.
OpenAI o4-mini$1.10$4.40o4-minio4 mini is the mini version of what will be the next version of OpenAI models.
OpenAI o4-mini high$1.10$4.40o4-mini-highThe maximum/high version of the o4-mini model. The next generation of OpenAI models.
Perplexity Deep Research$3.40$13.60sonar-deep-researchCurrently unstable. Analyzes hundreds of sources, delivering expert-level insights in minutes. Deep Research API has a 93.9% accuracy on SimpleQA benchmark and attains a score of 21.1% accuracy on Humanity's Last Exam, significantly outperforming Gemini Thinking, o3-mini, o1, and DeepSeek-R1.
Perplexity Pro$5.10$25.50sonar-proSonar Pro tackles complex questions that need deeper research and provides more sources.
Perplexity R1 1776$3.40$13.60r1-1776R1 1776 is a version of the DeepSeek R1 model that has been post-trained by Perplexity to provide uncensored, unbiased, and factual information.
Perplexity Reasoning$1.70$8.50sonar-reasoningPerplexity's Sonar Reasoning uses DeepSeek R1's thinking process combined with looking up on the web to tackle complex questions that need deeper research and provides more sources.
Perplexity Reasoning Pro$3.40$13.60sonar-reasoning-proPerplexity's Sonar Reasoning Pro uses DeepSeek R1's thinking process combined with looking up on the web to tackle complex questions that need deeper research and provides more sources.
Perplexity Simple$1.70$1.70sonarA Perplexity model that gives fast, straightforward answers.
Phi 4 Mini$0.20$0.82phi-4-mini-instructPhi 4 Mini by Microsoft. A small multilingual model.
Phi 4 Multimodal$0.12$0.19phi-4-multimodal-instructPhi 4 by Microsoft. A small multimodal model that can handle images and text.
QwQ 32B Snowdrop v0$0.34$0.34QwQ-32B-Snowdrop-v0QwQ 32B Snowdrop v0
QwQ 32B Snowdrop v0 nothink$0.34$0.34QwQ-32B-Snowdrop-v0-nothinkQwQ 32B Snowdrop v0 nothink
QwQ 32b Arli V1$0.34$0.34QwQ-32B-ArliAI-RpR-v1A QwQ 32b finetuned for roleplay and storytelling.
QwQ 32b Arli V2$0.34$0.34QwQ-32B-ArliAI-RpR-v2A QwQ 32b finetuned for roleplay and storytelling.
QwQ 32b Arli V3$0.34$0.34QwQ-32B-ArliAI-RpR-v3A QwQ 32b finetuned for roleplay and storytelling.
Qwen 2.5 32b EVA$0.85$0.85Qwen2.5-32B-EVA-v0.2A Qwen 2.5 32b finetuned for roleplay and storytelling.
Qwen 2.5 72B Instruct$0.59$0.68Qwen2.5-72B-InstructQwen 2.5 72B Instruct
Qwen 2.5 Coder 32b$0.27$0.27Qwen/Qwen2.5-Coder-32B-InstructThe latest series of Code-Specific Qwen large language models.
Qwen 2.5 Max$2.72$10.88qwen-maxQwen 2.5 Max is the upgraded version of Qwen Max, beating GPT-4o, Deepseek V3 and Claude 3.5 Sonnet in benchmarks.
Qwen 3 14b$0.06$0.18qwen/qwen3-14bQwen 3 14b is a 14b model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.
Qwen 3 235b A22B$0.13$0.50qwen/qwen3-235b-a22bQwen 3 235b is a 235b model with 22B active parameters. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.
Qwen 3 30b A3B$0.08$0.24qwen/qwen3-30b-a3bQwen 3 30b A3B is a 30b model with 3 billion active parameters per pass. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.
Qwen 3 32b$0.08$0.24qwen/qwen3-32bQwen 3 32b is a 32b model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.
Qwen 3 8B$0.06$0.18Qwen/Qwen3-8BQwen 3 8B is a 8B model. Supports switching between thinking and non thinking: trigger thinking with /think and /no_think anywhere in a prompt or system message to toggle chain-of-thought reasoning.
Qwen Long 10M$0.17$0.68qwen-longAlibaba's huge context window model. Takes in up to 10 million tokens, which is equivalent to dozens of books.
Qwen Plus$0.68$2.04qwen-plusAlibaba's balanced model. Fast, cheap, yet still very powerful.
Qwen QwQ 32B Preview$0.68$0.68Qwen/QwQ-32B-PreviewExperimental release of Qwen's reasoning model. Great at coding and math, but still in development so may exhibit odd bugs. Not production-ready.
Qwen Turbo$0.09$0.34qwen-turboAlibaba's fastest and cheapest model. Suitable for simple tasks, fast and low cost, with a 1 million token context window.
Qwen2.5 72B$0.59$0.68qwen/qwen-2.5-72b-instructGreat multilingual support, strong at mathematics and coding, supports roleplay and chatbots.
Qwen2.5 7B TEE$0.60$0.60TEE/qwen-2.5-7b-instructGreat multilingual support, strong at mathematics and coding, supports roleplay and chatbots. Runs inside a GPU TEE for full, provable privacy.
Qwen25 VL 72b$1.19$1.19qwen25-vl-72b-instructQwen25 VL 72b model with 32k context window
Qwen3 32B$0.34$0.34Qwen3-32BQwen3 32B
Qwen: QvQ Max$2.38$9.01qvq-maxQvQ Max is the top model of the Qwen series. QvQ Max is capable of thinking and reasoning, can achieve significantly enhanced performance especially on hard problems.
Qwen: QwQ 32B$0.34$0.34qwq-32bQwQ is the reasoning model of the Qwen series. QwQ is capable of thinking and reasoning, can achieve significantly enhanced performance especially on hard problems.
Qwerky 72B$0.85$0.85featherless-ai/Qwerky-72BLinear models offer a promising approach to significantly reduce computational costs at scale, particularly for large context lengths. Enabling a >1000x improvement in inference costs, enabling o1 inference time thinking and wider AI accessibility.
ReMM SLERP 13B$1.36$2.04undi95/remm-slerp-l2-13bA recreation trial of the original MythoMax-L2-B13 but merged with updated models.
Rocinante 12b$0.68$1.02TheDrummer/Rocinante-12B-v1.1Designed for engaging storytelling and rich prose. Expanded vocabulary with unique and expressive word choices, enhanced creativity and captivating stories.
Sao10K Stheno 8b$0.85$0.85Sao10K/L3-8B-Stheno-v3.2Sao10K's latest Stheno fine-tune optimized for instruction following.
Sarvan Medium$0.42$1.28sarvan-mediumSarvam AI has launched Sarvam-M, a 24-billion-parameter hybrid language model boasting strong performance in math, programming, and Indian languages.
Shisa V2 Llama 3.3 70B$0.85$0.85shisa-ai/shisa-v2-llama3.3-70bShisa V2 is a family of bilingual Japanese/English language models ranging from 7B to 70B parameters, optimized for high-quality Japanese language capabilities while maintaining strong English performance.
SorcererLM 8x22B$7.65$7.65raifle/sorcererlm-8x22bAdvanced roleplaying model with reasoning and emotional intelligence for engaging interactions, contextual awareness and enhanced narrative depth
Steelskull Electra R1 70b$1.19$1.19Steelskull/L3.3-Electra-R1-70bSteelskull Electra R1 70b
Steelskull Nevoria 70b$0.85$0.85Steelskull/L3.3-MS-Nevoria-70bSteelskull Nevoria 70b
Steelskull Nevoria R1 70b$0.85$0.85Steelskull/L3.3-Nevoria-R1-70bSteelskull Nevoria R1 70b
Step R1 V Mini$4.25$18.70step-r1-v-miniStep-R1-V-Mini, which supports image and text input, text output, has good instruction following and general capabilities, can perceive images with high precision and complete complex reasoning tasks.
Step-2 16k Exp$11.90$34.00step-2-16k-expStep-2 16k Exp is a 16k context window model
Step-2 Mini$0.34$0.68step-2-miniMiniMax's flagship model with a 1M token context window
The Drummer Cydonia 24B$0.17$0.20TheDrummer/Cydonia-24B-v2Cydonia 24B v2 is a finetune of Mistral's latest 'Small' model (2501). Aliases: Cydonia 24B, Cydonia v2, Cydonia on that broken base.
The Omega Abomination V1$0.70$0.95ReadyArt/The-Omega-Abomination-L-70B-v1.0The merger of the Omega Directive M 24b v1.1 and Cydonia 24b v2.
The Omega Abomination V5$0.70$0.95ReadyArt/The-Omega-Abomination-L-70B-v5.0The merger of the Omega Directive M 24b v1.1 and Cydonia 24b v2.
TheDrummer Skyfall 36B V2$0.85$0.85thedrummer/skyfall-36b-v2TheDrummer's Skyfall 36B V2, a 36B parameter model with a focus on high quality and consistency.
TheDrummer: Valkyrie 49B V1$1.02$1.36parasail-valkyrie-49b-v1Built on top of NVIDIA's Llama 3 Nemotron Super 49B.
UnslopNemo 12b v4$0.85$0.85TheDrummer/UnslopNemo-12B-v4.1UnslopNemo v4 is the previous version from the creator of Rocinante, designed for adventure writing and role-play scenarios.
Veiled Calla 12B$0.51$0.51soob3123/Veiled-Calla-12BVeiled Calla 12B is a 12B parameter model that is a more advanced version of Calla 12B.
WizardLM-2 8x22B$0.85$0.85microsoft/wizardlm-2-8x22bMicrosoft's advanced Wizard model. The most popular role-playing model.
Yi Large$5.44$5.44yi-largeLarge version of Yi Lightning with a 32k context window, but more expensive.
Yi Lightning$0.34$0.34yi-lightningChinese-developed multilingual (English, Chinese and others) model by 01.ai that's very fast and cheap, yet scores high on independent leaderboards.
Yi Medium 200k$4.25$4.25yi-medium-200kMedium version of Yi with a 200k context window.
Yi Medium 200k$0.02$0.02yi-34b-chat-200kMedium version of Yi Lightning with a huge 200k context window
Yi Spark$0.02$0.02yi-34b-chat-0205Small and powerful, lightweight and fast model. Provides enhanced mathematical operation and code writing capabilities.
Image models
POST https://nano-gpt.com/api/create-image
BAGELbagelBAGEL is a high-quality text-to-image model with excellent prompt adherence and creative capabilities. Supports both text-to-image and image-to-image generation. Supports thought tokens for enhanced generation quality.
DALL-E-3dall-e-3OpenAI's most well-known image model.
DALL-E-3 HDdall-e-3-hdOpenAI's most well-known image model, now in HD quality.
Dreamshaper XLdreamshaper_8_93211.safetensorsDreamshaper generates realistic and anime/illustration-style images, and is best suited to sci-fi and fantasy scenes.
Flux Devflux-dev-image-to-imageFlux Dev: Next-gen image-to-image model for advanced creative edits and improvements.
Flux Lightningflux-lightningJuggernaut Lightning by FAL, for the fastest text-to-image generation with high-quality results.
Flux Loraflux-loraFLUX.1 [dev] with LoRA support, fast and high-quality image generation with the option to use LORAs for specific styles.
Flux Pro V1flux-proOlder version of Flux V1.1. Exceptional quality and prompt adherence.
Flux Pro V1.1flux-pro/v1.1Excellent image quality, prompt adherence, and output diversity.
Flux Pro V1.1 Ultraflux-pro/v1.1-ultra4K version of Flux Pro V1.1. Excellent image quality, prompt adherence, and output diversity.
Flux Realismflux-realismIncredibly photorealistic image generation. Generate people, animals, landscapes that are hard to distinguish from reality.
Flux Schnellflux/schnellFast and high-quality image generation - the cheaper version of the Flux range of models.
GPT-4o Imagegpt-4o-image-vipOpenAI's GPT-4o image generation model. Currently in preview mode. Supports both text to image and image-to-image generation.
Gemini Image Edit gemini-flash-editEdit an existing image using Gemini based on a text prompt. Google-based, so quite censored.
GhiblifyghiblifyTransforms an input image into a Ghibli-inspired style.
HidreamhidreamHidream I1 Full, the latest and greatest image generation model from Hidream. Supports specific parameters like shift.
Hidream Edithidream-editEdit an existing image using Hidream I1 based on a text prompt.
Ideogram V2ideogram-ai/ideogram-v2An excellent image model with state of the art inpainting, prompt comprehension and especially text rendering.
Ideogram V2 Turboideogram-ai/ideogram-v2-turboA fast image model with state of the art inpainting, prompt comprehension and especially text rendering.
Ideogram V3ideogram-v3-defaultIdeogram V3 (Default) provides a good balance between speed and quality for text-to-image generation.
Ideogram V3 Qualityideogram-v3-qualityIdeogram V3 (Quality) offers the highest fidelity images and poster-grade text rendering capabilities.
Ideogram V3 Turboideogram-v3-turboIdeogram V3 (Turbo) generates images with lightning-fast speed and high text rendering accuracy.
Image Model Recommenderauto-image-selectionCategorizes your prompt and recommends the best image model for your task. Does not immediately generate an image itself.
Imagen V3imagen-3.0-generate-002Google's highest quality text-to-image model with fine detail, rich lighting, and excellent text rendering capabilities.
Imagen V4 Previewimagen-4.0-generate-preview-05-20Google's highest quality image generation model. Excels at fine details, diverse art styles, and understanding prompts. Works best when limited to 500 words input.
Imagen V4 Ultraimagen-4.0-ultra-generate-exp-05-20Ultra version of Google's highest quality image generation model. Excels at fine details, diverse art styles, and understanding prompts. Works best when limited to 500 words input.
MidjourneymidjourneyMidjourney creates stunningly detailed and imaginative images from simple text prompts. Note: generates 4 images at once, price shown is for 4 images.
Movie GeneratorlongstoriesUses LongStories AI to generate high-quality content from text prompts. Generates engaging short stories, similar to Youtube Shorts, TikTok clips etc, on any subject you want. Offers many customization options. Note: generation can take from 30 seconds to a few minutes.
Movie Generator for Kidslongstories-kidsGenerates engaging short stories for kids on any subject you want. Offers many customization options. Note: generation can take from 30 seconds to a few minutes.
Playground V2.5playground-v25Playground V2.5 outperforms SDXL in many user tests. Suitable for a broad range of images.
PromptchanpromptchanHigh-quality image generation with lots of customization options.
Proteusproteus-v0.2A versatile image generation model with high-quality outputs.
ReV AnimatedrevAnimated_v122.safetensorsReV Animated specialized in fantasy, anime and semi-realistic landscapes.
Recraft V3recraft-v3Recraft V3 is a state-of-the-art image generation model that is known for its high quality and prompt adherence.
SD 3.5 Largestable-diffusion-v35-largeStable Diffusion's newest model. Generates a wide variety of images reflecting different styles without complex prompting.
SD 3.5 Large Turbostable-diffusion-v35-large/turboTurbo version of Stable Diffusion's newest model. Faster and cheaper performance while still maintaining great prompt adherence and quality.
SDXL ArliMix V1SDXL-ArliMix-v1Image generation using SDXL ArliMix V1 via Arli AI. Cartoon-like generation
Seedream 3.0general_v3.0Supports native 2K resolution output, offers faster response speeds, generates more accurate small text, improves text layout effects, enhances aesthetics and structural quality, and demonstrates excellent fidelity and detail performance. It has achieved leading rankings in multiple evaluations.
Stable Diffusion 3 Mediumsd3_base_medium.safetensorsExcels at photorealism, typography, and prompt following. Works best in 1024x1024.
Stable Diffusion XLfast-sdxlCheap and powerful text-to-image model that generates pictures rapidly.