Title: NanoGPT API LLM Guide Audience: coding assistants helping devs integrate NanoGPT Last updated: 2025-03-14 GOAL Equip developers to call the NanoGPT API for chat, streaming, memory, search, embeddings, and audio with OpenAI-compatible clients in minutes. STYLE FOR ANSWERS - Prefer TypeScript (Node 18+) with openai@^5.8.0 and baseURL set to https://nano-gpt.com/api. - Share Python or curl snippets when requested or when automation beyond Node is the better fit. - Keep examples runnable, scoped to one task, and load secrets from environment variables. - Reference GET /api/v1/models for model availability instead of guessing ids or capabilities. - Explain reasoning fields, streaming frames, and billing metadata whenever they affect integration. - If unsure, acknowledge it and point the developer to LINKS. REQUIRED PACKAGES AND VERSIONS - npm install openai@^5.8.0 - yarn add openai@^5.8.0 (same SDK for Yarn projects) - pnpm add openai@^5.8.0 (monorepo-friendly install) - pip install --upgrade openai - Node 18+ or Deno/Bun with fetch and Web Streams support CORE CONCEPTS - API base https://nano-gpt.com/api hosts OpenAI-style chat/completions/embeddings plus /v1/audio/speech; Anthropic /v1/messages lives at https://nano-gpt.com/v1/messages. - Authentication uses Authorization: Bearer or x-api-key headers; generate keys in the NanoGPT dashboard. - Compatibility: standard OpenAI schema (messages, tools, response_format, seed, reasoning, parallel_tool_calls) is preserved for sync and streaming calls. - Model suffixes: append :online or :online/linkup-deep for Linkup search, :memory or :memory- for Polychat memory, and combine suffixes as needed. - Reasoning and pricing: request reasoning via the reasoning object, read delta.reasoning in streams, and handle sanitized x_nanogpt_pricing frames. - Prompt caching and memory: prompt_caching/cache_control toggles 5m or 1h cache TTLs, and memory compression replaces prior turns before dispatch. GLOSSARY - Linkup search: NanoGPT web search triggered by :online suffix; prepends fresh results and bills $0.006 (standard) or $0.06 (deep). - Polychat memory: hierarchical compression invoked by :memory suffix or memory header; stores 1-365 day summaries and replaces conversation history. - Prompt caching: reusable prompt segments activated by prompt_caching/cache_control or anthropic-beta headers with optional cutAfterMessageIndex. - Reasoning stream: delta.reasoning text emitted by reasoning-enabled models; reasoning_details provides structured metadata when available. - Pricing frame: sanitized billing payload returned as x_nanogpt_pricing { amount, currency, usage } during streaming or in the final JSON. - BYOK: bring-your-own-key mode activated with x-use-byok: true or byok.enabled: true to forward requests using stored provider credentials. HARD RULES AND GOTCHAS - Always set baseURL to https://nano-gpt.com/api (or https://nano-gpt.com for /v1/messages) and include Authorization or x-api-key on every call. - Do not hardcode model support; call GET /api/v1/models?detailed=true to confirm vision, context length, and subscription status. - Handle streaming as SSE: group chunks by id, concatenate delta.content, capture delta.reasoning separately, stop on [DONE], and watch for x_nanogpt_pricing frames. - Never rely on tags; reasoning text lives in reasoning/reasoning_details and may be omitted when reasoning.exclude is true. - Memory replaces the entire history with a compressed summary; send the up-to-date messages each turn and avoid mixing original transcripts with memory output. - BYOK requests skip fallback providers; warn developers that misconfigured downstream keys return provider errors directly without NanoGPT failover. QUICKSTART MINIMAL Set up the OpenAI SDK once and call the chat completions endpoint with familiar parameters. ```ts // quickstart.ts import OpenAI from 'openai'; const client = new OpenAI({ apiKey: process.env.NANOGPT_API_KEY!, baseURL: 'https://nano-gpt.com/api', }); async function run() { const completion = await client.chat.completions.create({ model: 'gpt-4o-mini', messages: [ { role: 'system', content: 'You are the NanoGPT onboarding guide.' }, { role: 'user', content: 'Summarize the API authentication steps.' }, ], reasoning: { enabled: true, effort: 'low' }, }); console.log(completion.choices[0]?.message.content); } run().catch((err) => { console.error('NanoGPT error:', err.response?.data ?? err); }); ``` STREAMING WITH REASONING Use stream: true to read delta.content and delta.reasoning while capturing sanitized pricing frames. ```ts import OpenAI from 'openai'; const client = new OpenAI({ apiKey: process.env.NANOGPT_API_KEY!, baseURL: 'https://nano-gpt.com/api', }); async function streamSummary() { const stream = await client.chat.completions.create({ model: 'gpt-4o', stream: true, messages: [ { role: 'system', content: 'Respond concisely and expose your reasoning.' }, { role: 'user', content: 'Describe the NanoGPT streaming contract.' }, ], reasoning: { enabled: true, effort: 'medium' }, }); let output = ''; let reasoning = ''; for await (const chunk of stream) { const delta = chunk.choices?.[0]?.delta; if (delta?.reasoning) reasoning += delta.reasoning; if (delta?.content) output += delta.content; const pricing = (chunk as any).x_nanogpt_pricing; if (pricing) console.info('pricing update', pricing); } console.log({ reasoning, output }); } streamSummary().catch(console.error); ``` TOOLS AND STRUCTURED OUTPUT NanoGPT forwards OpenAI tool definitions and structured_outputs to the selected provider. ```ts const result = await client.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'system', content: 'You call tools to fetch weather data.' }, { role: 'user', content: 'What is the weather in Austin right now?' }, ], tools: [ { type: 'function', function: { name: 'getWeather', description: 'Return current weather conditions.', parameters: { type: 'object', properties: { location: { type: 'string', description: 'City, Country' }, }, required: ['location'], }, }, }, ], tool_choice: 'auto', parallel_tool_calls: false, }); const call = result.choices[0]?.message.tool_calls?.[0]; if (call) { console.log('tool name', call.function.name); console.log('args', call.function.arguments); } ``` MEMORY AND SEARCH Combine Polychat memory with Linkup search by opting into the suffix and optional headers. ```ts await client.chat.completions.create( { model: 'gpt-4o:online:memory-30', messages: [ { role: 'system', content: 'Surface recent release notes before answering.' }, { role: 'user', content: 'Summarize the last five NanoGPT releases for me.' }, ], }, { headers: { memory: 'true', memory_expiration_days: '30', }, }, ); ``` VISION INPUT Send content arrays mixing text and image_url parts to vision-capable models. ```ts const visionReply = await client.chat.completions.create({ model: 'gpt-4o', messages: [ { role: 'user', content: [ { type: 'text', text: 'Describe this dashboard.' }, { type: 'image_url', image_url: { url: 'https://example.com/metrics.png', detail: 'high' } }, ], }, ], }); console.log(visionReply.choices[0]?.message.content); ``` PROMPT CACHING Enable prompt reuse with prompt_caching or cache_control, and set Anthropic beta headers when required. ```ts await client.chat.completions.create( { model: 'gpt-4o', messages: [ { role: 'system', content: 'Always respond with JSON.' }, { role: 'user', content: 'Cache this conversation scaffold for 5 minutes.' }, ], prompt_caching: { enabled: true, ttl: '5m', cut_after_message_index: 0, }, }, { headers: { 'anthropic-beta': 'prompt-caching-2024-07-31', }, }, ); ``` EMBEDDINGS Use the same SDK to create embeddings with NanoGPT-provided models. ```ts const embeddings = await client.embeddings.create({ model: 'text-embedding-3-large', input: [ 'NanoGPT provides unified access to many providers.', 'Check pricing before going to production.', ], }); console.log(embeddings.data.map((row) => row.embedding.length)); ``` AUDIO SPEECH POST /v1/audio/speech mirrors OpenAI’s text-to-speech API. ```ts import { writeFile } from 'node:fs/promises'; const speech = await client.audio.speech.create({ model: 'gpt-4o-mini-tts', voice: 'alloy', input: 'Welcome to the NanoGPT API.', response_format: 'mp3', }); await writeFile('welcome.mp3', Buffer.from(await speech.arrayBuffer())); ``` BYOK (BRING YOUR OWN KEY) Store provider keys once, then enable per request to route through your credentials. ```bash curl -X POST https://nano-gpt.com/api/user/provider-keys \ -H "Authorization: Bearer $NANOGPT_API_KEY" \ -H "Content-Type: application/json" \ -d '{"provider":"openai","key":"sk-live-..."}' ``` ```ts await client.chat.completions.create( { model: 'gpt-4o', messages: [ { role: 'user', content: 'Draft a status update.' }, ], byok: { enabled: true, provider: 'openai', disableFallbacks: true }, }, { headers: { 'x-use-byok': 'true' }, }, ); ``` ANTHROPIC-COMPATIBLE /v1/messages Point the Anthropic SDK at https://nano-gpt.com to reuse this adapter (streaming is unavailable for GPU-TEE models). ```ts import Anthropic from '@anthropic-ai/sdk'; const anthropic = new Anthropic({ apiKey: process.env.NANOGPT_API_KEY!, baseURL: 'https://nano-gpt.com', }); const response = await anthropic.messages.create({ model: 'claude-3-5-sonnet-20241022', max_tokens: 1024, messages: [{ role: 'user', content: 'Summarize the NanoGPT pricing metadata contract.' }], thinking: { type: 'enabled', budget_tokens: 2048 }, }); console.log(response.content); ``` PRICING AND BILLING - Expect usage.prompt_tokens/completion_tokens on JSON responses and per-chunk pricing metadata via x_nanogpt_pricing. - Pricing frames include sanitized errors when provider billing fails; surface the message without exposing vendor codes. - Conversation-level transactions are billed in USD with optional auto-conversion to Nano when applicable. - Use 429 handling with exponential backoff; subscription-free buckets allow roughly 60 RPM, while premium models may enforce tighter limits. ERROR HANDLING AND OBSERVABILITY - Error responses follow OpenAI envelopes: { error: { message, type, code, param } } with x-request-id headers for support. - Authentication failures increment an abuse counter; repeated failures yield 429 with rate_limit_exceeded. - Streaming parsers must guard against unexpected JSON-only frames (pricing) and handle [DONE] closure explicitly. - Log request ids alongside model and payment_source for easier reconciliation with the NanoGPT dashboard. LINKS - https://nano-gpt.com/api - docs/reasoning.md - docs/v1/models.md - memory.md - BYOK.md - docs/streaming-id-contract.md