Back to Blog

Context Memory: Unlimited AI Conversations

Aug 4, 2025

Large Language Models have a fundamental limitation: the context window. When conversations get long, models become slow, lose track, or error out entirely.

Context Memory solves this problem. It keeps conversations and coding sessions super snappy and allows them to continue indefinitely while maintaining full awareness of the entire conversation history.

The Problem

Current memory solutions like ChatGPT's memory store general facts but miss something critical: the ability to recall specific events at the right level of detail.

Without this, you get:

  • Important details forgotten or lost during summarization
  • Conversations cut short when context limits are reached
  • AI agents that lose track of their previous work

How Context Memory Works

Context Memory creates a hierarchical structure of your conversation:

  • High-level summaries for overall context
  • Mid-level details for important relationships
  • Specific details when relevant to recent messages

Here's an example from a coding session:

Token estimation function refactoring
|-- Initial user request
|-- Refactoring to support integer inputs
|-- Error: "exceeds the character limit"
|   +-- Fixed by changing test params from strings to integers
+-- Variable name refactoring

When you ask "What errors did we encounter?", Context Memory expands the relevant section while keeping other parts collapsed. The model that you're using (like ChatGPT or Claude) gets the exact detail needed without information overload.

Benefits

For Developers:

  • Long coding sessions without losing context
  • Speed! Compresses long histories so your model responds quickly
  • AI agents that learn from past mistakes
  • Documentation that maintains context across entire codebases

For Agentic Use Cases:

  • Long-running agents keep everything in memory, from the very first step to the current status
  • Use any model for agents with effectively infinite memory — Context Memory stores all history and passes only the relevant bits so the model always stays aware
  • Reliable planning and backtracking with preserved goals, constraints, decisions, and outcomes
  • Tool use and multi-step workflows stay coherent across hours, days or weeks, including retries and branches
  • Resume after interruptions with full state awareness, without hitting context window limits

For Roleplay & Storytelling:

  • Build far bigger worlds with persistent lore, timelines, and locations that never get forgotten
  • Characters remember identities, relationships, and evolving backstories across long arcs
  • Branching plots stay coherent—past choices, clues, and foreshadowing remain available
  • Resume sessions after days or weeks with full awareness of what happened at the very start
  • Epic-length narratives without context limits—only the relevant pieces are passed to the model

For Conversations:

  • Extended discussions without forgetting details
  • Research projects that build knowledge over time
  • Complex problem-solving with full history awareness

Important: Enable Memory Early

Turn on Context Memory as early as possible in your conversation! Context Memory progressively indexes your conversation each time, building up a comprehensive understanding. This means:

  • Starting memory early captures your entire conversation history
  • You can go over 1 million tokens without hitting limits
  • The system compresses intelligently, enabling it to return the most relevant information
  • Later messages benefit from the full context built up over time

The earlier you enable it, the more complete your memory will be.

Using Context Memory

Context Memory overview

Add :memory to any model name:

curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini:memory",
    "messages": [
      { "role": "user", "content": "Hi" },
      { "role": "assistant", "content": "Hi there! How are you?" },
      { "role": "user", "content": "Please help me set up...." }
    ]
  }'

Or use the memory header:

curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      { "role": "user", "content": "Hi" },
      { "role": "assistant", "content": "Hi there! How are you?" },
      { "role": "user", "content": "Please help me set up...." }
    ]
  }'

Retention (expiration_days)

By default, Context Memory retains your compressed chat state for 30 days. Retention is rolling and based on the conversation’s last update: each new message resets the timer, and the thread expires N days after its last activity. You can configure retention from 1 to 365 days:

  • Via model suffix: append :memory-<days>
curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini:memory-90",
    "messages": [
      { "role": "user", "content": "Persist this conversation for 90 days" }
    ]
  }'
  • Via headers: set memory: true and memory_expiration_days: <1..365>
curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  -H "memory_expiration_days: 45" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      { "role": "user", "content": "Retain for 45 days via header" }
    ]
  }'

Note: If both suffix and header are provided, the header value for memory_expiration_days takes precedence for retention. Combined features still follow suffix parsing (e.g., :online).

Combine with web search:

# Memory + Web Search
curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o:online:memory",
    "messages": [
      { "role": "user", "content": "Hi" },
      { "role": "assistant", "content": "Hi there! How are you?" },
      { "role": "user", "content": "Please search for and help me set up...." }
    ]
  }'

How It Works Technically

Context Memory uses your conversation messages as the identifier. This means:

  • Natural conversation branching
  • Easy reverting to earlier states
  • No complex indexing required

The system compresses long histories without losing information. Everything is preserved at the appropriate level of detail.

What happens behind the scenes:

  1. You send your full conversation history to our API
  2. Context Memory compresses this into a compact representation
  3. Only the compressed version is sent to the AI model (OpenAI, Anthropic, etc.)
  4. The model receives all the context it needs without hitting token limits

This means you can have conversations with millions of tokens of history, but the AI model only sees the intelligently compressed version that fits within its context window.

Provider: Polychat

Privacy with Polychat: When using Context Memory, your conversation data is processed by Polychat's API which uses Google/Gemini in the background with maximum privacy settings.

You can review Polychat's full privacy policy at https://polychat.co/legal/privacy.

Important privacy details:

  • Context Memory over the API does not send data to Google Analytics or use cookies
  • Only your conversation messages are sent to Polychat for compression
  • No email, IP address, or other metadata is shared, only the prompts

Pricing

  • Not-cached input: $5.00 per million tokens
  • Cached input: $2.50 per million tokens
  • Output Generation: $10.00 per million tokens
  • Retention: 30 days by default; configurable 1–365 days via :memory-<days> or memory_expiration_days header
  • Typical Usage: 8k-20k tokens per session

Getting Started

Context Memory is available now:

  1. Add :memory to any model name
  2. Use memory: true header
  3. Combine with other features like :online

The Usage page shows your Memory usage, token counts, and costs. For step-by-step examples, use the code snippets above.

Whether you're coding, researching, or having long conversations, Context Memory ensures your AI remembers everything that matters.