Context Memory: Unlimited AI Conversations

Aug 4, 2025

Large Language Models have a fundamental limitation: the context window. When conversations get long, models become slow, lose track, or error out entirely.

Context Memory solves this problem. It keeps conversations and coding sessions super snappy and allows them to continue indefinitely while maintaining full awareness of the entire conversation history.

The Problem

Current memory solutions like ChatGPT's memory store general facts but miss something critical: the ability to recall specific events at the right level of detail.

Without this, you get:

Important details forgotten or lost during summarization
Conversations cut short when context limits are reached
AI agents that lose track of their previous work

How Context Memory Works

Context Memory creates a hierarchical structure of your conversation:

High-level summaries for overall context
Mid-level details for important relationships
Specific details when relevant to recent messages

Here's an example from a coding session:

Token estimation function refactoring
|-- Initial user request
|-- Refactoring to support integer inputs
|-- Error: "exceeds the character limit"
|   +-- Fixed by changing test params from strings to integers
+-- Variable name refactoring

When you ask "What errors did we encounter?", Context Memory expands the relevant section while keeping other parts collapsed. The model that you're using (like ChatGPT or Claude) gets the exact detail needed without information overload.

Benefits

For Developers:

Long coding sessions without losing context
Speed! Compresses long histories so your model responds quickly
AI agents that learn from past mistakes
Documentation that maintains context across entire codebases

For Agentic Use Cases:

Long-running agents keep everything in memory, from the very first step to the current status
Use any model for agents with effectively infinite memory — Context Memory stores all history and passes only the relevant bits so the model always stays aware
Reliable planning and backtracking with preserved goals, constraints, decisions, and outcomes
Tool use and multi-step workflows stay coherent across hours, days or weeks, including retries and branches
Resume after interruptions with full state awareness, without hitting context window limits

For Roleplay & Storytelling:

Build far bigger worlds with persistent lore, timelines, and locations that never get forgotten
Characters remember identities, relationships, and evolving backstories across long arcs
Branching plots stay coherent—past choices, clues, and foreshadowing remain available
Resume sessions after days or weeks with full awareness of what happened at the very start
Epic-length narratives without context limits—only the relevant pieces are passed to the model

For Conversations:

Extended discussions without forgetting details
Research projects that build knowledge over time
Complex problem-solving with full history awareness

Important: Enable Memory Early

Turn on Context Memory as early as possible in your conversation! Context Memory progressively indexes your conversation each time, building up a comprehensive understanding. This means:

Starting memory early captures your entire conversation history
You can go over 1 million tokens without hitting limits
The system compresses intelligently, enabling it to return the most relevant information
Later messages benefit from the full context built up over time

The earlier you enable it, the more complete your memory will be.

Using Context Memory

Context Memory overview

Add :memory to any model name:

curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini:memory",
    "messages": [
      { "role": "user", "content": "Hi" },
      { "role": "assistant", "content": "Hi there! How are you?" },
      { "role": "user", "content": "Please help me set up...." }
    ]
  }'

Or use the memory header:

curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  -d '{
    "model": "gpt-4o",
    "messages": [
      { "role": "user", "content": "Hi" },
      { "role": "assistant", "content": "Hi there! How are you?" },
      { "role": "user", "content": "Please help me set up...." }
    ]
  }'

Retention (expiration_days)

By default, Context Memory retains your compressed chat state for 30 days. Retention is rolling and based on the conversation’s last update: each new message resets the timer, and the thread expires N days after its last activity. You can configure retention from 1 to 365 days:

Via model suffix: append :memory-<days>

curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o-mini:memory-90",
    "messages": [
      { "role": "user", "content": "Persist this conversation for 90 days" }
    ]
  }'

Via headers: set memory: true and memory_expiration_days: <1..365>

curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -H "memory: true" \
  -H "memory_expiration_days: 45" \
  -d '{
    "model": "gpt-4o-mini",
    "messages": [
      { "role": "user", "content": "Retain for 45 days via header" }
    ]
  }'

Note: If both suffix and header are provided, the header value for memory_expiration_days takes precedence for retention. Combined features still follow suffix parsing (e.g., :online).

Combine with web search:

# Memory + Web Search
curl -X POST https://nano-gpt.com/api/v1/chat/completions \
  -H "Authorization: Bearer $NANOGPT_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "gpt-4o:online:memory",
    "messages": [
      { "role": "user", "content": "Hi" },
      { "role": "assistant", "content": "Hi there! How are you?" },
      { "role": "user", "content": "Please search for and help me set up...." }
    ]
  }'

How It Works Technically

Context Memory uses your conversation messages as the identifier. This means:

Natural conversation branching
Easy reverting to earlier states
No complex indexing required

The system compresses long histories without losing information. Everything is preserved at the appropriate level of detail.

What happens behind the scenes:

You send your full conversation history to our API
Context Memory compresses this into a compact representation
Only the compressed version is sent to the AI model (OpenAI, Anthropic, etc.)
The model receives all the context it needs without hitting token limits

This means you can have conversations with millions of tokens of history, but the AI model only sees the intelligently compressed version that fits within its context window.

Provider: Polychat

Privacy with Polychat: When using Context Memory, your conversation data is processed by Polychat's API which uses Google/Gemini in the background with maximum privacy settings.

You can review Polychat's full privacy policy at https://polychat.co/legal/privacy.

Important privacy details:

Context Memory over the API does not send data to Google Analytics or use cookies
Only your conversation messages are sent to Polychat for compression
No email, IP address, or other metadata is shared, only the prompts

Pricing

Temporary discount pricing (limited time):
Not-cached input: $3.75 per million tokens (was $5.00)
Cached input: $1.00 per million tokens (was $2.50)
Output Generation: $1.25 per million tokens (was $10.00)
Retention: 30 days by default; configurable 1–365 days via :memory-<days> or memory_expiration_days header
Typical Usage: 8k-20k tokens per session

Getting Started

Context Memory is available now:

Add :memory to any model name
Use memory: true header
Combine with other features like :online

The Usage page shows your Memory usage, token counts, and costs. For step-by-step examples, use the code snippets above.

Whether you're coding, researching, or having long conversations, Context Memory ensures your AI remembers everything that matters.

Back to Blog