Browser-Local AI on NanoGPT

May 19, 2026

NanoGPT now supports a browser-local model that runs directly on your device instead of on NanoGPT servers.

The current local model is Qwen2.5 0.5B Instruct (Local). If your browser exposes WebGPU, you can install it once and use it for a set of privacy-sensitive helper tasks inside NanoGPT.

The browser-local model is now available by default on supported devices, with install and setup flows wired into NanoGPT.

Browser-Local AI on NanoGPT

What this means

Once installed, the local model can be used for:

Text-only local chat in the browser
Conversation title generation
Quick reply suggestions
Global Memory analysis
Auto model routing classification

Some of these features run fully locally. Others use the local model only for a helper step.

What runs locally and what does not

There are two broad categories here:

1. Fully local tasks

These can run on-device without sending the relevant text to NanoGPT for that step:

Local chat
Conversation titles
Quick replies
Global Memory analysis

For example, if you use the local model for Memory analysis, the chat text being analyzed for memory suggestions stays on your device for that analysis step.

2. Local decision-making for server features

The local model can also help choose what NanoGPT should do next, without being the final model that answers:

Auto model routing classification

In that setup, the browser-local model classifies the request locally, then NanoGPT routes the actual chat request to the selected normal model. So the classification can happen on-device, but the final selected chat model still runs through NanoGPT as usual.

The current local model

Right now the local model is:

Qwen2.5 0.5B Instruct (Local)

This is a deliberately small model. That means it's limited in intelligence, but the tasks we're using it for are simple tasks where we care about keeping it local, about low latency.

Current characteristics:

Runs with WebGPU
Downloads roughly 400 to 700 MB
Is stored in the browser cache
Can be evicted by the browser later
Has no NanoGPT billing for the local model itself

That last point only applies to the local model work. If you use local Auto model classification and it routes you into a paid server model, the server model still has its normal pricing.

How to install it

The local model only downloads after you explicitly install it.

After installation, NanoGPT shows a setup flow where you can choose which local features should use it:

Global memory
Conversation titles
Model routing for Auto model
Quick replies

This matters because different users want different tradeoffs. Some mainly want private Memory analysis. Others want local titles and quick replies. Others want local Auto model routing but still want normal server chat.

Where to use it

You can reach the local model from a few places:

The model picker
Settings for title generation
Settings for quick replies
Global Memory settings
Auto model selector settings

Once installed, the model can be selected as a local option in those surfaces where it makes sense.

Why local titles and quick replies matter

These are small features, but they are a good fit for local inference.

Conversation titles and quick replies are exactly the kind of short helper tasks where a small on-device model is often "good enough", while also being cheaper and more private than shipping those tasks to a remote model.

That makes the local model practical even if you never intend to use it as your primary chat model.

Why local Memory analysis matters

This is arguably the most interesting part of the rollout.

NanoGPT's Global Memory system can now use the browser-local model to analyze chats for memory-worthy facts on-device. In other words, the analysis step that proposes what should be remembered can stay local to your browser.

That is useful for users who like memory features but are extra sensitive about sending personal conversations to a server-side analyzer.

This does not mean the entire NanoGPT product becomes local-only. It means one important helper step in the memory workflow can now happen on-device.

Limits and tradeoffs

The local model is useful, but it has real constraints:

It requires a browser and device with WebGPU
The model download is still fairly large
Browser cache is not permanent, so the install may disappear later
A 0.5B model is much weaker than top hosted models for difficult reasoning or coding
Local chat currently supports only a narrow text-only v1 scope

So this feature is not a replacement for Claude, Gemini, GPT, or large open-weight server models. It is a new local layer that works best for helper tasks, lightweight chats, and privacy-sensitive client-side analysis.

Why we shipped it this way

There are two bad ways to ship local AI in a product:

Ship a local feature that is too weak to be useful.
Pretend a small local model can replace the entire hosted product.

We are trying to avoid both.

The current rollout focuses on tasks where a small local model is actually a sensible fit:

short generations
classification
summarization-like helper tasks
privacy-sensitive analysis steps

Bottom line

NanoGPT now has a real browser-local AI path for supported devices.

If you want everything on the server, you can ignore it.

If you want more privacy for helper tasks like titles, quick replies, memory analysis, or Auto model classification, you can install the local model and keep those steps on-device.

And if you want to experiment with fully local text chat inside NanoGPT, that is now available too, with the current v1 limitations.

Milan de Reede

CEO & Co-Founder

milan@nano-gpt.com

Browser-Local AI on NanoGPT

May 19, 2026

NanoGPT now supports a browser-local model that runs directly on your device instead of on NanoGPT servers.

The current local model is Qwen2.5 0.5B Instruct (Local). If your browser exposes WebGPU, you can install it once and use it for a set of privacy-sensitive helper tasks inside NanoGPT.

The browser-local model is now available by default on supported devices, with install and setup flows wired into NanoGPT.

Browser-Local AI on NanoGPT

What this means

Once installed, the local model can be used for:

Text-only local chat in the browser
Conversation title generation
Quick reply suggestions
Global Memory analysis
Auto model routing classification

Some of these features run fully locally. Others use the local model only for a helper step.

What runs locally and what does not

There are two broad categories here:

1. Fully local tasks

These can run on-device without sending the relevant text to NanoGPT for that step:

Local chat
Conversation titles
Quick replies
Global Memory analysis

For example, if you use the local model for Memory analysis, the chat text being analyzed for memory suggestions stays on your device for that analysis step.

2. Local decision-making for server features

The local model can also help choose what NanoGPT should do next, without being the final model that answers:

Auto model routing classification

The current local model

Right now the local model is:

Qwen2.5 0.5B Instruct (Local)

This is a deliberately small model. That means it's limited in intelligence, but the tasks we're using it for are simple tasks where we care about keeping it local, about low latency.

Current characteristics:

Runs with WebGPU
Downloads roughly 400 to 700 MB
Is stored in the browser cache
Can be evicted by the browser later
Has no NanoGPT billing for the local model itself

That last point only applies to the local model work. If you use local Auto model classification and it routes you into a paid server model, the server model still has its normal pricing.

How to install it

The local model only downloads after you explicitly install it.

After installation, NanoGPT shows a setup flow where you can choose which local features should use it:

Global memory
Conversation titles
Model routing for Auto model
Quick replies

Where to use it

You can reach the local model from a few places:

The model picker
Settings for title generation
Settings for quick replies
Global Memory settings
Auto model selector settings

Once installed, the model can be selected as a local option in those surfaces where it makes sense.

Why local titles and quick replies matter

These are small features, but they are a good fit for local inference.

That makes the local model practical even if you never intend to use it as your primary chat model.

Why local Memory analysis matters

This is arguably the most interesting part of the rollout.

That is useful for users who like memory features but are extra sensitive about sending personal conversations to a server-side analyzer.

This does not mean the entire NanoGPT product becomes local-only. It means one important helper step in the memory workflow can now happen on-device.

Limits and tradeoffs

The local model is useful, but it has real constraints:

It requires a browser and device with WebGPU
The model download is still fairly large
Browser cache is not permanent, so the install may disappear later
A 0.5B model is much weaker than top hosted models for difficult reasoning or coding
Local chat currently supports only a narrow text-only v1 scope

Why we shipped it this way

There are two bad ways to ship local AI in a product:

Ship a local feature that is too weak to be useful.
Pretend a small local model can replace the entire hosted product.

We are trying to avoid both.

The current rollout focuses on tasks where a small local model is actually a sensible fit:

short generations
classification
summarization-like helper tasks
privacy-sensitive analysis steps

Bottom line

NanoGPT now has a real browser-local AI path for supported devices.

If you want everything on the server, you can ignore it.

If you want more privacy for helper tasks like titles, quick replies, memory analysis, or Auto model classification, you can install the local model and keep those steps on-device.

And if you want to experiment with fully local text chat inside NanoGPT, that is now available too, with the current v1 limitations.

Milan de Reede

CEO & Co-Founder

milan@nano-gpt.com