Nano GPT logo

NanoGPT

Explore Audio Models

Discover AI audio generation models for speech and music

No preview available

ElevenLabs Music v1

Music

Eleven Music is a studio‑grade text‑to‑music model. Generate music with natural‑language prompts in any style — perfect for game soundtracks, podcast backgrounds, and marketing reels. Control genre, style, and structure, with optional vocals or instrumental. Supports 10–300s MP3 with selectable sample rate and bitrate.

≈ $0.100 per audio

Try ElevenLabs Music v1

No preview available

Inworld TTS 1.5 Mini

Text to Speech

Inworld ultra-fast, cost-efficient TTS model optimized for low latency with enhanced alignment data.

≈ $0.010 per audio

Try Inworld TTS 1.5 Mini

No preview available

google

Gemini 2.5 Flash Preview TTS

Text to Speech

Google Gemini native TTS. Single and multi-speaker support via prompt.

≈ $0.051 per audio

Try Gemini 2.5 Flash Preview TTS

No preview available

xAI TTS

Text to Speech

Affordable Runware-hosted xAI text-to-speech with five voices, inline expressive controls, and multilingual auto-detection.

≈ $0.005 per audio

Try xAI TTS

No preview available

Inworld TTS 1.5 Max

Text to Speech

Inworld flagship TTS model with the best balance of quality and speed, plus enhanced alignment data.

≈ $0.020 per audio

Try Inworld TTS 1.5 Max

No preview available

Kokoro 82M

Text to Speech

High-quality multilingual text-to-speech model

≈ $0.002 per audio

Try Kokoro 82M

No preview available

ACE-Step 1.5

Music

ACE-Step 1.5 composes complete songs from text descriptions. Guide genre, mood, and structure with style tags and optional custom lyrics. Generates up to 4 minutes of multi-track audio with vocals.

≈ $0.050 per audio

Try ACE-Step 1.5

No preview available

fal

ElevenLabs Turbo V2.5

Text to Speech

High quality with lowest latency, ideal for real-time applications. Supports 32 languages while maintaining natural voice quality.

≈ $0.102 per audio

Try ElevenLabs Turbo V2.5

No preview available

fal

ElevenLabs v3

Text to Speech

High-quality text-to-speech with enhanced controls and natural voices.

≈ $0.170 per audio

Try ElevenLabs v3

No preview available

ACE-Step v1.5 Turbo

Music

ACE-Step v1.5 Turbo is the faster, lower-cost Runware variant for full-song generation, with broad genre coverage, improved stylistic consistency, and text-guided music creation for creator workflows.

≈ $0.006 per audio

Try ACE-Step v1.5 Turbo

No preview available

ACE-Step v1.5 Base

Music

ACE-Step v1.5 Base is a Runware-hosted music model built for creator workflows, with stronger fidelity, more reliable stylistic consistency, and prompt-driven genre control for full-song generation.

≈ $0.009 per audio

Try ACE-Step v1.5 Base

No preview available

Wavespeed ACE-Step

Music

ACE-Step composes complete songs from text descriptions using Wavespeed’s music foundation model. Guide genre, mood, and structure with style tags and optional custom lyrics. Generates up to 4 minutes of multi-track audio with vocals.

≈ $0.050 per audio

Try Wavespeed ACE-Step

No preview available

Wavespeed HeartMuLa

Music

HeartMuLa is a state-of-the-art music generation model that creates high-quality songs from structured lyrics and optional style tags.

≈ $0.100 per audio

Try Wavespeed HeartMuLa

No preview available

MiniMax Music 02

Music

MiniMax Music 02 is a compact MoE music generator (230B params, 10B active) tuned for speedy, cost-effective song creation. Provide a creative prompt plus formatted lyrics to render polished, full-length tracks with configurable bitrate and sample rate.

≈ $0.050 per audio

Try MiniMax Music 02

No preview available

MiniMax Music 2.5

Music

MiniMax Music 2.5 generates complete songs from prompts and structured lyrics with high-fidelity audio and expressive vocals. Choose bitrate and sample rate to control output quality.

≈ $0.150 per audio

Try MiniMax Music 2.5

No preview available

Google Lyria 3 Pro Music

Music

Google Lyria 3 Pro generates premium music clips from a text prompt, with optional image guidance, negative prompts, and seed-based repeatability.

≈ $0.080 per audio

Try Google Lyria 3 Pro Music

No preview available

MiniMax Speech 02 HD

Text to Speech

High-definition text-to-speech with natural pronunciation and multiple voices.

≈ $0.100 per audio

Try MiniMax Speech 02 HD

No preview available

MiniMax Speech 2.6 HD

Text to Speech

Ultra-Fast, Ultra-Human, Ultra-Smart TTS with <250ms latency, natural voice cloning, seamless multilingual support across 40+ languages, and industry-leading text normalization for flawless, expressive communication.

≈ $0.170 per audio

Try MiniMax Speech 2.6 HD

No preview available

MiniMax Speech 2.6 Turbo

Text to Speech

High-definition Text-to-Speech with natural pronunciation and crisp articulation. Supports multiple built-in voices and custom cloned voices, adjustable speed, volume, and pitch, and coverage of 40+ languages for professional audio creation.

≈ $0.102 per audio

Try MiniMax Speech 2.6 Turbo

No preview available

MiniMax Speech 2.8 HD

Text to Speech

Studio-quality HD text-to-speech with expressive delivery, emotion control, pronunciation customization, and fine-grained audio controls for production-ready speech.

≈ $0.170 per audio

Try MiniMax Speech 2.8 HD

No preview available

MiniMax Speech 2.8 Turbo

Text to Speech

Fast, cost-effective MiniMax 2.8 text-to-speech with expressive voices, emotion control, pronunciation customization, and full audio output controls.

≈ $0.102 per audio

Try MiniMax Speech 2.8 Turbo

No preview available

VibeVoice

Text to Speech

Long-form text-to-speech with multi-speaker dialogue support and 9 voice presets across English, Chinese, and Hindi.

≈ $0.150 per audio

Try VibeVoice

No preview available

google

Gemini 2.5 Pro Preview TTS

Text to Speech

Higher-quality Gemini TTS with controllable style and tone.

≈ $0.102 per audio

Try Gemini 2.5 Pro Preview TTS

No preview available

fal

Qwen 3 TTS 1.7B

Text to Speech

Bring speech to your texts using Qwen3-TTS with pre-trained voices or cloned voice embeddings.

≈ $0.120 per audio

Try Qwen 3 TTS 1.7B

No preview available

openai

OpenAI TTS

Text to Speech

Standard quality text-to-speech model with low latency

≈ $0.025 per audio

Try OpenAI TTS

No preview available

openai

OpenAI TTS HD

Text to Speech

High definition text-to-speech model with superior quality

≈ $0.051 per audio

Try OpenAI TTS HD

No preview available

openai

GPT-4o Mini TTS

Text to Speech

Ultra-low cost text-to-speech model with voice instructions support

≈ $0.013 per audio

Try GPT-4o Mini TTS

No preview available

openai

GPT-4o Mini TTS (2025-03-20)

Text to Speech

Original release snapshot of GPT-4o Mini TTS with voice instructions support

≈ $0.013 per audio

Try GPT-4o Mini TTS (2025-03-20)

No preview available

openai

GPT-4o Mini TTS (2025-12-15)

Text to Speech

Latest snapshot of GPT-4o Mini TTS with voice instructions support

≈ $0.013 per audio

Try GPT-4o Mini TTS (2025-12-15)

You've seen all 29 models