Discover AI audio generation models for speech and music
No preview available
ElevenLabs Turbo V2.5
High quality with lowest latency, ideal for real-time applications. Supports 32 languages while maintaining natural voice quality.
≈ $0.102 per audio
No preview available
GPT-4o Mini TTS
Ultra-low cost text-to-speech model with voice instructions support
≈ $0.013 per audio
No preview available
ElevenLabs Music v1
Eleven Music is a studio‑grade text‑to‑music model. Generate music with natural‑language prompts in any style — perfect for game soundtracks, podcast backgrounds, and marketing reels. Control genre, style, and structure, with optional vocals or instrumental. Supports 10–300s MP3 with selectable sample rate and bitrate.
≈ $0.100 per audio
No preview available
ElevenLabs v3
High-quality text-to-speech with enhanced controls and natural voices.
≈ $0.170 per audio
No preview available
MiniMax Music 2.5
MiniMax Music 2.5 generates complete songs from prompts and structured lyrics with high-fidelity audio and expressive vocals. Choose bitrate and sample rate to control output quality.
≈ $0.150 per audio
No preview available
OpenAI TTS
Standard quality text-to-speech model with low latency
≈ $0.025 per audio
No preview available
Qwen 3 TTS 1.7B
Bring speech to your texts using Qwen3-TTS with pre-trained voices or cloned voice embeddings.
≈ $0.120 per audio
No preview available
Kokoro 82M
High-quality multilingual text-to-speech model
≈ $0.002 per audio
No preview available
Wavespeed HeartMuLa
HeartMuLa is a state-of-the-art music generation model that creates high-quality songs from structured lyrics and optional style tags.
≈ $0.100 per audio
No preview available
ACE-Step 1.5
ACE-Step 1.5 composes complete songs from text descriptions. Guide genre, mood, and structure with style tags and optional custom lyrics. Generates up to 4 minutes of multi-track audio with vocals.
≈ $0.050 per audio
No preview available
OpenAI TTS HD
High definition text-to-speech model with superior quality
≈ $0.051 per audio
No preview available
Gemini 2.5 Flash Preview TTS
Google Gemini native TTS. Single and multi-speaker support via prompt.
≈ $0.051 per audio
No preview available
Inworld TTS 1.5 Mini
Inworld ultra-fast, cost-efficient TTS model optimized for low latency with enhanced alignment data.
≈ $0.010 per audio
No preview available
Wavespeed ACE-Step
ACE-Step composes complete songs from text descriptions using Wavespeed’s music foundation model. Guide genre, mood, and structure with style tags and optional custom lyrics. Generates up to 4 minutes of multi-track audio with vocals.
≈ $0.050 per audio
No preview available
MiniMax Music 02
MiniMax Music 02 is a compact MoE music generator (230B params, 10B active) tuned for speedy, cost-effective song creation. Provide a creative prompt plus optional formatted lyrics to render polished, full-length tracks with configurable bitrate and sample rate.
≈ $0.050 per audio
No preview available
MiniMax Speech 02 HD
High-definition text-to-speech with natural pronunciation and multiple voices.
≈ $0.100 per audio
No preview available
MiniMax Speech 2.6 HD
Ultra-Fast, Ultra-Human, Ultra-Smart TTS with <250ms latency, natural voice cloning, seamless multilingual support across 40+ languages, and industry-leading text normalization for flawless, expressive communication.
≈ $0.170 per audio
No preview available
MiniMax Speech 2.6 Turbo
High-definition Text-to-Speech with natural pronunciation and crisp articulation. Supports multiple built-in voices and custom cloned voices, adjustable speed, volume, and pitch, and coverage of 40+ languages for professional audio creation.
≈ $0.102 per audio
No preview available
MiniMax Speech 2.8 HD
Studio-quality HD text-to-speech with expressive delivery, emotion control, pronunciation customization, and fine-grained audio controls for production-ready speech.
≈ $0.170 per audio
No preview available
MiniMax Speech 2.8 Turbo
Fast, cost-effective MiniMax 2.8 text-to-speech with expressive voices, emotion control, pronunciation customization, and full audio output controls.
≈ $0.102 per audio
No preview available
VibeVoice
Long-form text-to-speech with multi-speaker dialogue support and 9 voice presets across English, Chinese, and Hindi.
≈ $0.150 per audio
No preview available
Inworld TTS 1.5 Max
Inworld flagship TTS model with the best balance of quality and speed, plus enhanced alignment data.
≈ $0.020 per audio
No preview available
Gemini 2.5 Pro Preview TTS
Higher-quality Gemini TTS with controllable style and tone.
≈ $0.102 per audio
No preview available
GPT-4o Mini TTS (2025-03-20)
Original release snapshot of GPT-4o Mini TTS with voice instructions support
≈ $0.013 per audio
No preview available
GPT-4o Mini TTS (2025-12-15)
Latest snapshot of GPT-4o Mini TTS with voice instructions support
≈ $0.013 per audio
You've seen all 25 models