Discover AI audio generation models for speech and music
No preview available
ElevenLabs Music v1
Eleven Music is a studio‑grade text‑to‑music model. Generate music with natural‑language prompts in any style — perfect for game soundtracks, podcast backgrounds, and marketing reels. Control genre, style, and structure, with optional vocals or instrumental. Supports 10–300s MP3 with selectable sample rate and bitrate.
No preview available
GPT-4o Mini TTS
Ultra-low cost text-to-speech model with voice instructions support
No preview available
MiniMax Speech 2.8 HD
Studio-quality HD text-to-speech with expressive delivery, emotion control, pronunciation customization, and fine-grained audio controls for production-ready speech.
No preview available
Qwen 3 TTS 1.7B
Bring speech to your texts using Qwen3-TTS with pre-trained voices or cloned voice embeddings.
No preview available
OpenAI TTS HD
High definition text-to-speech model with superior quality
No preview available
Gemini 2.5 Flash Preview TTS
Google Gemini native TTS. Single and multi-speaker support via prompt.
No preview available
MiniMax Music 2.5
MiniMax Music 2.5 generates complete songs from prompts and structured lyrics with high-fidelity audio and expressive vocals. Choose bitrate and sample rate to control output quality.
No preview available
VibeVoice
Long-form text-to-speech with multi-speaker dialogue support and 9 voice presets across English, Chinese, and Hindi.
No preview available
Gemini 2.5 Pro Preview TTS
Higher-quality Gemini TTS with controllable style and tone.
No preview available
MiniMax Speech 02 HD
High-definition text-to-speech with natural pronunciation and multiple voices.
No preview available
Kokoro 82M
High-quality multilingual text-to-speech model
No preview available
ElevenLabs v3
High-quality text-to-speech with enhanced controls and natural voices.
No preview available
MiniMax Speech 2.6 Turbo
High-definition Text-to-Speech with natural pronunciation and crisp articulation. Supports multiple built-in voices and custom cloned voices, adjustable speed, volume, and pitch, and coverage of 40+ languages for professional audio creation.
No preview available
ElevenLabs Turbo V2.5
High quality with lowest latency, ideal for real-time applications. Supports 32 languages while maintaining natural voice quality.
No preview available
OpenAI TTS
Standard quality text-to-speech model with low latency
No preview available
GPT-4o Mini TTS (2025-12-15)
Latest snapshot of GPT-4o Mini TTS with voice instructions support
No preview available
Wavespeed ACE-Step
ACE-Step composes complete songs from text descriptions using Wavespeed’s music foundation model. Guide genre, mood, and structure with style tags and optional custom lyrics. Generates up to 4 minutes of multi-track audio with vocals.
No preview available
ACE-Step 1.5
ACE-Step 1.5 composes complete songs from text descriptions. Guide genre, mood, and structure with style tags and optional custom lyrics. Generates up to 4 minutes of multi-track audio with vocals.
No preview available
Wavespeed HeartMuLa
HeartMuLa is a state-of-the-art music generation model that creates high-quality songs from structured lyrics and optional style tags.
No preview available
MiniMax Music 02
MiniMax Music 02 is a compact MoE music generator (230B params, 10B active) tuned for speedy, cost-effective song creation. Provide a creative prompt plus optional formatted lyrics to render polished, full-length tracks with configurable bitrate and sample rate.
No preview available
MiniMax Speech 2.6 HD
Ultra-Fast, Ultra-Human, Ultra-Smart TTS with <250ms latency, natural voice cloning, seamless multilingual support across 40+ languages, and industry-leading text normalization for flawless, expressive communication.
No preview available
MiniMax Speech 2.8 Turbo
Fast, cost-effective MiniMax 2.8 text-to-speech with expressive voices, emotion control, pronunciation customization, and full audio output controls.
No preview available
Inworld TTS 1.5 Max
Inworld flagship TTS model with the best balance of quality and speed, plus enhanced alignment data.
No preview available
Inworld TTS 1.5 Mini
Inworld ultra-fast, cost-efficient TTS model optimized for low latency with enhanced alignment data.
No preview available
GPT-4o Mini TTS (2025-03-20)
Original release snapshot of GPT-4o Mini TTS with voice instructions support
You've seen all 25 models