Explore Audio Models
Discover AI audio generation models for speech and music
No preview available
Gemini 2.5 Flash Preview TTS
Google Gemini native TTS. Single and multi-speaker support via prompt.
No preview available
GPT-4o Mini TTS
Ultra-low cost text-to-speech model with voice instructions support
No preview available
Kokoro 82M
High-quality multilingual text-to-speech model
No preview available
ElevenLabs Turbo V2.5
High quality with lowest latency, ideal for real-time applications. Supports 32 languages while maintaining natural voice quality.
No preview available
OpenAI TTS
Standard quality text-to-speech model with low latency
No preview available
ElevenLabs Music v1
Eleven Music is a studio‑grade text‑to‑music model. Generate music with natural‑language prompts in any style — perfect for game soundtracks, podcast backgrounds, and marketing reels. Control genre, style, and structure, with optional vocals or instrumental. Supports 10–300s MP3 with selectable sample rate and bitrate.
No preview available
Wavespeed ACE-Step
ACE-Step composes complete songs from text descriptions using Wavespeed’s music foundation model. Guide genre, mood, and structure with style tags and optional custom lyrics. Generates up to 4 minutes of multi-track audio with vocals.
No preview available
MiniMax Music 02
MiniMax Music 02 is a compact MoE music generator (230B params, 10B active) tuned for speedy, cost-effective song creation. Provide a creative prompt plus optional formatted lyrics to render polished, full-length tracks with configurable bitrate and sample rate.
No preview available
MiniMax Speech 02 HD
High-definition text-to-speech with natural pronunciation and multiple voices.
No preview available
MiniMax Speech 2.6 HD
Ultra-Fast, Ultra-Human, Ultra-Smart TTS with <250ms latency, natural voice cloning, seamless multilingual support across 40+ languages, and industry-leading text normalization for flawless, expressive communication.
No preview available
MiniMax Speech 2.6 Turbo
High-definition Text-to-Speech with natural pronunciation and crisp articulation. Supports multiple built-in voices and custom cloned voices, adjustable speed, volume, and pitch, and coverage of 40+ languages for professional audio creation.
No preview available
Gemini 2.5 Pro Preview TTS
Higher-quality Gemini TTS with controllable style and tone.
No preview available
ElevenLabs v3
High-quality text-to-speech with enhanced controls and natural voices.
No preview available
OpenAI TTS HD
High definition text-to-speech model with superior quality
No preview available
GPT-4o Mini TTS (2025-03-20)
Original release snapshot of GPT-4o Mini TTS with voice instructions support
No preview available
GPT-4o Mini TTS (2025-12-15)
Latest snapshot of GPT-4o Mini TTS with voice instructions support
You've seen all 16 models
