Private AI
Private AI
Browse and discover the best AI audio models for text to speech, speech to text, and music.
Gemini 3.1 Flash TTS Preview
Google Gemini 3.1 Flash text-to-speech with inline audio tag and multi-speaker prompt support.
≈ $0.102 per audio
ElevenLabs v3
High-quality text-to-speech with enhanced controls and natural voices.
≈ $0.170 per audio
Inworld TTS 1.5 Max
Inworld flagship TTS model with the best balance of quality and speed, plus enhanced alignment data.
≈ $0.020 per audio
ElevenLabs Turbo V2.5
High quality with lowest latency, ideal for real-time applications. Supports 32 languages while maintaining natural voice quality.
≈ $0.102 per audio
GPT-4o Mini TTS (2025-12-15)
Latest snapshot of GPT-4o Mini TTS with voice instructions support
≈ $0.013 per audio
MiniMax Speech 2.8 HD
Studio-quality HD text-to-speech with expressive delivery, emotion control, pronunciation customization, and fine-grained audio controls for production-ready speech.
≈ $0.170 per audio
Gemini 2.5 Flash Preview TTS
Google Gemini native TTS. Single and multi-speaker support via prompt.
≈ $0.051 per audio
GPT-4o Mini TTS
Ultra-low cost text-to-speech model with voice instructions support
≈ $0.013 per audio
OpenAI TTS
Standard quality text-to-speech model with low latency
≈ $0.025 per audio
Kokoro 82M
High-quality multilingual text-to-speech model
≈ $0.002 per audio
Qwen 3 TTS 1.7B
Bring speech to your texts using Qwen3-TTS with pre-trained voices or cloned voice embeddings.
≈ $0.120 per audio
Google Lyria 3 Pro Music
Google Lyria 3 Pro generates premium music clips from a text prompt, with optional image guidance, negative prompts, and seed-based repeatability.
≈ $0.080 per audio
MiniMax Music Cover
MiniMax Music Cover transforms a reference song into a new style while preserving the core melody. Provide a style prompt plus an MP3 URL for the source track.
≈ $0.150 per audio
Mureka v9 Generate Song
Generate songs with Mureka v9 through WaveSpeed. Priced per generated song.
≈ $0.030 per audio
Whisper Large V3
OpenAI's state-of-the-art speech recognition model
≈ <$0.001 per minute
Wavespeed ACE-Step
ACE-Step composes complete songs from text descriptions using Wavespeed’s music foundation model. Guide genre, mood, and structure with style tags and optional custom lyrics. Generates up to 4 minutes of multi-track audio with vocals.
≈ $0.050 per audio
xAI TTS
Affordable Runware-hosted xAI text-to-speech with five voices, inline expressive controls, and multilingual auto-detection.
≈ $0.017 per audio
Stable Audio 3 Medium
Stable Audio 3 Medium generates high-quality stereo music up to 6 minutes from text prompts, trained on fully licensed data for commercial use.
≈ $0.041 per audio
MAI-Transcribe 1.5
Microsoft fast transcription model with automatic language detection, punctuation, and 100+ BCP-47 locales.
≈ $0.010 per minute
GPT-4o Mini Transcribe
OpenAI's efficient speech-to-text model with improved accuracy over Whisper
≈ $0.005 per minute
MiniMax Speech 2.6 HD
Ultra-Fast, Ultra-Human, Ultra-Smart TTS with <250ms latency, natural voice cloning, seamless multilingual support across 40+ languages, and industry-leading text normalization for flawless, expressive communication.
≈ $0.170 per audio
OpenAI TTS HD
High definition text-to-speech model with superior quality
≈ $0.051 per audio
GPT-4o Mini Transcribe (2025-12-15)
Latest snapshot of GPT-4o Mini Transcribe with improved accuracy
≈ $0.005 per minute
ACE-Step v1.5 Base
ACE-Step v1.5 Base is a Runware-hosted music model built for creator workflows, with stronger fidelity, more reliable stylistic consistency, and prompt-driven genre control for full-song generation.
≈ $0.009 per audio
Gemini 2.5 Pro Preview TTS
Higher-quality Gemini TTS with controllable style and tone.
≈ $0.102 per audio
MAI-Voice-2
Microsoft high-fidelity expressive text-to-speech with multilingual MAI-Voice-2 prebuilt voices.
≈ $0.037 per audio
Whisper-1
OpenAI's original Whisper model with full format support including SRT and VTT subtitles
≈ $0.010 per minute
ElevenLabs Music v1
Eleven Music is a studio‑grade text‑to‑music model. Generate music with natural‑language prompts in any style — perfect for game soundtracks, podcast backgrounds, and marketing reels. Control genre, style, and structure, with optional vocals or instrumental. Supports 10–300s MP3 with selectable sample rate and bitrate.
≈ $0.100 per audio
ACE-Step v1.5 Turbo
ACE-Step v1.5 Turbo is the faster, lower-cost Runware variant for full-song generation, with broad genre coverage, improved stylistic consistency, and text-guided music creation for creator workflows.
≈ $0.006 per audio
ACE-Step 1.5
ACE-Step 1.5 composes complete songs from text descriptions. Guide genre, mood, and structure with style tags and required custom lyrics. Generates up to 4 minutes of multi-track audio with vocals.
≈ $0.050 per audio