Private AI
Private AI
Browse and discover the best AI audio models for text to speech, speech to text, and music.
Mirelo SFX1.6 Text to Audio
Mirelo SFX1.6 creates sound effects and ambient audio directly from text prompts, with optional seamless ambience looping.
≈ $0.100 per audio
Inworld TTS 1.5 Mini
Inworld ultra-fast, cost-efficient TTS model optimized for low latency with enhanced alignment data.
≈ $0.010 per audio
Grok Speech-to-Text
xAI speech-to-text on fal with diarization, word-level timestamps, and multichannel audio support
≈ $0.003 per minute
Inworld Realtime TTS 2
Inworld Realtime TTS 2.0 for expressive, low-latency text-to-speech with natural-language steering and multilingual support.
≈ $0.060 per audio
Google Lyria 3 Pro Music
Google Lyria 3 Pro generates premium music clips from a text prompt, with optional image guidance, negative prompts, and seed-based repeatability.
≈ $0.080 per audio
GPT-4o Mini TTS (2025-12-15)
Latest snapshot of GPT-4o Mini TTS with voice instructions support
≈ $0.013 per audio
GPT-4o Mini TTS
Ultra-low cost text-to-speech model with voice instructions support
≈ $0.013 per audio
Gemini 2.5 Flash Preview TTS
Google Gemini native TTS. Single and multi-speaker support via prompt.
≈ $0.051 per audio
Gemini 3.1 Flash TTS Preview
Google Gemini 3.1 Flash text-to-speech with inline audio tag and multi-speaker prompt support.
≈ $0.102 per audio
Omnivoice
Massively multilingual zero-shot text-to-speech with auto voice mode and optional natural-language voice descriptions.
≈ $0.050 per audio
Mureka O2 Generate Song
Generate songs with Mureka O2. Priced per generated song.
≈ $0.150 per audio
VibeVoice
Long-form text-to-speech with multi-speaker dialogue support and 9 voice presets across English, Chinese, and Hindi.
≈ $0.150 per audio
Stable Audio 3 Medium
Stable Audio 3 Medium generates high-quality stereo music up to 6 minutes from text prompts, trained on fully licensed data for commercial use.
≈ $0.041 per audio
ElevenLabs v3
High-quality text-to-speech with enhanced controls and natural voices.
≈ $0.170 per audio
Stable Audio 3 Small SFX
Stable Audio 3 Small SFX generates high-quality sound effects from text prompts, with controllable clip length, output format, negative prompt, and seed.
≈ $0.023 per audio
Whisper Large V3
OpenAI's state-of-the-art speech recognition model
≈ <$0.001 per minute
Inworld TTS 1.5 Max
Inworld flagship TTS model with the best balance of quality and speed, plus enhanced alignment data.
≈ $0.020 per audio
ElevenLabs Turbo V2.5
High quality with lowest latency, ideal for real-time applications. Supports 32 languages while maintaining natural voice quality.
≈ $0.102 per audio