Image-and-audio avatar video generation for speech-driven talking-head clips, with 720p and 1080p output.
Added May 12, 2026
Approx. Price
$0.125 per video
Model Type
image-to-video
Preview Examples
3
Generation controls available for this model.
Resolution/Aspect Options
2
Default Duration
N/A
Tunable Settings
3
Resolution
Default
720p
Options (2)
720p, 1080p
Output video resolution
Seed
Default
-1
Control reproducibility (-1 for random)
Video Prompt
Default
The person is talking.
Controls body movement, framing, and atmosphere
Human preference benchmarks sourced from Artificial Analysis.
No Artificial Analysis benchmark data is available yet for this model.
Artificial Analysis APIStill: one line describing age, outfit, and soft light. Script: short lines, easy for TTS. Voice: warm and calm pace. Video: steady shot with small gestures.
Avatar | 720p | Fast pass
Still: single subject with age, skin, hair, wardrobe, expression, light direction, set, lens, and aspect ratio. Script: full localized script. Voice: role, energy, and what to avoid. Video: fixed camera, repeatable gestures, static or softly blurred background, no pan or zoom.
Avatar | 720p | Locked-in prompt
A male support agent explains a ticket resolution in a reassuring tone, with direct eye contact, small hand gestures, and stable office-style framing.
Avatar | 720p | Support tutorial