Expressive image-to-video avatar generation with natural facial performance, realistic body motion, accurate A/V sync, and optional driving audio for lip-sync mode.
Added Apr 3, 2026
Approx. Price
$0.250 per video
Model Type
image-to-video
Preview Examples
3
Generation controls available for this model.
Resolution/Aspect Options
4
Default Duration
5
30 duration options
Tunable Settings
7
Driving Audio URL
Default
N/A
Optional audio URL for lip-sync mode. If omitted, audio is generated from the prompt.
Duration
Default
5
Options (30)
1 second, 2 seconds, 3 seconds, 4 seconds +26 more
Length of the generated video in seconds.
Guidance Scale
Default
5
Optional classifier-free guidance scale (0-20).
Inference Steps
Default
8
Optional denoising steps (1-50).
Resolution
Default
256p
Options (4)
256p, 540p, 720p, 1080p
Output resolution.
Safety Checker
Human preference benchmarks sourced from Artificial Analysis.
No Artificial Analysis benchmark data is available yet for this model.
Artificial Analysis APIClose-up talking-head avatar in a softly lit home studio, natural eye contact, subtle head nods, realistic lip sync, 6-second social clip.
Waist-up spokesperson in a modern office, clear enunciation, confident hand gestures, clean background bokeh, cinematic portrait framing.
Stylized character giving a short product announcement, expressive facial performance, synced speech pacing, smooth body motion, high-detail skin and hair.
Default
Yes
Run prompt and image safety checks before generation.
Seed
Optional seed for reproducible output.