Vidu Q3 Pro text-to-video, image-to-video, and start/end-frame video generation with high visual fidelity, 540p/720p/1080p output, 1-16s duration, and optional audio plus background music.
Added May 13, 2026
Approx. Price
$0.250 per video
Model Type
both
Preview Examples
2
Generation controls available for this model.
Resolution/Aspect Options
3
Default Duration
5
16 duration options
Tunable Settings
6
Background Music
Default
Yes
Add background music
Duration
Default
5
Options (16)
1 second, 2 seconds, 3 seconds, 4 seconds +12 more
Video length in seconds (1-16)
Generate Audio
Default
Yes
Generate synchronized audio for text-to-video
Motion
Default
auto
Options (4)
Auto, Small, Medium, Large
Movement intensity
Resolution
Default
720p
Options (3)
540p, 720p, 1080p
Output resolution
Style (T2V only)
Default
general
Human preference benchmarks sourced from Artificial Analysis.
Text to Video
#7 / 81
ELO
1226.0
Appearances
6,643
95% CI
-8/8
Image to Video
#6 / 74
ELO
1286.0
Appearances
6,480
95% CI
-9/9
Release Date 2026-01 · Matched as Vidu Q3 Pro
Artificial Analysis APIUse the input image as the first frame. Preserve the same woman, face identity, suitcase, outfit, bus stop, lighting, rainy atmosphere, and composition. Light rain falls gently, reflections shimmer on the wet street, and distant bus headlights slowly approach. The woman slightly tightens her grip on the suitcase and looks toward the road. The camera slowly pushes in, creating a quiet emotional cinematic moment. Realistic motion, stable identity, no flicker, no distortion, no morphing.
Image-to-Video | 5s | 720p | Audio enabled
Use the first image as the starting frame and the last image as the ending frame. Preserve the same desert highway, phone booth, sunset lighting, dusty atmosphere, character identity, and cinematic composition. Dust moves slowly across the road, the phone receiver swings in the wind, and the character hesitates before stepping into the booth. The warm interior light flickers on as they pick up the receiver. The camera slowly pulls back, emphasizing the empty desert around them. Cinematic mystery, realistic motion, stable identity, no flicker, no morphing.
Start/End-to-Video | 5s | 720p | Audio enabled
Options (2)
General, Anime
Visual style for text-to-video