Kuaishou's unified multi-modal video model (Standard tier) optimized for cost efficiency. Supports text-only input for text-to-video, image input for image-to-video, reference images/video for reference-based generation, or video-only input for natural language video editing.
Added Dec 17, 2025
Approx. Price
$0.420 per video
Model Type
both
Preview Examples
3
Generation controls available for this model.
Resolution/Aspect Options
N/A
Default Duration
5
2 duration options
Tunable Settings
3
Duration
Default
5
Options (2)
5 seconds, 10 seconds
Video duration in seconds
Keep Original Sound
Default
true
Options (2)
Yes, No
Preserve original audio when using video input
Mode
Default
auto
Options (3)
Auto-detect, Edit Video, Reference to Video
How to process video input: Edit modifies the video, Reference uses it as style guidance for new generation
Human preference benchmarks sourced from Artificial Analysis.
No Artificial Analysis benchmark data is available yet for this model.
Artificial Analysis APINeon reflections shimmer on the glass door of a small convenience store at night. A lone figure in a raincoat steps inside, the bell chiming softly. Inside, the warm fluorescent lights contrast with the cold blue glow from outside. The camera slowly follows them down an aisle stocked with colorful snacks and magazines.
Soft afternoon light filters through tall trees as a young woman sits on a park bench, reading a book. A gentle breeze rustles the leaves above her, casting dancing shadows on the pages. She looks up briefly, smiling at something in the distance, before returning to her reading.
A young man sits motionless in the subway carriage. The fluorescent lights flicker overhead as the train sways gently. Through the window, the dark tunnel walls rush past, occasionally broken by the flash of passing lights. His reflection in the glass stares back at him contemplatively.