Explore Video Models

Discover AI video generation models for stunning animations

kling

Kling Video O1 Standard

Text & Image

Kuaishou's unified multi-modal video model (Standard tier) optimized for cost efficiency. Automatically routes based on your inputs: text-only for text-to-video, image for image-to-video, or reference images/video for reference-based generation.

wan

Wan 2.6

Text & Image

Alibaba WanXiang 2.6 - cinematic text-to-video, image-to-video, and reference-to-video generation with multi-shot storytelling support. 720p/1080p, 5-15s clips.

gemini

Veo 3.1 Extend

Video to Video

Extend Veo 3.1 videos by 7 seconds per call with smooth motion, preserved style, and strong scene coherence. Input must be Veo 3.1 generated. Supports up to 20 extensions for max 148 seconds total. 16:9 or 9:16 aspect ratio, 720p or 1080p.

gemini

Veo 3.1 Fast Extend

Video to Video

Fast video extension for Veo 3.1 clips. Adds 7 seconds per call with optimized speed for quick iteration. Input must be Veo 3.1 generated. Supports up to 20 extensions for max 148 seconds total. 16:9 or 9:16, 720p or 1080p.

SeedVR2 Video

Video to Video

Upscale videos with SeedVR2 for crisp details, reduced artifacts, and strong frame-to-frame consistency. Supports 720p, 1080p, 2K, and 4K output for clips up to 10 minutes.

kling

Kling V2 Avatar (Standard)

Image to Video

Turns a single portrait and one audio track into a realistic talking avatar with accurate lip sync, expressive facial motion, and consistent identity. Optional prompt can guide mood or energy.

Scroll to load preview

kling

Kling V2 Avatar (Pro)

Image to Video

Creates social-ready talking avatars from one portrait and your audio with sharper detail, stable motion, and strong identity consistency. Optional prompt to nudge camera feel, expression, or mood.

Scroll to load preview

LatentSync

Video to Video

State-of-the-art audio-to-video lip synchronization using latent diffusion. Upload a talking-head video (480p+) and target audio to generate perfectly synchronized lip movements while preserving identity, pose, and background.

Scroll to load preview

Pixverse v5.5 Effects

Image to Video

Pixverse v5.5 Effects via Wavespeed. Apply cinematic effect presets (Kiss Me AI, Venom, Holy Wings, Muscle Surge, etc.) to portraits. 360p-1080p resolution, 5/8/10s durations.

Scroll to load preview

Pixverse v5.5

Text & Image

Pixverse v5.5 via Wavespeed. Text-to-video, image-to-video, and transition mode (first+last frame morphing). 360p-1080p resolution, 5/8/10s durations. Supports prompt optimization and audio generation.

Scroll to load preview

kling

Kling 2.6 Pro

Text & Image

Latest Kling model with text-to-video and image-to-video capabilities. Supports native audio/voiceover generation. 5s and 10s durations with multiple aspect ratios.

Scroll to load preview

kling

Kling Video O1

Text & Image

Kuaishou's unified multi-modal video model with MVL technology. Automatically routes based on your inputs: text-only for text-to-video, image for image-to-video, video for editing, or both for reference-based generation.

Scroll to load preview

hunyuan

Hunyuan Video 1.5

Text & Image

Hunyuan Video 1.5 generates 5–10 second clips from text or an input image. Supports 480p/720p and landscape or portrait runs routed automatically based on whether an image is attached.

Scroll to load preview

Seedance Upscaler

Video to Video

Enhance existing videos with ByteDance’s Seedance super-resolution for cleaner 1080p, 2K, or 4K output with strong temporal consistency. Supports clips up to 10 minutes.

Scroll to load preview

minimax

MiniMax Hailuo 2.3 Standard

Text & Image

MiniMax Hailuo 2.3 Standard produces 1080p cinematic clips with realistic motion and smooth scene transitions from text prompts or reference images. Choose 6 or 10 second runs for quick drafts or extended shots.

Scroll to load preview

minimax

MiniMax Hailuo 2.3 Pro

Text & Image

MiniMax Hailuo 2.3 Pro delivers cinematic 1080p 5-second clips with advanced physics, prompt fidelity, and character consistency. Supports both pure text prompts and image-conditioned motion.

Scroll to load preview

Avatar Omni Human 1.5

Image to Video

Animate a portrait using ByteDance's cognitive avatar model. Upload a static image and an audio track for expressive lip-sync and emotion.

Scroll to load preview

wan

Video upscaler

Video to Video

Upscale existing videos with FlashVSR for sharper details, reduced compression artifacts, and improved temporal stability. Supports 720p, 1080p, 2K, and 4K output for clips up to 10 minutes.

Scroll to load preview

wan

Wan 2.2 Spicy Extend

Video to Video

Extend existing videos by 5 or 8 seconds with smooth motion and vivid color. Supports 480p or 720p output and preserves temporal coherence.

Scroll to load preview

doubao

SeeDance V1 Pro Fast

Text & Image

Fast Seedance Pro variant. Generates cinematic clips from text or a single reference image with durations up to 12 seconds.

Scroll to load preview

kling

Kling 2.5 Turbo Standard

Image to Video

Image-to-video only version of Kling 2.5 Turbo delivering cinematic motion at 720p with 5s and 10s clips. Optimized for fast, affordable production with 25% lower pricing than Kling 2.1 Standard.

Scroll to load preview

Lightricks LTX-2 Fast

Text & Image

High-speed LTX-2 pipeline tuned for rapid iterations. Convert text or a single image into cinematic clips with synchronized audio in seconds.

Scroll to load preview

Lightricks LTX-2 Pro

Text & Image

Flagship LTX-2 stack for production-ready motion. Generates synchronized audio and rich camera moves from text prompts or reference images.

Scroll to load preview

gemini

Veo 3.1

Text & Image

Text-to-video and image-to-video with optional end frame control. Native audio generation, cinematic realism, and consistent subjects. Supports 4/6/8 seconds at 720p or 1080p.

Scroll to load preview

kling

Kling 2.5 Turbo Pro

Text & Image

Text-to-video and image-to-video with ultra-smooth motion, cinematic visuals, and precise prompt control. Supports 5s and 10s outputs and multiple aspect ratios.

Scroll to load preview

openai

Sora 2

Text & Image

Create highly realistic videos. Toggle Pro for higher quality. Supports text-to-video and image-to-video (image becomes the first frame). Choose orientation and seconds; size must match orientation.

Scroll to load preview

wan

Wan 2.5

Text & Image

Text or image to video with one‑pass audio/voiceover sync. Supports optional custom audio input. 480p/720p/1080p, 5s or 10s.

Scroll to load preview

VEED Fabric 1.0

Image to Video

Turn a static image + an audio track into a natural talking video. Supports 480p/720p output. Audio is required.

Scroll to load preview

wan

Wan 2.2 Plus

Text & Image

Advanced text-to-video and image-to-video model. Supports 480p, 720p, and 1080p output with a fixed 5-second duration.

Scroll to load preview

wan

Wan 2.2 (V2V)

Video to Video

Edit an existing video using a natural language prompt. Examples: "Change the color of the clothes to yellow", "Change the woman to a handsome boy". Supports 480p or 720p output, up to 120 seconds.

Scroll to load preview

doubao

Bytedance Waver 1.0

Image to Video

Image-to-video. Requires an input image. Supports 5s duration only.

Scroll to load preview

wan

Wan 2.2 S2V

Image to Video

Generate a video from a static image and an audio track with realistic lip/body sync.

Scroll to load preview

Pixverse v5

Text & Image

Pixverse v5 video generation model via Runware. Supports text-to-video and image-to-video with customizable styles, effects, camera movements, and sound effects. Resolutions from 360p to 1080p and durations of 5 or 8 seconds.

Scroll to load preview

wan

Wan 2.2 5b

Text & Image

Wan 2.2 5b model produces up to 5 seconds of 720p video at 24FPS with fluid motion and powerful prompt understanding.

Scroll to load preview

wan

Wan 2.2 Turbo

Text & Image

Wan 2.2 Turbo is a faster, simplified version with fewer settings for both text-to-video and image-to-video generation. Variable pricing based on resolution.

Scroll to load preview

wan

Wan 2.2 14b

Text & Image

Wan 2.2 14b is the full version of Wan 2.2 video model that generates high-quality videos with high visual quality and motion diversity from text prompts or images.

Scroll to load preview

vidu

Vidu Q1

Text & Image

Vidu Q1 video generation model. Creates high-quality 5-second videos. Supports both text-to-video and image-to-video generation with customizable visual styles (general or anime), movement amplitude control, and multiple aspect ratios.

Scroll to load preview

Pixverse v4.5

Text & Image

Pixverse v4.5 video generation model. Creates high-quality videos with customizable styles, effects, camera movements, and sound effects. Supports multiple resolutions from 360p to 1080p with durations of 5 or 8 seconds.

Scroll to load preview

gemini

Veo 3 Fast

Text & Image

Google's fast Veo 3 model. Creates high-quality 8-second videos from text or images. Supports audio generation ($1.60 with audio, $1.20 without). Supports 16:9 and 9:16 aspect ratios. For best results, prompts should be descriptive and clear.

Scroll to load preview

midjourney

Midjourney Video

Image to Video

Midjourney Image-to-Video generator creates 4 videos of 5 seconds each from an input image with adjustable motion intensity.

Scroll to load preview

minimax

MiniMax Hailuo 02 Pro

Text & Image

MiniMax Hailuo-02 Pro video generation model with 1080p resolution. Creates high-quality videos from text prompts or images. Supports both text-to-video and image-to-video generation.

Scroll to load preview

minimax

MiniMax Hailuo 02

Text & Image

MiniMax Hailuo-02 Advanced video generation model with 768p resolution. Creates high-quality videos from text prompts or images. Supports both text-to-video and image-to-video generation.

Scroll to load preview

doubao

Seedance 1.0 Pro

Text & Image

ByteDance's Seedance video generation model. Supports both text-to-video and image-to-video generation with 5 and 10 second durations. Supports multiple aspect ratios including 16:9, 1:1, 3:4, 9:16, 21:9.

Scroll to load preview

doubao

Seedance 1.0 Lite

Text & Image

ByteDance's Seedance Lite video generation model. Fast and efficient model that supports both text-to-video and image-to-video generation with 5 and 10 second durations. Supports multiple aspect ratios including 16:9, 1:1, 4:3, and 9:21.

Scroll to load preview

gemini

Veo 3

Text & Image

Google's latest Veo 3 model. Creates high-quality 8-second videos from text or images. Supports audio generation ($4.80 with audio, $3.20 without). For best results, prompts should be descriptive and clear. Include the subject, context, action, style, camera motion, composition, and ambiance details. Note: This model has strict content filters and may reject NSFW or sensitive content — we issue refunds for content policy rejections.

Scroll to load preview

kling

Kling 2.1 Master

Text & Image

Kling 2.1 Master text-to-video and image-to-video model. Premium quality video generation from text or images powered by Runware.

Scroll to load preview

kling

Kling 2.1 Standard

Image to Video

Kling 2.1 Standard image-to-video model. Creates high-quality videos from images with text prompts. Requires an input image.

Scroll to load preview

kling

Kling 2.1 Pro

Image to Video

Kling 2.1 Pro image-to-video model. Higher quality video generation from images with text prompts. Requires an input image.

Scroll to load preview

wan

Wan 2.1

Image to Video

Generate a video from an image and prompt.

Scroll to load preview

kling

Kling 2.0 Master

Text & Image

Kling 2.0 Master text-to-video and image-to-video model. Blockbuster-quality scenes, lifelike characters, and smooth motion from text. Supports both text to video and image to video.

Scroll to load preview

hunyuan

Hunyuan Video

Text to Video

Hunyuan Video text-to-video generator creates high-quality 720p videos with customizable resolution, aspect ratio, and frame count. Features pro mode for enhanced quality.

Scroll to load preview

longstories

Longstories Movie

Text & Image

Generate AI mini-movies from 1 to 10 minutes. Bring any story to life with animated video and voice.

Scroll to load preview

longstories

Longstories Pixel Art

Text & Image

Generate pixel‑art mini‑movies from 1 to 10 minutes. A second universe with a stylized pixel art aesthetic.

Scroll to load preview

wan

Wan 2.5 Extend

Video to Video

Extend short clips to 3–10 seconds while preserving motion, lighting, and audio sync. Upload a base video, optional custom audio, and a prompt. Supports 480p/720p/1080p output.

Scroll to load preview

kling

Kling Lipsync T2V

Video to Video

Text-to-video lipsync. Upload a 2–10 second focal video and provide a script. Kling synthesizes a matching voiceover and animates lips/micro-expressions to the dialogue.

Scroll to load preview

kling

Kling Lipsync A2V

Video to Video

Audio-to-video lipsync. Upload a 2–10 second focal video and a clean vocal track (≤5 MB). Kling aligns mouth shapes and facial muscles to the audio while preserving the original footage.

Scroll to load preview

Lucy Edit Dev

Video to Video

Ultra-fast text-guided video editor. Upload a source clip and describe the desired edit, and Lucy will transform the content while preserving timing, camera motion, and overall composition. Supports clips up to 120 seconds.

Scroll to load preview

Lucy Edit Pro

Video to Video

High-fidelity text-guided video editor focused on cinematic quality, temporal stability, and 720p-ready output. Upload a source clip and a clear prompt to restyle outfits, props, and scenes while preserving motion, timing, and composition. Supports clips up to 120 seconds at 480p or 720p.

Scroll to load preview

runway

Runway Gen-4 Aleph (V2V)

Video to Video

Video-to-video generation. Requires input video + prompt. Optional reference image. Only the first 5 seconds of the input are used by the model; we charge per-second based on your input video length.

Scroll to load preview

wan

Wan 2.2 Animate

Video to Video

Animate or replace a character: provide a reference image and a driver video. Best results when composition, camera, and pose are consistent; keep image and video aspect ratios identical. Supports 480p/720p, up to 120s.

Scroll to load preview

gemini

Veo 2

Text & Image

Google's Veo 2 text-to-video model. Highly censored, we get >50% prompt refusals. We issue refunds for content policy rejections. Creates 720p resolution videos (5-8 seconds) from detailed text descriptions. Image to video also supported. Supports both 16:9 (landscape) and 9:16 (portrait) aspect ratios. For best results, prompts should be descriptive and clear. Include the subject, the context, the action, and the style. A Google model, so unfortunately relatively censored.

Scroll to load preview

gemini

Veo 2 Image-to-Video

Image to Video

Google's Veo 2 image-to-video model. Animates an input image using detailed prompts, producing 720p videos.

Scroll to load preview

minimax

MiniMax T2V-01

Text to Video

Hailuo T2V-01-Live text-to-video API: Transform static art into dynamic masterpieces. Creates vivid 6-second videos with enhanced smoothness and motion. Optimized for stability and subtle expression, supporting a wide range of artistic styles. Has a built-in prompt optimizer that makes it easy to use.

Scroll to load preview

kling

Kling 1.5 Pro

Text to Video

Kling 1.5 Pro text-to-video model. Creates high-quality videos from detailed text descriptions. Supports 16:9 (landscape), 9:16 (portrait), and 1:1 (square) aspect ratios with durations of 5 or 10 seconds.

Scroll to load preview

nanogpt

Video Face Swap

Video to Video

High-quality video face swapping. Swap faces between an image and a video with realistic results. Supports videos up to 4 minutes. Pricing varies by video resolution: 480p (half price), 720p/1080p (standard), 4K+ (1.5x price).

You've seen all 65 models

Explore video models | NanoGPT