image-to-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Image-to-Video AI Generation — Skill Reference

图像转视频AI生成技能参考

Version: 1.0.0 | Updated: 2026-03-02 | Category: Content & Video

版本: 1.0.0 | 更新日期: 2026-03-02 | 分类: 内容与视频

1. Tool Comparison Matrix

1. 工具对比矩阵

Tool	Best Model (Mar 2026)	Max Length	I2V	Free Tier	Max Resolution	Native Audio	Best For
Runway	Gen-4.5	10s	Yes	125 one-time credits (~25s Gen-4 Turbo)	4K (upscale)	No	Cinematic consistency, character ref
Kling	Kling 3.0 / 2.6 Pro	15s (3.0) / 10s (2.6)	Yes	66 daily credits (360-540p, watermark)	1080p (Master)	Yes (2.6+)	Motion control, product detail, fashion
Pika	Pika 2.5	10s	Yes	80 monthly credits (480p, watermark)	1080p+ (paid)	No	Creative effects (Pikaswaps, Pikadditions)
Luma	Ray3 / Ray3 Modify	20s (720p+)	Yes	30 gens/month (draft res, watermark)	1080p	No	Long clips, start+end frame, cinematic
Sora	Sora 2 / Sora 2 Pro	25s	Yes	None (Plus $20/mo minimum)	1080p (Pro: 1792x1024)	Yes	Narrative scenes, physics, dialogue
Vidu	Vidu Q3	16s	Yes	3 videos/month (720p, watermark)	4K (Q3 Pro)	Yes (native)	Multi-shot sequences, synced audio
Hailuo/MiniMax	Hailuo 2.3	10s	Yes	Daily bonus credits (720p, watermark)	1080p (paid)	Yes (2.6+)	Speed, social content, A/B testing
Google Veo	Veo 3.1	8s	Yes (Ingredients)	Limited (Gemini free: older model)	4K (3840x2160)	Yes	4K output, film language, camera control
Adobe Firefly	Firefly Video	5s	Yes	Limited credits (with CC sub)	2K native (up to 8K upscale)	No	Commercial-safe (IP indemnity), integration with CC
Seedance	Seedance 2.0 (ByteDance)	15s	Yes	Free credits on signup	1080p	Yes (native)	Multimodal input, fast generation
WAN	WAN 2.6 / 2.1	10s	Yes	Open source (run locally)	1080p	No	Open source, self-hosted, general-purpose

工具	最佳模型（2026年3月）	最长时长	I2V支持	免费额度	最高分辨率	原生音频	适用场景
Runway	Gen-4.5	10s	是	125次一次性额度（约25秒Gen-4 Turbo生成时长）	4K（可 upscale）	否	电影级一致性、角色参考
Kling	Kling 3.0 / 2.6 Pro	15s（3.0）/ 10s（2.6）	是	每日66个额度（360-540p，带水印）	1080p（Master版）	是（2.6及以上版本）	运动控制、产品细节、时尚领域
Pika	Pika 2.5	10s	是	每月80个额度（480p，带水印）	1080p+（付费版）	否	创意特效（Pikaswaps、Pikadditions）
Luma	Ray3 / Ray3 Modify	20s（720p+）	是	每月30次生成额度（草稿分辨率，带水印）	1080p	否	长片段、起止帧设定、电影级效果
Sora	Sora 2 / Sora 2 Pro	25s	是	无（最低需订阅Plus版，20美元/月）	1080p（Pro版：1792x1024）	是	叙事场景、物理效果、对话生成
Vidu	Vidu Q3	16s	是	每月3条视频额度（720p，带水印）	4K（Q3 Pro版）	是（原生）	多镜头序列、音频同步
Hailuo/MiniMax	Hailuo 2.3	10s	是	每日奖励额度（720p，带水印）	1080p（付费版）	是（2.6及以上版本）	快速生成、社交内容、A/B测试
Google Veo	Veo 3.1	8s	是（Ingredients功能）	有限额度（Gemini免费版：仅支持旧模型）	4K（3840x2160）	是	4K输出、电影语言、相机控制
Adobe Firefly	Firefly Video	5s	是	有限额度（需订阅CC）	原生2K（可 upscale至8K）	否	商用安全（IP保障）、与CC生态集成
Seedance	Seedance 2.0（字节跳动）	15s	是	注册赠送免费额度	1080p	是（原生）	多模态输入、快速生成
WAN	WAN 2.6 / 2.1	10s	是	开源（本地运行）	1080p	否	开源、自托管、通用场景

2. Detailed Tool Profiles

2. 工具详细介绍

Runway (Gen-4 / Gen-4.5)

Runway（Gen-4 / Gen-4.5）

Current Models:

Gen-4.5 (latest, Jan 2026): State-of-the-art motion quality, prompt adherence, visual fidelity. Variable durations 2-10s.
Gen-4 Turbo: Fast, economical (5 credits/sec vs Gen-4.5 at 25 credits/sec). Good for iteration.
Gen-4: Mid-tier (12 credits/sec). Balanced quality/cost.

Image-to-Video Specifics:

Upload a reference image + text prompt describing motion
Choose duration (5 or 10 seconds) and aspect ratio
Enable "Fixed Seed" for reproducible motion
Reference images maintain character appearance, clothing, features across scenes
Strong spatial understanding — objects/backgrounds stay coherent during camera movement

Pricing:

Free: 125 one-time credits (~25s of Gen-4 Turbo, ~5s of Gen-4.5)
Standard: $12-15/mo (625 credits)
Pro: $28-35/mo (2,250 credits)
Unlimited: $76-95/mo (2,250 fast + unlimited relaxed)

Prompt Best Practices (Runway-Specific):

Focus prompts EXCLUSIVELY on motion — do NOT re-describe what is in the image
Start simple, iterate by adding detail
Use camera terms: pan, tilt, dolly, orbit, zoom, truck, pedestal, crane, rack focus, crash zoom
Structure: "The camera [motion] as [subject action]"
Abstract/conceptual language causes unpredictable results — be specific and physical
Re-describing image elements in detail can reduce motion or cause artifacts

Sources: Runway Pricing | Gen-4 Research | Gen-4.5 Research

当前模型:

Gen-4.5（最新版，2026年1月）：顶尖的运动质量、提示词贴合度、视觉保真度。支持2-10秒可变时长。
Gen-4 Turbo：生成快速、成本经济（每秒5个额度，对比Gen-4.5的每秒25个额度）。适合迭代测试。
Gen-4：中端模型（每秒12个额度）。平衡质量与成本。

图像转视频特性:

上传参考图片+描述动作的文字提示词
选择时长（5或10秒）和宽高比
启用"Fixed Seed"功能实现可复现的运动效果
参考图片可在不同场景中保持角色外观、服装、特征一致性
强大的空间理解能力——物体/背景在相机运动时保持连贯

定价:

免费版：125次一次性额度（约25秒Gen-4 Turbo生成时长，约5秒Gen-4.5生成时长）
标准版：12-15美元/月（625个额度）
专业版：28-35美元/月（2250个额度）
无限版：76-95美元/月（2250个快速生成额度+无限慢速生成额度）

Runway专属提示词最佳实践:

提示词仅聚焦动作——不要重复描述图片中的内容
从简单描述开始，逐步添加细节迭代
使用相机术语：pan（摇镜头）、tilt（俯仰镜头）、dolly（推拉镜头）、orbit（环绕镜头）、zoom（变焦）、truck（横移镜头）、pedestal（升降镜头）、crane（吊臂镜头）、rack focus（焦点切换）、crash zoom（急变焦）
结构："相机[动作]，同时[主体动作]"
抽象/概念化语言会导致不可预测的结果——描述要具体、贴合物理动作
详细重复描述图片元素会降低运动效果或产生伪影

参考来源: Runway定价 | Gen-4研究 | Gen-4.5研究

Kling AI (Kling 3.0 / 2.6 Pro)

Kling AI（Kling 3.0 / 2.6 Pro）

Current Models:

Kling 3.0 (latest): Scene-aware generation, character/prop consistency, native audio, 3-15s clips
Kling 2.6 Pro: Built-in English and Chinese audio, stronger prompt control, cinematic realism
Kling 2.6 Motion Control: Upload a motion reference video to guide character movement
Variants: Turbo (fast), Pro (balanced), Master (highest quality)

Image-to-Video Specifics:

Upload image as subject + describe movement in prompt
Motion Control mode: image + reference video for precise motion transfer
Preserves edges, logos, and fabric details (great for product/fashion)

Pricing:

Free: 66 daily credits (resets every 24h, no rollover). 360-540p, watermarked, non-commercial
Paid plans: $6.99-180/mo depending on tier

Prompt Best Practices (Kling-Specific):

For I2V: describe ONLY what should move/change + camera behavior. The image IS the scene.
Keep ONE main action ("hero action"). Hint at secondary motion only.
For Motion Control: do NOT describe motion in prompt (the reference video defines it). Use prompt for environment/look only.
Use terms like "slow push-in", "drone follow", "lateral track"
Describe pace with words like "glides smoothly" or "jerks to a halt"
Ensure character limbs are visible in source image (hidden limbs cause hallucination/extra fingers)
Leave "breathing room" around subject for movement
Match aspect ratios between image and motion reference

Sources: Kling AI | Kling 3.0 Guide | Kling 2.6 Motion Control

当前模型:

Kling 3.0（最新版）：场景感知生成、角色/道具一致性、原生音频支持、3-15秒片段
Kling 2.6 Pro：内置中英文音频、更强的提示词控制、电影级真实感
Kling 2.6 Motion Control：上传运动参考视频引导角色动作
变体：Turbo（快速）、Pro（平衡）、Master（最高质量）

图像转视频特性:

上传主体图片+描述动作的提示词
运动控制模式：图片+参考视频实现精准动作迁移
保留边缘、Logo和面料细节（非常适合产品/时尚领域）

定价:

免费版：每日66个额度（24小时重置，不累计）。360-540p，带水印，非商用
付费方案：6.99-180美元/月，依档位而定

Kling专属提示词最佳实践:

图像转视频场景：仅描述需要移动/变化的内容+相机行为。图片本身就是场景。
保持一个核心动作（"核心动作"）。仅略微提及次要动作。
运动控制模式：不要在提示词中描述动作（参考视频已定义动作）。提示词仅用于描述环境/风格。
使用"slow push-in"（缓慢推入）、"drone follow"（无人机跟拍）、"lateral track"（横向追踪）等术语
用"glides smoothly"（平滑滑动）或"jerks to a halt"（突然停止）等词汇描述节奏
确保源图片中角色肢体可见（隐藏肢体会导致幻觉/多余手指）
主体周围预留"活动空间"
匹配图片与参考视频的宽高比

参考来源: Kling AI | Kling 3.0指南 | Kling 2.6 Motion Control

Pika Labs (Pika 2.5)

Pika Labs（Pika 2.5）

Current Models:

Pika 2.5 (latest): Sharper, smoother cinematic clips. Upgraded engine.
Pikaformance: Talking face model for lifelike voice-to-face performances
AI Selves: Personalized AI avatar creation

Key Features:

Pikaframes: Turn 2-5 images into smooth transition video with realistic movement
Pikaswaps: Replace objects in video (e.g., dog -> robot) with preserved lighting/motion
Pikadditions: Insert new characters/objects into footage
Scene Ingredients: Upload your own characters/objects for consistency

Pricing:

Free: 80 monthly credits. 480p only, watermarked, non-commercial
Paid: Unlocks all resolutions, removes watermark, commercial use

Prompt Best Practices (Pika-Specific):

Great for creative/stylized transformations rather than photorealistic
Use Pikaframes for multi-image storytelling
Specify lighting and physics behavior for realistic material interactions
Best for short creative social clips and effects-heavy content

Sources: Pika Pricing | Pika 2.5 Release

当前模型:

Pika 2.5（最新版）：更清晰、流畅的电影级片段。引擎升级。
Pikaformance：用于逼真语音转面部动作的模型
AI Selves：个性化AI头像创建

核心功能:

Pikaframes：将2-5张图片转为带有真实运动效果的平滑过渡视频
Pikaswaps：替换视频中的物体（如：狗→机器人），同时保留光线/运动效果
Pikadditions：向视频中插入新角色/物体
Scene Ingredients：上传自定义角色/物体以保证一致性

定价:

免费版：每月80个额度。仅支持480p，带水印，非商用
付费版：解锁所有分辨率，去除水印，支持商用

Pika专属提示词最佳实践:

更适合创意/风格化转换，而非写实风格
使用Pikaframes实现多图片叙事
指定光线和物理行为以实现逼真的材质交互
最适合制作短时长创意社交片段和特效丰富的内容

参考来源: Pika定价 | Pika 2.5发布

Luma Dream Machine (Ray3)

Luma Dream Machine（Ray3）

Current Models:

Ray3: Primary generation model. Supports 5-20s video depending on resolution.
Ray3 Modify: Modify existing footage with character reference images
Ray3.14: Draft resolution model (available on free tier)

Image-to-Video Specifics:

Upload still image, animate with natural motion and cinematic camera action
Start+End frame feature: provide first and last frame, AI generates the transition
Adds subtle camera pans, zooms, perspective shifts automatically

Pricing:

Free: $0/mo. 30 gens/month. Draft resolution (Ray3.14), 720p images, watermarked, personal only
Lite: $9.99/mo. 3,200 credits. 1080p images, watermarked, non-commercial
Plus: $29.99/mo. 10,000 credits. No watermark, commercial rights
Unlimited: $94.99/mo. 10,000 fast + unlimited relaxed

Video Duration by Resolution:

540p SDR: 5s (160 credits), 10s (320 credits)
720p SDR: 5-20s
1080p SDR: up to 20s

Sources: Luma Pricing | Dream Machine | Ray3 Info

当前模型:

Ray3：核心生成模型。根据分辨率支持5-20秒视频。
Ray3 Modify：使用角色参考图片修改现有视频
Ray3.14：草稿分辨率模型（免费版可用）

图像转视频特性:

上传静态图片，通过自然运动和电影级相机动作实现动效
起止帧功能：提供第一帧和最后一帧，AI生成过渡内容
自动添加细微的相机摇移、变焦、视角转换

定价:

免费版：0美元/月。每月30次生成额度。草稿分辨率（Ray3.14），720p图片，带水印，仅个人使用
Lite版：9.99美元/月。3200个额度。1080p图片，带水印，非商用
Plus版：29.99美元/月。10000个额度。无水印，商用授权
无限版：94.99美元/月。10000个快速生成额度+无限慢速生成额度

不同分辨率对应的视频时长:

540p SDR：5秒（160个额度），10秒（320个额度）
720p SDR：5-20秒
1080p SDR：最长20秒

参考来源: Luma定价 | Dream Machine | Ray3信息

OpenAI Sora (Sora 2)

OpenAI Sora（Sora 2）

Current Models:

Sora 2: Text-to-video and image-to-video with synchronized audio
Sora 2 Pro: Higher resolution (1792x1024) and better quality

Image-to-Video Specifics:

Start with a still image and expand it into motion
Physically accurate, realistic, controllable
Can insert people into any Sora-generated environment with accurate appearance and voice
Native dialogue and sound effects generation

Pricing:

NO free tier (as of Jan 10, 2026)
ChatGPT Plus ($20/mo): Unlimited 480p video generation
ChatGPT Pro ($200/mo): Higher quality, priority access
API: $0.10/sec (720p), $0.30/sec (720p Pro), $0.50/sec (1024p Pro)

Prompt Best Practices (Sora-Specific):

Rewards prompts describing INTENT and MOOD, not just motion
Use director-style framing and gradual motion introduction
Structure prompts in distinct sections: what happens, visual style, audio elements
Be explicit about sound (dialogue, foley, music, mood)
Specify character positioning, framing, emotional states, gestures
Describe physics: "gentle collision" vs "violent crash", "heavy object slides" vs "light feather floats"
Support 15-25 second clips. Describe pacing progression.
Specify 24fps for cinematic feel

Sources: Sora 2 Guide | Sora Announcement

当前模型:

Sora 2：支持文本转视频和图像转视频，同步生成音频
Sora 2 Pro：更高分辨率（1792x1024）和更好的质量

图像转视频特性:

从静态图片扩展为动态视频
物理效果准确、逼真、可控
可将人物插入任意Sora生成的环境中，保持外观和语音准确
原生对话和音效生成

定价:

无免费额度（截至2026年1月10日）
ChatGPT Plus（20美元/月）：无限生成480p视频
ChatGPT Pro（200美元/月）：更高质量、优先访问
API：0.10美元/秒（720p），0.30美元/秒（720p Pro），0.50美元/秒（1024p Pro）

Sora专属提示词最佳实践:

偏好描述意图和氛围的提示词，而非仅描述动作
使用导演式构图和渐进式动作引入方式
提示词分为不同部分：内容情节、视觉风格、音频元素
明确描述声音（对话、拟声、音乐、氛围）
指定人物位置、构图、情绪状态、手势
描述物理效果："gentle collision"（轻微碰撞）vs "violent crash"（剧烈撞击），"heavy object slides"（重物滑动）vs "light feather floats"（轻羽漂浮）
支持15-25秒片段。描述节奏变化。
指定24fps以实现电影级质感

参考来源: Sora 2指南 | Sora公告

Vidu (Vidu Q3)

Vidu（Vidu Q3）

Current Models:

Vidu Q3 (latest): Native audio+video in one pass, up to 16s, 2K resolution, multi-shot "Smart Cuts"
Vidu Q2: Previous gen. Natural motion, film-like camera effects.
Reference-to-Video 2.0: Character/subject consistency across generations

Key Features:

First AI model to generate multi-shot, edited-style sequences with synced audio from a single prompt
"Smart Cuts" for automatic multi-shot sequences
Audio: BGM + SFX synced to scene rhythm
Up to 4K in Q3 Pro via API

Pricing:

Free: 3 videos/month. 720p, watermarked
Paid plans available on vidu.com

Sources: Vidu | Vidu Q3 Guide | Vidu Q3 on WaveSpeed

当前模型:

Vidu Q3（最新版）：单次生成原生音视频，最长16秒，2K分辨率，多镜头"Smart Cuts"
Vidu Q2：上一代模型。自然运动、电影级相机效果。
Reference-to-Video 2.0：跨生成保持角色/主体一致性

核心功能:

首个通过单条提示词生成多镜头、剪辑风格序列并同步音频的AI模型
"Smart Cuts"自动生成多镜头序列
音频：背景音乐+音效与场景节奏同步
Q3 Pro版通过API支持最高4K分辨率

定价:

免费版：每月3条视频额度。720p，带水印
付费方案请访问vidu.com

参考来源: Vidu | Vidu Q3指南 | Vidu Q3 on WaveSpeed

Hailuo / MiniMax (Hailuo 2.3)

Hailuo / MiniMax（Hailuo 2.3）

Current Models:

Hailuo 2.3 (latest): Improved physical actions, stylization, character micro-expressions, anime support
Hailuo 02: Standard and Fast variants. 768p and 1080p, up to 10s
Media Agent: Multi-modal creation with minimal manual editing

Pricing:

Free: $0/mo. Daily bonus credits. 720p, watermarked. Peak-hour wait times.
Standard: $9.99/mo. 1,000 credits, fast-track, no watermark, up to 5 tasks
Unlimited: $94.99/mo. Unlimited credits

Prompt Best Practices (Hailuo-Specific):

Works best with clean images and modest motion requests
Great for rapid A/B testing and short-form social content
Strong anime/stylized content support in 2.3

Sources: Hailuo AI | MiniMax Hailuo 2.3

当前模型:

Hailuo 2.3（最新版）：改进的物理动作、风格化、人物微表情、动漫支持
Hailuo 02：标准和快速变体。支持768p和1080p，最长10秒
Media Agent：多模态创作，手动编辑量极少

定价:

免费版：0美元/月。每日奖励额度。720p，带水印。高峰时段需等待。
标准版：9.99美元/月。1000个额度，快速通道，无水印，最多5个任务
无限版：94.99美元/月。无限额度

Hailuo专属提示词最佳实践:

搭配清晰图片和适度动作需求效果最佳
非常适合快速A/B测试和短时长社交内容
2.3版对动漫/风格化内容支持出色

参考来源: Hailuo AI | MiniMax Hailuo 2.3

Google Veo (Veo 3.1)

Google Veo（Veo 3.1）

Current Models:

Veo 3.1: 4K output (3840x2160), vertical video (9:16), "Ingredients to Video" (up to 4 reference images)
Veo 3 Standard: Older model available to some free users
Veo 3 Fast: Lower-cost option

Key Features:

FIRST mainstream AI model with true 4K output
"Ingredients to Video": Accept up to 4 reference images per generation
Character identity consistency across scene changes
Native vertical video for YouTube Shorts / TikTok / Reels
Built-in audio generation

Pricing:

Free (Gemini): 100 monthly AI credits for Flow/Whisk. May get Veo 3 Standard (not 3.1)
Pro ($19.99/mo): Limited Veo 3.1 access
Ultra ($124.99/3mo or ~$42/mo): 25,000 monthly credits, full Veo 3.1
API: Veo 2 at $0.35-0.50/sec

Prompt Best Practices (Veo-Specific):

Excels with film language — reference shot types and pacing
Separate subject stability from camera motion in prompts
Input images should be 720p+ with 16:9 or 9:16 aspect ratio
Prompts referencing specific shot types produce more controlled results

Sources: Veo 3.1 4K Update | Veo 3.1 Blog | Google DeepMind Veo

当前模型:

Veo 3.1：4K输出（3840x2160）、竖屏视频（9:16）、"Ingredients to Video"（最多4张参考图片）
Veo 3 Standard：旧模型，部分免费用户可用
Veo 3 Fast：低成本选项

核心功能:

首个实现真正4K输出的主流AI模型
"Ingredients to Video"：每次生成最多支持4张参考图片
场景变化时保持人物身份一致性
原生竖屏视频，适配YouTube Shorts / TikTok / Reels
内置音频生成

定价:

免费版（Gemini）：每月100个AI额度用于Flow/Whisk。可能可使用Veo 3 Standard（非3.1版）
Pro版（19.99美元/月）：有限访问Veo 3.1
Ultra版（124.99美元/3个月，约42美元/月）：每月25000个额度，完全访问Veo 3.1
API：Veo 2为0.35-0.50美元/秒

Veo专属提示词最佳实践:

擅长电影语言——参考镜头类型和节奏
在提示词中区分主体稳定性和相机运动
输入图片需为720p+，宽高比为16:9或9:16
引用特定镜头类型的提示词会产生更可控的结果

参考来源: Veo 3.1 4K更新 | Veo 3.1博客 | Google DeepMind Veo

Adobe Firefly Video

Current Model: Firefly Video (Feb 2026)

Key Features:

5s clips per generation
Native 2K resolution (up to 8K with Upscale)
IP indemnity — commercially safe, trained on licensed content
QuickCut: Upload b-roll or generate footage, auto-create structured first cut
Deep integration with Premiere Pro, After Effects, Creative Cloud

Pricing:

Firefly Standard: $9.99/mo (2,000 premium credits). ~20 videos at 100 credits/5s clip
Firefly Pro: $19.99/mo (4,000 premium credits)
Firefly Premium: $199.99/mo (50,000 premium credits)
Jan-Mar 2026 promo: Unlimited generations on paid plans

Best For: Enterprise/agency use where IP indemnity matters. Integration with existing Adobe workflows.

Sources: Adobe Firefly Pricing | Firefly Blog

当前模型: Firefly Video（2026年2月）

核心功能:

每次生成5秒片段
原生2K分辨率（可 upscale至8K）
IP保障——商用安全，基于授权内容训练
QuickCut：上传备用素材或生成素材，自动创建结构化初剪
与Premiere Pro、After Effects、Creative Cloud深度集成

定价:

Firefly标准版：9.99美元/月（2000个高级额度）。约20条视频，每条5秒片段需100个额度
Firefly专业版：19.99美元/月（4000个高级额度）
Firefly高级版：199.99美元/月（50000个高级额度）
2026年1-3月促销：付费版无限生成

适用场景: 重视IP保障的企业/机构用户。与现有Adobe工作流集成。

参考来源: Adobe Firefly定价 | Firefly博客

Seedance 2.0 (ByteDance)

Seedance 2.0（字节跳动）

Current Model: Seedance 2.0

Key Features:

Unified multimodal audio-video joint generation (text, image, audio, video inputs)
4-15s video length
1080p resolution
30% faster than Seedance 1.0
Native audio generation (BGM + SFX)

Pricing:

Free credits on signup (check-in daily for more)

Sources: Seedance 2.0 | Seedance on fal.ai

当前模型: Seedance 2.0

核心功能:

统一多模态音视频联合生成（支持文本、图片、音频、视频输入）
4-15秒视频时长
1080p分辨率
比Seedance 1.0快30%
原生音频生成（背景音乐+音效）

定价:

注册赠送免费额度（每日签到可获取更多）

参考来源: Seedance 2.0 | Seedance on fal.ai

WAN 2.6 / 2.1 (Open Source)

WAN 2.6 / 2.1（开源）

Current Models:

WAN 2.6: Latest release
WAN 2.1: Widely available, open-source on Hugging Face

Key Features:

Open source — run locally, no credits needed
1.3B and 14B parameter variants
Text AND image generation in video (Chinese + English)
Realistic physics simulation
Great general-purpose all-rounder

Pricing: Free (open source). Hardware costs only.

Best For: Self-hosted workflows, privacy-sensitive projects, unlimited generation without credits

Sources: WAN GitHub | WAN on HuggingFace

当前模型:

WAN 2.6：最新版本
WAN 2.1：广泛可用，在Hugging Face开源

核心功能:

开源——本地运行，无需额度
13亿和140亿参数变体
视频中支持文本和图片生成（中英文）
逼真的物理模拟
优秀的通用型工具

定价: 免费（开源）。仅需硬件成本。

适用场景: 自托管工作流、隐私敏感项目、无额度限制的无限生成

参考来源: WAN GitHub | WAN on HuggingFace

3. Universal Prompt Best Practices

3. 通用提示词最佳实践

The Golden Rules

黄金法则

Separate identity from motion. The image defines WHO/WHAT. The prompt defines HOW it MOVES.
Do NOT re-describe the image. This causes reduced motion or visual artifacts.
Start simple, iterate. Begin with one action, one camera move. Add complexity after testing.
Be physically specific, not conceptual. "Camera slowly pushes in" > "dramatic emphasis"
3-4 descriptive elements per component is the sweet spot. More adjectives past this degrades quality.

区分身份与动作。图片定义"谁/什么"，提示词定义"如何运动"。
不要重复描述图片。这会降低运动效果或产生视觉伪影。
从简单开始，逐步迭代。先从一个动作、一个相机移动开始。测试后再增加复杂度。
描述要贴合物理动作，而非概念化。"Camera slowly pushes in"（相机缓慢推入）优于"dramatic emphasis"（戏剧性强调）
每个组件3-4个描述元素为最佳。超过这个数量会降低质量。

Prompt Structure Formula

提示词结构公式

[Camera movement], [pace/speed], [subject action], [environmental motion/details]

Example:

Slow push-in, steady cinematic pace, the developer's fingers type on the glowing keyboard,
holographic UI panels float and pulse with soft blue light around the workspace

[相机动作], [节奏/速度], [主体动作], [环境运动/细节]

示例:

Slow push-in, steady cinematic pace, the developer's fingers type on the glowing keyboard,
holographic UI panels float and pulse with soft blue light around the workspace

The 8-Point Shot Grammar (Advanced)

8要素镜头语法（进阶）

For consistent cinematic outputs, cover these 8 elements:

Element	What to Specify	Example
1. Subject	Who/what is the focus	"A developer at a desk"
2. Emotion/Mood	Tone of the scene	"focused, intense concentration"
3. Optics/Framing	Shot type and lens	"medium close-up, 35mm lens"
4. Motion	Camera + subject movement	"slow dolly in, subtle typing motion"
5. Lighting	Light source and quality	"cool monitor glow, purple ambient neon"
6. Style	Visual aesthetic	"cinematic, dark moody, tech noir"
7. Audio (if supported)	Sound design	"mechanical keyboard clicks, ambient hum"
8. Continuity	What stays constant	"face remains still, same expression"

为了获得一致的电影级输出，需覆盖以下8个要素:

要素	需要指定的内容	示例
1. 主体	焦点是谁/什么	"A developer at a desk"（书桌前的开发者）
2. 情绪/氛围	场景基调	"focused, intense concentration"（专注、高度集中）
3. 镜头/构图	镜头类型和焦距	"medium close-up, 35mm lens"（中近景，35mm镜头）
4. 动作	相机+主体动作	"slow dolly in, subtle typing motion"（缓慢推入，细微打字动作）
5. 光线	光源和光线质量	"cool monitor glow, purple ambient neon"（冷色调屏幕光，紫色环境霓虹）
6. 风格	视觉美学	"cinematic, dark moody, tech noir"（电影级、暗黑氛围、科技黑色电影风格）
7. 音频（若支持）	音效设计	"mechanical keyboard clicks, ambient hum"（机械键盘敲击声、环境嗡鸣）
8. 连续性	保持不变的内容	"face remains still, same expression"（面部保持静止，表情不变）

4. Camera Movement Reference

4. 相机运动参考

Movement Types with Prompt Keywords

运动类型及提示词关键词

Movement	Description	Prompt Keywords	Best For
Static/Locked	Camera stays still, subject moves	"static shot", "locked camera", "fixed frame"	Subtle expressions, product focus
Pan	Horizontal swivel from fixed point	"pan left", "pan right", "slow pan", "sweeping pan"	Revealing landscapes, following action
Tilt	Vertical angle up/down from fixed point	"tilt up", "tilt down", "slow tilt", "dramatic tilt"	Height emphasis, outfit reveal
Push-in/Dolly In	Camera moves toward subject	"push in", "dolly in", "slow push-in", "intimate push"	Building tension, product detail
Pull-back/Dolly Out	Camera moves away from subject	"pull back", "dolly out", "reveal pull-back"	Context reveal, scene endings
Truck	Camera moves parallel to subject	"truck left", "truck right", "lateral movement"	Walking scenes, shelf scanning
Pedestal	Camera moves vertically (elevator-like)	"pedestal up", "pedestal down", "rising reveal"	Revealing hidden elements
Tracking	Camera follows moving subject	"tracking shot", "follow shot", "match pace"	Action sequences, character walk
Orbit/Arc	Camera circles around subject	"orbit clockwise", "orbit counterclockwise", "arc around", "slow orbit"	Hero shots, product showcase, dramatic
Crane/Boom	Sweeping vertical + horizontal move	"crane up", "crane shot", "boom up", "sweeping crane"	Epic establishing shots, crowd reveals
Rack Focus	Focus shifts between planes	"rack focus", "shift focus", "focus pull"	Attention redirection
Crash Zoom	Very fast dramatic zoom	"crash zoom", "snap zoom", "whip zoom"	Action beats, comedy, emphasis
Zoom	Lens zooms in/out (not physical move)	"zoom in", "zoom out", "slow zoom"	Drawing attention, reveal
Handheld	Slight natural shake	"handheld", "shaky cam", "documentary style"	Realism, urgency, immediacy
FPV/First-Person	Camera IS the subject	"FPV", "first person view", "POV shot"	Immersive, gaming content

运动类型	描述	提示词关键词	适用场景
静态/锁定	相机保持静止，主体移动	"static shot"（静态镜头）, "locked camera"（锁定相机）, "fixed frame"（固定画面）	细微表情、产品聚焦
Pan（摇镜头）	固定点水平旋转	"pan left"（向左摇）, "pan right"（向右摇）, "slow pan"（缓慢摇）, "sweeping pan"（大范围摇）	展示风景、跟随动作
Tilt（俯仰镜头）	固定点上下倾斜	"tilt up"（向上俯仰）, "tilt down"（向下俯仰）, "slow tilt"（缓慢俯仰）, "dramatic tilt"（戏剧性俯仰）	强调高度、展示服装
Push-in/Dolly In（推入/推拉镜头）	相机向主体移动	"push in"（推入）, "dolly in"（推拉进入）, "slow push-in"（缓慢推入）, "intimate push"（近距离推入）	营造紧张感、展示产品细节
Pull-back/Dolly Out（拉远/推拉镜头）	相机远离主体	"pull back"（拉远）, "dolly out"（推拉退出）, "reveal pull-back"（拉远展示）	展示背景、场景结尾
Truck（横移镜头）	相机与主体平行移动	"truck left"（向左横移）, "truck right"（向右横移）, "lateral movement"（横向移动）	行走场景、货架扫描
Pedestal（升降镜头）	相机垂直移动（类似电梯）	"pedestal up"（向上升降）, "pedestal down"（向下升降）, "rising reveal"（上升展示）	展示隐藏元素
Tracking（追踪镜头）	相机跟随移动的主体	"tracking shot"（追踪镜头）, "follow shot"（跟拍镜头）, "match pace"（匹配速度）	动作序列、人物行走
Orbit/Arc（环绕镜头）	相机围绕主体旋转	"orbit clockwise"（顺时针环绕）, "orbit counterclockwise"（逆时针环绕）, "arc around"（弧形环绕）, "slow orbit"（缓慢环绕）	主角镜头、产品展示、戏剧性场景
Crane/Boom（吊臂镜头）	大范围垂直+水平移动	"crane up"（吊臂上升）, "crane shot"（吊臂镜头）, "boom up"（吊臂升起）, "sweeping crane"（大范围吊臂移动）	史诗级开场镜头、人群展示
Rack Focus（焦点切换）	焦点在不同平面间切换	"rack focus"（焦点切换）, "shift focus"（转移焦点）, "focus pull"（拉动焦点）	转移注意力
Crash Zoom（急变焦）	非常快速的戏剧性变焦	"crash zoom"（急变焦）, "snap zoom"（快速变焦）, "whip zoom"（甩动变焦）	动作节点、喜剧、强调
Zoom（变焦）	镜头变焦（非物理移动）	"zoom in"（放大）, "zoom out"（缩小）, "slow zoom"（缓慢变焦）	吸引注意力、展示内容
Handheld（手持镜头）	轻微自然晃动	"handheld"（手持）, "shaky cam"（晃动镜头）, "documentary style"（纪录片风格）	真实感、紧迫感、即时性
FPV/First-Person（第一视角）	相机即主体视角	"FPV", "first person view"（第一人称视角）, "POV shot"（主观镜头）	沉浸式内容、游戏内容

Combining Movements

组合运动

You can combine camera movements for complex shots:

"Slow dolly in while panning slightly right, the camera rises gently"
"Crane up and orbit counterclockwise, revealing the full workspace"
"Tracking shot following the subject with a slight handheld shake"

可以组合相机运动实现复杂镜头:

"Slow dolly in while panning slightly right, the camera rises gently"
"Crane up and orbit counterclockwise, revealing the full workspace"
"Tracking shot following the subject with a slight handheld shake"

Speed/Pace Modifiers

速度/节奏修饰词

Modifier	Effect	Keywords
Very slow	Dreamy, contemplative	"very slow", "glacial pace", "barely moving"
Slow	Cinematic, elegant	"slow", "steady", "gentle", "smooth"
Medium	Natural, documentary	"natural pace", "moderate speed"
Fast	Energetic, dynamic	"fast", "dynamic", "brisk", "energetic"
Whip	Sudden, dramatic	"whip", "snap", "lightning fast", "sudden"

修饰词	效果	关键词
极慢	梦幻、沉思	"very slow"（极慢）, "glacial pace"（冰川般缓慢）, "barely moving"（几乎不动）
缓慢	电影级、优雅	"slow"（缓慢）, "steady"（平稳）, "gentle"（轻柔）, "smooth"（流畅）
中等	自然、纪录片风格	"natural pace"（自然节奏）, "moderate speed"（中等速度）
快速	充满活力、动态	"fast"（快速）, "dynamic"（动态）, "brisk"（轻快）, "energetic"（充满活力）
突然	突然、戏剧性	"whip"（甩动）, "snap"（突然）, "lightning fast"（闪电般快速）, "sudden"（突然）

5. Subject Animation Guide

5. 主体动画指南

Subtle vs. Dramatic Motion Spectrum

细微与戏剧性运动范围

Level	Description	Keywords	Use Case
Minimal	Almost imperceptible	"barely perceptible movement", "very subtle", "still with micro-motion"	Thumbnails, portrait-style
Subtle	Natural idle motion	"gentle sway", "subtle breathing", "slight movement", "soft idle"	Professional headshots, calm scenes
Moderate	Clear but controlled	"natural movement", "smooth gesture", "controlled action"	Product demos, presentations
Dynamic	Active, energetic	"active movement", "energetic", "fluid motion"	Action scenes, sports
Dramatic	Maximum motion	"explosive motion", "dramatic action", "intense movement"	Music videos, trailers

等级	描述	关键词	适用场景
极细微	几乎难以察觉	"barely perceptible movement"（几乎难以察觉的动作）, "very subtle"（极细微）, "still with micro-motion"（静止带微动作）	缩略图、肖像风格
细微	自然 idle 动作	"gentle sway"（轻微摇摆）, "subtle breathing"（细微呼吸）, "slight movement"（轻微动作）, "soft idle"（柔和 idle 动作）	专业头像、平静场景
中等	清晰但可控	"natural movement"（自然动作）, "smooth gesture"（流畅手势）, "controlled action"（可控动作）	产品演示、演讲
动态	活跃、充满活力	"active movement"（活跃动作）, "energetic"（充满活力）, "fluid motion"（流畅动作）	动作场景、体育内容
戏剧性	最大幅度动作	"explosive motion"（爆发性动作）, "dramatic action"（戏剧性动作）, "intense movement"（剧烈动作）	音乐视频、预告片

Animating Specific Elements

特定元素动画

Hair/Clothing:

"hair gently moves as if from a light breeze"
"coat fabric ripples softly"
"scarf billows in the wind"

Eyes/Face (CAREFUL — most distortion-prone):

"eyes blink naturally"
"subtle smile forms"
"gaze shifts to the right"

WARNING: Keep facial animation minimal to avoid distortion. "Natural blinking" and "subtle expression" are safest.

Hands/Typing:

"fingers move across keyboard with natural rhythm"
"hands gesture subtly while speaking"
"subtle finger movement on the trackpad"

Environment/Background:

"particles float gently in the air"
"screen content scrolls slowly"
"ambient light pulses softly"
"clouds drift across the sky"

头发/服装:

"hair gently moves as if from a light breeze"
"coat fabric ripples softly"
"scarf billows in the wind"

眼睛/面部（注意——最容易失真）:

"eyes blink naturally"
"subtle smile forms"
"gaze shifts to the right"

警告：面部动画需保持细微，避免失真。"自然眨眼"和"细微表情"是最安全的选择。

手部/打字:

"fingers move across keyboard with natural rhythm"
"hands gesture subtly while speaking"
"subtle finger movement on the trackpad"

环境/背景:

"particles float gently in the air"
"screen content scrolls slowly"
"ambient light pulses softly"
"clouds drift across the sky"

6. Consistency & Stability

6. 一致性与稳定性

Keeping the Subject Stable

保持主体稳定

Identity locks in prompt: "same face, same outfit, same hairstyle, consistent proportions"
Fixed Seed (Runway): Enable for reproducible motion across iterations
Reference images (Runway Gen-4+, Veo 3.1): Upload character reference for cross-scene consistency
Minimize facial motion: Faces drift the most. Keep face expressions subtle.
Foreground priority: Place main character in foreground, blur secondary faces
One subject focus: Multiple moving subjects = more drift. Focus on ONE.

提示词中锁定身份: "same face, same outfit, same hairstyle, consistent proportions"（相同面部、相同服装、相同发型、比例一致）
Fixed Seed（Runway）: 启用该功能实现迭代间可复现的运动效果
参考图片（Runway Gen-4+、Veo 3.1）: 上传角色参考图片实现跨场景一致性
最小化面部动作: 面部最容易偏移。保持面部表情细微。
前景优先: 将主要角色放在前景，模糊次要面部
聚焦单个主体: 多个移动主体会增加偏移风险。聚焦一个主体。

Maintaining Visual Coherence

保持视觉连贯性

Use the SAME image for multi-clip generation (don't switch source images)
Save and reuse exact style parameters across batches (colors, aesthetic, motion quality)
Keep lighting description consistent: "cool blue monitor glow" in every prompt
Specify what should NOT change: "the background remains static" or "the desk stays perfectly still"

多片段生成使用同一张图片（不要切换源图片）
批次生成间保存并复用相同的风格参数（颜色、美学、运动质量）
保持光线描述一致：每个提示词都使用"cool blue monitor glow"（冷蓝色屏幕光）
指定保持不变的内容："the background remains static"（背景保持静止）或"the desk stays perfectly still"（书桌完全静止）

Cross-Scene Consistency

跨场景一致性

Runway Gen-4+: Upload character reference image for appearance matching
Veo 3.1 Ingredients: Up to 4 reference images per generation
Kling Motion Control: Character image + motion reference video
Pika Scene Ingredients: Upload characters/objects for consistency

Runway Gen-4+: 上传角色参考图片匹配外观
Veo 3.1 Ingredients: 每次生成最多支持4张参考图片
Kling Motion Control: 角色图片+运动参考视频
Pika Scene Ingredients: 上传角色/物体保证一致性

7. Avoiding Distortion & Common Mistakes

7. 避免失真与常见错误

Top 10 Distortion Causes and Fixes

十大失真原因及解决方法

Cause	Symptom	Fix
Re-describing image content in prompt	Reduced motion, visual artifacts	Prompt should ONLY describe motion, not the scene
Too many actions at once	Chaotic, incoherent motion	ONE hero action + hint secondary motion
Abstract/conceptual language	Unpredictable results	Use specific physical descriptions
Hidden limbs in source image	Extra fingers, hallucinated hands	Ensure all limbs visible in source
Wide-angle lens in source	Perspective distortion during motion	Use neutral focal length (35-85mm framing)
Too many adjectives	Quality degradation	3-4 descriptive elements per component max
Mismatched aspect ratios	Stretching, cropping artifacts	Match source image to output aspect ratio
Excessive facial animation	Face warping, identity drift	Keep face motion minimal ("subtle", "natural blink")
Low-resolution source image	Blurry, unstable output	Use 720p+ source images minimum
Contradictory instructions	Confused model output	Review prompt for conflicts

原因	症状	解决方法
提示词中重复描述图片内容	运动效果减弱、视觉伪影	提示词应仅描述动作，而非场景
同时包含过多动作	混乱、不连贯的运动	一个核心动作+略微提及次要动作
抽象/概念化语言	不可预测的结果	使用具体的物理描述
源图片中肢体隐藏	多余手指、幻觉手部	确保源图片中所有肢体可见
源图片使用广角镜头	运动时产生透视失真	使用中性焦距（35-85mm构图）
形容词过多	质量下降	每个组件最多3-4个描述元素
宽高比不匹配	拉伸、裁剪伪影	匹配源图片与输出的宽高比
面部动画过度	面部扭曲、身份偏移	保持面部动作细微（"subtle"、"natural blink"）
源图片分辨率低	模糊、不稳定输出	至少使用720p+的源图片
指令矛盾	模型输出混乱	检查提示词是否存在冲突

Negative Prompt Keywords (where supported)

负面提示词关键词（支持的工具）

Place critical exclusions first (models weight earlier terms more):

"blurry, low resolution, distorted, warped face, extra fingers, glitchy text,
unnatural movements, chaotic cuts, morphing features, flickering"

将关键排除项放在前面（模型对靠前的术语权重更高）:

"blurry, low resolution, distorted, warped face, extra fingers, glitchy text,
unnatural movements, chaotic cuts, morphing features, flickering"

Quality Safeguards

质量保障措施

Source image quality matters most. The cleaner the keyframe, the less the model invents.
Generate the source image with a good image model first (FLUX, Seedream 4.5, Midjourney)
Iterate ONE variable at a time when fixing issues (motion strength, camera move, style complexity)
Use preview/draft resolution first, then upscale the winner

源图片质量最重要。关键帧越清晰，模型生成的冗余内容越少。
先使用优秀的图像模型生成源图片（FLUX、Seedream 4.5、Midjourney）
修复问题时每次只迭代一个变量（运动强度、相机动作、风格复杂度）
先使用预览/草稿分辨率生成，再对最优结果进行 upscale

8. Text & Logo Preservation

8. 文字与Logo保留

The Core Problem

核心问题

AI video models struggle with text and logos. They frequently warp, blur, or morph text during motion. This is a fundamental limitation of current diffusion models.

AI视频模型处理文字和Logo时存在困难。运动过程中文字经常扭曲、模糊或变形。这是当前扩散模型的根本性局限。

Mitigation Strategies

缓解策略

Minimize motion near text areas:

"the text/logo remains perfectly stationary in the frame"
"camera movement avoids the text area"
"text stays sharp and readable throughout"

Use static camera for text-heavy areas: If text must be visible, keep the camera locked and animate only non-text elements.

Specify high contrast text:

"bold high-contrast text", "clear sans-serif text", "readable block letters"

Post-production approach (recommended for important text):
- Generate the video WITHOUT text
- Overlay text/logos in video editing (Premiere, After Effects, CapCut)
- This guarantees readability
Kling 2.6 for logos: Best at preserving edges and logos in product shots.
Short duration helps: Text stays more stable in 3-5s clips than 10s+.

最小化文字区域的运动:

"the text/logo remains perfectly stationary in the frame"
"camera movement avoids the text area"
"text stays sharp and readable throughout"

文字密集区域使用静态相机: 如果文字必须可见，保持相机锁定，仅动画非文字元素。

指定高对比度文字:

"bold high-contrast text", "clear sans-serif text", "readable block letters"

后期制作方法（重要文字推荐使用）:
- 生成不含文字的视频
- 在视频编辑软件中叠加文字/Logo（Premiere、After Effects、CapCut）
- 这种方法可保证文字可读性
Kling 2.6处理Logo: 在产品拍摄中最擅长保留边缘和Logo。
短时长有助于稳定: 3-5秒片段中的文字比10秒+片段更稳定。

9. Thumbnail-to-Video Specific Guide

9. 缩略图转视频专属指南

For Tech/Coding Thumbnails (Dark bg, neon, workspace, developer)

技术/编码类缩略图（深色背景、霓虹、工作区、开发者）

This section is specifically designed for thumbnails with: dark backgrounds, purple/teal neon lighting, desk/workspace scenes, MacBook/monitors, developer character, floating UI elements, holographic panels.

本节专门针对以下缩略图设计：深色背景、紫/青霓虹灯光、书桌/工作区场景、MacBook/显示器、开发者角色、悬浮UI元素、全息面板。

Best Camera Movements for Desk/Workspace Scenes

书桌/工作区场景推荐相机动作

Recommended (in order of effectiveness):

Slow Push-In (BEST for thumbnails):

"Slow push-in toward the developer's workspace, steady cinematic pace,
ambient particles float gently, monitor screens glow softly"

Why: Creates focus, minimal distortion, keeps face stable.

Subtle Orbit (dramatic, good for hero shots):

"Very slow orbit clockwise around the developer at the desk,
neon reflections shift on the monitor surface, floating UI elements rotate gently"

Why: Adds depth and drama without disrupting the subject.

Static + Ambient Motion (safest for face stability):

"Static locked camera, the developer sits motionless at the desk,
holographic panels pulse with soft light, particles drift upward,
screen content scrolls slowly"

Why: Zero face distortion. All motion is environmental.

Gentle Pedestal Up (reveal shot):

"Slow pedestal up from the keyboard level, rising to reveal the developer's face
and floating holographic displays, purple ambient light pulses"

Dolly Back + Reveal:

"Slow dolly backward revealing the full workspace setup,
multiple monitors glow, floating code snippets hover in the air"

推荐顺序（按效果排序）:

Slow Push-In（缩略图最佳选择）:

"Slow push-in toward the developer's workspace, steady cinematic pace,
ambient particles float gently, monitor screens glow softly"

原因：聚焦主体，失真最小，保持面部稳定。

Subtle Orbit（戏剧性，适合主角镜头）:

"Very slow orbit clockwise around the developer at the desk,
neon reflections shift on the monitor surface, floating UI elements rotate gently"

原因：增加深度和戏剧性，不干扰主体。

Static + Ambient Motion（面部稳定性最安全）:

"Static locked camera, the developer sits motionless at the desk,
holographic panels pulse with soft light, particles drift upward,
screen content scrolls slowly"

原因：面部零失真。所有运动都在环境中。

Gentle Pedestal Up（展示镜头）:

"Slow pedestal up from the keyboard level, rising to reveal the developer's face
and floating holographic displays, purple ambient light pulses"

Dolly Back + Reveal:

"Slow dolly backward revealing the full workspace setup,
multiple monitors glow, floating code snippets hover in the air"

Animating Floating UI Elements

悬浮UI元素动画

"Translucent holographic panels float around the workspace, slowly rotating and pulsing"
"Code snippets hover in mid-air with a soft cyan glow, gently bobbing up and down"
"Floating UI windows orbit the developer, each displaying different data visualizations"
"Semi-transparent screens drift slowly, reflecting purple and blue neon light"
"Holographic interface elements materialize one by one around the desk"

Key descriptors for floating elements:

"translucent", "semi-transparent", "glassmorphic"
"floating", "hovering", "drifting", "orbiting"
"pulsing", "glowing", "flickering softly"
"materializing", "fading in", "dissolving"

"Translucent holographic panels float around the workspace, slowly rotating and pulsing"
"Code snippets hover in mid-air with a soft cyan glow, gently bobbing up and down"
"Floating UI windows orbit the developer, each displaying different data visualizations"
"Semi-transparent screens drift slowly, reflecting purple and blue neon light"
"Holographic interface elements materialize one by one around the desk"

悬浮元素关键描述词:

"translucent"（半透明）, "semi-transparent"（半透明）, "glassmorphic"（玻璃态）
"floating"（悬浮）, "hovering"（悬停）, "drifting"（漂移）, "orbiting"（环绕）
"pulsing"（脉动）, "glowing"（发光）, "flickering softly"（轻微闪烁）
"materializing"（显现）, "fading in"（淡入）, "dissolving"（消散）

Making Holographic Panels Glow/Pulse

让全息面板发光/脉动

"Holographic panels emit a soft pulsating blue-purple glow"
"Screens pulse rhythmically with cyan light, intensity rising and falling"
"Neon edges of the floating panels flicker with electric energy"
"Warm glow radiates from the holographic displays, casting colored light on the developer's face"
"Panels glow brighter momentarily before dimming back, creating a breathing light effect"

Light behavior keywords:

"pulsating", "breathing light", "rhythmic glow"
"flickering", "shimmering", "radiating"
"casting colored light", "reflecting off surfaces"
"intensity rising and falling", "soft oscillation"

"Holographic panels emit a soft pulsating blue-purple glow"
"Screens pulse rhythmically with cyan light, intensity rising and falling"
"Neon edges of the floating panels flicker with electric energy"
"Warm glow radiates from the holographic displays, casting colored light on the developer's face"
"Panels glow brighter momentarily before dimming back, creating a breathing light effect"

光线行为关键词:

"pulsating"（脉动）, "breathing light"（呼吸灯效果）, "rhythmic glow"（有节奏的发光）
"flickering"（闪烁）, "shimmering"（微光）, "radiating"（辐射）
"casting colored light"（投射彩色光）, "reflecting off surfaces"（在表面反射）
"intensity rising and falling"（强度起伏）, "soft oscillation"（轻微波动）

Adding Subtle Typing/Screen Activity

添加细微打字/屏幕活动

"Fingers move naturally across the backlit keyboard, screen content scrolls upward"
"Subtle typing motion, code appearing on the main monitor line by line"
"The MacBook screen displays scrolling code with a soft green-on-black terminal"
"Cursor blinks on the screen, new lines of code appear gradually"
"Multiple monitors show different live data — one scrolling code, one showing metrics"

"Fingers move naturally across the backlit keyboard, screen content scrolls upward"
"Subtle typing motion, code appearing on the main monitor line by line"
"The MacBook screen displays scrolling code with a soft green-on-black terminal"
"Cursor blinks on the screen, new lines of code appear gradually"
"Multiple monitors show different live data — one scrolling code, one showing metrics"

Keeping the Person's Face Stable with Ambient Motion

通过环境运动保持人物面部稳定

This is the #1 challenge. Here is the priority order:

Face = STATIC, Everything else = MOVING:

"The developer sits perfectly still, face unchanged, steady gaze at the screen.
Around him, holographic panels pulse with light, particles float upward,
keyboard keys glow softly, ambient neon light shifts between purple and blue"

Minimal face motion only:

"The developer blinks naturally, otherwise perfectly still.
Ambient environment has floating particles, pulsing lights, and drifting UI elements"

Identity locks: Always include: "same face throughout, consistent facial features, no face morphing"
Camera choice matters:
- Static camera = most stable face
- Slow push-in = face stays stable (camera moves, not face)
- Orbit = face can drift (use with caution)
- Any motion toward/around face = highest risk
Tool selection for face stability:
- Best: Kling (Motion Control with locked face), Luma (start+end frame)
- Good: Runway Gen-4.5 (character reference), Veo 3.1 (identity consistency)
- Risky: Sora (longer clips = more drift), Pika (creative focus, less stability)

这是头号挑战。以下是优先级顺序:

面部完全静止，其他元素运动:

"The developer sits perfectly still, face unchanged, steady gaze at the screen.
Around him, holographic panels pulse with light, particles float upward,
keyboard keys glow softly, ambient neon light shifts between purple and blue"

仅最小化面部运动:

"The developer blinks naturally, otherwise perfectly still.
Ambient environment has floating particles, pulsing lights, and drifting UI elements"

身份锁定: 始终包含："same face throughout, consistent facial features, no face morphing"（全程保持相同面部，特征一致，无面部变形）
相机选择很重要:
- 静态相机 = 面部最稳定
- 缓慢推入 = 面部保持稳定（相机移动，而非面部）
- 环绕镜头 = 面部可能偏移（谨慎使用）
- 任何朝向/环绕面部的运动 = 风险最高
面部稳定性工具选择:
- 最佳: Kling（Motion Control锁定面部）、Luma（起止帧）
- 良好: Runway Gen-4.5（角色参考）、Veo 3.1（身份一致性）
- 风险: Sora（长片段=更高偏移）、Pika（创意聚焦，稳定性较低）

10. Prompt Templates

10. 提示词模板

Template 1: Thumbnail-to-Video (Tech Workspace)

模板1：缩略图转视频（技术工作区）

Slow push-in, smooth cinematic pace. A developer sits at a dark workspace,
face perfectly still with natural subtle blinking. Holographic UI panels float
around the desk, pulsing with soft [purple/blue/cyan] neon light. Particles
drift gently upward. The MacBook screen displays scrolling code. Ambient
lighting shifts subtly between [purple and teal]. The background remains
perfectly static while floating elements orbit slowly.

Slow push-in, smooth cinematic pace. A developer sits at a dark workspace,
face perfectly still with natural subtle blinking. Holographic UI panels float
around the desk, pulsing with soft [purple/blue/cyan] neon light. Particles
drift gently upward. The MacBook screen displays scrolling code. Ambient
lighting shifts subtly between [purple and teal]. The background remains
perfectly static while floating elements orbit slowly.

Template 2: Product Showcase (Orbit)

模板2：产品展示（环绕镜头）

Slow orbit clockwise around [product], smooth cinematic pace. The [product]
sits on a dark reflective surface. Ambient particles float in the air.
Soft studio lighting highlights edges and details. The background is
[dark/gradient]. Camera completes a quarter rotation over [5-10] seconds.

Slow orbit clockwise around [product], smooth cinematic pace. The [product]
sits on a dark reflective surface. Ambient particles float in the air.
Soft studio lighting highlights edges and details. The background is
[dark/gradient]. Camera completes a quarter rotation over [5-10] seconds.

Template 3: Hero Shot (Person)

模板3：主角镜头（人物）

Static locked camera, [duration] seconds. [Person description] stands/sits
in [environment]. Face remains perfectly still. Hair moves gently from a
subtle breeze. Atmospheric particles float in the background. Volumetric
light rays stream through [window/source]. The mood is [cinematic/dramatic/calm].

Static locked camera, [duration] seconds. [Person description] stands/sits
in [environment]. Face remains perfectly still. Hair moves gently from a
subtle breeze. Atmospheric particles float in the background. Volumetric
light rays stream through [window/source]. The mood is [cinematic/dramatic/calm].

Template 4: Code/Tech Demo

模板4：代码/技术演示

[Camera movement], steady pace. Close-up of a MacBook Pro screen showing
[code editor/terminal/dashboard]. Code scrolls upward naturally. The cursor
blinks. Ambient keyboard glow pulses softly. Shallow depth of field blurs
the background workspace. [Monitor reflections shift/bokeh lights drift].

[Camera movement], steady pace. Close-up of a MacBook Pro screen showing
[code editor/terminal/dashboard]. Code scrolls upward naturally. The cursor
blinks. Ambient keyboard glow pulses softly. Shallow depth of field blurs
the background workspace. [Monitor reflections shift/bokeh lights drift].

Template 5: Dramatic Reveal

模板5：戏剧性展示

Slow crane up from [starting point], sweeping reveal. Rising from [desk level/
ground level] to reveal [full scene/workspace/city]. Atmospheric fog drifts
through the scene. [Neon/ambient] lights illuminate the environment.
The scale of the scene becomes apparent as the camera rises.

Slow crane up from [starting point], sweeping reveal. Rising from [desk level/
ground level] to reveal [full scene/workspace/city]. Atmospheric fog drifts
through the scene. [Neon/ambient] lights illuminate the environment.
The scale of the scene becomes apparent as the camera rises.

Schema

架构

Inputs

输入

Parameter	Type	Required	Description
source_image	Image (720p+ recommended)	Yes	The still image to animate
prompt	String	Yes	Motion description (camera + subject + environment)
tool	Enum	Yes	Which AI tool to use
duration	Integer (3-25s)	No	Target video length
aspect_ratio	Enum (16:9, 9:16, 1:1, 4:3)	No	Output aspect ratio
resolution	Enum (480p-4K)	No	Output resolution (tool-dependent)
motion_reference	Video	No	Motion reference video (Kling Motion Control only)

参数	类型	必填	描述
source_image	图片（推荐720p+）	是	要动画化的静态图片
prompt	字符串	是	动作描述（相机+主体+环境）
tool	枚举	是	使用的AI工具
duration	整数（3-25s）	否	目标视频时长
aspect_ratio	枚举（16:9, 9:16, 1:1, 4:3）	否	输出宽高比
resolution	枚举（480p-4K）	否	输出分辨率（工具依赖）
motion_reference	视频	否	运动参考视频（仅Kling Motion Control支持）

Outputs

输出

Parameter	Type	Description
video	MP4	Generated video file
duration	Integer	Actual video length in seconds
resolution	String	Output resolution
credits_used	Integer	Credits consumed

参数	类型	描述
video	MP4	生成的视频文件
duration	整数	实际视频时长（秒）
resolution	字符串	输出分辨率
credits_used	整数	消耗的额度

Composable With

可组合技能

Skill	How
`thumbnail-generator`	Generate thumbnail image -> animate with this skill
`nano-banana-images`	Generate source image -> animate
`video-edit`	Post-process: trim, add text overlay, music
`pan-3d-transition`	Combine I2V clips with 3D transitions
`title-variants`	Generate titles -> overlay on video
`recreate-thumbnails`	Face-swap source image -> animate

技能	组合方式
`thumbnail-generator`	生成缩略图图片 → 使用本技能动画化
`nano-banana-images`	生成源图片 → 动画化
`video-edit`	后期处理：修剪、添加文字叠加、音乐
`pan-3d-transition`	将图像转视频片段与3D过渡效果结合
`title-variants`	生成标题 → 叠加到视频上
`recreate-thumbnails`	源图片换脸 → 动画化

Tool Selection Decision Tree

工具选择决策树

Need 4K output?
  -> Google Veo 3.1 or Vidu Q3 Pro

Need face stability (thumbnail/portrait)?
  -> Kling Motion Control (best) or Static camera on any tool

Need native audio?
  -> Sora 2, Vidu Q3, Seedance 2.0, Kling 2.6+, or Veo 3.1

Need free / no budget?
  -> Kling free (66 daily credits) or Hailuo free (daily bonus)
  -> WAN 2.1 (open source, run locally)

Need commercial IP safety?
  -> Adobe Firefly (IP indemnity)

Need creative effects (swaps, additions)?
  -> Pika 2.5 (Pikaswaps, Pikadditions, Pikaframes)

Need longest output?
  -> Sora 2 (25s) or Luma Ray3 (20s) or Vidu Q3 (16s)

Need product/fashion detail?
  -> Kling 2.6 Pro (preserves edges, logos, fabric)

Need fast iteration?
  -> Hailuo 2.3 (speed) or Runway Gen-4 Turbo (cheap credits)

Need multi-shot sequences?
  -> Vidu Q3 (Smart Cuts) or Veo 3.1 (Ingredients)

Need open source / self-hosted?
  -> WAN 2.1/2.6 (GitHub, HuggingFace)

需要4K输出?
  -> Google Veo 3.1或Vidu Q3 Pro

需要面部稳定性（缩略图/肖像）?
  -> Kling Motion Control（最佳）或任意工具使用静态相机

需要原生音频?
  -> Sora 2, Vidu Q3, Seedance 2.0, Kling 2.6+, 或Veo 3.1

需要免费/无预算?
  -> Kling免费版（每日66个额度）或Hailuo免费版（每日奖励）
  -> WAN 2.1（开源，本地运行）

需要商用IP安全?
  -> Adobe Firefly（IP保障）

需要创意特效（替换、插入）?
  -> Pika 2.5（Pikaswaps、Pikadditions、Pikaframes）

需要最长输出时长?
  -> Sora 2（25s）或Luma Ray3（20s）或Vidu Q3（16s）

需要产品/时尚细节?
  -> Kling 2.6 Pro（保留边缘、Logo、面料）

需要快速迭代?
  -> Hailuo 2.3（速度）或Runway Gen-4 Turbo（低成本额度）

需要多镜头序列?
  -> Vidu Q3（Smart Cuts）或Veo 3.1（Ingredients）

需要开源/自托管?
  -> WAN 2.1/2.6（GitHub、HuggingFace）

Quick Reference Card

快速参考卡片

Top 5 Motion Keywords That Work Across All Tools

所有工具通用的5大动作关键词

"slow push-in" / "dolly in"
"gentle orbit" / "arc around"
"static shot with ambient motion"
"tracking shot following"
"slow pan left/right"

"slow push-in" / "dolly in"
"gentle orbit" / "arc around"
"static shot with ambient motion"
"tracking shot following"
"slow pan left/right"

Top 5 Stability Keywords

5大稳定性关键词

"face remains perfectly still"
"same face throughout, consistent features"
"background stays static"
"subtle natural motion only"
"locked camera, environmental motion only"

"face remains perfectly still"
"same face throughout, consistent features"
"background stays static"
"subtle natural motion only"
"locked camera, environmental motion only"

Top 5 Atmosphere Keywords (for tech thumbnails)

技术缩略图5大氛围关键词

"holographic panels pulse with [color] light"
"particles float gently upward"
"neon ambient glow shifts between [colors]"
"translucent UI elements drift slowly"
"volumetric light rays, atmospheric haze"

Last updated: 2026-03-02 | Research covers tools available as of early March 2026

"holographic panels pulse with [color] light"
"particles float gently upward"
"neon ambient glow shifts between [colors]"
"translucent UI elements drift slowly"
"volumetric light rays, atmospheric haze"

最后更新: 2026-03-02 | 研究涵盖2026年3月初可用的工具

image-to-video

Original

Translation

Image-to-Video AI Generation — Skill Reference

图像转视频AI生成技能参考

Table of Contents

目录

1. Tool Comparison Matrix

1. 工具对比矩阵

2. Detailed Tool Profiles

2. 工具详细介绍

Runway (Gen-4 / Gen-4.5)

Runway（Gen-4 / Gen-4.5）

Kling AI (Kling 3.0 / 2.6 Pro)

Kling AI（Kling 3.0 / 2.6 Pro）

Pika Labs (Pika 2.5)

Pika Labs（Pika 2.5）

Luma Dream Machine (Ray3)

Luma Dream Machine（Ray3）

OpenAI Sora (Sora 2)

OpenAI Sora（Sora 2）

Vidu (Vidu Q3)

Vidu（Vidu Q3）

Hailuo / MiniMax (Hailuo 2.3)

Hailuo / MiniMax（Hailuo 2.3）

Google Veo (Veo 3.1)

Google Veo（Veo 3.1）

Adobe Firefly Video

Adobe Firefly Video

Seedance 2.0 (ByteDance)

Seedance 2.0（字节跳动）

WAN 2.6 / 2.1 (Open Source)

WAN 2.6 / 2.1（开源）

3. Universal Prompt Best Practices

3. 通用提示词最佳实践

The Golden Rules

黄金法则

Prompt Structure Formula

提示词结构公式

The 8-Point Shot Grammar (Advanced)

8要素镜头语法（进阶）

4. Camera Movement Reference

4. 相机运动参考

Movement Types with Prompt Keywords

运动类型及提示词关键词

Combining Movements

组合运动

Speed/Pace Modifiers

速度/节奏修饰词

5. Subject Animation Guide

5. 主体动画指南

Subtle vs. Dramatic Motion Spectrum

细微与戏剧性运动范围

Animating Specific Elements

特定元素动画

6. Consistency & Stability

6. 一致性与稳定性

Keeping the Subject Stable

保持主体稳定

Maintaining Visual Coherence

保持视觉连贯性

Cross-Scene Consistency

跨场景一致性

7. Avoiding Distortion & Common Mistakes

7. 避免失真与常见错误

Top 10 Distortion Causes and Fixes

十大失真原因及解决方法

Negative Prompt Keywords (where supported)

负面提示词关键词（支持的工具）

Quality Safeguards

质量保障措施

8. Text & Logo Preservation

8. 文字与Logo保留

The Core Problem

核心问题

Mitigation Strategies

缓解策略

9. Thumbnail-to-Video Specific Guide

9. 缩略图转视频专属指南

For Tech/Coding Thumbnails (Dark bg, neon, workspace, developer)