image-to-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Image-to-Video — Pro Pack on RunComfy

图片转视频——RunComfy专业套件

runcomfy.com · HappyHorse I2V · Wan 2.7 · Seedance 2.0 Pro · GitHub

Image-to-video, intent-routed. This skill doesn't lock you to one model — it picks the right i2v model in the RunComfy catalog based on what the user actually wants: portrait animation, custom-voiceover lip-sync, or multi-modal composition.

bash

npx skills add agentspace-so/runcomfy-skills --skill image-to-video -g

runcomfy.com · HappyHorse I2V · Wan 2.7 · Seedance 2.0 Pro · GitHub

图片转视频，按意图路由。该Skill不会限制你使用单一模型——它会根据用户实际需求从RunComfy模型库中挑选合适的i2v模型：肖像动画、自定义旁白唇形同步或多模态合成。

bash

npx skills add agentspace-so/runcomfy-skills --skill image-to-video -g

Pick the right model for the user's intent

根据用户意图选择合适的模型

User intent	Model	Why
Animate a portrait — keep identity stable	HappyHorse 1.0 I2V	#1 on Artificial Analysis Arena (Elo 1392); strong facial fidelity
Product reveal / 360 / macro motion	HappyHorse 1.0 I2V	Geometry preservation + smooth camera moves
Native synchronized ambient audio in one pass	HappyHorse 1.0 I2V	In-pass audio synthesis
Animate and lip-sync to a custom voiceover track	Wan 2.7 + `audio_url`	Accepts your own MP3/WAV (3–30s, ≤15MB) and drives lip-sync to it
Multi-language dub variants (same image, different audio per call)	Wan 2.7 + `audio_url`	Same shot, swap `audio_url` per language
Multi-modal — image + reference video + reference audio together	Seedance 2.0 Pro	Up to 9 image refs, 3 video refs (2–15s each), 3 audio refs
Brand-consistent narrative with character ref + scene ref + voice ref	Seedance 2.0 Pro	Image holds identity, video holds scene, audio holds voice
Default if unspecified	HappyHorse 1.0 I2V	Best all-round quality + native audio

The agent reads this table, classifies the user's intent, and picks the matching subsection below.

用户意图	模型	选择理由
生成肖像动画——保持身份特征稳定	HappyHorse 1.0 I2V	Artificial Analysis Arena排名第1（Elo 1392）；面部保真度出色
产品展示/360度旋转/微距运动	HappyHorse 1.0 I2V	几何结构保留+流畅镜头移动
一键生成同步环境音	HappyHorse 1.0 I2V	支持生成过程中同步合成音频
生成动画并与自定义旁白音轨做唇形同步	Wan 2.7 + `audio_url`	支持上传自定义MP3/WAV（3–30秒，≤15MB）并驱动唇形同步
多语言配音变体（同一张图片，每次调用更换音频）	Wan 2.7 + `audio_url`	相同画面，更换 `audio_url` 即可生成不同语言版本
多模态——图片+参考视频+参考音频结合	Seedance 2.0 Pro	支持最多9张参考图片、3段参考视频（每段2–15秒）、3段参考音频
品牌一致叙事（角色参考+场景参考+声音参考）	Seedance 2.0 Pro	图片保留身份特征，视频保留场景风格，音频保留声音特质
用户未指定时的默认选择	HappyHorse 1.0 I2V	综合质量最佳+原生音频支持

Agent会读取此表格，对用户意图进行分类，并选择对应的下方模块。

Prerequisites

前置条件

RunComfy CLI —
```
npm i -g @runcomfy/cli
```
RunComfy account —
```
runcomfy login
```
opens a browser device-code flow.
CI / containers — set
```
RUNCOMFY_TOKEN=<token>
```
.
A source image URL — JPEG/PNG/WebP, min 300px, ≤10MB; aspect 1:2.5 to 2.5:1 (HappyHorse) — other models have similar specs.

RunComfy CLI — 执行
```
npm i -g @runcomfy/cli
```
安装
RunComfy账号 — 执行
```
runcomfy login
```
会打开浏览器设备码登录流程
CI/容器环境 — 设置环境变量
```
RUNCOMFY_TOKEN=<token>
```
源图片URL — 支持JPEG/PNG/WebP格式，最小300像素，≤10MB；HappyHorse支持的宽高比为1:2.5至2.5:1——其他模型规格类似

Route 1: HappyHorse 1.0 I2V — default for portrait / product / general animation

路由1：HappyHorse 1.0 I2V — 肖像/产品/通用动画默认选择

Model:

happyhorse/happyhorse-1-0/image-to-video

· Arena rank: #1 (Elo 1392)

模型：

happyhorse/happyhorse-1-0/image-to-video

· 竞技场排名：第1（Elo 1392）

Schema

参数 schema

Field	Type	Required	Default	Notes
`image_url`	string	yes	—	JPEG/JPG/PNG/WEBP. Min 300px. Aspect 1:2.5–2.5:1. ≤10MB.
`prompt`	string	yes	—	≤5000 non-CJK or 2500 CJK chars. Motion / camera / lighting description.
`resolution`	enum	no	`1080P`	`720P` or `1080P` .
`duration`	int	no	5	3–15 seconds.
`seed`	int	no	0	Reuse for variant comparisons.
`watermark`	bool	no	true	Provider watermark toggle.

Output aspect = input aspect. No independent reframing.

字段	类型	是否必填	默认值	说明
`image_url`	string	是	—	JPEG/JPG/PNG/WEBP格式。最小300像素。宽高比1:2.5–2.5:1。≤10MB。
`prompt`	string	是	—	非CJK字符≤5000个，CJK字符≤2500个。需描述运动/镜头/灯光效果。
`resolution`	枚举值	否	`1080P`	可选 `720P` 或 `1080P` 。
`duration`	整数	否	5	3–15秒。
`seed`	整数	否	0	重复使用可对比不同变体效果。
`watermark`	布尔值	否	true	可切换是否显示提供商水印。

输出宽高比与输入一致，不支持独立重构图。

Invoke

调用示例

bash

runcomfy run happyhorse/happyhorse-1-0/image-to-video \
  --input '{
    "image_url": "https://.../portrait.jpg",
    "prompt": "Gentle camera drift around the subject'\''s face, subtle breathing motion, identity-stable features, soft natural light."
  }' \
  --output-dir <absolute/path>

bash

runcomfy run happyhorse/happyhorse-1-0/image-to-video \
  --input '{
    "image_url": "https://.../portrait.jpg",
    "prompt": "Gentle camera drift around the subject'\''s face, subtle breathing motion, identity-stable features, soft natural light."
  }' \
  --output-dir <absolute/path>

Prompting tips

提示词技巧

Lead with motion verbs: "drift", "dolly in", "orbit", "tilt up", "reveal", "blink", "breathe". Front-load what's MOVING.
Don't restate the image — the model sees it. Focus tokens on what changes.
Preservation goals explicit: "identity-stable features", "packaging unchanged", "background geometry stable".
Lighting evolution: "rim light intensifying", "shadows shortening as camera rises".
One beat per clip — single primary motion (orbit OR dolly OR tilt OR character action).

以动作动词开头：使用“drift（缓慢移动）”“dolly in（推镜头）”“orbit（环绕）”“tilt up（向上倾斜）”“reveal（展示）”“blink（眨眼）”“breathe（呼吸）”等词汇，优先描述动态内容。
不要重复描述图片内容——模型已识别图片，重点描述变化的部分。
明确保留目标：比如“identity-stable features（身份特征稳定）”“packaging unchanged（包装不变）”“background geometry stable（背景几何结构稳定）”。
描述灯光变化：比如“rim light intensifying（轮廓光增强）”“shadows shortening as camera rises（镜头上升时阴影缩短）”。
每个片段一个核心动作——单一主要运动（环绕或推镜头或倾斜或角色动作）。

Route 2: Wan 2.7 +

audio_url

— when the user has a custom voiceover

路由2：Wan 2.7 +

audio_url

— 用户有自定义旁白时选择

Model:

wan-ai/wan-2-7/text-to-video

(NOT

/image-to-video

— Wan 2.7's t2v endpoint accepts an

audio_url

that drives lip-sync)

Note on i2v with Wan 2.7: Wan 2.7's primary i2v animation isn't on a dedicated endpoint here. For pure i2v (image animated by motion prompt only), prefer HappyHorse i2v. Use Wan 2.7 specifically when the user has a custom audio track they want lip-synced to a generated talking-head clip.

模型：

wan-ai/wan-2-7/text-to-video

（注意不是

/image-to-video

——Wan 2.7的文本转视频端点支持传入

audio_url

驱动唇形同步）

关于Wan 2.7的图片转视频说明：Wan 2.7的基础图片转视频动画未在此处的专用端点提供。若仅需通过运动提示词生成图片转视频内容，优先选择HappyHorse i2v。仅当用户需要将自定义音轨与生成的说话人片段做唇形同步时，才使用Wan 2.7。

Schema (Wan 2.7 t2v with audio)

参数 schema（带音频的Wan 2.7文本转视频）

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	Up to ~5000 chars. Describe the talking-head shot: framing, lighting, motion.
`audio_url`	string	yes (for lip-sync)	—	WAV/MP3, 3–30s, ≤15MB. Drives lip-sync.
`aspect_ratio`	enum	no	`16:9`	`16:9` , `9:16` , `1:1` , `4:3` , `3:4` .
`resolution`	enum	no	`1080p`	`720p` or `1080p` .
`duration`	enum	no	`5`	2–15 (whole seconds). Match your audio length.
`negative_prompt`	string	no	—	Concrete issues to avoid (e.g. "no subtitles, no flicker").
`seed`	int	no	—	Reproducibility.

字段	类型	是否必填	默认值	说明
`prompt`	string	是	—	最多约5000字符。描述说话人镜头：构图、灯光、动作。
`audio_url`	string	是（用于唇形同步）	—	WAV/MP3格式，3–30秒，≤15MB。驱动唇形同步。
`aspect_ratio`	枚举值	否	`16:9`	可选 `16:9` 、 `9:16` 、 `1:1` 、 `4:3` 、 `3:4` 。
`resolution`	枚举值	否	`1080p`	可选 `720p` 或 `1080p` 。
`duration`	枚举值	否	`5`	2–15秒（整数）。需与音频长度匹配。
`negative_prompt`	string	否	—	指定需要避免的具体问题（例如“no subtitles, no flicker”）。
`seed`	整数	否	—	用于复现结果。

Invoke

调用示例

bash

runcomfy run wan-ai/wan-2-7/text-to-video \
  --input '{
    "prompt": "Medium close-up of a confident spokesperson in a softly-lit recording booth, leaning slightly toward the camera, locked tripod, shallow DOF, warm key light from camera-left.",
    "audio_url": "https://.../voiceover-en.mp3",
    "duration": 12,
    "aspect_ratio": "9:16"
  }' \
  --output-dir <absolute/path>

bash

runcomfy run wan-ai/wan-2-7/text-to-video \
  --input '{
    "prompt": "Medium close-up of a confident spokesperson in a softly-lit recording booth, leaning slightly toward the camera, locked tripod, shallow DOF, warm key light from camera-left.",
    "audio_url": "https://.../voiceover-en.mp3",
    "duration": 12,
    "aspect_ratio": "9:16"
  }' \
  --output-dir <absolute/path>

Prompting tips

提示词技巧

Describe the talking-head shot — framing, lighting, lens feel. The audio drives the lip-sync; the prompt builds the visual frame around it.
Match
duration
to audio length — clip will be silent past the audio if too long.

Use
negative_prompt
for issues:

"no subtitles, no flicker, no distorted hands"

For multi-language dubs — same prompt, swap
```
audio_url
```
per call. Lock seed for visual consistency across languages.

描述说话人镜头——构图、灯光、镜头质感。音频驱动唇形同步，提示词用于构建周围的视觉框架。
duration
需与音频长度匹配——若时长过长，音频结束后片段会静音。

使用
negative_prompt
规避问题：例如

"no subtitles, no flicker, no distorted hands"

。

多语言配音——使用相同提示词，每次调用更换
```
audio_url
```
。锁定seed可保证不同语言版本的视觉一致性。

Route 3: Seedance 2.0 Pro — multi-modal animation (image + ref video + ref audio)

路由3：Seedance 2.0 Pro — 多模态动画（图片+参考视频+参考音频）

Model:

bytedance/seedance-v2/pro

Use when the user wants a single clip that combines: a subject image + scene from a reference video + voice tone from a reference audio.

模型：

bytedance/seedance-v2/pro

当用户需要将主体图片+参考视频场景+参考音频语调融合为单个片段时使用。

Schema (Seedance 2.0 Pro, i2v-relevant fields)

参数 schema（Seedance 2.0 Pro，图片转视频相关字段）

Field	Type	Required	Default	Notes
`prompt`	string	yes	—	CN ≤500 chars OR EN ≤1000 words.
`image_url`	array	yes (for i2v)	`[]`	0–9 images. First is the primary subject.
`video_url`	array	no	`[]`	0–3 reference clips (MP4/MOV), 2–15s each.
`audio_url`	array	no	`[]`	0–3 reference audio (WAV/MP3), 2–15s, < 15MB each.
`aspect_ratio`	enum	no	`adaptive`	`adaptive` , `16:9` , `9:16` , `4:3` , `3:4` , `1:1` , `21:9` .
`duration`	int	no	5	4–15 (whole seconds).
`resolution`	enum	no	`720p`	`480p` or `720p` .
`generate_audio`	bool	no	true	In-pass synchronized speech / SFX / music.
`seed`	int	no	—	Reproducibility.

字段	类型	是否必填	默认值	说明
`prompt`	string	是	—	中文≤500字符或英文≤1000词。
`image_url`	数组	是（用于图片转视频）	`[]`	0–9张图片。第一张为主体。
`video_url`	数组	否	`[]`	0–3段参考视频（MP4/MOV），每段2–15秒。
`audio_url`	数组	否	`[]`	0–3段参考音频（WAV/MP3），每段2–15秒，<15MB。
`aspect_ratio`	枚举值	否	`adaptive`	可选 `adaptive` 、 `16:9` 、 `9:16` 、 `4:3` 、 `3:4` 、 `1:1` 、 `21:9` 。
`duration`	整数	否	5	4–15秒（整数）。
`resolution`	枚举值	否	`720p`	可选 `480p` 或 `720p` 。
`generate_audio`	布尔值	否	true	生成过程中同步合成语音/音效/音乐。
`seed`	整数	否	—	用于复现结果。

Invoke

调用示例

bash

runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "Subject from image 1 walks through the café in video 1, voice tone matches audio 1. Medium close-up, slow push-in, warm light, gentle ambience.",
    "image_url": ["https://.../subject.jpg"],
    "video_url": ["https://.../cafe-locked-shot.mp4"],
    "audio_url": ["https://.../voice-tone.mp3"],
    "duration": 8
  }' \
  --output-dir <absolute/path>

bash

runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "Subject from image 1 walks through the café in video 1, voice tone matches audio 1. Medium close-up, slow push-in, warm light, gentle ambience.",
    "image_url": ["https://.../subject.jpg"],
    "video_url": ["https://.../cafe-locked-shot.mp4"],
    "audio_url": ["https://.../voice-tone.mp3"],
    "duration": 8
  }' \
  --output-dir <absolute/path>

Prompting tips

提示词技巧

Image vs text division — use
```
image_url
```
for what must stay stable (face, costume, brand); use
```
prompt
```
for what should evolve (action, mood, lighting).

Number the refs in the prompt:

"subject from image 1, lighting from video 1, voice from audio 1"

. Seedance routes cues correctly.

Reference media specs — videos / audio must be 2–15s; audio < 15MB.
Don't mix radically different aesthetics — if image 1 is a watercolor and video 1 is photoreal, output drifts.

图片与文本分工——使用
```
image_url
```
指定必须保持稳定的内容（面部、服装、品牌）；使用
```
prompt
```
描述需要变化的内容（动作、氛围、灯光）。
在提示词中为参考素材编号：例如
```
"subject from image 1, lighting from video 1, voice from audio 1"
```
。Seedance会正确匹配对应素材。
参考媒体规格——视频/音频必须为2–15秒；音频<15MB。
不要混合差异过大的美学风格——若图片是水彩风格，视频是写实风格，输出效果会偏离预期。

Limitations

局限性

Each route inherits its model's limits. HappyHorse: 15s cap, output aspect = input aspect. Wan 2.7: 15s cap, audio 3–30s/15MB. Seedance: 720p ceiling on this template, 15s cap.
No multi-route blending. This skill picks one model per call. If the user wants HappyHorse animation + Wan-style lip-sync in the same clip, that's two calls + a stitch (out of scope here).
Brand-specific overrides — if the user named a specific model variant not listed (e.g. Wan 2.6, Seedance 1.5), route to the corresponding brand skill (
```
wan-2-7
```
,
```
seedance-v2
```
) instead of forcing it through here.

每个路由继承对应模型的限制。HappyHorse：最长15秒，输出宽高比与输入一致。Wan 2.7：最长15秒，音频要求3–30秒/≤15MB。Seedance：此模板最高支持720p，最长15秒。
不支持多路由混合。该Skill每次调用仅选择一个模型。若用户需要HappyHorse动画+Wan风格唇形同步的同一片段，需分两次调用后拼接（不在本Skill范围内）。
品牌特定覆盖——若用户指定了未列出的特定模型变体（例如Wan 2.6、Seedance 1.5），请路由到对应品牌的Skill（
```
wan-2-7
```
、
```
seedance-v2
```
），而非强制使用本Skill。

Exit codes

退出码

code	meaning
0	success
64	bad CLI args
65	bad input JSON / schema mismatch
69	upstream 5xx
75	retryable: timeout / 429
77	not signed in or token rejected

Full reference: docs.runcomfy.com/cli/troubleshooting.

代码	含义
0	成功
64	CLI参数错误
65	输入JSON错误/schema不匹配
69	上游服务5xx错误
75	可重试：超时/429请求过多
77	未登录或令牌被拒绝

完整参考：docs.runcomfy.com/cli/troubleshooting。

How it works

工作原理

The skill picks one of HappyHorse 1.0 I2V / Wan 2.7 t2v+audio / Seedance 2.0 Pro based on user intent and invokes

runcomfy run <model_id>

with the matching JSON body. The CLI POSTs to the Model API, polls the request, fetches the result, and downloads any

.runcomfy.net

.runcomfy.com

URL into

--output-dir

Ctrl-C

cancels the remote request before exit.

该Skill会根据用户意图选择HappyHorse 1.0 I2V/Wan 2.7文本转视频+音频/Seedance 2.0 Pro中的一个，并调用

runcomfy run <model_id>

传入匹配的JSON参数。CLI会向模型API发送POST请求，轮询请求状态，获取结果，并将

.runcomfy.net

.runcomfy.com

域名下的生成文件下载到

--output-dir

目录。按

Ctrl-C

会在退出前取消远程请求。

Security & Privacy

安全与隐私

Token storage:
```
runcomfy login
```
writes the API token to
```
~/.config/runcomfy/token.json
```
with mode 0600 (owner-only read/write). Set
```
RUNCOMFY_TOKEN
```
env var to bypass the file entirely in CI / containers.
Input boundary: the user prompt is passed as a JSON string to the CLI via
```
--input
```
. The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
Outbound endpoints: only
```
model-api.runcomfy.net
```
(request submission) and
```
*.runcomfy.net
```
/
```
*.runcomfy.com
```
(download whitelist for generated outputs). No telemetry, no callbacks.
Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.

令牌存储：
```
runcomfy login
```
会将API令牌写入
```
~/.config/runcomfy/token.json
```
，权限为0600（仅所有者可读写）。在CI/容器环境中，可设置环境变量
```
RUNCOMFY_TOKEN
```
绕过文件存储。
输入边界：用户提示词通过
```
--input
```
以JSON字符串形式传递给CLI。CLI不会对提示词进行shell展开，而是直接通过HTTPS将JSON主体传输给模型API。提示词内容不存在shell注入风险。
第三方内容：你传入的图片/遮罩/视频URL由RunComfy模型服务器获取，而非本地CLI。请将外部URL视为不可信；基于图片的提示词注入是所有图片编辑/视频编辑模型的已知风险。
出站端点：仅允许访问
```
model-api.runcomfy.net
```
（提交请求）和
```
*.runcomfy.net
```
/
```
*.runcomfy.com
```
（下载生成内容的白名单）。无遥测，无回调。
生成文件大小限制：CLI会终止任何超过2 GiB的单个下载，防止恶意或异常模型输出占满磁盘。