image-to-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseImage-to-Video — Pro Pack on RunComfy
图片转视频——RunComfy专业套件
Image-to-video, intent-routed. This skill doesn't lock you to one model — it picks the right i2v model in the RunComfy catalog based on what the user actually wants: portrait animation, custom-voiceover lip-sync, or multi-modal composition.
bash
npx skills add agentspace-so/runcomfy-skills --skill image-to-video -g图片转视频,按意图路由。该Skill不会限制你使用单一模型——它会根据用户实际需求从RunComfy模型库中挑选合适的i2v模型:肖像动画、自定义旁白唇形同步或多模态合成。
bash
npx skills add agentspace-so/runcomfy-skills --skill image-to-video -gPick the right model for the user's intent
根据用户意图选择合适的模型
| User intent | Model | Why |
|---|---|---|
| Animate a portrait — keep identity stable | HappyHorse 1.0 I2V | #1 on Artificial Analysis Arena (Elo 1392); strong facial fidelity |
| Product reveal / 360 / macro motion | HappyHorse 1.0 I2V | Geometry preservation + smooth camera moves |
| Native synchronized ambient audio in one pass | HappyHorse 1.0 I2V | In-pass audio synthesis |
| Animate and lip-sync to a custom voiceover track | Wan 2.7 + | Accepts your own MP3/WAV (3–30s, ≤15MB) and drives lip-sync to it |
| Multi-language dub variants (same image, different audio per call) | Wan 2.7 + | Same shot, swap |
| Multi-modal — image + reference video + reference audio together | Seedance 2.0 Pro | Up to 9 image refs, 3 video refs (2–15s each), 3 audio refs |
| Brand-consistent narrative with character ref + scene ref + voice ref | Seedance 2.0 Pro | Image holds identity, video holds scene, audio holds voice |
| Default if unspecified | HappyHorse 1.0 I2V | Best all-round quality + native audio |
The agent reads this table, classifies the user's intent, and picks the matching subsection below.
| 用户意图 | 模型 | 选择理由 |
|---|---|---|
| 生成肖像动画——保持身份特征稳定 | HappyHorse 1.0 I2V | Artificial Analysis Arena排名第1(Elo 1392);面部保真度出色 |
| 产品展示/360度旋转/微距运动 | HappyHorse 1.0 I2V | 几何结构保留+流畅镜头移动 |
| 一键生成同步环境音 | HappyHorse 1.0 I2V | 支持生成过程中同步合成音频 |
| 生成动画并与自定义旁白音轨做唇形同步 | Wan 2.7 + | 支持上传自定义MP3/WAV(3–30秒,≤15MB)并驱动唇形同步 |
| 多语言配音变体(同一张图片,每次调用更换音频) | Wan 2.7 + | 相同画面,更换 |
| 多模态——图片+参考视频+参考音频结合 | Seedance 2.0 Pro | 支持最多9张参考图片、3段参考视频(每段2–15秒)、3段参考音频 |
| 品牌一致叙事(角色参考+场景参考+声音参考) | Seedance 2.0 Pro | 图片保留身份特征,视频保留场景风格,音频保留声音特质 |
| 用户未指定时的默认选择 | HappyHorse 1.0 I2V | 综合质量最佳+原生音频支持 |
Agent会读取此表格,对用户意图进行分类,并选择对应的下方模块。
Prerequisites
前置条件
- RunComfy CLI —
npm i -g @runcomfy/cli - RunComfy account — opens a browser device-code flow.
runcomfy login - CI / containers — set .
RUNCOMFY_TOKEN=<token> - A source image URL — JPEG/PNG/WebP, min 300px, ≤10MB; aspect 1:2.5 to 2.5:1 (HappyHorse) — other models have similar specs.
- RunComfy CLI — 执行安装
npm i -g @runcomfy/cli - RunComfy账号 — 执行会打开浏览器设备码登录流程
runcomfy login - CI/容器环境 — 设置环境变量
RUNCOMFY_TOKEN=<token> - 源图片URL — 支持JPEG/PNG/WebP格式,最小300像素,≤10MB;HappyHorse支持的宽高比为1:2.5至2.5:1——其他模型规格类似
Route 1: HappyHorse 1.0 I2V — default for portrait / product / general animation
路由1:HappyHorse 1.0 I2V — 肖像/产品/通用动画默认选择
Model: · Arena rank: #1 (Elo 1392)
happyhorse/happyhorse-1-0/image-to-video模型: · 竞技场排名:第1(Elo 1392)
happyhorse/happyhorse-1-0/image-to-videoSchema
参数 schema
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
| string | yes | — | JPEG/JPG/PNG/WEBP. Min 300px. Aspect 1:2.5–2.5:1. ≤10MB. |
| string | yes | — | ≤5000 non-CJK or 2500 CJK chars. Motion / camera / lighting description. |
| enum | no | | |
| int | no | 5 | 3–15 seconds. |
| int | no | 0 | Reuse for variant comparisons. |
| bool | no | true | Provider watermark toggle. |
Output aspect = input aspect. No independent reframing.
| 字段 | 类型 | 是否必填 | 默认值 | 说明 |
|---|---|---|---|---|
| string | 是 | — | JPEG/JPG/PNG/WEBP格式。最小300像素。宽高比1:2.5–2.5:1。≤10MB。 |
| string | 是 | — | 非CJK字符≤5000个,CJK字符≤2500个。需描述运动/镜头/灯光效果。 |
| 枚举值 | 否 | | 可选 |
| 整数 | 否 | 5 | 3–15秒。 |
| 整数 | 否 | 0 | 重复使用可对比不同变体效果。 |
| 布尔值 | 否 | true | 可切换是否显示提供商水印。 |
输出宽高比与输入一致,不支持独立重构图。
Invoke
调用示例
bash
runcomfy run happyhorse/happyhorse-1-0/image-to-video \
--input '{
"image_url": "https://.../portrait.jpg",
"prompt": "Gentle camera drift around the subject'\''s face, subtle breathing motion, identity-stable features, soft natural light."
}' \
--output-dir <absolute/path>bash
runcomfy run happyhorse/happyhorse-1-0/image-to-video \
--input '{
"image_url": "https://.../portrait.jpg",
"prompt": "Gentle camera drift around the subject'\''s face, subtle breathing motion, identity-stable features, soft natural light."
}' \
--output-dir <absolute/path>Prompting tips
提示词技巧
- Lead with motion verbs: "drift", "dolly in", "orbit", "tilt up", "reveal", "blink", "breathe". Front-load what's MOVING.
- Don't restate the image — the model sees it. Focus tokens on what changes.
- Preservation goals explicit: "identity-stable features", "packaging unchanged", "background geometry stable".
- Lighting evolution: "rim light intensifying", "shadows shortening as camera rises".
- One beat per clip — single primary motion (orbit OR dolly OR tilt OR character action).
- 以动作动词开头:使用“drift(缓慢移动)”“dolly in(推镜头)”“orbit(环绕)”“tilt up(向上倾斜)”“reveal(展示)”“blink(眨眼)”“breathe(呼吸)”等词汇,优先描述动态内容。
- 不要重复描述图片内容——模型已识别图片,重点描述变化的部分。
- 明确保留目标:比如“identity-stable features(身份特征稳定)”“packaging unchanged(包装不变)”“background geometry stable(背景几何结构稳定)”。
- 描述灯光变化:比如“rim light intensifying(轮廓光增强)”“shadows shortening as camera rises(镜头上升时阴影缩短)”。
- 每个片段一个核心动作——单一主要运动(环绕或推镜头或倾斜或角色动作)。
Route 2: Wan 2.7 + audio_url
— when the user has a custom voiceover
audio_url路由2:Wan 2.7 + audio_url
— 用户有自定义旁白时选择
audio_urlModel: (NOT — Wan 2.7's t2v endpoint accepts an that drives lip-sync)
wan-ai/wan-2-7/text-to-video/image-to-videoaudio_urlNote on i2v with Wan 2.7: Wan 2.7's primary i2v animation isn't on a dedicated endpoint here. For pure i2v (image animated by motion prompt only), prefer HappyHorse i2v. Use Wan 2.7 specifically when the user has a custom audio track they want lip-synced to a generated talking-head clip.
模型:(注意不是——Wan 2.7的文本转视频端点支持传入驱动唇形同步)
wan-ai/wan-2-7/text-to-video/image-to-videoaudio_url关于Wan 2.7的图片转视频说明:Wan 2.7的基础图片转视频动画未在此处的专用端点提供。若仅需通过运动提示词生成图片转视频内容,优先选择HappyHorse i2v。仅当用户需要将自定义音轨与生成的说话人片段做唇形同步时,才使用Wan 2.7。
Schema (Wan 2.7 t2v with audio)
参数 schema(带音频的Wan 2.7文本转视频)
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
| string | yes | — | Up to ~5000 chars. Describe the talking-head shot: framing, lighting, motion. |
| string | yes (for lip-sync) | — | WAV/MP3, 3–30s, ≤15MB. Drives lip-sync. |
| enum | no | | |
| enum | no | | |
| enum | no | | 2–15 (whole seconds). Match your audio length. |
| string | no | — | Concrete issues to avoid (e.g. "no subtitles, no flicker"). |
| int | no | — | Reproducibility. |
| 字段 | 类型 | 是否必填 | 默认值 | 说明 |
|---|---|---|---|---|
| string | 是 | — | 最多约5000字符。描述说话人镜头:构图、灯光、动作。 |
| string | 是(用于唇形同步) | — | WAV/MP3格式,3–30秒,≤15MB。驱动唇形同步。 |
| 枚举值 | 否 | | 可选 |
| 枚举值 | 否 | | 可选 |
| 枚举值 | 否 | | 2–15秒(整数)。需与音频长度匹配。 |
| string | 否 | — | 指定需要避免的具体问题(例如“no subtitles, no flicker”)。 |
| 整数 | 否 | — | 用于复现结果。 |
Invoke
调用示例
bash
runcomfy run wan-ai/wan-2-7/text-to-video \
--input '{
"prompt": "Medium close-up of a confident spokesperson in a softly-lit recording booth, leaning slightly toward the camera, locked tripod, shallow DOF, warm key light from camera-left.",
"audio_url": "https://.../voiceover-en.mp3",
"duration": 12,
"aspect_ratio": "9:16"
}' \
--output-dir <absolute/path>bash
runcomfy run wan-ai/wan-2-7/text-to-video \
--input '{
"prompt": "Medium close-up of a confident spokesperson in a softly-lit recording booth, leaning slightly toward the camera, locked tripod, shallow DOF, warm key light from camera-left.",
"audio_url": "https://.../voiceover-en.mp3",
"duration": 12,
"aspect_ratio": "9:16"
}' \
--output-dir <absolute/path>Prompting tips
提示词技巧
- Describe the talking-head shot — framing, lighting, lens feel. The audio drives the lip-sync; the prompt builds the visual frame around it.
- Match to audio length — clip will be silent past the audio if too long.
duration - Use for issues:
negative_prompt."no subtitles, no flicker, no distorted hands" - For multi-language dubs — same prompt, swap per call. Lock seed for visual consistency across languages.
audio_url
- 描述说话人镜头——构图、灯光、镜头质感。音频驱动唇形同步,提示词用于构建周围的视觉框架。
- 需与音频长度匹配——若时长过长,音频结束后片段会静音。
duration - 使用规避问题:例如
negative_prompt。"no subtitles, no flicker, no distorted hands" - 多语言配音——使用相同提示词,每次调用更换。锁定seed可保证不同语言版本的视觉一致性。
audio_url
Route 3: Seedance 2.0 Pro — multi-modal animation (image + ref video + ref audio)
路由3:Seedance 2.0 Pro — 多模态动画(图片+参考视频+参考音频)
Model:
bytedance/seedance-v2/proUse when the user wants a single clip that combines: a subject image + scene from a reference video + voice tone from a reference audio.
模型:
bytedance/seedance-v2/pro当用户需要将主体图片+参考视频场景+参考音频语调融合为单个片段时使用。
Schema (Seedance 2.0 Pro, i2v-relevant fields)
参数 schema(Seedance 2.0 Pro,图片转视频相关字段)
| Field | Type | Required | Default | Notes |
|---|---|---|---|---|
| string | yes | — | CN ≤500 chars OR EN ≤1000 words. |
| array | yes (for i2v) | | 0–9 images. First is the primary subject. |
| array | no | | 0–3 reference clips (MP4/MOV), 2–15s each. |
| array | no | | 0–3 reference audio (WAV/MP3), 2–15s, < 15MB each. |
| enum | no | | |
| int | no | 5 | 4–15 (whole seconds). |
| enum | no | | |
| bool | no | true | In-pass synchronized speech / SFX / music. |
| int | no | — | Reproducibility. |
| 字段 | 类型 | 是否必填 | 默认值 | 说明 |
|---|---|---|---|---|
| string | 是 | — | 中文≤500字符或英文≤1000词。 |
| 数组 | 是(用于图片转视频) | | 0–9张图片。第一张为主体。 |
| 数组 | 否 | | 0–3段参考视频(MP4/MOV),每段2–15秒。 |
| 数组 | 否 | | 0–3段参考音频(WAV/MP3),每段2–15秒,<15MB。 |
| 枚举值 | 否 | | 可选 |
| 整数 | 否 | 5 | 4–15秒(整数)。 |
| 枚举值 | 否 | | 可选 |
| 布尔值 | 否 | true | 生成过程中同步合成语音/音效/音乐。 |
| 整数 | 否 | — | 用于复现结果。 |
Invoke
调用示例
bash
runcomfy run bytedance/seedance-v2/pro \
--input '{
"prompt": "Subject from image 1 walks through the café in video 1, voice tone matches audio 1. Medium close-up, slow push-in, warm light, gentle ambience.",
"image_url": ["https://.../subject.jpg"],
"video_url": ["https://.../cafe-locked-shot.mp4"],
"audio_url": ["https://.../voice-tone.mp3"],
"duration": 8
}' \
--output-dir <absolute/path>bash
runcomfy run bytedance/seedance-v2/pro \
--input '{
"prompt": "Subject from image 1 walks through the café in video 1, voice tone matches audio 1. Medium close-up, slow push-in, warm light, gentle ambience.",
"image_url": ["https://.../subject.jpg"],
"video_url": ["https://.../cafe-locked-shot.mp4"],
"audio_url": ["https://.../voice-tone.mp3"],
"duration": 8
}' \
--output-dir <absolute/path>Prompting tips
提示词技巧
- Image vs text division — use for what must stay stable (face, costume, brand); use
image_urlfor what should evolve (action, mood, lighting).prompt - Number the refs in the prompt: . Seedance routes cues correctly.
"subject from image 1, lighting from video 1, voice from audio 1" - Reference media specs — videos / audio must be 2–15s; audio < 15MB.
- Don't mix radically different aesthetics — if image 1 is a watercolor and video 1 is photoreal, output drifts.
- 图片与文本分工——使用指定必须保持稳定的内容(面部、服装、品牌);使用
image_url描述需要变化的内容(动作、氛围、灯光)。prompt - 在提示词中为参考素材编号:例如。Seedance会正确匹配对应素材。
"subject from image 1, lighting from video 1, voice from audio 1" - 参考媒体规格——视频/音频必须为2–15秒;音频<15MB。
- 不要混合差异过大的美学风格——若图片是水彩风格,视频是写实风格,输出效果会偏离预期。
Limitations
局限性
- Each route inherits its model's limits. HappyHorse: 15s cap, output aspect = input aspect. Wan 2.7: 15s cap, audio 3–30s/15MB. Seedance: 720p ceiling on this template, 15s cap.
- No multi-route blending. This skill picks one model per call. If the user wants HappyHorse animation + Wan-style lip-sync in the same clip, that's two calls + a stitch (out of scope here).
- Brand-specific overrides — if the user named a specific model variant not listed (e.g. Wan 2.6, Seedance 1.5), route to the corresponding brand skill (,
wan-2-7) instead of forcing it through here.seedance-v2
- 每个路由继承对应模型的限制。HappyHorse:最长15秒,输出宽高比与输入一致。Wan 2.7:最长15秒,音频要求3–30秒/≤15MB。Seedance:此模板最高支持720p,最长15秒。
- 不支持多路由混合。该Skill每次调用仅选择一个模型。若用户需要HappyHorse动画+Wan风格唇形同步的同一片段,需分两次调用后拼接(不在本Skill范围内)。
- 品牌特定覆盖——若用户指定了未列出的特定模型变体(例如Wan 2.6、Seedance 1.5),请路由到对应品牌的Skill(、
wan-2-7),而非强制使用本Skill。seedance-v2
Exit codes
退出码
| code | meaning |
|---|---|
| 0 | success |
| 64 | bad CLI args |
| 65 | bad input JSON / schema mismatch |
| 69 | upstream 5xx |
| 75 | retryable: timeout / 429 |
| 77 | not signed in or token rejected |
Full reference: docs.runcomfy.com/cli/troubleshooting.
| 代码 | 含义 |
|---|---|
| 0 | 成功 |
| 64 | CLI参数错误 |
| 65 | 输入JSON错误/schema不匹配 |
| 69 | 上游服务5xx错误 |
| 75 | 可重试:超时/429请求过多 |
| 77 | 未登录或令牌被拒绝 |
How it works
工作原理
The skill picks one of HappyHorse 1.0 I2V / Wan 2.7 t2v+audio / Seedance 2.0 Pro based on user intent and invokes with the matching JSON body. The CLI POSTs to the Model API, polls the request, fetches the result, and downloads any / URL into . cancels the remote request before exit.
runcomfy run <model_id>.runcomfy.net.runcomfy.com--output-dirCtrl-C该Skill会根据用户意图选择HappyHorse 1.0 I2V/Wan 2.7文本转视频+音频/Seedance 2.0 Pro中的一个,并调用传入匹配的JSON参数。CLI会向模型API发送POST请求,轮询请求状态,获取结果,并将/域名下的生成文件下载到目录。按会在退出前取消远程请求。
runcomfy run <model_id>.runcomfy.net.runcomfy.com--output-dirCtrl-CSecurity & Privacy
安全与隐私
- Token storage: writes the API token to
runcomfy loginwith mode 0600 (owner-only read/write). Set~/.config/runcomfy/token.jsonenv var to bypass the file entirely in CI / containers.RUNCOMFY_TOKEN - Input boundary: the user prompt is passed as a JSON string to the CLI via . The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
--input - Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
- Outbound endpoints: only (request submission) and
model-api.runcomfy.net/*.runcomfy.net(download whitelist for generated outputs). No telemetry, no callbacks.*.runcomfy.com - Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
- 令牌存储:会将API令牌写入
runcomfy login,权限为0600(仅所有者可读写)。在CI/容器环境中,可设置环境变量~/.config/runcomfy/token.json绕过文件存储。RUNCOMFY_TOKEN - 输入边界:用户提示词通过以JSON字符串形式传递给CLI。CLI不会对提示词进行shell展开,而是直接通过HTTPS将JSON主体传输给模型API。提示词内容不存在shell注入风险。
--input - 第三方内容:你传入的图片/遮罩/视频URL由RunComfy模型服务器获取,而非本地CLI。请将外部URL视为不可信;基于图片的提示词注入是所有图片编辑/视频编辑模型的已知风险。
- 出站端点:仅允许访问(提交请求)和
model-api.runcomfy.net/*.runcomfy.net(下载生成内容的白名单)。无遥测,无回调。*.runcomfy.com - 生成文件大小限制:CLI会终止任何超过2 GiB的单个下载,防止恶意或异常模型输出占满磁盘。