image-to-video

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Image-to-Video — Pro Pack on RunComfy

图片转视频——RunComfy专业套件

Image-to-video, intent-routed. This skill doesn't lock you to one model — it picks the right i2v model in the RunComfy catalog based on what the user actually wants: portrait animation, custom-voiceover lip-sync, or multi-modal composition.
bash
npx skills add agentspace-so/runcomfy-skills --skill image-to-video -g
图片转视频,按意图路由。该Skill不会限制你使用单一模型——它会根据用户实际需求从RunComfy模型库中挑选合适的i2v模型:肖像动画、自定义旁白唇形同步或多模态合成。
bash
npx skills add agentspace-so/runcomfy-skills --skill image-to-video -g

Pick the right model for the user's intent

根据用户意图选择合适的模型

User intentModelWhy
Animate a portrait — keep identity stableHappyHorse 1.0 I2V#1 on Artificial Analysis Arena (Elo 1392); strong facial fidelity
Product reveal / 360 / macro motionHappyHorse 1.0 I2VGeometry preservation + smooth camera moves
Native synchronized ambient audio in one passHappyHorse 1.0 I2VIn-pass audio synthesis
Animate and lip-sync to a custom voiceover trackWan 2.7 +
audio_url
Accepts your own MP3/WAV (3–30s, ≤15MB) and drives lip-sync to it
Multi-language dub variants (same image, different audio per call)Wan 2.7 +
audio_url
Same shot, swap
audio_url
per language
Multi-modal — image + reference video + reference audio togetherSeedance 2.0 ProUp to 9 image refs, 3 video refs (2–15s each), 3 audio refs
Brand-consistent narrative with character ref + scene ref + voice refSeedance 2.0 ProImage holds identity, video holds scene, audio holds voice
Default if unspecifiedHappyHorse 1.0 I2VBest all-round quality + native audio
The agent reads this table, classifies the user's intent, and picks the matching subsection below.
用户意图模型选择理由
生成肖像动画——保持身份特征稳定HappyHorse 1.0 I2VArtificial Analysis Arena排名第1(Elo 1392);面部保真度出色
产品展示/360度旋转/微距运动HappyHorse 1.0 I2V几何结构保留+流畅镜头移动
一键生成同步环境音HappyHorse 1.0 I2V支持生成过程中同步合成音频
生成动画并与自定义旁白音轨做唇形同步Wan 2.7 +
audio_url
支持上传自定义MP3/WAV(3–30秒,≤15MB)并驱动唇形同步
多语言配音变体(同一张图片,每次调用更换音频)Wan 2.7 +
audio_url
相同画面,更换
audio_url
即可生成不同语言版本
多模态——图片+参考视频+参考音频结合Seedance 2.0 Pro支持最多9张参考图片、3段参考视频(每段2–15秒)、3段参考音频
品牌一致叙事(角色参考+场景参考+声音参考)Seedance 2.0 Pro图片保留身份特征,视频保留场景风格,音频保留声音特质
用户未指定时的默认选择HappyHorse 1.0 I2V综合质量最佳+原生音频支持
Agent会读取此表格,对用户意图进行分类,并选择对应的下方模块。

Prerequisites

前置条件

  1. RunComfy CLI
    npm i -g @runcomfy/cli
  2. RunComfy account
    runcomfy login
    opens a browser device-code flow.
  3. CI / containers — set
    RUNCOMFY_TOKEN=<token>
    .
  4. A source image URL — JPEG/PNG/WebP, min 300px, ≤10MB; aspect 1:2.5 to 2.5:1 (HappyHorse) — other models have similar specs.

  1. RunComfy CLI — 执行
    npm i -g @runcomfy/cli
    安装
  2. RunComfy账号 — 执行
    runcomfy login
    会打开浏览器设备码登录流程
  3. CI/容器环境 — 设置环境变量
    RUNCOMFY_TOKEN=<token>
  4. 源图片URL — 支持JPEG/PNG/WebP格式,最小300像素,≤10MB;HappyHorse支持的宽高比为1:2.5至2.5:1——其他模型规格类似

Route 1: HappyHorse 1.0 I2V — default for portrait / product / general animation

路由1:HappyHorse 1.0 I2V — 肖像/产品/通用动画默认选择

Model:
happyhorse/happyhorse-1-0/image-to-video
· Arena rank: #1 (Elo 1392)
模型
happyhorse/happyhorse-1-0/image-to-video
· 竞技场排名:第1(Elo 1392)

Schema

参数 schema

FieldTypeRequiredDefaultNotes
image_url
stringyesJPEG/JPG/PNG/WEBP. Min 300px. Aspect 1:2.5–2.5:1. ≤10MB.
prompt
stringyes≤5000 non-CJK or 2500 CJK chars. Motion / camera / lighting description.
resolution
enumno
1080P
720P
or
1080P
.
duration
intno53–15 seconds.
seed
intno0Reuse for variant comparisons.
watermark
boolnotrueProvider watermark toggle.
Output aspect = input aspect. No independent reframing.
字段类型是否必填默认值说明
image_url
stringJPEG/JPG/PNG/WEBP格式。最小300像素。宽高比1:2.5–2.5:1。≤10MB。
prompt
string非CJK字符≤5000个,CJK字符≤2500个。需描述运动/镜头/灯光效果。
resolution
枚举值
1080P
可选
720P
1080P
duration
整数53–15秒。
seed
整数0重复使用可对比不同变体效果。
watermark
布尔值true可切换是否显示提供商水印。
输出宽高比与输入一致,不支持独立重构图。

Invoke

调用示例

bash
runcomfy run happyhorse/happyhorse-1-0/image-to-video \
  --input '{
    "image_url": "https://.../portrait.jpg",
    "prompt": "Gentle camera drift around the subject'\''s face, subtle breathing motion, identity-stable features, soft natural light."
  }' \
  --output-dir <absolute/path>
bash
runcomfy run happyhorse/happyhorse-1-0/image-to-video \
  --input '{
    "image_url": "https://.../portrait.jpg",
    "prompt": "Gentle camera drift around the subject'\''s face, subtle breathing motion, identity-stable features, soft natural light."
  }' \
  --output-dir <absolute/path>

Prompting tips

提示词技巧

  • Lead with motion verbs: "drift", "dolly in", "orbit", "tilt up", "reveal", "blink", "breathe". Front-load what's MOVING.
  • Don't restate the image — the model sees it. Focus tokens on what changes.
  • Preservation goals explicit: "identity-stable features", "packaging unchanged", "background geometry stable".
  • Lighting evolution: "rim light intensifying", "shadows shortening as camera rises".
  • One beat per clip — single primary motion (orbit OR dolly OR tilt OR character action).

  • 以动作动词开头:使用“drift(缓慢移动)”“dolly in(推镜头)”“orbit(环绕)”“tilt up(向上倾斜)”“reveal(展示)”“blink(眨眼)”“breathe(呼吸)”等词汇,优先描述动态内容
  • 不要重复描述图片内容——模型已识别图片,重点描述变化的部分。
  • 明确保留目标:比如“identity-stable features(身份特征稳定)”“packaging unchanged(包装不变)”“background geometry stable(背景几何结构稳定)”。
  • 描述灯光变化:比如“rim light intensifying(轮廓光增强)”“shadows shortening as camera rises(镜头上升时阴影缩短)”。
  • 每个片段一个核心动作——单一主要运动(环绕或推镜头或倾斜或角色动作)。

Route 2: Wan 2.7 +
audio_url
— when the user has a custom voiceover

路由2:Wan 2.7 +
audio_url
— 用户有自定义旁白时选择

Model:
wan-ai/wan-2-7/text-to-video
(NOT
/image-to-video
— Wan 2.7's t2v endpoint accepts an
audio_url
that drives lip-sync)
Note on i2v with Wan 2.7: Wan 2.7's primary i2v animation isn't on a dedicated endpoint here. For pure i2v (image animated by motion prompt only), prefer HappyHorse i2v. Use Wan 2.7 specifically when the user has a custom audio track they want lip-synced to a generated talking-head clip.
模型
wan-ai/wan-2-7/text-to-video
(注意不是
/image-to-video
——Wan 2.7的文本转视频端点支持传入
audio_url
驱动唇形同步)
关于Wan 2.7的图片转视频说明:Wan 2.7的基础图片转视频动画未在此处的专用端点提供。若仅需通过运动提示词生成图片转视频内容,优先选择HappyHorse i2v。仅当用户需要将自定义音轨与生成的说话人片段做唇形同步时,才使用Wan 2.7。

Schema (Wan 2.7 t2v with audio)

参数 schema(带音频的Wan 2.7文本转视频)

FieldTypeRequiredDefaultNotes
prompt
stringyesUp to ~5000 chars. Describe the talking-head shot: framing, lighting, motion.
audio_url
stringyes (for lip-sync)WAV/MP3, 3–30s, ≤15MB. Drives lip-sync.
aspect_ratio
enumno
16:9
16:9
,
9:16
,
1:1
,
4:3
,
3:4
.
resolution
enumno
1080p
720p
or
1080p
.
duration
enumno
5
2–15 (whole seconds). Match your audio length.
negative_prompt
stringnoConcrete issues to avoid (e.g. "no subtitles, no flicker").
seed
intnoReproducibility.
字段类型是否必填默认值说明
prompt
string最多约5000字符。描述说话人镜头:构图、灯光、动作。
audio_url
string是(用于唇形同步)WAV/MP3格式,3–30秒,≤15MB。驱动唇形同步
aspect_ratio
枚举值
16:9
可选
16:9
9:16
1:1
4:3
3:4
resolution
枚举值
1080p
可选
720p
1080p
duration
枚举值
5
2–15秒(整数)。需与音频长度匹配。
negative_prompt
string指定需要避免的具体问题(例如“no subtitles, no flicker”)。
seed
整数用于复现结果。

Invoke

调用示例

bash
runcomfy run wan-ai/wan-2-7/text-to-video \
  --input '{
    "prompt": "Medium close-up of a confident spokesperson in a softly-lit recording booth, leaning slightly toward the camera, locked tripod, shallow DOF, warm key light from camera-left.",
    "audio_url": "https://.../voiceover-en.mp3",
    "duration": 12,
    "aspect_ratio": "9:16"
  }' \
  --output-dir <absolute/path>
bash
runcomfy run wan-ai/wan-2-7/text-to-video \
  --input '{
    "prompt": "Medium close-up of a confident spokesperson in a softly-lit recording booth, leaning slightly toward the camera, locked tripod, shallow DOF, warm key light from camera-left.",
    "audio_url": "https://.../voiceover-en.mp3",
    "duration": 12,
    "aspect_ratio": "9:16"
  }' \
  --output-dir <absolute/path>

Prompting tips

提示词技巧

  • Describe the talking-head shot — framing, lighting, lens feel. The audio drives the lip-sync; the prompt builds the visual frame around it.
  • Match
    duration
    to audio length
    — clip will be silent past the audio if too long.
  • Use
    negative_prompt
    for issues
    :
    "no subtitles, no flicker, no distorted hands"
    .
  • For multi-language dubs — same prompt, swap
    audio_url
    per call. Lock seed for visual consistency across languages.

  • 描述说话人镜头——构图、灯光、镜头质感。音频驱动唇形同步,提示词用于构建周围的视觉框架。
  • duration
    需与音频长度匹配
    ——若时长过长,音频结束后片段会静音。
  • 使用
    negative_prompt
    规避问题
    :例如
    "no subtitles, no flicker, no distorted hands"
  • 多语言配音——使用相同提示词,每次调用更换
    audio_url
    。锁定seed可保证不同语言版本的视觉一致性。

Route 3: Seedance 2.0 Pro — multi-modal animation (image + ref video + ref audio)

路由3:Seedance 2.0 Pro — 多模态动画(图片+参考视频+参考音频)

Model:
bytedance/seedance-v2/pro
Use when the user wants a single clip that combines: a subject image + scene from a reference video + voice tone from a reference audio.
模型
bytedance/seedance-v2/pro
当用户需要将主体图片+参考视频场景+参考音频语调融合为单个片段时使用。

Schema (Seedance 2.0 Pro, i2v-relevant fields)

参数 schema(Seedance 2.0 Pro,图片转视频相关字段)

FieldTypeRequiredDefaultNotes
prompt
stringyesCN ≤500 chars OR EN ≤1000 words.
image_url
arrayyes (for i2v)
[]
0–9 images. First is the primary subject.
video_url
arrayno
[]
0–3 reference clips (MP4/MOV), 2–15s each.
audio_url
arrayno
[]
0–3 reference audio (WAV/MP3), 2–15s, < 15MB each.
aspect_ratio
enumno
adaptive
adaptive
,
16:9
,
9:16
,
4:3
,
3:4
,
1:1
,
21:9
.
duration
intno54–15 (whole seconds).
resolution
enumno
720p
480p
or
720p
.
generate_audio
boolnotrueIn-pass synchronized speech / SFX / music.
seed
intnoReproducibility.
字段类型是否必填默认值说明
prompt
string中文≤500字符或英文≤1000词。
image_url
数组是(用于图片转视频)
[]
0–9张图片。第一张为主体
video_url
数组
[]
0–3段参考视频(MP4/MOV),每段2–15秒。
audio_url
数组
[]
0–3段参考音频(WAV/MP3),每段2–15秒,<15MB。
aspect_ratio
枚举值
adaptive
可选
adaptive
16:9
9:16
4:3
3:4
1:1
21:9
duration
整数54–15秒(整数)。
resolution
枚举值
720p
可选
480p
720p
generate_audio
布尔值true生成过程中同步合成语音/音效/音乐。
seed
整数用于复现结果。

Invoke

调用示例

bash
runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "Subject from image 1 walks through the café in video 1, voice tone matches audio 1. Medium close-up, slow push-in, warm light, gentle ambience.",
    "image_url": ["https://.../subject.jpg"],
    "video_url": ["https://.../cafe-locked-shot.mp4"],
    "audio_url": ["https://.../voice-tone.mp3"],
    "duration": 8
  }' \
  --output-dir <absolute/path>
bash
runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "Subject from image 1 walks through the café in video 1, voice tone matches audio 1. Medium close-up, slow push-in, warm light, gentle ambience.",
    "image_url": ["https://.../subject.jpg"],
    "video_url": ["https://.../cafe-locked-shot.mp4"],
    "audio_url": ["https://.../voice-tone.mp3"],
    "duration": 8
  }' \
  --output-dir <absolute/path>

Prompting tips

提示词技巧

  • Image vs text division — use
    image_url
    for what must stay stable (face, costume, brand); use
    prompt
    for what should evolve (action, mood, lighting).
  • Number the refs in the prompt:
    "subject from image 1, lighting from video 1, voice from audio 1"
    . Seedance routes cues correctly.
  • Reference media specs — videos / audio must be 2–15s; audio < 15MB.
  • Don't mix radically different aesthetics — if image 1 is a watercolor and video 1 is photoreal, output drifts.

  • 图片与文本分工——使用
    image_url
    指定必须保持稳定的内容(面部、服装、品牌);使用
    prompt
    描述需要变化的内容(动作、氛围、灯光)。
  • 在提示词中为参考素材编号:例如
    "subject from image 1, lighting from video 1, voice from audio 1"
    。Seedance会正确匹配对应素材。
  • 参考媒体规格——视频/音频必须为2–15秒;音频<15MB。
  • 不要混合差异过大的美学风格——若图片是水彩风格,视频是写实风格,输出效果会偏离预期。

Limitations

局限性

  • Each route inherits its model's limits. HappyHorse: 15s cap, output aspect = input aspect. Wan 2.7: 15s cap, audio 3–30s/15MB. Seedance: 720p ceiling on this template, 15s cap.
  • No multi-route blending. This skill picks one model per call. If the user wants HappyHorse animation + Wan-style lip-sync in the same clip, that's two calls + a stitch (out of scope here).
  • Brand-specific overrides — if the user named a specific model variant not listed (e.g. Wan 2.6, Seedance 1.5), route to the corresponding brand skill (
    wan-2-7
    ,
    seedance-v2
    ) instead of forcing it through here.
  • 每个路由继承对应模型的限制。HappyHorse:最长15秒,输出宽高比与输入一致。Wan 2.7:最长15秒,音频要求3–30秒/≤15MB。Seedance:此模板最高支持720p,最长15秒。
  • 不支持多路由混合。该Skill每次调用仅选择一个模型。若用户需要HappyHorse动画+Wan风格唇形同步的同一片段,需分两次调用后拼接(不在本Skill范围内)。
  • 品牌特定覆盖——若用户指定了未列出的特定模型变体(例如Wan 2.6、Seedance 1.5),请路由到对应品牌的Skill(
    wan-2-7
    seedance-v2
    ),而非强制使用本Skill。

Exit codes

退出码

codemeaning
0success
64bad CLI args
65bad input JSON / schema mismatch
69upstream 5xx
75retryable: timeout / 429
77not signed in or token rejected
代码含义
0成功
64CLI参数错误
65输入JSON错误/schema不匹配
69上游服务5xx错误
75可重试:超时/429请求过多
77未登录或令牌被拒绝

How it works

工作原理

The skill picks one of HappyHorse 1.0 I2V / Wan 2.7 t2v+audio / Seedance 2.0 Pro based on user intent and invokes
runcomfy run <model_id>
with the matching JSON body. The CLI POSTs to the Model API, polls the request, fetches the result, and downloads any
.runcomfy.net
/
.runcomfy.com
URL into
--output-dir
.
Ctrl-C
cancels the remote request before exit.
该Skill会根据用户意图选择HappyHorse 1.0 I2V/Wan 2.7文本转视频+音频/Seedance 2.0 Pro中的一个,并调用
runcomfy run <model_id>
传入匹配的JSON参数。CLI会向模型API发送POST请求,轮询请求状态,获取结果,并将
.runcomfy.net
/
.runcomfy.com
域名下的生成文件下载到
--output-dir
目录。按
Ctrl-C
会在退出前取消远程请求。

Security & Privacy

安全与隐私

  • Token storage:
    runcomfy login
    writes the API token to
    ~/.config/runcomfy/token.json
    with mode 0600 (owner-only read/write). Set
    RUNCOMFY_TOKEN
    env var to bypass the file entirely in CI / containers.
  • Input boundary: the user prompt is passed as a JSON string to the CLI via
    --input
    . The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
  • Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
  • Outbound endpoints: only
    model-api.runcomfy.net
    (request submission) and
    *.runcomfy.net
    /
    *.runcomfy.com
    (download whitelist for generated outputs). No telemetry, no callbacks.
  • Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
  • 令牌存储
    runcomfy login
    会将API令牌写入
    ~/.config/runcomfy/token.json
    ,权限为0600(仅所有者可读写)。在CI/容器环境中,可设置环境变量
    RUNCOMFY_TOKEN
    绕过文件存储。
  • 输入边界:用户提示词通过
    --input
    以JSON字符串形式传递给CLI。CLI不会对提示词进行shell展开,而是直接通过HTTPS将JSON主体传输给模型API。提示词内容不存在shell注入风险。
  • 第三方内容:你传入的图片/遮罩/视频URL由RunComfy模型服务器获取,而非本地CLI。请将外部URL视为不可信;基于图片的提示词注入是所有图片编辑/视频编辑模型的已知风险。
  • 出站端点:仅允许访问
    model-api.runcomfy.net
    (提交请求)和
    *.runcomfy.net
    /
    *.runcomfy.com
    (下载生成内容的白名单)。无遥测,无回调。
  • 生成文件大小限制:CLI会终止任何超过2 GiB的单个下载,防止恶意或异常模型输出占满磁盘。