seedance-v2

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Seedance 2.0 Pro — Pro Pack on RunComfy

Seedance 2.0 Pro — RunComfy专业版

ByteDance Seedance 2.0 Pro — multimodal cinematic video generator with native lip-synced audio — hosted on the RunComfy Model API.
bash
npx skills add agentspace-so/runcomfy-skills --skill seedance-v2 -g
字节跳动Seedance 2.0 Pro——一款具备原生唇形同步音频的多模态电影级视频生成器,托管在RunComfy Model API上。
bash
npx skills add agentspace-so/runcomfy-skills --skill seedance-v2 -g

When to pick this model (vs siblings)

何时选择该模型(对比同类模型)

Seedance 2.0 Pro's distinct strength is multi-modal cinematic short-form: combine character images + scene videos + reference audio into one coherent shot. Pick it when fidelity to a reference identity / scene matters and you want native lip-sync.
You wantUse
Lip-synced spokesperson / dialogue adSeedance 2.0 Pro
Multi-modal references (image + video + audio)Seedance 2.0 Pro
Brand-consistent multi-language narrativeSeedance 2.0 Pro
Currently-#1 blind-vote video qualityHappyHorse 1.0
Audio-driven lip-sync from your own trackWan 2.7 (
audio_url
)
Motion editing on existing footageKling Video O1
Ultra-fast iterationLTX 2
If the user said "Seedance" / "Seedance 2" / "ByteDance video" explicitly, route here regardless.
Seedance 2.0 Pro的独特优势是多模态电影级短视频生成:可将人物图片、场景视频、参考音频融合为一个连贯镜头。当需要忠实还原参考人物/场景,且希望实现原生唇形同步时,选择该模型。
需求场景适用模型
带唇形同步的代言人/对话广告Seedance 2.0 Pro
多模态参考(图片+视频+音频)Seedance 2.0 Pro
品牌一致的多语言叙事视频Seedance 2.0 Pro
当前盲投排名第一的视频质量HappyHorse 1.0
基于自定义音轨的音频驱动唇形同步Wan 2.7(使用
audio_url
参数)
现有素材的运动编辑Kling Video O1
超快速迭代生成LTX 2
如果用户明确提到"Seedance" / "Seedance 2" / "ByteDance video",无论其他需求如何,均使用此模型。

Prerequisites

前置条件

  1. RunComfy CLI
    npm i -g @runcomfy/cli
  2. RunComfy account
    runcomfy login
    opens a browser device-code flow.
  3. CI / containers — set
    RUNCOMFY_TOKEN=<token>
    instead of
    runcomfy login
    .
  1. RunComfy CLI — 执行
    npm i -g @runcomfy/cli
    安装
  2. RunComfy账号 — 执行
    runcomfy login
    将打开浏览器设备码登录流程
  3. CI/容器环境 — 设置环境变量
    RUNCOMFY_TOKEN=<token>
    替代
    runcomfy login

Endpoints + input schema

接口与输入规范

bytedance/seedance-v2/pro

bytedance/seedance-v2/pro

FieldTypeRequiredDefaultNotes
prompt
stringyesCN ≤ 500 chars OR EN ≤ 1000 words.
image_url
arrayno
[]
0–9 references (JPEG/PNG/WebP/BMP/TIFF/GIF).
video_url
arrayno
[]
0–3 clips (MP4/MOV), 2–15s each.
audio_url
arrayno
[]
0–3 audio refs (WAV/MP3), 2–15s, < 15MB each.
aspect_ratio
enumno
adaptive
adaptive
,
16:9
,
9:16
,
4:3
,
3:4
,
1:1
,
21:9
.
duration
intno54–15 (whole seconds).
resolution
enumno
720p
480p
or
720p
.
generate_audio
boolnotrueIn-pass synchronized speech / SFX / music.
seed
intnoReproducibility.
字段类型是否必填默认值说明
prompt
字符串中文≤500字符 或 英文≤1000词
image_url
数组
[]
0–9个参考素材(支持JPEG/PNG/WebP/BMP/TIFF/GIF格式)
video_url
数组
[]
0–3个视频片段(MP4/MOV格式),每个时长2–15秒
audio_url
数组
[]
0–3个音频参考素材(WAV/MP3格式),每个时长2–15秒,文件大小<15MB
aspect_ratio
枚举值
adaptive
可选值:
adaptive
16:9
9:16
4:3
3:4
1:1
21:9
duration
整数54–15(单位:秒,需为整数)
resolution
枚举值
720p
可选值:
480p
720p
generate_audio
布尔值true生成同步的语音/音效/音乐
seed
整数用于生成结果的可复现性

How to invoke

调用方式

Default (text only, 5s, 720p with audio):
bash
runcomfy run bytedance/seedance-v2/pro \
  --input '{"prompt": "<user prompt>"}' \
  --output-dir <absolute/path>
Lip-synced ad with character reference (image-stable, text-evolves):
bash
runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "Medium close-up. The woman explains today'\''s special in a warm friendly tone, slow push-in, soft window light, gentle cafe ambience.",
    "image_url": ["https://.../barista-headshot.jpg"],
    "duration": 8,
    "aspect_ratio": "9:16"
  }' \
  --output-dir <absolute/path>
Multi-modal (image + video + audio refs):
bash
runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "Subject from image 1 walks through the café from video 1, voice tone matches audio 1.",
    "image_url": ["https://.../subject.jpg"],
    "video_url": ["https://.../cafe-locked-shot.mp4"],
    "audio_url": ["https://.../voice-ref.mp3"]
  }' \
  --output-dir <absolute/path>
The CLI submits, polls, fetches the result, downloads
*.runcomfy.net
/
*.runcomfy.com
URLs into
--output-dir
.
默认调用(仅文本输入,5秒时长,720p分辨率带音频):
bash
runcomfy run bytedance/seedance-v2/pro \
  --input '{"prompt": "<用户提示词>"}' \
  --output-dir <绝对路径>
带人物参考的唇形同步广告(人物形象稳定,文本驱动内容变化):
bash
runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "中近景镜头。女士用温暖友好的语气介绍今日特供,镜头缓慢推进,柔和的窗边光线,温馨的咖啡馆氛围。",
    "image_url": ["https://.../barista-headshot.jpg"],
    "duration": 8,
    "aspect_ratio": "9:16"
  }' \
  --output-dir <绝对路径>
多模态调用(图片+视频+音频参考):
bash
runcomfy run bytedance/seedance-v2/pro \
  --input '{
    "prompt": "图片1中的人物穿过视频1中的咖啡馆,语气与音频1匹配。",
    "image_url": ["https://.../subject.jpg"],
    "video_url": ["https://.../cafe-locked-shot.mp4"],
    "audio_url": ["https://.../voice-ref.mp3"]
  }' \
  --output-dir <绝对路径>
CLI会提交请求、轮询状态、获取结果,并将
*.runcomfy.net
/
*.runcomfy.com
地址的内容下载到
--output-dir
指定的目录。

Prompting — what actually works

提示词撰写技巧——有效方法

Image vs text division. This is the single most important rule. Stable identity (face, costume, brand mark, logo) → put in
image_url
. Evolving narrative (action, mood, lighting, camera) → put in
prompt
. Trying to verbally describe a face in detail wastes tokens and produces drift.
Camera + motion in plain language. "Medium close-up", "slow push-in", "handheld follow", "locked-off wide" all work as directives. Combine:
"Medium close-up. Slow push-in over 3 seconds. Handheld, slight breathing motion."
Audio direction with
generate_audio: true
— say the tone:
"warm friendly conversational"
,
"calm instructional"
,
"crisp newsroom delivery"
. For ambient:
"gentle cafe chatter, distant traffic, no foreground music"
.
Reference media specs — videos must be 2–15s; audio must be ≤15MB and 2–15s. Out-of-range files reject. Match aspect ratio of refs to your output to avoid crops.
Anti-patterns:
  • Mixing radically different aesthetic refs (watercolor + photoreal) → confuses.
  • Conflicting style cues in prompt → simplify by removing contradictions.
  • Trying to describe stable identity verbally → use
    image_url
    instead.
  • Asking for >15s clips → 422; segment into multiple calls.
图片与文本的分工:这是最重要的规则。稳定的元素(人脸、服装、品牌标识、Logo)→ 放入
image_url
。动态叙事内容(动作、情绪、光线、镜头运动)→ 写入
prompt
。试图用文字详细描述人脸会浪费token并导致结果偏差。
用通俗语言描述镜头与运动:"中近景"、"缓慢推进"、"手持跟拍"、"固定广角"等表述均可作为指令。组合示例:
"中近景镜头。3秒内缓慢推进。手持拍摄,轻微呼吸感晃动。"
开启
generate_audio: true
时的音频指令
:描述语气,例如
"温暖友好的对话语气"
、"冷静的教学语气"、"清晰的新闻播报语气"。对于环境音:
"轻柔的咖啡馆交谈声,远处的车流声,无前景音乐"
参考素材规格:视频时长需为2–15秒;音频文件大小≤15MB且时长2–15秒。超出范围的文件会被拒绝。参考素材的宽高比需与输出匹配,避免裁剪。
反模式:
  • 混合风格差异极大的参考素材(水彩风格+写实风格)→ 会让模型混淆
  • 提示词中包含冲突的风格线索→ 简化提示词,移除矛盾内容
  • 试图用文字描述稳定元素→ 改用
    image_url
  • 请求生成超过15秒的视频→ 会返回422错误;可拆分为多次调用

Where it shines

优势场景

Use caseWhy Seedance 2.0 Pro
Spokesperson / dialogue adsNative in-pass lip-sync, no separate TTS step
Brand-consistent multi-language narrativesImage refs hold identity; text drives translation
Cinematic short-form film previsCamera-shot grammar + multi-modal refs
Ad creatives with reference music / VO toneAudio refs guide voice / mood without locking lip-sync
Reproducible variant testingSeed control + fixed schema
使用场景选择Seedance 2.0 Pro的原因
代言人/对话广告原生内联唇形同步,无需单独的TTS步骤
品牌一致的多语言叙事视频图片参考保持人物一致性;文本驱动翻译内容
电影级短视频前期预演支持镜头语法+多模态参考
带参考音乐/旁白语气的广告创意音频参考引导语音/情绪,无需锁定唇形同步
可复现的变体测试支持Seed控制+固定规范

Sample prompts (verified to produce strong results)

验证有效的示例提示词

Default playground example:
Golden hour on a quiet cafe terrace: a barista wipes the counter, then
looks up and explains today's special in a friendly tone, natural
lip-sync. Medium close-up, slow push-in; warm side light, soft bokeh
through glass, gentle cafe ambience and subtle film grain.
Multi-modal lip-sync (text + image):
Same person as image 1 in a softly-lit recording booth, leaning into
the mic, says: "We just shipped the biggest update of the year."
Calm conversational tone. Medium close-up, locked tripod, shallow DOF,
warm key light from camera-left.
默认演示示例:
宁静咖啡馆露台的黄金时刻:咖啡师擦拭柜台,然后抬头用友好的语气介绍今日特供,自然唇形同步。中近景镜头,缓慢推进;温暖的侧光,透过玻璃的柔和虚化背景,温馨的咖啡馆氛围与细微的胶片颗粒感。
多模态唇形同步(文本+图片):
与图片1中相同的人物在光线柔和的录音棚内,凑近麦克风说:"我们刚刚发布了本年度最大的更新。"语气冷静且自然。中近景镜头,三脚架固定拍摄,浅景深,来自镜头左侧的暖色主光。

Limitations

局限性

  • Duration 4–15s — no longer clips on this endpoint.
  • Resolution ceiling 720p on the playground variant.
  • Reference media specs — videos / audio must be 2–15s; audio < 15MB.
  • Lip-sync quality — depends on prompt clarity; not guaranteed perfect under all conditions.
  • No
    @
    -syntax for character binding
    — relies on image refs + prompt alignment.
  • 时长限制4–15秒:此接口不支持更长的视频片段
  • 分辨率上限720p:演示版本的分辨率最高为720p
  • 参考素材规格限制:视频/音频时长需为2–15秒;音频文件大小<15MB
  • 唇形同步质量:取决于提示词的清晰度;无法保证所有场景下都完美
  • @
    语法绑定人物
    :依赖图片参考+提示词对齐

Exit codes

退出码

codemeaning
0success
64bad CLI args
65bad input JSON / schema mismatch
69upstream 5xx
75retryable: timeout / 429
77not signed in or token rejected
代码含义
0成功
64CLI参数错误
65输入JSON错误/规范不匹配
69上游服务5xx错误
75可重试:超时/429限流
77未登录或令牌被拒绝

How it works

工作原理

The skill invokes
runcomfy run bytedance/seedance-v2/pro
with a JSON body matching the schema. The CLI POSTs to
https://model-api.runcomfy.net/v1/models/bytedance/seedance-v2/pro
, polls the request, fetches the result, and downloads any
.runcomfy.net
/
.runcomfy.com
URL into
--output-dir
.
Ctrl-C
cancels the remote request before exit.
该技能通过符合规范的JSON请求体调用
runcomfy run bytedance/seedance-v2/pro
。CLI会向
https://model-api.runcomfy.net/v1/models/bytedance/seedance-v2/pro
发送POST请求,轮询请求状态,获取结果,并将所有
.runcomfy.net
/
.runcomfy.com
地址的内容下载到
--output-dir
指定的目录。按下
Ctrl-C
会在退出前取消远程请求。

Security & Privacy

安全与隐私

  • Token storage:
    runcomfy login
    writes the API token to
    ~/.config/runcomfy/token.json
    with mode 0600 (owner-only read/write). Set
    RUNCOMFY_TOKEN
    env var to bypass the file entirely in CI / containers.
  • Input boundary: the user prompt is passed as a JSON string to the CLI via
    --input
    . The CLI does NOT shell-expand the prompt; it transmits the JSON body directly to the Model API over HTTPS. No shell injection surface from prompt content.
  • Third-party content: image / mask / video URLs you pass are fetched by the RunComfy model server, not by the CLI on your machine. Treat external URLs as untrusted; image-based prompt injection is a known risk for any image-edit / video-edit model.
  • Outbound endpoints: only
    model-api.runcomfy.net
    (request submission) and
    *.runcomfy.net
    /
    *.runcomfy.com
    (download whitelist for generated outputs). No telemetry, no callbacks.
  • Generated-file size cap: the CLI aborts any single download > 2 GiB to prevent disk-fill from a malicious or runaway model output.
  • 令牌存储
    runcomfy login
    会将API令牌写入
    ~/.config/runcomfy/token.json
    ,权限设置为0600(仅所有者可读写)。在CI/容器环境中,可设置环境变量
    RUNCOMFY_TOKEN
    完全绕过文件存储。
  • 输入边界:用户提示词通过
    --input
    以JSON字符串形式传递给CLI。CLI不会对提示词进行Shell展开;会直接通过HTTPS将JSON请求体传输给模型API。提示词内容不存在Shell注入风险。
  • 第三方内容:你传入的图片/遮罩/视频URL由RunComfy模型服务器获取,而非本地CLI。请将外部URL视为不可信内容;基于图片的提示词注入是所有图片/视频编辑模型的已知风险。
  • 出站接口:仅允许访问
    model-api.runcomfy.net
    (提交请求)和
    *.runcomfy.net
    /
    *.runcomfy.com
    (下载生成结果的白名单地址)。无遥测数据,无回调。
  • 生成文件大小限制:CLI会终止任何超过2 GiB的单个文件下载,防止恶意或异常模型输出占满磁盘。