language-swap

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

/pika:language-swap

Translate and dub a video into another language while preserving the original speaker's voice. Pipeline: dub (one worker call) → lipsync (default ON) → burn target-language captions or bilingual captions.

The dubbing worker does the heavy lifting in a single call: it transcribes, translates, preserves each speaker's voice server-side (no separate clone step), and returns a fully A/V-synced video — so there is no manual transcribe/clone/TTS/replace chain to manage and no duration-drift handling to do by hand.

将视频翻译并配音为另一种语言，同时保留原说话者的音色。流程：配音（单次调用）→ 唇形同步（默认开启）→ 添加目标语言字幕或双语字幕（烧录到视频中）。

配音工具会在单次调用中完成所有核心工作：转录、翻译、在服务器端保留每位说话者的音色（无需单独的克隆步骤），并返回完全音视频同步的视频——因此无需手动管理转录/克隆/TTS/替换的流程链，也无需手动处理时长偏移问题。

Segmented / multi-language dub (per-range languages)

分段/多语言配音（按时间范围设置语言）

Use this when the user wants different languages on different parts of one video (e.g. first half Spanish, second half Japanese), or wants to translate only some sections and keep the rest in the original voice. Both are the same thing: a timeline of segments, each tagged with a language; any uncovered range keeps the original audio.

mcp__plugin_pika_pika__dub_video

takes a

segments

plan instead of

target_language

(pass exactly one — they are mutually exclusive):

mcp__plugin_pika_pika__dub_video(source_video_url=<video_url>, segments=[
  {start_s: 0,  end_s: 30, target_language: "es"},
  {start_s: 30, end_s: 60, target_language: "ja"}
])

How to build the plan: the user needs to know where the content is before they can pick ranges, so transcribe first — extract the audio with

mcp__claude_ai_pika__extract_audio_from_video

, then

mcp__claude_ai_pika__transcribe_audio(audio=<audio_url>, timestamps=true)

, show the user the timestamped segments, and let them say which time range goes to which language. Then assemble

segments[]

(seconds, ordered, non-overlapping) and make ONE

dub_video

call. There is no separate "video understanding" tool — the timestamped transcript is the understanding step.

Behavior of the segmented path:

Shared voice across all segments. The source speaker is cloned once and every segment — in every language — is spoken in that same cloned voice, then the clone is recycled, all inside the one
```
dub_video
```
call. You never clone or delete a voice yourself.
Keep-original. Any time range NOT covered by a segment plays the original audio (voice + background) untouched. To translate only parts of a video, list only the parts you want translated.
Length-locked. Output stays exactly the source length (each dubbed range is speed-fit to its window), so boundaries line up with the original timeline.
Provider. Mixed-language-per-range always uses the voice-cloning route automatically; the single-call whole-video dubbing route can't mix languages per range, so don't force a single-call provider for a segmented plan. Every covered language must be supported on the voice-cloning route — if one isn't, surface the error and consult
```
references/language-coverage.md
```
.
Result. Same dubbed-video result;
```
target_language
```
echoes the covered languages comma-joined (e.g.
```
"spa,jpn"
```
), and no single
```
transcript_language
```
is returned (the track is multi-language). Lipsync (Step 2, default ON, ≤5 min) still runs on the whole dubbed video. For captions (Step 3), use the returned multi-language
```
subtitles[]
```
in
```
caption_mode="manual"
```
; auto re-transcription can't pick a single language for a mixed track.

mcp__plugin_pika_pika__dub_video

rejects

segments

(older deployment without segmented support), fall back to dubbing each range single-language and concatenating — but prefer the one-call segmented path when available.

当用户希望同一视频的不同部分使用不同语言（例如前半部分西班牙语，后半部分日语），或希望仅翻译部分片段并保留其余部分原音时，使用此功能。两种场景的操作方式相同：创建一个时间片段列表，每个片段标记对应语言；未覆盖的时间范围将保留原音频。

mcp__plugin_pika_pika__dub_video

需要传入

segments

配置而非

target_language

（二选一，不可同时传入）：

mcp__plugin_pika_pika__dub_video(source_video_url=<video_url>, segments=[
  {start_s: 0,  end_s: 30, target_language: "es"},
  {start_s: 30, end_s: 60, target_language: "ja"}
])

如何创建配置：用户需要先了解内容的时间分布才能选择范围，因此先进行转录——使用

mcp__claude_ai_pika__extract_audio_from_video

提取音频，再调用

mcp__claude_ai_pika__transcribe_audio(audio=<audio_url>, timestamps=true)

获取带时间戳的转录内容，展示给用户后让用户指定每个时间范围对应的语言。然后组装

segments[]

（单位为秒，有序且不重叠），并调用一次

dub_video

。没有单独的“视频理解”工具——带时间戳的转录内容就是理解步骤。

分段模式的行为：

全片段共享音色：源说话者的音色仅克隆一次，所有片段（无论何种语言）都使用该克隆音色，之后自动回收克隆音色，整个过程都在单次
```
dub_video
```
调用内完成。无需手动克隆或删除音色。
保留原音：任何未被片段覆盖的时间范围将播放原始音频（人声+背景音）。若仅需翻译部分视频，只需列出需要翻译的片段即可。
锁定时长：输出视频与源视频时长完全一致（每个配音片段会适配对应时间窗口的速度），因此片段边界与原时间轴对齐。
服务提供商：按时间范围混合语言的模式会自动使用音色克隆路径；单次调用全视频配音的路径不支持按时间范围混合语言，因此不要强制为分段配置使用单次调用的提供商。所有涉及的语言必须在音色克隆路径中支持——若某语言不支持，需显示错误并参考
```
references/language-coverage.md
```
。
结果：生成的配音视频与普通配音视频一致；
```
target_language
```
会返回所有覆盖语言的逗号拼接字符串（例如
```
"spa,jpn"
```
），且不会返回单一的
```
transcript_language
```
（因为音轨是多语言的）。唇形同步（步骤2，默认开启，≤5分钟）仍会在整个配音视频上运行。对于字幕（步骤3），需使用返回的多语言
```
subtitles[]
```
并设置
```
caption_mode="manual"
```
；自动重新转录无法为混合音轨选择单一语言。

若

mcp__plugin_pika_pika__dub_video

拒绝

segments

参数（旧版本部署不支持分段功能），可退化为为每个范围单独配音后拼接——但优先使用支持单次调用的分段路径。

Behavior defaults

默认行为

Target language: required via
```
--to <language>
```
. Prefer language codes:
```
es
```
,
```
fr
```
,
```
ja
```
,
```
de
```
,
```
pt-BR
```
,
```
zh-Hans
```
. The dubbing worker accepts ISO/BCP-47-like tags and normalizes script/region subtags before calling ElevenLabs (for example
```
zh-Hans
```
→
```
zh
```
;
```
zh-Hant-TW
```
→
```
zh
```
).
Lipsync: ON by default — re-matches the speaker's mouth to the translated audio (fal sync-lipsync; the full-video lip-matcher, distinct from the portrait-image animator). Pass
```
--no-lipsync
```
to skip it when the source has no on-camera face or to avoid the meaningful cost (~$4/min on the sync-2-pro tier). Applies only to videos ≤5 min —
```
edit_lipsync
```
hard-caps at 300 s upstream, so longer sources auto-skip lipsync (see Step 2); the dub itself has no length limit.
BGM / background music: kept by default — the dub lays the translated voice over the original music / SFX bed. Pass
```
--no-bgm
```
for a translate-only output: the worker drops the original music and keeps only the translated speech (
```
drop_background_audio=true
```
).
Captions: target-language captions are burned by default. When the user asks for bilingual / dual subtitles, burn the target-language (translated) row on top and the source-language (original) row below it — after dubbing, the translated speech is what's actually being said, so it's the primary row; the original is the secondary reference.
Bilingual captions: enable when the user passes
```
--bilingual-subtitles
```
or asks for "bilingual subtitles", "dual subtitles", "two-language captions", "original + translated subtitles", "双语字幕", or "原文+译文字幕".
Language coverage: if language support is questioned or a language-related upstream error occurs, consult
```
references/language-coverage.md
```
. Do not proactively surface provider-specific language-list details in normal user replies.

目标语言：必须通过
```
--to <language>
```
指定。优先使用语言代码：
```
es
```
、
```
fr
```
、
```
ja
```
、
```
de
```
、
```
pt-BR
```
、
```
zh-Hans
```
。配音工具接受类似ISO/BCP-47的标签，并在调用ElevenLabs前标准化脚本/地区子标签（例如
```
zh-Hans
```
→
```
zh
```
；
```
zh-Hant-TW
```
→
```
zh
```
）。
唇形同步：默认开启——将说话者的嘴型与翻译后的音频重新匹配（fal sync-lipsync；全视频唇形匹配器，与肖像图像动画器不同）。当源视频中没有镜头内人脸或为避免可观成本（sync-2-pro tier约$4/分钟）时，可传入
```
--no-lipsync
```
跳过此步骤。仅适用于≤5分钟的视频——上游
```
edit_lipsync
```
硬限制为300秒，因此更长的源视频会自动跳过唇形同步（见步骤2）；配音本身无时长限制。
背景音乐（BGM）：默认保留——配音会将翻译后的人声叠加在原始音乐/音效轨道上。传入
```
--no-bgm
```
可仅输出翻译内容：工具会移除原始音乐，仅保留翻译后的语音（
```
drop_background_audio=true
```
）。
字幕：默认添加目标语言字幕（烧录到视频中）。当用户要求双语/双字幕时，将目标语言（翻译后）字幕放在上方，源语言（原始）字幕放在下方——配音后，翻译后的语音是实际播放内容，因此作为主字幕；原始字幕作为辅助参考。
双语字幕：当用户传入
```
--bilingual-subtitles
```
或要求“bilingual subtitles”、“dual subtitles”、“two-language captions”、“original + translated subtitles”、“双语字幕”或“原文+译文字幕”时启用。
语言覆盖范围：若对语言支持有疑问或出现与语言相关的上游错误，请参考
```
references/language-coverage.md
```
。在常规用户回复中不要主动展示特定提供商的语言列表细节。

State variables produced and consumed

生成和使用的状态变量

```
video_url
```
: input — from positional arg
```
source_input_url
```
: original positional URL — preserved for diagnostics if
```
video_url
```
is rehosted
```
target_language
```
: text — from
```
--to <language>
```
```
with_lipsync
```
: boolean — defaults true; false only when
```
--no-lipsync
```

no_bgm

: boolean — true when

--no-bgm

(maps to

drop_background_audio=true

)

```
bilingual_subtitles
```
: boolean — true when the user asks for bilingual / dual subtitles
```
dubbed_video_url
```
: dubbed, A/V-synced video — produced by Step 1
```
dub_subtitles
```
: optional target-language timed subtitles from the dub result — consumed by Step 3
```
source_subtitles
```
: optional source-language timed subtitles from the dub result — consumed by Step 3 for bilingual captions
```
dub_transcript_srt
```
: optional target-language SRT from the dub result — returned for review/debugging
```
source_transcript_srt
```
: optional source-language SRT from the dub result — returned for review/debugging
```
source_transcript_language
```
: optional source-language code from the dub result
```
lipsynced_video_url
```
: dubbed video with mouth re-matched — produced by Step 2 (when lipsync runs)
```
caption_target_video_url
```
: final visual video URL before captions are burned
```
final_video_url
```
: video with target-language captions burned in — produced by Step 3

```
video_url
```
：输入——来自位置参数
```
source_input_url
```
：原始位置URL——若
```
video_url
```
被重新托管，保留用于诊断
```
target_language
```
：文本——来自
```
--to <language>
```
```
with_lipsync
```
：布尔值——默认true；仅当传入
```
--no-lipsync
```
时为false
```
no_bgm
```
：布尔值——当传入
```
--no-bgm
```
时为true（对应
```
drop_background_audio=true
```
）
```
bilingual_subtitles
```
：布尔值——当用户要求双语/双字幕时为true
```
dubbed_video_url
```
：配音后的音视频同步视频——步骤1生成
```
dub_subtitles
```
：可选，来自配音结果的目标语言带时间戳字幕——步骤3使用
```
source_subtitles
```
：可选，来自配音结果的源语言带时间戳字幕——步骤3用于双语字幕
```
dub_transcript_srt
```
：可选，来自配音结果的目标语言SRT文件——返回供审核/调试
```
source_transcript_srt
```
：可选，来自配音结果的源语言SRT文件——返回供审核/调试
```
source_transcript_language
```
：可选，来自配音结果的源语言代码
```
lipsynced_video_url
```
：已完成嘴型匹配的配音视频——步骤2生成（当唇形同步运行时）
```
caption_target_video_url
```
：添加字幕前的最终可视化视频URL
```
final_video_url
```
：已添加目标语言字幕的视频——步骤3生成

Step 0 — Parse input

步骤0 — 解析输入

Required:

Positional
```
video_url
```
— MUST be
```
https://...
```
```
--to <language>
```
— target language (free-text or BCP-47 code)

Optional:

```
--no-lipsync
```
— skip the default mouth-matching step.
```
--no-bgm
```
— translate-only output; drop the original music/SFX bed.
```
--bilingual-subtitles
```
— burn source-language + target-language subtitle rows.

Infer

bilingual_subtitles=true

from user wording even if the explicit flag is absent.

--to

is missing, STOP and prompt the user — UNLESS the user wants different languages on different parts, or to translate only some sections: that is the per-range segmented path (see "Segmented / multi-language dub" above), which uses a

segments

plan instead of

--to

For the segmented path, first build the time-range plan: extract the audio with

mcp__claude_ai_pika__extract_audio_from_video

, then transcribe it with timestamps via

mcp__claude_ai_pika__transcribe_audio(audio=<audio_url>, timestamps=true)

, show the user the timestamped segments, and capture which time range maps to which language into

segments[]

Outputs:

video_url

target_language

with_lipsync

(default true),

no_bgm

(default false),

bilingual_subtitles

(default false).

必填项：

位置参数
```
video_url
```
— 必须为
```
https://...
```
```
--to <language>
```
— 目标语言（自由文本或BCP-47代码）

可选项：

```
--no-lipsync
```
— 跳过默认的嘴型匹配步骤。
```
--no-bgm
```
— 仅输出翻译内容；移除原始音乐/音效轨道。
```
--bilingual-subtitles
```
— 添加源语言+目标语言双行字幕。

即使没有显式传入参数，也可根据用户措辞推断

bilingual_subtitles=true

。

若缺少

--to

，请停止操作并提示用户——除非用户希望不同部分使用不同语言，或仅翻译部分片段：这属于按时间范围的分段模式（见上文“分段/多语言配音”），该模式使用

segments

配置而非

--to

。

对于分段模式，首先创建时间范围配置：使用

mcp__claude_ai_pika__extract_audio_from_video

提取音频，再调用

mcp__claude_ai_pika__transcribe_audio(audio=<audio_url>, timestamps=true)

获取带时间戳的转录内容，展示给用户后将时间范围与语言的映射关系存入

segments[]

。

输出：

video_url

、

target_language

、

with_lipsync

（默认true）、

no_bgm

（默认false）、

bilingual_subtitles

（默认false）。

Step 1 — Dub the video (state:

dubbed_video_url

)

步骤1 — 视频配音（状态：

dubbed_video_url

）

Call

mcp__plugin_pika_pika__dub_video

with:

```
source_video_url
```
—
```
<video_url>
```

target_language

—

<target_language>

(ISO/BCP-47-like tag, e.g.

es

pt-BR

zh-Hans

)

```
source_language
```
—
```
"auto"
```
```
drop_background_audio
```
—
```
true
```
only when
```
no_bgm
```
is set; otherwise omit (keeps the original music bed)

In Claude plugin installs the tool is exposed as

mcp__plugin_pika_pika__dub_video

. If your host exposes the same Pika server under a different local namespace, call that fully-qualified local tool with the same arguments. The Claude.ai connector surface may lag this plugin-only tool, so do not assume the connector prefix has it.

mcp__plugin_pika_pika__dub_video

is worker-backed: if the response comes back as

{task_id, status}

, poll

mcp__plugin_pika_pika__task_status

until

completed

, then read the dubbed video from the result (

video_url

for a video source;

audio_url

for an audio source). Also capture optional

subtitles[]

transcript_srt

, and

transcript_language

— these are target-language transcript metadata the dub worker produced, consumed in Step 3.

For bilingual captions, also capture optional

source_subtitles[]

source_transcript_srt

, and

source_transcript_language

. These source-language transcript fields are best-effort. The dubbed media is still valid when transcript fields are absent.

Source not worker-fetchable: if

mcp__plugin_pika_pika__dub_video

fails because the source URL cannot be fetched — especially HTTP

4xx

, hotlink protection, UA-gated hosts (Wikimedia/news CDNs), or "Access Denied" errors — do not keep retrying the same call. Rehost first:

Download the source bytes in the client/host environment using a normal browser/download path or an HTTP client with a real user-agent.
Call
```
mcp__claude_ai_pika__upload_asset
```
with the downloaded filename, MIME type, and exact byte size, then upload the bytes to the returned presigned URL.
Set
```
source_input_url = <original URL>
```
and replace
```
video_url
```
with the returned Pika CDN
```
public_url
```
. Do not construct CDN URLs manually.
Retry Step 1 once against the Pika CDN URL. All later steps must use the updated
```
video_url
```
.

If the client/host also cannot download the source bytes, stop and tell the user the host blocks direct fetch; ask them to upload the file or provide a different URL.

Outputs:

dubbed_video_url

dub_subtitles

source_subtitles

dub_transcript_srt

source_transcript_srt

source_transcript_language

调用

mcp__plugin_pika_pika__dub_video

，传入：

```
source_video_url
```
—
```
<video_url>
```
```
target_language
```
—
```
<target_language>
```
（类似ISO/BCP-47的标签，例如
```
es
```
、
```
pt-BR
```
、
```
zh-Hans
```
）
```
source_language
```
—
```
"auto"
```
```
drop_background_audio
```
— 仅当
```
no_bgm
```
为true时设为
```
true
```
；否则省略（保留原始音乐轨道）

在Claude插件安装中，该工具暴露为

mcp__plugin_pika_pika__dub_video

。若你的主机在不同的本地命名空间下暴露相同的Pika服务器，请调用该完全限定的本地工具并传入相同参数。Claude.ai连接器可能滞后于该插件专属工具，因此不要假设连接器前缀包含此工具。

mcp__plugin_pika_pika__dub_video

由工具后台支持：若返回

{task_id, status}

，需轮询

mcp__plugin_pika_pika__task_status

直到状态为

completed

，然后从结果中读取配音后的视频（视频源对应

video_url

；音频源对应

audio_url

）。同时捕获可选的

subtitles[]

、

transcript_srt

和

transcript_language

——这些是配音工具生成的目标语言转录元数据，将在步骤3中使用。

对于双语字幕，还需捕获可选的

source_subtitles[]

、

source_transcript_srt

和

source_transcript_language

。这些源语言转录字段为尽力返回。即使转录字段缺失，配音后的媒体仍然有效。

源文件无法被工具后台获取：若

mcp__plugin_pika_pika__dub_video

因无法获取源URL而失败——尤其是HTTP

4xx

、热链保护、UA限制的主机（维基媒体/新闻CDN）或“Access Denied”错误——请不要重复调用相同请求。先重新托管：

在客户端/主机环境中使用常规浏览器/下载路径或带真实用户代理的HTTP客户端下载源文件字节。
调用
```
mcp__claude_ai_pika__upload_asset
```
，传入下载的文件名、MIME类型和精确字节大小，然后将字节上传到返回的预签名URL。
设置
```
source_input_url = <原始URL>
```
，并将
```
video_url
```
替换为返回的Pika CDN
```
public_url
```
。不要手动构造CDN URL。
使用Pika CDN URL重试步骤1一次。后续所有步骤必须使用更新后的
```
video_url
```
。

若客户端/主机也无法下载源文件字节，请停止操作并告知用户主机阻止直接获取；请用户上传文件或提供其他URL。

输出：

dubbed_video_url

、

dub_subtitles

、

source_subtitles

、

dub_transcript_srt

、

source_transcript_srt

、

source_transcript_language

。

Step 2 — Lipsync (state:

lipsynced_video_url

)

步骤2 — 唇形同步（状态：

lipsynced_video_url

）

Default ON. Skip entirely when

--no-lipsync

is passed (then Step 3 captions

dubbed_video_url

directly).

Hard 5-minute cap — check duration before calling.

mcp__claude_ai_pika__edit_lipsync

enforces a 300-second (5-minute) audio limit upstream (sync.so) and rejects anything longer with

invalid_input

before billing; every

variant

tier shares the same cap, so falling back through tiers does NOT help. If the dubbed video's

duration_seconds

(returned by Step 1) is > 300, skip lipsync entirely, go straight to Step 3 captioning

dubbed_video_url

, and tell the user lipsync isn't available past 5 minutes (the dub itself works at any length). Only run the lipsync call below when

duration_seconds ≤ 300

Cost heads-up first. Lipsync is the dominant cost (~$4/min on the v2-pro tier). Before calling it, estimate from the dubbed video's

duration_seconds

(returned by Step 1) —

ceil(duration_seconds / 60) × $4

— and send the user a one-line heads-up, e.g. "Lipsync on — ~2 min video, est. ~$8 (pass

--no-lipsync

to skip). Starting now." Then proceed straight into the call; this is a heads-up, not an approval gate.

Call

mcp__claude_ai_pika__edit_lipsync(video_url=<dubbed_video_url>)

with no

audio_url

— the worker syncs to the dubbed video's own embedded translated audio. Do not extract the audio just to feed it back in. (

variant

defaults to

v2-pro

, with

sync-3

v2

as fallbacks.)

Outputs:

lipsynced_video_url

(read from

url

of response). When this step runs, Step 3 captions this video, not

dubbed_video_url

— otherwise the lip-matching is dropped.

默认开启。当传入

--no-lipsync

时完全跳过（此时步骤3直接为

dubbed_video_url

添加字幕）。

5分钟硬限制——调用前检查时长。

mcp__claude_ai_pika__edit_lipsync

上游（sync.so）强制限制音频时长为300秒（5分钟），超过时长会返回

invalid_input

且不会产生费用；所有

variant

tier共享此限制，因此切换tier无法解决问题。若步骤1返回的配音视频

duration_seconds

> 300，请完全跳过唇形同步，直接进入步骤3为

dubbed_video_url

添加字幕，并告知用户超过5分钟无法使用唇形同步（配音本身无时长限制）。仅当

duration_seconds ≤ 300

时才运行以下唇形同步调用。

先提示成本。唇形同步是主要成本（v2-pro tier约$4/分钟）。调用前，根据步骤1返回的配音视频

duration_seconds

估算成本——

ceil(duration_seconds / 60) × $4

——并向用户发送一行提示，例如“已开启唇形同步——约2分钟视频，预估成本约$8（传入

--no-lipsync

可跳过）。开始处理。”然后直接进行调用；这是提示而非审批环节。

调用

mcp__claude_ai_pika__edit_lipsync(video_url=<dubbed_video_url>)

，无需传入

audio_url

——工具会自动与配音视频中嵌入的翻译音频同步。无需提取音频再传入。（

variant

默认值为

v2-pro

，可回退到

sync-3

v2

。）

输出：

lipsynced_video_url

（从响应的

url

字段读取）。当运行此步骤时，步骤3将为此视频添加字幕，而非

dubbed_video_url

——否则嘴型匹配效果会丢失。

Step 3 — Burn target-language captions (state:

final_video_url

)

步骤3 — 添加目标语言字幕（状态：

final_video_url

）

Caption the final video so the output carries readable subtitles (matches the common "translate + subtitle" expectation). Set

caption_target_video_url

lipsynced_video_url

when lipsync ran (the default), or

dubbed_video_url

when

--no-lipsync

skipped it.

If this request is part of a Double video / split-screen comparison flow, build that Double video first and set

caption_target_video_url

to the final composed video URL. Do not burn captions onto only one panel before the Double video is composed; the bilingual caption burn should happen once, on the final visual output.

Call

mcp__claude_ai_pika__add_captions

once on

caption_target_video_url

When

bilingual_subtitles=true

, use manual bilingual mode if both tracks are available: call

mcp__claude_ai_pika__add_captions(video_url=<caption_target_video_url>, caption_mode="manual", subtitles=<dub_subtitles>, secondary_subtitles=<source_subtitles>, language=<target_language>, secondary_language=<source_transcript_language if available>, secondary_subtitles_position="below", style="branded-space-mono", position="bottom")

. The target-language (translated) row is the primary

subtitles

and renders on top; the source-language (original) row is the secondary reference and renders below it (

secondary_subtitles_position="below"

) — after dubbing the translated speech is what's actually spoken, so it leads. It works for every dub worker provider branch as long as

mcp__plugin_pika_pika__dub_video

returns both subtitle tracks.

bilingual_subtitles=true

but

source_subtitles

is missing, fall back to target-language captions only and tell the user the source transcript was unavailable from the dubbing provider. Do not invent a source-language row by retranscribing the final dubbed audio; that audio is already in the target language.

For target-language-only captions, prefer the target-language subtitles the dub worker already returned: if

dub_subtitles

is non-empty, call

mcp__claude_ai_pika__add_captions(video_url=<caption_target_video_url>, caption_mode="manual", subtitles=<dub_subtitles>, language=<target_language>, style="branded-space-mono", position="bottom")

. Manual mode skips a duplicate transcription pass and preserves the dubbing provider's target-language text.

dub_subtitles

is missing, empty, or rejected by

mcp__claude_ai_pika__add_captions

, fall back to auto: call

mcp__claude_ai_pika__add_captions(video_url=<caption_target_video_url>, caption_mode="auto", language=<target_language>, style="branded-space-mono", position="bottom")

. Auto mode re-transcribes the dubbed audio; use it only as the fallback because it costs extra time and can introduce CJK/proper-noun drift.

Use

style="branded-space-mono"

unless the user asks for a punchier style (

tiktok

hormozi

karaoke

). Skip this step only if the user explicitly asked for audio-only dubbing with no captions.

Outputs:

final_video_url

(read from

url

of response).

为最终视频添加字幕，使输出包含可读字幕（符合常见的“翻译+字幕”预期）。当唇形同步运行时（默认情况），将

caption_target_video_url

设置为

lipsynced_video_url

；当

--no-lipsync

跳过唇形同步时，设置为

dubbed_video_url

。

若此请求属于双视频/分屏对比流程，请先创建双视频，然后将

caption_target_video_url

设置为最终合成的视频URL。不要在合成双视频前仅为一个面板添加字幕；双语字幕应仅添加一次，在最终可视化输出上。

对

caption_target_video_url

调用一次

mcp__claude_ai_pika__add_captions

。

当

bilingual_subtitles=true

且两个字幕轨道都可用时，使用手动双语模式：调用

mcp__claude_ai_pika__add_captions(video_url=<caption_target_video_url>, caption_mode="manual", subtitles=<dub_subtitles>, secondary_subtitles=<source_subtitles>, language=<target_language>, secondary_language=<source_transcript_language if available>, secondary_subtitles_position="below", style="branded-space-mono", position="bottom")

。目标语言（翻译后）字幕作为主

subtitles

显示在上方；源语言（原始）字幕作为辅助参考显示在下方（

secondary_subtitles_position="below"

）——配音后翻译后的语音是实际播放内容，因此作为主字幕。只要

mcp__plugin_pika_pika__dub_video

返回两个字幕轨道，此模式适用于所有配音工具提供商分支。

若

bilingual_subtitles=true

但

source_subtitles

缺失，退化为仅添加目标语言字幕，并告知用户配音提供商无法提供源转录内容。不要通过重新转录最终配音音频来生成源语言字幕——该音频已经是目标语言。

对于仅目标语言字幕，优先使用配音工具已返回的目标语言字幕：若

dub_subtitles

非空，调用

mcp__claude_ai_pika__add_captions(video_url=<caption_target_video_url>, caption_mode="manual", subtitles=<dub_subtitles>, language=<target_language>, style="branded-space-mono", position="bottom")

。手动模式可避免重复转录，并保留配音提供商的目标语言文本。

若

dub_subtitles

缺失、为空或被

mcp__claude_ai_pika__add_captions

拒绝，退化为自动模式：调用

mcp__claude_ai_pika__add_captions(video_url=<caption_target_video_url>, caption_mode="auto", language=<target_language>, style="branded-space-mono", position="bottom")

。自动模式会重新转录配音音频；仅在必要时使用此回退模式，因为它会额外耗时且可能导致CJK/专有名词偏差。

除非用户要求更醒目的样式（

tiktok

hormozi

karaoke

），否则使用

style="branded-space-mono"

。仅当用户明确要求仅配音音频且不添加字幕时，才跳过此步骤。

输出：

final_video_url

（从响应的

url

字段读取）。

Step 4 — Return

步骤4 — 返回结果

Reply with

final_video_url

+ the translated transcript (from

dub_transcript_srt

/ the dub result) for user review.

Offer a bilingual-subtitle version. When this run burned target-language-only captions (

bilingual_subtitles=false

) and a source transcript is available (

source_subtitles

is non-empty), close the reply by asking whether the user also wants a dual-subtitle version, e.g. "Want a bilingual version with the original + translated subtitles stacked? I can add it." If they say yes, re-run Step 3 in bilingual manual mode on the same

caption_target_video_url

(the pre-caption visual video) — no re-dub or re-lipsync is needed, only the caption burn changes — and return the new

final_video_url

. Skip the offer when bilingual captions were already burned (

bilingual_subtitles=true

), or when

source_subtitles

is missing — without a source transcript a bilingual version can't be produced (the dubbed audio is already in the target language), so do not offer what can't be delivered.

回复时返回

final_video_url

+ 翻译后的转录内容（来自

dub_transcript_srt

/配音结果）供用户审核。

提供双语字幕版本选项。当本次运行仅添加了目标语言字幕（

bilingual_subtitles=false

）且存在源转录内容（

source_subtitles

非空）时，在回复末尾询问用户是否需要双字幕版本，例如“需要包含原文+译文字幕的双语版本吗？我可以添加。”若用户同意，在同一个

caption_target_video_url

（添加字幕前的可视化视频）上重新运行步骤3的手动双语模式——无需重新配音或重新唇形同步，仅需更改字幕添加方式——并返回新的

final_video_url

。当已添加双语字幕（

bilingual_subtitles=true

）或

source_subtitles

缺失时，跳过此选项——没有源转录内容无法生成双语版本（配音音频已经是目标语言），因此不要提供无法实现的选项。

Failure modes

失败模式

Class	Trigger	Mitigation	Fallback
Source URL not worker-fetchable	`mcp__plugin_pika_pika__dub_video` returns 403 / 4xx, hotlink / UA-gated fetch failure, or "Access Denied" for a public HTTPS URL	Download source bytes in the client/host environment, `mcp__claude_ai_pika__upload_asset` them to Pika, replace `video_url` with the Pika CDN URL, then retry Step 1 once	If local download also fails, ask the user to upload the file or provide a different URL
Extra target language	Target is Cantonese ( `yue` / `cantonese` / `zh-HK` ), Thai, Hebrew, Persian, Slovenian, Catalan, Norwegian Nynorsk, or Afrikaans	Supported — call `mcp__plugin_pika_pika__dub_video` with the target as usual; the original speaker's voice is kept	Background music isn't preserved for these languages (dubbed speech only)
Dub call fails (not fetchability)	`mcp__plugin_pika_pika__dub_video` errors for another reason — unsupported target language, provider/worker 5xx, `status: failed` from `mcp__plugin_pika_pika__task_status`	Surface the error to the user; if the message points at the language, check `references/language-coverage.md` and suggest a supported tag; otherwise suggest a retry. There is no manual chain to fall back to — dub is the single path	None — return the error, do not silently produce a non-dubbed video
Dub returns no speech	Silent video — nothing to translate	Surface to user: "no detectable speech in video — nothing to translate"	None
Original voice can't be kept	For the languages above, the source is too short or noisy to keep the original speaker's voice	Surface the error and ask the user for a cleaner / longer source clip	None — the dub fails rather than using a different voice
Lipsync source too long	Dubbed video >5 min — `mcp__claude_ai_pika__edit_lipsync` rejects with `invalid_input` (sync.so 300 s cap); all variant tiers share the cap so retrying won't help	Check `duration_seconds` from Step 1 first and skip lipsync when >300; caption `dubbed_video_url` directly and tell the user lipsync caps at 5 min	Dubbed video, no lip-match
Lipsync step fails	`mcp__claude_ai_pika__edit_lipsync` errors (no clear face track, provider 4xx)	Fall back through `variant` tiers (v2-pro → sync-3 → v2); if all fail, return the dubbed video without lip-matching and tell the user	Audio-replaced video, no lip-match
Captions wrong language	Step 3 auto-transcription mis-detects language	Pass explicit `language` tag; if `dub_subtitles` exists, use `caption_mode="manual"` with it instead of auto	Manual `subtitles[]`
Bilingual source row unavailable	User asked for bilingual subtitles but `source_subtitles` is absent	Use target-language captions and explain the source transcript was unavailable	Target-language captions only

类别	触发条件	缓解措施	回退方案
源URL无法被工具后台获取	`mcp__plugin_pika_pika__dub_video` 返回403/4xx、热链/UA限制获取失败，或公开HTTPS URL返回“Access Denied”	在客户端/主机环境下载源文件字节，使用 `mcp__claude_ai_pika__upload_asset` 上传到Pika，将 `video_url` 替换为Pika CDN URL，然后重试步骤1一次	若本地下载也失败，请用户上传文件或提供其他URL
特殊目标语言	目标语言为粤语（ `yue` / `cantonese` / `zh-HK` ）、泰语、希伯来语、波斯语、斯洛文尼亚语、加泰罗尼亚语、挪威尼诺斯克语或南非荷兰语	支持——按常规方式调用 `mcp__plugin_pika_pika__dub_video` 并传入目标语言；保留原说话者音色	这些语言不保留背景音乐（仅保留配音语音）
配音调用失败（非获取问题）	`mcp__plugin_pika_pika__dub_video` 因其他原因报错——不支持的目标语言、提供商/工具后台5xx、 `mcp__plugin_pika_pika__task_status` 返回 `status: failed`	向用户显示错误；若错误信息指向语言问题，检查 `references/language-coverage.md` 并建议使用支持的标签；否则建议重试。没有手动流程可回退——配音是唯一路径	无——返回错误，不要静默生成未配音的视频
配音返回无语音	视频无声音——无可翻译内容	向用户显示：“视频中未检测到语音——无可翻译内容”	无
无法保留原音色	对于上述特殊语言，源音频过短或噪音过大无法保留原说话者音色	显示错误并请用户提供更清晰/更长的源片段	无——配音失败，不会使用其他音色
唇形同步源过长	配音视频>5分钟—— `mcp__claude_ai_pika__edit_lipsync` 返回 `invalid_input` （sync.so 300秒限制）；所有variant tier共享此限制，重试无效	先检查步骤1返回的 `duration_seconds` ，当>300时跳过唇形同步；直接为 `dubbed_video_url` 添加字幕并告知用户唇形同步限制为5分钟	配音视频，无嘴型匹配
唇形同步步骤失败	`mcp__claude_ai_pika__edit_lipsync` 报错（无清晰人脸轨迹、提供商4xx）	按 `variant` tier回退（v2-pro→sync-3→v2）；若全部失败，返回无嘴型匹配的配音视频并告知用户	替换音频的视频，无嘴型匹配
字幕语言错误	步骤3自动转录错误检测语言	传入明确的 `language` 标签；若存在 `dub_subtitles` ，使用 `caption_mode="manual"` 并传入该字幕，而非自动模式	手动传入 `subtitles[]`
双语源字幕不可用	用户要求双语字幕但 `source_subtitles` 缺失	使用目标语言字幕并说明无法获取源转录内容	仅目标语言字幕

Compatibility

兼容性

Primary target: Claude Code. Uses standard MCP tools only. Works on Codex / Cursor / Claude Desktop.

主要目标：Claude Code。仅使用标准MCP工具。可在Codex/Cursor/Claude Desktop上运行。

language-swap

Original

Translation

/pika:language-swap

/pika:language-swap

Segmented / multi-language dub (per-range languages)

分段/多语言配音（按时间范围设置语言）

Behavior defaults

默认行为

State variables produced and consumed

生成和使用的状态变量

Step 0 — Parse input

步骤0 — 解析输入

Step 1 — Dub the video (state:
`dubbed_video_url`
)

步骤1 — 视频配音（状态：
`dubbed_video_url`
）

Step 2 — Lipsync (state:
`lipsynced_video_url`
)

步骤2 — 唇形同步（状态：
`lipsynced_video_url`
）

Step 3 — Burn target-language captions (state:
`final_video_url`
)

步骤3 — 添加目标语言字幕（状态：
`final_video_url`
）

Step 4 — Return

步骤4 — 返回结果

Failure modes

失败模式

Compatibility

兼容性

language-swap

Original

Translation

/pika:language-swap

/pika:language-swap

Segmented / multi-language dub (per-range languages)

分段/多语言配音（按时间范围设置语言）

Behavior defaults

默认行为

State variables produced and consumed

生成和使用的状态变量

Step 0 — Parse input

步骤0 — 解析输入

Step 1 — Dub the video (state: dubbed_video_url)

步骤1 — 视频配音（状态：dubbed_video_url）

Step 2 — Lipsync (state: lipsynced_video_url)

步骤2 — 唇形同步（状态：lipsynced_video_url）

Step 3 — Burn target-language captions (state: final_video_url)

步骤3 — 添加目标语言字幕（状态：final_video_url）

Step 4 — Return

步骤4 — 返回结果

Failure modes

失败模式

Compatibility

兼容性

Step 1 — Dub the video (state:
`dubbed_video_url`
)

步骤1 — 视频配音（状态：
`dubbed_video_url`
）

Step 2 — Lipsync (state:
`lipsynced_video_url`
)

步骤2 — 唇形同步（状态：
`lipsynced_video_url`
）

Step 3 — Burn target-language captions (state:
`final_video_url`
)

步骤3 — 添加目标语言字幕（状态：
`final_video_url`
）