podcast

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

/pika:podcast

/pika:podcast

4 acts × 15s each = 60s. Host A always LEFT, Host B always RIGHT. Accepts a URL or a free-form topic / brief.
4幕×每幕15秒 = 60秒。主播A始终在左侧,主播B始终在右侧。支持输入URL 自由主题/简介。

Parameters

参数

ParamDefaultNotes
input
requiredURL to review or free-form topic / brief (e.g. "I and Elon Musk talk about Mars")
bg_img
auto-generatedPodcast studio background
host_a_img
auto-generatedHost A portrait — see Real-person handling below
host_b_img
auto-generatedHost B portrait — see Real-person handling below
voice_a
876341503281471517
Kling preset or cloned voice ID for Host A
voice_b
829837252279803904
Kling preset or cloned voice ID for Host B
use_avatar
offClone user's identity voice as Host A via
clone_voice
aspect_ratio
16:9
Output aspect ratio
参数默认值说明
input
必填待评测的URL或自由主题/简介(例如:“我和埃隆·马斯克讨论火星”)
bg_img
自动生成播客工作室背景图
host_a_img
自动生成主播A肖像——详见下方「真实人物处理」
host_b_img
自动生成主播B肖像——详见下方「真实人物处理」
voice_a
876341503281471517
Kling预设语音或主播A的克隆语音ID
voice_b
829837252279803904
Kling预设语音或主播B的克隆语音ID
use_avatar
关闭通过
clone_voice
克隆用户身份语音作为主播A
aspect_ratio
16:9
输出画面比例

Defaults — fire fast, no mid-flow confirmation

默认规则——快速执行,无需中途确认

  • Use the param-table defaults silently for voices.
    voice_a
    defaults to the Kling preset
    876341503281471517
    and
    voice_b
    to
    829837252279803904
    . Do not ask "which voice?" or "should I clone yours?" before firing — only honor explicit overrides (
    voice_a=
    ,
    voice_b=
    ,
    use_avatar
    ).
  • Auto-generate any missing host portraits silently (Step 1's archetype prompts). Do not ask "should I generate a host image?" — just generate.
  • No "type yes to proceed" gates. Submit → render the 4 acts → return URL. Account credit balance + provider failover are the canonical guardrails. The
    --yes
    flag is accepted as a no-op for backward compatibility.
  • Topic-mode personas (Step 3) — when the user names a real public figure, follow Step 4 (Real-person handling) silently: archetype portrait by default, no auto-generated photographic likeness, no question to the user about likeness rights.
  • 默认使用参数表中的语音设置
    voice_a
    默认使用Kling预设
    876341503281471517
    voice_b
    默认使用
    829837252279803904
    。执行前不要询问“使用哪种语音?”或“是否克隆您的语音?”——仅遵循明确的覆盖设置(
    voice_a=
    voice_b=
    use_avatar
    )。
  • 自动静默生成缺失的主播肖像(步骤1的原型提示)。**不要询问“是否生成主播图片?”**直接生成即可。
  • 无需“输入yes确认”的环节。提交请求→渲染4幕内容→返回URL。账户余额和服务商故障转移是核心限制条件。
    --yes
    标记仅为向后兼容而保留,无实际作用。
  • 主题模式下的角色(步骤3)——当用户指定真实公众人物时,静默遵循步骤4(真实人物处理):默认使用原型肖像,不自动生成逼真照片,不询问用户关于肖像权的问题。

Local images on Claude Desktop

Claude Desktop本地图片处理

Claude Desktop can't pass inline-pasted images to MCP tools yet (Anthropic-side limitation). If the user pastes a photo inline, or mentions a local file they want as
host_a_img
/
host_b_img
, pause Step 1 and kindly send them this — something like:
Heads up — pasted images don't reach MCP tools on Claude Desktop yet (Anthropic limitation). Two easy options for your photo:
  • Paste a URL if it's already hosted (Imgur, S3, your site) — fastest
  • Attach the image file so I can upload it before generation.
When a local file arrives, convert it to a public URL with
upload_asset
and use the returned
public_url
as the parameter before Step 1. Already-hosted
https://...
URLs work as-is and skip this entirely.
If the user names a real public figure without attaching anything, do NOT auto-generate their likeness — Step 4 (Real-person handling) uses an archetype portrait instead.
Claude Desktop目前无法将粘贴的内嵌图片传递给MCP工具(Anthropic侧限制)。如果用户内嵌粘贴照片,或提及想要作为
host_a_img
/
host_b_img
的本地文件,请暂停步骤1并友好告知用户:
注意——目前Claude Desktop无法将粘贴的图片传递给MCP工具(Anthropic限制)。您可以通过以下两种简单方式提供照片:
  • 粘贴URL(如果图片已托管在Imgur、S3或您的网站上)——最快方式
  • 附加图片文件,我会在生成前上传。
收到本地文件后,使用
upload_asset
将其转换为公共URL,并将返回的
public_url
作为参数传入步骤1。已托管的
https://...
URL可直接使用,无需此步骤。
如果用户指定真实公众人物但未附加任何内容,不要自动生成其逼真肖像——步骤4(真实人物处理)将使用原型肖像替代。

Steps

步骤

0. Resolve input (empty-args menu)

0. 解析输入(空参数菜单)

Strip flags (
--yes
,
--no-captions
, etc.) and
key=value
parameters from
$ARGUMENTS
. If what remains is empty or whitespace-only, print this menu verbatim as your full response, then stop and wait for the user's next message — do NOT call any tool, do NOT proceed to Step 1, do NOT invent a topic or URL. If the stripped input is non-empty (a URL or any prose), skip this step silently and proceed to Step 1.
What would you like a podcast about? I can take any of:
  • A website URL (product page, docs site, launch page) — e.g.
    https://pika.art
  • A GitHub repo — e.g.
    https://github.com/anthropics/claude-code
  • A blog post / article URL — e.g. a recent piece you'd like discussed
  • A free-form topic or brief — e.g. "I and Elon Musk talk about Mars" or "two scientists debate AGI"
Reply with your choice and I'll generate a 1-minute two-host podcast video (4 acts × ~15s).
Tip: you don't need to type
/pika:podcast
— just say things like "make a podcast about <topic>", "podcast review of <url>", or "I and <persona> talk about <topic>" and I'll fire this skill automatically.
When the user replies, treat their reply as the resolved input (URL or topic) and proceed to Step 1. Do not re-prompt.
$ARGUMENTS
中移除标记(
--yes
--no-captions
等)和
key=value
参数。如果剩余内容为空或仅含空白字符,请完整打印以下菜单作为回复,然后停止并等待用户的下一条消息——不要调用任何工具,不要进入步骤1,不要自行创建主题或URL。如果解析后的输入非空(URL或任意文本),则静默跳过此步骤并进入步骤1。
**您想要制作关于什么的播客?**我支持以下输入:
  • 网站URL(产品页、文档站、发布页)——例如
    https://pika.art
  • GitHub仓库——例如
    https://github.com/anthropics/claude-code
  • 博客文章/文章URL——例如您想要讨论的近期文章
  • 自由主题或简介——例如*“我和埃隆·马斯克讨论火星”“两位科学家辩论AGI”*
回复您的选择,我将生成一段1分钟的双主播播客视频(4幕×约15秒)。
提示:您无需输入
/pika:podcast
——只需说“制作关于<主题>的播客”“对<url>的播客评测”或“我和<角色>讨论<主题>”,我会自动触发此技能。
用户回复后,将其回复视为解析后的输入(URL或主题)并进入步骤1,无需再次提示。

1. Generate missing assets (parallel)

1. 生成缺失资源(并行处理)

Generate only what's not provided. Default archetype prompts:
  • bg_img
    — modern podcast studio, two chairs, warm lighting, no people, 16:9
  • host_a_img
    — enthusiastic host, studio portrait, left-side framing, 1:1
  • host_b_img
    — pragmatic skeptic host, studio portrait, right-side framing, 1:1
If the input mentions specific personas (Step 3), tune the archetype to match the persona vibe — see Real-person handling below.
仅生成未提供的资源。默认原型提示:
  • bg_img
    ——现代播客工作室,两把椅子,暖光,无人物,16:9比例
  • host_a_img
    ——热情的主播,工作室肖像,左侧构图,1:1比例
  • host_b_img
    ——务实的质疑型主播,工作室肖像,右侧构图,1:1比例
如果输入提及特定角色(步骤3),调整原型提示以匹配角色风格——详见下方真实人物处理。

2. Resolve voice IDs (only if
use_avatar
is set)

2. 解析语音ID(仅当
use_avatar
开启时)

  1. Call
    identity_voice_info
    { voice_id, platform, sample_url }
  2. If
    sample_url
    is present: call
    clone_voice(voice_url=sample_url, voice_name="host_a_voice")
    → set
    voice_a
    to the returned Kling voice ID
  1. 调用
    identity_voice_info
    → 返回
    { voice_id, platform, sample_url }
  2. 如果存在
    sample_url
    :调用
    clone_voice(voice_url=sample_url, voice_name="host_a_voice")
    → 将
    voice_a
    设置为返回的Kling语音ID

3. Parse input mode — URL vs topic

3. 解析输入模式——URL模式vs主题模式

Strip flags (
--yes
,
--no-captions
, etc.) and key=value parameters from
$ARGUMENTS
. Inspect what remains.
URL mode — input contains a
https?://
URL:
  • Call
    capture_website
    on the URL.
  • Extract: product name, value prop, 2–3 specific features or facts, pricing, one jokeable detail.
  • Use these as the script's factual anchors.
Topic mode — input is free-form prose (no URL):
  • Treat the whole input as the brief. Parse for:
    • Subject — what the conversation is about
    • Hosts — explicit if mentioned ("I and Elon Musk", "two scientists", "Joe and Sarah"); otherwise use defaults (enthusiastic host + skeptic host)
    • Angle — debate / interview / explainer / casual
    • Concrete facts — any specific claims, numbers, dates, quotes the user gave
  • If no concrete facts are given, use 2–3 clearly framed observations or hypotheses to anchor jokes and the "wait, actually..." pivot. Do not present invented claims as facts; if factual accuracy matters for the topic, ask for a source or URL.
  • If the user says "I and X" or "me and X", Host A = the user (use
    use_avatar
    flow if not already, or default avatar) and Host B = X.
$ARGUMENTS
中移除标记(
--yes
--no-captions
等)和key=value参数,检查剩余内容。
URL模式——输入包含
https?://
格式的URL:
  • 对该URL调用
    capture_website
  • 提取:产品名称、价值主张、2-3个特定功能或事实、定价、一个可用于调侃的细节。
  • 将这些内容作为脚本的事实依据。
主题模式——输入为自由文本(无URL):
  • 将整个输入视为简介,解析以下内容:
    • 主题——对话围绕的内容
    • 主播——如果明确提及(“我和埃隆·马斯克”“两位科学家”“乔和莎拉”);否则使用默认设置(热情主播+质疑型主播)
    • 风格——辩论/访谈/讲解/闲聊
    • 具体事实——用户提供的任何具体主张、数字、日期、引用
  • 如果未提供具体事实,使用2-3个清晰表述的观察或假设作为调侃和“等等,实际上……”反转的依据。不要将虚构内容作为事实呈现;如果主题需要事实准确性,请询问用户提供来源或URL。
  • 如果用户说“我和X”或“我与X”,则主播A=用户(如果未开启则使用
    use_avatar
    流程,或使用默认头像),主播B=X。

4. Real-person handling (topic mode only)

4. 真实人物处理(仅主题模式)

If the parsed input names a specific real public figure as a host (e.g. "Elon Musk", "Taylor Swift", "Joe Rogan"):
  • Default behavior: do NOT auto-generate that person's photographic likeness. Generate an archetype portrait matching the persona vibe — e.g. "tech-billionaire-energy CEO at a podcast desk" for an Elon-style host, "pop-star aesthetic" for a Taylor-style host. Clearly inspired-by, not impersonation.
  • Override: if the user explicitly provides
    host_a_img=<url>
    or
    host_b_img=<url>
    , use the provided image as-is. The user takes responsibility for likeness rights.
  • Voices: same logic — default to a generic Kling preset; only use a cloned voice when the user provides one (
    voice_a=
    /
    voice_b=
    ) or invokes
    use_avatar
    (which clones the user's own voice for Host A).
  • Script tone: the dialogue can riff on the named persona's known public positions or vibe (e.g. Mars enthusiasm for Elon-style) — public-record opinions are fair game. Do NOT put specific defamatory, off-character, or fabricated-private-life statements in their mouth.
This guardrail keeps the skill creative ("I want a podcast where I argue with a tech CEO about Mars") without auto-generating deepfakes of named real people.
如果解析后的输入指定特定真实公众人物作为主播(例如“埃隆·马斯克”“泰勒·斯威夫特”“乔·罗根”):
  • 默认行为:不要自动生成该人物的逼真照片。生成匹配角色风格的原型肖像——例如,针对埃隆风格的主播使用“科技亿万富翁、精力充沛的CEO在播客桌前”的原型,针对泰勒风格的主播使用“流行明星风格”的原型。仅为风格启发,而非模仿。
  • 覆盖设置:如果用户明确提供
    host_a_img=<url>
    host_b_img=<url>
    ,则直接使用提供的图片。用户需自行承担肖像权责任。
  • 语音:逻辑相同——默认使用通用Kling预设;仅当用户提供克隆语音(
    voice_a=
    /
    voice_b=
    )或开启
    use_avatar
    (克隆用户自身语音作为主播A)时使用克隆语音。
  • 脚本语气:对话可以围绕该角色的公开立场或风格展开(例如,埃隆风格的主播对火星的热情)——公开记录的观点可合理使用。不要让角色说出具有诽谤性、不符合其人设或虚构私生活的内容。
此规则确保技能兼具创意性(“我想要制作一个我与科技CEO辩论火星的播客”),同时避免自动生成真实人物的深度伪造内容。

5. Write script

5. 编写脚本

Write 4 acts × 2 lines (HOST_A / HOST_B). Each line ~10–12s of spoken dialogue.
Required (Matan rules — apply to both URL and topic modes):
  • One specific joke tied to a concrete detail (scraped fact in URL mode; topic-derived claim in topic mode)
  • One "wait, actually..." skeptic-flip moment
  • At least one mid-sentence interruption
  • Natural filler: "okay so", "wait", "right?", "i mean", "honestly"
  • Real reactions, not generic praise
  • Reference at least one actual feature name, price, claim, or quote
  • Natural ending — no forced "bye!"
Acts: Hook → Feature deep-dive → The Turn → Verdict (In topic mode the analogue: Hook → Substance → The Pivot → Verdict.)
编写4幕×2行台词(HOST_A / HOST_B)。每行台词约10-12秒的口语对话。
必填规则(Matan规则——适用于URL和主题模式):
  • 一个与具体细节相关的特定笑话(URL模式下为抓取的事实;主题模式下为主题衍生的主张)
  • 一个“等等,实际上……”的质疑反转时刻
  • 至少一次句中打断
  • 自然的填充词:“okay so”“wait”“right?”“i mean”“honestly”(保留英文原词以保持口语自然)
  • 真实的反应,而非泛泛的赞美
  • 至少提及一个实际的功能名称、价格、主张或引用
  • 自然结尾——不要生硬地说“bye!”
幕结构:钩子→功能深入→反转→结论 (主题模式对应:钩子→核心内容→转折→结论)

6. Generate video acts (subagent, sequential)

6. 生成视频幕内容(子代理,顺序执行)

Delegate to a subagent with all resolved assets and the script. The subagent runs acts 1→2→3→4 sequentially — do NOT parallelize.
Each act: one
generate_reference_video
call (
kling-v3-omni
,
duration=15
,
sound=true
). Pass
reference_images=[bg_img, host_a_img, host_b_img]
and
voice_ids=[voice_a, voice_b]
. Optional knobs (added by
pika-mcp-server
BACK-339, 2026-05-10):
quality_mode: "pro"
for higher-fidelity kling output (longer wall-clock; reserve for high-stakes renders), and
kling_model
to pin a specific kling family member if you need reproducibility across runs. Three shots:
  • Wide 5s: both hosts, no voice token
  • MCU-A 5s:
    <<<voice_1>>> '<HOST_A line>'
  • MCU-B 5s:
    <<<voice_2>>> '<HOST_B line>'
Emotional beats per act:
  • Act 1: A excited, B skeptical
  • Act 2: A gesturing/explaining, B questioning
  • Act 3: A firm, B surprised and reconsidering
  • Act 4: A satisfied, B conceding
After act 4, subagent calls
edit_concat([act1, act2, act3, act4])
and returns the final video URL.
将所有解析后的资源和脚本委托给子代理。子代理按顺序执行幕1→2→3→4——不要并行处理
每一幕:调用一次
generate_reference_video
kling-v3-omni
duration=15
sound=true
)。传入
reference_images=[bg_img, host_a_img, host_b_img]
voice_ids=[voice_a, voice_b]
。可选参数(由
pika-mcp-server
BACK-339于2026-05-10添加):
quality_mode: "pro"
用于更高保真的Kling输出(耗时更长;仅用于高优先级渲染),
kling_model
用于固定特定Kling模型以确保多次运行的一致性。包含三个镜头:
  • 宽景5秒:两位主播,无语音标记
  • MCU-A 5秒:
    <<<voice_1>>> '<HOST_A台词>'
  • MCU-B 5秒:
    <<<voice_2>>> '<HOST_B台词>'
每幕的情绪节奏:
  • 幕1:主播A兴奋,主播B质疑
  • 幕2:主播A手势/讲解,主播B提问
  • 幕3:主播A坚定,主播B惊讶并重新考虑
  • 幕4:主播A满意,主播B妥协
幕4完成后,子代理调用
edit_concat([act1, act2, act3, act4])
并返回最终视频URL。

7. Output

7. 输出

Return the final video URL and a one-sentence verdict. Do not call
add_captions
— Whisper auto-transcription is unreliable on the domain-specific terms typical of podcast dialogue (product names, persona names, technical jargon). Native Kling Omni audio is the deliverable.

Rules:
  • voice_ids
    must be valid Kling voice IDs — never use name-style strings like
    Calm_Man
  • Host A always LEFT (
    <<<image_2>>>
    ), Host B always RIGHT (
    <<<image_3>>>
    ) — never swapped
返回最终视频URL和一句结论。不要调用
add_captions
——Whisper自动转录对播客对话中的领域特定术语(产品名称、角色名称、技术术语)识别不准确。原生Kling Omni音频为交付内容。

规则:
  • voice_ids
    必须是有效的Kling语音ID——绝不要使用
    Calm_Man
    这类名称式字符串
  • 主播A始终在左侧(
    <<<image_2>>>
    ),主播B始终在右侧(
    <<<image_3>>>
    )——绝不要调换位置

Load-bearing phrases

核心锚点短语

These anchors keep the podcast output coherent across URL and topic modes:
PhraseWhereWhy load-bearing
Host A always LEFT, Host B always RIGHT
Layout and shot promptsPrevents host identity swapping across the four separate act renders.
4 acts × 15s each
Overall structureKeeps the concat predictable and avoids uneven act pacing.
Hook → Feature deep-dive → The Turn → Verdict
Script structureGives the episode a conversational arc instead of four disconnected reactions.
wait, actually...
skeptic-flip moment
Script requirementsCreates the pivot that makes the podcast feel like a real exchange.
Do not call add_captions
Output ruleAvoids low-quality burned captions on fast two-host dialogue with names and jargon.
这些短语确保播客输出在URL和主题模式下保持连贯:
短语位置核心作用
Host A always LEFT, Host B always RIGHT
布局和镜头提示避免在四个独立幕的渲染中出现主播身份调换。
4 acts × 15s each
整体结构确保拼接后的内容可预测,避免幕时长不均。
Hook → Feature deep-dive → The Turn → Verdict
脚本结构让节目具备对话弧光,而非四个独立的反应片段。
wait, actually...
skeptic-flip moment
脚本要求创建反转,让播客更像真实的对话交流。
Do not call add_captions
输出规则避免在快速的双主播对话中添加包含名称和术语的低质量字幕。

Engine choice: Kling v3-omni for native two-host dialogue

引擎选择:Kling v3-omni用于原生双主播对话

Use Kling v3-omni for the four acts because it supports native dialogue with two reference hosts and voice tokens in a single shot plan. The tradeoff is that acts run sequentially for consistency and can take longer than pure edit/composite flows. Do not add a separate caption or music layer by default; the value of this skill is the native spoken exchange.
使用Kling v3-omni生成四幕内容,因为它支持在单个镜头计划中实现带有两位参考主播和语音标记的原生对话。 tradeoff是为了一致性需顺序执行幕内容,耗时比纯编辑/合成流程更长。默认不要添加单独的字幕或音乐层;此技能的核心价值是原生对话交互。

Runtime expectations

运行时预期

Typical wall-clock is 8-18 minutes:
StepWall clockNotes
Missing asset generation30-90sSkipped for provided background/host refs
URL/topic parse + script1-3 minURL mode depends on page fetch quality
Four Kling acts6-14 minRuns sequentially to reduce host/voice drift
Concat + return30-90sFinal URL only; captions skipped by default
典型耗时为8-18分钟:
步骤耗时说明
缺失资源生成30-90秒若已提供背景/主播参考则跳过
URL/主题解析 + 脚本编写1-3分钟URL模式取决于页面抓取质量
四个Kling幕生成6-14分钟顺序执行以减少主播/语音偏差
拼接 + 返回30-90秒仅返回最终URL;默认跳过字幕

Examples

示例

URL mode (review a website / repo / blog):
/pika:podcast https://pika.art
/pika:podcast https://github.com/anthropics/claude-code
/pika:podcast https://cursor.com use_avatar
Topic mode (free-form brief):
/pika:podcast Two AI researchers debate whether AGI arrives before 2030
/pika:podcast I and a Mars-obsessed tech CEO talk about colonization timelines
/pika:podcast interview with a seed-stage VC about what kills most startups
/pika:podcast podcast about quantum computing breakthroughs in 2026
Mixed (URL inside a topic prompt — agent prefers URL mode if a valid URL is found):
/pika:podcast podcast about https://pika.art with skeptical investor energy
URL模式(评测网站/仓库/博客):
/pika:podcast https://pika.art
/pika:podcast https://github.com/anthropics/claude-code
/pika:podcast https://cursor.com use_avatar
主题模式(自由简介):
/pika:podcast Two AI researchers debate whether AGI arrives before 2030
/pika:podcast I and a Mars-obsessed tech CEO talk about colonization timelines
/pika:podcast interview with a seed-stage VC about what kills most startups
/pika:podcast podcast about quantum computing breakthroughs in 2026
混合模式(主题提示中包含URL——若检测到有效URL,代理优先使用URL模式):
/pika:podcast podcast about https://pika.art with skeptical investor energy