podcast
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese/pika:podcast
/pika:podcast
4 acts × 15s each = 60s. Host A always LEFT, Host B always RIGHT. Accepts a URL or a free-form topic / brief.
4幕×每幕15秒 = 60秒。主播A始终在左侧,主播B始终在右侧。支持输入URL 或自由主题/简介。
Parameters
参数
| Param | Default | Notes |
|---|---|---|
| required | URL to review or free-form topic / brief (e.g. "I and Elon Musk talk about Mars") |
| auto-generated | Podcast studio background |
| auto-generated | Host A portrait — see Real-person handling below |
| auto-generated | Host B portrait — see Real-person handling below |
| | Kling preset or cloned voice ID for Host A |
| | Kling preset or cloned voice ID for Host B |
| off | Clone user's identity voice as Host A via |
| | Output aspect ratio |
| 参数 | 默认值 | 说明 |
|---|---|---|
| 必填 | 待评测的URL或自由主题/简介(例如:“我和埃隆·马斯克讨论火星”) |
| 自动生成 | 播客工作室背景图 |
| 自动生成 | 主播A肖像——详见下方「真实人物处理」 |
| 自动生成 | 主播B肖像——详见下方「真实人物处理」 |
| | Kling预设语音或主播A的克隆语音ID |
| | Kling预设语音或主播B的克隆语音ID |
| 关闭 | 通过 |
| | 输出画面比例 |
Defaults — fire fast, no mid-flow confirmation
默认规则——快速执行,无需中途确认
- Use the param-table defaults silently for voices. defaults to the Kling preset
voice_aand876341503281471517tovoice_b. Do not ask "which voice?" or "should I clone yours?" before firing — only honor explicit overrides (829837252279803904,voice_a=,voice_b=).use_avatar - Auto-generate any missing host portraits silently (Step 1's archetype prompts). Do not ask "should I generate a host image?" — just generate.
- No "type yes to proceed" gates. Submit → render the 4 acts → return URL. Account credit balance + provider failover are the canonical guardrails. The flag is accepted as a no-op for backward compatibility.
--yes - Topic-mode personas (Step 3) — when the user names a real public figure, follow Step 4 (Real-person handling) silently: archetype portrait by default, no auto-generated photographic likeness, no question to the user about likeness rights.
- 默认使用参数表中的语音设置。默认使用Kling预设
voice_a,876341503281471517默认使用voice_b。执行前不要询问“使用哪种语音?”或“是否克隆您的语音?”——仅遵循明确的覆盖设置(829837252279803904、voice_a=、voice_b=)。use_avatar - 自动静默生成缺失的主播肖像(步骤1的原型提示)。**不要询问“是否生成主播图片?”**直接生成即可。
- 无需“输入yes确认”的环节。提交请求→渲染4幕内容→返回URL。账户余额和服务商故障转移是核心限制条件。标记仅为向后兼容而保留,无实际作用。
--yes - 主题模式下的角色(步骤3)——当用户指定真实公众人物时,静默遵循步骤4(真实人物处理):默认使用原型肖像,不自动生成逼真照片,不询问用户关于肖像权的问题。
Local images on Claude Desktop
Claude Desktop本地图片处理
Claude Desktop can't pass inline-pasted images to MCP tools yet (Anthropic-side limitation). If the user pastes a photo inline, or mentions a local file they want as / , pause Step 1 and kindly send them this — something like:
host_a_imghost_b_imgHeads up — pasted images don't reach MCP tools on Claude Desktop yet (Anthropic limitation). Two easy options for your photo:
- Paste a URL if it's already hosted (Imgur, S3, your site) — fastest
- Attach the image file so I can upload it before generation.
When a local file arrives, convert it to a public URL with and use the returned as the parameter before Step 1. Already-hosted URLs work as-is and skip this entirely.
upload_assetpublic_urlhttps://...If the user names a real public figure without attaching anything, do NOT auto-generate their likeness — Step 4 (Real-person handling) uses an archetype portrait instead.
Claude Desktop目前无法将粘贴的内嵌图片传递给MCP工具(Anthropic侧限制)。如果用户内嵌粘贴照片,或提及想要作为/的本地文件,请暂停步骤1并友好告知用户:
host_a_imghost_b_img注意——目前Claude Desktop无法将粘贴的图片传递给MCP工具(Anthropic限制)。您可以通过以下两种简单方式提供照片:
- 粘贴URL(如果图片已托管在Imgur、S3或您的网站上)——最快方式
- 附加图片文件,我会在生成前上传。
收到本地文件后,使用将其转换为公共URL,并将返回的作为参数传入步骤1。已托管的URL可直接使用,无需此步骤。
upload_assetpublic_urlhttps://...如果用户指定真实公众人物但未附加任何内容,不要自动生成其逼真肖像——步骤4(真实人物处理)将使用原型肖像替代。
Steps
步骤
0. Resolve input (empty-args menu)
0. 解析输入(空参数菜单)
Strip flags (, , etc.) and parameters from . If what remains is empty or whitespace-only, print this menu verbatim as your full response, then stop and wait for the user's next message — do NOT call any tool, do NOT proceed to Step 1, do NOT invent a topic or URL. If the stripped input is non-empty (a URL or any prose), skip this step silently and proceed to Step 1.
--yes--no-captionskey=value$ARGUMENTSWhat would you like a podcast about? I can take any of:
- A website URL (product page, docs site, launch page) — e.g.
https://pika.art- A GitHub repo — e.g.
https://github.com/anthropics/claude-code- A blog post / article URL — e.g. a recent piece you'd like discussed
- A free-form topic or brief — e.g. "I and Elon Musk talk about Mars" or "two scientists debate AGI"
Reply with your choice and I'll generate a 1-minute two-host podcast video (4 acts × ~15s).Tip: you don't need to type— just say things like "make a podcast about <topic>", "podcast review of <url>", or "I and <persona> talk about <topic>" and I'll fire this skill automatically./pika:podcast
When the user replies, treat their reply as the resolved input (URL or topic) and proceed to Step 1. Do not re-prompt.
从中移除标记(、等)和参数。如果剩余内容为空或仅含空白字符,请完整打印以下菜单作为回复,然后停止并等待用户的下一条消息——不要调用任何工具,不要进入步骤1,不要自行创建主题或URL。如果解析后的输入非空(URL或任意文本),则静默跳过此步骤并进入步骤1。
$ARGUMENTS--yes--no-captionskey=value**您想要制作关于什么的播客?**我支持以下输入:
- 网站URL(产品页、文档站、发布页)——例如
https://pika.art- GitHub仓库——例如
https://github.com/anthropics/claude-code- 博客文章/文章URL——例如您想要讨论的近期文章
- 自由主题或简介——例如*“我和埃隆·马斯克讨论火星”或“两位科学家辩论AGI”*
回复您的选择,我将生成一段1分钟的双主播播客视频(4幕×约15秒)。提示:您无需输入——只需说“制作关于<主题>的播客”“对<url>的播客评测”或“我和<角色>讨论<主题>”,我会自动触发此技能。/pika:podcast
用户回复后,将其回复视为解析后的输入(URL或主题)并进入步骤1,无需再次提示。
1. Generate missing assets (parallel)
1. 生成缺失资源(并行处理)
Generate only what's not provided. Default archetype prompts:
- — modern podcast studio, two chairs, warm lighting, no people, 16:9
bg_img - — enthusiastic host, studio portrait, left-side framing, 1:1
host_a_img - — pragmatic skeptic host, studio portrait, right-side framing, 1:1
host_b_img
If the input mentions specific personas (Step 3), tune the archetype to match the persona vibe — see Real-person handling below.
仅生成未提供的资源。默认原型提示:
- ——现代播客工作室,两把椅子,暖光,无人物,16:9比例
bg_img - ——热情的主播,工作室肖像,左侧构图,1:1比例
host_a_img - ——务实的质疑型主播,工作室肖像,右侧构图,1:1比例
host_b_img
如果输入提及特定角色(步骤3),调整原型提示以匹配角色风格——详见下方真实人物处理。
2. Resolve voice IDs (only if use_avatar
is set)
use_avatar2. 解析语音ID(仅当use_avatar
开启时)
use_avatar- Call →
identity_voice_info{ voice_id, platform, sample_url } - If is present: call
sample_url→ setclone_voice(voice_url=sample_url, voice_name="host_a_voice")to the returned Kling voice IDvoice_a
- 调用→ 返回
identity_voice_info{ voice_id, platform, sample_url } - 如果存在:调用
sample_url→ 将clone_voice(voice_url=sample_url, voice_name="host_a_voice")设置为返回的Kling语音IDvoice_a
3. Parse input mode — URL vs topic
3. 解析输入模式——URL模式vs主题模式
Strip flags (, , etc.) and key=value parameters from . Inspect what remains.
--yes--no-captions$ARGUMENTSURL mode — input contains a URL:
https?://- Call on the URL.
capture_website - Extract: product name, value prop, 2–3 specific features or facts, pricing, one jokeable detail.
- Use these as the script's factual anchors.
Topic mode — input is free-form prose (no URL):
- Treat the whole input as the brief. Parse for:
- Subject — what the conversation is about
- Hosts — explicit if mentioned ("I and Elon Musk", "two scientists", "Joe and Sarah"); otherwise use defaults (enthusiastic host + skeptic host)
- Angle — debate / interview / explainer / casual
- Concrete facts — any specific claims, numbers, dates, quotes the user gave
- If no concrete facts are given, use 2–3 clearly framed observations or hypotheses to anchor jokes and the "wait, actually..." pivot. Do not present invented claims as facts; if factual accuracy matters for the topic, ask for a source or URL.
- If the user says "I and X" or "me and X", Host A = the user (use flow if not already, or default avatar) and Host B = X.
use_avatar
从中移除标记(、等)和key=value参数,检查剩余内容。
$ARGUMENTS--yes--no-captionsURL模式——输入包含格式的URL:
https?://- 对该URL调用。
capture_website - 提取:产品名称、价值主张、2-3个特定功能或事实、定价、一个可用于调侃的细节。
- 将这些内容作为脚本的事实依据。
主题模式——输入为自由文本(无URL):
- 将整个输入视为简介,解析以下内容:
- 主题——对话围绕的内容
- 主播——如果明确提及(“我和埃隆·马斯克”“两位科学家”“乔和莎拉”);否则使用默认设置(热情主播+质疑型主播)
- 风格——辩论/访谈/讲解/闲聊
- 具体事实——用户提供的任何具体主张、数字、日期、引用
- 如果未提供具体事实,使用2-3个清晰表述的观察或假设作为调侃和“等等,实际上……”反转的依据。不要将虚构内容作为事实呈现;如果主题需要事实准确性,请询问用户提供来源或URL。
- 如果用户说“我和X”或“我与X”,则主播A=用户(如果未开启则使用流程,或使用默认头像),主播B=X。
use_avatar
4. Real-person handling (topic mode only)
4. 真实人物处理(仅主题模式)
If the parsed input names a specific real public figure as a host (e.g. "Elon Musk", "Taylor Swift", "Joe Rogan"):
- Default behavior: do NOT auto-generate that person's photographic likeness. Generate an archetype portrait matching the persona vibe — e.g. "tech-billionaire-energy CEO at a podcast desk" for an Elon-style host, "pop-star aesthetic" for a Taylor-style host. Clearly inspired-by, not impersonation.
- Override: if the user explicitly provides or
host_a_img=<url>, use the provided image as-is. The user takes responsibility for likeness rights.host_b_img=<url> - Voices: same logic — default to a generic Kling preset; only use a cloned voice when the user provides one (/
voice_a=) or invokesvoice_b=(which clones the user's own voice for Host A).use_avatar - Script tone: the dialogue can riff on the named persona's known public positions or vibe (e.g. Mars enthusiasm for Elon-style) — public-record opinions are fair game. Do NOT put specific defamatory, off-character, or fabricated-private-life statements in their mouth.
This guardrail keeps the skill creative ("I want a podcast where I argue with a tech CEO about Mars") without auto-generating deepfakes of named real people.
如果解析后的输入指定特定真实公众人物作为主播(例如“埃隆·马斯克”“泰勒·斯威夫特”“乔·罗根”):
- 默认行为:不要自动生成该人物的逼真照片。生成匹配角色风格的原型肖像——例如,针对埃隆风格的主播使用“科技亿万富翁、精力充沛的CEO在播客桌前”的原型,针对泰勒风格的主播使用“流行明星风格”的原型。仅为风格启发,而非模仿。
- 覆盖设置:如果用户明确提供或
host_a_img=<url>,则直接使用提供的图片。用户需自行承担肖像权责任。host_b_img=<url> - 语音:逻辑相同——默认使用通用Kling预设;仅当用户提供克隆语音(/
voice_a=)或开启voice_b=(克隆用户自身语音作为主播A)时使用克隆语音。use_avatar - 脚本语气:对话可以围绕该角色的公开立场或风格展开(例如,埃隆风格的主播对火星的热情)——公开记录的观点可合理使用。不要让角色说出具有诽谤性、不符合其人设或虚构私生活的内容。
此规则确保技能兼具创意性(“我想要制作一个我与科技CEO辩论火星的播客”),同时避免自动生成真实人物的深度伪造内容。
5. Write script
5. 编写脚本
Write 4 acts × 2 lines (HOST_A / HOST_B). Each line ~10–12s of spoken dialogue.
Required (Matan rules — apply to both URL and topic modes):
- One specific joke tied to a concrete detail (scraped fact in URL mode; topic-derived claim in topic mode)
- One "wait, actually..." skeptic-flip moment
- At least one mid-sentence interruption
- Natural filler: "okay so", "wait", "right?", "i mean", "honestly"
- Real reactions, not generic praise
- Reference at least one actual feature name, price, claim, or quote
- Natural ending — no forced "bye!"
Acts: Hook → Feature deep-dive → The Turn → Verdict
(In topic mode the analogue: Hook → Substance → The Pivot → Verdict.)
编写4幕×2行台词(HOST_A / HOST_B)。每行台词约10-12秒的口语对话。
必填规则(Matan规则——适用于URL和主题模式):
- 一个与具体细节相关的特定笑话(URL模式下为抓取的事实;主题模式下为主题衍生的主张)
- 一个“等等,实际上……”的质疑反转时刻
- 至少一次句中打断
- 自然的填充词:“okay so”“wait”“right?”“i mean”“honestly”(保留英文原词以保持口语自然)
- 真实的反应,而非泛泛的赞美
- 至少提及一个实际的功能名称、价格、主张或引用
- 自然结尾——不要生硬地说“bye!”
幕结构:钩子→功能深入→反转→结论
(主题模式对应:钩子→核心内容→转折→结论)
6. Generate video acts (subagent, sequential)
6. 生成视频幕内容(子代理,顺序执行)
Delegate to a subagent with all resolved assets and the script. The subagent runs acts 1→2→3→4 sequentially — do NOT parallelize.
Each act: one call (, , ). Pass and . Optional knobs (added by BACK-339, 2026-05-10): for higher-fidelity kling output (longer wall-clock; reserve for high-stakes renders), and to pin a specific kling family member if you need reproducibility across runs. Three shots:
generate_reference_videokling-v3-omniduration=15sound=truereference_images=[bg_img, host_a_img, host_b_img]voice_ids=[voice_a, voice_b]pika-mcp-serverquality_mode: "pro"kling_model- Wide 5s: both hosts, no voice token
- MCU-A 5s:
<<<voice_1>>> '<HOST_A line>' - MCU-B 5s:
<<<voice_2>>> '<HOST_B line>'
Emotional beats per act:
- Act 1: A excited, B skeptical
- Act 2: A gesturing/explaining, B questioning
- Act 3: A firm, B surprised and reconsidering
- Act 4: A satisfied, B conceding
After act 4, subagent calls and returns the final video URL.
edit_concat([act1, act2, act3, act4])将所有解析后的资源和脚本委托给子代理。子代理按顺序执行幕1→2→3→4——不要并行处理。
每一幕:调用一次(,,)。传入和。可选参数(由 BACK-339于2026-05-10添加):用于更高保真的Kling输出(耗时更长;仅用于高优先级渲染),用于固定特定Kling模型以确保多次运行的一致性。包含三个镜头:
generate_reference_videokling-v3-omniduration=15sound=truereference_images=[bg_img, host_a_img, host_b_img]voice_ids=[voice_a, voice_b]pika-mcp-serverquality_mode: "pro"kling_model- 宽景5秒:两位主播,无语音标记
- MCU-A 5秒:
<<<voice_1>>> '<HOST_A台词>' - MCU-B 5秒:
<<<voice_2>>> '<HOST_B台词>'
每幕的情绪节奏:
- 幕1:主播A兴奋,主播B质疑
- 幕2:主播A手势/讲解,主播B提问
- 幕3:主播A坚定,主播B惊讶并重新考虑
- 幕4:主播A满意,主播B妥协
幕4完成后,子代理调用并返回最终视频URL。
edit_concat([act1, act2, act3, act4])7. Output
7. 输出
Return the final video URL and a one-sentence verdict. Do not call — Whisper auto-transcription is unreliable on the domain-specific terms typical of podcast dialogue (product names, persona names, technical jargon). Native Kling Omni audio is the deliverable.
add_captionsRules:
- must be valid Kling voice IDs — never use name-style strings like
voice_idsCalm_Man - Host A always LEFT (), Host B always RIGHT (
<<<image_2>>>) — never swapped<<<image_3>>>
返回最终视频URL和一句结论。不要调用——Whisper自动转录对播客对话中的领域特定术语(产品名称、角色名称、技术术语)识别不准确。原生Kling Omni音频为交付内容。
add_captions规则:
- 必须是有效的Kling语音ID——绝不要使用
voice_ids这类名称式字符串Calm_Man - 主播A始终在左侧(),主播B始终在右侧(
<<<image_2>>>)——绝不要调换位置<<<image_3>>>
Load-bearing phrases
核心锚点短语
These anchors keep the podcast output coherent across URL and topic modes:
| Phrase | Where | Why load-bearing |
|---|---|---|
| Layout and shot prompts | Prevents host identity swapping across the four separate act renders. |
| Overall structure | Keeps the concat predictable and avoids uneven act pacing. |
| Script structure | Gives the episode a conversational arc instead of four disconnected reactions. |
| Script requirements | Creates the pivot that makes the podcast feel like a real exchange. |
| Output rule | Avoids low-quality burned captions on fast two-host dialogue with names and jargon. |
这些短语确保播客输出在URL和主题模式下保持连贯:
| 短语 | 位置 | 核心作用 |
|---|---|---|
| 布局和镜头提示 | 避免在四个独立幕的渲染中出现主播身份调换。 |
| 整体结构 | 确保拼接后的内容可预测,避免幕时长不均。 |
| 脚本结构 | 让节目具备对话弧光,而非四个独立的反应片段。 |
| 脚本要求 | 创建反转,让播客更像真实的对话交流。 |
| 输出规则 | 避免在快速的双主播对话中添加包含名称和术语的低质量字幕。 |
Engine choice: Kling v3-omni for native two-host dialogue
引擎选择:Kling v3-omni用于原生双主播对话
Use Kling v3-omni for the four acts because it supports native dialogue with two reference hosts and voice tokens in a single shot plan. The tradeoff is that acts run sequentially for consistency and can take longer than pure edit/composite flows. Do not add a separate caption or music layer by default; the value of this skill is the native spoken exchange.
使用Kling v3-omni生成四幕内容,因为它支持在单个镜头计划中实现带有两位参考主播和语音标记的原生对话。 tradeoff是为了一致性需顺序执行幕内容,耗时比纯编辑/合成流程更长。默认不要添加单独的字幕或音乐层;此技能的核心价值是原生对话交互。
Runtime expectations
运行时预期
Typical wall-clock is 8-18 minutes:
| Step | Wall clock | Notes |
|---|---|---|
| Missing asset generation | 30-90s | Skipped for provided background/host refs |
| URL/topic parse + script | 1-3 min | URL mode depends on page fetch quality |
| Four Kling acts | 6-14 min | Runs sequentially to reduce host/voice drift |
| Concat + return | 30-90s | Final URL only; captions skipped by default |
典型耗时为8-18分钟:
| 步骤 | 耗时 | 说明 |
|---|---|---|
| 缺失资源生成 | 30-90秒 | 若已提供背景/主播参考则跳过 |
| URL/主题解析 + 脚本编写 | 1-3分钟 | URL模式取决于页面抓取质量 |
| 四个Kling幕生成 | 6-14分钟 | 顺序执行以减少主播/语音偏差 |
| 拼接 + 返回 | 30-90秒 | 仅返回最终URL;默认跳过字幕 |
Examples
示例
URL mode (review a website / repo / blog):
/pika:podcast https://pika.art
/pika:podcast https://github.com/anthropics/claude-code
/pika:podcast https://cursor.com use_avatarTopic mode (free-form brief):
/pika:podcast Two AI researchers debate whether AGI arrives before 2030
/pika:podcast I and a Mars-obsessed tech CEO talk about colonization timelines
/pika:podcast interview with a seed-stage VC about what kills most startups
/pika:podcast podcast about quantum computing breakthroughs in 2026Mixed (URL inside a topic prompt — agent prefers URL mode if a valid URL is found):
/pika:podcast podcast about https://pika.art with skeptical investor energyURL模式(评测网站/仓库/博客):
/pika:podcast https://pika.art
/pika:podcast https://github.com/anthropics/claude-code
/pika:podcast https://cursor.com use_avatar主题模式(自由简介):
/pika:podcast Two AI researchers debate whether AGI arrives before 2030
/pika:podcast I and a Mars-obsessed tech CEO talk about colonization timelines
/pika:podcast interview with a seed-stage VC about what kills most startups
/pika:podcast podcast about quantum computing breakthroughs in 2026混合模式(主题提示中包含URL——若检测到有效URL,代理优先使用URL模式):
/pika:podcast podcast about https://pika.art with skeptical investor energy