/pika:podcast
4 acts × 15s each = 60s. Host A always LEFT, Host B always RIGHT. Accepts a URL or a free-form topic / brief.
Parameters
| Param | Default | Notes |
|---|
| required | URL to review or free-form topic / brief (e.g. "I and Elon Musk talk about Mars") |
| auto-generated | Podcast studio background |
| auto-generated | Host A portrait — see Real-person handling below |
| auto-generated | Host B portrait — see Real-person handling below |
| | Kling preset or cloned voice ID for Host A |
| | Kling preset or cloned voice ID for Host B |
| off | Clone user's identity voice as Host A via |
| | Output aspect ratio |
Defaults — fire fast, no mid-flow confirmation
- Use the param-table defaults silently for voices. defaults to the Kling preset and to . Do not ask "which voice?" or "should I clone yours?" before firing — only honor explicit overrides (, , ).
- Auto-generate any missing host portraits silently (Step 1's archetype prompts). Do not ask "should I generate a host image?" — just generate.
- No "type yes to proceed" gates. Submit → render the 4 acts → return URL. Account credit balance + provider failover are the canonical guardrails. The flag is accepted as a no-op for backward compatibility.
- Topic-mode personas (Step 3) — when the user names a real public figure, follow Step 4 (Real-person handling) silently: archetype portrait by default, no auto-generated photographic likeness, no question to the user about likeness rights.
Local images on Claude Desktop
Claude Desktop can't pass inline-pasted images to MCP tools yet (Anthropic-side limitation). If the user pastes a photo inline, or mentions a local file they want as
/
, pause Step 1 and kindly send them this — something like:
Heads up — pasted images don't reach MCP tools on Claude Desktop yet (Anthropic limitation). Two easy options for your photo:
- Paste a URL if it's already hosted (Imgur, S3, your site) — fastest
- Attach the image file so I can upload it before generation.
When a local file arrives, convert it to a public URL with
and use the returned
as the parameter before Step 1. Already-hosted
URLs work as-is and skip this entirely.
If the user names a real public figure without attaching anything, do NOT auto-generate their likeness — Step 4 (Real-person handling) uses an archetype portrait instead.
Steps
0. Resolve input (empty-args menu)
Strip flags (
,
, etc.) and
parameters from
.
If what remains is empty or whitespace-only, print this menu
verbatim as your full response, then
stop and wait for the user's next message — do NOT call any tool, do NOT proceed to Step 1, do NOT invent a topic or URL. If the stripped input is non-empty (a URL or any prose), skip this step silently and proceed to Step 1.
What would you like a podcast about? I can take any of:
- A website URL (product page, docs site, launch page) — e.g.
- A GitHub repo — e.g.
https://github.com/anthropics/claude-code
- A blog post / article URL — e.g. a recent piece you'd like discussed
- A free-form topic or brief — e.g. "I and Elon Musk talk about Mars" or "two scientists debate AGI"
Reply with your choice and I'll generate a 1-minute two-host podcast video (4 acts × ~15s).
Tip: you don't need to type — just say things like "make a podcast about <topic>", "podcast review of <url>", or "I and <persona> talk about <topic>" and I'll fire this skill automatically.
When the user replies, treat their reply as the resolved input (URL or topic) and proceed to Step 1. Do not re-prompt.
1. Generate missing assets (parallel)
Generate only what's not provided. Default archetype prompts:
- — modern podcast studio, two chairs, warm lighting, no people, 16:9
- — enthusiastic host, studio portrait, left-side framing, 1:1
- — pragmatic skeptic host, studio portrait, right-side framing, 1:1
If the input mentions specific personas (Step 3), tune the archetype to match the persona vibe — see Real-person handling below.
2. Resolve voice IDs (only if is set)
- Call →
{ voice_id, platform, sample_url }
- If is present: call
clone_voice(voice_url=sample_url, voice_name="host_a_voice")
→ set to the returned Kling voice ID
3. Parse input mode — URL vs topic
Strip flags (
,
, etc.) and key=value parameters from
. Inspect what remains.
URL mode — input contains a
URL:
- Call on the URL.
- Extract: product name, value prop, 2–3 specific features or facts, pricing, one jokeable detail.
- Use these as the script's factual anchors.
Topic mode — input is free-form prose (no URL):
- Treat the whole input as the brief. Parse for:
- Subject — what the conversation is about
- Hosts — explicit if mentioned ("I and Elon Musk", "two scientists", "Joe and Sarah"); otherwise use defaults (enthusiastic host + skeptic host)
- Angle — debate / interview / explainer / casual
- Concrete facts — any specific claims, numbers, dates, quotes the user gave
- If no concrete facts are given, use 2–3 clearly framed observations or hypotheses to anchor jokes and the "wait, actually..." pivot. Do not present invented claims as facts; if factual accuracy matters for the topic, ask for a source or URL.
- If the user says "I and X" or "me and X", Host A = the user (use flow if not already, or default avatar) and Host B = X.
4. Real-person handling (topic mode only)
If the parsed input names a specific real public figure as a host (e.g. "Elon Musk", "Taylor Swift", "Joe Rogan"):
- Default behavior: do NOT auto-generate that person's photographic likeness. Generate an archetype portrait matching the persona vibe — e.g. "tech-billionaire-energy CEO at a podcast desk" for an Elon-style host, "pop-star aesthetic" for a Taylor-style host. Clearly inspired-by, not impersonation.
- Override: if the user explicitly provides or , use the provided image as-is. The user takes responsibility for likeness rights.
- Voices: same logic — default to a generic Kling preset; only use a cloned voice when the user provides one ( / ) or invokes (which clones the user's own voice for Host A).
- Script tone: the dialogue can riff on the named persona's known public positions or vibe (e.g. Mars enthusiasm for Elon-style) — public-record opinions are fair game. Do NOT put specific defamatory, off-character, or fabricated-private-life statements in their mouth.
This guardrail keeps the skill creative ("I want a podcast where I argue with a tech CEO about Mars") without auto-generating deepfakes of named real people.
5. Write script
Write 4 acts × 2 lines (HOST_A / HOST_B). Each line ~10–12s of spoken dialogue.
Required (Matan rules — apply to both URL and topic modes):
- One specific joke tied to a concrete detail (scraped fact in URL mode; topic-derived claim in topic mode)
- One "wait, actually..." skeptic-flip moment
- At least one mid-sentence interruption
- Natural filler: "okay so", "wait", "right?", "i mean", "honestly"
- Real reactions, not generic praise
- Reference at least one actual feature name, price, claim, or quote
- Natural ending — no forced "bye!"
Acts: Hook → Feature deep-dive → The Turn → Verdict
(In topic mode the analogue: Hook → Substance → The Pivot → Verdict.)
6. Generate video acts (subagent, sequential)
Delegate to a subagent with all resolved assets and the script. The subagent runs acts 1→2→3→4 sequentially — do NOT parallelize.
Each act: one
call (
,
,
). Pass
reference_images=[bg_img, host_a_img, host_b_img]
and
voice_ids=[voice_a, voice_b]
. Optional knobs (added by
BACK-339, 2026-05-10):
for higher-fidelity kling output (longer wall-clock; reserve for high-stakes renders), and
to pin a specific kling family member if you need reproducibility across runs. Three shots:
- Wide 5s: both hosts, no voice token
- MCU-A 5s:
<<<voice_1>>> '<HOST_A line>'
- MCU-B 5s:
<<<voice_2>>> '<HOST_B line>'
Emotional beats per act:
- Act 1: A excited, B skeptical
- Act 2: A gesturing/explaining, B questioning
- Act 3: A firm, B surprised and reconsidering
- Act 4: A satisfied, B conceding
After act 4, subagent calls
edit_concat([act1, act2, act3, act4])
and returns the final video URL.
7. Output
Return the final video URL and a one-sentence verdict.
Do not call — Whisper auto-transcription is unreliable on the domain-specific terms typical of podcast dialogue (product names, persona names, technical jargon). Native Kling Omni audio is the deliverable.
Rules:
- must be valid Kling voice IDs — never use name-style strings like
- Host A always LEFT (), Host B always RIGHT () — never swapped
Load-bearing phrases
These anchors keep the podcast output coherent across URL and topic modes:
| Phrase | Where | Why load-bearing |
|---|
Host A always LEFT, Host B always RIGHT
| Layout and shot prompts | Prevents host identity swapping across the four separate act renders. |
| Overall structure | Keeps the concat predictable and avoids uneven act pacing. |
Hook → Feature deep-dive → The Turn → Verdict
| Script structure | Gives the episode a conversational arc instead of four disconnected reactions. |
| skeptic-flip moment | Script requirements | Creates the pivot that makes the podcast feel like a real exchange. |
| Output rule | Avoids low-quality burned captions on fast two-host dialogue with names and jargon. |
Engine choice: Kling v3-omni for native two-host dialogue
Use Kling v3-omni for the four acts because it supports native dialogue with two reference hosts and voice tokens in a single shot plan. The tradeoff is that acts run sequentially for consistency and can take longer than pure edit/composite flows. Do not add a separate caption or music layer by default; the value of this skill is the native spoken exchange.
Runtime expectations
Typical wall-clock is 8-18 minutes:
| Step | Wall clock | Notes |
|---|
| Missing asset generation | 30-90s | Skipped for provided background/host refs |
| URL/topic parse + script | 1-3 min | URL mode depends on page fetch quality |
| Four Kling acts | 6-14 min | Runs sequentially to reduce host/voice drift |
| Concat + return | 30-90s | Final URL only; captions skipped by default |
Examples
URL mode (review a website / repo / blog):
/pika:podcast https://pika.art
/pika:podcast https://github.com/anthropics/claude-code
/pika:podcast https://cursor.com use_avatar
Topic mode (free-form brief):
/pika:podcast Two AI researchers debate whether AGI arrives before 2030
/pika:podcast I and a Mars-obsessed tech CEO talk about colonization timelines
/pika:podcast interview with a seed-stage VC about what kills most startups
/pika:podcast podcast about quantum computing breakthroughs in 2026
Mixed (URL inside a topic prompt — agent prefers URL mode if a valid URL is found):
/pika:podcast podcast about https://pika.art with skeptical investor energy