video-podcast-maker

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
REQUIRED: Load Remotion Best Practices First
This skill depends on
remotion-best-practices
. You MUST invoke it before proceeding:
Invoke the skill/tool named: remotion-best-practices
必填项:先加载Remotion最佳实践
本技能依赖
remotion-best-practices
在继续操作前必须调用它:
Invoke the skill/tool named: remotion-best-practices

Video Podcast Maker

视频播客制作工具

Automated pipeline for 4K Bilibili horizontal knowledge videos from a topic. Coding agent + TTS backend + Remotion + FFmpeg.
基于主题自动生成B站4K横版知识视频的流水线。包含代码Agent + TTS后端 + Remotion + FFmpeg。

Contents

目录



Bootstrap

初始化

Resolve
SKILL_DIR
to the directory containing this
SKILL.md
. If your agent exposes a built-in skill directory variable (e.g.
${CLAUDE_SKILL_DIR}
), map it to
SKILL_DIR
.
bash
SKILL_DIR="${SKILL_DIR:-${CLAUDE_SKILL_DIR}}"
SKILL_DIR
解析为包含本
SKILL.md
文件的目录。如果你的Agent提供内置技能目录变量(如
${CLAUDE_SKILL_DIR}
),请将其映射到
SKILL_DIR
bash
SKILL_DIR="${SKILL_DIR:-${CLAUDE_SKILL_DIR}}"

1. Update check (notify-only, throttled to 24h)

1. 更新检查(仅通知,每24小时限制一次)

"${SKILL_DIR}/scripts/check_update.sh"
"${SKILL_DIR}/scripts/check_update.sh"

2. Prerequisites (CLIs + backend env vars)

2. 前置条件检查(命令行工具 + 后端环境变量)

python3 "${SKILL_DIR}/scripts/check_prereqs.py"

**`check_update.sh` output**:
- `UPDATE_AVAILABLE vX.Y.Z -> vA.B.C` — tell the user the version delta and ask before running `git -C "${SKILL_DIR}" pull --ff-only`. **Notify-only by design — never pull without consent (the skill directory belongs to the user).**
- `UP_TO_DATE` / `SKIPPED_RECENT_CHECK` / `MANUAL_INSTALL` — continue silently.

**Prereqs failures** — see README.md for setup. The check is backend-aware (resolves `TTS_BACKEND` env → `user_prefs.json` `global.tts.backend` → `edge` default), so only env vars required by the active backend are validated.

> **Design Learning shortcut**: If the user provides a reference video/image or asks to save/list/delete style profiles, see [references/design-learning.md](references/design-learning.md) instead of running the workflow below.

---
python3 "${SKILL_DIR}/scripts/check_prereqs.py"

**`check_update.sh`输出说明**:
- `UPDATE_AVAILABLE vX.Y.Z -> vA.B.C` — 告知用户版本差异,并在运行`git -C "${SKILL_DIR}" pull --ff-only`前征得同意。**设计为仅通知模式——未经用户同意绝不能自动拉取更新(技能目录属于用户)。**
- `UP_TO_DATE` / `SKIPPED_RECENT_CHECK` / `MANUAL_INSTALL` — 静默继续流程。

**前置条件检查失败** — 请查看README.md进行配置。检查逻辑支持后端感知(优先解析`TTS_BACKEND`环境变量 → `user_prefs.json`中的`global.tts.backend` → 默认使用`edge`),因此仅验证当前激活后端所需的环境变量。

> **设计学习快捷路径**: 如果用户提供参考视频/图片,或要求保存/列出/删除风格配置文件,请查看[references/design-learning.md](references/design-learning.md),而非运行以下工作流。

---

Execution Modes

执行模式

Detect at workflow start:
  • "Make a video about..." / no special instructions → Auto Mode (default)
  • "I want to control each step" / "interactive" → Interactive Mode
在工作流开始时检测:
  • “制作一个关于...的视频” / 无特殊指令 → 自动模式(默认)
  • “我想控制每个步骤” / “交互” → 交互模式

Auto Mode defaults

自动模式默认规则

Full pipeline with sensible defaults. Mandatory stop at Step 9 (Studio review); Step 10 (4K render) only fires when the user says "render 4K" / "render final".
StepDecisionAuto Default
3Title positiontop-center
5Media assetsSkip (text-only animations)
7Thumbnail methodRemotion-generated (16:9 + 4:3)
9Outro animationPre-made MP4 (white/black by theme)
12Subtitle methodRemotion-native (skip legacy FFmpeg burn)
14CleanupAuto-clean temp files
Override any default in the initial request:
  • "make a video about AI, burn subtitles" → auto + subtitles on
  • "use dark theme, AI thumbnails" → auto + dark + imagen
  • "need screenshots" → auto + media collection enabled
完整流水线搭配合理默认值。步骤9(Studio预览)为强制停顿节点;仅当用户说出“渲染4K” / “渲染最终版”时,才会执行步骤10(4K渲染)。
步骤决策项自动默认值
3标题位置顶部居中
5媒体素材跳过(仅文字动画)
7封面生成方式Remotion自动生成(16:9 + 4:3比例)
9片尾动画预制MP4文件(根据主题选择白底/黑底)
12字幕生成方式Remotion原生方案(跳过旧版FFmpeg硬字幕)
14清理操作自动清理临时文件
可在初始请求中覆盖任意默认值:
  • “制作一个关于AI的视频,添加硬字幕” → 自动模式 + 开启字幕
  • “使用深色主题,AI生成封面” → 自动模式 + 深色主题 + Imagen生成
  • “需要截图素材” → 自动模式 + 启用媒体素材收集

Interactive Mode

交互模式

Prompts at each decision point.

在每个决策点向用户发起询问。

Workflow

工作流

At Step 1 start, create one task per step in your agent's tracker (Claude Code
TaskCreate
/ Codex todo list / equivalent). Mark
in_progress
on start,
completed
on finish. Files in
videos/{name}/
are the durable record — if interrupted, inspect the directory to determine where to resume.
#StepOutputPhase file
1Define topic direction
topic_definition.md
workflow-script.md
2Research topic
topic_research.md
workflow-script.md
3Design 5-7 sections(in-memory)workflow-script.md
4Write narration script
podcast.txt
workflow-script.md
4.5Pronunciation pre-flight (zh-CN)
phonemes.json
workflow-script.md
5Collect media (Auto: skip)
media_manifest.json
workflow-production.md
6Generate publish info (Part 1)
publish_info.md
workflow-production.md
7Generate thumbnails (16:9 + 4:3)
thumbnail_*.png
workflow-production.md
8Generate TTS audio
podcast_audio.wav
,
timing.json
workflow-production.md
9Remotion composition + Studio previewworkflow-production.md
10Render 4K video (only on user request)
output.mp4
workflow-production.md
11Mix background music
video_with_bgm.mp4
workflow-production.md
12Finalize (optional legacy subtitle burn)
final_video.mp4
workflow-publish.md
13Complete publish info (Part 2)chapter timestampsworkflow-publish.md
14Verify output (
scripts/verify_output.py
)
workflow-publish.md
15Generate vertical shorts (optional)
shorts/
workflow-publish.md
Mandatory stops (bold rows above):
  • Step 9 — Studio review. MUST launch
    npx remotion studio
    and wait for user feedback before rendering. NEVER render 4K until the user explicitly confirms ("render 4K" / "render final").
  • Step 14 —
    verify_output.py
    .
    MUST pass before declaring the video done. Exit 0 = green; exit 2 = warnings still publishable. Auto-fixes common omissions (creates
    final_video.mp4
    if missing). For machine-readable output add
    --format json
    (auto when piped).
Pre-render audit (recommended) — before Step 9:
bash
python3 ${SKILL_DIR}/scripts/audit_beat_sync.py <Video.tsx> <timing.json>
Flags beats that drift > 1.5s from narration. Especially important for kinetic-typography videos.
在步骤1开始时,在你的Agent任务追踪器中为每个步骤创建一个任务(如Claude Code的
TaskCreate
/ Codex待办列表 / 等效功能)。开始时标记为
in_progress
,完成后标记为
completed
videos/{name}/
目录下的文件是持久化记录——如果流程中断,可检查该目录以确定恢复位置。
序号步骤输出阶段文件
1定义主题方向
topic_definition.md
workflow-script.md
2主题调研
topic_research.md
workflow-script.md
3设计5-7个内容板块(内存中暂存)workflow-script.md
4撰写旁白脚本
podcast.txt
workflow-script.md
4.5发音预检查(中文)
phonemes.json
workflow-script.md
5收集媒体素材(自动模式:跳过)
media_manifest.json
workflow-production.md
6生成发布信息(第一部分)
publish_info.md
workflow-production.md
7生成封面图(16:9 + 4:3比例)
thumbnail_*.png
workflow-production.md
8生成TTS音频
podcast_audio.wav
,
timing.json
workflow-production.md
9Remotion合成 + Studio预览workflow-production.md
10渲染4K视频(仅在用户请求时执行)
output.mp4
workflow-production.md
11混合背景音乐
video_with_bgm.mp4
workflow-production.md
12最终处理(可选旧版硬字幕添加)
final_video.mp4
workflow-publish.md
13完善发布信息(第二部分)章节时间戳workflow-publish.md
14验证输出
scripts/verify_output.py
workflow-publish.md
15生成竖版短视频(可选)
shorts/
workflow-publish.md
强制停顿节点(上方加粗行):
  • 步骤9 — Studio预览。必须启动
    npx remotion studio
    并等待用户反馈后再进行渲染。在用户明确确认(“渲染4K” / “渲染最终版”)前,绝不能执行4K渲染。
  • 步骤14 —
    verify_output.py
    。必须验证通过后才能宣布视频制作完成。退出码0=验证通过;退出码2=存在警告但仍可发布。可自动修复常见遗漏(如缺失
    final_video.mp4
    时自动创建)。添加
    --format json
    参数可生成机器可读输出(管道传输时自动启用)。
渲染前审计(推荐) — 步骤9前执行:
bash
python3 ${SKILL_DIR}/scripts/audit_beat_sync.py <Video.tsx> <timing.json>
标记与旁白时间差超过1.5秒的节奏点,对于动态排版视频尤为重要。

Validation Checkpoints

验证检查点

After StepCheck
8 (TTS)
podcast_audio.wav
plays ·
timing.json
covers all sections · SRT is UTF-8
10 (Render)
output.mp4
is 3840×2160 · audio-video sync · no black frames
14 (Verify)
verify_output.py
exits 0 (or 2 with reviewed warnings)

步骤完成后检查项
8(TTS生成)
podcast_audio.wav
可正常播放 ·
timing.json
覆盖所有板块 · SRT文件为UTF-8编码
10(渲染完成)
output.mp4
分辨率为3840×2160 · 音视频同步 · 无黑帧
14(验证完成)
verify_output.py
退出码为0(或退出码2且警告已审核)

Hard Rules

硬性规则

RuleRequirement
Single ProjectAll videos under
videos/{name}/
in user's Remotion project. NEVER create a new project per video.
4K Output3840×2160 (or 2160×3840 vertical), use
scale(2)
wrapper over 1920×1080 design space
Audio SyncAll animations driven by
timing.json
timestamps
ThumbnailMUST generate both 16:9 (1920×1080) AND 4:3 (1200×900) — see design-guide.md
Studio Before RenderMUST launch
remotion studio
for review. NEVER render 4K until user explicitly confirms.
--public-dir
Every Remotion command uses
--public-dir videos/{name}/
Visual minimums (text sizes, content width, safe zones, animation safety) live in references/design-guide.md. MUST load before Step 9.
规则要求
单一项目所有视频存储在用户Remotion项目的
videos/{name}/
目录下。绝不能为每个视频创建新项目。
4K输出分辨率为3840×2160(竖版为2160×3840),在1920×1080设计空间外使用
scale(2)
包装器
音频同步所有动画由
timing.json
时间戳驱动
封面图必须同时生成16:9(1920×1080)和4:3(1200×900)两种比例——详见design-guide.md
渲染前必须预览必须启动
remotion studio
进行预览。在用户明确确认前,绝不能执行4K渲染。
--public-dir
参数
所有Remotion命令必须使用
--public-dir videos/{name}/
视觉最低要求(文字大小、内容宽度、安全区域、动画规范)详见references/design-guide.md必须在步骤9前加载

Output Specs

输出规格

ParameterHorizontal (16:9)Vertical (9:16)
Resolution3840×2160 (4K)2160×3840 (4K)
Frame rate30 fps30 fps
EncodingH.264, 16MbpsH.264, 16Mbps
AudioAAC, 192kbpsAAC, 192kbps
Duration1-15 min60-90s (highlight)

参数横版(16:9)竖版(9:16)
分辨率3840×2160(4K)2160×3840(4K)
帧率30 fps30 fps
编码格式H.264,16MbpsH.264,16Mbps
音频AAC,192kbpsAAC,192kbps
时长1-15分钟60-90秒(精华版)

Per-Video Layout

单视频目录结构

project-root/                           # Remotion project root
├── src/remotion/                       # Remotion source (Root.tsx, compositions, index.ts)
├── videos/{video-name}/                # Per-video assets (the agent's working dir)
│   ├── topic_definition.md             # Step 1
│   ├── topic_research.md               # Step 2
│   ├── podcast.txt                     # Step 4: narration script
│   ├── phonemes.json                   # Step 4.5: zh-CN pronunciation overrides
│   ├── podcast_audio.wav               # Step 8: TTS audio
│   ├── podcast_audio.srt               # Step 8: subtitles
│   ├── timing.json                     # Step 8: timeline (drives animations)
│   ├── thumbnail_*.png                 # Step 7
│   ├── output.mp4                      # Step 10: 4K render (no BGM)
│   ├── video_with_bgm.mp4              # Step 11
│   ├── final_video.mp4                 # Step 12: final output
│   └── bgm.mp3                         # Background music
└── remotion.config.ts
project-root/                           # Remotion项目根目录
├── src/remotion/                       # Remotion源码(Root.tsx、合成组件、index.ts)
├── videos/{video-name}/                # 单视频素材目录(Agent工作目录)
│   ├── topic_definition.md             # 步骤1输出
│   ├── topic_research.md               # 步骤2输出
│   ├── podcast.txt                     # 步骤4:旁白脚本
│   ├── phonemes.json                   # 步骤4.5:中文发音修正
│   ├── podcast_audio.wav               # 步骤8:TTS音频
│   ├── podcast_audio.srt               # 步骤8:字幕文件
│   ├── timing.json                     # 步骤8:时间轴(驱动动画)
│   ├── thumbnail_*.png                 # 步骤7:封面图
│   ├── output.mp4                      # 步骤10:4K渲染视频(无背景音乐)
│   ├── video_with_bgm.mp4              # 步骤11:添加背景音乐后的视频
│   ├── final_video.mp4                 # 步骤12:最终输出视频
│   └── bgm.mp3                         # 背景音乐文件
└── remotion.config.ts

--public-dir
per video

单视频的
--public-dir
参数

Remotion commands MUST use
--public-dir videos/{name}/
— each video's assets stay in its own directory, no copy to
public/
. Enables parallel renders.
bash
npx remotion studio src/remotion/index.ts --public-dir videos/{name}/
npx remotion render src/remotion/index.ts CompositionId videos/{name}/output.mp4 --public-dir videos/{name}/ --video-bitrate 16M
npx remotion still src/remotion/index.ts Thumbnail16x9 videos/{name}/thumbnail.png --public-dir videos/{name}/
Remotion命令必须使用
--public-dir videos/{name}/
——每个视频的素材保存在独立目录中,无需复制到
public/
目录。支持并行渲染。
bash
npx remotion studio src/remotion/index.ts --public-dir videos/{name}/
npx remotion render src/remotion/index.ts CompositionId videos/{name}/output.mp4 --public-dir videos/{name}/ --video-bitrate 16M
npx remotion still src/remotion/index.ts Thumbnail16x9 videos/{name}/thumbnail.png --public-dir videos/{name}/

Naming

命名规则

  • Video name
    {video-name}
    : lowercase English, hyphen-separated (e.g.
    reference-manager-comparison
    )
  • Section name
    {section}
    : lowercase English, underscore-separated, matches
    [SECTION:xxx]
  • Thumbnail naming (16:9 AND 4:3 both required):
Type16:94:3
Remotion
thumbnail_remotion_16x9.png
thumbnail_remotion_4x3.png
AI
thumbnail_ai_16x9.png
thumbnail_ai_4x3.png

  • 视频名称
    {video-name}
    : 小写英文,连字符分隔(如
    reference-manager-comparison
  • 板块名称
    {section}
    : 小写英文,下划线分隔,与
    [SECTION:xxx]
    格式匹配
  • 封面图命名(必须同时生成16:9和4:3两种比例):
类型16:9比例4:3比例
Remotion生成
thumbnail_remotion_16x9.png
thumbnail_remotion_4x3.png
AI生成
thumbnail_ai_16x9.png
thumbnail_ai_4x3.png

Additional Resources

额外资源

Load on demand — do NOT load all at once:
FileLoad when
references/workflow-script.mdSteps 1-4 (topic → script)
references/workflow-production.mdSteps 5-11 (media → TTS → Remotion → render → BGM)
references/workflow-publish.mdSteps 12-15 (subtitles, publish, cleanup, shorts)
references/design-guide.mdMUST load before Step 9 — visual minimums, typography, animation safety
references/design-learning.mdUser provides a reference video/image, or manages style profiles
references/azure-tts-pitfalls.mdChoosing Azure voice/style, debugging hoarse/glitchy audio
references/troubleshooting.mdOn error, or user asks about preferences/BGM
templates/presets/kinetic-typography/Bold type-driven preset (opinion / argument / declaration videos)
examples/Reference for composition structure and
timing.json
format
按需加载——请勿一次性加载所有资源:
文件加载时机
references/workflow-script.md步骤1-4(主题→脚本)
references/workflow-production.md步骤5-11(媒体素材→TTS→Remotion→渲染→背景音乐)
references/workflow-publish.md步骤12-15(字幕、发布、清理、短视频)
references/design-guide.md必须在步骤9前加载——视觉最低要求、排版规范、动画安全准则
references/design-learning.md用户提供参考视频/图片,或管理风格配置文件时
references/azure-tts-pitfalls.md选择Azure语音/风格、调试嘶哑/卡顿音频时
references/troubleshooting.md出现错误,或用户询问偏好设置/背景音乐选项时
templates/presets/kinetic-typography/动态排版预设(适用于观点/论证/宣言类视频)
examples/合成组件结构和
timing.json
格式参考

Script suite dispatcher

脚本套件调度器

All scripts under
${SKILL_DIR}/scripts/
are reachable through one hierarchical entry point:
bash
python3 ${SKILL_DIR}/scripts/cli.py --help                  # list resources
python3 ${SKILL_DIR}/scripts/cli.py <resource> --help       # list actions
python3 ${SKILL_DIR}/scripts/cli.py <resource> <action> --help    # forwards to underlying script
python3 ${SKILL_DIR}/scripts/cli.py schema [<method>]       # JSON parameter schema
Routes:
tts run|validate
,
verify
,
audit beats
,
shorts gen
,
design list|show|delete|add
,
prereqs
,
prefs get|migrate|backend|bgm-path
,
schema [<method>]
. Direct script invocation (
python3 scripts/<name>.py ...
) keeps working — the dispatcher is additive.

${SKILL_DIR}/scripts/
下的所有脚本可通过一个层级化入口访问:
bash
python3 ${SKILL_DIR}/scripts/cli.py --help                  # 列出所有资源
python3 ${SKILL_DIR}/scripts/cli.py <resource> --help       # 列出资源对应的操作
python3 ${SKILL_DIR}/scripts/cli.py <resource> <action> --help    # 跳转至对应脚本的帮助信息
python3 ${SKILL_DIR}/scripts/cli.py schema [<method>]       # 生成JSON参数 schema
路由列表:
tts run|validate
,
verify
,
audit beats
,
shorts gen
,
design list|show|delete|add
,
prereqs
,
prefs get|migrate|backend|bgm-path
,
schema [<method>]
。直接调用脚本(
python3 scripts/<name>.py ...
)仍可正常工作——调度器为附加功能。

User Preferences

用户偏好设置

Skill auto-learns and applies preferences. Full commands and learning details: references/troubleshooting.md.
  • Storage:
    user_prefs.json
    (auto-created from
    user_prefs.template.json
    , schema in
    prefs_schema.json
    ).
  • Priority:
    Root.tsx defaults < global < topic_patterns[type] < current instructions
    .
  • User commands: "show preferences" · "reset preferences" · "save as X default".

本技能可自动学习并应用用户偏好。完整命令和学习细节详见references/troubleshooting.md
  • 存储:
    user_prefs.json
    (从
    user_prefs.template.json
    自动创建,schema定义在
    prefs_schema.json
    中)。
  • 优先级:
    Root.tsx默认值 < 全局设置 < topic_patterns[type] < 当前指令
  • 用户命令: "显示偏好设置" · "重置偏好设置" · "保存为X类默认值"。

Troubleshooting

故障排查

See references/troubleshooting.md on errors, BGM options, preference learning, design-learning issues.
出现错误、背景音乐选项、偏好学习、设计学习相关问题时,请查看references/troubleshooting.md