podcast-pipeline

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Preamble (runs on skill start)

前置流程(技能启动时自动运行)

bash
undefined
bash
undefined

Version check (silent if up to date)

版本检查(如果是最新版本则无输出)

python3 telemetry/version_check.py 2>/dev/null || true
python3 telemetry/version_check.py 2>/dev/null || true

Telemetry opt-in (first run only, then remembers your choice)

遥测同意弹窗(仅首次运行触发,后续会记住用户选择)

python3 telemetry/telemetry_init.py 2>/dev/null || true

> **Privacy:** This skill logs usage locally to `~/.ai-marketing-skills/analytics/`. Remote telemetry is opt-in only. No code, file paths, or repo content is ever collected. See `telemetry/README.md`.

---
python3 telemetry/telemetry_init.py 2>/dev/null || true

> **隐私说明:** 本技能会在本地`~/.ai-marketing-skills/analytics/`目录下记录使用情况。远程遥测仅在用户主动同意的情况下开启,绝不会收集任何代码、文件路径或仓库内容,详情可查看`telemetry/README.md`。

---

Podcast-to-Everything Pipeline

播客万能内容流水线

Turns podcast episodes into a full content calendar across every platform. One episode in, 15-20 content pieces out — scored, deduplicated, and scheduled.

将单集播客转为覆盖全平台的完整内容日历,输入1集播客,可产出15-20条内容素材——所有素材均完成打分、去重、排期。

Step 1: Ingest — Get the Transcript

步骤1:输入获取——拿到转录文本

Determine the input source and obtain a clean transcript.
确定输入源并获取干净的转录文本。

Option A: RSS Feed (
--rss <url>
)

选项A:RSS Feed (
--rss <url>
)

  1. Fetch the RSS feed XML
  2. Extract the latest episode's audio URL (or use
    --episodes N
    for batch)
  3. Download the audio file
  4. Transcribe via OpenAI Whisper API (with timestamps)
  5. Store transcript with episode metadata (title, date, description, duration)
  1. 拉取RSS feed XML文件
  2. 提取最新一集的音频URL(或使用
    --episodes N
    参数批量处理N集)
  3. 下载音频文件
  4. 调用OpenAI Whisper API生成带时间戳的转录文本
  5. 存储转录文本及播客元数据(标题、发布日期、简介、时长)

Option B: Raw Transcript (
--transcript <file>
)

选项B:原始转录文本 (
--transcript <file>
)

  1. Read the transcript file (plain text, SRT, or VTT)
  2. Parse timestamps if present
  3. Extract episode metadata from filename or prompt user
  1. 读取转录文件(支持纯文本、SRT、VTT格式)
  2. 若文件包含时间戳则自动解析
  3. 从文件名提取播客元数据或提示用户补充

Option C: Batch Mode (
--batch <rss_url> --episodes N
)

选项C:批量模式 (
--batch <rss_url> --episodes N
)

  1. Fetch RSS feed
  2. Extract the last N episodes
  3. Process each through the full pipeline
  4. Deduplicate across all episodes in the batch
  1. 拉取RSS feed
  2. 提取最新N集播客
  3. 对每集执行全流水线处理
  4. 批量内所有播客内容统一去重

Transcript cleanup

转录文本清理规则

  • Remove filler words (um, uh, like, you know) for written content
  • Preserve original with timestamps for video clip suggestions
  • Split into logical segments by topic shift

  • 书面类内容移除填充词(嗯、呃、那个、你知道的)
  • 保留带时间戳的原始文本用于短视频片段推荐
  • 根据话题切换自动拆分逻辑片段

Step 2: Editorial Brain — Deep Analysis

步骤2:内容大脑——深度分析

Feed the full transcript to the LLM with this extraction framework:
将完整转录文本输入大模型,按以下框架提取内容原子:

Extract these content atoms:

提取7类内容原子:

  1. Narrative Arcs — Complete story segments with setup → tension → resolution. Tag with start/end timestamps.
  2. Quotable Moments — Punchy, shareable statements. One-liners that stand alone. Must pass the "would someone screenshot this?" test.
  3. Controversial Takes — Opinions that go against conventional wisdom. The stuff that makes people reply "hard disagree" or "finally someone said it."
  4. Data Points — Specific numbers, percentages, dollar amounts, timeframes. Concrete proof points that add credibility.
  5. Stories — Personal anecdotes, case studies, client examples. Must have a character, a problem, and an outcome.
  6. Frameworks — Step-by-step processes, mental models, decision matrices. Anything structured that people would save or bookmark.
  7. Predictions — Forward-looking claims about trends, markets, technology. Hot takes about where things are going.
  1. 叙事弧 — 包含铺垫→冲突→解决的完整故事片段,标注起止时间戳。
  2. 金句时刻 — 有力、易传播的短句,可独立传播,需通过「会有人愿意截图转发吗?」测试。
  3. 争议观点 — 违背大众共识的观点,能引发用户回复「完全不同意」或者「终于有人说真话了」。
  4. 数据点 — 具体的数字、百分比、金额、时间周期等可提升可信度的实证内容。
  5. 故事案例 — 个人轶事、案例研究、客户示例,需包含角色、问题、结果三个要素。
  6. 框架方法 — 分步流程、思维模型、决策矩阵等用户会收藏保存的结构化内容。
  7. 趋势预测 — 关于趋势、市场、技术的前瞻性判断,对行业走向的新锐观点。

Output format per atom:

单条内容原子输出格式:

- Type: [narrative_arc | quote | controversial_take | data_point | story | framework | prediction]
- Content: [extracted text]
- Timestamp: [start - end, if available]
- Context: [what was being discussed]
- Viral Score: [0-100, see Step 4]
- Suggested platforms: [where this atom works best]

- 类型: [narrative_arc | quote | controversial_take | data_point | story | framework | prediction]
- 内容: [提取的文本]
- 时间戳: [起止时间,存在则标注]
- 上下文: [当时讨论的主题]
- 传播得分: [0-100,规则见步骤4]
- 推荐平台: [该内容最适配的发布平台]

Step 3: Content Generation — One Episode, Many Pieces

步骤3:内容生成——单集输入,多素材输出

For each episode, generate ALL of these from the extracted atoms:
基于提取的内容原子,为单集播客生成所有以下类型的内容:

3a. Short-Form Video Clips (3-5 per episode)

3a. 短视频片段(每集3-5条)

- Hook: [First 3 seconds — pattern interrupt or bold claim]
- Clip segment: [Timestamp range from transcript]
- Caption overlay: [Text for the screen]
- Platform: [YouTube Shorts / TikTok / Instagram Reels]
- Why it works: [What makes this clippable]
Prioritize: controversial takes > stories with payoffs > surprising data points
- 钩子: [前3秒内容——反差感内容或大胆观点]
- 片段范围: [转录文本对应的时间戳区间]
- 字幕: [屏幕上显示的文字]
- 适配平台: [YouTube Shorts / TikTok / Instagram Reels]
- 传播逻辑: [该片段适合剪辑的原因]
优先级:争议观点 > 有反转的故事 > 出人意料的数据点

3b. Twitter/X Threads (2-3 per episode)

3b. Twitter/X帖子串(每集2-3条)

- Thread hook (tweet 1): [Curiosity gap or bold opener]
- Thread body (5-10 tweets): [Each tweet is one complete thought]
- Thread closer: [CTA — follow, reply, retweet trigger]
- Source atoms: [Which content atoms feed this thread]
Rules: No tweet over 280 chars. Each tweet must stand alone. Use data points as proof.
- 串首钩子(第1条推文): [好奇心缺口或大胆开场白]
- 串主体(5-10条推文): [每条推文为一个完整观点]
- 串结尾: [行动号召——引导关注、回复、转发]
- 来源原子: [该帖串使用的内容原子]
规则:单条推文不超过280字符,每条可独立传播,用数据点作为佐证。

3c. LinkedIn Article Draft (1 per episode)

3c. LinkedIn文章草稿(每集1篇)

- Headline: [Specific, benefit-driven]
- Hook paragraph: [Before the "see more" fold — must earn the click]
- Body: [3-5 sections with headers, 800-1200 words]
- CTA: [Engagement driver — question, not link]
- Hashtags: [3-5 relevant, not spammy]
Voice: Professional but not corporate. First-person. Story-driven.
- 标题: [具体、突出收益]
- 开篇段落: [「展开更多」折叠前的内容——必须吸引用户点击展开]
- 正文: [3-5个带小标题的章节,800-1200字]
- 行动号召: [引导互动——用提问而非链接]
- 话题标签: [3-5个相关标签,避免过度营销]
语气:专业但不死板,第一人称,以故事为驱动。

3d. Newsletter Section (1 per episode)

3d. 通讯板块(每集1块)

- Section headline: [Scannable, specific]
- TL;DR: [One sentence, the core insight]
- Body: [3-5 bullet points, each with a takeaway]
- Pull quote: [The most shareable line from the episode]
- Link: [Back to full episode]
- 板块标题: [易扫描、内容具体]
- 摘要: [一句话概括核心洞察]
- 正文: [3-5个要点,每个对应一个可落地的收获]
- 引语: [单集最值得传播的一句话]
- 链接: [跳转至完整播客的链接]

3e. Quote Cards (3-5 per episode)

3e. 金句卡片(每集3-5张)

- Quote text: [Max 20 words — must work as text overlay]
- Attribution: [Speaker name]
- Background suggestion: [Color/mood that matches the tone]
- Platform sizing: [1080x1080 for IG, 1200x675 for Twitter, 1080x1920 for Stories]
- 金句文本: [最多20字——适合作为文字叠加层]
- 署名: [发言者姓名]
- 背景建议: [匹配内容语气的颜色/风格]
- 平台尺寸: [IG用1080x1080,Twitter用1200x675,Stories用1080x1920]

3f. Blog Post Outline (1 per episode)

3f. 博客大纲(每集1份)

- Title: [SEO-optimized, includes primary keyword]
- Primary keyword: [Search volume + difficulty estimate]
- Secondary keywords: [3-5 related terms]
- Meta description: [155 chars max]
- H2 sections: [5-7, each maps to a content atom]
- Internal linking opportunities: [Topics that connect to existing content]
- Estimated word count: [1500-2500]
- 标题: [SEO优化,包含主关键词]
- 主关键词: [搜索量+难度评估]
- 次关键词: [3-5个相关术语]
- 元描述: [最多155字符]
- H2章节: [5-7个,每个对应一个内容原子]
- 内链机会: [可关联到现有内容的主题]
- 预估字数: [1500-2500]

3g. YouTube Shorts / TikTok Script (1 per episode)

3g. YouTube Shorts / TikTok脚本(每集1份)

- HOOK (0-3s): [Pattern interrupt — question, bold claim, or visual]
- SETUP (3-15s): [Context — why should they care]
- PAYOFF (15-45s): [The insight, data, or story resolution]
- CTA (45-60s): [Follow, comment prompt, or part 2 tease]
- On-screen text: [Key phrases to overlay]
- B-roll suggestions: [Visual ideas if not talking-head]

- 钩子(0-3秒): [反差内容——提问、大胆观点、视觉冲击]
- 背景铺垫(3-15秒): [上下文说明——为什么用户需要关注]
- 核心产出(15-45秒): [洞察、数据或故事结局]
- 行动号召(45-60秒): [引导关注、评论,或预告下一期内容]
- 屏幕文字: [需要叠加的关键短语]
- 补拍素材建议: [非露脸内容的视频创意]

Step 4: Content Scoring — Viral Potential

步骤4:内容打分——传播潜力评估

Score every generated piece on three dimensions (each 0-100):
DimensionWhat It MeasuresSignals
NoveltyIs this new or surprising?Contrarian takes, unexpected data, first-to-say
ControversyWill people argue about this?Strong opinions, challenges norms, picks a side
UtilityCan someone use this immediately?Frameworks, how-tos, templates, specific numbers
Viral Score = (Novelty × 0.4) + (Controversy × 0.3) + (Utility × 0.3)
从三个维度为所有生成的内容打分(每个维度0-100分):
维度衡量标准判断信号
新颖度内容是否新鲜或出人意料?反常识观点、意外数据、首次公开的信息
争议度内容会不会引发讨论?强烈的观点、挑战常规、明确站队
实用度用户能不能立刻用上?框架方法、操作指南、模板、具体数字
传播得分 = (新颖度 × 0.4) + (争议度 × 0.3) + (实用度 × 0.3)

Score thresholds:

得分阈值规则:

  • 80+ → Priority publish. Schedule for peak engagement windows.
  • 60-79 → Solid content. Fill the calendar.
  • 40-59 → Filler. Use only if calendar has gaps.
  • Below 40 → Cut it. Not worth the publish slot.

  • 80分以上 → 优先发布,排期在用户活跃高峰时段。
  • 60-79分 → 优质内容,填满内容日历。
  • 40-59分 → 填充内容,仅在日历有空缺时使用。
  • 40分以下 → 直接舍弃,不值得占用发布位置。

Step 5: Dedup Engine

步骤5:去重引擎

Before finalizing, check all generated content against:
  1. This batch — No two pieces should cover the same angle
  2. Recent history — Compare against last N days of output (default: 30)
  3. Similarity threshold — Flag any pair with >70% semantic overlap
最终定稿前,将所有生成内容与以下内容做比对:
  1. 本次批量产出内容 — 不能有两条内容覆盖同一个角度
  2. 近期历史内容 — 与过去N天的发布内容做比对(默认30天)
  3. 相似度阈值 — 标记语义重叠度超过70%的内容对

Dedup rules:

去重规则:

  • If two pieces overlap >70%: keep the higher-scored one, cut the other
  • If a piece overlaps with recently published content: flag with ⚠️ and suggest a differentiation angle
  • Track all published content hashes in
    output/content_history.json

  • 若两条内容重叠度超过70%:保留得分更高的,删除另一条
  • 若内容与近期已发布内容重叠:标记⚠️并提供差异化修改建议
  • 所有已发布内容的哈希值记录在
    output/content_history.json

Step 6: Calendar Generation (
--calendar
)

步骤6:日历生成 (
--calendar
)

Assemble scored, deduplicated content into a weekly publish calendar.
将完成打分、去重的内容组装为周度发布日历。

Scheduling rules:

排期规则:

  • Twitter/X: 1-2 per day, peak hours (8-10am, 12-1pm, 5-7pm ET)
  • LinkedIn: 1 per day max, Tuesday-Thursday mornings
  • YouTube Shorts/TikTok: 1 per day, evenings
  • Newsletter: Weekly, same day each week
  • Blog: 1-2 per week
  • Quote cards: Intersperse on low-content days
  • Twitter/X: 每天1-2条,高峰时段(美国东部时间8-10点、12-13点、17-19点)
  • LinkedIn: 每天最多1条,周二到周四上午发布
  • YouTube Shorts/TikTok: 每天1条,晚间发布
  • 通讯: 每周1次,固定周几发布
  • 博客: 每周1-2篇
  • 金句卡片: 穿插在内容较少的日期发布

Calendar output format:

日历输出格式:

json
{
  "week_of": "2024-01-15",
  "episode_source": "Episode Title - Guest Name",
  "content_pieces": [
    {
      "date": "2024-01-15",
      "time": "09:00 ET",
      "platform": "twitter",
      "type": "thread",
      "content": "...",
      "viral_score": 85,
      "status": "draft"
    }
  ],
  "total_pieces": 18,
  "avg_viral_score": 72,
  "coverage": {
    "twitter": 6,
    "linkedin": 3,
    "youtube_shorts": 3,
    "newsletter": 1,
    "blog": 1,
    "quote_cards": 4
  }
}

json
{
  "week_of": "2024-01-15",
  "episode_source": "Episode Title - Guest Name",
  "content_pieces": [
    {
      "date": "2024-01-15",
      "time": "09:00 ET",
      "platform": "twitter",
      "type": "thread",
      "content": "...",
      "viral_score": 85,
      "status": "draft"
    }
  ],
  "total_pieces": 18,
  "avg_viral_score": 72,
  "coverage": {
    "twitter": 6,
    "linkedin": 3,
    "youtube_shorts": 3,
    "newsletter": 1,
    "blog": 1,
    "quote_cards": 4
  }
}

Step 7: Output

步骤7:输出

All output goes to
output/
directory:
output/
├── episodes/
│   ├── YYYY-MM-DD-episode-slug/
│   │   ├── transcript.txt
│   │   ├── atoms.json          # Extracted content atoms
│   │   ├── content_pieces.json # All generated content
│   │   └── calendar.json       # Scheduled calendar
│   └── ...
├── calendar/
│   └── week-YYYY-WNN.json     # Aggregated weekly calendar
├── content_history.json        # Dedup tracking
└── pipeline_log.json           # Run history and stats

所有输出内容保存在
output/
目录下:
output/
├── episodes/
│   ├── YYYY-MM-DD-episode-slug/
│   │   ├── transcript.txt
│   │   ├── atoms.json          # 提取的内容原子
│   │   ├── content_pieces.json # 所有生成的内容
│   │   └── calendar.json       # 排期完成的日历
│   └── ...
├── calendar/
│   └── week-YYYY-WNN.json     # 聚合后的周度日历
├── content_history.json        # 去重追踪记录
└── pipeline_log.json           # 运行历史和统计数据

CLI Reference

CLI参考

bash
undefined
bash
undefined

Process latest episode from RSS feed

处理RSS feed中的最新一集播客

python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml"
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml"

Process a local transcript

处理本地转录文件

python podcast_pipeline.py --transcript episode-42.txt
python podcast_pipeline.py --transcript episode-42.txt

Batch process last 5 episodes

批量处理最近5集播客

python podcast_pipeline.py --batch "https://feeds.example.com/podcast.xml" --episodes 5
python podcast_pipeline.py --batch "https://feeds.example.com/podcast.xml" --episodes 5

Generate weekly calendar from existing outputs

基于现有生成内容生成周度日历

python podcast_pipeline.py --calendar
python podcast_pipeline.py --calendar

Process with custom dedup window

自定义去重时间窗口处理

python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --dedup-days 60
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --dedup-days 60

Process and only keep 80+ viral score content

处理后仅保留传播得分80分以上的内容

python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --min-score 80

---
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --min-score 80

---

Environment Variables

环境变量

VariableRequiredDescription
OPENAI_API_KEY
Yes (for Whisper)OpenAI API key for audio transcription
ANTHROPIC_API_KEY
Yes (for generation)Anthropic API key for content generation
OPENAI_LLM_KEY
OptionalSeparate OpenAI key if using GPT for generation instead

变量名必填说明
OPENAI_API_KEY
是(转录需要)用于音频转录的OpenAI API密钥
ANTHROPIC_API_KEY
是(生成需要)用于内容生成的Anthropic API密钥
OPENAI_LLM_KEY
可选若使用GPT而非Anthropic模型生成内容,可单独配置该OpenAI密钥

Reference Files

参考文件

FilePurpose
podcast_pipeline.py
Main pipeline script
requirements.txt
Python dependencies
README.md
Setup and usage guide
文件名用途
podcast_pipeline.py
流水线主脚本
requirements.txt
Python依赖包列表
README.md
安装和使用指南