podcast-pipeline
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePreamble (runs on skill start)
前置流程(技能启动时自动运行)
bash
undefinedbash
undefinedVersion check (silent if up to date)
版本检查(如果是最新版本则无输出)
python3 telemetry/version_check.py 2>/dev/null || true
python3 telemetry/version_check.py 2>/dev/null || true
Telemetry opt-in (first run only, then remembers your choice)
遥测同意弹窗(仅首次运行触发,后续会记住用户选择)
python3 telemetry/telemetry_init.py 2>/dev/null || true
> **Privacy:** This skill logs usage locally to `~/.ai-marketing-skills/analytics/`. Remote telemetry is opt-in only. No code, file paths, or repo content is ever collected. See `telemetry/README.md`.
---python3 telemetry/telemetry_init.py 2>/dev/null || true
> **隐私说明:** 本技能会在本地`~/.ai-marketing-skills/analytics/`目录下记录使用情况。远程遥测仅在用户主动同意的情况下开启,绝不会收集任何代码、文件路径或仓库内容,详情可查看`telemetry/README.md`。
---Podcast-to-Everything Pipeline
播客万能内容流水线
Turns podcast episodes into a full content calendar across every platform.
One episode in, 15-20 content pieces out — scored, deduplicated, and scheduled.
将单集播客转为覆盖全平台的完整内容日历,输入1集播客,可产出15-20条内容素材——所有素材均完成打分、去重、排期。
Step 1: Ingest — Get the Transcript
步骤1:输入获取——拿到转录文本
Determine the input source and obtain a clean transcript.
确定输入源并获取干净的转录文本。
Option A: RSS Feed (--rss <url>
)
--rss <url>选项A:RSS Feed (--rss <url>
)
--rss <url>- Fetch the RSS feed XML
- Extract the latest episode's audio URL (or use for batch)
--episodes N - Download the audio file
- Transcribe via OpenAI Whisper API (with timestamps)
- Store transcript with episode metadata (title, date, description, duration)
- 拉取RSS feed XML文件
- 提取最新一集的音频URL(或使用参数批量处理N集)
--episodes N - 下载音频文件
- 调用OpenAI Whisper API生成带时间戳的转录文本
- 存储转录文本及播客元数据(标题、发布日期、简介、时长)
Option B: Raw Transcript (--transcript <file>
)
--transcript <file>选项B:原始转录文本 (--transcript <file>
)
--transcript <file>- Read the transcript file (plain text, SRT, or VTT)
- Parse timestamps if present
- Extract episode metadata from filename or prompt user
- 读取转录文件(支持纯文本、SRT、VTT格式)
- 若文件包含时间戳则自动解析
- 从文件名提取播客元数据或提示用户补充
Option C: Batch Mode (--batch <rss_url> --episodes N
)
--batch <rss_url> --episodes N选项C:批量模式 (--batch <rss_url> --episodes N
)
--batch <rss_url> --episodes N- Fetch RSS feed
- Extract the last N episodes
- Process each through the full pipeline
- Deduplicate across all episodes in the batch
- 拉取RSS feed
- 提取最新N集播客
- 对每集执行全流水线处理
- 批量内所有播客内容统一去重
Transcript cleanup
转录文本清理规则
- Remove filler words (um, uh, like, you know) for written content
- Preserve original with timestamps for video clip suggestions
- Split into logical segments by topic shift
- 书面类内容移除填充词(嗯、呃、那个、你知道的)
- 保留带时间戳的原始文本用于短视频片段推荐
- 根据话题切换自动拆分逻辑片段
Step 2: Editorial Brain — Deep Analysis
步骤2:内容大脑——深度分析
Feed the full transcript to the LLM with this extraction framework:
将完整转录文本输入大模型,按以下框架提取内容原子:
Extract these content atoms:
提取7类内容原子:
-
Narrative Arcs — Complete story segments with setup → tension → resolution. Tag with start/end timestamps.
-
Quotable Moments — Punchy, shareable statements. One-liners that stand alone. Must pass the "would someone screenshot this?" test.
-
Controversial Takes — Opinions that go against conventional wisdom. The stuff that makes people reply "hard disagree" or "finally someone said it."
-
Data Points — Specific numbers, percentages, dollar amounts, timeframes. Concrete proof points that add credibility.
-
Stories — Personal anecdotes, case studies, client examples. Must have a character, a problem, and an outcome.
-
Frameworks — Step-by-step processes, mental models, decision matrices. Anything structured that people would save or bookmark.
-
Predictions — Forward-looking claims about trends, markets, technology. Hot takes about where things are going.
- 叙事弧 — 包含铺垫→冲突→解决的完整故事片段,标注起止时间戳。
- 金句时刻 — 有力、易传播的短句,可独立传播,需通过「会有人愿意截图转发吗?」测试。
- 争议观点 — 违背大众共识的观点,能引发用户回复「完全不同意」或者「终于有人说真话了」。
- 数据点 — 具体的数字、百分比、金额、时间周期等可提升可信度的实证内容。
- 故事案例 — 个人轶事、案例研究、客户示例,需包含角色、问题、结果三个要素。
- 框架方法 — 分步流程、思维模型、决策矩阵等用户会收藏保存的结构化内容。
- 趋势预测 — 关于趋势、市场、技术的前瞻性判断,对行业走向的新锐观点。
Output format per atom:
单条内容原子输出格式:
- Type: [narrative_arc | quote | controversial_take | data_point | story | framework | prediction]
- Content: [extracted text]
- Timestamp: [start - end, if available]
- Context: [what was being discussed]
- Viral Score: [0-100, see Step 4]
- Suggested platforms: [where this atom works best]- 类型: [narrative_arc | quote | controversial_take | data_point | story | framework | prediction]
- 内容: [提取的文本]
- 时间戳: [起止时间,存在则标注]
- 上下文: [当时讨论的主题]
- 传播得分: [0-100,规则见步骤4]
- 推荐平台: [该内容最适配的发布平台]Step 3: Content Generation — One Episode, Many Pieces
步骤3:内容生成——单集输入,多素材输出
For each episode, generate ALL of these from the extracted atoms:
基于提取的内容原子,为单集播客生成所有以下类型的内容:
3a. Short-Form Video Clips (3-5 per episode)
3a. 短视频片段(每集3-5条)
- Hook: [First 3 seconds — pattern interrupt or bold claim]
- Clip segment: [Timestamp range from transcript]
- Caption overlay: [Text for the screen]
- Platform: [YouTube Shorts / TikTok / Instagram Reels]
- Why it works: [What makes this clippable]Prioritize: controversial takes > stories with payoffs > surprising data points
- 钩子: [前3秒内容——反差感内容或大胆观点]
- 片段范围: [转录文本对应的时间戳区间]
- 字幕: [屏幕上显示的文字]
- 适配平台: [YouTube Shorts / TikTok / Instagram Reels]
- 传播逻辑: [该片段适合剪辑的原因]优先级:争议观点 > 有反转的故事 > 出人意料的数据点
3b. Twitter/X Threads (2-3 per episode)
3b. Twitter/X帖子串(每集2-3条)
- Thread hook (tweet 1): [Curiosity gap or bold opener]
- Thread body (5-10 tweets): [Each tweet is one complete thought]
- Thread closer: [CTA — follow, reply, retweet trigger]
- Source atoms: [Which content atoms feed this thread]Rules: No tweet over 280 chars. Each tweet must stand alone. Use data points as proof.
- 串首钩子(第1条推文): [好奇心缺口或大胆开场白]
- 串主体(5-10条推文): [每条推文为一个完整观点]
- 串结尾: [行动号召——引导关注、回复、转发]
- 来源原子: [该帖串使用的内容原子]规则:单条推文不超过280字符,每条可独立传播,用数据点作为佐证。
3c. LinkedIn Article Draft (1 per episode)
3c. LinkedIn文章草稿(每集1篇)
- Headline: [Specific, benefit-driven]
- Hook paragraph: [Before the "see more" fold — must earn the click]
- Body: [3-5 sections with headers, 800-1200 words]
- CTA: [Engagement driver — question, not link]
- Hashtags: [3-5 relevant, not spammy]Voice: Professional but not corporate. First-person. Story-driven.
- 标题: [具体、突出收益]
- 开篇段落: [「展开更多」折叠前的内容——必须吸引用户点击展开]
- 正文: [3-5个带小标题的章节,800-1200字]
- 行动号召: [引导互动——用提问而非链接]
- 话题标签: [3-5个相关标签,避免过度营销]语气:专业但不死板,第一人称,以故事为驱动。
3d. Newsletter Section (1 per episode)
3d. 通讯板块(每集1块)
- Section headline: [Scannable, specific]
- TL;DR: [One sentence, the core insight]
- Body: [3-5 bullet points, each with a takeaway]
- Pull quote: [The most shareable line from the episode]
- Link: [Back to full episode]- 板块标题: [易扫描、内容具体]
- 摘要: [一句话概括核心洞察]
- 正文: [3-5个要点,每个对应一个可落地的收获]
- 引语: [单集最值得传播的一句话]
- 链接: [跳转至完整播客的链接]3e. Quote Cards (3-5 per episode)
3e. 金句卡片(每集3-5张)
- Quote text: [Max 20 words — must work as text overlay]
- Attribution: [Speaker name]
- Background suggestion: [Color/mood that matches the tone]
- Platform sizing: [1080x1080 for IG, 1200x675 for Twitter, 1080x1920 for Stories]- 金句文本: [最多20字——适合作为文字叠加层]
- 署名: [发言者姓名]
- 背景建议: [匹配内容语气的颜色/风格]
- 平台尺寸: [IG用1080x1080,Twitter用1200x675,Stories用1080x1920]3f. Blog Post Outline (1 per episode)
3f. 博客大纲(每集1份)
- Title: [SEO-optimized, includes primary keyword]
- Primary keyword: [Search volume + difficulty estimate]
- Secondary keywords: [3-5 related terms]
- Meta description: [155 chars max]
- H2 sections: [5-7, each maps to a content atom]
- Internal linking opportunities: [Topics that connect to existing content]
- Estimated word count: [1500-2500]- 标题: [SEO优化,包含主关键词]
- 主关键词: [搜索量+难度评估]
- 次关键词: [3-5个相关术语]
- 元描述: [最多155字符]
- H2章节: [5-7个,每个对应一个内容原子]
- 内链机会: [可关联到现有内容的主题]
- 预估字数: [1500-2500]3g. YouTube Shorts / TikTok Script (1 per episode)
3g. YouTube Shorts / TikTok脚本(每集1份)
- HOOK (0-3s): [Pattern interrupt — question, bold claim, or visual]
- SETUP (3-15s): [Context — why should they care]
- PAYOFF (15-45s): [The insight, data, or story resolution]
- CTA (45-60s): [Follow, comment prompt, or part 2 tease]
- On-screen text: [Key phrases to overlay]
- B-roll suggestions: [Visual ideas if not talking-head]- 钩子(0-3秒): [反差内容——提问、大胆观点、视觉冲击]
- 背景铺垫(3-15秒): [上下文说明——为什么用户需要关注]
- 核心产出(15-45秒): [洞察、数据或故事结局]
- 行动号召(45-60秒): [引导关注、评论,或预告下一期内容]
- 屏幕文字: [需要叠加的关键短语]
- 补拍素材建议: [非露脸内容的视频创意]Step 4: Content Scoring — Viral Potential
步骤4:内容打分——传播潜力评估
Score every generated piece on three dimensions (each 0-100):
| Dimension | What It Measures | Signals |
|---|---|---|
| Novelty | Is this new or surprising? | Contrarian takes, unexpected data, first-to-say |
| Controversy | Will people argue about this? | Strong opinions, challenges norms, picks a side |
| Utility | Can someone use this immediately? | Frameworks, how-tos, templates, specific numbers |
Viral Score = (Novelty × 0.4) + (Controversy × 0.3) + (Utility × 0.3)
从三个维度为所有生成的内容打分(每个维度0-100分):
| 维度 | 衡量标准 | 判断信号 |
|---|---|---|
| 新颖度 | 内容是否新鲜或出人意料? | 反常识观点、意外数据、首次公开的信息 |
| 争议度 | 内容会不会引发讨论? | 强烈的观点、挑战常规、明确站队 |
| 实用度 | 用户能不能立刻用上? | 框架方法、操作指南、模板、具体数字 |
传播得分 = (新颖度 × 0.4) + (争议度 × 0.3) + (实用度 × 0.3)
Score thresholds:
得分阈值规则:
- 80+ → Priority publish. Schedule for peak engagement windows.
- 60-79 → Solid content. Fill the calendar.
- 40-59 → Filler. Use only if calendar has gaps.
- Below 40 → Cut it. Not worth the publish slot.
- 80分以上 → 优先发布,排期在用户活跃高峰时段。
- 60-79分 → 优质内容,填满内容日历。
- 40-59分 → 填充内容,仅在日历有空缺时使用。
- 40分以下 → 直接舍弃,不值得占用发布位置。
Step 5: Dedup Engine
步骤5:去重引擎
Before finalizing, check all generated content against:
- This batch — No two pieces should cover the same angle
- Recent history — Compare against last N days of output (default: 30)
- Similarity threshold — Flag any pair with >70% semantic overlap
最终定稿前,将所有生成内容与以下内容做比对:
- 本次批量产出内容 — 不能有两条内容覆盖同一个角度
- 近期历史内容 — 与过去N天的发布内容做比对(默认30天)
- 相似度阈值 — 标记语义重叠度超过70%的内容对
Dedup rules:
去重规则:
- If two pieces overlap >70%: keep the higher-scored one, cut the other
- If a piece overlaps with recently published content: flag with ⚠️ and suggest a differentiation angle
- Track all published content hashes in
output/content_history.json
- 若两条内容重叠度超过70%:保留得分更高的,删除另一条
- 若内容与近期已发布内容重叠:标记⚠️并提供差异化修改建议
- 所有已发布内容的哈希值记录在
output/content_history.json
Step 6: Calendar Generation (--calendar
)
--calendar步骤6:日历生成 (--calendar
)
--calendarAssemble scored, deduplicated content into a weekly publish calendar.
将完成打分、去重的内容组装为周度发布日历。
Scheduling rules:
排期规则:
- Twitter/X: 1-2 per day, peak hours (8-10am, 12-1pm, 5-7pm ET)
- LinkedIn: 1 per day max, Tuesday-Thursday mornings
- YouTube Shorts/TikTok: 1 per day, evenings
- Newsletter: Weekly, same day each week
- Blog: 1-2 per week
- Quote cards: Intersperse on low-content days
- Twitter/X: 每天1-2条,高峰时段(美国东部时间8-10点、12-13点、17-19点)
- LinkedIn: 每天最多1条,周二到周四上午发布
- YouTube Shorts/TikTok: 每天1条,晚间发布
- 通讯: 每周1次,固定周几发布
- 博客: 每周1-2篇
- 金句卡片: 穿插在内容较少的日期发布
Calendar output format:
日历输出格式:
json
{
"week_of": "2024-01-15",
"episode_source": "Episode Title - Guest Name",
"content_pieces": [
{
"date": "2024-01-15",
"time": "09:00 ET",
"platform": "twitter",
"type": "thread",
"content": "...",
"viral_score": 85,
"status": "draft"
}
],
"total_pieces": 18,
"avg_viral_score": 72,
"coverage": {
"twitter": 6,
"linkedin": 3,
"youtube_shorts": 3,
"newsletter": 1,
"blog": 1,
"quote_cards": 4
}
}json
{
"week_of": "2024-01-15",
"episode_source": "Episode Title - Guest Name",
"content_pieces": [
{
"date": "2024-01-15",
"time": "09:00 ET",
"platform": "twitter",
"type": "thread",
"content": "...",
"viral_score": 85,
"status": "draft"
}
],
"total_pieces": 18,
"avg_viral_score": 72,
"coverage": {
"twitter": 6,
"linkedin": 3,
"youtube_shorts": 3,
"newsletter": 1,
"blog": 1,
"quote_cards": 4
}
}Step 7: Output
步骤7:输出
All output goes to directory:
output/output/
├── episodes/
│ ├── YYYY-MM-DD-episode-slug/
│ │ ├── transcript.txt
│ │ ├── atoms.json # Extracted content atoms
│ │ ├── content_pieces.json # All generated content
│ │ └── calendar.json # Scheduled calendar
│ └── ...
├── calendar/
│ └── week-YYYY-WNN.json # Aggregated weekly calendar
├── content_history.json # Dedup tracking
└── pipeline_log.json # Run history and stats所有输出内容保存在目录下:
output/output/
├── episodes/
│ ├── YYYY-MM-DD-episode-slug/
│ │ ├── transcript.txt
│ │ ├── atoms.json # 提取的内容原子
│ │ ├── content_pieces.json # 所有生成的内容
│ │ └── calendar.json # 排期完成的日历
│ └── ...
├── calendar/
│ └── week-YYYY-WNN.json # 聚合后的周度日历
├── content_history.json # 去重追踪记录
└── pipeline_log.json # 运行历史和统计数据CLI Reference
CLI参考
bash
undefinedbash
undefinedProcess latest episode from RSS feed
处理RSS feed中的最新一集播客
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml"
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml"
Process a local transcript
处理本地转录文件
python podcast_pipeline.py --transcript episode-42.txt
python podcast_pipeline.py --transcript episode-42.txt
Batch process last 5 episodes
批量处理最近5集播客
python podcast_pipeline.py --batch "https://feeds.example.com/podcast.xml" --episodes 5
python podcast_pipeline.py --batch "https://feeds.example.com/podcast.xml" --episodes 5
Generate weekly calendar from existing outputs
基于现有生成内容生成周度日历
python podcast_pipeline.py --calendar
python podcast_pipeline.py --calendar
Process with custom dedup window
自定义去重时间窗口处理
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --dedup-days 60
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --dedup-days 60
Process and only keep 80+ viral score content
处理后仅保留传播得分80分以上的内容
python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --min-score 80
---python podcast_pipeline.py --rss "https://feeds.example.com/podcast.xml" --min-score 80
---Environment Variables
环境变量
| Variable | Required | Description |
|---|---|---|
| Yes (for Whisper) | OpenAI API key for audio transcription |
| Yes (for generation) | Anthropic API key for content generation |
| Optional | Separate OpenAI key if using GPT for generation instead |
| 变量名 | 必填 | 说明 |
|---|---|---|
| 是(转录需要) | 用于音频转录的OpenAI API密钥 |
| 是(生成需要) | 用于内容生成的Anthropic API密钥 |
| 可选 | 若使用GPT而非Anthropic模型生成内容,可单独配置该OpenAI密钥 |
Reference Files
参考文件
| File | Purpose |
|---|---|
| Main pipeline script |
| Python dependencies |
| Setup and usage guide |
| 文件名 | 用途 |
|---|---|
| 流水线主脚本 |
| Python依赖包列表 |
| 安装和使用指南 |