podcast-generation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Podcast Generation Skill

播客生成Skill

Overview

概述

This skill generates high-quality podcast audio from text content. The workflow includes creating a structured JSON script (conversational dialogue) and executing audio generation through text-to-speech synthesis.
此Skill可根据文本内容生成高质量的播客音频。工作流程包括创建结构化的JSON脚本(对话式内容),并通过文本转语音合成执行音频生成。

Core Capabilities

核心功能

  • Convert any text content (articles, reports, documentation) into podcast scripts
  • Generate natural two-host conversational dialogue (male and female hosts)
  • Synthesize speech audio using text-to-speech
  • Mix audio chunks into a final podcast MP3 file
  • Support both English and Chinese content
  • 将任意文本内容(文章、报告、文档)转换为播客脚本
  • 生成自然的双主播对话内容(男女主播)
  • 通过文本转语音(TTS)合成语音音频
  • 将音频片段混合为最终的播客MP3文件
  • 支持英文和中文内容

Workflow

工作流程

Step 1: Understand Requirements

步骤1:理解需求

When a user requests podcast generation, identify:
  • Source content: The text/article/report to convert into a podcast
  • Language: English or Chinese (based on content)
  • Output location: Where to save the generated podcast
  • You don't need to check the folder under
    /mnt/user-data
当用户请求生成播客时,需明确:
  • 源内容:要转换为播客的文本/文章/报告
  • 语言:英文或中文(根据内容确定)
  • 输出位置:生成的播客保存路径
  • 无需检查
    /mnt/user-data
    下的文件夹

Step 2: Create Structured Script JSON

步骤2:创建结构化脚本JSON

Generate a structured JSON script file in
/mnt/user-data/workspace/
with naming pattern:
{descriptive-name}-script.json
The JSON structure:
json
{
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "dialogue text"},
    {"speaker": "female", "paragraph": "dialogue text"}
  ]
}
/mnt/user-data/workspace/
路径下生成结构化的JSON脚本文件,命名格式为:
{描述性名称}-script.json
JSON结构:
json
{
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "dialogue text"},
    {"speaker": "female", "paragraph": "dialogue text"}
  ]
}

Step 3: Execute Generation

步骤3:执行生成

Call the Python script:
bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --script-file /mnt/user-data/workspace/script-file.json \
  --output-file /mnt/user-data/outputs/generated-podcast.mp3 \
  --transcript-file /mnt/user-data/outputs/generated-podcast-transcript.md
Parameters:
  • --script-file
    : Absolute path to JSON script file (required)
  • --output-file
    : Absolute path to output MP3 file (required)
  • --transcript-file
    : Absolute path to output transcript markdown file (optional, but recommended)
[!IMPORTANT]
  • Execute the script in one complete call. Do NOT split the workflow into separate steps.
  • The script handles all TTS API calls and audio generation internally.
  • Do NOT read the Python file, just call it with the parameters.
  • Always include
    --transcript-file
    to generate a readable transcript for the user.
调用Python脚本:
bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --script-file /mnt/user-data/workspace/script-file.json \
  --output-file /mnt/user-data/outputs/generated-podcast.mp3 \
  --transcript-file /mnt/user-data/outputs/generated-podcast-transcript.md
参数说明:
  • --script-file
    :JSON脚本文件的绝对路径(必填)
  • --output-file
    :输出MP3文件的绝对路径(必填)
  • --transcript-file
    :输出转录文本Markdown文件的绝对路径(可选,但推荐使用)
[!IMPORTANT]
  • 需一次性完整调用脚本执行全流程,不要将工作拆分为单独步骤。
  • 脚本会在内部处理所有TTS API调用和音频生成操作。
  • 无需读取Python文件内容,只需传入参数调用即可。
  • 务必包含
    --transcript-file
    参数,为用户生成可读的转录文本。

Script JSON Format

脚本JSON格式

The script JSON file must follow this structure:
json
{
  "title": "The History of Artificial Intelligence",
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "Hello Deer! Welcome back to another episode."},
    {"speaker": "female", "paragraph": "Hey everyone! Today we have an exciting topic to discuss."},
    {"speaker": "male", "paragraph": "That's right! We're going to talk about..."}
  ]
}
Fields:
  • title
    : Title of the podcast episode (optional, used as heading in transcript)
  • locale
    : Language code - "en" for English or "zh" for Chinese
  • lines
    : Array of dialogue lines
    • speaker
      : Either "male" or "female"
    • paragraph
      : The dialogue text for this speaker
脚本JSON文件必须遵循以下结构:
json
{
  "title": "The History of Artificial Intelligence",
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "Hello Deer! Welcome back to another episode."},
    {"speaker": "female", "paragraph": "Hey everyone! Today we have an exciting topic to discuss."},
    {"speaker": "male", "paragraph": "That's right! We're going to talk about..."}
  ]
}
字段说明:
  • title
    :播客集的标题(可选,用作转录文本的标题)
  • locale
    :语言代码 - "en"代表英文,"zh"代表中文
  • lines
    :对话行数组
    • speaker
      :取值为"male"或"female"
    • paragraph
      :该主播的对话文本

Script Writing Guidelines

脚本编写指南

When creating the script JSON, follow these guidelines:
创建脚本JSON时,请遵循以下指南:

Format Requirements

格式要求

  • Only two hosts: male and female, alternating naturally
  • Target runtime: approximately 10 minutes of dialogue (around 40-60 lines)
  • Start with the male host saying a greeting that includes "Hello Deer"
  • 仅设置两位主播:男性和女性,自然交替对话
  • 目标时长:约10分钟的对话内容(约40-60行)
  • 以男性主播说出包含"Hello Deer"的问候语开场

Tone & Style

语气与风格

  • Natural, conversational dialogue - like two friends chatting
  • Use casual expressions and conversational transitions
  • Avoid overly formal language or academic tone
  • Include reactions, follow-up questions, and natural interjections
  • 自然、口语化的对话 - 就像两位朋友聊天一样
  • 使用随意的表达和对话过渡语
  • 避免过于正式的语言或学术性语气
  • 加入反应、后续问题和自然的感叹词

Content Guidelines

内容指南

  • Frequent back-and-forth between hosts
  • Keep sentences short and easy to follow when spoken
  • Plain text only - no markdown formatting in the output
  • Translate technical concepts into accessible language
  • No mathematical formulas, code, or complex notation
  • Make content engaging and accessible for audio-only listeners
  • Exclude meta information like dates, author names, or document structure
  • 主播之间频繁交替对话
  • 句子要简短,便于口语表达和听众理解
  • 仅使用纯文本 - 输出中不要包含Markdown格式
  • 将技术概念转化为通俗易懂的语言
  • 不要包含数学公式、代码或复杂符号
  • 内容要有趣且适合纯音频听众
  • 排除日期、作者姓名或文档结构等元信息

Podcast Generation Example

播客生成示例

User request: "Generate a podcast about the history of artificial intelligence"
Step 1: Create script file
/mnt/user-data/workspace/ai-history-script.json
:
json
{
  "title": "The History of Artificial Intelligence",
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "Hello Deer! Welcome back to another fascinating episode. Today we're diving into something that's literally shaping our future - the history of artificial intelligence."},
    {"speaker": "female", "paragraph": "Oh, I love this topic! You know, AI feels so modern, but it actually has roots going back over seventy years."},
    {"speaker": "male", "paragraph": "Exactly! It all started back in the 1950s. The term artificial intelligence was actually coined by John McCarthy in 1956 at a famous conference at Dartmouth."},
    {"speaker": "female", "paragraph": "Wait, so they were already thinking about machines that could think back then? That's incredible!"},
    {"speaker": "male", "paragraph": "Right? The early pioneers were so optimistic. They thought we'd have human-level AI within a generation."},
    {"speaker": "female", "paragraph": "But things didn't quite work out that way, did they?"},
    {"speaker": "male", "paragraph": "No, not at all. The 1970s brought what's called the first AI winter..."}
  ]
}
Step 2: Execute generation:
bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --script-file /mnt/user-data/workspace/ai-history-script.json \
  --output-file /mnt/user-data/outputs/ai-history-podcast.mp3 \
  --transcript-file /mnt/user-data/outputs/ai-history-transcript.md
This will generate:
  • ai-history-podcast.mp3
    : The audio podcast file
  • ai-history-transcript.md
    : A readable markdown transcript of the podcast
用户请求:"Generate a podcast about the history of artificial intelligence"
步骤1:创建脚本文件
/mnt/user-data/workspace/ai-history-script.json
json
{
  "title": "The History of Artificial Intelligence",
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "Hello Deer! Welcome back to another fascinating episode. Today we're diving into something that's literally shaping our future - the history of artificial intelligence."},
    {"speaker": "female", "paragraph": "Oh, I love this topic! You know, AI feels so modern, but it actually has roots going back over seventy years."},
    {"speaker": "male", "paragraph": "Exactly! It all started back in the 1950s. The term artificial intelligence was actually coined by John McCarthy in 1956 at a famous conference at Dartmouth."},
    {"speaker": "female", "paragraph": "Wait, so they were already thinking about machines that could think back then? That's incredible!"},
    {"speaker": "male", "paragraph": "Right? The early pioneers were so optimistic. They thought we'd have human-level AI within a generation."},
    {"speaker": "female", "paragraph": "But things didn't quite work out that way, did they?"},
    {"speaker": "male", "paragraph": "No, not at all. The 1970s brought what's called the first AI winter..."}
  ]
}
步骤2:执行生成:
bash
python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --script-file /mnt/user-data/workspace/ai-history-script.json \
  --output-file /mnt/user-data/outputs/ai-history-podcast.mp3 \
  --transcript-file /mnt/user-data/outputs/ai-history-transcript.md
生成的内容包括:
  • ai-history-podcast.mp3
    :播客音频文件
  • ai-history-transcript.md
    :可读的播客转录文本Markdown文件

Specific Templates

特定模板

Read the following template file only when matching the user request.
  • Tech Explainer - For converting technical documentation and tutorials
仅当用户请求匹配时,才读取以下模板文件:
  • Tech Explainer - 用于转换技术文档和教程

Output Format

输出格式

The generated podcast follows the "Hello Deer" format:
  • Two hosts: one male, one female
  • Natural conversational dialogue
  • Starts with "Hello Deer" greeting
  • Target duration: approximately 10 minutes
  • Alternating speakers for engaging flow
生成的播客遵循"Hello Deer"格式:
  • 两位主播:一位男性,一位女性
  • 自然的对话式内容
  • 以"Hello Deer"问候语开场
  • 目标时长:约10分钟
  • 主播交替对话,提升内容吸引力

Output Handling

输出处理

After generation:
  • Podcasts and transcripts are saved in
    /mnt/user-data/outputs/
  • Share both the podcast MP3 and transcript MD with user using
    present_files
    tool
  • Provide brief description of the generation result (topic, duration, hosts)
  • Offer to regenerate if adjustments needed
生成完成后:
  • 播客和转录文本将保存至
    /mnt/user-data/outputs/
    路径
  • 使用
    present_files
    工具向用户分享播客MP3和转录文本MD文件
  • 简要说明生成结果(主题、时长、主播配置)
  • 若用户需要调整,可提供重新生成的服务

Requirements

环境要求

The following environment variables must be set:
  • VOLCENGINE_TTS_APPID
    : Volcengine TTS application ID
  • VOLCENGINE_TTS_ACCESS_TOKEN
    : Volcengine TTS access token
  • VOLCENGINE_TTS_CLUSTER
    : Volcengine TTS cluster (optional, defaults to "volcano_tts")
必须设置以下环境变量:
  • VOLCENGINE_TTS_APPID
    :火山引擎TTS应用ID
  • VOLCENGINE_TTS_ACCESS_TOKEN
    :火山引擎TTS访问令牌
  • VOLCENGINE_TTS_CLUSTER
    :火山引擎TTS集群(可选,默认值为"volcano_tts")

Notes

注意事项

  • Always execute the full pipeline in one call - no need to test individual steps or worry about timeouts
  • The script JSON should match the content language (en or zh)
  • Technical content should be simplified for audio accessibility in the script
  • Complex notations (formulas, code) should be translated to plain language in the script
  • Long content may result in longer podcasts
  • 务必一次性执行完整流程 - 无需测试单个步骤,也无需担心超时问题
  • 脚本JSON的语言需与内容语言匹配(en或zh)
  • 技术内容在脚本中应简化,以便音频听众理解
  • 复杂符号(公式、代码)需在脚本中转换为通俗易懂的语言
  • 较长的源内容可能会生成更长的播客