podcast-generation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Podcast Generation Skill

播客生成Skill

Overview

概述

This skill generates high-quality podcast audio from text content. The workflow includes creating a structured JSON script (conversational dialogue) and executing audio generation through text-to-speech synthesis.

此Skill可根据文本内容生成高质量的播客音频。工作流程包括创建结构化的JSON脚本（对话式内容），并通过文本转语音合成执行音频生成。

Core Capabilities

核心功能

Convert any text content (articles, reports, documentation) into podcast scripts
Generate natural two-host conversational dialogue (male and female hosts)
Synthesize speech audio using text-to-speech
Mix audio chunks into a final podcast MP3 file
Support both English and Chinese content

将任意文本内容（文章、报告、文档）转换为播客脚本
生成自然的双主播对话内容（男女主播）
通过文本转语音（TTS）合成语音音频
将音频片段混合为最终的播客MP3文件
支持英文和中文内容

Workflow

工作流程

Step 1: Understand Requirements

步骤1：理解需求

When a user requests podcast generation, identify:

Source content: The text/article/report to convert into a podcast
Language: English or Chinese (based on content)
Output location: Where to save the generated podcast
You don't need to check the folder under
```
/mnt/user-data
```

当用户请求生成播客时，需明确：

源内容：要转换为播客的文本/文章/报告
语言：英文或中文（根据内容确定）
输出位置：生成的播客保存路径
无需检查
```
/mnt/user-data
```
下的文件夹

Step 2: Create Structured Script JSON

步骤2：创建结构化脚本JSON

Generate a structured JSON script file in

/mnt/user-data/workspace/

with naming pattern:

{descriptive-name}-script.json

The JSON structure:

json

{
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "dialogue text"},
    {"speaker": "female", "paragraph": "dialogue text"}
  ]
}

在

/mnt/user-data/workspace/

路径下生成结构化的JSON脚本文件，命名格式为：

{描述性名称}-script.json

JSON结构：

json

{
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "dialogue text"},
    {"speaker": "female", "paragraph": "dialogue text"}
  ]
}

Step 3: Execute Generation

步骤3：执行生成

Call the Python script:

bash

python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --script-file /mnt/user-data/workspace/script-file.json \
  --output-file /mnt/user-data/outputs/generated-podcast.mp3 \
  --transcript-file /mnt/user-data/outputs/generated-podcast-transcript.md

Parameters:

```
--script-file
```
: Absolute path to JSON script file (required)
```
--output-file
```
: Absolute path to output MP3 file (required)
```
--transcript-file
```
: Absolute path to output transcript markdown file (optional, but recommended)

[!IMPORTANT]
Execute the script in one complete call. Do NOT split the workflow into separate steps.

The script handles all TTS API calls and audio generation internally.

Do NOT read the Python file, just call it with the parameters.
Always include
--transcript-file
to generate a readable transcript for the user.

调用Python脚本：

bash

python /mnt/skills/public/podcast-generation/scripts/generate.py \
  --script-file /mnt/user-data/workspace/script-file.json \
  --output-file /mnt/user-data/outputs/generated-podcast.mp3 \
  --transcript-file /mnt/user-data/outputs/generated-podcast-transcript.md

参数说明：

```
--script-file
```
：JSON脚本文件的绝对路径（必填）
```
--output-file
```
：输出MP3文件的绝对路径（必填）
```
--transcript-file
```
：输出转录文本Markdown文件的绝对路径（可选，但推荐使用）

[!IMPORTANT]
需一次性完整调用脚本执行全流程，不要将工作拆分为单独步骤。

脚本会在内部处理所有TTS API调用和音频生成操作。

无需读取Python文件内容，只需传入参数调用即可。
务必包含
--transcript-file
参数，为用户生成可读的转录文本。

Script JSON Format

脚本JSON格式

The script JSON file must follow this structure:

json

{
  "title": "The History of Artificial Intelligence",
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "Hello Deer! Welcome back to another episode."},
    {"speaker": "female", "paragraph": "Hey everyone! Today we have an exciting topic to discuss."},
    {"speaker": "male", "paragraph": "That's right! We're going to talk about..."}
  ]
}

Fields:

```
title
```
: Title of the podcast episode (optional, used as heading in transcript)
```
locale
```
: Language code - "en" for English or "zh" for Chinese
```
lines
```
: Array of dialogue lines
- ```
speaker
```
  : Either "male" or "female"
- ```
paragraph
```
  : The dialogue text for this speaker

脚本JSON文件必须遵循以下结构：

json

{
  "title": "The History of Artificial Intelligence",
  "locale": "en",
  "lines": [
    {"speaker": "male", "paragraph": "Hello Deer! Welcome back to another episode."},
    {"speaker": "female", "paragraph": "Hey everyone! Today we have an exciting topic to discuss."},
    {"speaker": "male", "paragraph": "That's right! We're going to talk about..."}
  ]
}

字段说明：

```
title
```
：播客集的标题（可选，用作转录文本的标题）
```
locale
```
：语言代码 - "en"代表英文，"zh"代表中文
```
lines
```
：对话行数组
- ```
speaker
```
  ：取值为"male"或"female"
- ```
paragraph
```
  ：该主播的对话文本

Script Writing Guidelines

脚本编写指南

When creating the script JSON, follow these guidelines:

创建脚本JSON时，请遵循以下指南：

Format Requirements

格式要求

Only two hosts: male and female, alternating naturally
Target runtime: approximately 10 minutes of dialogue (around 40-60 lines)
Start with the male host saying a greeting that includes "Hello Deer"

仅设置两位主播：男性和女性，自然交替对话
目标时长：约10分钟的对话内容（约40-60行）
以男性主播说出包含"Hello Deer"的问候语开场

Tone & Style

语气与风格

Natural, conversational dialogue - like two friends chatting
Use casual expressions and conversational transitions
Avoid overly formal language or academic tone
Include reactions, follow-up questions, and natural interjections

自然、口语化的对话 - 就像两位朋友聊天一样
使用随意的表达和对话过渡语
避免过于正式的语言或学术性语气
加入反应、后续问题和自然的感叹词

Content Guidelines

内容指南

Frequent back-and-forth between hosts
Keep sentences short and easy to follow when spoken
Plain text only - no markdown formatting in the output
Translate technical concepts into accessible language
No mathematical formulas, code, or complex notation
Make content engaging and accessible for audio-only listeners
Exclude meta information like dates, author names, or document structure

主播之间频繁交替对话
句子要简短，便于口语表达和听众理解
仅使用纯文本 - 输出中不要包含Markdown格式
将技术概念转化为通俗易懂的语言
不要包含数学公式、代码或复杂符号
内容要有趣且适合纯音频听众
排除日期、作者姓名或文档结构等元信息