blog-audio
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBlog Audio -- Gemini TTS Narration for Blog Posts
博客音频 -- 基于Gemini TTS的博客文章旁白生成工具
Generate professional audio narration of blog content using Google's Gemini TTS.
Three modes: summary (200-300 word spoken overview), full article read-aloud,
or two-speaker podcast dialogue. 30 voices, 80+ languages, HTML5 embed output.
使用Google的Gemini TTS生成专业的博客内容音频旁白。提供三种模式:摘要模式(200-300词的语音概述)、全文朗读模式,以及双主播播客对话模式。支持30种语音、80+种语言,输出HTML5嵌入代码。
Quick Reference
快速参考
| Command | What it does |
|---|---|
| Generate audio narration of a blog post |
| Show available voices with characteristics |
| Check/configure API key for Gemini TTS |
| 命令 | 功能 |
|---|---|
| 生成博客文章的音频旁白 |
| 展示可用语音及其特性 |
| 检查/配置Gemini TTS的API密钥 |
Prerequisites
前置条件
- Python 3.11+ (venv managed automatically by )
run.py - environment variable (same key used by blog-image)
GOOGLE_AI_API_KEY - FFmpeg (for WAV-to-MP3 conversion; falls back to WAV if missing)
- Python 3.11+(会自动管理venv)
run.py - 环境变量(与blog-image使用相同的密钥)
GOOGLE_AI_API_KEY - FFmpeg(用于WAV转MP3转换;若缺失则降级输出WAV格式)
Always Use run.py Wrapper
请始终使用run.py包装器
bash
undefinedbash
undefinedCORRECT:
CORRECT:
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json
python3 scripts/run.py generate_audio.py --text "..." --voice Charon --json
WRONG:
WRONG:
python3 scripts/generate_audio.py --text "..." # Fails without venv
undefinedpython3 scripts/generate_audio.py --text "..." # Fails without venv
undefinedAPI Key Check (Gate Pattern)
API密钥检查(门控模式)
Before generating audio, check for the API key:
bash
echo $GOOGLE_AI_API_KEY- If set: proceed with generation
- If not set: guide the user:
"Audio generation requires a Google AI API key. Get one free at https://aistudio.google.com/apikey
Then set it: This is the same key used by
export GOOGLE_AI_API_KEY=your-key-- if image generation works, audio works too."/blog image - When called internally (from blog-write): return silently if key is missing. Never block the writing workflow.
生成音频前,请检查API密钥:
bash
echo $GOOGLE_AI_API_KEY- 若已设置:继续生成流程
- 若未设置:引导用户操作:
"音频生成需要Google AI API密钥。可前往https://aistudio.google.com/apikey免费获取,
然后设置:此密钥与
export GOOGLE_AI_API_KEY=your-key使用的密钥相同——如果图片生成功能可用,音频生成也可正常使用。"/blog image - 内部调用时(来自blog-write):若密钥缺失则静默返回,绝不能阻塞写作流程。
Setup
设置流程
For :
/blog audio setup- Check if is set in environment
GOOGLE_AI_API_KEY - If blog-image is configured (check ), the key is already available
.mcp.json - If not, guide user to https://aistudio.google.com/apikey
- Verify with a dry run:
python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json
对于命令:
/blog audio setup- 检查环境变量中是否已设置
GOOGLE_AI_API_KEY - 若blog-image已配置(检查),则密钥已可用
.mcp.json - 若未配置,引导用户前往https://aistudio.google.com/apikey
- 通过试运行验证:
python3 scripts/run.py generate_audio.py --text "Test" --dry-run --json
Voice Selection
语音选择
For :
/blog audio voicesLoad and present the voice catalog to the user.
references/voices.mdAsk the user which voice they prefer, or recommend based on content type:
- Article narration: Charon (Informative) or Sadaltager (Knowledgeable)
- Tutorial/how-to: Achird (Friendly) or Sulafat (Warm)
- News/analysis: Rasalgethi (Informative) or Schedar (Even)
- Lifestyle/wellness: Aoede (Breezy) or Vindemiatrix (Gentle)
- Dialogue host: Puck (Upbeat) or Laomedeia (Upbeat)
- Dialogue expert: Kore (Firm) or Charon (Informative)
对于命令:
/blog audio voices加载并向用户展示语音目录。
references/voices.md询问用户偏好的语音,或根据内容类型推荐:
- 文章旁白:Charon(风格:资讯性)或Sadaltager(风格:知识性)
- 教程/操作指南:Achird(风格:友好)或Sulafat(风格:温暖)
- 新闻/分析:Rasalgethi(风格:资讯性)或Schedar(风格:平稳)
- 生活方式/健康:Aoede(风格:轻快)或Vindemiatrix(风格:温和)
- 对话主持人:Puck(风格:活泼)或Laomedeia(风格:活泼)
- 对话专家:Kore(风格:坚定)或Charon(风格:资讯性)
Generation Workflow
生成流程
For :
/blog audio generate <file>对于命令:
/blog audio generate <file>Step 1: Read the Blog Post
步骤1:读取博客文章
Read the file and extract:
- Title (from H1 or frontmatter)
- Full content (markdown body)
- Approximate word count
读取文件并提取:
- 标题(来自H1或前置元数据)
- 完整内容(Markdown正文)
- 大致字数
Step 2: Choose Mode
步骤2:选择模式
Ask the user (or auto-select if they specified ):
--mode| Mode | When to use | Output |
|---|---|---|
| Summary | Quick audio overview (1-2 min) | 200-300 word spoken summary |
| Full | Complete read-aloud (5-15 min) | Full article as natural speech |
| Dialogue | Podcast-style (3-8 min) | Two-person conversation about the article |
询问用户(或若用户指定则自动选择):
--mode| 模式 | 使用场景 | 输出 |
|---|---|---|
| 摘要模式 | 快速音频概述(1-2分钟) | 200-300词的语音摘要 |
| 全文模式 | 完整朗读(5-15分钟) | 全文转换为自然语音 |
| 对话模式 | 播客风格(3-8分钟) | 关于文章的双人对话内容 |
Step 3: Prepare Text
步骤3:准备文本
CRITICAL: Claude prepares the text. The script does TTS only.
Summary mode:
Write a 200-300 word spoken summary of the article. Rules:
- Write as natural speech, not written text
- Open with the article's key finding or answer
- Cover 3-5 main takeaways
- Close with actionable advice
- No markdown, no "In this article...", no meta-commentary
- Use conversational transitions ("Here's what matters...", "The key finding is...")
Full mode:
Strip the markdown content to clean spoken text:
- Headings become natural transitions ("Next, let's look at...")
- Links become plain text (remove URLs, keep anchor text)
- Images and charts: omit or briefly describe ("As the data shows...")
- Code blocks: describe verbally ("The code uses a for-loop to...")
- Lists: convert to natural sentences
- Remove frontmatter, schema markup, HTML tags
- Add brief intro: "This is [title], published on [date]."
Dialogue mode:
Write a 2-person conversation script about the article:
- Speaker1 = Host (curious, asks good questions)
- Speaker2 = Expert (knowledgeable, gives clear answers)
- Format each line as:
[Speaker1] What's the key takeaway here? - Cover the article's main points conversationally
- 15-25 exchanges (produces ~3-8 minutes)
- Natural, not stilted ("That's a great point" over "Indeed, as the research indicates")
关键说明:文本由Claude准备,脚本仅负责TTS转换。
摘要模式:
撰写文章的200-300词语音摘要,需遵循以下规则:
- 以自然口语风格撰写,而非书面语
- 开篇点明文章的核心结论或答案
- 涵盖3-5个主要要点
- 结尾给出可操作建议
- 不含Markdown格式、“在本文中...”等元注释
- 使用口语化过渡语(如“重点内容如下...”“核心结论是...”)
全文模式:
将Markdown内容转换为干净的口语文本:
- 标题转换为自然过渡语(如“接下来,我们来看...”)
- 链接转换为纯文本(移除URL,保留锚文本)
- 图片和图表:省略或简要描述(如“数据显示...”)
- 代码块:用语言描述(如“这段代码使用for循环来...”)
- 列表:转换为自然语句
- 移除前置元数据、Schema标记、HTML标签
- 添加简短引言:“这是《[标题]》,发布于[日期]。”
对话模式:
撰写关于文章的双人对话脚本:
- Speaker1 = 主持人(充满好奇,善于提问)
- Speaker2 = 专家(知识渊博,回答清晰)
- 每行格式:
[Speaker1] 这里的核心要点是什么? - 以对话形式涵盖文章主要内容
- 15-25轮对话(时长约3-8分钟)
- 风格自然,避免生硬(用“这个观点很棒”替代“确实,正如研究表明的那样”)
Step 4: Select Voice
步骤4:选择语音
If the user chose a voice, use it. Otherwise, recommend based on mode:
- Summary/Full: default to Charon (Informative)
- Dialogue: default to Puck (Host) + Kore (Expert)
若用户已选择语音则使用该语音,否则根据模式推荐:
- 摘要/全文模式:默认使用Charon(资讯性)
- 对话模式:默认使用Puck(主持人)+ Kore(专家)
Step 5: Generate Audio
步骤5:生成音频
Write the prepared text to a temp file, then call:
bash
undefined将准备好的文本写入临时文件,然后调用:
bash
undefinedSingle voice (summary or full mode)
单语音(摘要或全文模式)
python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_prepared.txt
--voice Charon
--model flash
--output /path/to/audio/post-slug.mp3
--json
--text-file /tmp/blog_audio_prepared.txt
--voice Charon
--model flash
--output /path/to/audio/post-slug.mp3
--json
python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_prepared.txt
--voice Charon
--model flash
--output /path/to/audio/post-slug.mp3
--json
--text-file /tmp/blog_audio_prepared.txt
--voice Charon
--model flash
--output /path/to/audio/post-slug.mp3
--json
Two voices (dialogue mode)
双语音(对话模式)
python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_dialogue.txt
--voice Puck
--voice2 Kore
--model pro
--output /path/to/audio/post-slug-dialogue.mp3
--json
--text-file /tmp/blog_audio_dialogue.txt
--voice Puck
--voice2 Kore
--model pro
--output /path/to/audio/post-slug-dialogue.mp3
--json
**Model selection:**
- `flash` (default): Fast, cheap. Good for summaries and standard narration.
- `pro`: Higher quality. Use for dialogue mode or premium content.python3 scripts/run.py generate_audio.py
--text-file /tmp/blog_audio_dialogue.txt
--voice Puck
--voice2 Kore
--model pro
--output /path/to/audio/post-slug-dialogue.mp3
--json
--text-file /tmp/blog_audio_dialogue.txt
--voice Puck
--voice2 Kore
--model pro
--output /path/to/audio/post-slug-dialogue.mp3
--json
**模型选择**:
- `flash`(默认):快速、低成本,适合摘要和标准旁白
- `pro`:更高质量,适合对话模式或 premium 内容Step 6: Deliver
步骤6:交付结果
Present the result to the user:
- File path -- where the audio was saved
- Duration -- human-readable (e.g., "3:42")
- Embed code -- ready-to-paste HTML5 audio tag
- Cost -- estimated API cost
- Placement suggestion -- where to insert the embed in the blog post
向用户展示以下结果:
- 文件路径 -- 音频保存位置
- 时长 -- 易读格式(如“3:42”)
- 嵌入代码 -- 可直接粘贴的HTML5音频标签
- 成本 -- API估算费用
- 放置建议 -- 嵌入代码在博客文章中的插入位置
Embedding Guide
嵌入指南
Standard HTML (Hugo, Jekyll, static sites)
标准HTML(Hugo、Jekyll、静态站点)
html
<audio controls preload="metadata">
<source src="audio/post-slug.mp3" type="audio/mpeg">
Your browser does not support the audio element.
</audio>html
<audio controls preload="metadata">
<source src="audio/post-slug.mp3" type="audio/mpeg">
Your browser does not support the audio element.
</audio>MDX (Next.js, Gatsby)
MDX(Next.js、Gatsby)
jsx
<audio controls preload="metadata">
<source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>jsx
<audio controls preload="metadata">
<source src="/audio/post-slug.mp3" type="audio/mpeg" />
</audio>WordPress
WordPress
[audio src="audio/post-slug.mp3"][audio src="audio/post-slug.mp3"]Placement
放置位置
Insert the audio player after the introduction (below the first H2) or at the
very top of the article with a label: "Listen to this article" or "Audio version".
将音频播放器插入引言之后(第一个H2下方),或放在文章最顶部并添加标签:“收听本文”或“音频版本”。
Internal API (for blog-write)
内部API(供blog-write调用)
When invoked internally from blog-write:
Input:
- : Prepared text (already cleaned by Claude)
text - : Voice name (default: Charon)
voice - : Second voice for dialogue (optional)
voice2 - : flash or pro
model - : Where to save the file
output_path
Output:
markdown
undefined当从blog-write内部调用时:
输入参数:
- : 准备好的文本(已由Claude清理)
text - : 语音名称(默认:Charon)
voice - : 对话模式的第二个语音(可选)
voice2 - : flash或pro
model - : 文件保存路径
output_path
输出格式:
markdown
undefinedAudio Narration
音频旁白
- Path: /path/to/audio/post-slug.mp3
- Duration: 3:42
- Voice: Charon
- Embed:
<audio controls preload="metadata"><source src="audio/post-slug.mp3" type="audio/mpeg"></audio>
**Graceful fallback:** If `GOOGLE_AI_API_KEY` is not set, return immediately
with no error. The writing workflow continues without audio. Never block
blog-write because audio generation is unavailable.- 路径: /path/to/audio/post-slug.mp3
- 时长: 3:42
- 语音: Charon
- 嵌入代码:
<audio controls preload="metadata"><source src="audio/post-slug.mp3" type="audio/mpeg"></audio>
**优雅降级**:若未设置`GOOGLE_AI_API_KEY`,则立即静默返回,不报错。写作流程将继续,不会因音频生成不可用而阻塞。Error Handling
错误处理
| Error | Resolution |
|---|---|
| GOOGLE_AI_API_KEY not set | Get key at https://aistudio.google.com/apikey |
| FFmpeg not found | Install: |
| Rate limited | Wait and retry. Check limits at https://aistudio.google.com/rate-limit |
| Text too long (>32k tokens) | Split into sections, generate separately |
| Unknown voice name | Run |
| API error | Check key validity, model availability (preview models) |
| API key missing (internal call) | Return silently -- writing workflow continues |
| 错误 | 解决方案 |
|---|---|
| 未设置GOOGLE_AI_API_KEY | 前往https://aistudio.google.com/apikey获取密钥 |
| 未找到FFmpeg | 安装: |
| 速率限制 | 等待后重试,查看限制:https://aistudio.google.com/rate-limit |
| 文本过长(>32k tokens) | 拆分段落,分别生成 |
| 未知语音名称 | 运行 |
| API错误 | 检查密钥有效性、模型可用性(预览模型) |
| 内部调用时缺失API密钥 | 静默返回——写作流程继续 |
Reference Documentation
参考文档
Load on-demand -- do NOT load all at startup:
- -- Full 30-voice catalog, recommendations by content type, dialogue pairings
references/voices.md
按需加载——请勿在启动时全部加载:
- -- 完整的30种语音目录、按内容类型推荐的语音、对话组合推荐
references/voices.md