audio-producer-agent
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAudio Producer
Audio Producer
Create single-speaker audio content: audiobooks, voiceovers, narrations, jingles, and more.
This is an orchestrator skill that combines:
- Text-to-speech / narration (Gemini TTS, ElevenLabs, or OpenAI TTS)
- Background music / ambient audio (Lyria)
- Audio assembly (FFmpeg via media-utils)
For dialogues and conversations, use instead.
podcast-producer创建单语音频内容:有声书、旁白、解说、广告短曲等。
这是一个整合型Skill,包含以下功能:
- 文本转语音/解说(Gemini TTS、ElevenLabs或OpenAI TTS)
- 背景音乐/环境音(Lyria)
- 音频合成(通过media-utils调用FFmpeg)
若需创建对话类内容,请使用。
podcast-producerWhat You Can Create
可创建的内容类型
| Type | Example |
|---|---|
| Audiobook | Long-form narration of text/chapters |
| Voiceover | Narration for video, presentation, or slideshow |
| Audio ad | Radio or podcast advertisement |
| Jingle | Short brand music with optional tagline |
| Sonic logo | Audio brand identifier (few seconds) |
| Audio guide | Museum/tour style narration |
| Meditation | Guided relaxation with ambient audio |
| Soundscape | Ambient audio environment |
| 类型 | 示例 |
|---|---|
| 有声书 | 文本/章节的长篇解说 |
| 旁白 | 视频、演示文稿或幻灯片的解说 |
| 音频广告 | 广播或播客广告 |
| 广告短曲 | 带有可选标语的品牌短音乐 |
| 声音标识 | 品牌音频标识(数秒时长) |
| 音频指南 | 博物馆/游览风格的解说 |
| 冥想音频 | 带环境音的引导放松音频 |
| 环境音景 | 沉浸式环境音频 |
Prerequisites
前置条件
- - For Gemini TTS (voice) and Lyria (music)
GOOGLE_API_KEY - FFmpeg installed:
brew install ffmpeg
- - 用于Gemini TTS(语音)和Lyria(音乐)生成
GOOGLE_API_KEY - 已安装FFmpeg:
brew install ffmpeg
Workflow
工作流程
Step 1: Gather Requirements (REQUIRED)
步骤1:收集需求(必填)
⚠️ DO NOT skip this step. Use interactive questioning — ask ONE question at a time.
⚠️ 请勿跳过此步骤。使用交互式提问——一次只提一个问题。
Question Flow
提问流程
⚠️ Use the tool for each question below. Do not just print questions in your response — use the tool to create interactive prompts with the options shown.
AskUserQuestionQ1: Type
"I'll create that audio for you! First — what type of audio?
- Audiobook / narration
- Voiceover (for video/presentation)
- Audio ad / radio ad
- Jingle / sonic logo
- Meditation / guided audio
- Or describe your own"
Wait for response.
Q2: Content
"What's the text/content to speak?
- Paste the text here
- Or describe what you need and I'll write it"
Wait for response.
Q3: Voice
"What voice style?
- Professional
- Warm/friendly
- Energetic
- Calm/soothing
- Dramatic
- Or describe your own"
Wait for response.
Q4: Music
"Do you want background music?
- Yes — describe the style (ambient, upbeat, cinematic, etc.)
- No — voice only"
Wait for response.
Q5: Duration
"What's the target duration?
- Let it be natural length
- Or specify (e.g., 30 seconds, 2 minutes)"
Wait for response.
⚠️ 每个问题都使用工具。请勿直接在回复中打印问题——使用工具创建带有以下选项的交互式提示。
AskUserQuestion问题1:内容类型
"我将为你创建所需音频!首先——你需要什么类型的音频?
- 有声书/解说
- 旁白(用于视频/演示文稿)
- 音频广告/广播广告
- 广告短曲/声音标识
- 冥想/引导音频
- 或描述你的自定义需求"
等待回复。
问题2:内容文本
"需要转换的文本/内容是什么?
- 在此粘贴文本
- 或描述需求,我来帮你撰写"
等待回复。
问题3:语音风格
"你想要什么语音风格?
- 专业正式
- 温暖友好
- 活力充沛
- 平静舒缓
- 富有戏剧性
- 或描述你的自定义风格"
等待回复。
问题4:背景音乐
"是否需要背景音乐?
- 是——描述风格(环境音、欢快、电影感等)
- 否——仅保留语音"
等待回复。
问题5:时长要求
"目标时长是多少?
- 保持自然长度
- 或指定时长(如30秒、2分钟)"
等待回复。
Quick Reference
快速参考
| Question | Determines |
|---|---|
| Type | Processing approach and output format |
| Content | TTS input text |
| Voice | Voice selection and style parameters |
| Music | Whether to generate and mix music |
| Duration | Pacing and content length |
| 问题 | 决定内容 |
|---|---|
| 类型 | 处理方式和输出格式 |
| 内容 | TTS输入文本 |
| 语音 | 语音选择和风格参数 |
| 音乐 | 是否生成并混合音乐 |
| 时长 | 语速和内容长度 |
Step 2: Prepare the Content
步骤2:准备内容
For narration/voiceover:
- Optimize text for speech (spell out numbers if needed)
- Add natural pause points (commas, periods)
- Break long content into chunks if > 32k tokens
For jingles/audio ads:
- Write the tagline/copy
- Determine music style
- Plan structure: music intro → voice → music outro
For audiobooks:
- Split into chapters
- Consider different voice styles for different sections
- Plan ambient music (subtle, low volume)
对于解说/旁白:
- 优化文本以适配语音(必要时拼写数字)
- 添加自然停顿点(逗号、句号)
- 若内容超过32k tokens,将其拆分为多个片段
对于广告短曲/音频广告:
- 撰写标语/文案
- 确定音乐风格
- 规划结构:音乐开场 → 语音 → 音乐结尾
对于有声书:
- 拆分为章节
- 考虑为不同章节使用不同语音风格
- 规划环境音乐(轻柔、低音量)
Step 3: Generate Assets
步骤3:生成资源
Type: Voiceover / Narration
类型:旁白/解说
Generate narration (Gemini TTS):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Your narration text here..." \
--voice Charon \
--style "Professional, measured pace, warm and authoritative"Generate background music if needed (Lyria):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "subtle ambient, corporate, unobtrusive, background" \
--duration 120 \
--density 0.2 \
--brightness 0.4Mix voice with music:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
--voice narration.wav \
--music background.wav \
--music-volume 0.15 \
--fade-in 2 \
--fade-out 3 \
-o final_voiceover.mp3生成解说(Gemini TTS):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Your narration text here..." \
--voice Charon \
--style "Professional, measured pace, warm and authoritative"若需要,生成背景音乐(Lyria):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "subtle ambient, corporate, unobtrusive, background" \
--duration 120 \
--density 0.2 \
--brightness 0.4混合语音与音乐:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
--voice narration.wav \
--music background.wav \
--music-volume 0.15 \
--fade-in 2 \
--fade-out 3 \
-o final_voiceover.mp3Type: Audio Ad / Radio Spot
类型:音频广告/广播广告
Structure: 30-second radio ad
0-3s: Music hook (attention grabber)
3-25s: Voice with music bed underneath
25-30s: Music + tagline + CTAGenerate energetic music:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "upbeat, energetic, advertising, catchy, radio jingle" \
--duration 35 \
--bpm 120Generate voice with style:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Tired of ordinary coffee? Wake up to extraordinary! Premium beans, perfect roast, delivered fresh. Visit BestCoffee.com today and get 20% off your first order!" \
--voice Puck \
--style "Energetic, radio announcer style, enthusiastic, clear call to action"Mix and assemble:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
--voice ad_voice.wav \
--music ad_music.wav \
--music-volume 0.35 \
--fade-in 1 \
--fade-out 2 \
-o radio_ad.mp3结构:30秒广播广告
0-3s: 音乐钩子(吸引注意力)
3-25s: 语音+背景音乐
25-30s: 音乐+标语+行动号召生成活力音乐:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "upbeat, energetic, advertising, catchy, radio jingle" \
--duration 35 \
--bpm 120生成带风格的语音:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Tired of ordinary coffee? Wake up to extraordinary! Premium beans, perfect roast, delivered fresh. Visit BestCoffee.com today and get 20% off your first order!" \
--voice Puck \
--style "Energetic, radio announcer style, enthusiastic, clear call to action"混合并合成:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
--voice ad_voice.wav \
--music ad_music.wav \
--music-volume 0.35 \
--fade-in 1 \
--fade-out 2 \
-o radio_ad.mp3Type: Jingle / Sonic Logo
类型:广告短曲/声音标识
For jingle with tagline:
Generate catchy music:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "catchy jingle, memorable, brand audio, upbeat, major key" \
--duration 10 \
--bpm 110 \
--scale CGenerate tagline:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "TechCorp. Innovation for tomorrow." \
--voice Kore \
--style "Confident, aspirational, slight pause between company name and tagline"Mix tagline over music:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
--voice tagline.wav \
--music jingle.wav \
--music-volume 0.5 \
-o brand_jingle.mp3For sonic logo (music only):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "sonic logo, 3 seconds, memorable, brand identifier, simple, distinctive" \
--duration 5 \
--bpm 100带标语的广告短曲:
生成朗朗上口的音乐:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "catchy jingle, memorable, brand audio, upbeat, major key" \
--duration 10 \
--bpm 110 \
--scale C生成标语语音:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "TechCorp. Innovation for tomorrow." \
--voice Kore \
--style "Confident, aspirational, slight pause between company name and tagline"将标语与音乐混合:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
--voice tagline.wav \
--music jingle.wav \
--music-volume 0.5 \
-o brand_jingle.mp3纯音乐声音标识:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "sonic logo, 3 seconds, memorable, brand identifier, simple, distinctive" \
--duration 5 \
--bpm 100Type: Audiobook
类型:有声书
Process chapters:
bash
undefined处理章节:
bash
undefinedChapter 1
Chapter 1
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter1.txt
--voice Algieba
--style "Audiobook narrator, measured pace, engaging storytelling"
-o chapter1.wav
--text-file chapter1.txt
--voice Algieba
--style "Audiobook narrator, measured pace, engaging storytelling"
-o chapter1.wav
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter1.txt
--voice Algieba
--style "Audiobook narrator, measured pace, engaging storytelling"
-o chapter1.wav
--text-file chapter1.txt
--voice Algieba
--style "Audiobook narrator, measured pace, engaging storytelling"
-o chapter1.wav
Chapter 2
Chapter 2
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter2.txt
--voice Algieba
-o chapter2.wav
--text-file chapter2.txt
--voice Algieba
-o chapter2.wav
**Optional: Add subtle ambient music:**
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "ambient, subtle, reading music, calm, unobtrusive, soft piano" \
--duration 600 \
--density 0.1 \
--brightness 0.3Concatenate chapters:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_concat.py \
-i chapter1.wav chapter2.wav chapter3.wav \
--crossfade 0.5 \
-o audiobook.mp3python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter2.txt
--voice Algieba
-o chapter2.wav
--text-file chapter2.txt
--voice Algieba
-o chapter2.wav
**可选:添加轻柔环境音乐:**
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "ambient, subtle, reading music, calm, unobtrusive, soft piano" \
--duration 600 \
--density 0.1 \
--brightness 0.3拼接章节:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_concat.py \
-i chapter1.wav chapter2.wav chapter3.wav \
--crossfade 0.5 \
-o audiobook.mp3Type: Meditation / Relaxation Audio
类型:冥想/放松音频
Generate calming narration:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Close your eyes. Take a deep breath in... and slowly release..." \
--voice Achernar \
--style "Calm, soothing, slow pace, relaxing, gentle, meditation guide"Generate ambient soundscape:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "ambient, meditation, peaceful, nature sounds, gentle, calming" \
--duration 300 \
--density 0.1 \
--brightness 0.6Mix with high ambient volume:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
--voice meditation_guide.wav \
--music ambient.wav \
--music-volume 0.5 \
-o meditation_session.mp3生成舒缓解说:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
--text "Close your eyes. Take a deep breath in... and slowly release..." \
--voice Achernar \
--style "Calm, soothing, slow pace, relaxing, gentle, meditation guide"生成环境音景:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
--prompt "ambient, meditation, peaceful, nature sounds, gentle, calming" \
--duration 300 \
--density 0.1 \
--brightness 0.6高音量混合环境音:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
--voice meditation_guide.wav \
--music ambient.wav \
--music-volume 0.5 \
-o meditation_session.mp3Step 4: Deliver the Result
步骤4:交付结果
Example delivery:
"✅ Your audio ad is ready!
File: (30s)
coffee_radio_ad.mp3What I created:
- Energetic voiceover (Puck voice, radio announcer style)
- Upbeat background music (120 BPM)
- Music ducks under voice, fades out at end
Structure:
- 0-3s: Music hook
- 3-25s: Voice + music bed
- 25-30s: Music swell + tagline
Want me to:
- Try a different voice?
- Change the music energy?
- Adjust timing?"
交付示例:
"✅ 你的音频广告已生成完成!
文件: (30秒)
coffee_radio_ad.mp3生成内容说明:
- 活力旁白(Puck语音,广播播音员风格)
- 欢快背景音乐(120 BPM)
- 语音播放时音乐自动降低音量,结尾渐弱
结构:
- 0-3秒:音乐钩子
- 3-25秒:语音+背景音乐
- 25-30秒:音乐升高+标语
是否需要调整:
- 更换语音?
- 调整音乐活力?
- 修改时长?"
Voice Recommendations by Type
按内容类型推荐语音
| Audio Type | Recommended Voices | Style Direction |
|---|---|---|
| Corporate voiceover | Charon, Orus | Professional, measured |
| Audiobook | Algieba, Despina | Smooth, engaging |
| Radio ad | Puck, Laomedeia | Energetic, upbeat |
| Meditation | Achernar, Sulafat | Calm, soothing |
| Jingle tagline | Kore, Alnilam | Confident, memorable |
| Documentary | Gacrux, Rasalgethi | Mature, authoritative |
| Tutorial | Achird, Charon | Friendly, clear |
| 音频类型 | 推荐语音 | 风格方向 |
|---|---|---|
| 企业旁白 | Charon、Orus | 专业、沉稳 |
| 有声书 | Algieba、Despina | 流畅、引人入胜 |
| 广播广告 | Puck、Laomedeia | 活力、欢快 |
| 冥想音频 | Achernar、Sulafat | 平静、舒缓 |
| 广告短曲标语 | Kore、Alnilam | 自信、易记 |
| 纪录片解说 | Gacrux、Rasalgethi | 成熟、权威 |
| 教程解说 | Achird、Charon | 友好、清晰 |
Music Recommendations by Type
按内容类型推荐音乐
| Audio Type | Lyria Prompt | Settings |
|---|---|---|
| Corporate VO | "subtle, professional, ambient" | density: 0.2, brightness: 0.4 |
| Radio ad | "upbeat, energetic, catchy" | bpm: 120, density: 0.6 |
| Audiobook | "soft, ambient, unobtrusive" | density: 0.1, brightness: 0.3 |
| Meditation | "peaceful, ambient, nature" | density: 0.1, brightness: 0.6 |
| Jingle | "catchy, memorable, brand" | bpm: 110, density: 0.5 |
| 音频类型 | Lyria提示词 | 设置参数 |
|---|---|---|
| 企业旁白 | "subtle, professional, ambient" | density: 0.2, brightness: 0.4 |
| 广播广告 | "upbeat, energetic, catchy" | bpm: 120, density: 0.6 |
| 有声书 | "soft, ambient, unobtrusive" | density: 0.1, brightness: 0.3 |
| 冥想音频 | "peaceful, ambient, nature" | density: 0.1, brightness: 0.6 |
| 广告短曲 | "catchy, memorable, brand" | bpm: 110, density: 0.5 |
Limitations
限制条件
- Gemini TTS max: 32k tokens per request (split longer content)
- Lyria instrumental only: No vocals in background music
- Processing time: Long audiobooks take time to generate
- Gemini TTS上限:每次请求最多32k tokens(长内容需拆分)
- Lyria仅支持器乐:背景音乐无 vocals
- 处理时长:长篇有声书生成需要较长时间
Example Prompts
示例提示词
Voiceover:
"Create a professional voiceover for this script: '...' Add subtle corporate background music."
Audio ad:
"Create a 30-second radio ad for our coffee brand. Energetic, memorable, with catchy music. End with 'Visit BestCoffee.com'"
Jingle:
"Create a 5-second jingle for TechCorp. Modern, memorable, with the tagline 'Innovation for tomorrow'"
Audiobook:
"Convert this text into an audiobook chapter. Use a warm, engaging narrator voice. Add subtle ambient music."
Meditation:
"Create a 5-minute guided meditation. Calm, soothing voice with peaceful ambient background."
旁白:
"为这个脚本创建专业旁白:'...' 添加轻柔的企业背景音乐。"
音频广告:
"为我们的咖啡品牌创建30秒广播广告。活力充沛、令人难忘,搭配朗朗上口的音乐。结尾加上'Visit BestCoffee.com'"
广告短曲:
"为TechCorp创建5秒广告短曲。现代、易记,搭配标语'Innovation for tomorrow'"
有声书:
"将这段文本转换为有声书章节。使用温暖、引人入胜的旁白语音。添加轻柔的环境音乐。"
冥想音频:
"创建5分钟引导冥想音频。平静、舒缓的语音搭配宁静的环境背景音。"