audio-producer-agent

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Audio Producer

Audio Producer

Create single-speaker audio content: audiobooks, voiceovers, narrations, jingles, and more.
This is an orchestrator skill that combines:
  • Text-to-speech / narration (Gemini TTS, ElevenLabs, or OpenAI TTS)
  • Background music / ambient audio (Lyria)
  • Audio assembly (FFmpeg via media-utils)
For dialogues and conversations, use
podcast-producer
instead.
创建单语音频内容:有声书、旁白、解说、广告短曲等。
这是一个整合型Skill,包含以下功能:
  • 文本转语音/解说(Gemini TTS、ElevenLabs或OpenAI TTS)
  • 背景音乐/环境音(Lyria)
  • 音频合成(通过media-utils调用FFmpeg)
若需创建对话类内容,请使用
podcast-producer

What You Can Create

可创建的内容类型

TypeExample
AudiobookLong-form narration of text/chapters
VoiceoverNarration for video, presentation, or slideshow
Audio adRadio or podcast advertisement
JingleShort brand music with optional tagline
Sonic logoAudio brand identifier (few seconds)
Audio guideMuseum/tour style narration
MeditationGuided relaxation with ambient audio
SoundscapeAmbient audio environment
类型示例
有声书文本/章节的长篇解说
旁白视频、演示文稿或幻灯片的解说
音频广告广播或播客广告
广告短曲带有可选标语的品牌短音乐
声音标识品牌音频标识(数秒时长)
音频指南博物馆/游览风格的解说
冥想音频带环境音的引导放松音频
环境音景沉浸式环境音频

Prerequisites

前置条件

  • GOOGLE_API_KEY
    - For Gemini TTS (voice) and Lyria (music)
  • FFmpeg installed:
    brew install ffmpeg
  • GOOGLE_API_KEY
    - 用于Gemini TTS(语音)和Lyria(音乐)生成
  • 已安装FFmpeg:
    brew install ffmpeg

Workflow

工作流程

Step 1: Gather Requirements (REQUIRED)

步骤1:收集需求(必填)

⚠️ DO NOT skip this step. Use interactive questioning — ask ONE question at a time.
⚠️ 请勿跳过此步骤。使用交互式提问——一次只提一个问题。

Question Flow

提问流程

⚠️ Use the
AskUserQuestion
tool for each question below.
Do not just print questions in your response — use the tool to create interactive prompts with the options shown.
Q1: Type
"I'll create that audio for you! First — what type of audio?
  • Audiobook / narration
  • Voiceover (for video/presentation)
  • Audio ad / radio ad
  • Jingle / sonic logo
  • Meditation / guided audio
  • Or describe your own"
Wait for response.
Q2: Content
"What's the text/content to speak?
  • Paste the text here
  • Or describe what you need and I'll write it"
Wait for response.
Q3: Voice
"What voice style?
  • Professional
  • Warm/friendly
  • Energetic
  • Calm/soothing
  • Dramatic
  • Or describe your own"
Wait for response.
Q4: Music
"Do you want background music?
  • Yes — describe the style (ambient, upbeat, cinematic, etc.)
  • No — voice only"
Wait for response.
Q5: Duration
"What's the target duration?
  • Let it be natural length
  • Or specify (e.g., 30 seconds, 2 minutes)"
Wait for response.
⚠️ 每个问题都使用
AskUserQuestion
工具
。请勿直接在回复中打印问题——使用工具创建带有以下选项的交互式提示。
问题1:内容类型
"我将为你创建所需音频!首先——你需要什么类型的音频?
  • 有声书/解说
  • 旁白(用于视频/演示文稿)
  • 音频广告/广播广告
  • 广告短曲/声音标识
  • 冥想/引导音频
  • 或描述你的自定义需求"
等待回复。
问题2:内容文本
"需要转换的文本/内容是什么?
  • 在此粘贴文本
  • 或描述需求,我来帮你撰写"
等待回复。
问题3:语音风格
"你想要什么语音风格
  • 专业正式
  • 温暖友好
  • 活力充沛
  • 平静舒缓
  • 富有戏剧性
  • 或描述你的自定义风格"
等待回复。
问题4:背景音乐
"是否需要背景音乐
  • 是——描述风格(环境音、欢快、电影感等)
  • 否——仅保留语音"
等待回复。
问题5:时长要求
"目标时长是多少?
  • 保持自然长度
  • 或指定时长(如30秒、2分钟)"
等待回复。

Quick Reference

快速参考

QuestionDetermines
TypeProcessing approach and output format
ContentTTS input text
VoiceVoice selection and style parameters
MusicWhether to generate and mix music
DurationPacing and content length

问题决定内容
类型处理方式和输出格式
内容TTS输入文本
语音语音选择和风格参数
音乐是否生成并混合音乐
时长语速和内容长度

Step 2: Prepare the Content

步骤2:准备内容

For narration/voiceover:
  • Optimize text for speech (spell out numbers if needed)
  • Add natural pause points (commas, periods)
  • Break long content into chunks if > 32k tokens
For jingles/audio ads:
  • Write the tagline/copy
  • Determine music style
  • Plan structure: music intro → voice → music outro
For audiobooks:
  • Split into chapters
  • Consider different voice styles for different sections
  • Plan ambient music (subtle, low volume)

对于解说/旁白:
  • 优化文本以适配语音(必要时拼写数字)
  • 添加自然停顿点(逗号、句号)
  • 若内容超过32k tokens,将其拆分为多个片段
对于广告短曲/音频广告:
  • 撰写标语/文案
  • 确定音乐风格
  • 规划结构:音乐开场 → 语音 → 音乐结尾
对于有声书:
  • 拆分为章节
  • 考虑为不同章节使用不同语音风格
  • 规划环境音乐(轻柔、低音量)

Step 3: Generate Assets

步骤3:生成资源

Type: Voiceover / Narration

类型:旁白/解说

Generate narration (Gemini TTS):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Your narration text here..." \
  --voice Charon \
  --style "Professional, measured pace, warm and authoritative"
Generate background music if needed (Lyria):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "subtle ambient, corporate, unobtrusive, background" \
  --duration 120 \
  --density 0.2 \
  --brightness 0.4
Mix voice with music:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice narration.wav \
  --music background.wav \
  --music-volume 0.15 \
  --fade-in 2 \
  --fade-out 3 \
  -o final_voiceover.mp3

生成解说(Gemini TTS):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Your narration text here..." \
  --voice Charon \
  --style "Professional, measured pace, warm and authoritative"
若需要,生成背景音乐(Lyria):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "subtle ambient, corporate, unobtrusive, background" \
  --duration 120 \
  --density 0.2 \
  --brightness 0.4
混合语音与音乐:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice narration.wav \
  --music background.wav \
  --music-volume 0.15 \
  --fade-in 2 \
  --fade-out 3 \
  -o final_voiceover.mp3

Type: Audio Ad / Radio Spot

类型:音频广告/广播广告

Structure: 30-second radio ad
0-3s:   Music hook (attention grabber)
3-25s:  Voice with music bed underneath
25-30s: Music + tagline + CTA
Generate energetic music:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat, energetic, advertising, catchy, radio jingle" \
  --duration 35 \
  --bpm 120
Generate voice with style:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Tired of ordinary coffee? Wake up to extraordinary! Premium beans, perfect roast, delivered fresh. Visit BestCoffee.com today and get 20% off your first order!" \
  --voice Puck \
  --style "Energetic, radio announcer style, enthusiastic, clear call to action"
Mix and assemble:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice ad_voice.wav \
  --music ad_music.wav \
  --music-volume 0.35 \
  --fade-in 1 \
  --fade-out 2 \
  -o radio_ad.mp3

结构:30秒广播广告
0-3s:   音乐钩子(吸引注意力)
3-25s:  语音+背景音乐
25-30s: 音乐+标语+行动号召
生成活力音乐:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat, energetic, advertising, catchy, radio jingle" \
  --duration 35 \
  --bpm 120
生成带风格的语音:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Tired of ordinary coffee? Wake up to extraordinary! Premium beans, perfect roast, delivered fresh. Visit BestCoffee.com today and get 20% off your first order!" \
  --voice Puck \
  --style "Energetic, radio announcer style, enthusiastic, clear call to action"
混合并合成:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice ad_voice.wav \
  --music ad_music.wav \
  --music-volume 0.35 \
  --fade-in 1 \
  --fade-out 2 \
  -o radio_ad.mp3

Type: Jingle / Sonic Logo

类型:广告短曲/声音标识

For jingle with tagline:
Generate catchy music:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "catchy jingle, memorable, brand audio, upbeat, major key" \
  --duration 10 \
  --bpm 110 \
  --scale C
Generate tagline:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "TechCorp. Innovation for tomorrow." \
  --voice Kore \
  --style "Confident, aspirational, slight pause between company name and tagline"
Mix tagline over music:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice tagline.wav \
  --music jingle.wav \
  --music-volume 0.5 \
  -o brand_jingle.mp3
For sonic logo (music only):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "sonic logo, 3 seconds, memorable, brand identifier, simple, distinctive" \
  --duration 5 \
  --bpm 100

带标语的广告短曲:
生成朗朗上口的音乐:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "catchy jingle, memorable, brand audio, upbeat, major key" \
  --duration 10 \
  --bpm 110 \
  --scale C
生成标语语音:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "TechCorp. Innovation for tomorrow." \
  --voice Kore \
  --style "Confident, aspirational, slight pause between company name and tagline"
将标语与音乐混合:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice tagline.wav \
  --music jingle.wav \
  --music-volume 0.5 \
  -o brand_jingle.mp3
纯音乐声音标识:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "sonic logo, 3 seconds, memorable, brand identifier, simple, distinctive" \
  --duration 5 \
  --bpm 100

Type: Audiobook

类型:有声书

Process chapters:
bash
undefined
处理章节:
bash
undefined

Chapter 1

Chapter 1

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter1.txt
--voice Algieba
--style "Audiobook narrator, measured pace, engaging storytelling"
-o chapter1.wav
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter1.txt
--voice Algieba
--style "Audiobook narrator, measured pace, engaging storytelling"
-o chapter1.wav

Chapter 2

Chapter 2

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter2.txt
--voice Algieba
-o chapter2.wav

**Optional: Add subtle ambient music:**
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "ambient, subtle, reading music, calm, unobtrusive, soft piano" \
  --duration 600 \
  --density 0.1 \
  --brightness 0.3
Concatenate chapters:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_concat.py \
  -i chapter1.wav chapter2.wav chapter3.wav \
  --crossfade 0.5 \
  -o audiobook.mp3

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter2.txt
--voice Algieba
-o chapter2.wav

**可选:添加轻柔环境音乐:**
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "ambient, subtle, reading music, calm, unobtrusive, soft piano" \
  --duration 600 \
  --density 0.1 \
  --brightness 0.3
拼接章节:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_concat.py \
  -i chapter1.wav chapter2.wav chapter3.wav \
  --crossfade 0.5 \
  -o audiobook.mp3

Type: Meditation / Relaxation Audio

类型:冥想/放松音频

Generate calming narration:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Close your eyes. Take a deep breath in... and slowly release..." \
  --voice Achernar \
  --style "Calm, soothing, slow pace, relaxing, gentle, meditation guide"
Generate ambient soundscape:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "ambient, meditation, peaceful, nature sounds, gentle, calming" \
  --duration 300 \
  --density 0.1 \
  --brightness 0.6
Mix with high ambient volume:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice meditation_guide.wav \
  --music ambient.wav \
  --music-volume 0.5 \
  -o meditation_session.mp3

生成舒缓解说:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Close your eyes. Take a deep breath in... and slowly release..." \
  --voice Achernar \
  --style "Calm, soothing, slow pace, relaxing, gentle, meditation guide"
生成环境音景:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "ambient, meditation, peaceful, nature sounds, gentle, calming" \
  --duration 300 \
  --density 0.1 \
  --brightness 0.6
高音量混合环境音:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice meditation_guide.wav \
  --music ambient.wav \
  --music-volume 0.5 \
  -o meditation_session.mp3

Step 4: Deliver the Result

步骤4:交付结果

Example delivery:
"✅ Your audio ad is ready!
File:
coffee_radio_ad.mp3
(30s)
What I created:
  • Energetic voiceover (Puck voice, radio announcer style)
  • Upbeat background music (120 BPM)
  • Music ducks under voice, fades out at end
Structure:
  • 0-3s: Music hook
  • 3-25s: Voice + music bed
  • 25-30s: Music swell + tagline
Want me to:
  • Try a different voice?
  • Change the music energy?
  • Adjust timing?"

交付示例:
"✅ 你的音频广告已生成完成!
文件:
coffee_radio_ad.mp3
(30秒)
生成内容说明:
  • 活力旁白(Puck语音,广播播音员风格)
  • 欢快背景音乐(120 BPM)
  • 语音播放时音乐自动降低音量,结尾渐弱
结构:
  • 0-3秒:音乐钩子
  • 3-25秒:语音+背景音乐
  • 25-30秒:音乐升高+标语
是否需要调整:
  • 更换语音?
  • 调整音乐活力?
  • 修改时长?"

Voice Recommendations by Type

按内容类型推荐语音

Audio TypeRecommended VoicesStyle Direction
Corporate voiceoverCharon, OrusProfessional, measured
AudiobookAlgieba, DespinaSmooth, engaging
Radio adPuck, LaomedeiaEnergetic, upbeat
MeditationAchernar, SulafatCalm, soothing
Jingle taglineKore, AlnilamConfident, memorable
DocumentaryGacrux, RasalgethiMature, authoritative
TutorialAchird, CharonFriendly, clear
音频类型推荐语音风格方向
企业旁白Charon、Orus专业、沉稳
有声书Algieba、Despina流畅、引人入胜
广播广告Puck、Laomedeia活力、欢快
冥想音频Achernar、Sulafat平静、舒缓
广告短曲标语Kore、Alnilam自信、易记
纪录片解说Gacrux、Rasalgethi成熟、权威
教程解说Achird、Charon友好、清晰

Music Recommendations by Type

按内容类型推荐音乐

Audio TypeLyria PromptSettings
Corporate VO"subtle, professional, ambient"density: 0.2, brightness: 0.4
Radio ad"upbeat, energetic, catchy"bpm: 120, density: 0.6
Audiobook"soft, ambient, unobtrusive"density: 0.1, brightness: 0.3
Meditation"peaceful, ambient, nature"density: 0.1, brightness: 0.6
Jingle"catchy, memorable, brand"bpm: 110, density: 0.5

音频类型Lyria提示词设置参数
企业旁白"subtle, professional, ambient"density: 0.2, brightness: 0.4
广播广告"upbeat, energetic, catchy"bpm: 120, density: 0.6
有声书"soft, ambient, unobtrusive"density: 0.1, brightness: 0.3
冥想音频"peaceful, ambient, nature"density: 0.1, brightness: 0.6
广告短曲"catchy, memorable, brand"bpm: 110, density: 0.5

Limitations

限制条件

  • Gemini TTS max: 32k tokens per request (split longer content)
  • Lyria instrumental only: No vocals in background music
  • Processing time: Long audiobooks take time to generate
  • Gemini TTS上限:每次请求最多32k tokens(长内容需拆分)
  • Lyria仅支持器乐:背景音乐无 vocals
  • 处理时长:长篇有声书生成需要较长时间

Example Prompts

示例提示词

Voiceover:
"Create a professional voiceover for this script: '...' Add subtle corporate background music."
Audio ad:
"Create a 30-second radio ad for our coffee brand. Energetic, memorable, with catchy music. End with 'Visit BestCoffee.com'"
Jingle:
"Create a 5-second jingle for TechCorp. Modern, memorable, with the tagline 'Innovation for tomorrow'"
Audiobook:
"Convert this text into an audiobook chapter. Use a warm, engaging narrator voice. Add subtle ambient music."
Meditation:
"Create a 5-minute guided meditation. Calm, soothing voice with peaceful ambient background."
旁白:
"为这个脚本创建专业旁白:'...' 添加轻柔的企业背景音乐。"
音频广告:
"为我们的咖啡品牌创建30秒广播广告。活力充沛、令人难忘,搭配朗朗上口的音乐。结尾加上'Visit BestCoffee.com'"
广告短曲:
"为TechCorp创建5秒广告短曲。现代、易记,搭配标语'Innovation for tomorrow'"
有声书:
"将这段文本转换为有声书章节。使用温暖、引人入胜的旁白语音。添加轻柔的环境音乐。"
冥想音频:
"创建5分钟引导冥想音频。平静、舒缓的语音搭配宁静的环境背景音。"