audio-producer-agent

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Audio Producer

Create single-speaker audio content: audiobooks, voiceovers, narrations, jingles, and more.

This is an orchestrator skill that combines:

Text-to-speech / narration (Gemini TTS, ElevenLabs, or OpenAI TTS)
Background music / ambient audio (Lyria)
Audio assembly (FFmpeg via media-utils)

For dialogues and conversations, use

podcast-producer

instead.

创建单语音频内容：有声书、旁白、解说、广告短曲等。

这是一个整合型Skill，包含以下功能：

文本转语音/解说（Gemini TTS、ElevenLabs或OpenAI TTS）
背景音乐/环境音（Lyria）
音频合成（通过media-utils调用FFmpeg）

若需创建对话类内容，请使用

podcast-producer

。

What You Can Create

可创建的内容类型

Type	Example
Audiobook	Long-form narration of text/chapters
Voiceover	Narration for video, presentation, or slideshow
Audio ad	Radio or podcast advertisement
Jingle	Short brand music with optional tagline
Sonic logo	Audio brand identifier (few seconds)
Audio guide	Museum/tour style narration
Meditation	Guided relaxation with ambient audio
Soundscape	Ambient audio environment

类型	示例
有声书	文本/章节的长篇解说
旁白	视频、演示文稿或幻灯片的解说
音频广告	广播或播客广告
广告短曲	带有可选标语的品牌短音乐
声音标识	品牌音频标识（数秒时长）
音频指南	博物馆/游览风格的解说
冥想音频	带环境音的引导放松音频
环境音景	沉浸式环境音频

Prerequisites

前置条件

```
GOOGLE_API_KEY
```
- For Gemini TTS (voice) and Lyria (music)
FFmpeg installed:
```
brew install ffmpeg
```

```
GOOGLE_API_KEY
```
- 用于Gemini TTS（语音）和Lyria（音乐）生成
已安装FFmpeg：
```
brew install ffmpeg
```

Workflow

工作流程

Step 1: Gather Requirements (REQUIRED)

步骤1：收集需求（必填）

⚠️ DO NOT skip this step. Use interactive questioning — ask ONE question at a time.

⚠️ 请勿跳过此步骤。使用交互式提问——一次只提一个问题。

Question Flow

提问流程

⚠️ Use the
AskUserQuestion
tool for each question below. Do not just print questions in your response — use the tool to create interactive prompts with the options shown.

Q1: Type

"I'll create that audio for you! First — what type of audio?

Audiobook / narration

Voiceover (for video/presentation)

Audio ad / radio ad

Jingle / sonic logo

Meditation / guided audio

Or describe your own"

Wait for response.

Q2: Content

"What's the text/content to speak?

Paste the text here

Or describe what you need and I'll write it"

Wait for response.

Q3: Voice

"What voice style?

Professional

Warm/friendly

Energetic

Calm/soothing

Dramatic

Or describe your own"

Wait for response.

Q4: Music

"Do you want background music?

Yes — describe the style (ambient, upbeat, cinematic, etc.)

No — voice only"

Wait for response.

Q5: Duration

"What's the target duration?

Let it be natural length

Or specify (e.g., 30 seconds, 2 minutes)"

Wait for response.

⚠️ 每个问题都使用
AskUserQuestion
工具。请勿直接在回复中打印问题——使用工具创建带有以下选项的交互式提示。

问题1：内容类型

"我将为你创建所需音频！首先——你需要什么类型的音频？

有声书/解说

旁白（用于视频/演示文稿）

音频广告/广播广告

广告短曲/声音标识

冥想/引导音频

或描述你的自定义需求"

等待回复。

问题2：内容文本

"需要转换的文本/内容是什么？

在此粘贴文本

或描述需求，我来帮你撰写"

等待回复。

问题3：语音风格

"你想要什么语音风格？

专业正式

温暖友好

活力充沛

平静舒缓

富有戏剧性

或描述你的自定义风格"

等待回复。

问题4：背景音乐

"是否需要背景音乐？

是——描述风格（环境音、欢快、电影感等）

否——仅保留语音"

等待回复。

问题5：时长要求

"目标时长是多少？

保持自然长度

或指定时长（如30秒、2分钟）"

等待回复。

Quick Reference

快速参考

Question	Determines
Type	Processing approach and output format
Content	TTS input text
Voice	Voice selection and style parameters
Music	Whether to generate and mix music
Duration	Pacing and content length

问题	决定内容
类型	处理方式和输出格式
内容	TTS输入文本
语音	语音选择和风格参数
音乐	是否生成并混合音乐
时长	语速和内容长度

Step 2: Prepare the Content

步骤2：准备内容

For narration/voiceover:

Optimize text for speech (spell out numbers if needed)
Add natural pause points (commas, periods)
Break long content into chunks if > 32k tokens

For jingles/audio ads:

Write the tagline/copy
Determine music style
Plan structure: music intro → voice → music outro

For audiobooks:

Split into chapters
Consider different voice styles for different sections
Plan ambient music (subtle, low volume)

对于解说/旁白：

优化文本以适配语音（必要时拼写数字）
添加自然停顿点（逗号、句号）
若内容超过32k tokens，将其拆分为多个片段

对于广告短曲/音频广告：

撰写标语/文案
确定音乐风格
规划结构：音乐开场 → 语音 → 音乐结尾

对于有声书：

拆分为章节
考虑为不同章节使用不同语音风格
规划环境音乐（轻柔、低音量）

Step 3: Generate Assets

步骤3：生成资源

Type: Voiceover / Narration

类型：旁白/解说

Generate narration (Gemini TTS):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Your narration text here..." \
  --voice Charon \
  --style "Professional, measured pace, warm and authoritative"

Generate background music if needed (Lyria):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "subtle ambient, corporate, unobtrusive, background" \
  --duration 120 \
  --density 0.2 \
  --brightness 0.4

Mix voice with music:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice narration.wav \
  --music background.wav \
  --music-volume 0.15 \
  --fade-in 2 \
  --fade-out 3 \
  -o final_voiceover.mp3

生成解说（Gemini TTS）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Your narration text here..." \
  --voice Charon \
  --style "Professional, measured pace, warm and authoritative"

若需要，生成背景音乐（Lyria）：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "subtle ambient, corporate, unobtrusive, background" \
  --duration 120 \
  --density 0.2 \
  --brightness 0.4

混合语音与音乐：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice narration.wav \
  --music background.wav \
  --music-volume 0.15 \
  --fade-in 2 \
  --fade-out 3 \
  -o final_voiceover.mp3

Type: Audio Ad / Radio Spot

类型：音频广告/广播广告

Structure: 30-second radio ad

0-3s:   Music hook (attention grabber)
3-25s:  Voice with music bed underneath
25-30s: Music + tagline + CTA

Generate energetic music:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat, energetic, advertising, catchy, radio jingle" \
  --duration 35 \
  --bpm 120

Generate voice with style:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Tired of ordinary coffee? Wake up to extraordinary! Premium beans, perfect roast, delivered fresh. Visit BestCoffee.com today and get 20% off your first order!" \
  --voice Puck \
  --style "Energetic, radio announcer style, enthusiastic, clear call to action"

Mix and assemble:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice ad_voice.wav \
  --music ad_music.wav \
  --music-volume 0.35 \
  --fade-in 1 \
  --fade-out 2 \
  -o radio_ad.mp3

结构：30秒广播广告

0-3s:   音乐钩子（吸引注意力）
3-25s:  语音+背景音乐
25-30s: 音乐+标语+行动号召

生成活力音乐：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "upbeat, energetic, advertising, catchy, radio jingle" \
  --duration 35 \
  --bpm 120

生成带风格的语音：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Tired of ordinary coffee? Wake up to extraordinary! Premium beans, perfect roast, delivered fresh. Visit BestCoffee.com today and get 20% off your first order!" \
  --voice Puck \
  --style "Energetic, radio announcer style, enthusiastic, clear call to action"

混合并合成：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice ad_voice.wav \
  --music ad_music.wav \
  --music-volume 0.35 \
  --fade-in 1 \
  --fade-out 2 \
  -o radio_ad.mp3

Type: Jingle / Sonic Logo

类型：广告短曲/声音标识

For jingle with tagline:

Generate catchy music:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "catchy jingle, memorable, brand audio, upbeat, major key" \
  --duration 10 \
  --bpm 110 \
  --scale C

Generate tagline:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "TechCorp. Innovation for tomorrow." \
  --voice Kore \
  --style "Confident, aspirational, slight pause between company name and tagline"

Mix tagline over music:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice tagline.wav \
  --music jingle.wav \
  --music-volume 0.5 \
  -o brand_jingle.mp3

For sonic logo (music only):

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "sonic logo, 3 seconds, memorable, brand identifier, simple, distinctive" \
  --duration 5 \
  --bpm 100

带标语的广告短曲：

生成朗朗上口的音乐：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "catchy jingle, memorable, brand audio, upbeat, major key" \
  --duration 10 \
  --bpm 110 \
  --scale C

生成标语语音：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "TechCorp. Innovation for tomorrow." \
  --voice Kore \
  --style "Confident, aspirational, slight pause between company name and tagline"

将标语与音乐混合：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice tagline.wav \
  --music jingle.wav \
  --music-volume 0.5 \
  -o brand_jingle.mp3

纯音乐声音标识：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "sonic logo, 3 seconds, memorable, brand identifier, simple, distinctive" \
  --duration 5 \
  --bpm 100

Type: Audiobook

类型：有声书

Process chapters:

bash

undefined

处理章节：

bash

undefined

Chapter 1

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter1.txt
--voice Algieba
--style "Audiobook narrator, measured pace, engaging storytelling"
-o chapter1.wav

Chapter 2

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter2.txt
--voice Algieba
-o chapter2.wav


**Optional: Add subtle ambient music:**
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "ambient, subtle, reading music, calm, unobtrusive, soft piano" \
  --duration 600 \
  --density 0.1 \
  --brightness 0.3

Concatenate chapters:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_concat.py \
  -i chapter1.wav chapter2.wav chapter3.wav \
  --crossfade 0.5 \
  -o audiobook.mp3

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py
--text-file chapter2.txt
--voice Algieba
-o chapter2.wav


**可选：添加轻柔环境音乐：**
```bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "ambient, subtle, reading music, calm, unobtrusive, soft piano" \
  --duration 600 \
  --density 0.1 \
  --brightness 0.3

拼接章节：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_concat.py \
  -i chapter1.wav chapter2.wav chapter3.wav \
  --crossfade 0.5 \
  -o audiobook.mp3

Type: Meditation / Relaxation Audio

类型：冥想/放松音频

Generate calming narration:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Close your eyes. Take a deep breath in... and slowly release..." \
  --voice Achernar \
  --style "Calm, soothing, slow pace, relaxing, gentle, meditation guide"

Generate ambient soundscape:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "ambient, meditation, peaceful, nature sounds, gentle, calming" \
  --duration 300 \
  --density 0.1 \
  --brightness 0.6

Mix with high ambient volume:

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice meditation_guide.wav \
  --music ambient.wav \
  --music-volume 0.5 \
  -o meditation_session.mp3

生成舒缓解说：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/voice-generation/scripts/gemini_tts.py \
  --text "Close your eyes. Take a deep breath in... and slowly release..." \
  --voice Achernar \
  --style "Calm, soothing, slow pace, relaxing, gentle, meditation guide"

生成环境音景：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/music-generation/scripts/lyria.py \
  --prompt "ambient, meditation, peaceful, nature sounds, gentle, calming" \
  --duration 300 \
  --density 0.1 \
  --brightness 0.6

高音量混合环境音：

bash

python3 ${CLAUDE_PLUGIN_ROOT}/skills/media-utils/scripts/audio_mix.py \
  --voice meditation_guide.wav \
  --music ambient.wav \
  --music-volume 0.5 \
  -o meditation_session.mp3

Step 4: Deliver the Result

步骤4：交付结果

Example delivery:

"✅ Your audio ad is ready!

File:

coffee_radio_ad.mp3

(30s)

What I created:

Energetic voiceover (Puck voice, radio announcer style)
Upbeat background music (120 BPM)
Music ducks under voice, fades out at end

Structure:

0-3s: Music hook
3-25s: Voice + music bed
25-30s: Music swell + tagline

Want me to:

Try a different voice?
Change the music energy?
Adjust timing?"

交付示例：

"✅ 你的音频广告已生成完成！

文件：

coffee_radio_ad.mp3

（30秒）

生成内容说明：

活力旁白（Puck语音，广播播音员风格）
欢快背景音乐（120 BPM）
语音播放时音乐自动降低音量，结尾渐弱

结构：

0-3秒：音乐钩子
3-25秒：语音+背景音乐
25-30秒：音乐升高+标语

是否需要调整：

更换语音？
调整音乐活力？
修改时长？"

Voice Recommendations by Type

按内容类型推荐语音

Audio Type	Recommended Voices	Style Direction
Corporate voiceover	Charon, Orus	Professional, measured
Audiobook	Algieba, Despina	Smooth, engaging
Radio ad	Puck, Laomedeia	Energetic, upbeat
Meditation	Achernar, Sulafat	Calm, soothing
Jingle tagline	Kore, Alnilam	Confident, memorable
Documentary	Gacrux, Rasalgethi	Mature, authoritative
Tutorial	Achird, Charon	Friendly, clear

音频类型	推荐语音	风格方向
企业旁白	Charon、Orus	专业、沉稳
有声书	Algieba、Despina	流畅、引人入胜
广播广告	Puck、Laomedeia	活力、欢快
冥想音频	Achernar、Sulafat	平静、舒缓
广告短曲标语	Kore、Alnilam	自信、易记
纪录片解说	Gacrux、Rasalgethi	成熟、权威
教程解说	Achird、Charon	友好、清晰

Music Recommendations by Type

按内容类型推荐音乐

Audio Type	Lyria Prompt	Settings
Corporate VO	"subtle, professional, ambient"	density: 0.2, brightness: 0.4
Radio ad	"upbeat, energetic, catchy"	bpm: 120, density: 0.6
Audiobook	"soft, ambient, unobtrusive"	density: 0.1, brightness: 0.3
Meditation	"peaceful, ambient, nature"	density: 0.1, brightness: 0.6
Jingle	"catchy, memorable, brand"	bpm: 110, density: 0.5

音频类型	Lyria提示词	设置参数
企业旁白	"subtle, professional, ambient"	density: 0.2, brightness: 0.4
广播广告	"upbeat, energetic, catchy"	bpm: 120, density: 0.6
有声书	"soft, ambient, unobtrusive"	density: 0.1, brightness: 0.3
冥想音频	"peaceful, ambient, nature"	density: 0.1, brightness: 0.6
广告短曲	"catchy, memorable, brand"	bpm: 110, density: 0.5

Limitations

限制条件

Gemini TTS max: 32k tokens per request (split longer content)
Lyria instrumental only: No vocals in background music
Processing time: Long audiobooks take time to generate

Gemini TTS上限：每次请求最多32k tokens（长内容需拆分）
Lyria仅支持器乐：背景音乐无 vocals
处理时长：长篇有声书生成需要较长时间

Example Prompts

示例提示词

Voiceover:

"Create a professional voiceover for this script: '...' Add subtle corporate background music."

Audio ad:

"Create a 30-second radio ad for our coffee brand. Energetic, memorable, with catchy music. End with 'Visit BestCoffee.com'"

Jingle:

"Create a 5-second jingle for TechCorp. Modern, memorable, with the tagline 'Innovation for tomorrow'"

Audiobook:

"Convert this text into an audiobook chapter. Use a warm, engaging narrator voice. Add subtle ambient music."

Meditation:

"Create a 5-minute guided meditation. Calm, soothing voice with peaceful ambient background."

旁白：

"为这个脚本创建专业旁白：'...' 添加轻柔的企业背景音乐。"

音频广告：

"为我们的咖啡品牌创建30秒广播广告。活力充沛、令人难忘，搭配朗朗上口的音乐。结尾加上'Visit BestCoffee.com'"

广告短曲：

"为TechCorp创建5秒广告短曲。现代、易记，搭配标语'Innovation for tomorrow'"

有声书：

"将这段文本转换为有声书章节。使用温暖、引人入胜的旁白语音。添加轻柔的环境音乐。"

冥想音频：

"创建5分钟引导冥想音频。平静、舒缓的语音搭配宁静的环境背景音。"