happy-audio-gen
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesehappy-audio-gen
happy-audio-gen
Turns text into speech across 6 providers through one CLI. All providers are synchronous (TTS is fast — typically under 10 seconds) except Bailian's voice-design flow (which is still covered but uses a longer poll window).
通过一个CLI工具实现跨6家服务商的文本转语音功能。除百炼的自定义语音设计流程(仍支持该功能,但需更长的轮询等待时间)外,所有服务商均为同步处理(TTS速度很快——通常不到10秒)。
Quick usage
快速使用
bash
undefinedbash
undefinedShortest path — OpenAI default voice
最简方式——使用OpenAI默认语音
bun scripts/main.ts --text "Hello, world" --out ./hello.mp3
bun scripts/main.ts --text "Hello, world" --out ./hello.mp3
Chinese, MiniMax
中文语音,使用MiniMax
bun scripts/main.ts --provider minimax --text "大家好" --voice male-qn-qingse --out ./hello.mp3
bun scripts/main.ts --provider minimax --text "大家好" --voice male-qn-qingse --out ./hello.mp3
Long-form, Bailian (auto-splits by sentence)
长文本,使用百炼(按句子自动分段)
bun scripts/main.ts --provider bailian --textfiles ./script.md --out ./narration.mp3
undefinedbun scripts/main.ts --provider bailian --textfiles ./script.md --out ./narration.mp3
undefinedWhen to invoke this skill
何时调用本Skill
- User asks to synthesize speech / TTS / read aloud / narrate / dub / make a voice-over.
- User asks to convert script / text / article into audio.
- User names a TTS voice or model.
Do not route here when the user wants to transcribe audio → text (that's STT, different domain), or edit / mix audio files (use a dedicated audio editor).
- 用户要求合成语音/TTS/朗读/生成旁白/配音/制作语音旁白时。
- 用户要求将脚本/文本/文章转换为音频时。
- 用户提及某个TTS语音或模型时。
请勿将以下请求路由至此:用户需要将音频转录为文本(这属于STT,是不同领域),或是编辑/混音音频文件(请使用专用音频编辑器)。
Step 0: Preflight (BLOCKING)
步骤0:预检查(阻塞操作)
-
Locate EXTEND.md:
./.happy-skills/happy-audio-gen/EXTEND.md$XDG_CONFIG_HOME/happy-skills/happy-audio-gen/EXTEND.md~/.happy-skills/happy-audio-gen/EXTEND.md
If none found, runand walk the user throughbun scripts/main.ts --setup.references/config/first-time-setup.md -
Verify at least one provider has credentials (env var or 1Password reference).
-
Verify Bun is available. Fallback:.
npx -y bun
-
找到EXTEND.md文件:
./.happy-skills/happy-audio-gen/EXTEND.md$XDG_CONFIG_HOME/happy-skills/happy-audio-gen/EXTEND.md~/.happy-skills/happy-audio-gen/EXTEND.md
如果未找到任何文件,请运行,并引导用户查看bun scripts/main.ts --setup。references/config/first-time-setup.md -
验证至少有一家服务商的凭证已配置(环境变量或1Password引用)。
-
验证Bun已安装。备选方案:。
npx -y bun
Step 1: Choose provider
步骤1:选择服务商
Preference order:
--provider <id>- EXTEND.md
default_provider - Auto-detect env vars:
openai > elevenlabs > bailian > minimax > siliconflow > playht
Pick by language / voice intent:
- English, natural + fast → (gpt-4o-mini-tts / tts-1).
openai - Multilingual, voice cloning → .
elevenlabs - Chinese, long-form → (qwen-tts auto-chunks long scripts) or
bailian.minimax - Chinese dialect / voice design → (voice-design with qwen3-tts-vd) or
bailian(CosyVoice2).siliconflow - Ultra-realistic, short-form → (2.0).
playht
优先级顺序:
- 参数指定
--provider <id> - EXTEND.md中的配置
default_provider - 自动检测环境变量:
openai > elevenlabs > bailian > minimax > siliconflow > playht
根据语言/语音需求选择:
- 英文,自然流畅+快速 → (gpt-4o-mini-tts/tts-1)。
openai - 多语言,支持语音克隆 → 。
elevenlabs - 中文,长文本 → (qwen-tts自动拆分长脚本)或
bailian。minimax - 中文方言/自定义语音设计 → (qwen3-tts-vd支持自定义语音设计)或
bailian(CosyVoice2)。siliconflow - 超写实风格,短文本 → (2.0)。
playht
Step 2: Fill parameters
步骤2:填写参数
- or
--text: input. Always quote.--textfiles - : REQUIRED. Extension determines format (
--out <path>/.mp3/.wav/.ogg)..flac - : provider-specific. See
--voice <id>for the short list of well-known voices.references/voices.md - : speaking rate.
--rate 0.5..2.0 - : voice direction (only
--instruction "..."gpt-4o-mini-tts andopenaihonor this).siliconflow - :
--language <code>,en,zh— only a few providers honor this explicitly.ja
- 或
--text:输入内容。请始终用引号包裹。--textfiles - :必填参数。文件扩展名决定输出格式(
--out <path>/.mp3/.wav/.ogg)。.flac - :服务商专属参数。请查看
--voice <id>获取知名语音的简短列表。references/voices.md - :语速。
--rate 0.5..2.0 - :语音风格指引(仅
--instruction "..."gpt-4o-mini-tts和openai支持该参数)。siliconflow - :语言代码,如
--language <code>、en、zh——仅部分服务商明确支持该参数。ja
Step 3: Run
步骤3:运行
bash
bun scripts/main.ts \
--provider openai \
--model gpt-4o-mini-tts \
--voice alloy \
--text "..." \
--out ./out.mp3JSON mode:
json
{ "success": true, "provider": "openai", "model": "gpt-4o-mini-tts", "voice": "alloy", "output": "/abs/out.mp3", "size_bytes": 76032, "format": "mp3" }bash
bun scripts/main.ts \
--provider openai \
--model gpt-4o-mini-tts \
--voice alloy \
--text "..." \
--out ./out.mp3JSON模式输出:
json
{ "success": true, "provider": "openai", "model": "gpt-4o-mini-tts", "voice": "alloy", "output": "/abs/out.mp3", "size_bytes": 76032, "format": "mp3" }Step 4: Long text handling
步骤4:长文本处理
- automatically splits long input for providers that cap per-call length (Bailian ≤ 200 Chinese chars per call). Chunks are concatenated byte-for-byte on output.
happy-audio-gen - For best fidelity with concatenated MP3s, stitch the segments with ffmpeg afterward rather than relying on byte concat.
- 会自动为有单调用长度限制的服务商拆分长输入内容(百炼单调用最多支持200个中文字符)。输出时会将分段内容按字节拼接。
happy-audio-gen - 若要获得拼接后MP3的最佳音质,建议后续使用ffmpeg进行拼接,而非依赖字节拼接方式。
Step 5: Errors
步骤5:错误处理
- with
[openai] OpenAI TTS 400→ the voice name is not supported by the model. Use one ofinvalid voice,alloy,ash,coral,echo,fable,onyx,nova,sage.shimmer - → try
[minimax] ... 2049 invalid api key(different region).MINIMAX_BASE_URL=https://api.minimaxi.com/v1 - → Aliyun content filter. Surface to the user.
[bailian] ... 400 DataInspectionFailed - → key invalid or subscription expired.
[elevenlabs] 401
- 伴随
[openai] OpenAI TTS 400错误→语音名称不被该模型支持。请使用invalid voice、alloy、ash、coral、echo、fable、onyx、nova、sage中的一种。shimmer - 错误→尝试设置
[minimax] ... 2049 invalid api key(不同区域)。MINIMAX_BASE_URL=https://api.minimaxi.com/v1 - 错误→阿里云内容审核未通过。请告知用户。
[bailian] ... 400 DataInspectionFailed - 错误→密钥无效或订阅已过期。
[elevenlabs] 401
References
参考文档
- — per-provider env vars, default models, voice lists.
references/providers.md - — curated voices for each provider.
references/voices.md - — common errors and fixes.
references/error_codes.md references/config/first-time-setup.mdreferences/config/extend-schema.mdassets/EXTEND.template.md
- ——各服务商的环境变量、默认模型、语音列表。
references/providers.md - ——为各服务商精选的语音列表。
references/voices.md - ——常见错误及修复方案。
references/error_codes.md references/config/first-time-setup.mdreferences/config/extend-schema.mdassets/EXTEND.template.md