happy-audio-gen

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

happy-audio-gen

Turns text into speech across 6 providers through one CLI. All providers are synchronous (TTS is fast — typically under 10 seconds) except Bailian's voice-design flow (which is still covered but uses a longer poll window).

通过一个CLI工具实现跨6家服务商的文本转语音功能。除百炼的自定义语音设计流程（仍支持该功能，但需更长的轮询等待时间）外，所有服务商均为同步处理（TTS速度很快——通常不到10秒）。

Quick usage

快速使用

bash

undefined

bash

undefined

Shortest path — OpenAI default voice

最简方式——使用OpenAI默认语音

bun scripts/main.ts --text "Hello, world" --out ./hello.mp3

Chinese, MiniMax

中文语音，使用MiniMax

bun scripts/main.ts --provider minimax --text "大家好" --voice male-qn-qingse --out ./hello.mp3

Long-form, Bailian (auto-splits by sentence)

长文本，使用百炼（按句子自动分段）

bun scripts/main.ts --provider bailian --textfiles ./script.md --out ./narration.mp3

undefined

bun scripts/main.ts --provider bailian --textfiles ./script.md --out ./narration.mp3

undefined

When to invoke this skill

何时调用本Skill

User asks to synthesize speech / TTS / read aloud / narrate / dub / make a voice-over.
User asks to convert script / text / article into audio.
User names a TTS voice or model.

Do not route here when the user wants to transcribe audio → text (that's STT, different domain), or edit / mix audio files (use a dedicated audio editor).

用户要求合成语音/TTS/朗读/生成旁白/配音/制作语音旁白时。
用户要求将脚本/文本/文章转换为音频时。
用户提及某个TTS语音或模型时。

请勿将以下请求路由至此：用户需要将音频转录为文本（这属于STT，是不同领域），或是编辑/混音音频文件（请使用专用音频编辑器）。

Step 0: Preflight (BLOCKING)

步骤0：预检查（阻塞操作）

Locate EXTEND.md:

./.happy-skills/happy-audio-gen/EXTEND.md

$XDG_CONFIG_HOME/happy-skills/happy-audio-gen/EXTEND.md

~/.happy-skills/happy-audio-gen/EXTEND.md

If none found, run

bun scripts/main.ts --setup

and walk the user through

references/config/first-time-setup.md

Verify at least one provider has credentials (env var or 1Password reference).
Verify Bun is available. Fallback:
```
npx -y bun
```
.

找到EXTEND.md文件：

./.happy-skills/happy-audio-gen/EXTEND.md

$XDG_CONFIG_HOME/happy-skills/happy-audio-gen/EXTEND.md

~/.happy-skills/happy-audio-gen/EXTEND.md

如果未找到任何文件，请运行

bun scripts/main.ts --setup

，并引导用户查看

references/config/first-time-setup.md

。

验证至少有一家服务商的凭证已配置（环境变量或1Password引用）。
验证Bun已安装。备选方案：
```
npx -y bun
```
。

Step 1: Choose provider

步骤1：选择服务商

Preference order:

```
--provider <id>
```
EXTEND.md
```
default_provider
```

Auto-detect env vars:

openai > elevenlabs > bailian > minimax > siliconflow > playht

Pick by language / voice intent:

English, natural + fast →
```
openai
```
(gpt-4o-mini-tts / tts-1).
Multilingual, voice cloning →
```
elevenlabs
```
.
Chinese, long-form →
```
bailian
```
(qwen-tts auto-chunks long scripts) or
```
minimax
```
.
Chinese dialect / voice design →
```
bailian
```
(voice-design with qwen3-tts-vd) or
```
siliconflow
```
(CosyVoice2).
Ultra-realistic, short-form →
```
playht
```
(2.0).

优先级顺序：

```
--provider <id>
```
参数指定
EXTEND.md中的
```
default_provider
```
配置

自动检测环境变量：

openai > elevenlabs > bailian > minimax > siliconflow > playht

根据语言/语音需求选择：

英文，自然流畅+快速 →
```
openai
```
（gpt-4o-mini-tts/tts-1）。
多语言，支持语音克隆 →
```
elevenlabs
```
。
中文，长文本 →
```
bailian
```
（qwen-tts自动拆分长脚本）或
```
minimax
```
。
中文方言/自定义语音设计 →
```
bailian
```
（qwen3-tts-vd支持自定义语音设计）或
```
siliconflow
```
（CosyVoice2）。
超写实风格，短文本 →
```
playht
```
（2.0）。

Step 2: Fill parameters

步骤2：填写参数

--text
or --textfiles
: input. Always quote.
--out <path>
: REQUIRED. Extension determines format (
```
.mp3
```
/
```
.wav
```
/
```
.ogg
```
/
```
.flac
```
).
--voice <id>
: provider-specific. See
```
references/voices.md
```
for the short list of well-known voices.
--rate 0.5..2.0
: speaking rate.
--instruction "..."
: voice direction (only
```
openai
```
gpt-4o-mini-tts and
```
siliconflow
```
honor this).
--language <code>
:
```
en
```
,
```
zh
```
,
```
ja
```
— only a few providers honor this explicitly.

--text
或 --textfiles
：输入内容。请始终用引号包裹。
--out <path>
：必填参数。文件扩展名决定输出格式（
```
.mp3
```
/
```
.wav
```
/
```
.ogg
```
/
```
.flac
```
）。
--voice <id>
：服务商专属参数。请查看
```
references/voices.md
```
获取知名语音的简短列表。
--rate 0.5..2.0
：语速。
--instruction "..."
：语音风格指引（仅
```
openai
```
gpt-4o-mini-tts和
```
siliconflow
```
支持该参数）。
--language <code>
：语言代码，如
```
en
```
、
```
zh
```
、
```
ja
```
——仅部分服务商明确支持该参数。

Step 3: Run

步骤3：运行

bash

bun scripts/main.ts \
  --provider openai \
  --model gpt-4o-mini-tts \
  --voice alloy \
  --text "..." \
  --out ./out.mp3

JSON mode:

json

{ "success": true, "provider": "openai", "model": "gpt-4o-mini-tts", "voice": "alloy", "output": "/abs/out.mp3", "size_bytes": 76032, "format": "mp3" }

bash

bun scripts/main.ts \
  --provider openai \
  --model gpt-4o-mini-tts \
  --voice alloy \
  --text "..." \
  --out ./out.mp3

JSON模式输出：

json

{ "success": true, "provider": "openai", "model": "gpt-4o-mini-tts", "voice": "alloy", "output": "/abs/out.mp3", "size_bytes": 76032, "format": "mp3" }

Step 4: Long text handling

步骤4：长文本处理

```
happy-audio-gen
```
automatically splits long input for providers that cap per-call length (Bailian ≤ 200 Chinese chars per call). Chunks are concatenated byte-for-byte on output.
For best fidelity with concatenated MP3s, stitch the segments with ffmpeg afterward rather than relying on byte concat.

```
happy-audio-gen
```
会自动为有单调用长度限制的服务商拆分长输入内容（百炼单调用最多支持200个中文字符）。输出时会将分段内容按字节拼接。
若要获得拼接后MP3的最佳音质，建议后续使用ffmpeg进行拼接，而非依赖字节拼接方式。

Step 5: Errors

步骤5：错误处理

[openai] OpenAI TTS 400

with

invalid voice

→ the voice name is not supported by the model. Use one of

alloy

ash

coral

echo

fable

onyx

nova

sage

shimmer

[minimax] ... 2049 invalid api key

→ try

MINIMAX_BASE_URL=https://api.minimaxi.com/v1

(different region).

```
[bailian] ... 400 DataInspectionFailed
```
→ Aliyun content filter. Surface to the user.
```
[elevenlabs] 401
```
→ key invalid or subscription expired.

```
[openai] OpenAI TTS 400
```
伴随
```
invalid voice
```
错误→语音名称不被该模型支持。请使用
```
alloy
```
、
```
ash
```
、
```
coral
```
、
```
echo
```
、
```
fable
```
、
```
onyx
```
、
```
nova
```
、
```
sage
```
、
```
shimmer
```
中的一种。

[minimax] ... 2049 invalid api key

错误→尝试设置

MINIMAX_BASE_URL=https://api.minimaxi.com/v1

（不同区域）。

```
[bailian] ... 400 DataInspectionFailed
```
错误→阿里云内容审核未通过。请告知用户。
```
[elevenlabs] 401
```
错误→密钥无效或订阅已过期。

References

参考文档

```
references/providers.md
```
— per-provider env vars, default models, voice lists.
```
references/voices.md
```
— curated voices for each provider.
```
references/error_codes.md
```
— common errors and fixes.
```
references/config/first-time-setup.md
```
```
references/config/extend-schema.md
```
```
assets/EXTEND.template.md
```

```
references/providers.md
```
——各服务商的环境变量、默认模型、语音列表。
```
references/voices.md
```
——为各服务商精选的语音列表。
```
references/error_codes.md
```
——常见错误及修复方案。
```
references/config/first-time-setup.md
```
```
references/config/extend-schema.md
```
```
assets/EXTEND.template.md
```