happy-audio-gen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

happy-audio-gen

happy-audio-gen

Turns text into speech across 6 providers through one CLI. All providers are synchronous (TTS is fast — typically under 10 seconds) except Bailian's voice-design flow (which is still covered but uses a longer poll window).
通过一个CLI工具实现跨6家服务商的文本转语音功能。除百炼的自定义语音设计流程(仍支持该功能,但需更长的轮询等待时间)外,所有服务商均为同步处理(TTS速度很快——通常不到10秒)。

Quick usage

快速使用

bash
undefined
bash
undefined

Shortest path — OpenAI default voice

最简方式——使用OpenAI默认语音

bun scripts/main.ts --text "Hello, world" --out ./hello.mp3
bun scripts/main.ts --text "Hello, world" --out ./hello.mp3

Chinese, MiniMax

中文语音,使用MiniMax

bun scripts/main.ts --provider minimax --text "大家好" --voice male-qn-qingse --out ./hello.mp3
bun scripts/main.ts --provider minimax --text "大家好" --voice male-qn-qingse --out ./hello.mp3

Long-form, Bailian (auto-splits by sentence)

长文本,使用百炼(按句子自动分段)

bun scripts/main.ts --provider bailian --textfiles ./script.md --out ./narration.mp3
undefined
bun scripts/main.ts --provider bailian --textfiles ./script.md --out ./narration.mp3
undefined

When to invoke this skill

何时调用本Skill

  • User asks to synthesize speech / TTS / read aloud / narrate / dub / make a voice-over.
  • User asks to convert script / text / article into audio.
  • User names a TTS voice or model.
Do not route here when the user wants to transcribe audio → text (that's STT, different domain), or edit / mix audio files (use a dedicated audio editor).
  • 用户要求合成语音/TTS/朗读/生成旁白/配音/制作语音旁白时。
  • 用户要求将脚本/文本/文章转换为音频时。
  • 用户提及某个TTS语音或模型时。
请勿将以下请求路由至此:用户需要将音频转录为文本(这属于STT,是不同领域),或是编辑/混音音频文件(请使用专用音频编辑器)。

Step 0: Preflight (BLOCKING)

步骤0:预检查(阻塞操作)

  1. Locate EXTEND.md:
    • ./.happy-skills/happy-audio-gen/EXTEND.md
    • $XDG_CONFIG_HOME/happy-skills/happy-audio-gen/EXTEND.md
    • ~/.happy-skills/happy-audio-gen/EXTEND.md
    If none found, run
    bun scripts/main.ts --setup
    and walk the user through
    references/config/first-time-setup.md
    .
  2. Verify at least one provider has credentials (env var or 1Password reference).
  3. Verify Bun is available. Fallback:
    npx -y bun
    .
  1. 找到EXTEND.md文件
    • ./.happy-skills/happy-audio-gen/EXTEND.md
    • $XDG_CONFIG_HOME/happy-skills/happy-audio-gen/EXTEND.md
    • ~/.happy-skills/happy-audio-gen/EXTEND.md
    如果未找到任何文件,请运行
    bun scripts/main.ts --setup
    ,并引导用户查看
    references/config/first-time-setup.md
  2. 验证至少有一家服务商的凭证已配置(环境变量或1Password引用)。
  3. 验证Bun已安装。备选方案:
    npx -y bun

Step 1: Choose provider

步骤1:选择服务商

Preference order:
  1. --provider <id>
  2. EXTEND.md
    default_provider
  3. Auto-detect env vars:
    openai > elevenlabs > bailian > minimax > siliconflow > playht
Pick by language / voice intent:
  • English, natural + fast
    openai
    (gpt-4o-mini-tts / tts-1).
  • Multilingual, voice cloning
    elevenlabs
    .
  • Chinese, long-form
    bailian
    (qwen-tts auto-chunks long scripts) or
    minimax
    .
  • Chinese dialect / voice design
    bailian
    (voice-design with qwen3-tts-vd) or
    siliconflow
    (CosyVoice2).
  • Ultra-realistic, short-form
    playht
    (2.0).
优先级顺序:
  1. --provider <id>
    参数指定
  2. EXTEND.md中的
    default_provider
    配置
  3. 自动检测环境变量:
    openai > elevenlabs > bailian > minimax > siliconflow > playht
根据语言/语音需求选择:
  • 英文,自然流畅+快速
    openai
    (gpt-4o-mini-tts/tts-1)。
  • 多语言,支持语音克隆
    elevenlabs
  • 中文,长文本
    bailian
    (qwen-tts自动拆分长脚本)或
    minimax
  • 中文方言/自定义语音设计
    bailian
    (qwen3-tts-vd支持自定义语音设计)或
    siliconflow
    (CosyVoice2)。
  • 超写实风格,短文本
    playht
    (2.0)。

Step 2: Fill parameters

步骤2:填写参数

  • --text
    or
    --textfiles
    : input. Always quote.
  • --out <path>
    : REQUIRED. Extension determines format (
    .mp3
    /
    .wav
    /
    .ogg
    /
    .flac
    ).
  • --voice <id>
    : provider-specific. See
    references/voices.md
    for the short list of well-known voices.
  • --rate 0.5..2.0
    : speaking rate.
  • --instruction "..."
    : voice direction (only
    openai
    gpt-4o-mini-tts and
    siliconflow
    honor this).
  • --language <code>
    :
    en
    ,
    zh
    ,
    ja
    — only a few providers honor this explicitly.
  • --text
    --textfiles
    :输入内容。请始终用引号包裹。
  • --out <path>
    :必填参数。文件扩展名决定输出格式(
    .mp3
    /
    .wav
    /
    .ogg
    /
    .flac
    )。
  • --voice <id>
    :服务商专属参数。请查看
    references/voices.md
    获取知名语音的简短列表。
  • --rate 0.5..2.0
    :语速。
  • --instruction "..."
    :语音风格指引(仅
    openai
    gpt-4o-mini-tts和
    siliconflow
    支持该参数)。
  • --language <code>
    :语言代码,如
    en
    zh
    ja
    ——仅部分服务商明确支持该参数。

Step 3: Run

步骤3:运行

bash
bun scripts/main.ts \
  --provider openai \
  --model gpt-4o-mini-tts \
  --voice alloy \
  --text "..." \
  --out ./out.mp3
JSON mode:
json
{ "success": true, "provider": "openai", "model": "gpt-4o-mini-tts", "voice": "alloy", "output": "/abs/out.mp3", "size_bytes": 76032, "format": "mp3" }
bash
bun scripts/main.ts \
  --provider openai \
  --model gpt-4o-mini-tts \
  --voice alloy \
  --text "..." \
  --out ./out.mp3
JSON模式输出:
json
{ "success": true, "provider": "openai", "model": "gpt-4o-mini-tts", "voice": "alloy", "output": "/abs/out.mp3", "size_bytes": 76032, "format": "mp3" }

Step 4: Long text handling

步骤4:长文本处理

  • happy-audio-gen
    automatically splits long input for providers that cap per-call length (Bailian ≤ 200 Chinese chars per call). Chunks are concatenated byte-for-byte on output.
  • For best fidelity with concatenated MP3s, stitch the segments with ffmpeg afterward rather than relying on byte concat.
  • happy-audio-gen
    会自动为有单调用长度限制的服务商拆分长输入内容(百炼单调用最多支持200个中文字符)。输出时会将分段内容按字节拼接。
  • 若要获得拼接后MP3的最佳音质,建议后续使用ffmpeg进行拼接,而非依赖字节拼接方式。

Step 5: Errors

步骤5:错误处理

  • [openai] OpenAI TTS 400
    with
    invalid voice
    → the voice name is not supported by the model. Use one of
    alloy
    ,
    ash
    ,
    coral
    ,
    echo
    ,
    fable
    ,
    onyx
    ,
    nova
    ,
    sage
    ,
    shimmer
    .
  • [minimax] ... 2049 invalid api key
    → try
    MINIMAX_BASE_URL=https://api.minimaxi.com/v1
    (different region).
  • [bailian] ... 400 DataInspectionFailed
    → Aliyun content filter. Surface to the user.
  • [elevenlabs] 401
    → key invalid or subscription expired.
  • [openai] OpenAI TTS 400
    伴随
    invalid voice
    错误→语音名称不被该模型支持。请使用
    alloy
    ash
    coral
    echo
    fable
    onyx
    nova
    sage
    shimmer
    中的一种。
  • [minimax] ... 2049 invalid api key
    错误→尝试设置
    MINIMAX_BASE_URL=https://api.minimaxi.com/v1
    (不同区域)。
  • [bailian] ... 400 DataInspectionFailed
    错误→阿里云内容审核未通过。请告知用户。
  • [elevenlabs] 401
    错误→密钥无效或订阅已过期。

References

参考文档

  • references/providers.md
    — per-provider env vars, default models, voice lists.
  • references/voices.md
    — curated voices for each provider.
  • references/error_codes.md
    — common errors and fixes.
  • references/config/first-time-setup.md
  • references/config/extend-schema.md
  • assets/EXTEND.template.md
  • references/providers.md
    ——各服务商的环境变量、默认模型、语音列表。
  • references/voices.md
    ——为各服务商精选的语音列表。
  • references/error_codes.md
    ——常见错误及修复方案。
  • references/config/first-time-setup.md
  • references/config/extend-schema.md
  • assets/EXTEND.template.md