minimax-tts-pipeline

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

MiniMax TTS 发音控制

MiniMax TTS Pronunciation Control

将文本文件逐步处理发音问题，最终调用 MiniMax TTS API 生成音频。

Process text files step by step to resolve pronunciation issues, and finally call the MiniMax TTS API to generate audio.

输入

Input

参数	必填	说明
文本文件路径	是	待处理的 .txt 文件绝对路径
输出目录	否	默认在输入文件同目录下创建 `tts-{YYYYMMDD-HHMMSS}/` 目录

Parameter	Required	Description
Text file path	Yes	Absolute path of the .txt file to be processed
Output directory	No	By default, creates a `tts-{YYYYMMDD-HHMMSS}/` directory in the same directory as the input file

用户发音规则管理

User Pronunciation Rule Management

当用户提出添加/查询/删除/修改发音规则（如"Qwen 读作千问"、"看看有哪些规则"、"删掉 Qwen 的规则"）时，读取

<SKILL_DIR>/references/manage-user-rules.md

和

<SKILL_DIR>/references/pronunciation-rules.md

，然后按指引操作

<SKILL_DIR>/user-rules.json

。

When the user requests to add/query/delete/modify pronunciation rules (e.g., "Qwen is pronounced as Qianwen", "Check what rules there are", "Delete the rule for Qwen"), read

<SKILL_DIR>/references/manage-user-rules.md

and

<SKILL_DIR>/references/pronunciation-rules.md

, then follow the guidelines to operate

<SKILL_DIR>/user-rules.json

工作流

Workflow

输入.txt → input.raw.txt → [脚本] normalize_punctuation.py → input.txt
         → [脚本] scan_terms.py → terms.json(草稿)
         → [Subagent 1] 补全规范化 → terms.json
         → [脚本] validate + generate_normalized.py → normalized.txt
         → [Subagent 2] 补全读法 + 多音字识别 → terms.json
         → [脚本] validate
         → [Subagent 3] 复核 → terms.json(review.pass)
         → [脚本] validate + call_tts.py → output.wav + output.title
         → [脚本] title_to_srt.py → output.srt

用

<SKILL_DIR>

表示本 skill 目录的绝对路径。用

<run_dir>

表示当前运行的输出目录的绝对路径（即 Step 0 中创建的

tts-{YYYYMMDD-HHMMSS}/

目录的完整路径）。

input.txt → input.raw.txt → [Script] normalize_punctuation.py → input.txt
         → [Script] scan_terms.py → terms.json(draft)
         → [Subagent 1] Complete normalization → terms.json
         → [Script] validate + generate_normalized.py → normalized.txt
         → [Subagent 2] Complete pronunciation + polyphonic character recognition → terms.json
         → [Script] validate
         → [Subagent 3] Review → terms.json(review.pass)
         → [Script] validate + call_tts.py → output.wav + output.title
         → [Script] title_to_srt.py → output.srt

Use

<SKILL_DIR>

to represent the absolute path of this skill directory. Use

<run_dir>

to represent the absolute path of the current running output directory (i.e., the full path of the

tts-{YYYYMMDD-HHMMSS}/

directory created in Step 0).

Step -1：环境预检测

Step -1: Environment Pre-check

在开始任何处理之前，依次检测运行环境和 MiniMax API Key。

Python 与依赖检测：

执行
```
python3 --version
```
，确认 Python >= 3.10。如果版本过低或未安装，提示用户安装后重试，停止流程。
执行
```
python3 -c "import requests"
```
，确认
```
requests
```
库已安装。如果未安装，提示用户执行
```
pip3 install requests
```
（或
```
pip install requests
```
）后重试，停止流程。

API Key 检测：

检查
```
<SKILL_DIR>/.env
```
（即与 SKILL.md 同级目录下的
```
.env
```
文件）是否存在。如果不存在，新建一个空的
```
.env
```
文件。
读取该
```
.env
```
文件，检查是否存在
```
MINIMAX_API_KEY
```
且值非空。
如果已配置，继续下一步。
如果未配置，向用户询问 MiniMax API Key。用户给出后，将
```
MINIMAX_API_KEY=<用户提供的值>
```
追加到
```
<SKILL_DIR>/.env
```
文件中，然后继续。

Before starting any processing, check the running environment and MiniMax API Key in sequence.

Python and Dependency Check:

Execute
```
python3 --version
```
to confirm Python >= 3.10. If the version is too low or not installed, prompt the user to install it and try again, then stop the process.
Execute
```
python3 -c "import requests"
```
to confirm that the
```
requests
```
library is installed. If not installed, prompt the user to execute
```
pip3 install requests
```
(or
```
pip install requests
```
) and try again, then stop the process.

API Key Check:

Check if
```
<SKILL_DIR>/.env
```
(i.e., the
```
.env
```
file in the same directory as SKILL.md) exists. If not, create an empty
```
.env
```
file.
Read the
```
.env
```
file and check if
```
MINIMAX_API_KEY
```
exists and its value is not empty.
If configured, proceed to the next step.
If not configured, ask the user for the MiniMax API Key. After the user provides it, append
```
MINIMAX_API_KEY=<value provided by user>
```
to the
```
<SKILL_DIR>/.env
```
file, then proceed.

Step 0：初始化运行目录

Step 0: Initialize Running Directory

从用户输入获取文本文件路径。
创建
```
<input_dir>/tts-{YYYYMMDD-HHMMSS}/
```
目录，其中
```
<input_dir>
```
是输入文件所在目录；除非用户显式指定输出目录，否则不得改用当前工作目录或 skill 项目目录。
- 如果因沙箱或权限限制无法写入输入文件同级目录，必须先请求用户授权；只有用户明确同意时，才允许改用其他目录。
复制输入文件为
```
<run_dir>/input.raw.txt
```
。
执行标点规范化：

bash

python3 <SKILL_DIR>/scripts/normalize_punctuation.py <run_dir>/input.raw.txt <run_dir>/input.txt

执行：

bash

python3 <SKILL_DIR>/scripts/scan_terms.py <run_dir>/input.txt <run_dir>/terms.json

进入 Step 1。

Obtain the text file path from user input.
Create the
```
<input_dir>/tts-{YYYYMMDD-HHMMSS}/
```
directory, where
```
<input_dir>
```
is the directory of the input file; unless the user explicitly specifies an output directory, do not use the current working directory or skill project directory instead.
- If writing to the same directory as the input file is not possible due to sandbox or permission restrictions, must request user authorization first; only when the user explicitly agrees is it allowed to use another directory.
Copy the input file as
```
<run_dir>/input.raw.txt
```
.
Perform punctuation normalization:

bash

python3 <SKILL_DIR>/scripts/normalize_punctuation.py <run_dir>/input.raw.txt <run_dir>/input.txt

Execute:

bash

python3 <SKILL_DIR>/scripts/scan_terms.py <run_dir>/input.txt <run_dir>/terms.json

Proceed to Step 1.

Step 1：大小写规范化判断

Step 1: Case Normalization Judgment

将

<SKILL_DIR>

和

<run_dir>

替换为实际绝对路径后，发送以下 prompt 给 subagent：

请先阅读以下文件，然后执行任务。

Replace

<SKILL_DIR>

and

<run_dir>

with actual absolute paths, then send the following prompt to the subagent:

Please read the following files first, then perform the task.

必读文件（按顺序阅读）

Required Files (Read in Order)

操作指引：<SKILL_DIR>/references/step-1-normalize.md
发音规则参考：<SKILL_DIR>/references/pronunciation-rules.md
用户自定义规则：<SKILL_DIR>/user-rules.json（如文件不存在则跳过）
原文：<run_dir>/input.txt
候选词：<run_dir>/terms.json

Operation Guide: <SKILL_DIR>/references/step-1-normalize.md
Pronunciation Rule Reference: <SKILL_DIR>/references/pronunciation-rules.md
User-defined Rules: <SKILL_DIR>/user-rules.json (skip if the file does not exist)
Original Text: <run_dir>/input.txt
Candidate Terms: <run_dir>/terms.json

任务

Task

按操作指引的规则，处理 terms.json 中每个 term 的 normalized、category、reason 字段。

Process the normalized, category, and reason fields of each term in terms.json according to the rules in the operation guide.

输出

Output

直接修改并保存 <run_dir>/terms.json（不要创建新文件）。

Directly modify and save <run_dir>/terms.json (do not create a new file).

校验

Validation

修改完成后，执行

python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 1

。如果校验失败，根据 errors 列表修正 terms.json，重新校验，直到通过。

After modification, execute

python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 1

. If validation fails, correct terms.json according to the errors list and re-validate until it passes.

收尾

Follow-up

校验通过后，执行

python3 <SKILL_DIR>/scripts/generate_normalized.py <run_dir>/input.txt <run_dir>/terms.json <run_dir>/normalized.txt

。

undefined

After validation passes, execute

python3 <SKILL_DIR>/scripts/generate_normalized.py <run_dir>/input.txt <run_dir>/terms.json <run_dir>/normalized.txt

undefined

Step 2：发音读法判断

Step 2: Pronunciation Judgment

将

<SKILL_DIR>

和

<run_dir>

替换为实际绝对路径后，发送以下 prompt 给 subagent：

请先阅读以下文件，然后执行任务。

Replace

<SKILL_DIR>

and

<run_dir>

with actual absolute paths, then send the following prompt to the subagent:

Please read the following files first, then perform the task.

必读文件（按顺序阅读）

Required Files (Read in Order)

操作指引：<SKILL_DIR>/references/step-2-reading.md
发音规则参考：<SKILL_DIR>/references/pronunciation-rules.md
用户自定义规则：<SKILL_DIR>/user-rules.json（如文件不存在则跳过）
原文：<run_dir>/input.txt
规范化后文本：<run_dir>/normalized.txt
候选词：<run_dir>/terms.json

Operation Guide: <SKILL_DIR>/references/step-2-reading.md
Pronunciation Rule Reference: <SKILL_DIR>/references/pronunciation-rules.md
User-defined Rules: <SKILL_DIR>/user-rules.json (skip if the file does not exist)
Original Text: <run_dir>/input.txt
Normalized Text: <run_dir>/normalized.txt
Candidate Terms: <run_dir>/terms.json

任务

Task

按操作指引的规则，处理 terms.json 中每个 term 的 reading、category 字段，并识别原文中遗漏的多音字。

Process the reading and category fields of each term in terms.json according to the rules in the operation guide, and identify missing polyphonic characters in the original text.

输出

Output

直接修改并保存 <run_dir>/terms.json（不要创建新文件）。

Directly modify and save <run_dir>/terms.json (do not create a new file).

校验

Validation

修改完成后，执行

python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 2

。如果校验失败，根据 errors 列表修正 terms.json，重新校验，直到通过。

undefined

After modification, execute

python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 2

. If validation fails, correct terms.json according to the errors list and re-validate until it passes.

undefined

Step 3：质量复核

Step 3: Quality Review

将

<SKILL_DIR>

和

<run_dir>

替换为实际绝对路径后，发送以下 prompt 给 subagent：

请先阅读以下文件，然后执行任务。

Replace

<SKILL_DIR>

and

<run_dir>

with actual absolute paths, then send the following prompt to the subagent:

Please read the following files first, then perform the task.

必读文件（按顺序阅读）

Required Files (Read in Order)

操作指引：<SKILL_DIR>/references/step-3-review.md
发音规则参考：<SKILL_DIR>/references/pronunciation-rules.md
用户自定义规则：<SKILL_DIR>/user-rules.json（如文件不存在则跳过）
原文：<run_dir>/input.txt
规范化文本：<run_dir>/normalized.txt
完整候选词：<run_dir>/terms.json

Operation Guide: <SKILL_DIR>/references/step-3-review.md
Pronunciation Rule Reference: <SKILL_DIR>/references/pronunciation-rules.md
User-defined Rules: <SKILL_DIR>/user-rules.json (skip if the file does not exist)
Original Text: <run_dir>/input.txt
Normalized Text: <run_dir>/normalized.txt
Complete Candidate Terms: <run_dir>/terms.json

任务

Task

按操作指引的检查项，对 terms.json 做最终质量复核。

Perform a final quality review on terms.json according to the check items in the operation guide.

输出

Output

直接修改并保存 <run_dir>/terms.json（不要创建新文件）。

Directly modify and save <run_dir>/terms.json (do not create a new file).

校验

Validation

修改完成后，执行

python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 3

。如果校验失败，根据 errors 列表修正 terms.json，重新校验，直到通过。

undefined

After modification, execute

python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 3

. If validation fails, correct terms.json according to the errors list and re-validate until it passes.

undefined

Step 4：生成音频和字幕 JSON

Step 4: Generate Audio and Subtitle JSON

调用 MiniMax TTS API：

bash

python3 <SKILL_DIR>/scripts/call_tts.py <run_dir>/normalized.txt <run_dir>/terms.json <run_dir>/output.wav <run_dir>/output.title

此步骤会：

生成并落盘 WAV 音频：
```
<run_dir>/output.wav
```
下载并落盘 MiniMax 返回的字幕 JSON：
```
<run_dir>/output.title
```

Call the MiniMax TTS API:

bash

python3 <SKILL_DIR>/scripts/call_tts.py <run_dir>/normalized.txt <run_dir>/terms.json <run_dir>/output.wav <run_dir>/output.title

This step will:

Generate and save the WAV audio:
```
<run_dir>/output.wav
```
Download and save the subtitle JSON returned by MiniMax:
```
<run_dir>/output.title
```

Step 5：生成 SRT 字幕

Step 5: Generate SRT Subtitles

根据 Step 4 得到的 MiniMax 字幕 JSON 和 WAV 音频，生成 SRT 字幕：

bash

python3 <SKILL_DIR>/scripts/title_to_srt.py <run_dir>/output.title <run_dir>/output.wav <run_dir>/output.srt

向用户报告结果：

音频文件路径
MiniMax 字幕 JSON 文件路径
SRT 字幕文件路径
使用了多少条 tone 规则
替换了多少处文本

Generate SRT subtitles based on the MiniMax subtitle JSON and WAV audio obtained in Step 4:

bash

python3 <SKILL_DIR>/scripts/title_to_srt.py <run_dir>/output.title <run_dir>/output.wav <run_dir>/output.srt

Report the results to the user:

Audio file path
MiniMax subtitle JSON file path
SRT subtitle file path
Number of tone rules used
Number of text replacements made

落盘文件

Saved Files

tts-YYYYMMDD-HHMMSS/
  input.raw.txt    # 原始输入（只读）
  input.txt        # 标点规范化后的输入（只读）
  terms.json       # 全流程唯一结构化工作文件
  normalized.txt   # 规范化后的文本
  output.wav       # MiniMax TTS 输出音频
  output.title     # MiniMax 返回的字级时间戳字幕 JSON
  output.srt       # 根据 output.title + output.wav 生成的 SRT 字幕

tts-YYYYMMDD-HHMMSS/
  input.raw.txt    # Original input (read-only)
  input.txt        # Input after punctuation normalization (read-only)
  terms.json       # The only structured working file throughout the process
  normalized.txt   # Normalized text
  output.wav       # MiniMax TTS output audio
  output.title     # Word-level timestamp subtitle JSON returned by MiniMax
  output.srt       # SRT subtitles generated from output.title + output.wav

约束

Constraints

全流程只维护一份 terms.json，所有 subagent 都直接修改这同一个文件。
LLM 只改 terms.json，不直接修改 normalized.txt 或 input.txt。
文本替换、tone 生成、API 调用全部由脚本执行。
任一阶段校验失败就停止，不继续后续阶段。
MINIMAX_API_KEY 从
```
<SKILL_DIR>/.env
```
文件读取。

Maintain only one copy of terms.json throughout the process; all subagents directly modify this same file.
LLM only modifies terms.json, not normalized.txt or input.txt directly.
Text replacement, tone generation, and API calls are all executed by scripts.
Stop the process if validation fails at any stage, do not proceed to subsequent stages.
MINIMAX_API_KEY is read from the
```
<SKILL_DIR>/.env
```
file.

Resources

scripts/

```
normalize_punctuation.py <input> <output>
```
— 阶段 0：对换行缺失句末标点的文本补充句号
```
scan_terms.py
```
— 阶段 0：从原文提取候选词，生成 terms.json 草稿
```
validate_terms.py <terms_json> <stage>
```
— 阶段 1/2/3：校验 terms.json schema
```
generate_normalized.py <input> <terms> <output>
```
— 阶段 1 后：根据 terms.json 生成规范化文本
```
call_tts.py <normalized> <terms> <output_wav> [output_title]
```
— 阶段 4：调用 MiniMax TTS API 生成 WAV 音频并下载字幕 JSON
```
title_to_srt.py <input_title> <input_wav> [output_srt]
```
— 阶段 5：根据 MiniMax 字幕 JSON 和 WAV 音频生成 SRT 字幕

```
normalize_punctuation.py <input> <output>
```
— Stage 0: Add periods to texts missing end-of-sentence punctuation with line breaks
```
scan_terms.py
```
— Stage 0: Extract candidate terms from the original text and generate a draft of terms.json
```
validate_terms.py <terms_json> <stage>
```
— Stage 1/2/3: Validate terms.json schema
```
generate_normalized.py <input> <terms> <output>
```
— After Stage 1: Generate normalized text based on terms.json
```
call_tts.py <normalized> <terms> <output_wav> [output_title]
```
— Stage 4: Call the MiniMax TTS API to generate WAV audio and download subtitle JSON
```
title_to_srt.py <input_title> <input_wav> [output_srt]
```
— Stage 5: Generate SRT subtitles from MiniMax subtitle JSON and WAV audio

references/

```
pronunciation-rules.md
```
— 发音规则速查（category 枚举、reading 格式、关键约束）
```
manage-user-rules.md
```
— 用户发音规则管理指引（按需加载）
```
api-voice-settings.md
```
— MiniMax API 请求中 voice_id、speed、vol、pitch 参数说明与修改位置
```
step-1-normalize.md
```
— step 1 操作指引：大小写规范化判断
```
step-2-reading.md
```
— step 2 操作指引：发音读法判断 + 多音字识别
```
step-3-review.md
```
— step 3 操作指引：质量复核

```
pronunciation-rules.md
```
— Quick reference for pronunciation rules (category enumeration, reading format, key constraints)
```
manage-user-rules.md
```
— Guide for user pronunciation rule management (loaded on demand)
```
api-voice-settings.md
```
— Description and modification location of parameters such as voice_id, speed, vol, pitch in MiniMax API requests
```
step-1-normalize.md
```
— Operation guide for Step 1: Case normalization judgment
```
step-2-reading.md
```
— Operation guide for Step 2: Pronunciation judgment + polyphonic character recognition
```
step-3-review.md
```
— Operation guide for Step 3: Quality review

其他文件

Other Files

```
user-rules.json
```
— 用户自定义发音规则（agent 通过对话维护，各步骤消费）
```
.env
```
— MiniMax API Key 存储

```
user-rules.json
```
— User-defined pronunciation rules (maintained by the agent through dialogue, consumed by each step)
```
.env
```
— MiniMax API Key storage

API 声音参数修改

API Voice Parameter Modification

如果用户询问或想修改 MiniMax TTS API 请求中的音色、语速、音量、语调参数（

voice_id

、

speed

、

vol

、

pitch

），请先阅读

<SKILL_DIR>/references/api-voice-settings.md

。这些参数需要直接在

<SKILL_DIR>/scripts/call_tts.py

的 payload 中修改。

If the user asks about or wants to modify voice parameters such as voice type, speech speed, volume, intonation (

voice_id

speed

vol

pitch

) in the MiniMax TTS API request, please read

<SKILL_DIR>/references/api-voice-settings.md

first. These parameters need to be modified directly in the payload of

<SKILL_DIR>/scripts/call_tts.py