minimax-tts-pipeline
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMiniMax TTS 发音控制
MiniMax TTS Pronunciation Control
将文本文件逐步处理发音问题,最终调用 MiniMax TTS API 生成音频。
Process text files step by step to resolve pronunciation issues, and finally call the MiniMax TTS API to generate audio.
输入
Input
| 参数 | 必填 | 说明 |
|---|---|---|
| 文本文件路径 | 是 | 待处理的 .txt 文件绝对路径 |
| 输出目录 | 否 | 默认在输入文件同目录下创建 |
| Parameter | Required | Description |
|---|---|---|
| Text file path | Yes | Absolute path of the .txt file to be processed |
| Output directory | No | By default, creates a |
用户发音规则管理
User Pronunciation Rule Management
当用户提出添加/查询/删除/修改发音规则(如"Qwen 读作千问"、"看看有哪些规则"、"删掉 Qwen 的规则")时,读取 和 ,然后按指引操作 。
<SKILL_DIR>/references/manage-user-rules.md<SKILL_DIR>/references/pronunciation-rules.md<SKILL_DIR>/user-rules.jsonWhen the user requests to add/query/delete/modify pronunciation rules (e.g., "Qwen is pronounced as Qianwen", "Check what rules there are", "Delete the rule for Qwen"), read and , then follow the guidelines to operate .
<SKILL_DIR>/references/manage-user-rules.md<SKILL_DIR>/references/pronunciation-rules.md<SKILL_DIR>/user-rules.json工作流
Workflow
输入.txt → input.raw.txt → [脚本] normalize_punctuation.py → input.txt
→ [脚本] scan_terms.py → terms.json(草稿)
→ [Subagent 1] 补全规范化 → terms.json
→ [脚本] validate + generate_normalized.py → normalized.txt
→ [Subagent 2] 补全读法 + 多音字识别 → terms.json
→ [脚本] validate
→ [Subagent 3] 复核 → terms.json(review.pass)
→ [脚本] validate + call_tts.py → output.wav + output.title
→ [脚本] title_to_srt.py → output.srt用 表示本 skill 目录的绝对路径。
用 表示当前运行的输出目录的绝对路径(即 Step 0 中创建的 目录的完整路径)。
<SKILL_DIR><run_dir>tts-{YYYYMMDD-HHMMSS}/input.txt → input.raw.txt → [Script] normalize_punctuation.py → input.txt
→ [Script] scan_terms.py → terms.json(draft)
→ [Subagent 1] Complete normalization → terms.json
→ [Script] validate + generate_normalized.py → normalized.txt
→ [Subagent 2] Complete pronunciation + polyphonic character recognition → terms.json
→ [Script] validate
→ [Subagent 3] Review → terms.json(review.pass)
→ [Script] validate + call_tts.py → output.wav + output.title
→ [Script] title_to_srt.py → output.srtUse to represent the absolute path of this skill directory.
Use to represent the absolute path of the current running output directory (i.e., the full path of the directory created in Step 0).
<SKILL_DIR><run_dir>tts-{YYYYMMDD-HHMMSS}/Step -1:环境预检测
Step -1: Environment Pre-check
在开始任何处理之前,依次检测运行环境和 MiniMax API Key。
Python 与依赖检测:
- 执行 ,确认 Python >= 3.10。如果版本过低或未安装,提示用户安装后重试,停止流程。
python3 --version - 执行 ,确认
python3 -c "import requests"库已安装。如果未安装,提示用户执行requests(或pip3 install requests)后重试,停止流程。pip install requests
API Key 检测:
- 检查 (即与 SKILL.md 同级目录下的
<SKILL_DIR>/.env文件)是否存在。如果不存在,新建一个空的.env文件。.env - 读取该 文件,检查是否存在
.env且值非空。MINIMAX_API_KEY - 如果已配置,继续下一步。
- 如果未配置,向用户询问 MiniMax API Key。用户给出后,将 追加到
MINIMAX_API_KEY=<用户提供的值>文件中,然后继续。<SKILL_DIR>/.env
Before starting any processing, check the running environment and MiniMax API Key in sequence.
Python and Dependency Check:
- Execute to confirm Python >= 3.10. If the version is too low or not installed, prompt the user to install it and try again, then stop the process.
python3 --version - Execute to confirm that the
python3 -c "import requests"library is installed. If not installed, prompt the user to executerequests(orpip3 install requests) and try again, then stop the process.pip install requests
API Key Check:
- Check if (i.e., the
<SKILL_DIR>/.envfile in the same directory as SKILL.md) exists. If not, create an empty.envfile..env - Read the file and check if
.envexists and its value is not empty.MINIMAX_API_KEY - If configured, proceed to the next step.
- If not configured, ask the user for the MiniMax API Key. After the user provides it, append to the
MINIMAX_API_KEY=<value provided by user>file, then proceed.<SKILL_DIR>/.env
Step 0:初始化运行目录
Step 0: Initialize Running Directory
- 从用户输入获取文本文件路径。
- 创建 目录,其中
<input_dir>/tts-{YYYYMMDD-HHMMSS}/是输入文件所在目录;除非用户显式指定输出目录,否则不得改用当前工作目录或 skill 项目目录。<input_dir>- 如果因沙箱或权限限制无法写入输入文件同级目录,必须先请求用户授权;只有用户明确同意时,才允许改用其他目录。
- 复制输入文件为 。
<run_dir>/input.raw.txt - 执行标点规范化:
bash
python3 <SKILL_DIR>/scripts/normalize_punctuation.py <run_dir>/input.raw.txt <run_dir>/input.txt- 执行:
bash
python3 <SKILL_DIR>/scripts/scan_terms.py <run_dir>/input.txt <run_dir>/terms.json- 进入 Step 1。
- Obtain the text file path from user input.
- Create the directory, where
<input_dir>/tts-{YYYYMMDD-HHMMSS}/is the directory of the input file; unless the user explicitly specifies an output directory, do not use the current working directory or skill project directory instead.<input_dir>- If writing to the same directory as the input file is not possible due to sandbox or permission restrictions, must request user authorization first; only when the user explicitly agrees is it allowed to use another directory.
- Copy the input file as .
<run_dir>/input.raw.txt - Perform punctuation normalization:
bash
python3 <SKILL_DIR>/scripts/normalize_punctuation.py <run_dir>/input.raw.txt <run_dir>/input.txt- Execute:
bash
python3 <SKILL_DIR>/scripts/scan_terms.py <run_dir>/input.txt <run_dir>/terms.json- Proceed to Step 1.
Step 1:大小写规范化判断
Step 1: Case Normalization Judgment
将 和 替换为实际绝对路径后,发送以下 prompt 给 subagent:
<SKILL_DIR><run_dir>请先阅读以下文件,然后执行任务。Replace and with actual absolute paths, then send the following prompt to the subagent:
<SKILL_DIR><run_dir>Please read the following files first, then perform the task.必读文件(按顺序阅读)
Required Files (Read in Order)
- 操作指引:<SKILL_DIR>/references/step-1-normalize.md
- 发音规则参考:<SKILL_DIR>/references/pronunciation-rules.md
- 用户自定义规则:<SKILL_DIR>/user-rules.json(如文件不存在则跳过)
- 原文:<run_dir>/input.txt
- 候选词:<run_dir>/terms.json
- Operation Guide: <SKILL_DIR>/references/step-1-normalize.md
- Pronunciation Rule Reference: <SKILL_DIR>/references/pronunciation-rules.md
- User-defined Rules: <SKILL_DIR>/user-rules.json (skip if the file does not exist)
- Original Text: <run_dir>/input.txt
- Candidate Terms: <run_dir>/terms.json
任务
Task
按操作指引的规则,处理 terms.json 中每个 term 的 normalized、category、reason 字段。
Process the normalized, category, and reason fields of each term in terms.json according to the rules in the operation guide.
输出
Output
直接修改并保存 <run_dir>/terms.json(不要创建新文件)。
Directly modify and save <run_dir>/terms.json (do not create a new file).
校验
Validation
修改完成后,执行 。如果校验失败,根据 errors 列表修正 terms.json,重新校验,直到通过。
python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 1After modification, execute . If validation fails, correct terms.json according to the errors list and re-validate until it passes.
python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 1收尾
Follow-up
校验通过后,执行 。
python3 <SKILL_DIR>/scripts/generate_normalized.py <run_dir>/input.txt <run_dir>/terms.json <run_dir>/normalized.txtundefinedAfter validation passes, execute .
python3 <SKILL_DIR>/scripts/generate_normalized.py <run_dir>/input.txt <run_dir>/terms.json <run_dir>/normalized.txtundefinedStep 2:发音读法判断
Step 2: Pronunciation Judgment
将 和 替换为实际绝对路径后,发送以下 prompt 给 subagent:
<SKILL_DIR><run_dir>请先阅读以下文件,然后执行任务。Replace and with actual absolute paths, then send the following prompt to the subagent:
<SKILL_DIR><run_dir>Please read the following files first, then perform the task.必读文件(按顺序阅读)
Required Files (Read in Order)
- 操作指引:<SKILL_DIR>/references/step-2-reading.md
- 发音规则参考:<SKILL_DIR>/references/pronunciation-rules.md
- 用户自定义规则:<SKILL_DIR>/user-rules.json(如文件不存在则跳过)
- 原文:<run_dir>/input.txt
- 规范化后文本:<run_dir>/normalized.txt
- 候选词:<run_dir>/terms.json
- Operation Guide: <SKILL_DIR>/references/step-2-reading.md
- Pronunciation Rule Reference: <SKILL_DIR>/references/pronunciation-rules.md
- User-defined Rules: <SKILL_DIR>/user-rules.json (skip if the file does not exist)
- Original Text: <run_dir>/input.txt
- Normalized Text: <run_dir>/normalized.txt
- Candidate Terms: <run_dir>/terms.json
任务
Task
按操作指引的规则,处理 terms.json 中每个 term 的 reading、category 字段,并识别原文中遗漏的多音字。
Process the reading and category fields of each term in terms.json according to the rules in the operation guide, and identify missing polyphonic characters in the original text.
输出
Output
直接修改并保存 <run_dir>/terms.json(不要创建新文件)。
Directly modify and save <run_dir>/terms.json (do not create a new file).
校验
Validation
修改完成后,执行 。如果校验失败,根据 errors 列表修正 terms.json,重新校验,直到通过。
python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 2undefinedAfter modification, execute . If validation fails, correct terms.json according to the errors list and re-validate until it passes.
python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 2undefinedStep 3:质量复核
Step 3: Quality Review
将 和 替换为实际绝对路径后,发送以下 prompt 给 subagent:
<SKILL_DIR><run_dir>请先阅读以下文件,然后执行任务。Replace and with actual absolute paths, then send the following prompt to the subagent:
<SKILL_DIR><run_dir>Please read the following files first, then perform the task.必读文件(按顺序阅读)
Required Files (Read in Order)
- 操作指引:<SKILL_DIR>/references/step-3-review.md
- 发音规则参考:<SKILL_DIR>/references/pronunciation-rules.md
- 用户自定义规则:<SKILL_DIR>/user-rules.json(如文件不存在则跳过)
- 原文:<run_dir>/input.txt
- 规范化文本:<run_dir>/normalized.txt
- 完整候选词:<run_dir>/terms.json
- Operation Guide: <SKILL_DIR>/references/step-3-review.md
- Pronunciation Rule Reference: <SKILL_DIR>/references/pronunciation-rules.md
- User-defined Rules: <SKILL_DIR>/user-rules.json (skip if the file does not exist)
- Original Text: <run_dir>/input.txt
- Normalized Text: <run_dir>/normalized.txt
- Complete Candidate Terms: <run_dir>/terms.json
任务
Task
按操作指引的检查项,对 terms.json 做最终质量复核。
Perform a final quality review on terms.json according to the check items in the operation guide.
输出
Output
直接修改并保存 <run_dir>/terms.json(不要创建新文件)。
Directly modify and save <run_dir>/terms.json (do not create a new file).
校验
Validation
修改完成后,执行 。如果校验失败,根据 errors 列表修正 terms.json,重新校验,直到通过。
python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 3undefinedAfter modification, execute . If validation fails, correct terms.json according to the errors list and re-validate until it passes.
python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 3undefinedStep 4:生成音频和字幕 JSON
Step 4: Generate Audio and Subtitle JSON
调用 MiniMax TTS API:
bash
python3 <SKILL_DIR>/scripts/call_tts.py <run_dir>/normalized.txt <run_dir>/terms.json <run_dir>/output.wav <run_dir>/output.title此步骤会:
- 生成并落盘 WAV 音频:
<run_dir>/output.wav - 下载并落盘 MiniMax 返回的字幕 JSON:
<run_dir>/output.title
Call the MiniMax TTS API:
bash
python3 <SKILL_DIR>/scripts/call_tts.py <run_dir>/normalized.txt <run_dir>/terms.json <run_dir>/output.wav <run_dir>/output.titleThis step will:
- Generate and save the WAV audio:
<run_dir>/output.wav - Download and save the subtitle JSON returned by MiniMax:
<run_dir>/output.title
Step 5:生成 SRT 字幕
Step 5: Generate SRT Subtitles
根据 Step 4 得到的 MiniMax 字幕 JSON 和 WAV 音频,生成 SRT 字幕:
bash
python3 <SKILL_DIR>/scripts/title_to_srt.py <run_dir>/output.title <run_dir>/output.wav <run_dir>/output.srt向用户报告结果:
- 音频文件路径
- MiniMax 字幕 JSON 文件路径
- SRT 字幕文件路径
- 使用了多少条 tone 规则
- 替换了多少处文本
Generate SRT subtitles based on the MiniMax subtitle JSON and WAV audio obtained in Step 4:
bash
python3 <SKILL_DIR>/scripts/title_to_srt.py <run_dir>/output.title <run_dir>/output.wav <run_dir>/output.srtReport the results to the user:
- Audio file path
- MiniMax subtitle JSON file path
- SRT subtitle file path
- Number of tone rules used
- Number of text replacements made
落盘文件
Saved Files
tts-YYYYMMDD-HHMMSS/
input.raw.txt # 原始输入(只读)
input.txt # 标点规范化后的输入(只读)
terms.json # 全流程唯一结构化工作文件
normalized.txt # 规范化后的文本
output.wav # MiniMax TTS 输出音频
output.title # MiniMax 返回的字级时间戳字幕 JSON
output.srt # 根据 output.title + output.wav 生成的 SRT 字幕tts-YYYYMMDD-HHMMSS/
input.raw.txt # Original input (read-only)
input.txt # Input after punctuation normalization (read-only)
terms.json # The only structured working file throughout the process
normalized.txt # Normalized text
output.wav # MiniMax TTS output audio
output.title # Word-level timestamp subtitle JSON returned by MiniMax
output.srt # SRT subtitles generated from output.title + output.wav约束
Constraints
- 全流程只维护一份 terms.json,所有 subagent 都直接修改这同一个文件。
- LLM 只改 terms.json,不直接修改 normalized.txt 或 input.txt。
- 文本替换、tone 生成、API 调用全部由脚本执行。
- 任一阶段校验失败就停止,不继续后续阶段。
- MINIMAX_API_KEY 从 文件读取。
<SKILL_DIR>/.env
- Maintain only one copy of terms.json throughout the process; all subagents directly modify this same file.
- LLM only modifies terms.json, not normalized.txt or input.txt directly.
- Text replacement, tone generation, and API calls are all executed by scripts.
- Stop the process if validation fails at any stage, do not proceed to subsequent stages.
- MINIMAX_API_KEY is read from the file.
<SKILL_DIR>/.env
Resources
Resources
scripts/
scripts/
- — 阶段 0:对换行缺失句末标点的文本补充句号
normalize_punctuation.py <input> <output> - — 阶段 0:从原文提取候选词,生成 terms.json 草稿
scan_terms.py - — 阶段 1/2/3:校验 terms.json schema
validate_terms.py <terms_json> <stage> - — 阶段 1 后:根据 terms.json 生成规范化文本
generate_normalized.py <input> <terms> <output> - — 阶段 4:调用 MiniMax TTS API 生成 WAV 音频并下载字幕 JSON
call_tts.py <normalized> <terms> <output_wav> [output_title] - — 阶段 5:根据 MiniMax 字幕 JSON 和 WAV 音频生成 SRT 字幕
title_to_srt.py <input_title> <input_wav> [output_srt]
- — Stage 0: Add periods to texts missing end-of-sentence punctuation with line breaks
normalize_punctuation.py <input> <output> - — Stage 0: Extract candidate terms from the original text and generate a draft of terms.json
scan_terms.py - — Stage 1/2/3: Validate terms.json schema
validate_terms.py <terms_json> <stage> - — After Stage 1: Generate normalized text based on terms.json
generate_normalized.py <input> <terms> <output> - — Stage 4: Call the MiniMax TTS API to generate WAV audio and download subtitle JSON
call_tts.py <normalized> <terms> <output_wav> [output_title] - — Stage 5: Generate SRT subtitles from MiniMax subtitle JSON and WAV audio
title_to_srt.py <input_title> <input_wav> [output_srt]
references/
references/
- — 发音规则速查(category 枚举、reading 格式、关键约束)
pronunciation-rules.md - — 用户发音规则管理指引(按需加载)
manage-user-rules.md - — MiniMax API 请求中 voice_id、speed、vol、pitch 参数说明与修改位置
api-voice-settings.md - — step 1 操作指引:大小写规范化判断
step-1-normalize.md - — step 2 操作指引:发音读法判断 + 多音字识别
step-2-reading.md - — step 3 操作指引:质量复核
step-3-review.md
- — Quick reference for pronunciation rules (category enumeration, reading format, key constraints)
pronunciation-rules.md - — Guide for user pronunciation rule management (loaded on demand)
manage-user-rules.md - — Description and modification location of parameters such as voice_id, speed, vol, pitch in MiniMax API requests
api-voice-settings.md - — Operation guide for Step 1: Case normalization judgment
step-1-normalize.md - — Operation guide for Step 2: Pronunciation judgment + polyphonic character recognition
step-2-reading.md - — Operation guide for Step 3: Quality review
step-3-review.md
其他文件
Other Files
- — 用户自定义发音规则(agent 通过对话维护,各步骤消费)
user-rules.json - — MiniMax API Key 存储
.env
- — User-defined pronunciation rules (maintained by the agent through dialogue, consumed by each step)
user-rules.json - — MiniMax API Key storage
.env
API 声音参数修改
API Voice Parameter Modification
如果用户询问或想修改 MiniMax TTS API 请求中的音色、语速、音量、语调参数(、、、),请先阅读 。这些参数需要直接在 的 payload 中修改。
voice_idspeedvolpitch<SKILL_DIR>/references/api-voice-settings.md<SKILL_DIR>/scripts/call_tts.pyIf the user asks about or wants to modify voice parameters such as voice type, speech speed, volume, intonation (, , , ) in the MiniMax TTS API request, please read first. These parameters need to be modified directly in the payload of .
voice_idspeedvolpitch<SKILL_DIR>/references/api-voice-settings.md<SKILL_DIR>/scripts/call_tts.py