MiniMax TTS Pronunciation Control

Process text files step by step to resolve pronunciation issues, and finally call the MiniMax TTS API to generate audio.

Input

Parameter	Required	Description
Text file path	Yes	Absolute path of the .txt file to be processed
Output directory	No	By default, creates a `tts-{YYYYMMDD-HHMMSS}/` directory in the same directory as the input file

User Pronunciation Rule Management

When the user requests to add/query/delete/modify pronunciation rules (e.g., "Qwen is pronounced as Qianwen", "Check what rules there are", "Delete the rule for Qwen"), read

<SKILL_DIR>/references/manage-user-rules.md

and

<SKILL_DIR>/references/pronunciation-rules.md

, then follow the guidelines to operate

<SKILL_DIR>/user-rules.json

Workflow

input.txt → input.raw.txt → [Script] normalize_punctuation.py → input.txt
         → [Script] scan_terms.py → terms.json(draft)
         → [Subagent 1] Complete normalization → terms.json
         → [Script] validate + generate_normalized.py → normalized.txt
         → [Subagent 2] Complete pronunciation + polyphonic character recognition → terms.json
         → [Script] validate
         → [Subagent 3] Review → terms.json(review.pass)
         → [Script] validate + call_tts.py → output.wav + output.title
         → [Script] title_to_srt.py → output.srt

Use

<SKILL_DIR>

to represent the absolute path of this skill directory. Use

<run_dir>

to represent the absolute path of the current running output directory (i.e., the full path of the

tts-{YYYYMMDD-HHMMSS}/

directory created in Step 0).

Step -1: Environment Pre-check

Before starting any processing, check the running environment and MiniMax API Key in sequence.

Python and Dependency Check:

Execute
```
python3 --version
```
to confirm Python >= 3.10. If the version is too low or not installed, prompt the user to install it and try again, then stop the process.
Execute
```
python3 -c "import requests"
```
to confirm that the
```
requests
```
library is installed. If not installed, prompt the user to execute
```
pip3 install requests
```
(or
```
pip install requests
```
) and try again, then stop the process.

API Key Check:

Check if
```
<SKILL_DIR>/.env
```
(i.e., the
```
.env
```
file in the same directory as SKILL.md) exists. If not, create an empty
```
.env
```
file.
Read the
```
.env
```
file and check if
```
MINIMAX_API_KEY
```
exists and its value is not empty.
If configured, proceed to the next step.
If not configured, ask the user for the MiniMax API Key. After the user provides it, append
```
MINIMAX_API_KEY=<value provided by user>
```
to the
```
<SKILL_DIR>/.env
```
file, then proceed.

Step 0: Initialize Running Directory

Obtain the text file path from user input.
Create the
```
<input_dir>/tts-{YYYYMMDD-HHMMSS}/
```
directory, where
```
<input_dir>
```
is the directory of the input file; unless the user explicitly specifies an output directory, do not use the current working directory or skill project directory instead.
- If writing to the same directory as the input file is not possible due to sandbox or permission restrictions, must request user authorization first; only when the user explicitly agrees is it allowed to use another directory.
Copy the input file as
```
<run_dir>/input.raw.txt
```
.
Perform punctuation normalization:

bash

python3 <SKILL_DIR>/scripts/normalize_punctuation.py <run_dir>/input.raw.txt <run_dir>/input.txt

Execute:

bash

python3 <SKILL_DIR>/scripts/scan_terms.py <run_dir>/input.txt <run_dir>/terms.json

Proceed to Step 1.

Step 1: Case Normalization Judgment

Replace

<SKILL_DIR>

and

<run_dir>

with actual absolute paths, then send the following prompt to the subagent:

Please read the following files first, then perform the task.

## Required Files (Read in Order)

1. Operation Guide: <SKILL_DIR>/references/step-1-normalize.md
2. Pronunciation Rule Reference: <SKILL_DIR>/references/pronunciation-rules.md
3. User-defined Rules: <SKILL_DIR>/user-rules.json (skip if the file does not exist)
4. Original Text: <run_dir>/input.txt
5. Candidate Terms: <run_dir>/terms.json

## Task

Process the normalized, category, and reason fields of each term in terms.json according to the rules in the operation guide.

## Output

Directly modify and save <run_dir>/terms.json (do not create a new file).

## Validation

After modification, execute `python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 1`. If validation fails, correct terms.json according to the errors list and re-validate until it passes.

## Follow-up

After validation passes, execute `python3 <SKILL_DIR>/scripts/generate_normalized.py <run_dir>/input.txt <run_dir>/terms.json <run_dir>/normalized.txt`.

Step 2: Pronunciation Judgment

Replace

<SKILL_DIR>

and

<run_dir>

with actual absolute paths, then send the following prompt to the subagent:

Please read the following files first, then perform the task.

## Required Files (Read in Order)

1. Operation Guide: <SKILL_DIR>/references/step-2-reading.md
2. Pronunciation Rule Reference: <SKILL_DIR>/references/pronunciation-rules.md
3. User-defined Rules: <SKILL_DIR>/user-rules.json (skip if the file does not exist)
4. Original Text: <run_dir>/input.txt
5. Normalized Text: <run_dir>/normalized.txt
6. Candidate Terms: <run_dir>/terms.json

## Task

Process the reading and category fields of each term in terms.json according to the rules in the operation guide, and identify missing polyphonic characters in the original text.

## Output

Directly modify and save <run_dir>/terms.json (do not create a new file).

## Validation

After modification, execute `python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 2`. If validation fails, correct terms.json according to the errors list and re-validate until it passes.

Step 3: Quality Review

Replace

<SKILL_DIR>

and

<run_dir>

with actual absolute paths, then send the following prompt to the subagent:

Please read the following files first, then perform the task.

## Required Files (Read in Order)

1. Operation Guide: <SKILL_DIR>/references/step-3-review.md
2. Pronunciation Rule Reference: <SKILL_DIR>/references/pronunciation-rules.md
3. User-defined Rules: <SKILL_DIR>/user-rules.json (skip if the file does not exist)
4. Original Text: <run_dir>/input.txt
5. Normalized Text: <run_dir>/normalized.txt
6. Complete Candidate Terms: <run_dir>/terms.json

## Task

Perform a final quality review on terms.json according to the check items in the operation guide.

## Output

Directly modify and save <run_dir>/terms.json (do not create a new file).

## Validation

After modification, execute `python3 <SKILL_DIR>/scripts/validate_terms.py <run_dir>/terms.json 3`. If validation fails, correct terms.json according to the errors list and re-validate until it passes.

Step 4: Generate Audio and Subtitle JSON

Call the MiniMax TTS API:

bash

python3 <SKILL_DIR>/scripts/call_tts.py <run_dir>/normalized.txt <run_dir>/terms.json <run_dir>/output.wav <run_dir>/output.title

This step will:

Generate and save the WAV audio:
```
<run_dir>/output.wav
```
Download and save the subtitle JSON returned by MiniMax:
```
<run_dir>/output.title
```

Step 5: Generate SRT Subtitles

Generate SRT subtitles based on the MiniMax subtitle JSON and WAV audio obtained in Step 4:

bash

python3 <SKILL_DIR>/scripts/title_to_srt.py <run_dir>/output.title <run_dir>/output.wav <run_dir>/output.srt

Report the results to the user:

Audio file path
MiniMax subtitle JSON file path
SRT subtitle file path
Number of tone rules used
Number of text replacements made

Saved Files

tts-YYYYMMDD-HHMMSS/
  input.raw.txt    # Original input (read-only)
  input.txt        # Input after punctuation normalization (read-only)
  terms.json       # The only structured working file throughout the process
  normalized.txt   # Normalized text
  output.wav       # MiniMax TTS output audio
  output.title     # Word-level timestamp subtitle JSON returned by MiniMax
  output.srt       # SRT subtitles generated from output.title + output.wav

Constraints

Maintain only one copy of terms.json throughout the process; all subagents directly modify this same file.
LLM only modifies terms.json, not normalized.txt or input.txt directly.
Text replacement, tone generation, and API calls are all executed by scripts.
Stop the process if validation fails at any stage, do not proceed to subsequent stages.
MINIMAX_API_KEY is read from the
```
<SKILL_DIR>/.env
```
file.

Resources

scripts/

```
normalize_punctuation.py <input> <output>
```
— Stage 0: Add periods to texts missing end-of-sentence punctuation with line breaks
```
scan_terms.py
```
— Stage 0: Extract candidate terms from the original text and generate a draft of terms.json
```
validate_terms.py <terms_json> <stage>
```
— Stage 1/2/3: Validate terms.json schema
```
generate_normalized.py <input> <terms> <output>
```
— After Stage 1: Generate normalized text based on terms.json
```
call_tts.py <normalized> <terms> <output_wav> [output_title]
```
— Stage 4: Call the MiniMax TTS API to generate WAV audio and download subtitle JSON
```
title_to_srt.py <input_title> <input_wav> [output_srt]
```
— Stage 5: Generate SRT subtitles from MiniMax subtitle JSON and WAV audio

references/

```
pronunciation-rules.md
```
— Quick reference for pronunciation rules (category enumeration, reading format, key constraints)
```
manage-user-rules.md
```
— Guide for user pronunciation rule management (loaded on demand)
```
api-voice-settings.md
```
— Description and modification location of parameters such as voice_id, speed, vol, pitch in MiniMax API requests
```
step-1-normalize.md
```
— Operation guide for Step 1: Case normalization judgment
```
step-2-reading.md
```
— Operation guide for Step 2: Pronunciation judgment + polyphonic character recognition
```
step-3-review.md
```
— Operation guide for Step 3: Quality review

Other Files

```
user-rules.json
```
— User-defined pronunciation rules (maintained by the agent through dialogue, consumed by each step)
```
.env
```
— MiniMax API Key storage

API Voice Parameter Modification

If the user asks about or wants to modify voice parameters such as voice type, speech speed, volume, intonation (

voice_id

speed

vol

pitch

) in the MiniMax TTS API request, please read

<SKILL_DIR>/references/api-voice-settings.md

first. These parameters need to be modified directly in the payload of

<SKILL_DIR>/scripts/call_tts.py

minimax-tts-pipeline

NPX Install

Tags

SKILL.md Content (Chinese)

MiniMax TTS Pronunciation Control

Input

User Pronunciation Rule Management

Workflow

Step -1: Environment Pre-check

Step 0: Initialize Running Directory

Step 1: Case Normalization Judgment

Step 2: Pronunciation Judgment

Step 3: Quality Review

Step 4: Generate Audio and Subtitle JSON

Step 5: Generate SRT Subtitles

Saved Files

Constraints

Resources

scripts/

references/

Other Files

API Voice Parameter Modification