listenhub-tts
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseListenHub TTS: 文本转语音
ListenHub TTS: Text-to-Speech
使用 ListenHub OpenAPI 将文本转换为语音。支持三种合成模式,覆盖从短文本到长文本的全场景。
Convert text to speech using the ListenHub OpenAPI. Three synthesis modes are supported, covering all scenarios from short text to long text.
API 信息
API Information
- Base URL:
https://api.marswave.ai/openapi - 认证: (从环境变量读取)
Authorization: Bearer $LISTENHUB_API_KEY - 前置检查: 调用任何 API 前先确认 环境变量已设置,未设置则提示用户配置
LISTENHUB_API_KEY
- Base URL:
https://api.marswave.ai/openapi - Authentication: (read from environment variable)
Authorization: Bearer $LISTENHUB_API_KEY - Prerequisite Check: Before calling any API, confirm that the environment variable is set. If not, prompt the user to configure it.
LISTENHUB_API_KEY
音色选择流程
Voice Selection Process
用户已明确指定音色
User Has Explicitly Specified a Voice
直接使用用户指定的 speakerId,跳过选择流程。
Directly use the speakerId specified by the user and skip the selection process.
用户未指定音色
User Has Not Specified a Voice
- 调用 获取可用音色列表
GET /v1/speakers/list?language=zh - 按 AskUserQuestion 展示音色列表供用户选择,格式如下:
- 默认选中 (晓曼 dxqqq)
chat-girl-105-cn - 列表展示:
{name}({gender},{speakerId}) - 附带每个音色的 demoAudioUrl 供参考
- 默认选中
- 用户确认后使用选定的 speakerId
- Call to retrieve the available voice list
GET /v1/speakers/list?language=zh - Display the voice list for user selection using AskUserQuestion in the following format:
- (Xiaoman dxqqq) is selected by default
chat-girl-105-cn - List display:
{name} ({gender}, {speakerId}) - Attach the demoAudioUrl of each voice for reference
- Use the selected speakerId after user confirmation
默认音色
Default Voice
| 字段 | 值 |
|---|---|
| speakerId | |
| 名称 | 晓曼 dxqqq |
| Field | Value |
|---|---|
| speakerId | |
| Name | Xiaoman dxqqq |
三种合成模式
Three Synthesis Modes
模式一:快速合成(短文本,单音色)
Mode 1: Quick Synthesis (Short Text, Single Voice)
适用场景: 短文本(< 1000 字),单音色,需要低延迟
接口:
POST /v1/tts请求体:
json
{
"text": "要合成的文本",
"speakerId": "chat-girl-105-cn",
"format": "mp3",
"sampleRate": 24000,
"speed": 1.0
}| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
| text | string | 是 | 要合成的文本 |
| speakerId | string | 是 | 音色 ID |
| format | string | 否 | 输出格式,默认 |
| sampleRate | int | 否 | 采样率,默认 |
| speed | float | 否 | 语速,默认 |
响应: 直接返回 MP3 二进制流()
Content-Type: audio/mpeg调用示例:
bash
curl -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "你好世界", "speakerId": "chat-girl-105-cn"}' \
-o output.mp3Applicable Scenarios: Short text (< 1000 words), single voice, low latency required
Endpoint:
POST /v1/ttsRequest Body:
json
{
"text": "Text to synthesize",
"speakerId": "chat-girl-105-cn",
"format": "mp3",
"sampleRate": 24000,
"speed": 1.0
}| Parameter | Type | Required | Description |
|---|---|---|---|
| text | string | Yes | Text to synthesize |
| speakerId | string | Yes | Voice ID |
| format | string | No | Output format, default is |
| sampleRate | int | No | Sample rate, default is |
| speed | float | No | Speech speed, default is |
Response: Directly returns MP3 binary stream ()
Content-Type: audio/mpegCall Example:
bash
curl -X POST "https://api.marswave.ai/openapi/v1/tts" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{"text": "你好世界", "speakerId": "chat-girl-105-cn"}' \
-o output.mp3模式二:多角色脚本合成
Mode 2: Multi-role Script Synthesis
适用场景: 多角色对话、播客、有声书片段,需要不同音色交替朗读
接口:
POST /v1/speech请求体:
json
{
"script": [
{
"text": "你好,欢迎收听本期节目。",
"speakerId": "chat-girl-105-cn"
},
{
"text": "谢谢,今天我们来聊聊 AI。",
"speakerId": "chat-boy-101-cn"
}
],
"format": "mp3",
"sampleRate": 24000
}| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
| script | array | 是 | 脚本数组,每项包含 text 和 speakerId |
| script[].text | string | 是 | 该段文本 |
| script[].speakerId | string | 是 | 该段的音色 ID |
| format | string | 否 | 输出格式,默认 |
| sampleRate | int | 否 | 采样率,默认 |
响应: JSON
json
{
"audioUrl": "https://cdn.example.com/output.mp3",
"duration": 12.5
}调用示例:
bash
curl -X POST "https://api.marswave.ai/openapi/v1/speech" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"script": [
{"text": "你好,欢迎收听。", "speakerId": "chat-girl-105-cn"},
{"text": "谢谢,我们开始吧。", "speakerId": "chat-boy-101-cn"}
]
}'Applicable Scenarios: Multi-role dialogues, podcasts, audiobook clips, requiring alternating reading with different voices
Endpoint:
POST /v1/speechRequest Body:
json
{
"script": [
{
"text": "Hello, welcome to this episode.",
"speakerId": "chat-girl-105-cn"
},
{
"text": "Thank you, today we'll talk about AI.",
"speakerId": "chat-boy-101-cn"
}
],
"format": "mp3",
"sampleRate": 24000
}| Parameter | Type | Required | Description |
|---|---|---|---|
| script | array | Yes | Script array, each item contains text and speakerId |
| script[].text | string | Yes | Text segment |
| script[].speakerId | string | Yes | Voice ID for this segment |
| format | string | No | Output format, default is |
| sampleRate | int | No | Sample rate, default is |
Response: JSON
json
{
"audioUrl": "https://cdn.example.com/output.mp3",
"duration": 12.5
}Call Example:
bash
curl -X POST "https://api.marswave.ai/openapi/v1/speech" \
-H "Authorization: Bearer $LISTENHUB_API_KEY" \
-H "Content-Type: application/json" \
-d '{
"script": [
{"text": "你好,欢迎收听。", "speakerId": "chat-girl-105-cn"},
{"text": "谢谢,我们开始吧。", "speakerId": "chat-boy-101-cn"}
]
}'模式三:长文本流式合成
Mode 3: Long Text Streaming Synthesis
适用场景: 长文本(> 1000 字),文章朗读,需要 AI 润色或分段处理
接口:
POST /v1/flow-speech/episodes请求体:
json
{
"title": "文章标题",
"content": "长文本内容...",
"speakerId": "chat-girl-105-cn",
"mode": "direct",
"format": "mp3"
}| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
| title | string | 是 | 音频标题 |
| content | string | 否 | 文本内容(与 contentUrl 二选一) |
| contentUrl | string | 否 | 内容 URL(与 content 二选一) |
| speakerId | string | 是 | 音色 ID |
| mode | string | 否 | |
| format | string | 否 | 输出格式,默认 |
响应: JSON
json
{
"episodeId": "ep_abc123",
"status": "processing"
}轮询获取结果:
bash
GET /v1/flow-speech/episodes/{episodeId}轮询策略:
- 提交后等待 30 秒
- 之后每 10 秒轮询一次
- 直到 status 变为 或
completedfailed
轮询响应:
json
{
"episodeId": "ep_abc123",
"status": "completed",
"audioUrl": "https://cdn.example.com/output.mp3",
"duration": 180.5
}| status 值 | 说明 |
|---|---|
| processing | 合成中,继续轮询 |
| completed | 合成完成,audioUrl 可用 |
| failed | 合成失败,查看 errorMessage |
调用示例:
bash
undefinedApplicable Scenarios: Long text (> 1000 words), article reading, requiring AI polishing or segment processing
Endpoint:
POST /v1/flow-speech/episodesRequest Body:
json
{
"title": "Article Title",
"content": "Long text content...",
"speakerId": "chat-girl-105-cn",
"mode": "direct",
"format": "mp3"
}| Parameter | Type | Required | Description |
|---|---|---|---|
| title | string | Yes | Audio title |
| content | string | No | Text content (choose either content or contentUrl) |
| contentUrl | string | No | Content URL (choose either content or contentUrl) |
| speakerId | string | Yes | Voice ID |
| mode | string | No | |
| format | string | No | Output format, default is |
Response: JSON
json
{
"episodeId": "ep_abc123",
"status": "processing"
}Poll for Results:
bash
GET /v1/flow-speech/episodes/{episodeId}Polling Strategy:
- Wait 30 seconds after submission
- Poll every 10 seconds afterwards
- Until status becomes or
completedfailed
Polling Response:
json
{
"episodeId": "ep_abc123",
"status": "completed",
"audioUrl": "https://cdn.example.com/output.mp3",
"duration": 180.5
}| Status Value | Description |
|---|---|
| processing | Synthesis in progress, continue polling |
| completed | Synthesis completed, audioUrl is available |
| failed | Synthesis failed, check errorMessage |
Call Example:
bash
undefined提交任务
Submit task
curl -X POST "https://api.marswave.ai/openapi/v1/flow-speech/episodes"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d '{ "title": "AI 技术趋势", "content": "长文本内容...", "speakerId": "chat-girl-105-cn", "mode": "direct" }'
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d '{ "title": "AI 技术趋势", "content": "长文本内容...", "speakerId": "chat-girl-105-cn", "mode": "direct" }'
curl -X POST "https://api.marswave.ai/openapi/v1/flow-speech/episodes"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d '{ "title": "AI Technology Trends", "content": "Long text content...", "speakerId": "chat-girl-105-cn", "mode": "direct" }'
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d '{ "title": "AI Technology Trends", "content": "Long text content...", "speakerId": "chat-girl-105-cn", "mode": "direct" }'
轮询结果
Poll for results
curl "https://api.marswave.ai/openapi/v1/flow-speech/episodes/ep_abc123"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
undefinedcurl "https://api.marswave.ai/openapi/v1/flow-speech/episodes/ep_abc123"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
undefined音色列表查询
Voice List Query
接口:
GET /v1/speakers/list查询参数:
| 参数 | 类型 | 必填 | 说明 |
|---|---|---|---|
| language | string | 否 | 筛选语言,如 |
响应:
json
{
"speakers": [
{
"name": "晓曼 dxqqq",
"speakerId": "chat-girl-105-cn",
"demoAudioUrl": "https://cdn.example.com/demo.mp3",
"gender": "female",
"language": "zh"
}
]
}Endpoint:
GET /v1/speakers/listQuery Parameters:
| Parameter | Type | Required | Description |
|---|---|---|---|
| language | string | No | Filter by language, e.g., |
Response:
json
{
"speakers": [
{
"name": "Xiaoman dxqqq",
"speakerId": "chat-girl-105-cn",
"demoAudioUrl": "https://cdn.example.com/demo.mp3",
"gender": "female",
"language": "zh"
}
]
}模式选择逻辑
Mode Selection Logic
根据用户输入自动选择最合适的模式:
| 条件 | 模式 |
|---|---|
| 文本 ≤ 1000 字,单音色 | 模式一: |
| 多角色脚本,需要不同音色 | 模式二: |
| 文本 > 1000 字,或需要 AI 润色 | 模式三: |
| 用户提供 URL 作为内容来源 | 模式三: |
如果用户明确指定模式,优先使用用户指定的模式。
Automatically select the most appropriate mode based on user input:
| Condition | Mode |
|---|---|
| Text ≤ 1000 words, single voice | Mode 1: |
| Multi-role script, requiring different voices | Mode 2: |
| Text > 1000 words, or AI polishing required | Mode 3: |
| User provides URL as content source | Mode 3: |
If the user explicitly specifies a mode, prioritize the user-specified mode.
用户交互
User Interaction
音色选择
Voice Selection
当用户未指定音色时,使用 AskUserQuestion 展示音色列表:
请选择音色(默认:晓曼 dxqqq):
A. 晓曼 dxqqq(女,chat-girl-105-cn)[默认]
B. [其他音色名称]([性别],[speakerId])
C. ...When the user does not specify a voice, use AskUserQuestion to display the voice list:
Please select a voice (default: Xiaoman dxqqq):
A. Xiaoman dxqqq (Female, chat-girl-105-cn) [Default]
B. [Other Voice Name] ([Gender], [speakerId])
C. ...合成参数
Synthesis Parameters
可选询问:
- 语速 speed(默认 1.0)
- 输出格式 format(默认 mp3)
- 长文本模式:direct 还是 aiPolish(默认 direct)
- 输出文件路径(默认 )
./output.mp3
Optional inquiries:
- Speech speed (speed, default 1.0)
- Output format (format, default mp3)
- Long text mode: direct or aiPolish (default direct)
- Output file path (default )
./output.mp3
输出
Output
- 将音频保存到指定路径(默认 )
./output.mp3 - 输出合成摘要:
- 使用的模式
- 音色名称和 ID
- 音频时长
- 文件大小
- 文件路径
- Save the audio to the specified path (default )
./output.mp3 - Output synthesis summary:
- Mode used
- Voice name and ID
- Audio duration
- File size
- File path
错误处理
Error Handling
- 401 Unauthorized: 提示用户检查 环境变量
LISTENHUB_API_KEY - 400 Bad Request: 检查请求参数,向用户报告具体错误
- flow-speech failed: 报告 errorMessage,建议用户重试或切换模式
- 网络错误: 提示检查网络连接,建议重试
- 401 Unauthorized: Prompt the user to check the environment variable
LISTENHUB_API_KEY - 400 Bad Request: Check request parameters and report specific errors to the user
- flow-speech failed: Report errorMessage, suggest the user retry or switch modes
- Network Error: Prompt to check network connection, suggest retrying
完整示例
Complete Example
用户输入: "把这段文字转成语音:今天天气真好,适合出去散步。"
执行流程:
- 检查 ✓
LISTENHUB_API_KEY - 文本长度 < 1000 字,单音色 → 选择模式一
/v1/tts - 用户未指定音色 → 默认使用 (晓曼)
chat-girl-105-cn - 调用 API 合成
- 保存到
./output.mp3 - 输出摘要
用户输入: "用晓曼的声音朗读这篇文章:article.md"
执行流程:
- 读取 内容
article.md - 检查文本长度 > 1000 字 → 选择模式三
/v1/flow-speech/episodes - 音色已指定:(晓曼)
chat-girl-105-cn - 提交合成任务
- 轮询直到完成
- 下载音频保存到
./article.mp3 - 输出摘要
User Input: "Convert this text to speech: The weather is nice today, perfect for a walk outside."
Execution Process:
- Check ✓
LISTENHUB_API_KEY - Text length < 1000 words, single voice → Select Mode 1
/v1/tts - User did not specify a voice → Default to (Xiaoman)
chat-girl-105-cn - Call API for synthesis
- Save to
./output.mp3 - Output summary
User Input: "Read this article with Xiaoman's voice: article.md"
Execution Process:
- Read content of
article.md - Check text length > 1000 words → Select Mode 3
/v1/flow-speech/episodes - Voice specified: (Xiaoman)
chat-girl-105-cn - Submit synthesis task
- Poll until completion
- Download audio and save to
./article.mp3 - Output summary