listenhub-tts

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

ListenHub TTS: 文本转语音

ListenHub TTS: Text-to-Speech

使用 ListenHub OpenAPI 将文本转换为语音。支持三种合成模式，覆盖从短文本到长文本的全场景。

Convert text to speech using the ListenHub OpenAPI. Three synthesis modes are supported, covering all scenarios from short text to long text.

API 信息

API Information

Base URL:
```
https://api.marswave.ai/openapi
```

认证:

Authorization: Bearer $LISTENHUB_API_KEY

（从环境变量读取）

前置检查: 调用任何 API 前先确认
```
LISTENHUB_API_KEY
```
环境变量已设置，未设置则提示用户配置

Base URL:
```
https://api.marswave.ai/openapi
```
Authentication:
```
Authorization: Bearer $LISTENHUB_API_KEY
```
(read from environment variable)
Prerequisite Check: Before calling any API, confirm that the
```
LISTENHUB_API_KEY
```
environment variable is set. If not, prompt the user to configure it.

音色选择流程

Voice Selection Process

用户已明确指定音色

User Has Explicitly Specified a Voice

直接使用用户指定的 speakerId，跳过选择流程。

Directly use the speakerId specified by the user and skip the selection process.

用户未指定音色

User Has Not Specified a Voice

调用
```
GET /v1/speakers/list?language=zh
```
获取可用音色列表
按 AskUserQuestion 展示音色列表供用户选择，格式如下：
- 默认选中
```
chat-girl-105-cn
```
  （晓曼 dxqqq）
- 列表展示：
```
{name}（{gender}，{speakerId}）
```
- 附带每个音色的 demoAudioUrl 供参考
用户确认后使用选定的 speakerId

Call
```
GET /v1/speakers/list?language=zh
```
to retrieve the available voice list
Display the voice list for user selection using AskUserQuestion in the following format:
- ```
chat-girl-105-cn
```
  (Xiaoman dxqqq) is selected by default
- List display:
```
{name} ({gender}, {speakerId})
```
- Attach the demoAudioUrl of each voice for reference
Use the selected speakerId after user confirmation

默认音色

Default Voice

字段	值
speakerId	`chat-girl-105-cn`
名称	晓曼 dxqqq

Field	Value
speakerId	`chat-girl-105-cn`
Name	Xiaoman dxqqq

三种合成模式

Three Synthesis Modes

模式一：快速合成（短文本，单音色）

Mode 1: Quick Synthesis (Short Text, Single Voice)

适用场景： 短文本（< 1000 字），单音色，需要低延迟

接口：

POST /v1/tts

请求体：

json

{
  "text": "要合成的文本",
  "speakerId": "chat-girl-105-cn",
  "format": "mp3",
  "sampleRate": 24000,
  "speed": 1.0
}

参数	类型	必填	说明
text	string	是	要合成的文本
speakerId	string	是	音色 ID
format	string	否	输出格式，默认 `mp3`
sampleRate	int	否	采样率，默认 `24000`
speed	float	否	语速，默认 `1.0` ，范围 `0.5 ~ 2.0`

响应： 直接返回 MP3 二进制流（

Content-Type: audio/mpeg

）

调用示例：

bash

curl -X POST "https://api.marswave.ai/openapi/v1/tts" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "你好世界", "speakerId": "chat-girl-105-cn"}' \
  -o output.mp3

Applicable Scenarios: Short text (< 1000 words), single voice, low latency required

Endpoint:

POST /v1/tts

Request Body:

json

{
  "text": "Text to synthesize",
  "speakerId": "chat-girl-105-cn",
  "format": "mp3",
  "sampleRate": 24000,
  "speed": 1.0
}

Parameter	Type	Required	Description
text	string	Yes	Text to synthesize
speakerId	string	Yes	Voice ID
format	string	No	Output format, default is `mp3`
sampleRate	int	No	Sample rate, default is `24000`
speed	float	No	Speech speed, default is `1.0` , range `0.5 ~ 2.0`

Response: Directly returns MP3 binary stream (

Content-Type: audio/mpeg

)

Call Example:

bash

curl -X POST "https://api.marswave.ai/openapi/v1/tts" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "你好世界", "speakerId": "chat-girl-105-cn"}' \
  -o output.mp3

模式二：多角色脚本合成

Mode 2: Multi-role Script Synthesis

适用场景： 多角色对话、播客、有声书片段，需要不同音色交替朗读

接口：

POST /v1/speech

请求体：

json

{
  "script": [
    {
      "text": "你好，欢迎收听本期节目。",
      "speakerId": "chat-girl-105-cn"
    },
    {
      "text": "谢谢，今天我们来聊聊 AI。",
      "speakerId": "chat-boy-101-cn"
    }
  ],
  "format": "mp3",
  "sampleRate": 24000
}

参数	类型	必填	说明
script	array	是	脚本数组，每项包含 text 和 speakerId
script[].text	string	是	该段文本
script[].speakerId	string	是	该段的音色 ID
format	string	否	输出格式，默认 `mp3`
sampleRate	int	否	采样率，默认 `24000`

响应： JSON

json

{
  "audioUrl": "https://cdn.example.com/output.mp3",
  "duration": 12.5
}

调用示例：

bash

curl -X POST "https://api.marswave.ai/openapi/v1/speech" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "script": [
      {"text": "你好，欢迎收听。", "speakerId": "chat-girl-105-cn"},
      {"text": "谢谢，我们开始吧。", "speakerId": "chat-boy-101-cn"}
    ]
  }'

Applicable Scenarios: Multi-role dialogues, podcasts, audiobook clips, requiring alternating reading with different voices

Endpoint:

POST /v1/speech

Request Body:

json

{
  "script": [
    {
      "text": "Hello, welcome to this episode.",
      "speakerId": "chat-girl-105-cn"
    },
    {
      "text": "Thank you, today we'll talk about AI.",
      "speakerId": "chat-boy-101-cn"
    }
  ],
  "format": "mp3",
  "sampleRate": 24000
}

Parameter	Type	Required	Description
script	array	Yes	Script array, each item contains text and speakerId
script[].text	string	Yes	Text segment
script[].speakerId	string	Yes	Voice ID for this segment
format	string	No	Output format, default is `mp3`
sampleRate	int	No	Sample rate, default is `24000`

Response: JSON

json

{
  "audioUrl": "https://cdn.example.com/output.mp3",
  "duration": 12.5
}

Call Example:

bash

curl -X POST "https://api.marswave.ai/openapi/v1/speech" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "script": [
      {"text": "你好，欢迎收听。", "speakerId": "chat-girl-105-cn"},
      {"text": "谢谢，我们开始吧。", "speakerId": "chat-boy-101-cn"}
    ]
  }'

模式三：长文本流式合成

Mode 3: Long Text Streaming Synthesis

适用场景： 长文本（> 1000 字），文章朗读，需要 AI 润色或分段处理

接口：

POST /v1/flow-speech/episodes

请求体：

json

{
  "title": "文章标题",
  "content": "长文本内容...",
  "speakerId": "chat-girl-105-cn",
  "mode": "direct",
  "format": "mp3"
}

参数	类型	必填	说明
title	string	是	音频标题
content	string	否	文本内容（与 contentUrl 二选一）
contentUrl	string	否	内容 URL（与 content 二选一）
speakerId	string	是	音色 ID
mode	string	否	`direct` （直接合成）或 `aiPolish` （AI 润色），默认 `direct`
format	string	否	输出格式，默认 `mp3`

响应： JSON

json

{
  "episodeId": "ep_abc123",
  "status": "processing"
}

轮询获取结果：

bash

GET /v1/flow-speech/episodes/{episodeId}

轮询策略：

提交后等待 30 秒
之后每 10 秒轮询一次
直到 status 变为
```
completed
```
或
```
failed
```

轮询响应：

json

{
  "episodeId": "ep_abc123",
  "status": "completed",
  "audioUrl": "https://cdn.example.com/output.mp3",
  "duration": 180.5
}

status 值	说明
processing	合成中，继续轮询
completed	合成完成，audioUrl 可用
failed	合成失败，查看 errorMessage

调用示例：

bash

undefined

Applicable Scenarios: Long text (> 1000 words), article reading, requiring AI polishing or segment processing

Endpoint:

POST /v1/flow-speech/episodes

Request Body:

json

{
  "title": "Article Title",
  "content": "Long text content...",
  "speakerId": "chat-girl-105-cn",
  "mode": "direct",
  "format": "mp3"
}

Parameter	Type	Required	Description
title	string	Yes	Audio title
content	string	No	Text content (choose either content or contentUrl)
contentUrl	string	No	Content URL (choose either content or contentUrl)
speakerId	string	Yes	Voice ID
mode	string	No	`direct` (direct synthesis) or `aiPolish` (AI polishing), default is `direct`
format	string	No	Output format, default is `mp3`

Response: JSON

json

{
  "episodeId": "ep_abc123",
  "status": "processing"
}

Poll for Results:

bash

GET /v1/flow-speech/episodes/{episodeId}

Polling Strategy:

Wait 30 seconds after submission
Poll every 10 seconds afterwards
Until status becomes
```
completed
```
or
```
failed
```

Polling Response:

json

{
  "episodeId": "ep_abc123",
  "status": "completed",
  "audioUrl": "https://cdn.example.com/output.mp3",
  "duration": 180.5
}

Status Value	Description
processing	Synthesis in progress, continue polling
completed	Synthesis completed, audioUrl is available
failed	Synthesis failed, check errorMessage

Call Example:

bash

undefined

提交任务

Submit task

curl -X POST "https://api.marswave.ai/openapi/v1/flow-speech/episodes"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d '{ "title": "AI 技术趋势", "content": "长文本内容...", "speakerId": "chat-girl-105-cn", "mode": "direct" }'

curl -X POST "https://api.marswave.ai/openapi/v1/flow-speech/episodes"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d '{ "title": "AI Technology Trends", "content": "Long text content...", "speakerId": "chat-girl-105-cn", "mode": "direct" }'

轮询结果

Poll for results

curl "https://api.marswave.ai/openapi/v1/flow-speech/episodes/ep_abc123"
-H "Authorization: Bearer $LISTENHUB_API_KEY"

undefined

curl "https://api.marswave.ai/openapi/v1/flow-speech/episodes/ep_abc123"
-H "Authorization: Bearer $LISTENHUB_API_KEY"

undefined

音色列表查询

Voice List Query

接口：

GET /v1/speakers/list

查询参数：

参数	类型	必填	说明
language	string	否	筛选语言，如 `zh` （中文）、 `en` （英文）

响应：

json

{
  "speakers": [
    {
      "name": "晓曼 dxqqq",
      "speakerId": "chat-girl-105-cn",
      "demoAudioUrl": "https://cdn.example.com/demo.mp3",
      "gender": "female",
      "language": "zh"
    }
  ]
}

Endpoint:

GET /v1/speakers/list

Query Parameters:

Parameter	Type	Required	Description
language	string	No	Filter by language, e.g., `zh` (Chinese), `en` (English)

Response:

json

{
  "speakers": [
    {
      "name": "Xiaoman dxqqq",
      "speakerId": "chat-girl-105-cn",
      "demoAudioUrl": "https://cdn.example.com/demo.mp3",
      "gender": "female",
      "language": "zh"
    }
  ]
}

模式选择逻辑

Mode Selection Logic

根据用户输入自动选择最合适的模式：

条件	模式
文本 ≤ 1000 字，单音色	模式一： `/v1/tts`
多角色脚本，需要不同音色	模式二： `/v1/speech`
文本 > 1000 字，或需要 AI 润色	模式三： `/v1/flow-speech/episodes`
用户提供 URL 作为内容来源	模式三： `/v1/flow-speech/episodes`

如果用户明确指定模式，优先使用用户指定的模式。

Automatically select the most appropriate mode based on user input:

Condition	Mode
Text ≤ 1000 words, single voice	Mode 1: `/v1/tts`
Multi-role script, requiring different voices	Mode 2: `/v1/speech`
Text > 1000 words, or AI polishing required	Mode 3: `/v1/flow-speech/episodes`
User provides URL as content source	Mode 3: `/v1/flow-speech/episodes`

If the user explicitly specifies a mode, prioritize the user-specified mode.

用户交互

User Interaction

音色选择

Voice Selection

当用户未指定音色时，使用 AskUserQuestion 展示音色列表：

请选择音色（默认：晓曼 dxqqq）：
A. 晓曼 dxqqq（女，chat-girl-105-cn）[默认]
B. [其他音色名称]（[性别]，[speakerId]）
C. ...

When the user does not specify a voice, use AskUserQuestion to display the voice list:

Please select a voice (default: Xiaoman dxqqq):
A. Xiaoman dxqqq (Female, chat-girl-105-cn) [Default]
B. [Other Voice Name] ([Gender], [speakerId])
C. ...

合成参数

Synthesis Parameters

可选询问：

语速 speed（默认 1.0）
输出格式 format（默认 mp3）
长文本模式：direct 还是 aiPolish（默认 direct）
输出文件路径（默认
```
./output.mp3
```
）

Optional inquiries:

Speech speed (speed, default 1.0)
Output format (format, default mp3)
Long text mode: direct or aiPolish (default direct)
Output file path (default
```
./output.mp3
```
)

输出

Output

将音频保存到指定路径（默认
```
./output.mp3
```
）
输出合成摘要：
- 使用的模式
- 音色名称和 ID
- 音频时长
- 文件大小
- 文件路径

Save the audio to the specified path (default
```
./output.mp3
```
)
Output synthesis summary:
- Mode used
- Voice name and ID
- Audio duration
- File size
- File path

错误处理

Error Handling

401 Unauthorized: 提示用户检查
```
LISTENHUB_API_KEY
```
环境变量
400 Bad Request: 检查请求参数，向用户报告具体错误
flow-speech failed: 报告 errorMessage，建议用户重试或切换模式
网络错误: 提示检查网络连接，建议重试

401 Unauthorized: Prompt the user to check the
```
LISTENHUB_API_KEY
```
environment variable
400 Bad Request: Check request parameters and report specific errors to the user
flow-speech failed: Report errorMessage, suggest the user retry or switch modes
Network Error: Prompt to check network connection, suggest retrying

完整示例

Complete Example

用户输入： "把这段文字转成语音：今天天气真好，适合出去散步。"

执行流程：

检查
```
LISTENHUB_API_KEY
```
✓
文本长度 < 1000 字，单音色 → 选择模式一
```
/v1/tts
```
用户未指定音色 → 默认使用
```
chat-girl-105-cn
```
（晓曼）
调用 API 合成
保存到
```
./output.mp3
```
输出摘要

用户输入： "用晓曼的声音朗读这篇文章：article.md"

执行流程：

读取
```
article.md
```
内容
检查文本长度 > 1000 字 → 选择模式三
```
/v1/flow-speech/episodes
```
音色已指定：
```
chat-girl-105-cn
```
（晓曼）
提交合成任务
轮询直到完成
下载音频保存到
```
./article.mp3
```
输出摘要

User Input: "Convert this text to speech: The weather is nice today, perfect for a walk outside."

Execution Process:

Check
```
LISTENHUB_API_KEY
```
✓
Text length < 1000 words, single voice → Select Mode 1
```
/v1/tts
```
User did not specify a voice → Default to
```
chat-girl-105-cn
```
(Xiaoman)
Call API for synthesis
Save to
```
./output.mp3
```
Output summary

User Input: "Read this article with Xiaoman's voice: article.md"

Execution Process:

Read content of
```
article.md
```
Check text length > 1000 words → Select Mode 3
```
/v1/flow-speech/episodes
```
Voice specified:
```
chat-girl-105-cn
```
(Xiaoman)
Submit synthesis task
Poll until completion
Download audio and save to
```
./article.mp3
```
Output summary