listenhub-tts

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

ListenHub TTS: 文本转语音

ListenHub TTS: Text-to-Speech

使用 ListenHub OpenAPI 将文本转换为语音。支持三种合成模式,覆盖从短文本到长文本的全场景。
Convert text to speech using the ListenHub OpenAPI. Three synthesis modes are supported, covering all scenarios from short text to long text.

API 信息

API Information

  • Base URL:
    https://api.marswave.ai/openapi
  • 认证:
    Authorization: Bearer $LISTENHUB_API_KEY
    (从环境变量读取)
  • 前置检查: 调用任何 API 前先确认
    LISTENHUB_API_KEY
    环境变量已设置,未设置则提示用户配置
  • Base URL:
    https://api.marswave.ai/openapi
  • Authentication:
    Authorization: Bearer $LISTENHUB_API_KEY
    (read from environment variable)
  • Prerequisite Check: Before calling any API, confirm that the
    LISTENHUB_API_KEY
    environment variable is set. If not, prompt the user to configure it.

音色选择流程

Voice Selection Process

用户已明确指定音色

User Has Explicitly Specified a Voice

直接使用用户指定的 speakerId,跳过选择流程。
Directly use the speakerId specified by the user and skip the selection process.

用户未指定音色

User Has Not Specified a Voice

  1. 调用
    GET /v1/speakers/list?language=zh
    获取可用音色列表
  2. 按 AskUserQuestion 展示音色列表供用户选择,格式如下:
    • 默认选中
      chat-girl-105-cn
      (晓曼 dxqqq)
    • 列表展示:
      {name}({gender},{speakerId})
    • 附带每个音色的 demoAudioUrl 供参考
  3. 用户确认后使用选定的 speakerId
  1. Call
    GET /v1/speakers/list?language=zh
    to retrieve the available voice list
  2. Display the voice list for user selection using AskUserQuestion in the following format:
    • chat-girl-105-cn
      (Xiaoman dxqqq) is selected by default
    • List display:
      {name} ({gender}, {speakerId})
    • Attach the demoAudioUrl of each voice for reference
  3. Use the selected speakerId after user confirmation

默认音色

Default Voice

字段
speakerId
chat-girl-105-cn
名称晓曼 dxqqq
FieldValue
speakerId
chat-girl-105-cn
NameXiaoman dxqqq

三种合成模式

Three Synthesis Modes

模式一:快速合成(短文本,单音色)

Mode 1: Quick Synthesis (Short Text, Single Voice)

适用场景: 短文本(< 1000 字),单音色,需要低延迟
接口:
POST /v1/tts
请求体:
json
{
  "text": "要合成的文本",
  "speakerId": "chat-girl-105-cn",
  "format": "mp3",
  "sampleRate": 24000,
  "speed": 1.0
}
参数类型必填说明
textstring要合成的文本
speakerIdstring音色 ID
formatstring输出格式,默认
mp3
sampleRateint采样率,默认
24000
speedfloat语速,默认
1.0
,范围
0.5 ~ 2.0
响应: 直接返回 MP3 二进制流(
Content-Type: audio/mpeg
调用示例:
bash
curl -X POST "https://api.marswave.ai/openapi/v1/tts" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "你好世界", "speakerId": "chat-girl-105-cn"}' \
  -o output.mp3
Applicable Scenarios: Short text (< 1000 words), single voice, low latency required
Endpoint:
POST /v1/tts
Request Body:
json
{
  "text": "Text to synthesize",
  "speakerId": "chat-girl-105-cn",
  "format": "mp3",
  "sampleRate": 24000,
  "speed": 1.0
}
ParameterTypeRequiredDescription
textstringYesText to synthesize
speakerIdstringYesVoice ID
formatstringNoOutput format, default is
mp3
sampleRateintNoSample rate, default is
24000
speedfloatNoSpeech speed, default is
1.0
, range
0.5 ~ 2.0
Response: Directly returns MP3 binary stream (
Content-Type: audio/mpeg
)
Call Example:
bash
curl -X POST "https://api.marswave.ai/openapi/v1/tts" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"text": "你好世界", "speakerId": "chat-girl-105-cn"}' \
  -o output.mp3

模式二:多角色脚本合成

Mode 2: Multi-role Script Synthesis

适用场景: 多角色对话、播客、有声书片段,需要不同音色交替朗读
接口:
POST /v1/speech
请求体:
json
{
  "script": [
    {
      "text": "你好,欢迎收听本期节目。",
      "speakerId": "chat-girl-105-cn"
    },
    {
      "text": "谢谢,今天我们来聊聊 AI。",
      "speakerId": "chat-boy-101-cn"
    }
  ],
  "format": "mp3",
  "sampleRate": 24000
}
参数类型必填说明
scriptarray脚本数组,每项包含 text 和 speakerId
script[].textstring该段文本
script[].speakerIdstring该段的音色 ID
formatstring输出格式,默认
mp3
sampleRateint采样率,默认
24000
响应: JSON
json
{
  "audioUrl": "https://cdn.example.com/output.mp3",
  "duration": 12.5
}
调用示例:
bash
curl -X POST "https://api.marswave.ai/openapi/v1/speech" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "script": [
      {"text": "你好,欢迎收听。", "speakerId": "chat-girl-105-cn"},
      {"text": "谢谢,我们开始吧。", "speakerId": "chat-boy-101-cn"}
    ]
  }'
Applicable Scenarios: Multi-role dialogues, podcasts, audiobook clips, requiring alternating reading with different voices
Endpoint:
POST /v1/speech
Request Body:
json
{
  "script": [
    {
      "text": "Hello, welcome to this episode.",
      "speakerId": "chat-girl-105-cn"
    },
    {
      "text": "Thank you, today we'll talk about AI.",
      "speakerId": "chat-boy-101-cn"
    }
  ],
  "format": "mp3",
  "sampleRate": 24000
}
ParameterTypeRequiredDescription
scriptarrayYesScript array, each item contains text and speakerId
script[].textstringYesText segment
script[].speakerIdstringYesVoice ID for this segment
formatstringNoOutput format, default is
mp3
sampleRateintNoSample rate, default is
24000
Response: JSON
json
{
  "audioUrl": "https://cdn.example.com/output.mp3",
  "duration": 12.5
}
Call Example:
bash
curl -X POST "https://api.marswave.ai/openapi/v1/speech" \
  -H "Authorization: Bearer $LISTENHUB_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "script": [
      {"text": "你好,欢迎收听。", "speakerId": "chat-girl-105-cn"},
      {"text": "谢谢,我们开始吧。", "speakerId": "chat-boy-101-cn"}
    ]
  }'

模式三:长文本流式合成

Mode 3: Long Text Streaming Synthesis

适用场景: 长文本(> 1000 字),文章朗读,需要 AI 润色或分段处理
接口:
POST /v1/flow-speech/episodes
请求体:
json
{
  "title": "文章标题",
  "content": "长文本内容...",
  "speakerId": "chat-girl-105-cn",
  "mode": "direct",
  "format": "mp3"
}
参数类型必填说明
titlestring音频标题
contentstring文本内容(与 contentUrl 二选一)
contentUrlstring内容 URL(与 content 二选一)
speakerIdstring音色 ID
modestring
direct
(直接合成)或
aiPolish
(AI 润色),默认
direct
formatstring输出格式,默认
mp3
响应: JSON
json
{
  "episodeId": "ep_abc123",
  "status": "processing"
}
轮询获取结果:
bash
GET /v1/flow-speech/episodes/{episodeId}
轮询策略:
  1. 提交后等待 30 秒
  2. 之后每 10 秒轮询一次
  3. 直到 status 变为
    completed
    failed
轮询响应:
json
{
  "episodeId": "ep_abc123",
  "status": "completed",
  "audioUrl": "https://cdn.example.com/output.mp3",
  "duration": 180.5
}
status 值说明
processing合成中,继续轮询
completed合成完成,audioUrl 可用
failed合成失败,查看 errorMessage
调用示例:
bash
undefined
Applicable Scenarios: Long text (> 1000 words), article reading, requiring AI polishing or segment processing
Endpoint:
POST /v1/flow-speech/episodes
Request Body:
json
{
  "title": "Article Title",
  "content": "Long text content...",
  "speakerId": "chat-girl-105-cn",
  "mode": "direct",
  "format": "mp3"
}
ParameterTypeRequiredDescription
titlestringYesAudio title
contentstringNoText content (choose either content or contentUrl)
contentUrlstringNoContent URL (choose either content or contentUrl)
speakerIdstringYesVoice ID
modestringNo
direct
(direct synthesis) or
aiPolish
(AI polishing), default is
direct
formatstringNoOutput format, default is
mp3
Response: JSON
json
{
  "episodeId": "ep_abc123",
  "status": "processing"
}
Poll for Results:
bash
GET /v1/flow-speech/episodes/{episodeId}
Polling Strategy:
  1. Wait 30 seconds after submission
  2. Poll every 10 seconds afterwards
  3. Until status becomes
    completed
    or
    failed
Polling Response:
json
{
  "episodeId": "ep_abc123",
  "status": "completed",
  "audioUrl": "https://cdn.example.com/output.mp3",
  "duration": 180.5
}
Status ValueDescription
processingSynthesis in progress, continue polling
completedSynthesis completed, audioUrl is available
failedSynthesis failed, check errorMessage
Call Example:
bash
undefined

提交任务

Submit task

curl -X POST "https://api.marswave.ai/openapi/v1/flow-speech/episodes"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d '{ "title": "AI 技术趋势", "content": "长文本内容...", "speakerId": "chat-girl-105-cn", "mode": "direct" }'
curl -X POST "https://api.marswave.ai/openapi/v1/flow-speech/episodes"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
-H "Content-Type: application/json"
-d '{ "title": "AI Technology Trends", "content": "Long text content...", "speakerId": "chat-girl-105-cn", "mode": "direct" }'

轮询结果

Poll for results

curl "https://api.marswave.ai/openapi/v1/flow-speech/episodes/ep_abc123"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
undefined
curl "https://api.marswave.ai/openapi/v1/flow-speech/episodes/ep_abc123"
-H "Authorization: Bearer $LISTENHUB_API_KEY"
undefined

音色列表查询

Voice List Query

接口:
GET /v1/speakers/list
查询参数:
参数类型必填说明
languagestring筛选语言,如
zh
(中文)、
en
(英文)
响应:
json
{
  "speakers": [
    {
      "name": "晓曼 dxqqq",
      "speakerId": "chat-girl-105-cn",
      "demoAudioUrl": "https://cdn.example.com/demo.mp3",
      "gender": "female",
      "language": "zh"
    }
  ]
}
Endpoint:
GET /v1/speakers/list
Query Parameters:
ParameterTypeRequiredDescription
languagestringNoFilter by language, e.g.,
zh
(Chinese),
en
(English)
Response:
json
{
  "speakers": [
    {
      "name": "Xiaoman dxqqq",
      "speakerId": "chat-girl-105-cn",
      "demoAudioUrl": "https://cdn.example.com/demo.mp3",
      "gender": "female",
      "language": "zh"
    }
  ]
}

模式选择逻辑

Mode Selection Logic

根据用户输入自动选择最合适的模式:
条件模式
文本 ≤ 1000 字,单音色模式一:
/v1/tts
多角色脚本,需要不同音色模式二:
/v1/speech
文本 > 1000 字,或需要 AI 润色模式三:
/v1/flow-speech/episodes
用户提供 URL 作为内容来源模式三:
/v1/flow-speech/episodes
如果用户明确指定模式,优先使用用户指定的模式。
Automatically select the most appropriate mode based on user input:
ConditionMode
Text ≤ 1000 words, single voiceMode 1:
/v1/tts
Multi-role script, requiring different voicesMode 2:
/v1/speech
Text > 1000 words, or AI polishing requiredMode 3:
/v1/flow-speech/episodes
User provides URL as content sourceMode 3:
/v1/flow-speech/episodes
If the user explicitly specifies a mode, prioritize the user-specified mode.

用户交互

User Interaction

音色选择

Voice Selection

当用户未指定音色时,使用 AskUserQuestion 展示音色列表:
请选择音色(默认:晓曼 dxqqq):
A. 晓曼 dxqqq(女,chat-girl-105-cn)[默认]
B. [其他音色名称]([性别],[speakerId])
C. ...
When the user does not specify a voice, use AskUserQuestion to display the voice list:
Please select a voice (default: Xiaoman dxqqq):
A. Xiaoman dxqqq (Female, chat-girl-105-cn) [Default]
B. [Other Voice Name] ([Gender], [speakerId])
C. ...

合成参数

Synthesis Parameters

可选询问:
  • 语速 speed(默认 1.0)
  • 输出格式 format(默认 mp3)
  • 长文本模式:direct 还是 aiPolish(默认 direct)
  • 输出文件路径(默认
    ./output.mp3
Optional inquiries:
  • Speech speed (speed, default 1.0)
  • Output format (format, default mp3)
  • Long text mode: direct or aiPolish (default direct)
  • Output file path (default
    ./output.mp3
    )

输出

Output

  1. 将音频保存到指定路径(默认
    ./output.mp3
  2. 输出合成摘要:
    • 使用的模式
    • 音色名称和 ID
    • 音频时长
    • 文件大小
    • 文件路径
  1. Save the audio to the specified path (default
    ./output.mp3
    )
  2. Output synthesis summary:
    • Mode used
    • Voice name and ID
    • Audio duration
    • File size
    • File path

错误处理

Error Handling

  • 401 Unauthorized: 提示用户检查
    LISTENHUB_API_KEY
    环境变量
  • 400 Bad Request: 检查请求参数,向用户报告具体错误
  • flow-speech failed: 报告 errorMessage,建议用户重试或切换模式
  • 网络错误: 提示检查网络连接,建议重试
  • 401 Unauthorized: Prompt the user to check the
    LISTENHUB_API_KEY
    environment variable
  • 400 Bad Request: Check request parameters and report specific errors to the user
  • flow-speech failed: Report errorMessage, suggest the user retry or switch modes
  • Network Error: Prompt to check network connection, suggest retrying

完整示例

Complete Example

用户输入: "把这段文字转成语音:今天天气真好,适合出去散步。"
执行流程:
  1. 检查
    LISTENHUB_API_KEY
  2. 文本长度 < 1000 字,单音色 → 选择模式一
    /v1/tts
  3. 用户未指定音色 → 默认使用
    chat-girl-105-cn
    (晓曼)
  4. 调用 API 合成
  5. 保存到
    ./output.mp3
  6. 输出摘要
用户输入: "用晓曼的声音朗读这篇文章:article.md"
执行流程:
  1. 读取
    article.md
    内容
  2. 检查文本长度 > 1000 字 → 选择模式三
    /v1/flow-speech/episodes
  3. 音色已指定:
    chat-girl-105-cn
    (晓曼)
  4. 提交合成任务
  5. 轮询直到完成
  6. 下载音频保存到
    ./article.mp3
  7. 输出摘要
User Input: "Convert this text to speech: The weather is nice today, perfect for a walk outside."
Execution Process:
  1. Check
    LISTENHUB_API_KEY
  2. Text length < 1000 words, single voice → Select Mode 1
    /v1/tts
  3. User did not specify a voice → Default to
    chat-girl-105-cn
    (Xiaoman)
  4. Call API for synthesis
  5. Save to
    ./output.mp3
  6. Output summary
User Input: "Read this article with Xiaoman's voice: article.md"
Execution Process:
  1. Read content of
    article.md
  2. Check text length > 1000 words → Select Mode 3
    /v1/flow-speech/episodes
  3. Voice specified:
    chat-girl-105-cn
    (Xiaoman)
  4. Submit synthesis task
  5. Poll until completion
  6. Download audio and save to
    ./article.mp3
  7. Output summary