text-to-speech

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Text-to-Speech (HeyGen Starfish)

文本转语音（HeyGen Starfish）

Generate speech audio files from text using HeyGen's in-house Starfish TTS model. This skill is for standalone audio generation — separate from video creation.

使用HeyGen自研的Starfish TTS模型将文本转换为语音音频文件。本技能用于独立音频生成——与视频创建功能分离。

Authentication

身份验证

All requests require the

X-Api-Key

header. Set the

HEYGEN_API_KEY

environment variable.

bash

curl -X GET "https://api.heygen.com/v1/audio/voices" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

所有请求都需要携带

X-Api-Key

请求头。请设置

HEYGEN_API_KEY

环境变量。

bash

curl -X GET "https://api.heygen.com/v1/audio/voices" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

Tool Selection

工具选择

If HeyGen MCP tools are available (

mcp__heygen__*

), prefer them over direct HTTP API calls.

Task	MCP Tool	Fallback (Direct API)
List TTS voices	`mcp__heygen__list_audio_voices`	`GET /v1/audio/voices`
Generate speech audio	`mcp__heygen__text_to_speech`	`POST /v1/audio/text_to_speech`

如果HeyGen MCP工具可用（

mcp__heygen__*

），优先使用这些工具而非直接调用HTTP API。

任务	MCP工具	备选方案（直接调用API）
列出TTS语音	`mcp__heygen__list_audio_voices`	`GET /v1/audio/voices`
生成语音音频	`mcp__heygen__text_to_speech`	`POST /v1/audio/text_to_speech`

Default Workflow

默认工作流程

List voices with

mcp__heygen__list_audio_voices

(or

GET /v1/audio/voices

)

Pick a voice matching desired language, gender, and features

Call

mcp__heygen__text_to_speech

(or

POST /v1/audio/text_to_speech

) with text and voice_id

Use the returned
```
audio_url
```
to download or play the audio

使用

mcp__heygen__list_audio_voices

（或

GET /v1/audio/voices

）列出语音

选择符合所需语言、性别及特性的语音

携带文本和voice_id调用

mcp__heygen__text_to_speech

（或

POST /v1/audio/text_to_speech

）

使用返回的
```
audio_url
```
下载或播放音频

List TTS Voices

列出TTS语音

Retrieve voices compatible with the Starfish TTS model.

Note: This uses
GET /v1/audio/voices
— a different endpoint from the video voices API (
GET /v2/voices
). Not all video voices support Starfish TTS.

获取兼容Starfish TTS模型的语音。

注意： 此处使用的是
GET /v1/audio/voices
接口——与视频语音API（
GET /v2/voices
）不同。并非所有视频语音都支持Starfish TTS。

curl

curl示例

bash

curl -X GET "https://api.heygen.com/v1/audio/voices" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

bash

curl -X GET "https://api.heygen.com/v1/audio/voices" \
  -H "X-Api-Key: $HEYGEN_API_KEY"

TypeScript

TypeScript示例

typescript

interface TTSVoice {
  voice_id: string;
  language: string;
  gender: "female" | "male" | "unknown";
  name: string;
  preview_audio_url: string | null;
  support_pause: boolean;
  support_locale: boolean;
  type: string;
}

interface TTSVoicesResponse {
  error: null | string;
  data: {
    voices: TTSVoice[];
  };
}

async function listTTSVoices(): Promise<TTSVoice[]> {
  const response = await fetch("https://api.heygen.com/v1/audio/voices", {
    headers: { "X-Api-Key": process.env.HEYGEN_API_KEY! },
  });

  const json: TTSVoicesResponse = await response.json();

  if (json.error) {
    throw new Error(json.error);
  }

  return json.data.voices;
}

typescript

interface TTSVoice {
  voice_id: string;
  language: string;
  gender: "female" | "male" | "unknown";
  name: string;
  preview_audio_url: string | null;
  support_pause: boolean;
  support_locale: boolean;
  type: string;
}

interface TTSVoicesResponse {
  error: null | string;
  data: {
    voices: TTSVoice[];
  };
}

async function listTTSVoices(): Promise<TTSVoice[]> {
  const response = await fetch("https://api.heygen.com/v1/audio/voices", {
    headers: { "X-Api-Key": process.env.HEYGEN_API_KEY! },
  });

  const json: TTSVoicesResponse = await response.json();

  if (json.error) {
    throw new Error(json.error);
  }

  return json.data.voices;
}

Python

Python示例

python

import requests
import os

def list_tts_voices() -> list:
    response = requests.get(
        "https://api.heygen.com/v1/audio/voices",
        headers={"X-Api-Key": os.environ["HEYGEN_API_KEY"]}
    )

    data = response.json()
    if data.get("error"):
        raise Exception(data["error"])

    return data["data"]["voices"]

python

import requests
import os

def list_tts_voices() -> list:
    response = requests.get(
        "https://api.heygen.com/v1/audio/voices",
        headers={"X-Api-Key": os.environ["HEYGEN_API_KEY"]}
    )

    data = response.json()
    if data.get("error"):
        raise Exception(data["error"])

    return data["data"]["voices"]

Response Format

响应格式

json

{
  "error": null,
  "data": {
    "voices": [
      {
        "voice_id": "f38a635bee7a4d1f9b0a654a31d050d2",
        "name": "Chill Brian",
        "language": "English",
        "gender": "male",
        "preview_audio_url": "https://resource.heygen.ai/text_to_speech/WpSDQvmLGXEqXZVZQiVeg6.mp3",
        "support_pause": true,
        "support_locale": false,
        "type": "public"
      }
    ]
  }
}

json

{
  "error": null,
  "data": {
    "voices": [
      {
        "voice_id": "f38a635bee7a4d1f9b0a654a31d050d2",
        "name": "Chill Brian",
        "language": "English",
        "gender": "male",
        "preview_audio_url": "https://resource.heygen.ai/text_to_speech/WpSDQvmLGXEqXZVZQiVeg6.mp3",
        "support_pause": true,
        "support_locale": false,
        "type": "public"
      }
    ]
  }
}

Generate Speech Audio

生成语音音频

Convert text to speech audio using a specified voice.

使用指定语音将文本转换为语音音频。

Endpoint

接口地址

POST https://api.heygen.com/v1/audio/text_to_speech

POST https://api.heygen.com/v1/audio/text_to_speech

Request Fields

请求参数

Field	Type	Req	Description
`text`	string	Y	Text content to convert to speech
`voice_id`	string	Y	Voice ID from `GET /v1/audio/voices`
`speed`	number		Speech speed, 0.5-1.5 (default: 1)
`pitch`	integer		Voice pitch, -50 to 50 (default: 0)
`locale`	string		Accent/locale for multilingual voices (e.g., `en-US` , `pt-BR` )
`elevenlabs_settings`	object		Advanced settings for ElevenLabs voices

参数	类型	必填	描述
`text`	string	是	要转换为语音的文本内容
`voice_id`	string	是	从 `GET /v1/audio/voices` 获取的语音ID
`speed`	number	否	语速，范围0.5-1.5（默认值：1）
`pitch`	integer	否	音调，范围-50至50（默认值：0）
`locale`	string	否	多语言语音的口音/区域设置（例如： `en-US` , `pt-BR` ）
`elevenlabs_settings`	object	否	ElevenLabs语音的高级设置

ElevenLabs Settings (optional)

ElevenLabs设置（可选）

Field	Type	Description
`model`	string	Model selection ( `eleven_v3` , `eleven_turbo_v2_5` , etc.)
`similarity_boost`	number	Voice similarity, 0.0-1.0
`stability`	number	Output consistency, 0.0-1.0
`style`	number	Style intensity, 0.0-1.0

参数	类型	描述
`model`	string	模型选择（ `eleven_v3` , `eleven_turbo_v2_5` 等）
`similarity_boost`	number	语音相似度，范围0.0-1.0
`stability`	number	输出一致性，范围0.0-1.0
`style`	number	风格强度，范围0.0-1.0

curl

curl示例

bash

curl -X POST "https://api.heygen.com/v1/audio/text_to_speech" \
  -H "X-Api-Key: $HEYGEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! Welcome to our product demo.",
    "voice_id": "YOUR_VOICE_ID",
    "speed": 1.0
  }'

bash

curl -X POST "https://api.heygen.com/v1/audio/text_to_speech" \
  -H "X-Api-Key: $HEYGEN_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Hello! Welcome to our product demo.",
    "voice_id": "YOUR_VOICE_ID",
    "speed": 1.0
  }'

TypeScript

TypeScript示例

typescript

interface TTSRequest {
  text: string;
  voice_id: string;
  speed?: number;
  pitch?: number;
  locale?: string;
  elevenlabs_settings?: {
    model?: string;
    similarity_boost?: number;
    stability?: number;
    style?: number;
  };
}

interface WordTimestamp {
  word: string;
  start: number;
  end: number;
}

interface TTSResponse {
  error: null | string;
  data: {
    audio_url: string;
    duration: number;
    request_id: string;
    word_timestamps: WordTimestamp[];
  };
}

async function textToSpeech(request: TTSRequest): Promise<TTSResponse["data"]> {
  const response = await fetch(
    "https://api.heygen.com/v1/audio/text_to_speech",
    {
      method: "POST",
      headers: {
        "X-Api-Key": process.env.HEYGEN_API_KEY!,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(request),
    }
  );

  const json: TTSResponse = await response.json();

  if (json.error) {
    throw new Error(json.error);
  }

  return json.data;
}

typescript

interface TTSRequest {
  text: string;
  voice_id: string;
  speed?: number;
  pitch?: number;
  locale?: string;
  elevenlabs_settings?: {
    model?: string;
    similarity_boost?: number;
    stability?: number;
    style?: number;
  };
}

interface WordTimestamp {
  word: string;
  start: number;
  end: number;
}

interface TTSResponse {
  error: null | string;
  data: {
    audio_url: string;
    duration: number;
    request_id: string;
    word_timestamps: WordTimestamp[];
  };
}

async function textToSpeech(request: TTSRequest): Promise<TTSResponse["data"]> {
  const response = await fetch(
    "https://api.heygen.com/v1/audio/text_to_speech",
    {
      method: "POST",
      headers: {
        "X-Api-Key": process.env.HEYGEN_API_KEY!,
        "Content-Type": "application/json",
      },
      body: JSON.stringify(request),
    }
  );

  const json: TTSResponse = await response.json();

  if (json.error) {
    throw new Error(json.error);
  }

  return json.data;
}

Python

Python示例

python

import requests
import os

def text_to_speech(
    text: str,
    voice_id: str,
    speed: float = 1.0,
    pitch: int = 0,
    locale: str | None = None,
) -> dict:
    payload = {
        "text": text,
        "voice_id": voice_id,
        "speed": speed,
        "pitch": pitch,
    }

    if locale:
        payload["locale"] = locale

    response = requests.post(
        "https://api.heygen.com/v1/audio/text_to_speech",
        headers={
            "X-Api-Key": os.environ["HEYGEN_API_KEY"],
            "Content-Type": "application/json",
        },
        json=payload,
    )

    data = response.json()
    if data.get("error"):
        raise Exception(data["error"])

    return data["data"]

python

import requests
import os

def text_to_speech(
    text: str,
    voice_id: str,
    speed: float = 1.0,
    pitch: int = 0,
    locale: str | None = None,
) -> dict:
    payload = {
        "text": text,
        "voice_id": voice_id,
        "speed": speed,
        "pitch": pitch,
    }

    if locale:
        payload["locale"] = locale

    response = requests.post(
        "https://api.heygen.com/v1/audio/text_to_speech",
        headers={
            "X-Api-Key": os.environ["HEYGEN_API_KEY"],
            "Content-Type": "application/json",
        },
        json=payload,
    )

    data = response.json()
    if data.get("error"):
        raise Exception(data["error"])

    return data["data"]

Response Format

响应格式

json

{
  "error": null,
  "data": {
    "audio_url": "https://resource2.heygen.ai/text_to_speech/.../id=365d46bb.wav",
    "duration": 5.526,
    "request_id": "p38QJ52hfgNlsYKZZmd9",
    "word_timestamps": [
      { "word": "<start>", "start": 0.0, "end": 0.0 },
      { "word": "Hey", "start": 0.079, "end": 0.219 },
      { "word": "there,", "start": 0.239, "end": 0.459 },
      { "word": "<end>", "start": 5.526, "end": 5.526 }
    ]
  }
}

json

{
  "error": null,
  "data": {
    "audio_url": "https://resource2.heygen.ai/text_to_speech/.../id=365d46bb.wav",
    "duration": 5.526,
    "request_id": "p38QJ52hfgNlsYKZZmd9",
    "word_timestamps": [
      { "word": "<start>", "start": 0.0, "end": 0.0 },
      { "word": "Hey", "start": 0.079, "end": 0.219 },
      { "word": "there,", "start": 0.239, "end": 0.459 },
      { "word": "<end>", "start": 5.526, "end": 5.526 }
    ]
  }
}

Usage Examples

使用示例

Basic TTS

基础文本转语音

typescript

const result = await textToSpeech({
  text: "Welcome to our quarterly earnings call.",
  voice_id: "YOUR_VOICE_ID",
});

console.log(`Audio URL: ${result.audio_url}`);
console.log(`Duration: ${result.duration}s`);

typescript

const result = await textToSpeech({
  text: "Welcome to our quarterly earnings call.",
  voice_id: "YOUR_VOICE_ID",
});

console.log(`Audio URL: ${result.audio_url}`);
console.log(`Duration: ${result.duration}s`);

With Speed Adjustment

调节语速

typescript

const result = await textToSpeech({
  text: "We're thrilled to announce our newest feature!",
  voice_id: "YOUR_VOICE_ID",
  speed: 1.1,
});

typescript

const result = await textToSpeech({
  text: "We're thrilled to announce our newest feature!",
  voice_id: "YOUR_VOICE_ID",
  speed: 1.1,
});

With Locale for Multilingual Voices

为多语言语音设置区域

typescript

const result = await textToSpeech({
  text: "Bem-vindo ao nosso produto.",
  voice_id: "MULTILINGUAL_VOICE_ID",
  locale: "pt-BR",
});

typescript

const result = await textToSpeech({
  text: "Bem-vindo ao nosso produto.",
  voice_id: "MULTILINGUAL_VOICE_ID",
  locale: "pt-BR",
});

Find a Voice and Generate Audio

查找语音并生成音频

typescript

async function generateSpeech(text: string, language: string): Promise<string> {
  const voices = await listTTSVoices();
  const voice = voices.find(
    (v) => v.language.toLowerCase().includes(language.toLowerCase())
  );

  if (!voice) {
    throw new Error(`No TTS voice found for language: ${language}`);
  }

  const result = await textToSpeech({
    text,
    voice_id: voice.voice_id,
  });

  return result.audio_url;
}

const audioUrl = await generateSpeech("Hello and welcome!", "english");

typescript

async function generateSpeech(text: string, language: string): Promise<string> {
  const voices = await listTTSVoices();
  const voice = voices.find(
    (v) => v.language.toLowerCase().includes(language.toLowerCase())
  );

  if (!voice) {
    throw new Error(`No TTS voice found for language: ${language}`);
  }

  const result = await textToSpeech({
    text,
    voice_id: voice.voice_id,
  });

  return result.audio_url;
}

const audioUrl = await generateSpeech("Hello and welcome!", "english");

Pauses with Break Tags

使用停顿标签添加停顿

Use SSML-style break tags in your text for pauses:

word <break time="1s"/> word

Rules:

Use seconds with
```
s
```
suffix:
```
<break time="1.5s"/>
```
Must have spaces before and after the tag
Self-closing tag format

在文本中使用SSML风格的停顿标签来添加停顿：

word <break time="1s"/> word

规则：

使用带
```
s
```
后缀的秒数：
```
<break time="1.5s"/>
```
标签前后必须有空格
使用自闭合标签格式

Best Practices

最佳实践

Use
GET /v1/audio/voices
to find compatible voices — not all voices from
```
GET /v2/voices
```
support Starfish TTS
Check
support_locale
before setting a
```
locale
```
— only multilingual voices support locale selection
Keep speed between 0.8-1.2 for natural-sounding output
Preview voices using the
```
preview_audio_url
```
before generating (may be null for some voices)
Use
word_timestamps
in the response for caption syncing or timed text overlays
Use SSML break tags in your text for pauses:
```
word <break time="1s"/> word
```

**使用
```
GET /v1/audio/voices
```
**查找兼容语音——并非所有来自
```
GET /v2/voices
```
的语音都支持Starfish TTS
设置
locale
前检查
support_locale
——只有多语言语音支持区域设置
将语速保持在0.8-1.2之间以获得自然的输出效果
生成前预览语音（部分语音的
```
preview_audio_url
```
可能为null）
**使用响应中的
```
word_timestamps
```
**进行字幕同步或定时文本叠加
在文本中使用SSML停顿标签添加停顿：
```
word <break time="1s"/> word
```