streaming-tts-openai

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

OpenAI Streaming TTS

Use this skill when the agent needs to synthesize speech from LLM output with low latency using OpenAI's TTS API. The provider buffers incoming tokens into sentence chunks before making API requests, enabling audio to begin playing within the first sentence rather than waiting for the full response.

Prefer this provider when a single

OPENAI_API_KEY

should cover both LLM and TTS, or when voice quality is important but ElevenLabs is not configured.

当Agent需要利用OpenAI的TTS API从LLM输出中低延迟合成语音时，可使用此Skill。该服务会在发起API请求前将传入的token缓冲为句子块，让音频在第一句完成后即可开始播放，无需等待完整响应。

当您希望用同一个

OPENAI_API_KEY

同时覆盖LLM和TTS，或者对语音质量有要求但未配置ElevenLabs时，推荐使用此服务。

Setup

设置

Set

OPENAI_API_KEY

in the environment or agent secrets store.

在环境变量或Agent密钥存储中设置

OPENAI_API_KEY

。

Configuration

配置

json

{
  "voice": {
    "tts": "openai"
  }
}

With voice and model options:

json

{
  "voice": {
    "tts": "openai",
    "providerOptions": {
      "model": "tts-1",
      "voice": "nova",
      "format": "opus",
      "maxBufferMs": 2000
    }
  }
}

json

{
  "voice": {
    "tts": "openai"
  }
}

包含语音和模型选项的配置：

json

{
  "voice": {
    "tts": "openai",
    "providerOptions": {
      "model": "tts-1",
      "voice": "nova",
      "format": "opus",
      "maxBufferMs": 2000
    }
  }
}

Provider Rules

服务规则

Prefer
```
tts-1
```
for real-time interactions; use
```
tts-1-hd
```
when audio quality matters more than latency.
Default voice is
```
nova
```
. Available voices:
```
alloy
```
,
```
echo
```
,
```
fable
```
,
```
onyx
```
,
```
nova
```
,
```
shimmer
```
.
The provider fetches the next sentence concurrently while the current one plays to minimize gaps.
All in-flight requests use
```
AbortController
```
for cancellation when the session is interrupted.
Use
```
maxBufferMs
```
to tune the fallback flush timer for fragments without terminal punctuation.

实时交互优先使用
```
tts-1
```
；当音频质量比延迟更重要时，使用
```
tts-1-hd
```
。
默认语音为
```
nova
```
。可选语音包括：
```
alloy
```
、
```
echo
```
、
```
fable
```
、
```
onyx
```
、
```
nova
```
、
```
shimmer
```
。
当前句子播放时，服务会并发获取下一句，以最小化间隔。
所有进行中的请求会在会话中断时使用
```
AbortController
```
取消。
使用
```
maxBufferMs
```
来调整无终止标点片段的回退刷新计时器。

Events

事件

Event	Description
`utterance_start`	Sentence chunk dispatched for synthesis
`audio_chunk`	Synthesized audio buffer ready for playback
`utterance_complete`	Synthesis complete for a sentence chunk
`cancelled`	Session cancelled; remaining text not rendered
`error`	Synthesis request failed
`close`	Session fully terminated

事件名称	描述
`utterance_start`	句子块已分发用于合成
`audio_chunk`	合成的音频缓冲已准备好播放
`utterance_complete`	句子块的合成已完成
`cancelled`	会话已取消；剩余文本未渲染
`error`	合成请求失败
`close`	会话已完全终止

Examples

示例

"Use OpenAI TTS with the nova voice for this conversation."
"Switch to the HD model for a podcast recording."
"Start a voice session where OpenAI handles both LLM and speech synthesis."

"本次对话使用OpenAI TTS的nova语音。"
"切换到HD模型进行播客录制。"
"启动一个由OpenAI同时处理LLM和语音合成的语音会话。"

Constraints

限制

Requires
```
OPENAI_API_KEY
```
.
Latency is bounded by the first sentence chunk — longer sentence fragments before a punctuation mark will delay audio start.
```
tts-1-hd
```
has higher latency than
```
tts-1
```
.

需要
```
OPENAI_API_KEY
```
。
延迟取决于第一个句子块——标点前的较长句子片段会延迟音频启动。
```
tts-1-hd
```
的延迟高于
```
tts-1
```
。