streaming-tts-openai
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOpenAI Streaming TTS
OpenAI Streaming TTS
Use this skill when the agent needs to synthesize speech from LLM output with low latency using OpenAI's TTS API. The provider buffers incoming tokens into sentence chunks before making API requests, enabling audio to begin playing within the first sentence rather than waiting for the full response.
Prefer this provider when a single should cover both LLM and TTS, or when voice quality is important but ElevenLabs is not configured.
OPENAI_API_KEY当Agent需要利用OpenAI的TTS API从LLM输出中低延迟合成语音时,可使用此Skill。该服务会在发起API请求前将传入的token缓冲为句子块,让音频在第一句完成后即可开始播放,无需等待完整响应。
当您希望用同一个同时覆盖LLM和TTS,或者对语音质量有要求但未配置ElevenLabs时,推荐使用此服务。
OPENAI_API_KEYSetup
设置
Set in the environment or agent secrets store.
OPENAI_API_KEY在环境变量或Agent密钥存储中设置。
OPENAI_API_KEYConfiguration
配置
json
{
"voice": {
"tts": "openai"
}
}With voice and model options:
json
{
"voice": {
"tts": "openai",
"providerOptions": {
"model": "tts-1",
"voice": "nova",
"format": "opus",
"maxBufferMs": 2000
}
}
}json
{
"voice": {
"tts": "openai"
}
}包含语音和模型选项的配置:
json
{
"voice": {
"tts": "openai",
"providerOptions": {
"model": "tts-1",
"voice": "nova",
"format": "opus",
"maxBufferMs": 2000
}
}
}Provider Rules
服务规则
- Prefer for real-time interactions; use
tts-1when audio quality matters more than latency.tts-1-hd - Default voice is . Available voices:
nova,alloy,echo,fable,onyx,nova.shimmer - The provider fetches the next sentence concurrently while the current one plays to minimize gaps.
- All in-flight requests use for cancellation when the session is interrupted.
AbortController - Use to tune the fallback flush timer for fragments without terminal punctuation.
maxBufferMs
- 实时交互优先使用;当音频质量比延迟更重要时,使用
tts-1。tts-1-hd - 默认语音为。可选语音包括:
nova、alloy、echo、fable、onyx、nova。shimmer - 当前句子播放时,服务会并发获取下一句,以最小化间隔。
- 所有进行中的请求会在会话中断时使用取消。
AbortController - 使用来调整无终止标点片段的回退刷新计时器。
maxBufferMs
Events
事件
| Event | Description |
|---|---|
| Sentence chunk dispatched for synthesis |
| Synthesized audio buffer ready for playback |
| Synthesis complete for a sentence chunk |
| Session cancelled; remaining text not rendered |
| Synthesis request failed |
| Session fully terminated |
| 事件名称 | 描述 |
|---|---|
| 句子块已分发用于合成 |
| 合成的音频缓冲已准备好播放 |
| 句子块的合成已完成 |
| 会话已取消;剩余文本未渲染 |
| 合成请求失败 |
| 会话已完全终止 |
Examples
示例
- "Use OpenAI TTS with the nova voice for this conversation."
- "Switch to the HD model for a podcast recording."
- "Start a voice session where OpenAI handles both LLM and speech synthesis."
- "本次对话使用OpenAI TTS的nova语音。"
- "切换到HD模型进行播客录制。"
- "启动一个由OpenAI同时处理LLM和语音合成的语音会话。"
Constraints
限制
- Requires .
OPENAI_API_KEY - Latency is bounded by the first sentence chunk — longer sentence fragments before a punctuation mark will delay audio start.
- has higher latency than
tts-1-hd.tts-1
- 需要。
OPENAI_API_KEY - 延迟取决于第一个句子块——标点前的较长句子片段会延迟音频启动。
- 的延迟高于
tts-1-hd。tts-1