streaming-tts-openai

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

OpenAI Streaming TTS

OpenAI Streaming TTS

Use this skill when the agent needs to synthesize speech from LLM output with low latency using OpenAI's TTS API. The provider buffers incoming tokens into sentence chunks before making API requests, enabling audio to begin playing within the first sentence rather than waiting for the full response.
Prefer this provider when a single
OPENAI_API_KEY
should cover both LLM and TTS, or when voice quality is important but ElevenLabs is not configured.
当Agent需要利用OpenAI的TTS API从LLM输出中低延迟合成语音时,可使用此Skill。该服务会在发起API请求前将传入的token缓冲为句子块,让音频在第一句完成后即可开始播放,无需等待完整响应。
当您希望用同一个
OPENAI_API_KEY
同时覆盖LLM和TTS,或者对语音质量有要求但未配置ElevenLabs时,推荐使用此服务。

Setup

设置

Set
OPENAI_API_KEY
in the environment or agent secrets store.
在环境变量或Agent密钥存储中设置
OPENAI_API_KEY

Configuration

配置

json
{
  "voice": {
    "tts": "openai"
  }
}
With voice and model options:
json
{
  "voice": {
    "tts": "openai",
    "providerOptions": {
      "model": "tts-1",
      "voice": "nova",
      "format": "opus",
      "maxBufferMs": 2000
    }
  }
}
json
{
  "voice": {
    "tts": "openai"
  }
}
包含语音和模型选项的配置:
json
{
  "voice": {
    "tts": "openai",
    "providerOptions": {
      "model": "tts-1",
      "voice": "nova",
      "format": "opus",
      "maxBufferMs": 2000
    }
  }
}

Provider Rules

服务规则

  • Prefer
    tts-1
    for real-time interactions; use
    tts-1-hd
    when audio quality matters more than latency.
  • Default voice is
    nova
    . Available voices:
    alloy
    ,
    echo
    ,
    fable
    ,
    onyx
    ,
    nova
    ,
    shimmer
    .
  • The provider fetches the next sentence concurrently while the current one plays to minimize gaps.
  • All in-flight requests use
    AbortController
    for cancellation when the session is interrupted.
  • Use
    maxBufferMs
    to tune the fallback flush timer for fragments without terminal punctuation.
  • 实时交互优先使用
    tts-1
    ;当音频质量比延迟更重要时,使用
    tts-1-hd
  • 默认语音为
    nova
    。可选语音包括:
    alloy
    echo
    fable
    onyx
    nova
    shimmer
  • 当前句子播放时,服务会并发获取下一句,以最小化间隔。
  • 所有进行中的请求会在会话中断时使用
    AbortController
    取消。
  • 使用
    maxBufferMs
    来调整无终止标点片段的回退刷新计时器。

Events

事件

EventDescription
utterance_start
Sentence chunk dispatched for synthesis
audio_chunk
Synthesized audio buffer ready for playback
utterance_complete
Synthesis complete for a sentence chunk
cancelled
Session cancelled; remaining text not rendered
error
Synthesis request failed
close
Session fully terminated
事件名称描述
utterance_start
句子块已分发用于合成
audio_chunk
合成的音频缓冲已准备好播放
utterance_complete
句子块的合成已完成
cancelled
会话已取消;剩余文本未渲染
error
合成请求失败
close
会话已完全终止

Examples

示例

  • "Use OpenAI TTS with the nova voice for this conversation."
  • "Switch to the HD model for a podcast recording."
  • "Start a voice session where OpenAI handles both LLM and speech synthesis."
  • "本次对话使用OpenAI TTS的nova语音。"
  • "切换到HD模型进行播客录制。"
  • "启动一个由OpenAI同时处理LLM和语音合成的语音会话。"

Constraints

限制

  • Requires
    OPENAI_API_KEY
    .
  • Latency is bounded by the first sentence chunk — longer sentence fragments before a punctuation mark will delay audio start.
  • tts-1-hd
    has higher latency than
    tts-1
    .
  • 需要
    OPENAI_API_KEY
  • 延迟取决于第一个句子块——标点前的较长句子片段会延迟音频启动。
  • tts-1-hd
    的延迟高于
    tts-1