streaming-tts-openai
Original:🇺🇸 English
Translated
Low-latency streaming text-to-speech via OpenAI TTS API — adaptive sentence chunking, concurrent fetch pipelining, six voices.
6installs
Sourceframersai/agentos-skills
Added on
NPX Install
npx skill4agent add framersai/agentos-skills streaming-tts-openaiTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →OpenAI Streaming TTS
Use this skill when the agent needs to synthesize speech from LLM output with low latency using OpenAI's TTS API. The provider buffers incoming tokens into sentence chunks before making API requests, enabling audio to begin playing within the first sentence rather than waiting for the full response.
Prefer this provider when a single should cover both LLM and TTS, or when voice quality is important but ElevenLabs is not configured.
OPENAI_API_KEYSetup
Set in the environment or agent secrets store.
OPENAI_API_KEYConfiguration
json
{
"voice": {
"tts": "openai"
}
}With voice and model options:
json
{
"voice": {
"tts": "openai",
"providerOptions": {
"model": "tts-1",
"voice": "nova",
"format": "opus",
"maxBufferMs": 2000
}
}
}Provider Rules
- Prefer for real-time interactions; use
tts-1when audio quality matters more than latency.tts-1-hd - Default voice is . Available voices:
nova,alloy,echo,fable,onyx,nova.shimmer - The provider fetches the next sentence concurrently while the current one plays to minimize gaps.
- All in-flight requests use for cancellation when the session is interrupted.
AbortController - Use to tune the fallback flush timer for fragments without terminal punctuation.
maxBufferMs
Events
| Event | Description |
|---|---|
| Sentence chunk dispatched for synthesis |
| Synthesized audio buffer ready for playback |
| Synthesis complete for a sentence chunk |
| Session cancelled; remaining text not rendered |
| Synthesis request failed |
| Session fully terminated |
Examples
- "Use OpenAI TTS with the nova voice for this conversation."
- "Switch to the HD model for a podcast recording."
- "Start a voice session where OpenAI handles both LLM and speech synthesis."
Constraints
- Requires .
OPENAI_API_KEY - Latency is bounded by the first sentence chunk — longer sentence fragments before a punctuation mark will delay audio start.
- has higher latency than
tts-1-hd.tts-1