together-audio
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTogether Audio (TTS & STT)
Together 音频(TTS & STT)
Overview
概述
Together AI provides text-to-speech and speech-to-text capabilities.
TTS — Generate speech from text via REST, streaming, or WebSocket:
- Endpoint:
/v1/audio/speech - WebSocket:
wss://api.together.xyz/v1/audio/speech/websocket
STT — Transcribe audio to text:
- Endpoint:
/v1/audio/transcriptions
Together AI提供文本转语音和语音转文本能力。
TTS — 通过REST、流式传输或WebSocket从文本生成语音:
- 端点:
/v1/audio/speech - WebSocket:
wss://api.together.xyz/v1/audio/speech/websocket
STT — 将音频转写为文本:
- 端点:
/v1/audio/transcriptions
TTS Quick Start
TTS快速入门
Basic Speech Generation
基础语音生成
python
from together import Together
client = Together()
response = client.audio.speech.create(
model="canopylabs/orpheus-3b-0.1-ft",
input="Today is a wonderful day to build something people love!",
voice="tara",
response_format="mp3",
)
response.stream_to_file("speech.mp3")typescript
import Together from "together-ai";
import { Readable } from "stream";
import { createWriteStream } from "fs";
const together = new Together();
async function generateAudio() {
const res = await together.audio.create({
input: "Today is a wonderful day to build something people love!",
voice: "tara",
response_format: "mp3",
sample_rate: 44100,
stream: false,
model: "canopylabs/orpheus-3b-0.1-ft",
});
if (res.body) {
const nodeStream = Readable.from(res.body as ReadableStream);
const fileStream = createWriteStream("./speech.mp3");
nodeStream.pipe(fileStream);
}
}
generateAudio();shell
curl -X POST "https://api.together.xyz/v1/audio/speech" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"canopylabs/orpheus-3b-0.1-ft","input":"Hello world","voice":"tara","response_format":"mp3"}' \
--output speech.mp3python
from together import Together
client = Together()
response = client.audio.speech.create(
model="canopylabs/orpheus-3b-0.1-ft",
input="Today is a wonderful day to build something people love!",
voice="tara",
response_format="mp3",
)
response.stream_to_file("speech.mp3")typescript
import Together from "together-ai";
import { Readable } from "stream";
import { createWriteStream } from "fs";
const together = new Together();
async function generateAudio() {
const res = await together.audio.create({
input: "Today is a wonderful day to build something people love!",
voice: "tara",
response_format: "mp3",
sample_rate: 44100,
stream: false,
model: "canopylabs/orpheus-3b-0.1-ft",
});
if (res.body) {
const nodeStream = Readable.from(res.body as ReadableStream);
const fileStream = createWriteStream("./speech.mp3");
nodeStream.pipe(fileStream);
}
}
generateAudio();shell
curl -X POST "https://api.together.xyz/v1/audio/speech" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-H "Content-Type: application/json" \
-d '{"model":"canopylabs/orpheus-3b-0.1-ft","input":"Hello world","voice":"tara","response_format":"mp3"}' \
--output speech.mp3Streaming Audio (Low Latency)
流式音频(低延迟)
python
response = client.audio.speech.create(
model="canopylabs/orpheus-3b-0.1-ft",
input="The quick brown fox jumps over the lazy dog",
voice="tara",
stream=True,
response_format="raw",
response_encoding="pcm_s16le",
)
response.stream_to_file("speech.wav", response_format="wav")typescript
import Together from "together-ai";
const together = new Together();
async function streamAudio() {
const response = await together.audio.speech.create({
model: "canopylabs/orpheus-3b-0.1-ft",
input: "The quick brown fox jumps over the lazy dog",
voice: "tara",
stream: true,
response_format: "raw",
response_encoding: "pcm_s16le",
});
const chunks = [];
for await (const chunk of response) {
chunks.push(chunk);
}
console.log("Streaming complete!");
}
streamAudio();python
response = client.audio.speech.create(
model="canopylabs/orpheus-3b-0.1-ft",
input="The quick brown fox jumps over the lazy dog",
voice="tara",
stream=True,
response_format="raw",
response_encoding="pcm_s16le",
)
response.stream_to_file("speech.wav", response_format="wav")typescript
import Together from "together-ai";
const together = new Together();
async function streamAudio() {
const response = await together.audio.speech.create({
model: "canopylabs/orpheus-3b-0.1-ft",
input: "The quick brown fox jumps over the lazy dog",
voice: "tara",
stream: true,
response_format: "raw",
response_encoding: "pcm_s16le",
});
const chunks = [];
for await (const chunk of response) {
chunks.push(chunk);
}
console.log("Streaming complete!");
}
streamAudio();WebSocket (Lowest Latency)
WebSocket(最低延迟)
python
import asyncio, websockets, json, base64
async def generate_speech():
url = "wss://api.together.ai/v1/audio/speech/websocket?model=hexgrad/Kokoro-82M&voice=af_alloy"
headers = {"Authorization": f"Bearer {api_key}"}
async with websockets.connect(url, additional_headers=headers) as ws:
session = json.loads(await ws.recv())
await ws.send(json.dumps({"type": "input_text_buffer.append", "text": "Hello!"}))
await ws.send(json.dumps({"type": "input_text_buffer.commit"}))
audio_data = bytearray()
async for msg in ws:
data = json.loads(msg)
if data["type"] == "conversation.item.audio_output.delta":
audio_data.extend(base64.b64decode(data["delta"]))
elif data["type"] == "conversation.item.audio_output.done":
breakpython
import asyncio, websockets, json, base64
async def generate_speech():
url = "wss://api.together.ai/v1/audio/speech/websocket?model=hexgrad/Kokoro-82M&voice=af_alloy"
headers = {"Authorization": f"Bearer {api_key}"}
async with websockets.connect(url, additional_headers=headers) as ws:
session = json.loads(await ws.recv())
await ws.send(json.dumps({"type": "input_text_buffer.append", "text": "Hello!"}))
await ws.send(json.dumps({"type": "input_text_buffer.commit"}))
audio_data = bytearray()
async for msg in ws:
data = json.loads(msg)
if data["type"] == "conversation.item.audio_output.delta":
audio_data.extend(base64.b64decode(data["delta"]))
elif data["type"] == "conversation.item.audio_output.done":
breakTTS Models
TTS模型
| Model | API String | Endpoints | Price |
|---|---|---|---|
| Orpheus 3B | | REST, Streaming, WebSocket | $15/1M chars |
| Kokoro | | REST, Streaming, WebSocket | $4/1M chars |
| Cartesia Sonic 2 | | REST | $65/1M chars |
| Cartesia Sonic | | REST | - |
| Rime Arcana v3 Turbo | | REST, Streaming, WebSocket | DE only |
| MiniMax Speech 2.6 | | REST, Streaming, WebSocket | DE only |
| 模型 | API字符串 | 支持端点 | 价格 |
|---|---|---|---|
| Orpheus 3B | | REST, 流式传输, WebSocket | $15/1M字符 |
| Kokoro | | REST, 流式传输, WebSocket | $4/1M字符 |
| Cartesia Sonic 2 | | REST | $65/1M字符 |
| Cartesia Sonic | | REST | - |
| Rime Arcana v3 Turbo | | REST, 流式传输, WebSocket | 仅德国可用 |
| MiniMax Speech 2.6 | | REST, 流式传输, WebSocket | 仅德国可用 |
TTS Parameters
TTS参数
| Parameter | Type | Description | Default |
|---|---|---|---|
| string | TTS model (required) | - |
| string | Text to synthesize (required) | - |
| string | Voice ID (required) | - |
| string | | |
| bool | Enable streaming (raw format only) | false |
| string | | - |
| string | Language of input text: en, de, fr, es, hi, it, ja, ko, nl, pl, pt, ru, sv, tr, zh | "en" |
| int | Audio sample rate (e.g., 44100) | - |
| 参数 | 类型 | 描述 | 默认值 |
|---|---|---|---|
| string | TTS模型(必填) | - |
| string | 待合成的文本(必填) | - |
| string | 音色ID(必填) | - |
| string | | |
| bool | 启用流式传输(仅raw格式支持) | false |
| string | raw格式可选编码: | - |
| string | 输入文本的语言:en, de, fr, es, hi, it, ja, ko, nl, pl, pt, ru, sv, tr, zh | "en" |
| int | 音频采样率(例如44100) | - |
List Available Voices
列出可用音色
python
response = client.audio.voices.list()
for model_voices in response.data:
print(f"Model: {model_voices.model}")
for voice in model_voices.voices:
print(f" - {voice.name}")Key voices: Orpheus: , , , , , . Kokoro: , , , . See references/tts-models.md for complete voice lists.
taraleahleodanmiazacaf_alloyaf_bellaam_adamam_echopython
response = client.audio.voices.list()
for model_voices in response.data:
print(f"Model: {model_voices.model}")
for voice in model_voices.voices:
print(f" - {voice.name}")核心音色: Orpheus: , , , , , 。Kokoro: , , , 。完整音色列表请查看 references/tts-models.md。
taraleahleodanmiazacaf_alloyaf_bellaam_adamam_echoSTT Quick Start
STT快速入门
Transcribe Audio
音频转写
python
response = client.audio.transcriptions.create(
model="openai/whisper-large-v3",
file=open("audio.mp3", "rb"),
)
print(response.text)typescript
import Together from "together-ai";
const together = new Together();
const transcription = await together.audio.transcriptions.create({
file: "path/to/audio.mp3",
model: "openai/whisper-large-v3",
language: "en",
});
console.log(transcription.text);shell
curl -X POST "https://api.together.xyz/v1/audio/transcriptions" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-F model="openai/whisper-large-v3" \
-F file=@audio.mp3python
response = client.audio.transcriptions.create(
model="openai/whisper-large-v3",
file=open("audio.mp3", "rb"),
)
print(response.text)typescript
import Together from "together-ai";
const together = new Together();
const transcription = await together.audio.transcriptions.create({
file: "path/to/audio.mp3",
model: "openai/whisper-large-v3",
language: "en",
});
console.log(transcription.text);shell
curl -X POST "https://api.together.xyz/v1/audio/transcriptions" \
-H "Authorization: Bearer $TOGETHER_API_KEY" \
-F model="openai/whisper-large-v3" \
-F file=@audio.mp3STT Models
STT模型
| Model | API String |
|---|---|
| Whisper Large v3 | |
| Voxtral Mini 3B | |
| 模型 | API字符串 |
|---|---|
| Whisper Large v3 | |
| Voxtral Mini 3B | |
Delivery Method Guide
交付方式指南
- REST: Batch processing, complete audio files
- Streaming: Real-time apps where TTFB matters
- WebSocket: Interactive/conversational apps, lowest latency
- REST:批处理,完整音频文件生成
- 流式传输:对首包时间(TTFB)有要求的实时应用
- WebSocket:交互/对话类应用,延迟最低
Resources
资源
- Complete voice lists: See references/tts-models.md
- STT details: See references/stt-models.md
- TTS script: See scripts/tts_generate.py — REST, streaming, and WebSocket TTS (v2 SDK)
- STT script: See scripts/stt_transcribe.py — transcribe, translate, diarize with CLI flags (v2 SDK)
- Official docs: Text-to-Speech
- Official docs: Speech-to-Text
- API reference: TTS API
- API reference: STT API