Loading...
Loading...
Convert text to natural speech using Sarvam AI's Bulbul v3 model. Handles audio generation, voiceovers, and voice interfaces for 11 Indian languages with 30+ voices. Supports REST, HTTP streaming, WebSocket, and pronunciation dictionaries. Use when generating spoken audio from text.
npx skill4agent add sarvamai/skills text-to-speech[!IMPORTANT] Auth:header — NOTapi-subscription-key. Base URL:Authorization: Bearerhttps://api.sarvam.ai/v1
bulbul:v3shubhfrom sarvamai import SarvamAI
from sarvamai.play import save
client = SarvamAI()
response = client.text_to_speech.convert(
text="नमस्ते, आप कैसे हैं?",
target_language_code="hi-IN",
model="bulbul:v3",
speaker="shubh"
)
save(response, "output.wav")
# HTTP Stream (lower latency, binary audio)
chunks = []
for chunk in client.text_to_speech.convert_stream(
text="Hello from Sarvam AI",
target_language_code="en-IN",
speaker="shubh",
model="bulbul:v3"
):
chunks.append(chunk)
audio = b"".join(chunks)import { SarvamAIClient } from "sarvamai";
import { writeFile } from "fs/promises";
const client = new SarvamAIClient({ apiSubscriptionKey: "YOUR_SARVAM_API_KEY" });
// REST
const response = await client.textToSpeech.convert({
text: "नमस्ते, आप कैसे हैं?",
target_language_code: "hi-IN",
model: "bulbul:v3",
speaker: "shubh"
});
// HTTP Stream (lower latency, returns BinaryResponse)
const streamResponse = await client.textToSpeech.convertStream({
text: "Hello from Sarvam AI",
target_language_code: "en-IN",
speaker: "shubh",
model: "bulbul:v3"
});
const bytes = await streamResponse.bytes();
await writeFile("output.wav", bytes);import asyncio
from sarvamai import AsyncSarvamAI
async def tts_stream():
client = AsyncSarvamAI()
async with client.text_to_speech_streaming.connect(model="bulbul:v3") as ws:
await ws.configure(target_language_code="hi-IN", speaker="shubh")
await ws.convert("Your text here")
await ws.flush()
async for message in ws:
pass # base64 audio chunks
asyncio.run(tts_stream())| Method | Max Text |
|---|---|
REST ( | 2,500 chars |
HTTP Stream ( | 3,500 chars |
| WebSocket | 2,500 chars/msg |
| Gotcha | Detail |
|---|---|
| JS method name | |
| SDK accepts these but API returns 400 for v3. Only |
| v2 voices incompatible | |
| Sample rate >24kHz | 32kHz, 44.1kHz, 48kHz only via REST, not streaming. |
| REST response | Base64-encoded audio in |
| Pronunciation dictionary | |