gemini-live-api-dev
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGemini Live API Development Skill
Gemini Live API 开发技能
Overview
概述
The Live API enables low-latency, real-time voice and video interactions with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.
Key capabilities:
- Bidirectional audio streaming — real-time mic-to-speaker conversations
- Video streaming — send camera/screen frames alongside audio
- Text input/output — send and receive text within a live session
- Audio transcriptions — get text transcripts of both input and output audio
- Voice Activity Detection (VAD) — automatic interruption handling
- Native audio — affective dialog, proactive audio, thinking
- Function calling — synchronous and asynchronous tool use
- Google Search grounding — ground responses in real-time search results
- Session management — context compression, session resumption, GoAway signals
- Ephemeral tokens — secure client-side authentication
[!NOTE] The Live API currently only supports WebSockets. For WebRTC support or simplified integration, use a partner integration.
Live API支持通过WebSockets与Gemini实现低延迟的实时音视频交互。它可以处理音频、视频或文本的连续流,输出即时的、类人声的语音响应。
核心能力:
- 双向音频流 — 实现麦克风到扬声器的实时对话
- 视频流 — 可同步传输摄像头/屏幕画面与音频
- 文本输入/输出 — 在实时会话内收发文本
- 音频转写 — 获取输入和输出音频的文本转录内容
- 语音活动检测(VAD) — 自动处理打断场景
- 原生音频 — 情感对话、主动音频输出、思考过程播报
- 函数调用 — 同步和异步工具调用
- 谷歌搜索 grounding — 基于实时搜索结果生成回复
- 会话管理 — 上下文压缩、会话恢复、GoAway信号处理
- 临时令牌 — 安全的客户端侧认证
[!NOTE] Live API 目前仅支持WebSockets。如需WebRTC支持或简化集成,请使用合作伙伴集成方案。
Models
模型
- — Native audio output, affective dialog, proactive audio, thinking. 128k context window. This is the recommended model for all Live API use cases.
gemini-2.5-flash-native-audio-preview-12-2025
[!WARNING] The following Live API models are deprecated and will be shut down. Migrate to.gemini-2.5-flash-native-audio-preview-12-2025
— Released June 17, 2025. Shutdown: December 9, 2025.gemini-live-2.5-flash-preview — Released April 9, 2025. Shutdown: December 9, 2025.gemini-2.0-flash-live-001
- — 支持原生音频输出、情感对话、主动音频、思考过程,上下文窗口为128k。这是所有Live API使用场景的推荐模型。
gemini-2.5-flash-native-audio-preview-12-2025
[!WARNING] 以下Live API模型已废弃,即将停止服务,请迁移至。gemini-2.5-flash-native-audio-preview-12-2025
— 2025年6月17日发布,停止服务时间:2025年12月9日。gemini-live-2.5-flash-preview — 2025年4月9日发布,停止服务时间:2025年12月9日。gemini-2.0-flash-live-001
SDKs
SDK
- Python: —
google-genaipip install google-genai - JavaScript/TypeScript: —
@google/genainpm install @google/genai
[!WARNING] Legacy SDKs(Python) andgoogle-generativeai(JS) are deprecated. Use the new SDKs above.@google/generative-ai
- Python: — 安装命令:
google-genaipip install google-genai - JavaScript/TypeScript: — 安装命令:
@google/genainpm install @google/genai
[!WARNING] 旧版SDK(Python) 和google-generativeai(JS) 已废弃,请使用上述新SDK。@google/generative-ai
Partner Integrations
合作伙伴集成
To streamline real-time audio/video app development, use a third-party integration supporting the Gemini Live API over WebRTC or WebSockets:
- LiveKit — Use the Gemini Live API with LiveKit Agents.
- Pipecat by Daily — Create a real-time AI chatbot using Gemini Live and Pipecat.
- Fishjam by Software Mansion — Create live video and audio streaming applications with Fishjam.
- Vision Agents by Stream — Build real-time voice and video AI applications with Vision Agents.
- Voximplant — Connect inbound and outbound calls to Live API with Voximplant.
- Firebase AI SDK — Get started with the Gemini Live API using Firebase AI Logic.
如需简化实时音视频应用开发,可使用支持通过WebRTC或WebSockets接入Gemini Live API的第三方集成方案:
- LiveKit — 配合LiveKit Agents使用Gemini Live API
- Pipecat by Daily — 基于Gemini Live和Pipecat构建实时AI聊天机器人
- Fishjam by Software Mansion — 基于Fishjam构建实时音视频流应用
- Vision Agents by Stream — 基于Vision Agents构建实时音视频AI应用
- Voximplant — 通过Voximplant将呼入和呼出电话接入Live API
- Firebase AI SDK — 基于Firebase AI Logic快速上手Gemini Live API
Audio Formats
音频格式
- Input: Raw PCM, little-endian, 16-bit, mono. 16kHz native (will resample others). MIME type:
audio/pcm;rate=16000 - Output: Raw PCM, little-endian, 16-bit, mono. 24kHz sample rate.
[!IMPORTANT] Use/send_realtime_inputfor all real-time user input (audio, video, and text). UsesendRealtimeInput/send_client_contentonly for incremental conversation history updates (appending prior turns to context), not for sending new user messages.sendClientContent
[!WARNING] Do not useinmedia. Use the specific keys:sendRealtimeInputfor audio data,audiofor images/video frames, andvideofor text input.text
- 输入:原始PCM、小端序、16位、单声道,原生采样率为16kHz(其他采样率会自动重采样),MIME类型:
audio/pcm;rate=16000 - 输出:原始PCM、小端序、16位、单声道,采样率为24kHz。
[!IMPORTANT] 所有实时用户输入(音频、视频以及文本)都使用/send_realtime_input发送。sendRealtimeInput/send_client_content仅用于增量更新会话历史(往上下文中追加之前的对话轮次),不可用于发送新的用户消息。sendClientContent
[!WARNING] 不要在中使用sendRealtimeInput字段,请使用对应的专用字段:media传音频数据、audio传图片/视频帧、video传文本输入。text
Quick Start
快速开始
Authentication
认证
Python
Python
python
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")python
from google import genai
client = genai.Client(api_key="YOUR_API_KEY")JavaScript
JavaScript
js
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' });js
import { GoogleGenAI } from '@google/genai';
const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' });Connecting to the Live API
连接Live API
Python
Python
python
from google.genai import types
config = types.LiveConnectConfig(
response_modalities=[types.Modality.AUDIO],
system_instruction=types.Content(
parts=[types.Part(text="You are a helpful assistant.")]
)
)
async with client.aio.live.connect(model="gemini-2.5-flash-native-audio-preview-12-2025", config=config) as session:
pass # Session is now activepython
from google.genai import types
config = types.LiveConnectConfig(
response_modalities=[types.Modality.AUDIO],
system_instruction=types.Content(
parts=[types.Part(text="You are a helpful assistant.")]
)
)
async with client.aio.live.connect(model="gemini-2.5-flash-native-audio-preview-12-2025", config=config) as session:
pass # 会话已激活JavaScript
JavaScript
js
const session = await ai.live.connect({
model: 'gemini-2.5-flash-native-audio-preview-12-2025',
config: {
responseModalities: ['audio'],
systemInstruction: { parts: [{ text: 'You are a helpful assistant.' }] }
},
callbacks: {
onopen: () => console.log('Connected'),
onmessage: (response) => console.log('Message:', response),
onerror: (error) => console.error('Error:', error),
onclose: () => console.log('Closed')
}
});js
const session = await ai.live.connect({
model: 'gemini-2.5-flash-native-audio-preview-12-2025',
config: {
responseModalities: ['audio'],
systemInstruction: { parts: [{ text: 'You are a helpful assistant.' }] }
},
callbacks: {
onopen: () => console.log('已连接'),
onmessage: (response) => console.log('收到消息:', response),
onerror: (error) => console.error('发生错误:', error),
onclose: () => console.log('连接已关闭')
}
});Sending Text
发送文本
Python
Python
python
await session.send_realtime_input(text="Hello, how are you?")python
await session.send_realtime_input(text="Hello, how are you?")JavaScript
JavaScript
js
session.sendRealtimeInput({ text: 'Hello, how are you?' });js
session.sendRealtimeInput({ text: 'Hello, how are you?' });Sending Audio
发送音频
Python
Python
python
await session.send_realtime_input(
audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")
)python
await session.send_realtime_input(
audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")
)JavaScript
JavaScript
js
session.sendRealtimeInput({
audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' }
});js
session.sendRealtimeInput({
audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' }
});Sending Video
发送视频
Python
Python
python
undefinedpython
undefinedframe: raw JPEG-encoded bytes
frame: JPEG编码的原始字节数据
await session.send_realtime_input(
video=types.Blob(data=frame, mime_type="image/jpeg")
)
undefinedawait session.send_realtime_input(
video=types.Blob(data=frame, mime_type="image/jpeg")
)
undefinedJavaScript
JavaScript
js
session.sendRealtimeInput({
video: { data: frame.toString('base64'), mimeType: 'image/jpeg' }
});js
session.sendRealtimeInput({
video: { data: frame.toString('base64'), mimeType: 'image/jpeg' }
});Receiving Audio and Text
接收音频和文本
Python
Python
python
async for response in session.receive():
content = response.server_content
if content:
# Audio
if content.model_turn:
for part in content.model_turn.parts:
if part.inline_data:
audio_data = part.inline_data.data
# Transcription
if content.input_transcription:
print(f"User: {content.input_transcription.text}")
if content.output_transcription:
print(f"Gemini: {content.output_transcription.text}")
# Interruption
if content.interrupted is True:
pass # Stop playback, clear audio queuepython
async for response in session.receive():
content = response.server_content
if content:
# 音频处理
if content.model_turn:
for part in content.model_turn.parts:
if part.inline_data:
audio_data = part.inline_data.data
# 转写处理
if content.input_transcription:
print(f"用户: {content.input_transcription.text}")
if content.output_transcription:
print(f"Gemini: {content.output_transcription.text}")
# 打断处理
if content.interrupted is True:
pass # 停止播放,清空音频队列JavaScript
JavaScript
js
// Inside the onmessage callback
const content = response.serverContent;
if (content?.modelTurn?.parts) {
for (const part of content.modelTurn.parts) {
if (part.inlineData) {
const audioData = part.inlineData.data; // Base64 encoded
}
}
}
if (content?.inputTranscription) console.log('User:', content.inputTranscription.text);
if (content?.outputTranscription) console.log('Gemini:', content.outputTranscription.text);
if (content?.interrupted) { /* Stop playback, clear audio queue */ }js
// 在onmessage回调内
const content = response.serverContent;
if (content?.modelTurn?.parts) {
for (const part of content.modelTurn.parts) {
if (part.inlineData) {
const audioData = part.inlineData.data; // Base64编码
}
}
}
if (content?.inputTranscription) console.log('用户:', content.inputTranscription.text);
if (content?.outputTranscription) console.log('Gemini:', content.outputTranscription.text);
if (content?.interrupted) { /* 停止播放,清空音频队列 */ }Limitations
限制
- Response modality — Only or
TEXTper session, not bothAUDIO - Audio-only session — 15 min without compression
- Audio+video session — 2 min without compression
- Connection lifetime — ~10 min (use session resumption)
- Context window — 128k tokens (native audio) / 32k tokens (standard)
- Code execution — Not supported
- URL context — Not supported
- 响应模态 — 单个会话仅支持或
TEXT,不可同时支持两者AUDIO - 纯音频会话 — 无压缩时最长15分钟
- 音视频会话 — 无压缩时最长2分钟
- 连接生命周期 — 约10分钟(可使用会话恢复功能延长)
- 上下文窗口 — 128k tokens(原生音频模型)/ 32k tokens(标准模型)
- 代码执行 — 不支持
- URL上下文 — 不支持
Best Practices
最佳实践
- Use headphones when testing mic audio to prevent echo/self-interruption
- Enable context window compression for sessions longer than 15 minutes
- Implement session resumption to handle connection resets gracefully
- Use ephemeral tokens for client-side deployments — never expose API keys in browsers
- Use for all real-time user input (audio, video, text). Reserve
send_realtime_inputonly for injecting conversation historysend_client_content - Send when the mic is paused to flush cached audio
audioStreamEnd - Clear audio playback queues on interruption signals
- 测试麦克风音频时使用耳机,避免回声/自我打断
- 会话时长超过15分钟时开启上下文窗口压缩
- 实现会话恢复逻辑,优雅处理连接重置问题
- 客户端部署时使用临时令牌 — 绝对不要在浏览器中暴露API密钥
- 所有实时用户输入(音频、视频、文本)都使用,
send_realtime_input仅用于注入对话历史send_client_content - 麦克风暂停时发送,清空缓存的音频数据
audioStreamEnd - 收到打断信号时清空音频播放队列
How to use the Gemini API
如何使用Gemini API
For detailed API documentation, fetch from the official docs index:
llms.txt URL:
https://ai.google.dev/gemini-api/docs/llms.txtThis index contains links to all documentation pages in format. Use web fetch tools to:
.md.txt- Fetch to discover available documentation pages
llms.txt - Fetch specific pages (e.g., )
https://ai.google.dev/gemini-api/docs/live-session.md.txt
如需详细的API文档,请从官方文档索引获取:
llms.txt 地址:
https://ai.google.dev/gemini-api/docs/llms.txt该索引包含所有格式的文档页面链接,你可以使用网络抓取工具:
.md.txt- 抓取获取可用的文档页面列表
llms.txt - 抓取指定页面(例如)
https://ai.google.dev/gemini-api/docs/live-session.md.txt
Key Documentation Pages
核心文档页面
[!IMPORTANT] Those are not all the documentation pages. Use theindex to discover available documentation pagesllms.txt
- Live API Overview — getting started, raw WebSocket usage
- Live API Capabilities Guide — voice config, transcription config, native audio (affective dialog, proactive audio, thinking), VAD configuration, media resolution
- Live API Tool Use — function calling (sync and async), Google Search grounding
- Session Management — context window compression, session resumption, GoAway signals
- Ephemeral Tokens — secure client-side authentication for browser/mobile
- WebSockets API Reference — raw WebSocket protocol details
[!IMPORTANT] 以下不是全部文档页面,请使用索引获取完整的可用文档列表llms.txt
- Live API 概述 — 入门指南、原生WebSocket使用方式
- Live API 能力指南 — 语音配置、转写配置、原生音频(情感对话、主动音频、思考过程)、VAD配置、媒体分辨率
- Live API 工具使用 — 函数调用(同步/异步)、谷歌搜索grounding
- 会话管理 — 上下文窗口压缩、会话恢复、GoAway信号
- 临时令牌 — 浏览器/移动端的安全客户端认证方案
- WebSockets API 参考 — 原生WebSocket协议详情
Supported Languages
支持的语言
The Live API supports 70 languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Hindi, Arabic, Russian, and many more. Native audio models automatically detect and switch languages.
Live API支持70种语言,包括:英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、印地语、阿拉伯语、俄语等。原生音频模型可自动检测并切换语言。