gemini-live-api-dev

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemini Live API Development Skill

Gemini Live API 开发技能

Overview

概述

The Live API enables low-latency, real-time voice and video interactions with Gemini over WebSockets. It processes continuous streams of audio, video, or text to deliver immediate, human-like spoken responses.
Key capabilities:
  • Bidirectional audio streaming — real-time mic-to-speaker conversations
  • Video streaming — send camera/screen frames alongside audio
  • Text input/output — send and receive text within a live session
  • Audio transcriptions — get text transcripts of both input and output audio
  • Voice Activity Detection (VAD) — automatic interruption handling
  • Native audio — affective dialog, proactive audio, thinking
  • Function calling — synchronous and asynchronous tool use
  • Google Search grounding — ground responses in real-time search results
  • Session management — context compression, session resumption, GoAway signals
  • Ephemeral tokens — secure client-side authentication
[!NOTE] The Live API currently only supports WebSockets. For WebRTC support or simplified integration, use a partner integration.
Live API支持通过WebSockets与Gemini实现低延迟的实时音视频交互。它可以处理音频、视频或文本的连续流,输出即时的、类人声的语音响应。
核心能力:
  • 双向音频流 — 实现麦克风到扬声器的实时对话
  • 视频流 — 可同步传输摄像头/屏幕画面与音频
  • 文本输入/输出 — 在实时会话内收发文本
  • 音频转写 — 获取输入和输出音频的文本转录内容
  • 语音活动检测(VAD) — 自动处理打断场景
  • 原生音频 — 情感对话、主动音频输出、思考过程播报
  • 函数调用 — 同步和异步工具调用
  • 谷歌搜索 grounding — 基于实时搜索结果生成回复
  • 会话管理 — 上下文压缩、会话恢复、GoAway信号处理
  • 临时令牌 — 安全的客户端侧认证
[!NOTE] Live API 目前仅支持WebSockets。如需WebRTC支持或简化集成,请使用合作伙伴集成方案

Models

模型

  • gemini-2.5-flash-native-audio-preview-12-2025
    — Native audio output, affective dialog, proactive audio, thinking. 128k context window. This is the recommended model for all Live API use cases.
[!WARNING] The following Live API models are deprecated and will be shut down. Migrate to
gemini-2.5-flash-native-audio-preview-12-2025
.
  • gemini-live-2.5-flash-preview
    — Released June 17, 2025. Shutdown: December 9, 2025.
  • gemini-2.0-flash-live-001
    — Released April 9, 2025. Shutdown: December 9, 2025.
  • gemini-2.5-flash-native-audio-preview-12-2025
    — 支持原生音频输出、情感对话、主动音频、思考过程,上下文窗口为128k。这是所有Live API使用场景的推荐模型。
[!WARNING] 以下Live API模型已废弃,即将停止服务,请迁移至
gemini-2.5-flash-native-audio-preview-12-2025
  • gemini-live-2.5-flash-preview
    — 2025年6月17日发布,停止服务时间:2025年12月9日。
  • gemini-2.0-flash-live-001
    — 2025年4月9日发布,停止服务时间:2025年12月9日。

SDKs

SDK

  • Python:
    google-genai
    pip install google-genai
  • JavaScript/TypeScript:
    @google/genai
    npm install @google/genai
[!WARNING] Legacy SDKs
google-generativeai
(Python) and
@google/generative-ai
(JS) are deprecated. Use the new SDKs above.
  • Python:
    google-genai
    — 安装命令:
    pip install google-genai
  • JavaScript/TypeScript:
    @google/genai
    — 安装命令:
    npm install @google/genai
[!WARNING] 旧版SDK
google-generativeai
(Python) 和
@google/generative-ai
(JS) 已废弃,请使用上述新SDK。

Partner Integrations

合作伙伴集成

To streamline real-time audio/video app development, use a third-party integration supporting the Gemini Live API over WebRTC or WebSockets:
  • LiveKit — Use the Gemini Live API with LiveKit Agents.
  • Pipecat by Daily — Create a real-time AI chatbot using Gemini Live and Pipecat.
  • Fishjam by Software Mansion — Create live video and audio streaming applications with Fishjam.
  • Vision Agents by Stream — Build real-time voice and video AI applications with Vision Agents.
  • Voximplant — Connect inbound and outbound calls to Live API with Voximplant.
  • Firebase AI SDK — Get started with the Gemini Live API using Firebase AI Logic.
如需简化实时音视频应用开发,可使用支持通过WebRTCWebSockets接入Gemini Live API的第三方集成方案:

Audio Formats

音频格式

  • Input: Raw PCM, little-endian, 16-bit, mono. 16kHz native (will resample others). MIME type:
    audio/pcm;rate=16000
  • Output: Raw PCM, little-endian, 16-bit, mono. 24kHz sample rate.
[!IMPORTANT] Use
send_realtime_input
/
sendRealtimeInput
for all real-time user input (audio, video, and text). Use
send_client_content
/
sendClientContent
only for incremental conversation history updates (appending prior turns to context), not for sending new user messages.
[!WARNING] Do not use
media
in
sendRealtimeInput
. Use the specific keys:
audio
for audio data,
video
for images/video frames, and
text
for text input.

  • 输入:原始PCM、小端序、16位、单声道,原生采样率为16kHz(其他采样率会自动重采样),MIME类型:
    audio/pcm;rate=16000
  • 输出:原始PCM、小端序、16位、单声道,采样率为24kHz。
[!IMPORTANT] 所有实时用户输入(音频、视频以及文本)都使用
send_realtime_input
/
sendRealtimeInput
发送。
send_client_content
/
sendClientContent
用于增量更新会话历史(往上下文中追加之前的对话轮次),不可用于发送新的用户消息。
[!WARNING] 不要在
sendRealtimeInput
中使用
media
字段,请使用对应的专用字段:
audio
传音频数据、
video
传图片/视频帧、
text
传文本输入。

Quick Start

快速开始

Authentication

认证

Python

Python

python
from google import genai

client = genai.Client(api_key="YOUR_API_KEY")
python
from google import genai

client = genai.Client(api_key="YOUR_API_KEY")

JavaScript

JavaScript

js
import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' });
js
import { GoogleGenAI } from '@google/genai';

const ai = new GoogleGenAI({ apiKey: 'YOUR_API_KEY' });

Connecting to the Live API

连接Live API

Python

Python

python
from google.genai import types

config = types.LiveConnectConfig(
    response_modalities=[types.Modality.AUDIO],
    system_instruction=types.Content(
        parts=[types.Part(text="You are a helpful assistant.")]
    )
)

async with client.aio.live.connect(model="gemini-2.5-flash-native-audio-preview-12-2025", config=config) as session:
    pass  # Session is now active
python
from google.genai import types

config = types.LiveConnectConfig(
    response_modalities=[types.Modality.AUDIO],
    system_instruction=types.Content(
        parts=[types.Part(text="You are a helpful assistant.")]
    )
)

async with client.aio.live.connect(model="gemini-2.5-flash-native-audio-preview-12-2025", config=config) as session:
    pass  # 会话已激活

JavaScript

JavaScript

js
const session = await ai.live.connect({
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  config: {
    responseModalities: ['audio'],
    systemInstruction: { parts: [{ text: 'You are a helpful assistant.' }] }
  },
  callbacks: {
    onopen: () => console.log('Connected'),
    onmessage: (response) => console.log('Message:', response),
    onerror: (error) => console.error('Error:', error),
    onclose: () => console.log('Closed')
  }
});
js
const session = await ai.live.connect({
  model: 'gemini-2.5-flash-native-audio-preview-12-2025',
  config: {
    responseModalities: ['audio'],
    systemInstruction: { parts: [{ text: 'You are a helpful assistant.' }] }
  },
  callbacks: {
    onopen: () => console.log('已连接'),
    onmessage: (response) => console.log('收到消息:', response),
    onerror: (error) => console.error('发生错误:', error),
    onclose: () => console.log('连接已关闭')
  }
});

Sending Text

发送文本

Python

Python

python
await session.send_realtime_input(text="Hello, how are you?")
python
await session.send_realtime_input(text="Hello, how are you?")

JavaScript

JavaScript

js
session.sendRealtimeInput({ text: 'Hello, how are you?' });
js
session.sendRealtimeInput({ text: 'Hello, how are you?' });

Sending Audio

发送音频

Python

Python

python
await session.send_realtime_input(
    audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")
)
python
await session.send_realtime_input(
    audio=types.Blob(data=chunk, mime_type="audio/pcm;rate=16000")
)

JavaScript

JavaScript

js
session.sendRealtimeInput({
  audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' }
});
js
session.sendRealtimeInput({
  audio: { data: chunk.toString('base64'), mimeType: 'audio/pcm;rate=16000' }
});

Sending Video

发送视频

Python

Python

python
undefined
python
undefined

frame: raw JPEG-encoded bytes

frame: JPEG编码的原始字节数据

await session.send_realtime_input( video=types.Blob(data=frame, mime_type="image/jpeg") )
undefined
await session.send_realtime_input( video=types.Blob(data=frame, mime_type="image/jpeg") )
undefined

JavaScript

JavaScript

js
session.sendRealtimeInput({
  video: { data: frame.toString('base64'), mimeType: 'image/jpeg' }
});
js
session.sendRealtimeInput({
  video: { data: frame.toString('base64'), mimeType: 'image/jpeg' }
});

Receiving Audio and Text

接收音频和文本

Python

Python

python
async for response in session.receive():
    content = response.server_content
    if content:
        # Audio
        if content.model_turn:
            for part in content.model_turn.parts:
                if part.inline_data:
                    audio_data = part.inline_data.data
        # Transcription
        if content.input_transcription:
            print(f"User: {content.input_transcription.text}")
        if content.output_transcription:
            print(f"Gemini: {content.output_transcription.text}")
        # Interruption
        if content.interrupted is True:
            pass  # Stop playback, clear audio queue
python
async for response in session.receive():
    content = response.server_content
    if content:
        # 音频处理
        if content.model_turn:
            for part in content.model_turn.parts:
                if part.inline_data:
                    audio_data = part.inline_data.data
        # 转写处理
        if content.input_transcription:
            print(f"用户: {content.input_transcription.text}")
        if content.output_transcription:
            print(f"Gemini: {content.output_transcription.text}")
        # 打断处理
        if content.interrupted is True:
            pass  # 停止播放,清空音频队列

JavaScript

JavaScript

js
// Inside the onmessage callback
const content = response.serverContent;
if (content?.modelTurn?.parts) {
  for (const part of content.modelTurn.parts) {
    if (part.inlineData) {
      const audioData = part.inlineData.data; // Base64 encoded
    }
  }
}
if (content?.inputTranscription) console.log('User:', content.inputTranscription.text);
if (content?.outputTranscription) console.log('Gemini:', content.outputTranscription.text);
if (content?.interrupted) { /* Stop playback, clear audio queue */ }

js
// 在onmessage回调内
const content = response.serverContent;
if (content?.modelTurn?.parts) {
  for (const part of content.modelTurn.parts) {
    if (part.inlineData) {
      const audioData = part.inlineData.data; // Base64编码
    }
  }
}
if (content?.inputTranscription) console.log('用户:', content.inputTranscription.text);
if (content?.outputTranscription) console.log('Gemini:', content.outputTranscription.text);
if (content?.interrupted) { /* 停止播放,清空音频队列 */ }

Limitations

限制

  • Response modality — Only
    TEXT
    or
    AUDIO
    per session, not both
  • Audio-only session — 15 min without compression
  • Audio+video session — 2 min without compression
  • Connection lifetime — ~10 min (use session resumption)
  • Context window — 128k tokens (native audio) / 32k tokens (standard)
  • Code execution — Not supported
  • URL context — Not supported
  • 响应模态 — 单个会话仅支持
    TEXT
    AUDIO
    ,不可同时支持两者
  • 纯音频会话 — 无压缩时最长15分钟
  • 音视频会话 — 无压缩时最长2分钟
  • 连接生命周期 — 约10分钟(可使用会话恢复功能延长)
  • 上下文窗口 — 128k tokens(原生音频模型)/ 32k tokens(标准模型)
  • 代码执行 — 不支持
  • URL上下文 — 不支持

Best Practices

最佳实践

  1. Use headphones when testing mic audio to prevent echo/self-interruption
  2. Enable context window compression for sessions longer than 15 minutes
  3. Implement session resumption to handle connection resets gracefully
  4. Use ephemeral tokens for client-side deployments — never expose API keys in browsers
  5. Use
    send_realtime_input
    for all real-time user input (audio, video, text). Reserve
    send_client_content
    only for injecting conversation history
  6. Send
    audioStreamEnd
    when the mic is paused to flush cached audio
  7. Clear audio playback queues on interruption signals
  1. 测试麦克风音频时使用耳机,避免回声/自我打断
  2. 会话时长超过15分钟时开启上下文窗口压缩
  3. 实现会话恢复逻辑,优雅处理连接重置问题
  4. 客户端部署时使用临时令牌 — 绝对不要在浏览器中暴露API密钥
  5. 所有实时用户输入(音频、视频、文本)都使用
    send_realtime_input
    send_client_content
    仅用于注入对话历史
  6. 麦克风暂停时发送
    audioStreamEnd
    ,清空缓存的音频数据
  7. 收到打断信号时清空音频播放队列

How to use the Gemini API

如何使用Gemini API

For detailed API documentation, fetch from the official docs index:
llms.txt URL:
https://ai.google.dev/gemini-api/docs/llms.txt
This index contains links to all documentation pages in
.md.txt
format. Use web fetch tools to:
  1. Fetch
    llms.txt
    to discover available documentation pages
  2. Fetch specific pages (e.g.,
    https://ai.google.dev/gemini-api/docs/live-session.md.txt
    )
如需详细的API文档,请从官方文档索引获取:
llms.txt 地址:
https://ai.google.dev/gemini-api/docs/llms.txt
该索引包含所有
.md.txt
格式的文档页面链接,你可以使用网络抓取工具:
  1. 抓取
    llms.txt
    获取可用的文档页面列表
  2. 抓取指定页面(例如
    https://ai.google.dev/gemini-api/docs/live-session.md.txt

Key Documentation Pages

核心文档页面

[!IMPORTANT] Those are not all the documentation pages. Use the
llms.txt
index to discover available documentation pages
[!IMPORTANT] 以下不是全部文档页面,请使用
llms.txt
索引获取完整的可用文档列表

Supported Languages

支持的语言

The Live API supports 70 languages including: English, Spanish, French, German, Italian, Portuguese, Chinese, Japanese, Korean, Hindi, Arabic, Russian, and many more. Native audio models automatically detect and switch languages.
Live API支持70种语言,包括:英语、西班牙语、法语、德语、意大利语、葡萄牙语、中文、日语、韩语、印地语、阿拉伯语、俄语等。原生音频模型可自动检测并切换语言。