speech-build

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Speech Skill (TTS & STT)

语音技能（TTS & STT）

Use this skill to implement audio generation and transcription workflows using the

google-genai

and

google-cloud-speech

SDKs.

使用此技能，你可以借助

google-genai

和

google-cloud-speech

SDK实现音频生成与转录工作流。

Quick Start Setup

快速开始设置

python

from google import genai
from google.genai import types

python

from google import genai
from google.genai import types

For STT: from google.cloud import speech_v2

client = genai.Client()

undefined

client = genai.Client()

undefined

Reference Materials

参考资料

Text-to-Speech (TTS): Gemini-TTS, Chirp 3 HD, Instant Custom Voice.
Speech-to-Text (STT): Chirp 3 Transcription, Diarization, Streaming.
Voices & Locales: Available voices (
```
Aoede
```
,
```
Puck
```
...) and languages.
Prompting Guide: How to control style, accent, and pacing in Gemini-TTS.
Source Code: Deep inspection of SDK internals.

文本转语音（TTS）: Gemini-TTS、Chirp 3 HD、即时自定义语音。
语音转文本（STT）: Chirp 3转录、说话人分离、流式处理。
语音与区域设置: 可用语音（
```
Aoede
```
、
```
Puck
```
等）及语言。
提示词指南: 如何在Gemini-TTS中控制风格、口音与语速。
源代码: 深入查看SDK内部实现。

Common Workflows

常见工作流

1. Generate Speech (Gemini-TTS)

1. 生成语音（Gemini-TTS）

python

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-tts",
    contents="Hello, world!",
    config=types.GenerateContentConfig(
        response_modalities=["AUDIO"],
        speech_config=types.SpeechConfig(
            voice_config=types.VoiceConfig(
                prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
            )
        )
    )
)

python

response = client.models.generate_content(
    model="gemini-2.5-flash-preview-tts",
    contents="Hello, world!",
    config=types.GenerateContentConfig(
        response_modalities=["AUDIO"],
        speech_config=types.SpeechConfig(
            voice_config=types.VoiceConfig(
                prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
            )
        )
    )
)

2. Transcribe Audio (Chirp 3)

2. 转录音频（Chirp 3）

python

undefined

python

undefined

Requires google-cloud-speech

from google.cloud import speech_v2

... (See stt.md for full setup)

...（完整设置请查看stt.md）

response = speech_client.recognize(...)

undefined

response = speech_client.recognize(...)

undefined