speech-build

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Speech Skill (TTS & STT)

语音技能(TTS & STT)

Use this skill to implement audio generation and transcription workflows using the
google-genai
and
google-cloud-speech
SDKs.
使用此技能,你可以借助
google-genai
google-cloud-speech
SDK实现音频生成与转录工作流。

Quick Start Setup

快速开始设置

python
from google import genai
from google.genai import types
python
from google import genai
from google.genai import types

For STT: from google.cloud import speech_v2

For STT: from google.cloud import speech_v2

client = genai.Client()
undefined
client = genai.Client()
undefined

Reference Materials

参考资料

  • Text-to-Speech (TTS): Gemini-TTS, Chirp 3 HD, Instant Custom Voice.
  • Speech-to-Text (STT): Chirp 3 Transcription, Diarization, Streaming.
  • Voices & Locales: Available voices (
    Aoede
    ,
    Puck
    ...) and languages.
  • Prompting Guide: How to control style, accent, and pacing in Gemini-TTS.
  • Source Code: Deep inspection of SDK internals.
  • 文本转语音(TTS): Gemini-TTS、Chirp 3 HD、即时自定义语音。
  • 语音转文本(STT): Chirp 3转录、说话人分离、流式处理。
  • 语音与区域设置: 可用语音(
    Aoede
    Puck
    等)及语言。
  • 提示词指南: 如何在Gemini-TTS中控制风格、口音与语速。
  • 源代码: 深入查看SDK内部实现。

Common Workflows

常见工作流

1. Generate Speech (Gemini-TTS)

1. 生成语音(Gemini-TTS)

python
response = client.models.generate_content(
    model="gemini-2.5-flash-preview-tts",
    contents="Hello, world!",
    config=types.GenerateContentConfig(
        response_modalities=["AUDIO"],
        speech_config=types.SpeechConfig(
            voice_config=types.VoiceConfig(
                prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
            )
        )
    )
)
python
response = client.models.generate_content(
    model="gemini-2.5-flash-preview-tts",
    contents="Hello, world!",
    config=types.GenerateContentConfig(
        response_modalities=["AUDIO"],
        speech_config=types.SpeechConfig(
            voice_config=types.VoiceConfig(
                prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
            )
        )
    )
)

2. Transcribe Audio (Chirp 3)

2. 转录音频(Chirp 3)

python
undefined
python
undefined

Requires google-cloud-speech

Requires google-cloud-speech

from google.cloud import speech_v2
from google.cloud import speech_v2

... (See stt.md for full setup)

...(完整设置请查看stt.md)

response = speech_client.recognize(...)
undefined
response = speech_client.recognize(...)
undefined