speech-build
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSpeech Skill (TTS & STT)
语音技能(TTS & STT)
Use this skill to implement audio generation and transcription workflows using the and SDKs.
google-genaigoogle-cloud-speech使用此技能,你可以借助和 SDK实现音频生成与转录工作流。
google-genaigoogle-cloud-speechQuick Start Setup
快速开始设置
python
from google import genai
from google.genai import typespython
from google import genai
from google.genai import typesFor STT: from google.cloud import speech_v2
For STT: from google.cloud import speech_v2
client = genai.Client()
undefinedclient = genai.Client()
undefinedReference Materials
参考资料
- Text-to-Speech (TTS): Gemini-TTS, Chirp 3 HD, Instant Custom Voice.
- Speech-to-Text (STT): Chirp 3 Transcription, Diarization, Streaming.
- Voices & Locales: Available voices (,
Aoede...) and languages.Puck - Prompting Guide: How to control style, accent, and pacing in Gemini-TTS.
- Source Code: Deep inspection of SDK internals.
- 文本转语音(TTS): Gemini-TTS、Chirp 3 HD、即时自定义语音。
- 语音转文本(STT): Chirp 3转录、说话人分离、流式处理。
- 语音与区域设置: 可用语音(、
Aoede等)及语言。Puck - 提示词指南: 如何在Gemini-TTS中控制风格、口音与语速。
- 源代码: 深入查看SDK内部实现。
Common Workflows
常见工作流
1. Generate Speech (Gemini-TTS)
1. 生成语音(Gemini-TTS)
python
response = client.models.generate_content(
model="gemini-2.5-flash-preview-tts",
contents="Hello, world!",
config=types.GenerateContentConfig(
response_modalities=["AUDIO"],
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
)
)
)
)python
response = client.models.generate_content(
model="gemini-2.5-flash-preview-tts",
contents="Hello, world!",
config=types.GenerateContentConfig(
response_modalities=["AUDIO"],
speech_config=types.SpeechConfig(
voice_config=types.VoiceConfig(
prebuilt_voice_config=types.PrebuiltVoiceConfig(voice_name='Kore')
)
)
)
)2. Transcribe Audio (Chirp 3)
2. 转录音频(Chirp 3)
python
undefinedpython
undefinedRequires google-cloud-speech
Requires google-cloud-speech
from google.cloud import speech_v2
from google.cloud import speech_v2
... (See stt.md for full setup)
...(完整设置请查看stt.md)
response = speech_client.recognize(...)
undefinedresponse = speech_client.recognize(...)
undefined