invoking-gemini
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseInvoking Gemini
调用Gemini模型
Delegate tasks to Google's Gemini models when they offer advantages over Claude.
当Gemini相比Claude具备优势时,可将任务委托给谷歌的Gemini模型。
When to Use Gemini
何时使用Gemini
Structured outputs:
- JSON Schema validation with property ordering guarantees
- Pydantic model compliance
- Strict schema adherence (enum values, required fields)
Cost optimization:
- Parallel batch processing (Gemini Flash is lightweight)
- High-volume simple tasks
- Budget-constrained operations
Google ecosystem:
- Integration with Google services
- Vertex AI workflows
- Google-specific APIs
Multi-modal tasks:
- Image analysis with JSON output
- Video processing
- Audio transcription with structure
结构化输出:
- 支持带属性顺序保证的JSON Schema验证
- 符合Pydantic模型规范
- 严格遵循Schema(枚举值、必填字段)
成本优化:
- 并行批量处理(Gemini Flash轻量高效)
- 高容量简单任务
- 预算受限的操作场景
谷歌生态系统:
- 与谷歌服务集成
- Vertex AI工作流
- 谷歌专属API
多模态任务:
- 带JSON输出的图像分析
- 视频处理
- 结构化音频转录
Available Models
可用模型
gemini-2.0-flash-exp (Recommended):
- Fast, cost-effective
- Native JSON Schema support
- Good for structured outputs
gemini-1.5-pro:
- More capable reasoning
- Better for complex tasks
- Higher cost
gemini-1.5-flash:
- Balanced speed/quality
- Good for most tasks
See references/models.md for full model details.
gemini-2.0-flash-exp(推荐):
- 快速、性价比高
- 原生支持JSON Schema
- 适用于结构化输出场景
gemini-1.5-pro:
- 推理能力更强
- 适用于复杂任务
- 成本更高
gemini-1.5-flash:
- 速度与质量均衡
- 适用于大多数任务
查看references/models.md获取完整模型详情。
Setup
配置步骤
Prerequisites:
-
Install google-generativeai:bash
uv pip install google-generativeai pydantic -
Configure API key via project knowledge file:Option 1 (recommended): Individual file
- Create document:
GOOGLE_API_KEY.txt - Content: Your API key (e.g., )
AIzaSy...
Option 2: Combined file- Create document:
API_CREDENTIALS.json - Content:
json
{ "google_api_key": "AIzaSy..." }
Get your API key: https://console.cloud.google.com/apis/credentials - Create document:
前置条件:
-
安装google-generativeai:bash
uv pip install google-generativeai pydantic -
通过项目知识库文件配置API密钥:选项1(推荐):独立文件
- 创建文档:
GOOGLE_API_KEY.txt - 内容:你的API密钥(例如:)
AIzaSy...
选项2:组合文件- 创建文档:
API_CREDENTIALS.json - 内容:
json
{ "google_api_key": "AIzaSy..." }
- 创建文档:
Basic Usage
基础用法
Import the client:
python
import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_gemini导入客户端:
python
import sys
sys.path.append('/mnt/skills/invoking-gemini/scripts')
from gemini_client import invoke_geminiSimple prompt
简单提示词
response = invoke_gemini(
prompt="Explain quantum computing in 3 bullet points",
model="gemini-2.0-flash-exp"
)
print(response)
undefinedresponse = invoke_gemini(
prompt="用3个要点解释量子计算",
model="gemini-2.0-flash-exp"
)
print(response)
undefinedStructured Output
结构化输出
Use Pydantic models for guaranteed JSON Schema compliance:
python
from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output
class BookAnalysis(BaseModel):
title: str
genre: str = Field(description="Primary genre")
key_themes: list[str] = Field(max_length=5)
rating: int = Field(ge=1, le=5)
result = invoke_with_structured_output(
prompt="Analyze the book '1984' by George Orwell",
pydantic_model=BookAnalysis
)使用Pydantic模型确保JSON Schema合规性:
python
from pydantic import BaseModel, Field
from gemini_client import invoke_with_structured_output
class BookAnalysis(BaseModel):
title: str
genre: str = Field(description="主要流派")
key_themes: list[str] = Field(max_length=5)
rating: int = Field(ge=1, le=5)
result = invoke_with_structured_output(
prompt="分析乔治·奥威尔的小说《1984》",
pydantic_model=BookAnalysis
)result is a BookAnalysis instance
result为BookAnalysis实例
print(result.title) # "1984"
print(result.genre) # "Dystopian Fiction"
**Advantages over Claude:**
- Guaranteed property ordering in JSON
- Strict enum enforcement
- Native schema validation (no prompt engineering)
- Lower cost for simple extractionsprint(result.title) # "1984"
print(result.genre) # "反乌托邦小说"
**相比Claude的优势:**
- 保证JSON中的属性顺序
- 严格的枚举值验证
- 原生Schema验证(无需提示词工程)
- 简单抽取任务成本更低Parallel Invocation
并行调用
Process multiple prompts concurrently:
python
from gemini_client import invoke_parallel
prompts = [
"Summarize the plot of Hamlet",
"Summarize the plot of Macbeth",
"Summarize the plot of Othello"
]
results = invoke_parallel(
prompts=prompts,
model="gemini-2.0-flash-exp"
)
for prompt, result in zip(prompts, results):
print(f"Q: {prompt[:30]}...")
print(f"A: {result[:100]}...\n")Use cases:
- Batch classification tasks
- Data labeling
- Multiple independent analyses
- A/B testing prompts
同时处理多个提示词:
python
from gemini_client import invoke_parallel
prompts = [
"总结《哈姆雷特》的剧情",
"总结《麦克白》的剧情",
"总结《奥赛罗》的剧情"
]
results = invoke_parallel(
prompts=prompts,
model="gemini-2.0-flash-exp"
)
for prompt, result in zip(prompts, results):
print(f"问题:{prompt[:30]}...")
print(f"回答:{result[:100]}...\n")适用场景:
- 批量分类任务
- 数据标注
- 多组独立分析
- 提示词A/B测试
Error Handling
错误处理
The client handles common errors:
python
from gemini_client import invoke_gemini
response = invoke_gemini(
prompt="Your prompt here",
model="gemini-2.0-flash-exp"
)
if response is None:
print("Error: API call failed")
# Check project knowledge file for valid google_api_keyCommon issues:
- Missing API key → Add GOOGLE_API_KEY.txt to project knowledge (see Setup above)
- Invalid model → Raises ValueError
- Rate limit → Automatically retries with backoff
- Network error → Returns None after retries
客户端可处理常见错误:
python
from gemini_client import invoke_gemini
response = invoke_gemini(
prompt="你的提示词",
model="gemini-2.0-flash-exp"
)
if response is None:
print("错误:API调用失败")
# 检查项目知识库文件中的有效google_api_key常见问题:
- 缺少API密钥 → 在项目知识库中添加文件(见上方配置步骤)
GOOGLE_API_KEY.txt - 无效模型 → 抛出ValueError异常
- 速率限制 → 自动重试并退避
- 网络错误 → 重试后返回None
Advanced Features
高级功能
Custom Generation Config
自定义生成配置
python
response = invoke_gemini(
prompt="Write a haiku",
model="gemini-2.0-flash-exp",
temperature=0.9,
max_output_tokens=100,
top_p=0.95
)python
response = invoke_gemini(
prompt="写一首俳句",
model="gemini-2.0-flash-exp",
temperature=0.9,
max_output_tokens=100,
top_p=0.95
)Multi-modal Input
多模态输入
python
undefinedpython
undefinedImage analysis with structured output
带结构化输出的图像分析
from pydantic import BaseModel
class ImageDescription(BaseModel):
objects: list[str]
scene: str
colors: list[str]
result = invoke_with_structured_output(
prompt="Describe this image",
pydantic_model=ImageDescription,
image_path="/mnt/user-data/uploads/photo.jpg"
)
See [references/advanced.md](references/advanced.md) for more patterns.from pydantic import BaseModel
class ImageDescription(BaseModel):
objects: list[str]
scene: str
colors: list[str]
result = invoke_with_structured_output(
prompt="描述这张图片",
pydantic_model=ImageDescription,
image_path="/mnt/user-data/uploads/photo.jpg"
)
查看[references/advanced.md](references/advanced.md)获取更多使用模式。Comparison: Gemini vs Claude
对比:Gemini vs Claude
Use Gemini when:
- Structured output is primary goal
- Cost is a constraint
- Property ordering matters
- Batch processing many simple tasks
Use Claude when:
- Complex reasoning required
- Long context needed (200K tokens)
- Code generation quality matters
- Nuanced instruction following
Use both:
- Claude for planning/reasoning
- Gemini for structured extraction
- Parallel workflows with different strengths
优先使用Gemini的场景:
- 以结构化输出为核心目标
- 存在成本约束
- 属性顺序有要求
- 批量处理大量简单任务
优先使用Claude的场景:
- 需要复杂推理
- 需要长上下文(200K tokens)
- 对代码生成质量要求高
- 需要精准遵循复杂指令
混合使用场景:
- 使用Claude进行规划与推理
- 使用Gemini进行结构化抽取
- 结合两者优势的并行工作流
Token Efficiency Pattern
令牌效率模式
Gemini Flash is cost-effective for sub-tasks:
python
undefinedGemini Flash适用于成本敏感的子任务:
python
undefinedClaude (you) plans the approach
Claude(你)规划执行方案
Gemini executes structured extractions
Gemini执行结构化抽取
data_points = []
for file in uploaded_files:
# Gemini extracts structured data
result = invoke_with_structured_output(
prompt=f"Extract contact info from {file}",
pydantic_model=ContactInfo
)
data_points.append(result)
data_points = []
for file in uploaded_files:
# Gemini抽取结构化数据
result = invoke_with_structured_output(
prompt=f"从{file}中提取联系信息",
pydantic_model=ContactInfo
)
data_points.append(result)
Claude synthesizes results
Claude整合结果
... your analysis here ...
... 你的分析代码 ...
undefinedundefinedLimitations
局限性
Not suitable for:
- Tasks requiring deep reasoning
- Long context (>1M tokens)
- Complex code generation
- Subjective creative writing
Token limits:
- gemini-2.0-flash-exp: ~1M input tokens
- gemini-1.5-pro: ~2M input tokens
Rate limits:
- Vary by API tier
- Client handles automatic retry
不适用于以下场景:
- 需要深度推理的任务
- 超长上下文(>1M tokens)
- 复杂代码生成
- 主观创意写作
令牌限制:
- gemini-2.0-flash-exp:约1M输入令牌
- gemini-1.5-pro:约2M输入令牌
速率限制:
- 因API层级而异
- 客户端会自动处理重试
Examples
示例
See references/examples.md for:
- Data extraction from documents
- Batch classification
- Multi-modal analysis
- Hybrid Claude+Gemini workflows
查看references/examples.md获取以下场景示例:
- 文档数据抽取
- 批量分类
- 多模态分析
- Claude+Gemini混合工作流
Troubleshooting
故障排除
"API key not configured":
- Add project knowledge file with your API key
GOOGLE_API_KEY.txt - Or add to :
API_CREDENTIALS.json{"google_api_key": "AIzaSy..."} - See Setup section above for details
Import errors:
bash
uv pip install google-generativeai pydanticSchema validation failures:
- Check Pydantic model definitions
- Ensure prompt is clear about expected structure
- Add examples to prompt if needed
提示“API key not configured”:
- 在项目知识库中添加文件并填入你的API密钥
GOOGLE_API_KEY.txt - 或在中添加:
API_CREDENTIALS.json{"google_api_key": "AIzaSy..."} - 详情见上方配置步骤
导入错误:
bash
uv pip install google-generativeai pydanticSchema验证失败:
- 检查Pydantic模型定义
- 确保提示词清晰说明预期结构
- 必要时在提示词中添加示例
Cost Comparison
成本对比
Approximate pricing (as of 2024):
Gemini 2.0 Flash:
- Input: $0.15 / 1M tokens
- Output: $0.60 / 1M tokens
Claude Sonnet:
- Input: $3.00 / 1M tokens
- Output: $15.00 / 1M tokens
For 1000 simple extraction tasks (100 tokens each):
- Gemini Flash: ~$0.10
- Claude Sonnet: ~$2.00
Strategy: Use Claude for complex reasoning, Gemini for high-volume simple tasks.
大致定价(截至2024年):
Gemini 2.0 Flash:
- 输入:$0.15 / 1M令牌
- 输出:$0.60 / 1M令牌
Claude Sonnet:
- 输入:$3.00 / 1M令牌
- 输出:$15.00 / 1M令牌
对于1000个简单抽取任务(每个100令牌):
- Gemini Flash:约$0.10
- Claude Sonnet:约$2.00
策略: 使用Claude处理复杂推理,使用Gemini处理高容量简单任务。