gemini-image-generator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGemini Image Generator
Gemini 图像生成工具
Operator Context
工具运行背景
This skill operates as an operator for CLI-based image generation, configuring Claude's behavior for deterministic Python script execution against Google Gemini APIs. It implements an Execute-Verify pattern — validate environment, generate image, verify output — with Domain Intelligence embedded in model selection and prompt engineering.
本工具是一款基于CLI的图像生成操作器,配置Claude的行为以通过Python脚本调用Google Gemini APIs执行确定性图像生成任务。它采用执行-验证模式——验证环境、生成图像、验证输出——并在模型选择和提示词工程中融入了领域智能。
Hardcoded Behaviors (Always Apply)
硬编码行为(始终生效)
- CLAUDE.md Compliance: Read and follow repository CLAUDE.md files
- Over-Engineering Prevention: Only generate what is directly requested
- Exact Model Names: Use only or
gemini-2.5-flash-image— no variations, no date suffixesgemini-3-pro-image-preview - API Key Validation: Always verify exists before any generation attempt
GEMINI_API_KEY - Output Verification: Confirm output file exists and is non-zero bytes after generation
- Absolute Paths: Always use absolute paths for output files
- 遵循CLAUDE.md规范:阅读并遵循仓库中的CLAUDE.md文件
- 避免过度设计:仅生成用户直接请求的内容
- 使用精确模型名称:仅使用或
gemini-2.5-flash-image——不使用变体,不添加日期后缀gemini-3-pro-image-preview - API密钥验证:在尝试生成图像前,始终验证是否存在
GEMINI_API_KEY - 输出验证:生成完成后,确认输出文件存在且文件大小非零
- 使用绝对路径:输出文件始终使用绝对路径
Default Behaviors (ON unless disabled)
默认行为(默认开启,可关闭)
- Show Complete Output: Display full script output, never summarize
- Rate Limit Handling: Wait between requests to avoid 429 errors
- Retry on Failure: Retry transient failures with exponential backoff (3 attempts)
- Status Reporting: Output structured status for Claude to parse
- 显示完整输出:显示脚本的完整输出,绝不摘要
- 速率限制处理:在请求之间等待,避免429错误
- 失败重试:对瞬时失败进行指数退避重试(最多3次)
- 状态报告:输出结构化状态供Claude解析
Optional Behaviors (OFF unless enabled)
可选行为(默认关闭,可开启)
- Watermark Removal: Clean watermarks from corners with
--remove-watermark - Background Transparency: Make solid backgrounds transparent with
--transparent-bg - Batch Mode: Generate multiple images from a prompt file with
--batch
- 水印去除:使用清除角落的水印
--remove-watermark - 背景透明化:使用将纯色背景设为透明
--transparent-bg - 批量模式:使用从提示词文件生成多张图像
--batch
What This Skill CAN Do
本工具可实现的功能
- Generate images from text prompts via CLI using Gemini APIs
- Select between fast () and quality (
gemini-2.5-flash-image) modelsgemini-3-pro-image-preview - Save images to specified file paths with automatic directory creation
- Remove watermarks from generated images via post-processing
- Make solid-color backgrounds transparent for game sprites and assets
- Generate multiple images in batch mode from a prompt file
- Retry on transient failures with exponential backoff
- 通过CLI调用Gemini APIs,根据文本提示生成图像
- 在快速模型()和高质量模型(
gemini-2.5-flash-image)之间选择gemini-3-pro-image-preview - 将图像保存到指定路径,自动创建目录
- 通过后处理去除生成图像中的水印
- 将纯色背景设为透明,适用于游戏精灵图和素材
- 从提示词文件批量生成多张图像
- 对瞬时失败进行指数退避重试
What This Skill CANNOT Do
本工具不可实现的功能
- Build web applications with image generation (use instead)
nano-banana-builder - Use non-Gemini models (DALL-E, Midjourney, Stable Diffusion)
- Fine-tune or train models
- Generate video or audio content
- Bypass content policy restrictions
- Edit or modify existing images (generation only)
- 构建带有图像生成功能的Web应用(请改用)
nano-banana-builder - 使用非Gemini模型(如DALL-E、Midjourney、Stable Diffusion)
- 微调或训练模型
- 生成视频或音频内容
- 绕过内容政策限制
- 编辑或修改现有图像(仅支持生成)
Instructions
使用说明
Phase 1: ENVIRONMENT
阶段1:环境准备
Goal: Verify all prerequisites before attempting generation.
Step 1: Validate API key
bash
echo "GEMINI_API_KEY is ${GEMINI_API_KEY:+set}"Expect: . If not set, instruct user to configure it.
GEMINI_API_KEY is setStep 2: Verify dependencies
bash
python3 -c "from google import genai; from PIL import Image; print('OK')"If missing, install:
bash
pip install google-genai PillowStep 3: Determine output path
Use an absolute path for the output file. Verify the parent directory exists or will be created.
Gate: API key is set, dependencies installed, output path is valid. Proceed only when gate passes.
目标:在尝试生成图像前,验证所有前置条件。
步骤1:验证API密钥
bash
echo "GEMINI_API_KEY is ${GEMINI_API_KEY:+set}"预期输出:。如果未设置,指导用户配置该密钥。
GEMINI_API_KEY is set步骤2:验证依赖项
bash
python3 -c "from google import genai; from PIL import Image; print('OK')"如果缺少依赖项,执行安装:
bash
pip install google-genai Pillow步骤3:确定输出路径
使用绝对路径作为输出文件路径。验证父目录是否存在,或是否会自动创建。
准入条件:API密钥已设置、依赖项已安装、输出路径有效。仅当满足所有条件时,方可进入下一阶段。
Phase 2: CONFIGURE
阶段2:配置参数
Goal: Select the correct model and options for the request.
Step 1: Select model
| Scenario | Model | Why |
|---|---|---|
| Iterating on prompt, drafts | | Fast feedback (2-5s) |
| Final quality asset | | Best quality, 2K resolution |
| Game sprites, batch work | | Cost effective, consistent |
| Text in image, typography | | Better text rendering |
| Product photography | | Detail matters |
CRITICAL: Use ONLY these exact model strings. Do not invent, guess, or add date suffixes.
| Correct (use exactly) | WRONG (never use) |
|---|---|
| |
| |
| |
|
Step 2: Compose prompt
Follow this structure:
[Subject] [Style] [Background] [Constraints]For transparent background post-processing, include:
- "solid dark gray background" or "solid uniform gray background (#3a3a3a)"
- "no background elements or scenery"
Always include negative constraints: "no text", "no labels", "character only"
Step 3: Determine post-processing flags
- Need watermark removal? Add
--remove-watermark - Need transparent background? Add
--transparent-bg - Custom background color? Add
--bg-color "#FFFFFF" --bg-tolerance 20
Gate: Model selected, prompt composed, flags determined. Proceed only when gate passes.
目标:根据请求选择正确的模型和选项。
步骤1:选择模型
| 场景 | 模型 | 原因 |
|---|---|---|
| 提示词迭代、草稿生成 | | 反馈速度快(2-5秒) |
| 最终高质量素材 | | 最佳画质,支持2K分辨率 |
| 游戏精灵图、批量任务 | | 成本低,输出稳定 |
| 含文本的图像、排版设计 | | 文本渲染效果更好 |
| 产品摄影图 | | 细节表现更重要 |
关键注意事项:仅使用以下精确的模型字符串。不得自行编造、猜测或添加日期后缀。
| 正确用法(严格使用) | 错误用法(禁止使用) |
|---|---|
| |
| |
| |
|
步骤2:编写提示词
遵循以下结构:
[主体] [风格] [背景] [约束条件]如果需要后续处理透明背景,请包含:
- "纯深灰色背景" 或 "纯统一灰色背景(#3a3a3a)"
- "无背景元素或场景"
始终包含否定约束:"无文本"、"无标签"、"仅保留角色"
步骤3:确定后处理参数
- 需要去除水印?添加
--remove-watermark - 需要透明背景?添加
--transparent-bg - 自定义背景颜色?添加
--bg-color "#FFFFFF" --bg-tolerance 20
准入条件:已选择模型、已编写提示词、已确定参数。仅当满足所有条件时,方可进入下一阶段。
Phase 3: GENERATE
阶段3:生成图像
Goal: Execute the generation script and capture output.
Step 1: Run generation
bash
python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
--prompt "YOUR_PROMPT_HERE" \
--output /absolute/path/to/output.png \
--model gemini-3-pro-image-previewFor batch mode:
bash
python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
--batch /path/to/prompts.txt \
--output-dir /absolute/path/to/output/ \
--model gemini-2.5-flash-imageStep 2: Read script output
Check for or in output. If rate limited (429), the script handles retry automatically.
SUCCESSERRORGate: Script exited with code 0 and printed SUCCESS. Proceed only when gate passes.
目标:执行生成脚本并捕获输出。
步骤1:运行生成命令
bash
python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
--prompt "YOUR_PROMPT_HERE" \
--output /absolute/path/to/output.png \
--model gemini-3-pro-image-preview批量模式用法:
bash
python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
--batch /path/to/prompts.txt \
--output-dir /absolute/path/to/output/ \
--model gemini-2.5-flash-image步骤2:读取脚本输出
检查输出中是否包含或。如果遇到速率限制(429错误),脚本会自动处理重试。
SUCCESSERROR准入条件:脚本以0状态码退出并输出SUCCESS。仅当满足条件时,方可进入下一阶段。
Phase 4: VERIFY
阶段4:验证输出
Goal: Confirm the output file exists and is valid.
Step 1: Verify file exists
bash
ls -la /absolute/path/to/output.pngFile must exist and have non-zero size.
Step 2: Check dimensions (optional)
bash
python3 -c "from PIL import Image; img = Image.open('/absolute/path/to/output.png'); print(f'Size: {img.size}, Mode: {img.mode}')"Step 3: Visual inspection (MANDATORY)
Read the generated image file using the Read tool to visually inspect it:
Read the image at /absolute/path/to/output.pngCheck for:
- Content matches the prompt intent (correct subject, layout, composition)
- No unwanted watermarks, logos, or artifacts
- Text renders correctly (if text was requested)
- Appropriate aspect ratio and framing
- No excessive empty space or dark padding that needs cropping
If the image fails visual inspection, regenerate with an adjusted prompt before reporting to the user. Do not commit or deliver images without visual verification.
Step 4: Report result
Provide the user with:
- Output file path
- Image dimensions
- Model used
- Visual verification status (what you checked and confirmed)
- Any post-processing applied (cropping, resizing)
Gate: Output file exists with non-zero size AND visual inspection passed. Generation is complete.
目标:确认输出文件存在且有效。
步骤1:验证文件存在
bash
ls -la /absolute/path/to/output.png文件必须存在且大小非零。
步骤2:检查图像尺寸(可选)
bash
python3 -c "from PIL import Image; img = Image.open('/absolute/path/to/output.png'); print(f'Size: {img.size}, Mode: {img.mode}')"步骤3:视觉检查(必填)
使用读取工具打开生成的图像文件,进行视觉检查:
读取路径/absolute/path/to/output.png下的图像检查内容:
- 图像内容是否符合提示词意图(主体、布局、构图正确)
- 无多余水印、logo或瑕疵
- 文本渲染正确(如果提示词中包含文本)
- 宽高比和构图合适
- 无过多空白或需要裁剪的深色边距
如果图像未通过视觉检查,请调整提示词后重新生成,再向用户交付。未经过视觉验证的图像不得交付或提交。
步骤4:报告结果
向用户提供以下信息:
- 输出文件路径
- 图像尺寸
- 使用的模型
- 视觉验证状态(检查的内容和结果)
- 应用的后处理(裁剪、调整大小等)
准入条件:输出文件存在且大小非零,且通过视觉检查。此时生成任务完成。
Script Reference
脚本参考
generate_image.py
generate_image.py
Location:
$HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py| Argument | Required | Description |
|---|---|---|
| Yes* | Text prompt for image generation |
| Yes* | Output file path (.png) |
| No | Model name (default: gemini-3-pro-image-preview) |
| No | Remove watermarks from corners |
| No | Make background transparent |
| No | Background color hex (default: #3a3a3a) |
| No | Color matching tolerance (default: 30) |
| No | File with prompts (one per line) |
| No | Directory for batch output |
| No | Max retry attempts (default: 3) |
| No | Delay between batch requests in seconds (default: 3) |
*Required unless using +
--batch--output-dirExit Codes: 0 = success, 1 = missing API key, 2 = generation failed, 3 = invalid arguments
位置:
$HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py| 参数 | 是否必填 | 描述 |
|---|---|---|
| 是* | 图像生成的文本提示词 |
| 是* | 输出文件路径(.png格式) |
| 否 | 模型名称(默认:gemini-3-pro-image-preview) |
| 否 | 去除角落的水印 |
| 否 | 将背景设为透明 |
| 否 | 背景颜色十六进制值(默认:#3a3a3a) |
| 否 | 颜色匹配容差(默认:30) |
| 否 | 包含提示词的文件(每行一个提示词) |
| 否 | 批量输出的目录 |
| 否 | 最大重试次数(默认:3) |
| 否 | 批量请求之间的延迟(秒,默认:3) |
*使用+时,可不填上述必填项
--batch--output-dir退出码:0=成功,1=缺少API密钥,2=生成失败,3=参数无效
Error Handling
错误处理
Error: "GEMINI_API_KEY not set"
错误:"GEMINI_API_KEY not set"
Cause: Environment variable missing or empty
Solution:
- Set the variable:
export GEMINI_API_KEY="your-key" - If in a CI/CD environment, check secrets configuration
- Verify the key is valid by testing with a simple prompt
原因:环境变量缺失或为空
解决方案:
- 设置变量:
export GEMINI_API_KEY="your-key" - 如果在CI/CD环境中,检查密钥配置
- 通过简单提示词测试,验证密钥是否有效
Error: "Rate limit exceeded (429)"
错误:"Rate limit exceeded (429)"
Cause: Too many requests to Gemini API in short period
Solution:
- The script retries automatically with exponential backoff
- If persistent after retries, wait 60 seconds and try again
- For batch operations, increase to 5-10 seconds
--delay - Consider switching to for higher throughput
gemini-2.5-flash-image
原因:短时间内向Gemini API发送了过多请求
解决方案:
- 脚本会自动进行指数退避重试
- 如果重试后仍然失败,等待60秒后再尝试
- 对于批量操作,将增加到5-10秒
--delay - 考虑切换到以获得更高吞吐量
gemini-2.5-flash-image
Error: "No image in response"
错误:"No image in response"
Cause: API returned text-only response or generation was blocked
Solution:
- Add more detail to the prompt — vague prompts sometimes fail
- Try a different model
- Check that the prompt does not violate content policy
- Verify the script sets
response_modalities=["IMAGE", "TEXT"]
原因:API仅返回文本响应,或生成请求被阻止
解决方案:
- 为提示词添加更多细节——模糊的提示词有时会失败
- 尝试使用不同的模型
- 检查提示词是否违反内容政策
- 验证脚本是否设置了
response_modalities=["IMAGE", "TEXT"]
Error: "Content policy violation (400)"
错误:"Content policy violation (400)"
Cause: Prompt contains restricted content or triggers safety filters
Solution:
- Remove potentially problematic terms from the prompt
- Rephrase the request using neutral language
- This is an API-side restriction and cannot be bypassed
原因:提示词包含受限内容,或触发了安全过滤器
解决方案:
- 从提示词中移除可能有问题的词汇
- 使用中性语言重新表述请求
- 这是API端的限制,无法绕过
Anti-Patterns
反模式
Anti-Pattern 1: Inventing Model Names
反模式1:自行编造模型名称
What it looks like: or
Why wrong: These models do not exist. Date suffixes are for text models only. The API returns cryptic errors.
Do instead: Use exactly or . No variations.
model="gemini-2.5-flash-image-preview-12-25"model="gemini-3-flash-image"gemini-2.5-flash-imagegemini-3-pro-image-preview表现: 或
错误原因:这些模型不存在。日期后缀仅适用于文本模型,API会返回模糊的错误信息。
正确做法:严格使用或,不得使用变体。
model="gemini-2.5-flash-image-preview-12-25"model="gemini-3-flash-image"gemini-2.5-flash-imagegemini-3-pro-image-previewAnti-Pattern 2: Skipping Environment Validation
反模式2:跳过环境验证
What it looks like: Running without checking API key or dependencies first
Why wrong: Produces confusing error messages. Wastes time debugging environment issues as generation bugs.
Do instead: Complete Phase 1 (ENVIRONMENT) before any generation attempt. Always.
generate_image.py表现:未检查API密钥或依赖项,直接运行
错误原因:会产生混淆的错误信息,将环境问题误判为生成故障,浪费调试时间。
正确做法:在尝试生成图像前,必须完成阶段1(环境准备)的所有步骤。
generate_image.pyAnti-Pattern 3: Generating Without Visual Verification
反模式3:生成后不进行视觉检查
What it looks like: Running the script, checking file size, and committing the image without reading it to visually inspect
Why wrong: The file may exist with correct dimensions but contain watermarks, wrong composition, excessive padding, or content that doesn't match the prompt. A 952KB PNG with a cat watermark and wrong aspect ratio passed file-exists checks but looked bad in the README.
Do instead: Complete Phase 4 (VERIFY) including Step 3 (visual inspection). Read the image file with the Read tool. Check composition, content, and artifacts before delivering or committing.
表现:运行脚本后,仅检查文件大小就提交图像,不进行视觉检查
错误原因:文件可能存在且尺寸正确,但可能包含水印、构图错误、过多边距或与提示词不符的内容。例如,一个952KB的PNG文件可能通过了存在性检查,但包含猫水印和错误的宽高比,在README中显示效果很差。
正确做法:必须完成阶段4(验证输出)的步骤3(视觉检查)。使用读取工具打开图像文件,在交付或提交前检查构图、内容和瑕疵。
Anti-Pattern 4: Writing Custom Generation Code Instead of Using the Script
反模式4:编写自定义生成代码而非使用脚本
What it looks like: Writing inline Python to call the Gemini API directly instead of using
Why wrong: Misses retry logic, rate limiting, post-processing, model validation, and error handling already built into the script.
Do instead: Always use the provided script. It handles all edge cases.
generate_image.pygenerate_image.py表现:直接编写Python代码调用Gemini API,而非使用
错误原因:会错过脚本中已内置的重试逻辑、速率限制、后处理、模型验证和错误处理。
正确做法:始终使用提供的脚本,它能处理所有边缘情况。
generate_image.pygenerate_image.pyAnti-Pattern 5: Storing Base64 in Memory Instead of Saving to File
反模式5:将Base64数据保存在内存中而非保存到文件
What it looks like: Keeping image data in a variable without writing to disk
Why wrong: Data is lost on exit, cannot be used by other tools, wastes memory for large images.
Do instead: Save to file immediately. The script does this automatically.
表现:将图像数据保存在变量中,不写入磁盘
错误原因:退出时数据会丢失,无法被其他工具使用,大图像会浪费内存。
正确做法:立即保存到文件,脚本会自动完成此操作。
References
参考资料
This skill uses these shared patterns:
- Verification Checklist - Pre-completion checks
本工具使用以下共享模式:
- 验证清单 - 完成前的检查项
Reference Files
参考文件
- : Categorized example prompts by use case (game art, characters, product photography, pixel art, icons)
${CLAUDE_SKILL_DIR}/references/prompts.md
- :按使用场景分类的示例提示词(游戏艺术、角色、产品摄影、像素艺术、图标)
${CLAUDE_SKILL_DIR}/references/prompts.md
Prompt Engineering Quick Reference
提示词工程快速参考
Effective prompt structure:
[Subject] [Style] [Background] [Constraints]For transparent background post-processing:
- Use "solid dark gray background" or "solid uniform gray background (#3a3a3a)"
- Include "no background elements or scenery" and "no ground shadows"
- Combine with flag
--transparent-bg
For clean edges: "clean edges", "sharp outlines", "heavy ink outlines"
Negative constraints: Always include "no text", "no labels", "no watermarks", "character only"
有效的提示词结构:
[主体] [风格] [背景] [约束条件]用于透明背景后处理的提示词:
- 使用"纯深灰色背景" 或 "纯统一灰色背景(#3a3a3a)"
- 包含"无背景元素或场景"和"无地面阴影"
- 配合参数使用
--transparent-bg
用于清晰边缘的提示词:"边缘清晰"、"轮廓锐利"、"粗墨线轮廓"
否定约束:始终包含"无文本"、"无标签"、"无水印"、"仅保留角色"
Domain-Specific Anti-Rationalization
领域特定的错误合理化规避
| Rationalization | Why It's Wrong | Required Action |
|---|---|---|
| "I know the right model name" | Model names are exact strings, not patterns | Check the two valid names |
| "Output file was probably created" | Probably is not verified | Run |
| "API key is probably set" | Silent failures waste debugging time | Check explicitly in Phase 1 |
| "Custom code is faster than the script" | Script has retry, rate limiting, validation | Use |
| 错误合理化 | 错误原因 | 正确做法 |
|---|---|---|
| "我知道正确的模型名称" | 模型名称是精确字符串,而非模式 | 检查两个有效的模型名称 |
| "输出文件可能已经创建了" | "可能"不等于已验证 | 对输出路径运行 |
| "API密钥可能已经设置了" | 静默失败会浪费调试时间 | 在阶段1中明确检查 |
| "自定义代码比脚本更快" | 脚本包含重试、速率限制和验证逻辑 | 使用 |