gemini-image-generator

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Gemini Image Generator

Gemini 图像生成工具

Operator Context

工具运行背景

This skill operates as an operator for CLI-based image generation, configuring Claude's behavior for deterministic Python script execution against Google Gemini APIs. It implements an Execute-Verify pattern — validate environment, generate image, verify output — with Domain Intelligence embedded in model selection and prompt engineering.

本工具是一款基于CLI的图像生成操作器，配置Claude的行为以通过Python脚本调用Google Gemini APIs执行确定性图像生成任务。它采用执行-验证模式——验证环境、生成图像、验证输出——并在模型选择和提示词工程中融入了领域智能。

Hardcoded Behaviors (Always Apply)

硬编码行为（始终生效）

CLAUDE.md Compliance: Read and follow repository CLAUDE.md files
Over-Engineering Prevention: Only generate what is directly requested
Exact Model Names: Use only
```
gemini-2.5-flash-image
```
or
```
gemini-3-pro-image-preview
```
— no variations, no date suffixes
API Key Validation: Always verify
```
GEMINI_API_KEY
```
exists before any generation attempt
Output Verification: Confirm output file exists and is non-zero bytes after generation
Absolute Paths: Always use absolute paths for output files

遵循CLAUDE.md规范：阅读并遵循仓库中的CLAUDE.md文件
避免过度设计：仅生成用户直接请求的内容
使用精确模型名称：仅使用
```
gemini-2.5-flash-image
```
或
```
gemini-3-pro-image-preview
```
——不使用变体，不添加日期后缀
API密钥验证：在尝试生成图像前，始终验证
```
GEMINI_API_KEY
```
是否存在
输出验证：生成完成后，确认输出文件存在且文件大小非零
使用绝对路径：输出文件始终使用绝对路径

Default Behaviors (ON unless disabled)

默认行为（默认开启，可关闭）

Show Complete Output: Display full script output, never summarize
Rate Limit Handling: Wait between requests to avoid 429 errors
Retry on Failure: Retry transient failures with exponential backoff (3 attempts)
Status Reporting: Output structured status for Claude to parse

显示完整输出：显示脚本的完整输出，绝不摘要
速率限制处理：在请求之间等待，避免429错误
失败重试：对瞬时失败进行指数退避重试（最多3次）
状态报告：输出结构化状态供Claude解析

Optional Behaviors (OFF unless enabled)

可选行为（默认关闭，可开启）

Watermark Removal: Clean watermarks from corners with
```
--remove-watermark
```
Background Transparency: Make solid backgrounds transparent with
```
--transparent-bg
```
Batch Mode: Generate multiple images from a prompt file with
```
--batch
```

水印去除：使用
```
--remove-watermark
```
清除角落的水印
背景透明化：使用
```
--transparent-bg
```
将纯色背景设为透明
批量模式：使用
```
--batch
```
从提示词文件生成多张图像

What This Skill CAN Do

本工具可实现的功能

Generate images from text prompts via CLI using Gemini APIs

Select between fast (

gemini-2.5-flash-image

) and quality (

gemini-3-pro-image-preview

) models

Save images to specified file paths with automatic directory creation
Remove watermarks from generated images via post-processing
Make solid-color backgrounds transparent for game sprites and assets
Generate multiple images in batch mode from a prompt file
Retry on transient failures with exponential backoff

通过CLI调用Gemini APIs，根据文本提示生成图像
在快速模型（
```
gemini-2.5-flash-image
```
）和高质量模型（
```
gemini-3-pro-image-preview
```
）之间选择
将图像保存到指定路径，自动创建目录
通过后处理去除生成图像中的水印
将纯色背景设为透明，适用于游戏精灵图和素材
从提示词文件批量生成多张图像
对瞬时失败进行指数退避重试

What This Skill CANNOT Do

本工具不可实现的功能

Build web applications with image generation (use
```
nano-banana-builder
```
instead)
Use non-Gemini models (DALL-E, Midjourney, Stable Diffusion)
Fine-tune or train models
Generate video or audio content
Bypass content policy restrictions
Edit or modify existing images (generation only)

构建带有图像生成功能的Web应用（请改用
```
nano-banana-builder
```
）
使用非Gemini模型（如DALL-E、Midjourney、Stable Diffusion）
微调或训练模型
生成视频或音频内容
绕过内容政策限制
编辑或修改现有图像（仅支持生成）

Instructions

使用说明

Phase 1: ENVIRONMENT

阶段1：环境准备

Goal: Verify all prerequisites before attempting generation.

Step 1: Validate API key

bash

echo "GEMINI_API_KEY is ${GEMINI_API_KEY:+set}"

Expect:

GEMINI_API_KEY is set

. If not set, instruct user to configure it.

Step 2: Verify dependencies

bash

python3 -c "from google import genai; from PIL import Image; print('OK')"

If missing, install:

bash

pip install google-genai Pillow

Step 3: Determine output path

Use an absolute path for the output file. Verify the parent directory exists or will be created.

Gate: API key is set, dependencies installed, output path is valid. Proceed only when gate passes.

目标：在尝试生成图像前，验证所有前置条件。

步骤1：验证API密钥

bash

echo "GEMINI_API_KEY is ${GEMINI_API_KEY:+set}"

预期输出：

GEMINI_API_KEY is set

。如果未设置，指导用户配置该密钥。

步骤2：验证依赖项

bash

python3 -c "from google import genai; from PIL import Image; print('OK')"

如果缺少依赖项，执行安装：

bash

pip install google-genai Pillow

步骤3：确定输出路径

使用绝对路径作为输出文件路径。验证父目录是否存在，或是否会自动创建。

准入条件：API密钥已设置、依赖项已安装、输出路径有效。仅当满足所有条件时，方可进入下一阶段。

Phase 2: CONFIGURE

阶段2：配置参数

Goal: Select the correct model and options for the request.

Step 1: Select model

Scenario	Model	Why
Iterating on prompt, drafts	`gemini-2.5-flash-image`	Fast feedback (2-5s)
Final quality asset	`gemini-3-pro-image-preview`	Best quality, 2K resolution
Game sprites, batch work	`gemini-2.5-flash-image`	Cost effective, consistent
Text in image, typography	`gemini-3-pro-image-preview`	Better text rendering
Product photography	`gemini-3-pro-image-preview`	Detail matters

CRITICAL: Use ONLY these exact model strings. Do not invent, guess, or add date suffixes.

Correct (use exactly)	WRONG (never use)
`gemini-2.5-flash-image`	`gemini-2.5-flash-preview-05-20` (date suffix)
`gemini-3-pro-image-preview`	`gemini-2.5-pro-image` (doesn't exist)
	`gemini-3-flash-image` (doesn't exist)
	`gemini-pro-vision` (that's image input)

Step 2: Compose prompt

Follow this structure:

[Subject] [Style] [Background] [Constraints]

For transparent background post-processing, include:

"solid dark gray background" or "solid uniform gray background (#3a3a3a)"
"no background elements or scenery"

Always include negative constraints: "no text", "no labels", "character only"

Step 3: Determine post-processing flags

Need watermark removal? Add
```
--remove-watermark
```
Need transparent background? Add
```
--transparent-bg
```
Custom background color? Add
```
--bg-color "#FFFFFF" --bg-tolerance 20
```

Gate: Model selected, prompt composed, flags determined. Proceed only when gate passes.

目标：根据请求选择正确的模型和选项。

步骤1：选择模型

场景	模型	原因
提示词迭代、草稿生成	`gemini-2.5-flash-image`	反馈速度快（2-5秒）
最终高质量素材	`gemini-3-pro-image-preview`	最佳画质，支持2K分辨率
游戏精灵图、批量任务	`gemini-2.5-flash-image`	成本低，输出稳定
含文本的图像、排版设计	`gemini-3-pro-image-preview`	文本渲染效果更好
产品摄影图	`gemini-3-pro-image-preview`	细节表现更重要

关键注意事项：仅使用以下精确的模型字符串。不得自行编造、猜测或添加日期后缀。

正确用法（严格使用）	错误用法（禁止使用）
`gemini-2.5-flash-image`	`gemini-2.5-flash-preview-05-20` （含日期后缀）
`gemini-3-pro-image-preview`	`gemini-2.5-pro-image` （不存在该模型）
	`gemini-3-flash-image` （不存在该模型）
	`gemini-pro-vision` （该模型用于图像输入）

步骤2：编写提示词

遵循以下结构：

[主体] [风格] [背景] [约束条件]

如果需要后续处理透明背景，请包含：

"纯深灰色背景" 或 "纯统一灰色背景（#3a3a3a）"
"无背景元素或场景"

始终包含否定约束："无文本"、"无标签"、"仅保留角色"

步骤3：确定后处理参数

需要去除水印？添加
```
--remove-watermark
```
需要透明背景？添加
```
--transparent-bg
```
自定义背景颜色？添加
```
--bg-color "#FFFFFF" --bg-tolerance 20
```

准入条件：已选择模型、已编写提示词、已确定参数。仅当满足所有条件时，方可进入下一阶段。

Phase 3: GENERATE

阶段3：生成图像

Goal: Execute the generation script and capture output.

Step 1: Run generation

bash

python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
  --prompt "YOUR_PROMPT_HERE" \
  --output /absolute/path/to/output.png \
  --model gemini-3-pro-image-preview

For batch mode:

bash

python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
  --batch /path/to/prompts.txt \
  --output-dir /absolute/path/to/output/ \
  --model gemini-2.5-flash-image

Step 2: Read script output

Check for

SUCCESS

ERROR

in output. If rate limited (429), the script handles retry automatically.

Gate: Script exited with code 0 and printed SUCCESS. Proceed only when gate passes.

目标：执行生成脚本并捕获输出。

步骤1：运行生成命令

bash

python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
  --prompt "YOUR_PROMPT_HERE" \
  --output /absolute/path/to/output.png \
  --model gemini-3-pro-image-preview

批量模式用法：

bash

python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
  --batch /path/to/prompts.txt \
  --output-dir /absolute/path/to/output/ \
  --model gemini-2.5-flash-image

步骤2：读取脚本输出

检查输出中是否包含

SUCCESS

或

ERROR

。如果遇到速率限制（429错误），脚本会自动处理重试。

准入条件：脚本以0状态码退出并输出SUCCESS。仅当满足条件时，方可进入下一阶段。

Phase 4: VERIFY

阶段4：验证输出

Goal: Confirm the output file exists and is valid.

Step 1: Verify file exists

bash

ls -la /absolute/path/to/output.png

File must exist and have non-zero size.

Step 2: Check dimensions (optional)

bash

python3 -c "from PIL import Image; img = Image.open('/absolute/path/to/output.png'); print(f'Size: {img.size}, Mode: {img.mode}')"

Step 3: Visual inspection (MANDATORY)

Read the generated image file using the Read tool to visually inspect it:

Read the image at /absolute/path/to/output.png

Check for:

Content matches the prompt intent (correct subject, layout, composition)
No unwanted watermarks, logos, or artifacts
Text renders correctly (if text was requested)
Appropriate aspect ratio and framing
No excessive empty space or dark padding that needs cropping

If the image fails visual inspection, regenerate with an adjusted prompt before reporting to the user. Do not commit or deliver images without visual verification.

Step 4: Report result

Provide the user with:

Output file path
Image dimensions
Model used
Visual verification status (what you checked and confirmed)
Any post-processing applied (cropping, resizing)

Gate: Output file exists with non-zero size AND visual inspection passed. Generation is complete.

目标：确认输出文件存在且有效。

步骤1：验证文件存在

bash

ls -la /absolute/path/to/output.png

文件必须存在且大小非零。

步骤2：检查图像尺寸（可选）

bash

python3 -c "from PIL import Image; img = Image.open('/absolute/path/to/output.png'); print(f'Size: {img.size}, Mode: {img.mode}')"

步骤3：视觉检查（必填）

使用读取工具打开生成的图像文件，进行视觉检查：

读取路径/absolute/path/to/output.png下的图像

检查内容：

图像内容是否符合提示词意图（主体、布局、构图正确）
无多余水印、logo或瑕疵
文本渲染正确（如果提示词中包含文本）
宽高比和构图合适
无过多空白或需要裁剪的深色边距

如果图像未通过视觉检查，请调整提示词后重新生成，再向用户交付。未经过视觉验证的图像不得交付或提交。

步骤4：报告结果

向用户提供以下信息：

输出文件路径
图像尺寸
使用的模型
视觉验证状态（检查的内容和结果）
应用的后处理（裁剪、调整大小等）

准入条件：输出文件存在且大小非零，且通过视觉检查。此时生成任务完成。

Script Reference

脚本参考

generate_image.py

Location:

$HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py

Argument	Required	Description
`--prompt`	Yes*	Text prompt for image generation
`--output`	Yes*	Output file path (.png)
`--model`	No	Model name (default: gemini-3-pro-image-preview)
`--remove-watermark`	No	Remove watermarks from corners
`--transparent-bg`	No	Make background transparent
`--bg-color`	No	Background color hex (default: #3a3a3a)
`--bg-tolerance`	No	Color matching tolerance (default: 30)
`--batch`	No	File with prompts (one per line)
`--output-dir`	No	Directory for batch output
`--retries`	No	Max retry attempts (default: 3)
`--delay`	No	Delay between batch requests in seconds (default: 3)

*Required unless using

--batch

--output-dir

Exit Codes: 0 = success, 1 = missing API key, 2 = generation failed, 3 = invalid arguments

位置：

$HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py

参数	是否必填	描述
`--prompt`	是*	图像生成的文本提示词
`--output`	是*	输出文件路径（.png格式）
`--model`	否	模型名称（默认：gemini-3-pro-image-preview）
`--remove-watermark`	否	去除角落的水印
`--transparent-bg`	否	将背景设为透明
`--bg-color`	否	背景颜色十六进制值（默认：#3a3a3a）
`--bg-tolerance`	否	颜色匹配容差（默认：30）
`--batch`	否	包含提示词的文件（每行一个提示词）
`--output-dir`	否	批量输出的目录
`--retries`	否	最大重试次数（默认：3）
`--delay`	否	批量请求之间的延迟（秒，默认：3）

*使用

--batch

--output-dir

时，可不填上述必填项

退出码：0=成功，1=缺少API密钥，2=生成失败，3=参数无效

Error Handling

错误处理

Error: "GEMINI_API_KEY not set"

错误："GEMINI_API_KEY not set"

Cause: Environment variable missing or empty Solution:

Set the variable:
```
export GEMINI_API_KEY="your-key"
```
If in a CI/CD environment, check secrets configuration
Verify the key is valid by testing with a simple prompt

原因：环境变量缺失或为空解决方案：

设置变量：
```
export GEMINI_API_KEY="your-key"
```
如果在CI/CD环境中，检查密钥配置
通过简单提示词测试，验证密钥是否有效

Error: "Rate limit exceeded (429)"

错误："Rate limit exceeded (429)"

Cause: Too many requests to Gemini API in short period Solution:

The script retries automatically with exponential backoff
If persistent after retries, wait 60 seconds and try again
For batch operations, increase
```
--delay
```
to 5-10 seconds
Consider switching to
```
gemini-2.5-flash-image
```
for higher throughput

原因：短时间内向Gemini API发送了过多请求解决方案：

脚本会自动进行指数退避重试
如果重试后仍然失败，等待60秒后再尝试
对于批量操作，将
```
--delay
```
增加到5-10秒
考虑切换到
```
gemini-2.5-flash-image
```
以获得更高吞吐量

Error: "No image in response"

错误："No image in response"

Cause: API returned text-only response or generation was blocked Solution:

Add more detail to the prompt — vague prompts sometimes fail
Try a different model
Check that the prompt does not violate content policy
Verify the script sets
```
response_modalities=["IMAGE", "TEXT"]
```

原因：API仅返回文本响应，或生成请求被阻止解决方案：

为提示词添加更多细节——模糊的提示词有时会失败
尝试使用不同的模型
检查提示词是否违反内容政策
验证脚本是否设置了
```
response_modalities=["IMAGE", "TEXT"]
```

Error: "Content policy violation (400)"

错误："Content policy violation (400)"

Cause: Prompt contains restricted content or triggers safety filters Solution:

Remove potentially problematic terms from the prompt
Rephrase the request using neutral language
This is an API-side restriction and cannot be bypassed

原因：提示词包含受限内容，或触发了安全过滤器解决方案：

从提示词中移除可能有问题的词汇
使用中性语言重新表述请求
这是API端的限制，无法绕过

Anti-Patterns

反模式

Anti-Pattern 1: Inventing Model Names

反模式1：自行编造模型名称

What it looks like:

model="gemini-2.5-flash-image-preview-12-25"

model="gemini-3-flash-image"

Why wrong: These models do not exist. Date suffixes are for text models only. The API returns cryptic errors. Do instead: Use exactly

gemini-2.5-flash-image

gemini-3-pro-image-preview

. No variations.

表现：

model="gemini-2.5-flash-image-preview-12-25"

或

model="gemini-3-flash-image"

错误原因：这些模型不存在。日期后缀仅适用于文本模型，API会返回模糊的错误信息。 正确做法：严格使用

gemini-2.5-flash-image

或

gemini-3-pro-image-preview

，不得使用变体。

Anti-Pattern 2: Skipping Environment Validation

反模式2：跳过环境验证

What it looks like: Running

generate_image.py

without checking API key or dependencies first Why wrong: Produces confusing error messages. Wastes time debugging environment issues as generation bugs. Do instead: Complete Phase 1 (ENVIRONMENT) before any generation attempt. Always.

表现：未检查API密钥或依赖项，直接运行

generate_image.py

错误原因：会产生混淆的错误信息，将环境问题误判为生成故障，浪费调试时间。 正确做法：在尝试生成图像前，必须完成阶段1（环境准备）的所有步骤。

Anti-Pattern 3: Generating Without Visual Verification

反模式3：生成后不进行视觉检查

What it looks like: Running the script, checking file size, and committing the image without reading it to visually inspect Why wrong: The file may exist with correct dimensions but contain watermarks, wrong composition, excessive padding, or content that doesn't match the prompt. A 952KB PNG with a cat watermark and wrong aspect ratio passed file-exists checks but looked bad in the README. Do instead: Complete Phase 4 (VERIFY) including Step 3 (visual inspection). Read the image file with the Read tool. Check composition, content, and artifacts before delivering or committing.

表现：运行脚本后，仅检查文件大小就提交图像，不进行视觉检查 错误原因：文件可能存在且尺寸正确，但可能包含水印、构图错误、过多边距或与提示词不符的内容。例如，一个952KB的PNG文件可能通过了存在性检查，但包含猫水印和错误的宽高比，在README中显示效果很差。 正确做法：必须完成阶段4（验证输出）的步骤3（视觉检查）。使用读取工具打开图像文件，在交付或提交前检查构图、内容和瑕疵。

Anti-Pattern 4: Writing Custom Generation Code Instead of Using the Script

反模式4：编写自定义生成代码而非使用脚本

What it looks like: Writing inline Python to call the Gemini API directly instead of using

generate_image.py

Why wrong: Misses retry logic, rate limiting, post-processing, model validation, and error handling already built into the script. Do instead: Always use the provided

generate_image.py

script. It handles all edge cases.

表现：直接编写Python代码调用Gemini API，而非使用

generate_image.py

错误原因：会错过脚本中已内置的重试逻辑、速率限制、后处理、模型验证和错误处理。 正确做法：始终使用提供的

generate_image.py

脚本，它能处理所有边缘情况。

Anti-Pattern 5: Storing Base64 in Memory Instead of Saving to File

反模式5：将Base64数据保存在内存中而非保存到文件

What it looks like: Keeping image data in a variable without writing to disk Why wrong: Data is lost on exit, cannot be used by other tools, wastes memory for large images. Do instead: Save to file immediately. The script does this automatically.

表现：将图像数据保存在变量中，不写入磁盘 错误原因：退出时数据会丢失，无法被其他工具使用，大图像会浪费内存。 正确做法：立即保存到文件，脚本会自动完成此操作。

References

参考资料

This skill uses these shared patterns:

Verification Checklist - Pre-completion checks

本工具使用以下共享模式：

验证清单 - 完成前的检查项

Reference Files

参考文件

```
${CLAUDE_SKILL_DIR}/references/prompts.md
```
: Categorized example prompts by use case (game art, characters, product photography, pixel art, icons)

```
${CLAUDE_SKILL_DIR}/references/prompts.md
```
：按使用场景分类的示例提示词（游戏艺术、角色、产品摄影、像素艺术、图标）

Prompt Engineering Quick Reference

提示词工程快速参考

Effective prompt structure:

[Subject] [Style] [Background] [Constraints]

For transparent background post-processing:

Use "solid dark gray background" or "solid uniform gray background (#3a3a3a)"
Include "no background elements or scenery" and "no ground shadows"
Combine with
```
--transparent-bg
```
flag

For clean edges: "clean edges", "sharp outlines", "heavy ink outlines"

Negative constraints: Always include "no text", "no labels", "no watermarks", "character only"

有效的提示词结构：

[主体] [风格] [背景] [约束条件]

用于透明背景后处理的提示词：

使用"纯深灰色背景" 或 "纯统一灰色背景（#3a3a3a）"
包含"无背景元素或场景"和"无地面阴影"
配合
```
--transparent-bg
```
参数使用

用于清晰边缘的提示词："边缘清晰"、"轮廓锐利"、"粗墨线轮廓"

否定约束：始终包含"无文本"、"无标签"、"无水印"、"仅保留角色"

Domain-Specific Anti-Rationalization

领域特定的错误合理化规避

Rationalization	Why It's Wrong	Required Action
"I know the right model name"	Model names are exact strings, not patterns	Check the two valid names
"Output file was probably created"	Probably is not verified	Run `ls -la` on the output path
"API key is probably set"	Silent failures waste debugging time	Check explicitly in Phase 1
"Custom code is faster than the script"	Script has retry, rate limiting, validation	Use `generate_image.py`

错误合理化	错误原因	正确做法
"我知道正确的模型名称"	模型名称是精确字符串，而非模式	检查两个有效的模型名称
"输出文件可能已经创建了"	"可能"不等于已验证	对输出路径运行 `ls -la` 命令
"API密钥可能已经设置了"	静默失败会浪费调试时间	在阶段1中明确检查
"自定义代码比脚本更快"	脚本包含重试、速率限制和验证逻辑	使用 `generate_image.py`