gemini-image-generator

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemini Image Generator

Gemini 图像生成工具

Operator Context

工具运行背景

This skill operates as an operator for CLI-based image generation, configuring Claude's behavior for deterministic Python script execution against Google Gemini APIs. It implements an Execute-Verify pattern — validate environment, generate image, verify output — with Domain Intelligence embedded in model selection and prompt engineering.
本工具是一款基于CLI的图像生成操作器,配置Claude的行为以通过Python脚本调用Google Gemini APIs执行确定性图像生成任务。它采用执行-验证模式——验证环境、生成图像、验证输出——并在模型选择和提示词工程中融入了领域智能

Hardcoded Behaviors (Always Apply)

硬编码行为(始终生效)

  • CLAUDE.md Compliance: Read and follow repository CLAUDE.md files
  • Over-Engineering Prevention: Only generate what is directly requested
  • Exact Model Names: Use only
    gemini-2.5-flash-image
    or
    gemini-3-pro-image-preview
    — no variations, no date suffixes
  • API Key Validation: Always verify
    GEMINI_API_KEY
    exists before any generation attempt
  • Output Verification: Confirm output file exists and is non-zero bytes after generation
  • Absolute Paths: Always use absolute paths for output files
  • 遵循CLAUDE.md规范:阅读并遵循仓库中的CLAUDE.md文件
  • 避免过度设计:仅生成用户直接请求的内容
  • 使用精确模型名称:仅使用
    gemini-2.5-flash-image
    gemini-3-pro-image-preview
    ——不使用变体,不添加日期后缀
  • API密钥验证:在尝试生成图像前,始终验证
    GEMINI_API_KEY
    是否存在
  • 输出验证:生成完成后,确认输出文件存在且文件大小非零
  • 使用绝对路径:输出文件始终使用绝对路径

Default Behaviors (ON unless disabled)

默认行为(默认开启,可关闭)

  • Show Complete Output: Display full script output, never summarize
  • Rate Limit Handling: Wait between requests to avoid 429 errors
  • Retry on Failure: Retry transient failures with exponential backoff (3 attempts)
  • Status Reporting: Output structured status for Claude to parse
  • 显示完整输出:显示脚本的完整输出,绝不摘要
  • 速率限制处理:在请求之间等待,避免429错误
  • 失败重试:对瞬时失败进行指数退避重试(最多3次)
  • 状态报告:输出结构化状态供Claude解析

Optional Behaviors (OFF unless enabled)

可选行为(默认关闭,可开启)

  • Watermark Removal: Clean watermarks from corners with
    --remove-watermark
  • Background Transparency: Make solid backgrounds transparent with
    --transparent-bg
  • Batch Mode: Generate multiple images from a prompt file with
    --batch
  • 水印去除:使用
    --remove-watermark
    清除角落的水印
  • 背景透明化:使用
    --transparent-bg
    将纯色背景设为透明
  • 批量模式:使用
    --batch
    从提示词文件生成多张图像

What This Skill CAN Do

本工具可实现的功能

  • Generate images from text prompts via CLI using Gemini APIs
  • Select between fast (
    gemini-2.5-flash-image
    ) and quality (
    gemini-3-pro-image-preview
    ) models
  • Save images to specified file paths with automatic directory creation
  • Remove watermarks from generated images via post-processing
  • Make solid-color backgrounds transparent for game sprites and assets
  • Generate multiple images in batch mode from a prompt file
  • Retry on transient failures with exponential backoff
  • 通过CLI调用Gemini APIs,根据文本提示生成图像
  • 在快速模型(
    gemini-2.5-flash-image
    )和高质量模型(
    gemini-3-pro-image-preview
    )之间选择
  • 将图像保存到指定路径,自动创建目录
  • 通过后处理去除生成图像中的水印
  • 将纯色背景设为透明,适用于游戏精灵图和素材
  • 从提示词文件批量生成多张图像
  • 对瞬时失败进行指数退避重试

What This Skill CANNOT Do

本工具不可实现的功能

  • Build web applications with image generation (use
    nano-banana-builder
    instead)
  • Use non-Gemini models (DALL-E, Midjourney, Stable Diffusion)
  • Fine-tune or train models
  • Generate video or audio content
  • Bypass content policy restrictions
  • Edit or modify existing images (generation only)

  • 构建带有图像生成功能的Web应用(请改用
    nano-banana-builder
  • 使用非Gemini模型(如DALL-E、Midjourney、Stable Diffusion)
  • 微调或训练模型
  • 生成视频或音频内容
  • 绕过内容政策限制
  • 编辑或修改现有图像(仅支持生成)

Instructions

使用说明

Phase 1: ENVIRONMENT

阶段1:环境准备

Goal: Verify all prerequisites before attempting generation.
Step 1: Validate API key
bash
echo "GEMINI_API_KEY is ${GEMINI_API_KEY:+set}"
Expect:
GEMINI_API_KEY is set
. If not set, instruct user to configure it.
Step 2: Verify dependencies
bash
python3 -c "from google import genai; from PIL import Image; print('OK')"
If missing, install:
bash
pip install google-genai Pillow
Step 3: Determine output path
Use an absolute path for the output file. Verify the parent directory exists or will be created.
Gate: API key is set, dependencies installed, output path is valid. Proceed only when gate passes.
目标:在尝试生成图像前,验证所有前置条件。
步骤1:验证API密钥
bash
echo "GEMINI_API_KEY is ${GEMINI_API_KEY:+set}"
预期输出:
GEMINI_API_KEY is set
。如果未设置,指导用户配置该密钥。
步骤2:验证依赖项
bash
python3 -c "from google import genai; from PIL import Image; print('OK')"
如果缺少依赖项,执行安装:
bash
pip install google-genai Pillow
步骤3:确定输出路径
使用绝对路径作为输出文件路径。验证父目录是否存在,或是否会自动创建。
准入条件:API密钥已设置、依赖项已安装、输出路径有效。仅当满足所有条件时,方可进入下一阶段。

Phase 2: CONFIGURE

阶段2:配置参数

Goal: Select the correct model and options for the request.
Step 1: Select model
ScenarioModelWhy
Iterating on prompt, drafts
gemini-2.5-flash-image
Fast feedback (2-5s)
Final quality asset
gemini-3-pro-image-preview
Best quality, 2K resolution
Game sprites, batch work
gemini-2.5-flash-image
Cost effective, consistent
Text in image, typography
gemini-3-pro-image-preview
Better text rendering
Product photography
gemini-3-pro-image-preview
Detail matters
CRITICAL: Use ONLY these exact model strings. Do not invent, guess, or add date suffixes.
Correct (use exactly)WRONG (never use)
gemini-2.5-flash-image
gemini-2.5-flash-preview-05-20
(date suffix)
gemini-3-pro-image-preview
gemini-2.5-pro-image
(doesn't exist)
gemini-3-flash-image
(doesn't exist)
gemini-pro-vision
(that's image input)
Step 2: Compose prompt
Follow this structure:
[Subject] [Style] [Background] [Constraints]
For transparent background post-processing, include:
  • "solid dark gray background" or "solid uniform gray background (#3a3a3a)"
  • "no background elements or scenery"
Always include negative constraints: "no text", "no labels", "character only"
Step 3: Determine post-processing flags
  • Need watermark removal? Add
    --remove-watermark
  • Need transparent background? Add
    --transparent-bg
  • Custom background color? Add
    --bg-color "#FFFFFF" --bg-tolerance 20
Gate: Model selected, prompt composed, flags determined. Proceed only when gate passes.
目标:根据请求选择正确的模型和选项。
步骤1:选择模型
场景模型原因
提示词迭代、草稿生成
gemini-2.5-flash-image
反馈速度快(2-5秒)
最终高质量素材
gemini-3-pro-image-preview
最佳画质,支持2K分辨率
游戏精灵图、批量任务
gemini-2.5-flash-image
成本低,输出稳定
含文本的图像、排版设计
gemini-3-pro-image-preview
文本渲染效果更好
产品摄影图
gemini-3-pro-image-preview
细节表现更重要
关键注意事项:仅使用以下精确的模型字符串。不得自行编造、猜测或添加日期后缀。
正确用法(严格使用)错误用法(禁止使用)
gemini-2.5-flash-image
gemini-2.5-flash-preview-05-20
(含日期后缀)
gemini-3-pro-image-preview
gemini-2.5-pro-image
(不存在该模型)
gemini-3-flash-image
(不存在该模型)
gemini-pro-vision
(该模型用于图像输入)
步骤2:编写提示词
遵循以下结构:
[主体] [风格] [背景] [约束条件]
如果需要后续处理透明背景,请包含:
  • "纯深灰色背景" 或 "纯统一灰色背景(#3a3a3a)"
  • "无背景元素或场景"
始终包含否定约束:"无文本"、"无标签"、"仅保留角色"
步骤3:确定后处理参数
  • 需要去除水印?添加
    --remove-watermark
  • 需要透明背景?添加
    --transparent-bg
  • 自定义背景颜色?添加
    --bg-color "#FFFFFF" --bg-tolerance 20
准入条件:已选择模型、已编写提示词、已确定参数。仅当满足所有条件时,方可进入下一阶段。

Phase 3: GENERATE

阶段3:生成图像

Goal: Execute the generation script and capture output.
Step 1: Run generation
bash
python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
  --prompt "YOUR_PROMPT_HERE" \
  --output /absolute/path/to/output.png \
  --model gemini-3-pro-image-preview
For batch mode:
bash
python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
  --batch /path/to/prompts.txt \
  --output-dir /absolute/path/to/output/ \
  --model gemini-2.5-flash-image
Step 2: Read script output
Check for
SUCCESS
or
ERROR
in output. If rate limited (429), the script handles retry automatically.
Gate: Script exited with code 0 and printed SUCCESS. Proceed only when gate passes.
目标:执行生成脚本并捕获输出。
步骤1:运行生成命令
bash
python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
  --prompt "YOUR_PROMPT_HERE" \
  --output /absolute/path/to/output.png \
  --model gemini-3-pro-image-preview
批量模式用法:
bash
python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
  --batch /path/to/prompts.txt \
  --output-dir /absolute/path/to/output/ \
  --model gemini-2.5-flash-image
步骤2:读取脚本输出
检查输出中是否包含
SUCCESS
ERROR
。如果遇到速率限制(429错误),脚本会自动处理重试。
准入条件:脚本以0状态码退出并输出SUCCESS。仅当满足条件时,方可进入下一阶段。

Phase 4: VERIFY

阶段4:验证输出

Goal: Confirm the output file exists and is valid.
Step 1: Verify file exists
bash
ls -la /absolute/path/to/output.png
File must exist and have non-zero size.
Step 2: Check dimensions (optional)
bash
python3 -c "from PIL import Image; img = Image.open('/absolute/path/to/output.png'); print(f'Size: {img.size}, Mode: {img.mode}')"
Step 3: Visual inspection (MANDATORY)
Read the generated image file using the Read tool to visually inspect it:
Read the image at /absolute/path/to/output.png
Check for:
  • Content matches the prompt intent (correct subject, layout, composition)
  • No unwanted watermarks, logos, or artifacts
  • Text renders correctly (if text was requested)
  • Appropriate aspect ratio and framing
  • No excessive empty space or dark padding that needs cropping
If the image fails visual inspection, regenerate with an adjusted prompt before reporting to the user. Do not commit or deliver images without visual verification.
Step 4: Report result
Provide the user with:
  • Output file path
  • Image dimensions
  • Model used
  • Visual verification status (what you checked and confirmed)
  • Any post-processing applied (cropping, resizing)
Gate: Output file exists with non-zero size AND visual inspection passed. Generation is complete.

目标:确认输出文件存在且有效。
步骤1:验证文件存在
bash
ls -la /absolute/path/to/output.png
文件必须存在且大小非零。
步骤2:检查图像尺寸(可选)
bash
python3 -c "from PIL import Image; img = Image.open('/absolute/path/to/output.png'); print(f'Size: {img.size}, Mode: {img.mode}')"
步骤3:视觉检查(必填)
使用读取工具打开生成的图像文件,进行视觉检查:
读取路径/absolute/path/to/output.png下的图像
检查内容:
  • 图像内容是否符合提示词意图(主体、布局、构图正确)
  • 无多余水印、logo或瑕疵
  • 文本渲染正确(如果提示词中包含文本)
  • 宽高比和构图合适
  • 无过多空白或需要裁剪的深色边距
如果图像未通过视觉检查,请调整提示词后重新生成,再向用户交付。未经过视觉验证的图像不得交付或提交。
步骤4:报告结果
向用户提供以下信息:
  • 输出文件路径
  • 图像尺寸
  • 使用的模型
  • 视觉验证状态(检查的内容和结果)
  • 应用的后处理(裁剪、调整大小等)
准入条件:输出文件存在且大小非零,且通过视觉检查。此时生成任务完成。

Script Reference

脚本参考

generate_image.py

generate_image.py

Location:
$HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py
ArgumentRequiredDescription
--prompt
Yes*Text prompt for image generation
--output
Yes*Output file path (.png)
--model
NoModel name (default: gemini-3-pro-image-preview)
--remove-watermark
NoRemove watermarks from corners
--transparent-bg
NoMake background transparent
--bg-color
NoBackground color hex (default: #3a3a3a)
--bg-tolerance
NoColor matching tolerance (default: 30)
--batch
NoFile with prompts (one per line)
--output-dir
NoDirectory for batch output
--retries
NoMax retry attempts (default: 3)
--delay
NoDelay between batch requests in seconds (default: 3)
*Required unless using
--batch
+
--output-dir
Exit Codes: 0 = success, 1 = missing API key, 2 = generation failed, 3 = invalid arguments

位置
$HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py
参数是否必填描述
--prompt
是*图像生成的文本提示词
--output
是*输出文件路径(.png格式)
--model
模型名称(默认:gemini-3-pro-image-preview)
--remove-watermark
去除角落的水印
--transparent-bg
将背景设为透明
--bg-color
背景颜色十六进制值(默认:#3a3a3a)
--bg-tolerance
颜色匹配容差(默认:30)
--batch
包含提示词的文件(每行一个提示词)
--output-dir
批量输出的目录
--retries
最大重试次数(默认:3)
--delay
批量请求之间的延迟(秒,默认:3)
*使用
--batch
+
--output-dir
时,可不填上述必填项
退出码:0=成功,1=缺少API密钥,2=生成失败,3=参数无效

Error Handling

错误处理

Error: "GEMINI_API_KEY not set"

错误:"GEMINI_API_KEY not set"

Cause: Environment variable missing or empty Solution:
  1. Set the variable:
    export GEMINI_API_KEY="your-key"
  2. If in a CI/CD environment, check secrets configuration
  3. Verify the key is valid by testing with a simple prompt
原因:环境变量缺失或为空 解决方案:
  1. 设置变量:
    export GEMINI_API_KEY="your-key"
  2. 如果在CI/CD环境中,检查密钥配置
  3. 通过简单提示词测试,验证密钥是否有效

Error: "Rate limit exceeded (429)"

错误:"Rate limit exceeded (429)"

Cause: Too many requests to Gemini API in short period Solution:
  1. The script retries automatically with exponential backoff
  2. If persistent after retries, wait 60 seconds and try again
  3. For batch operations, increase
    --delay
    to 5-10 seconds
  4. Consider switching to
    gemini-2.5-flash-image
    for higher throughput
原因:短时间内向Gemini API发送了过多请求 解决方案:
  1. 脚本会自动进行指数退避重试
  2. 如果重试后仍然失败,等待60秒后再尝试
  3. 对于批量操作,将
    --delay
    增加到5-10秒
  4. 考虑切换到
    gemini-2.5-flash-image
    以获得更高吞吐量

Error: "No image in response"

错误:"No image in response"

Cause: API returned text-only response or generation was blocked Solution:
  1. Add more detail to the prompt — vague prompts sometimes fail
  2. Try a different model
  3. Check that the prompt does not violate content policy
  4. Verify the script sets
    response_modalities=["IMAGE", "TEXT"]
原因:API仅返回文本响应,或生成请求被阻止 解决方案:
  1. 为提示词添加更多细节——模糊的提示词有时会失败
  2. 尝试使用不同的模型
  3. 检查提示词是否违反内容政策
  4. 验证脚本是否设置了
    response_modalities=["IMAGE", "TEXT"]

Error: "Content policy violation (400)"

错误:"Content policy violation (400)"

Cause: Prompt contains restricted content or triggers safety filters Solution:
  1. Remove potentially problematic terms from the prompt
  2. Rephrase the request using neutral language
  3. This is an API-side restriction and cannot be bypassed

原因:提示词包含受限内容,或触发了安全过滤器 解决方案:
  1. 从提示词中移除可能有问题的词汇
  2. 使用中性语言重新表述请求
  3. 这是API端的限制,无法绕过

Anti-Patterns

反模式

Anti-Pattern 1: Inventing Model Names

反模式1:自行编造模型名称

What it looks like:
model="gemini-2.5-flash-image-preview-12-25"
or
model="gemini-3-flash-image"
Why wrong: These models do not exist. Date suffixes are for text models only. The API returns cryptic errors. Do instead: Use exactly
gemini-2.5-flash-image
or
gemini-3-pro-image-preview
. No variations.
表现
model="gemini-2.5-flash-image-preview-12-25"
model="gemini-3-flash-image"
错误原因:这些模型不存在。日期后缀仅适用于文本模型,API会返回模糊的错误信息。 正确做法:严格使用
gemini-2.5-flash-image
gemini-3-pro-image-preview
,不得使用变体。

Anti-Pattern 2: Skipping Environment Validation

反模式2:跳过环境验证

What it looks like: Running
generate_image.py
without checking API key or dependencies first Why wrong: Produces confusing error messages. Wastes time debugging environment issues as generation bugs. Do instead: Complete Phase 1 (ENVIRONMENT) before any generation attempt. Always.
表现:未检查API密钥或依赖项,直接运行
generate_image.py
错误原因:会产生混淆的错误信息,将环境问题误判为生成故障,浪费调试时间。 正确做法:在尝试生成图像前,必须完成阶段1(环境准备)的所有步骤。

Anti-Pattern 3: Generating Without Visual Verification

反模式3:生成后不进行视觉检查

What it looks like: Running the script, checking file size, and committing the image without reading it to visually inspect Why wrong: The file may exist with correct dimensions but contain watermarks, wrong composition, excessive padding, or content that doesn't match the prompt. A 952KB PNG with a cat watermark and wrong aspect ratio passed file-exists checks but looked bad in the README. Do instead: Complete Phase 4 (VERIFY) including Step 3 (visual inspection). Read the image file with the Read tool. Check composition, content, and artifacts before delivering or committing.
表现:运行脚本后,仅检查文件大小就提交图像,不进行视觉检查 错误原因:文件可能存在且尺寸正确,但可能包含水印、构图错误、过多边距或与提示词不符的内容。例如,一个952KB的PNG文件可能通过了存在性检查,但包含猫水印和错误的宽高比,在README中显示效果很差。 正确做法:必须完成阶段4(验证输出)的步骤3(视觉检查)。使用读取工具打开图像文件,在交付或提交前检查构图、内容和瑕疵。

Anti-Pattern 4: Writing Custom Generation Code Instead of Using the Script

反模式4:编写自定义生成代码而非使用脚本

What it looks like: Writing inline Python to call the Gemini API directly instead of using
generate_image.py
Why wrong: Misses retry logic, rate limiting, post-processing, model validation, and error handling already built into the script. Do instead: Always use the provided
generate_image.py
script. It handles all edge cases.
表现:直接编写Python代码调用Gemini API,而非使用
generate_image.py
错误原因:会错过脚本中已内置的重试逻辑、速率限制、后处理、模型验证和错误处理。 正确做法:始终使用提供的
generate_image.py
脚本,它能处理所有边缘情况。

Anti-Pattern 5: Storing Base64 in Memory Instead of Saving to File

反模式5:将Base64数据保存在内存中而非保存到文件

What it looks like: Keeping image data in a variable without writing to disk Why wrong: Data is lost on exit, cannot be used by other tools, wastes memory for large images. Do instead: Save to file immediately. The script does this automatically.

表现:将图像数据保存在变量中,不写入磁盘 错误原因:退出时数据会丢失,无法被其他工具使用,大图像会浪费内存。 正确做法:立即保存到文件,脚本会自动完成此操作。

References

参考资料

This skill uses these shared patterns:
  • Verification Checklist - Pre-completion checks
本工具使用以下共享模式:
  • 验证清单 - 完成前的检查项

Reference Files

参考文件

  • ${CLAUDE_SKILL_DIR}/references/prompts.md
    : Categorized example prompts by use case (game art, characters, product photography, pixel art, icons)
  • ${CLAUDE_SKILL_DIR}/references/prompts.md
    :按使用场景分类的示例提示词(游戏艺术、角色、产品摄影、像素艺术、图标)

Prompt Engineering Quick Reference

提示词工程快速参考

Effective prompt structure:
[Subject] [Style] [Background] [Constraints]
For transparent background post-processing:
  • Use "solid dark gray background" or "solid uniform gray background (#3a3a3a)"
  • Include "no background elements or scenery" and "no ground shadows"
  • Combine with
    --transparent-bg
    flag
For clean edges: "clean edges", "sharp outlines", "heavy ink outlines"
Negative constraints: Always include "no text", "no labels", "no watermarks", "character only"
有效的提示词结构
[主体] [风格] [背景] [约束条件]
用于透明背景后处理的提示词
  • 使用"纯深灰色背景" 或 "纯统一灰色背景(#3a3a3a)"
  • 包含"无背景元素或场景"和"无地面阴影"
  • 配合
    --transparent-bg
    参数使用
用于清晰边缘的提示词:"边缘清晰"、"轮廓锐利"、"粗墨线轮廓"
否定约束:始终包含"无文本"、"无标签"、"无水印"、"仅保留角色"

Domain-Specific Anti-Rationalization

领域特定的错误合理化规避

RationalizationWhy It's WrongRequired Action
"I know the right model name"Model names are exact strings, not patternsCheck the two valid names
"Output file was probably created"Probably is not verifiedRun
ls -la
on the output path
"API key is probably set"Silent failures waste debugging timeCheck explicitly in Phase 1
"Custom code is faster than the script"Script has retry, rate limiting, validationUse
generate_image.py
错误合理化错误原因正确做法
"我知道正确的模型名称"模型名称是精确字符串,而非模式检查两个有效的模型名称
"输出文件可能已经创建了""可能"不等于已验证对输出路径运行
ls -la
命令
"API密钥可能已经设置了"静默失败会浪费调试时间在阶段1中明确检查
"自定义代码比脚本更快"脚本包含重试、速率限制和验证逻辑使用
generate_image.py