Gemini Image Generator
Operator Context
This skill operates as an operator for CLI-based image generation, configuring Claude's behavior for deterministic Python script execution against Google Gemini APIs. It implements an Execute-Verify pattern — validate environment, generate image, verify output — with Domain Intelligence embedded in model selection and prompt engineering.
Hardcoded Behaviors (Always Apply)
- CLAUDE.md Compliance: Read and follow repository CLAUDE.md files
- Over-Engineering Prevention: Only generate what is directly requested
- Exact Model Names: Use only or
gemini-3-pro-image-preview
— no variations, no date suffixes
- API Key Validation: Always verify exists before any generation attempt
- Output Verification: Confirm output file exists and is non-zero bytes after generation
- Absolute Paths: Always use absolute paths for output files
Default Behaviors (ON unless disabled)
- Show Complete Output: Display full script output, never summarize
- Rate Limit Handling: Wait between requests to avoid 429 errors
- Retry on Failure: Retry transient failures with exponential backoff (3 attempts)
- Status Reporting: Output structured status for Claude to parse
Optional Behaviors (OFF unless enabled)
- Watermark Removal: Clean watermarks from corners with
- Background Transparency: Make solid backgrounds transparent with
- Batch Mode: Generate multiple images from a prompt file with
What This Skill CAN Do
- Generate images from text prompts via CLI using Gemini APIs
- Select between fast () and quality (
gemini-3-pro-image-preview
) models
- Save images to specified file paths with automatic directory creation
- Remove watermarks from generated images via post-processing
- Make solid-color backgrounds transparent for game sprites and assets
- Generate multiple images in batch mode from a prompt file
- Retry on transient failures with exponential backoff
What This Skill CANNOT Do
- Build web applications with image generation (use instead)
- Use non-Gemini models (DALL-E, Midjourney, Stable Diffusion)
- Fine-tune or train models
- Generate video or audio content
- Bypass content policy restrictions
- Edit or modify existing images (generation only)
Instructions
Phase 1: ENVIRONMENT
Goal: Verify all prerequisites before attempting generation.
Step 1: Validate API key
bash
echo "GEMINI_API_KEY is ${GEMINI_API_KEY:+set}"
Expect:
. If not set, instruct user to configure it.
Step 2: Verify dependencies
bash
python3 -c "from google import genai; from PIL import Image; print('OK')"
If missing, install:
bash
pip install google-genai Pillow
Step 3: Determine output path
Use an absolute path for the output file. Verify the parent directory exists or will be created.
Gate: API key is set, dependencies installed, output path is valid. Proceed only when gate passes.
Phase 2: CONFIGURE
Goal: Select the correct model and options for the request.
Step 1: Select model
| Scenario | Model | Why |
|---|
| Iterating on prompt, drafts | | Fast feedback (2-5s) |
| Final quality asset | gemini-3-pro-image-preview
| Best quality, 2K resolution |
| Game sprites, batch work | | Cost effective, consistent |
| Text in image, typography | gemini-3-pro-image-preview
| Better text rendering |
| Product photography | gemini-3-pro-image-preview
| Detail matters |
CRITICAL: Use ONLY these exact model strings. Do not invent, guess, or add date suffixes.
| Correct (use exactly) | WRONG (never use) |
|---|
| gemini-2.5-flash-preview-05-20
(date suffix) |
gemini-3-pro-image-preview
| (doesn't exist) |
| (doesn't exist) |
| (that's image input) |
Step 2: Compose prompt
Follow this structure:
[Subject] [Style] [Background] [Constraints]
For transparent background post-processing, include:
- "solid dark gray background" or "solid uniform gray background (#3a3a3a)"
- "no background elements or scenery"
Always include negative constraints: "no text", "no labels", "character only"
Step 3: Determine post-processing flags
- Need watermark removal? Add
- Need transparent background? Add
- Custom background color? Add
--bg-color "#FFFFFF" --bg-tolerance 20
Gate: Model selected, prompt composed, flags determined. Proceed only when gate passes.
Phase 3: GENERATE
Goal: Execute the generation script and capture output.
Step 1: Run generation
bash
python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
--prompt "YOUR_PROMPT_HERE" \
--output /absolute/path/to/output.png \
--model gemini-3-pro-image-preview
For batch mode:
bash
python3 $HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py \
--batch /path/to/prompts.txt \
--output-dir /absolute/path/to/output/ \
--model gemini-2.5-flash-image
Step 2: Read script output
Check for
or
in output. If rate limited (429), the script handles retry automatically.
Gate: Script exited with code 0 and printed SUCCESS. Proceed only when gate passes.
Phase 4: VERIFY
Goal: Confirm the output file exists and is valid.
Step 1: Verify file exists
bash
ls -la /absolute/path/to/output.png
File must exist and have non-zero size.
Step 2: Check dimensions (optional)
bash
python3 -c "from PIL import Image; img = Image.open('/absolute/path/to/output.png'); print(f'Size: {img.size}, Mode: {img.mode}')"
Step 3: Visual inspection (MANDATORY)
Read the generated image file using the Read tool to visually inspect it:
Read the image at /absolute/path/to/output.png
Check for:
- Content matches the prompt intent (correct subject, layout, composition)
- No unwanted watermarks, logos, or artifacts
- Text renders correctly (if text was requested)
- Appropriate aspect ratio and framing
- No excessive empty space or dark padding that needs cropping
If the image fails visual inspection, regenerate with an adjusted prompt before reporting to the user. Do not commit or deliver images without visual verification.
Step 4: Report result
Provide the user with:
- Output file path
- Image dimensions
- Model used
- Visual verification status (what you checked and confirmed)
- Any post-processing applied (cropping, resizing)
Gate: Output file exists with non-zero size AND visual inspection passed. Generation is complete.
Script Reference
generate_image.py
Location:
$HOME/claude-code-toolkit/skills/gemini-image-generator/scripts/generate_image.py
| Argument | Required | Description |
|---|
| Yes* | Text prompt for image generation |
| Yes* | Output file path (.png) |
| No | Model name (default: gemini-3-pro-image-preview) |
| No | Remove watermarks from corners |
| No | Make background transparent |
| No | Background color hex (default: #3a3a3a) |
| No | Color matching tolerance (default: 30) |
| No | File with prompts (one per line) |
| No | Directory for batch output |
| No | Max retry attempts (default: 3) |
| No | Delay between batch requests in seconds (default: 3) |
Exit Codes: 0 = success, 1 = missing API key, 2 = generation failed, 3 = invalid arguments
Error Handling
Error: "GEMINI_API_KEY not set"
Cause: Environment variable missing or empty
Solution:
- Set the variable:
export GEMINI_API_KEY="your-key"
- If in a CI/CD environment, check secrets configuration
- Verify the key is valid by testing with a simple prompt
Error: "Rate limit exceeded (429)"
Cause: Too many requests to Gemini API in short period
Solution:
- The script retries automatically with exponential backoff
- If persistent after retries, wait 60 seconds and try again
- For batch operations, increase to 5-10 seconds
- Consider switching to for higher throughput
Error: "No image in response"
Cause: API returned text-only response or generation was blocked
Solution:
- Add more detail to the prompt — vague prompts sometimes fail
- Try a different model
- Check that the prompt does not violate content policy
- Verify the script sets
response_modalities=["IMAGE", "TEXT"]
Error: "Content policy violation (400)"
Cause: Prompt contains restricted content or triggers safety filters
Solution:
- Remove potentially problematic terms from the prompt
- Rephrase the request using neutral language
- This is an API-side restriction and cannot be bypassed
Anti-Patterns
Anti-Pattern 1: Inventing Model Names
What it looks like:
model="gemini-2.5-flash-image-preview-12-25"
or
model="gemini-3-flash-image"
Why wrong: These models do not exist. Date suffixes are for text models only. The API returns cryptic errors.
Do instead: Use exactly
or
gemini-3-pro-image-preview
. No variations.
Anti-Pattern 2: Skipping Environment Validation
What it looks like: Running
without checking API key or dependencies first
Why wrong: Produces confusing error messages. Wastes time debugging environment issues as generation bugs.
Do instead: Complete Phase 1 (ENVIRONMENT) before any generation attempt. Always.
Anti-Pattern 3: Generating Without Visual Verification
What it looks like: Running the script, checking file size, and committing the image without reading it to visually inspect
Why wrong: The file may exist with correct dimensions but contain watermarks, wrong composition, excessive padding, or content that doesn't match the prompt. A 952KB PNG with a cat watermark and wrong aspect ratio passed file-exists checks but looked bad in the README.
Do instead: Complete Phase 4 (VERIFY) including Step 3 (visual inspection). Read the image file with the Read tool. Check composition, content, and artifacts before delivering or committing.
Anti-Pattern 4: Writing Custom Generation Code Instead of Using the Script
What it looks like: Writing inline Python to call the Gemini API directly instead of using
Why wrong: Misses retry logic, rate limiting, post-processing, model validation, and error handling already built into the script.
Do instead: Always use the provided
script. It handles all edge cases.
Anti-Pattern 5: Storing Base64 in Memory Instead of Saving to File
What it looks like: Keeping image data in a variable without writing to disk
Why wrong: Data is lost on exit, cannot be used by other tools, wastes memory for large images.
Do instead: Save to file immediately. The script does this automatically.
References
This skill uses these shared patterns:
- Verification Checklist - Pre-completion checks
Reference Files
${CLAUDE_SKILL_DIR}/references/prompts.md
: Categorized example prompts by use case (game art, characters, product photography, pixel art, icons)
Prompt Engineering Quick Reference
Effective prompt structure:
[Subject] [Style] [Background] [Constraints]
For transparent background post-processing:
- Use "solid dark gray background" or "solid uniform gray background (#3a3a3a)"
- Include "no background elements or scenery" and "no ground shadows"
- Combine with flag
For clean edges: "clean edges", "sharp outlines", "heavy ink outlines"
Negative constraints: Always include "no text", "no labels", "no watermarks", "character only"
Domain-Specific Anti-Rationalization
| Rationalization | Why It's Wrong | Required Action |
|---|
| "I know the right model name" | Model names are exact strings, not patterns | Check the two valid names |
| "Output file was probably created" | Probably is not verified | Run on the output path |
| "API key is probably set" | Silent failures waste debugging time | Check explicitly in Phase 1 |
| "Custom code is faster than the script" | Script has retry, rate limiting, validation | Use |