ai-image-generator

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AI Image Generator

AI图片生成器

Generate images using AI APIs (Google Gemini and OpenAI GPT). This skill teaches the prompting patterns and API mechanics for producing professional images directly from Claude Code.

Managed alternative: If you don't want to manage API keys, ImageBot provides a managed image generation service with album templates and brand kit support.

使用AI API（Google Gemini和OpenAI GPT）生成图片。本技能将讲解直接通过Claude Code生成专业图片的提示词模式和API机制。

托管替代方案：如果你不想管理API密钥，ImageBot提供了带有相册模板和品牌套件支持的托管图片生成服务。

Model Selection

模型选择

Choose the right model for the job:

Need	Model	Why
Scenes / stock photos	Gemini 3.1 Flash Image	Best depth, complexity, environmental context
Transparent icons / logos	GPT Image 1.5	Native RGBA alpha channel ( `background: "transparent"` )
Text on images	GPT Image 1.5	90% accurate text rendering
Drafts / iteration	Gemini 2.5 Flash Image	Free tier (~500/day)
Final client assets	Gemini 3 Pro Image	Higher detail, better style consistency

根据需求选择合适的模型：

需求	模型	原因
场景图/图库照片	Gemini 3.1 Flash Image	在景深、复杂度和环境背景表现上最佳
透明图标/Logo	GPT Image 1.5	原生支持RGBA alpha通道（ `background: "transparent"` ）
图片添加文字	GPT Image 1.5	文字渲染准确率达90%
草稿/迭代生成	Gemini 2.5 Flash Image	免费额度（每日约500次）
最终客户素材	Gemini 3 Pro Image	细节更丰富，风格一致性更好

Model IDs

模型ID

Model	API ID	Provider
Gemini 3.1 Flash Image	`gemini-3.1-flash-image-preview`	Google AI
Gemini 3 Pro Image	`gemini-3-pro-image-preview`	Google AI
Gemini 2.5 Flash Image	`gemini-2.5-flash-image`	Google AI
GPT Image 1.5	`gpt-image-1.5`	OpenAI

Verify model IDs before use — they change frequently:

bash

curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models'] if 'image' in m['name'].lower()]"

模型	API ID	提供商
Gemini 3.1 Flash Image	`gemini-3.1-flash-image-preview`	Google AI
Gemini 3 Pro Image	`gemini-3-pro-image-preview`	Google AI
Gemini 2.5 Flash Image	`gemini-2.5-flash-image`	Google AI
GPT Image 1.5	`gpt-image-1.5`	OpenAI

使用前请验证模型ID——这些ID会经常变更：

bash

curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models'] if 'image' in m['name'].lower()]"

The 5-Part Prompting Framework

五部分提示词框架

Build prompts in this order for consistent results:

按照以下顺序构建提示词，以获得一致的生成结果：

1. Image Type

1. 图片类型

Set the genre: "A photorealistic photograph", "An isometric illustration", "A flat vector icon"

设定图片风格：“一张写实风格照片”、“一张等距插画”、“一张扁平化矢量图标”

2. Subject

2. 主体

Who or what, with specific details: "of a warm, approachable Australian woman in her early 30s, smiling naturally"

明确主体及细节：“一位30岁左右、亲切友善的澳大利亚女性，自然微笑”

3. Environment

3. 环境

Setting and spatial relationships: "in a bright modern home with terracotta decor on wooden shelves behind her"

设定场景和空间关系：“在一个明亮的现代住宅中，背景是木质架子上的赤陶装饰”

4. Technical Specs

4. 技术参数

Camera and lighting: "Shot at 85mm f/2.0, natural window light, head and shoulders framing"

相机和灯光设置：“使用85mm f/2.0镜头拍摄，自然光，头肩构图”

5. Constraints

5. 约束条件

What to exclude: "Photorealistic, no text, no watermarks, no logos"

明确排除内容：“写实风格，无文字，无水印，无Logo”

Example (Good vs Bad)

示例（优秀vs糟糕）

BAD — keyword soup:
"professional woman, spa, warm lighting, high quality, 4K"

GOOD — narrative direction:
"A professional skin treatment scene in a warm clinical setting.
A practitioner wearing blue medical gloves uses a microneedling pen
on the client's forehead. The client lies on a white treatment bed,
eyes closed, relaxed. Warm golden-hour light from a window to the
left. Terracotta-toned wall visible in the background. Shot at
85mm f/2.0, shallow depth of field. No text, no watermarks."

糟糕——关键词堆砌：
"职业女性，spa，暖光，高质量，4K"

优秀——叙事式引导：
"一个温暖的临床环境中的专业皮肤护理场景。
一位戴着蓝色医用手套的从业者，使用微针笔为客户的额头进行护理。客户躺在白色护理床上，双眼紧闭，神态放松。左侧窗户透入温暖的黄金时段光线。背景可见赤陶色调的墙面。使用85mm f/2.0镜头拍摄，浅景深。无文字，无水印。"

Workflow

工作流程

1. Determine Image Need

1. 明确图片需求

Purpose	Aspect Ratio	Model
Hero banner	16:9 or 21:9	Gemini
Service card	4:3 or 3:4	Gemini
Profile / avatar	1:1	Gemini
Icon / badge	1:1	GPT (transparent)
OG / social share	1.91:1	Gemini
Instagram post	1:1 or 4:5	Gemini
Mobile hero	9:16	Gemini

用途	宽高比	模型
首页横幅	16:9 或 21:9	Gemini
服务卡片	4:3 或 3:4	Gemini
头像/个人资料图	1:1	Gemini
图标/徽章	1:1	GPT（透明背景）
OG图/社交分享图	1.91:1	Gemini
Instagram帖子	1:1 或 4:5	Gemini
移动端首页横幅	9:16	Gemini

2. Build the Prompt

2. 构建提示词

Use the 5-part framework. Refer to

references/prompting-guide.md

for detailed photography parameters.

使用五部分框架。可参考

references/prompting-guide.md

获取详细摄影参数。

3. Generate via API

3. 通过API生成图片

Gemini (Python — handles shell escaping correctly)

Gemini（Python——可正确处理Shell转义）

python

python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
    print("Set GEMINI_API_KEY environment variable"); sys.exit(1)

model = "gemini-3.1-flash-image-preview"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={GEMINI_API_KEY}"

prompt = """A professional photograph of a modern co-working space in
Newcastle, Australia. Natural light floods through floor-to-ceiling
windows. Three people collaborate at a standing desk — one pointing
at a laptop screen. Exposed brick wall, potted fiddle-leaf fig,
coffee cups on the desk. Shot at 35mm f/4.0, environmental portrait
style. No text, no watermarks, no logos."""

payload = json.dumps({
    "contents": [{"parts": [{"text": prompt}]}],
    "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"],
        "temperature": 0.8
    }
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "User-Agent": "ImageGen/1.0"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

python

python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
    print("Set GEMINI_API_KEY environment variable"); sys.exit(1)

model = "gemini-3.1-flash-image-preview"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={GEMINI_API_KEY}"

prompt = """A professional photograph of a modern co-working space in
Newcastle, Australia. Natural light floods through floor-to-ceiling
windows. Three people collaborate at a standing desk — one pointing
at a laptop screen. Exposed brick wall, potted fiddle-leaf fig,
coffee cups on the desk. Shot at 35mm f/4.0, environmental portrait
style. No text, no watermarks, no logos."""

payload = json.dumps({
    "contents": [{"parts": [{"text": prompt}]}],
    "generationConfig": {
        "responseModalities": ["TEXT", "IMAGE"],
        "temperature": 0.8
    }
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "User-Agent": "ImageGen/1.0"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

Extract image from response

for part in result["candidates"][0]["content"]["parts"]: if "inlineData" in part: img_data = base64.b64decode(part["inlineData"]["data"]) output_path = "hero-image.png" with open(output_path, "wb") as f: f.write(img_data) print(f"Saved: {output_path} ({len(img_data):,} bytes)") break PYEOF

undefined

undefined

GPT (Transparent Icons)

GPT（透明图标生成）

python

python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    print("Set OPENAI_API_KEY environment variable"); sys.exit(1)

url = "https://api.openai.com/v1/images/generations"

payload = json.dumps({
    "model": "gpt-image-1.5",
    "prompt": "A minimal, clean plumbing wrench icon. Flat design, single consistent stroke weight, modern style. On a transparent background.",
    "n": 1,
    "size": "1024x1024",
    "background": "transparent",
    "output_format": "png"
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_API_KEY}"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

img_data = base64.b64decode(result["data"][0]["b64_json"])
with open("icon-wrench.png", "wb") as f:
    f.write(img_data)
print(f"Saved: icon-wrench.png ({len(img_data):,} bytes)")
PYEOF

python

python3 << 'PYEOF'
import json, base64, urllib.request, os, sys

OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
    print("Set OPENAI_API_KEY environment variable"); sys.exit(1)

url = "https://api.openai.com/v1/images/generations"

payload = json.dumps({
    "model": "gpt-image-1.5",
    "prompt": "A minimal, clean plumbing wrench icon. Flat design, single consistent stroke weight, modern style. On a transparent background.",
    "n": 1,
    "size": "1024x1024",
    "background": "transparent",
    "output_format": "png"
}).encode()

req = urllib.request.Request(url, data=payload, headers={
    "Content-Type": "application/json",
    "Authorization": f"Bearer {OPENAI_API_KEY}"
})

resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())

img_data = base64.b64decode(result["data"][0]["b64_json"])
with open("icon-wrench.png", "wb") as f:
    f.write(img_data)
print(f"Saved: icon-wrench.png ({len(img_data):,} bytes)")
PYEOF

4. Save and Optimise

4. 保存与优化

Save generated images to

.jez/artifacts/

or the user's specified path.

Post-processing (optional):

bash

undefined

将生成的图片保存至

.jez/artifacts/

或用户指定路径。

后期处理（可选）：

bash

undefined

Convert to WebP for web use

转换为WebP格式用于网页

python3 -c " from PIL import Image img = Image.open('hero-image.png') img.save('hero-image.webp', 'WEBP', quality=85) print(f'WebP: {img.size[0]}x{img.size[1]}') "

Trim whitespace from transparent icons

裁剪透明图标周围的空白

python3 -c " from PIL import Image img = Image.open('icon.png') trimmed = img.crop(img.getbbox()) trimmed.save('icon-trimmed.png') "

undefined

python3 -c " from PIL import Image img = Image.open('icon.png') trimmed = img.crop(img.getbbox()) trimmed.save('icon-trimmed.png') "

undefined

5. Quality Check (Optional)

5. 质量检查（可选）

Send the generated image back to a vision model for QA:

python

undefined

将生成的图片发送至视觉模型进行质量验证：

python

undefined

Send to Gemini Flash for critique

发送至Gemini Flash进行评估

critique_prompt = """Review this image for:

AI artifacts (extra fingers, floating objects, text errors)
Technical accuracy (wrong equipment, unsafe positioning)
Composition issues (awkward cropping, cluttered background)
Style consistency with a professional stock photo

List any issues found, or say 'PASS' if the image is production-ready."""


If issues are found, append them as negative guidance to the original prompt and regenerate.

critique_prompt = """Review this image for:

AI artifacts (extra fingers, floating objects, text errors)
Technical accuracy (wrong equipment, unsafe positioning)
Composition issues (awkward cropping, cluttered background)
Style consistency with a professional stock photo

List any issues found, or say 'PASS' if the image is production-ready."""


如果发现问题，将问题作为负面引导添加到原始提示词中，重新生成图片。

Multi-Turn Editing

多轮编辑

Gemini supports editing a generated image across conversation turns. The key requirement: preserve thought signatures from model responses.

python

undefined

Gemini支持在对话轮次中编辑已生成的图片。核心要求：保留模型响应中的所有思维签名（thought signatures）。

python

undefined

Turn 1: Generate base image

第一轮：生成基础图片

contents = [{"role": "user", "parts": [{"text": "Scene prompt..."}]}]

The response includes thoughtSignature on parts — preserve them ALL

响应结果的parts中包含thoughtSignature——需完整保留所有内容

Turn 2: Edit the image

第二轮：编辑图片

contents = [ {"role": "user", "parts": [{"text": "Original prompt"}]}, {"role": "model", "parts": response_parts_with_signatures}, # Keep intact {"role": "user", "parts": [{"text": "Edit: change the wall colour to blue. Keep everything else exactly the same."}]} ]


**Edit prompt pattern**: Always specify what to KEEP unchanged, not just what to change. The model treats unlisted elements as free to modify.

GOOD: "Edit this image: keep the people, desk, and window unchanged. Only change: wall colour from terracotta to ocean blue."

BAD: "Now make the wall blue." (Model may change everything else too)

undefined

contents = [ {"role": "user", "parts": [{"text": "Original prompt"}]}, {"role": "model", "parts": response_parts_with_signatures}, # 完整保留 {"role": "user", "parts": [{"text": "Edit: change the wall colour to blue. Keep everything else exactly the same."}]} ]


**编辑提示词模式**：务必明确指定需要**保留不变**的内容，而不仅仅是要修改的内容。模型会将未列出的元素视为可自由修改的部分。

优秀："编辑此图片：保留人物、桌子和窗户不变。仅修改：将墙面颜色从赤陶色改为海洋蓝。"

糟糕："现在把墙改成蓝色。" （模型可能会同时修改其他所有内容）

undefined

API Key Setup

API密钥设置

Provider	Get key at	Env variable
Google Gemini	aistudio.google.com	`GEMINI_API_KEY`
OpenAI	platform.openai.com	`OPENAI_API_KEY`

bash

export GEMINI_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"

提供商	获取地址	环境变量
Google Gemini	aistudio.google.com	`GEMINI_API_KEY`
OpenAI	platform.openai.com	`OPENAI_API_KEY`

bash

export GEMINI_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"

Common Mistakes

常见错误

Mistake	Fix
Using curl for Gemini prompts	Use Python — shell escaping breaks on apostrophes
"Beautiful, professional, high quality"	Use concrete specs: "85mm f/1.8, golden hour light"
Not specifying what to exclude	Always end with "No text, no watermarks, no logos"
Requesting transparent PNG from Gemini	Gemini cannot do transparency — use GPT with `background: "transparent"`
American defaults for AU businesses	Explicitly specify "Australian" + local architecture, vegetation
Generic data for model ID	Verify current model IDs — they change frequently

错误	修复方案
使用curl调用Gemini提示词	使用Python——Shell转义会在遇到撇号时出错
使用“精美、专业、高质量”这类模糊描述	使用具体参数：“85mm f/1.8镜头，黄金时段光线”
未明确排除内容	提示词末尾务必添加“无文字，无水印，无Logo”
要求Gemini生成透明PNG	Gemini不支持透明背景——使用GPT并设置 `background: "transparent"`
为澳大利亚企业生成带有美式默认风格的图片	明确指定“澳大利亚”+当地建筑、植被元素
使用通用的模型ID数据	验证当前模型ID——这些ID会经常变更