ai-image-generator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI Image Generator
AI图片生成器
Generate images using AI APIs (Google Gemini and OpenAI GPT). This skill teaches the prompting patterns and API mechanics for producing professional images directly from Claude Code.
Managed alternative: If you don't want to manage API keys, ImageBot provides a managed image generation service with album templates and brand kit support.
使用AI API(Google Gemini和OpenAI GPT)生成图片。本技能将讲解直接通过Claude Code生成专业图片的提示词模式和API机制。
托管替代方案:如果你不想管理API密钥,ImageBot提供了带有相册模板和品牌套件支持的托管图片生成服务。
Model Selection
模型选择
Choose the right model for the job:
| Need | Model | Why |
|---|---|---|
| Scenes / stock photos | Gemini 3.1 Flash Image | Best depth, complexity, environmental context |
| Transparent icons / logos | GPT Image 1.5 | Native RGBA alpha channel ( |
| Text on images | GPT Image 1.5 | 90% accurate text rendering |
| Drafts / iteration | Gemini 2.5 Flash Image | Free tier (~500/day) |
| Final client assets | Gemini 3 Pro Image | Higher detail, better style consistency |
根据需求选择合适的模型:
| 需求 | 模型 | 原因 |
|---|---|---|
| 场景图/图库照片 | Gemini 3.1 Flash Image | 在景深、复杂度和环境背景表现上最佳 |
| 透明图标/Logo | GPT Image 1.5 | 原生支持RGBA alpha通道( |
| 图片添加文字 | GPT Image 1.5 | 文字渲染准确率达90% |
| 草稿/迭代生成 | Gemini 2.5 Flash Image | 免费额度(每日约500次) |
| 最终客户素材 | Gemini 3 Pro Image | 细节更丰富,风格一致性更好 |
Model IDs
模型ID
| Model | API ID | Provider |
|---|---|---|
| Gemini 3.1 Flash Image | | Google AI |
| Gemini 3 Pro Image | | Google AI |
| Gemini 2.5 Flash Image | | Google AI |
| GPT Image 1.5 | | OpenAI |
Verify model IDs before use — they change frequently:
bash
curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models'] if 'image' in m['name'].lower()]"| 模型 | API ID | 提供商 |
|---|---|---|
| Gemini 3.1 Flash Image | | Google AI |
| Gemini 3 Pro Image | | Google AI |
| Gemini 2.5 Flash Image | | Google AI |
| GPT Image 1.5 | | OpenAI |
使用前请验证模型ID——这些ID会经常变更:
bash
curl -s "https://generativelanguage.googleapis.com/v1beta/models?key=$GEMINI_API_KEY" | python3 -c "import sys,json; [print(m['name']) for m in json.load(sys.stdin)['models'] if 'image' in m['name'].lower()]"The 5-Part Prompting Framework
五部分提示词框架
Build prompts in this order for consistent results:
按照以下顺序构建提示词,以获得一致的生成结果:
1. Image Type
1. 图片类型
Set the genre: "A photorealistic photograph", "An isometric illustration", "A flat vector icon"
设定图片风格:“一张写实风格照片”、“一张等距插画”、“一张扁平化矢量图标”
2. Subject
2. 主体
Who or what, with specific details: "of a warm, approachable Australian woman in her early 30s, smiling naturally"
明确主体及细节:“一位30岁左右、亲切友善的澳大利亚女性,自然微笑”
3. Environment
3. 环境
Setting and spatial relationships: "in a bright modern home with terracotta decor on wooden shelves behind her"
设定场景和空间关系:“在一个明亮的现代住宅中,背景是木质架子上的赤陶装饰”
4. Technical Specs
4. 技术参数
Camera and lighting: "Shot at 85mm f/2.0, natural window light, head and shoulders framing"
相机和灯光设置:“使用85mm f/2.0镜头拍摄,自然光,头肩构图”
5. Constraints
5. 约束条件
What to exclude: "Photorealistic, no text, no watermarks, no logos"
明确排除内容:“写实风格,无文字,无水印,无Logo”
Example (Good vs Bad)
示例(优秀vs糟糕)
BAD — keyword soup:
"professional woman, spa, warm lighting, high quality, 4K"
GOOD — narrative direction:
"A professional skin treatment scene in a warm clinical setting.
A practitioner wearing blue medical gloves uses a microneedling pen
on the client's forehead. The client lies on a white treatment bed,
eyes closed, relaxed. Warm golden-hour light from a window to the
left. Terracotta-toned wall visible in the background. Shot at
85mm f/2.0, shallow depth of field. No text, no watermarks."糟糕——关键词堆砌:
"职业女性,spa,暖光,高质量,4K"
优秀——叙事式引导:
"一个温暖的临床环境中的专业皮肤护理场景。
一位戴着蓝色医用手套的从业者,使用微针笔为客户的额头进行护理。客户躺在白色护理床上,双眼紧闭,神态放松。左侧窗户透入温暖的黄金时段光线。背景可见赤陶色调的墙面。使用85mm f/2.0镜头拍摄,浅景深。无文字,无水印。"Workflow
工作流程
1. Determine Image Need
1. 明确图片需求
| Purpose | Aspect Ratio | Model |
|---|---|---|
| Hero banner | 16:9 or 21:9 | Gemini |
| Service card | 4:3 or 3:4 | Gemini |
| Profile / avatar | 1:1 | Gemini |
| Icon / badge | 1:1 | GPT (transparent) |
| OG / social share | 1.91:1 | Gemini |
| Instagram post | 1:1 or 4:5 | Gemini |
| Mobile hero | 9:16 | Gemini |
| 用途 | 宽高比 | 模型 |
|---|---|---|
| 首页横幅 | 16:9 或 21:9 | Gemini |
| 服务卡片 | 4:3 或 3:4 | Gemini |
| 头像/个人资料图 | 1:1 | Gemini |
| 图标/徽章 | 1:1 | GPT(透明背景) |
| OG图/社交分享图 | 1.91:1 | Gemini |
| Instagram帖子 | 1:1 或 4:5 | Gemini |
| 移动端首页横幅 | 9:16 | Gemini |
2. Build the Prompt
2. 构建提示词
Use the 5-part framework. Refer to for detailed photography parameters.
references/prompting-guide.md使用五部分框架。可参考获取详细摄影参数。
references/prompting-guide.md3. Generate via API
3. 通过API生成图片
Gemini (Python — handles shell escaping correctly)
Gemini(Python——可正确处理Shell转义)
python
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
print("Set GEMINI_API_KEY environment variable"); sys.exit(1)
model = "gemini-3.1-flash-image-preview"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={GEMINI_API_KEY}"
prompt = """A professional photograph of a modern co-working space in
Newcastle, Australia. Natural light floods through floor-to-ceiling
windows. Three people collaborate at a standing desk — one pointing
at a laptop screen. Exposed brick wall, potted fiddle-leaf fig,
coffee cups on the desk. Shot at 35mm f/4.0, environmental portrait
style. No text, no watermarks, no logos."""
payload = json.dumps({
"contents": [{"parts": [{"text": prompt}]}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"temperature": 0.8
}
}).encode()
req = urllib.request.Request(url, data=payload, headers={
"Content-Type": "application/json",
"User-Agent": "ImageGen/1.0"
})
resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())python
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys
GEMINI_API_KEY = os.environ.get("GEMINI_API_KEY")
if not GEMINI_API_KEY:
print("Set GEMINI_API_KEY environment variable"); sys.exit(1)
model = "gemini-3.1-flash-image-preview"
url = f"https://generativelanguage.googleapis.com/v1beta/models/{model}:generateContent?key={GEMINI_API_KEY}"
prompt = """A professional photograph of a modern co-working space in
Newcastle, Australia. Natural light floods through floor-to-ceiling
windows. Three people collaborate at a standing desk — one pointing
at a laptop screen. Exposed brick wall, potted fiddle-leaf fig,
coffee cups on the desk. Shot at 35mm f/4.0, environmental portrait
style. No text, no watermarks, no logos."""
payload = json.dumps({
"contents": [{"parts": [{"text": prompt}]}],
"generationConfig": {
"responseModalities": ["TEXT", "IMAGE"],
"temperature": 0.8
}
}).encode()
req = urllib.request.Request(url, data=payload, headers={
"Content-Type": "application/json",
"User-Agent": "ImageGen/1.0"
})
resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())Extract image from response
Extract image from response
for part in result["candidates"][0]["content"]["parts"]:
if "inlineData" in part:
img_data = base64.b64decode(part["inlineData"]["data"])
output_path = "hero-image.png"
with open(output_path, "wb") as f:
f.write(img_data)
print(f"Saved: {output_path} ({len(img_data):,} bytes)")
break
PYEOF
undefinedfor part in result["candidates"][0]["content"]["parts"]:
if "inlineData" in part:
img_data = base64.b64decode(part["inlineData"]["data"])
output_path = "hero-image.png"
with open(output_path, "wb") as f:
f.write(img_data)
print(f"Saved: {output_path} ({len(img_data):,} bytes)")
break
PYEOF
undefinedGPT (Transparent Icons)
GPT(透明图标生成)
python
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
print("Set OPENAI_API_KEY environment variable"); sys.exit(1)
url = "https://api.openai.com/v1/images/generations"
payload = json.dumps({
"model": "gpt-image-1.5",
"prompt": "A minimal, clean plumbing wrench icon. Flat design, single consistent stroke weight, modern style. On a transparent background.",
"n": 1,
"size": "1024x1024",
"background": "transparent",
"output_format": "png"
}).encode()
req = urllib.request.Request(url, data=payload, headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {OPENAI_API_KEY}"
})
resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())
img_data = base64.b64decode(result["data"][0]["b64_json"])
with open("icon-wrench.png", "wb") as f:
f.write(img_data)
print(f"Saved: icon-wrench.png ({len(img_data):,} bytes)")
PYEOFpython
python3 << 'PYEOF'
import json, base64, urllib.request, os, sys
OPENAI_API_KEY = os.environ.get("OPENAI_API_KEY")
if not OPENAI_API_KEY:
print("Set OPENAI_API_KEY environment variable"); sys.exit(1)
url = "https://api.openai.com/v1/images/generations"
payload = json.dumps({
"model": "gpt-image-1.5",
"prompt": "A minimal, clean plumbing wrench icon. Flat design, single consistent stroke weight, modern style. On a transparent background.",
"n": 1,
"size": "1024x1024",
"background": "transparent",
"output_format": "png"
}).encode()
req = urllib.request.Request(url, data=payload, headers={
"Content-Type": "application/json",
"Authorization": f"Bearer {OPENAI_API_KEY}"
})
resp = urllib.request.urlopen(req, timeout=120)
result = json.loads(resp.read())
img_data = base64.b64decode(result["data"][0]["b64_json"])
with open("icon-wrench.png", "wb") as f:
f.write(img_data)
print(f"Saved: icon-wrench.png ({len(img_data):,} bytes)")
PYEOF4. Save and Optimise
4. 保存与优化
Save generated images to or the user's specified path.
.jez/artifacts/Post-processing (optional):
bash
undefined将生成的图片保存至或用户指定路径。
.jez/artifacts/后期处理(可选):
bash
undefinedConvert to WebP for web use
转换为WebP格式用于网页
python3 -c "
from PIL import Image
img = Image.open('hero-image.png')
img.save('hero-image.webp', 'WEBP', quality=85)
print(f'WebP: {img.size[0]}x{img.size[1]}')
"
python3 -c "
from PIL import Image
img = Image.open('hero-image.png')
img.save('hero-image.webp', 'WEBP', quality=85)
print(f'WebP: {img.size[0]}x{img.size[1]}')
"
Trim whitespace from transparent icons
裁剪透明图标周围的空白
python3 -c "
from PIL import Image
img = Image.open('icon.png')
trimmed = img.crop(img.getbbox())
trimmed.save('icon-trimmed.png')
"
undefinedpython3 -c "
from PIL import Image
img = Image.open('icon.png')
trimmed = img.crop(img.getbbox())
trimmed.save('icon-trimmed.png')
"
undefined5. Quality Check (Optional)
5. 质量检查(可选)
Send the generated image back to a vision model for QA:
python
undefined将生成的图片发送至视觉模型进行质量验证:
python
undefinedSend to Gemini Flash for critique
发送至Gemini Flash进行评估
critique_prompt = """Review this image for:
- AI artifacts (extra fingers, floating objects, text errors)
- Technical accuracy (wrong equipment, unsafe positioning)
- Composition issues (awkward cropping, cluttered background)
- Style consistency with a professional stock photo
List any issues found, or say 'PASS' if the image is production-ready."""
If issues are found, append them as negative guidance to the original prompt and regenerate.critique_prompt = """Review this image for:
- AI artifacts (extra fingers, floating objects, text errors)
- Technical accuracy (wrong equipment, unsafe positioning)
- Composition issues (awkward cropping, cluttered background)
- Style consistency with a professional stock photo
List any issues found, or say 'PASS' if the image is production-ready."""
如果发现问题,将问题作为负面引导添加到原始提示词中,重新生成图片。Multi-Turn Editing
多轮编辑
Gemini supports editing a generated image across conversation turns. The key requirement: preserve thought signatures from model responses.
python
undefinedGemini支持在对话轮次中编辑已生成的图片。核心要求:保留模型响应中的所有思维签名(thought signatures)。
python
undefinedTurn 1: Generate base image
第一轮:生成基础图片
contents = [{"role": "user", "parts": [{"text": "Scene prompt..."}]}]
contents = [{"role": "user", "parts": [{"text": "Scene prompt..."}]}]
The response includes thoughtSignature on parts — preserve them ALL
响应结果的parts中包含thoughtSignature——需完整保留所有内容
Turn 2: Edit the image
第二轮:编辑图片
contents = [
{"role": "user", "parts": [{"text": "Original prompt"}]},
{"role": "model", "parts": response_parts_with_signatures}, # Keep intact
{"role": "user", "parts": [{"text": "Edit: change the wall colour to blue. Keep everything else exactly the same."}]}
]
**Edit prompt pattern**: Always specify what to KEEP unchanged, not just what to change. The model treats unlisted elements as free to modify.
GOOD: "Edit this image: keep the people, desk, and window unchanged.
Only change: wall colour from terracotta to ocean blue."
BAD: "Now make the wall blue."
(Model may change everything else too)
undefinedcontents = [
{"role": "user", "parts": [{"text": "Original prompt"}]},
{"role": "model", "parts": response_parts_with_signatures}, # 完整保留
{"role": "user", "parts": [{"text": "Edit: change the wall colour to blue. Keep everything else exactly the same."}]}
]
**编辑提示词模式**:务必明确指定需要**保留不变**的内容,而不仅仅是要修改的内容。模型会将未列出的元素视为可自由修改的部分。
优秀:"编辑此图片:保留人物、桌子和窗户不变。
仅修改:将墙面颜色从赤陶色改为海洋蓝。"
糟糕:"现在把墙改成蓝色。"
(模型可能会同时修改其他所有内容)
undefinedAPI Key Setup
API密钥设置
| Provider | Get key at | Env variable |
|---|---|---|
| Google Gemini | aistudio.google.com | |
| OpenAI | platform.openai.com | |
bash
export GEMINI_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"| 提供商 | 获取地址 | 环境变量 |
|---|---|---|
| Google Gemini | aistudio.google.com | |
| OpenAI | platform.openai.com | |
bash
export GEMINI_API_KEY="your-key-here"
export OPENAI_API_KEY="your-key-here"Common Mistakes
常见错误
| Mistake | Fix |
|---|---|
| Using curl for Gemini prompts | Use Python — shell escaping breaks on apostrophes |
| "Beautiful, professional, high quality" | Use concrete specs: "85mm f/1.8, golden hour light" |
| Not specifying what to exclude | Always end with "No text, no watermarks, no logos" |
| Requesting transparent PNG from Gemini | Gemini cannot do transparency — use GPT with |
| American defaults for AU businesses | Explicitly specify "Australian" + local architecture, vegetation |
| Generic data for model ID | Verify current model IDs — they change frequently |
| 错误 | 修复方案 |
|---|---|
| 使用curl调用Gemini提示词 | 使用Python——Shell转义会在遇到撇号时出错 |
| 使用“精美、专业、高质量”这类模糊描述 | 使用具体参数:“85mm f/1.8镜头,黄金时段光线” |
| 未明确排除内容 | 提示词末尾务必添加“无文字,无水印,无Logo” |
| 要求Gemini生成透明PNG | Gemini不支持透明背景——使用GPT并设置 |
| 为澳大利亚企业生成带有美式默认风格的图片 | 明确指定“澳大利亚”+当地建筑、植被元素 |
| 使用通用的模型ID数据 | 验证当前模型ID——这些ID会经常变更 |