image-generation
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseImage Generation & Editing Skill
图像生成与编辑Skill
Generate and edit images using AI (Google Gemini Nano Banana Pro, OpenAI DALL-E 3).
Capabilities:
- 🎨 Generate: Create new images from text descriptions
- ✏️ Edit: Modify existing images (add/remove elements, change colors)
- 🛍️ Product Placement: Put products into scenes
- 🎭 Style Transfer: Apply artistic styles to photos
- 🖼️ Composite: Combine multiple images into one
使用AI(Google Gemini Nano Banana Pro、OpenAI DALL-E 3)生成和编辑图像。
功能特性:
- 🎨 生成:根据文本描述创建新图像
- ✏️ 编辑:修改现有图像(添加/移除元素、更改颜色)
- 🛍️ 产品放置:将产品融入场景中
- 🎭 风格迁移:为照片应用艺术风格
- 🖼️ 合成:将多张图像合并为一张
Quick Examples
快速示例
Users can specify what they want:
| User Says | Mode | What Happens |
|---|---|---|
| "Generate an image of a sunset" | Generate | Text-to-image, no reference needed |
| "Create a logo for my coffee shop" | Generate | Text-to-image with text rendering |
| "Edit this image: add a hat to the cat" | Edit | User provides image, AI modifies it |
| "Remove the background from this photo" | Edit | User provides image, AI edits it |
| "Put this product on a kitchen counter" | Product | User provides product + optional scene |
| "Make this photo look like Van Gogh painted it" | Style | User provides photo, AI applies style |
| "Combine these photos into a group shot" | Composite | User provides multiple images |
用户可以明确自己的需求:
| 用户指令 | 模式 | 执行动作 |
|---|---|---|
| "Generate an image of a sunset" | 生成 | 文本转图像,无需参考图 |
| "Create a logo for my coffee shop" | 生成 | 支持文本渲染的文本转图像 |
| "Edit this image: add a hat to the cat" | 编辑 | 用户提供图像,AI进行修改 |
| "Remove the background from this photo" | 编辑 | 用户提供图像,AI进行编辑 |
| "Put this product on a kitchen counter" | 产品放置 | 用户提供产品图 + 可选场景图 |
| "Make this photo look like Van Gogh painted it" | 风格迁移 | 用户提供照片,AI应用风格 |
| "Combine these photos into a group shot" | 合成 | 用户提供多张图像 |
Prerequisites
前置条件
Environment variables must be configured for the APIs to work. At least one API key is required:
- - For OpenAI DALL-E 3 image generation
OPENAI_API_KEY - - For Google Gemini (Nano Banana / Nano Banana Pro)
GOOGLE_API_KEY
See the repository README for setup instructions.
必须配置环境变量才能让API正常工作。至少需要一个API密钥:
- - 用于OpenAI DALL-E 3图像生成
OPENAI_API_KEY - - 用于Google Gemini(Nano Banana / Nano Banana Pro)
GOOGLE_API_KEY
请查看仓库README获取设置说明。
Available APIs
可用API
OpenAI GPT Image (Recommended for pure generation)
OpenAI GPT Image(推荐用于纯生成场景)
- Models:
- (state of the art, best quality)
gpt-image-1.5 - (great quality, cost-effective)
gpt-image-1 - (fastest, most affordable)
gpt-image-1-mini
- Best for: High-quality generation, transparency, text rendering, image editing
- Sizes: 1024x1024 (square), 1536x1024 (landscape), 1024x1536 (portrait), or
auto - Quality: low (fast), medium (balanced), high (best), or
auto - Background: transparent, opaque, or
auto - Output formats: png (default), jpeg (faster), webp
- Compression: 0-100% (for jpeg/webp)
- Features:
- Image editing with up to 16 input images
- Transparent backgrounds
- Streaming with partial images
- High input fidelity for preserving faces/logos
- Inpainting with masks
- 32,000 character prompts
⚠️ Note: DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026.
- 模型:
- (最先进,画质最佳)
gpt-image-1.5 - (画质出色,性价比高)
gpt-image-1 - (速度最快,成本最低)
gpt-image-1-mini
- 最适合:高质量生成、透明背景、流式传输、最多16张输入图像
- 尺寸:1024x1024(正方形)、1536x1024(横屏)、1024x1536(竖屏)或
auto - 画质:low(快速)、medium(平衡)、high(最佳)或
auto - 背景:transparent、opaque或
auto - 输出格式:png(默认)、jpeg(更快)、webp
- 压缩率:0-100%(适用于jpeg/webp)
- 功能:
- 支持最多16张输入图像的图像编辑
- 透明背景
- 带部分图像的流式传输
- 高输入保真度,可保留人脸/标志
- 带蒙版的图像修复
- 支持32000字符的提示词
⚠️ 注意:DALL-E 2和DALL-E 3已被弃用,将于2026年5月12日停止支持。
Google Gemini Native Image Generation (Recommended for editing)
Google Gemini原生图像生成(推荐用于编辑场景)
- Nano Banana (): Fast, efficient, 1K resolution, up to 3 reference images
gemini-2.5-flash-image - Nano Banana Pro (): Professional quality, up to 4K, thinking mode, up to 14 reference images (default)
gemini-3-pro-image-preview - Aspect ratios: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9
- Resolutions (Pro only): 1K, 2K, 4K
- Features:
- Image editing (add/remove elements, color changes)
- Product placement and composition
- Style transfer
- Advanced text rendering
- Google Search grounding (Pro only)
- Thinking mode for complex prompts (Pro only)
- Nano Banana ():快速高效,1K分辨率,最多3张参考图
gemini-2.5-flash-image - Nano Banana Pro ():专业画质,最高4K分辨率,思考模式,最多14张参考图(默认)
gemini-3-pro-image-preview - 宽高比:1:1、2:3、3:2、3:4、4:3、4:5、5:4、9:16、16:9、21:9
- 分辨率(仅Pro版):1K、2K、4K
- 功能:
- 图像编辑(添加/移除元素、颜色调整)
- 产品放置与合成
- 风格迁移
- 高级文本渲染
- Google搜索 grounding(仅Pro版)
- 复杂提示词的思考模式(仅Pro版)
Workflow
工作流程
Step 1: Gather Requirements (REQUIRED)
步骤1:收集需求(必填)
⚠️ Use interactive questioning — ask ONE question at a time.
⚠️ 使用交互式提问——一次只问一个问题。
Question Flow
提问流程
⚠️ Use the tool for each question below. Do not just print questions in your response — use the tool to create interactive prompts with the options shown.
AskUserQuestionQ0: Model Selection
"Which image generation model would you like to use?
- Google Gemini (Nano Banana Pro) - Up to 4K, 14 reference images, style transfer, thinking mode (Recommended)
- OpenAI GPT Image 1.5 - State of the art, transparency, streaming, up to 16 input images
- OpenAI GPT Image 1 - Great quality, transparency, image editing
- OpenAI GPT Image 1 Mini - Fastest, most affordable"
Wait for response. If user doesn't have a preference, recommend Gemini for editing/reference tasks or GPT Image 1.5 for pure generation.
Q1: Reference
"I'll generate that image for you! First — do you have any reference images?
- Product photos to include
- Style references
- Images to edit
- No, generate from scratch"
Wait for response.
Q2: Aspect Ratio
"What aspect ratio?
- 1:1 (square)
- 16:9 (landscape/widescreen)
- 9:16 (portrait/vertical)
- 4:3 / 3:4 (classic)
- Other (2:3, 3:2, 4:5, 5:4, 21:9)
- Or specify"
Wait for response.
Q3: Resolution
"What resolution?
- 1K (fast)
- 2K (balanced)
- 4K (highest quality)"
Wait for response.
Q4: Style
"Any style preferences?
- Photorealistic
- Artistic/painterly
- Cartoon/illustration
- 3D render
- Or describe your own"
Wait for response.
⚠️ 使用工具处理以下每个问题。 不要直接在回复中打印问题——使用工具创建包含所示选项的交互式提示。
AskUserQuestion问题0:模型选择
"你想使用哪种图像生成模型?
- Google Gemini(Nano Banana Pro)——最高4K分辨率,14张参考图,风格迁移,思考模式(推荐)
- OpenAI GPT Image 1.5——最先进,透明背景,流式传输,最多16张输入图像
- OpenAI GPT Image 1——画质出色,透明背景,图像编辑
- OpenAI GPT Image 1 Mini——速度最快,成本最低"
等待回复。如果用户没有偏好,编辑/参考任务推荐Gemini,纯生成任务推荐GPT Image 1.5。
问题1:参考图
"我将为你生成图像!首先——你有参考图吗?
- 要包含的产品照片
- 风格参考图
- 要编辑的图像
- 没有,从头生成"
等待回复。
问题2:宽高比
"需要什么宽高比?
- 1:1(正方形)
- 16:9(横屏/宽屏)
- 9:16(竖屏/垂直)
- 4:3 / 3:4(经典)
- 其他(2:3、3:2、4:5、5:4、21:9)
- 或自定义"
等待回复。
问题3:分辨率
"需要什么分辨率?
- 1K(快速)
- 2K(平衡)
- 4K(最高画质)"
等待回复。
问题4:风格偏好
"有风格偏好吗?
- 写实风格
- 艺术/绘画风格
- 卡通/插画风格
- 3D渲染风格
- 或自定义描述"
等待回复。
Quick Reference
快速参考
| Question | Determines |
|---|---|
| Reference | Generation vs editing mode |
| Aspect Ratio | Image dimensions |
| Resolution | Quality level |
| Style | Prompt enhancement direction |
Parsing:
- If user provides reference images → use image editing mode
- If user doesn't answer all questions → use sensible defaults and note assumptions
- Parse: subject, style, mood, special requirements (colors, text, composition)
| 问题 | 决定因素 |
|---|---|
| 参考图 | 生成模式 vs 编辑模式 |
| 宽高比 | 图像尺寸 |
| 分辨率 | 画质等级 |
| 风格 | 提示词优化方向 |
解析规则:
- 如果用户提供参考图 → 使用图像编辑模式
- 如果用户未回答所有问题 → 使用合理默认值并注明假设
- 解析内容:主体、风格、氛围、特殊要求(颜色、文本、构图)
Step 2: Craft the Prompt
步骤2:编写提示词
Transform the user request into an effective image generation prompt:
- Be specific: Add details the user might not have mentioned
- Describe style: "digital art", "oil painting", "photograph", "3D render"
- Include lighting: "soft lighting", "dramatic shadows", "golden hour"
- Specify quality: "highly detailed", "8k", "professional"
Example transformation:
- User: "a cat in space"
- Enhanced: "A majestic orange tabby cat floating in outer space, surrounded by colorful nebulae and distant stars, wearing a small astronaut helmet, digital art style, highly detailed, vibrant colors, cinematic lighting"
将用户请求转换为有效的图像生成提示词:
- 具体化:补充用户可能未提及的细节
- 描述风格:例如“digital art”、“oil painting”、“photograph”、“3D render”
- 包含光线:例如“soft lighting”、“dramatic shadows”、“golden hour”
- 指定画质:例如“highly detailed”、“8k”、“professional”
示例转换:
- 用户:"a cat in space"
- 优化后:"A majestic orange tabby cat floating in outer space, surrounded by colorful nebulae and distant stars, wearing a small astronaut helmet, digital art style, highly detailed, vibrant colors, cinematic lighting"
Step 3: Select the API
步骤3:选择API
Use the model selected by the user in Q0:
-
Check which API keys are configured in environment:
- → GPT Image models available
OPENAI_API_KEY - → Gemini (Nano Banana Pro) available
GOOGLE_API_KEY
-
If the user's selected model isn't available: Inform them and offer alternatives.
-
Model mapping from Q0:
- "Google Gemini (Nano Banana Pro)" → Use with
gemini.pygemini-3-pro-image-preview - "OpenAI GPT Image 1.5" → Use with
openai_image.pygpt-image-1.5 - "OpenAI GPT Image 1" → Use with
openai_image.pygpt-image-1 - "OpenAI GPT Image 1 Mini" → Use with
openai_image.pygpt-image-1-mini
- "Google Gemini (Nano Banana Pro)" → Use
使用用户在问题0中选择的模型:
-
检查环境中配置的API密钥:
- → 可使用GPT Image模型
OPENAI_API_KEY - → 可使用Gemini(Nano Banana Pro)
GOOGLE_API_KEY
-
如果用户选择的模型不可用:告知用户并提供替代方案。
-
问题0的模型映射:
- "Google Gemini(Nano Banana Pro)" → 使用,模型为
gemini.pygemini-3-pro-image-preview - "OpenAI GPT Image 1.5" → 使用,模型为
openai_image.pygpt-image-1.5 - "OpenAI GPT Image 1" → 使用,模型为
openai_image.pygpt-image-1 - "OpenAI GPT Image 1 Mini" → 使用,模型为
openai_image.pygpt-image-1-mini
- "Google Gemini(Nano Banana Pro)" → 使用
Step 4: Generate the Image
步骤4:生成图像
Execute the appropriate script from :
${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/For OpenAI GPT Image - Text to Image:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "your enhanced prompt" \
--model "gpt-image-1" \
--size "1024x1024" \
--quality "high" \
--output "/path/to/output.png"For OpenAI GPT Image - With Transparent Background:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "A product icon with no background" \
--model "gpt-image-1" \
--background "transparent" \
--quality "high" \
--output "/path/to/output.png"For OpenAI GPT Image - Image Editing (with reference images):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Add a wizard hat to this cat" \
--model "gpt-image-1" \
--image "/path/to/cat.jpg" \
--input-fidelity "high" \
--output "/path/to/output.png"For OpenAI GPT Image - Multiple Reference Images:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Create a gift basket containing these items" \
--model "gpt-image-1" \
--image "/path/to/item1.png" \
--image "/path/to/item2.png" \
--image "/path/to/item3.png" \
--output "/path/to/output.png"For OpenAI GPT Image - With Mask (Inpainting):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Replace the pool with a garden" \
--model "gpt-image-1" \
--image "/path/to/scene.jpg" \
--mask "/path/to/mask.png" \
--output "/path/to/output.png"For OpenAI GPT Image - Streaming with Partial Images:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "A beautiful sunset over mountains" \
--model "gpt-image-1" \
--stream \
--partial-images 2 \
--output "/path/to/output.png"For Google Gemini (Nano Banana Pro) - Text to Image:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "your enhanced prompt" \
--model "gemini-3-pro-image-preview" \
--aspect-ratio "1:1" \
--resolution "2K" \
--output "/path/to/output.png"For Google Gemini - With Reference Images (editing, product placement, etc.):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "Add a wizard hat to this cat" \
--image "/path/to/cat.jpg" \
--aspect-ratio "1:1" \
--resolution "2K"For Google Gemini - Multiple Reference Images (composition, style transfer):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "Place this product on the kitchen counter in this scene" \
--image "/path/to/product.png" \
--image "/path/to/kitchen.jpg" \
--aspect-ratio "16:9" \
--resolution "2K"For Google Gemini (Nano Banana - faster, fewer features):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "your enhanced prompt" \
--model "gemini-2.5-flash-image" \
--aspect-ratio "1:1"执行目录下的对应脚本:
${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/OpenAI GPT Image - 文本转图像:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "your enhanced prompt" \
--model "gpt-image-1" \
--size "1024x1024" \
--quality "high" \
--output "/path/to/output.png"OpenAI GPT Image - 透明背景:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "A product icon with no background" \
--model "gpt-image-1" \
--background "transparent" \
--quality "high" \
--output "/path/to/output.png"OpenAI GPT Image - 图像编辑(带参考图):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Add a wizard hat to this cat" \
--model "gpt-image-1" \
--image "/path/to/cat.jpg" \
--input-fidelity "high" \
--output "/path/to/output.png"OpenAI GPT Image - 多张参考图:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Create a gift basket containing these items" \
--model "gpt-image-1" \
--image "/path/to/item1.png" \
--image "/path/to/item2.png" \
--image "/path/to/item3.png" \
--output "/path/to/output.png"OpenAI GPT Image - 带蒙版(图像修复):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "Replace the pool with a garden" \
--model "gpt-image-1" \
--image "/path/to/scene.jpg" \
--mask "/path/to/mask.png" \
--output "/path/to/output.png"OpenAI GPT Image - 流式传输(带部分图像):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/openai_image.py \
--prompt "A beautiful sunset over mountains" \
--model "gpt-image-1" \
--stream \
--partial-images 2 \
--output "/path/to/output.png"Google Gemini(Nano Banana Pro)- 文本转图像:
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "your enhanced prompt" \
--model "gemini-3-pro-image-preview" \
--aspect-ratio "1:1" \
--resolution "2K" \
--output "/path/to/output.png"Google Gemini - 带参考图(编辑、产品放置等):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "Add a wizard hat to this cat" \
--image "/path/to/cat.jpg" \
--aspect-ratio "1:1" \
--resolution "2K"Google Gemini - 多张参考图(合成、风格迁移):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "Place this product on the kitchen counter in this scene" \
--image "/path/to/product.png" \
--image "/path/to/kitchen.jpg" \
--aspect-ratio "16:9" \
--resolution "2K"Google Gemini(Nano Banana - 更快,功能较少):
bash
python3 ${CLAUDE_PLUGIN_ROOT}/skills/image-generation/scripts/gemini.py \
--prompt "your enhanced prompt" \
--model "gemini-2.5-flash-image" \
--aspect-ratio "1:1"Step 5: Deliver the Result
步骤5:交付结果
- Show the generated image to the user
- Provide the enhanced prompt used (so they can iterate)
- Offer to:
- Generate variations
- Try a different style
- Use a different API/model
- Refine the prompt
- 向用户展示生成的图像
- 提供使用的优化提示词(方便用户迭代)
- 提供以下选项:
- 生成变体
- 尝试不同风格
- 使用不同的API/模型
- 优化提示词
Error Handling
错误处理
Missing API key: Inform the user which key is needed and how to set it up:
API rate limit: Suggest waiting or trying the other API.
Content policy violation: Rephrase the prompt to be more appropriate.
Generation failed: Retry with simplified prompt or different API.
缺少API密钥:告知用户需要的密钥类型及设置方法:
API速率限制:建议等待或尝试其他API。
内容政策违规:重新调整提示词使其更合规。
生成失败:使用简化提示词或不同API重试。
Reference Image Use Cases
参考图使用场景
Both OpenAI GPT Image and Google Gemini support reference images for advanced editing:
OpenAI GPT Image: Up to 16 input images, with for preserving faces/logos
Google Gemini: Nano Banana (up to 3), Nano Banana Pro (up to 14)
input_fidelity: highOpenAI GPT Image和Google Gemini均支持使用参考图进行高级编辑:
OpenAI GPT Image:最多16张输入图像,可保留人脸/标志
Google Gemini:Nano Banana(最多3张)、Nano Banana Pro(最多14张)
input_fidelity: highImage Editing
图像编辑
- "Add a santa hat to this person" + person.jpg
- "Remove the background and replace with a beach scene" + product.jpg
- "Change the sofa color to blue" + living_room.jpg
- "Add a santa hat to this person" + person.jpg
- "Remove the background and replace with a beach scene" + product.jpg
- "Change the sofa color to blue" + living_room.jpg
Product Placement
产品放置
- "Place this product on a marble kitchen counter" + product.png + kitchen.jpg
- "Show this watch on a person's wrist" + watch.png + arm.jpg
- "Place this product on a marble kitchen counter" + product.png + kitchen.jpg
- "Show this watch on a person's wrist" + watch.png + arm.jpg
Style Transfer
风格迁移
- "Transform this photo into Van Gogh's Starry Night style" + photo.jpg
- "Make this look like a watercolor painting" + landscape.jpg
- "Transform this photo into Van Gogh's Starry Night style" + photo.jpg
- "Make this look like a watercolor painting" + landscape.jpg
Multi-Image Composition
多图像合成
- "Create a group photo of these people in an office" + person1.jpg + person2.jpg + person3.jpg
- "Combine these elements into a cohesive scene" + element1.png + element2.png + background.jpg
- "Create a group photo of these people in an office" + person1.jpg + person2.jpg + person3.jpg
- "Combine these elements into a cohesive scene" + element1.png + element2.png + background.jpg
Character Consistency
角色一致性
- "Show this character from a different angle" + character.jpg
- "Put this person in a superhero costume" + person.jpg
Tip: For best results with reference images, be specific about what you want to preserve vs. change.
- "Show this character from a different angle" + character.jpg
- "Put this person in a superhero costume" + person.jpg
提示:使用参考图时,明确说明要保留和修改的内容,以获得最佳效果。
Prompt Engineering Tips
提示词工程技巧
For Photorealism
写实风格
- Include "photograph", "DSLR", "35mm film"
- Specify camera settings: "shallow depth of field", "bokeh"
- Add lighting: "natural light", "studio lighting"
- 包含"photograph"、"DSLR"、"35mm film"
- 指定相机设置:"shallow depth of field"、"bokeh"
- 添加光线描述:"natural light"、"studio lighting"
For Artistic Styles
艺术风格
- Reference art movements: "impressionist", "art nouveau", "cyberpunk"
- Name artist styles: "in the style of Studio Ghibli", "Moebius style"
- Specify medium: "watercolor", "oil painting", "pencil sketch"
- 参考艺术流派:"impressionist"、"art nouveau"、"cyberpunk"
- 提及艺术家风格:"in the style of Studio Ghibli"、"Moebius style"
- 指定媒介:"watercolor"、"oil painting"、"pencil sketch"
For Consistency
一致性
- Use seed values when available
- Save successful prompts for reference
- Note which API produced best results for similar requests
- 可用时使用种子值
- 保存成功的提示词以供参考
- 记录哪种API在类似请求中效果最佳
API Comparison
API对比
| Feature | GPT Image 1.5 | GPT Image 1 | GPT Image 1 Mini | Nano Banana | Nano Banana Pro |
|---|---|---|---|---|---|
| Provider | OpenAI | OpenAI | OpenAI | ||
| Model ID | gpt-image-1.5 | gpt-image-1 | gpt-image-1-mini | gemini-2.5-flash-image | gemini-3-pro-image-preview |
| Best for | State of the art | Quality + value | Speed + cost | Fast generation | Professional assets |
| Sizes | 1024², 1536x1024, 1024x1536, auto | Same | Same | 1K only | Up to 4K |
| Quality options | low, medium, high, auto | Same | Same | N/A | N/A |
| Aspect ratios | 3 + auto | Same | Same | 10 options | 10 options |
| Reference images | Up to 16 | Up to 16 | Up to 16 | Up to 3 | Up to 14 |
| Image editing | Yes | Yes | Yes | Yes | Yes |
| Inpainting (mask) | Yes | Yes | Yes | Yes | Yes |
| Transparent background | Yes | Yes | Yes | No | No |
| Streaming | Yes | Yes | Yes | No | No |
| Input fidelity | high/low | high/low | low only | N/A | N/A |
| Output formats | png, jpeg, webp | Same | Same | png | png |
| Compression | 0-100% | Same | Same | No | No |
| Text rendering | Excellent | Excellent | Good | Good | Excellent |
| Thinking mode | No | No | No | No | Yes |
| Max prompt length | 32,000 chars | 32,000 chars | 32,000 chars | N/A | N/A |
| Speed | ~30-60s | ~20-40s | ~10-20s | ~10-20s | ~30-60s |
⚠️ DALL-E 2 and DALL-E 3 are deprecated and will stop being supported on 05/12/2026. Use GPT Image models instead.
| 特性 | GPT Image 1.5 | GPT Image 1 | GPT Image 1 Mini | Nano Banana | Nano Banana Pro |
|---|---|---|---|---|---|
| 提供商 | OpenAI | OpenAI | OpenAI | ||
| 模型ID | gpt-image-1.5 | gpt-image-1 | gpt-image-1-mini | gemini-2.5-flash-image | gemini-3-pro-image-preview |
| 最佳场景 | 最先进画质 | 画质与性价比平衡 | 速度与成本优先 | 快速生成 | 专业级资产生成 |
| 尺寸 | 1024²、1536x1024、1024x1536、auto | 相同 | 相同 | 仅1K | 最高4K |
| 画质选项 | low、medium、high、auto | 相同 | 相同 | 无 | 无 |
| 宽高比 | 3种 + auto | 相同 | 相同 | 10种 | 10种 |
| 参考图数量 | 最多16张 | 最多16张 | 最多16张 | 最多3张 | 最多14张 |
| 图像编辑 | 是 | 是 | 是 | 是 | 是 |
| 蒙版修复 | 是 | 是 | 是 | 是 | 是 |
| 透明背景 | 是 | 是 | 是 | 否 | 否 |
| 流式传输 | 是 | 是 | 是 | 否 | 否 |
| 输入保真度 | high/low | high/low | 仅low | 无 | 无 |
| 输出格式 | png、jpeg、webp | 相同 | 相同 | png | png |
| 压缩率 | 0-100% | 相同 | 相同 | 无 | 无 |
| 文本渲染 | 优秀 | 优秀 | 良好 | 良好 | 优秀 |
| 思考模式 | 否 | 否 | 否 | 否 | 是 |
| 最大提示词长度 | 32000字符 | 32000字符 | 32000字符 | 无 | 无 |
| 速度 | ~30-60秒 | ~20-40秒 | ~10-20秒 | ~10-20秒 | ~30-60秒 |
⚠️ DALL-E 2和DALL-E 3已被弃用,将于2026年5月12日停止支持。请使用GPT Image模型替代。