gpt-image-2

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GPT Image 2 — Interactive Image Generation

GPT Image 2 — 交互式图片生成

Generate and edit images via OpenAI's GPT Image 2 API with an interactive, guided workflow.
通过OpenAI的GPT Image 2 API,以交互式引导工作流生成和编辑图片。

Interactive Flow

交互式流程

When the user invokes this skill, guide them through these steps using AskUserQuestion. Do not skip steps — the interactive flow is the core experience.
当用户调用此技能时,使用AskUserQuestion引导他们完成以下步骤。请勿跳过步骤——交互式流程是核心体验。

Step 1: What are we making?

步骤1:我们要制作什么?

Ask the user what they want to create. Offer these options:
  • Single image — one image from a text prompt
  • Photo edit — transform an existing photo into a style
  • Carousel — 5-10 cohesive slides for LinkedIn/Instagram
  • Variants — multiple versions of the same concept
  • Quick generate — skip questions, just run the prompt
If the user already provided a clear prompt (e.g. "generate an editorial image of a rocket"), skip to Step 3.
询问用户想要创建的内容。提供以下选项:
  • 单张图片 — 根据文本提示生成一张图片
  • 照片编辑 — 将现有照片转换为指定风格
  • 轮播图 — 5-10张风格统一的幻灯片,适用于LinkedIn/Instagram
  • 变体图 — 同一概念的多个版本
  • 快速生成 — 跳过问题,直接运行提示词
如果用户已经提供了明确的提示词(例如“生成一张火箭的社论风格图片”),则跳至步骤3。

Step 2: Style selection

步骤2:风格选择

Show the user available presets grouped by category. Read
presets.yaml
and present them:
Visual styles (no text in image): editorial, blueprint, ink, risograph, wireframe, constellation, brutalist, grain
Text-heavy (leverages GPT Image 2 text rendering): infographic, slide, diagram, poster, menu, manga
Community favorites: trading-card, pixar, app-mockup, isometric, action-figure, cinematic, panorama
Custom — user describes their own style
Ask: "Which style? Or describe your own."
向用户展示按类别分组的可用预设。读取
presets.yaml
并呈现如下:
视觉风格(图片中无文字): editorial, blueprint, ink, risograph, wireframe, constellation, brutalist, grain
文字密集型(利用GPT Image 2的文字渲染功能): infographic, slide, diagram, poster, menu, manga
社区热门风格: trading-card, pixar, app-mockup, isometric, action-figure, cinematic, panorama
自定义 — 用户描述自己想要的风格
询问:“选择哪种风格?或者描述您自定义的风格。”

Step 3: Platform & sizing

步骤3:平台与尺寸

Ask where this will be used:
  • YouTube thumbnail (1280×720)
  • Instagram square (1080×1080)
  • Slides/presentation (1920×1080)
  • Blog hero (1200×630)
  • X/Twitter (1600×900)
  • Story (1080×1920)
  • Custom size
  • No resize (use API default)
询问图片的使用场景:
  • YouTube缩略图(1280×720)
  • Instagram正方形图(1080×1080)
  • 幻灯片/演示文稿(1920×1080)
  • 博客首图(1200×630)
  • X/Twitter(1600×900)
  • 快拍(1080×1920)
  • 自定义尺寸
  • 不调整尺寸(使用API默认值)

Step 4: Draft first, then final

步骤4:先生成草稿,再生成最终版

Always generate a draft first unless the user says "skip draft" or uses
--draft false
.
  1. Generate with
    --draft
    (quality=low, ~$0.006/image)
  2. Show the image to the user using the Read tool
  3. Ask: "Like this direction? I can: (a) generate final quality, (b) adjust the prompt, (c) try a different style, (d) regenerate with a new seed"
  4. If approved, generate final with
    --quality high
    (~$0.21/image)
  5. Use
    --seed
    from the draft to maintain composition when upgrading to final
This draft→final flow saves ~97% on iteration costs.
始终先生成草稿,除非用户说“跳过草稿”或使用
--draft false
参数。
  1. 使用
    --draft
    参数生成(质量=低,约0.006美元/张)
  2. 使用Read工具向用户展示图片
  3. 询问:“您喜欢这个方向吗?我可以:(a) 生成最终质量版本,(b) 调整提示词,(c) 尝试不同风格,(d) 使用新种子重新生成”
  4. 如果获得批准,使用
    --quality high
    参数生成最终版(约0.21美元/张)
  5. 使用草稿的
    --seed
    参数,在升级到最终版时保持构图一致
这种草稿→最终版的流程可节省约97%的迭代成本。

Step 5: Show result and offer next actions

步骤5:展示结果并提供后续操作选项

After generation, always:
  1. Show the image using the Read tool
  2. Open it with
    open <path>
    for full-resolution preview
  3. Report the cost
  4. Offer: "Want to (a) generate variants, (b) edit this further, (c) use as reference for more images, (d) done?"
生成完成后,始终执行以下操作:
  1. 使用Read工具展示图片
  2. 使用
    open <path>
    命令打开图片以查看全分辨率预览
  3. 报告生成成本
  4. 询问:“您想要(a) 生成变体图,(b) 进一步编辑此图,(c) 将此图作为参考生成更多图片,(d) 完成操作?”

Carousel Workflow

轮播图工作流

When the user wants a carousel (5-10 slides):
当用户想要制作轮播图(5-10张幻灯片)时:

1. Story arc

1. 故事脉络

Ask: "What's the story? Give me the key message and I'll draft a 10-slide arc."
Then propose a slide-by-slide plan like:
Slide 1: [Cover] — hook headline + hero image
Slide 2: [Problem] — bold statement
Slide 3: [Context] — illustration + explanation
...
Slide 10: [CTA] — call to action with URL
Ask the user to approve or modify the plan.
询问:“故事内容是什么?告诉我核心信息,我会草拟一个10页的幻灯片脉络。”
然后提出逐页的规划,例如:
Slide 1: [封面] — 吸睛标题 + 主视觉图片
Slide 2: [问题] — 醒目陈述
Slide 3: [背景] — 插图 + 解释
...
Slide 10: [行动号召] — 带有URL的行动号召
询问用户是否批准或修改该规划。

2. Style consistency

2. 风格一致性

Use the same preset + seed range across all slides. For carousels:
  • Pick one visual style for all slides
  • Use
    --seed
    to lock composition patterns
  • Include pagination dots in prompts (e.g., "10 small dots at bottom, third dot highlighted orange")
  • Maintain consistent color palette and typography
所有幻灯片使用相同的预设+种子范围。制作轮播图时:
  • 为所有幻灯片选择一种视觉风格
  • 使用
    --seed
    参数锁定构图模式
  • 在提示词中包含分页点(例如“底部有10个小点,第三个点高亮为橙色”)
  • 保持一致的调色板和排版

3. Draft batch

3. 批量生成草稿

Generate all slides as drafts first ($0.006 × 10 = $0.06 total). Show them all to the user as a contact sheet or one by one. Ask which ones to regenerate or adjust.
先批量生成所有幻灯片的草稿(0.006美元×10 = 总计0.06美元)。以联系表形式或逐一展示给用户,询问哪些需要重新生成或调整。

4. Final batch

4. 批量生成最终版

Only generate finals for approved slides. Offer to generate all at once with
-y
flag.
仅为获得批准的幻灯片生成最终版。提供使用
-y
参数一次性生成所有最终版的选项。

Photo Edit Workflow

照片编辑工作流

When the user wants to transform a photo:
  1. Ask for the source image (file path or clipboard)
  2. For clipboard: save with
    osascript
    to a temp file
  3. Show available styles and ask which to try
  4. Generate a draft edit first
  5. Show result, ask if they want adjustments
  6. Generate final when approved
Use
--edit <path>
for the API call.
当用户想要转换照片风格时:
  1. 询问源图片(文件路径或剪贴板)
  2. 若为剪贴板图片:使用
    osascript
    保存到临时文件
  3. 展示可用风格并询问尝试哪种
  4. 先生成草稿编辑版
  5. 展示结果,询问是否需要调整
  6. 获得批准后生成最终版
在API调用中使用
--edit <path>
参数。

Cost Awareness

成本意识

Always communicate costs before generating:
QualityPer image10-slide carousel
--draft
(low)
$0.006$0.06
medium$0.05$0.50
high (default)$0.21$2.10
high + thinking$0.25-0.42$2.50-4.20
Thinking mode adds 20-100% cost. Only suggest it for text-heavy or complex compositions.
The script auto-confirms when cost < $0.50. Above that, it prompts the user.
生成前始终告知用户成本:
质量单张图片成本10页轮播图成本
--draft
(低)
$0.006$0.06
medium$0.05$0.50
high(默认)$0.21$2.10
high + thinking$0.25-0.42$2.50-4.20
思考模式会增加20-100%的成本。仅在处理文字密集型或复杂构图时建议使用。
当成本低于0.50美元时,脚本会自动确认。超过该金额时,会提示用户确认。

Prompt Engineering Tips

提示词工程技巧

When helping users write prompts, apply these patterns:
  1. Structure: Scene → Subject → Detail → Lighting → Constraint
  2. Front-load the subject: put the main thing first
  3. For text in images: quote exact text with single quotes:
    'with the headline "Hello World"'
  4. Character consistency: maintain a 5-tuple: age + appearance + hairstyle + distinctive features + clothing
  5. Style tags at end: append tags like
    editorial-magazine
    ,
    studio-product
    to converge batches
  6. Use
    --seed
    for iteration
    : lock composition, vary only the prompt details
帮助用户编写提示词时,应用以下模式:
  1. 结构:场景 → 主体 → 细节 → 光线 → 约束
  2. 主体前置:将主要对象放在最前面
  3. 图片中的文字:用单引号引用确切文字:
    'with the headline "Hello World"'
  4. 角色一致性:保持5元组信息:年龄 + 外貌 + 发型 + 独特特征 + 服装
  5. 风格标签后置:添加
    editorial-magazine
    studio-product
    等标签,使批量生成的风格更统一
  6. 使用
    --seed
    进行迭代
    :锁定构图,仅调整提示词细节

CLI Reference

CLI参考

bash
undefined
bash
undefined

Basic generation

基础生成

scripts/gpt_image_2.py "prompt" output.png
scripts/gpt_image_2.py "prompt" output.png

With preset and platform

使用预设和平台参数

scripts/gpt_image_2.py --preset editorial --platform square "subject" out.png
scripts/gpt_image_2.py --preset editorial --platform square "subject" out.png

Draft mode (~$0.006/image)

草稿模式(约0.006美元/张)

scripts/gpt_image_2.py --draft "prompt" out.png
scripts/gpt_image_2.py --draft "prompt" out.png

With thinking for complex layouts

针对复杂布局启用思考模式

scripts/gpt_image_2.py --thinking medium --preset diagram "OAuth flow" out.png
scripts/gpt_image_2.py --thinking medium --preset diagram "OAuth flow" out.png

Seed for reproducibility

使用种子确保可复现

scripts/gpt_image_2.py --seed 42 "prompt" out.png
scripts/gpt_image_2.py --seed 42 "prompt" out.png

Edit existing photo

编辑现有照片

scripts/gpt_image_2.py --edit photo.png "transform into constellation style" out.png
scripts/gpt_image_2.py --edit photo.png "transform into constellation style" out.png

Variants with contact sheet

生成变体图并以联系表展示

scripts/gpt_image_2.py --n 4 --preset ink "mountain" out.png
scripts/gpt_image_2.py --n 4 --preset ink "mountain" out.png

Cost estimate

成本估算

scripts/gpt_image_2.py --estimate --n 10 --quality high "batch test"
scripts/gpt_image_2.py --estimate --n 10 --quality high "batch test"

Skip confirmation

跳过确认

scripts/gpt_image_2.py -y --n 10 "batch" out.png
scripts/gpt_image_2.py -y --n 10 "batch" out.png

Dry run (show prompt without API call)

试运行(仅展示提示词,不调用API)

scripts/gpt_image_2.py --dry-run --preset editorial "test" out.png
undefined
scripts/gpt_image_2.py --dry-run --preset editorial "test" out.png
undefined

Files

文件说明

  • scripts/gpt_image_2.py
    — main CLI (Python, requires PyYAML)
  • presets.yaml
    — 21 style presets (visual + text-heavy + community)
  • platforms.yaml
    — 8 platform sizing presets
  • references/api_reference.md
    — full API documentation
  • ~/.config/gpt-image-2/config.yaml
    — user defaults
  • ~/.config/gpt-image-2/history.jsonl
    — generation log
  • ~/.config/gpt-image-2/last.json
    — last run (for
    again
    )
  • scripts/gpt_image_2.py
    — 主CLI工具(Python编写,需依赖PyYAML)
  • presets.yaml
    — 21种风格预设(视觉风格+文字密集型+社区热门)
  • platforms.yaml
    — 8种平台尺寸预设
  • references/api_reference.md
    — 完整API文档
  • ~/.config/gpt-image-2/config.yaml
    — 用户默认配置
  • ~/.config/gpt-image-2/history.jsonl
    — 生成日志
  • ~/.config/gpt-image-2/last.json
    — 上一次运行记录(用于
    again
    命令)