gpt-image-2

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

GPT Image 2 — Interactive Image Generation

GPT Image 2 — 交互式图片生成

Generate and edit images via OpenAI's GPT Image 2 API with an interactive, guided workflow.

通过OpenAI的GPT Image 2 API，以交互式引导工作流生成和编辑图片。

Interactive Flow

交互式流程

When the user invokes this skill, guide them through these steps using AskUserQuestion. Do not skip steps — the interactive flow is the core experience.

当用户调用此技能时，使用AskUserQuestion引导他们完成以下步骤。请勿跳过步骤——交互式流程是核心体验。

Step 1: What are we making?

步骤1：我们要制作什么？

Ask the user what they want to create. Offer these options:

Single image — one image from a text prompt
Photo edit — transform an existing photo into a style
Carousel — 5-10 cohesive slides for LinkedIn/Instagram
Variants — multiple versions of the same concept
Quick generate — skip questions, just run the prompt

If the user already provided a clear prompt (e.g. "generate an editorial image of a rocket"), skip to Step 3.

询问用户想要创建的内容。提供以下选项：

单张图片 — 根据文本提示生成一张图片
照片编辑 — 将现有照片转换为指定风格
轮播图 — 5-10张风格统一的幻灯片，适用于LinkedIn/Instagram
变体图 — 同一概念的多个版本
快速生成 — 跳过问题，直接运行提示词

如果用户已经提供了明确的提示词（例如“生成一张火箭的社论风格图片”），则跳至步骤3。

Step 2: Style selection

步骤2：风格选择

Show the user available presets grouped by category. Read

presets.yaml

and present them:

Visual styles (no text in image): editorial, blueprint, ink, risograph, wireframe, constellation, brutalist, grain

Text-heavy (leverages GPT Image 2 text rendering): infographic, slide, diagram, poster, menu, manga

Community favorites: trading-card, pixar, app-mockup, isometric, action-figure, cinematic, panorama

Custom — user describes their own style

Ask: "Which style? Or describe your own."

向用户展示按类别分组的可用预设。读取

presets.yaml

并呈现如下：

视觉风格（图片中无文字）： editorial, blueprint, ink, risograph, wireframe, constellation, brutalist, grain

文字密集型（利用GPT Image 2的文字渲染功能）： infographic, slide, diagram, poster, menu, manga

社区热门风格： trading-card, pixar, app-mockup, isometric, action-figure, cinematic, panorama

自定义 — 用户描述自己想要的风格

询问：“选择哪种风格？或者描述您自定义的风格。”

Step 3: Platform & sizing

步骤3：平台与尺寸

Ask where this will be used:

YouTube thumbnail (1280×720)
Instagram square (1080×1080)
Slides/presentation (1920×1080)
Blog hero (1200×630)
X/Twitter (1600×900)
Story (1080×1920)
Custom size
No resize (use API default)

询问图片的使用场景：

YouTube缩略图（1280×720）
Instagram正方形图（1080×1080）
幻灯片/演示文稿（1920×1080）
博客首图（1200×630）
X/Twitter（1600×900）
快拍（1080×1920）
自定义尺寸
不调整尺寸（使用API默认值）

Step 4: Draft first, then final

步骤4：先生成草稿，再生成最终版

Always generate a draft first unless the user says "skip draft" or uses

--draft false

Generate with
```
--draft
```
(quality=low, ~$0.006/image)
Show the image to the user using the Read tool
Ask: "Like this direction? I can: (a) generate final quality, (b) adjust the prompt, (c) try a different style, (d) regenerate with a new seed"
If approved, generate final with
```
--quality high
```
(~$0.21/image)
Use
```
--seed
```
from the draft to maintain composition when upgrading to final

This draft→final flow saves ~97% on iteration costs.

始终先生成草稿，除非用户说“跳过草稿”或使用

--draft false

参数。

使用
```
--draft
```
参数生成（质量=低，约0.006美元/张）
使用Read工具向用户展示图片
询问：“您喜欢这个方向吗？我可以：(a) 生成最终质量版本，(b) 调整提示词，(c) 尝试不同风格，(d) 使用新种子重新生成”
如果获得批准，使用
```
--quality high
```
参数生成最终版（约0.21美元/张）
使用草稿的
```
--seed
```
参数，在升级到最终版时保持构图一致

这种草稿→最终版的流程可节省约97%的迭代成本。

Step 5: Show result and offer next actions

步骤5：展示结果并提供后续操作选项

After generation, always:

Show the image using the Read tool
Open it with
```
open <path>
```
for full-resolution preview
Report the cost
Offer: "Want to (a) generate variants, (b) edit this further, (c) use as reference for more images, (d) done?"

生成完成后，始终执行以下操作：

使用Read工具展示图片
使用
```
open <path>
```
命令打开图片以查看全分辨率预览
报告生成成本
询问：“您想要(a) 生成变体图，(b) 进一步编辑此图，(c) 将此图作为参考生成更多图片，(d) 完成操作？”

Carousel Workflow

轮播图工作流

When the user wants a carousel (5-10 slides):

当用户想要制作轮播图（5-10张幻灯片）时：

1. Story arc

1. 故事脉络

Ask: "What's the story? Give me the key message and I'll draft a 10-slide arc."

Then propose a slide-by-slide plan like:

Slide 1: [Cover] — hook headline + hero image
Slide 2: [Problem] — bold statement
Slide 3: [Context] — illustration + explanation
...
Slide 10: [CTA] — call to action with URL

Ask the user to approve or modify the plan.

询问：“故事内容是什么？告诉我核心信息，我会草拟一个10页的幻灯片脉络。”

然后提出逐页的规划，例如：

Slide 1: [封面] — 吸睛标题 + 主视觉图片
Slide 2: [问题] — 醒目陈述
Slide 3: [背景] — 插图 + 解释
...
Slide 10: [行动号召] — 带有URL的行动号召

询问用户是否批准或修改该规划。

2. Style consistency

2. 风格一致性

Use the same preset + seed range across all slides. For carousels:

Pick one visual style for all slides
Use
```
--seed
```
to lock composition patterns
Include pagination dots in prompts (e.g., "10 small dots at bottom, third dot highlighted orange")
Maintain consistent color palette and typography

所有幻灯片使用相同的预设+种子范围。制作轮播图时：

为所有幻灯片选择一种视觉风格
使用
```
--seed
```
参数锁定构图模式
在提示词中包含分页点（例如“底部有10个小点，第三个点高亮为橙色”）
保持一致的调色板和排版

3. Draft batch

3. 批量生成草稿

Generate all slides as drafts first ($0.006 × 10 = $0.06 total). Show them all to the user as a contact sheet or one by one. Ask which ones to regenerate or adjust.

先批量生成所有幻灯片的草稿（0.006美元×10 = 总计0.06美元）。以联系表形式或逐一展示给用户，询问哪些需要重新生成或调整。

4. Final batch

4. 批量生成最终版

Only generate finals for approved slides. Offer to generate all at once with

-y

flag.

仅为获得批准的幻灯片生成最终版。提供使用

-y

参数一次性生成所有最终版的选项。

Photo Edit Workflow

照片编辑工作流

When the user wants to transform a photo:

Ask for the source image (file path or clipboard)
For clipboard: save with
```
osascript
```
to a temp file
Show available styles and ask which to try
Generate a draft edit first
Show result, ask if they want adjustments
Generate final when approved

Use

--edit <path>

for the API call.

当用户想要转换照片风格时：

询问源图片（文件路径或剪贴板）
若为剪贴板图片：使用
```
osascript
```
保存到临时文件
展示可用风格并询问尝试哪种
先生成草稿编辑版
展示结果，询问是否需要调整
获得批准后生成最终版

在API调用中使用

--edit <path>

参数。

Cost Awareness

成本意识

Always communicate costs before generating:

Quality	Per image	10-slide carousel
`--draft` (low)	$0.006	$0.06
medium	$0.05	$0.50
high (default)	$0.21	$2.10
high + thinking	$0.25-0.42	$2.50-4.20

Thinking mode adds 20-100% cost. Only suggest it for text-heavy or complex compositions.

The script auto-confirms when cost < $0.50. Above that, it prompts the user.

生成前始终告知用户成本：

质量	单张图片成本	10页轮播图成本
`--draft` （低）	$0.006	$0.06
medium	$0.05	$0.50
high（默认）	$0.21	$2.10
high + thinking	$0.25-0.42	$2.50-4.20

思考模式会增加20-100%的成本。仅在处理文字密集型或复杂构图时建议使用。

当成本低于0.50美元时，脚本会自动确认。超过该金额时，会提示用户确认。

Prompt Engineering Tips

提示词工程技巧

When helping users write prompts, apply these patterns:

Structure: Scene → Subject → Detail → Lighting → Constraint
Front-load the subject: put the main thing first
For text in images: quote exact text with single quotes:
```
'with the headline "Hello World"'
```
Character consistency: maintain a 5-tuple: age + appearance + hairstyle + distinctive features + clothing
Style tags at end: append tags like
```
editorial-magazine
```
,
```
studio-product
```
to converge batches
Use
--seed
for iteration: lock composition, vary only the prompt details

帮助用户编写提示词时，应用以下模式：

结构：场景 → 主体 → 细节 → 光线 → 约束
主体前置：将主要对象放在最前面
图片中的文字：用单引号引用确切文字：
```
'with the headline "Hello World"'
```
角色一致性：保持5元组信息：年龄 + 外貌 + 发型 + 独特特征 + 服装
风格标签后置：添加
```
editorial-magazine
```
、
```
studio-product
```
等标签，使批量生成的风格更统一
使用
--seed
进行迭代：锁定构图，仅调整提示词细节

CLI Reference

CLI参考

bash

undefined

bash

undefined

Basic generation

基础生成

scripts/gpt_image_2.py "prompt" output.png

With preset and platform

使用预设和平台参数

scripts/gpt_image_2.py --preset editorial --platform square "subject" out.png

Draft mode (~$0.006/image)

草稿模式（约0.006美元/张）

scripts/gpt_image_2.py --draft "prompt" out.png

With thinking for complex layouts

针对复杂布局启用思考模式

scripts/gpt_image_2.py --thinking medium --preset diagram "OAuth flow" out.png

Seed for reproducibility

使用种子确保可复现

scripts/gpt_image_2.py --seed 42 "prompt" out.png

Edit existing photo

编辑现有照片

scripts/gpt_image_2.py --edit photo.png "transform into constellation style" out.png

Variants with contact sheet

生成变体图并以联系表展示

scripts/gpt_image_2.py --n 4 --preset ink "mountain" out.png

Cost estimate

成本估算

scripts/gpt_image_2.py --estimate --n 10 --quality high "batch test"

Skip confirmation

跳过确认

scripts/gpt_image_2.py -y --n 10 "batch" out.png

Dry run (show prompt without API call)

试运行（仅展示提示词，不调用API）

scripts/gpt_image_2.py --dry-run --preset editorial "test" out.png

undefined

scripts/gpt_image_2.py --dry-run --preset editorial "test" out.png

undefined

Files

文件说明

```
scripts/gpt_image_2.py
```
— main CLI (Python, requires PyYAML)
```
presets.yaml
```
— 21 style presets (visual + text-heavy + community)
```
platforms.yaml
```
— 8 platform sizing presets
```
references/api_reference.md
```
— full API documentation
```
~/.config/gpt-image-2/config.yaml
```
— user defaults
```
~/.config/gpt-image-2/history.jsonl
```
— generation log
```
~/.config/gpt-image-2/last.json
```
— last run (for
```
again
```
)

```
scripts/gpt_image_2.py
```
— 主CLI工具（Python编写，需依赖PyYAML）
```
presets.yaml
```
— 21种风格预设（视觉风格+文字密集型+社区热门）
```
platforms.yaml
```
— 8种平台尺寸预设
```
references/api_reference.md
```
— 完整API文档
```
~/.config/gpt-image-2/config.yaml
```
— 用户默认配置
```
~/.config/gpt-image-2/history.jsonl
```
— 生成日志
```
~/.config/gpt-image-2/last.json
```
— 上一次运行记录（用于
```
again
```
命令）