GPT Image 2 — Interactive Image Generation
Generate and edit images via OpenAI's GPT Image 2 API with an interactive, guided workflow.
Interactive Flow
When the user invokes this skill, guide them through these steps using AskUserQuestion. Do not skip steps — the interactive flow is the core experience.
Step 1: What are we making?
Ask the user what they want to create. Offer these options:
- Single image — one image from a text prompt
- Photo edit — transform an existing photo into a style
- Carousel — 5-10 cohesive slides for LinkedIn/Instagram
- Variants — multiple versions of the same concept
- Quick generate — skip questions, just run the prompt
If the user already provided a clear prompt (e.g. "generate an editorial image of a rocket"), skip to Step 3.
Step 2: Style selection
Show the user available presets grouped by category. Read
and present them:
Visual styles (no text in image):
editorial, blueprint, ink, risograph, wireframe, constellation, brutalist, grain
Text-heavy (leverages GPT Image 2 text rendering):
infographic, slide, diagram, poster, menu, manga
Community favorites:
trading-card, pixar, app-mockup, isometric, action-figure, cinematic, panorama
Custom — user describes their own style
Ask: "Which style? Or describe your own."
Step 3: Platform & sizing
Ask where this will be used:
- YouTube thumbnail (1280×720)
- Instagram square (1080×1080)
- Slides/presentation (1920×1080)
- Blog hero (1200×630)
- X/Twitter (1600×900)
- Story (1080×1920)
- Custom size
- No resize (use API default)
Step 4: Draft first, then final
Always generate a draft first unless the user says "skip draft" or uses
.
- Generate with (quality=low, ~$0.006/image)
- Show the image to the user using the Read tool
- Ask: "Like this direction? I can: (a) generate final quality, (b) adjust the prompt, (c) try a different style, (d) regenerate with a new seed"
- If approved, generate final with (~$0.21/image)
- Use from the draft to maintain composition when upgrading to final
This draft→final flow saves ~97% on iteration costs.
Step 5: Show result and offer next actions
After generation, always:
- Show the image using the Read tool
- Open it with for full-resolution preview
- Report the cost
- Offer: "Want to (a) generate variants, (b) edit this further, (c) use as reference for more images, (d) done?"
Carousel Workflow
When the user wants a carousel (5-10 slides):
1. Story arc
Ask: "What's the story? Give me the key message and I'll draft a 10-slide arc."
Then propose a slide-by-slide plan like:
Slide 1: [Cover] — hook headline + hero image
Slide 2: [Problem] — bold statement
Slide 3: [Context] — illustration + explanation
...
Slide 10: [CTA] — call to action with URL
Ask the user to approve or modify the plan.
2. Style consistency
Use the same preset + seed range across all slides. For carousels:
- Pick one visual style for all slides
- Use to lock composition patterns
- Include pagination dots in prompts (e.g., "10 small dots at bottom, third dot highlighted orange")
- Maintain consistent color palette and typography
3. Draft batch
Generate all slides as drafts first ($0.006 × 10 = $0.06 total). Show them all to the user as a contact sheet or one by one. Ask which ones to regenerate or adjust.
4. Final batch
Only generate finals for approved slides. Offer to generate all at once with
flag.
Photo Edit Workflow
When the user wants to transform a photo:
- Ask for the source image (file path or clipboard)
- For clipboard: save with to a temp file
- Show available styles and ask which to try
- Generate a draft edit first
- Show result, ask if they want adjustments
- Generate final when approved
Cost Awareness
Always communicate costs before generating:
| Quality | Per image | 10-slide carousel |
|---|
| (low) | $0.006 | $0.06 |
| medium | $0.05 | $0.50 |
| high (default) | $0.21 | $2.10 |
| high + thinking | $0.25-0.42 | $2.50-4.20 |
Thinking mode adds 20-100% cost. Only suggest it for text-heavy or complex compositions.
The script auto-confirms when cost < $0.50. Above that, it prompts the user.
Prompt Engineering Tips
When helping users write prompts, apply these patterns:
- Structure: Scene → Subject → Detail → Lighting → Constraint
- Front-load the subject: put the main thing first
- For text in images: quote exact text with single quotes:
'with the headline "Hello World"'
- Character consistency: maintain a 5-tuple: age + appearance + hairstyle + distinctive features + clothing
- Style tags at end: append tags like , to converge batches
- Use for iteration: lock composition, vary only the prompt details
CLI Reference
bash
# Basic generation
scripts/gpt_image_2.py "prompt" output.png
# With preset and platform
scripts/gpt_image_2.py --preset editorial --platform square "subject" out.png
# Draft mode (~$0.006/image)
scripts/gpt_image_2.py --draft "prompt" out.png
# With thinking for complex layouts
scripts/gpt_image_2.py --thinking medium --preset diagram "OAuth flow" out.png
# Seed for reproducibility
scripts/gpt_image_2.py --seed 42 "prompt" out.png
# Edit existing photo
scripts/gpt_image_2.py --edit photo.png "transform into constellation style" out.png
# Variants with contact sheet
scripts/gpt_image_2.py --n 4 --preset ink "mountain" out.png
# Cost estimate
scripts/gpt_image_2.py --estimate --n 10 --quality high "batch test"
# Skip confirmation
scripts/gpt_image_2.py -y --n 10 "batch" out.png
# Dry run (show prompt without API call)
scripts/gpt_image_2.py --dry-run --preset editorial "test" out.png
Files
- — main CLI (Python, requires PyYAML)
- — 21 style presets (visual + text-heavy + community)
- — 8 platform sizing presets
references/api_reference.md
— full API documentation
~/.config/gpt-image-2/config.yaml
— user defaults
~/.config/gpt-image-2/history.jsonl
— generation log
~/.config/gpt-image-2/last.json
— last run (for )