GPT Image 2
This is a focused Skill for GPT Image 2, which can be used in 3 runtime environments with significant behavioral differences. You must first determine the current operating mode as the first step.
It only handles two types of image tasks:
- Image generation:
- Image editing:
This file retains: operating modes, Skill structure, environment variables, saving/naming rules, template index, and mode-aware workflows. Detailed templates are all placed in
, organized hierarchically:
- Level 1: Category directories
- Level 2: Individual template Markdown files
Operating Modes (Must Read, Confirm Before Any Operation)
This Skill comes with a lightweight detection script. Run it first, then decide how to proceed based on the results:
bash
node skills/gpt-image-2/scripts/check-mode.js
# To get structured results for upper-level programs:
node skills/gpt-image-2/scripts/check-mode.js --json
The output will indicate
/
/
along with a
. The three modes are defined as follows:
Mode A · Garden Local Image Generation
Trigger Condition: Environment variable
is true (
/
/
/
)
and exists.
Behavior: Complete end-to-end workflow of "select template → write prompt → call script → generate and save image".
- Use for text-to-image generation, for editing existing images.
- Prompts are saved to
garden-gpt-image-2/prompt/
by default, and images are saved to garden-gpt-image-2/image/
.
- This is the most powerful mode: you are the "owner" of the image tool.
Mode B · Host-Native Delegated Image Generation
Trigger Condition: Garden is not enabled (
is not set / false), but
the current host Agent has built-in image generation tools or image MCP.
Typical Identification Signals (you should self-check):
- Tools like / / / / / or similar names appear in your toolset
- Users call this Skill in clients that support native image generation such as ChatGPT / Codex / Gemini / Cursor
- Users explicitly say "use your own tool to generate images"
Behavior: This Skill degrades to a prompt engineering guide——
- Still follow the workflow of "select template → fill in fields → render final prompt".
- Do not call (no API key, will definitely fail).
- Directly call the host's built-in image tool, passing the rendered prompt as input.
- If users wish, you can save the prompt file to
garden-gpt-image-2/prompt/
, but the image storage location is determined by the host and not mandatory.
Mode C · Advisor Pure Prompt Consultant
Trigger Condition: Garden is not enabled, and the host Agent has no image generation tools.
Behavior: This Skill degrades to a "high-quality prompt writing consultant"——
- Follow the workflow of "select template → fill in fields → render final prompt", and ask users if information is missing.
- Directly print the final prompt to users + save a copy to
garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
.
- Attach a short "how to use" suggestion (e.g.: paste into ChatGPT / Midjourney / DALL·E / Sora / Nano Banana / your own backend / third-party GPT Image 2 gateway).
- Do not pretend image generation was successful. Clearly inform users: "A high-quality reusable prompt has been generated. Please execute it with your image tool."
Mode Decision Table
| Condition | Mode | Call Script? | Save Prompt? | Save Image? |
|---|
| + API key exists | A | ✅ / | ✅ Auto | ✅ Auto |
| but no API key | A? | ❌ (Ask for API key first) | — | — |
| Garden not enabled + host has image tools | B | ❌ (Use host tools) | Optional | Determined by host |
| Garden not enabled + host has no image tools | C | ❌ | ✅ Mandatory | ❌ (Impossible) |
When Mode is Uncertain
- If you cannot determine whether you are in Mode B or C, directly ask users: "Shall I use the image tool in your environment to generate images, or just write the prompt for you?"
- If Mode A script call fails (401 / network / quota) → report error and ask "Switch to Mode B / C?"
User Input Tools
When this Skill needs to ask users questions, follow these rules:
- Prioritize using the user input tools provided by the current runtime.
- If no corresponding tool exists, ask with short plain text numbered questions.
- Combine questions as much as possible and ask them all at once.
Skill Structure
- : Run this first to detect the operating mode (A / B / C)
- : Text-to-image generation (only used in Mode A)
- : Image editing based on original image/mask (only used in Mode A)
- : Shared logic for requests, saving, and environment variable reading
- : Hierarchical structured prompt templates (used in all three modes A / B / C)
Environment Variables
Read configurations in the following order:
- CLI parameters
Core variables:
- — Mode switch. Enable Mode A when set to / / / ; enter Mode B / C if not set or set to other values.
- — Required for Mode A; not needed for B / C.
- — Defaults to
https://api.openai.com/v1
, can point to third-party compatible gateways.
- — Defaults to , can be replaced with models supported by the gateway (e.g., / ).
The default implementation works with OpenAI-compatible APIs and does not hardcode any third-party gateways.
Default Output Directories
If users do not explicitly specify output paths, uniformly use the following directories in the current workspace:
- Prompt directory:
garden-gpt-image-2/prompt/
(Recommended for all three modes A / B / C for easy reuse and version management)
- Image directory:
garden-gpt-image-2/image/
(Only used in Mode A; determined by host in Mode B, no images generated in Mode C)
If the directories do not exist, scripts (Mode A) must create them automatically; Mode B / C should manually run
before writing prompts.
Default Naming Rules
If users do not explicitly specify filenames, scripts should automatically generate filenames related to the current task and append the current timestamp to avoid duplicates.
Naming rules:
- Prompt:
garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
- Image:
garden-gpt-image-2/image/<task-slug>-<timestamp>.png
Where:
- : A relevant short name automatically extracted based on current user requirements
- : Current timestamp, e.g.,
Examples:
garden-gpt-image-2/prompt/live-commerce-ui-20260424-153045.md
garden-gpt-image-2/image/live-commerce-ui-20260424-153045.png
garden-gpt-image-2/prompt/vr-headset-exploded-view-20260424-153102.md
garden-gpt-image-2/image/vr-headset-exploded-view-20260424-153102.png
Prompt Saving Rules
| Mode | Mandatory to Save Prompt? | Description |
|---|
| Mode A | ✅ Mandatory | Must save prompt when entering actual generation/editing workflow |
| Mode B | Recommended | Default to save for easy reuse; skip if users say "no" |
| Mode C | ✅ Mandatory | Users take the prompt to execute themselves; not saving is useless |
General rules (applicable to all three modes):
- If users explicitly provide a prompt file path, use that file directly as input.
- If users directly provide a text prompt, save the final prompt to
garden-gpt-image-2/prompt/
first.
- If users explicitly specify , respect the user-specified path.
- Otherwise, use the default naming rules to save automatically.
Image Saving Rules (Only Mode A)
- If users explicitly specify or , respect the user-specified path.
- Otherwise, save to
garden-gpt-image-2/image/
by default.
- Filenames should be semantically related to the current task and appended with a timestamp.
Mode B follows the saving method determined by the host image tool; Mode C does not generate images.
Quick Usage
0. Detect Operating Mode (First Step for Any Task)
bash
node skills/gpt-image-2/scripts/check-mode.js
The output will tell you if you are in Mode A / B / C, determining whether to call
/
next. Steps 1~4 below are only for
Mode A.
1. Text-to-Image Generation (Mode A)
bash
node skills/gpt-image-2/scripts/generate.js \
--prompt "A cute baby sea otter" \
--size 1024x1024 \
--quality high
2. Generate Image with Prompt File (Mode A)
bash
node skills/gpt-image-2/scripts/generate.js \
--promptfile garden-gpt-image-2/prompt/poster-20260424-153045.md
3. Edit Existing Image (Mode A)
bash
node skills/gpt-image-2/scripts/edit.js \
--image assets/source.png \
--prompt "Replace the background with a clean studio scene"
4. Local Editing with Mask (Mode A)
bash
node skills/gpt-image-2/scripts/edit.js \
--image assets/source.png \
--mask assets/mask.png \
--prompt "Replace only the masked area with a glass vase"
5. "Usage" for Mode B / C
No command-line entry——this Skill is only a prompt engineering guide at this time:
- Mode B: Render the final prompt → call the host's built-in -type tool (pass prompt as parameter) → get the image.
- Mode C: Render the final prompt → save to
garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
→ display the content directly to users → prompt users which image tools can reuse it directly.
JSON Template Working Method
When JSON templates are provided in
, follow these rules:
- First find the closest category directory from .
- Then locate the specific template file.
- in the template indicates replaceable parameters.
- Values explicitly provided by users are filled in directly.
- If users do not provide values but the template marks , use the default value first.
- If missing information will significantly affect the result, actively ask users.
- Users can also explicitly say "generate randomly for me", then you can keep the default value or reasonably randomize within the scope allowed by the template.
Questioning Rules
When the template lacks key variables, do not ask generally like "What style do you want?" Instead, ask precisely based on the template fields.
For example, when the live commerce UI template lacks the main subject, prioritize asking:
- Who is the host?
- Use real photos, celebrity names, character descriptions, or generate completely randomly?
When missing product information, ask:
- What is the product name?
- Is the product price specified?
- Do you want me to automatically complete comments and gift content?
Template Index
Only read the closest specific template file by task type; do not read the entire
at once.
1. Methodology Master Document
Read first:
references/prompt-writing.md
Applicable to:
- You haven't decided how to construct JSON templates
- You need to judge which fields to ask, which can be default, and which can be randomized
- You need to abstract cases into reusable templates
2. UI Mockups ()
Suitable for various "interface + content" mockup visuals. Currently implemented:
- — E-commerce live streaming screenshot mockup (host + chat area + gift area + product card)
social-interface-mockup.md
— Social platform dynamic detail page mockup (Twitter/X, Xiaohongshu, Weibo, Threads, etc.)
- — Landing page hero / detail page main image (character + product + selling points + price)
- — Chat / dialogue interface mockup (iMessage, WeChat, group chat, AI assistant)
- — Short video cover / live streaming thumbnail (YouTube, Douyin, Bilibili, VTuber stream)
landing-page-case-study.md
— Dark-mode SaaS / marketing case study long page UI mockup (multiple sections + scroll narrative + data cards + CTA)
3. Product Visuals (references/product-visuals/
)
Suitable for visuals with "products as the visual center". Currently implemented:
- — Product exploded view poster (vertical stacked main body + callout + top logo + bottom brand area)
white-background-product.md
— E-commerce pure white background main image (single product / multi-angle / minimalist marketing overlay)
premium-studio-product.md
— High-end studio commercial product image (magazine advertisement-level atmosphere)
- — Gift box / packaging display image (outer box + content display)
lifestyle-product-scene.md
— Lifestyle product scene image (product appears in real scenarios)
ecommerce-marketing-board.md
— Chinese-style e-commerce super composite sales board (main image + detail page + selling points + usage steps + scenarios + TVC storyboard combination in one image)
4. Maps ()
Suitable for "map-style visuals" (infographics have been extracted to independent category 17). Currently implemented:
- — Hand-drawn city food map (numbered spots + legend + central mascot)
- — Travel route map (multi-day itinerary / single-day city walk / outdoor route)
- — Illustrated city style map (landmarks + landscapes + cultural elements)
store-distribution-map.md
— Brand store / service coverage distribution map
itinerary-day-trip-map.md
— One-day trip split poster (left parchment itinerary card + right fantasy realistic map, 5-7 stations strictly aligned)
5. Slides & Visual Docs (references/slides-and-visual-docs/
)
Suitable for visual documents that "explain one thing clearly on one page". Currently implemented:
dense-explainer-slides.md
— Irasutoya × Kasumigaseki hybrid high-density explanation Slide
- — Policy / government announcement / white paper style explanation Slide
- — Business report executive summary / investor briefing / annual report overview page
educational-diagram-slide.md
— Educational schematic (concept / mechanism / process decomposition)
6. Poster & Campaigns (references/poster-and-campaigns/
)
Suitable for "brand key visuals + campaigns + banners + magazine covers". Currently implemented:
- — Brand main poster (product / character / pure text proposition)
- — Campaign Key Visual + derivative layout system
- — Web hero / landing page / app banner (horizontal composition + CTA)
- — Magazine / journal / publication cover
biomimetic-concept-poster.md
— Biomimetic industrial design concept poster (natural prototype → evolution bar → hero render → multi-view technical drawing)
vintage-editorial-infographic.md
— Vintage archive / 1940s editorial-style infographic poster (character + formula + timeline + model, Bell Labs style)
character-catalog-poster.md
— Multi-version infographic poster of the same character (constellation / element / dynasty / personality series cards)
lineup-comparison-poster.md
— Series product lineup comparison infographic poster (30+ SKUs in one image + legend + level key)
7. Portraits & Characters (references/portraits-and-characters/
)
Suitable for "character visuals". Currently implemented:
- — Professional business portrait (LinkedIn / team page / media illustrations)
- — Founder media blockbuster portrait (dramatic lighting + title space reserved)
- — VTuber / virtual host profile card + live preview
- — Comprehensive character setting sheet (three views + expressions + clothing + color palette)
- — N×N pose / action dictionary reference sheet (multiple poses of the same character, dance / combat / fitness)
8. Scenes & Illustrations (references/scenes-and-illustrations/
)
Suitable for illustration-style visuals focusing on "atmosphere + story + emotion". Currently implemented:
- — Healing daily / seasonal scene illustration
- — Cinematic concept large scene / IP key art
- — Children's book / picture book inner page / holiday card
- — Minimalist blank atmosphere image / literary wallpaper
9. Editing Workflows (references/editing-workflows/
)
Suitable for image modification tasks based on existing images (corresponding to
). Currently implemented:
background-replacement.md
— Background replacement (product / portrait / outdoor / studio scene)
local-object-replacement.md
— Local object replacement (with or without mask)
- — Removal of clutter / passers-by / wires / defects
- — Product retouching (gloss / label / shadow / defects)
- — Portrait local modification (hairstyle / clothing / makeup / accessories)
10. Avatars & Profile (references/avatars-and-profile/
)
Suitable for "personal image" visuals such as stylized avatars / character settings / grids / stickers / series portraits. Currently implemented:
- — Convert reference image characters into any style such as cosplay / gothic / retro film / idol photo
character-grid-portrait.md
— N×N grid portrait of the same character (multiple professions / expressions / dynasties / styles)
- — Kawaii 3D / Minecraft / skeuomorphic 3D app icon-style avatar
- — Sticker set / emoji collection (independent elements + stroke + label)
cultural-portrait-series.md
— Dynasty / myth / literature / ethnic series portraits
11. Storyboards & Sequences (references/storyboards-and-sequences/
)
Suitable for "narrative sequence" visuals such as multi-storyboard / comics / relationship diagrams / process steps. Currently implemented:
- — 4-panel comic / satire comic / joke comic (exposition → development → climax → resolution + dialogue bubbles)
- — Single-page / double-page manga storyboard (irregular grids + dialogue + inner thoughts)
- — Single-image anime KV / light novel cover / IP poster
character-relationship-diagram.md
— Character relationship diagram poster (cards + relationship lines + legend)
recipe-process-flowchart.md
— Recipe / tutorial / process step diagram (numbering + illustrations + descriptions)
product-tvc-storyboard.md
— Product TVC commercial advertisement storyboard (9-panel real-shot texture + shot description + duration)
cinematic-storyboard-grid.md
— Cinematic narrative storyboard contact sheet (3×4 / 4×4, continuous narrative + cinematic still)
- — Real-person cinematic process board (equipment wearing / makeup / training / operation decomposition, numbering + step progression)
12. Grids & Collages (references/grids-and-collages/
)
Suitable for "multi-panel grid / collage / project board" visuals. Currently implemented:
- — 2×2 marketing banner set (generate 4 unified series designs at once)
- — 7-day lookbook / 9-grid self-care / TOP N list image
mixed-style-multi-panel.md
— Mixed-style collage (same subject interpreted in different styles)
- — Anime / game / film project pitch board (KV + characters + worldview + copy)
- — Multi-industry / multi-theme mixed advertisement banner grid (each grid has independent industry + style + copy)
13. Branding & Packaging (references/branding-and-packaging/
)
Suitable for "brand identity system / mascot / packaging design" visuals. Currently implemented:
- — Brand identity system board (logo + color scheme + font + application mockup)
- — Mascot multi-panel brand identity set (main image + three views + expressions + applications)
- — Cosmetic / skincare single bottle / series / gift box packaging
- — Beverage / food / condiment label design (Chinese style / Japanese style / Western style)
- — 18+ module large-scale brand identity + mascot full-process document (DNA / moodboard / sketch / line drawing / 3D / color scheme / material / application overview in one image)
- — IP character + peripheral / packaging / poster / social profile multi-element comprehensive brand board
14. Typography & Text Layout (references/typography-and-text-layout/
)
Suitable for types where "text is the main visual" such as "text-first / bilingual layout". Currently implemented:
- — Large-text proposition poster (Japanese high-energy / Swiss minimalist / retro printing)
bilingual-layout-visual.md
— Chinese-English / Chinese-Japanese bilingual layout visual (culture / academic / cross-cultural brand)
15. Assets & Props (references/assets-and-props/
)
Suitable for "set of materials / game assets" visuals such as icon sets / game screenshots. Currently implemented:
retro-skeuomorphic-icons.md
— Skeuomorphic / Y2K / pixel icon set (unified style in a set)
game-screenshot-mockup.md
— In-game screenshot mockup (HUD + subtitles + task panel)
16. Academic Figures (references/academic-figures/
)
Suitable for illustrations for papers / top conference submissions / academic posters / defense PPT / proposal defense / journal submission Graphical Abstract. Overall preference for white background + publication fonts + geometric precision + low-saturation engineering colors (mainly dark blue / gray-blue / black-gray, ≤3 main colors) + printable in monochrome. Strictly prohibit fictional quantitative data (values / contour lines / color scale ranges / formulas).
CS / CV / ML direction:
method-pipeline-overview.md
— Method overview diagram / pipeline figure (multi-stage blocks + data flow; variant 4 provides left/middle/right three-stage technical roadmap for engineering)
neural-network-architecture.md
— Neural network architecture diagram (layer blocks + tensor shape + skip connections)
qualitative-comparison-grid.md
— Multi-method qualitative comparison grid (rows = samples, columns = methods)
Engineering / natural sciences / general defense:
- — Concept / principle / experimental device schematic (high degree of freedom, natural language template)
- — Mechanism schematic / causal link / transformation path (central object + multi-stage transformation + result area; includes three variants: three-stage causal chain / cyclic self-excitation / multi-branch competition)
multi-condition-comparison.md
— Multi-condition / multi-scenario result comparison diagram (side-by-side results of the same object under different conditions, 2×2 / 1×N / M×N; emphasizes strict uniformity between panels)
- — Publication-ready data chart (bar / line / scatter / heatmap / box)
Overview / abstract / defense homepage:
- — Journal submission Graphical Abstract / graphical abstract (four variants: horizontal 4-stage / central expansion / square / vertical)
research-overview-poster.md
— Research overview diagram for proposal / defense / presentation homepage (three layers top-middle-bottom + five modules; includes three variants: central radiation / left-right double column / minimalist)
Selection strategy: For CS/CV/ML papers, prefer
+
qualitative-comparison-grid
; for engineering / energy / chemical engineering / materials directions, prefer
variant 4 +
+
multi-condition-comparison
; use
for journal submission abstract images; use
for defense PPT homepage.
17. Infographics ()
Suitable for "large-scale information visualization" visuals such as infographics / high-density popular science / hand-drawn infographics / KPI dashboards. Currently implemented:
legend-heavy-infographic.md
— High-legend-density popular science / causal chain / evolution / anatomical diagram (bilingual)
hand-drawn-infographic.md
— Hand-drawn style infographic (macaron / morandi / blackboard / kraft paper; natural language template)
bento-grid-infographic.md
— Bento grid modular infographic (high-density multi-module widget arrangement)
comparison-infographic.md
— Binary / multi-element comparison infographic (A vs B / package tiers / misconceptions vs correct answers)
step-by-step-infographic.md
— Step-by-step tutorial infographic (illustrative, warm; non-engineering flowchart)
kpi-dashboard-infographic.md
— KPI dashboard-style infographic (annual review / Wrapped / business dashboard)
18. Technical Diagrams (references/technical-diagrams/
)
Suitable for engineering schematics such as system architecture / process / sequence / state machine / ER / mind map / network topology. Unified dark grid background + monospaced font + role-coded color scheme, each template comes with a light variant.
⚠️ Note: This directory generates PNG bitmaps, not editable SVG; use mermaid / draw.io / excalidraw / Figma if editable versions are needed. Currently implemented:
- — System architecture diagram (frontend + backend + DB + cache + queue + external services)
- — Flowchart / decision diagram (BPMN shape semantics + Yes/No branches)
- — Sequence diagram (actor + lifeline + message arrow + activation bar)
- — State machine / lifecycle diagram (state + transition + guard / action)
- — ER diagram / data model diagram (entity + fields + PK/FK + crow's foot relationships)
- — Technical topic mind map (central + radial branches)
- — Network topology diagram (device glyphs + zone / VPC + bandwidth / protocol labels)
Prompt Workflow (Mode-Aware)
Regardless of A / B / C, the first 6 steps are shared; the difference lies in steps 7-8 for "image generation".
- Run to determine the mode (A / B / C).
- Judge whether the task is image generation or editing.
- Identify which category directory it belongs to (refer to the "Template Index" below).
- Only read the corresponding specific template file, do not read the entire references/ at once.
- Strictly follow the template format: most templates use JSON main templates (preferred for structured tasks), a few templates (such as
infographics/hand-drawn-infographic.md
, academic-figures/scientific-schematic.md
) use a hybrid form of "structured natural language + parameters", because forced JSON will restrict creative freedom.
- Map user input to template parameters; actively initiate targeted clarification questions if key information is missing.
The prompt is now rendered. Branch by mode below:
7-A.
Mode A: Save the final prompt to
garden-gpt-image-2/prompt/
, call
or
, and save images to
garden-gpt-image-2/image/
.
7-B.
Mode B: Directly pass the final prompt to the host's image tool call; save a copy of the prompt to
garden-gpt-image-2/prompt/
as needed.
7-C.
Mode C: Save the final prompt to
garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
, display the complete prompt to users in the conversation, and attach a short "how to use / recommended tools" suggestion.
- After the task ends, tell users in one sentence: what the current mode is, where the prompt is saved, and where the image (if any) is saved.
Important Constraints
General:
- JSON in template files is a prompt structure template, not an API request body template.
- In all three modes, the final content passed to the image model is a "rendered prompt string"——it can be flattened JSON, structured natural language paragraphs, used exactly as per the template.
- Unless explicitly requested by users, do not copy the "mode description" from SKILL.md into the final prompt——that is meta-information for the Agent.
Only applicable to Mode A:
- Generation scripts use JSON body
- Editing scripts use multipart form data
- Responses are parsed preferentially by , and also compatible with
- Do not introduce additional special query parameters unless explicitly required by the upstream interface
When to Ask Questions
Only ask questions when this information is missing and will significantly affect the result:
- No prompt target
- No original image for editing tasks
- Subject identity or visual type determines the result direction
- Product / price / copy / UI text is a core component of the画面
- Users express multiple conflicting goals at the same time
Otherwise, prioritize making reasonable defaults and proceed.