gpt-image-2

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

GPT Image 2

GPT Image 2

这是一个面向 GPT Image 2 的聚焦型技能,在 3 种运行环境下都能用,但行为差异显著。第一步必须先确定当前运行模式
它只做两类图像任务:
  • 生成图片:
    POST /images/generations
  • 编辑图片:
    POST /images/edits
本文件保留:运行模式、技能结构、环境变量、保存 / 命名规则、模板索引、模式感知工作流。详细模板全部放在
references/
,分层组织:
  • 一级:分类目录
  • 二级:单模板 Markdown 文件
This is a focused Skill for GPT Image 2, which can be used in 3 runtime environments with significant behavioral differences. You must first determine the current operating mode as the first step.
It only handles two types of image tasks:
  • Image generation:
    POST /images/generations
  • Image editing:
    POST /images/edits
This file retains: operating modes, Skill structure, environment variables, saving/naming rules, template index, and mode-aware workflows. Detailed templates are all placed in
references/
, organized hierarchically:
  • Level 1: Category directories
  • Level 2: Individual template Markdown files

运行模式(必读,做任何事之前先确定)

Operating Modes (Must Read, Confirm Before Any Operation)

本 Skill 自带一个轻量探测脚本,先跑一次,再根据结果决定怎么干活:
bash
node skills/gpt-image-2/scripts/check-mode.js
This Skill comes with a lightweight detection script. Run it first, then decide how to proceed based on the results:
bash
node skills/gpt-image-2/scripts/check-mode.js

想拿结构化结果给上层程序用:

To get structured results for upper-level programs:

node skills/gpt-image-2/scripts/check-mode.js --json

输出会给出 `mode = A` / `A?` / `B-or-C` 以及 `recommendation`。三个模式定义如下:
node skills/gpt-image-2/scripts/check-mode.js --json

The output will indicate `mode = A` / `A?` / `B-or-C` along with a `recommendation`. The three modes are defined as follows:

Mode A · Garden 本地生图

Mode A · Garden Local Image Generation

触发条件:环境变量
ENABLE_GARDEN_IMAGEGEN
为真(
1
/
true
/
yes
/
on
存在
OPENAI_API_KEY
行为:完整端到端跑通"选模板 → 写 prompt → 调用脚本 → 出图落盘"。
  • scripts/generate.js
    文本生图、
    scripts/edit.js
    编辑现有图。
  • prompt 默认落盘到
    garden-gpt-image-2/prompt/
    、图片落盘到
    garden-gpt-image-2/image/
  • 这是最强的模式:你是图像工具的"持有者"。
Trigger Condition: Environment variable
ENABLE_GARDEN_IMAGEGEN
is true (
1
/
true
/
yes
/
on
) and
OPENAI_API_KEY
exists.
Behavior: Complete end-to-end workflow of "select template → write prompt → call script → generate and save image".
  • Use
    scripts/generate.js
    for text-to-image generation,
    scripts/edit.js
    for editing existing images.
  • Prompts are saved to
    garden-gpt-image-2/prompt/
    by default, and images are saved to
    garden-gpt-image-2/image/
    .
  • This is the most powerful mode: you are the "owner" of the image tool.

Mode B · Host-Native 委托宿主出图

Mode B · Host-Native Delegated Image Generation

触发条件:未启用 Garden(
ENABLE_GARDEN_IMAGEGEN
未设置 / 为假),但当前宿主 Agent 自带图像生成工具或图像 MCP
典型识别信号(你应该自检):
  • 你的工具集里出现
    image_generation
    /
    imagegen
    /
    dalle
    /
    nano_banana
    /
    mcp__*image*
    /
    make_image
    / 类似名字
  • 用户在 ChatGPT / Codex / Gemini / Cursor 等支持原生出图的客户端中调用本 Skill
  • 用户显式说"用你自己的工具出图"
行为:本 Skill 退化成提示词工程指引——
  1. 仍按"选模板 → 填字段 → 渲染最终 prompt"的流程走。
  2. 不要调用
    node scripts/generate.js
    (没有 API key、必失败)。
  3. 直接调用宿主自带的图像工具,把渲染好的 prompt 作为输入。
  4. 如用户希望可顺手把 prompt 文件保存到
    garden-gpt-image-2/prompt/
    ,但图片去向由宿主决定,不强制。
Trigger Condition: Garden is not enabled (
ENABLE_GARDEN_IMAGEGEN
is not set / false), but the current host Agent has built-in image generation tools or image MCP.
Typical Identification Signals (you should self-check):
  • Tools like
    image_generation
    /
    imagegen
    /
    dalle
    /
    nano_banana
    /
    mcp__*image*
    /
    make_image
    or similar names appear in your toolset
  • Users call this Skill in clients that support native image generation such as ChatGPT / Codex / Gemini / Cursor
  • Users explicitly say "use your own tool to generate images"
Behavior: This Skill degrades to a prompt engineering guide——
  1. Still follow the workflow of "select template → fill in fields → render final prompt".
  2. Do not call
    node scripts/generate.js
    (no API key, will definitely fail).
  3. Directly call the host's built-in image tool, passing the rendered prompt as input.
  4. If users wish, you can save the prompt file to
    garden-gpt-image-2/prompt/
    , but the image storage location is determined by the host and not mandatory.

Mode C · Advisor 纯提示词顾问

Mode C · Advisor Pure Prompt Consultant

触发条件:未启用 Garden,宿主 Agent 也没有任何图像生成工具。
行为:本 Skill 退化为"高质量 prompt 撰写顾问"——
  1. 按"选模板 → 填字段 → 渲染最终 prompt"流程走,缺信息就问用户。
  2. 把最终 prompt 直接打印给用户 + 保存一份到
    garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
  3. 附一句简短的"如何使用"建议(如:丢进 ChatGPT / Midjourney / DALL·E / Sora / Nano Banana / 自己后端 / 第三方 GPT Image 2 网关)。
  4. 不要假装出图成功。明确告知用户:"已生成可直接复用的高质量 prompt,请用你的图像工具执行。"
Trigger Condition: Garden is not enabled, and the host Agent has no image generation tools.
Behavior: This Skill degrades to a "high-quality prompt writing consultant"——
  1. Follow the workflow of "select template → fill in fields → render final prompt", and ask users if information is missing.
  2. Directly print the final prompt to users + save a copy to
    garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
    .
  3. Attach a short "how to use" suggestion (e.g.: paste into ChatGPT / Midjourney / DALL·E / Sora / Nano Banana / your own backend / third-party GPT Image 2 gateway).
  4. Do not pretend image generation was successful. Clearly inform users: "A high-quality reusable prompt has been generated. Please execute it with your image tool."

模式决策表

Mode Decision Table

条件模式调用脚本?落盘 prompt?落盘图片?
ENABLE_GARDEN_IMAGEGEN=1
+ 有 KEY
A
generate.js
/
edit.js
✅ 自动✅ 自动
ENABLE_GARDEN_IMAGEGEN=1
但没 KEY
A?❌(先要 KEY)
未启用 + 宿主有图像工具B❌(用宿主工具)可选由宿主决定
未启用 + 宿主无图像工具C✅ 必须❌(无法)
ConditionModeCall Script?Save Prompt?Save Image?
ENABLE_GARDEN_IMAGEGEN=1
+ API key exists
A
generate.js
/
edit.js
✅ Auto✅ Auto
ENABLE_GARDEN_IMAGEGEN=1
but no API key
A?❌ (Ask for API key first)
Garden not enabled + host has image toolsB❌ (Use host tools)OptionalDetermined by host
Garden not enabled + host has no image toolsC✅ Mandatory❌ (Impossible)

模式不确定时

When Mode is Uncertain

  • 如果你判断不清自己是 B 还是 C,直接问用户一句:"是用你环境里的图像工具出图,还是只要我写好提示词?"
  • Mode A 调脚本失败(401 / 网络 / 配额)→ 报错并询问"切到 B / C 吗?"
  • If you cannot determine whether you are in Mode B or C, directly ask users: "Shall I use the image tool in your environment to generate images, or just write the prompt for you?"
  • If Mode A script call fails (401 / network / quota) → report error and ask "Switch to Mode B / C?"

用户输入工具

User Input Tools

当此技能需要向用户提问时,遵循以下规则:
  1. 优先使用当前运行时提供的用户输入工具。
  2. 如果没有对应工具,则用简短的纯文本编号问题提问。
  3. 能合并的问题尽量一次问完。
When this Skill needs to ask users questions, follow these rules:
  1. Prioritize using the user input tools provided by the current runtime.
  2. If no corresponding tool exists, ask with short plain text numbered questions.
  3. Combine questions as much as possible and ask them all at once.

技能结构

Skill Structure

  • scripts/check-mode.js
    先跑这个,检测运行模式(A / B / C)
  • scripts/generate.js
    :文本生图(仅 Mode A 使用)
  • scripts/edit.js
    :基于原图 / 遮罩改图(仅 Mode A 使用)
  • scripts/shared.js
    :共享请求、保存、环境变量读取逻辑
  • references/
    :分层结构化提示词模板(A / B / C 三模式都用)
  • scripts/check-mode.js
    : Run this first to detect the operating mode (A / B / C)
  • scripts/generate.js
    : Text-to-image generation (only used in Mode A)
  • scripts/edit.js
    : Image editing based on original image/mask (only used in Mode A)
  • scripts/shared.js
    : Shared logic for requests, saving, and environment variable reading
  • references/
    : Hierarchical structured prompt templates (used in all three modes A / B / C)

环境变量

Environment Variables

按以下顺序读取配置:
  1. CLI 参数
  2. process.env
  3. <cwd>/.env
  4. <cwd>/.gateway.env
  5. ~/.gateway.env
核心变量:
  • ENABLE_GARDEN_IMAGEGEN
    模式开关
    1
    /
    true
    /
    yes
    /
    on
    时启用 Mode A;未设置或其它值则进入 Mode B / C。
  • OPENAI_API_KEY
    — Mode A 必需;B / C 不需要。
  • OPENAI_BASE_URL
    — 默认
    https://api.openai.com/v1
    ,可指向第三方兼容网关。
  • OPENAI_IMAGE_MODEL
    — 默认
    gpt-image-2
    ,可换成网关支持的型号(如
    gpt-image-1
    /
    dall-e-3
    )。
默认实现按 OpenAI 兼容接口工作,不写死任何第三方网关。
Read configurations in the following order:
  1. CLI parameters
  2. process.env
  3. <cwd>/.env
  4. <cwd>/.gateway.env
  5. ~/.gateway.env
Core variables:
  • ENABLE_GARDEN_IMAGEGEN
    Mode switch. Enable Mode A when set to
    1
    /
    true
    /
    yes
    /
    on
    ; enter Mode B / C if not set or set to other values.
  • OPENAI_API_KEY
    — Required for Mode A; not needed for B / C.
  • OPENAI_BASE_URL
    — Defaults to
    https://api.openai.com/v1
    , can point to third-party compatible gateways.
  • OPENAI_IMAGE_MODEL
    — Defaults to
    gpt-image-2
    , can be replaced with models supported by the gateway (e.g.,
    gpt-image-1
    /
    dall-e-3
    ).
The default implementation works with OpenAI-compatible APIs and does not hardcode any third-party gateways.

默认输出目录

Default Output Directories

如果用户没有明确指定输出路径,统一使用当前工作区下的:
  • 提示词目录:
    garden-gpt-image-2/prompt/
    A / B / C 三种模式都建议用,方便复用与版本管理)
  • 图片目录:
    garden-gpt-image-2/image/
    仅 Mode A 使用;Mode B 由宿主决定,Mode C 不产生图)
如果目录不存在,脚本(Mode A)必须自动创建;Mode B / C 在写 prompt 前手动
mkdir -p
If users do not explicitly specify output paths, uniformly use the following directories in the current workspace:
  • Prompt directory:
    garden-gpt-image-2/prompt/
    (Recommended for all three modes A / B / C for easy reuse and version management)
  • Image directory:
    garden-gpt-image-2/image/
    (Only used in Mode A; determined by host in Mode B, no images generated in Mode C)
If the directories do not exist, scripts (Mode A) must create them automatically; Mode B / C should manually run
mkdir -p
before writing prompts.

默认命名规则

Default Naming Rules

如果用户没有明确指定文件名,脚本应自动生成与当前任务相关的文件名,并追加当前时间戳,避免重名。
命名规则:
  • 提示词:
    garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
  • 图片:
    garden-gpt-image-2/image/<task-slug>-<timestamp>.png
其中:
  • <task-slug>
    :根据当前用户要求自动提取一个相关短名称
  • <timestamp>
    :当前时间戳,例如
    20260424-153045
示例:
  • garden-gpt-image-2/prompt/live-commerce-ui-20260424-153045.md
  • garden-gpt-image-2/image/live-commerce-ui-20260424-153045.png
  • garden-gpt-image-2/prompt/vr-headset-exploded-view-20260424-153102.md
  • garden-gpt-image-2/image/vr-headset-exploded-view-20260424-153102.png
If users do not explicitly specify filenames, scripts should automatically generate filenames related to the current task and append the current timestamp to avoid duplicates.
Naming rules:
  • Prompt:
    garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
  • Image:
    garden-gpt-image-2/image/<task-slug>-<timestamp>.png
Where:
  • <task-slug>
    : A relevant short name automatically extracted based on current user requirements
  • <timestamp>
    : Current timestamp, e.g.,
    20260424-153045
Examples:
  • garden-gpt-image-2/prompt/live-commerce-ui-20260424-153045.md
  • garden-gpt-image-2/image/live-commerce-ui-20260424-153045.png
  • garden-gpt-image-2/prompt/vr-headset-exploded-view-20260424-153102.md
  • garden-gpt-image-2/image/vr-headset-exploded-view-20260424-153102.png

Prompt 保存规则

Prompt Saving Rules

模式是否必须保存 prompt说明
Mode A✅ 必须进入实际生成 / 编辑流程必落盘
Mode B推荐默认建议保存方便复用;用户说"不用"就略过
Mode C✅ 必须用户拿走 prompt 自己执行,不落盘等于白干
通用规则(适用三种模式):
  1. 如果用户显式给了 prompt 文件路径,可直接使用该文件作为输入。
  2. 如果用户直接给的是文本 prompt,也要先把最终 prompt 保存到
    garden-gpt-image-2/prompt/
  3. 如果用户显式指定了
    --prompt-output
    ,则尊重用户指定路径。
  4. 否则使用默认命名规则自动保存。
ModeMandatory to Save Prompt?Description
Mode A✅ MandatoryMust save prompt when entering actual generation/editing workflow
Mode BRecommendedDefault to save for easy reuse; skip if users say "no"
Mode C✅ MandatoryUsers take the prompt to execute themselves; not saving is useless
General rules (applicable to all three modes):
  1. If users explicitly provide a prompt file path, use that file directly as input.
  2. If users directly provide a text prompt, save the final prompt to
    garden-gpt-image-2/prompt/
    first.
  3. If users explicitly specify
    --prompt-output
    , respect the user-specified path.
  4. Otherwise, use the default naming rules to save automatically.

图片保存规则(仅 Mode A)

Image Saving Rules (Only Mode A)

  1. 如果用户显式指定了
    --image
    --output
    ,则尊重用户指定路径。
  2. 否则默认保存到
    garden-gpt-image-2/image/
  3. 文件名应和当前任务语义相关,并附加时间戳。
Mode B 由宿主图像工具决定保存方式;Mode C 不产生图片。
  1. If users explicitly specify
    --image
    or
    --output
    , respect the user-specified path.
  2. Otherwise, save to
    garden-gpt-image-2/image/
    by default.
  3. Filenames should be semantically related to the current task and appended with a timestamp.
Mode B follows the saving method determined by the host image tool; Mode C does not generate images.

快速用法

Quick Usage

0. 检测运行模式(任何任务的第一步

0. Detect Operating Mode (First Step for Any Task)

bash
node skills/gpt-image-2/scripts/check-mode.js
输出会告诉你当前是 Mode A / B / C,决定后续是否调用
generate.js
/
edit.js
。下面 1~4 仅在 Mode A 下使用。
bash
node skills/gpt-image-2/scripts/check-mode.js
The output will tell you if you are in Mode A / B / C, determining whether to call
generate.js
/
edit.js
next. Steps 1~4 below are only for Mode A.

1. 文本生图(Mode A)

1. Text-to-Image Generation (Mode A)

bash
node skills/gpt-image-2/scripts/generate.js \
  --prompt "A cute baby sea otter" \
  --size 1024x1024 \
  --quality high
bash
node skills/gpt-image-2/scripts/generate.js \
  --prompt "A cute baby sea otter" \
  --size 1024x1024 \
  --quality high

2. 用提示词文件生图(Mode A)

2. Generate Image with Prompt File (Mode A)

bash
node skills/gpt-image-2/scripts/generate.js \
  --promptfile garden-gpt-image-2/prompt/poster-20260424-153045.md
bash
node skills/gpt-image-2/scripts/generate.js \
  --promptfile garden-gpt-image-2/prompt/poster-20260424-153045.md

3. 编辑已有图片(Mode A)

3. Edit Existing Image (Mode A)

bash
node skills/gpt-image-2/scripts/edit.js \
  --image assets/source.png \
  --prompt "Replace the background with a clean studio scene"
bash
node skills/gpt-image-2/scripts/edit.js \
  --image assets/source.png \
  --prompt "Replace the background with a clean studio scene"

4. 带遮罩的局部编辑(Mode A)

4. Local Editing with Mask (Mode A)

bash
node skills/gpt-image-2/scripts/edit.js \
  --image assets/source.png \
  --mask assets/mask.png \
  --prompt "Replace only the masked area with a glass vase"
bash
node skills/gpt-image-2/scripts/edit.js \
  --image assets/source.png \
  --mask assets/mask.png \
  --prompt "Replace only the masked area with a glass vase"

5. Mode B / C 的"用法"

5. "Usage" for Mode B / C

没有命令行入口——本 Skill 此时只是提示词工程指南
  • Mode B:渲染好最终 prompt → 调用宿主自带的
    image_generation
    类工具(参数中传入 prompt)→ 拿到图。
  • Mode C:渲染好最终 prompt → 保存到
    garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
    → 把内容直接展示给用户 → 提示用户在哪些图像工具中可以直接复用。
No command-line entry——this Skill is only a prompt engineering guide at this time:
  • Mode B: Render the final prompt → call the host's built-in
    image_generation
    -type tool (pass prompt as parameter) → get the image.
  • Mode C: Render the final prompt → save to
    garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
    → display the content directly to users → prompt users which image tools can reuse it directly.

JSON 模板工作方式

JSON Template Working Method

references/
中提供 JSON 模板时,按下面规则使用:
  1. 先从
    SKILL.md
    找到最贴近的分类目录。
  2. 再定位到具体模板文件。
  3. 模板中的
    {argument ...}
    表示可替换参数。
  4. 用户明确提供的值,直接填入。
  5. 用户没有提供,但模板标了
    default
    的,默认可以先用默认值。
  6. 如果缺失信息会显著影响结果,主动询问用户。
  7. 用户也可以明确说“你随机生成”,这时可以保留默认值或在模板允许范围内合理随机化。
When JSON templates are provided in
references/
, follow these rules:
  1. First find the closest category directory from
    SKILL.md
    .
  2. Then locate the specific template file.
  3. {argument ...}
    in the template indicates replaceable parameters.
  4. Values explicitly provided by users are filled in directly.
  5. If users do not provide values but the template marks
    default
    , use the default value first.
  6. If missing information will significantly affect the result, actively ask users.
  7. Users can also explicitly say "generate randomly for me", then you can keep the default value or reasonably randomize within the scope allowed by the template.

询问规则

Questioning Rules

当模板缺少关键变量时,不要笼统地问“你想要什么风格?”。应当根据模板字段精确提问。
例如直播 UI 模板缺少主体时,应优先问:
  • 主播是谁?
  • 用真人照片、名人名字、人物描述,还是完全随机生成?
缺少商品信息时应问:
  • 商品名称是什么?
  • 商品价格是否指定?
  • 是否希望我自动补全评论和礼物内容?
When the template lacks key variables, do not ask generally like "What style do you want?" Instead, ask precisely based on the template fields.
For example, when the live commerce UI template lacks the main subject, prioritize asking:
  • Who is the host?
  • Use real photos, celebrity names, character descriptions, or generate completely randomly?
When missing product information, ask:
  • What is the product name?
  • Is the product price specified?
  • Do you want me to automatically complete comments and gift content?

模板索引

Template Index

按任务类型只读取最贴近的具体模板文件,不要一次性全读整个
references/
Only read the closest specific template file by task type; do not read the entire
references/
at once.

1. 方法论总文档

1. Methodology Master Document

先读:
  • references/prompt-writing.md
适用于:
  • 你还没决定怎么构造 JSON 模板
  • 你需要判断哪些字段该问、哪些字段可默认、哪些字段可随机
  • 你需要把案例抽象成可复用模板
Read first:
  • references/prompt-writing.md
Applicable to:
  • You haven't decided how to construct JSON templates
  • You need to judge which fields to ask, which can be default, and which can be randomized
  • You need to abstract cases into reusable templates

2. UI Mockups (
references/ui-mockups/
)

2. UI Mockups (
references/ui-mockups/
)

适合各种“界面 + 内容”的样机视觉。当前已落地:
  • live-commerce-ui.md
    — 电商直播带货截图样机(主播 + 聊天区 + 礼物区 + 商品卡)
  • social-interface-mockup.md
    — 社交平台动态详情页样机(Twitter/X、小红书、微博、Threads 等)
  • product-card-overlay.md
    — 落地页 hero / 详情页主图(人物 + 商品 + 卖点 + 价格)
  • chat-interface-scene.md
    — 聊天 / 对话界面样机(iMessage、微信、群聊、AI 助手)
  • short-video-cover-ui.md
    — 短视频封面 / 直播缩略图(YouTube、抖音、B 站、VTuber stream)
  • landing-page-case-study.md
    — 深色 SaaS / 营销 case study 长页面 UI mockup(多 section + 滚动叙事 + 数据卡 + CTA)
Suitable for various "interface + content" mockup visuals. Currently implemented:
  • live-commerce-ui.md
    — E-commerce live streaming screenshot mockup (host + chat area + gift area + product card)
  • social-interface-mockup.md
    — Social platform dynamic detail page mockup (Twitter/X, Xiaohongshu, Weibo, Threads, etc.)
  • product-card-overlay.md
    — Landing page hero / detail page main image (character + product + selling points + price)
  • chat-interface-scene.md
    — Chat / dialogue interface mockup (iMessage, WeChat, group chat, AI assistant)
  • short-video-cover-ui.md
    — Short video cover / live streaming thumbnail (YouTube, Douyin, Bilibili, VTuber stream)
  • landing-page-case-study.md
    — Dark-mode SaaS / marketing case study long page UI mockup (multiple sections + scroll narrative + data cards + CTA)

3. Product Visuals (
references/product-visuals/
)

3. Product Visuals (
references/product-visuals/
)

适合“以商品为视觉中心”的图。当前已落地:
  • exploded-view-poster.md
    — 产品爆炸视图海报(主体垂直堆叠 + callout + 顶部 logo + 底部品牌区)
  • white-background-product.md
    — 电商纯白底主图(单品 / 多角度 / 极简营销叠层)
  • premium-studio-product.md
    — 高级影棚商业产品图(杂志广告级氛围)
  • packaging-showcase.md
    — 礼盒 / 包装展示图(外盒 + 内容物展示)
  • lifestyle-product-scene.md
    — 生活方式产品场景图(商品出现在真实场景中)
  • ecommerce-marketing-board.md
    — 中式电商超复合销售看板(主图 + 详情页 + 卖点 + 使用步骤 + 场景 + TVC 分镜组合一图)
Suitable for visuals with "products as the visual center". Currently implemented:
  • exploded-view-poster.md
    — Product exploded view poster (vertical stacked main body + callout + top logo + bottom brand area)
  • white-background-product.md
    — E-commerce pure white background main image (single product / multi-angle / minimalist marketing overlay)
  • premium-studio-product.md
    — High-end studio commercial product image (magazine advertisement-level atmosphere)
  • packaging-showcase.md
    — Gift box / packaging display image (outer box + content display)
  • lifestyle-product-scene.md
    — Lifestyle product scene image (product appears in real scenarios)
  • ecommerce-marketing-board.md
    — Chinese-style e-commerce super composite sales board (main image + detail page + selling points + usage steps + scenarios + TVC storyboard combination in one image)

4. Maps (
references/maps/
)

4. Maps (
references/maps/
)

适合“地图类视觉”(信息图已抽离到独立分类 17)。当前已落地:
  • food-map.md
    — 城市美食手绘地图(编号点位 + 图例 + 中心吉祥物)
  • travel-route-map.md
    — 旅行路线图(多日行程 / 单日 city walk / 户外路线)
  • illustrated-city-map.md
    — 城市风貌插画地图(地标 + 江山 + 文化元素)
  • store-distribution-map.md
    — 品牌门店 / 服务覆盖分布图
  • itinerary-day-trip-map.md
    一日游 split 海报(左 parchment 行程卡 + 右奇幻写实地图,5-7 站点严格对齐)
Suitable for "map-style visuals" (infographics have been extracted to independent category 17). Currently implemented:
  • food-map.md
    — Hand-drawn city food map (numbered spots + legend + central mascot)
  • travel-route-map.md
    — Travel route map (multi-day itinerary / single-day city walk / outdoor route)
  • illustrated-city-map.md
    — Illustrated city style map (landmarks + landscapes + cultural elements)
  • store-distribution-map.md
    — Brand store / service coverage distribution map
  • itinerary-day-trip-map.md
    One-day trip split poster (left parchment itinerary card + right fantasy realistic map, 5-7 stations strictly aligned)

5. Slides & Visual Docs (
references/slides-and-visual-docs/
)

5. Slides & Visual Docs (
references/slides-and-visual-docs/
)

适合“一页讲清楚一件事”的视觉文档。当前已落地:
  • dense-explainer-slides.md
    — Irasutoya × 霞关混合高密度讲解 Slide
  • policy-style-slide.md
    — 政策 / 政府公告 / 白皮书风格说明 Slide
  • visual-report-page.md
    — 商业报告执行摘要 / 投资人简报 / 年报概览页
  • educational-diagram-slide.md
    — 教学示意图(概念 / 机制 / 流程分解)
Suitable for visual documents that "explain one thing clearly on one page". Currently implemented:
  • dense-explainer-slides.md
    — Irasutoya × Kasumigaseki hybrid high-density explanation Slide
  • policy-style-slide.md
    — Policy / government announcement / white paper style explanation Slide
  • visual-report-page.md
    — Business report executive summary / investor briefing / annual report overview page
  • educational-diagram-slide.md
    — Educational schematic (concept / mechanism / process decomposition)

6. Poster & Campaigns (
references/poster-and-campaigns/
)

6. Poster & Campaigns (
references/poster-and-campaigns/
)

适合“品牌主视觉 + campaign + banner + 杂志封面”。当前已落地:
  • brand-poster.md
    — 品牌主海报(产品 / 人物 / 纯文字主张)
  • campaign-kv.md
    — Campaign Key Visual + 衍生 layout 系统
  • banner-hero.md
    — Web hero / 落地页 / app banner(横向构图 + CTA)
  • editorial-cover.md
    — 杂志 / 期刊 / 出版物封面
  • biomimetic-concept-poster.md
    — 仿生工业设计概念海报(自然原型 → 演化条 → hero render → 多视图技术图)
  • vintage-editorial-infographic.md
    — 复古档案 / 1940s 编辑式信息图海报(人物 + 公式 + 时间轴 + 模型,Bell Labs 风)
  • character-catalog-poster.md
    — 同一角色多版本信息图海报(星座 / 元素 / 朝代 / 人格系列卡片)
  • lineup-comparison-poster.md
    — 系列产品 lineup 对比信息图海报(30+ SKU 同图 + 图例 + 等级 key)
Suitable for "brand key visuals + campaigns + banners + magazine covers". Currently implemented:
  • brand-poster.md
    — Brand main poster (product / character / pure text proposition)
  • campaign-kv.md
    — Campaign Key Visual + derivative layout system
  • banner-hero.md
    — Web hero / landing page / app banner (horizontal composition + CTA)
  • editorial-cover.md
    — Magazine / journal / publication cover
  • biomimetic-concept-poster.md
    — Biomimetic industrial design concept poster (natural prototype → evolution bar → hero render → multi-view technical drawing)
  • vintage-editorial-infographic.md
    — Vintage archive / 1940s editorial-style infographic poster (character + formula + timeline + model, Bell Labs style)
  • character-catalog-poster.md
    — Multi-version infographic poster of the same character (constellation / element / dynasty / personality series cards)
  • lineup-comparison-poster.md
    — Series product lineup comparison infographic poster (30+ SKUs in one image + legend + level key)

7. Portraits & Characters (
references/portraits-and-characters/
)

7. Portraits & Characters (
references/portraits-and-characters/
)

适合“人物视觉”。当前已落地:
  • professional-portrait.md
    — 职业级商务肖像(LinkedIn / 团队页 / 媒体配图)
  • founder-portrait.md
    — 创始人媒体大片肖像(戏剧灯光 + 留标题位)
  • virtual-host.md
    — VTuber / 虚拟主播个人卡 + 直播预览
  • character-sheet.md
    — 角色综合设定稿(三视图 + 表情 + 服装 + 配色板)
  • pose-reference-sheet.md
    — N×N 姿势 / 动作字典参考表(同一角色多姿势,舞蹈 / 战斗 / 健身)
Suitable for "character visuals". Currently implemented:
  • professional-portrait.md
    — Professional business portrait (LinkedIn / team page / media illustrations)
  • founder-portrait.md
    — Founder media blockbuster portrait (dramatic lighting + title space reserved)
  • virtual-host.md
    — VTuber / virtual host profile card + live preview
  • character-sheet.md
    — Comprehensive character setting sheet (three views + expressions + clothing + color palette)
  • pose-reference-sheet.md
    — N×N pose / action dictionary reference sheet (multiple poses of the same character, dance / combat / fitness)

8. Scenes & Illustrations (
references/scenes-and-illustrations/
)

8. Scenes & Illustrations (
references/scenes-and-illustrations/
)

适合“氛围 + 故事 + 情绪”的插画类视觉。当前已落地:
  • healing-scene.md
    — 治愈系日常 / 季节场景插画
  • concept-scene.md
    — 电影感概念大场景 / IP key art
  • picture-book-scene.md
    — 童书 / 绘本内页 / 节日卡片
  • minimalist-mood-scene.md
    — 极简留白氛围图 / 文学性壁纸
Suitable for illustration-style visuals focusing on "atmosphere + story + emotion". Currently implemented:
  • healing-scene.md
    — Healing daily / seasonal scene illustration
  • concept-scene.md
    — Cinematic concept large scene / IP key art
  • picture-book-scene.md
    — Children's book / picture book inner page / holiday card
  • minimalist-mood-scene.md
    — Minimalist blank atmosphere image / literary wallpaper

9. Editing Workflows (
references/editing-workflows/
)

9. Editing Workflows (
references/editing-workflows/
)

适合“基于现有图片做编辑”的图改任务(对应
scripts/edit.js
)。当前已落地:
  • background-replacement.md
    — 背景替换(商品 / 人像 / 户外 / 棚景)
  • local-object-replacement.md
    — 局部对象替换(配合或不配合蒙版)
  • object-removal.md
    — 杂物 / 路人 / 电线 / 瑕疵去除
  • product-retouching.md
    — 产品精修(光泽 / 标签 / 阴影 / 瑕疵)
  • portrait-local-edit.md
    — 人像局部修改(发型 / 服装 / 妆容 / 配饰)
Suitable for image modification tasks based on existing images (corresponding to
scripts/edit.js
). Currently implemented:
  • background-replacement.md
    — Background replacement (product / portrait / outdoor / studio scene)
  • local-object-replacement.md
    — Local object replacement (with or without mask)
  • object-removal.md
    — Removal of clutter / passers-by / wires / defects
  • product-retouching.md
    — Product retouching (gloss / label / shadow / defects)
  • portrait-local-edit.md
    — Portrait local modification (hairstyle / clothing / makeup / accessories)

10. Avatars & Profile (
references/avatars-and-profile/
)

10. Avatars & Profile (
references/avatars-and-profile/
)

适合“风格化头像 / 人设 / 网格 / 贴纸 / 系列肖像”等"个人形象"类视觉。当前已落地:
  • style-transfer-selfie.md
    — 把参考图人物转成 cosplay / 哥特 / 复古胶片 / 偶像写真等任意风格
  • character-grid-portrait.md
    — 同一角色 n×n 网格肖像(多职业 / 多表情 / 多朝代 / 多风格)
  • themed-3d-icon.md
    — Kawaii 3D / Minecraft / 拟物 3D 应用图标式头像
  • sticker-set.md
    — 贴纸套装 / 表情包合集(独立元素 + 描边 + 标签)
  • cultural-portrait-series.md
    — 朝代 / 神话 / 文学 / 民族系列肖像
Suitable for "personal image" visuals such as stylized avatars / character settings / grids / stickers / series portraits. Currently implemented:
  • style-transfer-selfie.md
    — Convert reference image characters into any style such as cosplay / gothic / retro film / idol photo
  • character-grid-portrait.md
    — N×N grid portrait of the same character (multiple professions / expressions / dynasties / styles)
  • themed-3d-icon.md
    — Kawaii 3D / Minecraft / skeuomorphic 3D app icon-style avatar
  • sticker-set.md
    — Sticker set / emoji collection (independent elements + stroke + label)
  • cultural-portrait-series.md
    — Dynasty / myth / literature / ethnic series portraits

11. Storyboards & Sequences (
references/storyboards-and-sequences/
)

11. Storyboards & Sequences (
references/storyboards-and-sequences/
)

适合“多分镜 / 漫画 / 关系图 / 流程步骤”等"叙事性序列"类视觉。当前已落地:
  • four-panel-comic.md
    — 4 格漫画 / 讽刺漫画 / 段子漫画(起承转合 + 对话气泡)
  • manga-spread-page.md
    — 单页 / 跨页漫画分镜(不规则格子 + 对话 + 心声)
  • anime-key-visual.md
    — 单图动漫 KV / 轻小说封面 / IP 海报
  • character-relationship-diagram.md
    — 角色关系图海报(卡片 + 关系连线 + 图例)
  • recipe-process-flowchart.md
    — 食谱 / 教程 / 流程步骤图(编号 + 插图 + 说明)
  • product-tvc-storyboard.md
    — 产品 TVC 商业广告分镜板(9-panel 实拍质感 + 镜头描述 + 时长)
  • cinematic-storyboard-grid.md
    电影感叙事分镜 contact sheet(3×4 / 4×4,连续叙事 + cinematic still)
  • process-photo-board.md
    — 真人 cinematic 流程板(装备穿戴 / 化妆 / 训练 / 操作分解,编号 + 步骤递进)
Suitable for "narrative sequence" visuals such as multi-storyboard / comics / relationship diagrams / process steps. Currently implemented:
  • four-panel-comic.md
    — 4-panel comic / satire comic / joke comic (exposition → development → climax → resolution + dialogue bubbles)
  • manga-spread-page.md
    — Single-page / double-page manga storyboard (irregular grids + dialogue + inner thoughts)
  • anime-key-visual.md
    — Single-image anime KV / light novel cover / IP poster
  • character-relationship-diagram.md
    — Character relationship diagram poster (cards + relationship lines + legend)
  • recipe-process-flowchart.md
    — Recipe / tutorial / process step diagram (numbering + illustrations + descriptions)
  • product-tvc-storyboard.md
    — Product TVC commercial advertisement storyboard (9-panel real-shot texture + shot description + duration)
  • cinematic-storyboard-grid.md
    Cinematic narrative storyboard contact sheet (3×4 / 4×4, continuous narrative + cinematic still)
  • process-photo-board.md
    — Real-person cinematic process board (equipment wearing / makeup / training / operation decomposition, numbering + step progression)

12. Grids & Collages (
references/grids-and-collages/
)

12. Grids & Collages (
references/grids-and-collages/
)

适合“多面板网格 / 拼贴 / 立项 board”类视觉。当前已落地:
  • banner-grid-2x2.md
    — 2×2 营销 banner 套装(一次出 4 张统一系列设计)
  • lookbook-grid.md
    — 7 日 lookbook / 9 宫 self-care / TOP N 清单图
  • mixed-style-multi-panel.md
    — 多风格混合拼贴(同一主体不同画风演绎)
  • anime-pitch-board.md
    — 动漫 / 游戏 / 影视立项 pitch board(KV + 角色 + 世界观 + 文案)
  • ad-banner-multi-grid.md
    — 多行业 / 多主题混合广告 banner 网格(每格独立行业 + 风格 + 文案)
Suitable for "multi-panel grid / collage / project board" visuals. Currently implemented:
  • banner-grid-2x2.md
    — 2×2 marketing banner set (generate 4 unified series designs at once)
  • lookbook-grid.md
    — 7-day lookbook / 9-grid self-care / TOP N list image
  • mixed-style-multi-panel.md
    — Mixed-style collage (same subject interpreted in different styles)
  • anime-pitch-board.md
    — Anime / game / film project pitch board (KV + characters + worldview + copy)
  • ad-banner-multi-grid.md
    — Multi-industry / multi-theme mixed advertisement banner grid (each grid has independent industry + style + copy)

13. Branding & Packaging (
references/branding-and-packaging/
)

13. Branding & Packaging (
references/branding-and-packaging/
)

适合“品牌识别系统 / 吉祥物 / 包装设计”类视觉。当前已落地:
  • brand-identity-board.md
    — 品牌识别系统板(logo + 配色 + 字体 + 应用 mockup)
  • mascot-brand-kit.md
    — 吉祥物多面板品牌识别套装(主形象 + 三视图 + 表情 + 应用)
  • cosmetic-packaging.md
    — 化妆品 / 护肤品 单瓶 / 系列 / 礼盒包装
  • beverage-label-design.md
    — 饮料 / 食品 / 调味品标签设计(国潮 / 日式 / 西式)
  • full-mascot-brand-doc.md
    18+ 模块大型品牌识别 + 吉祥物全流程文档(DNA / moodboard / 草图 / 线稿 / 3D / 配色 / 材质 / 应用一图概览)
  • character-merch-board.md
    — IP 角色 + 周边 / 包装 / 海报 / 社交 profile 多元素综合品牌板
Suitable for "brand identity system / mascot / packaging design" visuals. Currently implemented:
  • brand-identity-board.md
    — Brand identity system board (logo + color scheme + font + application mockup)
  • mascot-brand-kit.md
    — Mascot multi-panel brand identity set (main image + three views + expressions + applications)
  • cosmetic-packaging.md
    — Cosmetic / skincare single bottle / series / gift box packaging
  • beverage-label-design.md
    — Beverage / food / condiment label design (Chinese style / Japanese style / Western style)
  • full-mascot-brand-doc.md
    18+ module large-scale brand identity + mascot full-process document (DNA / moodboard / sketch / line drawing / 3D / color scheme / material / application overview in one image)
  • character-merch-board.md
    — IP character + peripheral / packaging / poster / social profile multi-element comprehensive brand board

14. Typography & Text Layout (
references/typography-and-text-layout/
)

14. Typography & Text Layout (
references/typography-and-text-layout/
)

适合“字面优先 / 双语版式”等"以文字为主视觉"的类型。当前已落地:
  • title-safe-poster.md
    — 大字主张型海报(日式高能量 / 瑞士极简 / 复古印刷)
  • bilingual-layout-visual.md
    — 中英 / 中日双语版式视觉(文化 / 学术 / 跨文化品牌)
Suitable for types where "text is the main visual" such as "text-first / bilingual layout". Currently implemented:
  • title-safe-poster.md
    — Large-text proposition poster (Japanese high-energy / Swiss minimalist / retro printing)
  • bilingual-layout-visual.md
    — Chinese-English / Chinese-Japanese bilingual layout visual (culture / academic / cross-cultural brand)

15. Assets & Props (
references/assets-and-props/
)

15. Assets & Props (
references/assets-and-props/
)

适合“图标集 / 游戏截图”等"成套素材 / 游戏资产"类视觉。当前已落地:
  • retro-skeuomorphic-icons.md
    — 拟物 / Y2K / 像素 图标集(成套统一风格)
  • game-screenshot-mockup.md
    — 游戏内截图 mockup(HUD + 字幕 + 任务面板)
Suitable for "set of materials / game assets" visuals such as icon sets / game screenshots. Currently implemented:
  • retro-skeuomorphic-icons.md
    — Skeuomorphic / Y2K / pixel icon set (unified style in a set)
  • game-screenshot-mockup.md
    — In-game screenshot mockup (HUD + subtitles + task panel)

16. Academic Figures (
references/academic-figures/
)

16. Academic Figures (
references/academic-figures/
)

适合“论文 / 顶会投稿 / 学术海报 / 答辩 PPT / 开题答辩 / 期刊投稿 Graphical Abstract”的配图。整体偏白底 + 出版物字体 + 几何精确 + 低饱和工程色(深蓝 / 灰蓝 / 黑灰为主,≤3 主色)+ 可单色印刷。严格禁止虚构定量数据(数值 / 等值线 / 色标范围 / 公式)。
CS / CV / ML 方向:
  • method-pipeline-overview.md
    — 方法总览图 / pipeline figure(多 stage 块 + 数据流;变体 4 提供工程类左/中/右 三段式技术路线图)
  • neural-network-architecture.md
    — 神经网络架构图(layer 块 + tensor shape + 跳连)
  • qualitative-comparison-grid.md
    — 多方法 qualitative 对比网格(行 = 样本,列 = 方法
工程 / 自然科学 / 答辩通用:
  • scientific-schematic.md
    — 概念 / 原理 / 实验装置示意图(自由度高,自然语言模板)
  • mechanism-diagram.md
    — 机理示意图 / 因果链路 / 转化路径(中心对象 + 多阶段转化 + 结果区;含三段式因果链 / 循环自激发 / 多分支竞争 三种变体)
  • multi-condition-comparison.md
    多工况 / 多条件结果对比图(同一对象在不同 condition 下的并列结果,2×2 / 1×N / M×N;强调 panel 间严格统一)
  • publication-chart.md
    — publication-ready 数据图表(bar / line / scatter / heatmap / box)
总览 / 摘要 / 答辩首页:
  • graphical-abstract.md
    — 期刊投稿 Graphical Abstract / 图形摘要(横向 4 段式 / 中心展开 / 方形 / 竖版四种变体)
  • research-overview-poster.md
    — 开题 / 答辩 / 汇报首页研究总览图(上中下三层 + 五模块;含中心辐射 / 左右双栏 / 极简 三种变体)
选择策略:CS/CV/ML 论文首选
method-pipeline-overview
+
qualitative-comparison-grid
;工程 / 能源 / 化工 / 材料方向首选
method-pipeline-overview
变体 4 +
mechanism-diagram
+
multi-condition-comparison
;投稿期刊摘要图用
graphical-abstract
;答辩 PPT 首页用
research-overview-poster
Suitable for illustrations for papers / top conference submissions / academic posters / defense PPT / proposal defense / journal submission Graphical Abstract. Overall preference for white background + publication fonts + geometric precision + low-saturation engineering colors (mainly dark blue / gray-blue / black-gray, ≤3 main colors) + printable in monochrome. Strictly prohibit fictional quantitative data (values / contour lines / color scale ranges / formulas).
CS / CV / ML direction:
  • method-pipeline-overview.md
    — Method overview diagram / pipeline figure (multi-stage blocks + data flow; variant 4 provides left/middle/right three-stage technical roadmap for engineering)
  • neural-network-architecture.md
    — Neural network architecture diagram (layer blocks + tensor shape + skip connections)
  • qualitative-comparison-grid.md
    — Multi-method qualitative comparison grid (rows = samples, columns = methods)
Engineering / natural sciences / general defense:
  • scientific-schematic.md
    — Concept / principle / experimental device schematic (high degree of freedom, natural language template)
  • mechanism-diagram.md
    — Mechanism schematic / causal link / transformation path (central object + multi-stage transformation + result area; includes three variants: three-stage causal chain / cyclic self-excitation / multi-branch competition)
  • multi-condition-comparison.md
    Multi-condition / multi-scenario result comparison diagram (side-by-side results of the same object under different conditions, 2×2 / 1×N / M×N; emphasizes strict uniformity between panels)
  • publication-chart.md
    — Publication-ready data chart (bar / line / scatter / heatmap / box)
Overview / abstract / defense homepage:
  • graphical-abstract.md
    — Journal submission Graphical Abstract / graphical abstract (four variants: horizontal 4-stage / central expansion / square / vertical)
  • research-overview-poster.md
    — Research overview diagram for proposal / defense / presentation homepage (three layers top-middle-bottom + five modules; includes three variants: central radiation / left-right double column / minimalist)
Selection strategy: For CS/CV/ML papers, prefer
method-pipeline-overview
+
qualitative-comparison-grid
; for engineering / energy / chemical engineering / materials directions, prefer
method-pipeline-overview
variant 4 +
mechanism-diagram
+
multi-condition-comparison
; use
graphical-abstract
for journal submission abstract images; use
research-overview-poster
for defense PPT homepage.

17. Infographics (
references/infographics/
)

17. Infographics (
references/infographics/
)

适合“信息图 / 高密度科普 / 手绘信息图 / KPI 仪表盘”等"信息可视化大图"。当前已落地:
  • legend-heavy-infographic.md
    — 高图例密度科普 / 因果链 / 演化 / 解剖图(双语)
  • hand-drawn-infographic.md
    手绘风信息图(macaron / morandi / 黑板 / 牛皮纸;自然语言模板)
  • bento-grid-infographic.md
    — 便当格模块化信息图(高密度多模块 widget 排布)
  • comparison-infographic.md
    — 二元 / 多元对比信息图(A vs B / 套餐档位 / 误区 vs 正解)
  • step-by-step-infographic.md
    — 步骤教程信息图(插画感、温暖;非工程流程图)
  • kpi-dashboard-infographic.md
    — KPI 仪表盘式信息图(年度回顾 / Wrapped / 业务 dashboard)
Suitable for "large-scale information visualization" visuals such as infographics / high-density popular science / hand-drawn infographics / KPI dashboards. Currently implemented:
  • legend-heavy-infographic.md
    — High-legend-density popular science / causal chain / evolution / anatomical diagram (bilingual)
  • hand-drawn-infographic.md
    Hand-drawn style infographic (macaron / morandi / blackboard / kraft paper; natural language template)
  • bento-grid-infographic.md
    — Bento grid modular infographic (high-density multi-module widget arrangement)
  • comparison-infographic.md
    — Binary / multi-element comparison infographic (A vs B / package tiers / misconceptions vs correct answers)
  • step-by-step-infographic.md
    — Step-by-step tutorial infographic (illustrative, warm; non-engineering flowchart)
  • kpi-dashboard-infographic.md
    — KPI dashboard-style infographic (annual review / Wrapped / business dashboard)

18. Technical Diagrams (
references/technical-diagrams/
)

18. Technical Diagrams (
references/technical-diagrams/
)

适合“系统架构 / 流程 / 时序 / 状态机 / ER / 思维导图 / 网络拓扑”等工程示意图。统一暗色 grid 背景 + 等宽字体 + 角色编码配色,每个模板都附 light 变体。
⚠️ 注意:本目录生成的是 PNG 位图不是可编辑 SVG;需要可编辑请改用 mermaid / draw.io / excalidraw / Figma。当前已落地:
  • system-architecture.md
    — 系统架构图(前端 + 后端 + DB + 缓存 + 队列 + 外部)
  • flowchart-decision.md
    — 流程图 / 决策图(BPMN 形状语义 + Yes/No 分支)
  • sequence-diagram.md
    — 时序图(actor + lifeline + 消息箭头 + 激活条)
  • state-machine.md
    — 状态机 / 生命周期图(state + transition + guard / action)
  • er-diagram.md
    — ER 图 / 数据模型图(实体 + 字段 + PK/FK + crow's foot 关系)
  • mind-map-tech.md
    — 技术主题思维导图(中央 + 放射式分支)
  • network-topology.md
    — 网络拓扑图(设备 glyph + zone / VPC + 带宽 / 协议标)
Suitable for engineering schematics such as system architecture / process / sequence / state machine / ER / mind map / network topology. Unified dark grid background + monospaced font + role-coded color scheme, each template comes with a light variant.
⚠️ Note: This directory generates PNG bitmaps, not editable SVG; use mermaid / draw.io / excalidraw / Figma if editable versions are needed. Currently implemented:
  • system-architecture.md
    — System architecture diagram (frontend + backend + DB + cache + queue + external services)
  • flowchart-decision.md
    — Flowchart / decision diagram (BPMN shape semantics + Yes/No branches)
  • sequence-diagram.md
    — Sequence diagram (actor + lifeline + message arrow + activation bar)
  • state-machine.md
    — State machine / lifecycle diagram (state + transition + guard / action)
  • er-diagram.md
    — ER diagram / data model diagram (entity + fields + PK/FK + crow's foot relationships)
  • mind-map-tech.md
    — Technical topic mind map (central + radial branches)
  • network-topology.md
    — Network topology diagram (device glyphs + zone / VPC + bandwidth / protocol labels)

提示词工作流(模式感知)

Prompt Workflow (Mode-Aware)

无论 A / B / C,前 6 步是共用的;区别只在第 7-8 步如何"出图"。
  1. check-mode.js
    确定模式
    (A / B / C)。
  2. 判断任务是生图还是改图。
  3. 识别它属于哪个分类目录(参考下方"模板索引")。
  4. 只读取对应的具体模板文件,不要一次读整个 references/
  5. 严格遵循模板格式:大部分模板用 JSON 主模板(结构化任务首选),少数模板(
    infographics/hand-drawn-infographic.md
    academic-figures/scientific-schematic.md
    等)使用「结构化自然语言 + 参数」混合形式,因为强行 JSON 会限制创作自由。
  6. 把用户输入映射到模板参数;关键信息不足时主动发起有针对性的澄清问题。
到此 prompt 已渲染好。下面按模式分叉:
7-A. Mode A:把最终 prompt 保存到
garden-gpt-image-2/prompt/
,调用
scripts/generate.js
scripts/edit.js
,图片落到
garden-gpt-image-2/image/
。 7-B. Mode B:把最终 prompt 直接传给宿主的图像工具调用;按需保存 prompt 副本到
garden-gpt-image-2/prompt/
。 7-C. Mode C:把最终 prompt 保存到
garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
,并把完整 prompt 在对话中展示给用户,附一句简短的"如何使用 / 推荐工具"建议。
  1. 任务结束后用一句话告诉用户:当前模式是什么、prompt 落在哪、图(如有)落在哪。
Regardless of A / B / C, the first 6 steps are shared; the difference lies in steps 7-8 for "image generation".
  1. Run
    check-mode.js
    to determine the mode
    (A / B / C).
  2. Judge whether the task is image generation or editing.
  3. Identify which category directory it belongs to (refer to the "Template Index" below).
  4. Only read the corresponding specific template file, do not read the entire references/ at once.
  5. Strictly follow the template format: most templates use JSON main templates (preferred for structured tasks), a few templates (such as
    infographics/hand-drawn-infographic.md
    ,
    academic-figures/scientific-schematic.md
    ) use a hybrid form of "structured natural language + parameters", because forced JSON will restrict creative freedom.
  6. Map user input to template parameters; actively initiate targeted clarification questions if key information is missing.
The prompt is now rendered. Branch by mode below:
7-A. Mode A: Save the final prompt to
garden-gpt-image-2/prompt/
, call
scripts/generate.js
or
scripts/edit.js
, and save images to
garden-gpt-image-2/image/
. 7-B. Mode B: Directly pass the final prompt to the host's image tool call; save a copy of the prompt to
garden-gpt-image-2/prompt/
as needed. 7-C. Mode C: Save the final prompt to
garden-gpt-image-2/prompt/<task-slug>-<timestamp>.md
, display the complete prompt to users in the conversation, and attach a short "how to use / recommended tools" suggestion.
  1. After the task ends, tell users in one sentence: what the current mode is, where the prompt is saved, and where the image (if any) is saved.

重要约束

Important Constraints

通用:
  • 模板文件中的 JSON 是提示词结构模板,不是 API 请求体模板。
  • 三种模式下,最终交给图像模型的都是"渲染后的 prompt 字符串"——可以是拍平的 JSON、可以是结构化自然语言段落,按模板原样使用。
  • 除非用户明确要求,否则不要把 SKILL.md 里的"模式说明"复制到最终 prompt 里——那是给 Agent 看的元信息。
仅 Mode A 适用:
  • 生成脚本使用 JSON body
  • 编辑脚本使用 multipart form data
  • 响应优先按
    data[0].b64_json
    解析,也兼容
    data[0].url
  • 除非上游接口明确要求,不额外引入特殊 query 参数
General:
  • JSON in template files is a prompt structure template, not an API request body template.
  • In all three modes, the final content passed to the image model is a "rendered prompt string"——it can be flattened JSON, structured natural language paragraphs, used exactly as per the template.
  • Unless explicitly requested by users, do not copy the "mode description" from SKILL.md into the final prompt——that is meta-information for the Agent.
Only applicable to Mode A:
  • Generation scripts use JSON body
  • Editing scripts use multipart form data
  • Responses are parsed preferentially by
    data[0].b64_json
    , and also compatible with
    data[0].url
  • Do not introduce additional special query parameters unless explicitly required by the upstream interface

何时提问

When to Ask Questions

只在这些信息缺失且会显著影响结果时提问:
  • 没有 prompt 目标
  • 改图时没有原图
  • 主体身份或视觉类型决定结果走向
  • 商品 / 价格 / 文案 / UI 文本是画面核心组成部分
  • 用户同时表达了多个互相冲突的目标
除此之外,优先自己做合理默认并继续执行。
Only ask questions when this information is missing and will significantly affect the result:
  • No prompt target
  • No original image for editing tasks
  • Subject identity or visual type determines the result direction
  • Product / price / copy / UI text is a core component of the画面
  • Users express multiple conflicting goals at the same time
Otherwise, prioritize making reasonable defaults and proceed.