generate-image

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese
Requires
GEMINI_API_KEY
environment variable and
uv
package manager.
需要配置
GEMINI_API_KEY
环境变量和
uv
包管理器。

Workflow

工作流程

  1. Understand — Determine mode (t2i, i2i, multi-reference), gather parameters (model, aspect ratio, resolution, output path). If the prompt requires precise execution (specific pose, asymmetric framing, exact crop), default to
    --batch 3
    or
    --batch 4
    and surface this to the user — image generation is stochastic and precise directives hit ~50% per seed. Exit: mode, parameters, and batch size are clear.
  2. Craft prompt — Default to the minimal prompt that can carry the intent: t2i uses narrative prose; i2i/multi-reference uses a reference block plus the minimal directive. Apply the Core checklist (always, for the matching mode). Reach into the Escalation toolkit only on a known-hard signature — a detail/geometry-fidelity shot — or after a batch shows drift; then add only the specific lock for the attribute that is drifting, not the whole kit. Over-constraining a simple edit degrades it as surely as under-specifying a complex one. Exit: prompt written, Core items satisfied, escalation tools added only where a signature or observed drift justifies them.
  3. Confirm — Show the user the exact prompt, input images (if any), model, resolution, aspect ratio, and batch size. Ask for confirmation. Exit: user approves.
  4. Generate — Run the script with confirmed parameters. Exit: images are saved and displayed.
  5. Iterate — Present results and evaluate against intent before offering refinements. Evaluation order by mode: t2i — subject correctness, composition, style fidelity. i2i edit — the changed element looks right, nothing else changed. Multi-reference composition — the primary transferred attribute matches its source reference FIRST (for a detail shot, the construction geometry — width, edge shape, count, angle), secondary consistency (identity, environment) holds SECOND, staging (lighting, composition, framing) THIRD. Decide what's primary per task. Cherry-pick the winning frame from the batch rather than re-prompting for consistency past ~75%. Exit: user is satisfied or moves on.
  1. 需求理解 — 确定生成模式(t2i、i2i、多参考),收集参数(模型、宽高比、分辨率、输出路径)。如果提示词需要精准执行(特定姿势、不对称构图、精确裁剪),默认使用
    --batch 3
    --batch 4
    参数并告知用户——图像生成具有随机性,精准指令的单次种子成功率约为50%。完成标志:模式、参数和批量大小明确。
  2. 撰写提示词 — 默认使用能传达核心意图的最简提示词:t2i使用叙事性 prose;i2i/多参考模式使用参考块加最简指令。遵循对应模式的核心检查清单(必须执行)。仅当遇到已知的高难度场景(细节/几何保真度要求高的画面)或批量生成出现偏差时,才使用升级工具包;且仅针对出现偏差的属性添加特定约束,而非全部工具。过度约束简单编辑会降低效果,就像对复杂场景描述不足一样。完成标志:提示词撰写完成,核心要求满足,仅在必要场景添加升级工具。
  3. 确认信息 — 向用户展示最终的提示词、输入图像(如有)、模型、分辨率、宽高比和批量大小,请求确认。完成标志:用户批准。
  4. 生成图像 — 使用确认后的参数运行脚本。完成标志:图像已保存并展示。
  5. 迭代优化 — 展示结果并对照用户意图评估后,再提供优化方案。评估顺序因模式而异:t2i — 主体正确性、构图、风格保真度;i2i编辑 — 修改后的元素符合预期,其他元素未被改动;多参考合成 — 首要转移属性与源参考匹配(细节画面优先匹配结构几何:宽度、边缘形状、数量、角度),次要属性(身份、环境)一致性次之,场景布置(光线、构图、取景)再次之。根据任务确定首要属性。从批量结果中挑选最佳画面,而非在成功率达75%后仍为了一致性重新生成提示词。完成标志:用户满意或转向其他任务。

Default Output & Logging

默认输出与日志

When the user doesn't specify a location, save images to:
~/Documents/generated images/
Every generated image gets a companion
.md
file with the prompt and model used (e.g.,
logo.png
logo.md
).
When gathering parameters (aspect ratio, resolution), offer the option to specify a custom output location.

当用户未指定存储位置时,图像将保存至:
~/Documents/generated images/
每一张生成的图像都会附带一个
.md
文件,记录使用的提示词和模型(例如:
logo.png
对应
logo.md
)。
收集参数(宽高比、分辨率)时,提供自定义输出位置的选项。

Core Prompting Principle

核心提示词原则

Describe scenes narratively, not as keyword lists. Gemini's language model parses prose with full semantic understanding — narrative prompts encode spatial relationships, mood, and intent that comma-separated tags cannot express. Tag-style prompts lose compositional meaning and produce generic results.
Bad:  "cat, wizard hat, magical, fantasy, 4k, detailed"

Good: "A fluffy orange tabby sits regally on a velvet cushion, wearing an ornate
       purple wizard hat embroidered with silver stars. Soft candlelight illuminates
       the scene from the left. The mood is whimsical yet dignified."
Describe positively, never via negation. Every concept named in a prompt biases the output toward that concept — even when preceded by "not", "no", or "do not". Diffusion models condition on tokens regardless of polarity. To exclude X, either (a) name a positive alternative that fills the same role, or (b) scope the prompt so X has no place to land.
Bad:   "A clean studio backdrop. No warm tones, no cream, no beige, no tan."
Good:  "A clean cool-neutral gray studio backdrop with subtle blue undertones."

Bad:   "A headshot with no harsh shadows on the face, no distracting background."
Good:  "A headshot on a clean neutral gray backdrop, even soft frontal fill light
        that flatters the face."
This rule applies everywhere in the skill — t2i prompts, i2i directives, reference role descriptions, and framing instructions.
Name sources explicitly — leave no ambiguity in references. Every element in the prompt should trace to a specific source: "the man from Image 2" not "this man"; "the shirt from Image 3" not "the shirt". Ambiguous references bind to whichever source the model weights most, which is never reliably the right one. This isn't about over-describing — don't re-describe what the reference already shows. It's about making each reference point to exactly one source.
A useful formula:
[Subject] doing [Action] in [Context]. [Camera/Composition]. [Lighting]. [Style]. [Constraint].
Not every prompt needs every element — match detail to intent. If the user has a specific vision, be prescriptive (exact descriptions); if exploring, be open (general direction, let the model decide details). Ask if unclear.
用叙事性语言描述场景,而非关键词列表。 Gemini语言模型能解析具有完整语义的 prose——叙事性提示词能编码空间关系、氛围和意图,而逗号分隔的标签无法表达这些信息。标签式提示词会丢失构图意义,导致结果泛化。
错误示例:  "cat, wizard hat, magical, fantasy, 4k, detailed"

正确示例: "一只毛茸茸的橙色虎斑猫威严地坐在天鹅绒垫子上,戴着一顶绣有银色星星的华丽紫色巫师帽。柔和的烛光从左侧照亮场景,氛围既奇趣又庄重。"
用肯定式描述,绝不使用否定式。 提示词中提及的任何概念都会引导输出向该概念倾斜——即使前面有"not"、"no"或"do not"。扩散模型会对所有标记词进行条件处理,无论正负。要排除X,要么(a)指定一个能替代X的正面选项,要么(b)限定场景范围,让X没有存在的空间。
错误示例:   "干净的工作室背景。不要暖色调,不要米黄色,不要浅褐色,不要棕褐色。"
正确示例:  "干净的冷中性灰色工作室背景,带有淡淡的蓝色基调。"

错误示例:   "头像照片,面部没有刺眼阴影,背景无干扰。"
正确示例:  "在干净的中性灰色背景下拍摄的头像,使用均匀柔和的正面补光,使面部轮廓更美观。"
此规则适用于技能的所有场景——t2i提示词、i2i指令、参考角色描述和取景说明。
明确指定来源——参考内容无歧义。 提示词中的每个元素都应指向特定来源:使用"图像2中的男子"而非"这个男子";使用"图像3中的衬衫"而非"这件衬衫"。模糊的参考会绑定到模型权重最高的来源,而这通常不是正确的来源。这不是过度描述——无需重复描述参考图像已有的内容,而是让每个参考都精确指向一个来源。
一个实用的公式:
[主体] 在 [场景] 中做 [动作]。[镜头/构图]。[光线]。[风格]。[约束]。
并非每个提示词都需要所有元素——根据意图匹配细节程度。如果用户有明确愿景,要具体(精确描述);如果是探索性需求,可开放(大致方向,让模型决定细节)。如有疑问请询问用户。

Advanced Prompting Techniques

高级提示词技巧

Hyper-specificity: Be precise about quantities, positions, and attributes. "Three red apples arranged in a triangle on a wooden table" outperforms "some apples on a table." Every vague word is a degree of freedom the model fills arbitrarily.
Context and intent: State the purpose. "A hero image for a coffee brand landing page" produces different results than "a photo of coffee" even if the visual subject is the same, because intent shapes composition, mood, and framing.
Step-by-step instructions: For complex scenes, break the prompt into sequential directives. "Start with a wide desert landscape. Place a lone figure walking left-to-right in the lower third. Behind them, a massive sandstorm approaches from the right."
Exclusion via positive constraint: When something must be absent from the output, do not name it under a negation. Either name a positive alternative ("clean unbranded surface" instead of "no logos") or scope the scene so the unwanted element has no place to land ("a closed laptop on the desk" makes a screen impossible to render). Naming X under "no X" makes X more likely, not less.
Camera control: Specify shot type (extreme close-up, medium shot, aerial), lens (fisheye, telephoto), and camera angle (low angle, bird's eye, Dutch angle) to control framing precisely.
Editing with reference images follows different principles — see references/editing-guide.md.

超具体化:精确描述数量、位置和属性。"三个红苹果在木桌上排成三角形"的效果远优于"桌上有一些苹果"。每个模糊词汇都是模型任意填充的自由度。
场景与意图:说明用途。"咖啡品牌着陆页的主视觉图"与"咖啡照片"的结果不同,即使视觉主体相同,因为意图会影响构图、氛围和取景。
分步指令:对于复杂场景,将提示词拆分为连续指令。"先展示广阔的沙漠景观。在画面下方三分之一处放置一个从左向右行走的孤独身影。在他们身后,一场巨大的沙尘暴从右侧逼近。"
通过正面约束排除元素:当输出中必须排除某些内容时,不要用否定式提及它。要么指定正面替代选项("干净无品牌的表面"而非"没有logo"),要么限定场景范围,让不需要的元素无法存在("桌上的闭合笔记本"让屏幕无法被渲染)。用"no X"提及X会让X更可能出现,而非减少。
镜头控制:指定拍摄类型(极端特写、中景、航拍)、镜头(鱼眼、长焦)和拍摄角度(低角度、鸟瞰、倾斜角度),以精准控制构图。
使用参考图像进行编辑遵循不同原则——详见references/editing-guide.md

Key Editing Principles

核心编辑原则

Editing prompts direct changes rather than describing scenes. Point to what the model can see; describe only what it cannot. Specify intentionally — every adjective, color word, or preservation clause beyond the minimum competes with the reference image and degrades fidelity. The reliable shape is a reference block plus one Replace directive — the verb's implicit scope handles preservation, no stop clause needed. Details in editing-guide.md.
For multi-reference work (3+ images), use per-reference role assignment: one sentence that assigns each reference its specific contribution ("the facade from Image 2; the car from Image 3; the sky and lighting from Image 1") — see editing-guide.md "Per-Reference Role Assignment".
Base image goes first in
--input
— it becomes Image 1 in the prompt. Gemini numbers images sequentially from input order. Reference block labels must match input order exactly.
Names invoke aesthetics directly — referencing "shot on Kodak Portra 400" produces its characteristic look more reliably than describing warm skin tones and pastel highlights.

编辑提示词用于指导修改,而非描述场景。指向模型可见的内容;仅描述模型无法看到的部分。明确指定——超出必要范围的每个形容词、颜色词或保留条款都会与参考图像产生竞争,降低保真度。可靠的结构是参考块加一个替换指令——动词的隐含范围会处理保留内容,无需停止条款。详细内容见editing-guide.md。
对于多参考工作(3张及以上图像),使用每参考角色分配:用一句话为每个参考指定具体贡献("图像2的外立面;图像3的汽车;图像1的天空和光线")——详见editing-guide.md中的"每参考角色分配"。
基础图像需放在
--input
的第一位——它会成为提示词中的图像1。Gemini会根据输入顺序为图像编号。参考块的标签必须与输入顺序完全匹配。
直接指定名称可调用对应的美学风格——提及"用Kodak Portra 400拍摄"比描述暖色调肤色和淡彩高光更能可靠地生成其特有的效果。

References

参考文档

Load the relevant reference during prompt crafting (workflow step 2):
  • references/capability-patterns.md — mode-specific tips for photorealistic scenes, product photography, logos, stylized illustration, text rendering, and grounding
  • references/editing-guide.md — edit grammar, reference blocks, directive structure, image ordering, semantic masking, character consistency
  • references/style-reference.md — named aesthetics lexicon (film stocks, cameras, studios, artists, movements)

撰写提示词时(工作流程第2步)加载相关参考文档:
  • references/capability-patterns.md — 针对写实场景、产品摄影、logo、风格化插画、文本渲染和定位的模式特定技巧
  • references/editing-guide.md — 编辑语法、参考块、指令结构、图像排序、语义遮罩、角色一致性
  • references/style-reference.md — 命名美学词典(胶卷、相机、工作室、艺术家、流派)

Configuration

配置

Model Selection

模型选择

Nano Banana (default)Nano Banana Pro
SpeedFast, high-volumeSlower, higher quality
Resolutions0.5K, 1K, 2K, 4K1K, 2K, 4K
Extra ratios1:4, 4:1, 1:8, 8:1
Thinking modeYes (minimal/low/medium/high)No
Image search groundingYesNo
Max references1411 (6 objects + 5 characters)
Text renderingAdvancedStandard
Default to Nano Banana for most requests. Use Nano Banana Pro when the user explicitly asks for maximum quality or when Nano Banana results need refinement.
Nano Banana(默认)Nano Banana Pro
速度快速、高吞吐量较慢、高质量
分辨率0.5K、1K、2K、4K1K、2K、4K
额外宽高比1:4、4:1、1:8、8:1
思考模式支持(minimal/low/medium/high)不支持
图像搜索定位支持不支持
最大参考数量1411(6个物体 + 5个角色)
文本渲染高级标准
大多数请求默认使用Nano Banana。当用户明确要求最高质量或Nano Banana结果需要优化时,使用Nano Banana Pro。

Aspect Ratios

宽高比

Both models: 1:1, 2:3, 3:2, 3:4, 4:3, 4:5, 5:4, 9:16, 16:9, 21:9 Nano Banana only: 1:4, 4:1, 1:8, 8:1
两款模型均支持:1:1、2:3、3:2、3:4、4:3、4:5、5:4、9:16、16:9、21:9 仅Nano Banana支持:1:4、4:1、1:8、8:1

Resolutions

分辨率

  • 0.5K (~512px) — fast preview (Nano Banana only)
  • 1K (~1024px) — default, fast
  • 2K (~2048px) — high quality
  • 4K (~4096px) — maximum detail
Defaults: 1K resolution, batch 1, aspect ratio auto-detected from base image (first input, or 1:1 if no images). Use 0.5K for quick previews and iteration (Nano Banana only). Use 2K for higher quality requests, 4K only when high detail is explicitly needed.
  • 0.5K(约512px)——快速预览(仅Nano Banana支持)
  • 1K(约1024px)——默认选项,速度快
  • 2K(约2048px)——高质量
  • 4K(约4096px)——最大细节
默认设置:1K分辨率,批量1,宽高比从基础图像(第一个输入,无图像则为1:1)自动检测。快速预览和迭代使用0.5K(仅Nano Banana支持)。高质量请求使用2K,仅当明确需要高细节时使用4K。

Thinking Mode (Nano Banana only)

思考模式(仅Nano Banana支持)

Nano Banana supports controllable thinking levels that improve complex prompt interpretation:
  • minimal (default) — fastest, suitable for straightforward prompts
  • low/medium — balanced reasoning for moderately complex scenes
  • high — maximum reasoning for complex multi-element compositions, precise text rendering, or intricate spatial layouts
Use
--thinking high
when the prompt involves precise spatial relationships, multiple text elements, or detailed composition requirements. For i2i editing, thinking mode also helps with multi-reference composition (3+ images), precise text/sign placement on existing scenes, and complex spatial edits where element positioning matters.

Nano Banana支持可控制的思考级别,能提升复杂提示词的解读能力:
  • minimal(默认)——最快,适用于简单提示词
  • low/medium——平衡推理能力,适用于中等复杂度场景
  • high——最大推理能力,适用于复杂多元素构图、精准文本渲染或复杂空间布局
当提示词涉及精确空间关系、多个文本元素或详细构图要求时,使用
--thinking high
。对于i2i编辑,思考模式也有助于多参考合成(3张及以上图像)、在现有场景中精准放置文本/标识,以及元素位置至关重要的复杂空间编辑。

Script Usage

脚本使用

One unified script handles all modes: t2i, i2i, and multi-reference composition. Nano Banana is the default model.
bash
undefined
一个统一脚本处理所有模式:t2i、i2i和多参考合成。默认模型为Nano Banana。
bash
undefined

Text-to-image (t2i) — uses Nano Banana by default

文本转图像(t2i)——默认使用Nano Banana

uv run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.py --prompt "A serene mountain lake at dawn" --output landscape.png
uv run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.py --prompt "黎明时分宁静的山湖" --output landscape.png

Nano Banana Pro model

使用Nano Banana Pro模型

uv run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.py --prompt "A serene mountain lake at dawn" --output landscape.png --model pro
uv run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.py --prompt "黎明时分宁静的山湖" --output landscape.png --model pro

Image-to-image editing (i2i)

图像转图像编辑(i2i)

uv run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.py --prompt "Make it sunset colors" --input photo.png --output edited.png
uv run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.py --prompt "调整为日落色调" --input photo.png --output edited.png

Multi-reference composition

多参考合成

uv run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.py --prompt "Combine the cat from image 1 with the background from image 2" --input cat.png --input background.png --output composite.png
uv run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.py --prompt "将图像1中的猫与图像2中的背景结合" --input cat.png --input background.png --output composite.png

With options (aspect ratio, resolution, thinking, batch, grounding, format)

带选项(宽高比、分辨率、思考模式、批量、定位、格式)

uv run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.py --prompt "Logo for 'Acme Corp'" --output logo.png --aspect 1:1 --resolution 2K --thinking high
undefined
uv run ${CLAUDE_PLUGIN_ROOT}/skills/generate-image/scripts/generate.py --prompt "'Acme Corp'的logo" --output logo.png --aspect 1:1 --resolution 2K --thinking high
undefined

Script Options

脚本选项

FlagShortDescription
--prompt
-p
Image description or edit instruction (required)
--output
-o
Output file path (required)
--input
-i
Input image(s) for editing/composition (repeatable, up to 14)
--model
-m
Model: nano-banana (default) or pro
--aspect
-a
Aspect ratio (auto-detects from base image / first input, or 1:1)
--resolution
-r
Output resolution: 0.5K, 1K, 2K, or 4K (default: auto-detect or 1K)
--grounding
-g
Enable Google Search web grounding
--image-grounding
Enable image search grounding (Nano Banana only, use with --grounding)
--thinking
-t
Thinking level: minimal, low, medium, high (Nano Banana only)
--quality
-q
Output compression quality 1-100 (JPEG only)
--format
-f
Output format: png (default) or jpeg
--batch
-b
Generate multiple variations: 1-4 (default: 1)
--json
Output results as JSON for agent consumption
--quiet
Suppress progress output (MEDIA lines still printed)
The script auto-detects resolution and aspect ratio from input images when flags are omitted, and automatically resizes large inputs (>2048px) before sending to the API.

标志简写描述
--prompt
-p
图像描述或编辑指令(必填)
--output
-o
输出文件路径(必填)
--input
-i
用于编辑/合成的输入图像(可重复,最多14张)
--model
-m
模型:nano-banana(默认)或pro
--aspect
-a
宽高比(从基础图像/第一个输入自动检测,或默认1:1)
--resolution
-r
输出分辨率:0.5K、1K、2K或4K(默认:自动检测或1K)
--grounding
-g
启用Google搜索网页定位
--image-grounding
启用图像搜索定位(仅Nano Banana支持,需与--grounding配合使用)
--thinking
-t
思考级别:minimal、low、medium、high(仅Nano Banana支持)
--quality
-q
输出压缩质量1-100(仅JPEG格式)
--format
-f
输出格式:png(默认)或jpeg
--batch
-b
生成多个变体:1-4(默认:1)
--json
以JSON格式输出结果供Agent使用
--quiet
抑制进度输出(仍会打印MEDIA行)
当省略标志时,脚本会从输入图像自动检测分辨率和宽高比,并在发送到API前自动调整大尺寸输入(>2048px)的大小。

Pre-Generation Checklist

生成前检查清单

Core items are the floor — apply them to every prompt of the matching mode. The Escalation toolkit is opt-in: skip it entirely for simple t2i and single-element edits. Reach in only on a known-hard signature (a detail/geometry-fidelity shot) or after a batch shows drift — and then add only the lock for the attribute that is actually drifting. Each added constraint costs fidelity on everything else, so escalation scales with how many independent things can drift, not with how ambitious the prompt is.
核心项是基础要求——所有对应模式的提示词都必须满足。升级工具包为可选:简单t2i和单元素编辑可完全跳过。仅当遇到高难度场景(细节/几何保真度要求高的画面)或批量生成出现偏差时才使用——且仅针对实际出现偏差的属性添加约束。每个添加的约束都会降低其他内容的保真度,因此升级的程度取决于可能出现偏差的独立属性数量,而非提示词的复杂度。

Core — t2i (always)

核心要求——t2i(必须满足)

  • Narrative description (not keyword list)?
  • Positive framing throughout — no "no X" / "not X" / "do not X" clauses anywhere in the prompt?
  • Camera/lighting details for photorealism?
  • Text in quotes, font style described? (if the image has text)
  • Aspect ratio appropriate for use case?
  • Model choice appropriate? (Nano Banana default; Nano Banana Pro for max quality)
  • Thinking level set for complex prompts? (Nano Banana only)
  • Batch size matches precision needs? (
    --batch 3
    or
    --batch 4
    for precise pose / framing / asymmetric directives)
  • 使用叙事性描述(而非关键词列表)?
  • 全程使用肯定式表述——提示词中无"不要X"/"不是X"/"请勿X"的条款?
  • 写实风格包含镜头/光线细节?
  • 文本加引号,描述字体风格?(如果图像包含文本)
  • 宽高比适合使用场景?
  • 模型选择合适?(默认Nano Banana;最高质量需求使用Nano Banana Pro)
  • 复杂提示词设置了对应思考级别?(仅Nano Banana支持)
  • 批量大小匹配精度需求?(精确姿势/构图/不对称指令使用
    --batch 3
    --batch 4

Core — i2i / multi-reference (always)

核心要求——i2i / 多参考(必须满足)

  • Reference block at start of prompt labeling each image's role?
  • Reference roles positive-only — lists what to USE from each ref, never what to ignore? (see editing-guide.md "Reference Block")
  • Minimal directive pattern? (Reference block + one Replace directive — no "do not change anything else" stop clause, no decorative preservation clauses)
  • Positive framing throughout the directive? (no negation anywhere, including locks and composition clauses)
  • Base image first in
    --input
    (Image 1), and prompt labels match input order? (mislabeled roles cause character drift)
  • Only one change per prompt? (split competing directives into sequential passes)
  • When extracting/transferring elements: explicitly named each element rather than generic "outfit/object from image X"?
  • One reference carrying an element you need to preserve while another reference could compete with it? → add
    CRITICAL — [element]
    proactively: assign the source reference explicitly ("Image 1 is the sole identity source. Image 2 is scoped to [attribute] only."). Do not wait for drift — role competition defeats text directives silently.
  • No color labels competing with reference image? (color words override visual reference — see editing-guide)
  • Base image has minimal accessories that could contaminate? (bags, hats, sunglasses bleed into output)
  • Reference count within model limits? (Nano Banana: 14, Nano Banana Pro: 11)
  • For 3+ references: per-reference role-assignment sentence? (one sentence assigning each image its contribution — see editing-guide.md "Per-Reference Role Assignment")
  • For photorealistic human shots: skin & finish locked? (the one near-universal CRITICAL section — generative skin drifts to plastic/retouched)
  • 提示词开头包含参考块,标注每个图像的角色?
  • 参考角色仅使用肯定式表述——列出从每个参考中使用的内容,绝不提及忽略的内容?(详见editing-guide.md中的"参考块")
  • 使用最简指令模式?(参考块 + 一个替换指令——无"请勿更改其他内容"的停止条款,无装饰性保留条款)
  • 指令全程使用肯定式表述?(无任何否定式表述,包括约束和构图条款)
  • 基础图像放在
    --input
    的第一位(图像1),且提示词标签与输入顺序匹配?(标签错误会导致角色偏差)
  • 每个提示词仅包含一项修改?(将相互竞争的指令拆分为连续步骤)
  • 提取/转移元素时:明确命名每个元素,而非泛泛的"图像X中的服装/物体"?
  • 存在一个包含需要保留元素的参考,而另一个参考可能与之竞争?→ 主动添加
    CRITICAL — [元素]
    :明确指定来源参考("图像1是唯一身份来源。图像2仅用于[属性]。")。不要等到出现偏差才处理——角色竞争会无声地覆盖文本指令。
  • 无与参考图像冲突的颜色标签?(颜色词会覆盖视觉参考——详见编辑指南)
  • 基础图像的附属元素最少,避免污染输出?(包、帽子、太阳镜会渗透到输出中)
  • 参考数量在模型限制内?(Nano Banana:14,Nano Banana Pro:11)
  • 3张及以上参考:使用每参考角色分配语句?(用一句话为每个图像指定贡献——详见editing-guide.md中的"每参考角色分配")
  • 写实风格人像:锁定皮肤质感?(这是几乎通用的CRITICAL部分——生成的皮肤容易偏向塑料/过度修饰)

Escalation toolkit — reach for only on a hard signature or observed drift

升级工具包——仅在高难度场景或出现偏差时使用

  • An attribute (identity, lighting/color, orientation, drape) drifting across the batch? Lock that specific attribute in its own
    ## CRITICAL —
    section, positively phrased — without over-constraining the stable ones. (see editing-guide.md "Constraint Locking with CRITICAL Sections")
  • Detail shot whose construction won't hold? Geometry lock via per-attribute enumeration, not generic "match exactly". (see capability-patterns.md "Geometry Lock for Detail Shots")
  • Nano Banana +
    --thinking high
    with 3+ references and weak adherence? Add the inventory preamble. ("Silently inventory the design-critical details: ...")
  • Follow-up shot from the same set? Continuity assertion. ("from the same set as Image N: same subject, same setting, same light" — see editing-guide.md "Continuity Assertion")
  • Campaign with a locked hero image? Collapse to two-input form — hero as bundle-source + the single new-attribute reference. (see editing-guide.md "Single-Reference Collapse")
  • 某个属性(身份、光线/颜色、方向、褶皱)在批量生成中出现偏差?在单独的
    ## CRITICAL —
    部分锁定该特定属性,使用肯定式表述——不要过度约束稳定的属性。(详见editing-guide.md中的"使用CRITICAL部分锁定约束")
  • 细节画面的结构无法保持?通过逐个属性枚举锁定几何结构,而非泛泛的"完全匹配"。(详见capability-patterns.md中的"细节画面几何锁定")
  • 使用Nano Banana +
    --thinking high
    处理3张及以上参考,但 adherence较弱?添加清单前言。("静默盘点设计关键细节:...")
  • 同一系列的后续画面?添加连续性声明。("与图像N同一系列:相同主体、相同场景、相同光线"——详见editing-guide.md中的"连续性声明")
  • 包含锁定主视觉的系列活动?简化为双输入形式——主视觉作为bundle-source + 单个新属性参考。(详见editing-guide.md中的"单参考简化")