nano-banana-pro

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Nano Banana Pro Image Generation & Editing

Nano Banana Pro 图像生成与编辑

Generate and edit images using Google's Gemini 3 Pro model with advanced transparency support.

使用Google的Gemini 3 Pro模型生成和编辑图像，支持高级透明度功能。

Prerequisites

前提条件

Dependencies:

bash

pip install google-genai Pillow numpy python-dotenv

API Key: The script loads from
```
.env
```
automatically. Only ask the user if the script fails with "No API key found".

依赖项:

bash

pip install google-genai Pillow numpy python-dotenv

API密钥：脚本会自动从
```
.env
```
文件加载。仅当脚本因“未找到API密钥”失败时，才向用户询问。

CLI Usage (REQUIRED)

CLI使用方法（必填）

ALWAYS use the CLI script. Do NOT write Python code or create .py files.

Run

scripts/generate.py

directly:

bash

undefined

务必使用CLI脚本。请勿编写Python代码或创建.py文件。

直接运行

scripts/generate.py

bash

undefined

Basic generation

基础生成

python scripts/generate.py "a cute banana sticker" -o banana.png

With transparency (for game assets, stickers, icons)

带透明度（适用于游戏资源、贴纸、图标）

python scripts/generate.py "pixel art sword" -o sword.png --transparent

Custom size and aspect ratio

自定义尺寸和宽高比

python scripts/generate.py "game logo" -o logo.png --size 4K --ratio 16:9


**Options:**
- `-o, --output` - Output filename (default: output.png)
- `--transparent` - Extract true alpha channel using difference matting
- `--size` - 1K, 2K, or 4K (default: 2K)
- `--ratio` - Aspect ratio: 1:1, 16:9, 9:16, etc. (default: 1:1)
- `--model` - Model override (default: gemini-3-pro-image-preview)

**Note:** The script loads the API key from `.env` automatically. Do not check for API keys manually or ask the user about them - just run the script and it will error with instructions if the key is missing.

python scripts/generate.py "game logo" -o logo.png --size 4K --ratio 16:9


**选项:**
- `-o, --output` - 输出文件名（默认值：output.png）
- `--transparent` - 使用差异抠图提取真实的Alpha通道
- `--size` - 1K、2K或4K（默认值：2K）
- `--ratio` - 宽高比：1:1、16:9、9:16等（默认值：1:1）
- `--model` - 模型覆盖（默认值：gemini-3-pro-image-preview）

**注意:** 脚本会自动从`.env`文件加载API密钥。请勿手动检查API密钥或向用户询问，直接运行脚本即可，如果密钥缺失，脚本会报错并给出说明。

Intent Detection

意图识别

Analyze user request to determine:

Intent	Triggers	Action
Generate	"create", "generate", "make", "draw", "design"	Text-to-image
Edit	"edit", "change", "modify", "update", "fix"	Image-to-image
Transparency	"transparent", "remove background", "alpha", "cutout", "PNG with transparency"	Use difference matting
Text overlay	"add text", "write on", "label", "caption"	Use Gemini 3 Pro for accurate text

分析用户请求以确定：

意图	触发词	操作
生成	"create"、"generate"、"make"、"draw"、"design"	文本转图像
编辑	"edit"、"change"、"modify"、"update"、"fix"	图像转图像
透明度	"transparent"、"remove background"、"alpha"、"cutout"、"PNG with transparency"	使用差异抠图
文字叠加	"add text"、"write on"、"label"、"caption"	使用Gemini 3 Pro添加精准文字

Resolution Selection

分辨率选择

Choose resolution based on use case:

Resolution	Best For	Pixel Output
1K	Quick previews, thumbnails, web icons	~1024px
2K	Social media, standard web images	~2048px
4K	Print, professional assets, sprite sheets	~4096px

Heuristics:

Sprite sheets, game assets, print materials → 4K
Social media, blog images, presentations → 2K
Quick tests, thumbnails, prototypes → 1K

When uncertain, ask user or default to 2K.

根据使用场景选择分辨率：

分辨率	最佳用途	像素输出
1K	快速预览、缩略图、网页图标	~1024px
2K	社交媒体、标准网页图像	~2048px
4K	印刷、专业资源、精灵图	~4096px

经验法则:

精灵图、游戏资源、印刷材料 → 4K
社交媒体、博客图片、演示文稿 → 2K
快速测试、缩略图、原型 → 1K

不确定时，询问用户或默认使用2K。

Aspect Ratios

宽高比

Available:

1:1

2:3

3:2

3:4

4:3

4:5

5:4

9:16

16:9

21:9

Selection guide:

Square content (icons, avatars, social posts) →
```
1:1
```
Portrait (mobile, vertical video) →
```
9:16
```
or
```
3:4
```
Landscape (desktop, presentations) →
```
16:9
```
or
```
3:2
```
Cinematic/ultrawide →
```
21:9
```

可用选项：

1:1

、

2:3

、

3:2

、

3:4

、

4:3

、

4:5

、

5:4

、

9:16

、

16:9

、

21:9

选择指南:

方形内容（图标、头像、社交帖子） →
```
1:1
```
竖屏（移动端、竖版视频） →
```
9:16
```
或
```
3:4
```
横屏（桌面端、演示文稿） →
```
16:9
```
或
```
3:2
```
电影级/超宽屏 →
```
21:9
```

Core Implementation

核心实现

Basic Generation

基础生成

python

from google import genai
from google.genai import types
from PIL import Image
import io

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="Your descriptive prompt here",
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="1:1",  # or other ratio
            image_size="2K"     # 1K, 2K, or 4K
        ),
    ),
)

python

from google import genai
from google.genai import types
from PIL import Image
import io

client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-pro-image-preview",
    contents="Your descriptive prompt here",
    config=types.GenerateContentConfig(
        response_modalities=['IMAGE'],
        image_config=types.ImageConfig(
            aspect_ratio="1:1",  # or other ratio
            image_size="2K"     # 1K, 2K, or 4K
        ),
    ),
)

Extract image from response

for part in response.parts: if part.inline_data is not None: image = Image.open(io.BytesIO(part.inline_data.data)) image.save("output.png") break

undefined

for part in response.parts: if part.inline_data is not None: image = Image.open(io.BytesIO(part.inline_data.data)) image.save("output.png") break

undefined

Image Editing

图像编辑

python

undefined

python

undefined

Load existing image

input_image = Image.open("input.png")

response = client.models.generate_content( model="gemini-3-pro-image-preview", contents=[ input_image, "Edit instruction: Change the background to sunset colors" ], config=types.GenerateContentConfig( response_modalities=['TEXT', 'IMAGE'], image_config=types.ImageConfig( aspect_ratio="1:1", image_size="2K" ), ), )

undefined

input_image = Image.open("input.png")

undefined

Multi-Turn Editing

多轮编辑

Preserve context across edits using thought signatures:

python

undefined

使用思维标记在编辑过程中保留上下文：

python

undefined

First edit

response1 = client.models.generate_content( model="gemini-3-pro-image-preview", contents=[image, "Add a red hat"], config=config, )

Continue editing (include previous response)

response2 = client.models.generate_content( model="gemini-3-pro-image-preview", contents=[ image, "Add a red hat", response1, # Include for context preservation "Now make the hat blue instead" ], config=config, )

undefined

undefined

Transparency Extraction

透明度提取

When user needs transparent images, use difference matting. See

scripts/transparency.py

When to use:

User explicitly asks for transparency
Game sprites, icons, logos
Assets that will be composited
Cutouts and stickers

Process:

Generate image on pure white background (#FFFFFF)
Edit same image to pure black background (#000000)
Calculate alpha from pixel differences
Recover original colors

Key insight: Opaque pixels appear identical on both backgrounds (distance ≈ 0), transparent pixels show background color (max distance).

python

from scripts.transparency import extract_alpha_difference_matting

当用户需要透明图像时，使用差异抠图。请查看

scripts/transparency.py

。

使用场景:

用户明确要求透明度
游戏精灵、图标、标志
将要合成的资源
抠图和贴纸

流程:

在纯白背景（#FFFFFF）上生成图像
将同一图像编辑为纯黑背景（#000000）
根据像素差异计算Alpha通道
还原原始颜色

核心思路: 不透明像素在两种背景下显示相同（差异≈0），透明像素会显示背景颜色（差异最大）。

python

from scripts.transparency import extract_alpha_difference_matting

After generating white and black background versions

final_image = extract_alpha_difference_matting(img_on_white, img_on_black) final_image.save("output.png") # RGBA with true transparency

undefined

final_image = extract_alpha_difference_matting(img_on_white, img_on_black) final_image.save("output.png") # RGBA with true transparency

undefined

Prompt Engineering

提示词工程

Fundamental Principle

基本原则

"Describe the scene, don't just list keywords."

Narrative paragraphs outperform disconnected word lists.

“描述场景，不要只罗列关键词。”

叙事性段落比零散的单词列表效果更好。

Effective Prompt Structure

有效的提示词结构

[Style/Medium] of [Subject] in [Context/Setting], [Lighting], [Additional details]

Examples:

undefined

[风格/媒介]的[主体]在[环境/场景]中，[光线]，[其他细节]

示例:

undefined

Photorealistic

写实风格

A professional studio photograph of a brass steampunk pocket watch, shot with a 50mm lens, soft diffused lighting from the left, shallow depth of field with bokeh background, 4K HDR quality.

一张专业工作室拍摄的黄铜蒸汽朋克怀表照片，使用50mm镜头拍摄，左侧柔和漫射光，浅景深背景虚化，4K HDR画质。

Illustration

插画风格

A detailed digital illustration of a medieval blacksmith's forge, isometric perspective, warm orange glow from the furnace, dieselpunk aesthetic with exposed pipes and riveted metal plates.

一幅详细的中世纪铁匠铺数字插画，等距视角，熔炉发出温暖的橙色光芒，柴油朋克美学，带有裸露的管道和铆钉金属板。

Product mockup

产品样机

A product photography shot of a ceramic coffee mug on a marble surface, natural window lighting, minimalist Scandinavian style, clean white background.

undefined

一张陶瓷咖啡杯放在大理石台面上的产品摄影图，自然窗户光线，简约斯堪的纳维亚风格，干净的白色背景。

undefined

Text in Images

图像中的文字

For images containing text, use Gemini 3 Pro (not Imagen):

Keep text to 25 characters or less per element
Use 2-3 distinct text phrases maximum
Specify font style generally (bold, elegant, handwritten)
Indicate size (small, medium, large)

如需在图像中添加文字，请使用Gemini 3 Pro（而非Imagen）：

每个元素的文字控制在25个字符以内
最多使用2-3个不同的文字短语
大致指定字体风格（粗体、优雅、手写体）
说明尺寸（小、中、大）

Quality Modifiers

质量修饰词

Add these for enhanced output:

Photography: 4K, HDR, studio photo, professional lighting
Art: detailed, by a professional, high-quality illustration
General: high-fidelity, crisp details, polished finish

添加以下内容以提升输出质量：

摄影: 4K、HDR、工作室照片、专业布光
艺术: 细节丰富、专业水准、高质量插画
通用: 高保真、清晰细节、精致成品

Error Handling

错误处理

python

from google.genai import errors

def generate_with_retry(client, *, model, contents, config, max_attempts=5):
    for attempt in range(1, max_attempts + 1):
        try:
            return client.models.generate_content(
                model=model, contents=contents, config=config
            )
        except errors.APIError as e:
            code = getattr(e, "code", None) or getattr(e, "status", None)
            if code not in (429, 500, 502, 503, 504) or attempt >= max_attempts:
                raise
            delay = min(30, 2 ** (attempt - 1))
            time.sleep(delay)

python

from google.genai import errors

def generate_with_retry(client, *, model, contents, config, max_attempts=5):
    for attempt in range(1, max_attempts + 1):
        try:
            return client.models.generate_content(
                model=model, contents=contents, config=config
            )
        except errors.APIError as e:
            code = getattr(e, "code", None) or getattr(e, "status", None)
            if code not in (429, 500, 502, 503, 504) or attempt >= max_attempts:
                raise
            delay = min(30, 2 ** (attempt - 1))
            time.sleep(delay)

Model Selection

模型选择

Model	Use Case
`gemini-3-pro-image-preview`	Complex edits, text rendering, multi-turn, transparency workflows
`gemini-2.5-flash-image`	Quick generation, high volume, simple tasks
`imagen-4.0-generate-001`	Photorealistic images, no editing needed

Default to gemini-3-pro-image-preview for most tasks.

模型	使用场景
`gemini-3-pro-image-preview`	复杂编辑、文字渲染、多轮对话、透明度工作流
`gemini-2.5-flash-image`	快速生成、高吞吐量、简单任务
`imagen-4.0-generate-001`	写实图像、无需编辑

大多数任务默认使用gemini-3-pro-image-preview。

File References

文件参考

```
scripts/generate.py
```
- CLI for image generation (use this instead of writing code)
```
scripts/transparency.py
```
- Difference matting implementation
```
references/prompts.md
```
- Extended prompt examples by category

```
scripts/generate.py
```
- 图像生成的CLI（使用此脚本而非编写代码）
```
scripts/transparency.py
```
- 差异抠图实现
```
references/prompts.md
```
- 按类别划分的扩展提示词示例