image-generation

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Image Generation

图像生成

Generate images using
generate_media
with
mode="image"
. The system auto-selects the best backend based on available API keys.
使用
generate_media
并设置
mode="image"
即可生成图像,系统会根据可用的API密钥自动选择最优后端。

Quick Start

快速开始

python
undefined
python
undefined

Simple text-to-image (auto-selects backend)

简单文生图(自动选择后端)

generate_media(prompt="A cat in space", mode="image")
generate_media(prompt="A cat in space", mode="image")

Specify backend and quality

指定后端和质量

generate_media(prompt="A logo for a coffee shop", mode="image", backend_type="openai", quality="high")
generate_media(prompt="A logo for a coffee shop", mode="image", backend_type="openai", quality="high")

Batch generation (parallel)

批量生成(并行)

generate_media(prompts=["sunset over ocean", "mountain landscape", "city at night"], mode="image", max_concurrent=3)
undefined
generate_media(prompts=["sunset over ocean", "mountain landscape", "city at night"], mode="image", max_concurrent=3)
undefined

Backend Comparison

后端对比

BackendDefault ModelStrengthsAPI Key
Google (priority 1)
gemini-3.1-flash-image-preview
(Nano Banana 2)
Fast, flexible sizes, image editing, multi-turn
GOOGLE_API_KEY
or
GEMINI_API_KEY
OpenAI (priority 2)
gpt-5.4
High quality, transparent backgrounds, continuation via response ID
OPENAI_API_KEY
Grok (priority 3)
grok-imagine-image
1k resolution, continuation via stored data URI
XAI_API_KEY
OpenRouter (priority 4)
google/gemini-3.1-flash-image-preview
Access to multiple models via single API
OPENROUTER_API_KEY
后端默认模型优势API密钥
Google(优先级1)
gemini-3.1-flash-image-preview
(Nano Banana 2)
速度快、支持灵活尺寸、图像编辑、多轮交互
GOOGLE_API_KEY
GEMINI_API_KEY
OpenAI(优先级2)
gpt-5.4
画质高、支持透明背景、可通过响应ID续生成
OPENAI_API_KEY
Grok(优先级3)
grok-imagine-image
1k分辨率、可通过存储的data URI续生成
XAI_API_KEY
OpenRouter(优先级4)
google/gemini-3.1-flash-image-preview
可通过单个API访问多个模型
OPENROUTER_API_KEY

Key Parameters

核心参数

ParameterDescriptionExample
prompt
Text description of the image
"A watercolor painting of mountains"
backend_type
Force a specific backend
"google"
,
"openai"
,
"grok"
,
"openrouter"
model
Override default model
"gemini-3-pro-image-preview"
for studio quality
quality
Image quality (OpenAI)
"low"
,
"medium"
,
"high"
,
"auto"
size
Image dimensionsSee backends reference
aspect_ratio
Aspect ratio
"16:9"
,
"1:1"
,
"4:5"
input_images
Source images for image-to-image editing
["photo.jpg"]
continue_from
Continuation ID for multi-turn editing
result["continuation_id"]
参数说明示例
prompt
图像的文本描述
"A watercolor painting of mountains"
backend_type
强制使用指定后端
"google"
,
"openai"
,
"grok"
,
"openrouter"
model
覆盖默认模型用于工作室级画质的
"gemini-3-pro-image-preview"
quality
图像质量(仅OpenAI支持)
"low"
,
"medium"
,
"high"
,
"auto"
size
图像尺寸参考后端文档
aspect_ratio
宽高比
"16:9"
,
"1:1"
,
"4:5"
input_images
图生图编辑的源图像
["photo.jpg"]
continue_from
多轮编辑的续生成ID
result["continuation_id"]

Image-to-Image Editing

图生图编辑

Transform existing images by providing
input_images
:
python
generate_media(
    prompt="Make it look like a watercolor painting",
    mode="image",
    input_images=["photo.jpg"]
)
Supported backends for image-to-image: Google (Gemini), OpenAI, Grok. The system auto-selects if your current backend doesn't support it.
提供
input_images
即可转换现有图像:
python
generate_media(
    prompt="Make it look like a watercolor painting",
    mode="image",
    input_images=["photo.jpg"]
)
支持图生图的后端:Google(Gemini)、OpenAI、Grok。如果你当前使用的后端不支持该功能,系统会自动选择适配的后端。

Multi-Turn Editing (Continuation)

多轮编辑(续生成)

Iteratively refine images using
continue_from
:
python
undefined
使用
continue_from
迭代优化图像:
python
undefined

First generation

首次生成

result = generate_media(prompt="A logo for a coffee shop", mode="image")
result = generate_media(prompt="A logo for a coffee shop", mode="image")

Refine using the continuation ID

使用续生成ID优化图像

result2 = generate_media( prompt="Make the text larger and add a cup icon", mode="image", continue_from=result["continuation_id"] )

Each backend uses a different continuation mechanism:
- **OpenAI**: Passes `previous_response_id` (stateless)
- **Google Gemini**: In-memory chat store (LRU, 50 items)
- **Grok**: In-memory data URI store (LRU, 50 items)

Continuation only works for single image generation (not batch).
result2 = generate_media( prompt="Make the text larger and add a cup icon", mode="image", continue_from=result["continuation_id"] )

不同后端使用的续生成机制不同:
- **OpenAI**:传递`previous_response_id`(无状态)
- **Google Gemini**:内存聊天存储(LRU,最多存50条)
- **Grok**:内存data URI存储(LRU,最多存50条)

续生成仅适用于单张图像生成(不支持批量)。

Google: Gemini vs Imagen

Google:Gemini vs Imagen

Google supports two API paths. Gemini (Nano Banana 2) is the default and recommended for most use cases. Imagen is only needed for advanced reference-image editing features.
  • Gemini models (
    gemini-*
    ):
    generate_content()
    — text-to-image, image editing via
    input_images
    , multi-turn continuation
  • Imagen models (
    imagen-*
    ):
    generate_images()
    /
    edit_image()
    — text-to-image with
    negative_prompt
    /
    seed
    /
    guidance_scale
    , plus style transfer, control editing, and subject consistency via reference images
For studio-quality precision and text rendering, use:
model="gemini-3-pro-image-preview"
(Pro-tier).
Google支持两种API路径,Gemini(Nano Banana 2)是大多数场景下的默认推荐选项,仅当需要高级参考图像编辑功能时才需要使用Imagen。
  • Gemini模型
    gemini-*
    ):
    generate_content()
    — 文生图、通过
    input_images
    实现图像编辑、多轮续生成
  • Imagen模型
    imagen-*
    ):
    generate_images()
    /
    edit_image()
    — 支持
    negative_prompt
    /
    seed
    /
    guidance_scale
    的文生图,此外还支持风格迁移、可控编辑、通过参考图像保持主体一致性
如果需要工作室级别的精度和文字渲染效果,请使用:
model="gemini-3-pro-image-preview"
(专业版权限)。

Need More Control?

需要更多控制?

  • Per-backend sizes, quality options, and quirks: See references/backends.md
  • Complete
    extra_params
    reference
    : See references/extra_params.md
  • Advanced editing (inpainting, style transfer, control, subject): See references/editing.md
  • 各后端的尺寸、质量选项和使用注意事项:查看 references/backends.md
  • 完整的
    extra_params
    参考文档
    :查看 references/extra_params.md
  • 高级编辑(局部重绘、风格迁移、可控编辑、主体控制):查看 references/editing.md",