image-gen

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AI Image Generation Workflow

AI图像生成工作流

Use this skill when the user wants to create, edit, upscale, style-transfer, or create variations of images. AgentOS provides five high-level APIs that route to any configured provider with automatic fallback when multiple providers have credentials set.

当用户想要创建、编辑、放大、风格迁移或生成图像变体时，可使用此技能。当配置了多个提供商的凭证时，AgentOS提供五个高级API，可路由至任何已配置的提供商，并具备自动回退功能。

The Five High-Level APIs

五个高级API

generateImage()
— Create new images from text prompts. Supports
```
referenceImageUrl
```
for character consistency.
editImage()
— Transform existing images via img2img, inpainting, or outpainting.
upscaleImage()
— Increase resolution (2x or 4x super-resolution).
variateImage()
— Generate visual variations of an existing image.
transferStyle()
— Apply the visual aesthetic of a reference image to a source image via Flux Redux.

If the

generate_image

tool is not loaded, enable it with

extensions_enable image-generation

generateImage()
— 根据文本提示创建新图像。支持通过
```
referenceImageUrl
```
实现角色一致性。
editImage()
— 通过图生图、图像修复或图像扩展功能转换现有图像。
upscaleImage()
— 提升分辨率（2倍或4倍超分辨率）。
variateImage()
— 生成现有图像的视觉变体。
transferStyle()
— 通过Flux Redux将参考图像的视觉美学应用于源图像。

如果未加载

generate_image

工具，请使用

extensions_enable image-generation

启用它。

Provider Selection Guide

提供商选择指南

Choose the provider based on the user's priority:

Priority	Provider	Env Var	Best For
Quality	OpenAI (GPT-Image-1, DALL-E 3)	`OPENAI_API_KEY`	Highest fidelity, prompt adherence, text-in-image
Control	Stability AI (SDXL, SD3, Ultra)	`STABILITY_API_KEY`	Negative prompts, style presets, cfg/steps tuning
Speed	BFL / Flux (Flux Pro 1.1)	`BFL_API_KEY`	Fast generation with strong quality
Speed	Fal (Flux Dev)	`FAL_API_KEY`	Serverless Flux inference, low latency
Variety	Replicate (Flux, SDXL, community models)	`REPLICATE_API_TOKEN`	Access to thousands of community models
Cost	OpenRouter (routes to cheapest)	`OPENROUTER_API_KEY`	Provider-agnostic routing, best price
Privacy	Local SD (A1111 / ComfyUI)	`STABLE_DIFFUSION_LOCAL_BASE_URL`	Fully offline, no data leaves the machine

When multiple providers are configured, AgentOS wraps them in a FallbackImageProxy — if the primary provider fails (rate limit, outage, etc.), the request automatically retries on the next available provider in priority order.

根据用户的优先级选择提供商：

优先级	提供商	环境变量	最佳适用场景
质量	OpenAI（GPT-Image-1、DALL-E 3）	`OPENAI_API_KEY`	最高保真度、提示词贴合度、图像中嵌入文本
可控性	Stability AI（SDXL、SD3、Ultra）	`STABILITY_API_KEY`	负面提示词、风格预设、cfg/步数调优
速度	BFL / Flux（Flux Pro 1.1）	`BFL_API_KEY`	快速生成且质量出色
速度	Fal（Flux Dev）	`FAL_API_KEY`	无服务器Flux推理、低延迟
多样性	Replicate（Flux、SDXL、社区模型）	`REPLICATE_API_TOKEN`	访问数千个社区模型
成本	OpenRouter（路由至最便宜的提供商）	`OPENROUTER_API_KEY`	与提供商无关的路由、最优价格
隐私	Local SD（A1111 / ComfyUI）	`STABLE_DIFFUSION_LOCAL_BASE_URL`	完全离线、数据不会离开本地机器

当配置多个提供商时，AgentOS会将它们封装在FallbackImageProxy中——如果主提供商失败（速率限制、服务中断等），请求会自动按优先级顺序重试下一个可用的提供商。

Operation Decision Tree

操作决策树

Use this to pick the right API for the user's request:

"Generate / create / draw / imagine" ->
```
generateImage()
```
"Edit / change / modify / transform" ->
```
editImage()
```
with
```
mode: 'img2img'
```
"Remove / fill in / fix this area" ->
```
editImage()
```
with
```
mode: 'inpaint'
```
+ mask
"Extend / expand the borders" ->
```
editImage()
```
with
```
mode: 'outpaint'
```
"Make it higher resolution / sharper" ->
```
upscaleImage()
```
with
```
scale: 2
```
or
```
4
```
"Show me variations / alternatives" ->
```
variateImage()
```
with
```
n: 3-4
```
"Make it look like this style" ->
```
transferStyle()
```
with source image + style reference

"Same character but different expression/pose" ->

generateImage()

with

referenceImageUrl

consistencyMode: 'strict'

"Generate a character sheet / expression sheet" -> Use the
```
AvatarPipeline
```
which handles multi-stage consistency automatically

使用此决策树为用户的请求选择合适的API：

"生成/创建/绘制/想象" ->
```
generateImage()
```
"编辑/修改/变换" ->
```
editImage()
```
，设置
```
mode: 'img2img'
```
"移除/填充/修复该区域" ->
```
editImage()
```
，设置
```
mode: 'inpaint'
```
+ 蒙版
"扩展/拓宽边界" ->
```
editImage()
```
，设置
```
mode: 'outpaint'
```
"提高分辨率/更清晰" ->
```
upscaleImage()
```
，设置
```
scale: 2
```
或
```
4
```
"展示变体/替代方案" ->
```
variateImage()
```
，设置
```
n: 3-4
```
"让它看起来像这种风格" ->
```
transferStyle()
```
，使用源图像+风格参考图

"相同角色但不同表情/姿势" ->

generateImage()

，设置

referenceImageUrl

consistencyMode: 'strict'

"生成角色表/表情表" -> 使用
```
AvatarPipeline
```
，它会自动处理多阶段一致性

Character Consistency

角色一致性

When the user wants the same character across multiple images, use

referenceImageUrl

and

consistencyMode

```
'strict'
```
— Face must match exactly. Best for expression sheets. Auto-selects Pulid on Replicate.
```
'balanced'
```
— Recognizable but allows natural variation. Good for full-body shots and different angles.
```
'loose'
```
— Light influence from the reference. Good for "inspired by" mood pieces.

Supported providers: Replicate (Pulid, IP-Adapter), Fal (IP-Adapter), SD-Local (ControlNet). OpenAI/Stability ignore the field gracefully.

当用户希望在多张图像中保持相同角色时，使用

referenceImageUrl

和

consistencyMode

：

```
'strict'
```
— 面部必须完全匹配。最适合表情表。会自动选择Replicate上的Pulid模型。
```
'balanced'
```
— 可识别但允许自然变化。适用于全身照和不同角度。
```
'loose'
```
— 参考图的影响较弱。适用于"受启发的"氛围作品。

支持的提供商：Replicate（Pulid、IP-Adapter）、Fal（IP-Adapter）、SD-Local（ControlNet）。OpenAI/Stability会自动忽略该字段。

Prompt Engineering Tips

提示词工程技巧

A strong image prompt has five components:

Subject — What is in the image. Be specific: "a red panda sitting on a mossy branch" not "an animal."
Style — Artistic approach: photorealistic, watercolor, pixel art, oil painting, vector illustration, cinematic, anime.
Composition — Camera angle and framing: close-up portrait, wide establishing shot, overhead flat lay, isometric.
Lighting and Color — Mood through light: golden hour, dramatic side-lighting, neon glow, muted earth tones, high contrast.
Atmosphere — Emotional tone: serene, ominous, whimsical, nostalgic, futuristic.

Additional tips:

Front-load the most important elements. Models weight earlier tokens more heavily.
Use negative prompts (Stability, Local SD) to exclude unwanted elements: "no text, no watermark, no blurry."
For text-in-image, OpenAI GPT-Image-1 is the most reliable. Other models struggle with legible text.
Request
```
quality: 'hd'
```
for DALL-E 3 when detail matters (doubles cost).
For consistent characters across multiple images, describe the character in detail each time or use img2img with a reference.

优质的图像提示词包含五个组成部分：

主体 — 图像中的内容。要具体：比如"坐在长满苔藓的树枝上的小熊猫"，而不是"一只动物"。
风格 — 艺术表现方式：写实风格、水彩画、像素艺术、油画、矢量插画、电影感、动漫风格。
构图 — 相机角度和取景：特写肖像、宽景定场镜头、俯视平铺、等轴测视角。
光线与色彩 — 通过光线营造氛围：黄金时刻、戏剧性侧光、霓虹光晕、柔和大地色调、高对比度。
氛围 — 情感基调：宁静、不祥、奇幻、怀旧、未来感。

额外技巧：

将最重要的元素放在前面。模型对较早的标记权重更高。
使用负面提示词（Stability、Local SD）排除不需要的元素："无文本、无水印、无模糊"。
若要在图像中嵌入文本，OpenAI GPT-Image-1是最可靠的。其他模型难以生成清晰可读的文本。
当细节很重要时，为DALL-E 3请求
```
quality: 'hd'
```
（成本翻倍）。
若要在多张图像中保持角色一致，每次都详细描述角色，或使用图生图并搭配参考图。

Sizes and Aspect Ratios

尺寸与宽高比

Provider	Supported Sizes	Aspect Ratio Support
OpenAI	1024x1024, 1792x1024, 1024x1792	Via size selection
Stability	Flexible	`1:1` , `16:9` , `9:16` , `4:3` , `3:4` , etc.
Replicate/Flux	Flexible	`aspectRatio` parameter
Local SD	Any (multiples of 64)	Via `width` / `height`

提供商	支持的尺寸	宽高比支持
OpenAI	1024x1024、1792x1024、1024x1792	通过尺寸选择
Stability	灵活可变	`1:1` 、 `16:9` 、 `9:16` 、 `4:3` 、 `3:4` 等
Replicate/Flux	灵活可变	通过 `aspectRatio` 参数
Local SD	任意（64的倍数）	通过 `width` / `height` 参数

Examples

示例

"Generate a photorealistic image of a cozy cabin in the mountains at sunset."
"Create a professional logo for a coffee shop called 'Bean There' — vector illustration style, clean lines."
"Edit this photo: make the sky more dramatic with storm clouds." (img2img)
"Remove the person from the background of this product photo." (inpaint + mask)
"Upscale this thumbnail to 4x resolution for print."
"Show me 3 variations of this hero image with different color palettes."
"Generate a 16:9 cinematic landscape of a neon-lit Tokyo street at night in the rain."

"生成一张日落时分山间温馨小屋的写实风格图像。"
"为名为'Bean There'的咖啡店创建专业标志——矢量插画风格、线条简洁。"
"编辑这张照片：让天空变得更有戏剧性，添加暴风雨云。"（图生图）
"从这张产品照片的背景中移除人物。"（图像修复+蒙版）
"将此缩略图放大4倍分辨率以用于印刷。"
"展示这张主图的3种变体，搭配不同的调色板。"
"生成一张16:9的电影感夜景图，内容是雨中霓虹点亮的东京街道。"

Provider Preferences

提供商偏好设置

You can override the default fallback chain on a per-request basis using the

providerPreferences

field from the agent config (see

providerPreferences.image

agent.config.json

). This lets users pin preferred providers, weight them for probabilistic routing, or block specific providers entirely.

Key	Type	Purpose
`preferred`	`string[]`	Ordered list of provider IDs to try first (e.g., `['stability', 'openai']` ).
`weights`	`Record<string, number>`	Relative selection weights for probabilistic routing (e.g., `{ stability: 0.7, openai: 0.3 }` ).
`blocked`	`string[]`	Provider IDs that must never be used (e.g., `['replicate']` ).

Example — passing preferences inline:

generateImage({
  prompt: 'A neon-lit Tokyo alley in the rain',
  providerPreferences: {
    preferred: ['stability', 'openai'],
    blocked: ['replicate'],
  },
});

Example — setting in

agent.config.json

so all image calls inherit the preference:

jsonc

{
  "providerPreferences": {
    "image": {
      "preferred": ["stability", "bfl"],
      "weights": { "stability": 0.6, "bfl": 0.4 },
      "blocked": ["replicate"]
    }
  }
}

When

providerPreferences.image

is set in the agent config, the runtime merges it with any per-request overrides (per-request wins). Blocked providers are removed from the fallback chain before any attempt is made.

你可以使用代理配置中的

providerPreferences

字段，按请求覆盖默认的回退链（详见

agent.config.json

中的

providerPreferences.image

）。这允许用户固定偏好的提供商、为概率路由设置权重，或完全阻止特定提供商。

键	类型	用途
`preferred`	`string[]`	优先尝试的提供商ID有序列表（例如 `['stability', 'openai']` ）。
`weights`	`Record<string, number>`	概率路由的相对选择权重（例如 `{ stability: 0.7, openai: 0.3 }` ）。
`blocked`	`string[]`	绝对不能使用的提供商ID（例如 `['replicate']` ）。

示例——内联传递偏好设置：

generateImage({
  prompt: 'A neon-lit Tokyo alley in the rain',
  providerPreferences: {
    preferred: ['stability', 'openai'],
    blocked: ['replicate'],
  },
});

示例——在

agent.config.json

中设置，使所有图像调用都继承此偏好：

jsonc

{
  "providerPreferences": {
    "image": {
      "preferred": ["stability", "bfl"],
      "weights": { "stability": 0.6, "bfl": 0.4 },
      "blocked": ["replicate"]
    }
  }
}

当

agent.config.json

中设置了

providerPreferences.image

时，运行时会将其与任何按请求的覆盖设置合并（按请求的设置优先）。在进行任何尝试之前，被阻止的提供商将从回退链中移除。

Constraints

限制条件

Image generation costs API credits per request; inform the user of approximate costs when possible.
Content policy restrictions apply per provider: no realistic faces of real people, no violent/explicit content.
DALL-E 3 does not support native inpainting — use GPT-Image-1 or Stability for mask-based editing.
Upscaling is not supported by OpenAI or OpenRouter — use Stability, Replicate, or Local SD.
Generated images may not perfectly match the prompt; iterative refinement is expected.
Maximum prompt length varies by model (DALL-E 3: 4,000 chars; Stability: 2,000 chars).
Local SD requires a running A1111 or ComfyUI instance with the API enabled.
The fallback chain only activates when the primary provider fails; it does not merge results from multiple providers.

图像生成每次请求会消耗API积分；可能的话，请告知用户大致成本。
各提供商均适用内容政策限制：不得生成真实人物的逼真面部图像，不得生成暴力/露骨内容。
DALL-E 3不支持原生图像修复——如需基于蒙版的编辑，请使用GPT-Image-1或Stability。
OpenAI或OpenRouter不支持图像放大——请使用Stability、Replicate或Local SD。
生成的图像可能无法完全匹配提示词；通常需要迭代优化。
不同模型的最大提示词长度不同（DALL-E 3：4000字符；Stability：2000字符）。
Local SD需要运行中的A1111或ComfyUI实例，并启用API。
回退链仅在主提供商失败时激活；不会合并多个提供商的结果。