image-gen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AI Image Generation Workflow

AI图像生成工作流

Use this skill when the user wants to create, edit, upscale, style-transfer, or create variations of images. AgentOS provides five high-level APIs that route to any configured provider with automatic fallback when multiple providers have credentials set.
当用户想要创建、编辑、放大、风格迁移或生成图像变体时,可使用此技能。当配置了多个提供商的凭证时,AgentOS提供五个高级API,可路由至任何已配置的提供商,并具备自动回退功能。

The Five High-Level APIs

五个高级API

  1. generateImage()
    — Create new images from text prompts. Supports
    referenceImageUrl
    for character consistency.
  2. editImage()
    — Transform existing images via img2img, inpainting, or outpainting.
  3. upscaleImage()
    — Increase resolution (2x or 4x super-resolution).
  4. variateImage()
    — Generate visual variations of an existing image.
  5. transferStyle()
    — Apply the visual aesthetic of a reference image to a source image via Flux Redux.
If the
generate_image
tool is not loaded, enable it with
extensions_enable image-generation
.
  1. generateImage()
    — 根据文本提示创建新图像。支持通过
    referenceImageUrl
    实现角色一致性。
  2. editImage()
    — 通过图生图、图像修复或图像扩展功能转换现有图像。
  3. upscaleImage()
    — 提升分辨率(2倍或4倍超分辨率)。
  4. variateImage()
    — 生成现有图像的视觉变体。
  5. transferStyle()
    — 通过Flux Redux将参考图像的视觉美学应用于源图像。
如果未加载
generate_image
工具,请使用
extensions_enable image-generation
启用它。

Provider Selection Guide

提供商选择指南

Choose the provider based on the user's priority:
PriorityProviderEnv VarBest For
QualityOpenAI (GPT-Image-1, DALL-E 3)
OPENAI_API_KEY
Highest fidelity, prompt adherence, text-in-image
ControlStability AI (SDXL, SD3, Ultra)
STABILITY_API_KEY
Negative prompts, style presets, cfg/steps tuning
SpeedBFL / Flux (Flux Pro 1.1)
BFL_API_KEY
Fast generation with strong quality
SpeedFal (Flux Dev)
FAL_API_KEY
Serverless Flux inference, low latency
VarietyReplicate (Flux, SDXL, community models)
REPLICATE_API_TOKEN
Access to thousands of community models
CostOpenRouter (routes to cheapest)
OPENROUTER_API_KEY
Provider-agnostic routing, best price
PrivacyLocal SD (A1111 / ComfyUI)
STABLE_DIFFUSION_LOCAL_BASE_URL
Fully offline, no data leaves the machine
When multiple providers are configured, AgentOS wraps them in a FallbackImageProxy — if the primary provider fails (rate limit, outage, etc.), the request automatically retries on the next available provider in priority order.
根据用户的优先级选择提供商:
优先级提供商环境变量最佳适用场景
质量OpenAI(GPT-Image-1、DALL-E 3)
OPENAI_API_KEY
最高保真度、提示词贴合度、图像中嵌入文本
可控性Stability AI(SDXL、SD3、Ultra)
STABILITY_API_KEY
负面提示词、风格预设、cfg/步数调优
速度BFL / Flux(Flux Pro 1.1)
BFL_API_KEY
快速生成且质量出色
速度Fal(Flux Dev)
FAL_API_KEY
无服务器Flux推理、低延迟
多样性Replicate(Flux、SDXL、社区模型)
REPLICATE_API_TOKEN
访问数千个社区模型
成本OpenRouter(路由至最便宜的提供商)
OPENROUTER_API_KEY
与提供商无关的路由、最优价格
隐私Local SD(A1111 / ComfyUI)
STABLE_DIFFUSION_LOCAL_BASE_URL
完全离线、数据不会离开本地机器
当配置多个提供商时,AgentOS会将它们封装在FallbackImageProxy中——如果主提供商失败(速率限制、服务中断等),请求会自动按优先级顺序重试下一个可用的提供商。

Operation Decision Tree

操作决策树

Use this to pick the right API for the user's request:
  • "Generate / create / draw / imagine" ->
    generateImage()
  • "Edit / change / modify / transform" ->
    editImage()
    with
    mode: 'img2img'
  • "Remove / fill in / fix this area" ->
    editImage()
    with
    mode: 'inpaint'
    + mask
  • "Extend / expand the borders" ->
    editImage()
    with
    mode: 'outpaint'
  • "Make it higher resolution / sharper" ->
    upscaleImage()
    with
    scale: 2
    or
    4
  • "Show me variations / alternatives" ->
    variateImage()
    with
    n: 3-4
  • "Make it look like this style" ->
    transferStyle()
    with source image + style reference
  • "Same character but different expression/pose" ->
    generateImage()
    with
    referenceImageUrl
    +
    consistencyMode: 'strict'
  • "Generate a character sheet / expression sheet" -> Use the
    AvatarPipeline
    which handles multi-stage consistency automatically
使用此决策树为用户的请求选择合适的API:
  • "生成/创建/绘制/想象" ->
    generateImage()
  • "编辑/修改/变换" ->
    editImage()
    ,设置
    mode: 'img2img'
  • "移除/填充/修复该区域" ->
    editImage()
    ,设置
    mode: 'inpaint'
    + 蒙版
  • "扩展/拓宽边界" ->
    editImage()
    ,设置
    mode: 'outpaint'
  • "提高分辨率/更清晰" ->
    upscaleImage()
    ,设置
    scale: 2
    4
  • "展示变体/替代方案" ->
    variateImage()
    ,设置
    n: 3-4
  • "让它看起来像这种风格" ->
    transferStyle()
    ,使用源图像+风格参考图
  • "相同角色但不同表情/姿势" ->
    generateImage()
    ,设置
    referenceImageUrl
    +
    consistencyMode: 'strict'
  • "生成角色表/表情表" -> 使用
    AvatarPipeline
    ,它会自动处理多阶段一致性

Character Consistency

角色一致性

When the user wants the same character across multiple images, use
referenceImageUrl
and
consistencyMode
:
  • 'strict'
    — Face must match exactly. Best for expression sheets. Auto-selects Pulid on Replicate.
  • 'balanced'
    — Recognizable but allows natural variation. Good for full-body shots and different angles.
  • 'loose'
    — Light influence from the reference. Good for "inspired by" mood pieces.
Supported providers: Replicate (Pulid, IP-Adapter), Fal (IP-Adapter), SD-Local (ControlNet). OpenAI/Stability ignore the field gracefully.
当用户希望在多张图像中保持相同角色时,使用
referenceImageUrl
consistencyMode
  • 'strict'
    — 面部必须完全匹配。最适合表情表。会自动选择Replicate上的Pulid模型。
  • 'balanced'
    — 可识别但允许自然变化。适用于全身照和不同角度。
  • 'loose'
    — 参考图的影响较弱。适用于"受启发的"氛围作品。
支持的提供商:Replicate(Pulid、IP-Adapter)、Fal(IP-Adapter)、SD-Local(ControlNet)。OpenAI/Stability会自动忽略该字段。

Prompt Engineering Tips

提示词工程技巧

A strong image prompt has five components:
  1. Subject — What is in the image. Be specific: "a red panda sitting on a mossy branch" not "an animal."
  2. Style — Artistic approach: photorealistic, watercolor, pixel art, oil painting, vector illustration, cinematic, anime.
  3. Composition — Camera angle and framing: close-up portrait, wide establishing shot, overhead flat lay, isometric.
  4. Lighting and Color — Mood through light: golden hour, dramatic side-lighting, neon glow, muted earth tones, high contrast.
  5. Atmosphere — Emotional tone: serene, ominous, whimsical, nostalgic, futuristic.
Additional tips:
  • Front-load the most important elements. Models weight earlier tokens more heavily.
  • Use negative prompts (Stability, Local SD) to exclude unwanted elements: "no text, no watermark, no blurry."
  • For text-in-image, OpenAI GPT-Image-1 is the most reliable. Other models struggle with legible text.
  • Request
    quality: 'hd'
    for DALL-E 3 when detail matters (doubles cost).
  • For consistent characters across multiple images, describe the character in detail each time or use img2img with a reference.
优质的图像提示词包含五个组成部分:
  1. 主体 — 图像中的内容。要具体:比如"坐在长满苔藓的树枝上的小熊猫",而不是"一只动物"。
  2. 风格 — 艺术表现方式:写实风格、水彩画、像素艺术、油画、矢量插画、电影感、动漫风格。
  3. 构图 — 相机角度和取景:特写肖像、宽景定场镜头、俯视平铺、等轴测视角。
  4. 光线与色彩 — 通过光线营造氛围:黄金时刻、戏剧性侧光、霓虹光晕、柔和大地色调、高对比度。
  5. 氛围 — 情感基调:宁静、不祥、奇幻、怀旧、未来感。
额外技巧:
  • 将最重要的元素放在前面。模型对较早的标记权重更高。
  • 使用负面提示词(Stability、Local SD)排除不需要的元素:"无文本、无水印、无模糊"。
  • 若要在图像中嵌入文本,OpenAI GPT-Image-1是最可靠的。其他模型难以生成清晰可读的文本。
  • 当细节很重要时,为DALL-E 3请求
    quality: 'hd'
    (成本翻倍)。
  • 若要在多张图像中保持角色一致,每次都详细描述角色,或使用图生图并搭配参考图。

Sizes and Aspect Ratios

尺寸与宽高比

ProviderSupported SizesAspect Ratio Support
OpenAI1024x1024, 1792x1024, 1024x1792Via size selection
StabilityFlexible
1:1
,
16:9
,
9:16
,
4:3
,
3:4
, etc.
Replicate/FluxFlexible
aspectRatio
parameter
Local SDAny (multiples of 64)Via
width
/
height
提供商支持的尺寸宽高比支持
OpenAI1024x1024、1792x1024、1024x1792通过尺寸选择
Stability灵活可变
1:1
16:9
9:16
4:3
3:4
Replicate/Flux灵活可变通过
aspectRatio
参数
Local SD任意(64的倍数)通过
width
/
height
参数

Examples

示例

  • "Generate a photorealistic image of a cozy cabin in the mountains at sunset."
  • "Create a professional logo for a coffee shop called 'Bean There' — vector illustration style, clean lines."
  • "Edit this photo: make the sky more dramatic with storm clouds." (img2img)
  • "Remove the person from the background of this product photo." (inpaint + mask)
  • "Upscale this thumbnail to 4x resolution for print."
  • "Show me 3 variations of this hero image with different color palettes."
  • "Generate a 16:9 cinematic landscape of a neon-lit Tokyo street at night in the rain."
  • "生成一张日落时分山间温馨小屋的写实风格图像。"
  • "为名为'Bean There'的咖啡店创建专业标志——矢量插画风格、线条简洁。"
  • "编辑这张照片:让天空变得更有戏剧性,添加暴风雨云。"(图生图)
  • "从这张产品照片的背景中移除人物。"(图像修复+蒙版)
  • "将此缩略图放大4倍分辨率以用于印刷。"
  • "展示这张主图的3种变体,搭配不同的调色板。"
  • "生成一张16:9的电影感夜景图,内容是雨中霓虹点亮的东京街道。"

Provider Preferences

提供商偏好设置

You can override the default fallback chain on a per-request basis using the
providerPreferences
field from the agent config (see
providerPreferences.image
in
agent.config.json
). This lets users pin preferred providers, weight them for probabilistic routing, or block specific providers entirely.
KeyTypePurpose
preferred
string[]
Ordered list of provider IDs to try first (e.g.,
['stability', 'openai']
).
weights
Record<string, number>
Relative selection weights for probabilistic routing (e.g.,
{ stability: 0.7, openai: 0.3 }
).
blocked
string[]
Provider IDs that must never be used (e.g.,
['replicate']
).
Example — passing preferences inline:
ts
generateImage({
  prompt: 'A neon-lit Tokyo alley in the rain',
  providerPreferences: {
    preferred: ['stability', 'openai'],
    blocked: ['replicate'],
  },
});
Example — setting in
agent.config.json
so all image calls inherit the preference:
jsonc
{
  "providerPreferences": {
    "image": {
      "preferred": ["stability", "bfl"],
      "weights": { "stability": 0.6, "bfl": 0.4 },
      "blocked": ["replicate"]
    }
  }
}
When
providerPreferences.image
is set in the agent config, the runtime merges it with any per-request overrides (per-request wins). Blocked providers are removed from the fallback chain before any attempt is made.
你可以使用代理配置中的
providerPreferences
字段,按请求覆盖默认的回退链(详见
agent.config.json
中的
providerPreferences.image
)。这允许用户固定偏好的提供商、为概率路由设置权重,或完全阻止特定提供商。
类型用途
preferred
string[]
优先尝试的提供商ID有序列表(例如
['stability', 'openai']
)。
weights
Record<string, number>
概率路由的相对选择权重(例如
{ stability: 0.7, openai: 0.3 }
)。
blocked
string[]
绝对不能使用的提供商ID(例如
['replicate']
)。
示例——内联传递偏好设置:
ts
generateImage({
  prompt: 'A neon-lit Tokyo alley in the rain',
  providerPreferences: {
    preferred: ['stability', 'openai'],
    blocked: ['replicate'],
  },
});
示例——在
agent.config.json
中设置,使所有图像调用都继承此偏好:
jsonc
{
  "providerPreferences": {
    "image": {
      "preferred": ["stability", "bfl"],
      "weights": { "stability": 0.6, "bfl": 0.4 },
      "blocked": ["replicate"]
    }
  }
}
agent.config.json
中设置了
providerPreferences.image
时,运行时会将其与任何按请求的覆盖设置合并(按请求的设置优先)。在进行任何尝试之前,被阻止的提供商将从回退链中移除。

Constraints

限制条件

  • Image generation costs API credits per request; inform the user of approximate costs when possible.
  • Content policy restrictions apply per provider: no realistic faces of real people, no violent/explicit content.
  • DALL-E 3 does not support native inpainting — use GPT-Image-1 or Stability for mask-based editing.
  • Upscaling is not supported by OpenAI or OpenRouter — use Stability, Replicate, or Local SD.
  • Generated images may not perfectly match the prompt; iterative refinement is expected.
  • Maximum prompt length varies by model (DALL-E 3: 4,000 chars; Stability: 2,000 chars).
  • Local SD requires a running A1111 or ComfyUI instance with the API enabled.
  • The fallback chain only activates when the primary provider fails; it does not merge results from multiple providers.
  • 图像生成每次请求会消耗API积分;可能的话,请告知用户大致成本。
  • 各提供商均适用内容政策限制:不得生成真实人物的逼真面部图像,不得生成暴力/露骨内容。
  • DALL-E 3不支持原生图像修复——如需基于蒙版的编辑,请使用GPT-Image-1或Stability。
  • OpenAI或OpenRouter不支持图像放大——请使用Stability、Replicate或Local SD。
  • 生成的图像可能无法完全匹配提示词;通常需要迭代优化。
  • 不同模型的最大提示词长度不同(DALL-E 3:4000字符;Stability:2000字符)。
  • Local SD需要运行中的A1111或ComfyUI实例,并启用API。
  • 回退链仅在主提供商失败时激活;不会合并多个提供商的结果。