image-gen
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAI Image Generation Workflow
AI图像生成工作流
Use this skill when the user wants to create, edit, upscale, style-transfer, or create variations of images. AgentOS provides five high-level APIs that route to any configured provider with automatic fallback when multiple providers have credentials set.
当用户想要创建、编辑、放大、风格迁移或生成图像变体时,可使用此技能。当配置了多个提供商的凭证时,AgentOS提供五个高级API,可路由至任何已配置的提供商,并具备自动回退功能。
The Five High-Level APIs
五个高级API
- — Create new images from text prompts. Supports
generateImage()for character consistency.referenceImageUrl - — Transform existing images via img2img, inpainting, or outpainting.
editImage() - — Increase resolution (2x or 4x super-resolution).
upscaleImage() - — Generate visual variations of an existing image.
variateImage() - — Apply the visual aesthetic of a reference image to a source image via Flux Redux.
transferStyle()
If the tool is not loaded, enable it with .
generate_imageextensions_enable image-generation- — 根据文本提示创建新图像。支持通过
generateImage()实现角色一致性。referenceImageUrl - — 通过图生图、图像修复或图像扩展功能转换现有图像。
editImage() - — 提升分辨率(2倍或4倍超分辨率)。
upscaleImage() - — 生成现有图像的视觉变体。
variateImage() - — 通过Flux Redux将参考图像的视觉美学应用于源图像。
transferStyle()
如果未加载工具,请使用启用它。
generate_imageextensions_enable image-generationProvider Selection Guide
提供商选择指南
Choose the provider based on the user's priority:
| Priority | Provider | Env Var | Best For |
|---|---|---|---|
| Quality | OpenAI (GPT-Image-1, DALL-E 3) | | Highest fidelity, prompt adherence, text-in-image |
| Control | Stability AI (SDXL, SD3, Ultra) | | Negative prompts, style presets, cfg/steps tuning |
| Speed | BFL / Flux (Flux Pro 1.1) | | Fast generation with strong quality |
| Speed | Fal (Flux Dev) | | Serverless Flux inference, low latency |
| Variety | Replicate (Flux, SDXL, community models) | | Access to thousands of community models |
| Cost | OpenRouter (routes to cheapest) | | Provider-agnostic routing, best price |
| Privacy | Local SD (A1111 / ComfyUI) | | Fully offline, no data leaves the machine |
When multiple providers are configured, AgentOS wraps them in a FallbackImageProxy — if the primary provider fails (rate limit, outage, etc.), the request automatically retries on the next available provider in priority order.
根据用户的优先级选择提供商:
| 优先级 | 提供商 | 环境变量 | 最佳适用场景 |
|---|---|---|---|
| 质量 | OpenAI(GPT-Image-1、DALL-E 3) | | 最高保真度、提示词贴合度、图像中嵌入文本 |
| 可控性 | Stability AI(SDXL、SD3、Ultra) | | 负面提示词、风格预设、cfg/步数调优 |
| 速度 | BFL / Flux(Flux Pro 1.1) | | 快速生成且质量出色 |
| 速度 | Fal(Flux Dev) | | 无服务器Flux推理、低延迟 |
| 多样性 | Replicate(Flux、SDXL、社区模型) | | 访问数千个社区模型 |
| 成本 | OpenRouter(路由至最便宜的提供商) | | 与提供商无关的路由、最优价格 |
| 隐私 | Local SD(A1111 / ComfyUI) | | 完全离线、数据不会离开本地机器 |
当配置多个提供商时,AgentOS会将它们封装在FallbackImageProxy中——如果主提供商失败(速率限制、服务中断等),请求会自动按优先级顺序重试下一个可用的提供商。
Operation Decision Tree
操作决策树
Use this to pick the right API for the user's request:
- "Generate / create / draw / imagine" ->
generateImage() - "Edit / change / modify / transform" -> with
editImage()mode: 'img2img' - "Remove / fill in / fix this area" -> with
editImage()+ maskmode: 'inpaint' - "Extend / expand the borders" -> with
editImage()mode: 'outpaint' - "Make it higher resolution / sharper" -> with
upscaleImage()orscale: 24 - "Show me variations / alternatives" -> with
variateImage()n: 3-4 - "Make it look like this style" -> with source image + style reference
transferStyle() - "Same character but different expression/pose" -> with
generateImage()+referenceImageUrlconsistencyMode: 'strict' - "Generate a character sheet / expression sheet" -> Use the which handles multi-stage consistency automatically
AvatarPipeline
使用此决策树为用户的请求选择合适的API:
- "生成/创建/绘制/想象" ->
generateImage() - "编辑/修改/变换" -> ,设置
editImage()mode: 'img2img' - "移除/填充/修复该区域" -> ,设置
editImage()+ 蒙版mode: 'inpaint' - "扩展/拓宽边界" -> ,设置
editImage()mode: 'outpaint' - "提高分辨率/更清晰" -> ,设置
upscaleImage()或scale: 24 - "展示变体/替代方案" -> ,设置
variateImage()n: 3-4 - "让它看起来像这种风格" -> ,使用源图像+风格参考图
transferStyle() - "相同角色但不同表情/姿势" -> ,设置
generateImage()+referenceImageUrlconsistencyMode: 'strict' - "生成角色表/表情表" -> 使用,它会自动处理多阶段一致性
AvatarPipeline
Character Consistency
角色一致性
When the user wants the same character across multiple images, use and :
referenceImageUrlconsistencyMode- — Face must match exactly. Best for expression sheets. Auto-selects Pulid on Replicate.
'strict' - — Recognizable but allows natural variation. Good for full-body shots and different angles.
'balanced' - — Light influence from the reference. Good for "inspired by" mood pieces.
'loose'
Supported providers: Replicate (Pulid, IP-Adapter), Fal (IP-Adapter), SD-Local (ControlNet). OpenAI/Stability ignore the field gracefully.
当用户希望在多张图像中保持相同角色时,使用和:
referenceImageUrlconsistencyMode- — 面部必须完全匹配。最适合表情表。会自动选择Replicate上的Pulid模型。
'strict' - — 可识别但允许自然变化。适用于全身照和不同角度。
'balanced' - — 参考图的影响较弱。适用于"受启发的"氛围作品。
'loose'
支持的提供商:Replicate(Pulid、IP-Adapter)、Fal(IP-Adapter)、SD-Local(ControlNet)。OpenAI/Stability会自动忽略该字段。
Prompt Engineering Tips
提示词工程技巧
A strong image prompt has five components:
- Subject — What is in the image. Be specific: "a red panda sitting on a mossy branch" not "an animal."
- Style — Artistic approach: photorealistic, watercolor, pixel art, oil painting, vector illustration, cinematic, anime.
- Composition — Camera angle and framing: close-up portrait, wide establishing shot, overhead flat lay, isometric.
- Lighting and Color — Mood through light: golden hour, dramatic side-lighting, neon glow, muted earth tones, high contrast.
- Atmosphere — Emotional tone: serene, ominous, whimsical, nostalgic, futuristic.
Additional tips:
- Front-load the most important elements. Models weight earlier tokens more heavily.
- Use negative prompts (Stability, Local SD) to exclude unwanted elements: "no text, no watermark, no blurry."
- For text-in-image, OpenAI GPT-Image-1 is the most reliable. Other models struggle with legible text.
- Request for DALL-E 3 when detail matters (doubles cost).
quality: 'hd' - For consistent characters across multiple images, describe the character in detail each time or use img2img with a reference.
优质的图像提示词包含五个组成部分:
- 主体 — 图像中的内容。要具体:比如"坐在长满苔藓的树枝上的小熊猫",而不是"一只动物"。
- 风格 — 艺术表现方式:写实风格、水彩画、像素艺术、油画、矢量插画、电影感、动漫风格。
- 构图 — 相机角度和取景:特写肖像、宽景定场镜头、俯视平铺、等轴测视角。
- 光线与色彩 — 通过光线营造氛围:黄金时刻、戏剧性侧光、霓虹光晕、柔和大地色调、高对比度。
- 氛围 — 情感基调:宁静、不祥、奇幻、怀旧、未来感。
额外技巧:
- 将最重要的元素放在前面。模型对较早的标记权重更高。
- 使用负面提示词(Stability、Local SD)排除不需要的元素:"无文本、无水印、无模糊"。
- 若要在图像中嵌入文本,OpenAI GPT-Image-1是最可靠的。其他模型难以生成清晰可读的文本。
- 当细节很重要时,为DALL-E 3请求(成本翻倍)。
quality: 'hd' - 若要在多张图像中保持角色一致,每次都详细描述角色,或使用图生图并搭配参考图。
Sizes and Aspect Ratios
尺寸与宽高比
| Provider | Supported Sizes | Aspect Ratio Support |
|---|---|---|
| OpenAI | 1024x1024, 1792x1024, 1024x1792 | Via size selection |
| Stability | Flexible | |
| Replicate/Flux | Flexible | |
| Local SD | Any (multiples of 64) | Via |
| 提供商 | 支持的尺寸 | 宽高比支持 |
|---|---|---|
| OpenAI | 1024x1024、1792x1024、1024x1792 | 通过尺寸选择 |
| Stability | 灵活可变 | |
| Replicate/Flux | 灵活可变 | 通过 |
| Local SD | 任意(64的倍数) | 通过 |
Examples
示例
- "Generate a photorealistic image of a cozy cabin in the mountains at sunset."
- "Create a professional logo for a coffee shop called 'Bean There' — vector illustration style, clean lines."
- "Edit this photo: make the sky more dramatic with storm clouds." (img2img)
- "Remove the person from the background of this product photo." (inpaint + mask)
- "Upscale this thumbnail to 4x resolution for print."
- "Show me 3 variations of this hero image with different color palettes."
- "Generate a 16:9 cinematic landscape of a neon-lit Tokyo street at night in the rain."
- "生成一张日落时分山间温馨小屋的写实风格图像。"
- "为名为'Bean There'的咖啡店创建专业标志——矢量插画风格、线条简洁。"
- "编辑这张照片:让天空变得更有戏剧性,添加暴风雨云。"(图生图)
- "从这张产品照片的背景中移除人物。"(图像修复+蒙版)
- "将此缩略图放大4倍分辨率以用于印刷。"
- "展示这张主图的3种变体,搭配不同的调色板。"
- "生成一张16:9的电影感夜景图,内容是雨中霓虹点亮的东京街道。"
Provider Preferences
提供商偏好设置
You can override the default fallback chain on a per-request basis using the field from the agent config (see in ). This lets users pin preferred providers, weight them for probabilistic routing, or block specific providers entirely.
providerPreferencesproviderPreferences.imageagent.config.json| Key | Type | Purpose |
|---|---|---|
| | Ordered list of provider IDs to try first (e.g., |
| | Relative selection weights for probabilistic routing (e.g., |
| | Provider IDs that must never be used (e.g., |
Example — passing preferences inline:
ts
generateImage({
prompt: 'A neon-lit Tokyo alley in the rain',
providerPreferences: {
preferred: ['stability', 'openai'],
blocked: ['replicate'],
},
});Example — setting in so all image calls inherit the preference:
agent.config.jsonjsonc
{
"providerPreferences": {
"image": {
"preferred": ["stability", "bfl"],
"weights": { "stability": 0.6, "bfl": 0.4 },
"blocked": ["replicate"]
}
}
}When is set in the agent config, the runtime merges it with any per-request overrides (per-request wins). Blocked providers are removed from the fallback chain before any attempt is made.
providerPreferences.image你可以使用代理配置中的字段,按请求覆盖默认的回退链(详见中的)。这允许用户固定偏好的提供商、为概率路由设置权重,或完全阻止特定提供商。
providerPreferencesagent.config.jsonproviderPreferences.image| 键 | 类型 | 用途 |
|---|---|---|
| | 优先尝试的提供商ID有序列表(例如 |
| | 概率路由的相对选择权重(例如 |
| | 绝对不能使用的提供商ID(例如 |
示例——内联传递偏好设置:
ts
generateImage({
prompt: 'A neon-lit Tokyo alley in the rain',
providerPreferences: {
preferred: ['stability', 'openai'],
blocked: ['replicate'],
},
});示例——在中设置,使所有图像调用都继承此偏好:
agent.config.jsonjsonc
{
"providerPreferences": {
"image": {
"preferred": ["stability", "bfl"],
"weights": { "stability": 0.6, "bfl": 0.4 },
"blocked": ["replicate"]
}
}
}当中设置了时,运行时会将其与任何按请求的覆盖设置合并(按请求的设置优先)。在进行任何尝试之前,被阻止的提供商将从回退链中移除。
agent.config.jsonproviderPreferences.imageConstraints
限制条件
- Image generation costs API credits per request; inform the user of approximate costs when possible.
- Content policy restrictions apply per provider: no realistic faces of real people, no violent/explicit content.
- DALL-E 3 does not support native inpainting — use GPT-Image-1 or Stability for mask-based editing.
- Upscaling is not supported by OpenAI or OpenRouter — use Stability, Replicate, or Local SD.
- Generated images may not perfectly match the prompt; iterative refinement is expected.
- Maximum prompt length varies by model (DALL-E 3: 4,000 chars; Stability: 2,000 chars).
- Local SD requires a running A1111 or ComfyUI instance with the API enabled.
- The fallback chain only activates when the primary provider fails; it does not merge results from multiple providers.
- 图像生成每次请求会消耗API积分;可能的话,请告知用户大致成本。
- 各提供商均适用内容政策限制:不得生成真实人物的逼真面部图像,不得生成暴力/露骨内容。
- DALL-E 3不支持原生图像修复——如需基于蒙版的编辑,请使用GPT-Image-1或Stability。
- OpenAI或OpenRouter不支持图像放大——请使用Stability、Replicate或Local SD。
- 生成的图像可能无法完全匹配提示词;通常需要迭代优化。
- 不同模型的最大提示词长度不同(DALL-E 3:4000字符;Stability:2000字符)。
- Local SD需要运行中的A1111或ComfyUI实例,并启用API。
- 回退链仅在主提供商失败时激活;不会合并多个提供商的结果。