baoyu-image-gen

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Image Generation (AI SDK)

图像生成(AI SDK)

Official API-based image generation. Supports OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Jimeng (即梦), Seedream (豆包) and Replicate providers.
基于官方API实现的图像生成工具,支持OpenAI、Google、OpenRouter、DashScope(阿里通义万象)、Jimeng(即梦)、Seedream(豆包)和Replicate服务提供商。

Script Directory

脚本目录

Agent Execution:
  1. {baseDir}
    = this SKILL.md file's directory
  2. Script path =
    {baseDir}/scripts/main.ts
  3. Resolve
    ${BUN_X}
    runtime: if
    bun
    installed →
    bun
    ; if
    npx
    available →
    npx -y bun
    ; else suggest installing bun
Agent 执行规则
  1. {baseDir}
    = 本SKILL.md文件所在目录
  2. 脚本路径 =
    {baseDir}/scripts/main.ts
  3. 解析
    ${BUN_X}
    运行时:如果已安装
    bun
    → 直接使用
    bun
    ;如果可用
    npx
    → 使用
    npx -y bun
    ;否则提示用户安装bun

Step 0: Load Preferences ⛔ BLOCKING

步骤0:加载偏好设置 ⛔ 阻塞步骤

CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.
Check EXTEND.md existence from the current working directory:
bash
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "found"
ResultAction
FoundLoad, parse, apply settings. If
default_model.[provider]
is null → ask model only (Flow 2 in first-time-setup.md)
Not foundSTOP. Do NOT generate any images. Read references/config/first-time-setup.md and follow its Flow 1 checklist step by step. This is a multi-turn interactive setup that requires asking the user multiple questions. Resume image generation only after Step 5 (verify) passes.
CRITICAL: The first-time setup is a multi-step interactive workflow, NOT a single action. You must ask the user questions and wait for answers at each step.
PathLocation
.baoyu-skills/baoyu-image-gen/EXTEND.md
Relative to current working directory
EXTEND.md Supports: Default provider | Default quality | Default aspect ratio | Default image size | Default models | Batch worker cap | Provider-specific batch limits
Schema:
references/config/preferences-schema.md
重要提示:本步骤必须在任何图像生成前完成,不得跳过或延迟执行。
检查当前工作目录下是否存在EXTEND.md:
bash
test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "found"
结果操作
存在加载、解析并应用配置。如果
default_model.[provider]
为null → 仅询问用户所需模型(对应first-time-setup.md中的流程2)
不存在停止操作,禁止生成任何图像。请阅读references/config/first-time-setup.md并逐步遵循其流程1的检查清单操作。这是一个多轮交互的设置流程,需要多次向用户询问信息,仅在步骤5(验证)通过后才可恢复图像生成。
重要提示:首次设置是多步骤交互工作流,而非单次操作。你必须每一步都向用户询问问题并等待回复。
路径位置
.baoyu-skills/baoyu-image-gen/EXTEND.md
相对于当前工作目录的路径
EXTEND.md支持配置项:默认服务提供商 | 默认画质 | 默认宽高比 | 默认图像尺寸 | 默认模型 | 批量并发上限 | 服务商专属批量限制
配置Schema:
references/config/preferences-schema.md

Usage

使用方法

bash
undefined
bash
undefined

Basic

基础用法

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png

With aspect ratio

指定宽高比

${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9
${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9

High quality

高清画质

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k

From prompt files

从prompt文件读取提示词

${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png
${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

With reference images (Google, OpenAI, OpenRouter, Replicate, or Seedream 4.0/4.5/5.0)

带参考图生成(支持Google、OpenAI、OpenRouter、Replicate或Seedream 4.0/4.5/5.0)

${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

With reference images (explicit provider/model)

带参考图+显式指定服务商/模型

${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png

OpenRouter (recommended default model)

使用OpenRouter(推荐默认模型)

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openrouter
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openrouter

OpenRouter with reference images

OpenRouter带参考图生成

${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png
${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png

Specific provider

指定服务提供商

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai

DashScope (阿里通义万象)

使用DashScope(阿里通义万象)

${BUN_X} {baseDir}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope
${BUN_X} {baseDir}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope

DashScope Qwen-Image 2.0 Pro (recommended for custom sizes and text rendering)

DashScope Qwen-Image 2.0 Pro(推荐用于自定义尺寸和文字渲染场景)

${BUN_X} {baseDir}/scripts/main.ts --prompt "为咖啡品牌设计一张 21:9 横幅海报,包含清晰中文标题" --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872
${BUN_X} {baseDir}/scripts/main.ts --prompt "为咖啡品牌设计一张 21:9 横幅海报,包含清晰中文标题" --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872

DashScope legacy Qwen fixed-size model

DashScope旧版Qwen固定尺寸模型

${BUN_X} {baseDir}/scripts/main.ts --prompt "一张电影感海报" --image out.png --provider dashscope --model qwen-image-max --size 1664x928
${BUN_X} {baseDir}/scripts/main.ts --prompt "一张电影感海报" --image out.png --provider dashscope --model qwen-image-max --size 1664x928

Replicate (google/nano-banana-pro)

使用Replicate(google/nano-banana-pro)

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

Replicate with specific model

Replicate指定模型

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

Batch mode with saved prompt files

使用保存的prompt文件批量生成

${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json

Batch mode with explicit worker count

批量模式显式指定并发数

${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json
undefined
${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json
undefined

Batch File Format

批量文件格式

json
{
  "jobs": 4,
  "tasks": [
    {
      "id": "hero",
      "promptFiles": ["prompts/hero.md"],
      "image": "out/hero.png",
      "provider": "replicate",
      "model": "google/nano-banana-pro",
      "ar": "16:9",
      "quality": "2k"
    },
    {
      "id": "diagram",
      "promptFiles": ["prompts/diagram.md"],
      "image": "out/diagram.png",
      "ref": ["references/original.png"]
    }
  ]
}
Paths in
promptFiles
,
image
, and
ref
are resolved relative to the batch file's directory.
jobs
is optional (overridden by CLI
--jobs
). Top-level array format (without
jobs
wrapper) is also accepted.
json
{
  "jobs": 4,
  "tasks": [
    {
      "id": "hero",
      "promptFiles": ["prompts/hero.md"],
      "image": "out/hero.png",
      "provider": "replicate",
      "model": "google/nano-banana-pro",
      "ar": "16:9",
      "quality": "2k"
    },
    {
      "id": "diagram",
      "promptFiles": ["prompts/diagram.md"],
      "image": "out/diagram.png",
      "ref": ["references/original.png"]
    }
  ]
}
promptFiles
image
ref
中的路径是相对于批量文件所在目录解析的。
jobs
为可选参数(会被CLI的
--jobs
参数覆盖)。也支持没有
jobs
外层包装的顶级数组格式。

Options

可选参数

OptionDescription
--prompt <text>
,
-p
Prompt text
--promptfiles <files...>
Read prompt from files (concatenated)
--image <path>
Output image path (required in single-image mode)
--batchfile <path>
JSON batch file for multi-image generation
--jobs <count>
Worker count for batch mode (default: auto, max from config, built-in default 10)
--provider google|openai|openrouter|dashscope|jimeng|seedream|replicate
Force provider (default: auto-detect)
--model <id>
,
-m
Model ID (Google:
gemini-3-pro-image-preview
; OpenAI:
gpt-image-1.5
; OpenRouter:
google/gemini-3.1-flash-image-preview
; DashScope:
qwen-image-2.0-pro
)
--ar <ratio>
Aspect ratio (e.g.,
16:9
,
1:1
,
4:3
)
--size <WxH>
Size (e.g.,
1024x1024
)
--quality normal|2k
Quality preset (default:
2k
)
--imageSize 1K|2K|4K
Image size for Google/OpenRouter (default: from quality)
--ref <files...>
Reference images. Supported by Google multimodal, OpenAI GPT Image edits, OpenRouter multimodal models, Replicate, and Seedream 5.0/4.5/4.0. Not supported by Jimeng, Seedream 3.0, or removed SeedEdit 3.0
--n <count>
Number of images
--json
JSON output
参数说明
--prompt <text>
,
-p
提示词文本
--promptfiles <files...>
从文件读取提示词(会自动拼接内容)
--image <path>
输出图像路径(单图模式下必填)
--batchfile <path>
用于多图生成的JSON批量配置文件路径
--jobs <count>
批量模式下的并发工作进程数(默认:自动配置,上限为配置中的最大值,内置默认值为10)
--provider google|openai|openrouter|dashscope|jimeng|seedream|replicate
强制指定服务提供商(默认:自动检测)
--model <id>
,
-m
模型ID(Google:
gemini-3-pro-image-preview
; OpenAI:
gpt-image-1.5
; OpenRouter:
google/gemini-3.1-flash-image-preview
; DashScope:
qwen-image-2.0-pro
--ar <ratio>
宽高比(例如:
16:9
1:1
4:3
--size <WxH>
图像尺寸(例如:
1024x1024
--quality normal|2k
画质预设(默认:
2k
--imageSize 1K|2K|4K
Google/OpenRouter的图像尺寸(默认:从画质参数继承)
--ref <files...>
参考图。支持Google多模态模型、OpenAI GPT Image编辑、OpenRouter多模态模型、Replicate和Seedream 5.0/4.5/4.0。Jimeng、Seedream 3.0和已下线的SeedEdit 3.0不支持该功能
--n <count>
生成图像数量
--json
以JSON格式输出结果

Environment Variables

环境变量

VariableDescription
OPENAI_API_KEY
OpenAI API key
OPENROUTER_API_KEY
OpenRouter API key
GOOGLE_API_KEY
Google API key
DASHSCOPE_API_KEY
DashScope API key (阿里云)
REPLICATE_API_TOKEN
Replicate API token
JIMENG_ACCESS_KEY_ID
Jimeng (即梦) Volcengine access key
JIMENG_SECRET_ACCESS_KEY
Jimeng (即梦) Volcengine secret key
ARK_API_KEY
Seedream (豆包) Volcengine ARK API key
OPENAI_IMAGE_MODEL
OpenAI model override
OPENROUTER_IMAGE_MODEL
OpenRouter model override (default:
google/gemini-3.1-flash-image-preview
)
GOOGLE_IMAGE_MODEL
Google model override
DASHSCOPE_IMAGE_MODEL
DashScope model override (default:
qwen-image-2.0-pro
)
REPLICATE_IMAGE_MODEL
Replicate model override (default: google/nano-banana-pro)
JIMENG_IMAGE_MODEL
Jimeng model override (default: jimeng_t2i_v40)
SEEDREAM_IMAGE_MODEL
Seedream model override (default: doubao-seedream-5-0-260128)
OPENAI_BASE_URL
Custom OpenAI endpoint
OPENROUTER_BASE_URL
Custom OpenRouter endpoint (default:
https://openrouter.ai/api/v1
)
OPENROUTER_HTTP_REFERER
Optional app/site URL for OpenRouter attribution
OPENROUTER_TITLE
Optional app name for OpenRouter attribution
GOOGLE_BASE_URL
Custom Google endpoint
DASHSCOPE_BASE_URL
Custom DashScope endpoint
REPLICATE_BASE_URL
Custom Replicate endpoint
JIMENG_BASE_URL
Custom Jimeng endpoint (default:
https://visual.volcengineapi.com
)
JIMENG_REGION
Jimeng region (default:
cn-north-1
)
SEEDREAM_BASE_URL
Custom Seedream endpoint (default:
https://ark.cn-beijing.volces.com/api/v3
)
BAOYU_IMAGE_GEN_MAX_WORKERS
Override batch worker cap
BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY
Override provider concurrency, e.g.
BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY
BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS
Override provider start gap, e.g.
BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS
Load Priority: CLI args > EXTEND.md > env vars >
<cwd>/.baoyu-skills/.env
>
~/.baoyu-skills/.env
变量说明
OPENAI_API_KEY
OpenAI API密钥
OPENROUTER_API_KEY
OpenRouter API密钥
GOOGLE_API_KEY
Google API密钥
DASHSCOPE_API_KEY
DashScope API密钥(阿里云)
REPLICATE_API_TOKEN
Replicate API令牌
JIMENG_ACCESS_KEY_ID
Jimeng(即梦)火山引擎access key
JIMENG_SECRET_ACCESS_KEY
Jimeng(即梦)火山引擎secret key
ARK_API_KEY
Seedream(豆包)火山引擎ARK API密钥
OPENAI_IMAGE_MODEL
OpenAI模型覆盖配置
OPENROUTER_IMAGE_MODEL
OpenRouter模型覆盖配置(默认:
google/gemini-3.1-flash-image-preview
GOOGLE_IMAGE_MODEL
Google模型覆盖配置
DASHSCOPE_IMAGE_MODEL
DashScope模型覆盖配置(默认:
qwen-image-2.0-pro
REPLICATE_IMAGE_MODEL
Replicate模型覆盖配置(默认:google/nano-banana-pro)
JIMENG_IMAGE_MODEL
Jimeng模型覆盖配置(默认:jimeng_t2i_v40)
SEEDREAM_IMAGE_MODEL
Seedream模型覆盖配置(默认:doubao-seedream-5-0-260128)
OPENAI_BASE_URL
自定义OpenAI接口地址
OPENROUTER_BASE_URL
自定义OpenRouter接口地址(默认:
https://openrouter.ai/api/v1
OPENROUTER_HTTP_REFERER
可选,OpenRouter归因用的应用/站点URL
OPENROUTER_TITLE
可选,OpenRouter归因用的应用名称
GOOGLE_BASE_URL
自定义Google接口地址
DASHSCOPE_BASE_URL
自定义DashScope接口地址
REPLICATE_BASE_URL
自定义Replicate接口地址
JIMENG_BASE_URL
自定义Jimeng接口地址(默认:
https://visual.volcengineapi.com
JIMENG_REGION
Jimeng服务区域(默认:
cn-north-1
SEEDREAM_BASE_URL
自定义Seedream接口地址(默认:
https://ark.cn-beijing.volces.com/api/v3
BAOYU_IMAGE_GEN_MAX_WORKERS
批量并发上限覆盖配置
BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY
指定服务商的并发数覆盖配置,例如
BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY
BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS
指定服务商的请求间隔覆盖配置,例如
BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS
加载优先级:CLI参数 > EXTEND.md配置 > 环境变量 >
<cwd>/.baoyu-skills/.env
>
~/.baoyu-skills/.env

Model Resolution

模型选择规则

Model priority (highest → lowest), applies to all providers:
  1. CLI flag:
    --model <id>
  2. EXTEND.md:
    default_model.[provider]
  3. Env var:
    <PROVIDER>_IMAGE_MODEL
    (e.g.,
    GOOGLE_IMAGE_MODEL
    )
  4. Built-in default
EXTEND.md overrides env vars. If both EXTEND.md
default_model.google: "gemini-3-pro-image-preview"
and env var
GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview
exist, EXTEND.md wins.
Agent MUST display model info before each generation:
  • Show:
    Using [provider] / [model]
  • Show switch hint:
    Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL
模型优先级从高到低,适用于所有服务商:
  1. CLI参数:
    --model <id>
  2. EXTEND.md配置:
    default_model.[provider]
  3. 环境变量:
    <PROVIDER>_IMAGE_MODEL
    (例如
    GOOGLE_IMAGE_MODEL
  4. 内置默认值
EXTEND.md配置优先级高于环境变量。如果同时存在EXTEND.md的
default_model.google: "gemini-3-pro-image-preview"
和环境变量
GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview
,则以EXTEND.md的配置为准。
Agent必须在每次生成前展示模型信息
  • 展示内容:
    Using [provider] / [model]
  • 展示切换提示:
    Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL

DashScope Models

DashScope模型

Use
--model qwen-image-2.0-pro
or set
default_model.dashscope
/
DASHSCOPE_IMAGE_MODEL
when the user wants official Qwen-Image behavior.
Official DashScope model families:
  • qwen-image-2.0-pro
    ,
    qwen-image-2.0-pro-2026-03-03
    ,
    qwen-image-2.0
    ,
    qwen-image-2.0-2026-03-03
    • Free-form
      size
      in
      宽*高
      format
    • Total pixels must stay between
      512*512
      and
      2048*2048
    • Default size is approximately
      1024*1024
    • Best choice for custom ratios such as
      21:9
      and text-heavy Chinese/English layouts
  • qwen-image-max
    ,
    qwen-image-max-2025-12-30
    ,
    qwen-image-plus
    ,
    qwen-image-plus-2026-01-09
    ,
    qwen-image
    • Fixed sizes only:
      1664*928
      ,
      1472*1104
      ,
      1328*1328
      ,
      1104*1472
      ,
      928*1664
    • Default size is
      1664*928
    • qwen-image
      currently has the same capability as
      qwen-image-plus
  • Legacy DashScope models such as
    z-image-turbo
    ,
    z-image-ultra
    ,
    wanx-v1
    • Keep using them only when the user explicitly asks for legacy behavior or compatibility
When translating CLI args into DashScope behavior:
  • --size
    wins over
    --ar
  • For
    qwen-image-2.0*
    , prefer explicit
    --size
    ; otherwise infer from
    --ar
    and use the official recommended resolutions below
  • For
    qwen-image-max/plus/image
    , only use the five official fixed sizes; if the requested ratio is not covered, switch to
    qwen-image-2.0-pro
  • --quality
    is a baoyu-image-gen compatibility preset, not a native DashScope API field. Mapping
    normal
    /
    2k
    onto the
    qwen-image-2.0*
    table below is an implementation inference, not an official API guarantee
Recommended
qwen-image-2.0*
sizes for common aspect ratios:
Ratio
normal
2k
1:1
1024*1024
1536*1536
2:3
768*1152
1024*1536
3:2
1152*768
1536*1024
3:4
960*1280
1080*1440
4:3
1280*960
1440*1080
9:16
720*1280
1080*1920
16:9
1280*720
1920*1080
21:9
1344*576
2048*872
DashScope official APIs also expose
negative_prompt
,
prompt_extend
, and
watermark
, but
baoyu-image-gen
does not expose them as dedicated CLI flags today.
Official references:
当用户需要官方Qwen-Image行为时,请使用
--model qwen-image-2.0-pro
,或者设置
default_model.dashscope
/
DASHSCOPE_IMAGE_MODEL
官方DashScope模型系列:
  • qwen-image-2.0-pro
    qwen-image-2.0-pro-2026-03-03
    qwen-image-2.0
    qwen-image-2.0-2026-03-03
    • 支持
      宽*高
      格式的自定义
      size
    • 总像素需保持在
      512*512
      2048*2048
      之间
    • 默认尺寸约为
      1024*1024
    • 21:9
      等自定义比例和中英文文字密集型布局的最佳选择
  • qwen-image-max
    qwen-image-max-2025-12-30
    qwen-image-plus
    qwen-image-plus-2026-01-09
    qwen-image
    • 仅支持固定尺寸:
      1664*928
      1472*1104
      1328*1328
      1104*1472
      928*1664
    • 默认尺寸为
      1664*928
    • 当前
      qwen-image
      的能力与
      qwen-image-plus
      一致
  • 旧版DashScope模型,例如
    z-image-turbo
    z-image-ultra
    wanx-v1
    • 仅当用户明确要求旧版行为或兼容性时才继续使用
将CLI参数转换为DashScope行为时:
  • --size
    优先级高于
    --ar
  • 对于
    qwen-image-2.0*
    系列,优先使用显式指定的
    --size
    ;否则根据
    --ar
    推断并使用下方官方推荐分辨率
  • 对于
    qwen-image-max/plus/image
    系列,仅可使用5种官方固定尺寸;如果请求的比例不在支持范围内,请切换到
    qwen-image-2.0-pro
  • --quality
    是baoyu-image-gen的兼容性预设,并非DashScope原生API字段。将
    normal
    /
    2k
    映射到下方
    qwen-image-2.0*
    表格的尺寸是实现层面的推断,并非官方API保证
常见宽高比对应的
qwen-image-2.0*
推荐尺寸:
比例
normal
2k
1:1
1024*1024
1536*1536
2:3
768*1152
1024*1536
3:2
1152*768
1536*1024
3:4
960*1280
1080*1440
4:3
1280*960
1440*1080
9:16
720*1280
1080*1920
16:9
1280*720
1920*1080
21:9
1344*576
2048*872
DashScope官方API还提供
negative_prompt
prompt_extend
watermark
参数,但当前baoyu-image-gen未将其作为专用CLI参数开放。
官方参考文档:

OpenRouter Models

OpenRouter模型

Use full OpenRouter model IDs, e.g.:
  • google/gemini-3.1-flash-image-preview
    (recommended, supports image output and reference-image workflows)
  • google/gemini-2.5-flash-image-preview
  • black-forest-labs/flux.2-pro
  • Other OpenRouter image-capable model IDs
Notes:
  • OpenRouter image generation uses
    /chat/completions
    , not the OpenAI
    /images
    endpoints
  • If
    --ref
    is used, choose a multimodal model that supports image input and image output
  • --imageSize
    maps to OpenRouter
    imageGenerationOptions.size
    ;
    --size <WxH>
    is converted to the nearest OpenRouter size and inferred aspect ratio when possible
请使用完整的OpenRouter模型ID,例如:
  • google/gemini-3.1-flash-image-preview
    (推荐,支持图像输出和参考图工作流)
  • google/gemini-2.5-flash-image-preview
  • black-forest-labs/flux.2-pro
  • 其他支持图像生成的OpenRouter模型ID
注意事项:
  • OpenRouter图像生成使用
    /chat/completions
    接口,而非OpenAI的
    /images
    接口
  • 如果使用
    --ref
    参数,请选择同时支持图像输入和图像输出的多模态模型
  • --imageSize
    映射到OpenRouter的
    imageGenerationOptions.size
    --size <WxH>
    会被转换为最接近的OpenRouter尺寸,并在可能的情况下推断宽高比

Replicate Models

Replicate模型

Supported model formats:
  • owner/name
    (recommended for official models), e.g.
    google/nano-banana-pro
  • owner/name:version
    (community models by version), e.g.
    stability-ai/sdxl:<version>
Examples:
bash
undefined
支持的模型格式:
  • owner/name
    (官方模型推荐使用),例如
    google/nano-banana-pro
  • owner/name:version
    (按版本指定社区模型),例如
    stability-ai/sdxl:<version>
示例:
bash
undefined

Use Replicate default model

使用Replicate默认模型

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

Override model explicitly

显式覆盖模型

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
undefined
${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana
undefined

Provider Selection

服务商选择规则

  1. --ref
    provided + no
    --provider
    → auto-select Google first, then OpenAI, then OpenRouter, then Replicate (Jimeng and Seedream do not support reference images)
  2. --provider
    specified → use it (if
    --ref
    , must be
    google
    ,
    openai
    ,
    openrouter
    , or
    replicate
    )
  3. Only one API key available → use that provider
  4. Multiple available → default to Google
  1. 提供了
    --ref
    且未指定
    --provider
    → 优先自动选择Google,其次是OpenAI,然后是OpenRouter,最后是Replicate(Jimeng和Seedream不支持参考图)
  2. 已指定
    --provider
    → 使用指定的服务商(如果使用
    --ref
    ,则服务商必须是
    google
    openai
    openrouter
    replicate
  3. 仅存在一个可用API密钥 → 使用对应服务商
  4. 存在多个可用API密钥 → 默认使用Google

Quality Presets

画质预设

PresetGoogle imageSizeOpenAI SizeOpenRouter sizeReplicate resolutionUse Case
normal
1K1024px1K1KQuick previews
2k
(default)
2K2048px2K2KCovers, illustrations, infographics
Google/OpenRouter imageSize: Can be overridden with
--imageSize 1K|2K|4K
预设Google imageSizeOpenAI尺寸OpenRouter尺寸Replicate分辨率适用场景
normal
1K1024px1K1K快速预览
2k
(默认)
2K2048px2K2K封面、插画、信息图
Google/OpenRouter imageSize:可通过
--imageSize 1K|2K|4K
覆盖默认值

Aspect Ratios

宽高比

Supported:
1:1
,
16:9
,
9:16
,
4:3
,
3:4
,
2.35:1
  • Google multimodal: uses
    imageConfig.aspectRatio
  • OpenAI: maps to closest supported size
  • OpenRouter: sends
    imageGenerationOptions.aspect_ratio
    ; if only
    --size <WxH>
    is given, aspect ratio is inferred automatically
  • Replicate: passes
    aspect_ratio
    to model; when
    --ref
    is provided without
    --ar
    , defaults to
    match_input_image
支持的比例:
1:1
16:9
9:16
4:3
3:4
2.35:1
  • Google多模态模型:使用
    imageConfig.aspectRatio
    参数
  • OpenAI:映射为最接近的支持尺寸
  • OpenRouter:发送
    imageGenerationOptions.aspect_ratio
    参数;如果仅提供
    --size <WxH>
    ,则会自动推断宽高比
  • Replicate:向模型传递
    aspect_ratio
    参数;如果提供了
    --ref
    但未指定
    --ar
    ,则默认使用
    match_input_image

Generation Mode

生成模式

Default: Sequential generation.
Batch Parallel Generation: When
--batchfile
contains 2 or more pending tasks, the script automatically enables parallel generation.
ModeWhen to Use
Sequential (default)Normal usage, single images, small batches
Parallel batchBatch mode with 2+ tasks
Execution choice:
SituationPreferred approachWhy
One image, or 1-2 simple imagesSequentialLower coordination overhead and easier debugging
Multiple images already have saved prompt filesBatch (
--batchfile
)
Reuses finalized prompts, applies shared throttling/retries, and gives predictable throughput
Each image still needs separate reasoning, prompt writing, or style explorationSubagentsThe work is still exploratory, so each image may need independent analysis before generation
Rule of thumb:
  • Prefer batch over subagents once prompt files are already saved and the task is "generate all of these"
  • Use subagents only when generation is coupled with per-image thinking, rewriting, or divergent creative exploration
Parallel behavior:
  • Default worker count is automatic, capped by config, built-in default 10
  • Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts
  • You can override worker count with
    --jobs <count>
  • Each image retries automatically up to 3 attempts
  • Final output includes success count, failure count, and per-image failure reasons
默认:顺序生成
批量并行生成:当
--batchfile
包含2个及以上待处理任务时,脚本自动启用并行生成。
模式适用场景
顺序生成(默认)常规使用、单图生成、小批量任务
并行批量生成包含2个及以上任务的批量模式
执行方式选择:
场景推荐方案原因
单张图像,或1-2张简单图像顺序生成协调开销更低,更易调试
多张图像已有保存好的prompt文件批量生成(
--batchfile
复用已定稿的提示词,应用统一的限流/重试策略,吞吐量可预测
每张图像仍需要单独推理、撰写提示词或风格探索子Agent工作仍处于探索阶段,每张图像在生成前可能需要独立分析
经验法则:
  • 如果prompt文件已保存且任务为“生成所有这些图像”,优先使用批量生成而非子Agent
  • 仅当生成需要与每张图像的思考、改写或发散性创意探索耦合时,才使用子Agent
并行行为说明:
  • 默认并发数为自动配置,受配置上限约束,内置默认值为10
  • 仅在批量模式下应用服务商专属限流策略,内置默认值已调优,可在避免明显RPM峰值的同时实现更快的吞吐量
  • 可通过
    --jobs <count>
    参数覆盖并发数
  • 每张图像最多自动重试3次
  • 最终输出包含成功数量、失败数量和每张图像的失败原因

Error Handling

错误处理

  • Missing API key → error with setup instructions
  • Generation failure → auto-retry up to 3 attempts per image
  • Invalid aspect ratio → warning, proceed with default
  • Reference images with unsupported provider/model → error with fix hint
  • 缺失API密钥 → 返回错误并附带设置指南
  • 生成失败 → 每张图像最多自动重试3次
  • 无效宽高比 → 发出警告,使用默认值继续执行
  • 参考图对应服务商/模型不支持 → 返回错误并附带修复提示

Extension Support

扩展支持

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.
可通过EXTEND.md自定义配置,路径和支持选项见偏好设置章节。

Attribution

归属说明

Based on baoyu-image-gen by JimLiu, licensed under MIT. Modified and adapted for the Buda.im platform.
基于JimLiu开发的baoyu-image-gen修改,采用MIT许可证。 已适配修改用于Buda.im平台。