baoyu-image-gen

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Image Generation (AI SDK)

图像生成（AI SDK）

Official API-based image generation. Supports OpenAI, Google, OpenRouter, DashScope (阿里通义万象), Jimeng (即梦), Seedream (豆包) and Replicate providers.

基于官方API实现的图像生成工具，支持OpenAI、Google、OpenRouter、DashScope（阿里通义万象）、Jimeng（即梦）、Seedream（豆包）和Replicate服务提供商。

Script Directory

脚本目录

Agent Execution:

```
{baseDir}
```
= this SKILL.md file's directory
Script path =
```
{baseDir}/scripts/main.ts
```
Resolve
```
${BUN_X}
```
runtime: if
```
bun
```
installed →
```
bun
```
; if
```
npx
```
available →
```
npx -y bun
```
; else suggest installing bun

Agent 执行规则：

```
{baseDir}
```
= 本SKILL.md文件所在目录
脚本路径 =
```
{baseDir}/scripts/main.ts
```
解析
```
${BUN_X}
```
运行时：如果已安装
```
bun
```
→ 直接使用
```
bun
```
；如果可用
```
npx
```
→ 使用
```
npx -y bun
```
；否则提示用户安装bun

Step 0: Load Preferences ⛔ BLOCKING

步骤0：加载偏好设置 ⛔ 阻塞步骤

CRITICAL: This step MUST complete BEFORE any image generation. Do NOT skip or defer.

Check EXTEND.md existence from the current working directory:

bash

test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "found"

Result	Action
Found	Load, parse, apply settings. If `default_model.[provider]` is null → ask model only (Flow 2 in first-time-setup.md)
Not found	⛔ STOP. Do NOT generate any images. Read references/config/first-time-setup.md and follow its Flow 1 checklist step by step. This is a multi-turn interactive setup that requires asking the user multiple questions. Resume image generation only after Step 5 (verify) passes.

CRITICAL: The first-time setup is a multi-step interactive workflow, NOT a single action. You must ask the user questions and wait for answers at each step.

Path	Location
`.baoyu-skills/baoyu-image-gen/EXTEND.md`	Relative to current working directory

Schema:

references/config/preferences-schema.md

重要提示：本步骤必须在任何图像生成前完成，不得跳过或延迟执行。

检查当前工作目录下是否存在EXTEND.md：

bash

test -f .baoyu-skills/baoyu-image-gen/EXTEND.md && echo "found"

结果	操作
存在	加载、解析并应用配置。如果 `default_model.[provider]` 为null → 仅询问用户所需模型（对应first-time-setup.md中的流程2）
不存在	⛔ 停止操作，禁止生成任何图像。请阅读references/config/first-time-setup.md并逐步遵循其流程1的检查清单操作。这是一个多轮交互的设置流程，需要多次向用户询问信息，仅在步骤5（验证）通过后才可恢复图像生成。

重要提示：首次设置是多步骤交互工作流，而非单次操作。你必须每一步都向用户询问问题并等待回复。

路径	位置
`.baoyu-skills/baoyu-image-gen/EXTEND.md`	相对于当前工作目录的路径

配置Schema：

references/config/preferences-schema.md

Usage

使用方法

bash

undefined

bash

undefined

Basic

基础用法

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image cat.png

With aspect ratio

指定宽高比

${BUN_X} {baseDir}/scripts/main.ts --prompt "A landscape" --image out.png --ar 16:9

High quality

高清画质

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --quality 2k

From prompt files

从prompt文件读取提示词

${BUN_X} {baseDir}/scripts/main.ts --promptfiles system.md content.md --image out.png

With reference images (Google, OpenAI, OpenRouter, Replicate, or Seedream 4.0/4.5/5.0)

带参考图生成（支持Google、OpenAI、OpenRouter、Replicate或Seedream 4.0/4.5/5.0）

${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --ref source.png

With reference images (explicit provider/model)

带参考图+显式指定服务商/模型

${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider google --model gemini-3-pro-image-preview --ref source.png

OpenRouter (recommended default model)

使用OpenRouter（推荐默认模型）

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openrouter

OpenRouter with reference images

OpenRouter带参考图生成

${BUN_X} {baseDir}/scripts/main.ts --prompt "Make blue" --image out.png --provider openrouter --model google/gemini-3.1-flash-image-preview --ref source.png

Specific provider

指定服务提供商

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider openai

DashScope (阿里通义万象)

使用DashScope（阿里通义万象）

${BUN_X} {baseDir}/scripts/main.ts --prompt "一只可爱的猫" --image out.png --provider dashscope

DashScope Qwen-Image 2.0 Pro (recommended for custom sizes and text rendering)

DashScope Qwen-Image 2.0 Pro（推荐用于自定义尺寸和文字渲染场景）

${BUN_X} {baseDir}/scripts/main.ts --prompt "为咖啡品牌设计一张 21:9 横幅海报，包含清晰中文标题" --image out.png --provider dashscope --model qwen-image-2.0-pro --size 2048x872

DashScope legacy Qwen fixed-size model

DashScope旧版Qwen固定尺寸模型

${BUN_X} {baseDir}/scripts/main.ts --prompt "一张电影感海报" --image out.png --provider dashscope --model qwen-image-max --size 1664x928

Replicate (google/nano-banana-pro)

使用Replicate（google/nano-banana-pro）

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

Replicate with specific model

Replicate指定模型

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

Batch mode with saved prompt files

使用保存的prompt文件批量生成

${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json

Batch mode with explicit worker count

批量模式显式指定并发数

${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json

undefined

${BUN_X} {baseDir}/scripts/main.ts --batchfile batch.json --jobs 4 --json

undefined

Batch File Format

批量文件格式

json

{
  "jobs": 4,
  "tasks": [
    {
      "id": "hero",
      "promptFiles": ["prompts/hero.md"],
      "image": "out/hero.png",
      "provider": "replicate",
      "model": "google/nano-banana-pro",
      "ar": "16:9",
      "quality": "2k"
    },
    {
      "id": "diagram",
      "promptFiles": ["prompts/diagram.md"],
      "image": "out/diagram.png",
      "ref": ["references/original.png"]
    }
  ]
}

Paths in

promptFiles

image

, and

ref

are resolved relative to the batch file's directory.

jobs

is optional (overridden by CLI

--jobs

). Top-level array format (without

jobs

wrapper) is also accepted.

json

{
  "jobs": 4,
  "tasks": [
    {
      "id": "hero",
      "promptFiles": ["prompts/hero.md"],
      "image": "out/hero.png",
      "provider": "replicate",
      "model": "google/nano-banana-pro",
      "ar": "16:9",
      "quality": "2k"
    },
    {
      "id": "diagram",
      "promptFiles": ["prompts/diagram.md"],
      "image": "out/diagram.png",
      "ref": ["references/original.png"]
    }
  ]
}

promptFiles

、

image

和

ref

中的路径是相对于批量文件所在目录解析的。

jobs

为可选参数（会被CLI的

--jobs

参数覆盖）。也支持没有

jobs

外层包装的顶级数组格式。

Options

可选参数

Option	Description
`--prompt <text>` , `-p`	Prompt text
`--promptfiles <files...>`	Read prompt from files (concatenated)
`--image <path>`	Output image path (required in single-image mode)
`--batchfile <path>`	JSON batch file for multi-image generation
`--jobs <count>`	Worker count for batch mode (default: auto, max from config, built-in default 10)
`--provider google\|openai\|openrouter\|dashscope\|jimeng\|seedream\|replicate`	Force provider (default: auto-detect)
`--model <id>` , `-m`	Model ID (Google: `gemini-3-pro-image-preview` ; OpenAI: `gpt-image-1.5` ; OpenRouter: `google/gemini-3.1-flash-image-preview` ; DashScope: `qwen-image-2.0-pro` )
`--ar <ratio>`	Aspect ratio (e.g., `16:9` , `1:1` , `4:3` )
`--size <WxH>`	Size (e.g., `1024x1024` )
`--quality normal\|2k`	Quality preset (default: `2k` )
`--imageSize 1K\|2K\|4K`	Image size for Google/OpenRouter (default: from quality)
`--ref <files...>`	Reference images. Supported by Google multimodal, OpenAI GPT Image edits, OpenRouter multimodal models, Replicate, and Seedream 5.0/4.5/4.0. Not supported by Jimeng, Seedream 3.0, or removed SeedEdit 3.0
`--n <count>`	Number of images
`--json`	JSON output

参数	说明
`--prompt <text>` , `-p`	提示词文本
`--promptfiles <files...>`	从文件读取提示词（会自动拼接内容）
`--image <path>`	输出图像路径（单图模式下必填）
`--batchfile <path>`	用于多图生成的JSON批量配置文件路径
`--jobs <count>`	批量模式下的并发工作进程数（默认：自动配置，上限为配置中的最大值，内置默认值为10）
`--provider google\|openai\|openrouter\|dashscope\|jimeng\|seedream\|replicate`	强制指定服务提供商（默认：自动检测）
`--model <id>` , `-m`	模型ID（Google: `gemini-3-pro-image-preview` ; OpenAI: `gpt-image-1.5` ; OpenRouter: `google/gemini-3.1-flash-image-preview` ; DashScope: `qwen-image-2.0-pro` ）
`--ar <ratio>`	宽高比（例如： `16:9` 、 `1:1` 、 `4:3` ）
`--size <WxH>`	图像尺寸（例如： `1024x1024` ）
`--quality normal\|2k`	画质预设（默认： `2k` ）
`--imageSize 1K\|2K\|4K`	Google/OpenRouter的图像尺寸（默认：从画质参数继承）
`--ref <files...>`	参考图。支持Google多模态模型、OpenAI GPT Image编辑、OpenRouter多模态模型、Replicate和Seedream 5.0/4.5/4.0。Jimeng、Seedream 3.0和已下线的SeedEdit 3.0不支持该功能
`--n <count>`	生成图像数量
`--json`	以JSON格式输出结果

Environment Variables

环境变量

Variable	Description
`OPENAI_API_KEY`	OpenAI API key
`OPENROUTER_API_KEY`	OpenRouter API key
`GOOGLE_API_KEY`	Google API key
`DASHSCOPE_API_KEY`	DashScope API key (阿里云)
`REPLICATE_API_TOKEN`	Replicate API token
`JIMENG_ACCESS_KEY_ID`	Jimeng (即梦) Volcengine access key
`JIMENG_SECRET_ACCESS_KEY`	Jimeng (即梦) Volcengine secret key
`ARK_API_KEY`	Seedream (豆包) Volcengine ARK API key
`OPENAI_IMAGE_MODEL`	OpenAI model override
`OPENROUTER_IMAGE_MODEL`	OpenRouter model override (default: `google/gemini-3.1-flash-image-preview` )
`GOOGLE_IMAGE_MODEL`	Google model override
`DASHSCOPE_IMAGE_MODEL`	DashScope model override (default: `qwen-image-2.0-pro` )
`REPLICATE_IMAGE_MODEL`	Replicate model override (default: google/nano-banana-pro)
`JIMENG_IMAGE_MODEL`	Jimeng model override (default: jimeng_t2i_v40)
`SEEDREAM_IMAGE_MODEL`	Seedream model override (default: doubao-seedream-5-0-260128)
`OPENAI_BASE_URL`	Custom OpenAI endpoint
`OPENROUTER_BASE_URL`	Custom OpenRouter endpoint (default: `https://openrouter.ai/api/v1` )
`OPENROUTER_HTTP_REFERER`	Optional app/site URL for OpenRouter attribution
`OPENROUTER_TITLE`	Optional app name for OpenRouter attribution
`GOOGLE_BASE_URL`	Custom Google endpoint
`DASHSCOPE_BASE_URL`	Custom DashScope endpoint
`REPLICATE_BASE_URL`	Custom Replicate endpoint
`JIMENG_BASE_URL`	Custom Jimeng endpoint (default: `https://visual.volcengineapi.com` )
`JIMENG_REGION`	Jimeng region (default: `cn-north-1` )
`SEEDREAM_BASE_URL`	Custom Seedream endpoint (default: `https://ark.cn-beijing.volces.com/api/v3` )
`BAOYU_IMAGE_GEN_MAX_WORKERS`	Override batch worker cap
`BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY`	Override provider concurrency, e.g. `BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY`
`BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS`	Override provider start gap, e.g. `BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS`

Load Priority: CLI args > EXTEND.md > env vars >

<cwd>/.baoyu-skills/.env

~/.baoyu-skills/.env

变量	说明
`OPENAI_API_KEY`	OpenAI API密钥
`OPENROUTER_API_KEY`	OpenRouter API密钥
`GOOGLE_API_KEY`	Google API密钥
`DASHSCOPE_API_KEY`	DashScope API密钥（阿里云）
`REPLICATE_API_TOKEN`	Replicate API令牌
`JIMENG_ACCESS_KEY_ID`	Jimeng（即梦）火山引擎access key
`JIMENG_SECRET_ACCESS_KEY`	Jimeng（即梦）火山引擎secret key
`ARK_API_KEY`	Seedream（豆包）火山引擎ARK API密钥
`OPENAI_IMAGE_MODEL`	OpenAI模型覆盖配置
`OPENROUTER_IMAGE_MODEL`	OpenRouter模型覆盖配置（默认： `google/gemini-3.1-flash-image-preview` ）
`GOOGLE_IMAGE_MODEL`	Google模型覆盖配置
`DASHSCOPE_IMAGE_MODEL`	DashScope模型覆盖配置（默认： `qwen-image-2.0-pro` ）
`REPLICATE_IMAGE_MODEL`	Replicate模型覆盖配置（默认：google/nano-banana-pro）
`JIMENG_IMAGE_MODEL`	Jimeng模型覆盖配置（默认：jimeng_t2i_v40）
`SEEDREAM_IMAGE_MODEL`	Seedream模型覆盖配置（默认：doubao-seedream-5-0-260128）
`OPENAI_BASE_URL`	自定义OpenAI接口地址
`OPENROUTER_BASE_URL`	自定义OpenRouter接口地址（默认： `https://openrouter.ai/api/v1` ）
`OPENROUTER_HTTP_REFERER`	可选，OpenRouter归因用的应用/站点URL
`OPENROUTER_TITLE`	可选，OpenRouter归因用的应用名称
`GOOGLE_BASE_URL`	自定义Google接口地址
`DASHSCOPE_BASE_URL`	自定义DashScope接口地址
`REPLICATE_BASE_URL`	自定义Replicate接口地址
`JIMENG_BASE_URL`	自定义Jimeng接口地址（默认： `https://visual.volcengineapi.com` ）
`JIMENG_REGION`	Jimeng服务区域（默认： `cn-north-1` ）
`SEEDREAM_BASE_URL`	自定义Seedream接口地址（默认： `https://ark.cn-beijing.volces.com/api/v3` ）
`BAOYU_IMAGE_GEN_MAX_WORKERS`	批量并发上限覆盖配置
`BAOYU_IMAGE_GEN_<PROVIDER>_CONCURRENCY`	指定服务商的并发数覆盖配置，例如 `BAOYU_IMAGE_GEN_REPLICATE_CONCURRENCY`
`BAOYU_IMAGE_GEN_<PROVIDER>_START_INTERVAL_MS`	指定服务商的请求间隔覆盖配置，例如 `BAOYU_IMAGE_GEN_REPLICATE_START_INTERVAL_MS`

加载优先级：CLI参数 > EXTEND.md配置 > 环境变量 >

<cwd>/.baoyu-skills/.env

~/.baoyu-skills/.env

Model Resolution

模型选择规则

Model priority (highest → lowest), applies to all providers:

CLI flag:
```
--model <id>
```
EXTEND.md:
```
default_model.[provider]
```

Env var:

<PROVIDER>_IMAGE_MODEL

(e.g.,

GOOGLE_IMAGE_MODEL

)

Built-in default

EXTEND.md overrides env vars. If both EXTEND.md

default_model.google: "gemini-3-pro-image-preview"

and env var

GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview

exist, EXTEND.md wins.

Agent MUST display model info before each generation:

Show:
```
Using [provider] / [model]
```

Show switch hint:

Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL

模型优先级从高到低，适用于所有服务商：

CLI参数：
```
--model <id>
```
EXTEND.md配置：
```
default_model.[provider]
```

环境变量：

<PROVIDER>_IMAGE_MODEL

（例如

GOOGLE_IMAGE_MODEL

）

内置默认值

EXTEND.md配置优先级高于环境变量。如果同时存在EXTEND.md的

default_model.google: "gemini-3-pro-image-preview"

和环境变量

GOOGLE_IMAGE_MODEL=gemini-3.1-flash-image-preview

，则以EXTEND.md的配置为准。

Agent必须在每次生成前展示模型信息：

展示内容：
```
Using [provider] / [model]
```

展示切换提示：

Switch model: --model <id> | EXTEND.md default_model.[provider] | env <PROVIDER>_IMAGE_MODEL

DashScope Models

DashScope模型

Use

--model qwen-image-2.0-pro

or set

default_model.dashscope

DASHSCOPE_IMAGE_MODEL

when the user wants official Qwen-Image behavior.

Official DashScope model families:

```
qwen-image-2.0-pro
```
,
```
qwen-image-2.0-pro-2026-03-03
```
,
```
qwen-image-2.0
```
,
```
qwen-image-2.0-2026-03-03
```
- Free-form
```
size
```
  in
```
宽*高
```
  format
- Total pixels must stay between
```
512*512
```
  and
```
2048*2048
```
- Default size is approximately
```
1024*1024
```
- Best choice for custom ratios such as
```
21:9
```
  and text-heavy Chinese/English layouts

qwen-image-max

qwen-image-max-2025-12-30

qwen-image-plus

qwen-image-plus-2026-01-09

qwen-image

Fixed sizes only:

1664*928

1472*1104

1328*1328

1104*1472

928*1664

Default size is
```
1664*928
```
```
qwen-image
```
currently has the same capability as
```
qwen-image-plus
```

Legacy DashScope models such as
```
z-image-turbo
```
,
```
z-image-ultra
```
,
```
wanx-v1
```
- Keep using them only when the user explicitly asks for legacy behavior or compatibility

When translating CLI args into DashScope behavior:

```
--size
```
wins over
```
--ar
```
For
```
qwen-image-2.0*
```
, prefer explicit
```
--size
```
; otherwise infer from
```
--ar
```
and use the official recommended resolutions below
For
```
qwen-image-max/plus/image
```
, only use the five official fixed sizes; if the requested ratio is not covered, switch to
```
qwen-image-2.0-pro
```
```
--quality
```
is a baoyu-image-gen compatibility preset, not a native DashScope API field. Mapping
```
normal
```
/
```
2k
```
onto the
```
qwen-image-2.0*
```
table below is an implementation inference, not an official API guarantee

Recommended

qwen-image-2.0*

sizes for common aspect ratios:

Ratio	`normal`	`2k`
`1:1`	`1024*1024`	`1536*1536`
`2:3`	`768*1152`	`1024*1536`
`3:2`	`1152*768`	`1536*1024`
`3:4`	`960*1280`	`1080*1440`
`4:3`	`1280*960`	`1440*1080`
`9:16`	`720*1280`	`1080*1920`
`16:9`	`1280*720`	`1920*1080`
`21:9`	`1344*576`	`2048*872`

DashScope official APIs also expose

negative_prompt

prompt_extend

, and

watermark

, but

baoyu-image-gen

does not expose them as dedicated CLI flags today.

Official references:

当用户需要官方Qwen-Image行为时，请使用

--model qwen-image-2.0-pro

，或者设置

default_model.dashscope

DASHSCOPE_IMAGE_MODEL

。

官方DashScope模型系列：

```
qwen-image-2.0-pro
```
、
```
qwen-image-2.0-pro-2026-03-03
```
、
```
qwen-image-2.0
```
、
```
qwen-image-2.0-2026-03-03
```
- 支持
```
宽*高
```
  格式的自定义
```
size
```
- 总像素需保持在
```
512*512
```
  到
```
2048*2048
```
  之间
- 默认尺寸约为
```
1024*1024
```
- 是
```
21:9
```
  等自定义比例和中英文文字密集型布局的最佳选择

qwen-image-max

、

qwen-image-max-2025-12-30

、

qwen-image-plus

、

qwen-image-plus-2026-01-09

、

qwen-image

仅支持固定尺寸：

1664*928

、

1472*1104

、

1328*1328

、

1104*1472

、

928*1664

默认尺寸为
```
1664*928
```
当前
```
qwen-image
```
的能力与
```
qwen-image-plus
```
一致

旧版DashScope模型，例如
```
z-image-turbo
```
、
```
z-image-ultra
```
、
```
wanx-v1
```
- 仅当用户明确要求旧版行为或兼容性时才继续使用

将CLI参数转换为DashScope行为时：

```
--size
```
优先级高于
```
--ar
```
对于
```
qwen-image-2.0*
```
系列，优先使用显式指定的
```
--size
```
；否则根据
```
--ar
```
推断并使用下方官方推荐分辨率
对于
```
qwen-image-max/plus/image
```
系列，仅可使用5种官方固定尺寸；如果请求的比例不在支持范围内，请切换到
```
qwen-image-2.0-pro
```
```
--quality
```
是baoyu-image-gen的兼容性预设，并非DashScope原生API字段。将
```
normal
```
/
```
2k
```
映射到下方
```
qwen-image-2.0*
```
表格的尺寸是实现层面的推断，并非官方API保证

常见宽高比对应的

qwen-image-2.0*

推荐尺寸：

比例	`normal`	`2k`
`1:1`	`1024*1024`	`1536*1536`
`2:3`	`768*1152`	`1024*1536`
`3:2`	`1152*768`	`1536*1024`
`3:4`	`960*1280`	`1080*1440`
`4:3`	`1280*960`	`1440*1080`
`9:16`	`720*1280`	`1080*1920`
`16:9`	`1280*720`	`1920*1080`
`21:9`	`1344*576`	`2048*872`

DashScope官方API还提供

negative_prompt

、

prompt_extend

和

watermark

参数，但当前baoyu-image-gen未将其作为专用CLI参数开放。

官方参考文档：

OpenRouter Models

OpenRouter模型

Use full OpenRouter model IDs, e.g.:

```
google/gemini-3.1-flash-image-preview
```
(recommended, supports image output and reference-image workflows)
```
google/gemini-2.5-flash-image-preview
```
```
black-forest-labs/flux.2-pro
```
Other OpenRouter image-capable model IDs

Notes:

OpenRouter image generation uses
```
/chat/completions
```
, not the OpenAI
```
/images
```
endpoints
If
```
--ref
```
is used, choose a multimodal model that supports image input and image output
```
--imageSize
```
maps to OpenRouter
```
imageGenerationOptions.size
```
;
```
--size <WxH>
```
is converted to the nearest OpenRouter size and inferred aspect ratio when possible

请使用完整的OpenRouter模型ID，例如：

```
google/gemini-3.1-flash-image-preview
```
（推荐，支持图像输出和参考图工作流）
```
google/gemini-2.5-flash-image-preview
```
```
black-forest-labs/flux.2-pro
```
其他支持图像生成的OpenRouter模型ID

注意事项：

OpenRouter图像生成使用
```
/chat/completions
```
接口，而非OpenAI的
```
/images
```
接口
如果使用
```
--ref
```
参数，请选择同时支持图像输入和图像输出的多模态模型
```
--imageSize
```
映射到OpenRouter的
```
imageGenerationOptions.size
```
；
```
--size <WxH>
```
会被转换为最接近的OpenRouter尺寸，并在可能的情况下推断宽高比

Replicate Models

Replicate模型

Supported model formats:

```
owner/name
```
(recommended for official models), e.g.
```
google/nano-banana-pro
```

owner/name:version

(community models by version), e.g.

stability-ai/sdxl:<version>

Examples:

bash

undefined

支持的模型格式：

```
owner/name
```
（官方模型推荐使用），例如
```
google/nano-banana-pro
```

owner/name:version

（按版本指定社区模型），例如

stability-ai/sdxl:<version>

示例：

bash

undefined

Use Replicate default model

使用Replicate默认模型

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate

Override model explicitly

显式覆盖模型

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

undefined

${BUN_X} {baseDir}/scripts/main.ts --prompt "A cat" --image out.png --provider replicate --model google/nano-banana

undefined

Provider Selection

服务商选择规则

```
--ref
```
provided + no
```
--provider
```
→ auto-select Google first, then OpenAI, then OpenRouter, then Replicate (Jimeng and Seedream do not support reference images)

--provider

specified → use it (if

--ref

, must be

google

openai

openrouter

, or

replicate

)

Only one API key available → use that provider
Multiple available → default to Google

提供了
```
--ref
```
且未指定
```
--provider
```
→ 优先自动选择Google，其次是OpenAI，然后是OpenRouter，最后是Replicate（Jimeng和Seedream不支持参考图）
已指定
```
--provider
```
→ 使用指定的服务商（如果使用
```
--ref
```
，则服务商必须是
```
google
```
、
```
openai
```
、
```
openrouter
```
或
```
replicate
```
）
仅存在一个可用API密钥 → 使用对应服务商
存在多个可用API密钥 → 默认使用Google

Quality Presets

画质预设

Preset	Google imageSize	OpenAI Size	OpenRouter size	Replicate resolution	Use Case
`normal`	1K	1024px	1K	1K	Quick previews
`2k` (default)	2K	2048px	2K	2K	Covers, illustrations, infographics

Google/OpenRouter imageSize: Can be overridden with

--imageSize 1K|2K|4K

预设	Google imageSize	OpenAI尺寸	OpenRouter尺寸	Replicate分辨率	适用场景
`normal`	1K	1024px	1K	1K	快速预览
`2k` （默认）	2K	2048px	2K	2K	封面、插画、信息图

Google/OpenRouter imageSize：可通过

--imageSize 1K|2K|4K

覆盖默认值

Aspect Ratios

宽高比

Supported:

1:1

16:9

9:16

4:3

3:4

2.35:1

Google multimodal: uses
```
imageConfig.aspectRatio
```
OpenAI: maps to closest supported size
OpenRouter: sends
```
imageGenerationOptions.aspect_ratio
```
; if only
```
--size <WxH>
```
is given, aspect ratio is inferred automatically
Replicate: passes
```
aspect_ratio
```
to model; when
```
--ref
```
is provided without
```
--ar
```
, defaults to
```
match_input_image
```

支持的比例：

1:1

、

16:9

、

9:16

、

4:3

、

3:4

、

2.35:1

Google多模态模型：使用
```
imageConfig.aspectRatio
```
参数
OpenAI：映射为最接近的支持尺寸
OpenRouter：发送
```
imageGenerationOptions.aspect_ratio
```
参数；如果仅提供
```
--size <WxH>
```
，则会自动推断宽高比
Replicate：向模型传递
```
aspect_ratio
```
参数；如果提供了
```
--ref
```
但未指定
```
--ar
```
，则默认使用
```
match_input_image
```

Generation Mode

生成模式

Default: Sequential generation.

Batch Parallel Generation: When

--batchfile

contains 2 or more pending tasks, the script automatically enables parallel generation.

Mode	When to Use
Sequential (default)	Normal usage, single images, small batches
Parallel batch	Batch mode with 2+ tasks

Execution choice:

Situation	Preferred approach	Why
One image, or 1-2 simple images	Sequential	Lower coordination overhead and easier debugging
Multiple images already have saved prompt files	Batch ( `--batchfile` )	Reuses finalized prompts, applies shared throttling/retries, and gives predictable throughput
Each image still needs separate reasoning, prompt writing, or style exploration	Subagents	The work is still exploratory, so each image may need independent analysis before generation

Rule of thumb:

Prefer batch over subagents once prompt files are already saved and the task is "generate all of these"
Use subagents only when generation is coupled with per-image thinking, rewriting, or divergent creative exploration

Parallel behavior:

Default worker count is automatic, capped by config, built-in default 10
Provider-specific throttling is applied only in batch mode, and the built-in defaults are tuned for faster throughput while still avoiding obvious RPM bursts
You can override worker count with
```
--jobs <count>
```
Each image retries automatically up to 3 attempts
Final output includes success count, failure count, and per-image failure reasons

默认：顺序生成

批量并行生成：当

--batchfile

包含2个及以上待处理任务时，脚本自动启用并行生成。

模式	适用场景
顺序生成（默认）	常规使用、单图生成、小批量任务
并行批量生成	包含2个及以上任务的批量模式

执行方式选择：

场景	推荐方案	原因
单张图像，或1-2张简单图像	顺序生成	协调开销更低，更易调试
多张图像已有保存好的prompt文件	批量生成（ `--batchfile` ）	复用已定稿的提示词，应用统一的限流/重试策略，吞吐量可预测
每张图像仍需要单独推理、撰写提示词或风格探索	子Agent	工作仍处于探索阶段，每张图像在生成前可能需要独立分析

经验法则：

如果prompt文件已保存且任务为“生成所有这些图像”，优先使用批量生成而非子Agent
仅当生成需要与每张图像的思考、改写或发散性创意探索耦合时，才使用子Agent

并行行为说明：

默认并发数为自动配置，受配置上限约束，内置默认值为10
仅在批量模式下应用服务商专属限流策略，内置默认值已调优，可在避免明显RPM峰值的同时实现更快的吞吐量
可通过
```
--jobs <count>
```
参数覆盖并发数
每张图像最多自动重试3次
最终输出包含成功数量、失败数量和每张图像的失败原因

Error Handling

错误处理

Missing API key → error with setup instructions
Generation failure → auto-retry up to 3 attempts per image
Invalid aspect ratio → warning, proceed with default
Reference images with unsupported provider/model → error with fix hint

缺失API密钥 → 返回错误并附带设置指南
生成失败 → 每张图像最多自动重试3次
无效宽高比 → 发出警告，使用默认值继续执行
参考图对应服务商/模型不支持 → 返回错误并附带修复提示

Extension Support

扩展支持

Custom configurations via EXTEND.md. See Preferences section for paths and supported options.

可通过EXTEND.md自定义配置，路径和支持选项见偏好设置章节。

Attribution

归属说明

Based on baoyu-image-gen by JimLiu, licensed under MIT. Modified and adapted for the Buda.im platform.

基于JimLiu开发的baoyu-image-gen修改，采用MIT许可证。已适配修改用于Buda.im平台。