multi-agent-image

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Multi-Agent Image

Multi-Agent 图像生成

multi-agent-image
is a standalone Hermes skill for image generation workflows.
It is designed for cases where a simple one-line prompt is not enough. Instead of sending raw user input directly to an image model, this skill:
  1. analyzes the request,
  2. compiles it into a design-aware prompt,
  3. generates through
    gpt-image-2
    ,
  4. archives the result,
  5. and optionally reuses successful outputs as future style references.
This skill is independent at runtime. The design compiler is built into this repository and does not require an external skill.
multi-agent-image
是一款用于图像生成工作流的独立Hermes Skill。
它专为无法通过简单单行提示词完成的场景设计。该Skill不会直接将用户原始输入发送至图像模型,而是执行以下步骤:
  1. 分析请求,
  2. 将其编译为具备设计感知的提示词,
  3. 通过
    gpt-image-2
    生成图像,
  4. 归档结果,
  5. 可选地将成功输出复用为未来的风格参考。
该Skill在运行时独立。设计编译器内置在本仓库中,无需依赖外部Skill。

When to Use

使用场景

Use this skill when the user wants one or more of the following:
  • Design-oriented poster generation
  • Product images or ad visuals
  • PPT cover visuals or chapter art
  • Infographic-like or teaching/demo visuals
  • Style reference reuse from prior generations
  • Interactive “show examples first, then generate” flow
  • Batch generation for multiple directions or aspect ratios
  • Series generation where multiple images should share one visual language
Do not use this skill for:
  • pixel-accurate UI recreation
  • editable charts
  • exact typography output inside the image
  • tasks that require vector, HTML, or PPT-native assets rather than raster images
当用户有以下一项或多项需求时,可使用本Skill:
  • 面向设计的海报生成
  • 产品图片或广告视觉素材
  • PPT封面视觉图或章节插图
  • 信息图风格或教学/演示视觉素材
  • 复用先前生成内容作为风格参考
  • 交互式“先展示示例,再生成”流程
  • 针对多个方向或宽高比的批量生成
  • 生成风格统一的系列图像,多幅图像共享同一视觉语言
请勿将本Skill用于以下场景:
  • 像素级精准的UI还原
  • 可编辑图表
  • 图像内的精确排版输出
  • 需要矢量图、HTML或PPT原生资产而非光栅图像的任务

Architecture

架构

text
User Request
[Prompt Engineer]
[Style Scout]
[Internal Design Compiler]
[GPT-Image-2 Generation]
[QA + Archive]
[Case Library]
Optional layers on top of the main path:
  • Interactive reference selection
  • Batch generation
  • Series generation
text
User Request
[Prompt Engineer]
[Style Scout]
[Internal Design Compiler]
[GPT-Image-2 Generation]
[QA + Archive]
[Case Library]
主流程之上的可选模块:
  • 交互式参考选择
  • 批量生成
  • 系列生成

Setup

部署步骤

1. Deploy the skill

1. 部署Skill

The skill source lives in:
bash
~/.hermes/skills/multi-agent-image/
Install runtime files into the working directory:
bash
python3 ~/.hermes/skills/multi-agent-image/scripts/install.py
This prepares:
  • ~/.hermes/agents/multi-agent-image/output/
  • ~/.hermes/agents/multi-agent-image/case_library/
  • agent role folders and memory files
  • local runtime scripts copied from the skill
Skill源码位于:
bash
~/.hermes/skills/multi-agent-image/
将运行时文件安装至工作目录:
bash
python3 ~/.hermes/skills/multi-agent-image/scripts/install.py
该命令会准备以下内容:
  • ~/.hermes/agents/multi-agent-image/output/
  • ~/.hermes/agents/multi-agent-image/case_library/
  • Agent角色文件夹和内存文件
  • 从Skill复制的本地运行时脚本

2. Install Python dependencies

2. 安装Python依赖

bash
pip install openai requests
bash
pip install openai requests

3. Set API key

3. 设置API密钥

bash
export OPENAI_API_KEY="sk-..."
This key is used with the apimart-compatible GPT-Image-2 endpoints in this skill.
bash
export OPENAI_API_KEY="sk-..."
该密钥用于本Skill中兼容apimart的GPT-Image-2端点。

Core Components

核心组件

scripts/design_compiler.py

scripts/design_compiler.py

Internal prompt compiler.
Responsibilities:
  • detect task type
  • choose defaults for aspect and quality
  • build
    design_reasoning
  • compress it into
    compiled_brief
  • produce the final generation prompt
This is the core logic that makes the skill independent.
内置提示词编译器。
职责:
  • 检测任务类型
  • 选择宽高比和质量的默认值
  • 构建
    design_reasoning
  • 将其压缩为
    compiled_brief
  • 生成最终的图像生成提示词
这是使Skill具备独立性的核心逻辑。

scripts/design_image.py

scripts/design_image.py

CLI entrypoint for the internal compiler.
Use it when you want:
  • prompt-only output
  • a local design compilation test
  • direct generation without the full multi-agent workflow
Example:
bash
cd ~/.hermes/agents/multi-agent-image
python3 design_image.py \
  --task poster \
  --brief "AI训练营招生海报,强调速度、增长、实战" \
  --direction balanced \
  --aspect 3:4 \
  --prompt-only
It prints:
  • design_reasoning
  • compiled_brief
  • prompt
  • settings
内置编译器的CLI入口。
当你需要以下功能时使用:
  • 仅输出提示词
  • 本地设计编译测试
  • 无需完整多Agent工作流的直接生成
示例:
bash
cd ~/.hermes/agents/multi-agent-image
python3 design_image.py \
  --task poster \
  --brief "AI训练营招生海报,强调速度、增长、实战" \
  --direction balanced \
  --aspect 3:4 \
  --prompt-only
它会输出:
  • design_reasoning
  • compiled_brief
  • prompt
  • settings

scripts/orchestrator_v2.py

scripts/orchestrator_v2.py

Main workflow entrypoint.
Responsibilities:
  • run prompt analysis
  • choose task and generation parameters
  • optionally select a reference from the case library
  • call the internal compiler
  • call GPT-Image-2
  • archive outputs
  • auto-save successful results into the case library
主工作流入口。
职责:
  • 执行提示词分析
  • 选择任务和生成参数
  • 可选地从案例库中选择参考内容
  • 调用内置编译器
  • 调用GPT-Image-2
  • 归档输出结果
  • 将成功结果自动保存至案例库

scripts/gpt_image2_generator.py

scripts/gpt_image2_generator.py

Low-level GPT-Image-2 client.
Responsibilities:
  • submit async generation tasks
  • poll task status
  • download image results
Use this when you want direct API access without the full workflow.
底层GPT-Image-2客户端。
职责:
  • 提交异步生成任务
  • 轮询任务状态
  • 下载图像结果
当你需要直接访问API而无需完整工作流时使用。

scripts/case_library.py

scripts/case_library.py

Persistent library of past generations.
Responsibilities:
  • save outputs by task type
  • store metadata and rating
  • search by brief, prompt, or tags
  • return image paths for reuse as references
过往生成内容的持久化库。
职责:
  • 按任务类型保存输出结果
  • 存储元数据和评分
  • 按brief、prompt或标签搜索
  • 返回可复用为参考的图像路径

scripts/case_selector.py

scripts/case_selector.py

Interactive helper for Hermes dialogue flows.
Responsibilities:
  • render user-facing selection text
  • parse replies like
    1
    ,
    n
    ,
    case_001
    , or
    搜索蓝色
Hermes对话流程的交互式辅助工具。
职责:
  • 渲染面向用户的选择文本
  • 解析诸如
    1
    n
    case_001
    搜索蓝色
    之类的回复

scripts/interactive_run.py

scripts/interactive_run.py

Two-phase dialogue wrapper.
Use it when the workflow needs to ask the user before generating.
两阶段对话包装器。
当工作流需要在生成前询问用户时使用。

scripts/batch_generator_v2.py

scripts/batch_generator_v2.py

Batch generation entrypoint.
Supports:
  • same brief, multiple directions
  • same brief, multiple aspect ratios
  • multiple briefs in one run
批量生成入口。
支持:
  • 同一brief,多种风格方向
  • 同一brief,多种宽高比
  • 一次运行处理多个brief

scripts/series_generator.py

scripts/series_generator.py

Style-consistent series generator.
Workflow:
  1. generate a master image
  2. extract style signals from its compiled brief
  3. generate child images that follow the same visual system
风格统一的系列图像生成器。
工作流:
  1. 生成主图像
  2. 从其编译后的brief中提取风格信号
  3. 生成遵循同一视觉系统的子图像

templates/linear_batch.py

templates/linear_batch.py

Editable template for resumable sequential runs.
Useful when you want:
  • explicit scene lists
  • filesystem-based progress monitoring
  • style propagation from the first generated image
可编辑的可恢复顺序运行模板。
适用于以下场景:
  • 明确的场景列表
  • 基于文件系统的进度监控
  • 从第一张生成图像传播风格

Internal Design Compiler

内置设计编译器

The internal compiler produces three layers:
内置编译器生成三个层级的内容:

1.
design_reasoning

1.
design_reasoning

This captures design intent before generation.
Typical fields:
  • task
  • communication_goal
  • audience
  • channel
  • visual_system
  • hierarchy_strategy
  • safe_zone_strategy
  • lighting_strategy
  • palette_strategy
  • anti_filler_rules
  • anti_slop_rules
这部分在生成前捕捉设计意图。
典型字段:
  • task
  • communication_goal
  • audience
  • channel
  • visual_system
  • hierarchy_strategy
  • safe_zone_strategy
  • lighting_strategy
  • palette_strategy
  • anti_filler_rules
  • anti_slop_rules

2.
compiled_brief

2.
compiled_brief

This is a compressed design brief for generation.
It includes:
  • what the image is for
  • what should dominate visually
  • what space should remain available
  • what to avoid
这是用于生成的压缩版设计brief。
包含:
  • 图像用途
  • 视觉主导元素
  • 需预留的空间
  • 需要避免的内容

3.
prompt

3.
prompt

Final model-facing prompt used for GPT-Image-2.
The prompt is generated from design logic, not just from a list of style keywords.
用于GPT-Image-2的最终模型提示词。
该提示词由设计逻辑生成,而非仅由风格关键词列表组成。

Supported Tasks

支持的任务类型

The built-in compiler understands these task classes:
  • poster
  • product
  • ppt
  • infographic
  • teaching
  • auto
Default aspect assumptions:
  • poster
    3:4
  • product
    1:1
  • ppt
    16:9
  • infographic
    4:3
  • teaching
    16:9
Direction modes:
  • conservative
  • balanced
  • bold
Quality modes:
  • draft
  • final
  • premium
Current generation channel:
  • gpt-image-2
内置编译器可识别以下任务类别:
  • poster
  • product
  • ppt
  • infographic
  • teaching
  • auto
默认宽高比假设:
  • poster
    3:4
  • product
    1:1
  • ppt
    16:9
  • infographic
    4:3
  • teaching
    16:9
风格方向模式:
  • conservative
  • balanced
  • bold
质量模式:
  • draft
  • final
  • premium
当前生成渠道:
  • gpt-image-2

Usage

使用方法

Quick start

快速开始

bash
cd ~/.hermes/agents/multi-agent-image
python3 quick_start.py "AI训练营招生海报,强调速度、增长、实战"
bash
cd ~/.hermes/agents/multi-agent-image
python3 quick_start.py "AI训练营招生海报,强调速度、增长、实战"

Prompt-only compilation

仅编译提示词

bash
cd ~/.hermes/agents/multi-agent-image
python3 design_image.py \
  --task product \
  --brief "高端陶瓷咖啡杯电商首图,温暖晨光,突出釉面质感" \
  --prompt-only
bash
cd ~/.hermes/agents/multi-agent-image
python3 design_image.py \
  --task product \
  --brief "高端陶瓷咖啡杯电商首图,温暖晨光,突出釉面质感" \
  --prompt-only

Full orchestrated generation

完整编排生成

python
from orchestrator_v2 import run

run("AI训练营招生海报,强调速度增长实战")
python
from orchestrator_v2 import run

run("AI训练营招生海报,强调速度增长实战")

Force task and visual settings

强制指定任务和视觉设置

python
from orchestrator_v2 import run

run(
    "高端咖啡杯商品图",
    task="product",
    direction="balanced",
    aspect="1:1",
    quality="final",
    use_reference=False,
)
python
from orchestrator_v2 import run

run(
    "高端咖啡杯商品图",
    task="product",
    direction="balanced",
    aspect="1:1",
    quality="final",
    use_reference=False,
)

Interactive Workflow

交互式工作流

Use the two-phase pattern when Hermes should ask before generating.
当Hermes需要在生成前询问用户时,使用两阶段模式。

Phase 1: prepare text for the user

阶段1:准备面向用户的文本

python
from interactive_run import prepare

text = prepare("帮我做张 AI 训练营海报", task="poster")
print(text)
python
from interactive_run import prepare

text = prepare("帮我做张 AI 训练营海报", task="poster")
print(text)

Phase 2: execute after the user chooses

阶段2:用户选择后执行

python
from interactive_run import execute

result = execute("帮我做张 AI 训练营海报", user_choice="1", task="poster")
Supported reply patterns:
  • 1
    ,
    2
    ,
    3
  • n
  • y
  • case_001
  • 搜索蓝色
python
from interactive_run import execute

result = execute("帮我做张 AI 训练营海报", user_choice="1", task="poster")
支持的回复格式:
  • 1
    ,
    2
    ,
    3
  • n
  • y
  • case_001
  • 搜索蓝色

Batch Generation

批量生成

Same brief, multiple directions

同一brief,多种风格方向

python
from batch_generator_v2 import batch_styles

batch_styles("AI训练营海报", task="poster")
python
from batch_generator_v2 import batch_styles

batch_styles("AI训练营海报", task="poster")

Same brief, multiple aspect ratios

同一brief,多种宽高比

python
from batch_generator_v2 import batch_aspects

batch_aspects("AI训练营海报", task="poster", aspects=["1:1", "16:9", "9:16"])
python
from batch_generator_v2 import batch_aspects

batch_aspects("AI训练营海报", task="poster", aspects=["1:1", "16:9", "9:16"])

Multiple briefs

多个brief

python
from batch_generator_v2 import batch_briefs

batch_briefs(["海报A", "海报B", "海报C"], task="poster")
python
from batch_generator_v2 import batch_briefs

batch_briefs(["海报A", "海报B", "海报C"], task="poster")

Series Generation

系列生成

Use this when several outputs should feel like the same campaign or product family.
python
from series_generator import SeriesGenerator

sg = SeriesGenerator()
sg.create_series(
    master_brief="AI训练营系列视觉,科技蓝,专业商务感",
    items=[
        {"name": "主海报", "brief": "AI训练营招生主海报", "aspect": "3:4"},
        {"name": "Banner", "brief": "官网 Banner", "aspect": "16:9"},
        {"name": "朋友圈", "brief": "朋友圈推广方形图", "aspect": "1:1"},
    ],
    task="poster",
    direction="balanced",
)
当需要多个输出内容属于同一宣传活动或产品系列时使用。
python
from series_generator import SeriesGenerator

sg = SeriesGenerator()
sg.create_series(
    master_brief="AI训练营系列视觉,科技蓝,专业商务感",
    items=[
        {"name": "主海报", "brief": "AI训练营招生主海报", "aspect": "3:4"},
        {"name": "Banner", "brief": "官网 Banner", "aspect": "16:9"},
        {"name": "朋友圈", "brief": "朋友圈推广方形图", "aspect": "1:1"},
    ],
    task="poster",
    direction="balanced",
)

Case Library

案例库

Case library directory:
text
~/.hermes/agents/multi-agent-image/case_library/
Output directory:
text
~/.hermes/agents/multi-agent-image/output/
Typical case structure:
text
case_library/
├── poster/
│   └── case_001_example/
│       ├── image.png
│       └── metadata.json
Typical metadata fields:
  • case_id
  • task
  • brief
  • prompt
  • params
  • tags
  • rating
案例库目录:
text
~/.hermes/agents/multi-agent-image/case_library/
输出目录:
text
~/.hermes/agents/multi-agent-image/output/
典型案例结构:
text
case_library/
├── poster/
│   └── case_001_example/
│       ├── image.png
│       └── metadata.json
典型元数据字段:
  • case_id
  • task
  • brief
  • prompt
  • params
  • tags
  • rating

Validation Guidance

验证指南

Before generating at scale, test prompt quality first:
bash
python3 design_image.py \
  --task poster \
  --brief "AI训练营招生海报,强调速度、增长、实战" \
  --direction balanced \
  --aspect 3:4 \
  --prompt-only
What to check:
  • Does
    design_reasoning
    state a clear communication goal?
  • Is there an explicit safe zone?
  • Is hierarchy obvious?
  • Do
    anti_slop_rules
    remove HUD overlays, fog, and generic clutter?
  • Does the prompt describe a single strong visual idea rather than a pile of elements?
大规模生成前,先测试提示词质量:
bash
python3 design_image.py \
  --task poster \
  --brief "AI训练营招生海报,强调速度、增长、实战" \
  --direction balanced \
  --aspect 3:4 \
  --prompt-only
需要检查的内容:
  • design_reasoning
    是否明确了沟通目标?
  • 是否有明确的安全区域?
  • 视觉层级是否清晰?
  • anti_slop_rules
    是否移除了HUD覆盖层、雾气和通用杂乱元素?
  • 提示词是否描述了一个清晰的强视觉概念,而非一堆元素的堆砌?

Current Limits

当前限制

  • Current image provider is centered on
    gpt-image-2
  • QA scoring is intentionally lightweight
  • Series generation is heavier than one-off generation
  • The skill is optimized for raster outputs, not editable assets
  • Some reference documents remain longer than necessary, but the main runtime path is consistent
  • 当前图像生成提供商以
    gpt-image-2
    为核心
  • QA评分设计得较为轻量化
  • 系列生成比单次生成消耗更多资源
  • Skill针对光栅图像输出优化,不支持可编辑资产
  • 部分参考文档篇幅较长,但主运行时流程保持一致

Version History

版本历史

  • v1.0.0
    Initial multi-agent workflow for GPT-Image-2 generation
  • v2.0.0
    Added case library, interactive reference selection, and image-to-image style reuse
  • v2.1.0
    Added stronger download retry logic, batch workflows, and series generation
  • v2.2.0
    Packaged as a reusable Hermes skill with install script and runtime layout
  • v3.0.0
    Internalized the design compiler and removed external runtime dependency
  • v1.0.0
    初始版本,支持GPT-Image-2生成的多Agent工作流
  • v2.0.0
    添加案例库、交互式参考选择和图像到图像的风格复用功能
  • v2.1.0
    增强下载重试逻辑,添加批量工作流和系列生成功能
  • v2.2.0
    打包为可复用的Hermes Skill,包含安装脚本和运行时布局
  • v3.0.0
    内置设计编译器,移除外部运行时依赖