multi-agent-image
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMulti-Agent Image
Multi-Agent 图像生成
multi-agent-imageIt is designed for cases where a simple one-line prompt is not enough. Instead of sending raw user input directly to an image model, this skill:
- analyzes the request,
- compiles it into a design-aware prompt,
- generates through ,
gpt-image-2 - archives the result,
- and optionally reuses successful outputs as future style references.
This skill is independent at runtime. The design compiler is built into this repository and does not require an external skill.
multi-agent-image它专为无法通过简单单行提示词完成的场景设计。该Skill不会直接将用户原始输入发送至图像模型,而是执行以下步骤:
- 分析请求,
- 将其编译为具备设计感知的提示词,
- 通过生成图像,
gpt-image-2 - 归档结果,
- 可选地将成功输出复用为未来的风格参考。
该Skill在运行时独立。设计编译器内置在本仓库中,无需依赖外部Skill。
When to Use
使用场景
Use this skill when the user wants one or more of the following:
- Design-oriented poster generation
- Product images or ad visuals
- PPT cover visuals or chapter art
- Infographic-like or teaching/demo visuals
- Style reference reuse from prior generations
- Interactive “show examples first, then generate” flow
- Batch generation for multiple directions or aspect ratios
- Series generation where multiple images should share one visual language
Do not use this skill for:
- pixel-accurate UI recreation
- editable charts
- exact typography output inside the image
- tasks that require vector, HTML, or PPT-native assets rather than raster images
当用户有以下一项或多项需求时,可使用本Skill:
- 面向设计的海报生成
- 产品图片或广告视觉素材
- PPT封面视觉图或章节插图
- 信息图风格或教学/演示视觉素材
- 复用先前生成内容作为风格参考
- 交互式“先展示示例,再生成”流程
- 针对多个方向或宽高比的批量生成
- 生成风格统一的系列图像,多幅图像共享同一视觉语言
请勿将本Skill用于以下场景:
- 像素级精准的UI还原
- 可编辑图表
- 图像内的精确排版输出
- 需要矢量图、HTML或PPT原生资产而非光栅图像的任务
Architecture
架构
text
User Request
↓
[Prompt Engineer]
↓
[Style Scout]
↓
[Internal Design Compiler]
↓
[GPT-Image-2 Generation]
↓
[QA + Archive]
↓
[Case Library]Optional layers on top of the main path:
- Interactive reference selection
- Batch generation
- Series generation
text
User Request
↓
[Prompt Engineer]
↓
[Style Scout]
↓
[Internal Design Compiler]
↓
[GPT-Image-2 Generation]
↓
[QA + Archive]
↓
[Case Library]主流程之上的可选模块:
- 交互式参考选择
- 批量生成
- 系列生成
Setup
部署步骤
1. Deploy the skill
1. 部署Skill
The skill source lives in:
bash
~/.hermes/skills/multi-agent-image/Install runtime files into the working directory:
bash
python3 ~/.hermes/skills/multi-agent-image/scripts/install.pyThis prepares:
~/.hermes/agents/multi-agent-image/output/~/.hermes/agents/multi-agent-image/case_library/- agent role folders and memory files
- local runtime scripts copied from the skill
Skill源码位于:
bash
~/.hermes/skills/multi-agent-image/将运行时文件安装至工作目录:
bash
python3 ~/.hermes/skills/multi-agent-image/scripts/install.py该命令会准备以下内容:
~/.hermes/agents/multi-agent-image/output/~/.hermes/agents/multi-agent-image/case_library/- Agent角色文件夹和内存文件
- 从Skill复制的本地运行时脚本
2. Install Python dependencies
2. 安装Python依赖
bash
pip install openai requestsbash
pip install openai requests3. Set API key
3. 设置API密钥
bash
export OPENAI_API_KEY="sk-..."This key is used with the apimart-compatible GPT-Image-2 endpoints in this skill.
bash
export OPENAI_API_KEY="sk-..."该密钥用于本Skill中兼容apimart的GPT-Image-2端点。
Core Components
核心组件
scripts/design_compiler.py
scripts/design_compiler.pyscripts/design_compiler.py
scripts/design_compiler.pyInternal prompt compiler.
Responsibilities:
- detect task type
- choose defaults for aspect and quality
- build
design_reasoning - compress it into
compiled_brief - produce the final generation prompt
This is the core logic that makes the skill independent.
内置提示词编译器。
职责:
- 检测任务类型
- 选择宽高比和质量的默认值
- 构建
design_reasoning - 将其压缩为
compiled_brief - 生成最终的图像生成提示词
这是使Skill具备独立性的核心逻辑。
scripts/design_image.py
scripts/design_image.pyscripts/design_image.py
scripts/design_image.pyCLI entrypoint for the internal compiler.
Use it when you want:
- prompt-only output
- a local design compilation test
- direct generation without the full multi-agent workflow
Example:
bash
cd ~/.hermes/agents/multi-agent-image
python3 design_image.py \
--task poster \
--brief "AI训练营招生海报,强调速度、增长、实战" \
--direction balanced \
--aspect 3:4 \
--prompt-onlyIt prints:
design_reasoningcompiled_briefpromptsettings
内置编译器的CLI入口。
当你需要以下功能时使用:
- 仅输出提示词
- 本地设计编译测试
- 无需完整多Agent工作流的直接生成
示例:
bash
cd ~/.hermes/agents/multi-agent-image
python3 design_image.py \
--task poster \
--brief "AI训练营招生海报,强调速度、增长、实战" \
--direction balanced \
--aspect 3:4 \
--prompt-only它会输出:
design_reasoningcompiled_briefpromptsettings
scripts/orchestrator_v2.py
scripts/orchestrator_v2.pyscripts/orchestrator_v2.py
scripts/orchestrator_v2.pyMain workflow entrypoint.
Responsibilities:
- run prompt analysis
- choose task and generation parameters
- optionally select a reference from the case library
- call the internal compiler
- call GPT-Image-2
- archive outputs
- auto-save successful results into the case library
主工作流入口。
职责:
- 执行提示词分析
- 选择任务和生成参数
- 可选地从案例库中选择参考内容
- 调用内置编译器
- 调用GPT-Image-2
- 归档输出结果
- 将成功结果自动保存至案例库
scripts/gpt_image2_generator.py
scripts/gpt_image2_generator.pyscripts/gpt_image2_generator.py
scripts/gpt_image2_generator.pyLow-level GPT-Image-2 client.
Responsibilities:
- submit async generation tasks
- poll task status
- download image results
Use this when you want direct API access without the full workflow.
底层GPT-Image-2客户端。
职责:
- 提交异步生成任务
- 轮询任务状态
- 下载图像结果
当你需要直接访问API而无需完整工作流时使用。
scripts/case_library.py
scripts/case_library.pyscripts/case_library.py
scripts/case_library.pyPersistent library of past generations.
Responsibilities:
- save outputs by task type
- store metadata and rating
- search by brief, prompt, or tags
- return image paths for reuse as references
过往生成内容的持久化库。
职责:
- 按任务类型保存输出结果
- 存储元数据和评分
- 按brief、prompt或标签搜索
- 返回可复用为参考的图像路径
scripts/case_selector.py
scripts/case_selector.pyscripts/case_selector.py
scripts/case_selector.pyInteractive helper for Hermes dialogue flows.
Responsibilities:
- render user-facing selection text
- parse replies like ,
1,n, orcase_001搜索蓝色
Hermes对话流程的交互式辅助工具。
职责:
- 渲染面向用户的选择文本
- 解析诸如、
1、n或case_001之类的回复搜索蓝色
scripts/interactive_run.py
scripts/interactive_run.pyscripts/interactive_run.py
scripts/interactive_run.pyTwo-phase dialogue wrapper.
Use it when the workflow needs to ask the user before generating.
两阶段对话包装器。
当工作流需要在生成前询问用户时使用。
scripts/batch_generator_v2.py
scripts/batch_generator_v2.pyscripts/batch_generator_v2.py
scripts/batch_generator_v2.pyBatch generation entrypoint.
Supports:
- same brief, multiple directions
- same brief, multiple aspect ratios
- multiple briefs in one run
批量生成入口。
支持:
- 同一brief,多种风格方向
- 同一brief,多种宽高比
- 一次运行处理多个brief
scripts/series_generator.py
scripts/series_generator.pyscripts/series_generator.py
scripts/series_generator.pyStyle-consistent series generator.
Workflow:
- generate a master image
- extract style signals from its compiled brief
- generate child images that follow the same visual system
风格统一的系列图像生成器。
工作流:
- 生成主图像
- 从其编译后的brief中提取风格信号
- 生成遵循同一视觉系统的子图像
templates/linear_batch.py
templates/linear_batch.pytemplates/linear_batch.py
templates/linear_batch.pyEditable template for resumable sequential runs.
Useful when you want:
- explicit scene lists
- filesystem-based progress monitoring
- style propagation from the first generated image
可编辑的可恢复顺序运行模板。
适用于以下场景:
- 明确的场景列表
- 基于文件系统的进度监控
- 从第一张生成图像传播风格
Internal Design Compiler
内置设计编译器
The internal compiler produces three layers:
内置编译器生成三个层级的内容:
1. design_reasoning
design_reasoning1. design_reasoning
design_reasoningThis captures design intent before generation.
Typical fields:
taskcommunication_goalaudiencechannelvisual_systemhierarchy_strategysafe_zone_strategylighting_strategypalette_strategyanti_filler_rulesanti_slop_rules
这部分在生成前捕捉设计意图。
典型字段:
taskcommunication_goalaudiencechannelvisual_systemhierarchy_strategysafe_zone_strategylighting_strategypalette_strategyanti_filler_rulesanti_slop_rules
2. compiled_brief
compiled_brief2. compiled_brief
compiled_briefThis is a compressed design brief for generation.
It includes:
- what the image is for
- what should dominate visually
- what space should remain available
- what to avoid
这是用于生成的压缩版设计brief。
包含:
- 图像用途
- 视觉主导元素
- 需预留的空间
- 需要避免的内容
3. prompt
prompt3. prompt
promptFinal model-facing prompt used for GPT-Image-2.
The prompt is generated from design logic, not just from a list of style keywords.
用于GPT-Image-2的最终模型提示词。
该提示词由设计逻辑生成,而非仅由风格关键词列表组成。
Supported Tasks
支持的任务类型
The built-in compiler understands these task classes:
posterproductpptinfographicteachingauto
Default aspect assumptions:
- →
poster3:4 - →
product1:1 - →
ppt16:9 - →
infographic4:3 - →
teaching16:9
Direction modes:
conservativebalancedbold
Quality modes:
draftfinalpremium
Current generation channel:
gpt-image-2
内置编译器可识别以下任务类别:
posterproductpptinfographicteachingauto
默认宽高比假设:
- →
poster3:4 - →
product1:1 - →
ppt16:9 - →
infographic4:3 - →
teaching16:9
风格方向模式:
conservativebalancedbold
质量模式:
draftfinalpremium
当前生成渠道:
gpt-image-2
Usage
使用方法
Quick start
快速开始
bash
cd ~/.hermes/agents/multi-agent-image
python3 quick_start.py "AI训练营招生海报,强调速度、增长、实战"bash
cd ~/.hermes/agents/multi-agent-image
python3 quick_start.py "AI训练营招生海报,强调速度、增长、实战"Prompt-only compilation
仅编译提示词
bash
cd ~/.hermes/agents/multi-agent-image
python3 design_image.py \
--task product \
--brief "高端陶瓷咖啡杯电商首图,温暖晨光,突出釉面质感" \
--prompt-onlybash
cd ~/.hermes/agents/multi-agent-image
python3 design_image.py \
--task product \
--brief "高端陶瓷咖啡杯电商首图,温暖晨光,突出釉面质感" \
--prompt-onlyFull orchestrated generation
完整编排生成
python
from orchestrator_v2 import run
run("AI训练营招生海报,强调速度增长实战")python
from orchestrator_v2 import run
run("AI训练营招生海报,强调速度增长实战")Force task and visual settings
强制指定任务和视觉设置
python
from orchestrator_v2 import run
run(
"高端咖啡杯商品图",
task="product",
direction="balanced",
aspect="1:1",
quality="final",
use_reference=False,
)python
from orchestrator_v2 import run
run(
"高端咖啡杯商品图",
task="product",
direction="balanced",
aspect="1:1",
quality="final",
use_reference=False,
)Interactive Workflow
交互式工作流
Use the two-phase pattern when Hermes should ask before generating.
当Hermes需要在生成前询问用户时,使用两阶段模式。
Phase 1: prepare text for the user
阶段1:准备面向用户的文本
python
from interactive_run import prepare
text = prepare("帮我做张 AI 训练营海报", task="poster")
print(text)python
from interactive_run import prepare
text = prepare("帮我做张 AI 训练营海报", task="poster")
print(text)Phase 2: execute after the user chooses
阶段2:用户选择后执行
python
from interactive_run import execute
result = execute("帮我做张 AI 训练营海报", user_choice="1", task="poster")Supported reply patterns:
- ,
1,23 nycase_001搜索蓝色
python
from interactive_run import execute
result = execute("帮我做张 AI 训练营海报", user_choice="1", task="poster")支持的回复格式:
- ,
1,23 nycase_001搜索蓝色
Batch Generation
批量生成
Same brief, multiple directions
同一brief,多种风格方向
python
from batch_generator_v2 import batch_styles
batch_styles("AI训练营海报", task="poster")python
from batch_generator_v2 import batch_styles
batch_styles("AI训练营海报", task="poster")Same brief, multiple aspect ratios
同一brief,多种宽高比
python
from batch_generator_v2 import batch_aspects
batch_aspects("AI训练营海报", task="poster", aspects=["1:1", "16:9", "9:16"])python
from batch_generator_v2 import batch_aspects
batch_aspects("AI训练营海报", task="poster", aspects=["1:1", "16:9", "9:16"])Multiple briefs
多个brief
python
from batch_generator_v2 import batch_briefs
batch_briefs(["海报A", "海报B", "海报C"], task="poster")python
from batch_generator_v2 import batch_briefs
batch_briefs(["海报A", "海报B", "海报C"], task="poster")Series Generation
系列生成
Use this when several outputs should feel like the same campaign or product family.
python
from series_generator import SeriesGenerator
sg = SeriesGenerator()
sg.create_series(
master_brief="AI训练营系列视觉,科技蓝,专业商务感",
items=[
{"name": "主海报", "brief": "AI训练营招生主海报", "aspect": "3:4"},
{"name": "Banner", "brief": "官网 Banner", "aspect": "16:9"},
{"name": "朋友圈", "brief": "朋友圈推广方形图", "aspect": "1:1"},
],
task="poster",
direction="balanced",
)当需要多个输出内容属于同一宣传活动或产品系列时使用。
python
from series_generator import SeriesGenerator
sg = SeriesGenerator()
sg.create_series(
master_brief="AI训练营系列视觉,科技蓝,专业商务感",
items=[
{"name": "主海报", "brief": "AI训练营招生主海报", "aspect": "3:4"},
{"name": "Banner", "brief": "官网 Banner", "aspect": "16:9"},
{"name": "朋友圈", "brief": "朋友圈推广方形图", "aspect": "1:1"},
],
task="poster",
direction="balanced",
)Case Library
案例库
Case library directory:
text
~/.hermes/agents/multi-agent-image/case_library/Output directory:
text
~/.hermes/agents/multi-agent-image/output/Typical case structure:
text
case_library/
├── poster/
│ └── case_001_example/
│ ├── image.png
│ └── metadata.jsonTypical metadata fields:
case_idtaskbriefpromptparamstagsrating
案例库目录:
text
~/.hermes/agents/multi-agent-image/case_library/输出目录:
text
~/.hermes/agents/multi-agent-image/output/典型案例结构:
text
case_library/
├── poster/
│ └── case_001_example/
│ ├── image.png
│ └── metadata.json典型元数据字段:
case_idtaskbriefpromptparamstagsrating
Validation Guidance
验证指南
Before generating at scale, test prompt quality first:
bash
python3 design_image.py \
--task poster \
--brief "AI训练营招生海报,强调速度、增长、实战" \
--direction balanced \
--aspect 3:4 \
--prompt-onlyWhat to check:
- Does state a clear communication goal?
design_reasoning - Is there an explicit safe zone?
- Is hierarchy obvious?
- Do remove HUD overlays, fog, and generic clutter?
anti_slop_rules - Does the prompt describe a single strong visual idea rather than a pile of elements?
大规模生成前,先测试提示词质量:
bash
python3 design_image.py \
--task poster \
--brief "AI训练营招生海报,强调速度、增长、实战" \
--direction balanced \
--aspect 3:4 \
--prompt-only需要检查的内容:
- 是否明确了沟通目标?
design_reasoning - 是否有明确的安全区域?
- 视觉层级是否清晰?
- 是否移除了HUD覆盖层、雾气和通用杂乱元素?
anti_slop_rules - 提示词是否描述了一个清晰的强视觉概念,而非一堆元素的堆砌?
Current Limits
当前限制
- Current image provider is centered on
gpt-image-2 - QA scoring is intentionally lightweight
- Series generation is heavier than one-off generation
- The skill is optimized for raster outputs, not editable assets
- Some reference documents remain longer than necessary, but the main runtime path is consistent
- 当前图像生成提供商以为核心
gpt-image-2 - QA评分设计得较为轻量化
- 系列生成比单次生成消耗更多资源
- Skill针对光栅图像输出优化,不支持可编辑资产
- 部分参考文档篇幅较长,但主运行时流程保持一致
Version History
版本历史
- Initial multi-agent workflow for GPT-Image-2 generation
v1.0.0 - Added case library, interactive reference selection, and image-to-image style reuse
v2.0.0 - Added stronger download retry logic, batch workflows, and series generation
v2.1.0 - Packaged as a reusable Hermes skill with install script and runtime layout
v2.2.0 - Internalized the design compiler and removed external runtime dependency
v3.0.0
- 初始版本,支持GPT-Image-2生成的多Agent工作流
v1.0.0 - 添加案例库、交互式参考选择和图像到图像的风格复用功能
v2.0.0 - 增强下载重试逻辑,添加批量工作流和系列生成功能
v2.1.0 - 打包为可复用的Hermes Skill,包含安装脚本和运行时布局
v2.2.0 - 内置设计编译器,移除外部运行时依赖
v3.0.0