sn-image-imitate

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

sn-image-imitate

sn-image-imitate

Image style imitation scene skill (tier 1), relying on the
sn-image-recognize
,
sn-text-optimize
, and
sn-image-generate
tools provided by
sn-image-base
(tier 0).
Features:
  • Extracts high-fidelity long caption from a reference image
  • Rewrites caption according to user requested content change while preserving style and layout
  • Enforces layout-lock constraints during caption rewrite
  • Performs post-generation layout consistency review and bounded retries
  • Returns structured process artifacts for debugging and reproducibility
图像风格模仿场景技能(一级),依赖
sn-image-base
(零级)提供的
sn-image-recognize
sn-text-optimize
sn-image-generate
工具。
功能特性:
  • 从参考图中提取高保真长描述
  • 根据用户要求的内容修改改写描述,同时保留风格与布局
  • 在描述改写阶段强制布局锁定约束
  • 生成后执行布局一致性检查及有限次数重试
  • 返回结构化流程产物,便于调试与复现

Non-goals

非目标场景

  • Pure neural style transfer without content change (use dedicated style-transfer tools instead)
  • Local editing / inpainting of specific regions within the reference image
  • Processing video or animation input (only single static images are supported)
  • Batch generation from multiple reference images in one invocation
  • Guaranteeing pixel-level fidelity to the reference; the skill targets layout and style consistency, not exact reproduction
  • 无内容变更的纯神经风格迁移(请使用专用风格迁移工具)
  • 对参考图特定区域进行局部编辑/修复
  • 处理视频或动画输入(仅支持单张静态图片)
  • 单次调用基于多张参考图批量生成
  • 保证与参考图像素级一致;本技能目标是布局与风格一致性,而非精确复刻

Input Specification

输入规范

  • reference_image
    (string, required): local path or URL of the style reference image
  • target_content
    (string, required): new content user wants in the generated image
  • output_mode
    (string, default
    friendly
    ): output mode,
    friendly
    or
    verbose
  • aspect_ratio
    (string, default
    16:9
    ): output aspect ratio for generation
  • image_size
    (string, default
    2k
    ): output image size preset
  • max_attempts
    (int, default
    3
    ): maximum generation attempts for meeting layout consistency
  • layout_threshold
    (float, default
    0.75
    ): minimum layout similarity score to accept result
  • reference_image
    (字符串,必填):风格参考图的本地路径或URL
  • target_content
    (字符串,必填):用户希望生成图包含的新内容
  • output_mode
    (字符串,默认
    friendly
    ):输出模式,可选
    friendly
    verbose
  • aspect_ratio
    (字符串,默认
    16:9
    ):生成图的宽高比
  • image_size
    (字符串,默认
    2k
    ):生成图的尺寸预设
  • max_attempts
    (整数,默认
    3
    ):满足布局一致性要求的最大生成尝试次数
  • layout_threshold
    (浮点数,默认
    0.75
    ):可接受结果的最小布局相似度分数

Environment Variable

环境变量

Dependency installation and API key configuration are for sn-image-base skill.
The minimum environment variables to configure
sn-image-base
skill running with SenseNova Token Plan:
ini
SN_BASE_URL="https://token.sensenova.cn/v1"
SN_API_KEY="your-api-key"
Fallback priority is dedicated variable > domain shared variable > global variable. Text calls use
SN_TEXT_API_KEY
->
SN_CHAT_API_KEY
->
SN_API_KEY
; vision calls use
SN_VISION_API_KEY
->
SN_CHAT_API_KEY
->
SN_API_KEY
; image generation uses
SN_IMAGE_GEN_API_KEY
->
SN_API_KEY
.
Please refer to the Python dependencies and API keys section in
sn-image-generate_en.md
for more configurations.
依赖安装与API密钥配置需参考sn-image-base技能。
配置
sn-image-base
技能使用SenseNova Token Plan所需的最低环境变量:
ini
SN_BASE_URL="https://token.sensenova.cn/v1"
SN_API_KEY="your-api-key"
优先级顺序为:专用变量 > 域共享变量 > 全局变量。文本调用使用
SN_TEXT_API_KEY
->
SN_CHAT_API_KEY
->
SN_API_KEY
;视觉调用使用
SN_VISION_API_KEY
->
SN_CHAT_API_KEY
->
SN_API_KEY
;图像生成使用
SN_IMAGE_GEN_API_KEY
->
SN_API_KEY
更多配置请参考
sn-image-generate_en.md
中的Python依赖与API密钥章节。

API Configuration

API配置

All API calls in this skill are executed through the
sn_agent_runner.py
of the
sn-image-base
skill, please refer to the
sn-image-base
skill (README.md) for more details.
  • VLM call:
    sn-image-recognize
    (Step 1 & 3)
  • LLM call:
    sn-text-optimize
    (Step 2)
  • Image generation call:
    sn-image-generate
    (Step 3)
When encountering
MissingApiKeyError
or needing explicit model control
: pass model and auth params explicitly via CLI arguments. See
$SN_IMAGE_BASE/references/api_spec.md
.
$SN_IMAGE_BASE
path explanation
:
$SN_IMAGE_BASE
is the installation directory of the
sn-image-base
skill (
SKILL.md
exists). The agent can locate this path by skill name
sn-image-base
.
本技能中的所有API调用均通过
sn-image-base
技能的
sn_agent_runner.py
执行,更多细节请参考
sn-image-base
技能的README.md
  • VLM调用
    sn-image-recognize
    (步骤1和3)
  • LLM调用
    sn-text-optimize
    (步骤2)
  • 图像生成调用
    sn-image-generate
    (步骤3)
遇到
MissingApiKeyError
或需要显式控制模型时
:通过CLI参数显式传递模型和认证参数。详情请见
$SN_IMAGE_BASE/references/api_spec.md
$SN_IMAGE_BASE
路径说明
$SN_IMAGE_BASE
sn-image-base
技能的安装目录(存在
SKILL.md
文件)。Agent可通过技能名称
sn-image-base
定位该路径。

Architecture: Main Agent + Worker Agent

架构:主Agent + 工作Agent

This skill uses a two-tier agent architecture:
  • Main Agent: receives user request, normalizes parameters, sends preflight, invokes Worker Agent, and sends final text/image to user
  • Worker Agent: executes fixed 3-step pipeline and returns structured JSON
Responsibility Boundaries:
  • Worker Agent does not send any user-visible message directly
  • Main Agent sends all user-facing responses
  • Worker Agent last message must be and only be the JSON string defined in Return Contract
  • Worker Agent executes VLM/LLM/image calls directly; no nested subagent for these low-level calls
本技能采用双层Agent架构:
  • 主Agent:接收用户请求,标准化参数,发送预通知,调用工作Agent,并向用户发送最终文本/图片
  • 工作Agent:执行固定三步流程并返回结构化JSON
职责边界
  • 工作Agent不直接发送任何用户可见消息
  • 所有面向用户的响应均由主Agent发送
  • 工作Agent的最后一条消息必须且只能是返回契约中定义的JSON字符串
  • 工作Agent直接执行VLM/LLM/图像调用;此类底层调用不使用嵌套子Agent

Workflow

工作流程

Main Agent Workflow

主Agent工作流程

  1. Extract
    reference_image
    ,
    target_content
    ,
    output_mode
    (default
    friendly
    ),
    aspect_ratio
    (default
    16:9
    ),
    image_size
    (default
    2k
    ),
    max_attempts
    (default
    3
    ), and
    layout_threshold
    (default
    0.75
    )
  2. Validate required inputs:
    • reference_image
      is provided and resolvable
    • target_content
      is non-empty
  3. Send preflight message:
    "Using sn-image-imitate skill to generate a style-consistent image, please wait..."
  4. Start Worker Agent with full normalized parameters and working directory
  5. On Worker result:
    • status=ok
      : send final summary and generated image
    • status=error
      : report the actual error
  1. 提取
    reference_image
    target_content
    output_mode
    (默认
    friendly
    )、
    aspect_ratio
    (默认
    16:9
    )、
    image_size
    (默认
    2k
    )、
    max_attempts
    (默认
    3
    )和
    layout_threshold
    (默认
    0.75
  2. 验证必填输入:
    • reference_image
      已提供且可解析
    • target_content
      非空
  3. 发送预通知消息:
    "正在使用sn-image-imitate技能生成风格一致的图片,请稍候..."
  4. 使用完整标准化参数和工作目录启动工作Agent
  5. 处理工作Agent结果:
    • status=ok
      :发送最终总结和生成的图片
    • status=error
      :报告实际错误信息

Worker Agent Workflow

工作Agent工作流程

Worker Agent receives
reference_image
,
target_content
,
output_mode
,
aspect_ratio
,
image_size
,
max_attempts
,
layout_threshold
, and the working directory of this skill (
$SKILL_DIR
).
Error Handling Strategy:
All
sn_agent_runner.py
calls share the same error handling rules:
  • If the subprocess exits with non-zero code, crashes, or times out: do not fallback, return
    status=error
    with the actual error message from stderr or the system error string
  • If the subprocess returns invalid JSON or the JSON lacks an expected
    result
    field: return
    status=error
    , do not silently continue with empty or default values
  • If the VLM review call fails during Step 3, treat the attempt as incomplete: do not record a score, and either retry the review once or skip to the next attempt depending on remaining budget
工作Agent接收
reference_image
target_content
output_mode
aspect_ratio
image_size
max_attempts
layout_threshold
以及本技能的工作目录(
$SKILL_DIR
)。
错误处理策略
所有
sn_agent_runner.py
调用遵循相同的错误处理规则:
  • 如果子进程以非零代码退出、崩溃或超时:不进行降级处理,返回
    status=error
    并附带stderr中的实际错误信息或系统错误字符串
  • 如果子进程返回无效JSON或JSON缺少预期的
    result
    字段:返回
    status=error
    ,不静默使用空值或默认值继续执行
  • 如果步骤3中的VLM检查调用失败:将该次尝试视为未完成,不记录分数,根据剩余尝试次数决定重试检查一次或直接进入下一次尝试

Step 0 — Initialization

步骤0 — 初始化

  1. Generate
    task_id
    with format
    YYYYMMDD_HHMMSS
  2. Create temp directory:
    /tmp/openclaw/sn-image-imitate/<task_id>/
    as
    TEMP_DIR
  3. Resolve and normalize
    REFERENCE_IMAGE
  4. Persist user request:
bash
echo "$TARGET_CONTENT" > "$TEMP_DIR/target-content.txt"
  1. 生成格式为
    YYYYMMDD_HHMMSS
    task_id
  2. 创建临时目录:
    /tmp/openclaw/sn-image-imitate/<task_id>/
    ,记为
    TEMP_DIR
  3. 解析并标准化
    REFERENCE_IMAGE
  4. 持久化用户请求:
bash
echo "$TARGET_CONTENT" > "$TEMP_DIR/target-content.txt"

Step 1 — Image Annotation (long caption + layout blueprint)

步骤1 — 图像标注(长描述 + 布局蓝图)

Use
prompts/image_annotate.md
as system prompt and call
sn-image-recognize
on reference image.
bash
python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-recognize \
  --system-prompt-path "$SKILL_DIR/prompts/image_annotate.md" \
  --user-prompt "Please annotate this reference image and follow the required output format." \
  --images "$REFERENCE_IMAGE" \
  --output-format json
Parse JSON
result
, then parse three blocks:
  • SHORT_CAPTION: ...
  • LONG_CAPTION: ...
  • LAYOUT_BLUEPRINT_JSON: { ... }
If parsing fails,
LONG_CAPTION
is empty, or
LAYOUT_BLUEPRINT_JSON
is invalid JSON, return
status=error
.
Persist outputs:
bash
echo "$SHORT_CAPTION" > "$TEMP_DIR/reference-short-caption.txt"
echo "$LONG_CAPTION" > "$TEMP_DIR/reference-long-caption.txt"
echo "$LAYOUT_BLUEPRINT_JSON" > "$TEMP_DIR/layout-blueprint.json"
使用
prompts/image_annotate.md
作为系统提示词,对参考图调用
sn-image-recognize
bash
python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-recognize \\
  --system-prompt-path "$SKILL_DIR/prompts/image_annotate.md" \\
  --user-prompt "Please annotate this reference image and follow the required output format." \\
  --images "$REFERENCE_IMAGE" \\
  --output-format json
解析JSON中的
result
,再解析三个模块:
  • SHORT_CAPTION: ...
  • LONG_CAPTION: ...
  • LAYOUT_BLUEPRINT_JSON: { ... }
如果解析失败、
LONG_CAPTION
为空或
LAYOUT_BLUEPRINT_JSON
是无效JSON,返回
status=error
持久化输出:
bash
echo "$SHORT_CAPTION" > "$TEMP_DIR/reference-short-caption.txt"
echo "$LONG_CAPTION" > "$TEMP_DIR/reference-long-caption.txt"
echo "$LAYOUT_BLUEPRINT_JSON" > "$TEMP_DIR/layout-blueprint.json"

Step 2 — New long caption generation (content rewrite with layout lock)

步骤2 — 新长描述生成(带布局锁定的内容改写)

Goal: preserve style/layout/visual language from reference long caption while replacing core content by
target_content
.
Hard constraints to preserve (guided by
layout-blueprint.json
):
  • visual hierarchy (title/subtitle/body emphasis order)
  • region topology (number of major blocks and their relative positions)
  • reading flow (left-to-right / top-to-bottom / radial / timeline direction)
  • chart type and data encoding form (if present)
  • spacing rhythm and alignment pattern
  • major region bounding boxes and topological relations from blueprint
Preferred system prompt:
prompts/caption_rewrite.md
(recommended to add). If missing, use inline fallback system prompt:
Rewrite the long caption by preserving style and layout constraints while replacing semantic content according to user target. Do not change block topology, reading order, or visual hierarchy. Keep the caption detailed and directly usable for image generation.
Call
sn-text-optimize
:
bash
python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-text-optimize \
  --system-prompt-path "$SKILL_DIR/prompts/caption_rewrite.md" \
  --user-prompt "Reference long caption:\n$LONG_CAPTION\n\nLayout blueprint JSON:\n$LAYOUT_BLUEPRINT_JSON\n\nTarget content:\n$TARGET_CONTENT\n\nReturn only the rewritten long caption." \
  --output-format json
Parse JSON
result
as
NEW_LONG_CAPTION
. If empty, return
status=error
.
Persist output:
bash
echo "$NEW_LONG_CAPTION" > "$TEMP_DIR/new-long-caption.txt"
目标:保留参考图长描述中的风格/布局/视觉语言,同时根据
target_content
替换核心内容。
需保留的硬约束(由
layout-blueprint.json
引导):
  • 视觉层级(标题/副标题/正文的强调顺序)
  • 区域拓扑结构(主要区块数量及其相对位置)
  • 阅读流(从左到右/从上到下/放射状/时间线方向)
  • 图表类型和数据编码形式(如果存在)
  • 间距节奏和对齐模式
  • 蓝图中的主要区域边界框及其拓扑关系
推荐系统提示词
prompts/caption_rewrite.md
(建议添加)。 如果缺失,使用内置降级系统提示词:
Rewrite the long caption by preserving style and layout constraints while replacing semantic content according to user target. Do not change block topology, reading order, or visual hierarchy. Keep the caption detailed and directly usable for image generation.
调用
sn-text-optimize
bash
python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-text-optimize \\
  --system-prompt-path "$SKILL_DIR/prompts/caption_rewrite.md" \\
  --user-prompt "Reference long caption:\
$LONG_CAPTION\
\
Layout blueprint JSON:\
$LAYOUT_BLUEPRINT_JSON\
\
Target content:\
$TARGET_CONTENT\
\
Return only the rewritten long caption." \\
  --output-format json
解析JSON中的
result
作为
NEW_LONG_CAPTION
。如果为空,返回
status=error
持久化输出:
bash
echo "$NEW_LONG_CAPTION" > "$TEMP_DIR/new-long-caption.txt"

Step 3 — Image Generation and Layout Review Loop

步骤3 — 图像生成与布局检查循环

Execute
attempt
from
1
to
max_attempts
sequentially:
Generate Image (using
sn-image-base
's
sn-image-generate
tool):
bash
python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-generate \
  --prompt "$CURRENT_PROMPT" \
  --aspect-ratio "$ASPECT_RATIO" \
  --image-size "$IMAGE_SIZE" \
  --save-path "$TEMP_DIR/attempt_<N>.png" \
  --output-format json
VLM configuration requirements:
  • When
    max_attempts > 1
    , VLM review is required for each attempt
  • Select VLM model from OpenClaw configuration as parameter for image recognition
  • If no suitable VLM model exists in OpenClaw configuration:
    • Notify user that current parameter combination cannot be executed
    • Suggest adding VLM configuration or setting
      max_attempts
      to
      1
      to skip review
  • If VLM call times out or fails: do not fallback, report the real error directly
Layout Consistency Review (only executed when
max_attempts > 1
):
Review candidate vs reference using
prompts/layout_review.md
(with blueprint as structural oracle):
bash
python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-recognize \
  --system-prompt-path "$SKILL_DIR/prompts/layout_review.md" \
  --user-prompt "Reference is image[0], candidate is image[1]. Layout blueprint JSON:\n$LAYOUT_BLUEPRINT_JSON\n\nEvaluate layout similarity and return JSON only." \
  --images "$REFERENCE_IMAGE" "$TEMP_DIR/attempt_<N>.png" \
  --output-format json
Expected review JSON (inside
result
):
json
{
  "layout_similarity_score": 0.0,
  "style_similarity_score": 0.0,
  "pass": false,
  "major_deviations": [],
  "fix_hints": []
}
Save Attempt Result:
json
{
  "attempt": 1,
  "image": "$TEMP_DIR/attempt_1.png",
  "layout_similarity_score": 0.0,
  "style_similarity_score": 0.0,
  "pass": false,
  "major_deviations": [],
  "timing": {
    "image_generation": { "elapsed_seconds": 12.34, "model": "sn_image_model" },
    "vlm_review": { "elapsed_seconds": 5.67, "model": "sensenova-122b" }
  }
}
Note:
elapsed_seconds
is read from the
--output-format json
return of each CLI call;
image_generation.model
is fixed to the hardcoded placeholder
"sn_image_model"
(sn-image-generate does not return the model field);
vlm_review.model
is read from the JSON return of sn-image-recognize.
timing.vlm_review
is omitted when
max_attempts=1
.
Early Termination Check (only executed when
max_attempts > 1
):
Pass criteria:
  • layout_similarity_score >= layout_threshold
  • pass = true
  • If pass: immediately exit the loop, do not continue generating
  • If fail and attempts remain, append correction hints to prompt:
text
Layout correction requirements:
- <fix_hint_1>
- <fix_hint_2>
...
  • If all attempts fail to pass threshold, return highest-score candidate and mark
    layout_passed=false
依次执行从
1
max_attempts
attempt
生成图像(使用
sn-image-base
sn-image-generate
工具):
bash
python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-generate \\
  --prompt "$CURRENT_PROMPT" \\
  --aspect-ratio "$ASPECT_RATIO" \\
  --image-size "$IMAGE_SIZE" \\
  --save-path "$TEMP_DIR/attempt_<N>.png" \\
  --output-format json
VLM配置要求:
  • max_attempts > 1
    时,每次尝试都需要进行VLM检查
  • 从OpenClaw配置中选择VLM模型作为图像识别参数
  • 如果OpenClaw配置中没有合适的VLM模型:
    • 通知用户当前参数组合无法执行
    • 建议添加VLM配置或设置
      max_attempts
      1
      以跳过检查
  • 如果VLM调用超时或失败:不进行降级处理,直接报告真实错误
布局一致性检查(仅当
max_attempts > 1
时执行):
使用
prompts/layout_review.md
(以蓝图为结构基准)对比候选图与参考图:
bash
python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-recognize \\
  --system-prompt-path "$SKILL_DIR/prompts/layout_review.md" \\
  --user-prompt "Reference is image[0], candidate is image[1]. Layout blueprint JSON:\
$LAYOUT_BLUEPRINT_JSON\
\
Evaluate layout similarity and return JSON only." \\
  --images "$REFERENCE_IMAGE" "$TEMP_DIR/attempt_<N>.png" \\
  --output-format json
预期检查JSON(位于
result
内):
json
{
  "layout_similarity_score": 0.0,
  "style_similarity_score": 0.0,
  "pass": false,
  "major_deviations": [],
  "fix_hints": []
}
保存尝试结果
json
{
  "attempt": 1,
  "image": "$TEMP_DIR/attempt_1.png",
  "layout_similarity_score": 0.0,
  "style_similarity_score": 0.0,
  "pass": false,
  "major_deviations": [],
  "timing": {
    "image_generation": { "elapsed_seconds": 12.34, "model": "sn_image_model" },
    "vlm_review": { "elapsed_seconds": 5.67, "model": "sensenova-122b" }
  }
}
注意:
elapsed_seconds
从每个CLI调用的
--output-format json
返回值中读取;
image_generation.model
固定为硬编码占位符
"sn_image_model"
(sn-image-generate不返回模型字段);
vlm_review.model
从sn-image-recognize的JSON返回值中读取。当
max_attempts=1
时,省略
timing.vlm_review
提前终止检查(仅当
max_attempts > 1
时执行):
通过条件:
  • layout_similarity_score >= layout_threshold
  • pass = true
  • 如果通过:立即退出循环,不再继续生成
  • 如果未通过且还有剩余尝试次数,将修正提示追加到提示词中:
text
Layout correction requirements:
- <fix_hint_1>
- <fix_hint_2>
...
  • 如果所有尝试均未达到阈值,返回分数最高的候选图并标记
    layout_passed=false

Return Contract

返回契约

Worker Agent final response must be bare JSON (no extra text, no code fence).
工作Agent的最终响应必须是纯JSON(无额外文本,无代码块)。

Normal Flow

正常流程

json
{
  "status": "ok",
  "need_main_agent_send": true,
  "output_mode": "friendly|verbose",
  "result": {
    "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_2.png",
    "reference_image": "<resolved_reference_image>",
    "reference_short_caption": "<short caption from step 1>",
    "reference_long_caption": "<long caption from step 1>",
    "layout_blueprint": { "...": "..." },
    "new_long_caption": "<rewritten long caption from step 2>",
    "layout_passed": true,
    "selected_attempt": 2
  },
  "attempts": [
    {
      "attempt": 1,
      "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_1.png",
      "layout_similarity_score": 0.62,
      "style_similarity_score": 0.79,
      "pass": false,
      "major_deviations": ["center panel too narrow", "title block moved to top-right"]
    },
    {
      "attempt": 2,
      "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_2.png",
      "layout_similarity_score": 0.81,
      "style_similarity_score": 0.84,
      "pass": true,
      "major_deviations": []
    }
  ],
  "review": {
    "threshold": 0.75
  },
  "timing": {
    "total_elapsed_seconds": 24.56,
    "annotate": { "elapsed_seconds": 3.21, "model": "sensenova-122b" },
    "rewrite": { "elapsed_seconds": 2.45, "model": "sensenova-122b" },
    "generation_total": { "elapsed_seconds": 11.90, "model": "sn_image_model" },
    "review_total": { "elapsed_seconds": 7.00, "model": "sensenova-122b" }
  }
}
json
{
  "status": "ok",
  "need_main_agent_send": true,
  "output_mode": "friendly|verbose",
  "result": {
    "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_2.png",
    "reference_image": "<resolved_reference_image>",
    "reference_short_caption": "<short caption from step 1>",
    "reference_long_caption": "<long caption from step 1>",
    "layout_blueprint": { "...": "..." },
    "new_long_caption": "<rewritten long caption from step 2>",
    "layout_passed": true,
    "selected_attempt": 2
  },
  "attempts": [
    {
      "attempt": 1,
      "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_1.png",
      "layout_similarity_score": 0.62,
      "style_similarity_score": 0.79,
      "pass": false,
      "major_deviations": ["center panel too narrow", "title block moved to top-right"]
    },
    {
      "attempt": 2,
      "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_2.png",
      "layout_similarity_score": 0.81,
      "style_similarity_score": 0.84,
      "pass": true,
      "major_deviations": []
    }
  ],
  "review": {
    "threshold": 0.75
  },
  "timing": {
    "total_elapsed_seconds": 24.56,
    "annotate": { "elapsed_seconds": 3.21, "model": "sensenova-122b" },
    "rewrite": { "elapsed_seconds": 2.45, "model": "sensenova-122b" },
    "generation_total": { "elapsed_seconds": 11.90, "model": "sn_image_model" },
    "review_total": { "elapsed_seconds": 7.00, "model": "sensenova-122b" }
  }
}

Error Flow

错误流程

json
{
  "status": "error",
  "error": "<actual_error_message>"
}
Rules:
  • status=ok
    must include
    need_main_agent_send: true
  • result.image
    must be an existing generated image path
  • timing.total_elapsed_seconds
    covers full worker execution
  • If parsing of Step 1 format fails (including invalid blueprint JSON), return
    status=error
    (do not silently continue)
  • attempts
    must record each generation + review attempt
  • If no attempt passes threshold, return highest-score candidate and set
    result.layout_passed=false
json
{
  "status": "error",
  "error": "<actual_error_message>"
}
规则:
  • status=ok
    必须包含
    need_main_agent_send: true
  • result.image
    必须是已存在的生成图路径
  • timing.total_elapsed_seconds
    覆盖工作Agent的完整执行时间
  • 如果步骤1的格式解析失败(包括无效蓝图JSON),返回
    status=error
    (不静默继续执行)
  • attempts
    必须记录每次生成+检查尝试
  • 如果没有尝试通过阈值,返回分数最高的候选图并设置
    result.layout_passed=false

Output Format

输出格式

friendly mode (default)

friendly模式(默认)

  • One concise sentence: generated image follows reference style and updates to requested content
  • Mention whether layout consistency passed threshold and attempt count
  • Send single image:
    result.image
  • 一句简洁说明:生成图遵循参考图风格并更新为请求内容
  • 提及布局一致性是否通过阈值及尝试次数
  • 发送单张图片:
    result.image

verbose mode

verbose模式

Style imitation result
---
Reference short caption: <reference_short_caption>
---
Style/layout cues:
<brief extraction from reference_long_caption + layout_blueprint>
---
New long caption:
<new_long_caption>
---
#1 attempt=<n> layout_score=<0.00> style_score=<0.00> pass=<true|false> [selected]
  deviations: <major_deviations or none>
#2 attempt=<n> layout_score=<0.00> style_score=<0.00> pass=<true|false>
  deviations: <major_deviations or none>
...
---
Layout threshold: <0.75> | Passed: <true|false> | Selected: attempt <n>
Time statistics: Total <total>s | Annotation <t>s | Rewrite <t>s | Generation <t>s×<n> attempts | Review <t>s×<n> attempts
---
Images (selected image)
Style imitation result
---
Reference short caption: <reference_short_caption>
---
Style/layout cues:
<brief extraction from reference_long_caption + layout_blueprint>
---
New long caption:
<new_long_caption>
---
#1 attempt=<n> layout_score=<0.00> style_score=<0.00> pass=<true|false> [selected]
  deviations: <major_deviations or none>
#2 attempt=<n> layout_score=<0.00> style_score=<0.00> pass=<true|false>
  deviations: <major_deviations or none>
...
---
Layout threshold: <0.75> | Passed: <true|false> | Selected: attempt <n>
Time statistics: Total <total>s | Annotation <t>s | Rewrite <t>s | Generation <t>s×<n> attempts | Review <t>s×<n> attempts
---
Images (selected image)

Call Relationship

调用关系

  • Bottom-level dependency:
    sn-image-base
    sn-image-recognize
    ,
    sn-text-optimize
    ,
    sn-image-generate
  • 底层依赖:
    sn-image-base
    sn-image-recognize
    ,
    sn-text-optimize
    ,
    sn-image-generate

References

参考资料

  • prompts/image_annotate.md
    - Image annotation + layout blueprint system prompt (Step 1, required)
  • prompts/caption_rewrite.md
    - Caption rewrite system prompt with layout-lock constraints (Step 2, required)
  • prompts/layout_review.md
    - Candidate-vs-reference layout/style review prompt (Step 3, required)
  • ../sn-image-base/SKILL.md
    - Base tool behavior and parameter defaults
  • prompts/image_annotate.md
    - 图像标注+布局蓝图系统提示词(步骤1,必填)
  • prompts/caption_rewrite.md
    - 带布局锁定约束的描述改写系统提示词(步骤2,必填)
  • prompts/layout_review.md
    - 候选图与参考图的布局/风格检查提示词(步骤3,必填)
  • ../sn-image-base/SKILL.md
    - 基础工具行为与参数默认值",