sn-image-imitate

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

sn-image-imitate

Image style imitation scene skill (tier 1), relying on the

sn-image-recognize

sn-text-optimize

, and

sn-image-generate

tools provided by

sn-image-base

(tier 0).

Features:

Extracts high-fidelity long caption from a reference image
Rewrites caption according to user requested content change while preserving style and layout
Enforces layout-lock constraints during caption rewrite
Performs post-generation layout consistency review and bounded retries
Returns structured process artifacts for debugging and reproducibility

图像风格模仿场景技能（一级），依赖

sn-image-base

（零级）提供的

sn-image-recognize

、

sn-text-optimize

和

sn-image-generate

工具。

功能特性：

从参考图中提取高保真长描述
根据用户要求的内容修改改写描述，同时保留风格与布局
在描述改写阶段强制布局锁定约束
生成后执行布局一致性检查及有限次数重试
返回结构化流程产物，便于调试与复现

Non-goals

非目标场景

Pure neural style transfer without content change (use dedicated style-transfer tools instead)
Local editing / inpainting of specific regions within the reference image
Processing video or animation input (only single static images are supported)
Batch generation from multiple reference images in one invocation
Guaranteeing pixel-level fidelity to the reference; the skill targets layout and style consistency, not exact reproduction

无内容变更的纯神经风格迁移（请使用专用风格迁移工具）
对参考图特定区域进行局部编辑/修复
处理视频或动画输入（仅支持单张静态图片）
单次调用基于多张参考图批量生成
保证与参考图像素级一致；本技能目标是布局与风格一致性，而非精确复刻

Input Specification

输入规范

```
reference_image
```
(string, required): local path or URL of the style reference image
```
target_content
```
(string, required): new content user wants in the generated image
```
output_mode
```
(string, default
```
friendly
```
): output mode,
```
friendly
```
or
```
verbose
```
```
aspect_ratio
```
(string, default
```
16:9
```
): output aspect ratio for generation
```
image_size
```
(string, default
```
2k
```
): output image size preset
```
max_attempts
```
(int, default
```
3
```
): maximum generation attempts for meeting layout consistency
```
layout_threshold
```
(float, default
```
0.75
```
): minimum layout similarity score to accept result

```
reference_image
```
（字符串，必填）：风格参考图的本地路径或URL
```
target_content
```
（字符串，必填）：用户希望生成图包含的新内容
```
output_mode
```
（字符串，默认
```
friendly
```
）：输出模式，可选
```
friendly
```
或
```
verbose
```
```
aspect_ratio
```
（字符串，默认
```
16:9
```
）：生成图的宽高比
```
image_size
```
（字符串，默认
```
2k
```
）：生成图的尺寸预设
```
max_attempts
```
（整数，默认
```
3
```
）：满足布局一致性要求的最大生成尝试次数
```
layout_threshold
```
（浮点数，默认
```
0.75
```
）：可接受结果的最小布局相似度分数

Environment Variable

环境变量

Dependency installation and API key configuration are for sn-image-base skill.

The minimum environment variables to configure

sn-image-base

skill running with SenseNova Token Plan:

ini

SN_BASE_URL="https://token.sensenova.cn/v1"
SN_API_KEY="your-api-key"

Fallback priority is dedicated variable > domain shared variable > global variable. Text calls use

SN_TEXT_API_KEY

SN_CHAT_API_KEY

SN_API_KEY

; vision calls use

SN_VISION_API_KEY

SN_CHAT_API_KEY

SN_API_KEY

; image generation uses

SN_IMAGE_GEN_API_KEY

SN_API_KEY

Please refer to the Python dependencies and API keys section in

sn-image-generate_en.md

for more configurations.

依赖安装与API密钥配置需参考sn-image-base技能。

配置

sn-image-base

技能使用SenseNova Token Plan所需的最低环境变量：

ini

SN_BASE_URL="https://token.sensenova.cn/v1"
SN_API_KEY="your-api-key"

优先级顺序为：专用变量 > 域共享变量 > 全局变量。文本调用使用

SN_TEXT_API_KEY

SN_CHAT_API_KEY

SN_API_KEY

；视觉调用使用

SN_VISION_API_KEY

SN_CHAT_API_KEY

SN_API_KEY

；图像生成使用

SN_IMAGE_GEN_API_KEY

SN_API_KEY

。

更多配置请参考

sn-image-generate_en.md

中的Python依赖与API密钥章节。

API Configuration

API配置

All API calls in this skill are executed through the

sn_agent_runner.py

of the

sn-image-base

skill, please refer to the

sn-image-base

skill (README.md) for more details.

VLM call:
```
sn-image-recognize
```
(Step 1 & 3)
LLM call:
```
sn-text-optimize
```
(Step 2)
Image generation call:
```
sn-image-generate
```
(Step 3)

When encountering
MissingApiKeyError
or needing explicit model control: pass model and auth params explicitly via CLI arguments. See

$SN_IMAGE_BASE/references/api_spec.md

$SN_IMAGE_BASE
path explanation:

$SN_IMAGE_BASE

is the installation directory of the

sn-image-base

skill (

SKILL.md

exists). The agent can locate this path by skill name

sn-image-base

本技能中的所有API调用均通过

sn-image-base

技能的

sn_agent_runner.py

执行，更多细节请参考

sn-image-base

技能的README.md。

VLM调用：
```
sn-image-recognize
```
（步骤1和3）
LLM调用：
```
sn-text-optimize
```
（步骤2）
图像生成调用：
```
sn-image-generate
```
（步骤3）

遇到
MissingApiKeyError
或需要显式控制模型时：通过CLI参数显式传递模型和认证参数。详情请见

$SN_IMAGE_BASE/references/api_spec.md

。

$SN_IMAGE_BASE
路径说明：

$SN_IMAGE_BASE

是

sn-image-base

技能的安装目录（存在

SKILL.md

文件）。Agent可通过技能名称

sn-image-base

定位该路径。

Architecture: Main Agent + Worker Agent

架构：主Agent + 工作Agent

This skill uses a two-tier agent architecture:

Main Agent: receives user request, normalizes parameters, sends preflight, invokes Worker Agent, and sends final text/image to user
Worker Agent: executes fixed 3-step pipeline and returns structured JSON

Responsibility Boundaries:

Worker Agent does not send any user-visible message directly
Main Agent sends all user-facing responses
Worker Agent last message must be and only be the JSON string defined in Return Contract
Worker Agent executes VLM/LLM/image calls directly; no nested subagent for these low-level calls

本技能采用双层Agent架构：

主Agent：接收用户请求，标准化参数，发送预通知，调用工作Agent，并向用户发送最终文本/图片
工作Agent：执行固定三步流程并返回结构化JSON

职责边界：

工作Agent不直接发送任何用户可见消息
所有面向用户的响应均由主Agent发送
工作Agent的最后一条消息必须且只能是返回契约中定义的JSON字符串
工作Agent直接执行VLM/LLM/图像调用；此类底层调用不使用嵌套子Agent

Workflow

工作流程

Main Agent Workflow

主Agent工作流程

Extract

reference_image

target_content

output_mode

(default

friendly

aspect_ratio

(default

16:9

image_size

(default

2k

max_attempts

(default

), and

layout_threshold

(default

0.75

)

Validate required inputs:
- ```
reference_image
```
  is provided and resolvable
- ```
target_content
```
  is non-empty

Send preflight message:

"Using sn-image-imitate skill to generate a style-consistent image, please wait..."

Start Worker Agent with full normalized parameters and working directory
On Worker result:
- ```
status=ok
```
  : send final summary and generated image
- ```
status=error
```
  : report the actual error

提取

reference_image

、

target_content

、

output_mode

（默认

friendly

）、

aspect_ratio

（默认

16:9

）、

image_size

（默认

2k

）、

max_attempts

（默认

）和

layout_threshold

（默认

0.75

）

验证必填输入：
- ```
reference_image
```
  已提供且可解析
- ```
target_content
```
  非空

发送预通知消息：

"正在使用sn-image-imitate技能生成风格一致的图片，请稍候..."

使用完整标准化参数和工作目录启动工作Agent
处理工作Agent结果：
- ```
status=ok
```
  ：发送最终总结和生成的图片
- ```
status=error
```
  ：报告实际错误信息

Worker Agent Workflow

工作Agent工作流程

Worker Agent receives

reference_image

target_content

output_mode

aspect_ratio

image_size

max_attempts

layout_threshold

, and the working directory of this skill (

$SKILL_DIR

Error Handling Strategy:

All

sn_agent_runner.py

calls share the same error handling rules:

If the subprocess exits with non-zero code, crashes, or times out: do not fallback, return
```
status=error
```
with the actual error message from stderr or the system error string
If the subprocess returns invalid JSON or the JSON lacks an expected
```
result
```
field: return
```
status=error
```
, do not silently continue with empty or default values
If the VLM review call fails during Step 3, treat the attempt as incomplete: do not record a score, and either retry the review once or skip to the next attempt depending on remaining budget

工作Agent接收

reference_image

、

target_content

、

output_mode

、

aspect_ratio

、

image_size

、

max_attempts

、

layout_threshold

以及本技能的工作目录（

$SKILL_DIR

）。

错误处理策略：

所有

sn_agent_runner.py

调用遵循相同的错误处理规则：

如果子进程以非零代码退出、崩溃或超时：不进行降级处理，返回
```
status=error
```
并附带stderr中的实际错误信息或系统错误字符串
如果子进程返回无效JSON或JSON缺少预期的
```
result
```
字段：返回
```
status=error
```
，不静默使用空值或默认值继续执行
如果步骤3中的VLM检查调用失败：将该次尝试视为未完成，不记录分数，根据剩余尝试次数决定重试检查一次或直接进入下一次尝试

Step 0 — Initialization

步骤0 — 初始化

Generate
```
task_id
```
with format
```
YYYYMMDD_HHMMSS
```

Create temp directory:

/tmp/openclaw/sn-image-imitate/<task_id>/

TEMP_DIR

Resolve and normalize
```
REFERENCE_IMAGE
```
Persist user request:

bash

echo "$TARGET_CONTENT" > "$TEMP_DIR/target-content.txt"

生成格式为
```
YYYYMMDD_HHMMSS
```
的
```
task_id
```

创建临时目录：

/tmp/openclaw/sn-image-imitate/<task_id>/

，记为

TEMP_DIR

解析并标准化
```
REFERENCE_IMAGE
```
持久化用户请求：

bash

echo "$TARGET_CONTENT" > "$TEMP_DIR/target-content.txt"

Step 1 — Image Annotation (long caption + layout blueprint)

步骤1 — 图像标注（长描述 + 布局蓝图）

Use

prompts/image_annotate.md

as system prompt and call

sn-image-recognize

on reference image.

bash

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-recognize \
  --system-prompt-path "$SKILL_DIR/prompts/image_annotate.md" \
  --user-prompt "Please annotate this reference image and follow the required output format." \
  --images "$REFERENCE_IMAGE" \
  --output-format json

Parse JSON

result

, then parse three blocks:

```
SHORT_CAPTION: ...
```
```
LONG_CAPTION: ...
```
```
LAYOUT_BLUEPRINT_JSON: { ... }
```

If parsing fails,

LONG_CAPTION

is empty, or

LAYOUT_BLUEPRINT_JSON

is invalid JSON, return

status=error

Persist outputs:

bash

echo "$SHORT_CAPTION" > "$TEMP_DIR/reference-short-caption.txt"
echo "$LONG_CAPTION" > "$TEMP_DIR/reference-long-caption.txt"
echo "$LAYOUT_BLUEPRINT_JSON" > "$TEMP_DIR/layout-blueprint.json"

使用

prompts/image_annotate.md

作为系统提示词，对参考图调用

sn-image-recognize

。

bash

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-recognize \\
  --system-prompt-path "$SKILL_DIR/prompts/image_annotate.md" \\
  --user-prompt "Please annotate this reference image and follow the required output format." \\
  --images "$REFERENCE_IMAGE" \\
  --output-format json

解析JSON中的

result

，再解析三个模块：

```
SHORT_CAPTION: ...
```
```
LONG_CAPTION: ...
```
```
LAYOUT_BLUEPRINT_JSON: { ... }
```

如果解析失败、

LONG_CAPTION

为空或

LAYOUT_BLUEPRINT_JSON

是无效JSON，返回

status=error

。

持久化输出：

bash

echo "$SHORT_CAPTION" > "$TEMP_DIR/reference-short-caption.txt"
echo "$LONG_CAPTION" > "$TEMP_DIR/reference-long-caption.txt"
echo "$LAYOUT_BLUEPRINT_JSON" > "$TEMP_DIR/layout-blueprint.json"

Step 2 — New long caption generation (content rewrite with layout lock)

步骤2 — 新长描述生成（带布局锁定的内容改写）

Goal: preserve style/layout/visual language from reference long caption while replacing core content by

target_content

Hard constraints to preserve (guided by

layout-blueprint.json

visual hierarchy (title/subtitle/body emphasis order)
region topology (number of major blocks and their relative positions)
reading flow (left-to-right / top-to-bottom / radial / timeline direction)
chart type and data encoding form (if present)
spacing rhythm and alignment pattern
major region bounding boxes and topological relations from blueprint

Preferred system prompt:

prompts/caption_rewrite.md

(recommended to add). If missing, use inline fallback system prompt:

Rewrite the long caption by preserving style and layout constraints while replacing semantic content according to user target. Do not change block topology, reading order, or visual hierarchy. Keep the caption detailed and directly usable for image generation.

Call

sn-text-optimize

bash

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-text-optimize \
  --system-prompt-path "$SKILL_DIR/prompts/caption_rewrite.md" \
  --user-prompt "Reference long caption:\n$LONG_CAPTION\n\nLayout blueprint JSON:\n$LAYOUT_BLUEPRINT_JSON\n\nTarget content:\n$TARGET_CONTENT\n\nReturn only the rewritten long caption." \
  --output-format json

Parse JSON

result

NEW_LONG_CAPTION

. If empty, return

status=error

Persist output:

bash

echo "$NEW_LONG_CAPTION" > "$TEMP_DIR/new-long-caption.txt"

目标：保留参考图长描述中的风格/布局/视觉语言，同时根据

target_content

替换核心内容。

需保留的硬约束（由

layout-blueprint.json

引导）：

视觉层级（标题/副标题/正文的强调顺序）
区域拓扑结构（主要区块数量及其相对位置）
阅读流（从左到右/从上到下/放射状/时间线方向）
图表类型和数据编码形式（如果存在）
间距节奏和对齐模式
蓝图中的主要区域边界框及其拓扑关系

推荐系统提示词：

prompts/caption_rewrite.md

（建议添加）。如果缺失，使用内置降级系统提示词：

Rewrite the long caption by preserving style and layout constraints while replacing semantic content according to user target. Do not change block topology, reading order, or visual hierarchy. Keep the caption detailed and directly usable for image generation.

调用

sn-text-optimize

：

bash

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-text-optimize \\
  --system-prompt-path "$SKILL_DIR/prompts/caption_rewrite.md" \\
  --user-prompt "Reference long caption:\
$LONG_CAPTION\
\
Layout blueprint JSON:\
$LAYOUT_BLUEPRINT_JSON\
\
Target content:\
$TARGET_CONTENT\
\
Return only the rewritten long caption." \\
  --output-format json

解析JSON中的

result

作为

NEW_LONG_CAPTION

。如果为空，返回

status=error

。

持久化输出：

bash

echo "$NEW_LONG_CAPTION" > "$TEMP_DIR/new-long-caption.txt"

Step 3 — Image Generation and Layout Review Loop

步骤3 — 图像生成与布局检查循环

Execute

attempt

from

max_attempts

sequentially:

Generate Image (using

sn-image-base

sn-image-generate

tool):

bash

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-generate \
  --prompt "$CURRENT_PROMPT" \
  --aspect-ratio "$ASPECT_RATIO" \
  --image-size "$IMAGE_SIZE" \
  --save-path "$TEMP_DIR/attempt_<N>.png" \
  --output-format json

VLM configuration requirements:

When
```
max_attempts > 1
```
, VLM review is required for each attempt
Select VLM model from OpenClaw configuration as parameter for image recognition
If no suitable VLM model exists in OpenClaw configuration:
- Notify user that current parameter combination cannot be executed
- Suggest adding VLM configuration or setting
```
max_attempts
```
  to
```
1
```
  to skip review
If VLM call times out or fails: do not fallback, report the real error directly

Layout Consistency Review (only executed when

max_attempts > 1

Review candidate vs reference using

prompts/layout_review.md

(with blueprint as structural oracle):

bash

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-recognize \
  --system-prompt-path "$SKILL_DIR/prompts/layout_review.md" \
  --user-prompt "Reference is image[0], candidate is image[1]. Layout blueprint JSON:\n$LAYOUT_BLUEPRINT_JSON\n\nEvaluate layout similarity and return JSON only." \
  --images "$REFERENCE_IMAGE" "$TEMP_DIR/attempt_<N>.png" \
  --output-format json

Expected review JSON (inside

result

json

{
  "layout_similarity_score": 0.0,
  "style_similarity_score": 0.0,
  "pass": false,
  "major_deviations": [],
  "fix_hints": []
}

Save Attempt Result:

json

{
  "attempt": 1,
  "image": "$TEMP_DIR/attempt_1.png",
  "layout_similarity_score": 0.0,
  "style_similarity_score": 0.0,
  "pass": false,
  "major_deviations": [],
  "timing": {
    "image_generation": { "elapsed_seconds": 12.34, "model": "sn_image_model" },
    "vlm_review": { "elapsed_seconds": 5.67, "model": "sensenova-122b" }
  }
}

Note:

elapsed_seconds

is read from the

--output-format json

return of each CLI call;

image_generation.model

is fixed to the hardcoded placeholder

"sn_image_model"

(sn-image-generate does not return the model field);

vlm_review.model

is read from the JSON return of sn-image-recognize.

timing.vlm_review

is omitted when

max_attempts=1

Early Termination Check (only executed when

max_attempts > 1

Pass criteria:

layout_similarity_score >= layout_threshold

```
pass = true
```
If pass: immediately exit the loop, do not continue generating
If fail and attempts remain, append correction hints to prompt:

text

Layout correction requirements:
- <fix_hint_1>
- <fix_hint_2>
...

If all attempts fail to pass threshold, return highest-score candidate and mark
```
layout_passed=false
```

依次执行从

到

max_attempts

的

attempt

：

生成图像（使用

sn-image-base

的

sn-image-generate

工具）：

bash

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-generate \\
  --prompt "$CURRENT_PROMPT" \\
  --aspect-ratio "$ASPECT_RATIO" \\
  --image-size "$IMAGE_SIZE" \\
  --save-path "$TEMP_DIR/attempt_<N>.png" \\
  --output-format json

VLM配置要求：

当
```
max_attempts > 1
```
时，每次尝试都需要进行VLM检查
从OpenClaw配置中选择VLM模型作为图像识别参数
如果OpenClaw配置中没有合适的VLM模型：
- 通知用户当前参数组合无法执行
- 建议添加VLM配置或设置
```
max_attempts
```
  为
```
1
```
  以跳过检查
如果VLM调用超时或失败：不进行降级处理，直接报告真实错误

布局一致性检查（仅当

max_attempts > 1

时执行）：

使用

prompts/layout_review.md

（以蓝图为结构基准）对比候选图与参考图：

bash

python "$SN_IMAGE_BASE/scripts/sn_agent_runner.py" sn-image-recognize \\
  --system-prompt-path "$SKILL_DIR/prompts/layout_review.md" \\
  --user-prompt "Reference is image[0], candidate is image[1]. Layout blueprint JSON:\
$LAYOUT_BLUEPRINT_JSON\
\
Evaluate layout similarity and return JSON only." \\
  --images "$REFERENCE_IMAGE" "$TEMP_DIR/attempt_<N>.png" \\
  --output-format json

预期检查JSON（位于

result

内）：

json

{
  "layout_similarity_score": 0.0,
  "style_similarity_score": 0.0,
  "pass": false,
  "major_deviations": [],
  "fix_hints": []
}

保存尝试结果：

json

{
  "attempt": 1,
  "image": "$TEMP_DIR/attempt_1.png",
  "layout_similarity_score": 0.0,
  "style_similarity_score": 0.0,
  "pass": false,
  "major_deviations": [],
  "timing": {
    "image_generation": { "elapsed_seconds": 12.34, "model": "sn_image_model" },
    "vlm_review": { "elapsed_seconds": 5.67, "model": "sensenova-122b" }
  }
}

注意：

elapsed_seconds

从每个CLI调用的

--output-format json

返回值中读取；

image_generation.model

固定为硬编码占位符

"sn_image_model"

（sn-image-generate不返回模型字段）；

vlm_review.model

从sn-image-recognize的JSON返回值中读取。当

max_attempts=1

时，省略

timing.vlm_review

。

提前终止检查（仅当

max_attempts > 1

时执行）：

通过条件：

layout_similarity_score >= layout_threshold

```
pass = true
```
如果通过：立即退出循环，不再继续生成
如果未通过且还有剩余尝试次数，将修正提示追加到提示词中：

text

Layout correction requirements:
- <fix_hint_1>
- <fix_hint_2>
...

如果所有尝试均未达到阈值，返回分数最高的候选图并标记
```
layout_passed=false
```

Return Contract

返回契约

Worker Agent final response must be bare JSON (no extra text, no code fence).

工作Agent的最终响应必须是纯JSON（无额外文本，无代码块）。

Normal Flow

正常流程

json

{
  "status": "ok",
  "need_main_agent_send": true,
  "output_mode": "friendly|verbose",
  "result": {
    "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_2.png",
    "reference_image": "<resolved_reference_image>",
    "reference_short_caption": "<short caption from step 1>",
    "reference_long_caption": "<long caption from step 1>",
    "layout_blueprint": { "...": "..." },
    "new_long_caption": "<rewritten long caption from step 2>",
    "layout_passed": true,
    "selected_attempt": 2
  },
  "attempts": [
    {
      "attempt": 1,
      "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_1.png",
      "layout_similarity_score": 0.62,
      "style_similarity_score": 0.79,
      "pass": false,
      "major_deviations": ["center panel too narrow", "title block moved to top-right"]
    },
    {
      "attempt": 2,
      "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_2.png",
      "layout_similarity_score": 0.81,
      "style_similarity_score": 0.84,
      "pass": true,
      "major_deviations": []
    }
  ],
  "review": {
    "threshold": 0.75
  },
  "timing": {
    "total_elapsed_seconds": 24.56,
    "annotate": { "elapsed_seconds": 3.21, "model": "sensenova-122b" },
    "rewrite": { "elapsed_seconds": 2.45, "model": "sensenova-122b" },
    "generation_total": { "elapsed_seconds": 11.90, "model": "sn_image_model" },
    "review_total": { "elapsed_seconds": 7.00, "model": "sensenova-122b" }
  }
}

json

{
  "status": "ok",
  "need_main_agent_send": true,
  "output_mode": "friendly|verbose",
  "result": {
    "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_2.png",
    "reference_image": "<resolved_reference_image>",
    "reference_short_caption": "<short caption from step 1>",
    "reference_long_caption": "<long caption from step 1>",
    "layout_blueprint": { "...": "..." },
    "new_long_caption": "<rewritten long caption from step 2>",
    "layout_passed": true,
    "selected_attempt": 2
  },
  "attempts": [
    {
      "attempt": 1,
      "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_1.png",
      "layout_similarity_score": 0.62,
      "style_similarity_score": 0.79,
      "pass": false,
      "major_deviations": ["center panel too narrow", "title block moved to top-right"]
    },
    {
      "attempt": 2,
      "image": "/tmp/openclaw/sn-image-imitate/<task_id>/attempt_2.png",
      "layout_similarity_score": 0.81,
      "style_similarity_score": 0.84,
      "pass": true,
      "major_deviations": []
    }
  ],
  "review": {
    "threshold": 0.75
  },
  "timing": {
    "total_elapsed_seconds": 24.56,
    "annotate": { "elapsed_seconds": 3.21, "model": "sensenova-122b" },
    "rewrite": { "elapsed_seconds": 2.45, "model": "sensenova-122b" },
    "generation_total": { "elapsed_seconds": 11.90, "model": "sn_image_model" },
    "review_total": { "elapsed_seconds": 7.00, "model": "sensenova-122b" }
  }
}

Error Flow

错误流程

json

{
  "status": "error",
  "error": "<actual_error_message>"
}

Rules:

```
status=ok
```
must include
```
need_main_agent_send: true
```
```
result.image
```
must be an existing generated image path
```
timing.total_elapsed_seconds
```
covers full worker execution
If parsing of Step 1 format fails (including invalid blueprint JSON), return
```
status=error
```
(do not silently continue)
```
attempts
```
must record each generation + review attempt
If no attempt passes threshold, return highest-score candidate and set
```
result.layout_passed=false
```

json

{
  "status": "error",
  "error": "<actual_error_message>"
}

规则：

```
status=ok
```
必须包含
```
need_main_agent_send: true
```
```
result.image
```
必须是已存在的生成图路径
```
timing.total_elapsed_seconds
```
覆盖工作Agent的完整执行时间
如果步骤1的格式解析失败（包括无效蓝图JSON），返回
```
status=error
```
（不静默继续执行）
```
attempts
```
必须记录每次生成+检查尝试
如果没有尝试通过阈值，返回分数最高的候选图并设置
```
result.layout_passed=false
```

Output Format

输出格式

friendly mode (default)

friendly模式（默认）

One concise sentence: generated image follows reference style and updates to requested content
Mention whether layout consistency passed threshold and attempt count
Send single image:
```
result.image
```

一句简洁说明：生成图遵循参考图风格并更新为请求内容
提及布局一致性是否通过阈值及尝试次数
发送单张图片：
```
result.image
```

verbose mode

verbose模式

Style imitation result
---
Reference short caption: <reference_short_caption>
---
Style/layout cues:
<brief extraction from reference_long_caption + layout_blueprint>
---
New long caption:
<new_long_caption>
---
#1 attempt=<n> layout_score=<0.00> style_score=<0.00> pass=<true|false> [selected]
  deviations: <major_deviations or none>
#2 attempt=<n> layout_score=<0.00> style_score=<0.00> pass=<true|false>
  deviations: <major_deviations or none>
...
---
Layout threshold: <0.75> | Passed: <true|false> | Selected: attempt <n>
Time statistics: Total <total>s | Annotation <t>s | Rewrite <t>s | Generation <t>s×<n> attempts | Review <t>s×<n> attempts
---
Images (selected image)

Style imitation result
---
Reference short caption: <reference_short_caption>
---
Style/layout cues:
<brief extraction from reference_long_caption + layout_blueprint>
---
New long caption:
<new_long_caption>
---
#1 attempt=<n> layout_score=<0.00> style_score=<0.00> pass=<true|false> [selected]
  deviations: <major_deviations or none>
#2 attempt=<n> layout_score=<0.00> style_score=<0.00> pass=<true|false>
  deviations: <major_deviations or none>
...
---
Layout threshold: <0.75> | Passed: <true|false> | Selected: attempt <n>
Time statistics: Total <total>s | Annotation <t>s | Rewrite <t>s | Generation <t>s×<n> attempts | Review <t>s×<n> attempts
---
Images (selected image)

Call Relationship

调用关系

Bottom-level dependency:

sn-image-base

→

sn-image-recognize

sn-text-optimize

sn-image-generate

底层依赖：

sn-image-base

→

sn-image-recognize

sn-text-optimize

sn-image-generate

References

参考资料

```
prompts/image_annotate.md
```
- Image annotation + layout blueprint system prompt (Step 1, required)
```
prompts/caption_rewrite.md
```
- Caption rewrite system prompt with layout-lock constraints (Step 2, required)
```
prompts/layout_review.md
```
- Candidate-vs-reference layout/style review prompt (Step 3, required)
```
../sn-image-base/SKILL.md
```
- Base tool behavior and parameter defaults

```
prompts/image_annotate.md
```
- 图像标注+布局蓝图系统提示词（步骤1，必填）
```
prompts/caption_rewrite.md
```
- 带布局锁定约束的描述改写系统提示词（步骤2，必填）
```
prompts/layout_review.md
```
- 候选图与参考图的布局/风格检查提示词（步骤3，必填）
```
../sn-image-base/SKILL.md
```
- 基础工具行为与参数默认值",