image-to-svg

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Image to SVG

图像转SVG

Recreate raster images as high-quality SVGs by decomposing, studying, and rebuilding each visual element independently.
通过分解、分析并独立重建每个视觉元素,将光栅图像重新创建为高质量SVG。

Core Principles

核心原则

Never try to reproduce the whole image at once. The quality comes from isolating each feature, studying it closely against a cropped reference, and building it as a standalone SVG before compositing.
Correctness over speed. Every shortcut in this workflow compounds into visible quality loss in the final output. Batching crop verification, skipping programmatic checks, eyeballing coordinates instead of measuring, settling for "looks about right" instead of running the diff — each saves a minute but costs ten in rework or produces a visibly worse result. The value of this skill is in the output quality. Take the time to verify at every step.
切勿尝试一次性还原整个图像。高质量的关键在于分离每个元素,对照裁剪后的参考图仔细研究,将其构建为独立的SVG后再进行合成。
正确性优先于速度。此工作流中的任何捷径都会在最终输出中累积成可见的质量损失。批量跳过裁剪验证、省略程序化检查、凭目测估算坐标而非测量、满足于“看起来差不多”而不进行差异对比——这些操作每节省一分钟,都会导致十分钟的返工或产出质量明显下降的结果。这项技能的价值在于输出质量,务必在每个步骤都花时间进行验证。

When to Use

适用场景

  • Converting an image (photo, illustration, AI art) to SVG
  • Creating vector versions of logos, mascots, icons, or artwork
  • Extracting specific elements from images as scalable graphics
  • The user provides a reference image and wants an SVG recreation
  • 将图像(照片、插画、AI艺术作品)转换为SVG
  • 创建标志、吉祥物、图标或艺术作品的矢量版本
  • 从图像中提取特定元素作为可缩放图形
  • 用户提供参考图像并需要SVG复制品

Instructions

操作指南

You are converting a raster image into an SVG recreation. Follow the phases below in order.
This skill uses incremental discovery — reference files live in subdirectories adjacent to this skill (
analysis/
,
features/
,
styles/
,
workflow/
). Read them only when a specific phase or condition calls for them. Do not read all reference files upfront.
你需要将光栅图像转换为SVG复制品,请按以下阶段依次执行。
本技能采用增量式探索——参考文件存放在本技能相邻的子目录中(
analysis/
features/
styles/
workflow/
)。仅在特定阶段或条件要求时才读取这些文件,请勿提前读取所有参考文件

Phase 0: Environment Setup

阶段0:环境设置

Read
workflow/workflow-dependencies.md
and run the dependency check script. Ensure required tools (
magick
,
rsvg-convert
,
xmllint
) are available. For optional tools (
vtracer
,
svgo
), check availability and note which enhancements are possible.
If
vtracer
is not installed and Python is available, ask the user where they'd like the virtual environment before creating one. See the dependencies file for the three options (project-local, shared skill venv, user-specified).
读取
workflow/workflow-dependencies.md
并运行依赖检查脚本。确保所需工具(
magick
rsvg-convert
xmllint
)已可用。对于可选工具(
vtracer
svgo
),检查其可用性并记录可实现的增强功能。
如果
vtracer
未安装但Python可用,请询问用户希望在何处创建虚拟环境,可参考依赖文件中的三个选项(项目本地、共享技能虚拟环境、用户指定路径)。

Phase 1: Analyze the Image

阶段1:图像分析

Initial analysis — these are independent. Delegate to parallel subagents so each can focus fully on its concern:
  1. Identify the art style. Spawn a subagent that reads
    styles/styles-identification.md
    , studies the image, and reports back with the style classification and reasoning (line work, shape language, color approach, detail level). The identified style determines which techniques you'll use later.
  2. Build your observation framework. Spawn a subagent that reads
    analysis/analysis-asking-questions.md
    , studies the image, and reports back with answers to the key observation questions — especially the Construction and Structural questions for complex objects. This report informs the decomposition step.
  3. Handle transparency. Check programmatically:
    magick identify -format "%[channels]" original.png
    — if it reports
    srgba
    or similar alpha channel, note whether the transparent background should be preserved in the final SVG (common for emoji/stickers) or filled with a solid color.
Decompose — depends on the observation framework above:
  1. Decompose into features. Read
    analysis/analysis-identifying-concepts.md
    . Break the image into independent visual elements and establish a z-order (layer stack).
Parallel preparation — these all depend on the feature list from step 4 but are independent of each other. Run them in parallel:
  1. Create and verify reference crops. Read
    analysis/analysis-reference-crops.md
    . Write
    feature-locations.yml
    with bounding boxes for every feature, then crop all features from it in one pass. Run the programmatic edge-margin check on every crop — this is the most common failure point. Fix any failing crops by adjusting the YAML and re-cropping (don't re-estimate from the image). Then visually verify each crop individually (one per Read call, not batched). Do not proceed to the build phase with any clipped crops.
  2. Measure and map coordinates programmatically. Read
    workflow/workflow-verification.md
    for the measurement pipeline. Do NOT eyeball feature coordinates — small estimation errors compound across features and ruin proportions.
    • Determine canvas size (512x512 standard for emoji/icons; use original aspect ratio for other subjects)
    • Use ImageMagick to measure the original image dimensions and compute the scale factor to canvas
    • Identify proportion anchors: 3-5 key measured points (e.g., "head center at 35% of character height, chin at 52%, feet at 95%"). Express as ratios, not absolute pixels — ratios survive the canvas remapping.
    • Compute each feature's bounding box by measuring from the original and scaling to canvas coordinates
    • Record inter-feature relationships — not just individual bounding boxes but how features relate: "mouth width = 55% of face width", "gap between boots = 15% of body width", "ears extend 20% past hat brim edge". These relative measurements are what make proportions look right when features are built independently.
    • Record the feature map with measured coordinates, proportion ratios, relationships, and z-ordering. This map travels with every swarmed agent.
  3. Write a subject brief. In 2-3 sentences, describe the personality, expression, and overall vibe of the subject ("a cheeky, confident goblin with a big happy grin and a proud crossed-arms stance"). This qualitative description travels with every agent alongside the measurements — it gives agents a target for the feeling of the subject, not just the geometry. Without it, agents produce features that are technically correct but lack the character's personality.
After crops are verified (step 5 must complete first):
  1. Extract trace metadata from crops. If
    vtracer
    is available, read
    workflow/workflow-trace-metadata.md
    . Auto-trace each feature crop in polygon mode and extract structured metadata: color palettes, sub-element positions/sizes, area percentages, and topology hints. This gives agents precise numeric data (~130 tokens per feature) instead of requiring them to eyeball colors and positions from the raster image. Add the trace metadata to each feature's entry in the feature map.
    If
    vtracer
    is not available, fall back to ImageMagick color extraction:
    bash
    magick refs/{feature}.png -resize 200x200 -kmeans 10 -unique-colors txt: | tail -n +2 | tr -s ' ' | cut -d' ' -f3
    This gives accurate hex values but no spatial sub-element data.
初始分析——以下步骤相互独立,可委托给并行子Agent,让每个Agent专注于各自的任务:
  1. 识别艺术风格。启动一个子Agent,读取
    styles/styles-identification.md
    ,分析图像并返回风格分类及推理依据(线条表现、形状语言、色彩方式、细节程度)。识别出的风格将决定后续使用的技术。
  2. 构建观察框架。启动一个子Agent,读取
    analysis/analysis-asking-questions.md
    ,分析图像并返回关键观察问题的答案——尤其是复杂对象的结构与构造类问题。这份报告将为分解步骤提供依据。
  3. 处理透明度。通过编程方式检查:
    magick identify -format "%[channels]" original.png
    ——如果结果显示
    srgba
    或类似带alpha通道的格式,需记录最终SVG中是否应保留透明背景(表情符号/贴纸常用)或填充为纯色。
分解图像——依赖上述观察框架:
  1. 分解为独立元素。读取
    analysis/analysis-identifying-concepts.md
    ,将图像拆分为独立视觉元素并确定堆叠顺序(图层层级)。
并行准备——以下步骤均依赖步骤4的元素列表,但相互独立,可并行执行:
  1. 创建并验证参考裁剪图。读取
    analysis/analysis-reference-crops.md
    ,编写
    feature-locations.yml
    文件,记录每个元素的边界框,然后一次性裁剪所有元素。对每个裁剪图执行程序化边缘余量检查——这是最常见的失败点。如果有裁剪图未通过检查,调整YAML文件后重新裁剪(请勿从图像重新估算)。然后逐个视觉验证每个裁剪图(每次仅验证一个,不要批量处理)。若存在裁剪不完整的图,请勿进入构建阶段。
  2. 通过编程方式测量并映射坐标。读取
    workflow/workflow-verification.md
    中的测量流程。切勿凭目测估算元素坐标——微小的估算误差会在多个元素间累积,破坏整体比例。
    • 确定画布尺寸(表情符号/图标标准为512x512;其他主题使用原始图像的宽高比)
    • 使用ImageMagick测量原始图像尺寸,并计算缩放到目标画布的比例因子
    • 确定比例锚点:3-5个关键测量点(例如“头部中心位于角色身高的35%处,下巴位于52%处,脚部位于95%处”)。使用比例而非绝对像素值——比例在画布重映射时仍能保持准确。
    • 通过测量原始图像并缩放到画布坐标,计算每个元素的边界框
    • 记录元素间关系——不仅是单个元素的边界框,还要记录元素间的关联:“嘴巴宽度 = 脸部宽度的55%”、“靴子间距 = 身体宽度的15%”、“耳朵超出帽檐边缘20%”。这些相对测量值是保证比例协调的关键。
    • 记录包含测量坐标、比例锚点、元素间关系和堆叠顺序的元素映射表。该映射表将分发给所有参与协作的Agent。
  3. 编写主题简介。用2-3句话描述主题的个性、表情和整体氛围(例如“一个调皮自信的地精,带着大大的笑容,双臂交叉摆出骄傲的姿态”)。这份定性描述将与测量数据一起分发给所有Agent——它为Agent提供了主题的“感觉”目标,而非仅几何数据。没有这份简介,Agent产出的元素可能技术上正确,但缺乏角色的个性。
裁剪图验证完成后(必须先完成步骤5):
  1. 从裁剪图中提取追踪元数据。如果
    vtracer
    可用,读取
    workflow/workflow-trace-metadata.md
    。以多边形模式自动追踪每个元素的裁剪图,并提取结构化元数据:调色板、子元素位置/尺寸、面积百分比和拓扑提示。这为Agent提供了精确的数值数据(每个元素约130个token),无需Agent从光栅图像中目测颜色和位置。将追踪元数据添加到元素映射表中对应元素的条目里。
    如果
    vtracer
    不可用,退而使用ImageMagick提取颜色:
    bash
    magick refs/{feature}.png -resize 200x200 -kmeans 10 -unique-colors txt: | tail -n +2 | tr -s ' ' | cut -d' ' -f3
    此命令可提供准确的十六进制颜色值,但无法提供空间子元素数据。

Phase 2: Build Each Feature (Agent Swarm)

阶段2:构建每个元素(Agent集群协作)

Once the style is identified, reference crops verified, and the feature map established, agent swarm the feature builds. Each feature is independent — they can be built in parallel by separate agents, each working from its own reference crop.
确定风格、验证裁剪图并建立元素映射表后,使用Agent集群构建元素。每个元素相互独立——可由不同的Agent并行构建,每个Agent仅处理对应的参考裁剪图。

Character/face images

人物/面部图像

For character or face images, read the relevant feature reference sheet from
features/
before building each element:
FeatureReference file
Eyes
features/features-eyes.md
Mouth
features/features-mouth.md
Nose
features/features-nose.md
Ears
features/features-ears.md
Face shape
features/features-face-shape.md
Hair
features/features-hair.md
Body
features/features-body.md
Accessories
features/features-accessories.md
Complex objects (held items, props)
features/features-objects.md
Only read the reference sheets for features that exist in the image.
对于人物或面部图像,在构建每个元素前,先读取
features/
目录中对应的元素参考手册:
元素参考文件
眼睛
features/features-eyes.md
嘴巴
features/features-mouth.md
鼻子
features/features-nose.md
耳朵
features/features-ears.md
脸型
features/features-face-shape.md
头发
features/features-hair.md
身体
features/features-body.md
配饰
features/features-accessories.md
复杂物品(手持物品、道具)
features/features-objects.md
仅读取图像中存在的元素对应的参考手册。

Non-character images (landscapes, logos, objects, abstract)

非人物图像(风景、标志、物品、抽象图形)

The
features/
reference sheets are character-specific. For other subjects, decompose by visual layer instead:
  • Logos/icons: background shape, primary symbol, text (as traced paths — do not use
    <text>
    elements, since fonts won't match), secondary elements, border/frame
  • Landscapes/scenes: sky/background, distant elements, midground, foreground, focal subject, atmospheric effects (fog, light rays)
  • Objects/products: Read
    features/features-objects.md
    for detailed guidance on structural decomposition. Objects have internal structure, multiple visible surfaces, and perspective complexity that goes far beyond silhouette + fill. Decompose into structural parts (panels, ribs, joints, handles), not just color regions.
  • Vehicles/machines: Read
    features/features-vehicles.md
    . Vehicles are panel assemblies — decompose by body panels, glass, wheels, lights, and trim. Panel lines and metallic gradients are critical.
  • Food/drinks: Read
    features/features-food.md
    . Shape-building approach with glossy highlights, layered construction, and steam/aroma effects.
  • Plants/flowers: Read
    features/features-plants.md
    . Radial petal symmetry with
    <use>
    +
    rotate
    , leaf construction with vein clipping.
  • Abstract/patterns: base layer, repeating motifs (use
    <pattern>
    or
    <use>
    where possible), accent elements, overlay effects
  • Hybrid images: For images that combine categories (character holding an object in a landscape), use the focal subject's decomposition as primary and treat secondary elements more simply.
The same principles apply: one crop per element, one standalone SVG per layer, same composite viewBox. Read
analysis/analysis-asking-questions.md
for each element — the shape, color, and position questions are universal.
features/
目录下的参考手册针对人物设计。对于其他主题,按视觉图层进行分解:
  • 标志/图标:背景形状、主符号、文字(转换为描边路径——请勿使用
    <text>
    元素,因为字体可能不匹配)、次要元素、边框/框架
  • 风景/场景:天空/背景、远景元素、中景、前景、焦点主体、大气效果(雾、光线)
  • 物品/产品:读取
    features/features-objects.md
    获取结构分解的详细指南。物品具有内部结构、多个可见表面和透视复杂度,远不止轮廓+填充。分解为结构部件(面板、肋条、关节、把手),而非仅颜色区域。
  • 交通工具/机械:读取
    features/features-vehicles.md
    。交通工具是面板组件——分解为车身面板、玻璃、车轮、车灯和装饰条。面板线条和金属渐变至关重要。
  • 食物/饮品:读取
    features/features-food.md
    。采用形状构建方法,添加光泽高光、分层结构和蒸汽/香气效果。
  • 植物/花卉:读取
    features/features-plants.md
    。使用
    <use>
    +
    rotate
    实现花瓣的径向对称,通过裁剪实现叶脉效果。
  • 抽象/图案:基础层、重复图案(尽可能使用
    <pattern>
    <use>
    )、强调元素、叠加效果
  • 混合图像:对于跨类别的图像(人物在风景中手持物品),以焦点主体的分解方式为主,次要元素采用更简单的处理方式。
核心原则保持不变:每个元素对应一个裁剪图、一个独立SVG图层、相同的合成viewBox。为每个元素读取
analysis/analysis-asking-questions.md
——关于形状、颜色和位置的问题是通用的。

Expression-critical features

表情关键元素

Some features are disproportionately important because they define the character's personality or the object's identity. These get extra comparison rigor — more iteration passes, programmatic diff verification, and side-by-side checks before moving to composition:
  • Mouth/smile — the single biggest driver of expression. Curvature, width, and upturn at corners must match closely.
  • Eye gaze — pupil position and highlight placement determine where the character is looking and how it "feels."
  • Overall proportions — head-to-body ratio, stance width, limb length. If these are off, no amount of detail fixes the result.
  • Signature features — whatever makes this specific subject recognizable (a distinctive hat, a specific logo, a unique silhouette).
For these features, always run the programmatic diff (see "Render-Compare Loop" below) and iterate until the diff score converges, even if it means exceeding 3 passes.
某些元素对定义角色个性或物品身份尤为重要,这些元素需要额外的对比验证——更多迭代次数、程序化差异验证,以及在合成前进行并排检查:
  • 嘴巴/笑容——影响表情的最关键因素。曲率、宽度和嘴角上扬角度必须与参考图高度匹配。
  • 眼神——瞳孔位置和高光决定了角色的注视方向和“感受”。
  • 整体比例——头身比、站姿宽度、肢体长度。如果这些比例失调,再多的细节也无法修复结果。
  • 标志性特征——让该主题具有辨识度的独特元素(例如独特的帽子、特定的标志、唯一的轮廓)。
对于这些元素,务必运行完整的程序化差异对比(见下文“渲染-对比循环”)并迭代,直到差异分数收敛,即使这意味着超过3次迭代。

Building each feature

构建每个元素

For each feature:
  1. Study the cropped reference image in isolation and the full original image for proportion context
  2. Ask yourself the questions from
    analysis/analysis-asking-questions.md
  3. Consider what's hidden — if this feature is partially obscured by another (head under hat, face under hair), the layer should still extend under the obscuring element. See "Handling Obscured Content" below.
  4. Build as a standalone SVG in
    parts/
  5. Apply the appropriate art style techniques — read
    styles/styles-line-and-brush.md
    for illustrated/cartoon styles, or
    styles/styles-geometric.md
    for flat/geometric styles, or
    styles/styles-applying-to-lifelike.md
    for photographic/realistic images. Read only the style file matching the style identified in Phase 1.
  6. Always read
    styles/styles-curves-and-shapes.md
    for curve construction techniques. This applies to all styles — it covers how to actually build shapes with the right SVG path commands, when to use filled shapes vs strokes, and how to construct organic curves. This is the bridge between "what should it look like" and "how do I build it in SVG."
  7. Render and compare — see "Render-Compare Loop" below
Prefer complex construction over simple geometry (except for images identified as geometric/flat style — defer to
styles/styles-geometric.md
for those). A filled shape built from cubic Beziers with proper width variation, per-panel lighting, and structural detail will always produce a more valuable result than a circle with a stroke. Only use SVG primitives (
<circle>
,
<rect>
,
<ellipse>
) when the reference image genuinely shows a perfect geometric shape or the style is explicitly flat/geometric. When in doubt, build the more complex version — the visual quality difference is substantial.
对于每个元素:
  1. 单独研究裁剪后的参考图,同时结合完整原始图确认比例上下文
  2. 思考
    analysis/analysis-asking-questions.md
    中的问题
  3. 考虑被遮挡的部分——如果该元素被其他元素部分遮挡(例如帽子下的头部、头发下的脸部),该图层仍应延伸到遮挡元素下方。详见下文“处理被遮挡内容”。
  4. parts/
    目录中构建独立的SVG
  5. 应用相应的艺术风格技术——如果是插画/卡通风格,读取
    styles/styles-line-and-brush.md
    ;如果是扁平化/几何风格,读取
    styles/styles-geometric.md
    ;如果是照片/写实风格,读取
    styles/styles-applying-to-lifelike.md
    。仅读取与阶段1中识别出的风格匹配的文件。
  6. **务必读取
    styles/styles-curves-and-shapes.md
    **中的曲线构建技术。这适用于所有风格——它涵盖了如何使用正确的SVG路径命令构建形状、何时使用填充形状而非描边,以及如何构建有机曲线。这是连接“外观目标”与“SVG实现方式”的桥梁。
  7. 渲染并对比——见下文“渲染-对比循环”
优先选择复杂构建而非简单几何形状(除了被识别为几何/扁平化风格的图像——此类图像遵循
styles/styles-geometric.md
的指导)。使用三次贝塞尔曲线构建的、具有适当宽度变化、逐面板光影和结构细节的填充形状,永远比带描边的圆形更有价值。只有当参考图确实显示完美的几何形状,或风格明确为扁平化/几何风格时,才使用SVG基本图形(
<circle>
<rect>
<ellipse>
)。如有疑问,选择更复杂的构建方式——视觉质量的差异非常显著。

Agent Swarming

Agent集群协作

One feature = one SVG = one agent. No exceptions. Even trivially simple features (a nose that's just two dots, a sparkle, a small badge) get their own file and their own agent. The cost of an extra agent is low; the cost of coupled parts during compositing and future animation is high.
Paired features are ALWAYS separate. Left eye and right eye are separate agents writing separate SVGs. Same for left/right ears, left/right boots, left/right arms. They will be checked for consistency in the alignment phase (Phase 3) — that's where consistency is enforced, not by having one agent build both.
Body parts are independent. Arms are separate from the torso. Each leg is separate. The head is separate from the neck. Think of each part as something that might animate independently later — an arm could move while the body stays still, one ear could wiggle while the other doesn't.
If there are 5 features, spawn 5 agents. If there are 50 features, spawn 50 agents. The parallelism is the point. Each agent receives:
  • The reference crop for its feature(s)
  • The full original image — the crop is for detail, the full image is for proportion context. An agent building a mouth can't judge whether the grin is wide enough without seeing the full face.
  • The subject brief from Phase 1 step 7 — the personality/expression/vibe target
  • The identified art style description
  • The relevant feature reference sheet (from
    features/
    )
  • The relevant style technique file (
    styles/styles-line-and-brush.md
    ,
    styles/styles-geometric.md
    , or
    styles/styles-applying-to-lifelike.md
    )
  • The curve construction reference (
    styles/styles-curves-and-shapes.md
    ) — always included
  • The verification pipeline (
    workflow/workflow-verification.md
    ) — always included
  • The feature map with measured coordinates, proportion anchors, and inter-feature relationships from Phase 1 step 6
  • The trace metadata for this feature from Phase 1 step 8 (color palette, sub-element positions/sizes, topology) — if available
Describe features quantitatively, not qualitatively. When briefing agents, text descriptions lose visual nuance — "wide grin" doesn't convey the exact curvature, "thick brim" is ambiguous. Instead use measurements: "mouth width = 55% of face width", "brim height = 5% of hat height, follows dome curvature". Adjectives fail; ratios survive.
  • Whether this feature is expression-critical (see above) — if so, the agent should run the full programmatic diff loop
  • Instructions to write its standalone SVG to
    parts/{feature-name}.svg
All agents must use the same
viewBox
as the composite canvas
(e.g.,
viewBox="0 0 512 512"
). Each agent positions its feature within the full canvas coordinates using the bounding box from the feature map. This ensures parts align without rescaling during composition.
Features that interact (e.g., face + ears, hair + hat) should be noted but built independently — interactions are resolved in Phase 4. For tightly coupled features (ears + face contour, hair + hat brim), include the neighboring feature's bounding box so the agent knows where the boundary sits.
一个元素 = 一个SVG = 一个Agent。无例外。即使是极其简单的元素(仅两个点的鼻子、闪光效果、小徽章),也应有自己的文件和对应的Agent。增加一个Agent的成本很低,但在合成和未来动画中,耦合元素的成本很高。
成对元素必须完全独立。左眼和右眼由不同的Agent处理,生成独立的SVG。左耳和右耳、左脚靴和右脚靴、左臂和右臂也是如此。它们的一致性将在阶段3(对齐阶段)进行检查——一致性是在该阶段保证的,而非由单个Agent同时构建成对元素。
身体部位相互独立。手臂与躯干分离,每条腿独立,头部与颈部分离。将每个部位视为未来可能独立动画的元素——例如手臂可以移动而身体保持静止,一只耳朵可以摆动而另一只不动。
如果有5个元素,启动5个Agent;如果有50个元素,启动50个Agent。并行处理是关键。每个Agent将收到:
  • 对应元素的参考裁剪图
  • 完整原始图像——裁剪图用于查看细节,完整图像用于确认比例上下文。构建嘴巴的Agent如果看不到完整脸部,无法判断笑容的宽度是否合适。
  • 阶段1步骤7中的主题简介——个性/表情/氛围目标
  • 识别出的艺术风格描述
  • 对应的元素参考手册(来自
    features/
  • 对应的风格技术文件(
    styles/styles-line-and-brush.md
    styles/styles-geometric.md
    styles/styles-applying-to-lifelike.md
  • 曲线构建参考手册(
    styles/styles-curves-and-shapes.md
    )——始终包含
  • 验证流程(
    workflow/workflow-verification.md
    )——始终包含
  • 阶段1步骤6中的元素映射表(含测量坐标、比例锚点、元素间关系)
  • 阶段1步骤8中对应元素的追踪元数据(调色板、子元素位置/尺寸、拓扑结构)——如果可用
用量化描述替代定性描述。向Agent下达指令时,文字描述会丢失视觉细节——“宽笑容”无法传达精确的曲率,“厚帽檐”含义模糊。应使用测量值:“嘴巴宽度 = 脸部宽度的55%”、“帽檐高度 = 帽子高度的5%,贴合帽顶曲率”。形容词会失效;比例值始终可靠。
  • 该元素是否为表情关键元素(见上文)——如果是,Agent应运行完整的程序化差异对比循环
  • 将独立SVG写入
    parts/{feature-name}.svg
    的指令
所有Agent必须使用与合成画布相同的
viewBox
(例如
viewBox="0 0 512 512"
)。每个Agent使用元素映射表中的边界框,将元素定位在完整画布坐标内。这确保了在合成时无需重新缩放即可对齐各个部件。
对于相互作用的元素(例如脸部+耳朵、头发+帽子),应进行标注但独立构建——元素间的相互作用将在阶段4解决。对于紧密耦合的元素(耳朵+脸部轮廓、头发+帽檐),需包含相邻元素的边界框,以便Agent了解边界位置。

Render-Compare Loop

渲染-对比循环

Read
workflow/workflow-verification.md
for the full verification pipeline. The key insight: don't rely on visual comparison alone — the LLM is good at spotting catastrophic errors but bad at catching subtle proportion and curvature differences. Use programmatic diff to find errors precisely, then use the LLM to interpret and fix them.
After every SVG change:
  1. Validate the SVG XML before rendering:
    bash
    xmllint --noout parts/{feature}.svg
    This catches unclosed tags, malformed attributes, and missing namespaces with clear error messages — far more helpful than
    rsvg-convert
    's cryptic failures.
  2. Render the SVG to PNG:
    bash
    rsvg-convert -w 512 -h 512 parts/{feature}.svg -o parts/{feature}.png
    If
    rsvg-convert
    is not installed, install it (
    brew install librsvg
    on macOS,
    apt install librsvg2-bin
    on Linux).
  3. Programmatic diff — generate a visual diff image and numerical score comparing the rendered feature against the reference crop. See
    workflow/workflow-verification.md
    for ImageMagick commands. The diff image highlights exactly WHERE the SVG diverges — red areas show the biggest differences.
  4. Read the diff image — use the highlighted differences to direct your corrections. This is far more effective than comparing two similar-looking images: ImageMagick finds the errors precisely, you interpret them and know how to fix the SVG.
  5. Visual sanity check — also read both the rendered PNG and reference crop for qualitative assessment (colors, overall feel, details).
  6. Iterate — fix the top issue highlighted by the diff, re-render, re-diff. Repeat.
When to stop iterating: Limit to 3-5 refinement passes for normal features. For expression-critical features (mouth, eyes, overall proportions), continue up to 10 passes — these define the character and are worth the extra iteration.
Convergence targets (RMSE, normalized 0-1):
  • Expression-critical features: target RMSE < 0.15
  • Standard features: RMSE < 0.25 is acceptable
  • Background/simple fills: RMSE < 0.30 is acceptable
  • Stop when two consecutive iterations improve by less than 0.02 — diminishing returns
These are guidelines, not hard gates. A feature at RMSE 0.18 that looks right is done; a feature at 0.12 that looks wrong needs a different approach. Trust the diff image over the number.
读取
workflow/workflow-verification.md
中的完整验证流程。核心要点:切勿仅依赖视觉对比——大语言模型擅长发现明显错误,但难以察觉细微的比例和曲率差异。使用程序化差异对比精确找出错误,然后使用大语言模型解读并修复。
每次修改SVG后:
  1. 验证SVG XML的有效性,再进行渲染:
    bash
    xmllint --noout parts/{feature}.svg
    这能捕获未闭合标签、格式错误的属性和缺失的命名空间,并给出清晰的错误信息——比
    rsvg-convert
    的模糊错误提示更有帮助。
  2. 渲染SVG为PNG:
    bash
    rsvg-convert -w 512 -h 512 parts/{feature}.svg -o parts/{feature}.png
    如果未安装
    rsvg-convert
    ,请安装它(macOS使用
    brew install librsvg
    ,Linux使用
    apt install librsvg2-bin
    )。
  3. 程序化差异对比——生成视觉差异图像和数值分数,对比渲染后的元素与参考裁剪图。详见
    workflow/workflow-verification.md
    中的ImageMagick命令。差异图像会精确高亮显示SVG与参考图的不同之处——红色区域表示差异最大的部分。
  4. 解读差异图像——根据高亮的差异进行针对性修正。这比对比两张相似图像有效得多:ImageMagick能精确找出错误,你只需解读并知道如何修复SVG。
  5. 视觉合理性检查——同时查看渲染后的PNG和参考裁剪图,进行定性评估(颜色、整体感觉、细节)。
  6. 迭代——修复差异图像中最突出的问题,重新渲染并重新对比。重复此过程。
停止迭代的时机:普通元素限制为3-5次优化迭代。对于表情关键元素(嘴巴、眼睛、整体比例),最多可进行10次迭代——这些元素定义了角色,值得投入额外时间。
收敛目标(RMSE,归一化0-1)
  • 表情关键元素:目标RMSE < 0.15
  • 普通元素:RMSE < 0.25即可接受
  • 背景/简单填充:RMSE < 0.30即可接受
  • 当连续两次迭代的改进幅度小于0.02时停止——收益递减
这些是指导方针,而非硬性标准。如果某个元素的RMSE为0.18但视觉效果符合要求,即可停止;如果RMSE为0.12但视觉效果不佳,则需要更换方法。优先信任差异图像而非数值。

Handling Obscured Content

处理被遮挡内容

When a feature is partially hidden by another layer:
  • Imagine what's underneath. A head wearing a hat still has a complete top — extend the head shape up under where the hat sits, even though it won't be visible in the final composite.
  • Simplify but don't omit. The hidden portion doesn't need full detail, but the shape should be continuous. This prevents hard edges or gaps if layers shift during compositing.
  • Think in complete shapes. A face path should be a complete closed curve, not one that stops where the hat brim sits.
当元素被其他图层部分遮挡时:
  • 想象被遮挡的部分。戴帽子的头部仍有完整的顶部——即使在最终合成中不可见,也要将头部形状延伸到帽子下方。
  • 简化但不要省略。被遮挡的部分无需完整细节,但形状应保持连续。这能避免在合成时图层移位导致的硬边或间隙。
  • 以完整形状思考。脸部路径应是完整的闭合曲线,而非在帽檐处中断。

Phase 3: Class Alignment (Agent Swarm)

阶段3:类别对齐(Agent集群协作)

After all features are built individually, check paired and repeated features for consistency. Agent swarm this — one agent per class of similar features.
A "class" is a group of features that should share the same construction style:
  • Eyes class — left eye + right eye
  • Ears class — left ear + right ear
  • Boots/shoes class — left boot + right boot
  • Arms class — left arm + right arm (if pose shows both)
  • Any other repeated elements — e.g., both wheels of a bike, multiple windows on a building
For images with multiple subjects, classes are per-subject: "character A eyes" and "character B eyes" are separate classes.
Each alignment agent receives both SVGs, both reference crops, and the full original image, and checks:
  1. Outline weight — are paired features using the same stroke width or offset technique? (Most likely to drift between independent agents)
  2. Absolute size — are they the same size, or intentionally different per the reference?
  3. Fill colors — do both use exactly the same hex values?
  4. Construction technique — did one use ellipses while the other used paths? This creates visual inconsistency even if dimensions match.
  5. Highlight count and position — highlights are the most "creative" part and most likely to vary between agents.
  6. Proportional placement — "both eyes should be equidistant from face center" type checks.
The alignment agent normalizes any unintentional inconsistencies — making paired features match while preserving intentional asymmetry from the reference (e.g., if the reference genuinely shows different-sized eyes, keep that).
所有元素独立构建完成后,检查成对和重复元素的一致性。采用Agent集群协作——每个相似元素类别对应一个Agent。
“类别”指应采用相同构建风格的一组元素:
  • 眼睛类别——左眼+右眼
  • 耳朵类别——左耳+右耳
  • 靴子/鞋子类别——左脚靴+右脚靴
  • 手臂类别——左臂+右臂(如果姿势显示双臂)
  • 其他重复元素——例如自行车的两个车轮、建筑上的多个窗户
对于包含多个主题的图像,类别按主题划分:“角色A的眼睛”和“角色B的眼睛”是独立类别。
每个对齐Agent将收到两个SVG、两个参考裁剪图和完整原始图像,并检查:
  1. 轮廓粗细——成对元素是否使用相同的描边宽度或偏移技术?(这是独立Agent最容易出现差异的地方)
  2. 绝对尺寸——它们的尺寸是否相同,或是否根据参考图有意设置为不同尺寸?
  3. 填充颜色——两者是否使用完全相同的十六进制值?
  4. 构建技术——是否一个使用椭圆而另一个使用路径?即使尺寸匹配,这也会导致视觉不一致。
  5. 高光数量和位置——高光是最“具创造性”的部分,也是Agent间最容易出现差异的地方。
  6. 比例位置——例如“两只眼睛与脸部中心的距离应相等”这类检查。
对齐Agent会修正所有非故意的不一致——使成对元素匹配,同时保留参考图中的故意不对称(例如参考图中眼睛尺寸确实不同,则保持差异)。

Phase 4: Composite and Iterate

阶段4:合成与迭代

Read
workflow/composition-bringing-layers-together.md
.
This phase is not optional. The first assembly is never the final output. Individual features built in isolation always have proportion and alignment issues that only become visible in context.
  1. Assemble — combine standalone SVGs into the composite
  2. Apply effects — read
    styles/styles-effects.md
    for
    <clipPath>
    ,
    <mask>
    , and
    <filter>
    where needed
  3. Diff the composite against the original — use the full-image programmatic diff from
    workflow/workflow-verification.md
    . This highlights exactly where the composite diverges from the original.
  4. Identify the top 3 discrepancies from the diff — usually proportion errors (head too small, features shifted), expression mismatches (mouth curvature, gaze direction), or interaction issues (hat sitting wrong, limbs overlapping incorrectly).
  5. Fix each discrepancy — either send targeted corrections back to the feature agent, or fix directly in the composite SVG. Re-render and re-diff after each fix.
  6. Repeat until the composite diff stabilizes — at least 2 composite iterations, more if expression-critical features are off.
  7. Final small-size check:
    bash
    rsvg-convert -w 64 -h 64 final.svg -o /tmp/small-check-64.png
    rsvg-convert -w 128 -h 128 final.svg -o /tmp/small-check-128.png
    Read both renders — does it still read clearly at icon size? Features that looked fine at 512px may merge or disappear.
读取
workflow/composition-bringing-layers-together.md
此阶段为必选项。首次合成的结果永远不是最终输出。独立构建的单个元素总会存在比例和对齐问题,这些问题只有在合成后才会显现。
  1. 合成——将独立SVG合并为最终合成图
  2. 应用效果——读取
    styles/styles-effects.md
    ,按需使用
    <clipPath>
    <mask>
    <filter>
  3. 对比合成图与原始图像——使用
    workflow/workflow-verification.md
    中的完整图像程序化差异对比。这会精确高亮显示合成图与原始图的差异。
  4. 从差异中找出最突出的3个问题——通常是比例错误(头部过小、元素移位)、表情不匹配(嘴巴曲率、眼神方向)或交互问题(帽子位置错误、肢体重叠不当)。
  5. 修复每个问题——要么向对应元素的Agent发送针对性修正指令,要么直接在合成SVG中修复。每次修复后重新渲染并重新对比。
  6. 重复——直到合成图的差异稳定下来——至少进行2次合成迭代,如果表情关键元素存在问题,则需要更多次。
  7. 最终小尺寸检查
    bash
    rsvg-convert -w 64 -h 64 final.svg -o /tmp/small-check-64.png
    rsvg-convert -w 128 -h 128 final.svg -o /tmp/small-check-128.png
    查看这两个渲染图——在图标尺寸下是否仍清晰可辨?在512px下看起来正常的元素,在小尺寸下可能会融合或消失。

Phase 5: Deliver

阶段5:交付

Read
workflow/workflow-file-structure.md
for the expected project layout.
  1. Optimize the final SVG. If
    svgo
    is available, run it with
    cleanupIds
    disabled to preserve named groups:
    bash
    svgo final.svg -o final.svg \
      --config='{"plugins":[{"name":"preset-default","params":{"overrides":{"cleanupIds":false,"collapseGroups":false,"convertShapeToPath":false}}}]}'
    This typically reduces file size by 25-40% (numeric precision, default attributes, path command optimization) without changing the visual output. If
    svgo
    is not available, skip this step — the SVG is still valid.
  2. Keep the
    parts/
    directory with standalone SVGs for future edits
  3. Provide the final composite SVG
  4. Render a PNG at the target resolution for comparison
读取
workflow/workflow-file-structure.md
了解预期的项目结构。
  1. 优化最终SVG。如果
    svgo
    可用,在禁用
    cleanupIds
    的情况下运行它,以保留命名分组:
    bash
    svgo final.svg -o final.svg \
      --config='{"plugins":[{"name":"preset-default","params":{"overrides":{"cleanupIds":false,"collapseGroups":false,"convertShapeToPath":false}}}]}'
    这通常能在不改变视觉效果的情况下将文件大小减少25-40%(优化数值精度、默认属性、路径命令)。如果未安装
    svgo
    ,跳过此步骤——SVG仍然有效。
  2. 保留
    parts/
    目录中的独立SVG,以便未来编辑
  3. 提供最终合成SVG
  4. 渲染目标分辨率的PNG用于对比