image-service

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

图像处理技能

Image Processing Skill

概述

Overview

能力	说明	脚本
文生图	根据中文文本描述生成图片	`scripts/text_to_image.py`
图生图	在已有图片基础上进行编辑	`scripts/image_to_image.py`
图生文	分析图片内容（描述、OCR、图表等）	`scripts/image_to_text.py`
长图拼接	将多张图片垂直拼接为微信长图	`scripts/merge_long_image.py`
调研配图	预设手绘风格的调研报告信息图	`scripts/research_image.py`

Capability	Description	Script
Text-to-image	Generate images based on Chinese text descriptions	`scripts/text_to_image.py`
Image-to-image	Edit based on existing images	`scripts/image_to_image.py`
Image-to-text	Analyze image content (description, OCR, charts, etc.)	`scripts/image_to_text.py`
Long image stitching	Vertically stitch multiple images into a WeChat long image	`scripts/merge_long_image.py`
Research illustration	Preset hand-drawn style infographics for research reports	`scripts/research_image.py`

配置

Configuration

配置文件：

config/settings.json

配置项	值
IMAGE_API_BASE_URL	`https://llm.api.zyuncs.com/v1`
IMAGE_MODEL	`lyra-flash-9`
VISION_MODEL	`qwen2.5-vl-72b-instruct`

Configuration file:

config/settings.json

Configuration Item	Value
IMAGE_API_BASE_URL	`https://llm.api.zyuncs.com/v1`
IMAGE_MODEL	`lyra-flash-9`
VISION_MODEL	`qwen2.5-vl-72b-instruct`

执行规范

Execution Specifications

图片默认保存到命令执行时的当前工作目录：

不要使用
```
workdir
```
切换到 skill 目录执行命令
始终在用户的工作目录下执行，使用脚本的绝对路径
脚本路径：skill 目录下的
```
scripts/
```

bash

undefined

Images are saved to the current working directory by default when executing commands:

Do NOT use
```
workdir
```
to switch to the skill directory for command execution
Always execute in the user's working directory, using the absolute path of the script
Script path:
```
scripts/
```
under the skill directory

bash

undefined

正确示例（PYTHON 和 SKILL_DIR 替换为你环境的实际路径）

Correct Example (replace PYTHON and SKILL_DIR with actual paths in your environment)

$PYTHON $SKILL_DIR/scripts/text_to_image.py "描述" -r 3:4 -o output.png

undefined

$PYTHON $SKILL_DIR/scripts/text_to_image.py "description" -r 3:4 -o output.png

undefined

快速使用

Quick Start

文生图

Text-to-image

bash

$PYTHON $SKILL_DIR/scripts/text_to_image.py "信息图风格，标题：AI技术趋势" -r 16:9
$PYTHON $SKILL_DIR/scripts/text_to_image.py "竖版海报，产品展示" -r 3:4 -o poster.png

参数：

-r

宽高比 |

-s

尺寸 |

-o

输出路径

支持比例：

1:1

2:3

3:2

3:4

4:3

4:5

5:4

9:16

16:9

21:9

bash

$PYTHON $SKILL_DIR/scripts/text_to_image.py "Infographic style, title: AI Technology Trends" -r 16:9
$PYTHON $SKILL_DIR/scripts/text_to_image.py "Vertical poster, product display" -r 3:4 -o poster.png

Parameters:

-r

Aspect ratio |

-s

Size |

-o

Output path

Supported ratios:

1:1

2:3

3:2

3:4

4:3

4:5

5:4

9:16

16:9

21:9

图生图

Image-to-image

bash

$PYTHON $SKILL_DIR/scripts/image_to_image.py input.png "编辑描述" -r 3:4

bash

$PYTHON $SKILL_DIR/scripts/image_to_image.py input.png "edit description" -r 3:4

图生文

Image-to-text

bash

$PYTHON $SKILL_DIR/scripts/image_to_text.py image.jpg -m describe
$PYTHON $SKILL_DIR/scripts/image_to_text.py screenshot.png -m ocr

模式：

describe

ocr

chart

fashion

product

scene

bash

$PYTHON $SKILL_DIR/scripts/image_to_text.py image.jpg -m describe
$PYTHON $SKILL_DIR/scripts/image_to_text.py screenshot.png -m ocr

Modes:

describe

ocr

chart

fashion

product

scene

长图拼接

Long image stitching

bash

$PYTHON $SKILL_DIR/scripts/merge_long_image.py img1.png img2.png -o output.png --blend 20
$PYTHON $SKILL_DIR/scripts/merge_long_image.py -p "*.png" -o long.png --sort name

参数：

-p

通配符 |

-o

输出 |

-w

宽度 |

-g

间隔 |

--blend

融合 |

--sort

排序

bash

$PYTHON $SKILL_DIR/scripts/merge_long_image.py img1.png img2.png -o output.png --blend 20
$PYTHON $SKILL_DIR/scripts/merge_long_image.py -p "*.png" -o long.png --sort name

Parameters:

-p

Wildcard |

-o

Output |

-w

Width |

-g

Gap |

--blend

Blend |

--sort

Sort

调研配图

Research illustration

bash

$PYTHON $SKILL_DIR/scripts/research_image.py -t arch -n "标题" -c "内容" -o output.png

类型：

arch

架构图 |

flow

流程图 |

compare

对比图 |

concept

概念图

bash

$PYTHON $SKILL_DIR/scripts/research_image.py -t arch -n "Title" -c "Content" -o output.png

Types:

arch

Architecture diagram |

flow

Flowchart |

compare

Comparison chart |

concept

Concept diagram

执行前必做：需求类型判断（铁律）

Pre-execution Must-do: Demand Type Judgment (Iron Rule)

收到图片生成需求后，必须先判断是哪种类型，再决定执行方式：

After receiving an image generation request, you must first determine the type before deciding on the execution method:

长图识别规则

Long Image Recognition Rules

提示词中出现以下任一特征，即判定为长图需求：

特征类型	识别关键词/模式
明确声明	长图、长图海报、垂直长图、微信长图、Infographic、Long Banner
分段结构	提示词包含多个段落（如"第1部分"、"顶部"、"中间"、"底部"）
编号列表	使用 `### 1.` 、 `### 2.` 等编号分段
多屏内容	描述了3个及以上独立画面/模块
从上至下	出现"从上至下"、"从上到下"等描述

A request is judged as a long image demand if the prompt contains any of the following features:

Feature Type	Recognition Keywords/Patterns
Explicit Declaration	long image, long image poster, vertical long image, WeChat long image, Infographic, Long Banner
Segmented Structure	The prompt contains multiple paragraphs (e.g., "Part 1", "Top", "Middle", "Bottom")
Numbered List	Uses numbering like `### 1.` , `### 2.` to segment content
Multi-screen Content	Describes 3 or more independent frames/modules
Top-to-bottom Layout	Contains descriptions like "from top to bottom"

判断后的执行路径

Execution Path After Judgment

识别为长图 → 必须先读取 references/long-image-guide.md → 按长图流程执行
识别为单图 → 直接使用 text_to_image.py 生成

铁律：识别为长图后，禁止直接生成！必须先加载长图指南，按指南流程执行。

Identified as long image → Must first read references/long-image-guide.md → Execute according to long image process
Identified as single image → Directly use text_to_image.py to generate

Iron Rule: Once identified as a long image, direct generation is prohibited! You must first load the long image guide and execute according to the guide process.

详细指南（按需加载）

Detailed Guides (Load On Demand)

场景	触发条件	参考文档
生成多屏长图	命中上述长图识别规则	`references/long-image-guide.md` （必须加载）
图片含中文文字	提示词要求图片包含中文标题/文字	`references/text-rendering-guide.md`
为 PPT/文档配图	用户提供了配色要求或参考文档	`references/color-sync-guide.md`
API 接口细节	需要了解底层实现	`docs/api-reference.md`
提示词技巧	需要优化提示词效果	`docs/prompt-guide.md`

Scenario	Trigger Condition	Reference Document
Generate multi-screen long image	Hits any of the above long image recognition rules	`references/long-image-guide.md` (Must load)
Image contains Chinese text	The prompt requires the image to include Chinese titles/text	`references/text-rendering-guide.md`
Create illustrations for PPT/documents	The user provides color requirements or reference documents	`references/color-sync-guide.md`
API interface details	Need to understand underlying implementation	`docs/api-reference.md`
Prompt engineering tips	Need to optimize prompt effects	`docs/prompt-guide.md`

提示词要点

Prompt Key Points

必须使用中文撰写提示词
图片中的标题、标签必须为中文
默认宽高比 16:9，可通过
```
-r
```
参数调整
推荐风格：信息图、数据可视化、手绘文字、科技插画

Must use Chinese to write prompts
Titles and labels in the image must be in Chinese
Default aspect ratio is 16:9, adjustable via the
```
-r
```
parameter
Recommended styles: infographic, data visualization, hand-drawn text, tech illustration

营销物料生成（产品图/物料包/设计图/元素拆解）

Marketing Material Generation (Product Images/Material Packs/Design Images/Element Disassembly)

当用户提到「产品图」「物料包」「营销素材」「详情页」「元素拆解」「爆炸图」「套图」「多尺寸」「电商图」等关键词时，按以下流程执行。

提示词模板库：

references/marketing-templates.md

（必须加载，里面有完整的分类模板）

When the user mentions keywords such as "product image", "material pack", "marketing material", "detail page", "e-commerce image", "design drawing", "exploded view", "image set", "nine-grid", execute according to the following processes.

Prompt Template Library:

references/marketing-templates.md

(Must load, contains complete category templates)

能力矩阵

Capability Matrix

能力	触发词	流程	输出
电商详情长图	详情页、长图、商品介绍	叠罗汉串行生图 → merge 拼接	1张长图
营销物料包	物料包、营销素材、多尺寸	拆元素 → 多角度多场景 → zip	10-15张 + zip
产品设计图	产品图、渲染图、效果图	基准图 → 多角度/配色变体	3-8张
元素拆解图	拆解、爆炸图、分解、特写	整体图 → 局部特写/功能拆解	4-8张
社交媒体套图	套图、九宫格、朋友圈	统一风格 → 多尺寸适配	9张（1:1）
多配色/SKU图	配色、多色、SKU	基准图 → 图生图换色	N张

Capability	Trigger Keywords	Process	Output
E-commerce detail long image	Detail page, long image, product introduction	Stacked serial image generation → merge stitching	1 long image
Marketing material pack	Material pack, marketing material, multi-size	Disassemble elements → multi-angle multi-scene → zip	10-15 images + zip package
Product design image	Product image, rendering, effect image	Base image → multi-angle/color variants	3-8 images
Element disassembly diagram	Disassembly, exploded view, decomposition, close-up	Overall image → local close-up/function disassembly	4-8 images
Social media image set	Image set, nine-grid, Moments	Unified style → multi-size adaptation	9 images (1:1)
Multi-color/SKU images	Color scheme, multi-color, SKU	Base image → image-to-image color change	N images

流程一：电商详情长图

Process 1: E-commerce Detail Long Image

输入：产品名 + 卖点 + 风格
  ↓
Step 1：规划分屏（通常5-8屏）
  - 屏1：Hero大图（产品+核心卖点）
  - 屏2-N：逐个卖点展开（功能/材质/场景/参数）
  - 末屏：规格参数表
  ↓
Step 2：叠罗汉串行生图（必须读 references/long-image-guide.md）
  - 第1屏：text_to_image 生成基准
  - 第2-N屏：image_to_image 以上一屏为参考，保持风格一致
  ↓
Step 3：merge_long_image 拼接（--blend 20 融合接缝）
  ↓
Step 4：输出长图 + 各分屏原图

Input: Product name + selling points + style
  ↓
Step 1: Plan screens (usually 5-8 screens)
  - Screen 1: Hero image (product + core selling points)
  - Screen 2-N: Expand each selling point (function/material/scene/parameters)
  - Last screen: Specification parameter table
  ↓
Step 2: Stacked serial image generation (must read references/long-image-guide.md)
  - Screen 1: Generate base image via text_to_image
  - Screen 2-N: Use image_to_image with the previous screen as reference to maintain consistent style
  ↓
Step 3: Stitch via merge_long_image (use --blend 20 to merge seams)
  ↓
Step 4: Output long image + original images of each screen

流程二：营销物料包（重点！）

Process 2: Marketing Material Pack (Key!)

铁律：不是同一张图改尺寸！是拆元素、换角度、换场景！

输入：产品名 + 卖点列表 + 风格偏好
  ↓
Step 1：生基准主图（text_to_image，产品全貌 16:9）
  ↓
Step 2：元素拆解（image_to_image × 4-6张）
  - 核心卖点微距特写（1:1）
  - 功能爆炸图/拆解图（3:4）
  - 材质/工艺细节（4:3）
  - 配件全家福（16:9）
  ↓
Step 3：场景变体（text_to_image / image_to_image × 3-4张）
  - 生活使用场景（16:9）
  - 工作使用场景（4:3）
  - 开箱/拆封场景（1:1）
  - 艺术剪影/氛围图（21:9）
  ↓
Step 4：营销创意（text_to_image × 3-4张）
  - 对比评测图（3:4）
  - 数据可视化/声波图（16:9）
  - 多配色SKU展示（16:9）
  - 九宫格社交媒体（1:1）
  ↓
Step 5：全部打包 zip + 逐张预览发送

并发规则：同一批最多8张并发，超过分批。失败的单独重试。

Iron Rule: Don't just resize the same image! Disassemble elements, change angles, change scenes!

Input: Product name + list of selling points + style preference
  ↓
Step 1: Generate base main image (text_to_image, full product view 16:9)
  ↓
Step 2: Element disassembly (image_to_image × 4-6 images)
  - Macro close-up of core selling points (1:1)
  - Function exploded view/disassembly diagram (3:4)
  - Material/craft details (4:3)
  - Full set of accessories (16:9)
  ↓
Step 3: Scene variants (text_to_image / image_to_image × 3-4 images)
  - Daily usage scene (16:9)
  - Work usage scene (4:3)
  - Unboxing scene (1:1)
  - Art silhouette/atmosphere image (21:9)
  ↓
Step 4: Marketing creativity (text_to_image × 3-4 images)
  - Comparison review chart (3:4)
  - Data visualization/sound wave chart (16:9)
  - Multi-color SKU display (16:9)
  - Nine-grid social media images (1:1)
  ↓
Step 5: Package all into zip + send previews one by one

Concurrency Rule: Maximum 8 concurrent images per batch, split into batches if exceeding. Retry failed ones individually.

流程三：产品设计图

Process 3: Product Design Image

输入：产品名 + 设计要求
  ↓
Step 1：text_to_image 生基准主图（产品正面，16:9）
  ↓
Step 2：image_to_image 生变体（以基准图为参考）
  - 45度角展示
  - 侧面/背面
  - 俯视图
  - 不同配色版本
  - 不同使用场景

Input: Product name + design requirements
  ↓
Step 1: Generate base main image via text_to_image (front view of product, 16:9)
  ↓
Step 2: Generate variants via image_to_image (using the base image as reference)
  - 45-degree view
  - Side/back view
  - Top view
  - Different color versions
  - Different usage scenes

流程四：元素拆解图

Process 4: Element Disassembly Diagram

输入：产品图（已有图片）或产品描述
  ↓
Step 1：如有产品图 → image_to_image 拆解；无图 → text_to_image 先生全貌
  ↓
Step 2：逐元素生成（image_to_image）
  - 爆炸图/分解视角
  - 局部1微距特写 + 功能标注
  - 局部2微距特写 + 工艺标注
  - 局部3微距特写 + 材质标注
  ↓
Step 3：可选拼长图（merge_long_image）

Input: Product image (existing) or product description
  ↓
Step 1: If there is a product image → disassemble via image_to_image; if no image → first generate the full view via text_to_image
  ↓
Step 2: Generate element by element (image_to_image)
  - Exploded view/decomposition perspective
  - Macro close-up of Part 1 + function annotation
  - Macro close-up of Part 2 + craft annotation
  - Macro close-up of Part 3 + material annotation
  ↓
Step 3: Optional long image stitching (merge_long_image)

流程五：社交媒体套图

Process 5: Social Media Image Set

输入：产品/主题 + 平台（小红书/朋友圈/微博）
  ↓
Step 1：确定数量和比例
  - 小红书：6-9张，3:4
  - 朋友圈九宫格：9张，1:1
  - 微博：4-9张，16:9 或 1:1
  ↓
Step 2：规划每张内容（参考 marketing-templates.md 九宫格模板）
  ↓
Step 3：统一风格前缀，并发生成
  ↓
Step 4：按顺序编号输出

Input: Product/theme + platform (Xiaohongshu/Moments/Weibo)
  ↓
Step 1: Determine quantity and ratio
  - Xiaohongshu: 6-9 images, 3:4
  - Moments nine-grid: 9 images, 1:1
  - Weibo: 4-9 images, 16:9 or 1:1
  ↓
Step 2: Plan content for each image (refer to nine-grid templates in marketing-templates.md)
  ↓
Step 3: Define a unified style prefix and generate concurrently
  ↓
Step 4: Output in numbered order

通用规范

General Specifications

提示词必须中文，加载
```
references/marketing-templates.md
```
获取模板
同一批次风格统一：定义风格前缀，所有图片复用
并发≤8张，失败单独重试

命名规范：

{类型}_{序号}.png

（如

detail_01.png

、

scene_gaming.png

）

交付时：逐张发送预览 + 打包 zip（如有多张）

Prompts must be in Chinese, load
```
references/marketing-templates.md
```
to get templates
Unified style for the same batch: Define a style prefix and reuse it for all images
Concurrency ≤8 images, retry failed ones individually

Naming convention:

{type}_{sequence}.png

(e.g.,

detail_01.png

scene_gaming.png

)

When delivering: Send previews one by one + zip package (if multiple images)

触发关键词

Trigger Keywords

生成类：生成图片、创建图片、文生图、图生图、信息图、数据可视化
分析类：分析图片、OCR、识别文字、图生文
拼接类：长图、微信长图、拼接图片
营销类：产品图、物料包、营销素材、详情页、电商图、设计图、渲染图、效果图
拆解类：拆解、爆炸图、分解、特写、微距
套图类：套图、九宫格、朋友圈、多尺寸、多配色、SKU

Generation category: generate image, create image, text-to-image, image-to-image, infographic, data visualization
Analysis category: analyze image, OCR, recognize text, image-to-text
Stitching category: long image, WeChat long image, stitch image
Marketing category: product image, material pack, marketing material, detail page, e-commerce image, design drawing, rendering, effect image
Disassembly category: disassembly, exploded view, decomposition, close-up, macro
Image set category: image set, nine-grid, Moments, multi-size, multi-color, SKU