image-service

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

图像处理技能

Image Processing Skill

概述

Overview

能力说明脚本
文生图根据中文文本描述生成图片
scripts/text_to_image.py
图生图在已有图片基础上进行编辑
scripts/image_to_image.py
图生文分析图片内容(描述、OCR、图表等)
scripts/image_to_text.py
长图拼接将多张图片垂直拼接为微信长图
scripts/merge_long_image.py
调研配图预设手绘风格的调研报告信息图
scripts/research_image.py
CapabilityDescriptionScript
Text-to-imageGenerate images based on Chinese text descriptions
scripts/text_to_image.py
Image-to-imageEdit based on existing images
scripts/image_to_image.py
Image-to-textAnalyze image content (description, OCR, charts, etc.)
scripts/image_to_text.py
Long image stitchingVertically stitch multiple images into a WeChat long image
scripts/merge_long_image.py
Research illustrationPreset hand-drawn style infographics for research reports
scripts/research_image.py

配置

Configuration

配置文件:
config/settings.json
配置项
IMAGE_API_BASE_URL
https://llm.api.zyuncs.com/v1
IMAGE_MODEL
lyra-flash-9
VISION_MODEL
qwen2.5-vl-72b-instruct
Configuration file:
config/settings.json
Configuration ItemValue
IMAGE_API_BASE_URL
https://llm.api.zyuncs.com/v1
IMAGE_MODEL
lyra-flash-9
VISION_MODEL
qwen2.5-vl-72b-instruct

执行规范

Execution Specifications

图片默认保存到命令执行时的当前工作目录
  1. 不要使用
    workdir
    切换到 skill 目录执行命令
  2. 始终在用户的工作目录下执行,使用脚本的绝对路径
  3. 脚本路径:skill 目录下的
    scripts/
bash
undefined
Images are saved to the current working directory by default when executing commands:
  1. Do NOT use
    workdir
    to switch to the skill directory for command execution
  2. Always execute in the user's working directory, using the absolute path of the script
  3. Script path:
    scripts/
    under the skill directory
bash
undefined

正确示例(PYTHON 和 SKILL_DIR 替换为你环境的实际路径)

Correct Example (replace PYTHON and SKILL_DIR with actual paths in your environment)

$PYTHON $SKILL_DIR/scripts/text_to_image.py "描述" -r 3:4 -o output.png
undefined
$PYTHON $SKILL_DIR/scripts/text_to_image.py "description" -r 3:4 -o output.png
undefined

快速使用

Quick Start

文生图

Text-to-image

bash
$PYTHON $SKILL_DIR/scripts/text_to_image.py "信息图风格,标题:AI技术趋势" -r 16:9
$PYTHON $SKILL_DIR/scripts/text_to_image.py "竖版海报,产品展示" -r 3:4 -o poster.png
参数:
-r
宽高比 |
-s
尺寸 |
-o
输出路径
支持比例:
1:1
,
2:3
,
3:2
,
3:4
,
4:3
,
4:5
,
5:4
,
9:16
,
16:9
,
21:9
bash
$PYTHON $SKILL_DIR/scripts/text_to_image.py "Infographic style, title: AI Technology Trends" -r 16:9
$PYTHON $SKILL_DIR/scripts/text_to_image.py "Vertical poster, product display" -r 3:4 -o poster.png
Parameters:
-r
Aspect ratio |
-s
Size |
-o
Output path
Supported ratios:
1:1
,
2:3
,
3:2
,
3:4
,
4:3
,
4:5
,
5:4
,
9:16
,
16:9
,
21:9

图生图

Image-to-image

bash
$PYTHON $SKILL_DIR/scripts/image_to_image.py input.png "编辑描述" -r 3:4
bash
$PYTHON $SKILL_DIR/scripts/image_to_image.py input.png "edit description" -r 3:4

图生文

Image-to-text

bash
$PYTHON $SKILL_DIR/scripts/image_to_text.py image.jpg -m describe
$PYTHON $SKILL_DIR/scripts/image_to_text.py screenshot.png -m ocr
模式:
describe
|
ocr
|
chart
|
fashion
|
product
|
scene
bash
$PYTHON $SKILL_DIR/scripts/image_to_text.py image.jpg -m describe
$PYTHON $SKILL_DIR/scripts/image_to_text.py screenshot.png -m ocr
Modes:
describe
|
ocr
|
chart
|
fashion
|
product
|
scene

长图拼接

Long image stitching

bash
$PYTHON $SKILL_DIR/scripts/merge_long_image.py img1.png img2.png -o output.png --blend 20
$PYTHON $SKILL_DIR/scripts/merge_long_image.py -p "*.png" -o long.png --sort name
参数:
-p
通配符 |
-o
输出 |
-w
宽度 |
-g
间隔 |
--blend
融合 |
--sort
排序
bash
$PYTHON $SKILL_DIR/scripts/merge_long_image.py img1.png img2.png -o output.png --blend 20
$PYTHON $SKILL_DIR/scripts/merge_long_image.py -p "*.png" -o long.png --sort name
Parameters:
-p
Wildcard |
-o
Output |
-w
Width |
-g
Gap |
--blend
Blend |
--sort
Sort

调研配图

Research illustration

bash
$PYTHON $SKILL_DIR/scripts/research_image.py -t arch -n "标题" -c "内容" -o output.png
类型:
arch
架构图 |
flow
流程图 |
compare
对比图 |
concept
概念图
bash
$PYTHON $SKILL_DIR/scripts/research_image.py -t arch -n "Title" -c "Content" -o output.png
Types:
arch
Architecture diagram |
flow
Flowchart |
compare
Comparison chart |
concept
Concept diagram

执行前必做:需求类型判断(铁律)

Pre-execution Must-do: Demand Type Judgment (Iron Rule)

收到图片生成需求后,必须先判断是哪种类型,再决定执行方式:
After receiving an image generation request, you must first determine the type before deciding on the execution method:

长图识别规则

Long Image Recognition Rules

提示词中出现以下任一特征,即判定为长图需求
特征类型识别关键词/模式
明确声明长图、长图海报、垂直长图、微信长图、Infographic、Long Banner
分段结构提示词包含多个段落(如"第1部分"、"顶部"、"中间"、"底部")
编号列表使用
### 1.
### 2.
等编号分段
多屏内容描述了3个及以上独立画面/模块
从上至下出现"从上至下"、"从上到下"等描述
A request is judged as a long image demand if the prompt contains any of the following features:
Feature TypeRecognition Keywords/Patterns
Explicit Declarationlong image, long image poster, vertical long image, WeChat long image, Infographic, Long Banner
Segmented StructureThe prompt contains multiple paragraphs (e.g., "Part 1", "Top", "Middle", "Bottom")
Numbered ListUses numbering like
### 1.
,
### 2.
to segment content
Multi-screen ContentDescribes 3 or more independent frames/modules
Top-to-bottom LayoutContains descriptions like "from top to bottom"

判断后的执行路径

Execution Path After Judgment

识别为长图 → 必须先读取 references/long-image-guide.md → 按长图流程执行
识别为单图 → 直接使用 text_to_image.py 生成
铁律:识别为长图后,禁止直接生成!必须先加载长图指南,按指南流程执行。
Identified as long image → Must first read references/long-image-guide.md → Execute according to long image process
Identified as single image → Directly use text_to_image.py to generate
Iron Rule: Once identified as a long image, direct generation is prohibited! You must first load the long image guide and execute according to the guide process.

详细指南(按需加载)

Detailed Guides (Load On Demand)

场景触发条件参考文档
生成多屏长图命中上述长图识别规则
references/long-image-guide.md
(必须加载)
图片含中文文字提示词要求图片包含中文标题/文字
references/text-rendering-guide.md
为 PPT/文档配图用户提供了配色要求或参考文档
references/color-sync-guide.md
API 接口细节需要了解底层实现
docs/api-reference.md
提示词技巧需要优化提示词效果
docs/prompt-guide.md
ScenarioTrigger ConditionReference Document
Generate multi-screen long imageHits any of the above long image recognition rules
references/long-image-guide.md
(Must load)
Image contains Chinese textThe prompt requires the image to include Chinese titles/text
references/text-rendering-guide.md
Create illustrations for PPT/documentsThe user provides color requirements or reference documents
references/color-sync-guide.md
API interface detailsNeed to understand underlying implementation
docs/api-reference.md
Prompt engineering tipsNeed to optimize prompt effects
docs/prompt-guide.md

提示词要点

Prompt Key Points

  1. 必须使用中文撰写提示词
  2. 图片中的标题、标签必须为中文
  3. 默认宽高比 16:9,可通过
    -r
    参数调整
  4. 推荐风格:信息图、数据可视化、手绘文字、科技插画

  1. Must use Chinese to write prompts
  2. Titles and labels in the image must be in Chinese
  3. Default aspect ratio is 16:9, adjustable via the
    -r
    parameter
  4. Recommended styles: infographic, data visualization, hand-drawn text, tech illustration

营销物料生成(产品图/物料包/设计图/元素拆解)

Marketing Material Generation (Product Images/Material Packs/Design Images/Element Disassembly)

当用户提到「产品图」「物料包」「营销素材」「详情页」「元素拆解」「爆炸图」「套图」「多尺寸」「电商图」等关键词时,按以下流程执行。
提示词模板库
references/marketing-templates.md
(必须加载,里面有完整的分类模板)
When the user mentions keywords such as "product image", "material pack", "marketing material", "detail page", "e-commerce image", "design drawing", "exploded view", "image set", "nine-grid", execute according to the following processes.
Prompt Template Library:
references/marketing-templates.md
(Must load, contains complete category templates)

能力矩阵

Capability Matrix

能力触发词流程输出
电商详情长图详情页、长图、商品介绍叠罗汉串行生图 → merge 拼接1张长图
营销物料包物料包、营销素材、多尺寸拆元素 → 多角度多场景 → zip10-15张 + zip
产品设计图产品图、渲染图、效果图基准图 → 多角度/配色变体3-8张
元素拆解图拆解、爆炸图、分解、特写整体图 → 局部特写/功能拆解4-8张
社交媒体套图套图、九宫格、朋友圈统一风格 → 多尺寸适配9张(1:1)
多配色/SKU图配色、多色、SKU基准图 → 图生图换色N张
CapabilityTrigger KeywordsProcessOutput
E-commerce detail long imageDetail page, long image, product introductionStacked serial image generation → merge stitching1 long image
Marketing material packMaterial pack, marketing material, multi-sizeDisassemble elements → multi-angle multi-scene → zip10-15 images + zip package
Product design imageProduct image, rendering, effect imageBase image → multi-angle/color variants3-8 images
Element disassembly diagramDisassembly, exploded view, decomposition, close-upOverall image → local close-up/function disassembly4-8 images
Social media image setImage set, nine-grid, MomentsUnified style → multi-size adaptation9 images (1:1)
Multi-color/SKU imagesColor scheme, multi-color, SKUBase image → image-to-image color changeN images

流程一:电商详情长图

Process 1: E-commerce Detail Long Image

输入:产品名 + 卖点 + 风格
Step 1:规划分屏(通常5-8屏)
  - 屏1:Hero大图(产品+核心卖点)
  - 屏2-N:逐个卖点展开(功能/材质/场景/参数)
  - 末屏:规格参数表
Step 2:叠罗汉串行生图(必须读 references/long-image-guide.md)
  - 第1屏:text_to_image 生成基准
  - 第2-N屏:image_to_image 以上一屏为参考,保持风格一致
Step 3:merge_long_image 拼接(--blend 20 融合接缝)
Step 4:输出长图 + 各分屏原图
Input: Product name + selling points + style
Step 1: Plan screens (usually 5-8 screens)
  - Screen 1: Hero image (product + core selling points)
  - Screen 2-N: Expand each selling point (function/material/scene/parameters)
  - Last screen: Specification parameter table
Step 2: Stacked serial image generation (must read references/long-image-guide.md)
  - Screen 1: Generate base image via text_to_image
  - Screen 2-N: Use image_to_image with the previous screen as reference to maintain consistent style
Step 3: Stitch via merge_long_image (use --blend 20 to merge seams)
Step 4: Output long image + original images of each screen

流程二:营销物料包(重点!)

Process 2: Marketing Material Pack (Key!)

铁律:不是同一张图改尺寸!是拆元素、换角度、换场景!
输入:产品名 + 卖点列表 + 风格偏好
Step 1:生基准主图(text_to_image,产品全貌 16:9)
Step 2:元素拆解(image_to_image × 4-6张)
  - 核心卖点微距特写(1:1)
  - 功能爆炸图/拆解图(3:4)
  - 材质/工艺细节(4:3)
  - 配件全家福(16:9)
Step 3:场景变体(text_to_image / image_to_image × 3-4张)
  - 生活使用场景(16:9)
  - 工作使用场景(4:3)
  - 开箱/拆封场景(1:1)
  - 艺术剪影/氛围图(21:9)
Step 4:营销创意(text_to_image × 3-4张)
  - 对比评测图(3:4)
  - 数据可视化/声波图(16:9)
  - 多配色SKU展示(16:9)
  - 九宫格社交媒体(1:1)
Step 5:全部打包 zip + 逐张预览发送
并发规则:同一批最多8张并发,超过分批。失败的单独重试。
Iron Rule: Don't just resize the same image! Disassemble elements, change angles, change scenes!
Input: Product name + list of selling points + style preference
Step 1: Generate base main image (text_to_image, full product view 16:9)
Step 2: Element disassembly (image_to_image × 4-6 images)
  - Macro close-up of core selling points (1:1)
  - Function exploded view/disassembly diagram (3:4)
  - Material/craft details (4:3)
  - Full set of accessories (16:9)
Step 3: Scene variants (text_to_image / image_to_image × 3-4 images)
  - Daily usage scene (16:9)
  - Work usage scene (4:3)
  - Unboxing scene (1:1)
  - Art silhouette/atmosphere image (21:9)
Step 4: Marketing creativity (text_to_image × 3-4 images)
  - Comparison review chart (3:4)
  - Data visualization/sound wave chart (16:9)
  - Multi-color SKU display (16:9)
  - Nine-grid social media images (1:1)
Step 5: Package all into zip + send previews one by one
Concurrency Rule: Maximum 8 concurrent images per batch, split into batches if exceeding. Retry failed ones individually.

流程三:产品设计图

Process 3: Product Design Image

输入:产品名 + 设计要求
Step 1:text_to_image 生基准主图(产品正面,16:9)
Step 2:image_to_image 生变体(以基准图为参考)
  - 45度角展示
  - 侧面/背面
  - 俯视图
  - 不同配色版本
  - 不同使用场景
Input: Product name + design requirements
Step 1: Generate base main image via text_to_image (front view of product, 16:9)
Step 2: Generate variants via image_to_image (using the base image as reference)
  - 45-degree view
  - Side/back view
  - Top view
  - Different color versions
  - Different usage scenes

流程四:元素拆解图

Process 4: Element Disassembly Diagram

输入:产品图(已有图片)或产品描述
Step 1:如有产品图 → image_to_image 拆解;无图 → text_to_image 先生全貌
Step 2:逐元素生成(image_to_image)
  - 爆炸图/分解视角
  - 局部1微距特写 + 功能标注
  - 局部2微距特写 + 工艺标注
  - 局部3微距特写 + 材质标注
Step 3:可选拼长图(merge_long_image)
Input: Product image (existing) or product description
Step 1: If there is a product image → disassemble via image_to_image; if no image → first generate the full view via text_to_image
Step 2: Generate element by element (image_to_image)
  - Exploded view/decomposition perspective
  - Macro close-up of Part 1 + function annotation
  - Macro close-up of Part 2 + craft annotation
  - Macro close-up of Part 3 + material annotation
Step 3: Optional long image stitching (merge_long_image)

流程五:社交媒体套图

Process 5: Social Media Image Set

输入:产品/主题 + 平台(小红书/朋友圈/微博)
Step 1:确定数量和比例
  - 小红书:6-9张,3:4
  - 朋友圈九宫格:9张,1:1
  - 微博:4-9张,16:9 或 1:1
Step 2:规划每张内容(参考 marketing-templates.md 九宫格模板)
Step 3:统一风格前缀,并发生成
Step 4:按顺序编号输出
Input: Product/theme + platform (Xiaohongshu/Moments/Weibo)
Step 1: Determine quantity and ratio
  - Xiaohongshu: 6-9 images, 3:4
  - Moments nine-grid: 9 images, 1:1
  - Weibo: 4-9 images, 16:9 or 1:1
Step 2: Plan content for each image (refer to nine-grid templates in marketing-templates.md)
Step 3: Define a unified style prefix and generate concurrently
Step 4: Output in numbered order

通用规范

General Specifications

  1. 提示词必须中文,加载
    references/marketing-templates.md
    获取模板
  2. 同一批次风格统一:定义风格前缀,所有图片复用
  3. 并发≤8张,失败单独重试
  4. 命名规范
    {类型}_{序号}.png
    (如
    detail_01.png
    scene_gaming.png
  5. 交付时:逐张发送预览 + 打包 zip(如有多张)

  1. Prompts must be in Chinese, load
    references/marketing-templates.md
    to get templates
  2. Unified style for the same batch: Define a style prefix and reuse it for all images
  3. Concurrency ≤8 images, retry failed ones individually
  4. Naming convention:
    {type}_{sequence}.png
    (e.g.,
    detail_01.png
    ,
    scene_gaming.png
    )
  5. When delivering: Send previews one by one + zip package (if multiple images)

触发关键词

Trigger Keywords

  • 生成类:生成图片、创建图片、文生图、图生图、信息图、数据可视化
  • 分析类:分析图片、OCR、识别文字、图生文
  • 拼接类:长图、微信长图、拼接图片
  • 营销类:产品图、物料包、营销素材、详情页、电商图、设计图、渲染图、效果图
  • 拆解类:拆解、爆炸图、分解、特写、微距
  • 套图类:套图、九宫格、朋友圈、多尺寸、多配色、SKU
  • Generation category: generate image, create image, text-to-image, image-to-image, infographic, data visualization
  • Analysis category: analyze image, OCR, recognize text, image-to-text
  • Stitching category: long image, WeChat long image, stitch image
  • Marketing category: product image, material pack, marketing material, detail page, e-commerce image, design drawing, rendering, effect image
  • Disassembly category: disassembly, exploded view, decomposition, close-up, macro
  • Image set category: image set, nine-grid, Moments, multi-size, multi-color, SKU