image-service
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinese图像处理技能
Image Processing Skill
概述
Overview
| 能力 | 说明 | 脚本 |
|---|---|---|
| 文生图 | 根据中文文本描述生成图片 | |
| 图生图 | 在已有图片基础上进行编辑 | |
| 图生文 | 分析图片内容(描述、OCR、图表等) | |
| 长图拼接 | 将多张图片垂直拼接为微信长图 | |
| 调研配图 | 预设手绘风格的调研报告信息图 | |
| Capability | Description | Script |
|---|---|---|
| Text-to-image | Generate images based on Chinese text descriptions | |
| Image-to-image | Edit based on existing images | |
| Image-to-text | Analyze image content (description, OCR, charts, etc.) | |
| Long image stitching | Vertically stitch multiple images into a WeChat long image | |
| Research illustration | Preset hand-drawn style infographics for research reports | |
配置
Configuration
配置文件:
config/settings.json| 配置项 | 值 |
|---|---|
| IMAGE_API_BASE_URL | |
| IMAGE_MODEL | |
| VISION_MODEL | |
Configuration file:
config/settings.json| Configuration Item | Value |
|---|---|
| IMAGE_API_BASE_URL | |
| IMAGE_MODEL | |
| VISION_MODEL | |
执行规范
Execution Specifications
图片默认保存到命令执行时的当前工作目录:
- 不要使用 切换到 skill 目录执行命令
workdir - 始终在用户的工作目录下执行,使用脚本的绝对路径
- 脚本路径:skill 目录下的
scripts/
bash
undefinedImages are saved to the current working directory by default when executing commands:
- Do NOT use to switch to the skill directory for command execution
workdir - Always execute in the user's working directory, using the absolute path of the script
- Script path: under the skill directory
scripts/
bash
undefined正确示例(PYTHON 和 SKILL_DIR 替换为你环境的实际路径)
Correct Example (replace PYTHON and SKILL_DIR with actual paths in your environment)
$PYTHON $SKILL_DIR/scripts/text_to_image.py "描述" -r 3:4 -o output.png
undefined$PYTHON $SKILL_DIR/scripts/text_to_image.py "description" -r 3:4 -o output.png
undefined快速使用
Quick Start
文生图
Text-to-image
bash
$PYTHON $SKILL_DIR/scripts/text_to_image.py "信息图风格,标题:AI技术趋势" -r 16:9
$PYTHON $SKILL_DIR/scripts/text_to_image.py "竖版海报,产品展示" -r 3:4 -o poster.png参数: 宽高比 | 尺寸 | 输出路径
-r-s-o支持比例:, , , , , , , , ,
1:12:33:23:44:34:55:49:1616:921:9bash
$PYTHON $SKILL_DIR/scripts/text_to_image.py "Infographic style, title: AI Technology Trends" -r 16:9
$PYTHON $SKILL_DIR/scripts/text_to_image.py "Vertical poster, product display" -r 3:4 -o poster.pngParameters: Aspect ratio | Size | Output path
-r-s-oSupported ratios: , , , , , , , , ,
1:12:33:23:44:34:55:49:1616:921:9图生图
Image-to-image
bash
$PYTHON $SKILL_DIR/scripts/image_to_image.py input.png "编辑描述" -r 3:4bash
$PYTHON $SKILL_DIR/scripts/image_to_image.py input.png "edit description" -r 3:4图生文
Image-to-text
bash
$PYTHON $SKILL_DIR/scripts/image_to_text.py image.jpg -m describe
$PYTHON $SKILL_DIR/scripts/image_to_text.py screenshot.png -m ocr模式: | | | | |
describeocrchartfashionproductscenebash
$PYTHON $SKILL_DIR/scripts/image_to_text.py image.jpg -m describe
$PYTHON $SKILL_DIR/scripts/image_to_text.py screenshot.png -m ocrModes: | | | | |
describeocrchartfashionproductscene长图拼接
Long image stitching
bash
$PYTHON $SKILL_DIR/scripts/merge_long_image.py img1.png img2.png -o output.png --blend 20
$PYTHON $SKILL_DIR/scripts/merge_long_image.py -p "*.png" -o long.png --sort name参数: 通配符 | 输出 | 宽度 | 间隔 | 融合 | 排序
-p-o-w-g--blend--sortbash
$PYTHON $SKILL_DIR/scripts/merge_long_image.py img1.png img2.png -o output.png --blend 20
$PYTHON $SKILL_DIR/scripts/merge_long_image.py -p "*.png" -o long.png --sort nameParameters: Wildcard | Output | Width | Gap | Blend | Sort
-p-o-w-g--blend--sort调研配图
Research illustration
bash
$PYTHON $SKILL_DIR/scripts/research_image.py -t arch -n "标题" -c "内容" -o output.png类型: 架构图 | 流程图 | 对比图 | 概念图
archflowcompareconceptbash
$PYTHON $SKILL_DIR/scripts/research_image.py -t arch -n "Title" -c "Content" -o output.pngTypes: Architecture diagram | Flowchart | Comparison chart | Concept diagram
archflowcompareconcept执行前必做:需求类型判断(铁律)
Pre-execution Must-do: Demand Type Judgment (Iron Rule)
收到图片生成需求后,必须先判断是哪种类型,再决定执行方式:
After receiving an image generation request, you must first determine the type before deciding on the execution method:
长图识别规则
Long Image Recognition Rules
提示词中出现以下任一特征,即判定为长图需求:
| 特征类型 | 识别关键词/模式 |
|---|---|
| 明确声明 | 长图、长图海报、垂直长图、微信长图、Infographic、Long Banner |
| 分段结构 | 提示词包含多个段落(如"第1部分"、"顶部"、"中间"、"底部") |
| 编号列表 | 使用 |
| 多屏内容 | 描述了3个及以上独立画面/模块 |
| 从上至下 | 出现"从上至下"、"从上到下"等描述 |
A request is judged as a long image demand if the prompt contains any of the following features:
| Feature Type | Recognition Keywords/Patterns |
|---|---|
| Explicit Declaration | long image, long image poster, vertical long image, WeChat long image, Infographic, Long Banner |
| Segmented Structure | The prompt contains multiple paragraphs (e.g., "Part 1", "Top", "Middle", "Bottom") |
| Numbered List | Uses numbering like |
| Multi-screen Content | Describes 3 or more independent frames/modules |
| Top-to-bottom Layout | Contains descriptions like "from top to bottom" |
判断后的执行路径
Execution Path After Judgment
识别为长图 → 必须先读取 references/long-image-guide.md → 按长图流程执行
识别为单图 → 直接使用 text_to_image.py 生成铁律:识别为长图后,禁止直接生成!必须先加载长图指南,按指南流程执行。
Identified as long image → Must first read references/long-image-guide.md → Execute according to long image process
Identified as single image → Directly use text_to_image.py to generateIron Rule: Once identified as a long image, direct generation is prohibited! You must first load the long image guide and execute according to the guide process.
详细指南(按需加载)
Detailed Guides (Load On Demand)
| 场景 | 触发条件 | 参考文档 |
|---|---|---|
| 生成多屏长图 | 命中上述长图识别规则 | |
| 图片含中文文字 | 提示词要求图片包含中文标题/文字 | |
| 为 PPT/文档配图 | 用户提供了配色要求或参考文档 | |
| API 接口细节 | 需要了解底层实现 | |
| 提示词技巧 | 需要优化提示词效果 | |
| Scenario | Trigger Condition | Reference Document |
|---|---|---|
| Generate multi-screen long image | Hits any of the above long image recognition rules | |
| Image contains Chinese text | The prompt requires the image to include Chinese titles/text | |
| Create illustrations for PPT/documents | The user provides color requirements or reference documents | |
| API interface details | Need to understand underlying implementation | |
| Prompt engineering tips | Need to optimize prompt effects | |
提示词要点
Prompt Key Points
- 必须使用中文撰写提示词
- 图片中的标题、标签必须为中文
- 默认宽高比 16:9,可通过 参数调整
-r - 推荐风格:信息图、数据可视化、手绘文字、科技插画
- Must use Chinese to write prompts
- Titles and labels in the image must be in Chinese
- Default aspect ratio is 16:9, adjustable via the parameter
-r - Recommended styles: infographic, data visualization, hand-drawn text, tech illustration
营销物料生成(产品图/物料包/设计图/元素拆解)
Marketing Material Generation (Product Images/Material Packs/Design Images/Element Disassembly)
当用户提到「产品图」「物料包」「营销素材」「详情页」「元素拆解」「爆炸图」「套图」「多尺寸」「电商图」等关键词时,按以下流程执行。
提示词模板库:(必须加载,里面有完整的分类模板)
references/marketing-templates.mdWhen the user mentions keywords such as "product image", "material pack", "marketing material", "detail page", "e-commerce image", "design drawing", "exploded view", "image set", "nine-grid", execute according to the following processes.
Prompt Template Library: (Must load, contains complete category templates)
references/marketing-templates.md能力矩阵
Capability Matrix
| 能力 | 触发词 | 流程 | 输出 |
|---|---|---|---|
| 电商详情长图 | 详情页、长图、商品介绍 | 叠罗汉串行生图 → merge 拼接 | 1张长图 |
| 营销物料包 | 物料包、营销素材、多尺寸 | 拆元素 → 多角度多场景 → zip | 10-15张 + zip |
| 产品设计图 | 产品图、渲染图、效果图 | 基准图 → 多角度/配色变体 | 3-8张 |
| 元素拆解图 | 拆解、爆炸图、分解、特写 | 整体图 → 局部特写/功能拆解 | 4-8张 |
| 社交媒体套图 | 套图、九宫格、朋友圈 | 统一风格 → 多尺寸适配 | 9张(1:1) |
| 多配色/SKU图 | 配色、多色、SKU | 基准图 → 图生图换色 | N张 |
| Capability | Trigger Keywords | Process | Output |
|---|---|---|---|
| E-commerce detail long image | Detail page, long image, product introduction | Stacked serial image generation → merge stitching | 1 long image |
| Marketing material pack | Material pack, marketing material, multi-size | Disassemble elements → multi-angle multi-scene → zip | 10-15 images + zip package |
| Product design image | Product image, rendering, effect image | Base image → multi-angle/color variants | 3-8 images |
| Element disassembly diagram | Disassembly, exploded view, decomposition, close-up | Overall image → local close-up/function disassembly | 4-8 images |
| Social media image set | Image set, nine-grid, Moments | Unified style → multi-size adaptation | 9 images (1:1) |
| Multi-color/SKU images | Color scheme, multi-color, SKU | Base image → image-to-image color change | N images |
流程一:电商详情长图
Process 1: E-commerce Detail Long Image
输入:产品名 + 卖点 + 风格
↓
Step 1:规划分屏(通常5-8屏)
- 屏1:Hero大图(产品+核心卖点)
- 屏2-N:逐个卖点展开(功能/材质/场景/参数)
- 末屏:规格参数表
↓
Step 2:叠罗汉串行生图(必须读 references/long-image-guide.md)
- 第1屏:text_to_image 生成基准
- 第2-N屏:image_to_image 以上一屏为参考,保持风格一致
↓
Step 3:merge_long_image 拼接(--blend 20 融合接缝)
↓
Step 4:输出长图 + 各分屏原图Input: Product name + selling points + style
↓
Step 1: Plan screens (usually 5-8 screens)
- Screen 1: Hero image (product + core selling points)
- Screen 2-N: Expand each selling point (function/material/scene/parameters)
- Last screen: Specification parameter table
↓
Step 2: Stacked serial image generation (must read references/long-image-guide.md)
- Screen 1: Generate base image via text_to_image
- Screen 2-N: Use image_to_image with the previous screen as reference to maintain consistent style
↓
Step 3: Stitch via merge_long_image (use --blend 20 to merge seams)
↓
Step 4: Output long image + original images of each screen流程二:营销物料包(重点!)
Process 2: Marketing Material Pack (Key!)
铁律:不是同一张图改尺寸!是拆元素、换角度、换场景!
输入:产品名 + 卖点列表 + 风格偏好
↓
Step 1:生基准主图(text_to_image,产品全貌 16:9)
↓
Step 2:元素拆解(image_to_image × 4-6张)
- 核心卖点微距特写(1:1)
- 功能爆炸图/拆解图(3:4)
- 材质/工艺细节(4:3)
- 配件全家福(16:9)
↓
Step 3:场景变体(text_to_image / image_to_image × 3-4张)
- 生活使用场景(16:9)
- 工作使用场景(4:3)
- 开箱/拆封场景(1:1)
- 艺术剪影/氛围图(21:9)
↓
Step 4:营销创意(text_to_image × 3-4张)
- 对比评测图(3:4)
- 数据可视化/声波图(16:9)
- 多配色SKU展示(16:9)
- 九宫格社交媒体(1:1)
↓
Step 5:全部打包 zip + 逐张预览发送并发规则:同一批最多8张并发,超过分批。失败的单独重试。
Iron Rule: Don't just resize the same image! Disassemble elements, change angles, change scenes!
Input: Product name + list of selling points + style preference
↓
Step 1: Generate base main image (text_to_image, full product view 16:9)
↓
Step 2: Element disassembly (image_to_image × 4-6 images)
- Macro close-up of core selling points (1:1)
- Function exploded view/disassembly diagram (3:4)
- Material/craft details (4:3)
- Full set of accessories (16:9)
↓
Step 3: Scene variants (text_to_image / image_to_image × 3-4 images)
- Daily usage scene (16:9)
- Work usage scene (4:3)
- Unboxing scene (1:1)
- Art silhouette/atmosphere image (21:9)
↓
Step 4: Marketing creativity (text_to_image × 3-4 images)
- Comparison review chart (3:4)
- Data visualization/sound wave chart (16:9)
- Multi-color SKU display (16:9)
- Nine-grid social media images (1:1)
↓
Step 5: Package all into zip + send previews one by oneConcurrency Rule: Maximum 8 concurrent images per batch, split into batches if exceeding. Retry failed ones individually.
流程三:产品设计图
Process 3: Product Design Image
输入:产品名 + 设计要求
↓
Step 1:text_to_image 生基准主图(产品正面,16:9)
↓
Step 2:image_to_image 生变体(以基准图为参考)
- 45度角展示
- 侧面/背面
- 俯视图
- 不同配色版本
- 不同使用场景Input: Product name + design requirements
↓
Step 1: Generate base main image via text_to_image (front view of product, 16:9)
↓
Step 2: Generate variants via image_to_image (using the base image as reference)
- 45-degree view
- Side/back view
- Top view
- Different color versions
- Different usage scenes流程四:元素拆解图
Process 4: Element Disassembly Diagram
输入:产品图(已有图片)或产品描述
↓
Step 1:如有产品图 → image_to_image 拆解;无图 → text_to_image 先生全貌
↓
Step 2:逐元素生成(image_to_image)
- 爆炸图/分解视角
- 局部1微距特写 + 功能标注
- 局部2微距特写 + 工艺标注
- 局部3微距特写 + 材质标注
↓
Step 3:可选拼长图(merge_long_image)Input: Product image (existing) or product description
↓
Step 1: If there is a product image → disassemble via image_to_image; if no image → first generate the full view via text_to_image
↓
Step 2: Generate element by element (image_to_image)
- Exploded view/decomposition perspective
- Macro close-up of Part 1 + function annotation
- Macro close-up of Part 2 + craft annotation
- Macro close-up of Part 3 + material annotation
↓
Step 3: Optional long image stitching (merge_long_image)流程五:社交媒体套图
Process 5: Social Media Image Set
输入:产品/主题 + 平台(小红书/朋友圈/微博)
↓
Step 1:确定数量和比例
- 小红书:6-9张,3:4
- 朋友圈九宫格:9张,1:1
- 微博:4-9张,16:9 或 1:1
↓
Step 2:规划每张内容(参考 marketing-templates.md 九宫格模板)
↓
Step 3:统一风格前缀,并发生成
↓
Step 4:按顺序编号输出Input: Product/theme + platform (Xiaohongshu/Moments/Weibo)
↓
Step 1: Determine quantity and ratio
- Xiaohongshu: 6-9 images, 3:4
- Moments nine-grid: 9 images, 1:1
- Weibo: 4-9 images, 16:9 or 1:1
↓
Step 2: Plan content for each image (refer to nine-grid templates in marketing-templates.md)
↓
Step 3: Define a unified style prefix and generate concurrently
↓
Step 4: Output in numbered order通用规范
General Specifications
- 提示词必须中文,加载 获取模板
references/marketing-templates.md - 同一批次风格统一:定义风格前缀,所有图片复用
- 并发≤8张,失败单独重试
- 命名规范:(如
{类型}_{序号}.png、detail_01.png)scene_gaming.png - 交付时:逐张发送预览 + 打包 zip(如有多张)
- Prompts must be in Chinese, load to get templates
references/marketing-templates.md - Unified style for the same batch: Define a style prefix and reuse it for all images
- Concurrency ≤8 images, retry failed ones individually
- Naming convention: (e.g.,
{type}_{sequence}.png,detail_01.png)scene_gaming.png - When delivering: Send previews one by one + zip package (if multiple images)
触发关键词
Trigger Keywords
- 生成类:生成图片、创建图片、文生图、图生图、信息图、数据可视化
- 分析类:分析图片、OCR、识别文字、图生文
- 拼接类:长图、微信长图、拼接图片
- 营销类:产品图、物料包、营销素材、详情页、电商图、设计图、渲染图、效果图
- 拆解类:拆解、爆炸图、分解、特写、微距
- 套图类:套图、九宫格、朋友圈、多尺寸、多配色、SKU
- Generation category: generate image, create image, text-to-image, image-to-image, infographic, data visualization
- Analysis category: analyze image, OCR, recognize text, image-to-text
- Stitching category: long image, WeChat long image, stitch image
- Marketing category: product image, material pack, marketing material, detail page, e-commerce image, design drawing, rendering, effect image
- Disassembly category: disassembly, exploded view, decomposition, close-up, macro
- Image set category: image set, nine-grid, Moments, multi-size, multi-color, SKU