image-generation

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Image Generation Skill

图像生成技能

Overview

概述

This skill generates high-quality images using structured prompts and a Python script. The workflow includes creating JSON-formatted prompts and executing image generation with optional reference images.

本技能通过结构化提示词和Python脚本生成高质量图像。工作流程包括创建JSON格式的提示词，以及结合可选参考图像执行图像生成。

Core Capabilities

核心功能

Create structured JSON prompts for AIGC image generation
Support multiple reference images for style/composition guidance
Generate images through automated Python script execution
Handle various image generation scenarios (character design, scenes, products, etc.)

为AIGC图像生成创建结构化JSON提示词
支持多张参考图像以提供风格/构图引导
通过自动化Python脚本执行图像生成
处理各类图像生成场景（人物设计、场景、产品等）

Workflow

工作流程

Step 1: Understand Requirements

步骤1：明确需求

When a user requests image generation, identify:

Subject/content: What should be in the image
Style preferences: Art style, mood, color palette
Technical specs: Aspect ratio, composition, lighting
Reference images: Any images to guide generation
You don't need to check the folder under
```
/mnt/user-data
```

当用户请求图像生成时，需确定：

主体/内容：图像中应包含的元素
风格偏好：艺术风格、氛围、调色板
技术规格：宽高比、构图、光线
参考图像：用于引导生成的图像
无需检查
```
/mnt/user-data
```
下的文件夹

Step 2: Create Structured Prompt

步骤2：创建结构化提示词

Generate a structured JSON file in

/mnt/user-data/workspace/

with naming pattern:

{descriptive-name}.json

在

/mnt/user-data/workspace/

目录下生成结构化JSON文件，命名格式为：

{描述性名称}.json

Step 3: Execute Generation

步骤3：执行生成

Call the Python script:

bash

python /mnt/skills/public/image-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/prompt-file.json \
  --reference-images /path/to/ref1.jpg /path/to/ref2.png \
  --output-file /mnt/user-data/outputs/generated-image.jpg
  --aspect-ratio 16:9

Parameters:

```
--prompt-file
```
: Absolute path to JSON prompt file (required)
```
--reference-images
```
: Absolute paths to reference images (optional, space-separated)
```
--output-file
```
: Absolute path to output image file (required)
```
--aspect-ratio
```
: Aspect ratio of the generated image (optional, default: 16:9)

[!NOTE] Do NOT read the python file, just call it with the parameters.

调用Python脚本：

bash

python /mnt/skills/public/image-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/prompt-file.json \
  --reference-images /path/to/ref1.jpg /path/to/ref2.png \
  --output-file /mnt/user-data/outputs/generated-image.jpg
  --aspect-ratio 16:9

参数说明：

```
--prompt-file
```
：JSON提示词文件的绝对路径（必填）
```
--reference-images
```
：参考图像的绝对路径（可选，空格分隔）
```
--output-file
```
：输出图像文件的绝对路径（必填）
```
--aspect-ratio
```
：生成图像的宽高比（可选，默认值：16:9）

[!NOTE] 不要读取Python文件，只需传入参数调用即可。

Character Generation Example

人物生成示例

User request: "Create a Tokyo street style woman character in 1990s"

Create prompt file:

/mnt/user-data/workspace/asian-woman.json

json

{
  "characters": [{
    "gender": "female",
    "age": "mid-20s",
    "ethnicity": "Japanese",
    "body_type": "slender, elegant",
    "facial_features": "delicate features, expressive eyes, subtle makeup with emphasis on lips, long dark hair partially wet from rain",
    "clothing": "stylish trench coat, designer handbag, high heels, contemporary Tokyo street fashion",
    "accessories": "minimal jewelry, statement earrings, leather handbag",
    "era": "1990s"
  }],
  "negative_prompt": "blurry face, deformed, low quality, overly sharp digital look, oversaturated colors, artificial lighting, studio setting, posed, selfie angle",
  "style": "Leica M11 street photography aesthetic, film-like rendering, natural color palette with slight warmth, bokeh background blur, analog photography feel",
  "composition": "medium shot, rule of thirds, subject slightly off-center, environmental context of Tokyo street visible, shallow depth of field isolating subject",
  "lighting": "neon lights from signs and storefronts, wet pavement reflections, soft ambient city glow, natural street lighting, rim lighting from background neons",
  "color_palette": "muted naturalistic tones, warm skin tones, cool blue and magenta neon accents, desaturated compared to digital photography, film grain texture"
}

Execute generation:

bash

python /mnt/skills/public/image-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/cyberpunk-hacker.json \
  --output-file /mnt/user-data/outputs/cyberpunk-hacker-01.jpg \
  --aspect-ratio 2:3

With reference images:

json

{
  "characters": [{
    "gender": "based on [Image 1]",
    "age": "based on [Image 1]",
    "ethnicity": "human from [Image 1] adapted to Star Wars universe",
    "body_type": "based on [Image 1]",
    "facial_features": "matching [Image 1] with slight weathered look from space travel",
    "clothing": "Star Wars style outfit - worn leather jacket with utility vest, cargo pants with tactical pouches, scuffed boots, belt with holster",
    "accessories": "blaster pistol on hip, comlink device on wrist, goggles pushed up on forehead, satchel with supplies, personal vehicle based on [Image 2]",
    "era": "Star Wars universe, post-Empire era"
  }],
  "prompt": "Character inspired by [Image 1] standing next to a vehicle inspired by [Image 2] on a bustling alien planet street in Star Wars universe aesthetic. Character wearing worn leather jacket with utility vest, cargo pants with tactical pouches, scuffed boots, belt with blaster holster. The vehicle adapted to Star Wars aesthetic with weathered metal panels, repulsor engines, desert dust covering, parked on the street. Exotic alien marketplace street with multi-level architecture, weathered metal structures, hanging market stalls with colorful awnings, alien species walking by as background characters. Twin suns casting warm golden light, atmospheric dust particles in air, moisture vaporators visible in distance. Gritty lived-in Star Wars aesthetic, practical effects look, film grain texture, cinematic composition.",
  "negative_prompt": "clean futuristic look, sterile environment, overly CGI appearance, fantasy medieval elements, Earth architecture, modern city",
  "style": "Star Wars original trilogy aesthetic, lived-in universe, practical effects inspired, cinematic film look, slightly desaturated with warm tones",
  "composition": "medium wide shot, character in foreground with alien street extending into background, environmental storytelling, rule of thirds",
  "lighting": "warm golden hour lighting from twin suns, rim lighting on character, atmospheric haze, practical light sources from market stalls",
  "color_palette": "warm sandy tones, ochre and sienna, dusty blues, weathered metals, muted earth colors with pops of alien market colors",
  "technical": {
    "aspect_ratio": "9:16",
    "quality": "high",
    "detail_level": "highly detailed with film-like texture"
  }
}

bash

python /mnt/skills/public/image-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/star-wars-scene.json \
  --reference-images /mnt/user-data/uploads/character-ref.jpg /mnt/user-data/uploads/vehicle-ref.jpg \
  --output-file /mnt/user-data/outputs/star-wars-scene-01.jpg \
  --aspect-ratio 16:9

用户请求："创建一个90年代东京街头风格的女性角色"

创建提示词文件：

/mnt/user-data/workspace/asian-woman.json

json

{
  "characters": [{
    "gender": "female",
    "age": "mid-20s",
    "ethnicity": "Japanese",
    "body_type": "slender, elegant",
    "facial_features": "delicate features, expressive eyes, subtle makeup with emphasis on lips, long dark hair partially wet from rain",
    "clothing": "stylish trench coat, designer handbag, high heels, contemporary Tokyo street fashion",
    "accessories": "minimal jewelry, statement earrings, leather handbag",
    "era": "1990s"
  }],
  "negative_prompt": "blurry face, deformed, low quality, overly sharp digital look, oversaturated colors, artificial lighting, studio setting, posed, selfie angle",
  "style": "Leica M11 street photography aesthetic, film-like rendering, natural color palette with slight warmth, bokeh background blur, analog photography feel",
  "composition": "medium shot, rule of thirds, subject slightly off-center, environmental context of Tokyo street visible, shallow depth of field isolating subject",
  "lighting": "neon lights from signs and storefronts, wet pavement reflections, soft ambient city glow, natural street lighting, rim lighting from background neons",
  "color_palette": "muted naturalistic tones, warm skin tones, cool blue and magenta neon accents, desaturated compared to digital photography, film grain texture"
}

执行生成命令：

bash

python /mnt/skills/public/image-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/cyberpunk-hacker.json \
  --output-file /mnt/user-data/outputs/cyberpunk-hacker-01.jpg \
  --aspect-ratio 2:3

带参考图像的示例：

json

{
  "characters": [{
    "gender": "based on [Image 1]",
    "age": "based on [Image 1]",
    "ethnicity": "human from [Image 1] adapted to Star Wars universe",
    "body_type": "based on [Image 1]",
    "facial_features": "matching [Image 1] with slight weathered look from space travel",
    "clothing": "Star Wars style outfit - worn leather jacket with utility vest, cargo pants with tactical pouches, scuffed boots, belt with holster",
    "accessories": "blaster pistol on hip, comlink device on wrist, goggles pushed up on forehead, satchel with supplies, personal vehicle based on [Image 2]",
    "era": "Star Wars universe, post-Empire era"
  }],
  "prompt": "Character inspired by [Image 1] standing next to a vehicle inspired by [Image 2] on a bustling alien planet street in Star Wars universe aesthetic. Character wearing worn leather jacket with utility vest, cargo pants with tactical pouches, scuffed boots, belt with blaster holster. The vehicle adapted to Star Wars aesthetic with weathered metal panels, repulsor engines, desert dust covering, parked on the street. Exotic alien marketplace street with multi-level architecture, weathered metal structures, hanging market stalls with colorful awnings, alien species walking by as background characters. Twin suns casting warm golden light, atmospheric dust particles in air, moisture vaporators visible in distance. Gritty lived-in Star Wars aesthetic, practical effects look, film grain texture, cinematic composition.",
  "negative_prompt": "clean futuristic look, sterile environment, overly CGI appearance, fantasy medieval elements, Earth architecture, modern city",
  "style": "Star Wars original trilogy aesthetic, lived-in universe, practical effects inspired, cinematic film look, slightly desaturated with warm tones",
  "composition": "medium wide shot, character in foreground with alien street extending into background, environmental storytelling, rule of thirds",
  "lighting": "warm golden hour lighting from twin suns, rim lighting on character, atmospheric haze, practical light sources from market stalls",
  "color_palette": "warm sandy tones, ochre and sienna, dusty blues, weathered metals, muted earth colors with pops of alien market colors",
  "technical": {
    "aspect_ratio": "9:16",
    "quality": "high",
    "detail_level": "highly detailed with film-like texture"
  }
}

bash

python /mnt/skills/public/image-generation/scripts/generate.py \
  --prompt-file /mnt/user-data/workspace/star-wars-scene.json \
  --reference-images /mnt/user-data/uploads/character-ref.jpg /mnt/user-data/uploads/vehicle-ref.jpg \
  --output-file /mnt/user-data/outputs/star-wars-scene-01.jpg \
  --aspect-ratio 16:9

Common Scenarios

常见场景

Use different JSON schemas for different scenarios.

Character Design:

Physical attributes (gender, age, ethnicity, body type)
Facial features and expressions
Clothing and accessories
Historical era or setting
Pose and context

Scene Generation:

Environment description
Time of day, weather
Mood and atmosphere
Focal points and composition

Product Visualization:

Product details and materials
Lighting setup
Background and context
Presentation angle

针对不同场景使用不同的JSON模板。

人物设计:

身体特征（性别、年龄、种族、体型）
面部特征与表情
服装与配饰
历史时代或背景
姿势与场景

场景生成:

环境描述
时间、天气
氛围与情绪
焦点与构图

产品可视化:

产品细节与材质
光线设置
背景与场景
展示角度

Specific Templates

特定模板

Read the following template file only when matching the user request.

Doraemon Comic

仅当用户请求匹配时，才可读取以下模板文件。

哆啦A梦漫画风格

Output Handling

输出处理

After generation:

Images are typically saved in
```
/mnt/user-data/outputs/
```
Share generated images with user using present_files tool
Provide brief description of the generation result
Offer to iterate if adjustments needed

生成完成后：

图像通常保存至
```
/mnt/user-data/outputs/
```
目录
使用
```
present_files
```
工具向用户分享生成的图像
提供生成结果的简短描述
若需要调整，可提供迭代优化服务

Tips: Enhancing Generation with Reference Images

技巧：利用参考图像提升生成质量

For scenarios where visual accuracy is critical, use the
image_search
tool first to find reference images before generation.

Recommended scenarios for using image_search tool:

Character/Portrait Generation: Search for similar poses, expressions, or styles to guide facial features and body proportions
Specific Objects or Products: Find reference images of real objects to ensure accurate representation
Architectural or Environmental Scenes: Search for location references to capture authentic details
Fashion and Clothing: Find style references to ensure accurate garment details and styling

Example workflow:

Call the

image_search

tool to find suitable reference images:

image_search(query="Japanese woman street photography 1990s", size="Large")

Download the returned image URLs to local files
Use the downloaded images as
```
--reference-images
```
parameter in the generation script

This approach significantly improves generation quality by providing the model with concrete visual guidance rather than relying solely on text descriptions.

对于视觉准确性要求较高的场景，请先使用
image_search
工具查找参考图像，再进行生成。

推荐使用image_search工具的场景:

人物/肖像生成：搜索类似姿势、表情或风格的图像，以引导面部特征和身体比例的生成
特定物体或产品：查找真实物体的参考图像，确保呈现准确
建筑或环境场景：搜索地点参考图像，捕捉真实细节
时尚与服装：查找风格参考图像，确保服装细节和造型准确

示例工作流程:

调用

image_search

工具查找合适的参考图像：

image_search(query="Japanese woman street photography 1990s", size="Large")

将返回的图像URL下载到本地文件
在生成脚本中使用下载的图像作为
```
--reference-images
```
参数

这种方法通过为模型提供具体的视觉引导，而非仅依赖文本描述，能显著提升生成质量。

Notes

注意事项

Always use English for prompts regardless of user's language
JSON format ensures structured, parsable prompts
Reference images enhance generation quality significantly
Iterative refinement is normal for optimal results
For character generation, include the detailed character object plus a consolidated prompt field

无论用户使用何种语言，提示词始终使用英文
JSON格式确保提示词结构化、可解析
参考图像能显著提升生成质量
为获得最佳结果，迭代优化是正常流程
人物生成时，需包含详细的人物对象以及一个整合的提示词字段