veo-3.2-prompter

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Veo 3.2 Prompt Designer Skill

This skill transforms a user's scattered multimodal assets (images, videos, audio) and creative intent into a structured, executable prompt for the Google Veo 3.2 video generation model (Artemis engine). It acts as an expert prompt engineer, ensuring the highest quality output from the underlying model.

该Skill可将用户零散的多模态资产（图片、视频、音频）和创意意图转化为适用于Google Veo 3.2视频生成模型（Artemis引擎）的结构化、可执行提示词。它充当专业提示词工程师的角色，确保底层模型输出最高质量的结果。

When to Use

适用场景

When the user provides assets (images, videos, audio) for video generation with Veo 3.2.
When the user's request is complex and requires careful prompt construction for the Veo model.
When using any Google Veo 3.x model for video generation.

当用户提供资产（图片、视频、音频），想要使用Veo 3.2生成视频时。
当用户的请求较为复杂，需要为Veo模型精心构建提示词时。
当使用任何Google Veo 3.x模型进行视频生成时。

Core Function

核心功能

This skill analyzes all user inputs and generates a single, optimized JSON object containing the final prompt and recommended parameters. The internal workflow (Recognition, Mapping, Construction) is handled automatically and should not be exposed to the user.

该Skill会分析所有用户输入，生成一个包含最终提示词和推荐参数的优化JSON对象。内部工作流程（识别、映射、构建）会自动处理，无需向用户暴露。

Internal Workflow

内部工作流程

Phase 1: Recognition — Analyze uploaded assets and user intent. Use the
```
atomic_element_mapping.md
```
to classify each asset into its atomic element role(s).
Phase 2: Mapping — For each atomic element, determine the optimal reference method (reference image, text prompt, or hybrid). Use the mapping table to decide.
Phase 3: Construction — Assemble the final prompt using the 5-Part Framework (Shot → Subject → Environment → Camera → Style) and attach reference images via the Gemini API's
```
RawReferenceImage
```
system.

阶段1：识别 — 分析上传的资产和用户意图。使用
```
atomic_element_mapping.md
```
将每个资产分类为对应的原子元素角色。
阶段2：映射 — 针对每个原子元素，确定最优的引用方式（参考图片、文本提示词或混合方式）。使用映射表进行决策。
阶段3：构建 — 采用5部分框架（镜头→主体→环境→镜头参数→风格）组装最终提示词，并通过Gemini API的
```
RawReferenceImage
```
系统附加参考图片。

Usage Example

使用示例

User Request: "Make a cinematic shot of this perfume bottle rotating on a dark surface, like a luxury commercial." User uploads
perfume.png

Agent using
veo-3.2-prompter
: The agent internally processes the request and assets, then outputs the final JSON to the next skill in the chain.

Final Output (for internal use):

json

{
  "final_prompt": "Hero shot, a frosted glass perfume bottle with gold cap rotating slowly on a reflective dark surface, three-point studio lighting with soft key and rim light creating subtle caustics, smooth 180-degree arc, hyper-realistic luxury commercial style with shallow depth of field. Crystalline chime, soft ambient pad.",
  "reference_images": [
    {
      "file": "perfume.png",
      "reference_type": "SUBJECT"
    }
  ],
  "recommended_parameters": {
    "model": "veo-3.2-generate",
    "duration_seconds": 8,
    "aspect_ratio": "16:9",
    "resolution": "1080p",
    "generate_audio": true
  }
}

用户请求： "制作一个这支香水瓶在深色表面上旋转的电影级镜头，就像奢侈品广告一样。" 用户上传了
perfume.png

使用
veo-3.2-prompter
的Agent： Agent会在内部处理请求和资产，然后将最终JSON输出给链中的下一个Skill。

最终输出（内部使用）：

json

{
  "final_prompt": "Hero shot, a frosted glass perfume bottle with gold cap rotating slowly on a reflective dark surface, three-point studio lighting with soft key and rim light creating subtle caustics, smooth 180-degree arc, hyper-realistic luxury commercial style with shallow depth of field. Crystalline chime, soft ambient pad.",
  "reference_images": [
    {
      "file": "perfume.png",
      "reference_type": "SUBJECT"
    }
  ],
  "recommended_parameters": {
    "model": "veo-3.2-generate",
    "duration_seconds": 8,
    "aspect_ratio": "16:9",
    "resolution": "1080p",
    "generate_audio": true
  }
}

Veo 3.2 Key Differentiators

Veo 3.2 核心差异化特性

Feature	Capability
Engine	Artemis — world-model physics simulation (not pixel prediction)
Max duration	~30s native continuous generation
Audio	Native dialogue + synchronized SFX
Reference images	Up to 3 ( `STYLE` , `SUBJECT` , `SUBJECT_FACE` )
Video extension	Chain clips via previous video input
First/last frame	Specify start and/or end keyframes
Resolutions	720p, 1080p, 4K (with upscaling)
Aspect ratios	16:9, 9:16

特性	能力
引擎	Artemis — 世界模型物理模拟（而非像素预测）
最长时长	原生连续生成约30秒
音频	原生对话+同步音效
参考图片	最多3张（ `STYLE` 、 `SUBJECT` 、 `SUBJECT_FACE` ）
视频扩展	通过前一个视频输入串联片段
首/末帧	指定起始和/或结束关键帧
分辨率	720p、1080p、4K（支持超分辨率）
宽高比	16:9、9:16

Knowledge Base

知识库

This skill relies on an internal knowledge base to make informed decisions. The agent MUST consult these files during execution.

references/atomic_element_mapping.md
: Core Knowledge. Contains the "Asset Type → Atomic Element" and "Atomic Element → Optimal Reference Method" mapping tables, adapted for Veo 3.2's reference image system.
references/veo_syntax_guide.md
: Veo 3.2 Gemini API syntax reference, covering
```
RawReferenceImage
```
,
```
GenerateVideosConfig
```
, video extension, and first/last frame specification.

该Skill依赖内部知识库做出合理决策。Agent在执行过程中必须参考以下文件：

references/atomic_element_mapping.md
：核心知识库。包含“资产类型→原子元素”和“原子元素→最优引用方式”的映射表，适配Veo 3.2的参考图片系统。
references/veo_syntax_guide.md
：Veo 3.2 Gemini API语法参考，涵盖
```
RawReferenceImage
```
、
```
GenerateVideosConfig
```
、视频扩展以及首/末帧指定。