veo-3.2-prompter

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Veo 3.2 Prompt Designer Skill

Veo 3.2 Prompt Designer Skill

This skill transforms a user's scattered multimodal assets (images, videos, audio) and creative intent into a structured, executable prompt for the Google Veo 3.2 video generation model (Artemis engine). It acts as an expert prompt engineer, ensuring the highest quality output from the underlying model.
该Skill可将用户零散的多模态资产(图片、视频、音频)和创意意图转化为适用于Google Veo 3.2视频生成模型(Artemis引擎)的结构化、可执行提示词。它充当专业提示词工程师的角色,确保底层模型输出最高质量的结果。

When to Use

适用场景

  • When the user provides assets (images, videos, audio) for video generation with Veo 3.2.
  • When the user's request is complex and requires careful prompt construction for the Veo model.
  • When using any Google Veo 3.x model for video generation.
  • 当用户提供资产(图片、视频、音频),想要使用Veo 3.2生成视频时。
  • 当用户的请求较为复杂,需要为Veo模型精心构建提示词时。
  • 当使用任何Google Veo 3.x模型进行视频生成时。

Core Function

核心功能

This skill analyzes all user inputs and generates a single, optimized JSON object containing the final prompt and recommended parameters. The internal workflow (Recognition, Mapping, Construction) is handled automatically and should not be exposed to the user.
该Skill会分析所有用户输入,生成一个包含最终提示词和推荐参数的优化JSON对象。内部工作流程(识别、映射、构建)会自动处理,无需向用户暴露。

Internal Workflow

内部工作流程

  1. Phase 1: Recognition — Analyze uploaded assets and user intent. Use the
    atomic_element_mapping.md
    to classify each asset into its atomic element role(s).
  2. Phase 2: Mapping — For each atomic element, determine the optimal reference method (reference image, text prompt, or hybrid). Use the mapping table to decide.
  3. Phase 3: Construction — Assemble the final prompt using the 5-Part Framework (Shot → Subject → Environment → Camera → Style) and attach reference images via the Gemini API's
    RawReferenceImage
    system.
  1. 阶段1:识别 — 分析上传的资产和用户意图。使用
    atomic_element_mapping.md
    将每个资产分类为对应的原子元素角色。
  2. 阶段2:映射 — 针对每个原子元素,确定最优的引用方式(参考图片、文本提示词或混合方式)。使用映射表进行决策。
  3. 阶段3:构建 — 采用5部分框架(镜头→主体→环境→镜头参数→风格)组装最终提示词,并通过Gemini API的
    RawReferenceImage
    系统附加参考图片。

Usage Example

使用示例

User Request: "Make a cinematic shot of this perfume bottle rotating on a dark surface, like a luxury commercial." User uploads
perfume.png
Agent using
veo-3.2-prompter
:
The agent internally processes the request and assets, then outputs the final JSON to the next skill in the chain.
Final Output (for internal use):
json
{
  "final_prompt": "Hero shot, a frosted glass perfume bottle with gold cap rotating slowly on a reflective dark surface, three-point studio lighting with soft key and rim light creating subtle caustics, smooth 180-degree arc, hyper-realistic luxury commercial style with shallow depth of field. Crystalline chime, soft ambient pad.",
  "reference_images": [
    {
      "file": "perfume.png",
      "reference_type": "SUBJECT"
    }
  ],
  "recommended_parameters": {
    "model": "veo-3.2-generate",
    "duration_seconds": 8,
    "aspect_ratio": "16:9",
    "resolution": "1080p",
    "generate_audio": true
  }
}
用户请求: "制作一个这支香水瓶在深色表面上旋转的电影级镜头,就像奢侈品广告一样。" 用户上传了
perfume.png
使用
veo-3.2-prompter
的Agent:
Agent会在内部处理请求和资产,然后将最终JSON输出给链中的下一个Skill。
最终输出(内部使用):
json
{
  "final_prompt": "Hero shot, a frosted glass perfume bottle with gold cap rotating slowly on a reflective dark surface, three-point studio lighting with soft key and rim light creating subtle caustics, smooth 180-degree arc, hyper-realistic luxury commercial style with shallow depth of field. Crystalline chime, soft ambient pad.",
  "reference_images": [
    {
      "file": "perfume.png",
      "reference_type": "SUBJECT"
    }
  ],
  "recommended_parameters": {
    "model": "veo-3.2-generate",
    "duration_seconds": 8,
    "aspect_ratio": "16:9",
    "resolution": "1080p",
    "generate_audio": true
  }
}

Veo 3.2 Key Differentiators

Veo 3.2 核心差异化特性

FeatureCapability
EngineArtemis — world-model physics simulation (not pixel prediction)
Max duration~30s native continuous generation
AudioNative dialogue + synchronized SFX
Reference imagesUp to 3 (
STYLE
,
SUBJECT
,
SUBJECT_FACE
)
Video extensionChain clips via previous video input
First/last frameSpecify start and/or end keyframes
Resolutions720p, 1080p, 4K (with upscaling)
Aspect ratios16:9, 9:16
特性能力
引擎Artemis — 世界模型物理模拟(而非像素预测)
最长时长原生连续生成约30秒
音频原生对话+同步音效
参考图片最多3张(
STYLE
SUBJECT
SUBJECT_FACE
视频扩展通过前一个视频输入串联片段
首/末帧指定起始和/或结束关键帧
分辨率720p、1080p、4K(支持超分辨率)
宽高比16:9、9:16

Knowledge Base

知识库

This skill relies on an internal knowledge base to make informed decisions. The agent MUST consult these files during execution.
  • references/atomic_element_mapping.md
    : Core Knowledge. Contains the "Asset Type → Atomic Element" and "Atomic Element → Optimal Reference Method" mapping tables, adapted for Veo 3.2's reference image system.
  • references/veo_syntax_guide.md
    : Veo 3.2 Gemini API syntax reference, covering
    RawReferenceImage
    ,
    GenerateVideosConfig
    , video extension, and first/last frame specification.
该Skill依赖内部知识库做出合理决策。Agent在执行过程中必须参考以下文件:
  • references/atomic_element_mapping.md
    核心知识库。包含“资产类型→原子元素”和“原子元素→最优引用方式”的映射表,适配Veo 3.2的参考图片系统。
  • references/veo_syntax_guide.md
    :Veo 3.2 Gemini API语法参考,涵盖
    RawReferenceImage
    GenerateVideosConfig
    、视频扩展以及首/末帧指定。