vertex-ai-media-master

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Vertex AI Media Master

Overview

概述

Multimodal media operations on Google Cloud Vertex AI covering video understanding, audio generation, image creation, and marketing campaign automation. This skill orchestrates Gemini 2.5 Pro/Flash, Imagen 4, and Lyria models to process, analyze, and generate rich media assets.

基于Google Cloud Vertex AI的多模态媒体操作，涵盖视频理解、音频生成、图像创建以及营销活动自动化。该技能编排Gemini 2.5 Pro/Flash、Imagen 4和Lyria模型，以处理、分析并生成丰富的媒体资产。

Prerequisites

前提条件

Google Cloud project with Vertex AI API enabled

google-cloud-aiplatform

Python SDK installed (

pip install google-cloud-aiplatform[vision,audio]

)

GOOGLE_CLOUD_PROJECT

and

GOOGLE_APPLICATION_CREDENTIALS

environment variables set

Service account with
```
roles/aiplatform.user
```
permission
Sufficient quota for target models (Gemini 2.5 Pro: 2M tokens/min; Imagen 4: 100 images/min)

已启用Vertex AI API的Google Cloud项目

已安装

google-cloud-aiplatform

Python SDK（执行命令：

pip install google-cloud-aiplatform[vision,audio]

）

已设置

GOOGLE_CLOUD_PROJECT

和

GOOGLE_APPLICATION_CREDENTIALS

环境变量

拥有
```
roles/aiplatform.user
```
权限的服务账号
目标模型有足够的配额（Gemini 2.5 Pro：200万令牌/分钟；Imagen 4：100张图片/分钟）

Instructions

操作步骤

Initialize the Vertex AI client with the target project and region (
```
us-central1
```
recommended for model availability).
Select the appropriate model for the task:
- Video analysis: Gemini 2.5 Pro (up to 6 hours at low resolution, 2 hours at default).
- Image generation: Imagen 4 for highest quality stills; Gemini 2.5 Flash Image for interleaved text+image output.
- Audio generation: Lyria for music composition and background tracks.
- Campaign automation: Gemini 2.5 Pro for multi-asset generation from a single prompt.
Prepare input media: upload source files to Cloud Storage (
```
gs://
```
URIs) or provide local paths for smaller assets.
Construct the generation request with explicit parameters (aspect ratio, duration, number of outputs, style constraints).
Execute the request and capture response objects containing generated media bytes or analysis text.
Post-process outputs: save generated images/audio to the target directory, extract structured insights from video analysis, or compile campaign asset bundles.
Validate results against brand guidelines or schema expectations before delivery.

使用目标项目和区域初始化Vertex AI客户端（推荐使用
```
us-central1
```
以获得最佳模型可用性）。
为任务选择合适的模型：
- 视频分析：Gemini 2.5 Pro（低分辨率下最长支持6小时，默认分辨率下支持2小时）。
- 图像生成：Imagen 4用于生成最高质量静态图；Gemini 2.5 Flash Image用于生成文本+图像交错输出。
- 音频生成：Lyria用于音乐创作和背景音乐制作。
- 活动自动化：Gemini 2.5 Pro用于通过单一提示生成多类资产。
准备输入媒体：将源文件上传至Cloud Storage（使用
```
gs://
```
格式的URI），或为小型资产提供本地路径。
构建包含明确参数的生成请求（如宽高比、时长、输出数量、风格约束）。
执行请求并捕获包含生成媒体字节或分析文本的响应对象。
后处理输出：将生成的图像/音频保存至目标目录，从视频分析中提取结构化洞察，或编译营销活动资产包。
在交付前根据品牌规范或架构预期验证结果。

Output

输出内容

Generated image files (PNG/JPEG) from Imagen 4 or Gemini Flash Image
Audio files (WAV/MP3) from Lyria model for background music, voiceovers, or sound effects
Video analysis reports: scene breakdowns, key-moment timestamps, transcript text, marketing-insight summaries
Campaign asset packages: hero images, social media graphics, ad copy, email marketing text, and video scripts
Structured JSON metadata for each generated asset (model used, prompt, parameters, cost estimate)

由Imagen 4或Gemini Flash Image生成的图像文件（PNG/JPEG格式）
由Lyria模型生成的音频文件（WAV/MP3格式），用于背景音乐、旁白或音效
视频分析报告：场景拆解、关键时间戳、转录文本、营销洞察摘要
营销活动资产包：主视觉图、社交媒体素材、广告文案、邮件营销文本以及视频脚本
每个生成资产的结构化JSON元数据（包含使用的模型、提示语、参数、成本估算）

Error Handling

错误处理

Error	Cause	Solution
`PermissionDenied` on Vertex AI API	Service account lacks `aiplatform.user` role	Grant the required IAM role to the service account
`ResourceExhausted` / quota exceeded	Too many concurrent requests or token limit hit	Implement request batching; switch to Gemini 2.5 Flash for lower-cost operations
`InvalidArgument` on image generation	Prompt violates safety filters or unsupported aspect ratio	Revise the prompt to remove restricted content; use a supported aspect ratio (1:1, 16:9, 9:16)
Video processing timeout	Source video exceeds duration or resolution limits	Use low-resolution mode for videos over 2 hours; split longer videos into segments
Audio generation returns empty	Prompt too vague or duration parameter missing	Specify genre, tempo, mood, and an explicit duration in seconds
`NotFound` on model ID	Incorrect model name or model not available in region	Verify the model ID against current Vertex AI documentation; try `us-central1`

错误	原因	解决方案
Vertex AI API出现 `PermissionDenied`	服务账号缺少 `aiplatform.user` 角色	为服务账号授予所需的IAM角色
`ResourceExhausted` / 配额超限	并发请求过多或达到令牌限制	实现请求批处理；切换至Gemini 2.5 Flash以执行低成本操作
图像生成时出现 `InvalidArgument`	提示语违反安全过滤器或使用了不支持的宽高比	修改提示语以移除受限内容；使用支持的宽高比（1:1、16:9、9:16）
视频处理超时	源视频超出时长或分辨率限制	对超过2小时的视频使用低分辨率模式；将较长视频分割为片段
音频生成返回空结果	提示语过于模糊或缺少时长参数	指定流派、节奏、情绪以及明确的时长（以秒为单位）
模型ID出现 `NotFound`	模型名称错误或模型在当前区域不可用	对照Vertex AI官方文档验证模型ID；尝试使用 `us-central1` 区域

Examples

示例

Example 1: Analyze a competitor video ad

Input: A 60-second competitor video uploaded to
```
gs://bucket/competitor-ad.mp4
```
.
Action: Send to Gemini 2.5 Pro with the prompt "Extract messaging themes, calls to action, visual style, and production techniques."
Output: Structured analysis with timestamps for key scenes, identified CTAs, and a competitive positioning summary.

Example 2: Generate campaign assets from a product brief

Input: Text brief describing a new product launch with target audience and brand guidelines.
Action: Use Imagen 4 to generate 4 hero image variations, Lyria for a 30-second background track, and Gemini 2.5 Pro for ad copy in 3 languages.
Output: Directory containing hero images, audio file, and a campaign-copy document organized by language.

Example 3: Repurpose a long-form video into short-form clips

Input: A 10-minute product demo video.
Action: Gemini 2.5 Pro identifies the three most engaging 15-second segments with scene-boundary timestamps.
Output: Timestamp list with suggested captions for TikTok/Reels, plus a storyboard summary for each clip.

示例1：分析竞品视频广告

输入：上传至
```
gs://bucket/competitor-ad.mp4
```
的60秒竞品视频。
操作：将视频发送至Gemini 2.5 Pro，提示语为“提取 messaging themes、行动号召、视觉风格和制作技巧。”
输出：包含关键场景时间戳、识别出的CTA以及竞品定位摘要的结构化分析报告。

示例2：根据产品 brief 生成营销活动资产

输入：描述新产品发布的文本brief，包含目标受众和品牌规范。
操作：使用Imagen 4生成4种主视觉图变体，使用Lyria生成30秒背景音乐，使用Gemini 2.5 Pro生成3种语言的广告文案。
输出：包含主视觉图、音频文件以及按语言分类的营销文案文档的目录。

示例3：将长视频转化为短视频片段

输入：10分钟的产品演示视频。
操作：Gemini 2.5 Pro识别出三个最具吸引力的15秒片段，并提供场景边界时间戳。
输出：包含TikTok/Reels建议字幕的时间戳列表，以及每个片段的故事板摘要。

Resources

参考资源

Detailed model capabilities and code patterns:

${CLAUDE_SKILL_DIR}/references/core-capabilities.md

Vertex AI Multimodal overview: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/overview
Imagen documentation: https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview
Video understanding guide: https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
GenAI for Marketing reference repo: https://github.com/GoogleCloudPlatform/genai-for-marketing

详细模型能力和代码模式：${CLAUDE_SKILL_DIR}/references/core-capabilities.md
Vertex AI多模态概述：https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/overview
Imagen文档：https://cloud.google.com/vertex-ai/generative-ai/docs/image/overview
视频理解指南：https://cloud.google.com/vertex-ai/generative-ai/docs/multimodal/video-understanding
营销生成AI参考仓库：https://github.com/GoogleCloudPlatform/genai-for-marketing