vertex-ai-media-master

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Vertex AI Media Master

Vertex AI Media Master

Overview

概述

Multimodal media operations on Google Cloud Vertex AI covering video understanding, audio generation, image creation, and marketing campaign automation. This skill orchestrates Gemini 2.5 Pro/Flash, Imagen 4, and Lyria models to process, analyze, and generate rich media assets.
基于Google Cloud Vertex AI的多模态媒体操作,涵盖视频理解、音频生成、图像创建以及营销活动自动化。该技能编排Gemini 2.5 Pro/Flash、Imagen 4和Lyria模型,以处理、分析并生成丰富的媒体资产。

Prerequisites

前提条件

  • Google Cloud project with Vertex AI API enabled
  • google-cloud-aiplatform
    Python SDK installed (
    pip install google-cloud-aiplatform[vision,audio]
    )
  • GOOGLE_CLOUD_PROJECT
    and
    GOOGLE_APPLICATION_CREDENTIALS
    environment variables set
  • Service account with
    roles/aiplatform.user
    permission
  • Sufficient quota for target models (Gemini 2.5 Pro: 2M tokens/min; Imagen 4: 100 images/min)
  • 已启用Vertex AI API的Google Cloud项目
  • 已安装
    google-cloud-aiplatform
    Python SDK(执行命令:
    pip install google-cloud-aiplatform[vision,audio]
  • 已设置
    GOOGLE_CLOUD_PROJECT
    GOOGLE_APPLICATION_CREDENTIALS
    环境变量
  • 拥有
    roles/aiplatform.user
    权限的服务账号
  • 目标模型有足够的配额(Gemini 2.5 Pro:200万令牌/分钟;Imagen 4:100张图片/分钟)

Instructions

操作步骤

  1. Initialize the Vertex AI client with the target project and region (
    us-central1
    recommended for model availability).
  2. Select the appropriate model for the task:
    • Video analysis: Gemini 2.5 Pro (up to 6 hours at low resolution, 2 hours at default).
    • Image generation: Imagen 4 for highest quality stills; Gemini 2.5 Flash Image for interleaved text+image output.
    • Audio generation: Lyria for music composition and background tracks.
    • Campaign automation: Gemini 2.5 Pro for multi-asset generation from a single prompt.
  3. Prepare input media: upload source files to Cloud Storage (
    gs://
    URIs) or provide local paths for smaller assets.
  4. Construct the generation request with explicit parameters (aspect ratio, duration, number of outputs, style constraints).
  5. Execute the request and capture response objects containing generated media bytes or analysis text.
  6. Post-process outputs: save generated images/audio to the target directory, extract structured insights from video analysis, or compile campaign asset bundles.
  7. Validate results against brand guidelines or schema expectations before delivery.
  1. 使用目标项目和区域初始化Vertex AI客户端(推荐使用
    us-central1
    以获得最佳模型可用性)。
  2. 为任务选择合适的模型:
    • 视频分析:Gemini 2.5 Pro(低分辨率下最长支持6小时,默认分辨率下支持2小时)。
    • 图像生成:Imagen 4用于生成最高质量静态图;Gemini 2.5 Flash Image用于生成文本+图像交错输出。
    • 音频生成:Lyria用于音乐创作和背景音乐制作。
    • 活动自动化:Gemini 2.5 Pro用于通过单一提示生成多类资产。
  3. 准备输入媒体:将源文件上传至Cloud Storage(使用
    gs://
    格式的URI),或为小型资产提供本地路径。
  4. 构建包含明确参数的生成请求(如宽高比、时长、输出数量、风格约束)。
  5. 执行请求并捕获包含生成媒体字节或分析文本的响应对象。
  6. 后处理输出:将生成的图像/音频保存至目标目录,从视频分析中提取结构化洞察,或编译营销活动资产包。
  7. 在交付前根据品牌规范或架构预期验证结果。

Output

输出内容

  • Generated image files (PNG/JPEG) from Imagen 4 or Gemini Flash Image
  • Audio files (WAV/MP3) from Lyria model for background music, voiceovers, or sound effects
  • Video analysis reports: scene breakdowns, key-moment timestamps, transcript text, marketing-insight summaries
  • Campaign asset packages: hero images, social media graphics, ad copy, email marketing text, and video scripts
  • Structured JSON metadata for each generated asset (model used, prompt, parameters, cost estimate)
  • 由Imagen 4或Gemini Flash Image生成的图像文件(PNG/JPEG格式)
  • 由Lyria模型生成的音频文件(WAV/MP3格式),用于背景音乐、旁白或音效
  • 视频分析报告:场景拆解、关键时间戳、转录文本、营销洞察摘要
  • 营销活动资产包:主视觉图、社交媒体素材、广告文案、邮件营销文本以及视频脚本
  • 每个生成资产的结构化JSON元数据(包含使用的模型、提示语、参数、成本估算)

Error Handling

错误处理

ErrorCauseSolution
PermissionDenied
on Vertex AI API
Service account lacks
aiplatform.user
role
Grant the required IAM role to the service account
ResourceExhausted
/ quota exceeded
Too many concurrent requests or token limit hitImplement request batching; switch to Gemini 2.5 Flash for lower-cost operations
InvalidArgument
on image generation
Prompt violates safety filters or unsupported aspect ratioRevise the prompt to remove restricted content; use a supported aspect ratio (1:1, 16:9, 9:16)
Video processing timeoutSource video exceeds duration or resolution limitsUse low-resolution mode for videos over 2 hours; split longer videos into segments
Audio generation returns emptyPrompt too vague or duration parameter missingSpecify genre, tempo, mood, and an explicit duration in seconds
NotFound
on model ID
Incorrect model name or model not available in regionVerify the model ID against current Vertex AI documentation; try
us-central1
错误原因解决方案
Vertex AI API出现
PermissionDenied
服务账号缺少
aiplatform.user
角色
为服务账号授予所需的IAM角色
ResourceExhausted
/ 配额超限
并发请求过多或达到令牌限制实现请求批处理;切换至Gemini 2.5 Flash以执行低成本操作
图像生成时出现
InvalidArgument
提示语违反安全过滤器或使用了不支持的宽高比修改提示语以移除受限内容;使用支持的宽高比(1:1、16:9、9:16)
视频处理超时源视频超出时长或分辨率限制对超过2小时的视频使用低分辨率模式;将较长视频分割为片段
音频生成返回空结果提示语过于模糊或缺少时长参数指定流派、节奏、情绪以及明确的时长(以秒为单位)
模型ID出现
NotFound
模型名称错误或模型在当前区域不可用对照Vertex AI官方文档验证模型ID;尝试使用
us-central1
区域

Examples

示例

Example 1: Analyze a competitor video ad
  • Input: A 60-second competitor video uploaded to
    gs://bucket/competitor-ad.mp4
    .
  • Action: Send to Gemini 2.5 Pro with the prompt "Extract messaging themes, calls to action, visual style, and production techniques."
  • Output: Structured analysis with timestamps for key scenes, identified CTAs, and a competitive positioning summary.
Example 2: Generate campaign assets from a product brief
  • Input: Text brief describing a new product launch with target audience and brand guidelines.
  • Action: Use Imagen 4 to generate 4 hero image variations, Lyria for a 30-second background track, and Gemini 2.5 Pro for ad copy in 3 languages.
  • Output: Directory containing hero images, audio file, and a campaign-copy document organized by language.
Example 3: Repurpose a long-form video into short-form clips
  • Input: A 10-minute product demo video.
  • Action: Gemini 2.5 Pro identifies the three most engaging 15-second segments with scene-boundary timestamps.
  • Output: Timestamp list with suggested captions for TikTok/Reels, plus a storyboard summary for each clip.
示例1:分析竞品视频广告
  • 输入:上传至
    gs://bucket/competitor-ad.mp4
    的60秒竞品视频。
  • 操作:将视频发送至Gemini 2.5 Pro,提示语为“提取 messaging themes、行动号召、视觉风格和制作技巧。”
  • 输出:包含关键场景时间戳、识别出的CTA以及竞品定位摘要的结构化分析报告。
示例2:根据产品 brief 生成营销活动资产
  • 输入:描述新产品发布的文本brief,包含目标受众和品牌规范。
  • 操作:使用Imagen 4生成4种主视觉图变体,使用Lyria生成30秒背景音乐,使用Gemini 2.5 Pro生成3种语言的广告文案。
  • 输出:包含主视觉图、音频文件以及按语言分类的营销文案文档的目录。
示例3:将长视频转化为短视频片段
  • 输入:10分钟的产品演示视频。
  • 操作:Gemini 2.5 Pro识别出三个最具吸引力的15秒片段,并提供场景边界时间戳。
  • 输出:包含TikTok/Reels建议字幕的时间戳列表,以及每个片段的故事板摘要。

Resources

参考资源