gemini-video-understanding

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Gemini Video Understanding Skill

Gemini 视频理解Skill

This skill enables comprehensive video analysis using Google's Gemini API, including video summarization, question answering, transcription, timestamp references, and more.
该Skill可借助Google的Gemini API实现全面的视频分析,包括视频摘要、问答、转录、时间戳引用等功能。

Capabilities

功能特性

  • Video Summarization: Create concise summaries of video content
  • Question Answering: Answer specific questions about video content
  • Transcription: Transcribe audio with visual descriptions and timestamps
  • Timestamp References: Query specific moments in videos (MM:SS format)
  • Video Clipping: Process specific segments using start/end offsets
  • Multiple Videos: Compare and analyze up to 10 videos (Gemini 2.5+)
  • YouTube Support: Analyze YouTube videos directly (preview feature)
  • Custom Frame Rate: Adjust FPS sampling for different video types
  • 视频摘要生成:创建视频内容的精简摘要
  • 视频问答:针对视频内容回答特定问题
  • 音视频转录:转录音频并附带视觉描述和时间戳
  • 时间戳引用:查询视频中的特定时刻(支持MM:SS格式)
  • 视频剪辑:通过起始/结束偏移量处理特定片段
  • 多视频分析:最多可对比分析10个视频(需Gemini 2.5+)
  • YouTube支持:直接分析YouTube视频(预览功能)
  • 自定义帧率:针对不同视频类型调整FPS采样率

Supported Formats

支持的格式

  • MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP
  • MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP

Models Available

可用模型

Gemini 2.5 Series:
  • gemini-2.5-pro
    - Best quality, 1M context
  • gemini-2.5-flash
    - Balanced quality/speed, 1M context
  • gemini-2.5-flash-preview-09-2025
    - Preview features, 1M context
Gemini 2.0 Series:
  • gemini-2.0-flash
    - Fast processing
  • gemini-2.0-flash-lite
    - Lightweight option
Context Windows:
  • 2M token models: ~2 hours (default) or ~6 hours (low-res)
  • 1M token models: ~1 hour (default) or ~3 hours (low-res)
Gemini 2.5系列:
  • gemini-2.5-pro
    - 质量最优,100万token上下文
  • gemini-2.5-flash
    - 质量与速度平衡,100万token上下文
  • gemini-2.5-flash-preview-09-2025
    - 预览版功能,100万token上下文
Gemini 2.0系列:
  • gemini-2.0-flash
    - 处理速度快
  • gemini-2.0-flash-lite
    - 轻量版选项
上下文窗口:
  • 200万token模型:约2小时(默认分辨率)或约6小时(低分辨率)
  • 100万token模型:约1小时(默认分辨率)或约3小时(低分辨率)

API Key Configuration

API密钥配置

The skill supports both Google AI Studio and Vertex AI endpoints.
该Skill支持Google AI StudioVertex AI两种端点。

Option 1: Google AI Studio (Default)

选项1:Google AI Studio(默认)

The skill checks for
GEMINI_API_KEY
in this order:
  1. Process environment:
    process.env.GEMINI_API_KEY
    or
    $GEMINI_API_KEY
  2. Project root:
    .env
  3. .claude directory:
    .claude/.env
  4. .claude/skills directory:
    .claude/skills/.env
  5. Skill directory:
    .claude/skills/gemini-video-understanding/.env
To set up:
bash
undefined
该Skill会按以下顺序查找
GEMINI_API_KEY
  1. 进程环境
    process.env.GEMINI_API_KEY
    $GEMINI_API_KEY
  2. 项目根目录
    .env
    文件
  3. .claude目录
    .claude/.env
    文件
  4. .claude/skills目录
    .claude/skills/.env
    文件
  5. Skill目录
    .claude/skills/gemini-video-understanding/.env
    文件
配置步骤:
bash
undefined

Environment variable (recommended)

环境变量(推荐)

export GEMINI_API_KEY="your-api-key-here"
export GEMINI_API_KEY="your-api-key-here"

Or in .env file

或写入.env文件

echo "GEMINI_API_KEY=your-api-key-here" > .env
undefined
echo "GEMINI_API_KEY=your-api-key-here" > .env
undefined

Option 2: Vertex AI

选项2:Vertex AI

To use Vertex AI instead:
bash
undefined
若要使用Vertex AI:
bash
undefined

Enable Vertex AI

启用Vertex AI

export GEMINI_USE_VERTEX=true export VERTEX_PROJECT_ID=your-gcp-project-id export VERTEX_LOCATION=us-central1 # Optional, defaults to us-central1

Or in `.env` file:
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1
export GEMINI_USE_VERTEX=true export VERTEX_PROJECT_ID=your-gcp-project-id export VERTEX_LOCATION=us-central1 # 可选,默认值为us-central1

或写入`.env`文件:
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1

Usage Instructions

使用说明

When to Use This Skill

适用场景

Use this skill when the user asks to:
  • Analyze, summarize, or describe video content
  • Answer questions about videos
  • Transcribe video audio with visual context
  • Extract information from specific timestamps
  • Compare multiple videos
  • Process YouTube video content
  • Create quizzes or educational content from videos
当用户有以下需求时,可使用该Skill:
  • 分析、摘要或描述视频内容
  • 回答与视频相关的问题
  • 转录视频音频并附带视觉上下文
  • 从特定时间戳提取信息
  • 对比多个视频
  • 处理YouTube视频内容
  • 基于视频生成测验或教育内容

Basic Video Analysis

基础视频分析

For video files:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Summarize this video in 3 key points"
For YouTube URLs:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --youtube-url "https://www.youtube.com/watch?v=VIDEO_ID" \
  --prompt "What are the main topics discussed?"
针对本地视频文件:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Summarize this video in 3 key points"
针对YouTube链接:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --youtube-url "https://www.youtube.com/watch?v=VIDEO_ID" \
  --prompt "What are the main topics discussed?"

Advanced Features

高级功能

Video Clipping (specific time range):
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Summarize this segment" \
  --start-offset "40s" \
  --end-offset "80s"
Custom Frame Rate:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Analyze the rapid movements" \
  --fps 5
Transcription with Timestamps:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Transcribe the audio with timestamps and visual descriptions"
Multiple Videos (Gemini 2.5+ only):
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-paths "/path/video1.mp4" "/path/video2.mp4" \
  --prompt "Compare these two videos and highlight the differences"
Model Selection:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Detailed analysis" \
  --model "gemini-2.5-pro"
视频剪辑(特定时间范围):
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Summarize this segment" \
  --start-offset "40s" \
  --end-offset "80s"
自定义帧率:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Analyze the rapid movements" \
  --fps 5
带时间戳的转录:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Transcribe the audio with timestamps and visual descriptions"
多视频分析(仅Gemini 2.5+支持):
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-paths "/path/video1.mp4" "/path/video2.mp4" \
  --prompt "Compare these two videos and highlight the differences"
模型选择:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Detailed analysis" \
  --model "gemini-2.5-pro"

Script Parameters

脚本参数

Required (one of):
  --video-path PATH           Path to local video file
  --youtube-url URL           YouTube video URL
  --video-paths PATH [PATH..] Multiple video paths (Gemini 2.5+)

Required:
  --prompt TEXT              Analysis prompt/question

Optional:
  --model NAME               Model to use (default: gemini-2.5-flash)
  --start-offset TIME        Video clip start (e.g., "40s", "1m30s")
  --end-offset TIME          Video clip end (e.g., "80s", "2m")
  --fps NUMBER               Frame sampling rate (default: 1)
  --output-file PATH         Save response to file
  --verbose                  Show detailed processing info
必填参数(三选一):
  --video-path PATH           本地视频文件路径
  --youtube-url URL           YouTube视频链接
  --video-paths PATH [PATH..] 多个视频路径(仅Gemini 2.5+支持)

必填参数:
  --prompt TEXT              分析提示/问题

可选参数:
  --model NAME               使用的模型(默认:gemini-2.5-flash)
  --start-offset TIME        视频剪辑起始时间(例如:"40s", "1m30s")
  --end-offset TIME          视频剪辑结束时间(例如:"80s", "2m")
  --fps NUMBER               帧采样率(默认:1)
  --output-file PATH         将结果保存至文件
  --verbose                  显示详细处理信息

Common Use Cases

常见使用场景

1. Video Summarization

1. 视频摘要

Prompt: "Summarize this video in 3 key points with timestamps"
Prompt: "Summarize this video in 3 key points with timestamps"

2. Educational Content

2. 教育内容生成

Prompt: "Create a quiz with 5 questions and answer key based on this video"
Prompt: "Create a quiz with 5 questions and answer key based on this video"

3. Timestamp-Specific Questions

3. 特定时间戳问题

Prompt: "What happens at 01:15 and how does it relate to the topic at 02:30?"
Prompt: "What happens at 01:15 and how does it relate to the topic at 02:30?"

4. Transcription

4. 音视频转录

Prompt: "Transcribe the audio from this video with timestamps for salient events and visual descriptions"
Prompt: "Transcribe the audio from this video with timestamps for salient events and visual descriptions"

5. Content Comparison

5. 内容对比

Prompt: "Compare these two product demo videos. Which one explains the features more clearly?"
Prompt: "Compare these two product demo videos. Which one explains the features more clearly?"

6. Action Detection

6. 动作检测

Prompt: "List all the actions performed in this tutorial video with timestamps"
Prompt: "List all the actions performed in this tutorial video with timestamps"

Rate Limits & Quotas

速率限制与配额

Free Tier (per model):
  • 10-15 RPM (requests per minute)
  • 1M-4M TPM (tokens per minute)
  • 1,500 RPD (requests per day)
YouTube Limitations:
  • Free tier: 8 hours of YouTube video per day
  • Paid tier: No length-based limits
  • Public videos only (no private/unlisted)
Storage (Files API):
  • 20GB per project
  • 2GB per file
  • 48-hour retention period
免费层级(按模型统计):
  • 10-15 RPM(每分钟请求数)
  • 100万-400万 TPM(每分钟token数)
  • 1500 RPD(每日请求数)
YouTube限制:
  • 免费层级:每日最多处理8小时YouTube视频
  • 付费层级:无时长限制
  • 仅支持公开视频(不支持私有/未列出视频)
存储(Files API):
  • 每个项目最多20GB
  • 单个文件最大2GB
  • 文件保留期48小时

Token Calculation

Token计算

Video tokens depend on resolution:
  • Default resolution: ~300 tokens per second of video
  • Low resolution: ~100 tokens per second of video
Example: A 10-minute video = 600 seconds × 300 tokens = ~180,000 tokens
视频token数取决于分辨率:
  • 默认分辨率:约每秒视频300个token
  • 低分辨率:约每秒视频100个token
示例: 10分钟视频 = 600秒 × 300 token = 约180,000个token

Error Handling

错误处理

Common errors and solutions:
ErrorCauseSolution
400 Bad RequestInvalid video format or corrupt fileCheck file format and integrity
403 ForbiddenInvalid/missing API keyVerify GEMINI_API_KEY configuration
404 Not FoundFile URI not foundEnsure file is uploaded and active
429 Too Many RequestsRate limit exceededImplement backoff, upgrade to paid tier
500 Internal ErrorServer-side issueRetry with exponential backoff
常见错误及解决方案:
错误原因解决方案
400 Bad Request视频格式无效或文件损坏检查文件格式和完整性
403 ForbiddenAPI密钥无效/缺失验证
GEMINI_API_KEY
配置
404 Not Found文件URI不存在确保文件已上传且处于激活状态
429 Too Many Requests超出速率限制实现退避机制,升级至付费层级
500 Internal Error服务器端问题使用指数退避策略重试

Best Practices

最佳实践

  1. Use Files API for videos >20MB - More reliable than inline data
  2. Wait for file processing - Poll until state is ACTIVE before analysis
  3. Optimize FPS - Use lower FPS for static content to save tokens
  4. Clip long videos - Process specific segments instead of entire video
  5. Cache context - Reuse uploaded files for multiple queries
  6. Batch processing - Process multiple short videos in one request (2.5+)
  7. Specific prompts - Be precise about what you want to extract
  1. 视频>20MB时使用Files API - 比内联数据更可靠
  2. 等待文件处理完成 - 轮询直到状态变为ACTIVE再进行分析
  3. 优化FPS - 静态内容使用更低FPS以节省token
  4. 剪辑长视频 - 处理特定片段而非整个视频
  5. 缓存上下文 - 复用已上传文件进行多次查询
  6. 批量处理 - 一次请求处理多个短视频(2.5+支持)
  7. 使用精准提示 - 明确说明需要提取的内容

Implementation Notes

实现说明

For Claude Code:

针对Claude Code:

When a user requests video analysis:
  1. Check API key availability first using the helper script
  2. Determine video source: local file, YouTube URL, or multiple videos
  3. Select appropriate model based on requirements (default: gemini-2.5-flash)
  4. Run the analysis script with proper parameters
  5. Parse and present results to the user clearly
  6. Handle errors gracefully with helpful suggestions
当用户请求视频分析时:
  1. 先检查API密钥可用性 - 使用辅助脚本
  2. 确定视频来源:本地文件、YouTube链接或多个视频
  3. 选择合适的模型 - 根据需求选择(默认:gemini-2.5-flash)
  4. 运行分析脚本 - 使用正确的参数
  5. 解析并清晰呈现结果 给用户
  6. 优雅处理错误 - 提供有用的建议

Files API Workflow:

Files API工作流:

For videos >20MB or reusable content:
  1. Upload video using Files API (script handles this automatically)
  2. Wait for ACTIVE state (polling included in script)
  3. Use file URI for analysis
  4. Files auto-delete after 48 hours
针对>20MB的视频或可复用内容:
  1. 使用Files API上传视频(脚本会自动处理)
  2. 等待状态变为ACTIVE(脚本包含轮询逻辑)
  3. 使用文件URI进行分析
  4. 文件会在48小时后自动删除

Inline Data Workflow:

内联数据工作流:

For videos <20MB:
  1. Read video file as bytes
  2. Base64 encode for API
  3. Send in generateContent request
  4. Single-use, no upload needed
针对<20MB的视频:
  1. 读取视频文件为字节流
  2. 进行Base64编码以适配API
  3. 在generateContent请求中发送
  4. 单次使用,无需上传

Example Workflows

示例工作流

Workflow 1: YouTube Video Summary

工作流1:YouTube视频摘要

bash
undefined
bash
undefined

User: "Analyze this YouTube tutorial video"

用户需求:"分析这个YouTube教程视频"

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--youtube-url "https://www.youtube.com/watch?v=abc123"
--prompt "Create a structured summary with: 1) Main topics, 2) Key takeaways, 3) Recommended audience"
undefined
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--youtube-url "https://www.youtube.com/watch?v=abc123"
--prompt "Create a structured summary with: 1) Main topics, 2) Key takeaways, 3) Recommended audience"
undefined

Workflow 2: Interview Transcription

工作流2:采访转录

bash
undefined
bash
undefined

User: "Transcribe this interview with timestamps"

用户需求:"转录这个采访并添加时间戳"

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--video-path "interview.mp4"
--prompt "Transcribe this interview with speaker labels, timestamps, and visual descriptions of gestures or slides shown"
undefined
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--video-path "interview.mp4"
--prompt "Transcribe this interview with speaker labels, timestamps, and visual descriptions of gestures or slides shown"
undefined

Workflow 3: Product Comparison

工作流3:产品对比

bash
undefined
bash
undefined

User: "Compare these two product demo videos"

用户需求:"对比这两个产品演示视频"

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--video-paths "demo1.mp4" "demo2.mp4"
--model "gemini-2.5-pro"
--prompt "Compare these product demos on: features shown, presentation quality, clarity of explanation, and overall effectiveness"
undefined
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--video-paths "demo1.mp4" "demo2.mp4"
--model "gemini-2.5-pro"
--prompt "Compare these product demos on: features shown, presentation quality, clarity of explanation, and overall effectiveness"
undefined

Troubleshooting

故障排除

API Key Not Found:
bash
undefined
API密钥未找到:
bash
undefined

Check API key detection

检查API密钥检测情况

python .claude/skills/gemini-video-understanding/scripts/check_api_key.py

**Video Too Large:**
Error: Request size exceeds 20MB Solution: Script automatically uses Files API for large videos

**Processing Timeout:**
Error: File not reaching ACTIVE state Solution: Check video integrity, try smaller file, or different format

**Rate Limit Errors:**
Error: 429 Too Many Requests Solution: Wait before retry, or upgrade to paid tier
undefined
python .claude/skills/gemini-video-understanding/scripts/check_api_key.py

**视频文件过大:**
错误:请求大小超过20MB 解决方案:脚本会自动对大视频使用Files API

**处理超时:**
错误:文件未变为ACTIVE状态 解决方案:检查视频完整性,尝试更小的文件或其他格式

**速率限制错误:**
错误:429 Too Many Requests 解决方案:等待后重试,或升级至付费层级
undefined

Additional Resources

额外资源

Version History

版本历史

  • 1.0.0 (2025-10-26): Initial release with full video understanding capabilities
  • 1.0.0(2025-10-26):初始版本,包含完整的视频理解功能