gemini-video-understanding

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Gemini Video Understanding Skill

Gemini 视频理解Skill

This skill enables comprehensive video analysis using Google's Gemini API, including video summarization, question answering, transcription, timestamp references, and more.

该Skill可借助Google的Gemini API实现全面的视频分析，包括视频摘要、问答、转录、时间戳引用等功能。

Capabilities

功能特性

Video Summarization: Create concise summaries of video content
Question Answering: Answer specific questions about video content
Transcription: Transcribe audio with visual descriptions and timestamps
Timestamp References: Query specific moments in videos (MM:SS format)
Video Clipping: Process specific segments using start/end offsets
Multiple Videos: Compare and analyze up to 10 videos (Gemini 2.5+)
YouTube Support: Analyze YouTube videos directly (preview feature)
Custom Frame Rate: Adjust FPS sampling for different video types

视频摘要生成：创建视频内容的精简摘要
视频问答：针对视频内容回答特定问题
音视频转录：转录音频并附带视觉描述和时间戳
时间戳引用：查询视频中的特定时刻（支持MM:SS格式）
视频剪辑：通过起始/结束偏移量处理特定片段
多视频分析：最多可对比分析10个视频（需Gemini 2.5+）
YouTube支持：直接分析YouTube视频（预览功能）
自定义帧率：针对不同视频类型调整FPS采样率

Supported Formats

支持的格式

MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP

MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP

Models Available

可用模型

Gemini 2.5 Series:

```
gemini-2.5-pro
```
- Best quality, 1M context
```
gemini-2.5-flash
```
- Balanced quality/speed, 1M context
```
gemini-2.5-flash-preview-09-2025
```
- Preview features, 1M context

Gemini 2.0 Series:

```
gemini-2.0-flash
```
- Fast processing
```
gemini-2.0-flash-lite
```
- Lightweight option

Context Windows:

2M token models: ~2 hours (default) or ~6 hours (low-res)
1M token models: ~1 hour (default) or ~3 hours (low-res)

Gemini 2.5系列：

```
gemini-2.5-pro
```
- 质量最优，100万token上下文
```
gemini-2.5-flash
```
- 质量与速度平衡，100万token上下文
```
gemini-2.5-flash-preview-09-2025
```
- 预览版功能，100万token上下文

Gemini 2.0系列：

```
gemini-2.0-flash
```
- 处理速度快
```
gemini-2.0-flash-lite
```
- 轻量版选项

上下文窗口：

200万token模型：约2小时（默认分辨率）或约6小时（低分辨率）
100万token模型：约1小时（默认分辨率）或约3小时（低分辨率）

API Key Configuration

API密钥配置

The skill supports both Google AI Studio and Vertex AI endpoints.

该Skill支持Google AI Studio和Vertex AI两种端点。

Option 1: Google AI Studio (Default)

选项1：Google AI Studio（默认）

The skill checks for

GEMINI_API_KEY

in this order:

Process environment:

process.env.GEMINI_API_KEY

$GEMINI_API_KEY

Project root:
```
.env
```
.claude directory:
```
.claude/.env
```
.claude/skills directory:
```
.claude/skills/.env
```

Skill directory:

.claude/skills/gemini-video-understanding/.env

Get your API key: https://aistudio.google.com/apikey

To set up:

bash

undefined

该Skill会按以下顺序查找

GEMINI_API_KEY

：

进程环境：

process.env.GEMINI_API_KEY

或

$GEMINI_API_KEY

项目根目录：
```
.env
```
文件
.claude目录：
```
.claude/.env
```
文件
.claude/skills目录：
```
.claude/skills/.env
```
文件

Skill目录：

.claude/skills/gemini-video-understanding/.env

文件

获取API密钥： https://aistudio.google.com/apikey

配置步骤：

bash

undefined

Environment variable (recommended)

环境变量（推荐）

export GEMINI_API_KEY="your-api-key-here"

Or in .env file

或写入.env文件

echo "GEMINI_API_KEY=your-api-key-here" > .env

undefined

echo "GEMINI_API_KEY=your-api-key-here" > .env

undefined

Option 2: Vertex AI

选项2：Vertex AI

To use Vertex AI instead:

bash

undefined

若要使用Vertex AI：

bash

undefined

Enable Vertex AI

启用Vertex AI

export GEMINI_USE_VERTEX=true export VERTEX_PROJECT_ID=your-gcp-project-id export VERTEX_LOCATION=us-central1 # Optional, defaults to us-central1


Or in `.env` file:
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1

export GEMINI_USE_VERTEX=true export VERTEX_PROJECT_ID=your-gcp-project-id export VERTEX_LOCATION=us-central1 # 可选，默认值为us-central1


或写入`.env`文件：
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1

Usage Instructions

使用说明

When to Use This Skill

适用场景

Use this skill when the user asks to:

Analyze, summarize, or describe video content
Answer questions about videos
Transcribe video audio with visual context
Extract information from specific timestamps
Compare multiple videos
Process YouTube video content
Create quizzes or educational content from videos

当用户有以下需求时，可使用该Skill：

分析、摘要或描述视频内容
回答与视频相关的问题
转录视频音频并附带视觉上下文
从特定时间戳提取信息
对比多个视频
处理YouTube视频内容
基于视频生成测验或教育内容

Basic Video Analysis

基础视频分析

For video files:

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Summarize this video in 3 key points"

For YouTube URLs:

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --youtube-url "https://www.youtube.com/watch?v=VIDEO_ID" \
  --prompt "What are the main topics discussed?"

针对本地视频文件：

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Summarize this video in 3 key points"

针对YouTube链接：

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --youtube-url "https://www.youtube.com/watch?v=VIDEO_ID" \
  --prompt "What are the main topics discussed?"

Advanced Features

高级功能

Video Clipping (specific time range):

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Summarize this segment" \
  --start-offset "40s" \
  --end-offset "80s"

Custom Frame Rate:

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Analyze the rapid movements" \
  --fps 5

Transcription with Timestamps:

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Transcribe the audio with timestamps and visual descriptions"

Multiple Videos (Gemini 2.5+ only):

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-paths "/path/video1.mp4" "/path/video2.mp4" \
  --prompt "Compare these two videos and highlight the differences"

Model Selection:

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Detailed analysis" \
  --model "gemini-2.5-pro"

视频剪辑（特定时间范围）：

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Summarize this segment" \
  --start-offset "40s" \
  --end-offset "80s"

自定义帧率：

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Analyze the rapid movements" \
  --fps 5

带时间戳的转录：

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Transcribe the audio with timestamps and visual descriptions"

多视频分析（仅Gemini 2.5+支持）：

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-paths "/path/video1.mp4" "/path/video2.mp4" \
  --prompt "Compare these two videos and highlight the differences"

模型选择：

python

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
  --video-path "/path/to/video.mp4" \
  --prompt "Detailed analysis" \
  --model "gemini-2.5-pro"

Script Parameters

脚本参数

Required (one of):
  --video-path PATH           Path to local video file
  --youtube-url URL           YouTube video URL
  --video-paths PATH [PATH..] Multiple video paths (Gemini 2.5+)

Required:
  --prompt TEXT              Analysis prompt/question

Optional:
  --model NAME               Model to use (default: gemini-2.5-flash)
  --start-offset TIME        Video clip start (e.g., "40s", "1m30s")
  --end-offset TIME          Video clip end (e.g., "80s", "2m")
  --fps NUMBER               Frame sampling rate (default: 1)
  --output-file PATH         Save response to file
  --verbose                  Show detailed processing info

必填参数（三选一）：
  --video-path PATH           本地视频文件路径
  --youtube-url URL           YouTube视频链接
  --video-paths PATH [PATH..] 多个视频路径（仅Gemini 2.5+支持）

必填参数：
  --prompt TEXT              分析提示/问题

可选参数：
  --model NAME               使用的模型（默认：gemini-2.5-flash）
  --start-offset TIME        视频剪辑起始时间（例如："40s", "1m30s"）
  --end-offset TIME          视频剪辑结束时间（例如："80s", "2m"）
  --fps NUMBER               帧采样率（默认：1）
  --output-file PATH         将结果保存至文件
  --verbose                  显示详细处理信息

Common Use Cases

常见使用场景

1. Video Summarization

1. 视频摘要

Prompt: "Summarize this video in 3 key points with timestamps"

Prompt: "Summarize this video in 3 key points with timestamps"

2. Educational Content

2. 教育内容生成

Prompt: "Create a quiz with 5 questions and answer key based on this video"

Prompt: "Create a quiz with 5 questions and answer key based on this video"

3. Timestamp-Specific Questions

3. 特定时间戳问题

Prompt: "What happens at 01:15 and how does it relate to the topic at 02:30?"

Prompt: "What happens at 01:15 and how does it relate to the topic at 02:30?"

4. Transcription

4. 音视频转录

Prompt: "Transcribe the audio from this video with timestamps for salient events and visual descriptions"

Prompt: "Transcribe the audio from this video with timestamps for salient events and visual descriptions"

5. Content Comparison

5. 内容对比

Prompt: "Compare these two product demo videos. Which one explains the features more clearly?"

Prompt: "Compare these two product demo videos. Which one explains the features more clearly?"

6. Action Detection

6. 动作检测

Prompt: "List all the actions performed in this tutorial video with timestamps"

Prompt: "List all the actions performed in this tutorial video with timestamps"

Rate Limits & Quotas

速率限制与配额

Free Tier (per model):

10-15 RPM (requests per minute)
1M-4M TPM (tokens per minute)
1,500 RPD (requests per day)

YouTube Limitations:

Free tier: 8 hours of YouTube video per day
Paid tier: No length-based limits
Public videos only (no private/unlisted)

Storage (Files API):

20GB per project
2GB per file
48-hour retention period

免费层级（按模型统计）：

10-15 RPM（每分钟请求数）
100万-400万 TPM（每分钟token数）
1500 RPD（每日请求数）

YouTube限制：

免费层级：每日最多处理8小时YouTube视频
付费层级：无时长限制
仅支持公开视频（不支持私有/未列出视频）

存储（Files API）：

每个项目最多20GB
单个文件最大2GB
文件保留期48小时

Token Calculation

Token计算

Video tokens depend on resolution:

Default resolution: ~300 tokens per second of video
Low resolution: ~100 tokens per second of video

Example: A 10-minute video = 600 seconds × 300 tokens = ~180,000 tokens

视频token数取决于分辨率：

默认分辨率：约每秒视频300个token
低分辨率：约每秒视频100个token

示例： 10分钟视频 = 600秒 × 300 token = 约180,000个token

Error Handling

错误处理

Common errors and solutions:

Error	Cause	Solution
400 Bad Request	Invalid video format or corrupt file	Check file format and integrity
403 Forbidden	Invalid/missing API key	Verify GEMINI_API_KEY configuration
404 Not Found	File URI not found	Ensure file is uploaded and active
429 Too Many Requests	Rate limit exceeded	Implement backoff, upgrade to paid tier
500 Internal Error	Server-side issue	Retry with exponential backoff

常见错误及解决方案：

错误	原因	解决方案
400 Bad Request	视频格式无效或文件损坏	检查文件格式和完整性
403 Forbidden	API密钥无效/缺失	验证 `GEMINI_API_KEY` 配置
404 Not Found	文件URI不存在	确保文件已上传且处于激活状态
429 Too Many Requests	超出速率限制	实现退避机制，升级至付费层级
500 Internal Error	服务器端问题	使用指数退避策略重试

Best Practices

最佳实践

Use Files API for videos >20MB - More reliable than inline data
Wait for file processing - Poll until state is ACTIVE before analysis
Optimize FPS - Use lower FPS for static content to save tokens
Clip long videos - Process specific segments instead of entire video
Cache context - Reuse uploaded files for multiple queries
Batch processing - Process multiple short videos in one request (2.5+)
Specific prompts - Be precise about what you want to extract

视频>20MB时使用Files API - 比内联数据更可靠
等待文件处理完成 - 轮询直到状态变为ACTIVE再进行分析
优化FPS - 静态内容使用更低FPS以节省token
剪辑长视频 - 处理特定片段而非整个视频
缓存上下文 - 复用已上传文件进行多次查询
批量处理 - 一次请求处理多个短视频（2.5+支持）
使用精准提示 - 明确说明需要提取的内容

Implementation Notes

实现说明

For Claude Code:

针对Claude Code：

When a user requests video analysis:

Check API key availability first using the helper script
Determine video source: local file, YouTube URL, or multiple videos
Select appropriate model based on requirements (default: gemini-2.5-flash)
Run the analysis script with proper parameters
Parse and present results to the user clearly
Handle errors gracefully with helpful suggestions

当用户请求视频分析时：

先检查API密钥可用性 - 使用辅助脚本
确定视频来源：本地文件、YouTube链接或多个视频
选择合适的模型 - 根据需求选择（默认：gemini-2.5-flash）
运行分析脚本 - 使用正确的参数
解析并清晰呈现结果 给用户
优雅处理错误 - 提供有用的建议

Files API Workflow:

Files API工作流：

For videos >20MB or reusable content:

Upload video using Files API (script handles this automatically)
Wait for ACTIVE state (polling included in script)
Use file URI for analysis
Files auto-delete after 48 hours

针对>20MB的视频或可复用内容：

使用Files API上传视频（脚本会自动处理）
等待状态变为ACTIVE（脚本包含轮询逻辑）
使用文件URI进行分析
文件会在48小时后自动删除

Inline Data Workflow:

内联数据工作流：

For videos <20MB:

Read video file as bytes
Base64 encode for API
Send in generateContent request
Single-use, no upload needed

针对<20MB的视频：

读取视频文件为字节流
进行Base64编码以适配API
在generateContent请求中发送
单次使用，无需上传

Example Workflows

示例工作流

Workflow 1: YouTube Video Summary

工作流1：YouTube视频摘要

bash

undefined

bash

undefined

User: "Analyze this YouTube tutorial video"

用户需求："分析这个YouTube教程视频"

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--youtube-url "https://www.youtube.com/watch?v=abc123"
--prompt "Create a structured summary with: 1) Main topics, 2) Key takeaways, 3) Recommended audience"

undefined

undefined

Workflow 2: Interview Transcription

工作流2：采访转录

bash

undefined

bash

undefined

User: "Transcribe this interview with timestamps"

用户需求："转录这个采访并添加时间戳"

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--video-path "interview.mp4"
--prompt "Transcribe this interview with speaker labels, timestamps, and visual descriptions of gestures or slides shown"

undefined

undefined

Workflow 3: Product Comparison

工作流3：产品对比

bash

undefined

bash

undefined

User: "Compare these two product demo videos"

用户需求："对比这两个产品演示视频"

python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--video-paths "demo1.mp4" "demo2.mp4"
--model "gemini-2.5-pro"
--prompt "Compare these product demos on: features shown, presentation quality, clarity of explanation, and overall effectiveness"

undefined

undefined

Troubleshooting

故障排除

API Key Not Found:

bash

undefined

API密钥未找到：

bash

undefined

Check API key detection

检查API密钥检测情况

python .claude/skills/gemini-video-understanding/scripts/check_api_key.py


**Video Too Large:**

Error: Request size exceeds 20MB Solution: Script automatically uses Files API for large videos


**Processing Timeout:**

Error: File not reaching ACTIVE state Solution: Check video integrity, try smaller file, or different format


**Rate Limit Errors:**

Error: 429 Too Many Requests Solution: Wait before retry, or upgrade to paid tier

undefined

python .claude/skills/gemini-video-understanding/scripts/check_api_key.py


**视频文件过大：**

错误：请求大小超过20MB 解决方案：脚本会自动对大视频使用Files API


**处理超时：**

错误：文件未变为ACTIVE状态解决方案：检查视频完整性，尝试更小的文件或其他格式


**速率限制错误：**

错误：429 Too Many Requests 解决方案：等待后重试，或升级至付费层级

undefined

Additional Resources

额外资源

API Documentation: https://ai.google.dev/gemini-api/docs/video-understanding
Files API Guide: https://ai.google.dev/gemini-api/docs/vision#uploading-files
Rate Limits: https://ai.google.dev/gemini-api/docs/rate-limits
Pricing: https://ai.google.dev/pricing
Get API Key: https://aistudio.google.com/apikey

API文档：https://ai.google.dev/gemini-api/docs/video-understanding
Files API指南：https://ai.google.dev/gemini-api/docs/vision#uploading-files
速率限制：https://ai.google.dev/gemini-api/docs/rate-limits
定价：https://ai.google.dev/pricing
获取API密钥：https://aistudio.google.com/apikey

Version History

版本历史

1.0.0 (2025-10-26): Initial release with full video understanding capabilities

1.0.0（2025-10-26）：初始版本，包含完整的视频理解功能