gemini-video-understanding
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseGemini Video Understanding Skill
Gemini 视频理解Skill
This skill enables comprehensive video analysis using Google's Gemini API, including video summarization, question answering, transcription, timestamp references, and more.
该Skill可借助Google的Gemini API实现全面的视频分析,包括视频摘要、问答、转录、时间戳引用等功能。
Capabilities
功能特性
- Video Summarization: Create concise summaries of video content
- Question Answering: Answer specific questions about video content
- Transcription: Transcribe audio with visual descriptions and timestamps
- Timestamp References: Query specific moments in videos (MM:SS format)
- Video Clipping: Process specific segments using start/end offsets
- Multiple Videos: Compare and analyze up to 10 videos (Gemini 2.5+)
- YouTube Support: Analyze YouTube videos directly (preview feature)
- Custom Frame Rate: Adjust FPS sampling for different video types
- 视频摘要生成:创建视频内容的精简摘要
- 视频问答:针对视频内容回答特定问题
- 音视频转录:转录音频并附带视觉描述和时间戳
- 时间戳引用:查询视频中的特定时刻(支持MM:SS格式)
- 视频剪辑:通过起始/结束偏移量处理特定片段
- 多视频分析:最多可对比分析10个视频(需Gemini 2.5+)
- YouTube支持:直接分析YouTube视频(预览功能)
- 自定义帧率:针对不同视频类型调整FPS采样率
Supported Formats
支持的格式
- MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP
- MP4, MPEG, MOV, AVI, FLV, MPG, WebM, WMV, 3GPP
Models Available
可用模型
Gemini 2.5 Series:
- - Best quality, 1M context
gemini-2.5-pro - - Balanced quality/speed, 1M context
gemini-2.5-flash - - Preview features, 1M context
gemini-2.5-flash-preview-09-2025
Gemini 2.0 Series:
- - Fast processing
gemini-2.0-flash - - Lightweight option
gemini-2.0-flash-lite
Context Windows:
- 2M token models: ~2 hours (default) or ~6 hours (low-res)
- 1M token models: ~1 hour (default) or ~3 hours (low-res)
Gemini 2.5系列:
- - 质量最优,100万token上下文
gemini-2.5-pro - - 质量与速度平衡,100万token上下文
gemini-2.5-flash - - 预览版功能,100万token上下文
gemini-2.5-flash-preview-09-2025
Gemini 2.0系列:
- - 处理速度快
gemini-2.0-flash - - 轻量版选项
gemini-2.0-flash-lite
上下文窗口:
- 200万token模型:约2小时(默认分辨率)或约6小时(低分辨率)
- 100万token模型:约1小时(默认分辨率)或约3小时(低分辨率)
API Key Configuration
API密钥配置
The skill supports both Google AI Studio and Vertex AI endpoints.
该Skill支持Google AI Studio和Vertex AI两种端点。
Option 1: Google AI Studio (Default)
选项1:Google AI Studio(默认)
The skill checks for in this order:
GEMINI_API_KEY- Process environment: or
process.env.GEMINI_API_KEY$GEMINI_API_KEY - Project root:
.env - .claude directory:
.claude/.env - .claude/skills directory:
.claude/skills/.env - Skill directory:
.claude/skills/gemini-video-understanding/.env
Get your API key: https://aistudio.google.com/apikey
To set up:
bash
undefined该Skill会按以下顺序查找:
GEMINI_API_KEY- 进程环境:或
process.env.GEMINI_API_KEY$GEMINI_API_KEY - 项目根目录:文件
.env - .claude目录:文件
.claude/.env - .claude/skills目录:文件
.claude/skills/.env - Skill目录:文件
.claude/skills/gemini-video-understanding/.env
获取API密钥: https://aistudio.google.com/apikey
配置步骤:
bash
undefinedEnvironment variable (recommended)
环境变量(推荐)
export GEMINI_API_KEY="your-api-key-here"
export GEMINI_API_KEY="your-api-key-here"
Or in .env file
或写入.env文件
echo "GEMINI_API_KEY=your-api-key-here" > .env
undefinedecho "GEMINI_API_KEY=your-api-key-here" > .env
undefinedOption 2: Vertex AI
选项2:Vertex AI
To use Vertex AI instead:
bash
undefined若要使用Vertex AI:
bash
undefinedEnable Vertex AI
启用Vertex AI
export GEMINI_USE_VERTEX=true
export VERTEX_PROJECT_ID=your-gcp-project-id
export VERTEX_LOCATION=us-central1 # Optional, defaults to us-central1
Or in `.env` file:
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1export GEMINI_USE_VERTEX=true
export VERTEX_PROJECT_ID=your-gcp-project-id
export VERTEX_LOCATION=us-central1 # 可选,默认值为us-central1
或写入`.env`文件:
```bash
GEMINI_USE_VERTEX=true
VERTEX_PROJECT_ID=your-gcp-project-id
VERTEX_LOCATION=us-central1Usage Instructions
使用说明
When to Use This Skill
适用场景
Use this skill when the user asks to:
- Analyze, summarize, or describe video content
- Answer questions about videos
- Transcribe video audio with visual context
- Extract information from specific timestamps
- Compare multiple videos
- Process YouTube video content
- Create quizzes or educational content from videos
当用户有以下需求时,可使用该Skill:
- 分析、摘要或描述视频内容
- 回答与视频相关的问题
- 转录视频音频并附带视觉上下文
- 从特定时间戳提取信息
- 对比多个视频
- 处理YouTube视频内容
- 基于视频生成测验或教育内容
Basic Video Analysis
基础视频分析
For video files:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Summarize this video in 3 key points"For YouTube URLs:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--youtube-url "https://www.youtube.com/watch?v=VIDEO_ID" \
--prompt "What are the main topics discussed?"针对本地视频文件:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Summarize this video in 3 key points"针对YouTube链接:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--youtube-url "https://www.youtube.com/watch?v=VIDEO_ID" \
--prompt "What are the main topics discussed?"Advanced Features
高级功能
Video Clipping (specific time range):
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Summarize this segment" \
--start-offset "40s" \
--end-offset "80s"Custom Frame Rate:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Analyze the rapid movements" \
--fps 5Transcription with Timestamps:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Transcribe the audio with timestamps and visual descriptions"Multiple Videos (Gemini 2.5+ only):
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-paths "/path/video1.mp4" "/path/video2.mp4" \
--prompt "Compare these two videos and highlight the differences"Model Selection:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Detailed analysis" \
--model "gemini-2.5-pro"视频剪辑(特定时间范围):
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Summarize this segment" \
--start-offset "40s" \
--end-offset "80s"自定义帧率:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Analyze the rapid movements" \
--fps 5带时间戳的转录:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Transcribe the audio with timestamps and visual descriptions"多视频分析(仅Gemini 2.5+支持):
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-paths "/path/video1.mp4" "/path/video2.mp4" \
--prompt "Compare these two videos and highlight the differences"模型选择:
python
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py \
--video-path "/path/to/video.mp4" \
--prompt "Detailed analysis" \
--model "gemini-2.5-pro"Script Parameters
脚本参数
Required (one of):
--video-path PATH Path to local video file
--youtube-url URL YouTube video URL
--video-paths PATH [PATH..] Multiple video paths (Gemini 2.5+)
Required:
--prompt TEXT Analysis prompt/question
Optional:
--model NAME Model to use (default: gemini-2.5-flash)
--start-offset TIME Video clip start (e.g., "40s", "1m30s")
--end-offset TIME Video clip end (e.g., "80s", "2m")
--fps NUMBER Frame sampling rate (default: 1)
--output-file PATH Save response to file
--verbose Show detailed processing info必填参数(三选一):
--video-path PATH 本地视频文件路径
--youtube-url URL YouTube视频链接
--video-paths PATH [PATH..] 多个视频路径(仅Gemini 2.5+支持)
必填参数:
--prompt TEXT 分析提示/问题
可选参数:
--model NAME 使用的模型(默认:gemini-2.5-flash)
--start-offset TIME 视频剪辑起始时间(例如:"40s", "1m30s")
--end-offset TIME 视频剪辑结束时间(例如:"80s", "2m")
--fps NUMBER 帧采样率(默认:1)
--output-file PATH 将结果保存至文件
--verbose 显示详细处理信息Common Use Cases
常见使用场景
1. Video Summarization
1. 视频摘要
Prompt: "Summarize this video in 3 key points with timestamps"Prompt: "Summarize this video in 3 key points with timestamps"2. Educational Content
2. 教育内容生成
Prompt: "Create a quiz with 5 questions and answer key based on this video"Prompt: "Create a quiz with 5 questions and answer key based on this video"3. Timestamp-Specific Questions
3. 特定时间戳问题
Prompt: "What happens at 01:15 and how does it relate to the topic at 02:30?"Prompt: "What happens at 01:15 and how does it relate to the topic at 02:30?"4. Transcription
4. 音视频转录
Prompt: "Transcribe the audio from this video with timestamps for salient events and visual descriptions"Prompt: "Transcribe the audio from this video with timestamps for salient events and visual descriptions"5. Content Comparison
5. 内容对比
Prompt: "Compare these two product demo videos. Which one explains the features more clearly?"Prompt: "Compare these two product demo videos. Which one explains the features more clearly?"6. Action Detection
6. 动作检测
Prompt: "List all the actions performed in this tutorial video with timestamps"Prompt: "List all the actions performed in this tutorial video with timestamps"Rate Limits & Quotas
速率限制与配额
Free Tier (per model):
- 10-15 RPM (requests per minute)
- 1M-4M TPM (tokens per minute)
- 1,500 RPD (requests per day)
YouTube Limitations:
- Free tier: 8 hours of YouTube video per day
- Paid tier: No length-based limits
- Public videos only (no private/unlisted)
Storage (Files API):
- 20GB per project
- 2GB per file
- 48-hour retention period
免费层级(按模型统计):
- 10-15 RPM(每分钟请求数)
- 100万-400万 TPM(每分钟token数)
- 1500 RPD(每日请求数)
YouTube限制:
- 免费层级:每日最多处理8小时YouTube视频
- 付费层级:无时长限制
- 仅支持公开视频(不支持私有/未列出视频)
存储(Files API):
- 每个项目最多20GB
- 单个文件最大2GB
- 文件保留期48小时
Token Calculation
Token计算
Video tokens depend on resolution:
- Default resolution: ~300 tokens per second of video
- Low resolution: ~100 tokens per second of video
Example: A 10-minute video = 600 seconds × 300 tokens = ~180,000 tokens
视频token数取决于分辨率:
- 默认分辨率:约每秒视频300个token
- 低分辨率:约每秒视频100个token
示例: 10分钟视频 = 600秒 × 300 token = 约180,000个token
Error Handling
错误处理
Common errors and solutions:
| Error | Cause | Solution |
|---|---|---|
| 400 Bad Request | Invalid video format or corrupt file | Check file format and integrity |
| 403 Forbidden | Invalid/missing API key | Verify GEMINI_API_KEY configuration |
| 404 Not Found | File URI not found | Ensure file is uploaded and active |
| 429 Too Many Requests | Rate limit exceeded | Implement backoff, upgrade to paid tier |
| 500 Internal Error | Server-side issue | Retry with exponential backoff |
常见错误及解决方案:
| 错误 | 原因 | 解决方案 |
|---|---|---|
| 400 Bad Request | 视频格式无效或文件损坏 | 检查文件格式和完整性 |
| 403 Forbidden | API密钥无效/缺失 | 验证 |
| 404 Not Found | 文件URI不存在 | 确保文件已上传且处于激活状态 |
| 429 Too Many Requests | 超出速率限制 | 实现退避机制,升级至付费层级 |
| 500 Internal Error | 服务器端问题 | 使用指数退避策略重试 |
Best Practices
最佳实践
- Use Files API for videos >20MB - More reliable than inline data
- Wait for file processing - Poll until state is ACTIVE before analysis
- Optimize FPS - Use lower FPS for static content to save tokens
- Clip long videos - Process specific segments instead of entire video
- Cache context - Reuse uploaded files for multiple queries
- Batch processing - Process multiple short videos in one request (2.5+)
- Specific prompts - Be precise about what you want to extract
- 视频>20MB时使用Files API - 比内联数据更可靠
- 等待文件处理完成 - 轮询直到状态变为ACTIVE再进行分析
- 优化FPS - 静态内容使用更低FPS以节省token
- 剪辑长视频 - 处理特定片段而非整个视频
- 缓存上下文 - 复用已上传文件进行多次查询
- 批量处理 - 一次请求处理多个短视频(2.5+支持)
- 使用精准提示 - 明确说明需要提取的内容
Implementation Notes
实现说明
For Claude Code:
针对Claude Code:
When a user requests video analysis:
- Check API key availability first using the helper script
- Determine video source: local file, YouTube URL, or multiple videos
- Select appropriate model based on requirements (default: gemini-2.5-flash)
- Run the analysis script with proper parameters
- Parse and present results to the user clearly
- Handle errors gracefully with helpful suggestions
当用户请求视频分析时:
- 先检查API密钥可用性 - 使用辅助脚本
- 确定视频来源:本地文件、YouTube链接或多个视频
- 选择合适的模型 - 根据需求选择(默认:gemini-2.5-flash)
- 运行分析脚本 - 使用正确的参数
- 解析并清晰呈现结果 给用户
- 优雅处理错误 - 提供有用的建议
Files API Workflow:
Files API工作流:
For videos >20MB or reusable content:
- Upload video using Files API (script handles this automatically)
- Wait for ACTIVE state (polling included in script)
- Use file URI for analysis
- Files auto-delete after 48 hours
针对>20MB的视频或可复用内容:
- 使用Files API上传视频(脚本会自动处理)
- 等待状态变为ACTIVE(脚本包含轮询逻辑)
- 使用文件URI进行分析
- 文件会在48小时后自动删除
Inline Data Workflow:
内联数据工作流:
For videos <20MB:
- Read video file as bytes
- Base64 encode for API
- Send in generateContent request
- Single-use, no upload needed
针对<20MB的视频:
- 读取视频文件为字节流
- 进行Base64编码以适配API
- 在generateContent请求中发送
- 单次使用,无需上传
Example Workflows
示例工作流
Workflow 1: YouTube Video Summary
工作流1:YouTube视频摘要
bash
undefinedbash
undefinedUser: "Analyze this YouTube tutorial video"
用户需求:"分析这个YouTube教程视频"
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--youtube-url "https://www.youtube.com/watch?v=abc123"
--prompt "Create a structured summary with: 1) Main topics, 2) Key takeaways, 3) Recommended audience"
--youtube-url "https://www.youtube.com/watch?v=abc123"
--prompt "Create a structured summary with: 1) Main topics, 2) Key takeaways, 3) Recommended audience"
undefinedpython .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--youtube-url "https://www.youtube.com/watch?v=abc123"
--prompt "Create a structured summary with: 1) Main topics, 2) Key takeaways, 3) Recommended audience"
--youtube-url "https://www.youtube.com/watch?v=abc123"
--prompt "Create a structured summary with: 1) Main topics, 2) Key takeaways, 3) Recommended audience"
undefinedWorkflow 2: Interview Transcription
工作流2:采访转录
bash
undefinedbash
undefinedUser: "Transcribe this interview with timestamps"
用户需求:"转录这个采访并添加时间戳"
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--video-path "interview.mp4"
--prompt "Transcribe this interview with speaker labels, timestamps, and visual descriptions of gestures or slides shown"
--video-path "interview.mp4"
--prompt "Transcribe this interview with speaker labels, timestamps, and visual descriptions of gestures or slides shown"
undefinedpython .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--video-path "interview.mp4"
--prompt "Transcribe this interview with speaker labels, timestamps, and visual descriptions of gestures or slides shown"
--video-path "interview.mp4"
--prompt "Transcribe this interview with speaker labels, timestamps, and visual descriptions of gestures or slides shown"
undefinedWorkflow 3: Product Comparison
工作流3:产品对比
bash
undefinedbash
undefinedUser: "Compare these two product demo videos"
用户需求:"对比这两个产品演示视频"
python .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--video-paths "demo1.mp4" "demo2.mp4"
--model "gemini-2.5-pro"
--prompt "Compare these product demos on: features shown, presentation quality, clarity of explanation, and overall effectiveness"
--video-paths "demo1.mp4" "demo2.mp4"
--model "gemini-2.5-pro"
--prompt "Compare these product demos on: features shown, presentation quality, clarity of explanation, and overall effectiveness"
undefinedpython .claude/skills/gemini-video-understanding/scripts/analyze_video.py
--video-paths "demo1.mp4" "demo2.mp4"
--model "gemini-2.5-pro"
--prompt "Compare these product demos on: features shown, presentation quality, clarity of explanation, and overall effectiveness"
--video-paths "demo1.mp4" "demo2.mp4"
--model "gemini-2.5-pro"
--prompt "Compare these product demos on: features shown, presentation quality, clarity of explanation, and overall effectiveness"
undefinedTroubleshooting
故障排除
API Key Not Found:
bash
undefinedAPI密钥未找到:
bash
undefinedCheck API key detection
检查API密钥检测情况
python .claude/skills/gemini-video-understanding/scripts/check_api_key.py
**Video Too Large:**Error: Request size exceeds 20MB
Solution: Script automatically uses Files API for large videos
**Processing Timeout:**Error: File not reaching ACTIVE state
Solution: Check video integrity, try smaller file, or different format
**Rate Limit Errors:**Error: 429 Too Many Requests
Solution: Wait before retry, or upgrade to paid tier
undefinedpython .claude/skills/gemini-video-understanding/scripts/check_api_key.py
**视频文件过大:**错误:请求大小超过20MB
解决方案:脚本会自动对大视频使用Files API
**处理超时:**错误:文件未变为ACTIVE状态
解决方案:检查视频完整性,尝试更小的文件或其他格式
**速率限制错误:**错误:429 Too Many Requests
解决方案:等待后重试,或升级至付费层级
undefinedAdditional Resources
额外资源
- API Documentation: https://ai.google.dev/gemini-api/docs/video-understanding
- Files API Guide: https://ai.google.dev/gemini-api/docs/vision#uploading-files
- Rate Limits: https://ai.google.dev/gemini-api/docs/rate-limits
- Pricing: https://ai.google.dev/pricing
- Get API Key: https://aistudio.google.com/apikey
Version History
版本历史
- 1.0.0 (2025-10-26): Initial release with full video understanding capabilities
- 1.0.0(2025-10-26):初始版本,包含完整的视频理解功能