video-frames
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo Frames
视频帧提取
Extract frames from video files using ffmpeg, producing JPEG images optimized for LLM vision analysis. Supports multiple frame-selection strategies (fixed FPS, scene detection, target frame count), quality presets, model-aware dimension optimization, and OCR enhancements.
使用ffmpeg从视频文件中提取帧,生成针对LLM视觉分析优化的JPEG图片。支持多种帧选择策略(固定FPS、场景检测、目标帧数量)、质量预设、模型感知维度优化以及OCR增强功能。
Prerequisites
前置要求
ffmpeg and ffprobe must be installed and on PATH:
bash
brew install ffmpeg # macOS必须安装ffmpeg和ffprobe并配置到PATH中:
bash
brew install ffmpeg # macOSWorkflow
工作流程
- Receive a video file path from the user
- Run to extract JPEG frames
scripts/extract_frames.py - Parse the JSON output for frame paths, resolution, and token estimates
- Read the extracted frames as image attachments for analysis
- Answer the user's question about the video content
- Clean up temp directories when done
- 接收用户提供的视频文件路径
- 运行提取JPEG帧
scripts/extract_frames.py - 解析JSON输出,获取帧路径、分辨率和token估算值
- 将提取的帧作为图片附件读取,用于分析
- 回答用户关于视频内容的问题
- 完成后清理临时目录
Quick Start
快速开始
The simplest invocation -- extracts 1 frame/second at balanced quality:
bash
python3 scripts/extract_frames.py video.mp4For most use cases, use to let the script auto-calculate FPS:
--max-framesbash
python3 scripts/extract_frames.py video.mp4 --max-frames 30This is the preferred approach over manually setting , since it adapts to any video length and keeps the frame count predictable.
--fps最简单的调用方式——以平衡质量每秒提取1帧:
bash
python3 scripts/extract_frames.py video.mp4对于大多数使用场景,建议使用让脚本自动计算FPS:
--max-framesbash
python3 scripts/extract_frames.py video.mp4 --max-frames 30这是比手动设置更推荐的方式,因为它能适配任意视频长度,同时保持帧数量可预测。
--fpsPresets
预设配置
Four quality presets control resolution, JPEG quality, and image processing:
| Preset | Max dim | Quality | Extras | Best for |
|---|---|---|---|---|
| 768px | 5 | -- | Bulk frames, long videos |
| 1024px | 3 | -- | General analysis (default) |
| 1568px | 2 | -- | Fine detail, small objects |
| 1568px | 1 | grayscale + high contrast + sharpen | Text/document extraction |
bash
undefined四种质量预设控制分辨率、JPEG质量和图像处理:
| 预设名称 | 最大尺寸 | 质量值 | 额外配置 | 最佳适用场景 |
|---|---|---|---|---|
| 768px | 5 | -- | 批量帧提取、长视频 |
| 1024px | 3 | -- | 通用分析(默认) |
| 1568px | 2 | -- | 精细细节、小物体识别 |
| 1568px | 1 | 灰度化 + 高对比度 + 锐化处理 | 文本/文档提取 |
bash
undefinedLong video, keep costs low
长视频,控制成本
python3 scripts/extract_frames.py long_video.mp4 --max-frames 20 --preset efficient
python3 scripts/extract_frames.py long_video.mp4 --max-frames 20 --preset efficient
Need to read text in a screencast
需要提取屏幕录制视频中的文本
python3 scripts/extract_frames.py screencast.mp4 --max-frames 40 --preset ocr
Quality (1=best, 31=worst) and max dimension can be overridden independently:
```bash
python3 scripts/extract_frames.py video.mp4 --preset balanced --quality 1 --max-dimension 1568python3 scripts/extract_frames.py screencast.mp4 --max-frames 40 --preset ocr
质量值(1=最佳,31=最差)和最大尺寸可独立覆盖预设:
```bash
python3 scripts/extract_frames.py video.mp4 --preset balanced --quality 1 --max-dimension 1568Scene-Change Detection
场景变化检测
Instead of extracting at a fixed rate, detect visual scene changes and extract one frame per scene. This is ideal for videos with distinct segments (presentations, edited footage, tutorials).
bash
python3 scripts/extract_frames.py video.mp4 --scene-threshold 0.3- (float, 0.0-1.0): Sensitivity. Lower = more sensitive, detects smaller changes. Start with
--scene-threshold(the default when the flag is used).0.3 - (float, default: 1.0): Minimum seconds between detected scenes. Prevents burst detections during rapid cuts.
--min-scene-interval
Note: and are mutually exclusive. can only be used with mode, not scene detection.
--fps--scene-threshold--max-frames--fpsbash
undefined无需按固定速率提取帧,可检测视觉场景变化并为每个场景提取1帧。这适用于包含不同片段的视频(演示文稿、剪辑素材、教程视频)。
bash
python3 scripts/extract_frames.py video.mp4 --scene-threshold 0.3- (浮点数,0.0-1.0):敏感度。值越低越敏感,能检测到更小的变化。首次使用建议从
--scene-threshold开始(启用该标志时的默认值)。0.3 - (浮点数,默认值:1.0):检测到的场景之间的最小间隔秒数。防止快速切换时的连续检测。
--min-scene-interval
注意: 和互斥。仅可与模式配合使用,无法用于场景检测模式。
--fps--scene-threshold--max-frames--fpsbash
undefinedPresentation with clear slide transitions
包含清晰幻灯片切换的演示文稿
python3 scripts/extract_frames.py presentation.mp4 --scene-threshold 0.2
python3 scripts/extract_frames.py presentation.mp4 --scene-threshold 0.2
Action footage -- less sensitive, min 2s apart
动作视频——降低敏感度,最小间隔2秒
python3 scripts/extract_frames.py action.mp4 --scene-threshold 0.5 --min-scene-interval 2.0
undefinedpython3 scripts/extract_frames.py action.mp4 --scene-threshold 0.5 --min-scene-interval 2.0
undefinedModel-Aware Optimization
模型感知优化
Use to resize frames to dimensions that align with a specific model's tile boundaries, minimizing wasted tokens:
--target-model| Model | Max dim | Rationale |
|---|---|---|
| 1568px | Max native resolution before auto-resize |
| 768px | Aligned to 512px tile grid (shortest side 768) |
| 768px | Aligned to 768px tile boundaries |
| 768px | Sweet spot across all models (default) |
bash
undefined使用将帧调整为与特定模型的分片边界对齐的尺寸,减少token浪费:
--target-model| 模型名称 | 最大尺寸 | 调整依据 |
|---|---|---|
| 1568px | 自动调整前的最大原生分辨率 |
| 768px | 对齐512px分片网格(最短边768) |
| 768px | 对齐768px分片边界 |
| 768px | 适用于所有模型的折中方案(默认) |
bash
undefinedOptimized for Claude -- maximum detail
针对Claude优化——最大化细节
python3 scripts/extract_frames.py video.mp4 --max-frames 30 --target-model claude
python3 scripts/extract_frames.py video.mp4 --max-frames 30 --target-model claude
Optimized for GPT-4o -- efficient tile packing
针对GPT-4o优化——高效分片打包
python3 scripts/extract_frames.py video.mp4 --max-frames 30 --target-model openai
`--target-model` sets the max dimension unless `--max-dimension` is explicitly provided (CLI override takes priority).
See `references/llm-image-specs.md` for detailed token formulas, tile calculations, and optimal dimension tables for each model.python3 scripts/extract_frames.py video.mp4 --max-frames 30 --target-model openai
`--target-model`会设置最大尺寸,除非显式提供`--max-dimension`(CLI覆盖优先)。
查看`references/llm-image-specs.md`获取各模型的详细token计算公式、分片计算和最佳尺寸表。OCR and Grayscale Mode
OCR与灰度模式
For videos containing text (screencasts, presentations, documents, terminal recordings):
bash
undefined针对包含文本的视频(屏幕录制、演示文稿、文档、终端录制):
bash
undefinedFull OCR pipeline via preset
通过预设启用完整OCR流程
python3 scripts/extract_frames.py screencast.mp4 --preset ocr --max-frames 40
python3 scripts/extract_frames.py screencast.mp4 --preset ocr --max-frames 40
Manual OCR flags (can combine with any preset)
手动设置OCR标志(可与任意预设组合)
python3 scripts/extract_frames.py video.mp4 --grayscale --high-contrast
- `--grayscale`: Converts frames to grayscale. Reduces file size ~60% with no OCR accuracy loss.
- `--high-contrast`: Applies `contrast=1.3, brightness=0.05` to improve text/background separation.
- The `ocr` preset enables both flags **plus** unsharp-mask sharpening at 1568px, quality 1 (best JPEG).python3 scripts/extract_frames.py video.mp4 --grayscale --high-contrast
- `--grayscale`:将帧转换为灰度图。文件大小减少约60%且不损失OCR准确率。
- `--high-contrast`:应用`contrast=1.3, brightness=0.05`提升文本与背景的区分度。
- `ocr`预设会启用上述两个标志**外加**在1568px尺寸下的锐化处理,质量值设为1(最佳JPEG质量)。Advanced Options
高级选项
FPS Selection Guide
FPS选择指南
When using directly instead of :
--fps--max-frames| Video length | Recommended fps | Rationale |
|---|---|---|
| < 30s | 2-5 | Short clip, capture detail |
| 30s - 5min | 1 | Good balance of coverage vs frame count |
| 5min - 30min | 0.5 | Avoid excessive frames |
| > 30min | 0.1 - 0.2 | Sample key moments only |
Keep total frame count under ~50 for optimal LLM context usage. Formula: .
duration_seconds * fps = frame_countPrefer over manual FPS -- it auto-calculates the right rate and clamps to 0.05-30.0 FPS.
--max-frames当直接使用而非时:
--fps--max-frames| 视频时长 | 推荐fps值 | 选择依据 |
|---|---|---|
| < 30秒 | 2-5 | 短视频,捕捉细节 |
| 30秒 - 5分钟 | 1 | 覆盖范围与帧数量的平衡方案 |
| 5分钟 - 30分钟 | 0.5 | 避免过多帧 |
| > 30分钟 | 0.1 - 0.2 | 仅采样关键时刻 |
保持总帧数量在~50以内,以优化LLM上下文使用。计算公式:。
视频时长(秒) * fps = 帧数量优先使用而非手动设置FPS——它会自动计算合适的速率,并限制在0.05-30.0 FPS范围内。
--max-framesTimestamp Overlay
时间戳叠加
bash
python3 scripts/extract_frames.py video.mp4 --timestamps --max-frames 30Overlays the source filename and timestamp in the bottom-right corner of each frame (white text on semi-transparent black box). Use when the user needs to reference specific moments in the video.
hh:mm:ssbash
python3 scripts/extract_frames.py video.mp4 --timestamps --max-frames 30在每个帧的右下角叠加源文件名和时间戳(半透明黑底白字)。适用于用户需要参考视频特定时刻的场景。
hh:mm:ssAll CLI Options Reference
所有CLI选项参考
| Option | Type | Default | Description |
|---|---|---|---|
| pos. | (required) | Path to the video file |
| float | 1.0 | Frames per second (mutually exclusive with |
| float | -- | Scene-change sensitivity 0.0-1.0 (mutually exclusive with |
| float | 1.0 | Min seconds between scene-change frames |
| int | -- | Auto-calculate FPS to produce ~N frames |
| choice | balanced | |
| int | -- | Override max pixel dimension (longest edge) |
| int | -- | JPEG quality 1-31 (1=best, 31=worst) |
| choice | -- | |
| flag | off | Convert to grayscale |
| flag | off | Boost contrast for text readability |
| flag | off | Overlay filename + timestamp on frames |
| string | temp dir | Output directory for extracted frames |
| 选项名称 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| 位置参数 | 必填项 | 视频文件路径 |
| 浮点数 | 1.0 | 每秒帧数(与 |
| 浮点数 | -- | 场景变化敏感度0.0-1.0(与 |
| 浮点数 | 1.0 | 场景变化帧之间的最小间隔秒数 |
| 整数 | -- | 自动计算FPS以生成约N帧 |
| 选项值 | balanced | |
| 整数 | -- | 覆盖最大像素尺寸(最长边) |
| 整数 | -- | JPEG质量1-31(1=最佳,31=最差) |
| 选项值 | -- | |
| 标志位 | 关闭 | 转换为灰度图 |
| 标志位 | 关闭 | 提升对比度以增强文本可读性 |
| 标志位 | 关闭 | 在帧上叠加文件名+时间戳 |
| 字符串 | 临时目录 | 提取帧的输出目录 |
Output JSON Structure
输出JSON结构
The script prints JSON to stdout with the following structure:
json
{
"output_dir": "/tmp/video_frames_abc123/",
"frames": ["/tmp/video_frames_abc123/frame_00001.jpg", "..."],
"preset": "balanced",
"resolution": { "width": 1024, "height": 576 },
"token_estimate": {
"frame_count": 30,
"per_frame": {
"claude": 787,
"openai_high": 765,
"openai_low": 85,
"openai_patch": 934,
"gemini": 258
},
"total": {
"claude": 23610,
"openai_high": 22950,
"openai_low": 2550,
"openai_patch": 28020,
"gemini": 7740
}
},
"summary": {
"video_duration_seconds": 120.5,
"extraction_method": "max_frames",
"scene_changes_detected": null,
"frames_extracted": 30,
"estimated_total_tokens": {
"claude": 23610,
"openai_high": 22950,
"openai_low": 2550,
"openai_patch": 28020,
"gemini": 7740
}
}
}Use to verify the frame set fits within model context limits before attaching frames to a prompt.
token_estimate.totalNote:andopenai_highare for legacy models (GPT-4o, GPT-4.1).openai_lowis for newer models (gpt-5.4+, gpt-5-mini, o4-mini). Seeopenai_patchfor details.references/llm-image-specs.md
On error, JSON with an key is printed to stderr and the script exits with code 1.
"error"脚本会向标准输出打印如下结构的JSON:
json
{
"output_dir": "/tmp/video_frames_abc123/",
"frames": ["/tmp/video_frames_abc123/frame_00001.jpg", "..."],
"preset": "balanced",
"resolution": { "width": 1024, "height": 576 },
"token_estimate": {
"frame_count": 30,
"per_frame": {
"claude": 787,
"openai_high": 765,
"openai_low": 85,
"openai_patch": 934,
"gemini": 258
},
"total": {
"claude": 23610,
"openai_high": 22950,
"openai_low": 2550,
"openai_patch": 28020,
"gemini": 7740
}
},
"summary": {
"video_duration_seconds": 120.5,
"extraction_method": "max_frames",
"scene_changes_detected": null,
"frames_extracted": 30,
"estimated_total_tokens": {
"claude": 23610,
"openai_high": 22950,
"openai_low": 2550,
"openai_patch": 28020,
"gemini": 7740
}
}
}在将帧附加到提示词之前,使用验证帧集是否符合模型上下文限制。
token_estimate.total注意:和openai_high适用于旧版模型(GPT-4o、GPT-4.1)。openai_low适用于新版模型(gpt-5.4+、gpt-5-mini、o4-mini)。详情请查看openai_patch。references/llm-image-specs.md
发生错误时,脚本会向标准错误输出包含键的JSON,并以代码1退出。
"error"After Extraction
提取后操作
- Parse the JSON output to get the list of frame paths from
frames - Check to ensure the frames fit within context limits
token_estimate.total - Read each frame image using the Read tool (they are JPEG files)
- Analyze the frames to answer the user's question
- Clean up: delete the output directory when done if it was a temp dir
- 解析JSON输出,从字段获取帧路径列表
frames - 检查确保帧符合上下文限制
token_estimate.total - 使用读取工具读取每个帧图片(均为JPEG文件)
- 分析帧以回答用户的问题
- 清理:如果输出目录是临时目录,完成后将其删除