Video Reader Skill

Extract key frames to "see" videos, and extract + transcribe audio to "hear" them.

提取关键帧来“查看”视频内容，同时提取并转录音频来“聆听”音频信息。

Quick Start

快速开始

bash

undefined

bash

undefined

Get video info (duration, resolution, codec)

获取视频信息（时长、分辨率、编解码器）

ffprobe -v error -show_entries format=duration:stream=codec_name,width,height -of json "$VIDEO_PATH"

Extract key frames (1 per second, max 12 frames)

提取关键帧（每秒1帧，最多12帧）

OUTDIR=/tmp/alma-frames-$(date +%s) mkdir -p "$OUTDIR" DURATION=$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | cut -d. -f1) FPS_RATE=$(echo "scale=2; 12 / $DURATION" | bc 2>/dev/null || echo "1")

Cap at 1fps for short videos

短视频限制为1fps

if (( $(echo "$FPS_RATE > 1" | bc -l 2>/dev/null || echo 0) )); then FPS_RATE=1; fi ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vf "fps=$FPS_RATE,scale=720:-1" -frames:v 12 "$OUTDIR/frame_%02d.jpg" ls "$OUTDIR"

undefined

if (( $(echo "$FPS_RATE > 1" | bc -l 2>/dev/null || echo 0) )); then FPS_RATE=1; fi ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vf "fps=$FPS_RATE,scale=720:-1" -frames:v 12 "$OUTDIR/frame_%02d.jpg" ls "$OUTDIR"

undefined

How to Use

使用方法

Get video info first to know the duration
For short videos (<15s): extract 1 frame per second
For medium videos (15-60s): extract ~8-12 frames evenly spread
For long videos (>60s): extract 12 frames at key intervals
Look at the extracted frames (they're image files) to describe the video content

先获取视频信息，了解其时长
短视频（<15秒）：每秒提取1帧
中等时长视频（15-60秒）：均匀提取约8-12帧
长视频（>60秒）：在关键时间点提取12帧
查看提取的帧（图片文件）来描述视频内容

Frame Extraction Patterns

关键帧提取模式

bash

undefined

bash

undefined

Even spread: N frames across entire video

均匀分布：在整个视频中提取N帧

ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH"
-vf "select='not(mod(n,$(ffprobe -v error -count_frames -select_streams v:0 -show_entries stream=nb_read_frames -of csv=p=0 "$VIDEO_PATH" | awk -v n=8 '{printf "%d", $1/n}')))'"
-vsync vfr -frames:v 8 "$OUTDIR/frame_%02d.jpg"

Specific timestamp

指定时间戳提取

ffmpeg -hide_banner -loglevel error -ss 5.0 -i "$VIDEO_PATH" -frames:v 1 "$OUTDIR/at_5s.jpg"

Thumbnail grid (single image overview)

缩略图网格（单张图片概览）

ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH"
-vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3"
-frames:v 1 "$OUTDIR/grid.jpg"

undefined

ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH"
-vf "fps=1/$(ffprobe -v error -show_entries format=duration -of csv=p=0 "$VIDEO_PATH" | awk '{printf "%.1f", $1/9}'),scale=320:-1,tile=3x3"
-frames:v 1 "$OUTDIR/grid.jpg"

undefined

Tips

小贴士

Always use
```
-hide_banner -loglevel error
```
to keep output clean
Scale down to 720px width (
```
scale=720:-1
```
) to save tokens when sending to AI
Clean up frames after analysis:
```
rm -rf "$OUTDIR"
```
The extracted frames are regular image files — include their paths in your reply and they'll be auto-sent to Telegram

始终使用
```
-hide_banner -loglevel error
```
参数来保持输出简洁
将帧缩放到720px宽度（
```
scale=720:-1
```
），发送给AI时可节省token
分析完成后清理帧文件：
```
rm -rf "$OUTDIR"
```
提取的帧是常规图片文件——在回复中包含其路径，它们会自动发送到Telegram

Audio: "Hearing" Videos

音频：“聆听”视频

bash

undefined

bash

undefined

Extract audio from video

从视频中提取音频

ffmpeg -hide_banner -loglevel error -i "$VIDEO_PATH" -vn -acodec pcm_s16le -ar 16000 -ac 1 "/tmp/alma-audio-$(date +%s).wav"

Transcribe with Whisper (auto-detect language)

使用Whisper转录（自动检测语言）

whisper "/tmp/alma-audio.wav" --model turbo --output_format txt --output_dir /tmp/alma-whisper

Transcribe with specific language

指定语言转录

whisper "/tmp/alma-audio.wav" --model turbo --language zh --output_format txt --output_dir /tmp/alma-whisper

Read transcription

查看转录内容

cat /tmp/alma-whisper/*.txt

undefined

cat /tmp/alma-whisper/*.txt

undefined

When to See vs Hear

何时查看画面vs聆听音频

"这个视频里有啥" → Extract frames (see) + transcribe audio (hear) for full picture
"他说了什么" → Transcribe audio only
"这个视频好看吗" → Extract frames to see the visuals
"好听" → The user is commenting on audio content, transcribe to understand
Music/street performance → Mention what you see in frames + note the audio content
When in doubt, do BOTH — extract a few frames AND transcribe the audio

"这个视频里有啥" → 提取帧（查看画面）+ 转录音频（聆听内容）以获取完整信息
"他说了什么" → 仅转录音频
"这个视频好看吗" → 提取帧查看画面内容
"好听" → 用户在评价音频内容，转录音频以理解具体内容
音乐/街头表演 → 描述帧中的画面内容，并说明音频信息
不确定时，两者都做——提取少量帧并转录音频

Cleanup

清理操作

Always clean up temp files after analysis:

bash

rm -rf "$OUTDIR" /tmp/alma-whisper /tmp/alma-audio*.wav

分析完成后务必清理临时文件：

bash

rm -rf "$OUTDIR" /tmp/alma-whisper /tmp/alma-audio*.wav

video-reader

Original

Translation

Video Reader Skill

Video Reader Skill

Quick Start

快速开始

Get video info (duration, resolution, codec)

获取视频信息（时长、分辨率、编解码器）

Extract key frames (1 per second, max 12 frames)

提取关键帧（每秒1帧，最多12帧）

Cap at 1fps for short videos

短视频限制为1fps

How to Use

使用方法

Frame Extraction Patterns

关键帧提取模式

Even spread: N frames across entire video

均匀分布：在整个视频中提取N帧

Specific timestamp

指定时间戳提取

Thumbnail grid (single image overview)

缩略图网格（单张图片概览）

Tips

小贴士

Audio: "Hearing" Videos

音频：“聆听”视频

Extract audio from video

从视频中提取音频

Transcribe with Whisper (auto-detect language)

使用Whisper转录（自动检测语言）

Transcribe with specific language

指定语言转录

Read transcription

查看转录内容

When to See vs Hear

何时查看画面vs聆听音频

Cleanup

清理操作