videodb

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

VideoDB Skill

Perception + memory + actions for video, live streams, and desktop sessions.

针对视频、直播流和桌面会话的感知+记忆+处理能力。

When to use

适用场景

Desktop Perception

桌面感知

Start/stop a desktop session capturing screen, mic, and system audio
Stream live context and store episodic session memory
Run real-time alerts/triggers on what's spoken and what's happening on screen
Produce session summaries, a searchable timeline, and playable evidence links

启动/停止捕获屏幕、麦克风和系统音频的桌面会话
传输实时上下文并存储会话片段记忆
针对屏幕上的画面和语音内容运行实时警报/触发器
生成会话摘要、可搜索的时间线以及可播放的证据链接

Video ingest + stream

视频导入与流处理

Ingest a file or URL and return a playable web stream link
Transcode/normalize: codec, bitrate, fps, resolution, aspect ratio

导入文件或URL并返回可播放的网页流链接
转码/标准化：调整编解码器、比特率、帧率、分辨率、宽高比

Index + search (timestamps + evidence)

索引与搜索（时间戳+证据）

Build visual, spoken, and keyword indexes
Search and return exact moments with timestamps and playable evidence
Auto-create clips from search results

构建视觉、语音和关键词索引
搜索并返回包含时间戳和可播放证据的精准时刻
根据搜索结果自动创建剪辑片段

Timeline editing + generation

时间线编辑与生成

Subtitles: generate, translate, burn-in
Overlays: text/image/branding, motion captions
Audio: background music, voiceover, dubbing
Programmatic composition and exports via timeline operations

字幕：生成、翻译、内嵌
叠加层：文字/图片/品牌标识、动态字幕
音频：背景音乐、旁白、配音
通过时间线操作进行程序化合成与导出

Live streams (RTSP) + monitoring

直播流（RTSP）与监控

Connect RTSP/live feeds
Run real-time visual and spoken understanding and emit events/alerts for monitoring workflows

连接RTSP/直播流
运行实时视觉与语音理解，并为监控工作流触发事件/警报

How it works

工作原理

Common inputs

常见输入

Local file path, public URL, or RTSP URL
Desktop capture request: start / stop / summarize session
Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules

本地文件路径、公共URL或RTSP URL
桌面捕获请求：启动/停止/总结会话
所需操作：获取理解用的上下文、转码规格、索引规格、搜索查询、剪辑范围、时间线编辑、警报规则

Common outputs

常见输出

Stream URL
Search results with timestamps and evidence links
Generated assets: subtitles, audio, images, clips
Event/alert payloads for live streams
Desktop session summaries and memory entries

流链接
包含时间戳和证据链接的搜索结果
生成的资产：字幕、音频、图片、剪辑片段
直播流的事件/警报负载
桌面会话摘要和记忆条目

Running Python code

运行Python代码

Before running any VideoDB code, change to the project directory and load environment variables:

python

from dotenv import load_dotenv
load_dotenv(".env")

import videodb
conn = videodb.connect()

This reads

VIDEO_DB_API_KEY

from:

Environment (if already exported)
Project's
```
.env
```
file in current directory

If the key is missing,

videodb.connect()

raises

AuthenticationError

automatically.

Do NOT write a script file when a short inline command works.

When writing inline Python (

python -c "..."

), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead:

bash

python << 'EOF'
from dotenv import load_dotenv
load_dotenv(".env")

import videodb
conn = videodb.connect()
coll = conn.get_collection()
print(f"Videos: {len(coll.get_videos())}")
EOF

在运行任何VideoDB代码前，切换到项目目录并加载环境变量：

python

from dotenv import load_dotenv
load_dotenv(".env")

import videodb
conn = videodb.connect()

该代码会从以下位置读取

VIDEO_DB_API_KEY

：

环境变量（若已导出）
当前目录下的项目
```
.env
```
文件

如果密钥缺失，

videodb.connect()

会自动抛出

AuthenticationError

。

当简短的内联命令可实现需求时，请勿编写脚本文件。

编写内联Python代码（

python -c "..."

）时，请始终使用格式规范的代码——使用分号分隔语句并保持可读性。对于超过约3条语句的代码，请使用 heredoc：

bash

python << 'EOF'
from dotenv import load_dotenv
load_dotenv(".env")

import videodb
conn = videodb.connect()
coll = conn.get_collection()
print(f"Videos: {len(coll.get_videos())}")
EOF

Setup

安装配置

When the user asks to "setup videodb" or similar:

当用户要求“设置videodb”或类似需求时：

1. Install SDK

1. 安装SDK

bash

pip install "videodb[capture]" python-dotenv

videodb[capture]

fails on Linux, install without the capture extra:

bash

pip install videodb python-dotenv

bash

pip install "videodb[capture]" python-dotenv

如果在Linux上安装

videodb[capture]

失败，请安装不带capture扩展的版本：

bash

pip install videodb python-dotenv

2. Configure API key

2. 配置API密钥

The user must set

VIDEO_DB_API_KEY

using either method:

Export in terminal (before starting Claude):
```
export VIDEO_DB_API_KEY=your-key
```
Project
.env
file: Save
```
VIDEO_DB_API_KEY=your-key
```
in the project's
```
.env
```
file

Get a free API key at https://console.videodb.io (50 free uploads, no credit card).

Do NOT read, write, or handle the API key yourself. Always let the user set it.

用户必须通过以下任意一种方式设置

VIDEO_DB_API_KEY

：

在终端中导出（启动Claude前）：
```
export VIDEO_DB_API_KEY=your-key
```
项目.env文件：在项目的.env文件中保存
```
VIDEO_DB_API_KEY=your-key
```

可在https://console.videodb.io获取免费API密钥（50次免费上传，无需信用卡）。

请勿自行读取、写入或处理API密钥，请始终让用户自行设置。

Quick Reference

快速参考

Upload media

上传媒体

python

undefined

python

undefined

URL

video = coll.upload(url="https://example.com/video.mp4")

YouTube

video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")

Local file

本地文件

video = coll.upload(file_path="/path/to/video.mp4")

undefined

video = coll.upload(file_path="/path/to/video.mp4")

undefined

Transcript + subtitle

转录与字幕

python

undefined

python

undefined

force=True skips the error if the video is already indexed

force=True会在视频已被索引时跳过错误

video.index_spoken_words(force=True) text = video.get_transcript_text() stream_url = video.add_subtitle()

undefined

video.index_spoken_words(force=True) text = video.get_transcript_text() stream_url = video.add_subtitle()

undefined

Search inside videos

视频内搜索

python

from videodb.exceptions import InvalidRequestError

video.index_spoken_words(force=True)

python

from videodb.exceptions import InvalidRequestError

video.index_spoken_words(force=True)

search() raises InvalidRequestError when no results are found.

当无结果时，search()会抛出InvalidRequestError。

Always wrap in try/except and treat "No results found" as empty.

请始终用try/except包裹，并将“未找到结果”视为空结果。

try: results = video.search("product demo") shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise

undefined

try: results = video.search("product demo") shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise

undefined

Scene search

场景搜索

python

import re
from videodb import SearchType, IndexType, SceneExtractionType
from videodb.exceptions import InvalidRequestError

python

import re
from videodb import SearchType, IndexType, SceneExtractionType
from videodb.exceptions import InvalidRequestError

index_scenes() has no force parameter — it raises an error if a scene

index_scenes()没有force参数——如果场景索引已存在，会抛出错误。从错误信息中提取现有索引ID。

index already exists. Extract the existing index ID from the error.

—

try: scene_index_id = video.index_scenes( extraction_type=SceneExtractionType.shot_based, prompt="Describe the visual content in this scene.", ) except Exception as e: match = re.search(r"id\s+([a-f0-9]+)", str(e)) if match: scene_index_id = match.group(1) else: raise

Use score_threshold to filter low-relevance noise (recommended: 0.3+)

使用score_threshold过滤低相关性结果（推荐值：0.3+）

try: results = video.search( query="person writing on a whiteboard", search_type=SearchType.semantic, index_type=IndexType.scene, scene_index_id=scene_index_id, score_threshold=0.3, ) shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise

undefined

undefined

Timeline editing

时间线编辑

Important: Always validate timestamps before building a timeline:

```
start
```
must be >= 0 (negative values are silently accepted but produce broken output)
```
start
```
must be <
```
end
```
```
end
```
must be <=
```
video.length
```

python

from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle

timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30))
timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36)))
stream_url = timeline.generate_stream()

注意： 在构建时间线前，请始终验证时间戳：

```
start
```
必须 >= 0（负值会被静默接受，但会产生损坏的输出）
```
start
```
必须 <
```
end
```
```
end
```
必须 <=
```
video.length
```

python

from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle

timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30))
timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36)))
stream_url = timeline.generate_stream()

Transcode video (resolution / quality change)

转码视频（调整分辨率/画质）

python

from videodb import TranscodeMode, VideoConfig, AudioConfig

python

from videodb import TranscodeMode, VideoConfig, AudioConfig

Change resolution, quality, or aspect ratio server-side

在服务器端调整分辨率、画质或宽高比

job_id = conn.transcode( source="https://example.com/video.mp4", callback_url="https://example.com/webhook", mode=TranscodeMode.economy, video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"), audio_config=AudioConfig(mute=False), )

undefined

undefined

Reframe aspect ratio (for social platforms)

调整宽高比（适配社交平台）

Warning:

reframe()

is a slow server-side operation. For long videos it can take several minutes and may time out. Best practices:

Always limit to a short segment using
```
start
```
/
```
end
```
when possible
For full-length videos, use
```
callback_url
```
for async processing
Trim the video on a
```
Timeline
```
first, then reframe the shorter result

python

from videodb import ReframeMode

警告：

reframe()

是一项较慢的服务器端操作。对于长视频，可能需要数分钟甚至超时。最佳实践：

尽可能使用
```
start
```
/
```
end
```
限制为短片段
对于全长度视频，使用
```
callback_url
```
进行异步处理
先在
```
Timeline
```
上修剪视频，再对较短的结果进行宽高比调整

python

from videodb import ReframeMode

Always prefer reframing a short segment:

优先对短片段进行宽高比调整：

reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)

Async reframe for full-length videos (returns None, result via webhook):

全长度视频异步调整宽高比（返回None，结果通过webhook获取）：

video.reframe(target="vertical", callback_url="https://example.com/webhook")

Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9)

预设值："vertical"（9:16）、"square"（1:1）、"landscape"（16:9）

reframed = video.reframe(start=0, end=60, target="square")

Custom dimensions

自定义尺寸

reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})

undefined

reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})

undefined

Generative media

生成式媒体

python

image = coll.generate_image(
    prompt="a sunset over mountains",
    aspect_ratio="16:9",
)

python

image = coll.generate_image(
    prompt="a sunset over mountains",
    aspect_ratio="16:9",
)

Error handling

错误处理

python

from videodb.exceptions import AuthenticationError, InvalidRequestError

try:
    conn = videodb.connect()
except AuthenticationError:
    print("Check your VIDEO_DB_API_KEY")

try:
    video = coll.upload(url="https://example.com/video.mp4")
except InvalidRequestError as e:
    print(f"Upload failed: {e}")

python

from videodb.exceptions import AuthenticationError, InvalidRequestError

try:
    conn = videodb.connect()
except AuthenticationError:
    print("Check your VIDEO_DB_API_KEY")

try:
    video = coll.upload(url="https://example.com/video.mp4")
except InvalidRequestError as e:
    print(f"Upload failed: {e}")

Common pitfalls

常见问题

Scenario	Error message	Solution
Indexing an already-indexed video	`Spoken word index for video already exists`	Use `video.index_spoken_words(force=True)` to skip if already indexed
Scene index already exists	`Scene index with id XXXX already exists`	Extract the existing `scene_index_id` from the error with `re.search(r"id\s+([a-f0-9]+)", str(e))`
Search finds no matches	`InvalidRequestError: No results found`	Catch the exception and treat as empty results ( `shots = []` )
Reframe times out	Blocks indefinitely on long videos	Use `start` / `end` to limit segment, or pass `callback_url` for async
Negative timestamps on Timeline	Silently produces broken stream	Always validate `start >= 0` before creating `VideoAsset`
`generate_video()` / `create_collection()` fails	`Operation not allowed` or `maximum limit`	Plan-gated features — inform the user about plan limits

场景	错误信息	解决方案
对已索引的视频再次索引	`Spoken word index for video already exists`	使用 `video.index_spoken_words(force=True)` 跳过已索引的视频
场景索引已存在	`Scene index with id XXXX already exists`	使用 `re.search(r"id\s+([a-f0-9]+)", str(e))` 从错误信息中提取已有的 `scene_index_id`
搜索无匹配结果	`InvalidRequestError: No results found`	捕获异常并视为空结果（ `shots = []` ）
Reframe操作超时	长时间阻塞无响应	使用 `start` / `end` 限制片段，或传递 `callback_url` 进行异步处理
时间线使用负时间戳	静默生成损坏的流	创建 `VideoAsset` 前始终验证 `start >= 0`
`generate_video()` / `create_collection()` 失败	`Operation not allowed` 或 `maximum limit`	该功能受计划限制——告知用户相关计划限制

Examples

示例

Canonical prompts

标准提示词

"Start desktop capture and alert when a password field appears."
"Record my session and produce an actionable summary when it ends."
"Ingest this file and return a playable stream link."
"Index this folder and find every scene with people, return timestamps."
"Generate subtitles, burn them in, and add light background music."
"Connect this RTSP URL and alert when a person enters the zone."

"启动桌面捕获，当出现密码输入框时触发警报。"
"录制我的会话，结束后生成可执行的摘要。"
"导入该文件并返回可播放的流链接。"
"索引该文件夹并找出所有有人物的场景，返回时间戳。"
"生成字幕并内嵌，同时添加轻柔的背景音乐。"
"连接该RTSP URL，当有人进入区域时触发警报。"

Screen Recording (Desktop Capture)

屏幕录制（桌面捕获）

Use

ws_listener.py

to capture WebSocket events during recording sessions. Desktop capture supports macOS only.

使用

ws_listener.py

在录制会话期间捕获WebSocket事件。桌面捕获仅支持macOS。

Quick Start

快速开始

Choose state dir:

STATE_DIR="${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}"

Start listener:

VIDEODB_EVENTS_DIR="$STATE_DIR" python scripts/ws_listener.py --clear "$STATE_DIR" &

Get WebSocket ID:
```
cat "$STATE_DIR/videodb_ws_id"
```
Run capture code (see reference/capture.md for the full workflow)
Events written to:
```
$STATE_DIR/videodb_events.jsonl
```

Use

--clear

whenever you start a fresh capture run so stale transcript and visual events do not leak into the new session.

选择状态目录：

STATE_DIR="${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}"

启动监听器：

VIDEODB_EVENTS_DIR="$STATE_DIR" python scripts/ws_listener.py --clear "$STATE_DIR" &

获取WebSocket ID：
```
cat "$STATE_DIR/videodb_ws_id"
```
运行捕获代码（完整工作流请参考reference/capture.md）
事件写入位置：
```
$STATE_DIR/videodb_events.jsonl
```

每次启动新的捕获运行时，请使用

--clear

参数，避免旧的转录和视觉事件泄露到新会话中。

Query Events

查询事件

python

import json
import os
import time
from pathlib import Path

events_dir = Path(os.environ.get("VIDEODB_EVENTS_DIR", Path.home() / ".local" / "state" / "videodb"))
events_file = events_dir / "videodb_events.jsonl"
events = []

if events_file.exists():
    with events_file.open(encoding="utf-8") as handle:
        for line in handle:
            try:
                events.append(json.loads(line))
            except json.JSONDecodeError:
                continue

transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
cutoff = time.time() - 300
recent_visual = [
    e for e in events
    if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff
]

python

import json
import os
import time
from pathlib import Path

events_dir = Path(os.environ.get("VIDEODB_EVENTS_DIR", Path.home() / ".local" / "state" / "videodb"))
events_file = events_dir / "videodb_events.jsonl"
events = []

if events_file.exists():
    with events_file.open(encoding="utf-8") as handle:
        for line in handle:
            try:
                events.append(json.loads(line))
            except json.JSONDecodeError:
                continue

transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
cutoff = time.time() - 300
recent_visual = [
    e for e in events
    if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff
]

Additional docs

附加文档

Reference documentation is in the

reference/

directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.

reference/api-reference.md - Complete VideoDB Python SDK API reference
reference/search.md - In-depth guide to video search (spoken word and scene-based)
reference/editor.md - Timeline editing, assets, and composition
reference/streaming.md - HLS streaming and instant playback
reference/generative.md - AI-powered media generation (images, video, audio)
reference/rtstream.md - Live stream ingestion workflow (RTSP/RTMP)
reference/rtstream-reference.md - RTStream SDK methods and AI pipelines
reference/capture.md - Desktop capture workflow
reference/capture-reference.md - Capture SDK and WebSocket events
reference/use-cases.md - Common video processing patterns and examples

Do not use ffmpeg, moviepy, or local encoding tools when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing).

参考文档位于本SKILL.md文件相邻的

reference/

目录中。如有需要，请使用Glob工具查找。

reference/api-reference.md - 完整的VideoDB Python SDK API参考
reference/search.md - 视频搜索深度指南（语音与场景搜索）
reference/editor.md - 时间线编辑、资产与合成
reference/streaming.md - HLS流与即时播放
reference/generative.md - AI驱动的媒体生成（图片、视频、音频）
reference/rtstream.md - 直播流导入工作流（RTSP/RTMP）
reference/rtstream-reference.md - RTStream SDK方法与AI流水线
reference/capture.md - 桌面捕获工作流
reference/capture-reference.md - 捕获SDK与WebSocket事件
reference/use-cases.md - 常见视频处理模式与示例

**当VideoDB支持相关操作时，请勿使用ffmpeg、moviepy或本地编码工具。**以下操作均由VideoDB在服务器端处理——修剪、合并剪辑、叠加音频或音乐、添加字幕、文字/图片叠加、转码、分辨率调整、宽高比转换、适配平台的尺寸调整、转录以及媒体生成。仅当操作属于reference/editor.md中列出的限制项（转场、速度调整、裁剪/缩放、色彩分级、音量混合）时，才使用本地工具作为备选。

When to use what

工具选择指南

Problem	VideoDB solution
Platform rejects video aspect ratio or resolution	`video.reframe()` or `conn.transcode()` with `VideoConfig`
Need to resize video for Twitter/Instagram/TikTok	`video.reframe(target="vertical")` or `target="square"`
Need to change resolution (e.g. 1080p → 720p)	`conn.transcode()` with `VideoConfig(resolution=720)`
Need to overlay audio/music on video	`AudioAsset` on a `Timeline`
Need to add subtitles	`video.add_subtitle()` or `CaptionAsset`
Need to combine/trim clips	`VideoAsset` on a `Timeline`
Need to generate voiceover, music, or SFX	`coll.generate_voice()` , `generate_music()` , `generate_sound_effect()`

问题	VideoDB解决方案
平台拒绝视频的宽高比或分辨率	`video.reframe()` 或带 `VideoConfig` 的 `conn.transcode()`
需要调整视频尺寸以适配Twitter/Instagram/TikTok	`video.reframe(target="vertical")` 或 `target="square"`
需要更改分辨率（如1080p → 720p）	带 `VideoConfig(resolution=720)` 的 `conn.transcode()`
需要在视频上叠加音频/音乐	在 `Timeline` 上使用 `AudioAsset`
需要添加字幕	`video.add_subtitle()` 或 `CaptionAsset`
需要合并/修剪剪辑	在 `Timeline` 上使用 `VideoAsset`
需要生成旁白、音乐或音效	`coll.generate_voice()` 、 `generate_music()` 、 `generate_sound_effect()`

Provenance

来源

Reference material for this skill is vendored locally under

skills/videodb/reference/

. Use the local copies above instead of following external repository links at runtime.

Maintained By: VideoDB

本Skill的参考材料存储在本地

skills/videodb/reference/

目录下。运行时请使用上述本地副本，而非外部仓库链接。

维护方： VideoDB