videodb

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

VideoDB Skill

VideoDB Skill

Perception + memory + actions for video, live streams, and desktop sessions.
针对视频、直播流和桌面会话的感知+记忆+处理能力。

When to use

适用场景

Desktop Perception

桌面感知

  • Start/stop a desktop session capturing screen, mic, and system audio
  • Stream live context and store episodic session memory
  • Run real-time alerts/triggers on what's spoken and what's happening on screen
  • Produce session summaries, a searchable timeline, and playable evidence links
  • 启动/停止捕获屏幕、麦克风和系统音频桌面会话
  • 传输实时上下文并存储会话片段记忆
  • 针对屏幕上的画面和语音内容运行实时警报/触发器
  • 生成会话摘要、可搜索的时间线以及可播放的证据链接

Video ingest + stream

视频导入与流处理

  • Ingest a file or URL and return a playable web stream link
  • Transcode/normalize: codec, bitrate, fps, resolution, aspect ratio
  • 导入文件或URL并返回可播放的网页流链接
  • 转码/标准化:调整编解码器、比特率、帧率、分辨率、宽高比

Index + search (timestamps + evidence)

索引与搜索(时间戳+证据)

  • Build visual, spoken, and keyword indexes
  • Search and return exact moments with timestamps and playable evidence
  • Auto-create clips from search results
  • 构建视觉语音关键词索引
  • 搜索并返回包含时间戳可播放证据的精准时刻
  • 根据搜索结果自动创建剪辑片段

Timeline editing + generation

时间线编辑与生成

  • Subtitles: generate, translate, burn-in
  • Overlays: text/image/branding, motion captions
  • Audio: background music, voiceover, dubbing
  • Programmatic composition and exports via timeline operations
  • 字幕:生成翻译内嵌
  • 叠加层:文字/图片/品牌标识、动态字幕
  • 音频:背景音乐旁白配音
  • 通过时间线操作进行程序化合成与导出

Live streams (RTSP) + monitoring

直播流(RTSP)与监控

  • Connect RTSP/live feeds
  • Run real-time visual and spoken understanding and emit events/alerts for monitoring workflows
  • 连接RTSP/直播流
  • 运行实时视觉与语音理解,并为监控工作流触发事件/警报

How it works

工作原理

Common inputs

常见输入

  • Local file path, public URL, or RTSP URL
  • Desktop capture request: start / stop / summarize session
  • Desired operations: get context for understanding, transcode spec, index spec, search query, clip ranges, timeline edits, alert rules
  • 本地文件路径、公共URLRTSP URL
  • 桌面捕获请求:启动/停止/总结会话
  • 所需操作:获取理解用的上下文、转码规格、索引规格、搜索查询、剪辑范围、时间线编辑、警报规则

Common outputs

常见输出

  • Stream URL
  • Search results with timestamps and evidence links
  • Generated assets: subtitles, audio, images, clips
  • Event/alert payloads for live streams
  • Desktop session summaries and memory entries
  • 流链接
  • 包含时间戳证据链接的搜索结果
  • 生成的资产:字幕、音频、图片、剪辑片段
  • 直播流的事件/警报负载
  • 桌面会话摘要和记忆条目

Running Python code

运行Python代码

Before running any VideoDB code, change to the project directory and load environment variables:
python
from dotenv import load_dotenv
load_dotenv(".env")

import videodb
conn = videodb.connect()
This reads
VIDEO_DB_API_KEY
from:
  1. Environment (if already exported)
  2. Project's
    .env
    file in current directory
If the key is missing,
videodb.connect()
raises
AuthenticationError
automatically.
Do NOT write a script file when a short inline command works.
When writing inline Python (
python -c "..."
), always use properly formatted code — use semicolons to separate statements and keep it readable. For anything longer than ~3 statements, use a heredoc instead:
bash
python << 'EOF'
from dotenv import load_dotenv
load_dotenv(".env")

import videodb
conn = videodb.connect()
coll = conn.get_collection()
print(f"Videos: {len(coll.get_videos())}")
EOF
在运行任何VideoDB代码前,切换到项目目录并加载环境变量:
python
from dotenv import load_dotenv
load_dotenv(".env")

import videodb
conn = videodb.connect()
该代码会从以下位置读取
VIDEO_DB_API_KEY
  1. 环境变量(若已导出)
  2. 当前目录下的项目
    .env
    文件
如果密钥缺失,
videodb.connect()
会自动抛出
AuthenticationError
当简短的内联命令可实现需求时,请勿编写脚本文件。
编写内联Python代码(
python -c "..."
)时,请始终使用格式规范的代码——使用分号分隔语句并保持可读性。对于超过约3条语句的代码,请使用 heredoc:
bash
python << 'EOF'
from dotenv import load_dotenv
load_dotenv(".env")

import videodb
conn = videodb.connect()
coll = conn.get_collection()
print(f"Videos: {len(coll.get_videos())}")
EOF

Setup

安装配置

When the user asks to "setup videodb" or similar:
当用户要求“设置videodb”或类似需求时:

1. Install SDK

1. 安装SDK

bash
pip install "videodb[capture]" python-dotenv
If
videodb[capture]
fails on Linux, install without the capture extra:
bash
pip install videodb python-dotenv
bash
pip install "videodb[capture]" python-dotenv
如果在Linux上安装
videodb[capture]
失败,请安装不带capture扩展的版本:
bash
pip install videodb python-dotenv

2. Configure API key

2. 配置API密钥

The user must set
VIDEO_DB_API_KEY
using either method:
  • Export in terminal (before starting Claude):
    export VIDEO_DB_API_KEY=your-key
  • Project
    .env
    file
    : Save
    VIDEO_DB_API_KEY=your-key
    in the project's
    .env
    file
Get a free API key at https://console.videodb.io (50 free uploads, no credit card).
Do NOT read, write, or handle the API key yourself. Always let the user set it.
用户必须通过以下任意一种方式设置
VIDEO_DB_API_KEY
  • 在终端中导出(启动Claude前):
    export VIDEO_DB_API_KEY=your-key
  • 项目.env文件:在项目的.env文件中保存
    VIDEO_DB_API_KEY=your-key
请勿自行读取、写入或处理API密钥,请始终让用户自行设置。

Quick Reference

快速参考

Upload media

上传媒体

python
undefined
python
undefined

URL

URL

video = coll.upload(url="https://example.com/video.mp4")
video = coll.upload(url="https://example.com/video.mp4")

YouTube

YouTube

video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")
video = coll.upload(url="https://www.youtube.com/watch?v=VIDEO_ID")

Local file

本地文件

video = coll.upload(file_path="/path/to/video.mp4")
undefined
video = coll.upload(file_path="/path/to/video.mp4")
undefined

Transcript + subtitle

转录与字幕

python
undefined
python
undefined

force=True skips the error if the video is already indexed

force=True会在视频已被索引时跳过错误

video.index_spoken_words(force=True) text = video.get_transcript_text() stream_url = video.add_subtitle()
undefined
video.index_spoken_words(force=True) text = video.get_transcript_text() stream_url = video.add_subtitle()
undefined

Search inside videos

视频内搜索

python
from videodb.exceptions import InvalidRequestError

video.index_spoken_words(force=True)
python
from videodb.exceptions import InvalidRequestError

video.index_spoken_words(force=True)

search() raises InvalidRequestError when no results are found.

当无结果时,search()会抛出InvalidRequestError。

Always wrap in try/except and treat "No results found" as empty.

请始终用try/except包裹,并将“未找到结果”视为空结果。

try: results = video.search("product demo") shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise
undefined
try: results = video.search("product demo") shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise
undefined

Scene search

场景搜索

python
import re
from videodb import SearchType, IndexType, SceneExtractionType
from videodb.exceptions import InvalidRequestError
python
import re
from videodb import SearchType, IndexType, SceneExtractionType
from videodb.exceptions import InvalidRequestError

index_scenes() has no force parameter — it raises an error if a scene

index_scenes()没有force参数——如果场景索引已存在,会抛出错误。从错误信息中提取现有索引ID。

index already exists. Extract the existing index ID from the error.

try: scene_index_id = video.index_scenes( extraction_type=SceneExtractionType.shot_based, prompt="Describe the visual content in this scene.", ) except Exception as e: match = re.search(r"id\s+([a-f0-9]+)", str(e)) if match: scene_index_id = match.group(1) else: raise
try: scene_index_id = video.index_scenes( extraction_type=SceneExtractionType.shot_based, prompt="Describe the visual content in this scene.", ) except Exception as e: match = re.search(r"id\s+([a-f0-9]+)", str(e)) if match: scene_index_id = match.group(1) else: raise

Use score_threshold to filter low-relevance noise (recommended: 0.3+)

使用score_threshold过滤低相关性结果(推荐值:0.3+)

try: results = video.search( query="person writing on a whiteboard", search_type=SearchType.semantic, index_type=IndexType.scene, scene_index_id=scene_index_id, score_threshold=0.3, ) shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise
undefined
try: results = video.search( query="person writing on a whiteboard", search_type=SearchType.semantic, index_type=IndexType.scene, scene_index_id=scene_index_id, score_threshold=0.3, ) shots = results.get_shots() stream_url = results.compile() except InvalidRequestError as e: if "No results found" in str(e): shots = [] else: raise
undefined

Timeline editing

时间线编辑

Important: Always validate timestamps before building a timeline:
  • start
    must be >= 0 (negative values are silently accepted but produce broken output)
  • start
    must be <
    end
  • end
    must be <=
    video.length
python
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle

timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30))
timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36)))
stream_url = timeline.generate_stream()
注意: 在构建时间线前,请始终验证时间戳:
  • start
    必须 >= 0(负值会被静默接受,但会产生损坏的输出)
  • start
    必须 <
    end
  • end
    必须 <=
    video.length
python
from videodb.timeline import Timeline
from videodb.asset import VideoAsset, TextAsset, TextStyle

timeline = Timeline(conn)
timeline.add_inline(VideoAsset(asset_id=video.id, start=10, end=30))
timeline.add_overlay(0, TextAsset(text="The End", duration=3, style=TextStyle(fontsize=36)))
stream_url = timeline.generate_stream()

Transcode video (resolution / quality change)

转码视频(调整分辨率/画质)

python
from videodb import TranscodeMode, VideoConfig, AudioConfig
python
from videodb import TranscodeMode, VideoConfig, AudioConfig

Change resolution, quality, or aspect ratio server-side

在服务器端调整分辨率、画质或宽高比

job_id = conn.transcode( source="https://example.com/video.mp4", callback_url="https://example.com/webhook", mode=TranscodeMode.economy, video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"), audio_config=AudioConfig(mute=False), )
undefined
job_id = conn.transcode( source="https://example.com/video.mp4", callback_url="https://example.com/webhook", mode=TranscodeMode.economy, video_config=VideoConfig(resolution=720, quality=23, aspect_ratio="16:9"), audio_config=AudioConfig(mute=False), )
undefined

Reframe aspect ratio (for social platforms)

调整宽高比(适配社交平台)

Warning:
reframe()
is a slow server-side operation. For long videos it can take several minutes and may time out. Best practices:
  • Always limit to a short segment using
    start
    /
    end
    when possible
  • For full-length videos, use
    callback_url
    for async processing
  • Trim the video on a
    Timeline
    first, then reframe the shorter result
python
from videodb import ReframeMode
警告:
reframe()
是一项较慢的服务器端操作。对于长视频,可能需要数分钟甚至超时。最佳实践:
  • 尽可能使用
    start
    /
    end
    限制为短片段
  • 对于全长度视频,使用
    callback_url
    进行异步处理
  • 先在
    Timeline
    上修剪视频,再对较短的结果进行宽高比调整
python
from videodb import ReframeMode

Always prefer reframing a short segment:

优先对短片段进行宽高比调整:

reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)
reframed = video.reframe(start=0, end=60, target="vertical", mode=ReframeMode.smart)

Async reframe for full-length videos (returns None, result via webhook):

全长度视频异步调整宽高比(返回None,结果通过webhook获取):

video.reframe(target="vertical", callback_url="https://example.com/webhook")
video.reframe(target="vertical", callback_url="https://example.com/webhook")

Presets: "vertical" (9:16), "square" (1:1), "landscape" (16:9)

预设值:"vertical"(9:16)、"square"(1:1)、"landscape"(16:9)

reframed = video.reframe(start=0, end=60, target="square")
reframed = video.reframe(start=0, end=60, target="square")

Custom dimensions

自定义尺寸

reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})
undefined
reframed = video.reframe(start=0, end=60, target={"width": 1280, "height": 720})
undefined

Generative media

生成式媒体

python
image = coll.generate_image(
    prompt="a sunset over mountains",
    aspect_ratio="16:9",
)
python
image = coll.generate_image(
    prompt="a sunset over mountains",
    aspect_ratio="16:9",
)

Error handling

错误处理

python
from videodb.exceptions import AuthenticationError, InvalidRequestError

try:
    conn = videodb.connect()
except AuthenticationError:
    print("Check your VIDEO_DB_API_KEY")

try:
    video = coll.upload(url="https://example.com/video.mp4")
except InvalidRequestError as e:
    print(f"Upload failed: {e}")
python
from videodb.exceptions import AuthenticationError, InvalidRequestError

try:
    conn = videodb.connect()
except AuthenticationError:
    print("Check your VIDEO_DB_API_KEY")

try:
    video = coll.upload(url="https://example.com/video.mp4")
except InvalidRequestError as e:
    print(f"Upload failed: {e}")

Common pitfalls

常见问题

ScenarioError messageSolution
Indexing an already-indexed video
Spoken word index for video already exists
Use
video.index_spoken_words(force=True)
to skip if already indexed
Scene index already exists
Scene index with id XXXX already exists
Extract the existing
scene_index_id
from the error with
re.search(r"id\s+([a-f0-9]+)", str(e))
Search finds no matches
InvalidRequestError: No results found
Catch the exception and treat as empty results (
shots = []
)
Reframe times outBlocks indefinitely on long videosUse
start
/
end
to limit segment, or pass
callback_url
for async
Negative timestamps on TimelineSilently produces broken streamAlways validate
start >= 0
before creating
VideoAsset
generate_video()
/
create_collection()
fails
Operation not allowed
or
maximum limit
Plan-gated features — inform the user about plan limits
场景错误信息解决方案
对已索引的视频再次索引
Spoken word index for video already exists
使用
video.index_spoken_words(force=True)
跳过已索引的视频
场景索引已存在
Scene index with id XXXX already exists
使用
re.search(r"id\s+([a-f0-9]+)", str(e))
从错误信息中提取已有的
scene_index_id
搜索无匹配结果
InvalidRequestError: No results found
捕获异常并视为空结果(
shots = []
Reframe操作超时长时间阻塞无响应使用
start
/
end
限制片段,或传递
callback_url
进行异步处理
时间线使用负时间戳静默生成损坏的流创建
VideoAsset
前始终验证
start >= 0
generate_video()
/
create_collection()
失败
Operation not allowed
maximum limit
该功能受计划限制——告知用户相关计划限制

Examples

示例

Canonical prompts

标准提示词

  • "Start desktop capture and alert when a password field appears."
  • "Record my session and produce an actionable summary when it ends."
  • "Ingest this file and return a playable stream link."
  • "Index this folder and find every scene with people, return timestamps."
  • "Generate subtitles, burn them in, and add light background music."
  • "Connect this RTSP URL and alert when a person enters the zone."
  • "启动桌面捕获,当出现密码输入框时触发警报。"
  • "录制我的会话,结束后生成可执行的摘要。"
  • "导入该文件并返回可播放的流链接。"
  • "索引该文件夹并找出所有有人物的场景,返回时间戳。"
  • "生成字幕并内嵌,同时添加轻柔的背景音乐。"
  • "连接该RTSP URL,当有人进入区域时触发警报。"

Screen Recording (Desktop Capture)

屏幕录制(桌面捕获)

Use
ws_listener.py
to capture WebSocket events during recording sessions. Desktop capture supports macOS only.
使用
ws_listener.py
在录制会话期间捕获WebSocket事件。桌面捕获仅支持macOS

Quick Start

快速开始

  1. Choose state dir:
    STATE_DIR="${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}"
  2. Start listener:
    VIDEODB_EVENTS_DIR="$STATE_DIR" python scripts/ws_listener.py --clear "$STATE_DIR" &
  3. Get WebSocket ID:
    cat "$STATE_DIR/videodb_ws_id"
  4. Run capture code (see reference/capture.md for the full workflow)
  5. Events written to:
    $STATE_DIR/videodb_events.jsonl
Use
--clear
whenever you start a fresh capture run so stale transcript and visual events do not leak into the new session.
  1. 选择状态目录
    STATE_DIR="${VIDEODB_EVENTS_DIR:-$HOME/.local/state/videodb}"
  2. 启动监听器
    VIDEODB_EVENTS_DIR="$STATE_DIR" python scripts/ws_listener.py --clear "$STATE_DIR" &
  3. 获取WebSocket ID
    cat "$STATE_DIR/videodb_ws_id"
  4. 运行捕获代码(完整工作流请参考reference/capture.md)
  5. 事件写入位置
    $STATE_DIR/videodb_events.jsonl
每次启动新的捕获运行时,请使用
--clear
参数,避免旧的转录和视觉事件泄露到新会话中。

Query Events

查询事件

python
import json
import os
import time
from pathlib import Path

events_dir = Path(os.environ.get("VIDEODB_EVENTS_DIR", Path.home() / ".local" / "state" / "videodb"))
events_file = events_dir / "videodb_events.jsonl"
events = []

if events_file.exists():
    with events_file.open(encoding="utf-8") as handle:
        for line in handle:
            try:
                events.append(json.loads(line))
            except json.JSONDecodeError:
                continue

transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
cutoff = time.time() - 300
recent_visual = [
    e for e in events
    if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff
]
python
import json
import os
import time
from pathlib import Path

events_dir = Path(os.environ.get("VIDEODB_EVENTS_DIR", Path.home() / ".local" / "state" / "videodb"))
events_file = events_dir / "videodb_events.jsonl"
events = []

if events_file.exists():
    with events_file.open(encoding="utf-8") as handle:
        for line in handle:
            try:
                events.append(json.loads(line))
            except json.JSONDecodeError:
                continue

transcripts = [e["data"]["text"] for e in events if e.get("channel") == "transcript"]
cutoff = time.time() - 300
recent_visual = [
    e for e in events
    if e.get("channel") == "visual_index" and e["unix_ts"] > cutoff
]

Additional docs

附加文档

Reference documentation is in the
reference/
directory adjacent to this SKILL.md file. Use the Glob tool to locate it if needed.
  • reference/api-reference.md - Complete VideoDB Python SDK API reference
  • reference/search.md - In-depth guide to video search (spoken word and scene-based)
  • reference/editor.md - Timeline editing, assets, and composition
  • reference/streaming.md - HLS streaming and instant playback
  • reference/generative.md - AI-powered media generation (images, video, audio)
  • reference/rtstream.md - Live stream ingestion workflow (RTSP/RTMP)
  • reference/rtstream-reference.md - RTStream SDK methods and AI pipelines
  • reference/capture.md - Desktop capture workflow
  • reference/capture-reference.md - Capture SDK and WebSocket events
  • reference/use-cases.md - Common video processing patterns and examples
Do not use ffmpeg, moviepy, or local encoding tools when VideoDB supports the operation. The following are all handled server-side by VideoDB — trimming, combining clips, overlaying audio or music, adding subtitles, text/image overlays, transcoding, resolution changes, aspect-ratio conversion, resizing for platform requirements, transcription, and media generation. Only fall back to local tools for operations listed under Limitations in reference/editor.md (transitions, speed changes, crop/zoom, colour grading, volume mixing).
参考文档位于本SKILL.md文件相邻的
reference/
目录中。如有需要,请使用Glob工具查找。
  • reference/api-reference.md - 完整的VideoDB Python SDK API参考
  • reference/search.md - 视频搜索深度指南(语音与场景搜索)
  • reference/editor.md - 时间线编辑、资产与合成
  • reference/streaming.md - HLS流与即时播放
  • reference/generative.md - AI驱动的媒体生成(图片、视频、音频)
  • reference/rtstream.md - 直播流导入工作流(RTSP/RTMP)
  • reference/rtstream-reference.md - RTStream SDK方法与AI流水线
  • reference/capture.md - 桌面捕获工作流
  • reference/capture-reference.md - 捕获SDK与WebSocket事件
  • reference/use-cases.md - 常见视频处理模式与示例
**当VideoDB支持相关操作时,请勿使用ffmpeg、moviepy或本地编码工具。**以下操作均由VideoDB在服务器端处理——修剪、合并剪辑、叠加音频或音乐、添加字幕、文字/图片叠加、转码、分辨率调整、宽高比转换、适配平台的尺寸调整、转录以及媒体生成。仅当操作属于reference/editor.md中列出的限制项(转场、速度调整、裁剪/缩放、色彩分级、音量混合)时,才使用本地工具作为备选。

When to use what

工具选择指南

ProblemVideoDB solution
Platform rejects video aspect ratio or resolution
video.reframe()
or
conn.transcode()
with
VideoConfig
Need to resize video for Twitter/Instagram/TikTok
video.reframe(target="vertical")
or
target="square"
Need to change resolution (e.g. 1080p → 720p)
conn.transcode()
with
VideoConfig(resolution=720)
Need to overlay audio/music on video
AudioAsset
on a
Timeline
Need to add subtitles
video.add_subtitle()
or
CaptionAsset
Need to combine/trim clips
VideoAsset
on a
Timeline
Need to generate voiceover, music, or SFX
coll.generate_voice()
,
generate_music()
,
generate_sound_effect()
问题VideoDB解决方案
平台拒绝视频的宽高比或分辨率
video.reframe()
或带
VideoConfig
conn.transcode()
需要调整视频尺寸以适配Twitter/Instagram/TikTok
video.reframe(target="vertical")
target="square"
需要更改分辨率(如1080p → 720p)
VideoConfig(resolution=720)
conn.transcode()
需要在视频上叠加音频/音乐
Timeline
上使用
AudioAsset
需要添加字幕
video.add_subtitle()
CaptionAsset
需要合并/修剪剪辑
Timeline
上使用
VideoAsset
需要生成旁白、音乐或音效
coll.generate_voice()
generate_music()
generate_sound_effect()

Provenance

来源

Reference material for this skill is vendored locally under
skills/videodb/reference/
. Use the local copies above instead of following external repository links at runtime.
Maintained By: VideoDB
本Skill的参考材料存储在本地
skills/videodb/reference/
目录下。 运行时请使用上述本地副本,而非外部仓库链接。
维护方: VideoDB