video-understanding

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Video QnA using VLM through VSS Agent

使用VSS Agent通过VLM实现视频问答

Use this skill when you need details about the video which requires VLM to look at the video frames — for example the agent has no usable prior answer and needs a fresh look at the pixels for a specific clip.

当你需要了解视频中需要VLM分析视频帧才能获取的细节时，使用此技能——例如Agent没有可用的已有答案，需要针对特定片段“重新查看像素信息”。

When to Use

使用场景

The user asks what happens in the video, what objects / people / actions appear, colors, timing, safety, or other visual facts that require watching the clip.
The user asks for details that cannot be answered from existing messages, summaries, Elasticsearch/MCP results, or filenames alone—you need model inference on the video.
Follow-up questions about content details after a coarse summary or after report generation.

Do not use this skill when a database / MCP / prior tool output already answers the question, unless the user explicitly wants verification against the video.

用户询问视频中发生了什么，出现了哪些物体/人物/动作，颜色、时间、安全性，或其他需要观看视频片段才能得知的视觉事实。
用户询问的细节无法通过现有消息、摘要、Elasticsearch/MCP结果或文件名直接解答——你需要对视频进行模型推理。
在生成粗略摘要或报告后，用户提出关于内容细节的后续问题。

当数据库/MCP/先前工具输出已经能回答问题时，不要使用此技能，除非用户明确要求针对视频进行验证。

Deployment prerequisite

部署前提条件

This skill requires a VSS profile that serves the

video_understanding

tool — typically base (recommended) or lvs. Before any request:

Probe the VSS agent:

bash

curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null

If the probe fails, ask the user:
"No VSS profile is running on
$HOST_IP
. Shall I deploy
base
(recommended for per-clip VLM QnA) using the
/deploy
skill? If you prefer
lvs
, say so."
- If yes → hand off to
```
/deploy -p base
```
  (or
```
-p lvs
```
  if the user prefers). Return here once it succeeds.
- If no → stop.
(If your caller has granted explicit pre-authorization to deploy autonomously — e.g. the request says "pre-authorized to deploy prerequisites", or you are running in a non-interactive evaluation harness with that permission — skip the confirmation and invoke
```
/deploy -p base
```
directly. Prefer
```
base
```
unless the request names another profile.)
If the probe passes, proceed.

此技能需要一个提供

video_understanding

工具的VSS配置文件——通常是base（推荐）或lvs。在发起任何请求前：

探测VSS Agent：

bash

curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null

如果探测失败，询问用户：
“
$HOST_IP
上没有运行VSS配置文件。是否使用
/deploy
技能部署
base
（推荐用于逐片段VLM问答）？如果您偏好
lvs
，请告知。”
- 如果用户同意 → 转交至
```
/deploy -p base
```
  （如果用户偏好
```
lvs
```
  则使用
```
-p lvs
```
  ）。部署成功后返回此处。
- 如果用户不同意 → 停止操作。
（如果调用者已明确授权自主部署——例如请求中注明“已预先授权部署前提条件”，或你在具备该权限的非交互式评估环境中运行——则跳过确认步骤，直接调用
```
/deploy -p base
```
。除非请求指定其他配置文件，否则优先选择
```
base
```
。）
如果探测成功，继续操作。

Agent workflow

Agent工作流程

Clip — Identify sensor id, filename, or URL for one video segment. If ambiguous, ask the user.
Call vss agent with the sensor id and ask for it to call video_understanding tool to answer the user's question. The sensor / file name must be included in the input message to the agent.
Return the vss agent's answer back to the user.

片段选择 —— 确定一个视频片段的传感器ID、文件名或URL。如果信息不明确，询问用户。
调用VSS Agent，传入传感器ID，并请求其调用video_understanding工具来回答用户的问题。向Agent发送的输入消息中必须包含传感器/文件名称。
将VSS Agent的回答返回给用户。

Query VSS agent (

/generate

)

查询VSS Agent（

/generate

）

bash

undefined

bash

undefined

Set from deployment (compose / .env / host where vss-agent listens)

从部署配置中设置（compose / .env / vss-agent监听的主机）

export VSS_AGENT_BASE_URL="http://localhost:8000"

curl -s -X POST "${VSS_AGENT_BASE_URL}/generate"
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .

---

export VSS_AGENT_BASE_URL="http://localhost:8000"

---

Cross-Reference

交叉引用

vios — VST storage/replay URLs so VIDEO_URL
is valid for the VLM.
report — timestamped reports via the VSS agent (
```
/generate
```
); this skill is direct VLM for ad-hoc video Q&A.

vios —— VST存储/回放URL，确保**
```
VIDEO_URL
```
**对VLM有效。
report —— 通过VSS Agent（
```
/generate
```
）生成带时间戳的报告；此技能是用于临时视频问答的直接VLM调用。

video-understanding

Original

Translation

Video QnA using VLM through VSS Agent

使用VSS Agent通过VLM实现视频问答

When to Use

使用场景

Deployment prerequisite

部署前提条件

Agent workflow

Agent工作流程

Query VSS agent (
`/generate`
)

查询VSS Agent（
`/generate`
）

Set from deployment (compose / .env / host where vss-agent listens)

从部署配置中设置（compose / .env / vss-agent监听的主机）

Cross-Reference

交叉引用

video-understanding

Original

Translation

Video QnA using VLM through VSS Agent

使用VSS Agent通过VLM实现视频问答

When to Use

使用场景

Deployment prerequisite

部署前提条件

Agent workflow

Agent工作流程

Query VSS agent (/generate)

查询VSS Agent（/generate）

Set from deployment (compose / .env / host where vss-agent listens)

从部署配置中设置（compose / .env / vss-agent监听的主机）

Cross-Reference

交叉引用

Query VSS agent (
`/generate`
)

查询VSS Agent（
`/generate`
）