video-understanding
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo QnA using VLM through VSS Agent
使用VSS Agent通过VLM实现视频问答
Use this skill when you need details about the video which requires VLM to look at the video frames — for example the agent has no usable prior answer and needs a fresh look at the pixels for a specific clip.
当你需要了解视频中需要VLM分析视频帧才能获取的细节时,使用此技能——例如Agent没有可用的已有答案,需要针对特定片段“重新查看像素信息”。
When to Use
使用场景
- The user asks what happens in the video, what objects / people / actions appear, colors, timing, safety, or other visual facts that require watching the clip.
- The user asks for details that cannot be answered from existing messages, summaries, Elasticsearch/MCP results, or filenames alone—you need model inference on the video.
- Follow-up questions about content details after a coarse summary or after report generation.
Do not use this skill when a database / MCP / prior tool output already answers the question, unless the user explicitly wants verification against the video.
- 用户询问视频中发生了什么,出现了哪些物体/人物/动作,颜色、时间、安全性,或其他需要观看视频片段才能得知的视觉事实。
- 用户询问的细节无法通过现有消息、摘要、Elasticsearch/MCP结果或文件名直接解答——你需要对视频进行模型推理。
- 在生成粗略摘要或报告后,用户提出关于内容细节的后续问题。
当数据库/MCP/先前工具输出已经能回答问题时,不要使用此技能,除非用户明确要求针对视频进行验证。
Deployment prerequisite
部署前提条件
This skill requires a VSS profile that serves the tool — typically base (recommended) or lvs. Before any request:
video_understanding-
Probe the VSS agent:bash
curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null -
If the probe fails, ask the user:"No VSS profile is running on. Shall I deploy
$HOST_IP(recommended for per-clip VLM QnA) using thebaseskill? If you prefer/deploy, say so."lvs- If yes → hand off to (or
/deploy -p baseif the user prefers). Return here once it succeeds.-p lvs - If no → stop.
(If your caller has granted explicit pre-authorization to deploy autonomously — e.g. the request says "pre-authorized to deploy prerequisites", or you are running in a non-interactive evaluation harness with that permission — skip the confirmation and invokedirectly. Prefer/deploy -p baseunless the request names another profile.)base - If yes → hand off to
-
If the probe passes, proceed.
此技能需要一个提供工具的VSS配置文件——通常是base(推荐)或lvs。在发起任何请求前:
video_understanding-
探测VSS Agent:bash
curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null -
如果探测失败,询问用户:“上没有运行VSS配置文件。是否使用
$HOST_IP技能部署/deploy(推荐用于逐片段VLM问答)?如果您偏好base,请告知。”lvs- 如果用户同意 → 转交至(如果用户偏好
/deploy -p base则使用lvs)。部署成功后返回此处。-p lvs - 如果用户不同意 → 停止操作。
(如果调用者已明确授权自主部署——例如请求中注明“已预先授权部署前提条件”,或你在具备该权限的非交互式评估环境中运行——则跳过确认步骤,直接调用。除非请求指定其他配置文件,否则优先选择/deploy -p base。)base - 如果用户同意 → 转交至
-
如果探测成功,继续操作。
Agent workflow
Agent工作流程
- Clip — Identify sensor id, filename, or URL for one video segment. If ambiguous, ask the user.
- Call vss agent with the sensor id and ask for it to call video_understanding tool to answer the user's question. The sensor / file name must be included in the input message to the agent.
- Return the vss agent's answer back to the user.
- 片段选择 —— 确定一个视频片段的传感器ID、文件名或URL。如果信息不明确,询问用户。
- 调用VSS Agent,传入传感器ID,并请求其调用video_understanding工具来回答用户的问题。向Agent发送的输入消息中必须包含传感器/文件名称。
- 将VSS Agent的回答返回给用户。
Query VSS agent (/generate
)
/generate查询VSS Agent(/generate
)
/generatebash
undefinedbash
undefinedSet from deployment (compose / .env / host where vss-agent listens)
从部署配置中设置(compose / .env / vss-agent监听的主机)
export VSS_AGENT_BASE_URL="http://localhost:8000"
curl -s -X POST "${VSS_AGENT_BASE_URL}/generate"
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .
---export VSS_AGENT_BASE_URL="http://localhost:8000"
curl -s -X POST "${VSS_AGENT_BASE_URL}/generate"
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .
---Cross-Reference
交叉引用
- vios — VST storage/replay URLs so is valid for the VLM.
VIDEO_URL - report — timestamped reports via the VSS agent (); this skill is direct VLM for ad-hoc video Q&A.
/generate
- vios —— VST存储/回放URL,确保****对VLM有效。
VIDEO_URL - report —— 通过VSS Agent()生成带时间戳的报告;此技能是用于临时视频问答的直接VLM调用。
/generate