vss-ask-video
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVideo QnA using VLM through VSS Agent
通过VSS Agent使用VLM实现视频问答
Use this skill when you need details about the video which requires VLM to look at the video frames — for example the agent has no usable prior answer and needs a fresh look at the pixels for a specific clip.
当你需要了解视频中需要VLM分析视频帧的细节时使用此技能——例如agent没有可用的已有答案,需要针对特定片段重新查看像素信息。
When to Use
使用场景
- The user asks what happens in the video, what objects / people / actions appear, colors, timing, safety, or other visual facts that require watching the clip.
- The user asks for details that cannot be answered from existing messages, summaries, Elasticsearch/MCP results, or filenames alone—you need model inference on the video.
- Follow-up questions about content details after a coarse summary or after report generation.
Do not use this skill when a database / MCP / prior tool output already answers the question, unless the user explicitly wants verification against the video.
- 用户询问视频中发生了什么,出现了哪些物体/人物/动作,以及颜色、时间、安全性或其他需要观看片段才能得知的视觉事实。
- 用户询问的细节无法通过现有消息、摘要、Elasticsearch/MCP结果或文件名单独回答——需要对视频进行模型推理。
- 在粗略摘要或报告生成后,针对内容细节提出的跟进问题。
当数据库/MCP/已有工具输出已经能回答问题时,请勿使用此技能,除非用户明确要求针对视频进行验证。
Deployment prerequisite
部署前提
This skill requires a VSS profile that serves the tool — typically base (recommended) or lvs. Before any request:
video_understanding-
Probe the VSS agent:bash
curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null -
If the probe fails, ask the user:"No VSS profile is running on. Shall I deploy
$HOST_IP(recommended for per-clip VLM QnA) using thebaseskill? If you prefer/vss-deploy-profile, say so."lvs- If yes → hand off to (or
/vss-deploy-profile -p baseif the user prefers). Return here once it succeeds.-p lvs - If no → stop.
- If yes → hand off to
-
If the probe passes, proceed.
此技能需要提供工具的VSS配置文件——通常为base(推荐)或lvs。发起任何请求前:
video_understanding-
探测VSS agent:bash
curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null -
如果探测失败,询问用户:"上未运行VSS配置文件。是否使用
$HOST_IP技能部署/vss-deploy-profile(推荐用于单片段VLM问答)?如果偏好base,请告知。"lvs- 若用户同意 → 转交至(若用户偏好则为
/vss-deploy-profile -p base)。部署成功后返回此流程。-p lvs - 若用户拒绝 → 终止流程。
- 若用户同意 → 转交至
-
如果探测通过,继续执行。
Sensor prerequisite
传感器前提
You MUST list VST sensors before any call. This is required even when the user names the sensor explicitly, even when the user asserts the video is already uploaded, and even when a previous turn appeared to use the same video. Do not skip this step.
/generate-
List sensors:bash
curl -sf --max-time 5 "http://${HOST_IP}:30888/vst/api/v1/sensor/list" | jq '.[].name' -
Compare the returnedvalues against the user-supplied
name(or filename stem, e.g.<sensor-id>).warehouse_safety_0001 -
If a matching sensor is present → proceed to the Agent workflow below.
-
If no matching sensor is present — upload the video first, then re-list to confirm the new sensor appears:bash
# filename: must not contain whitespace # timestamp: ISO 8601 UTC — default 2025-01-01T00:00:00.000Z if user did not specify curl -s -X PUT "http://${HOST_IP}:30888/vst/api/v1/storage/file/<filename>?timestamp=<timestamp>" \ -H "Content-Type: application/octet-stream" \ -H "Content-Length: <file_size_in_bytes>" \ --upload-file /path/to/<filename> | jq .Seefor full upload semantics (v1 vs v2, conflict handling, delete flow). In interactive runs, confirm with the user before uploading. Never issue an unconditional PUT without first running the sensor-list check above — that is exactly the failure mode this prerequisite exists to prevent./vss-manage-video-io-storage
在任何调用前,必须列出VST传感器。即使用户明确指定传感器、声称视频已上传或之前会话似乎使用了同一视频,也必须执行此步骤,不可跳过。
/generate-
列出传感器:bash
curl -sf --max-time 5 "http://${HOST_IP}:30888/vst/api/v1/sensor/list" | jq '.[].name' -
将返回的值与用户提供的
name(或文件名主干,例如<sensor-id>)进行比对。warehouse_safety_0001 -
如果存在匹配的传感器 → 继续执行下方的Agent工作流。
-
如果不存在匹配的传感器 —— 先上传视频,然后重新列出传感器以确认新传感器已出现:bash
# 文件名:不得包含空格 # 时间戳:ISO 8601 UTC格式 —— 若用户未指定,默认值为2025-01-01T00:00:00.000Z curl -s -X PUT "http://${HOST_IP}:30888/vst/api/v1/storage/file/<filename>?timestamp=<timestamp>" \ -H "Content-Type: application/octet-stream" \ -H "Content-Length: <file_size_in_bytes>" \ --upload-file /path/to/<filename> | jq .查看以获取完整的上传语义(v1 vs v2、冲突处理、删除流程)。在交互式运行中,上传前需与用户确认。绝对不要在未先执行传感器列表检查的情况下无条件发起PUT请求——这正是此前提步骤要避免的失败场景。/vss-manage-video-io-storage
Agent workflow
Agent工作流
The Sensor prerequisite above must have already confirmed (or made) the sensor exist on VST. Then:
- Clip — Identify sensor id, filename, or URL for one video segment. If ambiguous, ask the user.
- Call vss agent with the sensor id and ask for it to call video_understanding tool to answer the user's question.
- Return the vss agent's answer back to the user.
上述传感器前提步骤必须已确认(或创建)VST上存在对应的传感器。然后:
- 片段定位 —— 确定单个视频片段的传感器ID、文件名或URL。若信息模糊,询问用户。
- 调用vss agent并传入传感器ID,请求其调用video_understanding工具回答用户的问题。
- 将vss agent的回答返回给用户。
Query VSS agent (/generate
)
/generate查询VSS agent(/generate
)
/generatebash
undefinedbash
undefinedSet from deployment (compose / .env / host where vss-agent listens)
从部署配置中设置(compose / .env / vss-agent监听的主机)
export VSS_AGENT_BASE_URL="http://localhost:8000"
curl -s -X POST "${VSS_AGENT_BASE_URL}/generate"
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .
---export VSS_AGENT_BASE_URL="http://localhost:8000"
curl -s -X POST "${VSS_AGENT_BASE_URL}/generate"
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .
---Cross-Reference
交叉参考
- vss-manage-video-io-storage — VST storage/replay URLs so is valid for the VLM.
VIDEO_URL - vss-generate-video-report — timestamped reports via the VSS agent (); this skill is direct VLM for ad-hoc video Q&A.
/generate
- vss-manage-video-io-storage —— VST存储/回放URL,确保****对VLM有效。
VIDEO_URL - vss-generate-video-report —— 通过VSS agent()生成带时间戳的报告;本技能是用于临时视频问答的直接VLM调用。
/generate