video-understanding

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Video QnA using VLM through VSS Agent

使用VSS Agent通过VLM实现视频问答

Use this skill when you need details about the video which requires VLM to look at the video frames — for example the agent has no usable prior answer and needs a fresh look at the pixels for a specific clip.

当你需要了解视频中需要VLM分析视频帧才能获取的细节时,使用此技能——例如Agent没有可用的已有答案,需要针对特定片段“重新查看像素信息”。

When to Use

使用场景

  • The user asks what happens in the video, what objects / people / actions appear, colors, timing, safety, or other visual facts that require watching the clip.
  • The user asks for details that cannot be answered from existing messages, summaries, Elasticsearch/MCP results, or filenames alone—you need model inference on the video.
  • Follow-up questions about content details after a coarse summary or after report generation.
Do not use this skill when a database / MCP / prior tool output already answers the question, unless the user explicitly wants verification against the video.

  • 用户询问视频中发生了什么,出现了哪些物体/人物/动作颜色时间安全性,或其他需要观看视频片段才能得知的视觉事实
  • 用户询问的细节无法通过现有消息、摘要、Elasticsearch/MCP结果或文件名直接解答——你需要对视频进行模型推理
  • 在生成粗略摘要或报告后,用户提出关于内容细节的后续问题。
数据库/MCP/先前工具输出已经能回答问题时,不要使用此技能,除非用户明确要求针对视频进行验证

Deployment prerequisite

部署前提条件

This skill requires a VSS profile that serves the
video_understanding
tool — typically base (recommended) or lvs. Before any request:
  1. Probe the VSS agent:
    bash
    curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null
  2. If the probe fails, ask the user:
    "No VSS profile is running on
    $HOST_IP
    . Shall I deploy
    base
    (recommended for per-clip VLM QnA) using the
    /deploy
    skill? If you prefer
    lvs
    , say so."
    • If yes → hand off to
      /deploy -p base
      (or
      -p lvs
      if the user prefers). Return here once it succeeds.
    • If no → stop.
    (If your caller has granted explicit pre-authorization to deploy autonomously — e.g. the request says "pre-authorized to deploy prerequisites", or you are running in a non-interactive evaluation harness with that permission — skip the confirmation and invoke
    /deploy -p base
    directly. Prefer
    base
    unless the request names another profile.)
  3. If the probe passes, proceed.

此技能需要一个提供
video_understanding
工具的VSS配置文件——通常是base(推荐)或lvs。在发起任何请求前:
  1. 探测VSS Agent:
    bash
    curl -sf --max-time 5 "http://${HOST_IP}:8000/docs" >/dev/null
  2. 如果探测失败,询问用户:
    $HOST_IP
    上没有运行VSS配置文件。是否使用
    /deploy
    技能部署
    base
    (推荐用于逐片段VLM问答)?如果您偏好
    lvs
    ,请告知。”
    • 如果用户同意 → 转交至
      /deploy -p base
      (如果用户偏好
      lvs
      则使用
      -p lvs
      )。部署成功后返回此处。
    • 如果用户不同意 → 停止操作。
    (如果调用者已明确授权自主部署——例如请求中注明“已预先授权部署前提条件”,或你在具备该权限的非交互式评估环境中运行——则跳过确认步骤,直接调用
    /deploy -p base
    。除非请求指定其他配置文件,否则优先选择
    base
    。)
  3. 如果探测成功,继续操作。

Agent workflow

Agent工作流程

  1. Clip — Identify sensor id, filename, or URL for one video segment. If ambiguous, ask the user.
  2. Call vss agent with the sensor id and ask for it to call video_understanding tool to answer the user's question. The sensor / file name must be included in the input message to the agent.
  3. Return the vss agent's answer back to the user.
  1. 片段选择 —— 确定一个视频片段的传感器ID文件名URL。如果信息不明确,询问用户。
  2. 调用VSS Agent,传入传感器ID,并请求其调用video_understanding工具来回答用户的问题。向Agent发送的输入消息中必须包含传感器/文件名称。
  3. 将VSS Agent的回答返回给用户。

Query VSS agent (
/generate
)

查询VSS Agent(
/generate

bash
undefined
bash
undefined

Set from deployment (compose / .env / host where vss-agent listens)

从部署配置中设置(compose / .env / vss-agent监听的主机)

export VSS_AGENT_BASE_URL="http://localhost:8000"
curl -s -X POST "${VSS_AGENT_BASE_URL}/generate"
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .

---
export VSS_AGENT_BASE_URL="http://localhost:8000"
curl -s -X POST "${VSS_AGENT_BASE_URL}/generate"
-H "Content-Type: application/json"
-d '{"input_message": "Call video_understanding tool to answer the following question about <sensor-id>: <user query>"}' | jq .

---

Cross-Reference

交叉引用

  • vios — VST storage/replay URLs so
    VIDEO_URL
    is valid for the VLM.
  • report — timestamped reports via the VSS agent (
    /generate
    ); this skill is direct VLM for ad-hoc video Q&A.
  • vios —— VST存储/回放URL,确保**
    VIDEO_URL
    **对VLM有效。
  • report —— 通过VSS Agent(
    /generate
    )生成带时间戳的报告;此技能是用于临时视频问答直接VLM调用