vss-generate-video-report

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Report

报告

Generate a video analysis report by routing to one of two backends — never via
POST /generate
on the VSS agent.
ModeTriggerBackend
A. Video clip"report on
<sensor>
", "report on this video", "analyze warehouse_01.mp4"
/vss-manage-video-io-storage
→ clip URL → VLM chat/completions
B. Incident range"report on incidents from
<t1>
to
<t2>
", "report on alerts today", "what incidents happened on
<sensor>
last hour"
/vss-query-analytics
→ incident list → narrative report
If the request is ambiguous (e.g. "report on
<sensor>
" with no time range and no incident wording), default to Mode A. Ask only if the user mentions both a sensor and a time range.

通过调用两个后端之一生成视频分析报告——绝对不要调用VSS agent的
POST /generate
接口。
模式触发指令后端
A. 视频片段"生成
<sensor>
的报告"、"生成此视频的报告"、"分析warehouse_01.mp4"
/vss-manage-video-io-storage
→ 片段URL → VLM chat/completions
B. 事件范围"生成
<t1>
<t2>
的事件报告"、"生成今日告警报告"、"
<sensor>
在上一小时发生了哪些事件"
/vss-query-analytics
→ 事件列表 → 叙述性报告
如果请求存在歧义(例如,仅提及"生成
<sensor>
的报告"但未指定时间范围或相关事件表述),默认使用模式A。仅当用户同时提及传感器和时间范围时,才需要向用户确认。

When to Use

使用场景

  • "Generate a report for this video" / "for
    <sensor-id>
    " — Mode A
  • "Create an analysis report on the uploaded video" — Mode A
  • "Report on incidents from 12:31Z to 12:32Z" — Mode B
  • "Summarize alerts on
    <sensor>
    between
    <t1>
    and
    <t2>
    " — Mode B

  • "为该视频生成报告" / "为
    <sensor-id>
    生成报告" —— 模式A
  • "为上传的视频生成分析报告" —— 模式A
  • "生成12:31Z至12:32Z的事件报告" —— 模式B
  • "总结
    <sensor>
    <t1>
    <t2>
    期间的告警" —— 模式B

Deployment prerequisite

部署前提条件

Mode A needs the VSS base profile (VST + VLM NIM). Mode B needs the VSS alerts profile (VA-MCP + Elasticsearch).
Probe:
bash
undefined
模式A需要VSS base配置文件(VST + VLM NIM)。 模式B需要VSS alerts配置文件(VA-MCP + Elasticsearch)。
探测命令:
bash
undefined

Mode A — VST + VLM reachability

模式A —— 检查VST + VLM可达性

curl -sf --max-time 5 "http://${HOST_IP}:30888/vst/api/v1/sensor/version" >/dev/null
curl -sf --max-time 5 "http://${HOST_IP}:30888/vst/api/v1/sensor/version" >/dev/null

Mode B — VA-MCP

模式B —— 检查VA-MCP

curl -sf --max-time 5 "http://${HOST_IP}:9901/" >/dev/null

If the probe fails, hand off to `/vss-deploy-profile` with `-p base` (Mode A) or `-p alerts` (Mode B). **Always** confirm the deploy with the user first.

---
curl -sf --max-time 5 "http://${HOST_IP}:9901/" >/dev/null

如果探测失败,需调用`/vss-deploy-profile`接口并指定`-p base`(模式A)或`-p alerts`(模式B)。**必须先与用户确认后再执行部署操作**。

---

Browser-playable clip URL (always do this before embedding any clip in the report)

浏览器可播放的片段URL(在报告中嵌入任何片段前必须执行此操作)

VST returns clip URLs using the agent-internal
${HOST_IP}:30888
host:port. Those work in-cluster (VLM frame pulls, agent backend) but the user's browser cannot reach them. The deploy layer already exports the browser-facing host:port as
$VSS_PUBLIC_HOST
/
$VSS_PUBLIC_PORT
(and scheme as
$VSS_PUBLIC_HTTP_PROTOCOL
) in every profile
.env
— Brev or bare-metal — so the rewrite is a one-liner:
bash
BROWSER_CLIP_URL=$(echo "$RAW_URL" | sed -E "s|^https?://[^/]+|${VSS_PUBLIC_HTTP_PROTOCOL}://${VSS_PUBLIC_HOST}:${VSS_PUBLIC_PORT}|")
Apply it to every clip URL surfaced in the rendered report (Mode A Step 4 Clip URL row; Mode B per-incident clip sub-bullet). Leave the VLM
video_url
content block in Mode A Step 3 on the original internal URL — the VLM is in-cluster.

VST返回的片段URL使用agent内部的
${HOST_IP}:30888
主机:端口。这些URL在集群内部可用(VLM拉取帧、agent后端),但用户浏览器无法访问。部署层已在每个配置文件的
.env
中导出面向浏览器的主机:端口为
$VSS_PUBLIC_HOST
/
$VSS_PUBLIC_PORT
(协议为
$VSS_PUBLIC_HTTP_PROTOCOL
)——无论是Brev还是裸金属部署,只需一行命令即可重写:
bash
BROWSER_CLIP_URL=$(echo "$RAW_URL" | sed -E "s|^https?://[^/]+|${VSS_PUBLIC_HTTP_PROTOCOL}://${VSS_PUBLIC_HOST}:${VSS_PUBLIC_PORT}|")
将此命令应用于渲染报告中显示的所有片段URL(模式A步骤4的片段URL行;模式B每个事件的片段子项目符号)。模式A步骤3中VLM的
video_url
内容块保留原始内部URL即可——VLM处于集群内部。

Mode A — Report on a recorded video clip

模式A —— 为录制的视频片段生成报告

If the VSS
lvs
profile is deployed
curl -sf --max-time 5 "http://${HOST_IP}:38111/v1/ready"
returns HTTP 200 — run
/vss-summarize-video
to produce the summary, then paste its output into the report template in Step 4 and skip Steps 1–3 (the VLM-direct path). Run Steps 1–3 only when
/v1/ready
is non-200.
如果已部署VSS
lvs
配置文件
——
curl -sf --max-time 5 "http://${HOST_IP}:38111/v1/ready"
返回HTTP 200 —— 调用
/vss-summarize-video
生成摘要,然后将其输出粘贴到步骤4的报告模板中,跳过步骤1–3(直接调用VLM的路径)。仅当
/v1/ready
返回非200状态时,才执行步骤1–3。

Step 1 — Resolve the clip URL

步骤1 —— 解析片段URL

Hand off to
/vss-manage-video-io-storage
to:
  1. List sensors and confirm the named
    <sensor-id>
    exists (upload first if not).
  2. Fetch
    /storage/<streamId>/timelines
    for the recorded range when the user did not supply
    startTime
    /
    endTime
    .
  3. Request a clip URL:
    bash
    curl -s "http://${HOST_IP}:30888/vst/api/v1/storage/file/<streamId>/url?startTime=<startTime>&endTime=<endTime>&container=mp4&disableAudio=true" | jq -r .videoUrl
    That gives a direct
    mp4
    URL that the VLM can pull frames from. Bind it to
    VIDEO_URL
    (used in-cluster by the VLM in Step 3) and rewrite to
    BROWSER_CLIP_URL
    for the Step 4 report template using the one-liner from Browser-playable clip URL above — the user's browser cannot reach
    $VIDEO_URL
    directly. Mode A requires the selected VLM endpoint to be able to fetch
    VIDEO_URL
    . Local NIM/RT-VLM deployments normally can; remote endpoints generally cannot fetch
    localhost
    , private
    HOST_IP
    , or VST-internal URLs. If the live
    VLM_ENDPOINT
    is remote, surface that reachability requirement instead of making a chat request that will fail after
    /v1/models
    succeeds.
调用
/vss-manage-video-io-storage
接口完成以下操作:
  1. 列出传感器并确认指定的
    <sensor-id>
    存在(如果不存在则先上传)。
  2. 如果用户未提供
    startTime
    /
    endTime
    ,获取
    /storage/<streamId>/timelines
    以获取录制范围。
  3. 请求片段URL:
    bash
    curl -s "http://${HOST_IP}:30888/vst/api/v1/storage/file/<streamId>/url?startTime=<startTime>&endTime=<endTime>&container=mp4&disableAudio=true" | jq -r .videoUrl
    此命令会返回一个VLM可用于拉取帧的直接
    mp4
    URL。将其绑定到
    VIDEO_URL
    (供步骤3中的VLM在集群内部使用)使用上面“浏览器可播放的片段URL”中的一行命令重写为
    BROWSER_CLIP_URL
    ,用于步骤4的报告模板——用户浏览器无法直接访问
    $VIDEO_URL
    。 模式A要求所选VLM端点能够获取
    VIDEO_URL
    。 本地NIM/RT-VLM部署通常可以做到;远程端点通常无法获取
    localhost
    、私有
    HOST_IP
    或VST内部URL。如果当前
    VLM_ENDPOINT
    是远程的,需告知用户此可达性要求,而不是发起会失败的聊天请求(即使
    /v1/models
    请求成功)。

Step 2 — Resolve VLM endpoint and model

步骤2 —— 解析VLM端点和模型

The deploy may serve the VLM through either of two stacks. Both expose an OpenAI-compatible
chat/completions
API — pick whichever is live:
BackendEnv varsTypical host endpointPicked when
NIM Cosmos
VLM_BASE_URL
,
VLM_NAME
${VLM_BASE_URL}/v1
(no trailing
/v1
on the env var; the agent appends it)
VLM_MODE
∈ {
local
,
local_shared
,
remote
} and
VLM_BASE_URL
is non-empty
RT-VLM Cosmos
RTVI_VLM_BASE_URL
,
RTVI_VLM_MODEL_TO_USE
(model identifier on the RT-VLM side, e.g.
cosmos-reason2
)
${RTVI_VLM_BASE_URL}/v1
— alerts default
http://${HOST_IP}:8018/v1
, base default
http://${HOST_IP}:30082/v1
(
RTVI_VLM_ENDPOINT
)
VLM_MODE=none
or
VLM_BASE_URL
empty; also the only path for
warehouse
Read the live values off the running agent container — do not guess:
bash
docker exec vss-agent env | grep -E '^(VLM_BASE_URL|VLM_NAME|VLM_MODE|RTVI_VLM_BASE_URL|RTVI_VLM_ENDPOINT|RTVI_VLM_MODEL_TO_USE)='
Selection rule:
bash
if [ -n "${VLM_BASE_URL}" ] && [ "${VLM_MODE}" != "none" ]; then
  VLM_ENDPOINT="${VLM_BASE_URL%/}/v1"
  VLM_MODEL="${VLM_NAME}"
else
  VLM_ENDPOINT="${RTVI_VLM_ENDPOINT:-${RTVI_VLM_BASE_URL%/}/v1}"
  VLM_MODEL="${RTVI_VLM_MODEL_TO_USE}"
fi
Probe
/v1/models
before sending a chat request to confirm the chosen endpoint is alive and the model is loaded:
bash
curl -sf --max-time 5 "${VLM_ENDPOINT}/models" | jq -r '.data[].id'
If the probe fails or the listed ids don't include
${VLM_MODEL}
, fall back to the other backend (or surface the error — never silently pick a model that isn't on the server).
部署环境可能通过两种栈提供VLM服务。两者都暴露兼容OpenAI的
chat/completions
API —— 选择处于运行状态的栈:
后端环境变量典型主机端点选择条件
NIM Cosmos
VLM_BASE_URL
,
VLM_NAME
${VLM_BASE_URL}/v1
(环境变量末尾不带
/v1
;agent会自动追加)
VLM_MODE
∈ {
local
,
local_shared
,
remote
}
VLM_BASE_URL
非空
RT-VLM Cosmos
RTVI_VLM_BASE_URL
,
RTVI_VLM_MODEL_TO_USE
(RT-VLM侧的模型标识符,例如
cosmos-reason2
${RTVI_VLM_BASE_URL}/v1
—— 告警配置默认
http://${HOST_IP}:8018/v1
,基础配置默认
http://${HOST_IP}:30082/v1
RTVI_VLM_ENDPOINT
VLM_MODE=none
VLM_BASE_URL
为空;同时是
warehouse
场景的唯一路径
从运行中的agent容器中读取实时值——不要猜测:
bash
docker exec vss-agent env | grep -E '^(VLM_BASE_URL|VLM_NAME|VLM_MODE|RTVI_VLM_BASE_URL|RTVI_VLM_ENDPOINT|RTVI_VLM_MODEL_TO_USE)='
选择规则:
bash
if [ -n "${VLM_BASE_URL}" ] && [ "${VLM_MODE}" != "none" ]; then
  VLM_ENDPOINT="${VLM_BASE_URL%/}/v1"
  VLM_MODEL="${VLM_NAME}"
else
  VLM_ENDPOINT="${RTVI_VLM_ENDPOINT:-${RTVI_VLM_BASE_URL%/}/v1}"
  VLM_MODEL="${RTVI_VLM_MODEL_TO_USE}"
fi
在发送聊天请求前,探测
/v1/models
以确认所选端点处于运行状态且模型已加载:
bash
curl -sf --max-time 5 "${VLM_ENDPOINT}/models" | jq -r '.data[].id'
如果探测失败或列出的id不包含
${VLM_MODEL}
,切换到另一个后端(或显示错误——绝对不要静默选择服务器上不存在的模型)。

Step 3 — Call the VLM directly

步骤3 —— 直接调用VLM

Use the OpenAI-compatible
chat/completions
endpoint with a
video_url
content block — the same payload shape
video_understanding
builds in
src/vss_agents/tools/video_understanding.py
(
_build_vlm_messages
):
bash
PROMPT='Describe in detail what happens in the video, with timestamps (start–end in seconds from clip start) for each segment or event. Cover scenes, objects, people, vehicles, and notable actions.'
使用兼容OpenAI的
chat/completions
端点,传入包含
video_url
的内容块——与
video_understanding
src/vss_agents/tools/video_understanding.py
中构建的负载格式相同(
_build_vlm_messages
方法):
bash
PROMPT='详细描述视频中发生的内容,为每个片段或事件添加时间戳(从片段开始计算的秒数范围)。涵盖场景、物体、人物、车辆以及显著动作。'

Cosmos Reason 2 reasoning prompt suffix — matches video_understanding.py for is_cosmos_reason2 + reasoning=true.

Cosmos Reason 2推理提示后缀——与video_understanding.py中is_cosmos_reason2 + reasoning=true的配置匹配。

Drop this suffix for non-cosmos-reason2 VLMs.

非cosmos-reason2类型的VLM请移除此后缀。

PROMPT="${PROMPT}
Answer the question using the following format:
<think> Your reasoning. </think>
Write your final answer immediately after the </think> tag."
curl -s -X POST "${VLM_ENDPOINT}/chat/completions"
-H "Content-Type: application/json"
-d @- <<EOF | jq -r '.choices[0].message.content' { "model": "${VLM_MODEL}", "messages": [ { "role": "user", "content": [ {"type": "text", "text": $(jq -Rs . <<< "${PROMPT}")}, {"type": "video_url", "video_url": {"url": "${VIDEO_URL}"}} ] } ], "max_tokens": 1024, "temperature": 0.0 } EOF

If the VLM returns a `<think>…</think>` block (Cosmos Reason reasoning mode), keep only the text after `</think>` as the report body.
PROMPT="${PROMPT}
请使用以下格式回答问题:
<think> 你的推理过程。 </think>
</think>标签后立即写出最终答案。"
curl -s -X POST "${VLM_ENDPOINT}/chat/completions"
-H "Content-Type: application/json"
-d @- <<EOF | jq -r '.choices[0].message.content' { "model": "${VLM_MODEL}", "messages": [ { "role": "user", "content": [ {"type": "text", "text": $(jq -Rs . <<< "${PROMPT}")}, {"type": "video_url", "video_url": {"url": "${VIDEO_URL}"}} ] } ], "max_tokens": 1024, "temperature": 0.0 } EOF

如果VLM返回`<think>…</think>`块(Cosmos Reason推理模式),仅保留`</think>`之后的文本作为报告正文。

Step 4 — Fill the Video Analysis Report template

步骤4 —— 填写视频分析报告模板

markdown
undefined
markdown
undefined

Video Analysis Report

视频分析报告

Basic Information

基本信息

FieldValue
Report Identifiervss_report_<YYYYMMDD_HHMMSS>
Date of Analysis<YYYY-MM-DD>
Time of AnalysisHH:MM:SS
Video Source<sensor_id or filename>
Clip Range<startTime><endTime>
Clip URL
<BROWSER_CLIP_URL>
(apply the
$VSS_PUBLIC_HOST:$VSS_PUBLIC_PORT
rewrite — NEVER paste the raw
HOST_IP:30888
URL here)
VLM<VLM_MODEL (NIM or RT-VLM)>
Analysis Request<user's request>
字段
报告标识符vss_report_<YYYYMMDD_HHMMSS>
分析日期<YYYY-MM-DD>
分析时间HH:MM:SS
视频来源<sensor_id或文件名>
片段范围<startTime><endTime>
片段URL
<BROWSER_CLIP_URL>
(必须使用
$VSS_PUBLIC_HOST:$VSS_PUBLIC_PORT
重写——绝对不要粘贴原始
HOST_IP:30888
URL)
VLM模型<VLM_MODEL(NIM或RT-VLM)>
分析请求<用户的请求内容>

Analysis Results

分析结果

<VLM output: timestamped caption / summary>

Return the rendered markdown to the user.

---
<VLM输出:带时间戳的描述/摘要>

将渲染后的markdown返回给用户。

---

Mode B — Report on incidents in a time range

模式B —— 为时间范围内的事件生成报告

Step 1 — Resolve the time range and (optionally) sensor

步骤1 —— 解析时间范围和(可选)传感器

  • start_time
    /
    end_time
    must be ISO 8601 UTC (
    YYYY-MM-DDTHH:MM:SS.sssZ
    ). Resolve relative phrases ("last hour", "today") against the current host clock.
  • If the user names a sensor, capture it as
    source
    +
    source_type=sensor
    . Otherwise leave both unset for an all-sensors query.
  • start_time
    /
    end_time
    必须为ISO 8601 UTC格式(
    YYYY-MM-DDTHH:MM:SS.sssZ
    )。将相对表述(如“过去一小时”、“今日”)转换为基于当前主机时钟的具体时间。
  • 如果用户指定了传感器,将其记录为
    source
    +
    source_type=sensor
    。否则不设置这两个参数,查询所有传感器。

Step 2 — Fetch incidents via
/vss-query-analytics

步骤2 —— 通过
/vss-query-analytics
获取事件

Hand off to
/vss-query-analytics
(initialize →
tools/call
) with:
json
{
  "name": "video_analytics__get_incidents",
  "arguments": {
    "source": "<sensor-id-or-omit>",
    "source_type": "sensor",
    "start_time": "<ISO>",
    "end_time": "<ISO>",
    "max_count": 100,
    "includes": ["objectIds", "info"]
  }
}
For each incident keep:
id
,
sensorId
,
timestamp
,
end
,
category
,
place.name
,
info.verdict
,
info.reasoning
,
objectIds
, and the clip URL (commonly
info.clip_url
,
clip_url
, or whichever clip-pointer field the response carries). Apply the
$VSS_PUBLIC_HOST:$VSS_PUBLIC_PORT
rewrite (see Browser-playable clip URL above) to every clip URL before pasting it into the report
— the raw value is a
HOST_IP:30888
URL the user's browser cannot reach.
调用
/vss-query-analytics
接口(初始化 →
tools/call
),传入以下参数:
json
{
  "name": "video_analytics__get_incidents",
  "arguments": {
    "source": "<sensor-id或留空>",
    "source_type": "sensor",
    "start_time": "<ISO格式时间>",
    "end_time": "<ISO格式时间>",
    "max_count": 100,
    "includes": ["objectIds", "info"]
  }
}
为每个事件保留以下信息:
id
sensorId
timestamp
end
category
place.name
info.verdict
info.reasoning
objectIds
以及片段URL(通常为
info.clip_url
clip_url
或响应中包含的任何片段指向字段)。在将片段URL粘贴到报告前,必须使用上面“浏览器可播放的片段URL”中的方法进行
$VSS_PUBLIC_HOST:$VSS_PUBLIC_PORT
重写
——原始值是用户浏览器无法访问的
HOST_IP:30888
URL。

Step 3 — Fill the Incident Range Report template

步骤3 —— 填写事件范围报告模板

Group by sensor (or by category if no sensor scope), tally verdicts, list each incident as a bullet with timestamp / category / verdict / reasoning.
markdown
undefined
按传感器分组(如果未指定传感器范围则按类别分组),统计裁决结果,将每个事件列为项目符号,包含时间戳/类别/裁决/推理。
markdown
undefined

Incident Range Report

事件范围报告

Basic Information

基本信息

FieldValue
Report Identifiervss_report_<YYYYMMDD_HHMMSS>
Range<start_time> – <end_time>
Scope<sensor_id>
Total Incidents<N>
Confirmed / Rejected / Unverified<c> / <r> / <u>
字段
报告标识符vss_report_<YYYYMMDD_HHMMSS>
时间范围<start_time> – <end_time>
范围<sensor_id>
事件总数<N>
已确认 / 已驳回 / 未验证<c> / <r> / <u>

Incidents

事件详情

<sensor_id_or_category>

<sensor_id或类别>

  • <timestamp><category> — verdict: <confirmed|rejected|unverified>
    • <info.reasoning (1–2 lines)>
    • clip:
      <rewritten URL>
      (omit row when the incident carries no clip URL — never paste a raw
      HOST_IP:30888
      URL)
    • objects: <objectIds joined>
  • <时间戳> —— <类别> —— 裁决:<已确认|已驳回|未验证>
    • <info.reasoning(1-2行)>
    • 片段:
      <重写后的URL>
      (如果事件无片段URL则省略此行——绝对不要粘贴原始
      HOST_IP:30888
      URL)
    • 物体:<objectIds拼接结果>

Summary

总结

<2–4 sentences synthesizing what dominates the range — top categories, sensors with the most confirmed incidents, any clusters in time.>

If `get_incidents` returns zero results, return a one-line report stating the range and scope produced no incidents — do not invent content and do not fall back to Mode A.

---
<2-4句话总结时间范围内的主要情况——占比最高的类别、已确认事件最多的传感器、任何时间聚类情况。>

如果`get_incidents`返回空结果,返回一行报告说明该时间范围和范围内无事件——不要编造内容,也不要切换到模式A。

---

Cross-Reference

交叉引用

  • /vss-manage-video-io-storage
    — sensor list, timelines, and clip URL for Mode A Step 1.
  • /vss-query-analytics
    — incident retrieval (and verdict / reasoning enrichment) for Mode B Step 2.
  • /vss-ask-video
    — ad-hoc VLM Q&A on a single clip (not a structured report).
  • /vss-summarize-video
    — used by Mode A to produce the summary body when the
    lvs
    profile is deployed; the report template (Step 4) is still filled here.
  • /vss-manage-video-io-storage
    —— 传感器列表、时间线以及模式A步骤1所需的片段URL。
  • /vss-query-analytics
    —— 事件检索(以及裁决/推理信息补充),用于模式B步骤2。
  • /vss-ask-video
    —— 针对单个片段的临时VLM问答(非结构化报告)。
  • /vss-summarize-video
    —— 当部署
    lvs
    配置文件时,模式A使用此接口生成摘要正文;报告模板(步骤4)仍在此处填写。