vss-generate-video-report
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseReport
报告
Generate a video analysis report by routing to one of two backends — never via on the VSS agent.
POST /generate| Mode | Trigger | Backend |
|---|---|---|
| A. Video clip | "report on | |
| B. Incident range | "report on incidents from | |
If the request is ambiguous (e.g. "report on " with no time range and no incident wording), default to Mode A. Ask only if the user mentions both a sensor and a time range.
<sensor>通过调用两个后端之一生成视频分析报告——绝对不要调用VSS agent的接口。
POST /generate| 模式 | 触发指令 | 后端 |
|---|---|---|
| A. 视频片段 | "生成 | |
| B. 事件范围 | "生成 | |
如果请求存在歧义(例如,仅提及"生成的报告"但未指定时间范围或相关事件表述),默认使用模式A。仅当用户同时提及传感器和时间范围时,才需要向用户确认。
<sensor>When to Use
使用场景
- "Generate a report for this video" / "for " — Mode A
<sensor-id> - "Create an analysis report on the uploaded video" — Mode A
- "Report on incidents from 12:31Z to 12:32Z" — Mode B
- "Summarize alerts on between
<sensor>and<t1>" — Mode B<t2>
- "为该视频生成报告" / "为生成报告" —— 模式A
<sensor-id> - "为上传的视频生成分析报告" —— 模式A
- "生成12:31Z至12:32Z的事件报告" —— 模式B
- "总结在
<sensor>至<t1>期间的告警" —— 模式B<t2>
Deployment prerequisite
部署前提条件
Mode A needs the VSS base profile (VST + VLM NIM).
Mode B needs the VSS alerts profile (VA-MCP + Elasticsearch).
Probe:
bash
undefined模式A需要VSS base配置文件(VST + VLM NIM)。
模式B需要VSS alerts配置文件(VA-MCP + Elasticsearch)。
探测命令:
bash
undefinedMode A — VST + VLM reachability
模式A —— 检查VST + VLM可达性
curl -sf --max-time 5 "http://${HOST_IP}:30888/vst/api/v1/sensor/version" >/dev/null
curl -sf --max-time 5 "http://${HOST_IP}:30888/vst/api/v1/sensor/version" >/dev/null
Mode B — VA-MCP
模式B —— 检查VA-MCP
curl -sf --max-time 5 "http://${HOST_IP}:9901/" >/dev/null
If the probe fails, hand off to `/vss-deploy-profile` with `-p base` (Mode A) or `-p alerts` (Mode B). **Always** confirm the deploy with the user first.
---curl -sf --max-time 5 "http://${HOST_IP}:9901/" >/dev/null
如果探测失败,需调用`/vss-deploy-profile`接口并指定`-p base`(模式A)或`-p alerts`(模式B)。**必须先与用户确认后再执行部署操作**。
---Browser-playable clip URL (always do this before embedding any clip in the report)
浏览器可播放的片段URL(在报告中嵌入任何片段前必须执行此操作)
VST returns clip URLs using the agent-internal host:port. Those work in-cluster (VLM frame pulls, agent backend) but the user's browser cannot reach them. The deploy layer already exports the browser-facing host:port as / (and scheme as ) in every profile — Brev or bare-metal — so the rewrite is a one-liner:
${HOST_IP}:30888$VSS_PUBLIC_HOST$VSS_PUBLIC_PORT$VSS_PUBLIC_HTTP_PROTOCOL.envbash
BROWSER_CLIP_URL=$(echo "$RAW_URL" | sed -E "s|^https?://[^/]+|${VSS_PUBLIC_HTTP_PROTOCOL}://${VSS_PUBLIC_HOST}:${VSS_PUBLIC_PORT}|")Apply it to every clip URL surfaced in the rendered report (Mode A Step 4 Clip URL row; Mode B per-incident clip sub-bullet). Leave the VLM content block in Mode A Step 3 on the original internal URL — the VLM is in-cluster.
video_urlVST返回的片段URL使用agent内部的主机:端口。这些URL在集群内部可用(VLM拉取帧、agent后端),但用户浏览器无法访问。部署层已在每个配置文件的中导出面向浏览器的主机:端口为 / (协议为)——无论是Brev还是裸金属部署,只需一行命令即可重写:
${HOST_IP}:30888.env$VSS_PUBLIC_HOST$VSS_PUBLIC_PORT$VSS_PUBLIC_HTTP_PROTOCOLbash
BROWSER_CLIP_URL=$(echo "$RAW_URL" | sed -E "s|^https?://[^/]+|${VSS_PUBLIC_HTTP_PROTOCOL}://${VSS_PUBLIC_HOST}:${VSS_PUBLIC_PORT}|")将此命令应用于渲染报告中显示的所有片段URL(模式A步骤4的片段URL行;模式B每个事件的片段子项目符号)。模式A步骤3中VLM的内容块保留原始内部URL即可——VLM处于集群内部。
video_urlMode A — Report on a recorded video clip
模式A —— 为录制的视频片段生成报告
If the VSS profile is deployed — returns HTTP 200 — run to produce the summary, then paste its output into the report template in Step 4 and skip Steps 1–3 (the VLM-direct path). Run Steps 1–3 only when is non-200.
lvscurl -sf --max-time 5 "http://${HOST_IP}:38111/v1/ready"/vss-summarize-video/v1/ready如果已部署VSS 配置文件 —— 返回HTTP 200 —— 调用生成摘要,然后将其输出粘贴到步骤4的报告模板中,跳过步骤1–3(直接调用VLM的路径)。仅当返回非200状态时,才执行步骤1–3。
lvscurl -sf --max-time 5 "http://${HOST_IP}:38111/v1/ready"/vss-summarize-video/v1/readyStep 1 — Resolve the clip URL
步骤1 —— 解析片段URL
Hand off to to:
/vss-manage-video-io-storage-
List sensors and confirm the namedexists (upload first if not).
<sensor-id> -
Fetchfor the recorded range when the user did not supply
/storage/<streamId>/timelines/startTime.endTime -
Request a clip URL:bash
curl -s "http://${HOST_IP}:30888/vst/api/v1/storage/file/<streamId>/url?startTime=<startTime>&endTime=<endTime>&container=mp4&disableAudio=true" | jq -r .videoUrlThat gives a directURL that the VLM can pull frames from. Bind it tomp4(used in-cluster by the VLM in Step 3) and rewrite toVIDEO_URLfor the Step 4 report template using the one-liner from Browser-playable clip URL above — the user's browser cannot reachBROWSER_CLIP_URLdirectly. Mode A requires the selected VLM endpoint to be able to fetch$VIDEO_URL. Local NIM/RT-VLM deployments normally can; remote endpoints generally cannot fetchVIDEO_URL, privatelocalhost, or VST-internal URLs. If the liveHOST_IPis remote, surface that reachability requirement instead of making a chat request that will fail afterVLM_ENDPOINTsucceeds./v1/models
调用接口完成以下操作:
/vss-manage-video-io-storage-
列出传感器并确认指定的存在(如果不存在则先上传)。
<sensor-id> -
如果用户未提供/
startTime,获取endTime以获取录制范围。/storage/<streamId>/timelines -
请求片段URL:bash
curl -s "http://${HOST_IP}:30888/vst/api/v1/storage/file/<streamId>/url?startTime=<startTime>&endTime=<endTime>&container=mp4&disableAudio=true" | jq -r .videoUrl此命令会返回一个VLM可用于拉取帧的直接URL。将其绑定到mp4(供步骤3中的VLM在集群内部使用)并使用上面“浏览器可播放的片段URL”中的一行命令重写为VIDEO_URL,用于步骤4的报告模板——用户浏览器无法直接访问BROWSER_CLIP_URL。 模式A要求所选VLM端点能够获取$VIDEO_URL。 本地NIM/RT-VLM部署通常可以做到;远程端点通常无法获取VIDEO_URL、私有localhost或VST内部URL。如果当前HOST_IP是远程的,需告知用户此可达性要求,而不是发起会失败的聊天请求(即使VLM_ENDPOINT请求成功)。/v1/models
Step 2 — Resolve VLM endpoint and model
步骤2 —— 解析VLM端点和模型
The deploy may serve the VLM through either of two stacks. Both expose an OpenAI-compatible API — pick whichever is live:
chat/completions| Backend | Env vars | Typical host endpoint | Picked when |
|---|---|---|---|
| NIM Cosmos | | | |
| RT-VLM Cosmos | | | |
Read the live values off the running agent container — do not guess:
bash
docker exec vss-agent env | grep -E '^(VLM_BASE_URL|VLM_NAME|VLM_MODE|RTVI_VLM_BASE_URL|RTVI_VLM_ENDPOINT|RTVI_VLM_MODEL_TO_USE)='Selection rule:
bash
if [ -n "${VLM_BASE_URL}" ] && [ "${VLM_MODE}" != "none" ]; then
VLM_ENDPOINT="${VLM_BASE_URL%/}/v1"
VLM_MODEL="${VLM_NAME}"
else
VLM_ENDPOINT="${RTVI_VLM_ENDPOINT:-${RTVI_VLM_BASE_URL%/}/v1}"
VLM_MODEL="${RTVI_VLM_MODEL_TO_USE}"
fiProbe before sending a chat request to confirm the chosen endpoint is alive and the model is loaded:
/v1/modelsbash
curl -sf --max-time 5 "${VLM_ENDPOINT}/models" | jq -r '.data[].id'If the probe fails or the listed ids don't include , fall back to the other backend (or surface the error — never silently pick a model that isn't on the server).
${VLM_MODEL}部署环境可能通过两种栈提供VLM服务。两者都暴露兼容OpenAI的 API —— 选择处于运行状态的栈:
chat/completions| 后端 | 环境变量 | 典型主机端点 | 选择条件 |
|---|---|---|---|
| NIM Cosmos | | | |
| RT-VLM Cosmos | | | |
从运行中的agent容器中读取实时值——不要猜测:
bash
docker exec vss-agent env | grep -E '^(VLM_BASE_URL|VLM_NAME|VLM_MODE|RTVI_VLM_BASE_URL|RTVI_VLM_ENDPOINT|RTVI_VLM_MODEL_TO_USE)='选择规则:
bash
if [ -n "${VLM_BASE_URL}" ] && [ "${VLM_MODE}" != "none" ]; then
VLM_ENDPOINT="${VLM_BASE_URL%/}/v1"
VLM_MODEL="${VLM_NAME}"
else
VLM_ENDPOINT="${RTVI_VLM_ENDPOINT:-${RTVI_VLM_BASE_URL%/}/v1}"
VLM_MODEL="${RTVI_VLM_MODEL_TO_USE}"
fi在发送聊天请求前,探测以确认所选端点处于运行状态且模型已加载:
/v1/modelsbash
curl -sf --max-time 5 "${VLM_ENDPOINT}/models" | jq -r '.data[].id'如果探测失败或列出的id不包含,切换到另一个后端(或显示错误——绝对不要静默选择服务器上不存在的模型)。
${VLM_MODEL}Step 3 — Call the VLM directly
步骤3 —— 直接调用VLM
Use the OpenAI-compatible endpoint with a content block — the same payload shape builds in ():
chat/completionsvideo_urlvideo_understandingsrc/vss_agents/tools/video_understanding.py_build_vlm_messagesbash
PROMPT='Describe in detail what happens in the video, with timestamps (start–end in seconds from clip start) for each segment or event. Cover scenes, objects, people, vehicles, and notable actions.'使用兼容OpenAI的端点,传入包含的内容块——与在中构建的负载格式相同(方法):
chat/completionsvideo_urlvideo_understandingsrc/vss_agents/tools/video_understanding.py_build_vlm_messagesbash
PROMPT='详细描述视频中发生的内容,为每个片段或事件添加时间戳(从片段开始计算的秒数范围)。涵盖场景、物体、人物、车辆以及显著动作。'Cosmos Reason 2 reasoning prompt suffix — matches video_understanding.py for is_cosmos_reason2 + reasoning=true.
Cosmos Reason 2推理提示后缀——与video_understanding.py中is_cosmos_reason2 + reasoning=true的配置匹配。
Drop this suffix for non-cosmos-reason2 VLMs.
非cosmos-reason2类型的VLM请移除此后缀。
PROMPT="${PROMPT}
Answer the question using the following format:
<think>
Your reasoning.
</think>
Write your final answer immediately after the </think> tag."
curl -s -X POST "${VLM_ENDPOINT}/chat/completions"
-H "Content-Type: application/json"
-d @- <<EOF | jq -r '.choices[0].message.content' { "model": "${VLM_MODEL}", "messages": [ { "role": "user", "content": [ {"type": "text", "text": $(jq -Rs . <<< "${PROMPT}")}, {"type": "video_url", "video_url": {"url": "${VIDEO_URL}"}} ] } ], "max_tokens": 1024, "temperature": 0.0 } EOF
-H "Content-Type: application/json"
-d @- <<EOF | jq -r '.choices[0].message.content' { "model": "${VLM_MODEL}", "messages": [ { "role": "user", "content": [ {"type": "text", "text": $(jq -Rs . <<< "${PROMPT}")}, {"type": "video_url", "video_url": {"url": "${VIDEO_URL}"}} ] } ], "max_tokens": 1024, "temperature": 0.0 } EOF
If the VLM returns a `<think>…</think>` block (Cosmos Reason reasoning mode), keep only the text after `</think>` as the report body.PROMPT="${PROMPT}
请使用以下格式回答问题:
<think>
你的推理过程。
</think>
在</think>标签后立即写出最终答案。"
curl -s -X POST "${VLM_ENDPOINT}/chat/completions"
-H "Content-Type: application/json"
-d @- <<EOF | jq -r '.choices[0].message.content' { "model": "${VLM_MODEL}", "messages": [ { "role": "user", "content": [ {"type": "text", "text": $(jq -Rs . <<< "${PROMPT}")}, {"type": "video_url", "video_url": {"url": "${VIDEO_URL}"}} ] } ], "max_tokens": 1024, "temperature": 0.0 } EOF
-H "Content-Type: application/json"
-d @- <<EOF | jq -r '.choices[0].message.content' { "model": "${VLM_MODEL}", "messages": [ { "role": "user", "content": [ {"type": "text", "text": $(jq -Rs . <<< "${PROMPT}")}, {"type": "video_url", "video_url": {"url": "${VIDEO_URL}"}} ] } ], "max_tokens": 1024, "temperature": 0.0 } EOF
如果VLM返回`<think>…</think>`块(Cosmos Reason推理模式),仅保留`</think>`之后的文本作为报告正文。Step 4 — Fill the Video Analysis Report template
步骤4 —— 填写视频分析报告模板
markdown
undefinedmarkdown
undefinedVideo Analysis Report
视频分析报告
Basic Information
基本信息
| Field | Value |
|---|---|
| Report Identifier | vss_report_<YYYYMMDD_HHMMSS> |
| Date of Analysis | <YYYY-MM-DD> |
| Time of Analysis | HH:MM:SS |
| Video Source | <sensor_id or filename> |
| Clip Range | <startTime> – <endTime> |
| Clip URL | |
| VLM | <VLM_MODEL (NIM or RT-VLM)> |
| Analysis Request | <user's request> |
| 字段 | 值 |
|---|---|
| 报告标识符 | vss_report_<YYYYMMDD_HHMMSS> |
| 分析日期 | <YYYY-MM-DD> |
| 分析时间 | HH:MM:SS |
| 视频来源 | <sensor_id或文件名> |
| 片段范围 | <startTime> – <endTime> |
| 片段URL | |
| VLM模型 | <VLM_MODEL(NIM或RT-VLM)> |
| 分析请求 | <用户的请求内容> |
Analysis Results
分析结果
<VLM output: timestamped caption / summary>
Return the rendered markdown to the user.
---<VLM输出:带时间戳的描述/摘要>
将渲染后的markdown返回给用户。
---Mode B — Report on incidents in a time range
模式B —— 为时间范围内的事件生成报告
Step 1 — Resolve the time range and (optionally) sensor
步骤1 —— 解析时间范围和(可选)传感器
- /
start_timemust be ISO 8601 UTC (end_time). Resolve relative phrases ("last hour", "today") against the current host clock.YYYY-MM-DDTHH:MM:SS.sssZ - If the user names a sensor, capture it as +
source. Otherwise leave both unset for an all-sensors query.source_type=sensor
- /
start_time必须为ISO 8601 UTC格式(end_time)。将相对表述(如“过去一小时”、“今日”)转换为基于当前主机时钟的具体时间。YYYY-MM-DDTHH:MM:SS.sssZ - 如果用户指定了传感器,将其记录为+
source。否则不设置这两个参数,查询所有传感器。source_type=sensor
Step 2 — Fetch incidents via /vss-query-analytics
/vss-query-analytics步骤2 —— 通过/vss-query-analytics
获取事件
/vss-query-analyticsHand off to (initialize → ) with:
/vss-query-analyticstools/calljson
{
"name": "video_analytics__get_incidents",
"arguments": {
"source": "<sensor-id-or-omit>",
"source_type": "sensor",
"start_time": "<ISO>",
"end_time": "<ISO>",
"max_count": 100,
"includes": ["objectIds", "info"]
}
}For each incident keep: , , , , , , , , , and the clip URL (commonly , , or whichever clip-pointer field the response carries). Apply the rewrite (see Browser-playable clip URL above) to every clip URL before pasting it into the report — the raw value is a URL the user's browser cannot reach.
idsensorIdtimestampendcategoryplace.nameinfo.verdictinfo.reasoningobjectIdsinfo.clip_urlclip_url$VSS_PUBLIC_HOST:$VSS_PUBLIC_PORTHOST_IP:30888调用接口(初始化 → ),传入以下参数:
/vss-query-analyticstools/calljson
{
"name": "video_analytics__get_incidents",
"arguments": {
"source": "<sensor-id或留空>",
"source_type": "sensor",
"start_time": "<ISO格式时间>",
"end_time": "<ISO格式时间>",
"max_count": 100,
"includes": ["objectIds", "info"]
}
}为每个事件保留以下信息:、、、、、、、、以及片段URL(通常为、或响应中包含的任何片段指向字段)。在将片段URL粘贴到报告前,必须使用上面“浏览器可播放的片段URL”中的方法进行重写——原始值是用户浏览器无法访问的 URL。
idsensorIdtimestampendcategoryplace.nameinfo.verdictinfo.reasoningobjectIdsinfo.clip_urlclip_url$VSS_PUBLIC_HOST:$VSS_PUBLIC_PORTHOST_IP:30888Step 3 — Fill the Incident Range Report template
步骤3 —— 填写事件范围报告模板
Group by sensor (or by category if no sensor scope), tally verdicts, list each incident as a bullet with timestamp / category / verdict / reasoning.
markdown
undefined按传感器分组(如果未指定传感器范围则按类别分组),统计裁决结果,将每个事件列为项目符号,包含时间戳/类别/裁决/推理。
markdown
undefinedIncident Range Report
事件范围报告
Basic Information
基本信息
| Field | Value |
|---|---|
| Report Identifier | vss_report_<YYYYMMDD_HHMMSS> |
| Range | <start_time> – <end_time> |
| Scope | <sensor_id> |
| Total Incidents | <N> |
| Confirmed / Rejected / Unverified | <c> / <r> / <u> |
| 字段 | 值 |
|---|---|
| 报告标识符 | vss_report_<YYYYMMDD_HHMMSS> |
| 时间范围 | <start_time> – <end_time> |
| 范围 | <sensor_id> |
| 事件总数 | <N> |
| 已确认 / 已驳回 / 未验证 | <c> / <r> / <u> |
Incidents
事件详情
<sensor_id_or_category>
<sensor_id或类别>
- <timestamp> — <category> — verdict: <confirmed|rejected|unverified>
- <info.reasoning (1–2 lines)>
- clip: (omit row when the incident carries no clip URL — never paste a raw
<rewritten URL>URL)HOST_IP:30888 - objects: <objectIds joined>
- …
- <时间戳> —— <类别> —— 裁决:<已确认|已驳回|未验证>
- <info.reasoning(1-2行)>
- 片段:(如果事件无片段URL则省略此行——绝对不要粘贴原始
<重写后的URL>URL)HOST_IP:30888 - 物体:<objectIds拼接结果>
- …
Summary
总结
<2–4 sentences synthesizing what dominates the range — top categories, sensors with the most confirmed incidents, any clusters in time.>
If `get_incidents` returns zero results, return a one-line report stating the range and scope produced no incidents — do not invent content and do not fall back to Mode A.
---<2-4句话总结时间范围内的主要情况——占比最高的类别、已确认事件最多的传感器、任何时间聚类情况。>
如果`get_incidents`返回空结果,返回一行报告说明该时间范围和范围内无事件——不要编造内容,也不要切换到模式A。
---Cross-Reference
交叉引用
- — sensor list, timelines, and clip URL for Mode A Step 1.
/vss-manage-video-io-storage - — incident retrieval (and verdict / reasoning enrichment) for Mode B Step 2.
/vss-query-analytics - — ad-hoc VLM Q&A on a single clip (not a structured report).
/vss-ask-video - — used by Mode A to produce the summary body when the
/vss-summarize-videoprofile is deployed; the report template (Step 4) is still filled here.lvs
- —— 传感器列表、时间线以及模式A步骤1所需的片段URL。
/vss-manage-video-io-storage - —— 事件检索(以及裁决/推理信息补充),用于模式B步骤2。
/vss-query-analytics - —— 针对单个片段的临时VLM问答(非结构化报告)。
/vss-ask-video - —— 当部署
/vss-summarize-video配置文件时,模式A使用此接口生成摘要正文;报告模板(步骤4)仍在此处填写。lvs