vss-summarize-video

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Instructions

操作说明

Follow the routing tables and step-by-step workflows below. Each section that ends in workflow, quick start, or flow is intended to be executed top-to-bottom. Detailed reference material lives in

references/

and helper scripts live in

scripts/

— call them via

run_script

when the skill points to a script by name.

请遵循以下路由表和分步工作流。所有以workflow、quick start或flow结尾的部分都需要从上到下依次执行。详细参考资料位于

references/

目录下，辅助脚本位于

scripts/

目录下——当技能指向某个脚本名称时，通过

run_script

调用它们。

Examples

示例

Worked end-to-end examples are kept under

evals/

(each

*.json

manifest contains a runnable scenario) and inline in the per-workflow

curl

blocks below. Run a Tier-3 evaluation with

nv-base validate <this-skill-dir> --agent-eval

to replay them.

You are a video summarization assistant. You call the VLM NIM or the video summarization microservice directly. Always run

curl

commands yourself; never instruct the user to run them.

Primary video workflow query type: "Summarize this video." Direct video summarization API and service-ops requests are handled by the reference-routed sections below.

完整的端到端示例存放在

evals/

目录下（每个

*.json

清单文件包含一个可运行的场景），同时也内嵌在下方每个工作流的

curl

代码块中。使用

nv-base validate <this-skill-dir> --agent-eval

命令运行三级评估即可复现这些示例。

您是视频总结助手。您可以直接调用VLM NIM或视频总结微服务。请自行运行

curl

命令；切勿指示用户运行。

主要视频工作流查询类型："总结这段视频。" 直接的视频总结API和服务运维请求由下方的参考路由部分处理。

Purpose

用途

Produce a single, polished narrative summary of one recorded video clip, with timestamped events when the LVS microservice path is reachable.

Do NOT use this skill for:

Live RTSP captioning — use
```
vss-deploy-dense-captioning
```
.
Incident-range or alert-window reports — use
```
vss-generate-video-report
```
Mode B.
Semantic search across the archive — use
```
vss-search-archive
```
.

生成单个录制视频片段的完整、精炼的叙事性总结，当LVS微服务路径可用时，还会包含带时间戳的事件信息。

请勿将此技能用于：

实时RTSP字幕生成——请使用
```
vss-deploy-dense-captioning
```
。
事件范围或告警窗口报告——请使用
```
vss-generate-video-report
```
的模式B。
跨归档文件的语义搜索——请使用
```
vss-search-archive
```
。

Prerequisites

前提条件

VSS
```
lvs
```
profile running on
```
$HOST_IP
```
(port 38111) OR a reachable VLM/RT-VLM endpoint as a fallback. The
```
vss-deploy-profile
```
skill brings these up.
Network reachability from the agent host to both endpoints; clip URLs from VIOS must be fetchable by the chosen backend.
```
jq
```
and
```
curl
```
available on the agent host.

VSS的
```
lvs
```
配置文件在
```
$HOST_IP
```
（端口38111）上运行，或者有一个可访问的VLM/RT-VLM端点作为备选。
```
vss-deploy-profile
```
技能可启动这些服务。
代理主机与两个端点之间网络可达；所选后端必须能够获取来自VIOS的片段URL。
代理主机上已安装
```
jq
```
和
```
curl
```
。

Limitations

限制条件

Direct VLM fallback uses a single fixed prompt and cannot target scenario/events — output quality is lower than the LVS path.
Remote VLM endpoints generally cannot reach
```
localhost
```
/private clip URLs.
One backend call per request; no parallel hedging or multi-pass summaries.

直接使用VLM备选方案时采用单一固定提示词，无法针对特定场景/事件——输出质量低于LVS路径。
远程VLM端点通常无法访问
```
localhost
```
/私有片段URL。
每个请求仅调用一次后端；不支持并行对冲或多轮总结。

Troubleshooting

故障排查

Symptom	Cause	Fix
`/v1/ready` returns 503 repeatedly	LVS service still warming up	Retry up to ~30 s as shown in Setup; if it never returns 200 the service may not be deployed
Empty `video_summary` and `events`	Clip does not contain the requested events	Re-run with broader `scenario` or different `events`
VLM returns `<think>` block	Cosmos Reason 2 reasoning mode	Strip everything up to `</think>` before rendering
Empty stdout from `curl /v1/ready`	Service legitimately returns 200 with empty body	Always check HTTP status with `-o /dev/null -w '%{http_code}'` , never inspect the body

See

references/video-summarization-debugging.md

for deeper diagnostics.

症状	原因	解决方法
`/v1/ready` 反复返回503	LVS服务仍在预热中	按照设置部分的说明重试约30秒；如果始终无法返回200，可能服务未部署
`video_summary` 和 `events` 为空	视频片段中不包含请求的事件	使用更宽泛的 `scenario` 或不同的 `events` 重新运行
VLM返回 `<think>` 块	Cosmos Reason 2推理模式	在输出前移除 `</think>` 之前的所有内容
`curl /v1/ready` 返回空标准输出	服务合法返回200但响应体为空	始终使用 `-o /dev/null -w '%{http_code}'` 检查HTTP状态码，不要检查响应体

如需更深入的诊断，请查看

references/video-summarization-debugging.md

。

Reference Map

参考指南

Use these references only when the user asks for the relevant detail, or when the core workflow below needs deeper video summarization information:

video summarization API details:

references/video-summarization-api.md

for

/v1/summarize

/summarize

/v1/generate_captions

/v1/stream_summarize

, health probes,

/models

/recommended_config

/metrics

, request fields, response shapes, and API gotchas.

video summarization service configuration and ops:
```
references/video-summarization-deployment.md
```
for the VSS
```
lvs
```
profile, ports, required env vars, logs, status, dry-runs, teardown, model/backend swaps, Elasticsearch/Neo4j/ArangoDB backend selection, and service-level troubleshooting.

Extended video summarization ops references:

references/video-summarization-environment-variables.md

references/video-summarization-debugging.md

, and

assets/video-summarization.env.example

Load

video-summarization-api.md

only when you need a request field, response shape, or endpoint that is not already covered by the Step 2 LVS or fallback VLM example below, or when handling a direct video summarization API request. Load

video-summarization-deployment.md

only for deployment, configuration, or service operations.

仅当用户询问相关细节，或核心工作流需要更深入的视频总结信息时，才使用以下参考资料：

视频总结API详情：

references/video-summarization-api.md

包含

/v1/summarize

、

/summarize

、

/v1/generate_captions

、

/v1/stream_summarize

、健康探测、

/models

、

/recommended_config

、

/metrics

、请求字段、响应结构以及API注意事项。

视频总结服务配置与运维：
```
references/video-summarization-deployment.md
```
包含VSS的
```
lvs
```
配置文件、端口、所需环境变量、日志、状态、试运行、销毁、模型/后端切换、Elasticsearch/Neo4j/ArangoDB后端选择以及服务级故障排查。

扩展视频总结运维参考：

references/video-summarization-environment-variables.md

、

references/video-summarization-debugging.md

以及

assets/video-summarization.env.example

。

仅当需要以下内容时才加载

video-summarization-api.md

：下方步骤2的LVS或备选VLM示例未涵盖的请求字段、响应结构或端点，或者处理直接的视频总结API请求。仅在部署、配置或服务运维场景下加载

video-summarization-deployment.md

。

Video Summarization API And Service Ops Requests

视频总结API与服务运维请求

If the user asks to call or debug video summarization endpoints directly, answer from

references/video-summarization-api.md

instead of running the end-to-end video summarization workflow. Examples: list video summarization models, check readiness, get recommended chunking config, inspect metrics, explain a 422 response, or build a

/v1/summarize

request body.

If the user asks to configure, deploy, restart, tear down, or troubleshoot the video summarization service, prefer the

vss-deploy-profile

skill for full VSS profile deployment and use

references/video-summarization-deployment.md

for video summarization-specific service details.

如果用户要求直接调用或调试视频总结端点，请从

references/video-summarization-api.md

中获取答案，而非运行端到端视频总结工作流。示例：列出视频总结模型、检查就绪状态、获取推荐的分块配置、查看指标、解释422响应或构建

/v1/summarize

请求体。

如果用户要求配置、部署、重启、销毁或排查视频总结服务，请优先使用

vss-deploy-profile

技能进行完整的VSS配置文件部署，并使用

references/video-summarization-deployment.md

获取视频总结特定的服务细节。

Routing

路由规则

Decide purely from video summarization service availability (probed in Setup → Availability checks below). Duration does not drive routing.

`/v1/ready`	Backend	Endpoint
HTTP 200	LVS microservice with HITL	`POST ${LVS_BACKEND_URL}/v1/summarize`
Anything else	VLM / RT-VLM with the default prompt + fallback note	`POST ${VLM_BASE_URL}/v1/chat/completions`

Fallback message when the LVS service is unreachable — copy verbatim above the summary:

⚠ Note: Input video
<name>
is
<N>
s long. The video summarization service is not deployed, so this summary was produced by the VLM alone with a generic default prompt. Deploy the
lvs
profile for higher-quality summaries with scenario/events targeting.

仅根据视频总结服务的可用性（在设置 → 可用性检查部分探测）决定路由。时长不影响路由。

`/v1/ready` 响应	后端	端点
HTTP 200	带HITL的LVS微服务	`POST ${LVS_BACKEND_URL}/v1/summarize`
其他响应	带默认提示词的VLM / RT-VLM + 备选说明	`POST ${VLM_BASE_URL}/v1/chat/completions`

当LVS服务不可用时的备选提示信息——请逐字复制到总结上方：

⚠ 注意： 输入视频
<name>
时长为
<N>
秒。视频总结服务未部署，因此本总结仅由VLM通过通用默认提示词生成。部署
lvs
配置文件可获得更高质量的总结，支持场景/事件定向。

Deployment prerequisite

部署前提

The VSS lvs profile on

$HOST_IP

is the primary backend. If the

/v1/ready

probe (see Setup → Availability checks) returns anything other than 200 after the warmup retries, ask the user:

"The VSS
lvs
profile isn't running on
$HOST_IP
. Shall I deploy it now using the
/vss-deploy-profile
skill with
-p lvs
? Reply
no
to summarize with the VLM-only fallback instead (lower quality, no scenario/events targeting)."

Yes → hand off to
```
/vss-deploy-profile
```
, then re-probe and continue with Step 2 (LVS + HITL).
No → go straight to Step 2 fallback (VLM with default prompt) and prepend the Routing fallback note. Do not ask again, and do not run scenario/events HITL.
Pre-authorized to deploy autonomously (caller said so explicitly) → skip the confirmation and invoke
```
/vss-deploy-profile
```
directly.
Pre-authorized to use VLM fallback ("skip lvs, just use the VLM") → go straight to Step 2 fallback without prompting.

$HOST_IP

上的VSS lvs配置文件是主要后端。如果

/v1/ready

探测（见设置 → 可用性检查）在预热重试后返回非200响应，请询问用户：

"VSS的
lvs
配置文件未在
$HOST_IP
上运行。是否现在使用
/vss-deploy-profile
技能并加上
-p lvs
参数进行部署？回复
no
将仅使用VLM备选方案进行总结（质量较低，不支持场景/事件定向）。"

是 → 转交至
```
/vss-deploy-profile
```
，然后重新探测并继续执行步骤2（LVS + HITL）。
否 → 直接进入步骤2备选方案（带默认提示词的VLM），并在开头添加路由备选提示信息。请勿再次询问，也不要运行场景/事件HITL。
已授权自主部署（调用者明确说明） → 跳过确认，直接调用
```
/vss-deploy-profile
```
。
已授权使用VLM备选方案（"跳过lvs，直接使用VLM"） → 无需提示，直接进入步骤2备选方案。

Setup

设置

Endpoints (defaults for a local VSS
lvs
deployment):

VLM / RT-VLM:

${VLM_BASE_URL}

— default

${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}

LVS service:

${LVS_BACKEND_URL}

— default

http://${HOST_IP:-localhost}:38111

VIOS: owned by
```
vss-manage-video-io-storage
```
; refer there.

Use env vars when set (strip trailing

/v1

from the VLM base — the skill appends it). Otherwise use the defaults. If neither works, ask the user — do not scan ports or read config files to guess.

Model name: read

${VLM_NAME}

(default

nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8

). It must match the id RT-VLM

/v1/models

advertises; do not substitute the friendly

nvidia/cosmos-reason2-8b

For endpoint schemas, optional fields, response envelopes, and error handling, see

references/video-summarization-api.md

Availability checks (run both before routing). Readiness is determined by the HTTP status code only — the LVS

/v1/ready

may legitimately return

with an empty body, so do not inspect the body.

bash

VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"

端点（本地VSS
lvs
部署的默认值）：

VLM / RT-VLM：

${VLM_BASE_URL}

—— 默认值为

${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}

LVS服务：

${LVS_BACKEND_URL}

—— 默认值为

http://${HOST_IP:-localhost}:38111

VIOS：由
```
vss-manage-video-io-storage
```
负责；请参考该技能。

如果设置了环境变量则使用（移除VLM基础URL末尾的

/v1

——技能会自动追加）。否则使用默认值。如果两者都不可用，请询问用户——不要扫描端口或读取配置文件猜测。

模型名称： 读取

${VLM_NAME}

（默认值为

nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8

）。必须与RT-VLM的

/v1/models

接口返回的ID匹配；请勿替换为友好名称

nvidia/cosmos-reason2-8b

。

如需端点模式、可选字段、响应包和错误处理的信息，请查看

references/video-summarization-api.md

。

可用性检查（路由前需运行两项检查）。仅通过HTTP状态码判断就绪状态——LVS的

/v1/ready

可能合法返回

但响应体为空，因此不要检查响应体。

bash

VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"

VLM / RT-VLM: 200 on /v1/models

VLM / RT-VLM: /v1/models返回200

vlm_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3
"$VLM/v1/models") [ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"

Video summarization service: 200 on /v1/ready, with retry on 503 (warmup) for up to ~30s

Video summarization service: /v1/ready返回200，503时重试（预热）最多约30秒

VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111} video_sum_code=000 for i in $(seq 1 10); do video_sum_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$VIDEO_SUMMARIZATION_URL/v1/ready") case "$video_sum_code" in 200) echo "video summarization OK"; break ;; 503) sleep 3 ;; # warming up; keep polling *) break ;; # any other code = not reachable, stop retrying esac done [ "$video_sum_code" = "200" ] || echo "video summarization service not reachable (HTTP $video_sum_code)"


**How to interpret the results:**

- `video_sum_code = 200` → **Step 2 (LVS + HITL)** for every video.
- `video_sum_code != 200`, `vlm_code = 200` → **Step 2 fallback (VLM)**; prepend the Routing fallback note.
- `vlm_code != 200` → fail; at least one backend must be reachable.
- A non-200 LVS code after the retry loop is the ONLY signal of unavailability. Empty stdout or missing JSON fields are NOT "unavailable."

---

VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111} video_sum_code=000 for i in $(seq 1 10); do video_sum_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$VIDEO_SUMMARIZATION_URL/v1/ready") case "$video_sum_code" in 200) echo "video summarization OK"; break ;; 503) sleep 3 ;; # 正在预热；继续轮询 *) break ;; # 其他状态码 = 不可访问，停止重试 esac done [ "$video_sum_code" = "200" ] || echo "video summarization service not reachable (HTTP $video_sum_code)"


**结果解读：**

- `video_sum_code = 200` → 所有视频均执行**步骤2（LVS + HITL）**。
- `video_sum_code != 200`且`vlm_code = 200` → 执行**步骤2备选方案（VLM）**；在开头添加路由备选提示信息。
- `vlm_code != 200` → 执行失败；至少需要一个后端可达。
- 重试循环后LVS返回非200状态码是唯一表示不可用的信号。空标准输出或缺失JSON字段不代表"不可用"。

---

Step 1 - Get the clip URL via

vss-manage-video-io-storage

(sub-task, NOT the final answer)

步骤1 - 通过

vss-manage-video-io-storage

获取片段URL（子任务，非最终答案）

Use the
vss-manage-video-io-storage
skill for all VIOS interactions — it owns the canonical curl recipes, parameter defaults, and delete/upload flows. Do not fabricate URLs or hand-roll VIOS calls; they will drift.

This step is a sub-task — do NOT end your turn here; do NOT return the clip URL as the final answer. From VIOS collect three values:

streamId
(via
```
sensor/list
```
→
```
sensor/<id>/streams
```
, or directly from an upload response).
Timeline -
```
{startTime, endTime}
```
(ISO 8601 UTC).
```
endTime - startTime
```
is the duration; needed only for the user-facing header (routing is driven solely by
```
/v1/ready
```
).
Temporary MP4 clip URL — the
```
/storage/file/<streamId>/url
```
variant with
```
container=mp4
```
. Response field:
```
.videoUrl
```
. Both backends need an HTTP(S) URL they can
```
GET
```
.

Everything else (auth, upload,

disableAudio

, expiry, etc.) lives in the

vss-manage-video-io-storage

skill — refer users there if VIOS fails.

所有VIOS交互请使用
vss-manage-video-io-storage
技能——该技能拥有标准的curl命令模板、参数默认值以及删除/上传流程。请勿自行构造URL或手动编写VIOS调用；否则会出现偏差。

此步骤为子任务——请勿在此结束操作；请勿将片段URL作为最终答案返回。从VIOS收集三个值：

streamId
（通过
```
sensor/list
```
→
```
sensor/<id>/streams
```
获取，或直接从上传响应中获取）。
时间轴 -
```
{startTime, endTime}
```
（ISO 8601 UTC格式）。
```
endTime - startTime
```
为时长；仅用于面向用户的标题（路由仅由
```
/v1/ready
```
驱动）。
临时MP4片段URL —— 使用
```
/storage/file/<streamId>/url
```
接口并指定
```
container=mp4
```
。响应字段：
```
.videoUrl
```
。两个后端都需要一个可通过
```
GET
```
访问的HTTP(S) URL。

其他所有内容（认证、上传、

disableAudio

、过期时间等）均由

vss-manage-video-io-storage

技能处理——如果VIOS出现故障，请引导用户参考该技能。

Step 2 — Primary: video summarization microservice with HITL

步骤2 — 主流程：带HITL的视频总结微服务

Use this path whenever

/v1/ready

returned 200 in Setup. Duration is irrelevant.

For advanced fields (

media_info

schema

, structured output, stream captioning, metrics, recommended config) see

references/video-summarization-api.md

只要在设置步骤中

/v1/ready

返回200，就使用此路径。时长无关紧要。

如需高级字段（

media_info

、

schema

、结构化输出、流字幕生成、指标、推荐配置），请查看

references/video-summarization-api.md

。

HITL: collect scenario and events first (REQUIRED — do not skip)

HITL：先收集场景和事件（必填——请勿跳过）

Full walk-through is in

references/hitl-prompts.md

. Always run HITL before calling the LVS service.

Autonomous-mode defaults. When the caller has bypassed HITL ("run autonomously without prompting") AND the original query asks for

default

defaults

(or gives none), use

scenario="activity monitoring"

and

events=["notable activity"]

verbatim — do not infer from filename or sensor name. Note the defaults in the final reply and offer a re-run with more specific parameters. This is the ONLY supported HITL bypass; "the video is short" or "the user seems in a hurry" are not valid reasons.

Prefer

POST /v1/summarize

(3.2 GA route);

/summarize

is a compatibility alias.

bash

VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111}

完整流程请查看

references/hitl-prompts.md

。调用LVS服务前必须先运行HITL。

自主模式默认值。当调用者绕过HITL（"无需提示，自主运行"）且原始查询要求使用

default

defaults

（或未指定）时，请使用

scenario="activity monitoring"

和

events=["notable activity"]

逐字使用——不要从文件名或传感器名称推断。在最终回复中注明默认值，并提供使用更具体参数重新运行的选项。这是唯一支持的HITL绕过方式；"视频较短"或"用户似乎很着急"均不是合法理由。

优先使用

POST /v1/summarize

（3.2 GA版本路由）；

/summarize

是兼容别名。

bash

VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111}

From HITL reply:

来自HITL的回复：

SCENARIO='warehouse monitoring' EVENTS_JSON='["notable activity"]' OBJECTS_JSON='' # '' to omit, else '["forklifts","pallets","workers"]'

curl -s -X POST "$VIDEO_SUMMARIZATION_URL/v1/summarize"
-H "Content-Type: application/json"
-d "$(jq -n --arg url "<clip_url_from_vss_manage_video_io_storage>"
--arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8}"
--arg scenario "$SCENARIO"
--argjson events "$EVENTS_JSON"
--argjson objects "${OBJECTS_JSON:-null}" '{ url: $url, model: $model, scenario: $scenario, events: $events, chunk_duration: 10, num_frames_per_second_or_fixed_frames_chunk: 20, use_fps_for_chunking: false, seed: 1 } + (if $objects == null then {} else {objects_of_interest: $objects} end)')"
| jq -r '.choices[0].message.content'
| jq '{video_summary, events}'


If both `video_summary` and `events` are empty, the clip probably doesn't contain the requested events — re-run with broader `scenario`/`events`, don't report "no content".

**Tuning:** `chunk_duration` (default `10`s; `0` = single chunk),
`num_frames_per_second_or_fixed_frames_chunk` (default `20`; meaning depends
on `use_fps_for_chunking`), `seed` (default `1`). `num_frames_per_chunk` is
deprecated.

---

SCENARIO='warehouse monitoring' EVENTS_JSON='["notable activity"]' OBJECTS_JSON='' # 留空则省略，否则使用'["forklifts","pallets","workers"]'


如果`video_summary`和`events`均为空，可能视频片段中不包含请求的事件——使用更宽泛的`scenario`/`events`重新运行，不要报告"无内容"。

**调优参数：** `chunk_duration`（默认值为10秒；`0`表示单个分块）、`num_frames_per_second_or_fixed_frames_chunk`（默认值为20；含义取决于`use_fps_for_chunking`）、`seed`（默认值为1）。`num_frames_per_chunk`已废弃。

---

Step 2 fallback — VLM direct with default prompt

步骤2备选方案 — 直接使用带默认提示词的VLM

Use this path only when

/v1/ready

did not return 200 after warmup. Do NOT run HITL — the user did not opt in; you fell back because the service was missing. Prepend the Routing fallback note to the response.

bash

VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"
PROMPT='Describe in detail what is happening in this video,
including all visible people, vehicles, equipments, objects,
actions, and environmental conditions.
OUTPUT REQUIREMENTS:
[timestamp-timestamp] Description of what is happening.
EXAMPLE:
[0.0s-4.0s] <description of the first event>
[4.0s-12.0s] <description of the second event>'

curl -s -X POST "$VLM/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
        --arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8}" \
        --arg text "$PROMPT" \
        --arg url "<clip_url_from_vss_manage_video_io_storage>" \
        '{
          model: $model,
          temperature: 0.0,
          max_tokens: 1024,
          messages: [{
            role: "user",
            content: [
              {type: "text", text: $text},
              {type: "video_url", video_url: {url: $url}}
            ]
          }]
        }')" | jq -r '.choices[0].message.content'

Response: standard OpenAI chat-completion envelope. The summary is in

choices[0].message.content

Cosmos-model notes: Cosmos Reason 2 supports reasoning via

<think>...</think><answer>...</answer>

blocks. Omit the reasoning instructions if you want a plain summary. Frame sampling and pixel limits are applied server-side; no client-side prep is required when you pass a

video_url

仅当预热后

/v1/ready

未返回200时使用此路径。请勿运行HITL——用户未选择该选项；使用备选方案是因为服务不可用。在响应开头添加路由备选提示信息。

bash

VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"
PROMPT='Describe in detail what is happening in this video,
including all visible people, vehicles, equipments, objects,
actions, and environmental conditions.
OUTPUT REQUIREMENTS:
[timestamp-timestamp] Description of what is happening.
EXAMPLE:
[0.0s-4.0s] <description of the first event>
[4.0s-12.0s] <description of the second event>'

curl -s -X POST "$VLM/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
        --arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8}" \
        --arg text "$PROMPT" \
        --arg url "<clip_url_from_vss_manage_video_io_storage>" \
        '{
          model: $model,
          temperature: 0.0,
          max_tokens: 1024,
          messages: [{
            role: "user",
            content: [
              {type: "text", text: $text},
              {type: "video_url", video_url: {url: $url}}
            ]
          }]
        }')" | jq -r '.choices[0].message.content'

响应： 标准OpenAI聊天补全包。总结内容位于

choices[0].message.content

。

Cosmos模型说明： Cosmos Reason 2支持通过

<think>...</think><answer>...</answer>

块进行推理。如果需要纯总结内容，请省略推理指令。帧采样和像素限制在服务器端应用；传递

video_url

时无需客户端预处理。

End-to-end example

端到端示例

See

references/end-to-end-example.md

for the full LVS-or-VLM-fallback script that probes

/v1/ready

and runs the appropriate path.

完整的LVS或VLM备选方案脚本（包含探测

/v1/ready

并运行对应路径）请查看

references/end-to-end-example.md

。

Responses

响应说明

VLM returns an OpenAI chat-completion envelope; summary is
```
choices[0].message.content
```
.
LVS service returns the same envelope but
```
content
```
is a JSON string — run
```
jq -r '.choices[0].message.content' | jq
```
to reach
```
{video_summary, events}
```
.
Errors surface as HTTP non-2xx plus JSON
```
{error: ...}
```
. LVS
```
503
```
usually means warmup — retry
```
/v1/ready
```
.

VLM 返回OpenAI聊天补全包；总结内容为
```
choices[0].message.content
```
。
LVS服务 返回相同的包，但
```
content
```
是JSON字符串——需运行
```
jq -r '.choices[0].message.content' | jq
```
才能获取
```
{video_summary, events}
```
。
错误表现为HTTP非2xx状态码加JSON格式的
```
{error: ...}
```
。LVS返回503通常表示正在预热——请重试
```
/v1/ready
```
。

Presenting the output to the user

向用户展示输出

Surface backend output with minimal transformation — do not paraphrase, re-voice, add emojis, or reformat. One backend call → one rendering: no parallel hedging, no duplicate headers, never call both LVS and VLM for the same video.

Header line. Start with exactly one:

Summary of <video_name> (<duration>)

<duration>

Ns

for

< 60 s

, else

Mm Ss

(e.g.

3m 30s

LVS output: render

video_summary

verbatim (polished, tone-controlled report — rewriting loses fidelity). Render each

events

entry with its

start_time

end_time

type

, and full

description

verbatim (table when the client renders one cleanly, otherwise a per-event list). You MAY add a one-line header and a closing offer to re-run with different parameters.

VLM output: render

choices[0].message.content

verbatim. If the model produced

<think>…</think><answer>…</answer>

blocks, drop the

<think>

block and show the answer.

Fallback warning (when applicable) goes above the summary, never mixed into it.

对后端输出进行最小化转换——请勿改写、调整语气、添加表情符号或重新格式化。一次后端调用 → 一次输出：不支持并行对冲，不添加重复标题，同一视频永远不要同时调用LVS和VLM。

标题行。开头必须包含以下内容：

Summary of <video_name> (<duration>)

<duration>

格式：时长<60秒时为

Ns

，否则为

Mm Ss

（例如

3m 30s

）。

LVS输出： 直接输出

video_summary

内容（经过精炼、语气可控的报告——改写会丢失准确性）。直接输出每个

events

条目的

start_time

、

end_time

、

type

和完整

description

内容（如果客户端能清晰渲染表格则使用表格，否则使用事件列表）。您可以添加一行标题，并在结尾提供使用不同参数重新运行的选项。

VLM输出： 直接输出

choices[0].message.content

内容。如果模型生成了

<think>…</think><answer>…</answer>

块，请移除

<think>

块并展示答案部分。

备选警告（如适用）必须放在总结上方，切勿混入总结内容中。

Tips

提示

Route by service availability, not by duration. Probe
```
/v1/ready
```
once in Setup; HTTP 200 → LVS+HITL for every clip; anything else → VLM fallback.
HITL is mandatory on the LVS path. The
```
defaults
```
opt-in is the only sanctioned bypass. The VLM fallback path is silent (no HITL).
Readiness = HTTP 200 on
/v1/ready
. Nothing else. Body may be empty. Always use
```
curl -s -o /dev/null -w '%{http_code}'
```
— never pipe through
```
jq
```
/
```
grep
```
/
```
head
```
.
Delegate VIOS to
vss-manage-video-io-storage
— it is a sub-task; the final answer is the Step 2 summary, not the clip URL.
jq
twice for LVS output. First unwraps the OpenAI envelope, second parses the JSON string inside
```
content
```
.
Prefer
/v1/summarize
for 3.2 GA;
```
/summarize
```
is a compatibility alias.
Use the exact VLM model id advertised by the endpoint (default
```
nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8
```
).
Render output verbatim — no paraphrasing, no reformatting, no rewriting the
```
video_summary
```
or
```
choices[0].message.content
```
.
One call, one render. No parallel hedging, no double renderings.

根据服务可用性路由，而非时长。 在设置步骤中探测一次
```
/v1/ready
```
；HTTP 200 → 所有片段使用LVS+HITL；其他情况 → 使用VLM备选方案。
LVS路径必须运行HITL。 只有选择默认值的情况是唯一认可的绕过方式。VLM备选方案无需运行HITL。
就绪状态 =
/v1/ready
返回HTTP 200。仅此一项。响应体可能为空。始终使用
```
curl -s -o /dev/null -w '%{http_code}'
```
——切勿通过
```
jq
```
/
```
grep
```
/
```
head
```
处理。
将VIOS操作委托给
vss-manage-video-io-storage
——这是子任务；最终答案是步骤2的总结，而非片段URL。
LVS输出需两次使用
jq
。第一次解析OpenAI包，第二次解析
```
content
```
中的JSON字符串。
3.2 GA版本优先使用
/v1/summarize
；
```
/summarize
```
是兼容别名。
使用端点返回的准确VLM模型ID（默认值为
```
nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8
```
）。
直接输出内容——请勿改写、重新格式化或修改
```
video_summary
```
或
```
choices[0].message.content
```
。
一次调用，一次输出。 不支持并行对冲，不重复输出。

Cross-reference

交叉引用

vss-deploy-profile — bring up the
```
base
```
(VLM only) or
```
lvs
```
(VLM + video summarization service) profile
vss-manage-video-io-storage (VIOS API) — upload videos, list streams, get clip URLs
vss-search-archive — semantic search across the archive (different profile)
vss-query-analytics — query incidents/events from Elasticsearch
video summarization API reference —
```
references/video-summarization-api.md
```

video summarization service ops reference —

references/video-summarization-deployment.md

bump:1

vss-deploy-profile —— 启动
```
base
```
（仅VLM）或
```
lvs
```
（VLM + 视频总结服务）配置文件
vss-manage-video-io-storage（VIOS API）—— 上传视频、列出流、获取片段URL
vss-search-archive —— 跨归档文件的语义搜索（不同配置文件）
vss-query-analytics —— 从Elasticsearch查询事件/告警
视频总结API参考 ——
```
references/video-summarization-api.md
```

视频总结服务运维参考 ——

references/video-summarization-deployment.md

bump:1

vss-summarize-video

Original

Translation

Instructions

操作说明

Examples

示例

Purpose

用途

Prerequisites

前提条件

Limitations

限制条件

Troubleshooting

故障排查

Reference Map

参考指南

Video Summarization API And Service Ops Requests

视频总结API与服务运维请求

Routing

路由规则

Deployment prerequisite

部署前提

Setup

设置

VLM / RT-VLM: 200 on /v1/models

VLM / RT-VLM: /v1/models返回200

Video summarization service: 200 on /v1/ready, with retry on 503 (warmup) for up to ~30s

Video summarization service: /v1/ready返回200，503时重试（预热）最多约30秒

Step 1 - Get the clip URL via vss-manage-video-io-storage (sub-task, NOT the final answer)

步骤1 - 通过vss-manage-video-io-storage获取片段URL（子任务，非最终答案）

Step 2 — Primary: video summarization microservice with HITL

步骤2 — 主流程：带HITL的视频总结微服务

HITL: collect scenario and events first (REQUIRED — do not skip)

HITL：先收集场景和事件（必填——请勿跳过）

From HITL reply:

来自HITL的回复：

Step 2 fallback — VLM direct with default prompt

步骤2备选方案 — 直接使用带默认提示词的VLM

End-to-end example

端到端示例

Responses

响应说明

Presenting the output to the user

向用户展示输出

Tips

提示

Cross-reference

交叉引用

Step 1 - Get the clip URL via
`vss-manage-video-io-storage`
(sub-task, NOT the final answer)

步骤1 - 通过
`vss-manage-video-io-storage`
获取片段URL（子任务，非最终答案）