vss-summarize-video

Instructions

Follow the routing tables and step-by-step workflows below. Each section that ends in workflow, quick start, or flow is intended to be executed top-to-bottom. Detailed reference material lives in

references/

and helper scripts live in

scripts/

— call them via

run_script

when the skill points to a script by name.

Examples

Worked end-to-end examples are kept under

evals/

(each

*.json

manifest contains a runnable scenario) and inline in the per-workflow

curl

blocks below. Run a Tier-3 evaluation with

nv-base validate <this-skill-dir> --agent-eval

to replay them.

You are a video summarization assistant. You call the VLM NIM or the video summarization microservice directly. Always run

curl

commands yourself; never instruct the user to run them.

Primary video workflow query type: "Summarize this video." Direct video summarization API and service-ops requests are handled by the reference-routed sections below.

Purpose

Produce a single, polished narrative summary of one recorded video clip, with timestamped events when the LVS microservice path is reachable.

Do NOT use this skill for:

Live RTSP captioning — use
```
vss-deploy-dense-captioning
```
.
Incident-range or alert-window reports — use
```
vss-generate-video-report
```
Mode B.
Semantic search across the archive — use
```
vss-search-archive
```
.

Prerequisites

VSS
```
lvs
```
profile running on
```
$HOST_IP
```
(port 38111) OR a reachable VLM/RT-VLM endpoint as a fallback. The
```
vss-deploy-profile
```
skill brings these up.
Network reachability from the agent host to both endpoints; clip URLs from VIOS must be fetchable by the chosen backend.
```
jq
```
and
```
curl
```
available on the agent host.

Limitations

Direct VLM fallback uses a single fixed prompt and cannot target scenario/events — output quality is lower than the LVS path.
Remote VLM endpoints generally cannot reach
```
localhost
```
/private clip URLs.
One backend call per request; no parallel hedging or multi-pass summaries.

Troubleshooting

Symptom	Cause	Fix
`/v1/ready` returns 503 repeatedly	LVS service still warming up	Retry up to ~30 s as shown in Setup; if it never returns 200 the service may not be deployed
Empty `video_summary` and `events`	Clip does not contain the requested events	Re-run with broader `scenario` or different `events`
VLM returns `<think>` block	Cosmos Reason 2 reasoning mode	Strip everything up to `</think>` before rendering
Empty stdout from `curl /v1/ready`	Service legitimately returns 200 with empty body	Always check HTTP status with `-o /dev/null -w '%{http_code}'` , never inspect the body

See

references/video-summarization-debugging.md

for deeper diagnostics.

Reference Map

Use these references only when the user asks for the relevant detail, or when the core workflow below needs deeper video summarization information:

video summarization API details:

references/video-summarization-api.md

for

/v1/summarize

/summarize

/v1/generate_captions

/v1/stream_summarize

, health probes,

/models

/recommended_config

/metrics

, request fields, response shapes, and API gotchas.

video summarization service configuration and ops:
```
references/video-summarization-deployment.md
```
for the VSS
```
lvs
```
profile, ports, required env vars, logs, status, dry-runs, teardown, model/backend swaps, Elasticsearch/Neo4j/ArangoDB backend selection, and service-level troubleshooting.

Extended video summarization ops references:

references/video-summarization-environment-variables.md

references/video-summarization-debugging.md

, and

assets/video-summarization.env.example

Load

video-summarization-api.md

only when you need a request field, response shape, or endpoint that is not already covered by the Step 2 LVS or fallback VLM example below, or when handling a direct video summarization API request. Load

video-summarization-deployment.md

only for deployment, configuration, or service operations.

Video Summarization API And Service Ops Requests

If the user asks to call or debug video summarization endpoints directly, answer from

references/video-summarization-api.md

instead of running the end-to-end video summarization workflow. Examples: list video summarization models, check readiness, get recommended chunking config, inspect metrics, explain a 422 response, or build a

/v1/summarize

request body.

If the user asks to configure, deploy, restart, tear down, or troubleshoot the video summarization service, prefer the

vss-deploy-profile

skill for full VSS profile deployment and use

references/video-summarization-deployment.md

for video summarization-specific service details.

Routing

Decide purely from video summarization service availability (probed in Setup → Availability checks below). Duration does not drive routing.

`/v1/ready`	Backend	Endpoint
HTTP 200	LVS microservice with HITL	`POST ${LVS_BACKEND_URL}/v1/summarize`
Anything else	VLM / RT-VLM with the default prompt + fallback note	`POST ${VLM_BASE_URL}/v1/chat/completions`

Fallback message when the LVS service is unreachable — copy verbatim above the summary:

⚠ Note: Input video
<name>
is
<N>
s long. The video summarization service is not deployed, so this summary was produced by the VLM alone with a generic default prompt. Deploy the
lvs
profile for higher-quality summaries with scenario/events targeting.

Deployment prerequisite

The VSS lvs profile on

$HOST_IP

is the primary backend. If the

/v1/ready

probe (see Setup → Availability checks) returns anything other than 200 after the warmup retries, ask the user:

"The VSS
lvs
profile isn't running on
$HOST_IP
. Shall I deploy it now using the
/vss-deploy-profile
skill with
-p lvs
? Reply
no
to summarize with the VLM-only fallback instead (lower quality, no scenario/events targeting)."

Yes → hand off to
```
/vss-deploy-profile
```
, then re-probe and continue with Step 2 (LVS + HITL).
No → go straight to Step 2 fallback (VLM with default prompt) and prepend the Routing fallback note. Do not ask again, and do not run scenario/events HITL.
Pre-authorized to deploy autonomously (caller said so explicitly) → skip the confirmation and invoke
```
/vss-deploy-profile
```
directly.
Pre-authorized to use VLM fallback ("skip lvs, just use the VLM") → go straight to Step 2 fallback without prompting.

Setup

Endpoints (defaults for a local VSS
lvs
deployment):

VLM / RT-VLM:

${VLM_BASE_URL}

— default

${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}

LVS service:

${LVS_BACKEND_URL}

— default

http://${HOST_IP:-localhost}:38111

VIOS: owned by
```
vss-manage-video-io-storage
```
; refer there.

Use env vars when set (strip trailing

/v1

from the VLM base — the skill appends it). Otherwise use the defaults. If neither works, ask the user — do not scan ports or read config files to guess.

Model name: read

${VLM_NAME}

(default

nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8

). It must match the id RT-VLM

/v1/models

advertises; do not substitute the friendly

nvidia/cosmos-reason2-8b

For endpoint schemas, optional fields, response envelopes, and error handling, see

references/video-summarization-api.md

Availability checks (run both before routing). Readiness is determined by the HTTP status code only — the LVS

/v1/ready

may legitimately return

with an empty body, so do not inspect the body.

bash

VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"

# VLM / RT-VLM: 200 on /v1/models
vlm_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 \
  "$VLM/v1/models")
[ "$vlm_code" = "200" ] && echo "VLM OK" || echo "VLM not reachable (HTTP $vlm_code)"

# Video summarization service: 200 on /v1/ready, with retry on 503 (warmup) for up to ~30s
VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111}
video_sum_code=000
for i in $(seq 1 10); do
  video_sum_code=$(curl -s -o /dev/null -w '%{http_code}' --connect-timeout 3 "$VIDEO_SUMMARIZATION_URL/v1/ready")
  case "$video_sum_code" in
    200) echo "video summarization OK"; break ;;
    503) sleep 3 ;;                 # warming up; keep polling
    *)   break ;;                   # any other code = not reachable, stop retrying
  esac
done
[ "$video_sum_code" = "200" ] || echo "video summarization service not reachable (HTTP $video_sum_code)"

How to interpret the results:

```
video_sum_code = 200
```
→ Step 2 (LVS + HITL) for every video.
```
video_sum_code != 200
```
,
```
vlm_code = 200
```
→ Step 2 fallback (VLM); prepend the Routing fallback note.
```
vlm_code != 200
```
→ fail; at least one backend must be reachable.
A non-200 LVS code after the retry loop is the ONLY signal of unavailability. Empty stdout or missing JSON fields are NOT "unavailable."

Step 1 - Get the clip URL via

vss-manage-video-io-storage

(sub-task, NOT the final answer)

Use the
vss-manage-video-io-storage
skill for all VIOS interactions — it owns the canonical curl recipes, parameter defaults, and delete/upload flows. Do not fabricate URLs or hand-roll VIOS calls; they will drift.

This step is a sub-task — do NOT end your turn here; do NOT return the clip URL as the final answer. From VIOS collect three values:

streamId
(via
```
sensor/list
```
→
```
sensor/<id>/streams
```
, or directly from an upload response).
Timeline -
```
{startTime, endTime}
```
(ISO 8601 UTC).
```
endTime - startTime
```
is the duration; needed only for the user-facing header (routing is driven solely by
```
/v1/ready
```
).
Temporary MP4 clip URL — the
```
/storage/file/<streamId>/url
```
variant with
```
container=mp4
```
. Response field:
```
.videoUrl
```
. Both backends need an HTTP(S) URL they can
```
GET
```
.

Everything else (auth, upload,

disableAudio

, expiry, etc.) lives in the

vss-manage-video-io-storage

skill — refer users there if VIOS fails.

Step 2 — Primary: video summarization microservice with HITL

Use this path whenever

/v1/ready

returned 200 in Setup. Duration is irrelevant.

For advanced fields (

media_info

schema

, structured output, stream captioning, metrics, recommended config) see

references/video-summarization-api.md

HITL: collect scenario and events first (REQUIRED — do not skip)

Full walk-through is in

references/hitl-prompts.md

. Always run HITL before calling the LVS service.

Autonomous-mode defaults. When the caller has bypassed HITL ("run autonomously without prompting") AND the original query asks for

default

defaults

(or gives none), use

scenario="activity monitoring"

and

events=["notable activity"]

verbatim — do not infer from filename or sensor name. Note the defaults in the final reply and offer a re-run with more specific parameters. This is the ONLY supported HITL bypass; "the video is short" or "the user seems in a hurry" are not valid reasons.

Prefer

POST /v1/summarize

(3.2 GA route);

/summarize

is a compatibility alias.

bash

VIDEO_SUMMARIZATION_URL=${LVS_BACKEND_URL:-http://${HOST_IP:-localhost}:38111}

# From HITL reply:
SCENARIO='warehouse monitoring'
EVENTS_JSON='["notable activity"]'
OBJECTS_JSON=''  # '' to omit, else '["forklifts","pallets","workers"]'

curl -s -X POST "$VIDEO_SUMMARIZATION_URL/v1/summarize" \
  -H "Content-Type: application/json" \
  -d "$(jq -n --arg url "<clip_url_from_vss_manage_video_io_storage>" \
        --arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8}" \
        --arg scenario "$SCENARIO" \
        --argjson events "$EVENTS_JSON" \
        --argjson objects "${OBJECTS_JSON:-null}" '{
    url: $url,
    model: $model,
    scenario: $scenario,
    events: $events,
    chunk_duration: 10,
    num_frames_per_second_or_fixed_frames_chunk: 20,
    use_fps_for_chunking: false,
    seed: 1
  } + (if $objects == null then {} else {objects_of_interest: $objects} end)')" \
  | jq -r '.choices[0].message.content' \
  | jq '{video_summary, events}'

If both

video_summary

and

events

are empty, the clip probably doesn't contain the requested events — re-run with broader

scenario

events

, don't report "no content".

Tuning:

chunk_duration

(default

= single chunk),

num_frames_per_second_or_fixed_frames_chunk

(default

; meaning depends on

use_fps_for_chunking

seed

(default

num_frames_per_chunk

is deprecated.

Step 2 fallback — VLM direct with default prompt

Use this path only when

/v1/ready

did not return 200 after warmup. Do NOT run HITL — the user did not opt in; you fell back because the service was missing. Prepend the Routing fallback note to the response.

bash

VLM="${VLM_BASE_URL:-${RTVI_VLM_BASE_URL:-http://${HOST_IP:-localhost}:8018}}"
VLM="${VLM%/v1}"
PROMPT='Describe in detail what is happening in this video,
including all visible people, vehicles, equipments, objects,
actions, and environmental conditions.
OUTPUT REQUIREMENTS:
[timestamp-timestamp] Description of what is happening.
EXAMPLE:
[0.0s-4.0s] <description of the first event>
[4.0s-12.0s] <description of the second event>'

curl -s -X POST "$VLM/v1/chat/completions" \
  -H "Content-Type: application/json" \
  -d "$(jq -n \
        --arg model "${VLM_NAME:-nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8}" \
        --arg text "$PROMPT" \
        --arg url "<clip_url_from_vss_manage_video_io_storage>" \
        '{
          model: $model,
          temperature: 0.0,
          max_tokens: 1024,
          messages: [{
            role: "user",
            content: [
              {type: "text", text: $text},
              {type: "video_url", video_url: {url: $url}}
            ]
          }]
        }')" | jq -r '.choices[0].message.content'

Response: standard OpenAI chat-completion envelope. The summary is in

choices[0].message.content

Cosmos-model notes: Cosmos Reason 2 supports reasoning via

<think>...</think><answer>...</answer>

blocks. Omit the reasoning instructions if you want a plain summary. Frame sampling and pixel limits are applied server-side; no client-side prep is required when you pass a

video_url

End-to-end example

See

references/end-to-end-example.md

for the full LVS-or-VLM-fallback script that probes

/v1/ready

and runs the appropriate path.

Responses

VLM returns an OpenAI chat-completion envelope; summary is
```
choices[0].message.content
```
.
LVS service returns the same envelope but
```
content
```
is a JSON string — run
```
jq -r '.choices[0].message.content' | jq
```
to reach
```
{video_summary, events}
```
.
Errors surface as HTTP non-2xx plus JSON
```
{error: ...}
```
. LVS
```
503
```
usually means warmup — retry
```
/v1/ready
```
.

Presenting the output to the user

Surface backend output with minimal transformation — do not paraphrase, re-voice, add emojis, or reformat. One backend call → one rendering: no parallel hedging, no duplicate headers, never call both LVS and VLM for the same video.

Header line. Start with exactly one:

Summary of <video_name> (<duration>)

<duration>

Ns

for

< 60 s

, else

Mm Ss

(e.g.

3m 30s

LVS output: render

video_summary

verbatim (polished, tone-controlled report — rewriting loses fidelity). Render each

events

entry with its

start_time

end_time

type

, and full

description

verbatim (table when the client renders one cleanly, otherwise a per-event list). You MAY add a one-line header and a closing offer to re-run with different parameters.

VLM output: render

choices[0].message.content

verbatim. If the model produced

<think>…</think><answer>…</answer>

blocks, drop the

<think>

block and show the answer.

Fallback warning (when applicable) goes above the summary, never mixed into it.

Tips

Route by service availability, not by duration. Probe
```
/v1/ready
```
once in Setup; HTTP 200 → LVS+HITL for every clip; anything else → VLM fallback.
HITL is mandatory on the LVS path. The
```
defaults
```
opt-in is the only sanctioned bypass. The VLM fallback path is silent (no HITL).
Readiness = HTTP 200 on
/v1/ready
. Nothing else. Body may be empty. Always use
```
curl -s -o /dev/null -w '%{http_code}'
```
— never pipe through
```
jq
```
/
```
grep
```
/
```
head
```
.
Delegate VIOS to
vss-manage-video-io-storage
— it is a sub-task; the final answer is the Step 2 summary, not the clip URL.
jq
twice for LVS output. First unwraps the OpenAI envelope, second parses the JSON string inside
```
content
```
.
Prefer
/v1/summarize
for 3.2 GA;
```
/summarize
```
is a compatibility alias.
Use the exact VLM model id advertised by the endpoint (default
```
nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8
```
).
Render output verbatim — no paraphrasing, no reformatting, no rewriting the
```
video_summary
```
or
```
choices[0].message.content
```
.
One call, one render. No parallel hedging, no double renderings.

Cross-reference

vss-deploy-profile — bring up the
```
base
```
(VLM only) or
```
lvs
```
(VLM + video summarization service) profile
vss-manage-video-io-storage (VIOS API) — upload videos, list streams, get clip URLs
vss-search-archive — semantic search across the archive (different profile)
vss-query-analytics — query incidents/events from Elasticsearch
video summarization API reference —
```
references/video-summarization-api.md
```

video summarization service ops reference —

references/video-summarization-deployment.md

bump:1

vss-summarize-video

NPX Install

Tags

SKILL.md Content

Instructions

Examples

Purpose

Prerequisites

Limitations

Troubleshooting

Reference Map

Video Summarization API And Service Ops Requests

Routing

Deployment prerequisite

Setup

Step 1 - Get the clip URL via
`vss-manage-video-io-storage`
(sub-task, NOT the final answer)

Step 2 — Primary: video summarization microservice with HITL

HITL: collect scenario and events first (REQUIRED — do not skip)

Step 2 fallback — VLM direct with default prompt

End-to-end example

Responses

Presenting the output to the user

Tips

Cross-reference