vss-deploy-dense-captioning

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Purpose

用途

Stand up the RT-VLM dense-captioning microservice on its own and exercise every endpoint it exposes (file upload, generate_captions, stream add/delete, chat-completions, Kafka topics).

独立部署RT-VLM密集字幕微服务，并测试其暴露的所有端点（文件上传、generate_captions、流添加/删除、聊天补全、Kafka主题）。

Prerequisites

前提条件

—

独立RT-VLM部署：

For standalone RT-VLM deployment:

Docker, Docker Compose, NVIDIA Container Toolkit, and a visible GPU.
NGC registry credentials in
```
$NGC_CLI_API_KEY
```
for
```
docker login nvcr.io
```
, image pulls, and local NGC model/artifact downloads.
```
curl
```
,
```
jq
```
, and any writable working directory for the standalone compose copy.

For API calls against an existing service:

Running RT-VLM service reachable at
```
$BASE_URL
```
.
Bearer token in
```
$RTVI_VLM_API_KEY
```
or
```
$NGC_CLI_API_KEY
```
, depending on how the service was configured.

For full VSS profile deployment:

Use
```
../vss-deploy-profile/SKILL.md
```
; this skill does not deploy full VSS profiles.

Docker、Docker Compose、NVIDIA Container Toolkit，以及可识别的GPU。
存储在
```
$NGC_CLI_API_KEY
```
中的NGC注册表凭证，用于
```
docker login nvcr.io
```
、镜像拉取和本地NGC模型/工件下载。
```
curl
```
、
```
jq
```
，以及用于存放独立compose副本的可写工作目录。

Instructions

针对现有服务的API调用：

Follow the routing tables and step-by-step workflows below. Each section that ends in workflow, quick start, or flow is intended to be executed top-to-bottom. Detailed reference material lives in

references/

and helper scripts live in

scripts/

— call them via

run_script

when the skill points to a script by name.

可通过
```
$BASE_URL
```
访问的运行中的RT-VLM服务。
根据服务配置，需提供存储在
```
$RTVI_VLM_API_KEY
```
或
```
$NGC_CLI_API_KEY
```
中的Bearer令牌。

Examples

完整VSS配置文件部署：

Worked end-to-end examples are kept under

evals/

(each

*.json

manifest contains a runnable scenario) and inline in the per-workflow

curl

blocks below. Run a Tier-3 evaluation with

nv-base validate <this-skill-dir> --agent-eval

to replay them.

使用
```
../vss-deploy-profile/SKILL.md
```
；本技能不负责部署完整VSS配置文件。

Limitations

操作说明

Requires either a standalone RT-VLM service deployed via this skill or an existing RT-VLM service reachable from the caller.
NGC-hosted models and NIMs may be subject to rate-limits, GPU memory requirements, and license restrictions.
Concurrency, GPU memory, and storage limits depend on the host hardware and the profile's compose file.

遵循以下路由表和分步工作流。所有以工作流、快速入门或流程结尾的章节都需要从上到下执行。详细参考资料位于

references/

目录，辅助脚本位于

scripts/

目录——当技能指向某个脚本名称时，通过

run_script

调用它们。

Troubleshooting

示例

Error: REST call returns connection refused. Cause: target microservice not running. Solution: probe
```
/docs
```
or
```
/health
```
; redeploy via
```
vss-deploy-profile
```
or the matching
```
vss-deploy-*
```
skill.
Error: HTTP 401/403 from NGC pulls. Cause: missing/expired
```
NGC_CLI_API_KEY
```
. Solution:
```
docker login nvcr.io
```
and re-export the key before retrying.
Error: container OOM or model fails to load. Cause: insufficient GPU memory for the selected profile. Solution: switch to a smaller variant or free GPUs via
```
docker compose down
```
.

完整的端到端示例保存在

evals/

目录下（每个

*.json

清单包含一个可运行的场景），并内嵌在各工作流的

curl

代码块中。运行

nv-base validate <this-skill-dir> --agent-eval

可重放这些示例，完成Tier-3评估。

Deploy and Use RT-VLM Dense Captioning (VSS 3.2)

限制

RT-VLM is NVIDIA's real-time vision-language microservice: decode video (file or RTSP), segment it into chunks, run a VLM (

cosmos-reason1

cosmos-reason2

, or any OpenAI-compatible model), stream dense captions back over SSE/HTTP, and publish captions, incident alerts, and errors to Kafka. Use this skill to deploy the standalone RT-VLM service when a full VSS profile is not already running, then call its

/v1/...

API for caption generation, file upload, live-stream management, health checks, NIM-compatible chat completions, or Prometheus metrics. API reference: https://docs.nvidia.com/vss/latest/real-time-vlm-api.html.

需要通过本技能部署的独立RT-VLM服务，或调用方可访问的现有RT-VLM服务。
NGC托管的模型和NIM可能受速率限制、GPU内存要求和许可证限制。
并发数、GPU内存和存储限制取决于主机硬件和配置文件的compose文件。

Deployment Routing

故障排除

If the user asks to deploy a full VSS profile, use

../vss-deploy-profile/SKILL.md

. That skill owns profile routing,

generated.env

resolved.yml

, multi-service sizing, and full-stack deploy/teardown.

If the user asks for standalone RT-VLM dense captioning, or no VSS profile is already running, use the standalone RT-VLM flow in

references/deploy-rt-vlm-service.md

before calling the API. This follows the same compose-centric pattern as

vss-deploy-profile

: gather context, run preflights, work from a local copy, dry-run with

docker compose config

, review, deploy, then wait for health.

错误：REST调用返回连接拒绝。原因：目标微服务未运行。解决方案：探测
```
/docs
```
或
```
/health
```
端点；通过
```
vss-deploy-profile
```
或对应的
```
vss-deploy-*
```
技能重新部署。
错误：NGC拉取时返回HTTP 401/403。原因：
```
NGC_CLI_API_KEY
```
缺失或过期。解决方案：执行
```
docker login nvcr.io
```
并重新导出密钥后重试。
错误：容器内存不足（OOM）或模型加载失败。原因：所选配置文件的GPU内存不足。解决方案：切换到更小的模型变体，或通过
```
docker compose down
```
释放GPU资源。

Standalone Deployment Flow

部署并使用RT-VLM密集字幕服务（VSS 3.2）

Always follow this sequence. Never skip the dry-run.

bash

undefined

RT-VLM是NVIDIA的实时视觉语言微服务：解码视频（文件或RTSP流），将其分割为片段，运行VLM模型（

cosmos-reason1

、

cosmos-reason2

或任何OpenAI兼容模型），通过SSE/HTTP返回密集字幕流，并将字幕、事件警报和错误信息发布到Kafka。当完整VSS配置文件未运行时，使用本技能部署独立RT-VLM服务，然后调用其

/v1/...

API进行字幕生成、文件上传、直播流管理、健康检查、NIM兼容的聊天补全或Prometheus指标查询。API参考文档：https://docs.nvidia.com/vss/latest/real-time-vlm-api.html。

1. Copy deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml

部署路由

into any writable standalone working directory.

—

2. Derive RTVI_VLM_IMAGE_TAG from that compose copy.

—

3. Strip the standalone-only dangling depends_on block from the copy.

—

4. Create a gitignored .env with the required RT-VLM values.

—

5. Prepare host bind paths such as $VSS_DATA_DIR/data_log/vst/clip_storage.

—

6. docker compose --env-file .env -f rtvi-vlm-docker-compose.yml config --quiet

—

7. docker pull the exact RT-VLM image tag.

—

8. docker compose ... up -d rtvi-vlm, wait for ready, then smoke test.

—


Run preflights before any pull or `up`; stop and fix failures here before
debugging RT-VLM itself:

```bash
nvidia-smi --query-gpu=index,name --format=csv,noheader
nvidia-container-cli info
docker compose version
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

For standalone single-file deployments, do not run the raw

deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml

directly: it contains

depends_on

references to sibling VLM/NIM services that are only defined in the full VSS/met-blueprints compose project. The standalone reference shows how to copy the compose file, derive the current image tag from it, strip the

depends_on

block, and validate the result before

up

docker pull

fails with a containerd snapshotter/unpack error on Docker 28+, apply the

/etc/docker/daemon.json

containerd-snapshotter=false

fix in the standalone reference before retrying.

Minimum standalone

.env

values:

Host env var	Required when	Purpose
`NGC_CLI_API_KEY`	Standalone deploy path	NGC registry image pull and NGC model/artifact download
`RTVI_VLM_API_KEY` or `NGC_CLI_API_KEY`	Authenticated API calls	RT-VLM bearer auth after the service is running
`RTVI_VLM_PORT`	Always	Host API port mapped to container `8000`
`HOST_IP`	Always	Kafka bootstrap host ( `${HOST_IP}:9092` )
`VSS_DATA_DIR`	Always	Required clip-storage bind mount
`RTVI_VLM_MODEL_TO_USE`	Always for standalone	Backend selector; use `cosmos-reason2` for the default local model or `openai-compat` for a remote/sibling endpoint
`RTVI_VLM_MODEL_PATH`	Local self-hosted model	Source-backed Cosmos Reason 2 path: `ngc:nim/nvidia/cosmos-reason2-8b:0303-fp8-dynamic-kv8`
`RTVI_VLM_ENDPOINT`	`RTVI_VLM_MODEL_TO_USE=openai-compat`	Remote/sibling OpenAI-compatible VLM endpoint
`VLM_NAME`	`RTVI_VLM_MODEL_TO_USE=openai-compat`	Model/deployment name exposed by that endpoint

如果用户要求部署完整VSS配置文件，请使用

../vss-deploy-profile/SKILL.md

。该技能负责配置文件路由、

generated.env

、

resolved.yml

、多服务规模调整和全栈部署/销毁。

如果用户要求部署独立RT-VLM密集字幕服务，或当前未运行任何VSS配置文件，请先使用

references/deploy-rt-vlm-service.md

中的独立RT-VLM流程，再调用API。此流程遵循与

vss-deploy-profile

相同的compose中心模式：收集上下文、运行预检查、基于本地副本操作、通过

docker compose config

执行试运行、审核配置、部署，然后等待服务就绪。

Setup

独立部署流程

bash

export BASE_URL="http://localhost:${RTVI_VLM_PORT:-8018}"  # host-side RT-VLM port
export API_KEY="${NGC_CLI_API_KEY:-${RTVI_VLM_API_KEY:-}}" # bearer token used by host-side curl commands
: "${API_KEY:?Set NGC_CLI_API_KEY or RTVI_VLM_API_KEY before calling authenticated endpoints}"

Every request below uses

Authorization: Bearer $API_KEY

. Health endpoints (

/v1/health/*

/v1/ready

/v1/live

/v1/startup

) typically work without auth.

Smoke test before use:

bash

curl -fsS "$BASE_URL/v1/health/ready"
MODEL_ID="$(curl -fsS "$BASE_URL/v1/models" -H "Authorization: Bearer $API_KEY" | jq -r '.data[0].id // .id')"
curl -fsS "$BASE_URL/openapi.json" | jq -r '.paths | keys[]' | sort

请严格遵循以下步骤，切勿跳过试运行环节。

bash

undefined

Quick Start — dense captions from a local video

1. 将deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml

—

复制到任意可写的独立工作目录。

—

2. 从该compose副本中获取RTVI_VLM_IMAGE_TAG。

—

3. 移除副本中仅适用于独立部署的无用depends_on块。

—

4. 创建一个被git忽略的.env文件，填入所需的RT-VLM配置值。

—

5. 准备主机绑定路径，例如$VSS_DATA_DIR/data_log/vst/clip_storage。

—

6. 执行docker compose --env-file .env -f rtvi-vlm-docker-compose.yml config --quiet

—

7. 拉取指定版本的RT-VLM镜像。

—

8. 执行docker compose ... up -d rtvi-vlm，等待服务就绪，然后进行冒烟测试。

bash

undefined


在执行拉取或`up`操作前先运行预检查；在此阶段解决所有失败问题，再调试RT-VLM本身：

```bash
nvidia-smi --query-gpu=index,name --format=csv,noheader
nvidia-container-cli info
docker compose version
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi

对于独立单文件部署，请勿直接运行原始的

deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml

：它包含对同级VLM/NIM服务的

depends_on

引用，而这些服务仅在完整VSS/met-blueprints compose项目中定义。独立部署参考文档展示了如何复制compose文件、从中获取当前镜像标签、移除

depends_on

块，并在

up

操作前验证结果。

如果在Docker 28+版本中执行

docker pull

时出现containerd快照器/解压错误，请先应用独立参考文档中提到的

/etc/docker/daemon.json

设置

containerd-snapshotter=false

的修复方案，再重试。

1. Upload the video, capture its file id

独立部署所需的最小.env配置值：

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')

主机环境变量	适用场景	用途
`NGC_CLI_API_KEY`	独立部署路径	NGC注册表镜像拉取和NGC模型/工件下载
`RTVI_VLM_API_KEY` 或 `NGC_CLI_API_KEY`	需认证的API调用	服务运行后的RT-VLM Bearer认证
`RTVI_VLM_PORT`	所有场景	映射到容器端口 `8000` 的主机API端口
`HOST_IP`	所有场景	Kafka引导主机（ `${HOST_IP}:9092` ）
`VSS_DATA_DIR`	所有场景	必需的剪辑存储绑定挂载路径
`RTVI_VLM_MODEL_TO_USE`	所有独立部署场景	后端选择器；默认本地模型使用 `cosmos-reason2` ，远程/同级端点使用 `openai-compat`
`RTVI_VLM_MODEL_PATH`	本地自托管模型	Cosmos Reason 2模型的源路径： `ngc:nim/nvidia/cosmos-reason2-8b:0303-fp8-dynamic-kv8`
`RTVI_VLM_ENDPOINT`	`RTVI_VLM_MODEL_TO_USE=openai-compat`	远程/同级OpenAI兼容VLM端点
`VLM_NAME`	`RTVI_VLM_MODEL_TO_USE=openai-compat`	该端点暴露的模型/部署名称

2. Generate captions + alerts (SSE stream of chunked responses)

环境设置

curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Write a concise dense caption for each 10-second segment of this warehouse video.", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"

undefined

bash

export BASE_URL="http://localhost:${RTVI_VLM_PORT:-8018}"  # 主机侧RT-VLM端口
export API_KEY="${NGC_CLI_API_KEY:-${RTVI_VLM_API_KEY:-}}" # 主机侧curl命令使用的Bearer令牌
: "${API_KEY:?调用需认证的端点前，请设置NGC_CLI_API_KEY或RTVI_VLM_API_KEY}"

以下所有请求均使用

Authorization: Bearer $API_KEY

。健康端点（

/v1/health/*

、

/v1/ready

、

/v1/live

、

/v1/startup

）通常无需认证即可访问。

使用前的冒烟测试：

bash

curl -fsS "$BASE_URL/v1/health/ready"
MODEL_ID="$(curl -fsS "$BASE_URL/v1/models" -H "Authorization: Bearer $API_KEY" | jq -r '.data[0].id // .id')"
curl -fsS "$BASE_URL/openapi.json" | jq -r '.paths | keys[]' | sort

Endpoints

快速入门——从本地视频生成密集字幕

Captions

—

Generate VLM captions and alerts for videos and live streams.

bash

undefined

POST /v1/generate_captions

— Generate VLM captions (and alerts) for video/stream

1. 上传视频，获取文件ID

Required:

Field	Type	Description
`id`	string \| array	UUID of a previously-uploaded file, or id of an active live stream. Accepts a list of ids for batch
`prompt`	string	User prompt to the VLM (e.g. dense-caption instruction)
`model`	string	Exact model id returned by `GET /v1/models` , for example `nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8` ; backend selector aliases such as `cosmos-reason2` are not request model ids

Key optional fields:

Field	Type	Default	Description
`system_prompt`	string	—	System prompt; use `<think></think><answer></answer>` tags to enable reasoning on Cosmos Reason
`enable_reasoning`	boolean	false	Turn on reasoning for Cosmos Reason models
`enable_audio`	boolean	false	Transcribe audio (via Riva) and fold into captions
`chunk_duration`	integer	—	Segment video into N-second chunks ( `0` = no chunking)
`chunk_overlap_duration`	integer	0	Overlap between consecutive chunks
`num_frames_per_second_or_fixed_frames_chunk`	number	—	FPS (if `use_fps_for_chunking=true` ) or fixed frames per chunk
`use_fps_for_chunking`	boolean	false	Interpret above as FPS vs. fixed-frame count
`vlm_input_width` / `vlm_input_height`	int	—	Resize frames before inference (0 = native)
`media_info`	object	—	`{"type":"offset","start_offset":0,"end_offset":10}` to process a slice of a file (not live streams)
`stream`	boolean	false	SSE: emit per-chunk caption deltas as `data:` events (recommended for long videos)
`max_tokens` / `temperature` / `top_p` / `top_k` / `seed` / `ignore_eos`			Standard sampling controls
`response_format`	object	—	Query response format object
`mm_processor_kwargs`	object	—	Extra kwargs for the multimodal processor (e.g. size, shortest/longest edge)

bash

curl -N -X POST "$BASE_URL/v1/generate_captions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "prompt": "Dense-caption this warehouse video, one sentence per 10s chunk.",
    "model": "nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8",
    "chunk_duration": 10,
    "stream": true
  }'

Response shape: live 26.05 responses use

chunk_responses

with

start_time

end_time

; SSE streams terminate with

data: [DONE]

. See

references/api-surface-26.05.md

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')

DELETE /v1/generate_captions/{stream_id}

— Stop caption generation for a live stream, if exposed

2. 生成字幕和警报（分段响应的SSE流）

Some deployments expose this companion stop endpoint. Check the live OpenAPI (

curl -fsS "$BASE_URL/openapi.json" | jq '.paths | keys[]'

) before using it. Always pair live-stream cleanup with

DELETE /v1/streams/delete/{stream_id}

to un-register the RTSP source.

bash

curl -X DELETE "$BASE_URL/v1/generate_captions/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "为该仓库视频的每10秒片段生成简洁的密集字幕。", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"

undefined

Files

端点说明

—

字幕生成

Upload and manage media files consumed by
/v1/generate_captions
.

为视频和直播流生成VLM字幕和警报。

POST /v1/files

— Upload a media file (multipart)

POST /v1/generate_captions

— 为视频/流生成VLM字幕（和警报）

bash

curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
  -F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"

Response:

{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }

. Optional metadata such as

sensor_name

may be accepted by newer builds; check the live OpenAPI before sending it.

必填字段：

字段	类型	描述
`id`	string \| array	已上传文件的UUID，或活跃直播流的ID。支持传入ID列表进行批量处理
`prompt`	string	给VLM的用户提示（例如密集字幕生成指令）
`model`	string	`GET /v1/models` 返回的精确模型ID，例如 `nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8` ；后端选择器别名（如 `cosmos-reason2` ）不属于请求模型ID

关键可选字段：

字段	类型	默认值	描述
`system_prompt`	string	—	系统提示；使用 `<think></think><answer></answer>` 标签可启用Cosmos Reason模型的推理功能
`enable_reasoning`	boolean	false	为Cosmos Reason模型开启推理功能
`enable_audio`	boolean	false	转录音频（通过Riva）并整合到字幕中
`chunk_duration`	integer	—	将视频分割为N秒的片段（ `0` 表示不分割）
`chunk_overlap_duration`	integer	0	连续片段之间的重叠时长
`num_frames_per_second_or_fixed_frames_chunk`	number	—	FPS（当 `use_fps_for_chunking=true` 时）或每个片段的固定帧数
`use_fps_for_chunking`	boolean	false	将上述字段解释为FPS还是固定帧数
`vlm_input_width` / `vlm_input_height`	int	—	推理前调整帧大小（0表示使用原生尺寸）
`media_info`	object	—	`{"type":"offset","start_offset":0,"end_offset":10}` 用于处理文件的一部分（不适用于直播流）
`stream`	boolean	false	SSE：以 `data:` 事件形式逐段返回字幕增量（推荐用于长视频）
`max_tokens` / `temperature` / `top_p` / `top_k` / `seed` / `ignore_eos`			标准采样控制参数
`response_format`	object	—	查询响应格式对象
`mm_processor_kwargs`	object	—	多模态处理器的额外参数（例如尺寸、最短/最长边）

bash

curl -N -X POST "$BASE_URL/v1/generate_captions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "prompt": "为该仓库视频生成密集字幕，每10秒片段一句话。",
    "model": "nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8",
    "chunk_duration": 10,
    "stream": true
  }'

响应格式： 26.05版本的实时响应使用包含

start_time

end_time

的

chunk_responses

；SSE流以

data: [DONE]

结束。详情请参考

references/api-surface-26.05.md

。

GET /v1/files?purpose=vision

— List uploaded files

DELETE /v1/generate_captions/{stream_id}

— 停止直播流的字幕生成（若端点已暴露）

GET /v1/files/{file_id}

— File metadata

—

GET /v1/files/{file_id}/content

— Download original file content

—

DELETE /v1/files/{file_id}

— Delete file (releases asset storage)

—

Live Stream

—

RTSP stream lifecycle.

部分部署会暴露此配套停止端点。使用前请检查实时OpenAPI（

curl -fsS "$BASE_URL/openapi.json" | jq '.paths | keys[]'

）。清理直播流时，请务必配合调用

DELETE /v1/streams/delete/{stream_id}

以注销RTSP源。

bash

curl -X DELETE "$BASE_URL/v1/generate_captions/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

POST /v1/streams/add

— Register one or more RTSP streams

文件管理

Required per stream:

liveStreamUrl

(must start with

rtsp://

description

. Optional:

username

password

sensor_name

, and placement metadata (

place_name

place_type

place_lat

place_lon

place_alt

place_coordinate_x

place_coordinate_y

Precheck public or external RTSP sources before registering them. A probe exit code alone is not enough;

gst-discoverer-1.0

can exit

while reporting an unknown media type. Treat the stream as usable only when a probe output identifies at least one video stream/caps entry. If one probe is inconclusive, cross-check with another tool such as

ffprobe

before failing or registering:

bash

ffprobe -v error -select_streams v:0 \
  -show_entries stream=codec_type -of csv=p=0 "$RTSP_URL" | grep -qx video

bash

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
  | jq -r '.results[0].id')

上传和管理供
/v1/generate_captions
使用的媒体文件。

GET /v1/streams/get-stream-info

— List active streams

POST /v1/files

— 上传媒体文件（多部分表单）

DELETE /v1/streams/delete/{stream_id}

— Remove a single stream

—

DELETE /v1/streams/delete-batch

— Remove many (

{"stream_ids":[...]}

)

—

CV-style singular stream endpoints

—

26.05 deployments also expose CV-style stream control paths:

POST /v1/stream/add

GET /v1/stream/get-stream-info

, and

POST /v1/stream/remove

. Use these when a workflow or release note explicitly uses the key/value envelope; otherwise prefer the plural RT-VLM stream endpoints above. See

references/api-surface-26.05.md

for examples and the

stream_count:0

compatibility caveat.

bash

curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
  -F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"

响应：

{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }

。新版本可能接受

sensor_name

等可选元数据；发送前请检查实时OpenAPI。

NIM Compatible

GET /v1/files?purpose=vision

— 列出已上传文件

—

GET /v1/files/{file_id}

— 获取文件元数据

—

GET /v1/files/{file_id}/content

— 下载原始文件内容

—

DELETE /v1/files/{file_id}

— 删除文件（释放资产存储空间）

—

直播流管理

OpenAI-compatible endpoints for interop with OpenAI/NVIDIA-API clients.

RTSP流生命周期管理。

POST /v1/chat/completions

— OpenAI-compatible chat (text + multimodal)

POST /v1/streams/add

— 注册一个或多个RTSP流

Required:

messages

model

. Text-only requests work and omit

id

video_url

, and

image_url

. For uploaded-video, direct

video_url

, direct

image_url

, streaming, and RTSP-backed chat examples, see

references/api-surface-26.05.md

bash

curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"$MODEL_ID\",\"messages\":[{\"role\":\"user\",\"content\":\"Summarize this scene.\"}]}"

每个流必填字段：

liveStreamUrl

（必须以

rtsp://

开头）、

description

。可选字段：

username

、

password

、

sensor_name

，以及位置元数据（

place_name

、

place_type

、

place_lat

、

place_lon

、

place_alt

、

place_coordinate_x

、

place_coordinate_y

）。

注册公共或外部RTSP源前请先进行预检查。仅探测退出码不足以判断；

gst-discoverer-1.0

可能返回

但报告未知媒体类型。只有当探测输出识别到至少一个视频流/caps条目时，才认为该流可用。如果一次探测结果不确定，请使用

ffprobe

等其他工具交叉验证后再决定是否注册：

bash

ffprobe -v error -select_streams v:0 \
  -show_entries stream=codec_type -of csv=p=0 "$RTSP_URL" | grep -qx video

bash

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
  | jq -r '.results[0].id')

POST /v1/completions

— OpenAI-compatible legacy completions

GET /v1/streams/get-stream-info

— 列出活跃流

—

DELETE /v1/streams/delete/{stream_id}

— 删除单个流

—

DELETE /v1/streams/delete-batch

— 批量删除多个流（传入

{"stream_ids":[...]}

）

—

CV风格的单流端点

This endpoint exists for compatibility, but on current 26.05 builds text-only legacy completion requests return HTTP 400 by design. Use

/v1/chat/completions

for text-only and multimodal requests.

26.05版本部署还暴露CV风格的流控制路径：

POST /v1/stream/add

、

GET /v1/stream/get-stream-info

和

POST /v1/stream/remove

。当工作流或发行说明明确使用键值信封时使用这些端点；否则优先使用上述RT-VLM的复数流端点。示例和

stream_count:0

兼容性注意事项请参考

references/api-surface-26.05.md

。

GET /v1/version

—

{ "version": "3.2.0-..." }

NIM兼容端点

GET /v1/manifest

— NIM manifest

—

GET /v1/health/live

GET /v1/health/ready

— NIM-style probes

—

Do not assume

/v1/license

exists. The current 26.05 live OpenAPI does not expose it and the endpoint returns 404; only call it after checking

GET /openapi.json

与OpenAI/NVIDIA-API客户端互操作的OpenAI兼容端点。

Models · Metadata · Metrics · Health Check

POST /v1/chat/completions

— OpenAI兼容的聊天（文本+多模态）

GET /v1/models

— List loaded VLMs:

{ "data": [{ "id", "object": "model", "owned_by" }] }

—

GET /v1/metadata

— Service metadata (build, release, image tag)

—

GET /v1/assets/stats

— Asset storage counts, TTL, and oldest-asset age

—

GET /v1/metrics

— Prometheus metrics (plain text)

—

GET /v1/ready

GET /v1/live

GET /v1/startup

— Kubernetes-style probes

—

必填字段：

messages

、

model

。纯文本请求无需传入

id

、

video_url

和

image_url

。上传视频、直接

video_url

、直接

image_url

、流和RTSP支持的聊天示例，请参考

references/api-surface-26.05.md

。

bash

curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"$MODEL_ID\",\"messages\":[{\"role\":\"user\",\"content\":\"Summarize this scene.\"}]}"

Common Workflows

POST /v1/completions

— OpenAI兼容的旧版补全

The four standard dense-captioning scenarios.

此端点仅用于兼容性，但在当前26.05版本中，纯文本旧版补全请求会返回HTTP 400（设计如此）。请使用

/v1/chat/completions

处理纯文本和多模态请求。

1. Dense captions from a stored video file

GET /v1/version

— 返回

{ "version": "3.2.0-..." }

—

GET /v1/manifest

— 获取NIM清单

—

GET /v1/health/live

GET /v1/health/ready

— NIM风格的探测端点

bash

undefined

请勿假设

/v1/license

端点存在。当前26.05版本的实时OpenAPI未暴露该端点，调用会返回404；请先检查

GET /openapi.json

再调用。

Upload → capture file id → generate captions (SSE stream)

模型·元数据·指标·健康检查

—

GET /v1/models

— 列出已加载的VLM模型：

{ "data": [{ "id", "object": "model", "owned_by" }] }

—

GET /v1/metadata

— 获取服务元数据（构建版本、发行版、镜像标签）

—

GET /v1/assets/stats

— 获取资产存储统计信息、TTL和最旧资产时长

—

GET /v1/metrics

— 获取Prometheus指标（纯文本格式）

—

GET /v1/ready

GET /v1/live

GET /v1/startup

— Kubernetes风格的探测端点

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')

curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Describe warehouse events in 1 sentence per 10s chunk.", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"

When done, free storage:

常见工作流

curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"

undefined

四种标准的密集字幕生成场景。

2. Dense captions from an RTSP live stream

1. 从存储的视频文件生成密集字幕

bash

undefined

bash

undefined

Register the stream

上传视频 → 获取文件ID → 生成字幕（SSE流）

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')

curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "每10秒片段用一句话描述仓库事件。", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"

Start continuous caption generation

完成后释放存储空间：

curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "Describe each event; start each sentence with a timestamp.", "model": "$MODEL_ID", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &

curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"

undefined

Tear down when finished. If the live OpenAPI exposes

2. 从RTSP直播流生成密集字幕

DELETE /v1/generate_captions/{stream_id}, call it before unregistering.

—

curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

undefined

bash

undefined

3. Dense captions with alerts from an RTSP stream

注册直播流

bash

undefined

Pre-req: Kafka is enabled and topics match the deployment source.

启动持续字幕生成

The checked-in rtvi-vlm/.env and VSS alerts profiles use:

—

RTVI_VLM_KAFKA_ENABLED=true

—

RTVI_VLM_KAFKA_TOPIC=mdx-vlm

—

RTVI_VLM_KAFKA_INCIDENT_TOPIC=mdx-vlm-incidents

—

RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors

—

HOST_IP=<kafka-host>

—

A copied compose without those env overrides falls back to vision-llm-* topics.

—

Confirm the live container before consuming:

—

docker exec vss-rtvi-vlm printenv KAFKA_TOPIC KAFKA_INCIDENT_TOPIC ERROR_MESSAGE_TOPIC

—

curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "You are a warehouse monitoring system. Describe the scene in one sentence, then on a new line output exactly:\nAnomaly Detected: Yes/No\nReason: <one sentence>\nFlag an anomaly if any worker is missing a hard hat or high-vis vest.", "system_prompt": "Answer the user's question correctly in yes or no.", "model": "$MODEL_ID", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"


**Consume alerts from Kafka**. Kafka values are NvSchema protobuf payloads, so
use `print.value=false` for a clean validation pass that shows timestamp, key,
and headers without dumping binary payload bytes. The VSS alerts/profile source
uses `mdx-vlm-incidents`; a bare copied compose may fall back to
`vision-llm-events-incidents` if no `RTVI_VLM_KAFKA_INCIDENT_TOPIC` override is
loaded. Prefer the live container environment over hard-coded topic names.
```bash
INCIDENT_TOPIC="${INCIDENT_TOPIC:-$(docker exec vss-rtvi-vlm printenv KAFKA_INCIDENT_TOPIC 2>/dev/null || true)}"
INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"

docker exec mdx-kafka kafka-console-consumer \
  --bootstrap-server 127.0.0.1:9092 \
  --topic "$INCIDENT_TOPIC" \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false

If Kafka is not running in the VSS

mdx-kafka

container, use the Kafka CLI from the host or container running the broker:

bash

INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"

kafka-console-consumer \
  --bootstrap-server "$HOST_IP:9092" \
  --topic "$INCIDENT_TOPIC" \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false

For standalone validation, remember that the RT-VLM compose maps Kafka through

KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092

; setting

KAFKA_BOOTSTRAP_SERVERS

directly in

.env

is ignored unless the compose is changed. The broker must advertise a listener reachable from the

vss-rtvi-vlm

container.

localhost

inside the broker and service containers is not the host, and a broker alias such as

kafka:9092

only works when both containers share that Docker network. For RT-VLM-only validation, prefer the self-contained broker in

references/kafka-workflows.md

over the full repo infra compose; the latter expects full-profile SDRC env/config. If Kafka is already running, ask the user whether to reuse it or launch a dedicated broker before stopping or replacing anything. Run CLI checks inside the actual broker container, but still configure the advertised listener so RT-VLM can connect from its container network.

Incident protobuf (

ext.proto :: Incident

) key fields:

sensorId

timestamp

end

objectIds

frameIds

place

analyticsModule

category

isAnomaly

(

true

for alerts),

llm

(nested VisionLLM),

info

map including

triggerPhrase

verdict

requestId

chunkIdx

streamId

alertCategory

(if the deployment supports the

alert_category

query field — post-3.1).

curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "描述每个事件；每句话以时间戳开头。", "model": "$MODEL_ID", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &

4. Kafka workflows (alerts + message bus)

完成后清理资源。如果实时OpenAPI暴露了

—

DELETE /v1/generate_captions/{stream_id}，请先调用它再注销流。

Dense captioning with alerts on an RTSP stream and the HTTP-vs-Kafka response model are documented in

references/kafka-workflows.md

curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

undefined

Error Reference

3. 从RTSP流生成带警报的密集字幕

Code	Meaning	Common Cause
400	Bad Request	Missing required field ( `id` , `prompt` , `model` ); unsupported `media_type` ; unknown `model` name
401	Unauthorized	Missing/invalid `Authorization: Bearer $API_KEY` — or wrong key format (expect `nvapi-...` )
404	Not Found	`file_id` deleted / stream_id not registered / wrong endpoint path (note: `{stream_id}` is required on `DELETE /v1/streams/delete/{stream_id}` )
413	Payload Too Large	Uploaded file exceeds server `MAX_FILE_SIZE` ; increase or pre-chunk the video
422	Unprocessable Entity	Pydantic schema violation — e.g. `use_fps_for_chunking=true` without `num_frames_per_second_or_fixed_frames_chunk` ; stream ids supplied to a file-only field like `media_info`
429	Rate Limited	Too many concurrent streams — raise `VLM_BATCH_SIZE` or spread across instances
500	Server Error	VLM inference exception (OOM, model unavailable) — check `docker logs vss-rtvi-vlm`
503	Service Busy	Startup not complete (model still downloading) or upstream NIM dependency unhealthy

bash

undefined

Gotchas

前提：Kafka已启用，且主题与部署源匹配。

—

已签入的rtvi-vlm/.env和VSS警报配置文件使用：

—

RTVI_VLM_KAFKA_ENABLED=true

—

RTVI_VLM_KAFKA_TOPIC=mdx-vlm

—

RTVI_VLM_KAFKA_INCIDENT_TOPIC=mdx-vlm-incidents

—

RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors

—

HOST_IP=<kafka-host>

—

如果复制的compose文件没有这些环境变量覆盖，会回退到vision-llm-*主题。

—

消费前请确认容器的实时配置：

—

docker exec vss-rtvi-vlm printenv KAFKA_TOPIC KAFKA_INCIDENT_TOPIC ERROR_MESSAGE_TOPIC

Use the live OpenAPI as the source of truth. For VSS 3.2, the caption-generation endpoint is
```
/v1/generate_captions
```
. Some older references and images used
```
/v1/generate_captions_alerts
```
; do not assume that path exists unless
```
GET /openapi.json
```
shows it.
URL-based input support depends on the deployed service version. If the live schema does not expose
```
url
```
/
```
media_type
```
/
```
creation_time
```
, upload via
```
POST /v1/files
```
first and pass the returned
```
id
```
.
Alert trigger = the tokens
"yes"
or
"true"
in the VLM response (case-insensitive). There is no per-request alert flag. Design prompts with an explicit
```
Anomaly Detected: Yes/No
```
line and set
```
system_prompt
```
to constrain the model to Yes/No answers (per the VSS docs). Every chunk is published to
```
KAFKA_TOPIC
```
; matched chunks additionally go to
```
KAFKA_INCIDENT_TOPIC
```
with
```
isAnomaly=true
```
,
```
info["triggerPhrase"]
```
set to the matched tokens, and
```
info["verdict"]="confirmed"
```
.
alert_category
support depends on the deployed service version. If the live OpenAPI schema does not expose it, Kafka incidents default
```
incident.category = "vlm-alert"
```
.
Kafka topics are server-side config, not per-request. The
```
KAFKA_*
```
env vars (via compose
```
RTVI_VLM_KAFKA_*
```
rewrites) are fixed at container start — clients can't override topics on a per-request basis. Kafka publish is additive to the HTTP response, never a replacement.
Topic names differ by deployment source. The checked-in RT-VLM
```
.env
```
and VSS alerts/profile sources use
```
mdx-vlm
```
and
```
mdx-vlm-incidents
```
; a bare copied compose with no
```
RTVI_VLM_KAFKA_*
```
overrides falls back to
```
vision-llm-messages
```
and
```
vision-llm-events-incidents
```
. Always trust the live
```
vss-rtvi-vlm
```
environment before consuming.
Standalone Kafka must advertise
${HOST_IP}:9092
. The RT-VLM compose uses
```
KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092
```
; a broker that advertises
```
localhost:9094
```
or
```
kafka:9092
```
may pass producer/consumer tests inside the broker container while RT-VLM publish fails.
Start Kafka before RT-VLM when Kafka is enabled. For deterministic standalone validation, make the broker reachable at
```
${HOST_IP}:9092
```
first. If you start Kafka later or change its advertised listener, restart/recreate
```
rtvi-vlm
```
before expecting Kafka offsets to move.
stream=true
returns Server-Sent Events, not chunked JSON. Use
```
curl -N
```
(no buffering). Each event is
```
data: {...}\n\n
```
with per-chunk fields such as
```
content
```
,
```
start_time
```
, and
```
end_time
```
, terminated by
```
data: [DONE]
```
. Without
```
stream=true
```
the server buffers until the full video is processed — fine for short clips (<1 min), avoid for live streams.
Trust live OpenAPI for optional NIM-compatible endpoints.
```
/v1/license
```
is not exposed by current 26.05 builds and returns 404, even though older generic NIM docs may mention it.
Prefer
/v1/chat/completions
over
/v1/completions
. Text-only legacy completions return HTTP 400 by design on current 26.05 builds; text-only chat completions work.
chunk_duration=0
disables chunking — the entire video is sent to the VLM as one shot. Only meaningful for short clips; long videos will OOM or exceed
```
max_model_len
```
.
Default frame budget caps at
VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES
(256). Requesting FPS that implies >256 frames per chunk is silently capped; drop FPS or shorten
```
chunk_duration
```
to stay within budget.
enable_reasoning
requires a Cosmos Reason model. Passing it with Qwen3-VL or other non-reasoning models is a no-op.
/v1/metrics
is unauthenticated on current 26.05 standalone builds. A Bearer token is harmless if a deployment has stricter auth, but do not fail validation when
```
/v1/metrics
```
returns HTTP 200 without auth.
File upload is multipart, not JSON. Use
```
-F file=@path -F purpose=vision -F media_type=video
```
; a
```
-d
```
body returns 422.
Live-stream lifecycle cleanup must unregister the stream:
```
DELETE /v1/streams/delete/{stream_id}
```
removes the RTSP source. If the live schema also exposes
```
DELETE /v1/generate_captions/{stream_id}
```
, call it first to stop inference explicitly.

curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "你是仓库监控系统。用一句话描述场景，然后在下一行准确输出：\nAnomaly Detected: Yes/No\nReason: <一句话>\n如果任何工人未佩戴安全帽或反光背心，标记为异常。", "system_prompt": "用yes或no正确回答用户的问题。", "model": "$MODEL_ID", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"


**从Kafka消费警报**。Kafka消息值是NvSchema protobuf负载，因此使用`print.value=false`可获得干净的验证结果，显示时间戳、键和头信息，而不会转储二进制负载字节。VSS警报/配置文件源使用`mdx-vlm-incidents`；如果未加载`RTVI_VLM_KAFKA_INCIDENT_TOPIC`覆盖配置，纯复制的compose文件可能回退到`vision-llm-events-incidents`。优先信任容器的实时环境，而非硬编码的主题名称。
```bash
INCIDENT_TOPIC="${INCIDENT_TOPIC:-$(docker exec vss-rtvi-vlm printenv KAFKA_INCIDENT_TOPIC 2>/dev/null || true)}"
INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"

docker exec mdx-kafka kafka-console-consumer \
  --bootstrap-server 127.0.0.1:9092 \
  --topic "$INCIDENT_TOPIC" \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false

如果Kafka未在VSS的

mdx-kafka

容器中运行，请使用运行代理的主机或容器中的Kafka CLI：

bash

INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"

kafka-console-consumer \
  --bootstrap-server "$HOST_IP:9092" \
  --topic "$INCIDENT_TOPIC" \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false

对于独立验证，请记住RT-VLM的compose文件通过

KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092

映射Kafka；除非修改compose文件，否则直接在.env中设置

KAFKA_BOOTSTRAP_SERVERS

会被忽略。代理必须广播一个

vss-rtvi-vlm

容器可访问的监听器。代理和服务容器内部的

localhost

并非主机，而

kafka:9092

等代理别名仅在两个容器共享同一Docker网络时有效。对于仅RT-VLM的验证，优先使用

references/kafka-workflows.md

中的独立代理，而非完整仓库的基础设施compose文件；后者需要完整配置文件的SDRC环境/配置。如果Kafka已在运行，请询问用户是重用现有代理还是启动专用代理，再停止或替换现有资源。在实际代理容器内运行CLI检查，但仍需配置广播监听器，以便RT-VLM能从其容器网络连接。

事件protobuf（

ext.proto :: Incident

）的关键字段：

sensorId

、

timestamp

、

end

、

objectIds

、

frameIds

、

place

、

analyticsModule

、

category

、

isAnomaly

（警报时为

true

）、

llm

（嵌套的VisionLLM）、

info

映射（包括

triggerPhrase

、

verdict

、

requestId

、

chunkIdx

、

streamId

、

alertCategory

——如果部署支持

alert_category

查询字段，即3.1版本之后）。

—

4. Kafka工作流（警报+消息总线）

—

RTSP流的密集字幕生成带警报功能，以及HTTP与Kafka响应模型的文档，请参考

references/kafka-workflows.md

。

—

错误参考

—

状态码	含义	常见原因
400	错误请求	缺少必填字段（ `id` 、 `prompt` 、 `model` ）；不支持的 `media_type` ；未知的 `model` 名称
401	未授权	缺失/无效的 `Authorization: Bearer $API_KEY` ——或密钥格式错误（应为 `nvapi-...` 格式）
404	未找到	`file_id` 已删除 / stream_id未注册 / 端点路径错误（注意： `DELETE /v1/streams/delete/{stream_id}` 必须传入 `{stream_id}` ）
413	请求实体过大	上传文件超过服务器 `MAX_FILE_SIZE` 限制；增大限制或预先分割视频
422	无法处理的实体	Pydantic schema违反——例如 `use_fps_for_chunking=true` 但未设置 `num_frames_per_second_or_fixed_frames_chunk` ；向仅支持文件的字段（如 `media_info` ）传入流ID
429	请求受限	并发流过多——提高 `VLM_BATCH_SIZE` 或分散到多个实例
500	服务器错误	VLM推理异常（OOM、模型不可用）——检查 `docker logs vss-rtvi-vlm`
503	服务繁忙	启动未完成（模型仍在下载）或上游NIM依赖不健康

—

注意事项

—

以实时OpenAPI为权威来源。对于VSS 3.2，字幕生成端点是
```
/v1/generate_captions
```
。一些旧参考资料和镜像使用
```
/v1/generate_captions_alerts
```
；除非
```
GET /openapi.json
```
显示该路径存在，否则请勿假设其可用。
基于URL的输入支持取决于部署的服务版本。如果实时schema未暴露
```
url
```
/
```
media_type
```
/
```
creation_time
```
，请先通过
```
POST /v1/files
```
上传文件，再传入返回的
```
id
```
。
警报触发条件为VLM响应中包含
"yes"
或
"true"
令牌（不区分大小写）。没有每个请求的警报标志。设计提示时需包含明确的
```
Anomaly Detected: Yes/No
```
行，并设置
```
system_prompt
```
约束模型返回Yes/No答案（根据VSS文档）。每个片段都会发布到
```
KAFKA_TOPIC
```
；匹配的片段会额外发送到
```
KAFKA_INCIDENT_TOPIC
```
，并设置
```
isAnomaly=true
```
、
```
info["triggerPhrase"]
```
为匹配的令牌、
```
info["verdict"]="confirmed"
```
。
alert_category
支持取决于部署的服务版本。如果实时OpenAPI schema未暴露该字段，Kafka事件的
```
incident.category
```
默认值为
```
"vlm-alert"
```
。
Kafka主题是服务器端配置，而非每个请求可配置。
```
KAFKA_*
```
环境变量（通过compose的
```
RTVI_VLM_KAFKA_*
```
重写）在容器启动时固定——客户端无法在每个请求中覆盖主题。Kafka发布是对HTTP响应的补充，而非替代。
主题名称因部署源而异。已签入的RT-VLM
```
.env
```
和VSS警报/配置文件源使用
```
mdx-vlm
```
和
```
mdx-vlm-incidents
```
；如果没有
```
RTVI_VLM_KAFKA_*
```
覆盖配置，纯复制的compose文件会回退到
```
vision-llm-messages
```
和
```
vision-llm-events-incidents
```
。消费前请始终信任
```
vss-rtvi-vlm
```
的实时环境。
独立Kafka必须广播
${HOST_IP}:9092
。RT-VLM的compose文件使用
```
KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092
```
；如果代理广播
```
localhost:9094
```
或
```
kafka:9092
```
，可能在代理容器内通过生产者/消费者测试，但RT-VLM发布会失败。
启用Kafka时，请先启动Kafka再启动RT-VLM。为了确定性的独立验证，请确保代理先在
```
${HOST_IP}:9092
```
上可访问。如果稍后启动Kafka或更改其广播监听器，请重启/重建
```
rtvi-vlm
```
，再期望Kafka偏移量更新。
stream=true
返回Server-Sent Events，而非分段JSON。请使用
```
curl -N
```
（无缓冲）。每个事件格式为
```
data: {...}\n\n
```
，包含
```
content
```
、
```
start_time
```
和
```
end_time
```
等分段字段，以
```
data: [DONE]
```
结束。如果不设置
```
stream=true
```
，服务器会缓冲直到处理完整个视频——适合短剪辑（<1分钟），但请避免用于直播流。
信任实时OpenAPI获取可选的NIM兼容端点。当前26.05版本未暴露
```
/v1/license
```
，调用会返回404，即使旧版通用NIM文档可能提到该端点。
优先使用
/v1/chat/completions
而非
/v1/completions
。当前26.05版本中，纯文本旧版补全请求会返回HTTP 400（设计如此）；纯文本聊天补全请求可正常工作。
chunk_duration=0
会禁用分段——整个视频作为一个整体发送给VLM。仅对短剪辑有意义；长视频会导致OOM或超过
```
max_model_len
```
。
默认帧预算上限为
VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES
（256）。请求的FPS意味着每个片段超过256帧时会被静默限制；降低FPS或缩短
```
chunk_duration
```
以保持在预算内。
enable_reasoning
需要Cosmos Reason模型。与Qwen3-VL或其他非推理模型一起使用时，该参数无效。
当前26.05独立版本中
/v1/metrics
无需认证。如果部署有更严格的认证，传入Bearer令牌也无害，但当
```
/v1/metrics
```
无需认证返回HTTP 200时，请勿将其视为验证失败。
文件上传使用多部分表单，而非JSON。请使用
```
-F file=@path -F purpose=vision -F media_type=video
```
；使用
```
-d
```
请求体返回422错误。
直播流生命周期清理必须注销流：
```
DELETE /v1/streams/delete/{stream_id}
```
会移除RTSP源。如果实时schema还暴露了
```
DELETE /v1/generate_captions/{stream_id}
```
，请先调用它显式停止推理。

vss-deploy-dense-captioning

Original

Translation

Purpose

用途

Prerequisites

前提条件

独立RT-VLM部署：

Instructions

针对现有服务的API调用：

Examples

完整VSS配置文件部署：

Limitations

操作说明

Troubleshooting

示例

Deploy and Use RT-VLM Dense Captioning (VSS 3.2)

限制

Deployment Routing

故障排除

Standalone Deployment Flow

部署并使用RT-VLM密集字幕服务（VSS 3.2）

1. Copy deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml

部署路由

into any writable standalone working directory.

2. Derive RTVI_VLM_IMAGE_TAG from that compose copy.

3. Strip the standalone-only dangling depends_on block from the copy.

4. Create a gitignored .env with the required RT-VLM values.

5. Prepare host bind paths such as $VSS_DATA_DIR/data_log/vst/clip_storage.

6. docker compose --env-file .env -f rtvi-vlm-docker-compose.yml config --quiet

7. docker pull the exact RT-VLM image tag.

8. docker compose ... up -d rtvi-vlm, wait for ready, then smoke test.

Setup

独立部署流程

Quick Start — dense captions from a local video

1. 将deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml

复制到任意可写的独立工作目录。

2. 从该compose副本中获取RTVI_VLM_IMAGE_TAG。

3. 移除副本中仅适用于独立部署的无用depends_on块。

4. 创建一个被git忽略的.env文件，填入所需的RT-VLM配置值。

5. 准备主机绑定路径，例如$VSS_DATA_DIR/data_log/vst/clip_storage。

6. 执行docker compose --env-file .env -f rtvi-vlm-docker-compose.yml config --quiet

7. 拉取指定版本的RT-VLM镜像。

8. 执行docker compose ... up -d rtvi-vlm，等待服务就绪，然后进行冒烟测试。

1. Upload the video, capture its file id

独立部署所需的最小.env配置值：

2. Generate captions + alerts (SSE stream of chunked responses)

环境设置

Endpoints

快速入门——从本地视频生成密集字幕

Captions

POST /v1/generate_captions — Generate VLM captions (and alerts) for video/stream

1. 上传视频，获取文件ID

DELETE /v1/generate_captions/{stream_id} — Stop caption generation for a live stream, if exposed

2. 生成字幕和警报（分段响应的SSE流）

Files

端点说明

字幕生成

POST /v1/files — Upload a media file (multipart)

POST /v1/generate_captions — 为视频/流生成VLM字幕（和警报）

GET /v1/files?purpose=vision — List uploaded files

DELETE /v1/generate_captions/{stream_id} — 停止直播流的字幕生成（若端点已暴露）

GET /v1/files/{file_id} — File metadata

GET /v1/files/{file_id}/content — Download original file content

DELETE /v1/files/{file_id} — Delete file (releases asset storage)

Live Stream

POST /v1/streams/add — Register one or more RTSP streams

文件管理

GET /v1/streams/get-stream-info — List active streams

POST /v1/files — 上传媒体文件（多部分表单）

DELETE /v1/streams/delete/{stream_id} — Remove a single stream

DELETE /v1/streams/delete-batch — Remove many ({"stream_ids":[...]})

CV-style singular stream endpoints

NIM Compatible

GET /v1/files?purpose=vision — 列出已上传文件

GET /v1/files/{file_id} — 获取文件元数据

GET /v1/files/{file_id}/content — 下载原始文件内容

DELETE /v1/files/{file_id} — 删除文件（释放资产存储空间）

直播流管理

POST /v1/chat/completions — OpenAI-compatible chat (text + multimodal)

`POST /v1/generate_captions`
— Generate VLM captions (and alerts) for video/stream

`DELETE /v1/generate_captions/{stream_id}`
— Stop caption generation for a live stream, if exposed

`POST /v1/files`
— Upload a media file (multipart)

`POST /v1/generate_captions`
— 为视频/流生成VLM字幕（和警报）

`GET /v1/files?purpose=vision`
— List uploaded files

`DELETE /v1/generate_captions/{stream_id}`
— 停止直播流的字幕生成（若端点已暴露）

`GET /v1/files/{file_id}`
— File metadata

`GET /v1/files/{file_id}/content`
— Download original file content

`DELETE /v1/files/{file_id}`
— Delete file (releases asset storage)

`POST /v1/streams/add`
— Register one or more RTSP streams

`GET /v1/streams/get-stream-info`
— List active streams

`POST /v1/files`
— 上传媒体文件（多部分表单）

`DELETE /v1/streams/delete/{stream_id}`
— Remove a single stream

`DELETE /v1/streams/delete-batch`
— Remove many (
`{"stream_ids":[...]}`
)

`GET /v1/files?purpose=vision`
— 列出已上传文件

`GET /v1/files/{file_id}`
— 获取文件元数据

`GET /v1/files/{file_id}/content`
— 下载原始文件内容

`DELETE /v1/files/{file_id}`
— 删除文件（释放资产存储空间）

`POST /v1/chat/completions`
— OpenAI-compatible chat (text + multimodal)

`POST /v1/streams/add`
— 注册一个或多个RTSP流

`POST /v1/completions`
— OpenAI-compatible legacy completions

`GET /v1/streams/get-stream-info`
— 列出活跃流

`DELETE /v1/streams/delete/{stream_id}`
— 删除单个流

`DELETE /v1/streams/delete-batch`
— 批量删除多个流（传入
`{"stream_ids":[...]}`
）

`GET /v1/version`
—
`{ "version": "3.2.0-..." }`

`GET /v1/manifest`
— NIM manifest

`GET /v1/health/live`
·
`GET /v1/health/ready`
— NIM-style probes

`POST /v1/chat/completions`
— OpenAI兼容的聊天（文本+多模态）

`GET /v1/models`
— List loaded VLMs:
`{ "data": [{ "id", "object": "model", "owned_by" }] }`

`GET /v1/metadata`
— Service metadata (build, release, image tag)

`GET /v1/assets/stats`
— Asset storage counts, TTL, and oldest-asset age

`GET /v1/metrics`
— Prometheus metrics (plain text)

`GET /v1/ready`
·
`GET /v1/live`
·
`GET /v1/startup`
— Kubernetes-style probes

`POST /v1/completions`
— OpenAI兼容的旧版补全

`GET /v1/version`
— 返回
`{ "version": "3.2.0-..." }`

`GET /v1/manifest`
— 获取NIM清单

`GET /v1/health/live`
·
`GET /v1/health/ready`
— NIM风格的探测端点

`GET /v1/models`
— 列出已加载的VLM模型：
`{ "data": [{ "id", "object": "model", "owned_by" }] }`

`GET /v1/metadata`
— 获取服务元数据（构建版本、发行版、镜像标签）

`GET /v1/assets/stats`
— 获取资产存储统计信息、TTL和最旧资产时长

`GET /v1/metrics`
— 获取Prometheus指标（纯文本格式）

`GET /v1/ready`
·
`GET /v1/live`
·
`GET /v1/startup`
— Kubernetes风格的探测端点