vss-deploy-dense-captioning

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Purpose

用途

Stand up the RT-VLM dense-captioning microservice on its own and exercise every endpoint it exposes (file upload, generate_captions, stream add/delete, chat-completions, Kafka topics).
独立部署RT-VLM密集字幕微服务,并测试其暴露的所有端点(文件上传、generate_captions、流添加/删除、聊天补全、Kafka主题)。

Prerequisites

前提条件

独立RT-VLM部署:

For standalone RT-VLM deployment:
  • Docker, Docker Compose, NVIDIA Container Toolkit, and a visible GPU.
  • NGC registry credentials in
    $NGC_CLI_API_KEY
    for
    docker login nvcr.io
    , image pulls, and local NGC model/artifact downloads.
  • curl
    ,
    jq
    , and any writable working directory for the standalone compose copy.
For API calls against an existing service:
  • Running RT-VLM service reachable at
    $BASE_URL
    .
  • Bearer token in
    $RTVI_VLM_API_KEY
    or
    $NGC_CLI_API_KEY
    , depending on how the service was configured.
For full VSS profile deployment:
  • Use
    ../vss-deploy-profile/SKILL.md
    ; this skill does not deploy full VSS profiles.
  • Docker、Docker Compose、NVIDIA Container Toolkit,以及可识别的GPU。
  • 存储在
    $NGC_CLI_API_KEY
    中的NGC注册表凭证,用于
    docker login nvcr.io
    、镜像拉取和本地NGC模型/工件下载。
  • curl
    jq
    ,以及用于存放独立compose副本的可写工作目录。

Instructions

针对现有服务的API调用:

Follow the routing tables and step-by-step workflows below. Each section that ends in workflow, quick start, or flow is intended to be executed top-to-bottom. Detailed reference material lives in
references/
and helper scripts live in
scripts/
— call them via
run_script
when the skill points to a script by name.
  • 可通过
    $BASE_URL
    访问的运行中的RT-VLM服务。
  • 根据服务配置,需提供存储在
    $RTVI_VLM_API_KEY
    $NGC_CLI_API_KEY
    中的Bearer令牌。

Examples

完整VSS配置文件部署:

Worked end-to-end examples are kept under
evals/
(each
*.json
manifest contains a runnable scenario) and inline in the per-workflow
curl
blocks below. Run a Tier-3 evaluation with
nv-base validate <this-skill-dir> --agent-eval
to replay them.
  • 使用
    ../vss-deploy-profile/SKILL.md
    ;本技能不负责部署完整VSS配置文件。

Limitations

操作说明

  • Requires either a standalone RT-VLM service deployed via this skill or an existing RT-VLM service reachable from the caller.
  • NGC-hosted models and NIMs may be subject to rate-limits, GPU memory requirements, and license restrictions.
  • Concurrency, GPU memory, and storage limits depend on the host hardware and the profile's compose file.
遵循以下路由表和分步工作流。所有以工作流快速入门流程结尾的章节都需要从上到下执行。详细参考资料位于
references/
目录,辅助脚本位于
scripts/
目录——当技能指向某个脚本名称时,通过
run_script
调用它们。

Troubleshooting

示例

  • Error: REST call returns connection refused. Cause: target microservice not running. Solution: probe
    /docs
    or
    /health
    ; redeploy via
    vss-deploy-profile
    or the matching
    vss-deploy-*
    skill.
  • Error: HTTP 401/403 from NGC pulls. Cause: missing/expired
    NGC_CLI_API_KEY
    . Solution:
    docker login nvcr.io
    and re-export the key before retrying.
  • Error: container OOM or model fails to load. Cause: insufficient GPU memory for the selected profile. Solution: switch to a smaller variant or free GPUs via
    docker compose down
    .
完整的端到端示例保存在
evals/
目录下(每个
*.json
清单包含一个可运行的场景),并内嵌在各工作流的
curl
代码块中。运行
nv-base validate <this-skill-dir> --agent-eval
可重放这些示例,完成Tier-3评估。

Deploy and Use RT-VLM Dense Captioning (VSS 3.2)

限制

RT-VLM is NVIDIA's real-time vision-language microservice: decode video (file or RTSP), segment it into chunks, run a VLM (
cosmos-reason1
,
cosmos-reason2
, or any OpenAI-compatible model), stream dense captions back over SSE/HTTP, and publish captions, incident alerts, and errors to Kafka. Use this skill to deploy the standalone RT-VLM service when a full VSS profile is not already running, then call its
/v1/...
API for caption generation, file upload, live-stream management, health checks, NIM-compatible chat completions, or Prometheus metrics. API reference: https://docs.nvidia.com/vss/latest/real-time-vlm-api.html.
  • 需要通过本技能部署的独立RT-VLM服务,或调用方可访问的现有RT-VLM服务。
  • NGC托管的模型和NIM可能受速率限制、GPU内存要求和许可证限制。
  • 并发数、GPU内存和存储限制取决于主机硬件和配置文件的compose文件。

Deployment Routing

故障排除

If the user asks to deploy a full VSS profile, use
../vss-deploy-profile/SKILL.md
. That skill owns profile routing,
generated.env
,
resolved.yml
, multi-service sizing, and full-stack deploy/teardown.
If the user asks for standalone RT-VLM dense captioning, or no VSS profile is already running, use the standalone RT-VLM flow in
references/deploy-rt-vlm-service.md
before calling the API. This follows the same compose-centric pattern as
vss-deploy-profile
: gather context, run preflights, work from a local copy, dry-run with
docker compose config
, review, deploy, then wait for health.
  • 错误:REST调用返回连接拒绝。原因:目标微服务未运行。解决方案:探测
    /docs
    /health
    端点;通过
    vss-deploy-profile
    或对应的
    vss-deploy-*
    技能重新部署。
  • 错误:NGC拉取时返回HTTP 401/403。原因
    NGC_CLI_API_KEY
    缺失或过期。解决方案:执行
    docker login nvcr.io
    并重新导出密钥后重试。
  • 错误:容器内存不足(OOM)或模型加载失败。原因:所选配置文件的GPU内存不足。解决方案:切换到更小的模型变体,或通过
    docker compose down
    释放GPU资源。

Standalone Deployment Flow

部署并使用RT-VLM密集字幕服务(VSS 3.2)

Always follow this sequence. Never skip the dry-run.
bash
undefined
RT-VLM是NVIDIA的实时视觉语言微服务:解码视频(文件或RTSP流),将其分割为片段,运行VLM模型(
cosmos-reason1
cosmos-reason2
或任何OpenAI兼容模型),通过SSE/HTTP返回密集字幕流,并将字幕、事件警报和错误信息发布到Kafka。当完整VSS配置文件未运行时,使用本技能部署独立RT-VLM服务,然后调用其
/v1/...
API进行字幕生成、文件上传、直播流管理、健康检查、NIM兼容的聊天补全或Prometheus指标查询。API参考文档:https://docs.nvidia.com/vss/latest/real-time-vlm-api.html

1. Copy deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml

部署路由

into any writable standalone working directory.

2. Derive RTVI_VLM_IMAGE_TAG from that compose copy.

3. Strip the standalone-only dangling depends_on block from the copy.

4. Create a gitignored .env with the required RT-VLM values.

5. Prepare host bind paths such as $VSS_DATA_DIR/data_log/vst/clip_storage.

6. docker compose --env-file .env -f rtvi-vlm-docker-compose.yml config --quiet

7. docker pull the exact RT-VLM image tag.

8. docker compose ... up -d rtvi-vlm, wait for ready, then smoke test.


Run preflights before any pull or `up`; stop and fix failures here before
debugging RT-VLM itself:

```bash
nvidia-smi --query-gpu=index,name --format=csv,noheader
nvidia-container-cli info
docker compose version
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
For standalone single-file deployments, do not run the raw
deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml
directly: it contains
depends_on
references to sibling VLM/NIM services that are only defined in the full VSS/met-blueprints compose project. The standalone reference shows how to copy the compose file, derive the current image tag from it, strip the
depends_on
block, and validate the result before
up
.
If
docker pull
fails with a containerd snapshotter/unpack error on Docker 28+, apply the
/etc/docker/daemon.json
containerd-snapshotter=false
fix in the standalone reference before retrying.
Minimum standalone
.env
values:
Host env varRequired whenPurpose
NGC_CLI_API_KEY
Standalone deploy pathNGC registry image pull and NGC model/artifact download
RTVI_VLM_API_KEY
or
NGC_CLI_API_KEY
Authenticated API callsRT-VLM bearer auth after the service is running
RTVI_VLM_PORT
AlwaysHost API port mapped to container
8000
HOST_IP
AlwaysKafka bootstrap host (
${HOST_IP}:9092
)
VSS_DATA_DIR
AlwaysRequired clip-storage bind mount
RTVI_VLM_MODEL_TO_USE
Always for standaloneBackend selector; use
cosmos-reason2
for the default local model or
openai-compat
for a remote/sibling endpoint
RTVI_VLM_MODEL_PATH
Local self-hosted modelSource-backed Cosmos Reason 2 path:
ngc:nim/nvidia/cosmos-reason2-8b:0303-fp8-dynamic-kv8
RTVI_VLM_ENDPOINT
RTVI_VLM_MODEL_TO_USE=openai-compat
Remote/sibling OpenAI-compatible VLM endpoint
VLM_NAME
RTVI_VLM_MODEL_TO_USE=openai-compat
Model/deployment name exposed by that endpoint
如果用户要求部署完整VSS配置文件,请使用
../vss-deploy-profile/SKILL.md
。该技能负责配置文件路由、
generated.env
resolved.yml
、多服务规模调整和全栈部署/销毁。
如果用户要求部署独立RT-VLM密集字幕服务,或当前未运行任何VSS配置文件,请先使用
references/deploy-rt-vlm-service.md
中的独立RT-VLM流程,再调用API。此流程遵循与
vss-deploy-profile
相同的compose中心模式:收集上下文、运行预检查、基于本地副本操作、通过
docker compose config
执行试运行、审核配置、部署,然后等待服务就绪。

Setup

独立部署流程

bash
export BASE_URL="http://localhost:${RTVI_VLM_PORT:-8018}"  # host-side RT-VLM port
export API_KEY="${NGC_CLI_API_KEY:-${RTVI_VLM_API_KEY:-}}" # bearer token used by host-side curl commands
: "${API_KEY:?Set NGC_CLI_API_KEY or RTVI_VLM_API_KEY before calling authenticated endpoints}"
Every request below uses
Authorization: Bearer $API_KEY
. Health endpoints (
/v1/health/*
,
/v1/ready
,
/v1/live
,
/v1/startup
) typically work without auth.
Smoke test before use:
bash
curl -fsS "$BASE_URL/v1/health/ready"
MODEL_ID="$(curl -fsS "$BASE_URL/v1/models" -H "Authorization: Bearer $API_KEY" | jq -r '.data[0].id // .id')"
curl -fsS "$BASE_URL/openapi.json" | jq -r '.paths | keys[]' | sort
请严格遵循以下步骤,切勿跳过试运行环节。
bash
undefined

Quick Start — dense captions from a local video

1. 将deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml

复制到任意可写的独立工作目录。

2. 从该compose副本中获取RTVI_VLM_IMAGE_TAG。

3. 移除副本中仅适用于独立部署的无用depends_on块。

4. 创建一个被git忽略的.env文件,填入所需的RT-VLM配置值。

5. 准备主机绑定路径,例如$VSS_DATA_DIR/data_log/vst/clip_storage。

6. 执行docker compose --env-file .env -f rtvi-vlm-docker-compose.yml config --quiet

7. 拉取指定版本的RT-VLM镜像。

8. 执行docker compose ... up -d rtvi-vlm,等待服务就绪,然后进行冒烟测试。

bash
undefined

在执行拉取或`up`操作前先运行预检查;在此阶段解决所有失败问题,再调试RT-VLM本身:

```bash
nvidia-smi --query-gpu=index,name --format=csv,noheader
nvidia-container-cli info
docker compose version
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi
对于独立单文件部署,请勿直接运行原始的
deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml
:它包含对同级VLM/NIM服务的
depends_on
引用,而这些服务仅在完整VSS/met-blueprints compose项目中定义。独立部署参考文档展示了如何复制compose文件、从中获取当前镜像标签、移除
depends_on
块,并在
up
操作前验证结果。
如果在Docker 28+版本中执行
docker pull
时出现containerd快照器/解压错误,请先应用独立参考文档中提到的
/etc/docker/daemon.json
设置
containerd-snapshotter=false
的修复方案,再重试。

1. Upload the video, capture its file id

独立部署所需的最小.env配置值:

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')
主机环境变量适用场景用途
NGC_CLI_API_KEY
独立部署路径NGC注册表镜像拉取和NGC模型/工件下载
RTVI_VLM_API_KEY
NGC_CLI_API_KEY
需认证的API调用服务运行后的RT-VLM Bearer认证
RTVI_VLM_PORT
所有场景映射到容器端口
8000
的主机API端口
HOST_IP
所有场景Kafka引导主机(
${HOST_IP}:9092
VSS_DATA_DIR
所有场景必需的剪辑存储绑定挂载路径
RTVI_VLM_MODEL_TO_USE
所有独立部署场景后端选择器;默认本地模型使用
cosmos-reason2
,远程/同级端点使用
openai-compat
RTVI_VLM_MODEL_PATH
本地自托管模型Cosmos Reason 2模型的源路径:
ngc:nim/nvidia/cosmos-reason2-8b:0303-fp8-dynamic-kv8
RTVI_VLM_ENDPOINT
RTVI_VLM_MODEL_TO_USE=openai-compat
远程/同级OpenAI兼容VLM端点
VLM_NAME
RTVI_VLM_MODEL_TO_USE=openai-compat
该端点暴露的模型/部署名称

2. Generate captions + alerts (SSE stream of chunked responses)

环境设置

curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Write a concise dense caption for each 10-second segment of this warehouse video.", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"
undefined
bash
export BASE_URL="http://localhost:${RTVI_VLM_PORT:-8018}"  # 主机侧RT-VLM端口
export API_KEY="${NGC_CLI_API_KEY:-${RTVI_VLM_API_KEY:-}}" # 主机侧curl命令使用的Bearer令牌
: "${API_KEY:?调用需认证的端点前,请设置NGC_CLI_API_KEY或RTVI_VLM_API_KEY}"
以下所有请求均使用
Authorization: Bearer $API_KEY
。健康端点(
/v1/health/*
/v1/ready
/v1/live
/v1/startup
)通常无需认证即可访问。
使用前的冒烟测试:
bash
curl -fsS "$BASE_URL/v1/health/ready"
MODEL_ID="$(curl -fsS "$BASE_URL/v1/models" -H "Authorization: Bearer $API_KEY" | jq -r '.data[0].id // .id')"
curl -fsS "$BASE_URL/openapi.json" | jq -r '.paths | keys[]' | sort

Endpoints

快速入门——从本地视频生成密集字幕

Captions

Generate VLM captions and alerts for videos and live streams.
bash
undefined

POST /v1/generate_captions
— Generate VLM captions (and alerts) for video/stream

1. 上传视频,获取文件ID

Required:
FieldTypeDescription
id
string | arrayUUID of a previously-uploaded file, or id of an active live stream. Accepts a list of ids for batch
prompt
stringUser prompt to the VLM (e.g. dense-caption instruction)
model
stringExact model id returned by
GET /v1/models
, for example
nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8
; backend selector aliases such as
cosmos-reason2
are not request model ids
Key optional fields:
FieldTypeDefaultDescription
system_prompt
stringSystem prompt; use
<think></think><answer></answer>
tags to enable reasoning on Cosmos Reason
enable_reasoning
booleanfalseTurn on reasoning for Cosmos Reason models
enable_audio
booleanfalseTranscribe audio (via Riva) and fold into captions
chunk_duration
integerSegment video into N-second chunks (
0
= no chunking)
chunk_overlap_duration
integer0Overlap between consecutive chunks
num_frames_per_second_or_fixed_frames_chunk
numberFPS (if
use_fps_for_chunking=true
) or fixed frames per chunk
use_fps_for_chunking
booleanfalseInterpret above as FPS vs. fixed-frame count
vlm_input_width
/
vlm_input_height
intResize frames before inference (0 = native)
media_info
object
{"type":"offset","start_offset":0,"end_offset":10}
to process a slice of a file (not live streams)
stream
booleanfalseSSE: emit per-chunk caption deltas as
data:
events (recommended for long videos)
max_tokens
/
temperature
/
top_p
/
top_k
/
seed
/
ignore_eos
Standard sampling controls
response_format
objectQuery response format object
mm_processor_kwargs
objectExtra kwargs for the multimodal processor (e.g. size, shortest/longest edge)
bash
curl -N -X POST "$BASE_URL/v1/generate_captions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "prompt": "Dense-caption this warehouse video, one sentence per 10s chunk.",
    "model": "nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8",
    "chunk_duration": 10,
    "stream": true
  }'
Response shape: live 26.05 responses use
chunk_responses
with
start_time
/
end_time
; SSE streams terminate with
data: [DONE]
. See
references/api-surface-26.05.md
.
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')

DELETE /v1/generate_captions/{stream_id}
— Stop caption generation for a live stream, if exposed

2. 生成字幕和警报(分段响应的SSE流)

Some deployments expose this companion stop endpoint. Check the live OpenAPI (
curl -fsS "$BASE_URL/openapi.json" | jq '.paths | keys[]'
) before using it. Always pair live-stream cleanup with
DELETE /v1/streams/delete/{stream_id}
to un-register the RTSP source.
bash
curl -X DELETE "$BASE_URL/v1/generate_captions/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "为该仓库视频的每10秒片段生成简洁的密集字幕。", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"
undefined

Files

端点说明

字幕生成

Upload and manage media files consumed by
/v1/generate_captions
.
为视频和直播流生成VLM字幕和警报。

POST /v1/files
— Upload a media file (multipart)

POST /v1/generate_captions
— 为视频/流生成VLM字幕(和警报)

bash
curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
  -F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"
Response:
{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }
. Optional metadata such as
sensor_name
may be accepted by newer builds; check the live OpenAPI before sending it.
必填字段:
字段类型描述
id
string | array已上传文件的UUID,或活跃直播流的ID。支持传入ID列表进行批量处理
prompt
string给VLM的用户提示(例如密集字幕生成指令)
model
string
GET /v1/models
返回的精确模型ID,例如
nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8
;后端选择器别名(如
cosmos-reason2
)不属于请求模型ID
关键可选字段:
字段类型默认值描述
system_prompt
string系统提示;使用
<think></think><answer></answer>
标签可启用Cosmos Reason模型的推理功能
enable_reasoning
booleanfalse为Cosmos Reason模型开启推理功能
enable_audio
booleanfalse转录音频(通过Riva)并整合到字幕中
chunk_duration
integer将视频分割为N秒的片段(
0
表示不分割)
chunk_overlap_duration
integer0连续片段之间的重叠时长
num_frames_per_second_or_fixed_frames_chunk
numberFPS(当
use_fps_for_chunking=true
时)或每个片段的固定帧数
use_fps_for_chunking
booleanfalse将上述字段解释为FPS还是固定帧数
vlm_input_width
/
vlm_input_height
int推理前调整帧大小(0表示使用原生尺寸)
media_info
object
{"type":"offset","start_offset":0,"end_offset":10}
用于处理文件的一部分(不适用于直播流)
stream
booleanfalseSSE:以
data:
事件形式逐段返回字幕增量(推荐用于长视频)
max_tokens
/
temperature
/
top_p
/
top_k
/
seed
/
ignore_eos
标准采样控制参数
response_format
object查询响应格式对象
mm_processor_kwargs
object多模态处理器的额外参数(例如尺寸、最短/最长边)
bash
curl -N -X POST "$BASE_URL/v1/generate_captions" \
  -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "id": "123e4567-e89b-12d3-a456-426614174000",
    "prompt": "为该仓库视频生成密集字幕,每10秒片段一句话。",
    "model": "nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8",
    "chunk_duration": 10,
    "stream": true
  }'
响应格式: 26.05版本的实时响应使用包含
start_time
/
end_time
chunk_responses
;SSE流以
data: [DONE]
结束。详情请参考
references/api-surface-26.05.md

GET /v1/files?purpose=vision
— List uploaded files

DELETE /v1/generate_captions/{stream_id}
— 停止直播流的字幕生成(若端点已暴露)

GET /v1/files/{file_id}
— File metadata

GET /v1/files/{file_id}/content
— Download original file content

DELETE /v1/files/{file_id}
— Delete file (releases asset storage)

Live Stream

RTSP stream lifecycle.
部分部署会暴露此配套停止端点。使用前请检查实时OpenAPI(
curl -fsS "$BASE_URL/openapi.json" | jq '.paths | keys[]'
)。清理直播流时,请务必配合调用
DELETE /v1/streams/delete/{stream_id}
以注销RTSP源。
bash
curl -X DELETE "$BASE_URL/v1/generate_captions/$STREAM_ID" -H "Authorization: Bearer $API_KEY"

POST /v1/streams/add
— Register one or more RTSP streams

文件管理

Required per stream:
liveStreamUrl
(must start with
rtsp://
),
description
. Optional:
username
,
password
,
sensor_name
, and placement metadata (
place_name
,
place_type
,
place_lat
,
place_lon
,
place_alt
,
place_coordinate_x
,
place_coordinate_y
).
Precheck public or external RTSP sources before registering them. A probe exit code alone is not enough;
gst-discoverer-1.0
can exit
0
while reporting an unknown media type. Treat the stream as usable only when a probe output identifies at least one video stream/caps entry. If one probe is inconclusive, cross-check with another tool such as
ffprobe
before failing or registering:
bash
ffprobe -v error -select_streams v:0 \
  -show_entries stream=codec_type -of csv=p=0 "$RTSP_URL" | grep -qx video
bash
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
  | jq -r '.results[0].id')
上传和管理供
/v1/generate_captions
使用的媒体文件。

GET /v1/streams/get-stream-info
— List active streams

POST /v1/files
— 上传媒体文件(多部分表单)

DELETE /v1/streams/delete/{stream_id}
— Remove a single stream

DELETE /v1/streams/delete-batch
— Remove many (
{"stream_ids":[...]}
)

CV-style singular stream endpoints

26.05 deployments also expose CV-style stream control paths:
POST /v1/stream/add
,
GET /v1/stream/get-stream-info
, and
POST /v1/stream/remove
. Use these when a workflow or release note explicitly uses the key/value envelope; otherwise prefer the plural RT-VLM stream endpoints above. See
references/api-surface-26.05.md
for examples and the
stream_count:0
compatibility caveat.
bash
curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
  -F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"
响应:
{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }
。新版本可能接受
sensor_name
等可选元数据;发送前请检查实时OpenAPI。

NIM Compatible

GET /v1/files?purpose=vision
— 列出已上传文件

GET /v1/files/{file_id}
— 获取文件元数据

GET /v1/files/{file_id}/content
— 下载原始文件内容

DELETE /v1/files/{file_id}
— 删除文件(释放资产存储空间)

直播流管理

OpenAI-compatible endpoints for interop with OpenAI/NVIDIA-API clients.
RTSP流生命周期管理。

POST /v1/chat/completions
— OpenAI-compatible chat (text + multimodal)

POST /v1/streams/add
— 注册一个或多个RTSP流

Required:
messages
,
model
. Text-only requests work and omit
id
,
video_url
, and
image_url
. For uploaded-video, direct
video_url
, direct
image_url
, streaming, and RTSP-backed chat examples, see
references/api-surface-26.05.md
.
bash
curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"$MODEL_ID\",\"messages\":[{\"role\":\"user\",\"content\":\"Summarize this scene.\"}]}"
每个流必填字段:
liveStreamUrl
(必须以
rtsp://
开头)、
description
。 可选字段:
username
password
sensor_name
,以及位置元数据(
place_name
place_type
place_lat
place_lon
place_alt
place_coordinate_x
place_coordinate_y
)。
注册公共或外部RTSP源前请先进行预检查。仅探测退出码不足以判断;
gst-discoverer-1.0
可能返回
0
但报告未知媒体类型。只有当探测输出识别到至少一个视频流/caps条目时,才认为该流可用。如果一次探测结果不确定,请使用
ffprobe
等其他工具交叉验证后再决定是否注册:
bash
ffprobe -v error -select_streams v:0 \
  -show_entries stream=codec_type -of csv=p=0 "$RTSP_URL" | grep -qx video
bash
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
  -H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
  -d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
  | jq -r '.results[0].id')

POST /v1/completions
— OpenAI-compatible legacy completions

GET /v1/streams/get-stream-info
— 列出活跃流

DELETE /v1/streams/delete/{stream_id}
— 删除单个流

DELETE /v1/streams/delete-batch
— 批量删除多个流(传入
{"stream_ids":[...]}

CV风格的单流端点

This endpoint exists for compatibility, but on current 26.05 builds text-only legacy completion requests return HTTP 400 by design. Use
/v1/chat/completions
for text-only and multimodal requests.
26.05版本部署还暴露CV风格的流控制路径:
POST /v1/stream/add
GET /v1/stream/get-stream-info
POST /v1/stream/remove
。当工作流或发行说明明确使用键值信封时使用这些端点;否则优先使用上述RT-VLM的复数流端点。示例和
stream_count:0
兼容性注意事项请参考
references/api-surface-26.05.md

GET /v1/version
{ "version": "3.2.0-..." }

NIM兼容端点

GET /v1/manifest
— NIM manifest

GET /v1/health/live
·
GET /v1/health/ready
— NIM-style probes

Do not assume
/v1/license
exists. The current 26.05 live OpenAPI does not expose it and the endpoint returns 404; only call it after checking
GET /openapi.json
.
与OpenAI/NVIDIA-API客户端互操作的OpenAI兼容端点。

Models · Metadata · Metrics · Health Check

POST /v1/chat/completions
— OpenAI兼容的聊天(文本+多模态)

GET /v1/models
— List loaded VLMs:
{ "data": [{ "id", "object": "model", "owned_by" }] }

GET /v1/metadata
— Service metadata (build, release, image tag)

GET /v1/assets/stats
— Asset storage counts, TTL, and oldest-asset age

GET /v1/metrics
— Prometheus metrics (plain text)

GET /v1/ready
·
GET /v1/live
·
GET /v1/startup
— Kubernetes-style probes


必填字段:
messages
model
。纯文本请求无需传入
id
video_url
image_url
。上传视频、直接
video_url
、直接
image_url
、流和RTSP支持的聊天示例,请参考
references/api-surface-26.05.md
bash
curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
  -H "Content-Type: application/json" \
  -d "{\"model\":\"$MODEL_ID\",\"messages\":[{\"role\":\"user\",\"content\":\"Summarize this scene.\"}]}"

Common Workflows

POST /v1/completions
— OpenAI兼容的旧版补全

The four standard dense-captioning scenarios.
此端点仅用于兼容性,但在当前26.05版本中,纯文本旧版补全请求会返回HTTP 400(设计如此)。请使用
/v1/chat/completions
处理纯文本和多模态请求。

1. Dense captions from a stored video file

GET /v1/version
— 返回
{ "version": "3.2.0-..." }

GET /v1/manifest
— 获取NIM清单

GET /v1/health/live
·
GET /v1/health/ready
— NIM风格的探测端点

bash
undefined
请勿假设
/v1/license
端点存在。当前26.05版本的实时OpenAPI未暴露该端点,调用会返回404;请先检查
GET /openapi.json
再调用。

Upload → capture file id → generate captions (SSE stream)

模型·元数据·指标·健康检查

GET /v1/models
— 列出已加载的VLM模型:
{ "data": [{ "id", "object": "model", "owned_by" }] }

GET /v1/metadata
— 获取服务元数据(构建版本、发行版、镜像标签)

GET /v1/assets/stats
— 获取资产存储统计信息、TTL和最旧资产时长

GET /v1/metrics
— 获取Prometheus指标(纯文本格式)

GET /v1/ready
·
GET /v1/live
·
GET /v1/startup
— Kubernetes风格的探测端点

FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Describe warehouse events in 1 sentence per 10s chunk.", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"

When done, free storage:

常见工作流

curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"
undefined
四种标准的密集字幕生成场景。

2. Dense captions from an RTSP live stream

1. 从存储的视频文件生成密集字幕

bash
undefined
bash
undefined

Register the stream

上传视频 → 获取文件ID → 生成字幕(SSE流)

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "每10秒片段用一句话描述仓库事件。", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"

Start continuous caption generation

完成后释放存储空间:

curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "Describe each event; start each sentence with a timestamp.", "model": "$MODEL_ID", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &
curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"
undefined

Tear down when finished. If the live OpenAPI exposes

2. 从RTSP直播流生成密集字幕

DELETE /v1/generate_captions/{stream_id}, call it before unregistering.

curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
undefined
bash
undefined

3. Dense captions with alerts from an RTSP stream

注册直播流

bash
undefined
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')

Pre-req: Kafka is enabled and topics match the deployment source.

启动持续字幕生成

The checked-in rtvi-vlm/.env and VSS alerts profiles use:

RTVI_VLM_KAFKA_ENABLED=true

RTVI_VLM_KAFKA_TOPIC=mdx-vlm

RTVI_VLM_KAFKA_INCIDENT_TOPIC=mdx-vlm-incidents

RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors

HOST_IP=<kafka-host>

A copied compose without those env overrides falls back to vision-llm-* topics.

Confirm the live container before consuming:

docker exec vss-rtvi-vlm printenv KAFKA_TOPIC KAFKA_INCIDENT_TOPIC ERROR_MESSAGE_TOPIC

STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "You are a warehouse monitoring system. Describe the scene in one sentence, then on a new line output exactly:\nAnomaly Detected: Yes/No\nReason: <one sentence>\nFlag an anomaly if any worker is missing a hard hat or high-vis vest.", "system_prompt": "Answer the user's question correctly in yes or no.", "model": "$MODEL_ID", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"

**Consume alerts from Kafka**. Kafka values are NvSchema protobuf payloads, so
use `print.value=false` for a clean validation pass that shows timestamp, key,
and headers without dumping binary payload bytes. The VSS alerts/profile source
uses `mdx-vlm-incidents`; a bare copied compose may fall back to
`vision-llm-events-incidents` if no `RTVI_VLM_KAFKA_INCIDENT_TOPIC` override is
loaded. Prefer the live container environment over hard-coded topic names.
```bash
INCIDENT_TOPIC="${INCIDENT_TOPIC:-$(docker exec vss-rtvi-vlm printenv KAFKA_INCIDENT_TOPIC 2>/dev/null || true)}"
INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"

docker exec mdx-kafka kafka-console-consumer \
  --bootstrap-server 127.0.0.1:9092 \
  --topic "$INCIDENT_TOPIC" \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false
If Kafka is not running in the VSS
mdx-kafka
container, use the Kafka CLI from the host or container running the broker:
bash
INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"

kafka-console-consumer \
  --bootstrap-server "$HOST_IP:9092" \
  --topic "$INCIDENT_TOPIC" \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false
For standalone validation, remember that the RT-VLM compose maps Kafka through
KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092
; setting
KAFKA_BOOTSTRAP_SERVERS
directly in
.env
is ignored unless the compose is changed. The broker must advertise a listener reachable from the
vss-rtvi-vlm
container.
localhost
inside the broker and service containers is not the host, and a broker alias such as
kafka:9092
only works when both containers share that Docker network. For RT-VLM-only validation, prefer the self-contained broker in
references/kafka-workflows.md
over the full repo infra compose; the latter expects full-profile SDRC env/config. If Kafka is already running, ask the user whether to reuse it or launch a dedicated broker before stopping or replacing anything. Run CLI checks inside the actual broker container, but still configure the advertised listener so RT-VLM can connect from its container network.
Incident protobuf (
ext.proto :: Incident
) key fields:
sensorId
,
timestamp
,
end
,
objectIds
,
frameIds
,
place
,
analyticsModule
,
category
,
isAnomaly
(
true
for alerts),
llm
(nested VisionLLM),
info
map including
triggerPhrase
,
verdict
,
requestId
,
chunkIdx
,
streamId
,
alertCategory
(if the deployment supports the
alert_category
query field — post-3.1).
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "描述每个事件;每句话以时间戳开头。", "model": "$MODEL_ID", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &

4. Kafka workflows (alerts + message bus)

完成后清理资源。如果实时OpenAPI暴露了

DELETE /v1/generate_captions/{stream_id},请先调用它再注销流。

Dense captioning with alerts on an RTSP stream and the HTTP-vs-Kafka response model are documented in
references/kafka-workflows.md
.
curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
undefined

Error Reference

3. 从RTSP流生成带警报的密集字幕

CodeMeaningCommon Cause
400Bad RequestMissing required field (
id
,
prompt
,
model
); unsupported
media_type
; unknown
model
name
401UnauthorizedMissing/invalid
Authorization: Bearer $API_KEY
— or wrong key format (expect
nvapi-...
)
404Not Found
file_id
deleted / stream_id not registered / wrong endpoint path (note:
{stream_id}
is required on
DELETE /v1/streams/delete/{stream_id}
)
413Payload Too LargeUploaded file exceeds server
MAX_FILE_SIZE
; increase or pre-chunk the video
422Unprocessable EntityPydantic schema violation — e.g.
use_fps_for_chunking=true
without
num_frames_per_second_or_fixed_frames_chunk
; stream ids supplied to a file-only field like
media_info
429Rate LimitedToo many concurrent streams — raise
VLM_BATCH_SIZE
or spread across instances
500Server ErrorVLM inference exception (OOM, model unavailable) — check
docker logs vss-rtvi-vlm
503Service BusyStartup not complete (model still downloading) or upstream NIM dependency unhealthy

bash
undefined

Gotchas

前提:Kafka已启用,且主题与部署源匹配。

已签入的rtvi-vlm/.env和VSS警报配置文件使用:

RTVI_VLM_KAFKA_ENABLED=true

RTVI_VLM_KAFKA_TOPIC=mdx-vlm

RTVI_VLM_KAFKA_INCIDENT_TOPIC=mdx-vlm-incidents

RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors

HOST_IP=<kafka-host>

如果复制的compose文件没有这些环境变量覆盖,会回退到vision-llm-*主题。

消费前请确认容器的实时配置:

docker exec vss-rtvi-vlm printenv KAFKA_TOPIC KAFKA_INCIDENT_TOPIC ERROR_MESSAGE_TOPIC

  • Use the live OpenAPI as the source of truth. For VSS 3.2, the caption-generation endpoint is
    /v1/generate_captions
    . Some older references and images used
    /v1/generate_captions_alerts
    ; do not assume that path exists unless
    GET /openapi.json
    shows it.
  • URL-based input support depends on the deployed service version. If the live schema does not expose
    url
    /
    media_type
    /
    creation_time
    , upload via
    POST /v1/files
    first and pass the returned
    id
    .
  • Alert trigger = the tokens
    "yes"
    or
    "true"
    in the VLM response (case-insensitive)
    . There is no per-request alert flag. Design prompts with an explicit
    Anomaly Detected: Yes/No
    line and set
    system_prompt
    to constrain the model to Yes/No answers (per the VSS docs). Every chunk is published to
    KAFKA_TOPIC
    ; matched chunks additionally go to
    KAFKA_INCIDENT_TOPIC
    with
    isAnomaly=true
    ,
    info["triggerPhrase"]
    set to the matched tokens, and
    info["verdict"]="confirmed"
    .
  • alert_category
    support depends on the deployed service version.
    If the live OpenAPI schema does not expose it, Kafka incidents default
    incident.category = "vlm-alert"
    .
  • Kafka topics are server-side config, not per-request. The
    KAFKA_*
    env vars (via compose
    RTVI_VLM_KAFKA_*
    rewrites) are fixed at container start — clients can't override topics on a per-request basis. Kafka publish is additive to the HTTP response, never a replacement.
  • Topic names differ by deployment source. The checked-in RT-VLM
    .env
    and VSS alerts/profile sources use
    mdx-vlm
    and
    mdx-vlm-incidents
    ; a bare copied compose with no
    RTVI_VLM_KAFKA_*
    overrides falls back to
    vision-llm-messages
    and
    vision-llm-events-incidents
    . Always trust the live
    vss-rtvi-vlm
    environment before consuming.
  • Standalone Kafka must advertise
    ${HOST_IP}:9092
    .
    The RT-VLM compose uses
    KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092
    ; a broker that advertises
    localhost:9094
    or
    kafka:9092
    may pass producer/consumer tests inside the broker container while RT-VLM publish fails.
  • Start Kafka before RT-VLM when Kafka is enabled. For deterministic standalone validation, make the broker reachable at
    ${HOST_IP}:9092
    first. If you start Kafka later or change its advertised listener, restart/recreate
    rtvi-vlm
    before expecting Kafka offsets to move.
  • stream=true
    returns Server-Sent Events, not chunked JSON.
    Use
    curl -N
    (no buffering). Each event is
    data: {...}\n\n
    with per-chunk fields such as
    content
    ,
    start_time
    , and
    end_time
    , terminated by
    data: [DONE]
    . Without
    stream=true
    the server buffers until the full video is processed — fine for short clips (<1 min), avoid for live streams.
  • Trust live OpenAPI for optional NIM-compatible endpoints.
    /v1/license
    is not exposed by current 26.05 builds and returns 404, even though older generic NIM docs may mention it.
  • Prefer
    /v1/chat/completions
    over
    /v1/completions
    .
    Text-only legacy completions return HTTP 400 by design on current 26.05 builds; text-only chat completions work.
  • chunk_duration=0
    disables chunking
    — the entire video is sent to the VLM as one shot. Only meaningful for short clips; long videos will OOM or exceed
    max_model_len
    .
  • Default frame budget caps at
    VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES
    (256).
    Requesting FPS that implies >256 frames per chunk is silently capped; drop FPS or shorten
    chunk_duration
    to stay within budget.
  • enable_reasoning
    requires a Cosmos Reason model.
    Passing it with Qwen3-VL or other non-reasoning models is a no-op.
  • /v1/metrics
    is unauthenticated on current 26.05 standalone builds.
    A Bearer token is harmless if a deployment has stricter auth, but do not fail validation when
    /v1/metrics
    returns HTTP 200 without auth.
  • File upload is multipart, not JSON. Use
    -F file=@path -F purpose=vision -F media_type=video
    ; a
    -d
    body returns 422.
  • Live-stream lifecycle cleanup must unregister the stream:
    DELETE /v1/streams/delete/{stream_id}
    removes the RTSP source. If the live schema also exposes
    DELETE /v1/generate_captions/{stream_id}
    , call it first to stop inference explicitly.
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "你是仓库监控系统。用一句话描述场景,然后在下一行准确输出:\nAnomaly Detected: Yes/No\nReason: <一句话>\n如果任何工人未佩戴安全帽或反光背心,标记为异常。", "system_prompt": "用yes或no正确回答用户的问题。", "model": "$MODEL_ID", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"

**从Kafka消费警报**。Kafka消息值是NvSchema protobuf负载,因此使用`print.value=false`可获得干净的验证结果,显示时间戳、键和头信息,而不会转储二进制负载字节。VSS警报/配置文件源使用`mdx-vlm-incidents`;如果未加载`RTVI_VLM_KAFKA_INCIDENT_TOPIC`覆盖配置,纯复制的compose文件可能回退到`vision-llm-events-incidents`。优先信任容器的实时环境,而非硬编码的主题名称。
```bash
INCIDENT_TOPIC="${INCIDENT_TOPIC:-$(docker exec vss-rtvi-vlm printenv KAFKA_INCIDENT_TOPIC 2>/dev/null || true)}"
INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"

docker exec mdx-kafka kafka-console-consumer \
  --bootstrap-server 127.0.0.1:9092 \
  --topic "$INCIDENT_TOPIC" \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false
如果Kafka未在VSS的
mdx-kafka
容器中运行,请使用运行代理的主机或容器中的Kafka CLI:
bash
INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"

kafka-console-consumer \
  --bootstrap-server "$HOST_IP:9092" \
  --topic "$INCIDENT_TOPIC" \
  --from-beginning \
  --timeout-ms 5000 \
  --max-messages 10 \
  --property print.timestamp=true \
  --property print.key=true \
  --property print.headers=true \
  --property print.value=false
对于独立验证,请记住RT-VLM的compose文件通过
KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092
映射Kafka;除非修改compose文件,否则直接在.env中设置
KAFKA_BOOTSTRAP_SERVERS
会被忽略。代理必须广播一个
vss-rtvi-vlm
容器可访问的监听器。代理和服务容器内部的
localhost
并非主机,而
kafka:9092
等代理别名仅在两个容器共享同一Docker网络时有效。对于仅RT-VLM的验证,优先使用
references/kafka-workflows.md
中的独立代理,而非完整仓库的基础设施compose文件;后者需要完整配置文件的SDRC环境/配置。如果Kafka已在运行,请询问用户是重用现有代理还是启动专用代理,再停止或替换现有资源。在实际代理容器内运行CLI检查,但仍需配置广播监听器,以便RT-VLM能从其容器网络连接。
事件protobuf(
ext.proto :: Incident
)的关键字段:
sensorId
timestamp
end
objectIds
frameIds
place
analyticsModule
category
isAnomaly
(警报时为
true
)、
llm
(嵌套的VisionLLM)、
info
映射(包括
triggerPhrase
verdict
requestId
chunkIdx
streamId
alertCategory
——如果部署支持
alert_category
查询字段,即3.1版本之后)。

4. Kafka工作流(警报+消息总线)

RTSP流的密集字幕生成带警报功能,以及HTTP与Kafka响应模型的文档,请参考
references/kafka-workflows.md

错误参考

状态码含义常见原因
400错误请求缺少必填字段(
id
prompt
model
);不支持的
media_type
;未知的
model
名称
401未授权缺失/无效的
Authorization: Bearer $API_KEY
——或密钥格式错误(应为
nvapi-...
格式)
404未找到
file_id
已删除 / stream_id未注册 / 端点路径错误(注意:
DELETE /v1/streams/delete/{stream_id}
必须传入
{stream_id}
413请求实体过大上传文件超过服务器
MAX_FILE_SIZE
限制;增大限制或预先分割视频
422无法处理的实体Pydantic schema违反——例如
use_fps_for_chunking=true
但未设置
num_frames_per_second_or_fixed_frames_chunk
;向仅支持文件的字段(如
media_info
)传入流ID
429请求受限并发流过多——提高
VLM_BATCH_SIZE
或分散到多个实例
500服务器错误VLM推理异常(OOM、模型不可用)——检查
docker logs vss-rtvi-vlm
503服务繁忙启动未完成(模型仍在下载)或上游NIM依赖不健康

注意事项

  • 以实时OpenAPI为权威来源。对于VSS 3.2,字幕生成端点是
    /v1/generate_captions
    。一些旧参考资料和镜像使用
    /v1/generate_captions_alerts
    ;除非
    GET /openapi.json
    显示该路径存在,否则请勿假设其可用。
  • 基于URL的输入支持取决于部署的服务版本。如果实时schema未暴露
    url
    /
    media_type
    /
    creation_time
    ,请先通过
    POST /v1/files
    上传文件,再传入返回的
    id
  • 警报触发条件为VLM响应中包含
    "yes"
    "true"
    令牌(不区分大小写)
    。没有每个请求的警报标志。设计提示时需包含明确的
    Anomaly Detected: Yes/No
    行,并设置
    system_prompt
    约束模型返回Yes/No答案(根据VSS文档)。每个片段都会发布到
    KAFKA_TOPIC
    ;匹配的片段会额外发送到
    KAFKA_INCIDENT_TOPIC
    ,并设置
    isAnomaly=true
    info["triggerPhrase"]
    为匹配的令牌、
    info["verdict"]="confirmed"
  • alert_category
    支持取决于部署的服务版本
    。如果实时OpenAPI schema未暴露该字段,Kafka事件的
    incident.category
    默认值为
    "vlm-alert"
  • Kafka主题是服务器端配置,而非每个请求可配置
    KAFKA_*
    环境变量(通过compose的
    RTVI_VLM_KAFKA_*
    重写)在容器启动时固定——客户端无法在每个请求中覆盖主题。Kafka发布是对HTTP响应的补充,而非替代。
  • 主题名称因部署源而异。已签入的RT-VLM
    .env
    和VSS警报/配置文件源使用
    mdx-vlm
    mdx-vlm-incidents
    ;如果没有
    RTVI_VLM_KAFKA_*
    覆盖配置,纯复制的compose文件会回退到
    vision-llm-messages
    vision-llm-events-incidents
    。消费前请始终信任
    vss-rtvi-vlm
    的实时环境。
  • 独立Kafka必须广播
    ${HOST_IP}:9092
    。RT-VLM的compose文件使用
    KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092
    ;如果代理广播
    localhost:9094
    kafka:9092
    ,可能在代理容器内通过生产者/消费者测试,但RT-VLM发布会失败。
  • 启用Kafka时,请先启动Kafka再启动RT-VLM。为了确定性的独立验证,请确保代理先在
    ${HOST_IP}:9092
    上可访问。如果稍后启动Kafka或更改其广播监听器,请重启/重建
    rtvi-vlm
    ,再期望Kafka偏移量更新。
  • stream=true
    返回Server-Sent Events,而非分段JSON
    。请使用
    curl -N
    (无缓冲)。每个事件格式为
    data: {...}\n\n
    ,包含
    content
    start_time
    end_time
    等分段字段,以
    data: [DONE]
    结束。如果不设置
    stream=true
    ,服务器会缓冲直到处理完整个视频——适合短剪辑(<1分钟),但请避免用于直播流。
  • 信任实时OpenAPI获取可选的NIM兼容端点。当前26.05版本未暴露
    /v1/license
    ,调用会返回404,即使旧版通用NIM文档可能提到该端点。
  • 优先使用
    /v1/chat/completions
    而非
    /v1/completions
    。当前26.05版本中,纯文本旧版补全请求会返回HTTP 400(设计如此);纯文本聊天补全请求可正常工作。
  • chunk_duration=0
    会禁用分段
    ——整个视频作为一个整体发送给VLM。仅对短剪辑有意义;长视频会导致OOM或超过
    max_model_len
  • 默认帧预算上限为
    VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES
    (256)
    。请求的FPS意味着每个片段超过256帧时会被静默限制;降低FPS或缩短
    chunk_duration
    以保持在预算内。
  • enable_reasoning
    需要Cosmos Reason模型
    。与Qwen3-VL或其他非推理模型一起使用时,该参数无效。
  • 当前26.05独立版本中
    /v1/metrics
    无需认证
    。如果部署有更严格的认证,传入Bearer令牌也无害,但当
    /v1/metrics
    无需认证返回HTTP 200时,请勿将其视为验证失败。
  • 文件上传使用多部分表单,而非JSON。请使用
    -F file=@path -F purpose=vision -F media_type=video
    ;使用
    -d
    请求体返回422错误。
  • 直播流生命周期清理必须注销流
    DELETE /v1/streams/delete/{stream_id}
    会移除RTSP源。如果实时schema还暴露了
    DELETE /v1/generate_captions/{stream_id}
    ,请先调用它显式停止推理。