vss-deploy-dense-captioning
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePurpose
用途
Stand up the RT-VLM dense-captioning microservice on its own and exercise every endpoint it exposes (file upload, generate_captions, stream add/delete, chat-completions, Kafka topics).
独立部署RT-VLM密集字幕微服务,并测试其暴露的所有端点(文件上传、generate_captions、流添加/删除、聊天补全、Kafka主题)。
Prerequisites
前提条件
—
独立RT-VLM部署:
For standalone RT-VLM deployment:
- Docker, Docker Compose, NVIDIA Container Toolkit, and a visible GPU.
- NGC registry credentials in for
$NGC_CLI_API_KEY, image pulls, and local NGC model/artifact downloads.docker login nvcr.io - ,
curl, and any writable working directory for the standalone compose copy.jq
For API calls against an existing service:
- Running RT-VLM service reachable at .
$BASE_URL - Bearer token in or
$RTVI_VLM_API_KEY, depending on how the service was configured.$NGC_CLI_API_KEY
For full VSS profile deployment:
- Use ; this skill does not deploy full VSS profiles.
../vss-deploy-profile/SKILL.md
- Docker、Docker Compose、NVIDIA Container Toolkit,以及可识别的GPU。
- 存储在中的NGC注册表凭证,用于
$NGC_CLI_API_KEY、镜像拉取和本地NGC模型/工件下载。docker login nvcr.io - 、
curl,以及用于存放独立compose副本的可写工作目录。jq
Instructions
针对现有服务的API调用:
Follow the routing tables and step-by-step workflows below. Each section that ends in workflow, quick start, or flow is intended to be executed top-to-bottom. Detailed reference material lives in and helper scripts live in — call them via when the skill points to a script by name.
references/scripts/run_script- 可通过访问的运行中的RT-VLM服务。
$BASE_URL - 根据服务配置,需提供存储在或
$RTVI_VLM_API_KEY中的Bearer令牌。$NGC_CLI_API_KEY
Examples
完整VSS配置文件部署:
Worked end-to-end examples are kept under (each manifest contains a runnable scenario) and inline in the per-workflow blocks below. Run a Tier-3 evaluation with to replay them.
evals/*.jsoncurlnv-base validate <this-skill-dir> --agent-eval- 使用;本技能不负责部署完整VSS配置文件。
../vss-deploy-profile/SKILL.md
Limitations
操作说明
- Requires either a standalone RT-VLM service deployed via this skill or an existing RT-VLM service reachable from the caller.
- NGC-hosted models and NIMs may be subject to rate-limits, GPU memory requirements, and license restrictions.
- Concurrency, GPU memory, and storage limits depend on the host hardware and the profile's compose file.
遵循以下路由表和分步工作流。所有以工作流、快速入门或流程结尾的章节都需要从上到下执行。详细参考资料位于目录,辅助脚本位于目录——当技能指向某个脚本名称时,通过调用它们。
references/scripts/run_scriptTroubleshooting
示例
- Error: REST call returns connection refused. Cause: target microservice not running. Solution: probe or
/docs; redeploy via/healthor the matchingvss-deploy-profileskill.vss-deploy-* - Error: HTTP 401/403 from NGC pulls. Cause: missing/expired . Solution:
NGC_CLI_API_KEYand re-export the key before retrying.docker login nvcr.io - Error: container OOM or model fails to load. Cause: insufficient GPU memory for the selected profile. Solution: switch to a smaller variant or free GPUs via .
docker compose down
完整的端到端示例保存在目录下(每个清单包含一个可运行的场景),并内嵌在各工作流的代码块中。运行可重放这些示例,完成Tier-3评估。
evals/*.jsoncurlnv-base validate <this-skill-dir> --agent-evalDeploy and Use RT-VLM Dense Captioning (VSS 3.2)
限制
RT-VLM is NVIDIA's real-time vision-language microservice: decode video (file or
RTSP), segment it into chunks, run a VLM (, , or any
OpenAI-compatible model), stream dense captions back over SSE/HTTP, and publish
captions, incident alerts, and errors to Kafka. Use this skill to deploy the
standalone RT-VLM service when a full VSS profile is not already running, then call
its API for caption generation, file upload, live-stream management, health
checks, NIM-compatible chat completions, or Prometheus metrics. API reference:
https://docs.nvidia.com/vss/latest/real-time-vlm-api.html.
cosmos-reason1cosmos-reason2/v1/...- 需要通过本技能部署的独立RT-VLM服务,或调用方可访问的现有RT-VLM服务。
- NGC托管的模型和NIM可能受速率限制、GPU内存要求和许可证限制。
- 并发数、GPU内存和存储限制取决于主机硬件和配置文件的compose文件。
Deployment Routing
故障排除
If the user asks to deploy a full VSS profile, use
. That skill
owns profile routing, , , multi-service sizing, and
full-stack deploy/teardown.
../vss-deploy-profile/SKILL.mdgenerated.envresolved.ymlIf the user asks for standalone RT-VLM dense captioning, or no VSS profile is
already running, use the standalone RT-VLM flow in
before calling the API. This follows the same compose-centric pattern as
: gather context, run preflights, work from a local copy,
dry-run with , review, deploy, then wait for health.
references/deploy-rt-vlm-service.mdvss-deploy-profiledocker compose config- 错误:REST调用返回连接拒绝。原因:目标微服务未运行。解决方案:探测或
/docs端点;通过/health或对应的vss-deploy-profile技能重新部署。vss-deploy-* - 错误:NGC拉取时返回HTTP 401/403。原因:缺失或过期。解决方案:执行
NGC_CLI_API_KEY并重新导出密钥后重试。docker login nvcr.io - 错误:容器内存不足(OOM)或模型加载失败。原因:所选配置文件的GPU内存不足。解决方案:切换到更小的模型变体,或通过释放GPU资源。
docker compose down
Standalone Deployment Flow
部署并使用RT-VLM密集字幕服务(VSS 3.2)
Always follow this sequence. Never skip the dry-run.
bash
undefinedRT-VLM是NVIDIA的实时视觉语言微服务:解码视频(文件或RTSP流),将其分割为片段,运行VLM模型(、或任何OpenAI兼容模型),通过SSE/HTTP返回密集字幕流,并将字幕、事件警报和错误信息发布到Kafka。当完整VSS配置文件未运行时,使用本技能部署独立RT-VLM服务,然后调用其 API进行字幕生成、文件上传、直播流管理、健康检查、NIM兼容的聊天补全或Prometheus指标查询。API参考文档:https://docs.nvidia.com/vss/latest/real-time-vlm-api.html。
cosmos-reason1cosmos-reason2/v1/...1. Copy deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml
部署路由
into any writable standalone working directory.
—
2. Derive RTVI_VLM_IMAGE_TAG from that compose copy.
—
3. Strip the standalone-only dangling depends_on block from the copy.
—
4. Create a gitignored .env with the required RT-VLM values.
—
5. Prepare host bind paths such as $VSS_DATA_DIR/data_log/vst/clip_storage.
—
6. docker compose --env-file .env -f rtvi-vlm-docker-compose.yml config --quiet
—
7. docker pull the exact RT-VLM image tag.
—
8. docker compose ... up -d rtvi-vlm, wait for ready, then smoke test.
—
Run preflights before any pull or `up`; stop and fix failures here before
debugging RT-VLM itself:
```bash
nvidia-smi --query-gpu=index,name --format=csv,noheader
nvidia-container-cli info
docker compose version
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smiFor standalone single-file deployments, do not run the raw
directly: it
contains references to sibling VLM/NIM services that are only
defined in the full VSS/met-blueprints compose project. The standalone reference
shows how to copy the compose file, derive the current image tag from it, strip
the block, and validate the result before .
deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.ymldepends_ondepends_onupIf fails with a containerd snapshotter/unpack error on Docker 28+,
apply the fix in the
standalone reference before retrying.
docker pull/etc/docker/daemon.jsoncontainerd-snapshotter=falseMinimum standalone values:
.env| Host env var | Required when | Purpose |
|---|---|---|
| Standalone deploy path | NGC registry image pull and NGC model/artifact download |
| Authenticated API calls | RT-VLM bearer auth after the service is running |
| Always | Host API port mapped to container |
| Always | Kafka bootstrap host ( |
| Always | Required clip-storage bind mount |
| Always for standalone | Backend selector; use |
| Local self-hosted model | Source-backed Cosmos Reason 2 path: |
| | Remote/sibling OpenAI-compatible VLM endpoint |
| | Model/deployment name exposed by that endpoint |
如果用户要求部署完整VSS配置文件,请使用。该技能负责配置文件路由、、、多服务规模调整和全栈部署/销毁。
../vss-deploy-profile/SKILL.mdgenerated.envresolved.yml如果用户要求部署独立RT-VLM密集字幕服务,或当前未运行任何VSS配置文件,请先使用中的独立RT-VLM流程,再调用API。此流程遵循与相同的compose中心模式:收集上下文、运行预检查、基于本地副本操作、通过执行试运行、审核配置、部署,然后等待服务就绪。
references/deploy-rt-vlm-service.mdvss-deploy-profiledocker compose configSetup
独立部署流程
bash
export BASE_URL="http://localhost:${RTVI_VLM_PORT:-8018}" # host-side RT-VLM port
export API_KEY="${NGC_CLI_API_KEY:-${RTVI_VLM_API_KEY:-}}" # bearer token used by host-side curl commands
: "${API_KEY:?Set NGC_CLI_API_KEY or RTVI_VLM_API_KEY before calling authenticated endpoints}"Every request below uses . Health endpoints
(, , , ) typically work without auth.
Authorization: Bearer $API_KEY/v1/health/*/v1/ready/v1/live/v1/startupSmoke test before use:
bash
curl -fsS "$BASE_URL/v1/health/ready"
MODEL_ID="$(curl -fsS "$BASE_URL/v1/models" -H "Authorization: Bearer $API_KEY" | jq -r '.data[0].id // .id')"
curl -fsS "$BASE_URL/openapi.json" | jq -r '.paths | keys[]' | sort请严格遵循以下步骤,切勿跳过试运行环节。
bash
undefinedQuick Start — dense captions from a local video
1. 将deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.yml
—
复制到任意可写的独立工作目录。
—
2. 从该compose副本中获取RTVI_VLM_IMAGE_TAG。
—
3. 移除副本中仅适用于独立部署的无用depends_on块。
—
4. 创建一个被git忽略的.env文件,填入所需的RT-VLM配置值。
—
5. 准备主机绑定路径,例如$VSS_DATA_DIR/data_log/vst/clip_storage。
—
6. 执行docker compose --env-file .env -f rtvi-vlm-docker-compose.yml config --quiet
—
7. 拉取指定版本的RT-VLM镜像。
—
8. 执行docker compose ... up -d rtvi-vlm,等待服务就绪,然后进行冒烟测试。
bash
undefined
在执行拉取或`up`操作前先运行预检查;在此阶段解决所有失败问题,再调试RT-VLM本身:
```bash
nvidia-smi --query-gpu=index,name --format=csv,noheader
nvidia-container-cli info
docker compose version
docker run --rm --gpus all nvidia/cuda:12.4.0-base-ubuntu22.04 nvidia-smi对于独立单文件部署,请勿直接运行原始的:它包含对同级VLM/NIM服务的引用,而这些服务仅在完整VSS/met-blueprints compose项目中定义。独立部署参考文档展示了如何复制compose文件、从中获取当前镜像标签、移除块,并在操作前验证结果。
deploy/docker/services/rtvi/rtvi-vlm/rtvi-vlm-docker-compose.ymldepends_ondepends_onup如果在Docker 28+版本中执行时出现containerd快照器/解压错误,请先应用独立参考文档中提到的设置的修复方案,再重试。
docker pull/etc/docker/daemon.jsoncontainerd-snapshotter=false1. Upload the video, capture its file id
独立部署所需的最小.env配置值:
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')
| 主机环境变量 | 适用场景 | 用途 |
|---|---|---|
| 独立部署路径 | NGC注册表镜像拉取和NGC模型/工件下载 |
| 需认证的API调用 | 服务运行后的RT-VLM Bearer认证 |
| 所有场景 | 映射到容器端口 |
| 所有场景 | Kafka引导主机( |
| 所有场景 | 必需的剪辑存储绑定挂载路径 |
| 所有独立部署场景 | 后端选择器;默认本地模型使用 |
| 本地自托管模型 | Cosmos Reason 2模型的源路径: |
| | 远程/同级OpenAI兼容VLM端点 |
| | 该端点暴露的模型/部署名称 |
2. Generate captions + alerts (SSE stream of chunked responses)
环境设置
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Write a concise dense caption for each 10-second segment of this warehouse video.", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Write a concise dense caption for each 10-second segment of this warehouse video.", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"
undefinedbash
export BASE_URL="http://localhost:${RTVI_VLM_PORT:-8018}" # 主机侧RT-VLM端口
export API_KEY="${NGC_CLI_API_KEY:-${RTVI_VLM_API_KEY:-}}" # 主机侧curl命令使用的Bearer令牌
: "${API_KEY:?调用需认证的端点前,请设置NGC_CLI_API_KEY或RTVI_VLM_API_KEY}"以下所有请求均使用。健康端点(、、、)通常无需认证即可访问。
Authorization: Bearer $API_KEY/v1/health/*/v1/ready/v1/live/v1/startup使用前的冒烟测试:
bash
curl -fsS "$BASE_URL/v1/health/ready"
MODEL_ID="$(curl -fsS "$BASE_URL/v1/models" -H "Authorization: Bearer $API_KEY" | jq -r '.data[0].id // .id')"
curl -fsS "$BASE_URL/openapi.json" | jq -r '.paths | keys[]' | sortEndpoints
快速入门——从本地视频生成密集字幕
Captions
—
Generate VLM captions and alerts for videos and live streams.
bash
undefinedPOST /v1/generate_captions
— Generate VLM captions (and alerts) for video/stream
POST /v1/generate_captions1. 上传视频,获取文件ID
Required:
| Field | Type | Description |
|---|---|---|
| string | array | UUID of a previously-uploaded file, or id of an active live stream. Accepts a list of ids for batch |
| string | User prompt to the VLM (e.g. dense-caption instruction) |
| string | Exact model id returned by |
Key optional fields:
| Field | Type | Default | Description |
|---|---|---|---|
| string | — | System prompt; use |
| boolean | false | Turn on reasoning for Cosmos Reason models |
| boolean | false | Transcribe audio (via Riva) and fold into captions |
| integer | — | Segment video into N-second chunks ( |
| integer | 0 | Overlap between consecutive chunks |
| number | — | FPS (if |
| boolean | false | Interpret above as FPS vs. fixed-frame count |
| int | — | Resize frames before inference (0 = native) |
| object | — | |
| boolean | false | SSE: emit per-chunk caption deltas as |
| Standard sampling controls | ||
| object | — | Query response format object |
| object | — | Extra kwargs for the multimodal processor (e.g. size, shortest/longest edge) |
bash
curl -N -X POST "$BASE_URL/v1/generate_captions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"id": "123e4567-e89b-12d3-a456-426614174000",
"prompt": "Dense-caption this warehouse video, one sentence per 10s chunk.",
"model": "nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8",
"chunk_duration": 10,
"stream": true
}'Response shape: live 26.05 responses use with
/; SSE streams terminate with . See
.
chunk_responsesstart_timeend_timedata: [DONE]references/api-surface-26.05.mdFILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')
-H "Authorization: Bearer $API_KEY"
-F "file=@/path/to/warehouse.mp4"
-F "purpose=vision"
-F "media_type=video" | jq -r '.id')
DELETE /v1/generate_captions/{stream_id}
— Stop caption generation for a live stream, if exposed
DELETE /v1/generate_captions/{stream_id}2. 生成字幕和警报(分段响应的SSE流)
Some deployments expose this companion stop endpoint. Check the live OpenAPI
() before using it.
Always pair live-stream cleanup with to
un-register the RTSP source.
curl -fsS "$BASE_URL/openapi.json" | jq '.paths | keys[]'DELETE /v1/streams/delete/{stream_id}bash
curl -X DELETE "$BASE_URL/v1/generate_captions/$STREAM_ID" -H "Authorization: Bearer $API_KEY"curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "为该仓库视频的每10秒片段生成简洁的密集字幕。", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY"
-H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "为该仓库视频的每10秒片段生成简洁的密集字幕。", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"
undefinedFiles
端点说明
—
字幕生成
Upload and manage media files consumed by./v1/generate_captions
为视频和直播流生成VLM字幕和警报。
POST /v1/files
— Upload a media file (multipart)
POST /v1/filesPOST /v1/generate_captions
— 为视频/流生成VLM字幕(和警报)
POST /v1/generate_captionsbash
curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
-F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"Response: .
Optional metadata such as may be accepted by newer builds; check
the live OpenAPI before sending it.
{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }sensor_name必填字段:
| 字段 | 类型 | 描述 |
|---|---|---|
| string | array | 已上传文件的UUID,或活跃直播流的ID。支持传入ID列表进行批量处理 |
| string | 给VLM的用户提示(例如密集字幕生成指令) |
| string | |
关键可选字段:
| 字段 | 类型 | 默认值 | 描述 |
|---|---|---|---|
| string | — | 系统提示;使用 |
| boolean | false | 为Cosmos Reason模型开启推理功能 |
| boolean | false | 转录音频(通过Riva)并整合到字幕中 |
| integer | — | 将视频分割为N秒的片段( |
| integer | 0 | 连续片段之间的重叠时长 |
| number | — | FPS(当 |
| boolean | false | 将上述字段解释为FPS还是固定帧数 |
| int | — | 推理前调整帧大小(0表示使用原生尺寸) |
| object | — | |
| boolean | false | SSE:以 |
| 标准采样控制参数 | ||
| object | — | 查询响应格式对象 |
| object | — | 多模态处理器的额外参数(例如尺寸、最短/最长边) |
bash
curl -N -X POST "$BASE_URL/v1/generate_captions" \
-H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d '{
"id": "123e4567-e89b-12d3-a456-426614174000",
"prompt": "为该仓库视频生成密集字幕,每10秒片段一句话。",
"model": "nim_nvidia_cosmos-reason2-8b_0303-fp8-dynamic-kv8",
"chunk_duration": 10,
"stream": true
}'响应格式: 26.05版本的实时响应使用包含/的;SSE流以结束。详情请参考。
start_timeend_timechunk_responsesdata: [DONE]references/api-surface-26.05.mdGET /v1/files?purpose=vision
— List uploaded files
GET /v1/files?purpose=visionDELETE /v1/generate_captions/{stream_id}
— 停止直播流的字幕生成(若端点已暴露)
DELETE /v1/generate_captions/{stream_id}GET /v1/files/{file_id}
— File metadata
GET /v1/files/{file_id}—
GET /v1/files/{file_id}/content
— Download original file content
GET /v1/files/{file_id}/content—
DELETE /v1/files/{file_id}
— Delete file (releases asset storage)
DELETE /v1/files/{file_id}—
Live Stream
—
RTSP stream lifecycle.
部分部署会暴露此配套停止端点。使用前请检查实时OpenAPI()。清理直播流时,请务必配合调用以注销RTSP源。
curl -fsS "$BASE_URL/openapi.json" | jq '.paths | keys[]'DELETE /v1/streams/delete/{stream_id}bash
curl -X DELETE "$BASE_URL/v1/generate_captions/$STREAM_ID" -H "Authorization: Bearer $API_KEY"POST /v1/streams/add
— Register one or more RTSP streams
POST /v1/streams/add文件管理
Required per stream: (must start with ), .
Optional: , , , and placement metadata
(, , , , ,
, ).
liveStreamUrlrtsp://descriptionusernamepasswordsensor_nameplace_nameplace_typeplace_latplace_lonplace_altplace_coordinate_xplace_coordinate_yPrecheck public or external RTSP sources before registering them. A probe exit
code alone is not enough; can exit while reporting an
unknown media type. Treat the stream as usable only when a probe output
identifies at least one video stream/caps entry. If one probe is inconclusive,
cross-check with another tool such as before failing or registering:
gst-discoverer-1.00ffprobebash
ffprobe -v error -select_streams v:0 \
-show_entries stream=codec_type -of csv=p=0 "$RTSP_URL" | grep -qx videobash
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
-d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
| jq -r '.results[0].id')上传和管理供使用的媒体文件。/v1/generate_captions
GET /v1/streams/get-stream-info
— List active streams
GET /v1/streams/get-stream-infoPOST /v1/files
— 上传媒体文件(多部分表单)
POST /v1/filesDELETE /v1/streams/delete/{stream_id}
— Remove a single stream
DELETE /v1/streams/delete/{stream_id}—
DELETE /v1/streams/delete-batch
— Remove many ({"stream_ids":[...]}
)
DELETE /v1/streams/delete-batch{"stream_ids":[...]}—
CV-style singular stream endpoints
—
26.05 deployments also expose CV-style stream control paths:
, , and
. Use these when a workflow or release note explicitly uses
the key/value envelope; otherwise prefer the plural RT-VLM stream endpoints
above. See
for examples and the compatibility caveat.
POST /v1/stream/addGET /v1/stream/get-stream-infoPOST /v1/stream/removereferences/api-surface-26.05.mdstream_count:0bash
curl -X POST "$BASE_URL/v1/files" -H "Authorization: Bearer $API_KEY" \
-F "file=@./video.mp4" -F "purpose=vision" -F "media_type=video"响应: 。新版本可能接受等可选元数据;发送前请检查实时OpenAPI。
{ "id", "object": "file", "bytes", "created_at", "filename", "purpose" }sensor_nameNIM Compatible
GET /v1/files?purpose=vision
— 列出已上传文件
GET /v1/files?purpose=vision—
GET /v1/files/{file_id}
— 获取文件元数据
GET /v1/files/{file_id}—
GET /v1/files/{file_id}/content
— 下载原始文件内容
GET /v1/files/{file_id}/content—
DELETE /v1/files/{file_id}
— 删除文件(释放资产存储空间)
DELETE /v1/files/{file_id}—
直播流管理
OpenAI-compatible endpoints for interop with OpenAI/NVIDIA-API clients.
RTSP流生命周期管理。
POST /v1/chat/completions
— OpenAI-compatible chat (text + multimodal)
POST /v1/chat/completionsPOST /v1/streams/add
— 注册一个或多个RTSP流
POST /v1/streams/addRequired: , . Text-only requests work and omit ,
, and . For uploaded-video, direct ,
direct , streaming, and RTSP-backed chat examples, see
.
messagesmodelidvideo_urlimage_urlvideo_urlimage_urlreferences/api-surface-26.05.mdbash
curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{\"model\":\"$MODEL_ID\",\"messages\":[{\"role\":\"user\",\"content\":\"Summarize this scene.\"}]}"每个流必填字段: (必须以开头)、。
可选字段:、、,以及位置元数据(、、、、、、)。
liveStreamUrlrtsp://descriptionusernamepasswordsensor_nameplace_nameplace_typeplace_latplace_lonplace_altplace_coordinate_xplace_coordinate_y注册公共或外部RTSP源前请先进行预检查。仅探测退出码不足以判断;可能返回但报告未知媒体类型。只有当探测输出识别到至少一个视频流/caps条目时,才认为该流可用。如果一次探测结果不确定,请使用等其他工具交叉验证后再决定是否注册:
gst-discoverer-1.00ffprobebash
ffprobe -v error -select_streams v:0 \
-show_entries stream=codec_type -of csv=p=0 "$RTSP_URL" | grep -qx videobash
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add" \
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json" \
-d '{"streams":[{"liveStreamUrl":"rtsp://cam:8554/live","description":"warehouse cam 1"}]}' \
| jq -r '.results[0].id')POST /v1/completions
— OpenAI-compatible legacy completions
POST /v1/completionsGET /v1/streams/get-stream-info
— 列出活跃流
GET /v1/streams/get-stream-info—
DELETE /v1/streams/delete/{stream_id}
— 删除单个流
DELETE /v1/streams/delete/{stream_id}—
DELETE /v1/streams/delete-batch
— 批量删除多个流(传入{"stream_ids":[...]}
)
DELETE /v1/streams/delete-batch{"stream_ids":[...]}—
CV风格的单流端点
This endpoint exists for compatibility, but on current 26.05 builds text-only legacy
completion requests return HTTP 400 by design. Use for
text-only and multimodal requests.
/v1/chat/completions26.05版本部署还暴露CV风格的流控制路径:、和。当工作流或发行说明明确使用键值信封时使用这些端点;否则优先使用上述RT-VLM的复数流端点。示例和兼容性注意事项请参考。
POST /v1/stream/addGET /v1/stream/get-stream-infoPOST /v1/stream/removestream_count:0references/api-surface-26.05.mdGET /v1/version
— { "version": "3.2.0-..." }
GET /v1/version{ "version": "3.2.0-..." }NIM兼容端点
GET /v1/manifest
— NIM manifest
GET /v1/manifest—
GET /v1/health/live
· GET /v1/health/ready
— NIM-style probes
GET /v1/health/liveGET /v1/health/ready—
Do not assume exists. The current 26.05 live OpenAPI does not expose it
and the endpoint returns 404; only call it after checking .
/v1/licenseGET /openapi.json与OpenAI/NVIDIA-API客户端互操作的OpenAI兼容端点。
Models · Metadata · Metrics · Health Check
POST /v1/chat/completions
— OpenAI兼容的聊天(文本+多模态)
POST /v1/chat/completionsGET /v1/models
— List loaded VLMs: { "data": [{ "id", "object": "model", "owned_by" }] }
GET /v1/models{ "data": [{ "id", "object": "model", "owned_by" }] }—
GET /v1/metadata
— Service metadata (build, release, image tag)
GET /v1/metadata—
GET /v1/assets/stats
— Asset storage counts, TTL, and oldest-asset age
GET /v1/assets/stats—
GET /v1/metrics
— Prometheus metrics (plain text)
GET /v1/metrics—
GET /v1/ready
· GET /v1/live
· GET /v1/startup
— Kubernetes-style probes
GET /v1/readyGET /v1/liveGET /v1/startup—
必填字段: 、。纯文本请求无需传入、和。上传视频、直接、直接、流和RTSP支持的聊天示例,请参考。
messagesmodelidvideo_urlimage_urlvideo_urlimage_urlreferences/api-surface-26.05.mdbash
curl -X POST "$BASE_URL/v1/chat/completions" -H "Authorization: Bearer $API_KEY" \
-H "Content-Type: application/json" \
-d "{\"model\":\"$MODEL_ID\",\"messages\":[{\"role\":\"user\",\"content\":\"Summarize this scene.\"}]}"Common Workflows
POST /v1/completions
— OpenAI兼容的旧版补全
POST /v1/completionsThe four standard dense-captioning scenarios.
此端点仅用于兼容性,但在当前26.05版本中,纯文本旧版补全请求会返回HTTP 400(设计如此)。请使用处理纯文本和多模态请求。
/v1/chat/completions1. Dense captions from a stored video file
GET /v1/version
— 返回{ "version": "3.2.0-..." }
GET /v1/version{ "version": "3.2.0-..." }—
GET /v1/manifest
— 获取NIM清单
GET /v1/manifest—
GET /v1/health/live
· GET /v1/health/ready
— NIM风格的探测端点
GET /v1/health/liveGET /v1/health/readybash
undefined请勿假设端点存在。当前26.05版本的实时OpenAPI未暴露该端点,调用会返回404;请先检查再调用。
/v1/licenseGET /openapi.jsonUpload → capture file id → generate captions (SSE stream)
模型·元数据·指标·健康检查
—
GET /v1/models
— 列出已加载的VLM模型:{ "data": [{ "id", "object": "model", "owned_by" }] }
GET /v1/models{ "data": [{ "id", "object": "model", "owned_by" }] }—
GET /v1/metadata
— 获取服务元数据(构建版本、发行版、镜像标签)
GET /v1/metadata—
GET /v1/assets/stats
— 获取资产存储统计信息、TTL和最旧资产时长
GET /v1/assets/stats—
GET /v1/metrics
— 获取Prometheus指标(纯文本格式)
GET /v1/metrics—
GET /v1/ready
· GET /v1/live
· GET /v1/startup
— Kubernetes风格的探测端点
GET /v1/readyGET /v1/liveGET /v1/startupFILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Describe warehouse events in 1 sentence per 10s chunk.", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "Describe warehouse events in 1 sentence per 10s chunk.", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"
When done, free storage:
常见工作流
curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"
undefined四种标准的密集字幕生成场景。
2. Dense captions from an RTSP live stream
1. 从存储的视频文件生成密集字幕
bash
undefinedbash
undefinedRegister the stream
上传视频 → 获取文件ID → 生成字幕(SSE流)
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
FILE_ID=$(curl -fsS -X POST "$BASE_URL/v1/files"
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
-H "Authorization: Bearer $API_KEY"
-F "file=@warehouse.mp4" -F "purpose=vision" -F "media_type=video" | jq -r '.id')
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "每10秒片段用一句话描述仓库事件。", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$FILE_ID", "prompt": "每10秒片段用一句话描述仓库事件。", "model": "$MODEL_ID", "chunk_duration": 10, "stream": true }"
Start continuous caption generation
完成后释放存储空间:
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "Describe each event; start each sentence with a timestamp.", "model": "$MODEL_ID", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "Describe each event; start each sentence with a timestamp.", "model": "$MODEL_ID", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &
curl -X DELETE "$BASE_URL/v1/files/$FILE_ID" -H "Authorization: Bearer $API_KEY"
undefinedTear down when finished. If the live OpenAPI exposes
2. 从RTSP直播流生成密集字幕
DELETE /v1/generate_captions/{stream_id}, call it before unregistering.
—
curl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
undefinedbash
undefined3. Dense captions with alerts from an RTSP stream
注册直播流
bash
undefinedSTREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
Pre-req: Kafka is enabled and topics match the deployment source.
启动持续字幕生成
The checked-in rtvi-vlm/.env and VSS alerts profiles use:
—
RTVI_VLM_KAFKA_ENABLED=true
—
RTVI_VLM_KAFKA_TOPIC=mdx-vlm
—
RTVI_VLM_KAFKA_INCIDENT_TOPIC=mdx-vlm-incidents
—
RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors
—
HOST_IP=<kafka-host>
—
A copied compose without those env overrides falls back to vision-llm-* topics.
—
Confirm the live container before consuming:
—
docker exec vss-rtvi-vlm printenv KAFKA_TOPIC KAFKA_INCIDENT_TOPIC ERROR_MESSAGE_TOPIC
—
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "You are a warehouse monitoring system. Describe the scene in one sentence, then on a new line output exactly:\nAnomaly Detected: Yes/No\nReason: <one sentence>\nFlag an anomaly if any worker is missing a hard hat or high-vis vest.", "system_prompt": "Answer the user's question correctly in yes or no.", "model": "$MODEL_ID", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "You are a warehouse monitoring system. Describe the scene in one sentence, then on a new line output exactly:\nAnomaly Detected: Yes/No\nReason: <one sentence>\nFlag an anomaly if any worker is missing a hard hat or high-vis vest.", "system_prompt": "Answer the user's question correctly in yes or no.", "model": "$MODEL_ID", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"
**Consume alerts from Kafka**. Kafka values are NvSchema protobuf payloads, so
use `print.value=false` for a clean validation pass that shows timestamp, key,
and headers without dumping binary payload bytes. The VSS alerts/profile source
uses `mdx-vlm-incidents`; a bare copied compose may fall back to
`vision-llm-events-incidents` if no `RTVI_VLM_KAFKA_INCIDENT_TOPIC` override is
loaded. Prefer the live container environment over hard-coded topic names.
```bash
INCIDENT_TOPIC="${INCIDENT_TOPIC:-$(docker exec vss-rtvi-vlm printenv KAFKA_INCIDENT_TOPIC 2>/dev/null || true)}"
INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"
docker exec mdx-kafka kafka-console-consumer \
--bootstrap-server 127.0.0.1:9092 \
--topic "$INCIDENT_TOPIC" \
--from-beginning \
--timeout-ms 5000 \
--max-messages 10 \
--property print.timestamp=true \
--property print.key=true \
--property print.headers=true \
--property print.value=falseIf Kafka is not running in the VSS container, use the Kafka CLI from
the host or container running the broker:
mdx-kafkabash
INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"
kafka-console-consumer \
--bootstrap-server "$HOST_IP:9092" \
--topic "$INCIDENT_TOPIC" \
--from-beginning \
--timeout-ms 5000 \
--max-messages 10 \
--property print.timestamp=true \
--property print.key=true \
--property print.headers=true \
--property print.value=falseFor standalone validation, remember that the RT-VLM compose maps Kafka through
; setting
directly in is ignored unless the compose is changed. The broker must
advertise a listener reachable from the container.
inside the broker and service containers is not the host, and a broker alias
such as only works when both containers share that Docker network.
For RT-VLM-only validation, prefer the self-contained broker in
over the full repo infra compose; the latter
expects full-profile SDRC env/config. If Kafka is already running, ask the user
whether to reuse it or launch a dedicated broker before stopping or replacing
anything. Run CLI checks inside the actual broker container, but still configure
the advertised listener so RT-VLM can connect from its container network.
KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092KAFKA_BOOTSTRAP_SERVERS.envvss-rtvi-vlmlocalhostkafka:9092references/kafka-workflows.mdIncident protobuf () key fields: , , ,
, , , , , ( for
alerts), (nested VisionLLM), map including , ,
, , , (if the deployment supports the
query field — post-3.1).
ext.proto :: IncidentsensorIdtimestampendobjectIdsframeIdsplaceanalyticsModulecategoryisAnomalytruellminfotriggerPhraseverdictrequestIdchunkIdxstreamIdalertCategoryalert_categorycurl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "描述每个事件;每句话以时间戳开头。", "model": "$MODEL_ID", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "描述每个事件;每句话以时间戳开头。", "model": "$MODEL_ID", "chunk_duration": 10, "num_frames_per_second_or_fixed_frames_chunk": 2, "use_fps_for_chunking": true, "stream": true }" &
4. Kafka workflows (alerts + message bus)
完成后清理资源。如果实时OpenAPI暴露了
—
DELETE /v1/generate_captions/{stream_id},请先调用它再注销流。
Dense captioning with alerts on an RTSP stream and the HTTP-vs-Kafka response model are documented in .
references/kafka-workflows.mdcurl -X DELETE "$BASE_URL/v1/streams/delete/$STREAM_ID" -H "Authorization: Bearer $API_KEY"
undefinedError Reference
3. 从RTSP流生成带警报的密集字幕
| Code | Meaning | Common Cause |
|---|---|---|
| 400 | Bad Request | Missing required field ( |
| 401 | Unauthorized | Missing/invalid |
| 404 | Not Found | |
| 413 | Payload Too Large | Uploaded file exceeds server |
| 422 | Unprocessable Entity | Pydantic schema violation — e.g. |
| 429 | Rate Limited | Too many concurrent streams — raise |
| 500 | Server Error | VLM inference exception (OOM, model unavailable) — check |
| 503 | Service Busy | Startup not complete (model still downloading) or upstream NIM dependency unhealthy |
bash
undefinedGotchas
前提:Kafka已启用,且主题与部署源匹配。
—
已签入的rtvi-vlm/.env和VSS警报配置文件使用:
—
RTVI_VLM_KAFKA_ENABLED=true
—
RTVI_VLM_KAFKA_TOPIC=mdx-vlm
—
RTVI_VLM_KAFKA_INCIDENT_TOPIC=mdx-vlm-incidents
—
RTVI_VLM_ERROR_MESSAGE_TOPIC=vision-llm-errors
—
HOST_IP=<kafka-host>
—
如果复制的compose文件没有这些环境变量覆盖,会回退到vision-llm-*主题。
—
消费前请确认容器的实时配置:
—
docker exec vss-rtvi-vlm printenv KAFKA_TOPIC KAFKA_INCIDENT_TOPIC ERROR_MESSAGE_TOPIC
- Use the live OpenAPI as the source of truth. For VSS 3.2, the caption-generation endpoint is . Some older references and images used
/v1/generate_captions; do not assume that path exists unless/v1/generate_captions_alertsshows it.GET /openapi.json - URL-based input support depends on the deployed service version. If the live schema does not expose /
url/media_type, upload viacreation_timefirst and pass the returnedPOST /v1/files.id - Alert trigger = the tokens or
"yes"in the VLM response (case-insensitive). There is no per-request alert flag. Design prompts with an explicit"true"line and setAnomaly Detected: Yes/Noto constrain the model to Yes/No answers (per the VSS docs). Every chunk is published tosystem_prompt; matched chunks additionally go toKAFKA_TOPICwithKAFKA_INCIDENT_TOPIC,isAnomaly=trueset to the matched tokens, andinfo["triggerPhrase"].info["verdict"]="confirmed" - support depends on the deployed service version. If the live OpenAPI schema does not expose it, Kafka incidents default
alert_category.incident.category = "vlm-alert" - Kafka topics are server-side config, not per-request. The env vars (via compose
KAFKA_*rewrites) are fixed at container start — clients can't override topics on a per-request basis. Kafka publish is additive to the HTTP response, never a replacement.RTVI_VLM_KAFKA_* - Topic names differ by deployment source. The checked-in RT-VLM and VSS alerts/profile sources use
.envandmdx-vlm; a bare copied compose with nomdx-vlm-incidentsoverrides falls back toRTVI_VLM_KAFKA_*andvision-llm-messages. Always trust the livevision-llm-events-incidentsenvironment before consuming.vss-rtvi-vlm - Standalone Kafka must advertise . The RT-VLM compose uses
${HOST_IP}:9092; a broker that advertisesKAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092orlocalhost:9094may pass producer/consumer tests inside the broker container while RT-VLM publish fails.kafka:9092 - Start Kafka before RT-VLM when Kafka is enabled. For deterministic standalone validation, make the broker reachable at first. If you start Kafka later or change its advertised listener, restart/recreate
${HOST_IP}:9092before expecting Kafka offsets to move.rtvi-vlm - returns Server-Sent Events, not chunked JSON. Use
stream=true(no buffering). Each event iscurl -Nwith per-chunk fields such asdata: {...}\n\n,content, andstart_time, terminated byend_time. Withoutdata: [DONE]the server buffers until the full video is processed — fine for short clips (<1 min), avoid for live streams.stream=true - Trust live OpenAPI for optional NIM-compatible endpoints. is not exposed by current 26.05 builds and returns 404, even though older generic NIM docs may mention it.
/v1/license - Prefer over
/v1/chat/completions. Text-only legacy completions return HTTP 400 by design on current 26.05 builds; text-only chat completions work./v1/completions - disables chunking — the entire video is sent to the VLM as one shot. Only meaningful for short clips; long videos will OOM or exceed
chunk_duration=0.max_model_len - Default frame budget caps at (256). Requesting FPS that implies >256 frames per chunk is silently capped; drop FPS or shorten
VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMESto stay within budget.chunk_duration - requires a Cosmos Reason model. Passing it with Qwen3-VL or other non-reasoning models is a no-op.
enable_reasoning - is unauthenticated on current 26.05 standalone builds. A Bearer token is harmless if a deployment has stricter auth, but do not fail validation when
/v1/metricsreturns HTTP 200 without auth./v1/metrics - File upload is multipart, not JSON. Use ; a
-F file=@path -F purpose=vision -F media_type=videobody returns 422.-d - Live-stream lifecycle cleanup must unregister the stream: removes the RTSP source. If the live schema also exposes
DELETE /v1/streams/delete/{stream_id}, call it first to stop inference explicitly.DELETE /v1/generate_captions/{stream_id}
STREAM_ID=$(curl -fsS -X POST "$BASE_URL/v1/streams/add"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d '{"streams":[{"liveStreamUrl":"rtsp://10.0.0.5:8554/warehouse","description":"warehouse cam"}]}'
| jq -r '.results[0].id')
curl -N -X POST "$BASE_URL/v1/generate_captions"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "你是仓库监控系统。用一句话描述场景,然后在下一行准确输出:\nAnomaly Detected: Yes/No\nReason: <一句话>\n如果任何工人未佩戴安全帽或反光背心,标记为异常。", "system_prompt": "用yes或no正确回答用户的问题。", "model": "$MODEL_ID", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"
-H "Authorization: Bearer $API_KEY" -H "Content-Type: application/json"
-d "{ "id": "$STREAM_ID", "prompt": "你是仓库监控系统。用一句话描述场景,然后在下一行准确输出:\nAnomaly Detected: Yes/No\nReason: <一句话>\n如果任何工人未佩戴安全帽或反光背心,标记为异常。", "system_prompt": "用yes或no正确回答用户的问题。", "model": "$MODEL_ID", "chunk_duration": 60, "chunk_overlap_duration": 10, "stream": true }"
**从Kafka消费警报**。Kafka消息值是NvSchema protobuf负载,因此使用`print.value=false`可获得干净的验证结果,显示时间戳、键和头信息,而不会转储二进制负载字节。VSS警报/配置文件源使用`mdx-vlm-incidents`;如果未加载`RTVI_VLM_KAFKA_INCIDENT_TOPIC`覆盖配置,纯复制的compose文件可能回退到`vision-llm-events-incidents`。优先信任容器的实时环境,而非硬编码的主题名称。
```bash
INCIDENT_TOPIC="${INCIDENT_TOPIC:-$(docker exec vss-rtvi-vlm printenv KAFKA_INCIDENT_TOPIC 2>/dev/null || true)}"
INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"
docker exec mdx-kafka kafka-console-consumer \
--bootstrap-server 127.0.0.1:9092 \
--topic "$INCIDENT_TOPIC" \
--from-beginning \
--timeout-ms 5000 \
--max-messages 10 \
--property print.timestamp=true \
--property print.key=true \
--property print.headers=true \
--property print.value=false如果Kafka未在VSS的容器中运行,请使用运行代理的主机或容器中的Kafka CLI:
mdx-kafkabash
INCIDENT_TOPIC="${INCIDENT_TOPIC:-mdx-vlm-incidents}"
kafka-console-consumer \
--bootstrap-server "$HOST_IP:9092" \
--topic "$INCIDENT_TOPIC" \
--from-beginning \
--timeout-ms 5000 \
--max-messages 10 \
--property print.timestamp=true \
--property print.key=true \
--property print.headers=true \
--property print.value=false对于独立验证,请记住RT-VLM的compose文件通过映射Kafka;除非修改compose文件,否则直接在.env中设置会被忽略。代理必须广播一个容器可访问的监听器。代理和服务容器内部的并非主机,而等代理别名仅在两个容器共享同一Docker网络时有效。对于仅RT-VLM的验证,优先使用中的独立代理,而非完整仓库的基础设施compose文件;后者需要完整配置文件的SDRC环境/配置。如果Kafka已在运行,请询问用户是重用现有代理还是启动专用代理,再停止或替换现有资源。在实际代理容器内运行CLI检查,但仍需配置广播监听器,以便RT-VLM能从其容器网络连接。
KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092KAFKA_BOOTSTRAP_SERVERSvss-rtvi-vlmlocalhostkafka:9092references/kafka-workflows.md事件protobuf()的关键字段:、、、、、、、、(警报时为)、(嵌套的VisionLLM)、映射(包括、、、、、——如果部署支持查询字段,即3.1版本之后)。
ext.proto :: IncidentsensorIdtimestampendobjectIdsframeIdsplaceanalyticsModulecategoryisAnomalytruellminfotriggerPhraseverdictrequestIdchunkIdxstreamIdalertCategoryalert_category—
4. Kafka工作流(警报+消息总线)
—
RTSP流的密集字幕生成带警报功能,以及HTTP与Kafka响应模型的文档,请参考。
references/kafka-workflows.md—
错误参考
—
| 状态码 | 含义 | 常见原因 |
|---|---|---|
| 400 | 错误请求 | 缺少必填字段( |
| 401 | 未授权 | 缺失/无效的 |
| 404 | 未找到 | |
| 413 | 请求实体过大 | 上传文件超过服务器 |
| 422 | 无法处理的实体 | Pydantic schema违反——例如 |
| 429 | 请求受限 | 并发流过多——提高 |
| 500 | 服务器错误 | VLM推理异常(OOM、模型不可用)——检查 |
| 503 | 服务繁忙 | 启动未完成(模型仍在下载)或上游NIM依赖不健康 |
—
注意事项
—
- 以实时OpenAPI为权威来源。对于VSS 3.2,字幕生成端点是。一些旧参考资料和镜像使用
/v1/generate_captions;除非/v1/generate_captions_alerts显示该路径存在,否则请勿假设其可用。GET /openapi.json - 基于URL的输入支持取决于部署的服务版本。如果实时schema未暴露/
url/media_type,请先通过creation_time上传文件,再传入返回的POST /v1/files。id - 警报触发条件为VLM响应中包含或
"yes"令牌(不区分大小写)。没有每个请求的警报标志。设计提示时需包含明确的"true"行,并设置Anomaly Detected: Yes/No约束模型返回Yes/No答案(根据VSS文档)。每个片段都会发布到system_prompt;匹配的片段会额外发送到KAFKA_TOPIC,并设置KAFKA_INCIDENT_TOPIC、isAnomaly=true为匹配的令牌、info["triggerPhrase"]。info["verdict"]="confirmed" - 支持取决于部署的服务版本。如果实时OpenAPI schema未暴露该字段,Kafka事件的
alert_category默认值为incident.category。"vlm-alert" - Kafka主题是服务器端配置,而非每个请求可配置。环境变量(通过compose的
KAFKA_*重写)在容器启动时固定——客户端无法在每个请求中覆盖主题。Kafka发布是对HTTP响应的补充,而非替代。RTVI_VLM_KAFKA_* - 主题名称因部署源而异。已签入的RT-VLM 和VSS警报/配置文件源使用
.env和mdx-vlm;如果没有mdx-vlm-incidents覆盖配置,纯复制的compose文件会回退到RTVI_VLM_KAFKA_*和vision-llm-messages。消费前请始终信任vision-llm-events-incidents的实时环境。vss-rtvi-vlm - 独立Kafka必须广播。RT-VLM的compose文件使用
${HOST_IP}:9092;如果代理广播KAFKA_BOOTSTRAP_SERVERS=${HOST_IP}:9092或localhost:9094,可能在代理容器内通过生产者/消费者测试,但RT-VLM发布会失败。kafka:9092 - 启用Kafka时,请先启动Kafka再启动RT-VLM。为了确定性的独立验证,请确保代理先在上可访问。如果稍后启动Kafka或更改其广播监听器,请重启/重建
${HOST_IP}:9092,再期望Kafka偏移量更新。rtvi-vlm - 返回Server-Sent Events,而非分段JSON。请使用
stream=true(无缓冲)。每个事件格式为curl -N,包含data: {...}\n\n、content和start_time等分段字段,以end_time结束。如果不设置data: [DONE],服务器会缓冲直到处理完整个视频——适合短剪辑(<1分钟),但请避免用于直播流。stream=true - 信任实时OpenAPI获取可选的NIM兼容端点。当前26.05版本未暴露,调用会返回404,即使旧版通用NIM文档可能提到该端点。
/v1/license - 优先使用而非
/v1/chat/completions。当前26.05版本中,纯文本旧版补全请求会返回HTTP 400(设计如此);纯文本聊天补全请求可正常工作。/v1/completions - 会禁用分段——整个视频作为一个整体发送给VLM。仅对短剪辑有意义;长视频会导致OOM或超过
chunk_duration=0。max_model_len - 默认帧预算上限为(256)。请求的FPS意味着每个片段超过256帧时会被静默限制;降低FPS或缩短
VLLM_MM_PROCESSOR_VIDEO_NUM_FRAMES以保持在预算内。chunk_duration - 需要Cosmos Reason模型。与Qwen3-VL或其他非推理模型一起使用时,该参数无效。
enable_reasoning - 当前26.05独立版本中无需认证。如果部署有更严格的认证,传入Bearer令牌也无害,但当
/v1/metrics无需认证返回HTTP 200时,请勿将其视为验证失败。/v1/metrics - 文件上传使用多部分表单,而非JSON。请使用;使用
-F file=@path -F purpose=vision -F media_type=video请求体返回422错误。-d - 直播流生命周期清理必须注销流:会移除RTSP源。如果实时schema还暴露了
DELETE /v1/streams/delete/{stream_id},请先调用它显式停止推理。DELETE /v1/generate_captions/{stream_id}