vss-deploy-profile

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

VSS Deploy

VSS 部署

Purpose

用途

Deploy any VSS profile (
base
,
search
,
lvs
,
warehouse
,
alerts
,
edge
) using a compose-centric workflow: build env overrides, generate resolved compose (dry-run), review, then deploy. This SKILL.md covers the cross-profile concerns (profile routing, prerequisites, NGC, GPU setup, and the deploy/teardown flow). Profile-specific service lists, sizing, env recipes, endpoints, and debugging live in per-profile reference docs — load the one that matches the user's intent.
Helper script:
run_script("scripts/normalize_resolved_yml.py", "<resolved.yml>")
normalizes a
docker compose config
dry-run dump for diff-friendly review during Step 3c. All other deployment work goes through
compose
/
dev-profile.sh
.
采用以Compose为核心的工作流部署任意VSS配置文件(
base
search
lvs
warehouse
alerts
edge
):构建环境覆盖配置、生成已解析的Compose配置(dry-run)、审查、然后部署。本SKILL.md涵盖跨配置文件的通用事项(配置文件路由前置条件NGCGPU设置以及部署/拆除流程)。特定配置文件的服务列表、资源规格、环境配置方案、端点和调试内容请查看对应配置文件的参考文档——加载与用户需求匹配的文档即可。
辅助脚本:
run_script("scripts/normalize_resolved_yml.py", "<resolved.yml>")
用于标准化
docker compose config
的dry-run输出结果,以便在步骤3c中进行便于对比的审查。所有其他部署操作均通过
compose
/
dev-profile.sh
完成。

Available Scripts

可用脚本

ScriptPurposeArguments
scripts/normalize_resolved_yml.py
Strip optional
depends_on
entries for services filtered out of
resolved.yml
before deploy.
Path to
resolved.yml
脚本用途参数
scripts/normalize_resolved_yml.py
移除
resolved.yml
中被过滤掉的服务的可选
depends_on
条目,以便在部署前进行便于对比的审查。
resolved.yml
的路径

Profile Routing

配置文件路由

Match the user's request to a profile, then load that profile's reference for sizing, services, env recipes, and debugging.
User saysProfileReference
"deploy vss" / "deploy base"
base
references/base.md
"deploy alerts" / "alert verification" / "real-time alerts" / "deploy for incident report"
alerts
references/alerts.md
"deploy lvs" / "video summarization"
lvs
references/lvs-profile.md
"deploy search" / "video search"
search
references/search.md
"deploy warehouse" / "warehouse blueprint" / "vss warehouse"
warehouse
references/warehouse.md
"debug warehouse" / "warehouse not working" / "warehouse FPS low" / "warehouse BEV out of sync"
warehouse
(debug)
references/warehouse-debug.md
Edge hardware routing (DGX Spark, AGX/IGX Thor): see
references/edge.md
. DGX Spark uses the Spark Nano 9B standalone local LLM on port
30081
; AGX/IGX Thor uses the Edge 4B standalone vLLM fallback.
Each profile's reference owns its sizing table. Don't pick a deployment shape from this file — open the profile reference and check minimum GPU count for the host's hardware against the (mode × platform) matrix there.
将用户请求与对应配置文件匹配,然后加载该配置文件的参考文档以获取资源规格、服务信息、环境配置方案和调试内容。
用户表述配置文件参考文档
"deploy vss" / "deploy base"
base
references/base.md
"deploy alerts" / "alert verification" / "real-time alerts" / "deploy for incident report"
alerts
references/alerts.md
"deploy lvs" / "video summarization"
lvs
references/lvs-profile.md
"deploy search" / "video search"
search
references/search.md
"deploy warehouse" / "warehouse blueprint" / "vss warehouse"
warehouse
references/warehouse.md
"debug warehouse" / "warehouse not working" / "warehouse FPS low" / "warehouse BEV out of sync"
warehouse
(调试)
references/warehouse-debug.md
边缘硬件路由(DGX Spark、AGX/IGX Thor):请查看
references/edge.md
。DGX Spark使用端口
30081
上的Spark Nano 9B独立本地LLM;AGX/IGX Thor使用Edge 4B独立vLLM降级方案。
每个配置文件的参考文档独立维护其资源规格表。请勿从本文档中选择部署配置——请打开对应配置文件的参考文档,对照(模式×平台)矩阵检查主机硬件所需的最低GPU数量。

Instructions

操作步骤

The deployment flow is always: copy
.env
to
generated.env
, apply overrides, dry-run compose into
resolved.yml
, review, normalize, deploy, then wait for readiness.
bash
undefined
部署流程始终遵循以下步骤:复制
.env
generated.env
、应用覆盖配置、通过dry-run生成
resolved.yml
、审查、标准化、部署、然后等待就绪。
bash
undefined

1. cp dev-profile-<profile>/.env dev-profile-<profile>/generated.env (clean copy)

1. cp dev-profile-<profile>/.env dev-profile-<profile>/generated.env (干净副本)

2. Apply env overrides to generated.env (source .env stays untouched)

2. 对generated.env应用环境覆盖配置 (源.env保持不变)

3. docker compose --env-file generated.env config > resolved.yml (dry-run)

3. docker compose --env-file generated.env config > resolved.yml (dry-run)

4. Review resolved.yml

4. 审查resolved.yml

5. docker compose --env-file generated.env -f resolved.yml up -d

5. docker compose --env-file generated.env -f resolved.yml up -d


The source `.env` is treated as **read-only defaults** committed to the repo. The skill's per-deploy working copy is `generated.env` — same pattern `dev-profile.sh` uses internally. This keeps the checked-in `.env` clean across iterations.

源`.env`被视为提交到仓库的**只读默认配置**。技能的每次部署工作副本为`generated.env`——与`dev-profile.sh`内部使用的模式一致。这样可以确保迭代过程中已签入的`.env`保持干净。

Prerequisites

前置条件

  1. Repo path — find
    video-search-and-summarization/
    on disk. Check
    TOOLS.md
    if available.
  2. NGC CLI & API key — see
    references/ngc.md
    . Confirm
    $NGC_CLI_API_KEY
    is set.
  3. System prerequisites (GPU driver, Docker, NVIDIA Container Toolkit, kernel sysctls) — full checks in
    references/prerequisites.md
    . Canonical hardware/driver matrix is the VSS prerequisites page.
  1. 仓库路径——在磁盘上找到
    video-search-and-summarization/
    目录。如果有
    TOOLS.md
    请查看该文档。
  2. NGC CLI & API密钥——请查看
    references/ngc.md
    。确认已设置
    $NGC_CLI_API_KEY
  3. 系统前置条件(GPU驱动、Docker、NVIDIA Container Toolkit、内核系统参数)——完整检查项请查看
    references/prerequisites.md
    。标准硬件/驱动矩阵请参考VSS前置条件页面

Pre-flight check

预部署检查

Run before every deploy. The full system checklist and remediation steps live in
references/prerequisites.md
. For DGX Spark / IGX Thor / AGX Thor, also run the cache-cleaner check in
references/edge.md
.
Detect sudo mode first. Several pre-flight remediations and the edge cache-cleaner installer call
sudo
. If the host requires a sudo password, those steps will silently no-op under
sudo -n
and leave the deploy in a half-prepared state.
bash
if sudo -n true 2>/dev/null; then
  echo "passwordless sudo — pre-flight will auto-install missing pieces"
else
  echo "sudo requires password — pre-flight will NOT auto-install; hand commands to the user"
fi
When sudo needs a password, the skill must not run privileged installers itself. Surface the copy-pasteable command block from
references/prerequisites.md
to the user with a "run this once and confirm" handoff, then resume after the user replies.
Minimum smoke test (must succeed):
bash
nvidia-smi --query-gpu=index,name --format=csv,noheader
docker info 2>/dev/null | grep -qi runtimes \
  && docker run --rm --gpus all ubuntu:22.04 nvidia-smi >/dev/null 2>&1 \
  && echo "nvidia runtime OK"
If the smoke test fails, do not proceed; open
references/prerequisites.md
for the remediation tree.
每次部署前均需运行。完整的系统检查清单和修复步骤请查看
references/prerequisites.md
。对于DGX Spark / IGX Thor / AGX Thor,还需运行
references/edge.md
中的缓存清理程序检查。
首先检测sudo模式。多项预部署修复操作和边缘缓存清理安装程序会调用
sudo
。如果主机需要sudo密码,这些步骤在
sudo -n
下会静默执行失败,导致部署处于半准备状态。
bash
if sudo -n true 2>/dev/null; then
  echo "无需密码的sudo权限——预部署将自动安装缺失组件"
else
  echo "sudo需要密码——预部署将不会自动安装;请将命令交给用户手动执行"
fi
当sudo需要密码时,技能不得自行运行特权安装程序。请从
references/prerequisites.md
中提取可复制粘贴的命令块提供给用户,并提示“运行此命令一次并确认完成”,等待用户回复后再继续。
最低验证测试(必须成功):
bash
nvidia-smi --query-gpu=index,name --format=csv,noheader
docker info 2>/dev/null | grep -qi runtimes \
  && docker run --rm --gpus all ubuntu:22.04 nvidia-smi >/dev/null 2>&1 \
  && echo "nvidia runtime OK"
如果验证测试失败,请勿继续部署;请打开
references/prerequisites.md
查看修复流程。

Model Selection

模型选择

  • $LLM_REMOTE_URL
    /
    $VLM_REMOTE_URL
    if the user asks for remote
  • $NGC_CLI_API_KEY
    (local NIMs) or
    $NVIDIA_API_KEY
    (remote)
If no combination on this host satisfies the profile's sizing requirements, stop and report the blocker — don't silently pick another shape.
Edge shared mode is platform-specific. On DGX Spark, run
nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:1.0.0-variant
as a standalone local NIM on port
30081
and point the agent at it with
LLM_MODE=remote
. On AGX/IGX Thor, keep using the Edge 4B standalone vLLM fallback with
HF_TOKEN
. Full recipes are in
references/edge.md
.
  • 如果用户要求使用远程模型,设置
    $LLM_REMOTE_URL
    /
    $VLM_REMOTE_URL
  • 本地NIM使用
    $NGC_CLI_API_KEY
    ,远程模型使用
    $NVIDIA_API_KEY
如果主机无法满足所选配置文件的资源规格要求,请停止操作并报告阻塞问题——请勿擅自选择其他配置。
边缘共享模式具有平台特异性。在DGX Spark上,运行
nvcr.io/nim/nvidia/nvidia-nemotron-nano-9b-v2-dgx-spark:1.0.0-variant
作为端口
30081
上的独立本地NIM,并通过
LLM_MODE=remote
让agent指向该模型。在AGX/IGX Thor上,继续使用带
HF_TOKEN
的Edge 4B独立vLLM降级方案。完整配置方案请查看
references/edge.md

Deployment Flow

部署流程

Always follow this sequence. Never skip the dry-run.
请始终遵循以下步骤序列,切勿跳过dry-run步骤。

Step 0 — Tear down any existing deployment + clear data volumes

步骤0 — 拆除现有部署并清理数据卷

If a deployment already exists, tear it down AND clear stale data volumes before redeploying.
Full procedure lives in
references/teardown.md
.
如果已存在部署,在重新部署前需拆除现有部署并清理陈旧数据卷。
完整流程请查看
references/teardown.md

Step 0a — Credentials gate (run before any env mutation)

步骤0a — 凭据验证(在任何环境变更前运行)

Validate every credential the chosen profile needs before Step 1c copies
.env
to
generated.env
. A 401 here is a 30-second failure; the same 401 inside a NIM cold-start is a 10–20 min failure. Run the discovery and probe flow in
references/credentials.md
, then map the result against the chosen mode: missing or invalid required credentials are blockers, optional credentials are not.
在步骤1c将
.env
复制到
generated.env
之前,验证所选配置文件所需的所有凭据。在此阶段出现401错误仅需30秒即可排查,而在NIM冷启动过程中出现相同错误则需要10–20分钟才能发现。请运行
references/credentials.md
中的发现和探测流程,然后将结果与所选模式匹配:缺失或无效的必填凭据属于阻塞问题,可选凭据则不会阻塞部署。

Step 1 — Gather context

步骤1 — 收集上下文信息

Before building env overrides, confirm:
ValueHow to determine
ProfileMatch user intent to the routing table above. Default:
base
Repo pathFind
video-search-and-summarization/
on disk
Hardware
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
LLM/VLM placementCross-reference available GPUs against the chosen profile's Minimum GPU count table
API keys
NGC_CLI_API_KEY
for local NIMs,
NVIDIA_API_KEY
for remote
HOST_IP
hostname -I | awk '{print $1}'
— the host's primary internal IP
EXTERNAL_IP
Browser-reachable host/IP. On Brev, use the secure-link domain (see
references/brev.md
).
HAPROXY_PORT
Browser-facing ingress port. Default
7777
; ensure it is free.
Before
docker compose up
, verify
EXTERNAL_IP
,
HAPROXY_PORT
,
VSS_PUBLIC_HOST
, and
VSS_PUBLIC_PORT
are populated with browser-reachable values. Otherwise the stack may appear healthy while UI/API/VST links 404 or loop through Cloudflare Access.
在构建环境覆盖配置前,请确认以下信息:
信息项确认方式
配置文件根据用户需求与上述路由表匹配。默认值:
base
仓库路径在磁盘上找到
video-search-and-summarization/
目录
硬件信息运行
nvidia-smi --query-gpu=name,memory.total --format=csv,noheader
LLM/VLM部署位置将可用GPU与所选配置文件的最低GPU数量表交叉核对
API密钥本地NIM使用
NGC_CLI_API_KEY
,远程模型使用
NVIDIA_API_KEY
HOST_IP
运行
hostname -I | awk '{print $1}'
——主机的主要内部IP
EXTERNAL_IP
可通过浏览器访问的主机/IP。在Brev平台上,请使用安全链接域名(请查看
references/brev.md
)。
HAPROXY_PORT
面向浏览器的入口端口。默认值为
7777
;请确保该端口未被占用。
在运行
docker compose up
前,请验证
EXTERNAL_IP
HAPROXY_PORT
VSS_PUBLIC_HOST
VSS_PUBLIC_PORT
已配置为可通过浏览器访问的值。否则,即使堆栈显示健康状态,UI/API/VST链接也可能出现404错误或通过Cloudflare Access循环跳转。

Step 1b — Prepare the data directory

步骤1b — 准备数据目录

Layout (asset paths, ownership, mount points, profile-specific subdirs) is documented in
references/data-directory.md
. Read that file before deploying for the first time on a host or when changing profiles.
FORBIDDEN:
chown -R ubuntu:ubuntu $VSS_DATA_DIR
(or any recursive chown).
This is "good housekeeping" to a shell-admin instinct but is the deploy-breaking command in this stack. You will observe a "healthy" deploy (containers Up, endpoints 200) while the video pipeline is silently broken. Use
chmod -R 777
on the specific subdirs documented in
data-directory.md
— nothing else.
目录结构(资源路径、权限、挂载点、配置文件特定子目录)请查看
references/data-directory.md
。首次在主机上部署或切换配置文件前,请阅读该文档。
禁止执行:
chown -R ubuntu:ubuntu $VSS_DATA_DIR
(或任何递归chown命令)
对于shell管理员来说这看似是“良好的管理操作”,但却是导致本部署堆栈失败的关键命令。执行该命令后,部署可能显示“健康”状态(容器已启动、端点返回200),但视频流水线会静默失效。请仅对
data-directory.md
中记录的特定子目录执行
chmod -R 777
——请勿对其他目录执行该操作。

Step 1c — Initialize
generated.env

步骤1c — 初始化
generated.env

The skill's per-deploy working copy. Always start from a fresh copy of the source
.env
— never mutate the source.
bash
PROFILE=base
ENV_SRC=$REPO/deploy/docker/developer-profiles/dev-profile-$PROFILE/.env
ENV_GEN=$REPO/deploy/docker/developer-profiles/dev-profile-$PROFILE/generated.env

cp "$ENV_SRC" "$ENV_GEN"
All subsequent writes (Brev
EXTERNAL_IP
, the env_overrides dict from Step 2) go to
$ENV_GEN
.
$ENV_SRC
is read-only from here on.
技能的每次部署工作副本。请始终从源
.env
的干净副本开始——切勿修改源文件。
bash
PROFILE=base
ENV_SRC=$REPO/deploy/docker/developer-profiles/dev-profile-$PROFILE/.env
ENV_GEN=$REPO/deploy/docker/developer-profiles/dev-profile-$PROFILE/generated.env

cp "$ENV_SRC" "$ENV_GEN"
所有后续写入操作(Brev平台的
EXTERNAL_IP
、步骤2中的env_overrides字典)均针对
$ENV_GEN
。从此处开始,
$ENV_SRC
为只读状态。

Step 1d — If deploying on Brev, set
EXTERNAL_IP
to the secure-link domain

步骤1d — 如果在Brev平台部署,将
EXTERNAL_IP
设置为安全链接域名

Read
BREV_ENV_ID
from
/etc/environment
and write
EXTERNAL_IP
into
generated.env
(NOT
.env
). Full secure-link behavior and troubleshooting are in
references/brev.md
.
bash
brev_env_id=$(awk -F= '/^BREV_ENV_ID=/ {gsub(/"/, "", $2); print $2; exit}' /etc/environment)
sed -i "s|^EXTERNAL_IP=.*|EXTERNAL_IP=7777-${brev_env_id}.brevlab.com|" "$ENV_GEN"
/etc/environment
中读取
BREV_ENV_ID
并将
EXTERNAL_IP
写入
generated.env
不是.env)。完整的安全链接行为和故障排查请查看
references/brev.md
bash
brev_env_id=$(awk -F= '/^BREV_ENV_ID=/ {gsub(/"/, "", $2); print $2; exit}' /etc/environment)
sed -i "s|^EXTERNAL_IP=.*|EXTERNAL_IP=7777-${brev_env_id}.brevlab.com|" "$ENV_GEN"

Step 2 — Build env_overrides

步骤2 — 构建env_overrides

Produce an
env_overrides
dict from the user request and the gathered context: choose remote/local LLM/VLM, set credentials, point at endpoints, set platform-specific flags. The full mapping (every override key, when it applies, defaults, profile-specific differences) lives in
references/env-overrides.md
. Each profile reference has worked examples for that profile's common scenarios.
根据用户请求和收集到的上下文信息生成
env_overrides
字典:选择远程/本地LLM/VLM、设置凭据、指向端点、设置平台特定标志。完整的映射关系(所有覆盖配置项、适用场景、默认值、配置文件特定差异)请查看
references/env-overrides.md
。每个配置文件的参考文档均包含该配置文件常见场景的示例。

Step 3 — Apply overrides + dry-run

步骤3 — 应用覆盖配置 + dry-run

Working env file:
<repo>/deploy/docker/developer-profiles/dev-profile-<profile>/generated.env
(created in Step 1c).
Two env files, distinct roles.
  • .env
    read-only defaults, checked in. Don't mutate it from the skill.
  • generated.env
    the skill's per-deploy working copy. All overrides (the dict from Step 2, plus the Brev
    EXTERNAL_IP
    from Step 1d) land here.
    --env-file
    always points at this file. Post-deploy verifiers should also read from
    generated.env
    for the actually-deployed values — see Debugging a Deployment.
generated.env
matches the convention
dev-profile.sh
uses internally — it's a per-invocation scratchpad regenerated by
cp .env generated.env
each run.
bash
undefined
工作环境文件
<repo>/deploy/docker/developer-profiles/dev-profile-<profile>/generated.env
(步骤1c中创建)。
两个环境文件,职责不同
  • .env
    —— 只读默认配置,已签入仓库。请勿通过技能修改该文件。
  • generated.env
    —— 技能的每次部署工作副本。所有覆盖配置(步骤2中的字典、步骤1d中的Brev平台
    EXTERNAL_IP
    )均写入该文件。
    --env-file
    始终指向该文件。部署后验证程序也应从
    generated.env
    读取实际部署的值——请查看部署调试
generated.env
dev-profile.sh
内部使用的约定一致——它是每次调用时通过
cp .env generated.env
重新生成的临时文件。
bash
undefined

(Step 1c already ran: cp $ENV_SRC $ENV_GEN)

(步骤1c已执行:cp $ENV_SRC $ENV_GEN)

Apply the env_overrides dict from Step 2 to generated.env

将步骤2中的env_overrides字典应用到generated.env

(read lines, update matching keys, append new keys, write)

(读取行、更新匹配的键、追加新键、写入)

Example:

示例:

sed -i "s|^LLM_MODE=.*|LLM_MODE=remote|" "$ENV_GEN"

sed -i "s|^LLM_MODE=.*|LLM_MODE=remote|" "$ENV_GEN"

sed -i "s|^LLM_BASE_URL=.*|LLM_BASE_URL=http://localhost:30081|" "$ENV_GEN"

sed -i "s|^LLM_BASE_URL=.*|LLM_BASE_URL=http://localhost:30081|" "$ENV_GEN"

Resolve compose

解析Compose配置

cd $REPO/deploy/docker docker compose --env-file $ENV_GEN config > resolved.yml

The resolved YAML is saved to `<repo>/deploy/docker/resolved.yml`.
cd $REPO/deploy/docker docker compose --env-file $ENV_GEN config > resolved.yml

已解析的YAML文件保存到`<repo>/deploy/docker/resolved.yml`。

Step 3b — Verify resolved.yml has no unexpanded ${...} tokens

步骤3b — 验证resolved.yml中无未展开的${...}令牌

Unexpanded
${VAR}
tokens in
resolved.yml
mean compose did not see those env values. Diagnostic procedure and common culprits live in
references/troubleshooting.md
.
resolved.yml
中存在未展开的
${VAR}
令牌意味着Compose未读取到这些环境变量值。诊断流程和常见问题请查看
references/troubleshooting.md

Step 3c — Strip dangling optional
depends_on
from resolved.yml

步骤3c — 从resolved.yml中移除悬空的可选
depends_on
条目

MUST run after Step 3, before Step 5. Skipping this aborts the deploy:
Normalize - drop optional dependencies for services filtered out from resolved.yml
bash
undefined
必须在步骤3之后、步骤5之前运行。跳过此步骤将导致部署失败:
标准化——移除resolved.yml中被过滤掉的服务的可选依赖项
bash
undefined

From the repo root

从仓库根目录执行

uv run skills/vss-deploy-profile/scripts/normalize_resolved_yml.py "$REPO/deploy/docker/resolved.yml"
If `uv` isn't on the host, install it once with `curl -LsSf https://astral.sh/uv/install.sh | sh` (no root needed).
**Re-validate** before `up -d`:

```bash
docker compose -f "$REPO/deploy/docker/resolved.yml" config --quiet && echo "resolved.yml OK"
If validation still fails after the normalizer runs, capture the error and inspect — that's a different bug (a dependency that's not optional, or another schema violation), not the dangling-depends_on case.
uv run skills/vss-deploy-profile/scripts/normalize_resolved_yml.py "$REPO/deploy/docker/resolved.yml"
如果主机上未安装`uv`,请运行`curl -LsSf https://astral.sh/uv/install.sh | sh`进行安装(无需root权限)。
**在执行`up -d`前重新验证**:

```bash
docker compose -f "$REPO/deploy/docker/resolved.yml" config --quiet && echo "resolved.yml OK"
如果运行标准化程序后验证仍失败,请捕获错误并检查——这是另一个问题(非可选依赖项或其他架构违规),而非悬空depends_on的情况。

Step 4 — Review

步骤4 — 审查

Show the user a summary of what will be deployed:
  • Profile name and hardware
  • LLM/VLM models and mode (local/remote/local_shared)
  • Services that will start
  • GPU device assignment
  • Key endpoints (UI port, agent port)
Ask: "Looks good — deploy now?" and wait for confirmation before Step 5.
Exception — autonomous mode. If the user's request already asks you to run autonomously (e.g. "deploy X autonomously", "run without confirmation", "non-interactive"), skip the confirmation prompt and proceed straight to Step 5. This path exists so automated eval / CI invocations don't hang waiting for a human reply they'll never get. In all other cases, a human must approve.
向用户展示即将部署的内容摘要:
  • 配置文件名称和硬件信息
  • LLM/VLM模型和模式(本地/远程/local_shared)
  • 将启动的服务
  • GPU设备分配
  • 关键端点(UI端口、agent端口)
询问:“确认无误——是否现在部署?”,等待用户确认后再执行步骤5。
例外情况——自主模式。如果用户请求明确要求自主运行(例如“自主部署X”、“无需确认直接运行”、“非交互式”),则跳过确认提示直接执行步骤5。此模式用于自动化评估/CI调用,避免因等待人工回复而挂起。其他所有情况均需人工确认。

Step 5 — Deploy

步骤5 — 部署

bash
cd $REPO/deploy/docker
docker compose --env-file $ENV_GEN -f resolved.yml up -d
--env-file
is mandatory.
Without the same
generated.env
used in Step 3,
COMPOSE_PROFILES
may be unset and
up -d
can exit 0 with zero selected services.
Do NOT use
--force-recreate
on retries.
It destroys already-warm NIM containers, forcing another 3–5 min torch.compile + CUDA-graph capture per NIM. If the previous
up -d
partially failed, fix the root cause (usually perms or an env typo) and just re-run
up -d
— Docker will re-create only the containers whose config changed or that are down.
docker compose up -d
only creates containers; it does not wait for internal services to finish warming. Never declare deploy success until the readiness gates pass.
bash
cd $REPO/deploy/docker
docker compose --env-file $ENV_GEN -f resolved.yml up -d
--env-file
是必填项
。如果未使用步骤3中相同的
generated.env
COMPOSE_PROFILES
可能未设置,
up -d
可能返回0但未启动任何选定服务。
重试时请勿使用
--force-recreate
。该命令会销毁已预热的NIM容器,导致每个NIM需重新进行3–5分钟的torch.compile + CUDA-graph捕获。如果之前的
up -d
部分失败,请修复根本原因(通常是权限或环境变量输入错误)并重新运行
up -d
——Docker仅会重新创建配置已更改或处于停止状态的容器。
docker compose up -d
仅创建容器,不会等待内部服务完成预热。在就绪检查通过前,切勿宣布部署成功。

Step 5b — Wait until the stack is actually healthy

步骤5b — 等待堆栈完全健康

Gate 0 — container count must be > 0. Refuse to proceed past
up -d
until compose started the expected services:
bash
expected=$(docker compose --env-file $ENV_GEN -f resolved.yml config --services | wc -l)
actual=$(docker compose -f resolved.yml ps -q | wc -l)
[ "$actual" -gt 0 ] && [ "$actual" -ge "$expected" ] \
  || { echo "FAIL: expected $expected services, got $actual — re-check Step 5 --env-file"; exit 1; }
Cold deploys can take 10–20 min. The full readiness procedure lives in
references/readiness.md
, and each profile reference lists the required endpoints. Never declare deploy done after
up -d
; only after every documented endpoint succeeds.
检查项0 — 容器数量必须>0。在
up -d
执行后,需确认Compose已启动预期数量的服务,否则拒绝继续:
bash
expected=$(docker compose --env-file $ENV_GEN -f resolved.yml config --services | wc -l)
actual=$(docker compose -f resolved.yml ps -q | wc -l)
[ "$actual" -gt 0 ] && [ "$actual" -ge "$expected" ] \
  || { echo "FAIL: expected $expected services, got $actual — re-check Step 5 --env-file"; exit 1; }
冷启动部署可能需要10–20分钟。完整的就绪检查流程请查看
references/readiness.md
,每个配置文件的参考文档均列出了所需的端点。
up -d
执行后切勿宣布部署完成;仅当所有文档中记录的端点均成功响应后才可宣布完成

Tear Down

拆除部署

bash
cd $REPO/deploy/docker
docker compose -f resolved.yml down
For switching profiles or recovering from a partial deploy, follow the full procedure in
references/teardown.md
.
bash
cd $REPO/deploy/docker
docker compose -f resolved.yml down
如需切换配置文件或从部分部署中恢复,请遵循
references/teardown.md
中的完整流程。

Debugging a Deployment

部署调试

Use this workflow when the user asks to "debug the deploy", "verify it's working", "why is the agent not responding", or similar. The goal is to confirm the full video-ingestion-to-agent-answer path, not just that containers are "Up".
Each profile reference has a Debugging section listing the exact commands and failure-mode table for that profile.
当用户要求“调试部署”、“验证是否正常工作”、“agent无响应的原因”或类似请求时,请使用此工作流。目标是确认从视频摄入到agent回复的完整路径正常,而非仅确认容器处于“运行中”状态。
每个配置文件的参考文档均包含调试部分,列出了该配置文件的具体命令和故障模式表。

Quick checks (all profiles)

快速检查(所有配置文件)

bash
undefined
bash
undefined

1. All expected containers Up

1. 所有预期容器均处于运行状态

docker ps --format 'table {{.Names}}\t{{.Status}}'
docker ps --format 'table {{.Names}}\t{{.Status}}'

2. Agent API + UI responding

2. Agent API + UI可正常响应

curl -sf http://localhost:8000/docs >/dev/null && echo "agent OK" curl -sf http://localhost:3000/ >/dev/null && echo "ui OK"
curl -sf http://localhost:8000/docs >/dev/null && echo "agent OK" curl -sf http://localhost:3000/ >/dev/null && echo "ui OK"

3. VLM NIM responding (base/lvs profiles)

3. VLM NIM可正常响应(base/lvs配置文件)

curl -sf http://localhost:30082/v1/models | python3 -m json.tool
curl -sf http://localhost:30082/v1/models | python3 -m json.tool

4. LLM NIM responding

4. LLM NIM可正常响应

curl -sf http://localhost:30081/v1/models | python3 -m json.tool
undefined
curl -sf http://localhost:30081/v1/models | python3 -m json.tool
undefined

End-to-end video sanity check

端到端视频验证

After the quick checks above pass, drive a real query through the agent — e.g. ask it over the REST API or UI to describe a video you've uploaded to VST. If the agent returns a non-empty answer, the upload → ingest → inference → reply path is healthy. If it fails,
docker logs vss-agent
shows which stage tripped.
在上述快速检查通过后,请通过agent发起真实查询——例如通过REST API或UI要求agent描述已上传到VST的视频。如果agent返回非空答案,则说明上传→摄入→推理→回复路径正常。如果失败,
docker logs vss-agent
会显示故障发生的阶段。

Examples

示例

  • Base profile, remote models: route to
    base
    , copy
    dev-profile-base/.env
    to
    generated.env
    , set
    LLM_MODE=remote
    /
    VLM_MODE=remote
    , dry-run, normalize, deploy, then verify
    /docs
    and UI.
  • Search profile on RTX: route to
    search
    , follow
    references/search.md
    for sizing and endpoints, seed videos, then run the search-profile readiness checks.
  • Edge target: route through
    references/edge.md
    , then use the same
    generated.env
    → dry-run → normalize → deploy flow.
  • Base配置文件,远程模型:路由到
    base
    ,复制
    dev-profile-base/.env
    generated.env
    ,设置
    LLM_MODE=remote
    /
    VLM_MODE=remote
    ,执行dry-run、标准化、部署,然后验证
    /docs
    和UI。
  • RTX上部署Search配置文件:路由到
    search
    ,遵循
    references/search.md
    中的资源规格和端点要求,导入视频,然后运行Search配置文件的就绪检查。
  • 边缘目标:通过
    references/edge.md
    路由,然后使用相同的
    generated.env
    → dry-run → 标准化 → 部署流程。

Limitations

限制

  • This skill deploys compose-based VSS profiles only; standalone microservice deployment belongs to the matching
    vss-deploy-*
    skill.
  • Hardware sizing, model placement, and profile-specific readiness are owned by profile references; do not infer them from memory.
  • Privileged host remediation requires user approval when passwordless sudo is unavailable.
  • 本技能仅部署基于Compose的VSS配置文件;独立微服务部署请使用对应的
    vss-deploy-*
    技能。
  • 硬件资源规格、模型部署位置和配置文件特定的就绪检查由对应配置文件的参考文档维护;请勿凭记忆推断。
  • 当无密码sudo权限不可用时,特权主机修复操作需要用户批准。

Troubleshooting

故障排查

Start with
references/agent-failure-modes.md
for cross-profile failures such as NIM cold-start timeouts, OOM, remote endpoint 5xx responses, missing
NGC_CLI_API_KEY
/
HF_TOKEN
, unexpanded values in
resolved.yml
etc.
对于跨配置文件的故障(例如NIM冷启动超时、OOM、远程端点5xx响应、缺失
NGC_CLI_API_KEY
/
HF_TOKEN
resolved.yml
中存在未展开值等),请首先查看
references/agent-failure-modes.md