dynamo-router-starter

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Dynamo Router Starter

Dynamo Router Starter

<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: CC-BY-4.0 -->
<!-- SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved. SPDX-License-Identifier: CC-BY-4.0 -->

Purpose

用途

Make Dynamo routing feel easy by getting a baseline router mode running, enabling KV-aware routing when appropriate, and proving the endpoint works. Keep the user focused on exact commands and success signals, not router internals.
通过运行基准路由模式、在合适时启用KV感知路由并验证端点可用,让Dynamo路由配置变得简单。让用户专注于具体命令和成功信号,而非路由内部机制。

Prerequisites

前提条件

  • Python 3.10+ with the
    dynamo
    package importable (
    python3 -m dynamo.frontend --help
    works).
  • For Kubernetes runs:
    kubectl
    configured with access to the target namespace and a deployed Dynamo recipe.
  • Network reachability to the frontend service (port-forward or direct).
  • A model already loaded into at least one worker (
    /v1/models
    returns at least one entry).
  • Python 3.10+,且可导入
    dynamo
    包(执行
    python3 -m dynamo.frontend --help
    可正常运行)。
  • 若在Kubernetes中运行:已配置
    kubectl
    ,可访问目标命名空间,且已部署Dynamo部署包(recipe)。
  • 可连通前端服务(通过端口转发或直接访问)。
  • 至少一个worker已加载模型(
    /v1/models
    返回至少一个条目)。

Required Inputs

所需输入

Collect or infer:
  • local Python/CLI or Kubernetes recipe path
  • desired mode:
    round-robin
    ,
    kv
    ,
    least-loaded
    ,
    device-aware-weighted
    ,
    direct
    , or
    random
  • frontend port or Kubernetes frontend service
  • whether workers publish KV events; if not, use approximate KV mode
  • model name for smoke requests, if
    /v1/models
    cannot discover it
收集或推断以下信息:
  • 本地Python/CLI或Kubernetes部署包路径
  • 期望的模式:
    round-robin
    kv
    least-loaded
    device-aware-weighted
    direct
    random
  • 前端端口或Kubernetes前端服务
  • worker是否发布KV事件;若未发布,则使用近似KV模式
  • 冒烟请求使用的模型名称(若
    /v1/models
    无法自动发现)

Instructions

操作步骤

1. Establish A Baseline

1. 建立基准配置

For local bring-up with already registered workers:
bash
python3 -m dynamo.frontend --router-mode round-robin --http-port 8000
For Kubernetes, inspect the selected recipe
deploy.yaml
and locate the frontend service. If the recipe is not already deployed, use
dynamo-recipe-runner
first.
对于已注册worker的本地启动:
bash
python3 -m dynamo.frontend --router-mode round-robin --http-port 8000
在Kubernetes中,检查选中的部署包
deploy.yaml
并找到前端服务。若部署包尚未部署,请先使用
dynamo-recipe-runner

2. Enable KV Routing

2. 启用KV路由

For local frontend:
bash
python3 -m dynamo.frontend --router-mode kv --http-port 8000
For Kubernetes, patch only the frontend service env:
yaml
envs:
  - name: DYN_ROUTER_MODE
    value: kv
If backend workers are not publishing KV cache events, set approximate mode instead of leaving the router waiting for events:
yaml
envs:
  - name: DYN_ROUTER_USE_KV_EVENTS
    value: "false"
对于本地前端:
bash
python3 -m dynamo.frontend --router-mode kv --http-port 8000
在Kubernetes中,仅修补前端服务的环境变量:
yaml
envs:
  - name: DYN_ROUTER_MODE
    value: kv
若后端worker未发布KV缓存事件,请设置近似模式,避免路由等待事件:
yaml
envs:
  - name: DYN_ROUTER_USE_KV_EVENTS
    value: "false"

3. Smoke Test

3. 冒烟测试

After port-forwarding the frontend service or starting local frontend, run:
bash
python3 scripts/check_router_health.py \
  --base-url http://127.0.0.1:8000
This must verify
/v1/models
and, when a model is discoverable, one
/v1/chat/completions
request.
端口转发前端服务或启动本地前端后,运行:
bash
python3 scripts/check_router_health.py \
  --base-url http://127.0.0.1:8000
此脚本需验证
/v1/models
,且当可发现模型时,发送一个
/v1/chat/completions
请求。

4. Compare Modes Carefully

4. 谨慎对比不同模式

When comparing round-robin vs KV routing:
  • use the same model, workers, prompt set, concurrency, and sampling settings
  • send repeated-prefix prompts if demonstrating KV reuse
  • label the result as a smoke comparison unless enough benchmark samples were collected
  • do not claim throughput improvement from a single chat request
If the endpoint is unhealthy or workers are missing, switch to
dynamo-troubleshoot
.
对比轮询与KV路由时:
  • 使用相同的模型、worker、提示集、并发数和采样设置
  • 若演示KV复用,发送重复前缀的提示
  • 除非收集了足够的基准样本,否则将结果标记为冒烟对比
  • 不要仅凭单次聊天请求宣称吞吐量提升
若端点不健康或worker缺失,请切换至
dynamo-troubleshoot

Available Scripts

可用脚本

ScriptPurposeArguments
scripts/check_router_health.py
Smoke-test
/v1/models
and one chat completion against a Dynamo frontend
--base-url
,
--retries
,
--timeout
Invoke via the agentskills.io
run_script()
protocol:
python
run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000"])
脚本用途参数
scripts/check_router_health.py
对Dynamo前端进行
/v1/models
和单次聊天完成的冒烟测试
--base-url
,
--retries
,
--timeout
通过agentskills.io的
run_script()
协议调用:
python
run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000"])

Examples

示例

Local KV-routed frontend on port 8000, then smoke-test it:
bash
python3 -m dynamo.frontend --router-mode kv --http-port 8000 &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000
Kubernetes-deployed frontend reachable via port-forward:
bash
kubectl port-forward svc/qwen-vllm-disagg-frontend 8000:8000 -n dynamo-demo &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000 --retries 3
Equivalent through the agent protocol:
python
run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000", "--retries", "3"])
本地端口8000上的KV路由前端,随后进行冒烟测试:
bash
python3 -m dynamo.frontend --router-mode kv --http-port 8000 &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000
通过端口转发访问Kubernetes部署的前端:
bash
kubectl port-forward svc/qwen-vllm-disagg-frontend 8000:8000 -n dynamo-demo &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000 --retries 3
通过agent协议的等效操作:
python
run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000", "--retries", "3"])

Output Contract

输出约定

Return:
  • mode selected and why
  • local command or Kubernetes env patch
  • frontend service or URL
  • smoke-test result
  • any limitation, such as approximate KV mode or missing worker KV events
  • next command to run for a fuller comparison
返回以下内容:
  • 所选模式及原因
  • 本地命令或Kubernetes环境变量补丁
  • 前端服务或URL
  • 冒烟测试结果
  • 任何限制,例如近似KV模式或缺失worker KV事件
  • 用于更全面对比的下一个命令

Limitations

限制

  • Smoke test is one chat completion; it is not a benchmark. Use
    dynamo-benchmark
    for throughput/latency numbers.
  • KV-aware mode without worker KV-event publication degrades to approximate mode; this skill flags but does not fix the underlying worker config.
  • Mode comparisons require matched workloads; cross-mode latency claims need separate benchmark runs.
  • 冒烟测试仅包含一次聊天完成,并非基准测试。如需吞吐量/延迟数据,请使用
    dynamo-benchmark
  • 若worker未发布KV事件,KV感知模式会降级为近似模式;本技能会标记此问题,但不会修复worker的底层配置。
  • 模式对比需要匹配工作负载;跨模式延迟声明需要单独的基准测试运行。

Troubleshooting

故障排查

SymptomLikely causeNext step
/v1/models
returns empty list
No worker registered with the frontendVerify worker pods are Ready; confirm they connect to the same etcd/NATS
Smoke chat request times outFrontend up, workers not servingSwitch to
dynamo-troubleshoot
; inspect worker logs
KV mode hangsWorkers do not publish KV cache eventsSet
DYN_ROUTER_USE_KV_EVENTS=false
(approximate mode)
Connection refused on port-forwardPort-forward dropped or wrong service nameRe-run port-forward; verify the frontend service name matches the recipe
症状可能原因下一步操作
/v1/models
返回空列表
没有worker注册到前端验证worker Pod是否处于Ready状态;确认它们连接到相同的etcd/NATS
冒烟聊天请求超时前端已启动,但worker未提供服务切换至
dynamo-troubleshoot
;检查worker日志
KV模式挂起worker未发布KV缓存事件设置
DYN_ROUTER_USE_KV_EVENTS=false
(近似模式)
端口转发时连接被拒绝端口转发断开或服务名称错误重新运行端口转发;验证前端服务名称与部署包匹配

Benchmark

基准测试

See
BENCHMARK.md
for the NVCARPS-EVAL performance report (auto-generated by the NVSkills CI pipeline). To refresh, re-run
/nvskills-ci
on an upstream PR touching this skill.
查看
BENCHMARK.md
获取NVCARPS-EVAL性能报告(由NVSkills CI管道自动生成)。如需刷新报告,请在触及本技能的上游PR上重新运行
/nvskills-ci

References

参考资料

  • Read
    references/router-modes.md
    for the compact mode/env map.
  • Use
    scripts/check_router_health.py
    for endpoint smoke tests.
  • 阅读
    references/router-modes.md
    获取简洁的模式/环境变量映射表。
  • 使用
    scripts/check_router_health.py
    进行端点冒烟测试。