dynamo-router-starter
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDynamo Router Starter
Dynamo Router Starter
<!--
SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: CC-BY-4.0
-->
<!--
SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: CC-BY-4.0
-->
Purpose
用途
Make Dynamo routing feel easy by getting a baseline router mode running, enabling
KV-aware routing when appropriate, and proving the endpoint works. Keep the user
focused on exact commands and success signals, not router internals.
通过运行基准路由模式、在合适时启用KV感知路由并验证端点可用,让Dynamo路由配置变得简单。让用户专注于具体命令和成功信号,而非路由内部机制。
Prerequisites
前提条件
- Python 3.10+ with the package importable (
dynamoworks).python3 -m dynamo.frontend --help - For Kubernetes runs: configured with access to the target namespace and a deployed Dynamo recipe.
kubectl - Network reachability to the frontend service (port-forward or direct).
- A model already loaded into at least one worker (returns at least one entry).
/v1/models
- Python 3.10+,且可导入包(执行
dynamo可正常运行)。python3 -m dynamo.frontend --help - 若在Kubernetes中运行:已配置,可访问目标命名空间,且已部署Dynamo部署包(recipe)。
kubectl - 可连通前端服务(通过端口转发或直接访问)。
- 至少一个worker已加载模型(返回至少一个条目)。
/v1/models
Required Inputs
所需输入
Collect or infer:
- local Python/CLI or Kubernetes recipe path
- desired mode: ,
round-robin,kv,least-loaded,device-aware-weighted, ordirectrandom - frontend port or Kubernetes frontend service
- whether workers publish KV events; if not, use approximate KV mode
- model name for smoke requests, if cannot discover it
/v1/models
收集或推断以下信息:
- 本地Python/CLI或Kubernetes部署包路径
- 期望的模式:、
round-robin、kv、least-loaded、device-aware-weighted或directrandom - 前端端口或Kubernetes前端服务
- worker是否发布KV事件;若未发布,则使用近似KV模式
- 冒烟请求使用的模型名称(若无法自动发现)
/v1/models
Instructions
操作步骤
1. Establish A Baseline
1. 建立基准配置
For local bring-up with already registered workers:
bash
python3 -m dynamo.frontend --router-mode round-robin --http-port 8000For Kubernetes, inspect the selected recipe and locate the
frontend service. If the recipe is not already deployed, use
first.
deploy.yamldynamo-recipe-runner对于已注册worker的本地启动:
bash
python3 -m dynamo.frontend --router-mode round-robin --http-port 8000在Kubernetes中,检查选中的部署包并找到前端服务。若部署包尚未部署,请先使用。
deploy.yamldynamo-recipe-runner2. Enable KV Routing
2. 启用KV路由
For local frontend:
bash
python3 -m dynamo.frontend --router-mode kv --http-port 8000For Kubernetes, patch only the frontend service env:
yaml
envs:
- name: DYN_ROUTER_MODE
value: kvIf backend workers are not publishing KV cache events, set approximate mode
instead of leaving the router waiting for events:
yaml
envs:
- name: DYN_ROUTER_USE_KV_EVENTS
value: "false"对于本地前端:
bash
python3 -m dynamo.frontend --router-mode kv --http-port 8000在Kubernetes中,仅修补前端服务的环境变量:
yaml
envs:
- name: DYN_ROUTER_MODE
value: kv若后端worker未发布KV缓存事件,请设置近似模式,避免路由等待事件:
yaml
envs:
- name: DYN_ROUTER_USE_KV_EVENTS
value: "false"3. Smoke Test
3. 冒烟测试
After port-forwarding the frontend service or starting local frontend, run:
bash
python3 scripts/check_router_health.py \
--base-url http://127.0.0.1:8000This must verify and, when a model is discoverable, one
request.
/v1/models/v1/chat/completions端口转发前端服务或启动本地前端后,运行:
bash
python3 scripts/check_router_health.py \
--base-url http://127.0.0.1:8000此脚本需验证,且当可发现模型时,发送一个请求。
/v1/models/v1/chat/completions4. Compare Modes Carefully
4. 谨慎对比不同模式
When comparing round-robin vs KV routing:
- use the same model, workers, prompt set, concurrency, and sampling settings
- send repeated-prefix prompts if demonstrating KV reuse
- label the result as a smoke comparison unless enough benchmark samples were collected
- do not claim throughput improvement from a single chat request
If the endpoint is unhealthy or workers are missing, switch to
.
dynamo-troubleshoot对比轮询与KV路由时:
- 使用相同的模型、worker、提示集、并发数和采样设置
- 若演示KV复用,发送重复前缀的提示
- 除非收集了足够的基准样本,否则将结果标记为冒烟对比
- 不要仅凭单次聊天请求宣称吞吐量提升
若端点不健康或worker缺失,请切换至。
dynamo-troubleshootAvailable Scripts
可用脚本
| Script | Purpose | Arguments |
|---|---|---|
| Smoke-test | |
Invoke via the agentskills.io protocol:
run_script()python
run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000"])| 脚本 | 用途 | 参数 |
|---|---|---|
| 对Dynamo前端进行 | |
通过agentskills.io的协议调用:
run_script()python
run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000"])Examples
示例
Local KV-routed frontend on port 8000, then smoke-test it:
bash
python3 -m dynamo.frontend --router-mode kv --http-port 8000 &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000Kubernetes-deployed frontend reachable via port-forward:
bash
kubectl port-forward svc/qwen-vllm-disagg-frontend 8000:8000 -n dynamo-demo &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000 --retries 3Equivalent through the agent protocol:
python
run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000", "--retries", "3"])本地端口8000上的KV路由前端,随后进行冒烟测试:
bash
python3 -m dynamo.frontend --router-mode kv --http-port 8000 &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000通过端口转发访问Kubernetes部署的前端:
bash
kubectl port-forward svc/qwen-vllm-disagg-frontend 8000:8000 -n dynamo-demo &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000 --retries 3通过agent协议的等效操作:
python
run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000", "--retries", "3"])Output Contract
输出约定
Return:
- mode selected and why
- local command or Kubernetes env patch
- frontend service or URL
- smoke-test result
- any limitation, such as approximate KV mode or missing worker KV events
- next command to run for a fuller comparison
返回以下内容:
- 所选模式及原因
- 本地命令或Kubernetes环境变量补丁
- 前端服务或URL
- 冒烟测试结果
- 任何限制,例如近似KV模式或缺失worker KV事件
- 用于更全面对比的下一个命令
Limitations
限制
- Smoke test is one chat completion; it is not a benchmark. Use for throughput/latency numbers.
dynamo-benchmark - KV-aware mode without worker KV-event publication degrades to approximate mode; this skill flags but does not fix the underlying worker config.
- Mode comparisons require matched workloads; cross-mode latency claims need separate benchmark runs.
- 冒烟测试仅包含一次聊天完成,并非基准测试。如需吞吐量/延迟数据,请使用。
dynamo-benchmark - 若worker未发布KV事件,KV感知模式会降级为近似模式;本技能会标记此问题,但不会修复worker的底层配置。
- 模式对比需要匹配工作负载;跨模式延迟声明需要单独的基准测试运行。
Troubleshooting
故障排查
| Symptom | Likely cause | Next step |
|---|---|---|
| No worker registered with the frontend | Verify worker pods are Ready; confirm they connect to the same etcd/NATS |
| Smoke chat request times out | Frontend up, workers not serving | Switch to |
| KV mode hangs | Workers do not publish KV cache events | Set |
| Connection refused on port-forward | Port-forward dropped or wrong service name | Re-run port-forward; verify the frontend service name matches the recipe |
| 症状 | 可能原因 | 下一步操作 |
|---|---|---|
| 没有worker注册到前端 | 验证worker Pod是否处于Ready状态;确认它们连接到相同的etcd/NATS |
| 冒烟聊天请求超时 | 前端已启动,但worker未提供服务 | 切换至 |
| KV模式挂起 | worker未发布KV缓存事件 | 设置 |
| 端口转发时连接被拒绝 | 端口转发断开或服务名称错误 | 重新运行端口转发;验证前端服务名称与部署包匹配 |
Benchmark
基准测试
See for the NVCARPS-EVAL performance report (auto-generated by the NVSkills CI pipeline). To refresh, re-run on an upstream PR touching this skill.
BENCHMARK.md/nvskills-ci查看获取NVCARPS-EVAL性能报告(由NVSkills CI管道自动生成)。如需刷新报告,请在触及本技能的上游PR上重新运行。
BENCHMARK.md/nvskills-ciReferences
参考资料
- Read for the compact mode/env map.
references/router-modes.md - Use for endpoint smoke tests.
scripts/check_router_health.py
- 阅读获取简洁的模式/环境变量映射表。
references/router-modes.md - 使用进行端点冒烟测试。
scripts/check_router_health.py