dynamo-router-starter

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Dynamo Router Starter

Purpose

用途

Make Dynamo routing feel easy by getting a baseline router mode running, enabling KV-aware routing when appropriate, and proving the endpoint works. Keep the user focused on exact commands and success signals, not router internals.

通过运行基准路由模式、在合适时启用KV感知路由并验证端点可用，让Dynamo路由配置变得简单。让用户专注于具体命令和成功信号，而非路由内部机制。

Prerequisites

前提条件

Python 3.10+ with the
```
dynamo
```
package importable (
```
python3 -m dynamo.frontend --help
```
works).
For Kubernetes runs:
```
kubectl
```
configured with access to the target namespace and a deployed Dynamo recipe.
Network reachability to the frontend service (port-forward or direct).
A model already loaded into at least one worker (
```
/v1/models
```
returns at least one entry).

Python 3.10+，且可导入
```
dynamo
```
包（执行
```
python3 -m dynamo.frontend --help
```
可正常运行）。
若在Kubernetes中运行：已配置
```
kubectl
```
，可访问目标命名空间，且已部署Dynamo部署包（recipe）。
可连通前端服务（通过端口转发或直接访问）。
至少一个worker已加载模型（
```
/v1/models
```
返回至少一个条目）。

Required Inputs

所需输入

Collect or infer:

local Python/CLI or Kubernetes recipe path

desired mode:

round-robin

kv

least-loaded

device-aware-weighted

direct

, or

random

frontend port or Kubernetes frontend service
whether workers publish KV events; if not, use approximate KV mode
model name for smoke requests, if
```
/v1/models
```
cannot discover it

收集或推断以下信息：

本地Python/CLI或Kubernetes部署包路径

期望的模式：

round-robin

、

kv

、

least-loaded

、

device-aware-weighted

、

direct

或

random

前端端口或Kubernetes前端服务
worker是否发布KV事件；若未发布，则使用近似KV模式
冒烟请求使用的模型名称（若
```
/v1/models
```
无法自动发现）

Instructions

操作步骤

1. Establish A Baseline

1. 建立基准配置

For local bring-up with already registered workers:

bash

python3 -m dynamo.frontend --router-mode round-robin --http-port 8000

For Kubernetes, inspect the selected recipe

deploy.yaml

and locate the frontend service. If the recipe is not already deployed, use

dynamo-recipe-runner

first.

对于已注册worker的本地启动：

bash

python3 -m dynamo.frontend --router-mode round-robin --http-port 8000

在Kubernetes中，检查选中的部署包

deploy.yaml

并找到前端服务。若部署包尚未部署，请先使用

dynamo-recipe-runner

。

2. Enable KV Routing

2. 启用KV路由

For local frontend:

bash

python3 -m dynamo.frontend --router-mode kv --http-port 8000

For Kubernetes, patch only the frontend service env:

yaml

envs:
  - name: DYN_ROUTER_MODE
    value: kv

If backend workers are not publishing KV cache events, set approximate mode instead of leaving the router waiting for events:

yaml

envs:
  - name: DYN_ROUTER_USE_KV_EVENTS
    value: "false"

对于本地前端：

bash

python3 -m dynamo.frontend --router-mode kv --http-port 8000

在Kubernetes中，仅修补前端服务的环境变量：

yaml

envs:
  - name: DYN_ROUTER_MODE
    value: kv

若后端worker未发布KV缓存事件，请设置近似模式，避免路由等待事件：

yaml

envs:
  - name: DYN_ROUTER_USE_KV_EVENTS
    value: "false"

3. Smoke Test

3. 冒烟测试

After port-forwarding the frontend service or starting local frontend, run:

bash

python3 scripts/check_router_health.py \
  --base-url http://127.0.0.1:8000

This must verify

/v1/models

and, when a model is discoverable, one

/v1/chat/completions

request.

端口转发前端服务或启动本地前端后，运行：

bash

python3 scripts/check_router_health.py \
  --base-url http://127.0.0.1:8000

此脚本需验证

/v1/models

，且当可发现模型时，发送一个

/v1/chat/completions

请求。

4. Compare Modes Carefully

4. 谨慎对比不同模式

When comparing round-robin vs KV routing:

use the same model, workers, prompt set, concurrency, and sampling settings
send repeated-prefix prompts if demonstrating KV reuse
label the result as a smoke comparison unless enough benchmark samples were collected
do not claim throughput improvement from a single chat request

If the endpoint is unhealthy or workers are missing, switch to

dynamo-troubleshoot

对比轮询与KV路由时：

使用相同的模型、worker、提示集、并发数和采样设置
若演示KV复用，发送重复前缀的提示
除非收集了足够的基准样本，否则将结果标记为冒烟对比
不要仅凭单次聊天请求宣称吞吐量提升

若端点不健康或worker缺失，请切换至

dynamo-troubleshoot

。

Available Scripts

可用脚本

Script	Purpose	Arguments
`scripts/check_router_health.py`	Smoke-test `/v1/models` and one chat completion against a Dynamo frontend	`--base-url` , `--retries` , `--timeout`

Invoke via the agentskills.io

run_script()

protocol:

python

run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000"])

脚本	用途	参数
`scripts/check_router_health.py`	对Dynamo前端进行 `/v1/models` 和单次聊天完成的冒烟测试	`--base-url` , `--retries` , `--timeout`

通过agentskills.io的

run_script()

协议调用：

python

run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000"])

Examples

示例

Local KV-routed frontend on port 8000, then smoke-test it:

bash

python3 -m dynamo.frontend --router-mode kv --http-port 8000 &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000

Kubernetes-deployed frontend reachable via port-forward:

bash

kubectl port-forward svc/qwen-vllm-disagg-frontend 8000:8000 -n dynamo-demo &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000 --retries 3

Equivalent through the agent protocol:

python

run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000", "--retries", "3"])

本地端口8000上的KV路由前端，随后进行冒烟测试：

bash

python3 -m dynamo.frontend --router-mode kv --http-port 8000 &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000

通过端口转发访问Kubernetes部署的前端：

bash

kubectl port-forward svc/qwen-vllm-disagg-frontend 8000:8000 -n dynamo-demo &
python3 scripts/check_router_health.py --base-url http://127.0.0.1:8000 --retries 3

通过agent协议的等效操作：

python

run_script("scripts/check_router_health.py", args=["--base-url", "http://127.0.0.1:8000", "--retries", "3"])

Output Contract

输出约定

Return:

mode selected and why
local command or Kubernetes env patch
frontend service or URL
smoke-test result
any limitation, such as approximate KV mode or missing worker KV events
next command to run for a fuller comparison

返回以下内容：

所选模式及原因
本地命令或Kubernetes环境变量补丁
前端服务或URL
冒烟测试结果
任何限制，例如近似KV模式或缺失worker KV事件
用于更全面对比的下一个命令

Limitations

限制

Smoke test is one chat completion; it is not a benchmark. Use
```
dynamo-benchmark
```
for throughput/latency numbers.
KV-aware mode without worker KV-event publication degrades to approximate mode; this skill flags but does not fix the underlying worker config.
Mode comparisons require matched workloads; cross-mode latency claims need separate benchmark runs.

冒烟测试仅包含一次聊天完成，并非基准测试。如需吞吐量/延迟数据，请使用
```
dynamo-benchmark
```
。
若worker未发布KV事件，KV感知模式会降级为近似模式；本技能会标记此问题，但不会修复worker的底层配置。
模式对比需要匹配工作负载；跨模式延迟声明需要单独的基准测试运行。

Troubleshooting

故障排查

Symptom	Likely cause	Next step
`/v1/models` returns empty list	No worker registered with the frontend	Verify worker pods are Ready; confirm they connect to the same etcd/NATS
Smoke chat request times out	Frontend up, workers not serving	Switch to `dynamo-troubleshoot` ; inspect worker logs
KV mode hangs	Workers do not publish KV cache events	Set `DYN_ROUTER_USE_KV_EVENTS=false` (approximate mode)
Connection refused on port-forward	Port-forward dropped or wrong service name	Re-run port-forward; verify the frontend service name matches the recipe

症状	可能原因	下一步操作
`/v1/models` 返回空列表	没有worker注册到前端	验证worker Pod是否处于Ready状态；确认它们连接到相同的etcd/NATS
冒烟聊天请求超时	前端已启动，但worker未提供服务	切换至 `dynamo-troubleshoot` ；检查worker日志
KV模式挂起	worker未发布KV缓存事件	设置 `DYN_ROUTER_USE_KV_EVENTS=false` （近似模式）
端口转发时连接被拒绝	端口转发断开或服务名称错误	重新运行端口转发；验证前端服务名称与部署包匹配

Benchmark

基准测试

See

BENCHMARK.md

for the NVCARPS-EVAL performance report (auto-generated by the NVSkills CI pipeline). To refresh, re-run

/nvskills-ci

on an upstream PR touching this skill.

查看

BENCHMARK.md

获取NVCARPS-EVAL性能报告（由NVSkills CI管道自动生成）。如需刷新报告，请在触及本技能的上游PR上重新运行

/nvskills-ci

。

References

参考资料

Read
```
references/router-modes.md
```
for the compact mode/env map.
Use
```
scripts/check_router_health.py
```
for endpoint smoke tests.

阅读
```
references/router-modes.md
```
获取简洁的模式/环境变量映射表。
使用
```
scripts/check_router_health.py
```
进行端点冒烟测试。