Dynamo Recipe Runner
<!--
SPDX-FileCopyrightText: Copyright (c) 2026 NVIDIA CORPORATION & AFFILIATES. All rights reserved.
SPDX-License-Identifier: CC-BY-4.0
-->
Purpose
Get from user intent to a working Dynamo recipe endpoint with minimal back and
forth. Do not create new guide content. Operate on the existing
tree, patch the smallest necessary set of manifests, deploy when the user has
cluster access, and prove success with an OpenAI-compatible smoke request.
Prerequisites
- Python 3.10+ on the operator machine.
- configured with a working cluster context.
- Cluster has a default storage class for model-cache PVCs.
- Hugging Face token stored in a Kubernetes secret named
(or equivalent) in the target namespace.
- Read access to the tree in the ai-dynamo/dynamo repository.
Required Inputs
Collect or infer these before changing manifests:
- recipe target: model, framework (, , , ), deployment mode, and GPU type/count
- Kubernetes context and namespace
- Hugging Face secret name, usually
- storage class for model cache PVCs
- runtime image tag if the recipe uses a placeholder or stale test image
- whether to run commands or only produce exact commands
If a required value is missing and cannot be inferred from the selected recipe,
ask for only that value.
Instructions
1. Preflight
Run read-only checks first:
bash
git status --short
python3 scripts/recipe_tool.py list --format table
kubectl config current-context
kubectl get storageclass
kubectl get nodes -o wide
kubectl get namespace "${NAMESPACE}"
kubectl get secret hf-token-secret -n "${NAMESPACE}"
If
is unavailable or the cluster is unreachable, continue by
selecting and validating the recipe, then return exact commands instead of
pretending the deployment ran.
2. Select The Recipe
Use the recipe matrix from
and the scanner:
bash
python3 scripts/recipe_tool.py list \
--query qwen --framework vllm --mode disagg --format table
Prefer an exact existing recipe. Do not invent new manifests unless the user
explicitly asks to author a new recipe.
3. Inspect And Validate
Read the selected recipe README, model-cache manifests,
, and
if present. Then run:
bash
python3 scripts/recipe_tool.py validate \
recipes/<model>/<framework>/<mode>
Resolve reported blockers before applying manifests: storage class, model cache
PVC, image tag, HF token secret, GPU count, frontend service name, and router
mode.
4. Patch Minimal Values
Patch only recipe-specific values needed for this run. Do not reformat whole
YAML files. Common patches:
- image repository/tag
- model path or model cache mount path
- GPU resource requests/limits
- frontend
- namespace only when a manifest hardcodes it
Never write Hugging Face tokens into files or logs. Use Kubernetes secrets.
5. Deploy
Follow the selected recipe README when it differs from the default sequence.
The default sequence is:
bash
kubectl apply -f recipes/<model>/model-cache/ -n "${NAMESPACE}"
kubectl wait --for=condition=Complete job/model-download -n "${NAMESPACE}" --timeout=6000s
kubectl apply -f recipes/<model>/<framework>/<mode>/deploy.yaml -n "${NAMESPACE}"
kubectl get dynamographdeployment -n "${NAMESPACE}"
kubectl get pods -n "${NAMESPACE}" -o wide
Wait for the frontend and workers to be ready before testing.
6. Smoke Test
Port-forward the frontend service, then verify
and one chat
completion:
bash
kubectl port-forward svc/<deployment-name>-frontend 8000:8000 -n "${NAMESPACE}"
curl http://127.0.0.1:8000/v1/models
If
is also installed, prefer its
scripts/check_router_health.py
for the full OpenAI-compatible smoke test. If this fails, switch to
.
Available Scripts
| Script | Purpose | Arguments |
|---|
scripts/recipe_tool.py list
| Enumerate available recipes, optionally filtered | , , , |
scripts/recipe_tool.py validate
| Validate a recipe directory before apply | positional recipe path |
Invoke via the agentskills.io
protocol:
python
run_script("scripts/recipe_tool.py", args=["list", "--framework", "sglang", "--format", "table"])
run_script("scripts/recipe_tool.py", args=["validate", "recipes/nemotron-3-super-fp8/sglang/agg"])
Examples
List sglang recipes that fit a single 8xB200 node:
bash
python3 scripts/recipe_tool.py list --framework sglang --format table
Validate a specific recipe and resolve blockers before applying:
bash
python3 scripts/recipe_tool.py validate recipes/nemotron-3-super-fp8/sglang/agg
Equivalent through the agent protocol:
python
run_script("scripts/recipe_tool.py", args=["validate", "recipes/nemotron-3-super-fp8/sglang/agg"])
Output Contract
Return:
- selected recipe path and why it was selected
- exact values patched
- commands run or commands to run
- endpoint and smoke-test result
- unresolved blockers, if any
- next troubleshooting step when deployment does not become healthy
Limitations
- Operates on the existing tree only. Does not author new manifests.
- Cluster-mutating apply steps require permission to the target namespace.
- Smoke-test depth is intentionally minimal; for full router/endpoint coverage use .
- Multi-node disagg transport correctness is out of scope; use
dynamo-interconnect-check
after deploy.
Troubleshooting
| Symptom | Likely cause | Next step |
|---|
| cluster unreachable | Context not set or VPN down | Return exact commands instead of running them; resume when cluster is reachable |
| reports missing storage class | Cluster has no default | Patch on the model-cache manifest before applying |
| Model-cache job stuck | PVC unbound or HF secret missing | Inspect PVC events; create or rename the HF secret to match the recipe |
| Worker pods | Stale image tag or missing pull secret | Patch the image tag; verify image pull secret in the namespace |
| 4xx/5xx after deploy | Frontend not ready or wrong service port | Wait for pods Ready; re-run port-forward; switch to if it persists |
Benchmark
See
for the NVCARPS-EVAL performance report (auto-generated by the NVSkills CI pipeline). To refresh, re-run
on an upstream PR touching this skill.
References
- Read
references/k8s-recipe-workflow.md
for command templates and readiness checks.
- Use for recipe discovery and lightweight validation.