Verify that AutoDeploy YAML configs were applied at runtime by cross-referencing with server logs and optionally graph dumps.
-
[Collect Inputs] Ask the user for the following inputs:
- TensorRT-LLM source directory (required) — path to the TensorRT-LLM repo root. Used to cross-check and source code for the latest log patterns.
- YAML config file path(s) (required) — one or more AutoDeploy configs used for the run. When multiple YAMLs are provided, they are deep-merged left-to-right: later files override earlier ones for overlapping keys. Tell the user: "If you have multiple configs (e.g., a default config and a user override), list them in priority order — lowest priority first, highest priority last."
- Server log file path (required) — the log output from the server
- Graph dump directory (optional but recommended) — the output directory containing per-transform graph snapshots. Files are named and show the graph AFTER each transform. When provided, graph analysis provides additional evidence (e.g., verifying sharded weights, collective ops, fused ops). This is especially useful for resolving UNKNOWN results.
- Nsys trace file (optional) — Nsight Systems profile ( or ) from the server run. Useful for verifying executor-level configs that produce no log output (e.g., , multi-stream concurrency, CUDA graph capture/replay).
- TensorRT-LLM source reference paths:
- Example configs:
<trtllm_src>/examples/auto_deploy/model_registry/configs/*.yaml
- Default transform config (all available transforms and their defaults):
<trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml
-
[Update Reference Doc] Before checking configs, ensure the bundled reference doc is up-to-date with the TensorRT-LLM source.
- — the TensorRT-LLM source directory from step 1
- — the directory containing this SKILL.md file
The agent compares
<trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml
and the AutoDeploy source code against
<skill_dir>/references/config_log_patterns.md
. If any configs were added, removed, renamed, or if log patterns have changed, the agent updates the reference doc in-place and reports what changed.
After the agent completes:
- If the reference doc was updated, inform the user: "Updated references/config_log_patterns.md to match the latest TensorRT-LLM source — see the agent's change summary below." Then show the agent's summary.
- If no changes were needed, briefly note: "Reference doc is up-to-date with the TensorRT-LLM source."
-
[Parse Configs] Run the parser script to flatten the YAML configs (
is the directory containing this SKILL.md file):
Input: The TensorRT-LLM
as the base, followed by the user's YAML config path(s) from step 1. Always include
first so that user configs override the defaults.
bash
python3 <skill_dir>/scripts/parse_config.py <trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml <yaml_path1> [<yaml_path2> ...]
This deep-merges the YAML files left-to-right (later files override earlier ones) and flattens nested keys into dotted notation (e.g.,
kv_cache_config.enable_block_reuse
). By including
first, every known config key appears in the output even if the user only overrode a subset.
Output: Flat JSON with all config
pairs. Example:
json
{
"yaml_files": ["default.yaml", "user_override.yaml"],
"total_configs": 15,
"configs": [
{"key": "compile_backend", "value": "torch-cudagraph"},
{"key": "kv_cache_config.free_gpu_memory_fraction", "value": "0.85"},
{"key": "transforms.compile_model.piecewise_enabled", "value": "True"}
]
}
-
[Quick Scan] Check each config against the server log using parallel agents.
Input: Config list from step 3, server log path from step 1, and references/config_log_patterns.md.
Split the configs from step 3 into 3 groups by section and launch 3 agents in parallel, each checking its group:
| Agent | Config group | Keys starting with | Reference section |
|---|
| Agent 1 | Top-level configs | , , , , , , , , , , etc. | "Top-Level Config Parameters" |
| Agent 2 | KV cache configs | | "kv_cache_config Parameters" |
| Agent 3 | Transform configs | (or any key matching a transform name like , , , , , etc.) | "Transform Parameters" |
Each agent receives:
- Its subset of pairs
- The server log file path
- The reference doc references/config_log_patterns.md (including verification source tags: , , )
- The nsys trace file path (if provided)
Each agent, for every config in its group:
- Reads the reference doc to find the relevant keywords and patterns for this config key.
- Greps the server log for those patterns. Key search strategies:
- For transform configs: grep for
[stage=..., transform=<name>]
and check the line ( → APPLIED if N>0, SKIPPED if N=0).
- For configs with success/failure indicators: grep for those specific strings.
- For configs with no known log pattern: grep for or the key name near the value.
- For configs with : mark as DISABLED without log search.
- Assigns a status based on what was found:
- APPLIED — log confirms the config took effect
- FAILED — log shows the config was attempted but fell back or errored
- SKIPPED — transform ran but found nothing to do (0 matches)
- DISABLED — config explicitly set
- UNKNOWN — no log evidence found (config may still be active but unlogged)
- Records the evidence (the matching log line or lack thereof).
Output: Each agent returns a list of
{config, value, status, evidence}
entries for its group. Merge all 3 lists into the combined result.
-
[Double Check] For any UNKNOWN entries from step 4, investigate further before presenting results to the user (FAILED entries already have concrete log evidence and do not need double-checking):
Input: List of UNKNOWN config entries from step 4 output, the server log file, and references/config_log_patterns.md.
- Re-read references/config_log_patterns.md for alternative patterns
- Grep the log more broadly for the transform name:
[stage=..., transform=<name>]
- Look for prefixed lines and lines for that transform
- Check for , , or near the transform logs
- If graph dump directory was provided:
- Graph files are named — each contains the FX graph AFTER that transform. Compare before/after by reading consecutive files.
- Graph evidence can upgrade UNKNOWN to APPLIED (e.g., collective ops after lm_head confirm sharding, fused custom ops confirm fusion transforms).
- Graph analysis verifies: sharding (collective ops, weight shape changes), attention backend (op types), MoE fusion (fused op presence), GEMM fusion (linear op count changes), RMSNorm/SwiGLU/RoPE pattern matching (custom op presence).
- See references/graph_verification_patterns.md for the full list of graph-based checks.
- If nsys trace was provided, check for executor-level configs tagged in the reference doc (e.g., , , multi-stream concurrency, CUDA graph capture/replay)
Output: For each investigated UNKNOWN entry, either additional evidence found (with status upgrade) or confirmation that the config is genuinely unlogged.
-
[Report] Present the final results to the user.
ALWAYS show the full detailed table. Do NOT summarize or condense. Present one row per config with columns:
- Config — the config key and its value (e.g.,
compile_backend = torch-cudagraph
)
- Result — one of: APPLIED, FAILED, SKIPPED, DISABLED, UNKNOWN
- Evidence — the log line or pattern that proves the result
After the table, show the summary line (e.g.,
Total configs checked: 29 | APPLIED: 23 | ...
) and any FAILED/WARNING details. Include any additional findings from the Double Check step (step 5).
If the user requested output files, write:
- Table output — the human-friendly table as plain text
- JSON output — machine-friendly JSON with array and object