ad-conf-check

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

AutoDeploy Config Checker

AutoDeploy配置检查器

Verify that AutoDeploy YAML configs were applied at runtime by cross-referencing with server logs and optionally graph dumps.

通过与服务器日志及可选的图转储交叉对比，验证AutoDeploy YAML配置是否在运行时已应用。

Input

输入

TensorRT-LLM source directory (required) — path to the TensorRT-LLM repo root. Used to read the latest
```
default.yaml
```
and source code for up-to-date log patterns (the bundled reference doc may be stale).
YAML config file path(s) (required) — one or more AutoDeploy YAML configs. When multiple files are provided, they are deep-merged left-to-right (later files override earlier ones for overlapping keys).
Server log file path (required) — log output from the AutoDeploy server run.
Graph dump directory (optional) —
```
AD_DUMP_GRAPHS_DIR
```
output directory containing per-transform graph snapshots (
```
NNN_stage_transform.txt
```
). Provides additional evidence for resolving UNKNOWN results.
Nsys trace file (optional) — Nsight Systems profile (
```
.nsys-rep
```
or
```
.sqlite
```
) from the server run. Useful for verifying executor-level configs that produce no log output (e.g.,
```
enable_chunked_prefill
```
, multi-stream concurrency, CUDA graph capture/replay).
Table output file path (optional) — path to write human-friendly table results.
JSON output file path (optional) — path to write machine-friendly JSON results.

TensorRT-LLM源码目录（必填）——TensorRT-LLM仓库根目录路径。用于读取最新的
```
default.yaml
```
和源码以获取最新日志模式（捆绑的参考文档可能过时）。
YAML配置文件路径（必填）——一个或多个AutoDeploy YAML配置文件。提供多个文件时，会从左到右深度合并（后续文件会覆盖前置文件的重叠键）。
服务器日志文件路径（必填）——AutoDeploy服务器运行生成的日志输出。
图转储目录（可选）——
```
AD_DUMP_GRAPHS_DIR
```
输出目录，包含每个转换的图快照文件（
```
NNN_stage_transform.txt
```
）。可为解决UNKNOWN结果提供额外证据。
Nsys追踪文件（可选）——服务器运行生成的Nsight Systems分析文件（
```
.nsys-rep
```
或
```
.sqlite
```
格式）。可用于验证无日志输出的执行器级配置（如
```
enable_chunked_prefill
```
、多流并发、CUDA图捕获/重放）。
表格输出文件路径（可选）——用于写入易读表格结果的路径。
JSON输出文件路径（可选）——用于写入机器友好型JSON结果的路径。

Output

输出

Human-friendly table (always presented to user)

易读表格（始终向用户展示）

Verification table — one row per config key with columns: Config (key=value), Result (APPLIED / FAILED / SKIPPED / DISABLED / UNKNOWN), Evidence (log line or graph analysis proving the result).

Summary line — total counts per status (e.g.,

Total configs checked: 29 | APPLIED: 23 | UNKNOWN: 4 | ...

FAILED/WARNING details — expanded information for any configs that failed or had warnings.

验证表格——每个配置键对应一行，包含列：配置（key=value）、结果（APPLIED / FAILED / SKIPPED / DISABLED / UNKNOWN）、证据（证明结果的日志行或图分析内容）。

汇总行——各状态的总计数（例如：

已检查配置总数: 29 | APPLIED: 23 | UNKNOWN: 4 | ...

）。

FAILED/WARNING详情——对任何配置失败或存在警告的展开信息。

Machine-friendly JSON (when JSON output path is given)

机器友好型JSON（指定JSON输出路径时生成）

JSON file with two top-level keys:

results
— array of objects, each with
```
config
```
,
```
value
```
,
```
status
```
,
```
evidence
```
.
summary
— object with
```
total
```
(int) and
```
counts
```
(object mapping status to count, only non-zero statuses included).

包含两个顶级键的JSON文件：

results
——对象数组，每个对象包含
```
config
```
、
```
value
```
、
```
status
```
、
```
evidence
```
字段。
summary
——对象，包含
```
total
```
（整数）和
```
counts
```
（对象，映射状态到计数，仅包含非零状态）。

Workflow

工作流程

[Collect Inputs] Ask the user for the following inputs:
- TensorRT-LLM source directory (required) — path to the TensorRT-LLM repo root. Used to cross-check
```
default.yaml
```
  and source code for the latest log patterns.
- YAML config file path(s) (required) — one or more AutoDeploy configs used for the run. When multiple YAMLs are provided, they are deep-merged left-to-right: later files override earlier ones for overlapping keys. Tell the user: "If you have multiple configs (e.g., a default config and a user override), list them in priority order — lowest priority first, highest priority last."
- Server log file path (required) — the log output from the server
- Graph dump directory (optional but recommended) — the
```
AD_DUMP_GRAPHS_DIR
```
  output directory containing per-transform graph snapshots. Files are named
```
NNN_stage_transform.txt
```
  and show the graph AFTER each transform. When provided, graph analysis provides additional evidence (e.g., verifying sharded weights, collective ops, fused ops). This is especially useful for resolving UNKNOWN results.
- Nsys trace file (optional) — Nsight Systems profile (
```
.nsys-rep
```
  or
```
.sqlite
```
  ) from the server run. Useful for verifying executor-level configs that produce no log output (e.g.,
```
enable_chunked_prefill
```
  , multi-stream concurrency, CUDA graph capture/replay).
- TensorRT-LLM source reference paths:
  - Example configs:
```
<trtllm_src>/examples/auto_deploy/model_registry/configs/*.yaml
```
  - Default transform config (all available transforms and their defaults):
```
<trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml
```
[Update Reference Doc] Before checking configs, ensure the bundled reference doc is up-to-date with the TensorRT-LLM source.
Launch the
```
ad-conf-check-update
```
agent with:
- ```
<trtllm_src>
```
  — the TensorRT-LLM source directory from step 1
- ```
<skill_dir>
```
  — the directory containing this SKILL.md file
The agent compares
```
<trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml
```
and the AutoDeploy source code against
```
<skill_dir>/references/config_log_patterns.md
```
. If any configs were added, removed, renamed, or if log patterns have changed, the agent updates the reference doc in-place and reports what changed.
After the agent completes:
- If the reference doc was updated, inform the user: "Updated references/config_log_patterns.md to match the latest TensorRT-LLM source — see the agent's change summary below." Then show the agent's summary.
- If no changes were needed, briefly note: "Reference doc is up-to-date with the TensorRT-LLM source."

[Parse Configs] Run the parser script to flatten the YAML configs (

<skill_dir>

is the directory containing this SKILL.md file):

Input: The TensorRT-LLM

default.yaml

as the base, followed by the user's YAML config path(s) from step 1. Always include

default.yaml

first so that user configs override the defaults.

bash

python3 <skill_dir>/scripts/parse_config.py <trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml <yaml_path1> [<yaml_path2> ...]

This deep-merges the YAML files left-to-right (later files override earlier ones) and flattens nested keys into dotted notation (e.g.,

kv_cache_config.enable_block_reuse

). By including

default.yaml

first, every known config key appears in the output even if the user only overrode a subset.

Output: Flat JSON with all config

{key, value}

pairs. Example:

json

{
  "yaml_files": ["default.yaml", "user_override.yaml"],
  "total_configs": 15,
  "configs": [
    {"key": "compile_backend", "value": "torch-cudagraph"},
    {"key": "kv_cache_config.free_gpu_memory_fraction", "value": "0.85"},
    {"key": "transforms.compile_model.piecewise_enabled", "value": "True"}
  ]
}

[Quick Scan] Check each config against the server log using parallel agents.

Input: Config list from step 3, server log path from step 1, and references/config_log_patterns.md.

Split the configs from step 3 into 3 groups by section and launch 3 agents in parallel, each checking its group:

Agent	Config group	Keys starting with	Reference section
Agent 1	Top-level configs	`runtime` , `compile_backend` , `attn_backend` , `max_seq_len` , `max_num_tokens` , `max_batch_size` , `cuda_graph_batch_sizes` , `enable_chunked_prefill` , `model_factory` , `dtype` , etc.	"Top-Level Config Parameters"
Agent 2	KV cache configs	`kv_cache_config.*`	"kv_cache_config Parameters"
Agent 3	Transform configs	`transforms.` (or any key matching a transform name like `compile_model` , `detect_sharding` , `multi_stream_` , `fuse_` , `gather_logits_` , etc.)	"Transform Parameters"

Each agent receives:

Its subset of
```
{key, value}
```
pairs
The server log file path
The reference doc references/config_log_patterns.md (including verification source tags:
```
[log]
```
,
```
[graph]
```
,
```
[nsys]
```
)
The nsys trace file path (if provided)

Each agent, for every config in its group:

Reads the reference doc to find the relevant keywords and patterns for this config key.
Greps the server log for those patterns. Key search strategies:
- For transform configs: grep for
```
[stage=..., transform=<name>]
```
  and check the
```
[SUMMARY]
```
  line (
```
matches=N
```
  → APPLIED if N>0, SKIPPED if N=0).
- For configs with success/failure indicators: grep for those specific strings.
- For configs with no known log pattern: grep for
```
key=value
```
  or the key name near the value.
- For configs with
```
enabled: false
```
  : mark as DISABLED without log search.
Assigns a status based on what was found:
- APPLIED — log confirms the config took effect
- FAILED — log shows the config was attempted but fell back or errored
- SKIPPED — transform ran but found nothing to do (0 matches)
- DISABLED — config explicitly set
```
enabled: false
```
- UNKNOWN — no log evidence found (config may still be active but unlogged)
Records the evidence (the matching log line or lack thereof).

Output: Each agent returns a list of

{config, value, status, evidence}

entries for its group. Merge all 3 lists into the combined result.

[Double Check] For any UNKNOWN entries from step 4, investigate further before presenting results to the user (FAILED entries already have concrete log evidence and do not need double-checking):

Input: List of UNKNOWN config entries from step 4 output, the server log file, and references/config_log_patterns.md.
- Re-read references/config_log_patterns.md for alternative patterns
- Grep the log more broadly for the transform name:
```
[stage=..., transform=<name>]
```
- Look for
```
[APPLY]
```
  prefixed lines and
```
[SUMMARY]
```
  lines for that transform
- Check for
```
"Falling back"
```
  ,
```
"Skipping"
```
  , or
```
"failed"
```
  near the transform logs
- If graph dump directory was provided:
  - Graph files are named
```
NNN_stage_transform.txt
```
    — each contains the FX graph AFTER that transform. Compare before/after by reading consecutive files.
  - Graph evidence can upgrade UNKNOWN to APPLIED (e.g., collective ops after lm_head confirm sharding, fused custom ops confirm fusion transforms).
  - Graph analysis verifies: sharding (collective ops, weight shape changes), attention backend (op types), MoE fusion (fused op presence), GEMM fusion (linear op count changes), RMSNorm/SwiGLU/RoPE pattern matching (custom op presence).
  - See references/graph_verification_patterns.md for the full list of graph-based checks.
- If nsys trace was provided, check for executor-level configs tagged
```
[nsys]
```
  in the reference doc (e.g.,
```
enable_chunked_prefill
```
  ,
```
enable_block_reuse
```
  , multi-stream concurrency, CUDA graph capture/replay)
Output: For each investigated UNKNOWN entry, either additional evidence found (with status upgrade) or confirmation that the config is genuinely unlogged.
[Report] Present the final results to the user.

ALWAYS show the full detailed table. Do NOT summarize or condense. Present one row per config with columns:
- Config — the config key and its value (e.g.,
```
compile_backend = torch-cudagraph
```
  )
- Result — one of: APPLIED, FAILED, SKIPPED, DISABLED, UNKNOWN
- Evidence — the log line or pattern that proves the result
After the table, show the summary line (e.g.,
```
Total configs checked: 29 | APPLIED: 23 | ...
```
) and any FAILED/WARNING details. Include any additional findings from the Double Check step (step 5).
If the user requested output files, write:
- Table output — the human-friendly table as plain text
- JSON output — machine-friendly JSON with
```
results
```
  array and
```
summary
```
  object

[收集输入] 向用户请求以下输入：
- TensorRT-LLM源码目录（必填）——TensorRT-LLM仓库根目录路径。用于交叉核对
```
default.yaml
```
  和源码以获取最新日志模式。
- YAML配置文件路径（必填）——运行时使用的一个或多个AutoDeploy配置文件。提供多个YAML文件时，会从左到右深度合并：后续文件覆盖前置文件的重叠键。告知用户："如果您有多个配置文件（例如默认配置和用户自定义覆盖配置），请按优先级顺序列出——优先级最低的在前，最高的在后。"
- 服务器日志文件路径（必填）——服务器生成的日志输出
- 图转储目录（可选但推荐）——
```
AD_DUMP_GRAPHS_DIR
```
  输出目录，包含每个转换的图快照文件。文件命名为
```
NNN_stage_transform.txt
```
  ，展示每个转换后的图。提供该目录时，图分析可提供额外证据（例如验证分片权重、集合操作、融合操作），这对解决UNKNOWN结果尤为有用。
- Nsys追踪文件（可选）——服务器运行生成的Nsight Systems分析文件（
```
.nsys-rep
```
  或
```
.sqlite
```
  格式）。可用于验证无日志输出的执行器级配置（如
```
enable_chunked_prefill
```
  、多流并发、CUDA图捕获/重放）。
- TensorRT-LLM源码参考路径：
  - 示例配置：
```
<trtllm_src>/examples/auto_deploy/model_registry/configs/*.yaml
```
  - 默认转换配置（所有可用转换及其默认值）：
```
<trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml
```
[更新参考文档] 在检查配置前，确保捆绑的参考文档与TensorRT-LLM源码保持同步。
启动
```
ad-conf-check-update
```
代理，传入：
- ```
<trtllm_src>
```
  ——步骤1中获取的TensorRT-LLM源码目录
- ```
<skill_dir>
```
  ——包含此SKILL.md文件的目录
该代理会对比
```
<trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml
```
和AutoDeploy源码与
```
<skill_dir>/references/config_log_patterns.md
```
的差异。如果有配置被添加、删除、重命名，或日志模式发生变化，代理会就地更新参考文档并报告变更内容。
代理完成后：
- 如果参考文档已更新，告知用户："已更新references/config_log_patterns.md以匹配最新的TensorRT-LLM源码——请查看下面代理的变更摘要。" 然后展示代理的摘要内容。
- 如果无需变更，简要说明："参考文档与TensorRT-LLM源码保持同步。"

[解析配置] 运行解析脚本以扁平化YAML配置文件（

<skill_dir>

是包含此SKILL.md文件的目录）：

输入： 以TensorRT-LLM的

default.yaml

为基础，后跟步骤1中用户提供的YAML配置文件路径。始终先包含

default.yaml

，以便用户配置覆盖默认值。

bash

python3 <skill_dir>/scripts/parse_config.py <trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml <yaml_path1> [<yaml_path2> ...]

该脚本会从左到右深度合并YAML文件（后续文件覆盖前置文件），并将嵌套键扁平化为点分隔格式（例如

kv_cache_config.enable_block_reuse

）。通过先包含

default.yaml

，即使用户仅覆盖了部分配置，输出中也会包含所有已知的配置键。

输出： 包含所有配置

{key, value}

对的扁平JSON。示例：

json

{
  "yaml_files": ["default.yaml", "user_override.yaml"],
  "total_configs": 15,
  "configs": [
    {"key": "compile_backend", "value": "torch-cudagraph"},
    {"key": "kv_cache_config.free_gpu_memory_fraction", "value": "0.85"},
    {"key": "transforms.compile_model.piecewise_enabled", "value": "True"}
  ]
}

[快速扫描] 使用并行代理检查每个配置与服务器日志的匹配情况。

输入： 步骤3生成的配置列表、步骤1提供的服务器日志路径，以及references/config_log_patterns.md。

将步骤3的配置按章节分为3组，并行启动3个代理，每个代理检查其对应的组：

代理	配置组	键前缀	参考章节
代理1	顶级配置	`runtime` , `compile_backend` , `attn_backend` , `max_seq_len` , `max_num_tokens` , `max_batch_size` , `cuda_graph_batch_sizes` , `enable_chunked_prefill` , `model_factory` , `dtype` 等	"Top-Level Config Parameters"
代理2	KV缓存配置	`kv_cache_config.*`	"kv_cache_config Parameters"
代理3	转换配置	`transforms.` （或任何匹配转换名称的键，如 `compile_model` , `detect_sharding` , `multi_stream_` , `fuse_` , `gather_logits_` 等）	"Transform Parameters"

每个代理会收到：

其对应的
```
{key, value}
```
子集
服务器日志文件路径
参考文档references/config_log_patterns.md（包含验证源标签：
```
[log]
```
,
```
[graph]
```
,
```
[nsys]
```
）
Nsys追踪文件路径（如果提供）

每个代理会对其组内的每个配置执行以下操作：

读取参考文档，找到与此配置键相关的关键词和模式。
在服务器日志中搜索这些模式。核心搜索策略：
- 对于转换配置：搜索
```
[stage=..., transform=<name>]
```
  并检查
```
[SUMMARY]
```
  行（
```
matches=N
```
  → 若N>0则标记为APPLIED，若N=0则标记为SKIPPED）。
- 对于带有成功/失败标识的配置：搜索特定字符串。
- 对于无已知日志模式的配置：搜索
```
key=value
```
  或键名附近的值。
- 对于设置
```
enabled: false
```
  的配置：无需搜索日志，直接标记为DISABLED。
根据搜索结果分配状态：
- APPLIED —— 日志确认配置已生效
- FAILED —— 日志显示配置已尝试但回退或出错
- SKIPPED —— 转换已运行但未找到可处理内容（0匹配）
- DISABLED —— 配置显式设置为
```
enabled: false
```
- UNKNOWN —— 未找到日志证据（配置可能仍在生效但未记录日志）
记录证据（匹配的日志行或无匹配的说明）。

输出： 每个代理返回其组内的

{config, value, status, evidence}

条目列表。将3个列表合并为组合结果。

[二次检查] 针对步骤4中任何UNKNOWN条目，在向用户展示结果前进行进一步调查（FAILED条目已有明确日志证据，无需二次检查）：

输入： 步骤4输出中的UNKNOWN配置条目列表、服务器日志文件，以及references/config_log_patterns.md。
- 重新读取references/config_log_patterns.md以查找替代模式
- 在日志中更广泛地搜索转换名称：
```
[stage=..., transform=<name>]
```
- 查找带有
```
[APPLY]
```
  前缀的行和该转换的
```
[SUMMARY]
```
  行
- 检查转换日志附近是否有
```
"Falling back"
```
  、
```
"Skipping"
```
  或
```
"failed"
```
  字样
- 如果提供了图转储目录：
  - 图文件命名为
```
NNN_stage_transform.txt
```
    ——每个文件包含该转换后的FX图。通过读取连续文件对比转换前后的差异。
  - 图证据可将UNKNOWN升级为APPLIED（例如lm_head后的集合操作确认分片，融合自定义操作确认融合转换）。
  - 图分析可验证：分片（集合操作、权重形状变化）、注意力后端（操作类型）、MoE融合（融合操作存在）、GEMM融合（线性操作数量变化）、RMSNorm/SwiGLU/RoPE模式匹配（自定义操作存在）。
  - 完整的基于图的检查列表请参考references/graph_verification_patterns.md。
- 如果提供了Nsys追踪文件，检查参考文档中标记为
```
[nsys]
```
  的执行器级配置（如
```
enable_chunked_prefill
```
  、
```
enable_block_reuse
```
  、多流并发、CUDA图捕获/重放）
输出： 对于每个被调查的UNKNOWN条目，要么找到额外证据（并升级状态），要么确认该配置确实无日志记录。
[报告] 向用户展示最终结果。

始终展示完整的详细表格，请勿总结或压缩。每行展示一个配置，包含列：
- 配置——配置键及其值（例如
```
compile_backend = torch-cudagraph
```
  ）
- 结果——以下之一：APPLIED、FAILED、SKIPPED、DISABLED、UNKNOWN
- 证据——证明结果的日志行或模式
在表格之后，展示汇总行（例如
```
已检查配置总数: 29 | APPLIED: 23 | ...
```
）以及任何FAILED/WARNING详情。包含二次检查步骤（步骤5）中的所有额外发现。
如果用户请求输出文件，写入：
- 表格输出——纯文本格式的易读表格
- JSON输出——包含
```
results
```
  数组和
```
summary
```
  对象的机器友好型JSON

Key Patterns to Know

核心模式须知

Every transform logs:

[stage=<stage>, transform=<name>] [SUMMARY] matches=N | time: ...

Piecewise success chain:

dual-mode enabled

prepared with N submodules

captured graphs

Piecewise failure:

"model is not a GraphModule...Falling back to eager execution"

Sharding:

"Using allreduce strategy: SYMM_MEM"

"Applied N TP shards from config"

每个转换都会记录：

[stage=<stage>, transform=<name>] [SUMMARY] matches=N | time: ...

分段成功链：

dual-mode enabled

prepared with N submodules

captured graphs

分段失败：

"model is not a GraphModule...Falling back to eager execution"

分片：

"Using allreduce strategy: SYMM_MEM"

"Applied N TP shards from config"

Gotchas

注意事项

Every YAML key must appear in the output. Check all configs from the YAML, not just ones with known patterns. If a config key has no entry in the reference doc, grep the log for the key name and value. New/unknown configs should still be reported — never silently skip them.
UNKNOWN does not mean the config was ignored. Some configs (e.g.,
```
enable_chunked_prefill
```
,
```
enable_block_reuse
```
) are consumed at executor/runtime level and produce no log output. UNKNOWN means "no log evidence found", not "config was not applied".
Deprecated config names may cause FAILED. For example,
```
torch_dtype
```
is deprecated in favor of
```
dtype
```
, and
```
cuda_graph_batch_sizes
```
(top-level) is replaced by
```
cuda_graph_config.batch_sizes
```
. Look for deprecation warning messages in the log. Old keys may be silently ignored.
Runtime may adjust configured values. For example,
```
max_seq_len
```
may be configured as 262144 but adjusted down to 16384 at runtime due to memory constraints. Report this as APPLIED with a WARNING annotation.
ANSI color codes in logs. AutoDeploy uses colored log output. Strip or ignore ANSI escape sequences when matching patterns.
Reference doc is auto-updated. Step 2 runs the
```
ad-conf-check-update
```
agent to sync references/config_log_patterns.md with the latest TensorRT-LLM source before any config checking begins. If the agent reports changes, review its summary to understand what shifted.

每个YAML键都必须出现在输出中。检查YAML中的所有配置，而不仅仅是有已知模式的配置。如果某个配置键在参考文档中没有条目，在日志中搜索键名和值。新的/未知的配置仍需报告——切勿静默跳过。
UNKNOWN并不意味着配置被忽略。某些配置（如
```
enable_chunked_prefill
```
、
```
enable_block_reuse
```
）在执行器/运行时级别被消费，不会产生日志输出。UNKNOWN表示“未找到日志证据”，而非“配置未应用”。
已弃用的配置名称可能导致FAILED。例如
```
torch_dtype
```
已被
```
dtype
```
取代，顶级
```
cuda_graph_batch_sizes
```
已被
```
cuda_graph_config.batch_sizes
```
取代。请在日志中查找弃用警告信息。旧键可能被静默忽略。
运行时可能调整配置值。例如
```
max_seq_len
```
可能被配置为262144，但由于内存限制，运行时会下调至16384。此情况应报告为APPLIED并添加WARNING注释。
日志中的ANSI颜色代码。AutoDeploy使用带颜色的日志输出。匹配模式时请剥离或忽略ANSI转义序列。
参考文档会自动更新。步骤2会运行
```
ad-conf-check-update
```
代理，在开始任何配置检查前同步references/config_log_patterns.md与最新的TensorRT-LLM源码。如果代理报告有变更，请查看其摘要以了解具体变化。",