ad-conf-check

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AutoDeploy Config Checker

AutoDeploy配置检查器

Verify that AutoDeploy YAML configs were applied at runtime by cross-referencing with server logs and optionally graph dumps.
通过与服务器日志及可选的图转储交叉对比,验证AutoDeploy YAML配置是否在运行时已应用。

Input

输入

  • TensorRT-LLM source directory (required) — path to the TensorRT-LLM repo root. Used to read the latest
    default.yaml
    and source code for up-to-date log patterns (the bundled reference doc may be stale).
  • YAML config file path(s) (required) — one or more AutoDeploy YAML configs. When multiple files are provided, they are deep-merged left-to-right (later files override earlier ones for overlapping keys).
  • Server log file path (required) — log output from the AutoDeploy server run.
  • Graph dump directory (optional) —
    AD_DUMP_GRAPHS_DIR
    output directory containing per-transform graph snapshots (
    NNN_stage_transform.txt
    ). Provides additional evidence for resolving UNKNOWN results.
  • Nsys trace file (optional) — Nsight Systems profile (
    .nsys-rep
    or
    .sqlite
    ) from the server run. Useful for verifying executor-level configs that produce no log output (e.g.,
    enable_chunked_prefill
    , multi-stream concurrency, CUDA graph capture/replay).
  • Table output file path (optional) — path to write human-friendly table results.
  • JSON output file path (optional) — path to write machine-friendly JSON results.
  • TensorRT-LLM源码目录(必填)——TensorRT-LLM仓库根目录路径。用于读取最新的
    default.yaml
    和源码以获取最新日志模式(捆绑的参考文档可能过时)。
  • YAML配置文件路径(必填)——一个或多个AutoDeploy YAML配置文件。提供多个文件时,会从左到右深度合并(后续文件会覆盖前置文件的重叠键)。
  • 服务器日志文件路径(必填)——AutoDeploy服务器运行生成的日志输出。
  • 图转储目录(可选)——
    AD_DUMP_GRAPHS_DIR
    输出目录,包含每个转换的图快照文件(
    NNN_stage_transform.txt
    )。可为解决UNKNOWN结果提供额外证据。
  • Nsys追踪文件(可选)——服务器运行生成的Nsight Systems分析文件(
    .nsys-rep
    .sqlite
    格式)。可用于验证无日志输出的执行器级配置(如
    enable_chunked_prefill
    、多流并发、CUDA图捕获/重放)。
  • 表格输出文件路径(可选)——用于写入易读表格结果的路径。
  • JSON输出文件路径(可选)——用于写入机器友好型JSON结果的路径。

Output

输出

Human-friendly table (always presented to user)

易读表格(始终向用户展示)

  • Verification table — one row per config key with columns: Config (key=value), Result (APPLIED / FAILED / SKIPPED / DISABLED / UNKNOWN), Evidence (log line or graph analysis proving the result).
  • Summary line — total counts per status (e.g.,
    Total configs checked: 29 | APPLIED: 23 | UNKNOWN: 4 | ...
    ).
  • FAILED/WARNING details — expanded information for any configs that failed or had warnings.
  • 验证表格——每个配置键对应一行,包含列:配置(key=value)、结果(APPLIED / FAILED / SKIPPED / DISABLED / UNKNOWN)、证据(证明结果的日志行或图分析内容)。
  • 汇总行——各状态的总计数(例如:
    已检查配置总数: 29 | APPLIED: 23 | UNKNOWN: 4 | ...
    )。
  • FAILED/WARNING详情——对任何配置失败或存在警告的展开信息。

Machine-friendly JSON (when JSON output path is given)

机器友好型JSON(指定JSON输出路径时生成)

JSON file with two top-level keys:
  • results
    — array of objects, each with
    config
    ,
    value
    ,
    status
    ,
    evidence
    .
  • summary
    — object with
    total
    (int) and
    counts
    (object mapping status to count, only non-zero statuses included).
包含两个顶级键的JSON文件:
  • results
    ——对象数组,每个对象包含
    config
    value
    status
    evidence
    字段。
  • summary
    ——对象,包含
    total
    (整数)和
    counts
    (对象,映射状态到计数,仅包含非零状态)。

Workflow

工作流程

  1. [Collect Inputs] Ask the user for the following inputs:
    • TensorRT-LLM source directory (required) — path to the TensorRT-LLM repo root. Used to cross-check
      default.yaml
      and source code for the latest log patterns.
    • YAML config file path(s) (required) — one or more AutoDeploy configs used for the run. When multiple YAMLs are provided, they are deep-merged left-to-right: later files override earlier ones for overlapping keys. Tell the user: "If you have multiple configs (e.g., a default config and a user override), list them in priority order — lowest priority first, highest priority last."
    • Server log file path (required) — the log output from the server
    • Graph dump directory (optional but recommended) — the
      AD_DUMP_GRAPHS_DIR
      output directory containing per-transform graph snapshots. Files are named
      NNN_stage_transform.txt
      and show the graph AFTER each transform. When provided, graph analysis provides additional evidence (e.g., verifying sharded weights, collective ops, fused ops). This is especially useful for resolving UNKNOWN results.
    • Nsys trace file (optional) — Nsight Systems profile (
      .nsys-rep
      or
      .sqlite
      ) from the server run. Useful for verifying executor-level configs that produce no log output (e.g.,
      enable_chunked_prefill
      , multi-stream concurrency, CUDA graph capture/replay).
    • TensorRT-LLM source reference paths:
      • Example configs:
        <trtllm_src>/examples/auto_deploy/model_registry/configs/*.yaml
      • Default transform config (all available transforms and their defaults):
        <trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml
  2. [Update Reference Doc] Before checking configs, ensure the bundled reference doc is up-to-date with the TensorRT-LLM source.
    Launch the
    ad-conf-check-update
    agent with:
    • <trtllm_src>
      — the TensorRT-LLM source directory from step 1
    • <skill_dir>
      — the directory containing this SKILL.md file
    The agent compares
    <trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml
    and the AutoDeploy source code against
    <skill_dir>/references/config_log_patterns.md
    . If any configs were added, removed, renamed, or if log patterns have changed, the agent updates the reference doc in-place and reports what changed.
    After the agent completes:
    • If the reference doc was updated, inform the user: "Updated references/config_log_patterns.md to match the latest TensorRT-LLM source — see the agent's change summary below." Then show the agent's summary.
    • If no changes were needed, briefly note: "Reference doc is up-to-date with the TensorRT-LLM source."
  3. [Parse Configs] Run the parser script to flatten the YAML configs (
    <skill_dir>
    is the directory containing this SKILL.md file):
    Input: The TensorRT-LLM
    default.yaml
    as the base, followed by the user's YAML config path(s) from step 1. Always include
    default.yaml
    first so that user configs override the defaults.
    bash
    python3 <skill_dir>/scripts/parse_config.py <trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml <yaml_path1> [<yaml_path2> ...]
    This deep-merges the YAML files left-to-right (later files override earlier ones) and flattens nested keys into dotted notation (e.g.,
    kv_cache_config.enable_block_reuse
    ). By including
    default.yaml
    first, every known config key appears in the output even if the user only overrode a subset.
    Output: Flat JSON with all config
    {key, value}
    pairs. Example:
    json
    {
      "yaml_files": ["default.yaml", "user_override.yaml"],
      "total_configs": 15,
      "configs": [
        {"key": "compile_backend", "value": "torch-cudagraph"},
        {"key": "kv_cache_config.free_gpu_memory_fraction", "value": "0.85"},
        {"key": "transforms.compile_model.piecewise_enabled", "value": "True"}
      ]
    }
  4. [Quick Scan] Check each config against the server log using parallel agents.
    Input: Config list from step 3, server log path from step 1, and references/config_log_patterns.md.
    Split the configs from step 3 into 3 groups by section and launch 3 agents in parallel, each checking its group:
    AgentConfig groupKeys starting withReference section
    Agent 1Top-level configs
    runtime
    ,
    compile_backend
    ,
    attn_backend
    ,
    max_seq_len
    ,
    max_num_tokens
    ,
    max_batch_size
    ,
    cuda_graph_batch_sizes
    ,
    enable_chunked_prefill
    ,
    model_factory
    ,
    dtype
    , etc.
    "Top-Level Config Parameters"
    Agent 2KV cache configs
    kv_cache_config.*
    "kv_cache_config Parameters"
    Agent 3Transform configs
    transforms.*
    (or any key matching a transform name like
    compile_model
    ,
    detect_sharding
    ,
    multi_stream_*
    ,
    fuse_*
    ,
    gather_logits_*
    , etc.)
    "Transform Parameters"
    Each agent receives:
    • Its subset of
      {key, value}
      pairs
    • The server log file path
    • The reference doc references/config_log_patterns.md (including verification source tags:
      [log]
      ,
      [graph]
      ,
      [nsys]
      )
    • The nsys trace file path (if provided)
    Each agent, for every config in its group:
    1. Reads the reference doc to find the relevant keywords and patterns for this config key.
    2. Greps the server log for those patterns. Key search strategies:
      • For transform configs: grep for
        [stage=..., transform=<name>]
        and check the
        [SUMMARY]
        line (
        matches=N
        → APPLIED if N>0, SKIPPED if N=0).
      • For configs with success/failure indicators: grep for those specific strings.
      • For configs with no known log pattern: grep for
        key=value
        or the key name near the value.
      • For configs with
        enabled: false
        : mark as DISABLED without log search.
    3. Assigns a status based on what was found:
      • APPLIED — log confirms the config took effect
      • FAILED — log shows the config was attempted but fell back or errored
      • SKIPPED — transform ran but found nothing to do (0 matches)
      • DISABLED — config explicitly set
        enabled: false
      • UNKNOWN — no log evidence found (config may still be active but unlogged)
    4. Records the evidence (the matching log line or lack thereof).
    Output: Each agent returns a list of
    {config, value, status, evidence}
    entries for its group. Merge all 3 lists into the combined result.
  5. [Double Check] For any UNKNOWN entries from step 4, investigate further before presenting results to the user (FAILED entries already have concrete log evidence and do not need double-checking):
    Input: List of UNKNOWN config entries from step 4 output, the server log file, and references/config_log_patterns.md.
    • Re-read references/config_log_patterns.md for alternative patterns
    • Grep the log more broadly for the transform name:
      [stage=..., transform=<name>]
    • Look for
      [APPLY]
      prefixed lines and
      [SUMMARY]
      lines for that transform
    • Check for
      "Falling back"
      ,
      "Skipping"
      , or
      "failed"
      near the transform logs
    • If graph dump directory was provided:
      • Graph files are named
        NNN_stage_transform.txt
        — each contains the FX graph AFTER that transform. Compare before/after by reading consecutive files.
      • Graph evidence can upgrade UNKNOWN to APPLIED (e.g., collective ops after lm_head confirm sharding, fused custom ops confirm fusion transforms).
      • Graph analysis verifies: sharding (collective ops, weight shape changes), attention backend (op types), MoE fusion (fused op presence), GEMM fusion (linear op count changes), RMSNorm/SwiGLU/RoPE pattern matching (custom op presence).
      • See references/graph_verification_patterns.md for the full list of graph-based checks.
    • If nsys trace was provided, check for executor-level configs tagged
      [nsys]
      in the reference doc (e.g.,
      enable_chunked_prefill
      ,
      enable_block_reuse
      , multi-stream concurrency, CUDA graph capture/replay)
    Output: For each investigated UNKNOWN entry, either additional evidence found (with status upgrade) or confirmation that the config is genuinely unlogged.
  6. [Report] Present the final results to the user.
    ALWAYS show the full detailed table. Do NOT summarize or condense. Present one row per config with columns:
    • Config — the config key and its value (e.g.,
      compile_backend = torch-cudagraph
      )
    • Result — one of: APPLIED, FAILED, SKIPPED, DISABLED, UNKNOWN
    • Evidence — the log line or pattern that proves the result
    After the table, show the summary line (e.g.,
    Total configs checked: 29 | APPLIED: 23 | ...
    ) and any FAILED/WARNING details. Include any additional findings from the Double Check step (step 5).
    If the user requested output files, write:
    • Table output — the human-friendly table as plain text
    • JSON output — machine-friendly JSON with
      results
      array and
      summary
      object
  1. [收集输入] 向用户请求以下输入:
    • TensorRT-LLM源码目录(必填)——TensorRT-LLM仓库根目录路径。用于交叉核对
      default.yaml
      和源码以获取最新日志模式。
    • YAML配置文件路径(必填)——运行时使用的一个或多个AutoDeploy配置文件。提供多个YAML文件时,会从左到右深度合并:后续文件覆盖前置文件的重叠键。告知用户:"如果您有多个配置文件(例如默认配置和用户自定义覆盖配置),请按优先级顺序列出——优先级最低的在前,最高的在后。"
    • 服务器日志文件路径(必填)——服务器生成的日志输出
    • 图转储目录(可选但推荐)——
      AD_DUMP_GRAPHS_DIR
      输出目录,包含每个转换的图快照文件。文件命名为
      NNN_stage_transform.txt
      ,展示每个转换后的图。提供该目录时,图分析可提供额外证据(例如验证分片权重、集合操作、融合操作),这对解决UNKNOWN结果尤为有用。
    • Nsys追踪文件(可选)——服务器运行生成的Nsight Systems分析文件(
      .nsys-rep
      .sqlite
      格式)。可用于验证无日志输出的执行器级配置(如
      enable_chunked_prefill
      、多流并发、CUDA图捕获/重放)。
    • TensorRT-LLM源码参考路径:
      • 示例配置:
        <trtllm_src>/examples/auto_deploy/model_registry/configs/*.yaml
      • 默认转换配置(所有可用转换及其默认值):
        <trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml
  2. [更新参考文档] 在检查配置前,确保捆绑的参考文档与TensorRT-LLM源码保持同步。
    启动
    ad-conf-check-update
    代理,传入:
    • <trtllm_src>
      ——步骤1中获取的TensorRT-LLM源码目录
    • <skill_dir>
      ——包含此SKILL.md文件的目录
    该代理会对比
    <trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml
    和AutoDeploy源码与
    <skill_dir>/references/config_log_patterns.md
    的差异。如果有配置被添加、删除、重命名,或日志模式发生变化,代理会就地更新参考文档并报告变更内容。
    代理完成后:
    • 如果参考文档已更新,告知用户:"已更新references/config_log_patterns.md以匹配最新的TensorRT-LLM源码——请查看下面代理的变更摘要。" 然后展示代理的摘要内容。
    • 如果无需变更,简要说明:"参考文档与TensorRT-LLM源码保持同步。"
  3. [解析配置] 运行解析脚本以扁平化YAML配置文件(
    <skill_dir>
    是包含此SKILL.md文件的目录):
    输入: 以TensorRT-LLM的
    default.yaml
    为基础,后跟步骤1中用户提供的YAML配置文件路径。始终先包含
    default.yaml
    ,以便用户配置覆盖默认值。
    bash
    python3 <skill_dir>/scripts/parse_config.py <trtllm_src>/tensorrt_llm/_torch/auto_deploy/config/default.yaml <yaml_path1> [<yaml_path2> ...]
    该脚本会从左到右深度合并YAML文件(后续文件覆盖前置文件),并将嵌套键扁平化为点分隔格式(例如
    kv_cache_config.enable_block_reuse
    )。通过先包含
    default.yaml
    ,即使用户仅覆盖了部分配置,输出中也会包含所有已知的配置键。
    输出: 包含所有配置
    {key, value}
    对的扁平JSON。示例:
    json
    {
      "yaml_files": ["default.yaml", "user_override.yaml"],
      "total_configs": 15,
      "configs": [
        {"key": "compile_backend", "value": "torch-cudagraph"},
        {"key": "kv_cache_config.free_gpu_memory_fraction", "value": "0.85"},
        {"key": "transforms.compile_model.piecewise_enabled", "value": "True"}
      ]
    }
  4. [快速扫描] 使用并行代理检查每个配置与服务器日志的匹配情况。
    输入: 步骤3生成的配置列表、步骤1提供的服务器日志路径,以及references/config_log_patterns.md
    将步骤3的配置按章节分为3组,并行启动3个代理,每个代理检查其对应的组:
    代理配置组键前缀参考章节
    代理1顶级配置
    runtime
    ,
    compile_backend
    ,
    attn_backend
    ,
    max_seq_len
    ,
    max_num_tokens
    ,
    max_batch_size
    ,
    cuda_graph_batch_sizes
    ,
    enable_chunked_prefill
    ,
    model_factory
    ,
    dtype
    "Top-Level Config Parameters"
    代理2KV缓存配置
    kv_cache_config.*
    "kv_cache_config Parameters"
    代理3转换配置
    transforms.*
    (或任何匹配转换名称的键,如
    compile_model
    ,
    detect_sharding
    ,
    multi_stream_*
    ,
    fuse_*
    ,
    gather_logits_*
    等)
    "Transform Parameters"
    每个代理会收到:
    • 其对应的
      {key, value}
      子集
    • 服务器日志文件路径
    • 参考文档references/config_log_patterns.md(包含验证源标签:
      [log]
      ,
      [graph]
      ,
      [nsys]
    • Nsys追踪文件路径(如果提供)
    每个代理会对其组内的每个配置执行以下操作:
    1. 读取参考文档,找到与此配置键相关的关键词和模式。
    2. 在服务器日志中搜索这些模式。核心搜索策略:
      • 对于转换配置:搜索
        [stage=..., transform=<name>]
        并检查
        [SUMMARY]
        行(
        matches=N
        → 若N>0则标记为APPLIED,若N=0则标记为SKIPPED)。
      • 对于带有成功/失败标识的配置:搜索特定字符串。
      • 对于无已知日志模式的配置:搜索
        key=value
        或键名附近的值。
      • 对于设置
        enabled: false
        的配置:无需搜索日志,直接标记为DISABLED。
    3. 根据搜索结果分配状态:
      • APPLIED —— 日志确认配置已生效
      • FAILED —— 日志显示配置已尝试但回退或出错
      • SKIPPED —— 转换已运行但未找到可处理内容(0匹配)
      • DISABLED —— 配置显式设置为
        enabled: false
      • UNKNOWN —— 未找到日志证据(配置可能仍在生效但未记录日志)
    4. 记录证据(匹配的日志行或无匹配的说明)。
    输出: 每个代理返回其组内的
    {config, value, status, evidence}
    条目列表。将3个列表合并为组合结果。
  5. [二次检查] 针对步骤4中任何UNKNOWN条目,在向用户展示结果前进行进一步调查(FAILED条目已有明确日志证据,无需二次检查):
    输入: 步骤4输出中的UNKNOWN配置条目列表、服务器日志文件,以及references/config_log_patterns.md
    • 重新读取references/config_log_patterns.md以查找替代模式
    • 在日志中更广泛地搜索转换名称:
      [stage=..., transform=<name>]
    • 查找带有
      [APPLY]
      前缀的行和该转换的
      [SUMMARY]
    • 检查转换日志附近是否有
      "Falling back"
      "Skipping"
      "failed"
      字样
    • 如果提供了图转储目录:
      • 图文件命名为
        NNN_stage_transform.txt
        ——每个文件包含该转换后的FX图。通过读取连续文件对比转换前后的差异。
      • 图证据可将UNKNOWN升级为APPLIED(例如lm_head后的集合操作确认分片,融合自定义操作确认融合转换)。
      • 图分析可验证:分片(集合操作、权重形状变化)、注意力后端(操作类型)、MoE融合(融合操作存在)、GEMM融合(线性操作数量变化)、RMSNorm/SwiGLU/RoPE模式匹配(自定义操作存在)。
      • 完整的基于图的检查列表请参考references/graph_verification_patterns.md
    • 如果提供了Nsys追踪文件,检查参考文档中标记为
      [nsys]
      的执行器级配置(如
      enable_chunked_prefill
      enable_block_reuse
      、多流并发、CUDA图捕获/重放)
    输出: 对于每个被调查的UNKNOWN条目,要么找到额外证据(并升级状态),要么确认该配置确实无日志记录。
  6. [报告] 向用户展示最终结果。
    始终展示完整的详细表格,请勿总结或压缩。每行展示一个配置,包含列:
    • 配置——配置键及其值(例如
      compile_backend = torch-cudagraph
    • 结果——以下之一:APPLIED、FAILED、SKIPPED、DISABLED、UNKNOWN
    • 证据——证明结果的日志行或模式
    在表格之后,展示汇总行(例如
    已检查配置总数: 29 | APPLIED: 23 | ...
    )以及任何FAILED/WARNING详情。包含二次检查步骤(步骤5)中的所有额外发现。
    如果用户请求输出文件,写入:
    • 表格输出——纯文本格式的易读表格
    • JSON输出——包含
      results
      数组和
      summary
      对象的机器友好型JSON

Key Patterns to Know

核心模式须知

  • Every transform logs:
    [stage=<stage>, transform=<name>] [SUMMARY] matches=N | time: ...
  • Piecewise success chain:
    dual-mode enabled
    ->
    prepared with N submodules
    ->
    captured graphs
  • Piecewise failure:
    "model is not a GraphModule...Falling back to eager execution"
  • Sharding:
    "Using allreduce strategy: SYMM_MEM"
    ,
    "Applied N TP shards from config"
  • 每个转换都会记录:
    [stage=<stage>, transform=<name>] [SUMMARY] matches=N | time: ...
  • 分段成功链:
    dual-mode enabled
    ->
    prepared with N submodules
    ->
    captured graphs
  • 分段失败:
    "model is not a GraphModule...Falling back to eager execution"
  • 分片:
    "Using allreduce strategy: SYMM_MEM"
    ,
    "Applied N TP shards from config"

Gotchas

注意事项

  • Every YAML key must appear in the output. Check all configs from the YAML, not just ones with known patterns. If a config key has no entry in the reference doc, grep the log for the key name and value. New/unknown configs should still be reported — never silently skip them.
  • UNKNOWN does not mean the config was ignored. Some configs (e.g.,
    enable_chunked_prefill
    ,
    enable_block_reuse
    ) are consumed at executor/runtime level and produce no log output. UNKNOWN means "no log evidence found", not "config was not applied".
  • Deprecated config names may cause FAILED. For example,
    torch_dtype
    is deprecated in favor of
    dtype
    , and
    cuda_graph_batch_sizes
    (top-level) is replaced by
    cuda_graph_config.batch_sizes
    . Look for deprecation warning messages in the log. Old keys may be silently ignored.
  • Runtime may adjust configured values. For example,
    max_seq_len
    may be configured as 262144 but adjusted down to 16384 at runtime due to memory constraints. Report this as APPLIED with a WARNING annotation.
  • ANSI color codes in logs. AutoDeploy uses colored log output. Strip or ignore ANSI escape sequences when matching patterns.
  • Reference doc is auto-updated. Step 2 runs the
    ad-conf-check-update
    agent to sync references/config_log_patterns.md with the latest TensorRT-LLM source before any config checking begins. If the agent reports changes, review its summary to understand what shifted.
  • 每个YAML键都必须出现在输出中。检查YAML中的所有配置,而不仅仅是有已知模式的配置。如果某个配置键在参考文档中没有条目,在日志中搜索键名和值。新的/未知的配置仍需报告——切勿静默跳过。
  • UNKNOWN并不意味着配置被忽略。某些配置(如
    enable_chunked_prefill
    enable_block_reuse
    )在执行器/运行时级别被消费,不会产生日志输出。UNKNOWN表示“未找到日志证据”,而非“配置未应用”。
  • 已弃用的配置名称可能导致FAILED。例如
    torch_dtype
    已被
    dtype
    取代,顶级
    cuda_graph_batch_sizes
    已被
    cuda_graph_config.batch_sizes
    取代。请在日志中查找弃用警告信息。旧键可能被静默忽略。
  • 运行时可能调整配置值。例如
    max_seq_len
    可能被配置为262144,但由于内存限制,运行时会下调至16384。此情况应报告为APPLIED并添加WARNING注释。
  • 日志中的ANSI颜色代码。AutoDeploy使用带颜色的日志输出。匹配模式时请剥离或忽略ANSI转义序列。
  • 参考文档会自动更新。步骤2会运行
    ad-conf-check-update
    代理,在开始任何配置检查前同步references/config_log_patterns.md与最新的TensorRT-LLM源码。如果代理报告有变更,请查看其摘要以了解具体变化。",