tao-analyze-gaps-visual-changenet

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

TAO VCN Classify Gap Analysis Skill

TAO VCN Classify差距分析Skill

You are an analyst for NVIDIA TAO VCN Classify (Visual Component Net) inference results. Your job is to identify the weakest samples per ground-truth label by measuring signed distance from the decision threshold in the wrong direction, then surface them for downstream augmentation or relabeling.
This skill is intentionally lightweight. VCN's classify head is a single-score binary boundary (PASS vs NO_PASS by
siamese_score
), so the analysis is computational, not investigative. The whole computation lives behind one direct
docker run
invocation against the
tao_toolkit.data_services
image declared in
versions.yaml
(resolved at runtime — see Setup). The container's entrypoint takes
<category> <action> [hydra overrides...]
; we pass
gap_analysis vcn_aoi key=value …
. You do not need subagents, multi-phase image audits, or component-type clustering — VCN does not expose those dimensions. View only a small set of representative weak samples to qualify the gaps after the container returns.
The CLI surface can shift between data-services container builds. If a
gap_analysis vcn_aoi
invocation fails on argument parsing, introspect the actual schema once per image with
docker run --rm "$DS_IMAGE" gap_analysis vcn_aoi --cfg=job
and reconcile any renamed keys before retrying. See
references/troubleshooting.md
for the key-rename reconciliation and the full pitfalls list. The output parquet name is
kpi_gaps.parquet
.

你是NVIDIA TAO VCN Classify(视觉组件网络)推理结果的分析师。你的任务是通过计算样本与决策阈值的反向符号距离,识别每个真实标签下的最弱样本,然后将这些样本提交给下游的增强或重新标注流程。
这个Skill设计得十分轻量化。VCN的分类头是单一分数的二元边界(通过
siamese_score
区分PASS与NO_PASS),因此分析是计算性的,而非调查性的。整个计算过程通过对
versions.yaml
中声明的
tao_toolkit.data_services
镜像执行一次直接的
docker run
调用即可完成(运行时解析——见设置部分)。容器的入口点接收
<category> <action> [hydra overrides...]
参数;我们传入
gap_analysis vcn_aoi key=value …
。你不需要使用子Agent、多阶段图像审核或组件类型聚类——VCN不提供这些维度的信息。容器返回结果后,只需查看少量代表性的弱样本即可确认差距情况。
CLI界面可能会随data-services容器版本更新而变化。如果
gap_analysis vcn_aoi
调用因参数解析失败,可通过
docker run --rm "$DS_IMAGE" gap_analysis vcn_aoi --cfg=job
查看当前镜像的实际配置模式,在重试前调整任何重命名的参数键。有关参数键重命名调整和完整问题列表,请参阅
references/troubleshooting.md
。输出的parquet文件名为
kpi_gaps.parquet

Inputs

输入

  1. Experiment result directory — contains
    inference/inference.csv
    (required columns
    input_path
    ,
    object_name
    ,
    label
    ,
    siamese_score
    ). Pass the directory (e.g.
    inference/latest/
    ), not the CSV file.
  2. Training code/config directory — contains the VCN train YAML; the container reads
    dataset.classify.input_map
    and
    dataset.classify.image_ext
    for per-lighting expansion.
  3. Dataset directory — image root (
    kpi_media_path
    ) prepended to each row's relative
    input_path
    .
  4. Schema overrides
    min_recall
    ,
    top_k_per_label
    , and optionally a hard-pinned
    threshold
    , passed as Hydra overrides (defaults:
    min_recall=1.0
    ,
    top_k_per_label=50
    ,
    threshold=-1.0
    meaning sweep).
    top_k_per_label
    must be a positive integer
    — omitting it flips the container into "below-threshold filter" mode, which at
    min_recall=1.0
    returns only PASS misclassifications and zero NO_PASS rows.
See
references/parameters-and-artifacts.md
for the full input detail, the
GapAnalysisConfig
override semantics, and the per-default explanation.

  1. 实验结果目录——包含
    inference/inference.csv
    (必填列:
    input_path
    object_name
    label
    siamese_score
    )。需传入目录(例如
    inference/latest/
    ),而非CSV文件。
  2. 训练代码/配置目录——包含VCN训练YAML文件;容器会读取
    dataset.classify.input_map
    dataset.classify.image_ext
    用于按光照条件展开样本。
  3. 数据集目录——图像根目录(
    kpi_media_path
    ),会添加到每行的相对
    input_path
    前。
  4. 配置覆盖参数——
    min_recall
    top_k_per_label
    ,可选硬固定的
    threshold
    ,作为Hydra覆盖参数传入(默认值:
    min_recall=1.0
    top_k_per_label=50
    threshold=-1.0
    表示自动搜索阈值)。
    top_k_per_label
    必须为正整数
    ——省略该参数会将容器切换为“阈值以下过滤”模式,在
    min_recall=1.0
    时仅返回分类错误的PASS样本,无NO_PASS样本。
有关完整输入细节、
GapAnalysisConfig
覆盖参数语义及默认值说明,请参阅
references/parameters-and-artifacts.md

Setup

设置

The threshold sweep, weakness ranking, and per-lighting expansion all run inside the
tao_toolkit.data_services
image declared in
versions.yaml
. Resolve the concrete URI once at the top of the run, then confirm Docker, the NVIDIA container toolkit, and a GPU are present and ensure the image is cached:
bash
undefined
阈值搜索、弱点排名和按光照条件展开均在
versions.yaml
中声明的
tao_toolkit.data_services
镜像内运行。在运行开始时先解析具体的URI,然后确认Docker、NVIDIA容器工具包和GPU已就绪,并确保镜像已缓存:
bash
undefined

Resolve tao_toolkit.data_services → concrete nvcr.io/... URI from versions.yaml

从versions.yaml中解析tao_toolkit.data_services → 具体的nvcr.io/... URI

DS_IMAGE=$(python3 -c "import yaml,os; print(yaml.safe_load(open(os.environ['TAO_SKILL_BANK_PATH']+'/versions.yaml'))['images']['tao_toolkit']['data_services'])") echo "DS_IMAGE=$DS_IMAGE"
docker info > /dev/null && echo "OK: docker" nvidia-smi > /dev/null && echo "OK: GPU" docker image inspect "$DS_IMAGE" > /dev/null
|| docker pull "$DS_IMAGE"

`TAO_SKILL_BANK_PATH` is exported by the plugin's `session_start` hook. If it is unset (e.g. running outside the Claude Code plugin), point it at the skill-bank repo root before resolving.

A GPU is required (the same image is used across the AOI loop and other actions assume CUDA is present). Aborting early on a GPU-less host saves a confusing late error.

**Path mounting.** Every host path the container reads or writes — `inference.csv`, the train YAML, the dataset image root, and the output dir — must be bind-mounted. The simplest pattern is to mount the workspace root with **identical paths** inside and outside the container so absolute paths in args resolve the same on both sides:

```bash
WORKSPACE=<absolute path that contains inference.csv, train YAML, dataset images, and the output dir>
DOCKER="docker run --gpus all --rm --ipc=host --user $(id -u):$(id -g) -v $WORKSPACE:$WORKSPACE -w $WORKSPACE $DS_IMAGE"
If
inference.csv
, the train YAML, and the dataset images live in different roots, pass multiple
-v
flags — but every absolute path you pass in args must resolve inside the container.
CLI overrides cover the common case.
min_recall
,
top_k_per_label
, and optionally
threshold
are passed as Hydra overrides on the command line; defaults baked into the container (
min_recall=1.0
,
top_k_per_label=50
,
threshold=-1.0
to sweep) handle most runs. If the container also accepts a spec file via
-e <spec>
(verify with
--cfg=job
), passing one is a convenience, not a requirement — override only what you need.

DS_IMAGE=$(python3 -c "import yaml,os; print(yaml.safe_load(open(os.environ['TAO_SKILL_BANK_PATH']+'/versions.yaml'))['images']['tao_toolkit']['data_services'])") echo "DS_IMAGE=$DS_IMAGE"
docker info > /dev/null && echo "OK: docker" nvidia-smi > /dev/null && echo "OK: GPU" docker image inspect "$DS_IMAGE" > /dev/null
|| docker pull "$DS_IMAGE"

`TAO_SKILL_BANK_PATH`由插件的`session_start`钩子导出。如果未设置(例如在Claude Code插件外运行),请在解析前将其指向skill-bank仓库根目录。

必须使用GPU(AOI循环和其他操作均假设CUDA可用,因此同一镜像会被复用)。在无GPU的主机上提前终止运行可避免后续出现令人困惑的错误。

**路径挂载**。容器需要读取或写入的所有主机路径——`inference.csv`、训练YAML文件、数据集图像根目录和输出目录——都必须进行绑定挂载。最简单的方式是将工作区根目录以**内外路径一致**的方式挂载,这样参数中的绝对路径在容器内外均可解析:

```bash
WORKSPACE=<包含inference.csv、训练YAML、数据集图像和输出目录的绝对路径>
DOCKER="docker run --gpus all --rm --ipc=host --user $(id -u):$(id -g) -v $WORKSPACE:$WORKSPACE -w $WORKSPACE $DS_IMAGE"
如果
inference.csv
、训练YAML和数据集图像位于不同根目录,请传入多个
-v
参数——但你传入的每个绝对路径都必须能在容器内解析。
CLI覆盖参数覆盖常见场景
min_recall
top_k_per_label
和可选的
threshold
作为Hydra覆盖参数在命令行传入;容器内置的默认值(
min_recall=1.0
top_k_per_label=50
threshold=-1.0
表示自动搜索)可处理大多数运行场景。如果容器也支持通过
-e <spec>
传入配置文件(请通过
--cfg=job
验证),传入配置文件只是一种便捷方式,并非必需——仅覆盖你需要修改的参数即可。

Method

方法

The whole skill is a single
docker run
invocation followed by a small visual spot-check. The container does Steps 1–4 internally (threshold sweep, weakness scoring, top-K selection, per-lighting expansion). You handle Step 5 (visual spot-check) directly with the Read tool.
整个Skill仅需一次
docker run
调用,随后进行少量视觉抽查。容器会在内部完成步骤1-4(阈值搜索、弱点评分、top-K选择、按光照条件展开)。你需直接使用Read工具完成步骤5(视觉抽查)。

Step 1–4 — Run the container

步骤1-4 — 运行容器

bash
$DOCKER gap_analysis vcn_aoi \
    inference_results_dir=<exp_dir>/inference/<label>/ \
    train_config=<exp_dir>/train.yaml \
    kpi_media_path=<dataset_root> \
    results_dir=<rca_results_dir> \
    top_k_per_label=50
Always pass
top_k_per_label
.
This is the argument that switches the container from the default "samples below threshold" filter into proper top-K-per-label ranking. At
min_recall=1.0
the threshold is by construction at-or-below every NO_PASS score, so the below-threshold filter returns ONLY misclassified PASS rows and zero NO_PASS rows — useless as an augmentation queue. With
top_k_per_label
set to a positive integer (either in the spec or as a Hydra override), the container computes signed weakness against the threshold for every row and surfaces the K weakest per ground-truth label, which is the per-label ranked output downstream steps consume.
The container sweeps every unique
siamese_score
(plus one value just below the minimum), keeps candidates with NO_PASS recall ≥
min_recall
(tolerance
1e-12
), picks the best-F1 threshold (tie-break: precision, then threshold value), scores signed weakness per row, takes the top
top_k_per_label
per ground-truth label, and expands each into one row per lighting. See
references/parameters-and-artifacts.md
for the exact computation, the override defaults, and the artifact table.
If no candidate threshold meets the recall target, the container exits non-zero and writes
unreachable_kpi.txt
into
results_dir
explaining which recall the model can actually achieve. In that case, stop the analysis after the docker call, write a one-section report explaining the model fundamentally cannot reach the KPI at any operating point, and recommend retraining or relabeling — skip the visual spot-check.
Container writes into
results_dir
:
kpi_gaps.parquet
(top-K weakest per label, expanded per lighting; columns
filepath
,
label
,
siamese_score
,
weakness
),
threshold.txt
,
metrics.json
,
weak_samples_breakdown.txt
, and
unreachable_kpi.txt
(only when the recall target is unreachable). See
references/parameters-and-artifacts.md
for the per-artifact contents. Print the container's stdout summary (chosen threshold, kept-row counts, per-label breakdown) to your own stdout so the script-check hook can verify the run produced output.
bash
$DOCKER gap_analysis vcn_aoi \
    inference_results_dir=<exp_dir>/inference/<label>/ \
    train_config=<exp_dir>/train.yaml \
    kpi_media_path=<dataset_root> \
    results_dir=<rca_results_dir> \
    top_k_per_label=50
务必传入
top_k_per_label
。该参数用于将容器从默认的“阈值以下样本”过滤模式切换为正确的“按标签选择top-K”排名模式。当
min_recall=1.0
时,所选阈值必然小于或等于所有NO_PASS样本的分数,因此阈值以下过滤模式仅会返回分类错误的PASS样本,无NO_PASS样本——这对于增强队列来说毫无用处。将
top_k_per_label
设置为正整数(在配置文件中或作为Hydra覆盖参数)后,容器会计算每个样本相对于阈值的符号弱点评分,并展示每个真实标签下的K个最弱样本,这正是下游步骤所需的按标签排名输出。
容器会遍历所有唯一的
siamese_score
(加上一个略低于最小值的数值),保留NO_PASS召回率≥
min_recall
(容差
1e-12
)的候选阈值,选择F1值最高的阈值(平局时优先考虑精度,其次是阈值大小),为每个样本计算符号弱点评分,选取每个真实标签下的前
top_k_per_label
个样本,并按光照条件展开每个样本。有关精确计算过程、覆盖参数默认值和产物列表,请参阅
references/parameters-and-artifacts.md
如果没有候选阈值能达到召回率目标,容器会以非零状态退出,并在
results_dir
中写入
unreachable_kpi.txt
,说明模型实际能达到的召回率。在这种情况下,docker调用后停止分析,撰写一段报告说明模型在任何运行点都根本无法达到KPI,并建议重新训练或重新标注——跳过视觉抽查。
容器写入
results_dir
的内容
kpi_gaps.parquet
(每个标签下的top-K最弱样本,按光照条件展开;列包括
filepath
label
siamese_score
weakness
)、
threshold.txt
metrics.json
weak_samples_breakdown.txt
,以及
unreachable_kpi.txt
(仅当召回率目标无法达到时生成)。有关每个产物的内容,请参阅
references/parameters-and-artifacts.md
。将容器的stdout摘要(所选阈值、保留样本数量、按标签细分情况)打印到你的stdout,以便脚本检查钩子验证运行是否产生了输出。

Step 5 — Visual spot check (small, fixed)

步骤5 — 视觉抽查(少量、固定数量)

Skip this step if
unreachable_kpi.txt
exists in
results_dir
— there is nothing meaningful to spot-check when the model can't reach the KPI at any threshold.
Otherwise, use the Read tool to view the test images for:
  • The 5 weakest PASS samples (the top of the "PASS misclassified as NO_PASS" pile) — pick by sorting
    kpi_gaps.parquet
    rows where
    label == 'PASS'
    by
    weakness
    descending.
  • The 5 weakest NO_PASS samples (the top of the "NO_PASS misclassified as PASS" pile) — same, with
    label != 'PASS'
    .
kpi_gaps.parquet
is already expanded per-lighting (multiple rows per sample). For the spot check, deduplicate to one row per (input_path, object_name) — pick the row whose
filepath
uses the FIRST lighting from the train YAML (one image per sample is enough — VCN's classify head sees all lightings stacked, but for human spot-check one is representative).
Classify each viewed sample as exactly one of:
  • mislabeled — visual content disagrees with the CSV label
  • edge case — genuinely ambiguous boundary case
  • data quality — corrupted, dark, wrong crop, bad framing
  • systematic — model has learned the wrong feature (the image looks "obviously PASS/NO_PASS" but the model disagrees)
Copy each viewed image (resized to 128×128 if PIL is available, otherwise just copy) into
<results_dir>/rca_images/
so it can be embedded inline in the report.
This is the only image inspection required. Do not view dozens of images, do not run failure mode clustering, do not audit goldens — VCN does not have golden images.

如果
results_dir
中存在
unreachable_kpi.txt
,则跳过此步骤——当模型在任何阈值下都无法达到KPI时,没有有意义的内容可抽查。
否则,使用Read工具查看以下测试图像:
  • 5个最弱的PASS样本(“PASS被错误分类为NO_PASS”中的最顶部样本)——通过对
    kpi_gaps.parquet
    label == 'PASS'
    的行按
    weakness
    降序排序选取。
  • 5个最弱的NO_PASS样本(“NO_PASS被错误分类为PASS”中的最顶部样本)——同理,选取
    label != 'PASS'
    的行。
kpi_gaps.parquet
已按光照条件展开(每个样本对应多行)。抽查时,按(input_path, object_name)去重——选取
filepath
使用训练YAML中第一个光照条件的行(每个样本查看一张图像即可——VCN的分类头会查看所有光照条件的堆叠图像,但对于人工抽查来说,一张图像足以代表)。
将每个查看的样本精确分类为以下类别之一:
  • 标注错误——视觉内容与CSV标签不符
  • 边缘情况——真正模糊的边界案例
  • 数据质量问题——图像损坏、过暗、裁剪错误、构图不佳
  • 系统性问题——模型学习了错误特征(图像看起来“明显是PASS/NO_PASS”但模型判断错误)
将每个查看的图像(如果PIL可用则调整为128×128大小,否则直接复制)复制到
<results_dir>/rca_images/
,以便嵌入到报告中。
这是唯一需要的图像检查步骤。无需查看数十张图像,无需运行失败模式聚类,无需审核黄金样本——VCN没有黄金样本。

Reference invocation

参考调用

The paste-and-edit end-to-end recipe (workspace, four paths, two numeric knobs, spec-file write, docker run, and the stdout sanity print that surfaces row counts for the script-check hook) lives in
references/recipe.md
. Use it verbatim, editing only the workspace, paths, and knobs.

可直接复制编辑的端到端流程(工作区、四个路径、两个数值参数、配置文件写入、docker运行,以及用于脚本检查钩子验证样本数量的stdout sanity打印)位于
references/recipe.md
中。请直接使用该流程,仅修改工作区、路径和参数。

Outputs

输出

Write everything into a timestamped folder under the experiment result directory:
<experiment_result_dir>/rca_results/YYYY-MM-DD_HHMMSS/
. The container's outputs (
kpi_gaps.parquet
,
threshold.txt
,
metrics.json
,
weak_samples_breakdown.txt
, and
unreachable_kpi.txt
when applicable) go straight there; the visual spot-check writes
rca_images/
; the packaging hook adds
rca_config/
and
claude_session.jsonl
automatically when
RCA_Report.md
is written. See
references/parameters-and-artifacts.md
for the full folder tree.
At the start of the run, get the real timestamp by running
date +%Y-%m-%d_%H%M%S
in Bash. Do NOT hardcode or guess. If the user specifies a custom output path, use that instead but maintain the same internal structure.

将所有内容写入实验结果目录下的一个带时间戳的文件夹:
<experiment_result_dir>/rca_results/YYYY-MM-DD_HHMMSS/
。容器的输出(
kpi_gaps.parquet
threshold.txt
metrics.json
weak_samples_breakdown.txt
,以及适用时的
unreachable_kpi.txt
)直接写入该文件夹;视觉抽查会写入
rca_images/
;当
RCA_Report.md
被写入时,打包钩子会自动添加
rca_config/
claude_session.jsonl
。有关完整文件夹结构,请参阅
references/parameters-and-artifacts.md
在运行开始时,通过在Bash中执行
date +%Y-%m-%d_%H%M%S
获取真实时间戳。请勿硬编码或猜测时间戳。如果用户指定了自定义输出路径,请使用该路径,但保持内部结构不变。

Common pitfalls

常见问题

The most consequential failure is forgetting
top_k_per_label
when
min_recall=1.0
— at that recall the chosen threshold sits at or below every NO_PASS score, so the fallback below-threshold filter matches ONLY misclassified PASS rows and
kpi_gaps.parquet
ends up with zero NO_PASS rows. Always include an explicit positive
top_k_per_label
. The full pitfalls list (spec file outside
$WORKSPACE
, unresolved
???
sentinels, wrong/unpulled image tag, path-mount mismatch,
unreachable_kpi.txt
handling, missing
inference.csv
columns, missing train-YAML keys,
kpi_media_path
prefix mismatch, no GPU inside the container) and the CLI-drift reconciliation are in
references/troubleshooting.md
.

最严重的错误是
min_recall=1.0
时忘记传入
top_k_per_label
——在该召回率下,所选阈值会小于或等于所有NO_PASS样本的分数,因此默认的阈值以下过滤模式仅会匹配分类错误的PASS样本,导致
kpi_gaps.parquet
中没有NO_PASS样本。务必传入明确的正整数
top_k_per_label
。完整的问题列表(配置文件位于
$WORKSPACE
外、未解析的
???
占位符、错误/未拉取的镜像标签、路径挂载不匹配、
unreachable_kpi.txt
处理、
inference.csv
列缺失、训练YAML键缺失、
kpi_media_path
前缀不匹配、容器内无GPU)以及CLI变更调整方法,请参阅
references/troubleshooting.md

Report Structure

报告结构

Write the RCA report into the timestamped output folder. It is a 7-section computational gap analysis (Verdict, Threshold Selection, Weakness Distribution, Top-K Weakest Samples, Visual Spot Check, Per-Label Breakdown, Recommended Actions), 1000–1800 words, with the confusion-matrix and per-label tables filled from
metrics.json
and the spot-check rows from
kpi_gaps.parquet
. When
unreachable_kpi.txt
exists, replace sections 3–6 with one short section quoting that file and collapse section 7 to a single retrain-or-relabel recommendation. See
references/rca-report-structure.md
for the complete skeleton with every section heading, table layout, and the unreachable-KPI variant.

将RCA报告写入带时间戳的输出文件夹。报告为7节的计算性差距分析(结论、阈值选择、弱点分布、Top-K最弱样本、视觉抽查、按标签细分、建议行动),字数1000-1800字,混淆矩阵和按标签表格从
metrics.json
填充,抽查行从
kpi_gaps.parquet
获取。如果存在
unreachable_kpi.txt
,则将第3-6节替换为引用该文件的简短章节,并将第7节简化为单一的重新训练或重新标注建议。有关完整框架、每个节标题、表格布局以及无法达到KPI的变体,请参阅
references/rca-report-structure.md

Execution Order

执行顺序

  1. Resolve
    DS_IMAGE
    from
    versions.yaml
    (
    images.tao_toolkit.data_services
    ), then run
    docker info
    ,
    nvidia-smi
    , and
    docker image inspect "$DS_IMAGE"
    (pulling if missing) once to confirm the environment. Abort with a clear message if any fail.
  2. Run
    date +%Y-%m-%d_%H%M%S
    to get the timestamp; create
    <experiment_result_dir>/rca_results/<timestamp>/
    .
  3. Write
    vcn_aoi_spec.yaml
    into the timestamped dir with
    min_recall
    and
    top_k_per_label
    filled in. Keep it under
    $WORKSPACE
    so the
    -e
    path resolves inside the container.
  4. Run
    docker run … "$DS_IMAGE" gap_analysis vcn_aoi -e vcn_aoi_spec.yaml inference_results_dir=… train_config=… kpi_media_path=… output_dir=…
    . The container writes
    kpi_gaps.parquet
    ,
    threshold.txt
    ,
    metrics.json
    ,
    weak_samples_breakdown.txt
    into
    results_dir
    . Print the chosen threshold and kept-row counts to stdout so the script-check hook can verify the run produced output.
  5. If
    unreachable_kpi.txt
    exists, skip Step 6 and write the abridged report. Otherwise continue.
  6. Pick 10 weak samples (5 weakest PASS + 5 weakest NO_PASS) from
    kpi_gaps.parquet
    , view each test image with Read, classify, and copy each into
    rca_images/
    .
  7. Write
    RCA_Report.md
    last — writing it triggers the packaging hook, which copies session logs and skill config alongside.
  1. versions.yaml
    images.tao_toolkit.data_services
    )解析
    DS_IMAGE
    ,然后运行
    docker info
    nvidia-smi
    docker image inspect "$DS_IMAGE"
    (如果缺失则拉取)一次以确认环境。如果任何步骤失败,需给出明确提示并终止运行。
  2. 运行
    date +%Y-%m-%d_%H%M%S
    获取时间戳;创建
    <experiment_result_dir>/rca_results/<timestamp>/
  3. vcn_aoi_spec.yaml
    写入带时间戳的文件夹,填入
    min_recall
    top_k_per_label
    。确保该文件位于
    $WORKSPACE
    下,以便
    -e
    路径在容器内可解析。
  4. 运行
    docker run … "$DS_IMAGE" gap_analysis vcn_aoi -e vcn_aoi_spec.yaml inference_results_dir=… train_config=… kpi_media_path=… output_dir=…
    。容器会将
    kpi_gaps.parquet
    threshold.txt
    metrics.json
    weak_samples_breakdown.txt
    写入
    results_dir
    。将所选阈值和保留样本数量打印到stdout,以便脚本检查钩子验证运行是否产生了输出。
  5. 如果存在
    unreachable_kpi.txt
    ,跳过步骤6并撰写简化版报告。否则继续。
  6. kpi_gaps.parquet
    中选取10个弱样本(5个最弱PASS + 5个最弱NO_PASS),使用Read工具查看每个测试图像,进行分类,并将每个图像复制到
    rca_images/
  7. 最后写入
    RCA_Report.md
    ——写入该文件会触发打包钩子,自动复制会话日志和Skill配置到旁边。