Loading...
Loading...
Routes the weakest VCN samples (output of `tao-analyze-gaps-visual-changenet`) into per-augmentation-module subsets — one parquet for k-NN mining, one for AnomalyGen (Cosmos SDG) — based on each module's label eligibility. Use as the immediate next step after DEFT gap analysis in a VCN AOI SDA iteration.
npx skill4agent add nvidia/skills tao-route-visual-changenet-samplesSHIFTSHIFTPASSEXCESS_SOLDERMISSINGBRIDGE.isin(...)gaps_parquet<exp_dir>/rca_results/<timestamp>/gaps.parquettao-analyze-gaps-visual-changenetfilepathlabelsiamese_scoreweaknesssource_pool_csvlabel<rca_result_dir>/routing_results/<timestamp>/anomalygen_supported_labels{"PASS", "EXCESS_SOLDER", "MISSING", "BRIDGE"}ANOMALYGEN_SUPPORTED_LABELSmdo-kratos-workflows/pipelines/sda/routing.py.isin(...)df = pd.read_parquet(gaps_parquet)
labels_upper = df["label"].astype(str).str.upper()labelif source_pool_csv and os.path.isfile(source_pool_csv):
pool_df = pd.read_csv(source_pool_csv)
pool_labels = {str(l).upper() for l in pool_df["label"].unique()}
mn_mask = labels_upper.isin(pool_labels)
mn_df = df[mn_mask]
else:
pool_missing = True
pool_labels = set()
mn_df = df.iloc[0:0] # empty, but with the same schema
mn_df.to_parquet(mining_gaps_parquet, index=False)ANOMALYGEN_SUPPORTED = {"PASS", "EXCESS_SOLDER", "MISSING", "BRIDGE"}
ag_mask = labels_upper.isin(ANOMALYGEN_SUPPORTED)
ag_df = df[ag_mask]
ag_df.to_parquet(anomalygen_gaps_parquet, index=False)anomalygen_gaps.parquetcountminingpool_labelsanomalygenANOMALYGEN_SUPPORTEDrouting_summary.txtWeak-sample routing summary
Total weak samples: <N>
Mining subset: <N_mn> -> <mining_gaps_parquet>
AnomalyGen subset: <N_ag> -> <anomalygen_gaps_parquet>
[If pool missing:]
No source pool CSV at '<path>'; mining subset is empty.
Per-label breakdown (count, mining, anomalygen):
PASS: 50 (mining=yes, anomalygen=yes)
MISSING: 32 (mining=no, anomalygen=yes)
SHIFT: 14 (mining=yes, anomalygen=no)
EXCESS_SOLDER: 9 (mining=yes, anomalygen=yes)
...len(df)len(mn_df) == 0len(ag_df) == 0Recommended Actionsmdo-kratos-workflows/pipelines/sda/routing.pyimport os
import pandas as pd
ANOMALYGEN_SUPPORTED = {"PASS", "EXCESS_SOLDER", "MISSING", "BRIDGE"}
df = pd.read_parquet(gaps_parquet)
labels_upper = df["label"].astype(str).str.upper()
# Mining subset
pool_missing = False
if source_pool_csv and os.path.isfile(source_pool_csv):
pool_df = pd.read_csv(source_pool_csv)
pool_labels = {str(l).upper() for l in pool_df["label"].unique()}
mn_mask = labels_upper.isin(pool_labels)
mn_df = df[mn_mask]
else:
pool_missing = True
pool_labels = set()
mn_df = df.iloc[0:0]
os.makedirs(os.path.dirname(mining_gaps_parquet) or ".", exist_ok=True)
mn_df.to_parquet(mining_gaps_parquet, index=False)
# AnomalyGen subset
ag_mask = labels_upper.isin(ANOMALYGEN_SUPPORTED)
ag_df = df[ag_mask]
os.makedirs(os.path.dirname(anomalygen_gaps_parquet) or ".", exist_ok=True)
ag_df.to_parquet(anomalygen_gaps_parquet, index=False)
# Per-label breakdown
summary_lines = [
"Weak-sample routing summary",
f"Total weak samples: {len(df)}",
f"Mining subset: {len(mn_df)} -> {mining_gaps_parquet}",
f"AnomalyGen subset: {len(ag_df)} -> {anomalygen_gaps_parquet}",
"",
]
if pool_missing:
summary_lines.append(f"No source pool CSV at {source_pool_csv!r}; mining subset is empty.")
summary_lines.append("")
summary_lines.append("Per-label breakdown (count, mining, anomalygen):")
label_counts = labels_upper.value_counts()
for label, count in label_counts.items():
in_mn = (not pool_missing) and label in pool_labels
in_ag = label in ANOMALYGEN_SUPPORTED
summary_lines.append(
f" {label}: {count} "
f"(mining={'yes' if in_mn else 'no'}, "
f"anomalygen={'yes' if in_ag else 'no'})"
)
summary_text = "\n".join(summary_lines) + "\n"
os.makedirs(logs_dir, exist_ok=True)
with open(os.path.join(logs_dir, "routing_summary.txt"), "w", encoding="utf-8") as f:
f.write(summary_text)
print(summary_text.strip())routing_config/claude_session.jsonlRouting_Report.md<output_dir>/routing_results/YYYY-MM-DD_HHMMSS/
├── Routing_Report.md # Full routing report
├── mining_gaps.parquet # Subset routed to k-NN Mining
├── anomalygen_gaps.parquet # Subset routed to AnomalyGen (Cosmos SDG)
├── routing_summary.txt # Plain-text per-label breakdown
├── routing_config/ # Auto-copied by hook
└── claude_session.jsonl # Auto-copied by hookdate +%Y-%m-%d_%H%M%S# VCN Routing Report: <Iteration / Experiment Name>
## 1. Verdict
- Total weak samples in: <N>
- Mining subset: <N_mn> rows → `mining_gaps.parquet`
- AnomalyGen subset: <N_ag> rows → `anomalygen_gaps.parquet`
- Source pool present? <yes/no — and the path>
- One-line headline: "<X> labels routed, <Y> labels dropped (no module accepted)"
## 2. Inputs
| Input | Path | Notes |
|-------|------|-------|
| gaps_parquet | … | rows=<N>, columns=<col list> |
| source_pool_csv | … | rows=<M> or "not provided" / "missing" |
## 3. Per-Label Routing Decisions
| Label | Count in gaps | In source pool? | Mining? | AnomalyGen? | Routed To |
|-------|----------------|------------------|----------|--------------|-----------|
(One row per distinct label in `gaps_parquet`, uppercased. `Routed To` is one of:
`mining only`, `anomalygen only`, `mining+anomalygen`, `neither (DROPPED)`.
Use `neither (DROPPED)` whenever no module accepted the label. Sort by count descending.)
## 4. Module-Level Summaries
### 4.1 k-NN Mining
- Pool labels (from source_pool_csv): <list, or "pool missing">
- Labels accepted from input: <list>
- Total rows routed: <N_mn>
- Per-label row counts: <breakdown>
### 4.2 AnomalyGen (Cosmos SDG)
- Eligible labels (configured): PASS, EXCESS_SOLDER, MISSING, BRIDGE
- Labels accepted from input: <list>
- Total rows routed: <N_ag>
- Per-label row counts: <breakdown>
## 5. Dropped Labels (routed to NEITHER module)
| Label | Count | Why dropped | Suggested fix |
|-------|-------|-------------|----------------|
(Empty table is OK and means no labels were dropped. If non-empty, every row needs a
"why" — typically one of: "not in source pool AND not in AnomalyGen supported set",
"source pool missing entirely AND label not in AnomalyGen set", "label name doesn't
match any module's expected canonicalization".)
## 6. Recommended Actions
1. **If any labels are dropped**: seed the source pool with that label, OR extend
`ANOMALYGEN_SUPPORTED_LABELS` (and the AnomalyGen generator coverage).
2. **If source pool is missing**: provide `source_pool_csv` to enable the Mining branch.
Without it, half of the augmentation pipeline is dark.
3. **If AnomalyGen subset is empty**: gap analysis only surfaced labels AnomalyGen cannot
generate; rely on Mining for this iteration, or extend the AnomalyGen integration.
4. **If both subsets are empty**: stop the SDA iteration. Nothing downstream can run.date +%Y-%m-%d_%H%M%S<output_dir>/routing_results/<timestamp>/mining_gaps.parquetanomalygen_gaps.parquetrouting_summary.txtRouting_Report.md