tao-analyze-gaps-vlm-bcq
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseVLM Binary Classification Gap Analysis
VLM二分类差距分析
Reads a VLM predictions JSON, compares each model response against ground truth, and writes FP/FN failure cases to a JSONL file with a summary report.
读取VLM预测结果JSON文件,将每个模型响应与真实标签进行对比,并将FP(假阳性)/FN(假阴性)失败案例写入JSONL文件,同时生成一份汇总报告。
Purpose
用途
After running a VLM on a binary yes/no evaluation task, the predictions need to be compared against ground truth to identify failure cases. This skill produces a structured list of FP (false positive) and FN (false negative) samples that downstream RCCA stages (e.g., cosmos generation, root cause analysis) consume to drive a DEFT iteration.
在VLM完成是/否二分类评估任务后,需要将预测结果与真实标签对比以识别失败案例。本工具会生成结构化的FP(假阳性)和FN(假阴性)样本列表,供下游RCCA阶段(如cosmos生成、根因分析)使用,以推动DEFT迭代。
Usage
使用方法
Invoke the action inside the TAO Toolkit data services container with Hydra-style key=value overrides:
vlm_bcqbash
gap_analysis vlm_bcq \
predictions_json=/path/to/results.json \
results_dir=/path/to/output/gapsInclude when values in the predictions are relative paths:
videos_dirvideo_idbash
gap_analysis vlm_bcq \
predictions_json=/path/to/results.json \
results_dir=/path/to/output/gaps \
videos_dir=/path/to/videos/rootAfter the run, surface the FP/FN counts from and point downstream stages at .
kpi_gaps_report.txtkpi_gaps.jsonl在TAO Toolkit数据服务容器内,使用Hydra风格的key=value参数覆盖调用操作:
vlm_bcqbash
gap_analysis vlm_bcq \
predictions_json=/path/to/results.json \
results_dir=/path/to/output/gaps当预测结果中的为相对路径时,需添加参数:
video_idvideos_dirbash
gap_analysis vlm_bcq \
predictions_json=/path/to/results.json \
results_dir=/path/to/output/gaps \
videos_dir=/path/to/videos/root运行完成后,可从中查看FP/FN的数量,并将下游阶段指向文件。
kpi_gaps_report.txtkpi_gaps.jsonlInputs
输入参数
- predictions_json: Path to predictions JSON file. Must be a JSON array where each item has ,
video_id, andresponsefields.gtandresponseare parsed with word-boundary matching —gtor'yes'anywhere in the string is recognized. Samples where both or neither are present are skipped with a warning.'no' - videos_dir (optional): Base directory for resolving relative paths. If omitted,
video_idvalues are used as absolute paths.video_id
Predictions JSON format:
json
[
{
"video_id": "/path/to/video.mp4",
"response": "Yes, there is a collision.",
"gt": "B. No",
"question": "Is there a collision?"
}
]- predictions_json:预测结果JSON文件的路径。文件必须为JSON数组,其中每个元素包含、
video_id和response字段。gt和response通过单词边界匹配解析——字符串中任意位置的gt或'yes'都会被识别。若样本同时包含或都不包含这两个词,会被跳过并发出警告。'no' - videos_dir(可选):用于解析相对路径的基础目录。若省略,
video_id值将被视为绝对路径。video_id
预测结果JSON格式:
json
[
{
"video_id": "/path/to/video.mp4",
"response": "Yes, there is a collision.",
"gt": "B. No",
"question": "Is there a collision?"
}
]Outputs
输出结果
- kpi_gaps.jsonl: One JSON object per line for each FP/FN case. Fields: (absolute path),
video_id(error_typeorFP),FN,question,ground_truth.response - kpi_gaps_report.txt: Human-readable table with total FP/FN counts.
If no gaps are found, no files are written and a message is logged.
- kpi_gaps.jsonl:每行一个JSON对象,对应每个FP/FN案例。字段包括:(绝对路径)、
video_id(error_type或FP)、FN、question、ground_truth。response - kpi_gaps_report.txt:易读的表格形式文件,包含FP/FN的总数。
若未发现差距,则不会生成任何文件,仅记录一条提示信息。
Key Parameters
关键参数
| Parameter | Required | Description |
|---|---|---|
| predictions_json | Yes | Path to predictions JSON file |
| results_dir | Yes | Output directory; created if it does not exist |
| videos_dir | No | Base directory for resolving relative |
| 参数 | 必填 | 描述 |
|---|---|---|
| predictions_json | 是 | 预测结果JSON文件的路径 |
| results_dir | 是 | 输出目录;若不存在则自动创建 |
| videos_dir | 否 | 用于解析相对 |
Error Patterns
错误模式
| Error | Cause | Fix |
|---|---|---|
| | Check the path |
| Predictions file is not a list | Wrap predictions in |
| A prediction item is missing a required field | Inspect and fix the predictions JSON |
| Samples silently skipped | | Check logs for warnings; inspect those samples |
| 错误 | 原因 | 修复方法 |
|---|---|---|
| | 检查路径是否正确 |
| 预测结果文件不是数组格式 | 将预测结果包裹在 |
| 某个预测结果项缺少必填字段 | 检查并修复预测结果JSON文件 |
| 样本被静默跳过 | | 查看日志中的警告信息;检查相关样本 |