tao-analyze-gaps-vlm-bcq

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

VLM Binary Classification Gap Analysis

VLM二分类差距分析

Reads a VLM predictions JSON, compares each model response against ground truth, and writes FP/FN failure cases to a JSONL file with a summary report.
读取VLM预测结果JSON文件,将每个模型响应与真实标签进行对比,并将FP(假阳性)/FN(假阴性)失败案例写入JSONL文件,同时生成一份汇总报告。

Purpose

用途

After running a VLM on a binary yes/no evaluation task, the predictions need to be compared against ground truth to identify failure cases. This skill produces a structured list of FP (false positive) and FN (false negative) samples that downstream RCCA stages (e.g., cosmos generation, root cause analysis) consume to drive a DEFT iteration.
在VLM完成是/否二分类评估任务后,需要将预测结果与真实标签对比以识别失败案例。本工具会生成结构化的FP(假阳性)和FN(假阴性)样本列表,供下游RCCA阶段(如cosmos生成、根因分析)使用,以推动DEFT迭代。

Usage

使用方法

Invoke the
vlm_bcq
action inside the TAO Toolkit data services container with Hydra-style key=value overrides:
bash
gap_analysis vlm_bcq \
  predictions_json=/path/to/results.json \
  results_dir=/path/to/output/gaps
Include
videos_dir
when
video_id
values in the predictions are relative paths:
bash
gap_analysis vlm_bcq \
  predictions_json=/path/to/results.json \
  results_dir=/path/to/output/gaps \
  videos_dir=/path/to/videos/root
After the run, surface the FP/FN counts from
kpi_gaps_report.txt
and point downstream stages at
kpi_gaps.jsonl
.
在TAO Toolkit数据服务容器内,使用Hydra风格的key=value参数覆盖调用
vlm_bcq
操作:
bash
gap_analysis vlm_bcq \
  predictions_json=/path/to/results.json \
  results_dir=/path/to/output/gaps
当预测结果中的
video_id
为相对路径时,需添加
videos_dir
参数:
bash
gap_analysis vlm_bcq \
  predictions_json=/path/to/results.json \
  results_dir=/path/to/output/gaps \
  videos_dir=/path/to/videos/root
运行完成后,可从
kpi_gaps_report.txt
中查看FP/FN的数量,并将下游阶段指向
kpi_gaps.jsonl
文件。

Inputs

输入参数

  • predictions_json: Path to predictions JSON file. Must be a JSON array where each item has
    video_id
    ,
    response
    , and
    gt
    fields.
    response
    and
    gt
    are parsed with word-boundary matching —
    'yes'
    or
    'no'
    anywhere in the string is recognized. Samples where both or neither are present are skipped with a warning.
  • videos_dir (optional): Base directory for resolving relative
    video_id
    paths. If omitted,
    video_id
    values are used as absolute paths.
Predictions JSON format:
json
[
  {
    "video_id": "/path/to/video.mp4",
    "response": "Yes, there is a collision.",
    "gt": "B. No",
    "question": "Is there a collision?"
  }
]
  • predictions_json:预测结果JSON文件的路径。文件必须为JSON数组,其中每个元素包含
    video_id
    response
    gt
    字段。
    response
    gt
    通过单词边界匹配解析——字符串中任意位置的
    'yes'
    'no'
    都会被识别。若样本同时包含或都不包含这两个词,会被跳过并发出警告。
  • videos_dir(可选):用于解析相对
    video_id
    路径的基础目录。若省略,
    video_id
    值将被视为绝对路径。
预测结果JSON格式:
json
[
  {
    "video_id": "/path/to/video.mp4",
    "response": "Yes, there is a collision.",
    "gt": "B. No",
    "question": "Is there a collision?"
  }
]

Outputs

输出结果

  • kpi_gaps.jsonl: One JSON object per line for each FP/FN case. Fields:
    video_id
    (absolute path),
    error_type
    (
    FP
    or
    FN
    ),
    question
    ,
    ground_truth
    ,
    response
    .
  • kpi_gaps_report.txt: Human-readable table with total FP/FN counts.
If no gaps are found, no files are written and a message is logged.
  • kpi_gaps.jsonl:每行一个JSON对象,对应每个FP/FN案例。字段包括:
    video_id
    (绝对路径)、
    error_type
    FP
    FN
    )、
    question
    ground_truth
    response
  • kpi_gaps_report.txt:易读的表格形式文件,包含FP/FN的总数。
若未发现差距,则不会生成任何文件,仅记录一条提示信息。

Key Parameters

关键参数

ParameterRequiredDescription
predictions_jsonYesPath to predictions JSON file
results_dirYesOutput directory; created if it does not exist
videos_dirNoBase directory for resolving relative
video_id
paths
参数必填描述
predictions_json预测结果JSON文件的路径
results_dir输出目录;若不存在则自动创建
videos_dir用于解析相对
video_id
路径的基础目录

Error Patterns

错误模式

ErrorCauseFix
FileNotFoundError
predictions_json
does not exist
Check the path
ValueError: must be a JSON array
Predictions file is not a listWrap predictions in
[...]
ValueError: missing 'gt'/'response'/'video_id'
A prediction item is missing a required fieldInspect and fix the predictions JSON
Samples silently skipped
response
or
gt
contains both or neither 'yes'/'no'
Check logs for warnings; inspect those samples
错误原因修复方法
FileNotFoundError
predictions_json
文件不存在
检查路径是否正确
ValueError: must be a JSON array
预测结果文件不是数组格式将预测结果包裹在
[...]
ValueError: missing 'gt'/'response'/'video_id'
某个预测结果项缺少必填字段检查并修复预测结果JSON文件
样本被静默跳过
response
gt
同时包含或都不包含'yes'/'no'
查看日志中的警告信息;检查相关样本