ascendc-operator-precision-eval
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseAscendC 算子精度评估
AscendC Operator Precision Evaluation
Skill 类型:评估型(测试生成 + 执行 + 报告输出)
本 skill 对已编译安装的 AscendC 算子进行系统化精度评估。测试用例由「常规 shape 测试」和「边界值测试」两部分组成,每个用例遍历算子支持的全部 dtype,运行后输出结构化精度报告。
Skill Type: Evaluation (Test Generation + Execution + Report Output)
This skill conducts systematic precision evaluation on compiled and installed AscendC operators. Test cases consist of two parts: "regular shape tests" and "boundary value tests". Each case traverses all dtypes supported by the operator, and a structured precision report is output after execution.
前置条件
Prerequisites
- 算子已编译安装(包可 import 或
ascend_kernel文件存在).so - 算子已通过基本功能测试(存在且通过)
tests/test_<op_name>.py - 已知算子名称、PyTorch 调用方式、输入域约束、支持的全部 dtype
- 已由
csrc/ops/<op_name>/test/<op_name>-test-cases.md生成(包含 SUPPORTED_DTYPES、TEST_SHAPES、BOUNDARY_VALUES、算子标杆)ascendc-operator-testcase-gen
- The operator has been compiled and installed (the package can be imported or the
ascend_kernelfile exists).so - The operator has passed basic functional tests (the file exists and passes)
tests/test_<op_name>.py - The operator name, PyTorch calling method, input domain constraints, and all supported dtypes are known
- has been generated by
csrc/ops/<op_name>/test/<op_name>-test-cases.md(including SUPPORTED_DTYPES, TEST_SHAPES, BOUNDARY_VALUES, and operator benchmarks)ascendc-operator-testcase-gen
核心流程
Core Process
Phase 1: 加载用例文档 + 信息收集 → Phase 2: 用例适配 → Phase 3: 测试脚本生成 → Phase 4: 执行 → Phase 5: 报告生成Phase 1: Load Case Document + Information Collection → Phase 2: Case Adaptation → Phase 3: Test Script Generation → Phase 4: Execution → Phase 5: Report GenerationPhase 1:加载用例文档 + 信息收集
Phase 1: Load Case Document + Information Collection
Step 1.1:加载 testcase-gen 用例文档(MANDATORY)
Step 1.1: Load testcase-gen Case Document (MANDATORY)
MUST 首先读取 ,从中提取:
csrc/ops/<op_name>/test/<op_name>-test-cases.md| 提取项 | 在用例文档中的位置 | 用途 |
|---|---|---|
| SUPPORTED_DTYPES | §测试配置 | 精度测试遍历的 dtype 列表 |
| TEST_SHAPES | §测试配置 | 常规 shape 测试的 shape 列表 |
| BOUNDARY_VALUES | §测试配置 | 边界值测试的标量值列表 |
| NPU 调用方式 | §算子标杆 | NPU_CALL 表达式 |
| CPU 参考实现 | §算子标杆 | CPU_REF 表达式 |
若不存在:回退为自行设计用例(按原 Phase 2 流程),但需在报告中注明"用例为自行设计,非 testcase-gen 产出"。<op_name>-test-cases.md
MUST first read and extract the following from it:
csrc/ops/<op_name>/test/<op_name>-test-cases.md| Extraction Item | Location in Case Document | Purpose |
|---|---|---|
| SUPPORTED_DTYPES | §Test Configuration | List of dtypes for precision test traversal |
| TEST_SHAPES | §Test Configuration | List of shapes for regular shape tests |
| BOUNDARY_VALUES | §Test Configuration | List of scalar values for boundary value tests |
| NPU Calling Method | §Operator Benchmark | NPU_CALL expression |
| CPU Reference Implementation | §Operator Benchmark | CPU_REF expression |
Ifdoes not exist: Fall back to designing cases on your own (follow the original Phase 2 process), but you must note in the report that "cases are self-designed, not generated by testcase-gen".<op_name>-test-cases.md
Step 1.2:从代码补充信息
Step 1.2: Supplement Information from Code
从现有代码中补充提取以下信息:
| 信息 | 来源 | 示例 |
|---|---|---|
| 算子名称 | 用户输入 / op_host 文件名 | |
| NPU 调用方式 | | |
| CPU 参考实现 | PyTorch 标准库(与用例文档交叉验证) | |
| 输入域约束 | 数学定义 / design.md | |
| 支持的全部 dtype | | |
| 支持的维度/shape 约束 | design.md / op_host 中的逻辑 | elementwise 支持任意维度 |
| 精度阈值 | 生态算子开源精度标准(见下表) | 按 dtype 查 Threshold |
Supplement and extract the following information from existing code:
| Information | Source | Example |
|---|---|---|
| Operator Name | User Input / op_host File Name | |
| NPU Calling Method | | |
| CPU Reference Implementation | PyTorch Standard Library (cross-verify with case document) | |
| Input Domain Constraints | Mathematical Definition / design.md | |
| All Supported dtypes | TORCH_CHECK in | |
| Supported Dimension/Shape Constraints | design.md / Logic in op_host | elementwise supports arbitrary dimensions |
| Precision Threshold | Open Source Precision Standards for Ecological Operators (see table below) | Check Threshold by dtype |
精度标准(生态算子开源精度标准)
Precision Standards (Open Source Precision Standards for Ecological Operators)
采用 MERE(平均相对误差)和 MARE(最大相对误差)两个指标判定:
相对误差 = abs(actual - golden) / (abs(golden) + 1e-7)
MERE = mean(相对误差)
MARE = max(相对误差)分母引入避免 golden 为零时除零。1e-7
通过标准:MERE < Threshold 且 MARE < 10 × Threshold。
| dtype | Threshold | MERE 上限 | MARE 上限 (10×) |
|---|---|---|---|
| float16 | 2⁻¹⁰ ≈ 9.77e-4 | 9.77e-4 | 9.77e-3 |
| bfloat16 | 2⁻⁷ ≈ 7.81e-3 | 7.81e-3 | 7.81e-2 |
| float32 | 2⁻¹³ ≈ 1.22e-4 | 1.22e-4 | 1.22e-3 |
完整 dtype 列表(含 HiFLOAT32、FLOAT8 E4M3/E5M2)见。references/OPS_PRECISION_STANDARDS.md
Two metrics, MERE (Mean Relative Error) and MARE (Maximum Relative Error), are used for judgment:
Relative Error = abs(actual - golden) / (abs(golden) + 1e-7)
MERE = mean(Relative Error)
MARE = max(Relative Error)is introduced in the denominator to avoid division by zero when golden is zero.1e-7
Pass Standard: MERE < Threshold and MARE < 10 × Threshold.
| dtype | Threshold | MERE Upper Limit | MARE Upper Limit (10×) |
|---|---|---|---|
| float16 | 2⁻¹⁰ ≈ 9.77e-4 | 9.77e-4 | 9.77e-3 |
| bfloat16 | 2⁻⁷ ≈ 7.81e-3 | 7.81e-3 | 7.81e-2 |
| float32 | 2⁻¹³ ≈ 1.22e-4 | 1.22e-4 | 1.22e-3 |
For the complete dtype list (including HiFLOAT32, FLOAT8 E4M3/E5M2), see.references/OPS_PRECISION_STANDARDS.md
Phase 2:用例适配
Phase 2: Case Adaptation
优先复用 testcase-gen 产出:若 Phase 1 成功加载了,则 TEST_SHAPES 和 BOUNDARY_VALUES 直接使用用例文档中的定义,无需重新设计。仅在用例文档不存在时才自行设计。<op_name>-test-cases.md
Prioritize Reusing testcase-gen Output: Ifis successfully loaded in Phase 1, directly use the definitions of TEST_SHAPES and BOUNDARY_VALUES from the case document, and do not redesign them. Only design cases on your own when the case document does not exist.<op_name>-test-cases.md
设计原则
Design Principles
- 全 dtype 覆盖:每个 shape / 每个边界值都遍历算子支持的全部 dtype
- shape 由算子决定:根据算子支持的维度选择合适的 shape,不要写固定维度
- shape 不要过大:单个用例元素数控制在合理范围,避免不必要的大 tensor
- 用例总数 = (len(TEST_SHAPES) + len(BOUNDARY_VALUES)) × len(SUPPORTED_DTYPES) ≥ 30
- Full dtype Coverage: Each shape / each boundary value traverses all dtypes supported by the operator
- Shapes Determined by Operator: Select appropriate shapes according to the dimensions supported by the operator, do not write fixed dimensions
- Shapes Not Too Large: Control the number of elements in a single case within a reasonable range to avoid unnecessary large tensors
- Total Number of Cases = (len(TEST_SHAPES) + len(BOUNDARY_VALUES)) × len(SUPPORTED_DTYPES) ≥ 30
Part A: 常规 Shape 测试(TEST_SHAPES)
Part A: Regular Shape Tests (TEST_SHAPES)
根据算子支持的维度,从以下维度池中选取适合的 shape,组成 列表。
TEST_SHAPESMUST 根据算子实际支持的维度来选,不支持的维度不要选。
According to the dimensions supported by the operator, select suitable shapes from the following dimension pool to form the list.
TEST_SHAPESMUST select based on the actual supported dimensions of the operator, do not select unsupported dimensions.
shape 选择参考池
Shape Selection Reference Pool
| 维度 | 推荐 shape | 适用算子类型 |
|---|---|---|
| 1D | (128,), (1024,), (4096,), (8192,) | elementwise, reduction |
| 2D | (32, 512), (64, 768), (128, 1024) | elementwise, matmul, linear |
| 3D | (8, 16, 64), (4, 128, 256) | elementwise, attention, conv1d |
| 4D | (4, 8, 32, 16), (2, 64, 32, 32) | conv2d, elementwise |
| 5D | (2, 3, 4, 5, 6) | conv3d, elementwise |
shape 不要过大:推荐单个 shape 元素数 ≤ 200K。
| Dimension | Recommended Shapes | Applicable Operator Types |
|---|---|---|
| 1D | (128,), (1024,), (4096,), (8192,) | elementwise, reduction |
| 2D | (32, 512), (64, 768), (128, 1024) | elementwise, matmul, linear |
| 3D | (8, 16, 64), (4, 128, 256) | elementwise, attention, conv1d |
| 4D | (4, 8, 32, 16), (2, 64, 32, 32) | conv2d, elementwise |
| 5D | (2, 3, 4, 5, 6) | conv3d, elementwise |
Shapes Not Too Large: It is recommended that the number of elements in a single shape ≤ 200K.
TEST_SHAPES 格式
TEST_SHAPES Format
python
TEST_SHAPES = [
("category_name", "description", (dim0, dim1, ...)),
# ...
]Category 名称自定义,建议按维度或场景命名。示例:
python
undefinedpython
TEST_SHAPES = [
("category_name", "description", (dim0, dim1, ...)),
# ...
]Category names can be customized, and it is recommended to name them by dimension or scenario. Example:
python
undefinedelementwise 算子(支持任意维度)
elementwise operator (supports arbitrary dimensions)
TEST_SHAPES = [
("1D", "128 elements", (128,)),
("1D", "1024 elements", (1024,)),
("1D", "4096 elements", (4096,)),
("1D", "8192 elements", (8192,)),
("2D", "batch*hidden 32x512", (32, 512)),
("2D", "BERT-base 64x768", (64, 768)),
("2D", "BERT-large 128x1024", (128, 1024)),
("3D", "8x16x64", (8, 16, 64)),
("3D", "4x128x256", (4, 128, 256)),
("4D", "4x8x32x16", (4, 8, 32, 16)),
("Production", "ViT 8x197x768", (8, 197, 768)),
("Production", "single sample 1x512", (1, 512)),
]
undefinedTEST_SHAPES = [
("1D", "128 elements", (128,)),
("1D", "1024 elements", (1024,)),
("1D", "4096 elements", (4096,)),
("1D", "8192 elements", (8192,)),
("2D", "batch*hidden 32x512", (32, 512)),
("2D", "BERT-base 64x768", (64, 768)),
("2D", "BERT-large 128x1024", (128, 1024)),
("3D", "8x16x64", (8, 16, 64)),
("3D", "4x128x256", (4, 128, 256)),
("4D", "4x8x32x16", (4, 8, 32, 16)),
("Production", "ViT 8x197x768", (8, 197, 768)),
("Production", "single sample 1x512", (1, 512)),
]
undefinedPart B: 边界值测试(BOUNDARY_VALUES)
Part B: Boundary Value Tests (BOUNDARY_VALUES)
针对算子的输入域,选取关键边界点和典型值。使用固定小 shape 进行测试。
(1024,)MUST 根据算子的数学定义确定边界值,不同算子差异很大。
Select key boundary points and typical values based on the input domain of the operator. Use the fixed small shape for testing.
(1024,)MUST determine boundary values according to the mathematical definition of the operator, which varies greatly for different operators.
边界值设计指导
Boundary Value Design Guidelines
| 算子类型 | 推荐边界值 |
|---|---|
| acosh (x≥1) | x=1.0, x=1.001, x=10.0, x=1000.0 |
| log (x>0) | x=0.001, x=1.0, x=100.0, x=10000.0 |
| sigmoid (全域) | x=0.0, x=-5.0, x=5.0, x=-20.0, x=20.0 |
| sqrt (x≥0) | x=0.0, x=0.001, x=1.0, x=10000.0 |
| 无域限制 | x=0.0, x=1.0, x=-1.0, x=100.0 |
| Operator Type | Recommended Boundary Values |
|---|---|
| acosh (x≥1) | x=1.0, x=1.001, x=10.0, x=1000.0 |
| log (x>0) | x=0.001, x=1.0, x=100.0, x=10000.0 |
| sigmoid (full domain) | x=0.0, x=-5.0, x=5.0, x=-20.0, x=20.0 |
| sqrt (x≥0) | x=0.0, x=0.001, x=1.0, x=10000.0 |
| No Domain Restrictions | x=0.0, x=1.0, x=-1.0, x=100.0 |
BOUNDARY_VALUES 格式
BOUNDARY_VALUES Format
python
BOUNDARY_VALUES = [
("description", scalar_value),
# ...
]示例(acosh):
python
BOUNDARY_VALUES = [
("domain lower bound x=1.0", 1.0),
("near boundary x=1.001", 1.001),
("moderate value x=10.0", 10.0),
("large value x=1000.0", 1000.0),
]python
BOUNDARY_VALUES = [
("description", scalar_value),
# ...
]Example (acosh):
python
BOUNDARY_VALUES = [
("domain lower bound x=1.0", 1.0),
("near boundary x=1.001", 1.001),
("moderate value x=10.0", 10.0),
("large value x=1000.0", 1000.0),
]Phase 3:测试脚本生成
Phase 3: Test Script Generation
MUST 先读取 目录下的模板文件,替换占位符后生成到算子目录的 子目录。
templates/test/MUST first read the template files in the directory, replace the placeholders, and generate them to the subdirectory of the operator directory.
templates/test/输出目录
Output Directory
csrc/ops/<op_name>/test/
├── test_<op_name>_precision.py ← pytest 测试
├── run_<op_name>_precision_report.py ← 报告生成器
├── <op_name>_precision_report.json ← JSON 报告(执行后生成)
└── <op_name>_precision_report.md ← Markdown 报告(执行后生成)csrc/ops/<op_name>/test/
├── test_<op_name>_precision.py ← pytest Test
├── run_<op_name>_precision_report.py ← Report Generator
├── <op_name>_precision_report.json ← JSON Report (generated after execution)
└── <op_name>_precision_report.md ← Markdown Report (generated after execution)模板文件
Template Files
| 模板 | 路径 | 生成目标 |
|---|---|---|
| pytest 测试 | | |
| 报告生成器 | | |
| Template | Path | Generation Target |
|---|---|---|
| pytest Test | | |
| Report Generator | | |
占位符替换表
Placeholder Replacement Table
| 占位符 | 说明 | 示例(acosh) |
|---|---|---|
| 算子名称 | |
| NPU 调用表达式(使用变量 | |
| CPU 参考基线(使用变量 | |
| 支持的全部 dtype 列表 | |
| 随机输入的域下界 | |
| 随机输入的域上界 | |
| 常规 shape 列表(Phase 2 Part A 的输出) | 见上方示例 |
| 边界值列表(Phase 2 Part B 的输出) | 见上方示例 |
| Placeholder | Description | Example (acosh) |
|---|---|---|
| Operator Name | |
| NPU Calling Expression (using variable | |
| CPU Reference Baseline (using variables | |
| List of All Supported dtypes | |
| Lower Bound of Random Input Domain | |
| Upper Bound of Random Input Domain | |
| List of Regular Shapes (Output of Phase 2 Part A) | See example above |
| List of Boundary Values (Output of Phase 2 Part B) | See example above |
必须采集的精度指标
Must-Collect Precision Metrics
判定指标(用于通过/失败判定):
| 指标 | 计算方式 | 通过条件 |
|---|---|---|
| MERE | | < Threshold |
| MARE | | < 10 × Threshold |
辅助指标(用于分析,不作为判定依据):
| 指标 | 计算方式 | 意义 |
|---|---|---|
| MaxAbsErr | | 最大绝对误差 |
| MeanAbsErr | | 平均绝对误差 |
| CosineSim | | 余弦相似度 |
Judgment Metrics (used for pass/fail judgment):
| Metric | Calculation Method | Pass Condition |
|---|---|---|
| MERE | | < Threshold |
| MARE | | < 10 × Threshold |
Auxiliary Metrics (used for analysis, not as judgment basis):
| Metric | Calculation Method | Meaning |
|---|---|---|
| MaxAbsErr | | Maximum Absolute Error |
| MeanAbsErr | | Mean Absolute Error |
| CosineSim | | Cosine Similarity |
Phase 4:执行测试
Phase 4: Test Execution
4.1 环境准备
4.1 Environment Preparation
bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
export PATH=/root/miniconda3/envs/py310/bin:$PATHMUST 在每个 Shell 调用前 source 环境。
bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
export PATH=/root/miniconda3/envs/py310/bin:$PATHMUST source the environment before each Shell call.
4.2 执行 pytest
4.2 Execute pytest
bash
cd <project_root>
python3 -m pytest csrc/ops/<op_name>/test/test_<op_name>_precision.py -v --tb=shortbash
cd <project_root>
python3 -m pytest csrc/ops/<op_name>/test/test_<op_name>_precision.py -v --tb=short4.3 生成报告
4.3 Generate Report
bash
python3 csrc/ops/<op_name>/test/run_<op_name>_precision_report.pybash
python3 csrc/ops/<op_name>/test/run_<op_name>_precision_report.py4.4 失败处理
4.4 Failure Handling
| 失败类型 | 排查方向 |
|---|---|
| RuntimeError (NPU kernel) | 输入数据超出定义域 / NPU 不支持该 dtype |
| AssertionError (精度) | 检查 MERE/MARE 是否略超 Threshold,分析是否为边界值导致 |
| 个别 dtype 用例 FAIL | 确认该 dtype 的 Threshold 是否匹配,检查 MARE 是否集中在少数异常点 |
| 大量 FAIL | 检查算子 Compute 逻辑是否有 bug |
| Failure Type | Troubleshooting Direction |
|---|---|
| RuntimeError (NPU kernel) | Input data exceeds the defined domain / NPU does not support this dtype |
| AssertionError (Precision) | Check if MERE/MARE slightly exceeds the Threshold, analyze whether it is caused by boundary values |
| Individual dtype Cases FAIL | Confirm whether the Threshold of this dtype matches, check if MARE is concentrated in a few abnormal points |
| A Large Number of FAIL | Check if there are bugs in the operator Compute logic |
4.5 精度问题深度排查
4.5 In-Depth Troubleshooting of Precision Issues
当出现精度失败(allclose 不通过、输出偏差过大、输出全零/NaN)且简单阈值调整无法解决时,MUST 读取并按 skill 流程进行系统化根因定位:
ascendc-operator-precision-debug- 读取 SKILL.md
ascendc-operator-precision-debug - 按其五阶段流程执行:误差分析 → 代码审查 → 实验隔离 → 插桩定位 → 修复验证
- 修复后重新运行本 skill 的完整精度测试,确认全部通过
注意:仅在精度问题无法通过调整阈值解决时才调用。个别 dtype 因硬件精度特性略超阈值的情况,优先通过放宽阈值(并在报告中说明)解决。
When precision failures occur (allclose not passed, excessive output deviation, all-zero/NaN output) and simple threshold adjustment cannot solve the problem, MUST read and follow the skill process for systematic root cause location:
ascendc-operator-precision-debug- Read the SKILL.md of
ascendc-operator-precision-debug - Execute according to its five-phase process: Error Analysis → Code Review → Experimental Isolation → Instrumentation Location → Fix Verification
- After fixing, re-run the complete precision test of this skill to confirm all tests pass
Note: Only call this when precision issues cannot be resolved by threshold adjustment. For cases where individual dtypes slightly exceed the threshold due to hardware precision characteristics, priority should be given to solving it by relaxing the threshold (and explain in the report).
Phase 5:报告生成
Phase 5: Report Generation
5.1 Markdown 报告
5.1 Markdown Report
MUST 生成 ,参考 。
csrc/ops/<op_name>/test/<op_name>_precision_report.mdtemplates/precision_report_template.md报告包含:
- 总览表:总用例/通过/失败/通过率
- 精度阈值标准表
- 常规 Shape 测试结果表(按 category 分组)
- 边界值测试结果表
- 按 dtype 汇总统计
- 关键发现(≥3 条结论)
MUST generate , referring to .
csrc/ops/<op_name>/test/<op_name>_precision_report.mdtemplates/precision_report_template.mdThe report includes:
- Overview Table: Total cases, passed cases, failed cases, pass rate (percentage).
- Precision Threshold Standard Table
- Regular Shape Test Results Table (grouped by category)
- Boundary Value Test Results Table
- Summary Statistics by dtype
- Key Findings (≥3 conclusions)
5.2 完成提示(文件 + 对话)
5.2 Completion Prompt (File + Conversation)
- 文件:MUST 生成 (及同目录
csrc/ops/<op_name>/test/<op_name>_precision_report.md若脚本输出),并向用户给出完整路径:*_precision_report.json
精度验证报告已生成:
csrc/ops/<op_name>/test/<op_name>_precision_report.md
csrc/ops/<op_name>/test/<op_name>_precision_report.json # 若存在- 当前对话:MUST 同时遵守下节「对话内展示结果」,不得仅输出路径。
- File: MUST generate (and
csrc/ops/<op_name>/test/<op_name>_precision_report.mdin the same directory if the script outputs it), and provide the complete path to the user:*_precision_report.json
Precision verification report has been generated:
csrc/ops/<op_name>/test/<op_name>_precision_report.md
csrc/ops/<op_name>/test/<op_name>_precision_report.json # if exists- Current Conversation: MUST also comply with the next section "Display Results in Conversation", and must not only output the path.
对话内展示结果(MANDATORY)
Display Results in Conversation (MANDATORY)
pytest 与报告脚本执行完毕且已生成 Markdown/JSON 后,助手在当前对话的回复中 MUST:
- 粘贴可读结论(用户无需打开文件即可掌握结果):
- 总览:总用例数、通过数、失败数、通过率(百分比)。
- 若有失败:列出失败用例标识(case 名 / shape / dtype / 类别),以及主要误差指标(如 MaxAbsErr)或 pytest 摘要行。
- 若全部通过:明确写出「全部通过」及总用例数。
- 关键发现 ≥3 条:可与报告内「关键发现」一致或提炼自报告(dtype 差异、边界值表现、阈值是否收紧/放宽等)。
- 可选:按 dtype 汇总的通过情况表(摘录,case 多时可只列汇总行)。
- 口径:一两句话说明使用的精度标准(MERE/MARE,生态算子开源精度标准)及各 dtype 的 Threshold 值。
- 路径殿后:在展示完上述内容后,再附 (及 JSON)的完整路径。
<op_name>_precision_report.md
NEVER:仅回复「报告已生成」和路径;NEVER 用「请自行打开 Markdown」替代在对话中展示通过率与失败摘要。
After pytest and the report script are executed and the Markdown/JSON report is generated, the assistant MUST in the reply of the current conversation:
- Paste Readable Conclusions (users can grasp the results without opening files):
- Overview: Total number of cases, number of passed cases, number of failed cases, pass rate (percentage).
- If there are failures: List the identifiers of failed cases (case name / shape / dtype / category), as well as main error metrics (such as MaxAbsErr) or pytest summary lines.
- If all passed: Clearly state "all passed" and the total number of cases.
- Key Findings ≥3: Can be consistent with the "Key Findings" in the report or extracted from the report (dtype differences, boundary value performance, whether the threshold is tightened/relaxed, etc.).
- Optional: A table summarizing the pass status by dtype (excerpt, only list summary rows when there are many cases).
- Caliber: Briefly explain the precision standards used (MERE/MARE, open source precision standards for ecological operators) and the Threshold values of each dtype in one or two sentences.
- Path at the End: After displaying the above content, attach the complete path of (and JSON if applicable).
<op_name>_precision_report.md
NEVER: Only reply "report has been generated" and the path; NEVER use "please open the Markdown file yourself" instead of displaying the pass rate and failure summary in the conversation.
经验总结
Experience Summary
输入生成
Input Generation
- 定义域:务必查阅算子数学定义,确保输入合法
- fp16 范围:fp16 最大约 65504,输入不要超过此值
- shape 大小:推荐单个 shape 元素数 ≤ 200K,避免测试时间过长
- Defined Domain: Be sure to consult the mathematical definition of the operator to ensure legal input
- fp16 Range: The maximum value of fp16 is approximately 65504, and input should not exceed this value
- Shape Size: It is recommended that the number of elements in a single shape ≤ 200K to avoid excessive test time
精度指标
Precision Metrics
- MERE / MARE:判定指标,分母为 (非 clamp),与生态算子开源精度标准对齐
abs(golden) + 1e-7 - MaxAbsErr / MeanAbsErr:辅助分析,帮助判断偏差量级
- CosineSim:全零输出时为 0 或 NaN,需标注说明而非判定失败
- MERE / MARE: Judgment metrics, with denominator (not clamp), aligned with open source precision standards for ecological operators
abs(golden) + 1e-7 - MaxAbsErr / MeanAbsErr: Auxiliary analysis, helping to judge the magnitude of deviation
- CosineSim: 0 or NaN when output is all-zero, need to mark and explain instead of judging failure
阈值说明
Threshold Description
- 阈值来源:生态算子开源精度标准()
references/OPS_PRECISION_STANDARDS.md - 通过条件:MERE < Threshold 且 MARE < 10 × Threshold
- 不建议随意放宽阈值;若确需放宽,MUST 在报告中说明原因
- Threshold Source: Open Source Precision Standards for Ecological Operators ()
references/OPS_PRECISION_STANDARDS.md - Pass Condition: MERE < Threshold and MARE < 10 × Threshold
- It is not recommended to relax the threshold arbitrarily; if relaxation is indeed necessary, MUST explain the reason in the report
反模式(NEVER)
Anti-Patterns (NEVER)
- NEVER 只生成报告文件而不在对话中展示总览与结论
- NEVER 隐瞒失败用例数量,仅报告路径
- NEVER only generate report files without displaying the overview and conclusions in the conversation
- NEVER conceal the number of failed cases and only report the path
检查清单
Checklist
- 已读取 (若存在)
csrc/ops/<op_name>/test/<op_name>-test-cases.md - 信息收集完成(算子名、调用方式、输入域、支持的全部 dtype、支持的维度)
- TEST_SHAPES 优先来自 testcase-gen 用例文档,shape 不过大
- BOUNDARY_VALUES 根据算子输入域设计
- 用例总数 = (shapes + boundary) × dtypes ≥ 30
- 算子支持的每种 dtype 都已测试
- pytest 全部通过
- JSON + Markdown 报告已生成
- 关键发现 ≥ 3 条
- 已向用户提示报告与 JSON 路径(若生成)
- 已在当前对话中展示总览(通过率)、失败摘要(若有)及 ≥3 条关键发现,不仅附路径
- Has read (if exists)
csrc/ops/<op_name>/test/<op_name>-test-cases.md - Information collection is complete (operator name, calling method, input domain, all supported dtypes, supported dimensions)
- TEST_SHAPES are preferentially from the testcase-gen case document, and shapes are not too large
- BOUNDARY_VALUES are designed according to the operator's input domain
- Total number of cases = (shapes + boundary) × dtypes ≥ 30
- Each dtype supported by the operator has been tested
- All pytest tests passed
- JSON + Markdown report has been generated
- Key Findings ≥ 3
- Has prompted the user with the report and JSON path (if generated)
- Has displayed the overview (pass rate), failure summary (if any) and ≥3 key findings in the current conversation, not only attached the path