ascendc-operator-precision-eval

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AscendC 算子精度评估

AscendC Operator Precision Evaluation

Skill 类型:评估型(测试生成 + 执行 + 报告输出)
本 skill 对已编译安装的 AscendC 算子进行系统化精度评估。测试用例由「常规 shape 测试」和「边界值测试」两部分组成,每个用例遍历算子支持的全部 dtype,运行后输出结构化精度报告。
Skill Type: Evaluation (Test Generation + Execution + Report Output)
This skill conducts systematic precision evaluation on compiled and installed AscendC operators. Test cases consist of two parts: "regular shape tests" and "boundary value tests". Each case traverses all dtypes supported by the operator, and a structured precision report is output after execution.

前置条件

Prerequisites

  • 算子已编译安装(
    ascend_kernel
    包可 import 或
    .so
    文件存在)
  • 算子已通过基本功能测试(
    tests/test_<op_name>.py
    存在且通过)
  • 已知算子名称、PyTorch 调用方式、输入域约束、支持的全部 dtype
  • csrc/ops/<op_name>/test/<op_name>-test-cases.md
    已由
    ascendc-operator-testcase-gen
    生成(包含 SUPPORTED_DTYPES、TEST_SHAPES、BOUNDARY_VALUES、算子标杆)
  • The operator has been compiled and installed (the
    ascend_kernel
    package can be imported or the
    .so
    file exists)
  • The operator has passed basic functional tests (the
    tests/test_<op_name>.py
    file exists and passes)
  • The operator name, PyTorch calling method, input domain constraints, and all supported dtypes are known
  • csrc/ops/<op_name>/test/<op_name>-test-cases.md
    has been generated by
    ascendc-operator-testcase-gen
    (including SUPPORTED_DTYPES, TEST_SHAPES, BOUNDARY_VALUES, and operator benchmarks)

核心流程

Core Process

Phase 1: 加载用例文档 + 信息收集 → Phase 2: 用例适配 → Phase 3: 测试脚本生成 → Phase 4: 执行 → Phase 5: 报告生成

Phase 1: Load Case Document + Information Collection → Phase 2: Case Adaptation → Phase 3: Test Script Generation → Phase 4: Execution → Phase 5: Report Generation

Phase 1:加载用例文档 + 信息收集

Phase 1: Load Case Document + Information Collection

Step 1.1:加载 testcase-gen 用例文档(MANDATORY)

Step 1.1: Load testcase-gen Case Document (MANDATORY)

MUST 首先读取
csrc/ops/<op_name>/test/<op_name>-test-cases.md
,从中提取:
提取项在用例文档中的位置用途
SUPPORTED_DTYPES§测试配置精度测试遍历的 dtype 列表
TEST_SHAPES§测试配置常规 shape 测试的 shape 列表
BOUNDARY_VALUES§测试配置边界值测试的标量值列表
NPU 调用方式§算子标杆NPU_CALL 表达式
CPU 参考实现§算子标杆CPU_REF 表达式
<op_name>-test-cases.md
不存在
:回退为自行设计用例(按原 Phase 2 流程),但需在报告中注明"用例为自行设计,非 testcase-gen 产出"。
MUST first read
csrc/ops/<op_name>/test/<op_name>-test-cases.md
and extract the following from it:
Extraction ItemLocation in Case DocumentPurpose
SUPPORTED_DTYPES§Test ConfigurationList of dtypes for precision test traversal
TEST_SHAPES§Test ConfigurationList of shapes for regular shape tests
BOUNDARY_VALUES§Test ConfigurationList of scalar values for boundary value tests
NPU Calling Method§Operator BenchmarkNPU_CALL expression
CPU Reference Implementation§Operator BenchmarkCPU_REF expression
If
<op_name>-test-cases.md
does not exist
: Fall back to designing cases on your own (follow the original Phase 2 process), but you must note in the report that "cases are self-designed, not generated by testcase-gen".

Step 1.2:从代码补充信息

Step 1.2: Supplement Information from Code

从现有代码中补充提取以下信息:
信息来源示例
算子名称用户输入 / op_host 文件名
acosh
NPU 调用方式
register.cpp
中的
m.def
(与用例文档交叉验证)
torch.ops.npu.acosh(x)
CPU 参考实现PyTorch 标准库(与用例文档交叉验证)
torch.acosh(x.cpu().float()).to(dtype)
输入域约束数学定义 / design.md
x >= 1.0
支持的全部 dtype
op_host
中的 TORCH_CHECK(与用例文档交叉验证)
[torch.float16, torch.float32]
支持的维度/shape 约束design.md / op_host 中的逻辑elementwise 支持任意维度
精度阈值生态算子开源精度标准(见下表)按 dtype 查 Threshold
Supplement and extract the following information from existing code:
InformationSourceExample
Operator NameUser Input / op_host File Name
acosh
NPU Calling Method
m.def
in
register.cpp
(cross-verify with case document)
torch.ops.npu.acosh(x)
CPU Reference ImplementationPyTorch Standard Library (cross-verify with case document)
torch.acosh(x.cpu().float()).to(dtype)
Input Domain ConstraintsMathematical Definition / design.md
x >= 1.0
All Supported dtypesTORCH_CHECK in
op_host
(cross-verify with case document)
[torch.float16, torch.float32]
Supported Dimension/Shape Constraintsdesign.md / Logic in op_hostelementwise supports arbitrary dimensions
Precision ThresholdOpen Source Precision Standards for Ecological Operators (see table below)Check Threshold by dtype

精度标准(生态算子开源精度标准)

Precision Standards (Open Source Precision Standards for Ecological Operators)

采用 MERE(平均相对误差)和 MARE(最大相对误差)两个指标判定:
相对误差 = abs(actual - golden) / (abs(golden) + 1e-7)
MERE = mean(相对误差)
MARE = max(相对误差)
分母引入
1e-7
避免 golden 为零时除零。
通过标准:MERE < Threshold MARE < 10 × Threshold。
dtypeThresholdMERE 上限MARE 上限 (10×)
float162⁻¹⁰ ≈ 9.77e-49.77e-49.77e-3
bfloat162⁻⁷ ≈ 7.81e-37.81e-37.81e-2
float322⁻¹³ ≈ 1.22e-41.22e-41.22e-3
完整 dtype 列表(含 HiFLOAT32、FLOAT8 E4M3/E5M2)见
references/OPS_PRECISION_STANDARDS.md

Two metrics, MERE (Mean Relative Error) and MARE (Maximum Relative Error), are used for judgment:
Relative Error = abs(actual - golden) / (abs(golden) + 1e-7)
MERE = mean(Relative Error)
MARE = max(Relative Error)
1e-7
is introduced in the denominator to avoid division by zero when golden is zero.
Pass Standard: MERE < Threshold and MARE < 10 × Threshold.
dtypeThresholdMERE Upper LimitMARE Upper Limit (10×)
float162⁻¹⁰ ≈ 9.77e-49.77e-49.77e-3
bfloat162⁻⁷ ≈ 7.81e-37.81e-37.81e-2
float322⁻¹³ ≈ 1.22e-41.22e-41.22e-3
For the complete dtype list (including HiFLOAT32, FLOAT8 E4M3/E5M2), see
references/OPS_PRECISION_STANDARDS.md
.

Phase 2:用例适配

Phase 2: Case Adaptation

优先复用 testcase-gen 产出:若 Phase 1 成功加载了
<op_name>-test-cases.md
,则 TEST_SHAPES 和 BOUNDARY_VALUES 直接使用用例文档中的定义,无需重新设计。仅在用例文档不存在时才自行设计。
Prioritize Reusing testcase-gen Output: If
<op_name>-test-cases.md
is successfully loaded in Phase 1, directly use the definitions of TEST_SHAPES and BOUNDARY_VALUES from the case document, and do not redesign them. Only design cases on your own when the case document does not exist.

设计原则

Design Principles

  1. 全 dtype 覆盖:每个 shape / 每个边界值都遍历算子支持的全部 dtype
  2. shape 由算子决定:根据算子支持的维度选择合适的 shape,不要写固定维度
  3. shape 不要过大:单个用例元素数控制在合理范围,避免不必要的大 tensor
  4. 用例总数 = (len(TEST_SHAPES) + len(BOUNDARY_VALUES)) × len(SUPPORTED_DTYPES) ≥ 30
  1. Full dtype Coverage: Each shape / each boundary value traverses all dtypes supported by the operator
  2. Shapes Determined by Operator: Select appropriate shapes according to the dimensions supported by the operator, do not write fixed dimensions
  3. Shapes Not Too Large: Control the number of elements in a single case within a reasonable range to avoid unnecessary large tensors
  4. Total Number of Cases = (len(TEST_SHAPES) + len(BOUNDARY_VALUES)) × len(SUPPORTED_DTYPES) ≥ 30

Part A: 常规 Shape 测试(TEST_SHAPES)

Part A: Regular Shape Tests (TEST_SHAPES)

根据算子支持的维度,从以下维度池中选取适合的 shape,组成
TEST_SHAPES
列表。
MUST 根据算子实际支持的维度来选,不支持的维度不要选。
According to the dimensions supported by the operator, select suitable shapes from the following dimension pool to form the
TEST_SHAPES
list.
MUST select based on the actual supported dimensions of the operator, do not select unsupported dimensions.

shape 选择参考池

Shape Selection Reference Pool

维度推荐 shape适用算子类型
1D(128,), (1024,), (4096,), (8192,)elementwise, reduction
2D(32, 512), (64, 768), (128, 1024)elementwise, matmul, linear
3D(8, 16, 64), (4, 128, 256)elementwise, attention, conv1d
4D(4, 8, 32, 16), (2, 64, 32, 32)conv2d, elementwise
5D(2, 3, 4, 5, 6)conv3d, elementwise
shape 不要过大:推荐单个 shape 元素数 ≤ 200K。
DimensionRecommended ShapesApplicable Operator Types
1D(128,), (1024,), (4096,), (8192,)elementwise, reduction
2D(32, 512), (64, 768), (128, 1024)elementwise, matmul, linear
3D(8, 16, 64), (4, 128, 256)elementwise, attention, conv1d
4D(4, 8, 32, 16), (2, 64, 32, 32)conv2d, elementwise
5D(2, 3, 4, 5, 6)conv3d, elementwise
Shapes Not Too Large: It is recommended that the number of elements in a single shape ≤ 200K.

TEST_SHAPES 格式

TEST_SHAPES Format

python
TEST_SHAPES = [
    ("category_name", "description", (dim0, dim1, ...)),
    # ...
]
Category 名称自定义,建议按维度或场景命名。示例:
python
undefined
python
TEST_SHAPES = [
    ("category_name", "description", (dim0, dim1, ...)),
    # ...
]
Category names can be customized, and it is recommended to name them by dimension or scenario. Example:
python
undefined

elementwise 算子(支持任意维度)

elementwise operator (supports arbitrary dimensions)

TEST_SHAPES = [ ("1D", "128 elements", (128,)), ("1D", "1024 elements", (1024,)), ("1D", "4096 elements", (4096,)), ("1D", "8192 elements", (8192,)), ("2D", "batch*hidden 32x512", (32, 512)), ("2D", "BERT-base 64x768", (64, 768)), ("2D", "BERT-large 128x1024", (128, 1024)), ("3D", "8x16x64", (8, 16, 64)), ("3D", "4x128x256", (4, 128, 256)), ("4D", "4x8x32x16", (4, 8, 32, 16)), ("Production", "ViT 8x197x768", (8, 197, 768)), ("Production", "single sample 1x512", (1, 512)), ]
undefined
TEST_SHAPES = [ ("1D", "128 elements", (128,)), ("1D", "1024 elements", (1024,)), ("1D", "4096 elements", (4096,)), ("1D", "8192 elements", (8192,)), ("2D", "batch*hidden 32x512", (32, 512)), ("2D", "BERT-base 64x768", (64, 768)), ("2D", "BERT-large 128x1024", (128, 1024)), ("3D", "8x16x64", (8, 16, 64)), ("3D", "4x128x256", (4, 128, 256)), ("4D", "4x8x32x16", (4, 8, 32, 16)), ("Production", "ViT 8x197x768", (8, 197, 768)), ("Production", "single sample 1x512", (1, 512)), ]
undefined

Part B: 边界值测试(BOUNDARY_VALUES)

Part B: Boundary Value Tests (BOUNDARY_VALUES)

针对算子的输入域,选取关键边界点和典型值。使用固定小 shape
(1024,)
进行测试。
MUST 根据算子的数学定义确定边界值,不同算子差异很大。
Select key boundary points and typical values based on the input domain of the operator. Use the fixed small shape
(1024,)
for testing.
MUST determine boundary values according to the mathematical definition of the operator, which varies greatly for different operators.

边界值设计指导

Boundary Value Design Guidelines

算子类型推荐边界值
acosh (x≥1)x=1.0, x=1.001, x=10.0, x=1000.0
log (x>0)x=0.001, x=1.0, x=100.0, x=10000.0
sigmoid (全域)x=0.0, x=-5.0, x=5.0, x=-20.0, x=20.0
sqrt (x≥0)x=0.0, x=0.001, x=1.0, x=10000.0
无域限制x=0.0, x=1.0, x=-1.0, x=100.0
Operator TypeRecommended Boundary Values
acosh (x≥1)x=1.0, x=1.001, x=10.0, x=1000.0
log (x>0)x=0.001, x=1.0, x=100.0, x=10000.0
sigmoid (full domain)x=0.0, x=-5.0, x=5.0, x=-20.0, x=20.0
sqrt (x≥0)x=0.0, x=0.001, x=1.0, x=10000.0
No Domain Restrictionsx=0.0, x=1.0, x=-1.0, x=100.0

BOUNDARY_VALUES 格式

BOUNDARY_VALUES Format

python
BOUNDARY_VALUES = [
    ("description", scalar_value),
    # ...
]
示例(acosh):
python
BOUNDARY_VALUES = [
    ("domain lower bound x=1.0",  1.0),
    ("near boundary x=1.001",     1.001),
    ("moderate value x=10.0",     10.0),
    ("large value x=1000.0",      1000.0),
]

python
BOUNDARY_VALUES = [
    ("description", scalar_value),
    # ...
]
Example (acosh):
python
BOUNDARY_VALUES = [
    ("domain lower bound x=1.0",  1.0),
    ("near boundary x=1.001",     1.001),
    ("moderate value x=10.0",     10.0),
    ("large value x=1000.0",      1000.0),
]

Phase 3:测试脚本生成

Phase 3: Test Script Generation

MUST 先读取
templates/
目录下的模板文件,替换占位符后生成到算子目录的
test/
子目录。
MUST first read the template files in the
templates/
directory, replace the placeholders, and generate them to the
test/
subdirectory of the operator directory.

输出目录

Output Directory

csrc/ops/<op_name>/test/
├── test_<op_name>_precision.py           ← pytest 测试
├── run_<op_name>_precision_report.py     ← 报告生成器
├── <op_name>_precision_report.json       ← JSON 报告(执行后生成)
└── <op_name>_precision_report.md         ← Markdown 报告(执行后生成)
csrc/ops/<op_name>/test/
├── test_<op_name>_precision.py           ← pytest Test
├── run_<op_name>_precision_report.py     ← Report Generator
├── <op_name>_precision_report.json       ← JSON Report (generated after execution)
└── <op_name>_precision_report.md         ← Markdown Report (generated after execution)

模板文件

Template Files

模板路径生成目标
pytest 测试
templates/test_op_precision_template.py
csrc/ops/<op_name>/test/test_<op_name>_precision.py
报告生成器
templates/run_precision_report_template.py
csrc/ops/<op_name>/test/run_<op_name>_precision_report.py
TemplatePathGeneration Target
pytest Test
templates/test_op_precision_template.py
csrc/ops/<op_name>/test/test_<op_name>_precision.py
Report Generator
templates/run_precision_report_template.py
csrc/ops/<op_name>/test/run_<op_name>_precision_report.py

占位符替换表

Placeholder Replacement Table

占位符说明示例(acosh)
{{OP_NAME}}
算子名称
acosh
{{NPU_CALL}}
NPU 调用表达式(使用变量
x
torch.ops.npu.acosh(x)
{{CPU_REF}}
CPU 参考基线(使用变量
x
,
dtype
torch.acosh(x.cpu().float()).to(dtype)
{{SUPPORTED_DTYPES}}
支持的全部 dtype 列表
[torch.float16, torch.float32]
{{INPUT_LOW}}
随机输入的域下界
1.0
{{INPUT_HIGH}}
随机输入的域上界
11.0
{{TEST_SHAPES}}
常规 shape 列表(Phase 2 Part A 的输出)见上方示例
{{BOUNDARY_VALUES}}
边界值列表(Phase 2 Part B 的输出)见上方示例
PlaceholderDescriptionExample (acosh)
{{OP_NAME}}
Operator Name
acosh
{{NPU_CALL}}
NPU Calling Expression (using variable
x
)
torch.ops.npu.acosh(x)
{{CPU_REF}}
CPU Reference Baseline (using variables
x
,
dtype
)
torch.acosh(x.cpu().float()).to(dtype)
{{SUPPORTED_DTYPES}}
List of All Supported dtypes
[torch.float16, torch.float32]
{{INPUT_LOW}}
Lower Bound of Random Input Domain
1.0
{{INPUT_HIGH}}
Upper Bound of Random Input Domain
11.0
{{TEST_SHAPES}}
List of Regular Shapes (Output of Phase 2 Part A)See example above
{{BOUNDARY_VALUES}}
List of Boundary Values (Output of Phase 2 Part B)See example above

必须采集的精度指标

Must-Collect Precision Metrics

判定指标(用于通过/失败判定):
指标计算方式通过条件
MERE
((npu - ref).abs() / (ref.abs() + 1e-7)).mean()
< Threshold
MARE
((npu - ref).abs() / (ref.abs() + 1e-7)).max()
< 10 × Threshold
辅助指标(用于分析,不作为判定依据):
指标计算方式意义
MaxAbsErr
(npu - ref).abs().max()
最大绝对误差
MeanAbsErr
(npu - ref).abs().mean()
平均绝对误差
CosineSim
F.cosine_similarity(npu.flatten(), ref.flatten())
余弦相似度

Judgment Metrics (used for pass/fail judgment):
MetricCalculation MethodPass Condition
MERE
((npu - ref).abs() / (ref.abs() + 1e-7)).mean()
< Threshold
MARE
((npu - ref).abs() / (ref.abs() + 1e-7)).max()
< 10 × Threshold
Auxiliary Metrics (used for analysis, not as judgment basis):
MetricCalculation MethodMeaning
MaxAbsErr
(npu - ref).abs().max()
Maximum Absolute Error
MeanAbsErr
(npu - ref).abs().mean()
Mean Absolute Error
CosineSim
F.cosine_similarity(npu.flatten(), ref.flatten())
Cosine Similarity

Phase 4:执行测试

Phase 4: Test Execution

4.1 环境准备

4.1 Environment Preparation

bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
export PATH=/root/miniconda3/envs/py310/bin:$PATH
MUST 在每个 Shell 调用前 source 环境。
bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
export PATH=/root/miniconda3/envs/py310/bin:$PATH
MUST source the environment before each Shell call.

4.2 执行 pytest

4.2 Execute pytest

bash
cd <project_root>
python3 -m pytest csrc/ops/<op_name>/test/test_<op_name>_precision.py -v --tb=short
bash
cd <project_root>
python3 -m pytest csrc/ops/<op_name>/test/test_<op_name>_precision.py -v --tb=short

4.3 生成报告

4.3 Generate Report

bash
python3 csrc/ops/<op_name>/test/run_<op_name>_precision_report.py
bash
python3 csrc/ops/<op_name>/test/run_<op_name>_precision_report.py

4.4 失败处理

4.4 Failure Handling

失败类型排查方向
RuntimeError (NPU kernel)输入数据超出定义域 / NPU 不支持该 dtype
AssertionError (精度)检查 MERE/MARE 是否略超 Threshold,分析是否为边界值导致
个别 dtype 用例 FAIL确认该 dtype 的 Threshold 是否匹配,检查 MARE 是否集中在少数异常点
大量 FAIL检查算子 Compute 逻辑是否有 bug
Failure TypeTroubleshooting Direction
RuntimeError (NPU kernel)Input data exceeds the defined domain / NPU does not support this dtype
AssertionError (Precision)Check if MERE/MARE slightly exceeds the Threshold, analyze whether it is caused by boundary values
Individual dtype Cases FAILConfirm whether the Threshold of this dtype matches, check if MARE is concentrated in a few abnormal points
A Large Number of FAILCheck if there are bugs in the operator Compute logic

4.5 精度问题深度排查

4.5 In-Depth Troubleshooting of Precision Issues

当出现精度失败(allclose 不通过、输出偏差过大、输出全零/NaN)且简单阈值调整无法解决时MUST 读取并按
ascendc-operator-precision-debug
skill 流程进行系统化根因定位:
  1. 读取
    ascendc-operator-precision-debug
    SKILL.md
  2. 按其五阶段流程执行:误差分析 → 代码审查 → 实验隔离 → 插桩定位 → 修复验证
  3. 修复后重新运行本 skill 的完整精度测试,确认全部通过
注意:仅在精度问题无法通过调整阈值解决时才调用。个别 dtype 因硬件精度特性略超阈值的情况,优先通过放宽阈值(并在报告中说明)解决。

When precision failures occur (allclose not passed, excessive output deviation, all-zero/NaN output) and simple threshold adjustment cannot solve the problem, MUST read and follow the
ascendc-operator-precision-debug
skill process for systematic root cause location:
  1. Read the SKILL.md of
    ascendc-operator-precision-debug
  2. Execute according to its five-phase process: Error Analysis → Code Review → Experimental Isolation → Instrumentation Location → Fix Verification
  3. After fixing, re-run the complete precision test of this skill to confirm all tests pass
Note: Only call this when precision issues cannot be resolved by threshold adjustment. For cases where individual dtypes slightly exceed the threshold due to hardware precision characteristics, priority should be given to solving it by relaxing the threshold (and explain in the report).

Phase 5:报告生成

Phase 5: Report Generation

5.1 Markdown 报告

5.1 Markdown Report

MUST 生成
csrc/ops/<op_name>/test/<op_name>_precision_report.md
,参考
templates/precision_report_template.md
报告包含:
  1. 总览表:总用例/通过/失败/通过率
  2. 精度阈值标准表
  3. 常规 Shape 测试结果表(按 category 分组)
  4. 边界值测试结果表
  5. 按 dtype 汇总统计
  6. 关键发现(≥3 条结论)
MUST generate
csrc/ops/<op_name>/test/<op_name>_precision_report.md
, referring to
templates/precision_report_template.md
.
The report includes:
  1. Overview Table: Total cases, passed cases, failed cases, pass rate (percentage).
  2. Precision Threshold Standard Table
  3. Regular Shape Test Results Table (grouped by category)
  4. Boundary Value Test Results Table
  5. Summary Statistics by dtype
  6. Key Findings (≥3 conclusions)

5.2 完成提示(文件 + 对话)

5.2 Completion Prompt (File + Conversation)

  1. 文件MUST 生成
    csrc/ops/<op_name>/test/<op_name>_precision_report.md
    (及同目录
    *_precision_report.json
    若脚本输出),并向用户给出完整路径
精度验证报告已生成:
  csrc/ops/<op_name>/test/<op_name>_precision_report.md
  csrc/ops/<op_name>/test/<op_name>_precision_report.json   # 若存在
  1. 当前对话MUST 同时遵守下节「对话内展示结果」,不得仅输出路径。

  1. File: MUST generate
    csrc/ops/<op_name>/test/<op_name>_precision_report.md
    (and
    *_precision_report.json
    in the same directory if the script outputs it), and provide the complete path to the user:
Precision verification report has been generated:
  csrc/ops/<op_name>/test/<op_name>_precision_report.md
  csrc/ops/<op_name>/test/<op_name>_precision_report.json   # if exists
  1. Current Conversation: MUST also comply with the next section "Display Results in Conversation", and must not only output the path.

对话内展示结果(MANDATORY)

Display Results in Conversation (MANDATORY)

pytest 与报告脚本执行完毕且已生成 Markdown/JSON 后,助手在当前对话的回复中 MUST
  1. 粘贴可读结论(用户无需打开文件即可掌握结果):
    • 总览:总用例数、通过数、失败数、通过率(百分比)。
    • 若有失败:列出失败用例标识(case 名 / shape / dtype / 类别),以及主要误差指标(如 MaxAbsErr)或 pytest 摘要行。
    • 若全部通过:明确写出「全部通过」及总用例数。
    • 关键发现 ≥3 条:可与报告内「关键发现」一致或提炼自报告(dtype 差异、边界值表现、阈值是否收紧/放宽等)。
    • 可选:按 dtype 汇总的通过情况表(摘录,case 多时可只列汇总行)。
  2. 口径:一两句话说明使用的精度标准(MERE/MARE,生态算子开源精度标准)及各 dtype 的 Threshold 值。
  3. 路径殿后:在展示完上述内容后,再附
    <op_name>_precision_report.md
    (及 JSON)的完整路径。
NEVER:仅回复「报告已生成」和路径;NEVER 用「请自行打开 Markdown」替代在对话中展示通过率与失败摘要。

After pytest and the report script are executed and the Markdown/JSON report is generated, the assistant MUST in the reply of the current conversation:
  1. Paste Readable Conclusions (users can grasp the results without opening files):
    • Overview: Total number of cases, number of passed cases, number of failed cases, pass rate (percentage).
    • If there are failures: List the identifiers of failed cases (case name / shape / dtype / category), as well as main error metrics (such as MaxAbsErr) or pytest summary lines.
    • If all passed: Clearly state "all passed" and the total number of cases.
    • Key Findings ≥3: Can be consistent with the "Key Findings" in the report or extracted from the report (dtype differences, boundary value performance, whether the threshold is tightened/relaxed, etc.).
    • Optional: A table summarizing the pass status by dtype (excerpt, only list summary rows when there are many cases).
  2. Caliber: Briefly explain the precision standards used (MERE/MARE, open source precision standards for ecological operators) and the Threshold values of each dtype in one or two sentences.
  3. Path at the End: After displaying the above content, attach the complete path of
    <op_name>_precision_report.md
    (and JSON if applicable).
NEVER: Only reply "report has been generated" and the path; NEVER use "please open the Markdown file yourself" instead of displaying the pass rate and failure summary in the conversation.

经验总结

Experience Summary

输入生成

Input Generation

  • 定义域:务必查阅算子数学定义,确保输入合法
  • fp16 范围:fp16 最大约 65504,输入不要超过此值
  • shape 大小:推荐单个 shape 元素数 ≤ 200K,避免测试时间过长
  • Defined Domain: Be sure to consult the mathematical definition of the operator to ensure legal input
  • fp16 Range: The maximum value of fp16 is approximately 65504, and input should not exceed this value
  • Shape Size: It is recommended that the number of elements in a single shape ≤ 200K to avoid excessive test time

精度指标

Precision Metrics

  • MERE / MARE:判定指标,分母为
    abs(golden) + 1e-7
    (非 clamp),与生态算子开源精度标准对齐
  • MaxAbsErr / MeanAbsErr:辅助分析,帮助判断偏差量级
  • CosineSim:全零输出时为 0 或 NaN,需标注说明而非判定失败
  • MERE / MARE: Judgment metrics, with denominator
    abs(golden) + 1e-7
    (not clamp), aligned with open source precision standards for ecological operators
  • MaxAbsErr / MeanAbsErr: Auxiliary analysis, helping to judge the magnitude of deviation
  • CosineSim: 0 or NaN when output is all-zero, need to mark and explain instead of judging failure

阈值说明

Threshold Description

  • 阈值来源:生态算子开源精度标准(
    references/OPS_PRECISION_STANDARDS.md
  • 通过条件:MERE < Threshold 且 MARE < 10 × Threshold
  • 不建议随意放宽阈值;若确需放宽,MUST 在报告中说明原因

  • Threshold Source: Open Source Precision Standards for Ecological Operators (
    references/OPS_PRECISION_STANDARDS.md
    )
  • Pass Condition: MERE < Threshold and MARE < 10 × Threshold
  • It is not recommended to relax the threshold arbitrarily; if relaxation is indeed necessary, MUST explain the reason in the report

反模式(NEVER)

Anti-Patterns (NEVER)

  • NEVER 只生成报告文件而不在对话中展示总览与结论
  • NEVER 隐瞒失败用例数量,仅报告路径

  • NEVER only generate report files without displaying the overview and conclusions in the conversation
  • NEVER conceal the number of failed cases and only report the path

检查清单

Checklist

  • 已读取
    csrc/ops/<op_name>/test/<op_name>-test-cases.md
    (若存在)
  • 信息收集完成(算子名、调用方式、输入域、支持的全部 dtype、支持的维度)
  • TEST_SHAPES 优先来自 testcase-gen 用例文档,shape 不过大
  • BOUNDARY_VALUES 根据算子输入域设计
  • 用例总数 = (shapes + boundary) × dtypes ≥ 30
  • 算子支持的每种 dtype 都已测试
  • pytest 全部通过
  • JSON + Markdown 报告已生成
  • 关键发现 ≥ 3 条
  • 已向用户提示报告与 JSON 路径(若生成)
  • 已在当前对话中展示总览(通过率)、失败摘要(若有)及 ≥3 条关键发现,不仅附路径
  • Has read
    csrc/ops/<op_name>/test/<op_name>-test-cases.md
    (if exists)
  • Information collection is complete (operator name, calling method, input domain, all supported dtypes, supported dimensions)
  • TEST_SHAPES are preferentially from the testcase-gen case document, and shapes are not too large
  • BOUNDARY_VALUES are designed according to the operator's input domain
  • Total number of cases = (shapes + boundary) × dtypes ≥ 30
  • Each dtype supported by the operator has been tested
  • All pytest tests passed
  • JSON + Markdown report has been generated
  • Key Findings ≥ 3
  • Has prompted the user with the report and JSON path (if generated)
  • Has displayed the overview (pass rate), failure summary (if any) and ≥3 key findings in the current conversation, not only attached the path