AscendC Operator Precision Evaluation

Skill Type: Evaluation (Test Generation + Execution + Report Output)

This skill conducts systematic precision evaluation on compiled and installed AscendC operators. Test cases consist of two parts: "regular shape tests" and "boundary value tests". Each case traverses all dtypes supported by the operator, and a structured precision report is output after execution.

Prerequisites

The operator has been compiled and installed (the
```
ascend_kernel
```
package can be imported or the
```
.so
```
file exists)
The operator has passed basic functional tests (the
```
tests/test_<op_name>.py
```
file exists and passes)
The operator name, PyTorch calling method, input domain constraints, and all supported dtypes are known
csrc/ops/<op_name>/test/<op_name>-test-cases.md
has been generated by
```
ascendc-operator-testcase-gen
```
(including SUPPORTED_DTYPES, TEST_SHAPES, BOUNDARY_VALUES, and operator benchmarks)

Core Process

Phase 1: Load Case Document + Information Collection → Phase 2: Case Adaptation → Phase 3: Test Script Generation → Phase 4: Execution → Phase 5: Report Generation

Phase 1: Load Case Document + Information Collection

Step 1.1: Load testcase-gen Case Document (MANDATORY)

MUST first read

csrc/ops/<op_name>/test/<op_name>-test-cases.md

and extract the following from it:

Extraction Item	Location in Case Document	Purpose
SUPPORTED_DTYPES	§Test Configuration	List of dtypes for precision test traversal
TEST_SHAPES	§Test Configuration	List of shapes for regular shape tests
BOUNDARY_VALUES	§Test Configuration	List of scalar values for boundary value tests
NPU Calling Method	§Operator Benchmark	NPU_CALL expression
CPU Reference Implementation	§Operator Benchmark	CPU_REF expression

If
<op_name>-test-cases.md
does not exist: Fall back to designing cases on your own (follow the original Phase 2 process), but you must note in the report that "cases are self-designed, not generated by testcase-gen".

Step 1.2: Supplement Information from Code

Supplement and extract the following information from existing code:

Information	Source	Example
Operator Name	User Input / op_host File Name	`acosh`
NPU Calling Method	`m.def` in `register.cpp` (cross-verify with case document)	`torch.ops.npu.acosh(x)`
CPU Reference Implementation	PyTorch Standard Library (cross-verify with case document)	`torch.acosh(x.cpu().float()).to(dtype)`
Input Domain Constraints	Mathematical Definition / design.md	`x >= 1.0`
All Supported dtypes	TORCH_CHECK in `op_host` (cross-verify with case document)	`[torch.float16, torch.float32]`
Supported Dimension/Shape Constraints	design.md / Logic in op_host	elementwise supports arbitrary dimensions
Precision Threshold	Open Source Precision Standards for Ecological Operators (see table below)	Check Threshold by dtype

Precision Standards (Open Source Precision Standards for Ecological Operators)

Two metrics, MERE (Mean Relative Error) and MARE (Maximum Relative Error), are used for judgment:

Relative Error = abs(actual - golden) / (abs(golden) + 1e-7)
MERE = mean(Relative Error)
MARE = max(Relative Error)

1e-7
is introduced in the denominator to avoid division by zero when golden is zero.

Pass Standard: MERE < Threshold and MARE < 10 × Threshold.

dtype	Threshold	MERE Upper Limit	MARE Upper Limit (10×)
float16	2⁻¹⁰ ≈ 9.77e-4	9.77e-4	9.77e-3
bfloat16	2⁻⁷ ≈ 7.81e-3	7.81e-3	7.81e-2
float32	2⁻¹³ ≈ 1.22e-4	1.22e-4	1.22e-3

For the complete dtype list (including HiFLOAT32, FLOAT8 E4M3/E5M2), see
references/OPS_PRECISION_STANDARDS.md
.

Phase 2: Case Adaptation

Prioritize Reusing testcase-gen Output: If
<op_name>-test-cases.md
is successfully loaded in Phase 1, directly use the definitions of TEST_SHAPES and BOUNDARY_VALUES from the case document, and do not redesign them. Only design cases on your own when the case document does not exist.

Design Principles

Full dtype Coverage: Each shape / each boundary value traverses all dtypes supported by the operator
Shapes Determined by Operator: Select appropriate shapes according to the dimensions supported by the operator, do not write fixed dimensions
Shapes Not Too Large: Control the number of elements in a single case within a reasonable range to avoid unnecessary large tensors
Total Number of Cases = (len(TEST_SHAPES) + len(BOUNDARY_VALUES)) × len(SUPPORTED_DTYPES) ≥ 30

Part A: Regular Shape Tests (TEST_SHAPES)

According to the dimensions supported by the operator, select suitable shapes from the following dimension pool to form the

TEST_SHAPES

list.

MUST select based on the actual supported dimensions of the operator, do not select unsupported dimensions.

Shape Selection Reference Pool

Dimension	Recommended Shapes	Applicable Operator Types
1D	(128,), (1024,), (4096,), (8192,)	elementwise, reduction
2D	(32, 512), (64, 768), (128, 1024)	elementwise, matmul, linear
3D	(8, 16, 64), (4, 128, 256)	elementwise, attention, conv1d
4D	(4, 8, 32, 16), (2, 64, 32, 32)	conv2d, elementwise
5D	(2, 3, 4, 5, 6)	conv3d, elementwise

Shapes Not Too Large: It is recommended that the number of elements in a single shape ≤ 200K.

TEST_SHAPES Format

python

TEST_SHAPES = [
    ("category_name", "description", (dim0, dim1, ...)),
    # ...
]

Category names can be customized, and it is recommended to name them by dimension or scenario. Example:

python

# elementwise operator (supports arbitrary dimensions)
TEST_SHAPES = [
    ("1D",         "128 elements",         (128,)),
    ("1D",         "1024 elements",        (1024,)),
    ("1D",         "4096 elements",        (4096,)),
    ("1D",         "8192 elements",        (8192,)),
    ("2D",         "batch*hidden 32x512",  (32, 512)),
    ("2D",         "BERT-base 64x768",     (64, 768)),
    ("2D",         "BERT-large 128x1024",  (128, 1024)),
    ("3D",         "8x16x64",             (8, 16, 64)),
    ("3D",         "4x128x256",           (4, 128, 256)),
    ("4D",         "4x8x32x16",           (4, 8, 32, 16)),
    ("Production", "ViT 8x197x768",       (8, 197, 768)),
    ("Production", "single sample 1x512", (1, 512)),
]

Part B: Boundary Value Tests (BOUNDARY_VALUES)

Select key boundary points and typical values based on the input domain of the operator. Use the fixed small shape

(1024,)

for testing.

MUST determine boundary values according to the mathematical definition of the operator, which varies greatly for different operators.

Boundary Value Design Guidelines

Operator Type	Recommended Boundary Values
acosh (x≥1)	x=1.0, x=1.001, x=10.0, x=1000.0
log (x>0)	x=0.001, x=1.0, x=100.0, x=10000.0
sigmoid (full domain)	x=0.0, x=-5.0, x=5.0, x=-20.0, x=20.0
sqrt (x≥0)	x=0.0, x=0.001, x=1.0, x=10000.0
No Domain Restrictions	x=0.0, x=1.0, x=-1.0, x=100.0

BOUNDARY_VALUES Format

python

BOUNDARY_VALUES = [
    ("description", scalar_value),
    # ...
]

Example (acosh):

python

BOUNDARY_VALUES = [
    ("domain lower bound x=1.0",  1.0),
    ("near boundary x=1.001",     1.001),
    ("moderate value x=10.0",     10.0),
    ("large value x=1000.0",      1000.0),
]

Phase 3: Test Script Generation

MUST first read the template files in the

templates/

directory, replace the placeholders, and generate them to the

test/

subdirectory of the operator directory.

Output Directory

csrc/ops/<op_name>/test/
├── test_<op_name>_precision.py           ← pytest Test
├── run_<op_name>_precision_report.py     ← Report Generator
├── <op_name>_precision_report.json       ← JSON Report (generated after execution)
└── <op_name>_precision_report.md         ← Markdown Report (generated after execution)

Template Files

Template	Path	Generation Target
pytest Test	`templates/test_op_precision_template.py`	`csrc/ops/<op_name>/test/test_<op_name>_precision.py`
Report Generator	`templates/run_precision_report_template.py`	`csrc/ops/<op_name>/test/run_<op_name>_precision_report.py`

Placeholder Replacement Table

Placeholder	Description	Example (acosh)
`{{OP_NAME}}`	Operator Name	`acosh`
`{{NPU_CALL}}`	NPU Calling Expression (using variable `x` )	`torch.ops.npu.acosh(x)`
`{{CPU_REF}}`	CPU Reference Baseline (using variables `x` , `dtype` )	`torch.acosh(x.cpu().float()).to(dtype)`
`{{SUPPORTED_DTYPES}}`	List of All Supported dtypes	`[torch.float16, torch.float32]`
`{{INPUT_LOW}}`	Lower Bound of Random Input Domain	`1.0`
`{{INPUT_HIGH}}`	Upper Bound of Random Input Domain	`11.0`
`{{TEST_SHAPES}}`	List of Regular Shapes (Output of Phase 2 Part A)	See example above
`{{BOUNDARY_VALUES}}`	List of Boundary Values (Output of Phase 2 Part B)	See example above

Must-Collect Precision Metrics

Judgment Metrics (used for pass/fail judgment):

Metric	Calculation Method	Pass Condition
MERE	`((npu - ref).abs() / (ref.abs() + 1e-7)).mean()`	< Threshold
MARE	`((npu - ref).abs() / (ref.abs() + 1e-7)).max()`	< 10 × Threshold

Auxiliary Metrics (used for analysis, not as judgment basis):

Metric	Calculation Method	Meaning
MaxAbsErr	`(npu - ref).abs().max()`	Maximum Absolute Error
MeanAbsErr	`(npu - ref).abs().mean()`	Mean Absolute Error
CosineSim	`F.cosine_similarity(npu.flatten(), ref.flatten())`	Cosine Similarity

Phase 4: Test Execution

4.1 Environment Preparation

bash

source /usr/local/Ascend/ascend-toolkit/set_env.sh
export PATH=/root/miniconda3/envs/py310/bin:$PATH

MUST source the environment before each Shell call.

4.2 Execute pytest

bash

cd <project_root>
python3 -m pytest csrc/ops/<op_name>/test/test_<op_name>_precision.py -v --tb=short

4.3 Generate Report

bash

python3 csrc/ops/<op_name>/test/run_<op_name>_precision_report.py

4.4 Failure Handling

Failure Type	Troubleshooting Direction
RuntimeError (NPU kernel)	Input data exceeds the defined domain / NPU does not support this dtype
AssertionError (Precision)	Check if MERE/MARE slightly exceeds the Threshold, analyze whether it is caused by boundary values
Individual dtype Cases FAIL	Confirm whether the Threshold of this dtype matches, check if MARE is concentrated in a few abnormal points
A Large Number of FAIL	Check if there are bugs in the operator Compute logic

4.5 In-Depth Troubleshooting of Precision Issues

When precision failures occur (allclose not passed, excessive output deviation, all-zero/NaN output) and simple threshold adjustment cannot solve the problem, MUST read and follow the ascendc-operator-precision-debug
skill process for systematic root cause location:

Read the SKILL.md of
```
ascendc-operator-precision-debug
```
Execute according to its five-phase process: Error Analysis → Code Review → Experimental Isolation → Instrumentation Location → Fix Verification
After fixing, re-run the complete precision test of this skill to confirm all tests pass

Note: Only call this when precision issues cannot be resolved by threshold adjustment. For cases where individual dtypes slightly exceed the threshold due to hardware precision characteristics, priority should be given to solving it by relaxing the threshold (and explain in the report).

Phase 5: Report Generation

5.1 Markdown Report

MUST generate

csrc/ops/<op_name>/test/<op_name>_precision_report.md

, referring to

templates/precision_report_template.md

The report includes:

Overview Table: Total cases, passed cases, failed cases, pass rate (percentage).
Precision Threshold Standard Table
Regular Shape Test Results Table (grouped by category)
Boundary Value Test Results Table
Summary Statistics by dtype
Key Findings (≥3 conclusions)

5.2 Completion Prompt (File + Conversation)

File: MUST generate
```
csrc/ops/<op_name>/test/<op_name>_precision_report.md
```
(and
```
*_precision_report.json
```
in the same directory if the script outputs it), and provide the complete path to the user:

Precision verification report has been generated:
  csrc/ops/<op_name>/test/<op_name>_precision_report.md
  csrc/ops/<op_name>/test/<op_name>_precision_report.json   # if exists

Current Conversation: MUST also comply with the next section "Display Results in Conversation", and must not only output the path.

Display Results in Conversation (MANDATORY)

After pytest and the report script are executed and the Markdown/JSON report is generated, the assistant MUST in the reply of the current conversation:

Paste Readable Conclusions (users can grasp the results without opening files):
- Overview: Total number of cases, number of passed cases, number of failed cases, pass rate (percentage).
- If there are failures: List the identifiers of failed cases (case name / shape / dtype / category), as well as main error metrics (such as MaxAbsErr) or pytest summary lines.
- If all passed: Clearly state "all passed" and the total number of cases.
- Key Findings ≥3: Can be consistent with the "Key Findings" in the report or extracted from the report (dtype differences, boundary value performance, whether the threshold is tightened/relaxed, etc.).
- Optional: A table summarizing the pass status by dtype (excerpt, only list summary rows when there are many cases).
Caliber: Briefly explain the precision standards used (MERE/MARE, open source precision standards for ecological operators) and the Threshold values of each dtype in one or two sentences.
Path at the End: After displaying the above content, attach the complete path of <op_name>_precision_report.md
(and JSON if applicable).

NEVER: Only reply "report has been generated" and the path; NEVER use "please open the Markdown file yourself" instead of displaying the pass rate and failure summary in the conversation.

Experience Summary

Input Generation

Defined Domain: Be sure to consult the mathematical definition of the operator to ensure legal input
fp16 Range: The maximum value of fp16 is approximately 65504, and input should not exceed this value
Shape Size: It is recommended that the number of elements in a single shape ≤ 200K to avoid excessive test time

Precision Metrics

MERE / MARE: Judgment metrics, with denominator
```
abs(golden) + 1e-7
```
(not clamp), aligned with open source precision standards for ecological operators
MaxAbsErr / MeanAbsErr: Auxiliary analysis, helping to judge the magnitude of deviation
CosineSim: 0 or NaN when output is all-zero, need to mark and explain instead of judging failure

Threshold Description

Threshold Source: Open Source Precision Standards for Ecological Operators (
```
references/OPS_PRECISION_STANDARDS.md
```
)
Pass Condition: MERE < Threshold and MARE < 10 × Threshold
It is not recommended to relax the threshold arbitrarily; if relaxation is indeed necessary, MUST explain the reason in the report

Anti-Patterns (NEVER)

NEVER only generate report files without displaying the overview and conclusions in the conversation
NEVER conceal the number of failed cases and only report the path

Checklist

Has read

csrc/ops/<op_name>/test/<op_name>-test-cases.md

(if exists)

Information collection is complete (operator name, calling method, input domain, all supported dtypes, supported dimensions)
TEST_SHAPES are preferentially from the testcase-gen case document, and shapes are not too large
BOUNDARY_VALUES are designed according to the operator's input domain
Total number of cases = (shapes + boundary) × dtypes ≥ 30
Each dtype supported by the operator has been tested
All pytest tests passed
JSON + Markdown report has been generated
Key Findings ≥ 3
Has prompted the user with the report and JSON path (if generated)
Has displayed the overview (pass rate), failure summary (if any) and ≥3 key findings in the current conversation, not only attached the path

ascendc-operator-precision-eval

NPX Install

Tags

SKILL.md Content (Chinese)

AscendC Operator Precision Evaluation

Prerequisites

Core Process

Phase 1: Load Case Document + Information Collection

Step 1.1: Load testcase-gen Case Document (MANDATORY)

Step 1.2: Supplement Information from Code

Precision Standards (Open Source Precision Standards for Ecological Operators)

Phase 2: Case Adaptation

Design Principles

Part A: Regular Shape Tests (TEST_SHAPES)

Shape Selection Reference Pool

TEST_SHAPES Format

Part B: Boundary Value Tests (BOUNDARY_VALUES)

Boundary Value Design Guidelines

BOUNDARY_VALUES Format

Phase 3: Test Script Generation

Output Directory

Template Files

Placeholder Replacement Table

Must-Collect Precision Metrics

Phase 4: Test Execution

4.1 Environment Preparation

4.2 Execute pytest

4.3 Generate Report

4.4 Failure Handling

4.5 In-Depth Troubleshooting of Precision Issues

Phase 5: Report Generation

5.1 Markdown Report

5.2 Completion Prompt (File + Conversation)

Display Results in Conversation (MANDATORY)

Experience Summary

Input Generation

Precision Metrics

Threshold Description

Anti-Patterns (NEVER)

Checklist