AscendC Operator Precision Evaluation
Skill Type: Evaluation (Test Generation + Execution + Report Output)
This skill conducts systematic precision evaluation on compiled and installed AscendC operators. Test cases consist of two parts: "regular shape tests" and "boundary value tests". Each case traverses all dtypes supported by the operator, and a structured precision report is output after execution.
Prerequisites
- The operator has been compiled and installed (the package can be imported or the file exists)
- The operator has passed basic functional tests (the file exists and passes)
- The operator name, PyTorch calling method, input domain constraints, and all supported dtypes are known
csrc/ops/<op_name>/test/<op_name>-test-cases.md
has been generated by ascendc-operator-testcase-gen
(including SUPPORTED_DTYPES, TEST_SHAPES, BOUNDARY_VALUES, and operator benchmarks)
Core Process
Phase 1: Load Case Document + Information Collection → Phase 2: Case Adaptation → Phase 3: Test Script Generation → Phase 4: Execution → Phase 5: Report Generation
Phase 1: Load Case Document + Information Collection
Step 1.1: Load testcase-gen Case Document (MANDATORY)
MUST first read
csrc/ops/<op_name>/test/<op_name>-test-cases.md
and extract the following from it:
| Extraction Item | Location in Case Document | Purpose |
|---|
| SUPPORTED_DTYPES | §Test Configuration | List of dtypes for precision test traversal |
| TEST_SHAPES | §Test Configuration | List of shapes for regular shape tests |
| BOUNDARY_VALUES | §Test Configuration | List of scalar values for boundary value tests |
| NPU Calling Method | §Operator Benchmark | NPU_CALL expression |
| CPU Reference Implementation | §Operator Benchmark | CPU_REF expression |
If does not exist: Fall back to designing cases on your own (follow the original Phase 2 process), but you must note in the report that "cases are self-designed, not generated by testcase-gen".
Step 1.2: Supplement Information from Code
Supplement and extract the following information from existing code:
| Information | Source | Example |
|---|
| Operator Name | User Input / op_host File Name | |
| NPU Calling Method | in (cross-verify with case document) | |
| CPU Reference Implementation | PyTorch Standard Library (cross-verify with case document) | torch.acosh(x.cpu().float()).to(dtype)
|
| Input Domain Constraints | Mathematical Definition / design.md | |
| All Supported dtypes | TORCH_CHECK in (cross-verify with case document) | [torch.float16, torch.float32]
|
| Supported Dimension/Shape Constraints | design.md / Logic in op_host | elementwise supports arbitrary dimensions |
| Precision Threshold | Open Source Precision Standards for Ecological Operators (see table below) | Check Threshold by dtype |
Precision Standards (Open Source Precision Standards for Ecological Operators)
Two metrics, MERE (Mean Relative Error) and MARE (Maximum Relative Error), are used for judgment:
Relative Error = abs(actual - golden) / (abs(golden) + 1e-7)
MERE = mean(Relative Error)
MARE = max(Relative Error)
is introduced in the denominator to avoid division by zero when golden is zero.
Pass Standard: MERE < Threshold and MARE < 10 × Threshold.
| dtype | Threshold | MERE Upper Limit | MARE Upper Limit (10×) |
|---|
| float16 | 2⁻¹⁰ ≈ 9.77e-4 | 9.77e-4 | 9.77e-3 |
| bfloat16 | 2⁻⁷ ≈ 7.81e-3 | 7.81e-3 | 7.81e-2 |
| float32 | 2⁻¹³ ≈ 1.22e-4 | 1.22e-4 | 1.22e-3 |
For the complete dtype list (including HiFLOAT32, FLOAT8 E4M3/E5M2), see
references/OPS_PRECISION_STANDARDS.md
.
Phase 2: Case Adaptation
Prioritize Reusing testcase-gen Output: If
is successfully loaded in Phase 1, directly use the definitions of TEST_SHAPES and BOUNDARY_VALUES from the case document, and do not redesign them. Only design cases on your own when the case document does not exist.
Design Principles
- Full dtype Coverage: Each shape / each boundary value traverses all dtypes supported by the operator
- Shapes Determined by Operator: Select appropriate shapes according to the dimensions supported by the operator, do not write fixed dimensions
- Shapes Not Too Large: Control the number of elements in a single case within a reasonable range to avoid unnecessary large tensors
- Total Number of Cases = (len(TEST_SHAPES) + len(BOUNDARY_VALUES)) × len(SUPPORTED_DTYPES) ≥ 30
Part A: Regular Shape Tests (TEST_SHAPES)
According to the dimensions supported by the operator, select suitable shapes from the following dimension pool to form the
list.
MUST select based on the actual supported dimensions of the operator, do not select unsupported dimensions.
Shape Selection Reference Pool
| Dimension | Recommended Shapes | Applicable Operator Types |
|---|
| 1D | (128,), (1024,), (4096,), (8192,) | elementwise, reduction |
| 2D | (32, 512), (64, 768), (128, 1024) | elementwise, matmul, linear |
| 3D | (8, 16, 64), (4, 128, 256) | elementwise, attention, conv1d |
| 4D | (4, 8, 32, 16), (2, 64, 32, 32) | conv2d, elementwise |
| 5D | (2, 3, 4, 5, 6) | conv3d, elementwise |
Shapes Not Too Large: It is recommended that the number of elements in a single shape ≤ 200K.
TEST_SHAPES Format
python
TEST_SHAPES = [
("category_name", "description", (dim0, dim1, ...)),
# ...
]
Category names can be customized, and it is recommended to name them by dimension or scenario. Example:
python
# elementwise operator (supports arbitrary dimensions)
TEST_SHAPES = [
("1D", "128 elements", (128,)),
("1D", "1024 elements", (1024,)),
("1D", "4096 elements", (4096,)),
("1D", "8192 elements", (8192,)),
("2D", "batch*hidden 32x512", (32, 512)),
("2D", "BERT-base 64x768", (64, 768)),
("2D", "BERT-large 128x1024", (128, 1024)),
("3D", "8x16x64", (8, 16, 64)),
("3D", "4x128x256", (4, 128, 256)),
("4D", "4x8x32x16", (4, 8, 32, 16)),
("Production", "ViT 8x197x768", (8, 197, 768)),
("Production", "single sample 1x512", (1, 512)),
]
Part B: Boundary Value Tests (BOUNDARY_VALUES)
Select key boundary points and typical values based on the input domain of the operator. Use the fixed small shape
for testing.
MUST determine boundary values according to the mathematical definition of the operator, which varies greatly for different operators.
Boundary Value Design Guidelines
| Operator Type | Recommended Boundary Values |
|---|
| acosh (x≥1) | x=1.0, x=1.001, x=10.0, x=1000.0 |
| log (x>0) | x=0.001, x=1.0, x=100.0, x=10000.0 |
| sigmoid (full domain) | x=0.0, x=-5.0, x=5.0, x=-20.0, x=20.0 |
| sqrt (x≥0) | x=0.0, x=0.001, x=1.0, x=10000.0 |
| No Domain Restrictions | x=0.0, x=1.0, x=-1.0, x=100.0 |
BOUNDARY_VALUES Format
python
BOUNDARY_VALUES = [
("description", scalar_value),
# ...
]
Example (acosh):
python
BOUNDARY_VALUES = [
("domain lower bound x=1.0", 1.0),
("near boundary x=1.001", 1.001),
("moderate value x=10.0", 10.0),
("large value x=1000.0", 1000.0),
]
Phase 3: Test Script Generation
MUST first read the template files in the
directory, replace the placeholders, and generate them to the
subdirectory of the operator directory.
Output Directory
csrc/ops/<op_name>/test/
├── test_<op_name>_precision.py ← pytest Test
├── run_<op_name>_precision_report.py ← Report Generator
├── <op_name>_precision_report.json ← JSON Report (generated after execution)
└── <op_name>_precision_report.md ← Markdown Report (generated after execution)
Template Files
| Template | Path | Generation Target |
|---|
| pytest Test | templates/test_op_precision_template.py
| csrc/ops/<op_name>/test/test_<op_name>_precision.py
|
| Report Generator | templates/run_precision_report_template.py
| csrc/ops/<op_name>/test/run_<op_name>_precision_report.py
|
Placeholder Replacement Table
| Placeholder | Description | Example (acosh) |
|---|
| Operator Name | |
| NPU Calling Expression (using variable ) | |
| CPU Reference Baseline (using variables , ) | torch.acosh(x.cpu().float()).to(dtype)
|
| List of All Supported dtypes | [torch.float16, torch.float32]
|
| Lower Bound of Random Input Domain | |
| Upper Bound of Random Input Domain | |
| List of Regular Shapes (Output of Phase 2 Part A) | See example above |
| List of Boundary Values (Output of Phase 2 Part B) | See example above |
Must-Collect Precision Metrics
Judgment Metrics (used for pass/fail judgment):
| Metric | Calculation Method | Pass Condition |
|---|
| MERE | ((npu - ref).abs() / (ref.abs() + 1e-7)).mean()
| < Threshold |
| MARE | ((npu - ref).abs() / (ref.abs() + 1e-7)).max()
| < 10 × Threshold |
Auxiliary Metrics (used for analysis, not as judgment basis):
| Metric | Calculation Method | Meaning |
|---|
| MaxAbsErr | | Maximum Absolute Error |
| MeanAbsErr | | Mean Absolute Error |
| CosineSim | F.cosine_similarity(npu.flatten(), ref.flatten())
| Cosine Similarity |
Phase 4: Test Execution
4.1 Environment Preparation
bash
source /usr/local/Ascend/ascend-toolkit/set_env.sh
export PATH=/root/miniconda3/envs/py310/bin:$PATH
MUST source the environment before each Shell call.
4.2 Execute pytest
bash
cd <project_root>
python3 -m pytest csrc/ops/<op_name>/test/test_<op_name>_precision.py -v --tb=short
4.3 Generate Report
bash
python3 csrc/ops/<op_name>/test/run_<op_name>_precision_report.py
4.4 Failure Handling
| Failure Type | Troubleshooting Direction |
|---|
| RuntimeError (NPU kernel) | Input data exceeds the defined domain / NPU does not support this dtype |
| AssertionError (Precision) | Check if MERE/MARE slightly exceeds the Threshold, analyze whether it is caused by boundary values |
| Individual dtype Cases FAIL | Confirm whether the Threshold of this dtype matches, check if MARE is concentrated in a few abnormal points |
| A Large Number of FAIL | Check if there are bugs in the operator Compute logic |
4.5 In-Depth Troubleshooting of Precision Issues
When precision failures occur (allclose not passed, excessive output deviation, all-zero/NaN output) and simple threshold adjustment cannot solve the problem,
MUST read and follow the
ascendc-operator-precision-debug
skill process for systematic root cause location:
- Read the SKILL.md of
ascendc-operator-precision-debug
- Execute according to its five-phase process: Error Analysis → Code Review → Experimental Isolation → Instrumentation Location → Fix Verification
- After fixing, re-run the complete precision test of this skill to confirm all tests pass
Note: Only call this when precision issues cannot be resolved by threshold adjustment. For cases where individual dtypes slightly exceed the threshold due to hardware precision characteristics, priority should be given to solving it by relaxing the threshold (and explain in the report).
Phase 5: Report Generation
5.1 Markdown Report
MUST generate
csrc/ops/<op_name>/test/<op_name>_precision_report.md
, referring to
templates/precision_report_template.md
.
The report includes:
- Overview Table: Total cases, passed cases, failed cases, pass rate (percentage).
- Precision Threshold Standard Table
- Regular Shape Test Results Table (grouped by category)
- Boundary Value Test Results Table
- Summary Statistics by dtype
- Key Findings (≥3 conclusions)
5.2 Completion Prompt (File + Conversation)
- File: MUST generate
csrc/ops/<op_name>/test/<op_name>_precision_report.md
(and in the same directory if the script outputs it), and provide the complete path to the user:
Precision verification report has been generated:
csrc/ops/<op_name>/test/<op_name>_precision_report.md
csrc/ops/<op_name>/test/<op_name>_precision_report.json # if exists
- Current Conversation: MUST also comply with the next section "Display Results in Conversation", and must not only output the path.
Display Results in Conversation (MANDATORY)
After pytest and the report script are executed and the Markdown/JSON report is generated, the assistant MUST in the reply of the current conversation:
- Paste Readable Conclusions (users can grasp the results without opening files):
- Overview: Total number of cases, number of passed cases, number of failed cases, pass rate (percentage).
- If there are failures: List the identifiers of failed cases (case name / shape / dtype / category), as well as main error metrics (such as MaxAbsErr) or pytest summary lines.
- If all passed: Clearly state "all passed" and the total number of cases.
- Key Findings ≥3: Can be consistent with the "Key Findings" in the report or extracted from the report (dtype differences, boundary value performance, whether the threshold is tightened/relaxed, etc.).
- Optional: A table summarizing the pass status by dtype (excerpt, only list summary rows when there are many cases).
- Caliber: Briefly explain the precision standards used (MERE/MARE, open source precision standards for ecological operators) and the Threshold values of each dtype in one or two sentences.
- Path at the End: After displaying the above content, attach the complete path of
<op_name>_precision_report.md
(and JSON if applicable).
NEVER: Only reply "report has been generated" and the path; NEVER use "please open the Markdown file yourself" instead of displaying the pass rate and failure summary in the conversation.
Experience Summary
Input Generation
- Defined Domain: Be sure to consult the mathematical definition of the operator to ensure legal input
- fp16 Range: The maximum value of fp16 is approximately 65504, and input should not exceed this value
- Shape Size: It is recommended that the number of elements in a single shape ≤ 200K to avoid excessive test time
Precision Metrics
- MERE / MARE: Judgment metrics, with denominator (not clamp), aligned with open source precision standards for ecological operators
- MaxAbsErr / MeanAbsErr: Auxiliary analysis, helping to judge the magnitude of deviation
- CosineSim: 0 or NaN when output is all-zero, need to mark and explain instead of judging failure
Threshold Description
- Threshold Source: Open Source Precision Standards for Ecological Operators (
references/OPS_PRECISION_STANDARDS.md
)
- Pass Condition: MERE < Threshold and MARE < 10 × Threshold
- It is not recommended to relax the threshold arbitrarily; if relaxation is indeed necessary, MUST explain the reason in the report
Anti-Patterns (NEVER)
- NEVER only generate report files without displaying the overview and conclusions in the conversation
- NEVER conceal the number of failed cases and only report the path
Checklist