AscendC Operator End-to-End Development Orchestration
Skill Type: Process-oriented (seven-stage workflow with serial orchestration of sub-skills)
This skill orchestrates seven sub-skills to drive ascend-kernel operators from scratch to production-ready.
Core Principles
- Seven-stage Serial Execution: Project Initialization → Design Documentation → Test Case Generation → Code Generation & Testing → Interface Documentation → Precision Evaluation → Performance Benchmarking, executed in strict order
- Sub-skill Execution: Each stage MUST call the corresponding sub-skill, no self-implementation allowed
- Stage Gating: Proceed to the next stage only after all checkpoints of the previous stage are passed
- Design-driven Coding: Code generation depends on the Tiling strategy and UB allocation table in the design document
- Automated Design: No need for users to provide pre-prepared design documents; the design stage generates them automatically
- Unified Test Case Generation: Generate test case documents immediately after design completion for reuse in subsequent precision evaluation and performance benchmarking
- Documentation Closure: After passing compilation and testing, MUST generate Chinese interface documents in PyTorch style and display them in the chat interface
- Precision Closure: Operators must pass ≥30 comprehensive precision evaluation cases to be considered complete
- Performance Closure: Operators must pass msprof performance comparison and benchmarking, with a performance report output
- Result Visualization: Results of Phase 4/5/6/7 MUST be directly displayed in the chat interface in Markdown format, do not only output paths
Available Sub-skill List
| Skill | Path | Responsibility |
|---|
ascendc-operator-project-init
| ascendc-operator-project-init/SKILL.md
| Detect/create ascend-kernel project, generate operator skeleton directory |
| ascendc-operator-design/SKILL.md
| Analyze operator requirements, generate design document (including Tiling strategy, UB allocation table) |
ascendc-operator-testcase-gen
| ascendc-operator-testcase-gen/SKILL.md
| Generate unified test case document based on design document for reuse in precision evaluation and performance benchmarking |
ascendc-operator-code-gen
| ascendc-operator-code-gen/SKILL.md
| Generate op_host/op_kernel code, framework adaptation, compilation testing |
ascendc-operator-compile-debug
| ascendc-operator-compile-debug/SKILL.md
| Compile, install whl package, generate test files, run precision tests (called internally by code-gen) |
| ascendc-operator-doc-gen/SKILL.md
| Extract interface information from source code, generate Chinese API documents in PyTorch style (mandatory stage) |
ascendc-operator-precision-eval
| ascendc-operator-precision-eval/SKILL.md
| Generate ≥30 precision test cases, run them and output precision verification report (mandatory stage) |
ascendc-operator-performance-eval
| ascendc-operator-performance-eval/SKILL.md
| Use msprof to compare performance between project operators and native operators, output performance benchmarking report (mandatory stage) |
Workflow Overview
Phase 1 Phase 2 Phase 3 Phase 4 Phase 5 Phase 6 Phase 7
Project Init ──▶ Design Doc ──▶ Test Case Gen ──▶ Code Gen + Framework Adaptation + Compile Test ──▶ Interface Doc ──▶ Precision Eval Report ──▶ Performance Benchmark Report
project-init design testcase-gen code-gen → compile-debug doc-gen precision-eval performance-eval
Input: Operator Name + Function Description Output: Production-ready Operator + Test Case Doc + Interface Doc + Precision Report + Performance Report
Anti-pattern List (NEVER DO THESE)
- ❌ Do not skip the design stage and directly write code
- ❌ Do not skip the test case generation stage; Phase 3 (testcase-gen) must be executed after Phase 2 is passed
- ❌ Do not implement any operator code by yourself, must call sub-skills
- ❌ Do not modify framework files (ops.h / register.cpp / CMakeLists.txt) before code generation
- ❌ Do not manually execute compilation and testing, handle uniformly via compile-debug skill
- ❌ Do not reference non-existent skills
- ❌ Do not skip checkpoint verification
- ❌ Do not skip the interface documentation stage; Phase 5 must be executed after Phase 4 is passed
- ❌ Do not skip the precision evaluation stage; Phase 6 must be executed after Phase 5 is passed
- ❌ Do not skip the performance benchmarking stage; Phase 7 must be executed after Phase 6 is passed
- ❌ Do not use timing methods other than msprof as performance conclusions
- ❌ Do not design test cases for precision evaluation and performance benchmarking by yourself, must first read the test case document generated by testcase-gen
Phase 0: Requirements Collection
Goal: Confirm the minimum information set required for operator development, including development environment and operator requirements
Step 0.1: Environment Confirmation (MUST be completed before any development action)
The development environment is a prerequisite for all subsequent stages, must be confirmed first.
CANN Environment
Automatic Detection Process:
- Check if the environment variable is set ()
- If set: Use it directly as without asking the user
- If not set: MUST ask the user for the CANN installation path (e.g.,
/usr/local/Ascend/ascend-toolkit
)
Activation Method:
bash
source ${CANN_PATH}/*/set_env.sh
In every Shell session that requires compiling or running operators, this activation command must be executed first.
Conda Environment
Automatic Detection Process:
- Check if a conda environment is currently activated ()
- If activated (value is not and not empty): Use the current environment directly without asking the user
- If not activated or is : MUST ask the user for the name of the conda environment to use
Activation Method:
bash
conda activate <env_name>
In every Shell session that requires compiling or running operators, the conda environment must be activated first.
Environment Confirmation Checkpoints
Step 0.2: Operator Requirements Collection
Mandatory Information to Confirm
| Information | Format Requirement | Mandatory | Description |
|---|
| CANN Environment Path | Absolute path | Yes | Auto-detect , ask user if not set |
| Conda Environment Name | String | Yes | Auto-detect , ask user if not activated |
| Operator Name | snake_case | Yes | e.g., , , |
| Function Description | Text/Mathematical Formula | Yes | e.g., "Inverse hyperbolic cosine acosh(x) = ln(x + sqrt(x²-1))" |
Optional Information (with default values):
| Information | Default Value | Description |
|---|
| Supported Data Types | float16, float32 | Can be extended to bfloat16 |
| SoC Platform | ascend910b | Auto-obtained via platform API |
Decision Tree
| User Request | Handling Method |
|---|
| "Generate X operator" / "Develop X operator" | Complete environment confirmation (Step 0.1) first, then infer the function from the operator name, and execute the full process directly after confirmation |
| "Help me develop a new operator" (no specific name) | Complete environment confirmation (Step 0.1) first, then ask for the operator name and function description |
| "Continue operator development" | Complete environment confirmation (Step 0.1) first, then check existing files to determine the stage and resume from the interrupted point |
Acceptance Criteria
Phase 1: Project Initialization
Called Skill:
ascendc-operator-project-init
Execution Content
MANDATORY: Execute according to the ascendc-operator-project-init skill process:
1. Detect if the ascend-kernel project exists
2. Copy from template if it does not exist
3. Create operator skeleton under csrc/ops/<op_name>/
4. Prompt three registration update points
Checkpoints
All passed → Proceed to Phase 2
Phase 2: Design Document Generation
Execution Content
MANDATORY: Execute according to the ascendc-operator-design skill process:
1. Analyze operator requirements (name, function, data types)
2. Determine implementation path (AscendC Kernel / CATLASS / ACLNN)
3. Design Tiling strategy (Block-level + UB-level)
4. Fill in UB allocation table, derive bufferCoefficient
5. Generate complete design document to csrc/ops/<op_name>/design.md
Checkpoints
All passed → Proceed to Phase 3
Phase 3: Test Case Generation
Called Skill:
ascendc-operator-testcase-gen
Execution Content
MANDATORY: Execute according to the ascendc-operator-testcase-gen skill process:
1. Read csrc/ops/<op_name>/design.md, extract parameter constraints, supported dtypes, typical shapes
2. Generate TEST_SHAPES (regular shapes), GENERAL_SHAPES (generalized shapes), BOUNDARY_VALUES (boundary values)
3. Generate operator benchmarks (CPU reference implementation, NPU calling method)
4. Output test case document to csrc/ops/<op_name>/test/<op_name>-test-cases.md
Checkpoints
All passed → Proceed to Phase 4
Phase 4: Code Generation + Framework Adaptation + Compile Test
Called Skill:
ascendc-operator-code-gen
(internally calls
ascendc-operator-compile-debug
automatically)
Execution Content
MANDATORY: Execute according to the ascendc-operator-code-gen skill process:
Stage 1: Load Reference Documents
- Read references/GUIDE.md
- Load corresponding reference according to operator type
Stage 2: Read Design Document
- Extract function signature, UB allocation table, calculation pseudocode
Stage 3: Select Template and Generate Code
- Select elementwise / row template
- Generate op_host/<op_name>.cpp (includes Tiling calculation logic)
- Generate op_kernel/<op_name>.cpp (includes Compute calculation logic)
Stage 4: Framework Adaptation
- Update csrc/ops.h (function declaration)
- Update csrc/register.cpp (m.def + m.impl)
- Update csrc/CMakeLists.txt (OP_SRCS + ascendc_library)
Stage 5: Compilation, Installation and Testing (call compile-debug skill)
- Compile via ./build.sh
- Install via pip install whl
- Generate tests/test_<op_name>.py
- Run functional tests and precision tests
- Debug up to 3 times if compilation/test fails
Checkpoints
All passed → Proceed to Phase 5
Phase 5: Interface Document Generation
Execution Content
MANDATORY: Execute according to the ascendc-operator-doc-gen skill process:
Stage 1: Information Extraction
- Extract Python calling signature (m.def schema) from register.cpp
- Extract C++ function declaration and return type from ops.h
- Extract algorithm description, parameter description, dtype support, constraint conditions from design.md
- Extract TORCH_CHECK constraints from op_host
- Extract usage examples from tests/test_<op_name>.py
Stage 2: Document Structure Assembly
- Assemble Chinese interface documents in PyTorch official documentation style
- Includes: Title Signature + Function Description + Parameter Description + Supported Data Types + Shape + Constraint Conditions + Usage Examples + Return Value
Stage 3: File Generation
- Generate csrc/ops/<op_name>/README.md
Stage 4: Display complete document content in the interactive interface
Checkpoints
All passed → Proceed to Phase 6
Phase 6: Precision Evaluation Report
Called Skill:
ascendc-operator-precision-eval
Execution Content
MANDATORY: Execute according to the ascendc-operator-precision-eval skill process:
Stage 1: Load Test Case Document + Information Collection
- Read csrc/ops/<op_name>/test/<op_name>-test-cases.md (output from testcase-gen)
- Extract SUPPORTED_DTYPES, TEST_SHAPES, GENERAL_SHAPES, BOUNDARY_VALUES, operator benchmarks
- Supplement and extract information such as precision thresholds from existing code
Stage 2: Test Case Adaptation ((shapes + boundary) × dtypes ≥ 30 cases)
- Directly reuse TEST_SHAPES and BOUNDARY_VALUES from testcase-gen
- Traverse all dtypes supported by the operator for each shape / boundary value
Stage 3: Test Script Generation (output to operator directory csrc/ops/<op_name>/test/)
- Generate test_<op_name>_precision.py (pytest format) based on template
- Generate run_<op_name>_precision_report.py (report generator) based on template
Stage 4: Execution
- Run pytest and all tests pass
- Run report generator to output JSON
Stage 5: Report Generation
- Generate <op_name>_precision_report.md (includes regular shape + boundary value table + summary + key findings)
- Prompt the user for the report path
Checkpoints
All passed → Proceed to Phase 7
Phase 7: Performance Benchmarking Report
Called Skill:
ascendc-operator-performance-eval
Execution Content
MANDATORY: Execute according to the ascendc-operator-performance-eval skill process:
Stage 1: Load Test Case Document + Information Collection
- Read csrc/ops/<op_name>/test/<op_name>-test-cases.md (output from testcase-gen)
- Extract SUPPORTED_DTYPES, TEST_SHAPES, GENERAL_SHAPES, operator benchmarks
- Supplement and extract information such as OP Type keywords from existing code
Stage 2: Test Case Adaptation (JSONL format, ≥8 cases)
- Select representative shapes from TEST_SHAPES + GENERAL_SHAPES of testcase-gen
- Cover all dtypes supported by the operator
- Convert to JSONL format
Stage 3: Script Generation (output to operator directory csrc/ops/<op_name>/test/)
- Generate run_<op_name>_case.py (single case msprof executor) based on template
- Generate benchmark_<op_name>_msprof.py (master control script) based on template
- Generate <op_name>_cases.jsonl
Stage 4: Execution and Collection
- Run the master control script, 20 iterations per case (first 10 for warm-up)
- Extract Task Duration(us) and hardware metrics from op_summary_*.csv by OP Type
- Output JSON results
Stage 5: Report Generation
- Generate <op_name>_perf_report.md (includes result table + summary + brief analysis)
- Prompt the user for the report path
Checkpoints
All passed → Operator development is complete
Inter-stage Data Flow
Phase 1 Output Phase 2 Input
csrc/ops/<op_name>/ ────▶ Operator name, directory structure
design.md (placeholder)
Phase 2 Output Phase 3 Input
design.md (complete) ────▶ Parameter constraints, supported dtypes, typical shapes
→ Generate unified test case document
Phase 3 Output Phase 4 Input
<op_name>-test-cases.md ────▶ design.md (complete)
(test case document for subsequent reuse) Function signature, UB allocation table → bufferCoefficient
Calculation pseudocode → Compute logic
Tiling strategy → Block/UB splitting parameters
Phase 4 Output Phase 5 Input
Installed operator whl ────▶ register.cpp / ops.h / design.md /
tests/test_<op_name>.py op_host / test files
→ Extract interface information to generate documents
Phase 5 Output Phase 6 Input
csrc/ops/<op>/README.md ────▶ <op_name>-test-cases.md (from Phase 3)
Interface document completed Operator name, calling method, input domain constraints
All supported dtypes, precision thresholds
→ Output to csrc/ops/<op_name>/test/
Phase 6 Output Phase 7 Input
Precision report passed ────▶ <op_name>-test-cases.md (from Phase 3)
csrc/ops/<op>/test/ Operator name, project/native calling method
All supported dtypes, OP Type keywords
→ Output to csrc/ops/<op_name>/test/
Status Tracking Table
| Phase | Precondition | Called Skill | Key Deliverables |
|---|
| 0. Requirements Collection | None | — | CANN path + Conda environment + Operator name + Function description |
| 1. Project Initialization | Phase 0 | ascendc-operator-project-init
| Operator skeleton directory |
| 2. Design Document | Phase 1 | | design.md (includes Tiling + UB allocation table) |
| 3. Test Case Generation | Phase 2 | ascendc-operator-testcase-gen
| (unified test case document) |
| 4. Code & Testing | Phase 3 | ascendc-operator-code-gen
→ | Runnable operator + basic tests passed |
| 5. Interface Document | Phase 4 | | PyTorch-style Chinese API document (README.md) |
| 6. Precision Evaluation | Phase 5 | ascendc-operator-precision-eval
| ≥30 precision test cases + precision report |
| 7. Performance Benchmarking | Phase 6 | ascendc-operator-performance-eval
| msprof performance comparison + performance report |
Error Recovery
Resume from Interrupted Point
When the user says "Continue operator development":
| Detection Condition | Determined Stage | Recovery Action |
|---|
| does not exist | Phase 1 not completed | Start from Phase 1 |
| is placeholder or empty | Phase 2 not completed | Start from Phase 2 |
csrc/ops/<op_name>/test/<op_name>-test-cases.md
does not exist | Phase 3 not completed | Start from Phase 3 |
| still contains skeleton code | Phase 4 not completed | Start from Phase 4 |
| whl package not generated | Phase 4 compilation not completed | Resume from compilation step |
| Basic tests not passed | Phase 4 testing not completed | Resume from testing step |
csrc/ops/<op_name>/README.md
does not exist | Phase 5 not completed | Start from Phase 5 |
| No precision report in | Phase 6 not started | Start from Phase 6 |
| Precision report does not exist or precision tests not all passed | Phase 6 not completed | Resume from Phase 6 |
| Precision report exists but performance report does not | Phase 7 not started | Start from Phase 7 |
| does not exist or is incomplete | Phase 7 not completed | Resume from Phase 7 |
Compilation/Test Failure
Handled internally by
ascendc-operator-compile-debug
skill, up to 3 debugging attempts. If it still fails after 3 times, stop and report detailed errors to the user.