Catlass Operator End-to-End Development Orchestration
Skill Type: Process-oriented (six-phase workflow; Catlass Source Code Preparation is incorporated into Phase 1, sub-skills are orchestrated serially)
This skill orchestrates Catlass operators on ascend-kernel from scratch to production-ready; general capabilities (project skeleton, compilation debugging, interface documentation, precision, performance) reuse ascendc-* sub-skills, and Catlass-specific (source code tree, design, Device/Host implementation) uses catlass-* sub-skills.
Core Principles
- Six-phase serial execution: Project initialization (including Catlass source code) → Design documentation → Code generation & compilation testing → Interface documentation → Precision evaluation → Performance evaluation, executed in strict order
- Sub-skill execution: Each phase MUST open and follow the corresponding sub-skill, and shall not replace the implementation by oneself
- Phase gating: Enter the next phase only after all checkpoints of the previous phase are passed
- Design-driven coding: Code generation depends on the finalized from catlass-operator-design and the selection of catlass/examples
- No need for users to pre-write design documents: The design phase generates and saves the design document via catlass-operator-design
- Documentation closed loop: After passing compilation testing, MUST generate PyTorch-style Chinese interface documentation (Phase 4) and display it in the chat interface
- Precision closed loop: The operator must pass ≥30 comprehensive precision evaluation cases (Phase 5) to be considered completed
- Performance closed loop: The operator must complete the comparative evaluation with torch_npu.profiler and output a performance report (Phase 6); the conclusion shall be based on ascendc-operator-performance-eval
- Result visualization: Key results of Phase 3/4/5/6 MUST be directly displayed in the chat interface in the form of Markdown or other formats, do not only output paths
- Operator naming: (snake_case) MUST contain the substring , consistent with the convention of existing Catlass operators in ascend-kernel
- Honest shutdown: When unable to continue due to environment or dependencies, explain the specific reason and completed steps before stopping
Catlass Compilation and Operation (Error-Prone Summary)
- Build: ; CMake uses Python with torch_npu (such as / ); must match the chip (see
catlass-operator-code-gen/references/compile-catlass.md
); CANN can be the bundle root + .
- pytest / torch_npu: If is reported:
export ASCEND_RUNTIME_PATH="${ASCEND_TOOLKIT_HOME}/runtime"
.
- Design/Code: Consistent with the compilable examples in and , details see .
Available Sub-Skill List
| Skill | Path | Responsibility |
|---|
ascendc-operator-project-init
| ascendc-operator-project-init/SKILL.md
| Detect/create ascend-kernel, generate operator skeleton in |
| — | (Step within Phase 1) | Clone in ASCEND_KERNEL_ROOT (at the same level as ) to make and available |
| catlass-operator-design/SKILL.md
| Convert Catlass requirements into finalized design documents (recommended path: csrc/ops/<op_name>/design.md
) |
catlass-operator-code-gen
| catlass-operator-code-gen/SKILL.md
| Implement op_host / op_kernel and framework adaptation according to and catlass/examples, and internally call the compilation testing skill |
ascendc-operator-compile-debug
| ascendc-operator-compile-debug/SKILL.md
| Compile, install whl package, generate/run (called in Phase 5 of catlass-operator-code-gen, do not skip code-gen directly and claim completion) |
| ascendc-operator-doc-gen/SKILL.md
| Generate PyTorch-style Chinese API document (mandatory phase) |
ascendc-operator-precision-eval
| ascendc-operator-precision-eval/SKILL.md
| ≥30 precision test cases and precision verification report (mandatory phase) |
ascendc-operator-performance-eval
| ascendc-operator-performance-eval/SKILL.md
| JSONL test cases + torch_npu.profiler (warmup/active=5) + summary, output Markdown report of custom vs benchmark operators (mandatory phase) |
catlass-operator-performance-optim
| catlass-operator-performance-optim/SKILL.md
| Optional after delivery: Perform tiling/performance iteration according to Catlass documentation; after code changes, re-run the closed loop starting from Phase 3 |
Project Directory Terminology (Aligned with AscendC)
| Term | Meaning |
|---|
| ASCEND_KERNEL_ROOT | Root directory of ascend-kernel: contains , , |
| Operator Directory | <ASCEND_KERNEL_ROOT>/csrc/ops/<op_name>/
|
| Catlass Source Code | <ASCEND_KERNEL_ROOT>/catlass/
(Prohibited to clone in ) |
Workflow Overview
┌─────────────────────────────┐ ┌──────────────┐ ┌───────────────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Phase 1 │ │ Phase 2 │ │ Phase 3 │ │ Phase 4 │ │ Phase 5 │ │ Phase 6 │
│ Project Init + Catlass Src │──▶│ Catlass Design │──▶│ Code Gen + Framework Adaption + Compile Test │──▶│ Interface Doc Gen │──▶│ Precision Eval Report │──▶│ Performance Eval Report │
│ project-init + clone │ │ catlass- │ │ catlass-code-gen → │ │ doc-gen │ │ precision-eval │ │ performance-eval│
│ catlass │ │ design │ │ compile-debug │ │ │ │ │ │ (profiler) │
└─────────────────────────────┘ └──────────────┘ └───────────────────────────┘ └──────────────────┘ └──────────────────┘ └──────────────────┘
Input: Operator name (contains catlass) + Function description + Environment confirmation Output: Deliverable operator + README + Precision report + Profiler performance report
Anti-Pattern List (NEVER DO THESE)
- ❌ Do not skip Catlass Source Code Preparation (do not proceed with design or code generation without and )
- ❌ Do not clone Catlass in , must be in project root under
- ❌ Do not skip the design phase and directly write kernel/host code
- ❌ Do not implement the entire operator independently without following the catlass-operator-code-gen process
- ❌ Do not modify framework registration without permission before code generation (follow the conventions of project-init / code-gen)
- ❌ Do not manually replace the compilation, installation and basic test closed loop responsible for compile-debug (should be triggered via code-gen Phase 5)
- ❌ Do not skip the interface documentation phase (Phase 4 must be executed after Phase 3 passes)
- ❌ Do not skip the precision evaluation phase (Phase 5 must be executed after Phase 4 passes)
- ❌ Do not skip the performance evaluation phase (Phase 6 must be executed after Phase 5 passes)
- ❌ Do not use collection methods inconsistent with ascendc-operator-performance-eval as the final performance conclusion
- ❌ Do not reference non-existent skills
Phase 0: Requirements Collection
Objective: Confirm the minimum information set and operating environment for Catlass operator development (aligned with ascendc-operator-dev Phase 0, plus Catlass naming constraints).
Step 0.1: Environment Confirmation (MUST be completed before any development actions)
CANN Environment
- Check (run )
- Already set: Use as , no need to ask repeatedly
- Not set: MUST ask the user for the CANN path (e.g.,
/usr/local/Ascend/ascend-toolkit
)
bash
source ${CANN_PATH}/*/set_env.sh
Conda Environment
- Check
- Activated and not : Use directly
- Not activated or is : MUST ask the user for the conda environment name
bash
conda activate <env_name>
Environment Confirmation Checkpoints
Step 0.2: Operator Requirements Collection
| Information | Format Requirement | Mandatory | Description |
|---|
| CANN Path | Absolute path | Yes | Aligned with ascendc, can be detected automatically |
| Conda Environment | String | Yes | Aligned with ascendc, can be detected automatically |
| Operator Name | snake_case, contains | Yes | e.g., |
| Function Description | Text/Formula/Benchmark Example | Yes | Consistent with Catlass capability scope |
Optional: Support dtype, SoC — default values can be consistent with catlass-operator-design / platform APIs.
Decision Tree
| User Request | Handling Method |
|---|
| "Develop/generate a certain Catlass operator" | Complete Step 0.1 → Validate that the name contains → Confirm function → Execute full workflow |
| "Continue Catlass operator development" | Complete Step 0.1 → Detect current phase according to Error Recovery and resume |
Acceptance Criteria
Phase 1: Project Initialization + Catlass Source Code Preparation
Step 1.1: Project Skeleton
Call Skill:
ascendc-operator-project-init
MANDATORY: Execute according to ascendc-operator-project-init:
1. Detect or create ascend-kernel
2. Create operator skeleton in csrc/ops/<op_name>/
3. Prompt registration update points (to be implemented by catlass-operator-code-gen later)
Checkpoints (Step 1.1)
Step 1.2: Catlass Source Code
This step does not correspond to an independent skill file, but must be executed according to the following requirements.
Prerequisite: Step 1.1 is completed
Execution Content
- Ensure exists under , and contains and
- If not exists: MUST execute in the project root (Prohibited to clone in )
git clone https://gitcode.com/cann/catlass.git catlass
Checkpoints (Step 1.2)
All Phase 1 checkpoints passed → Enter Phase 2
Phase 2: Catlass Design Document
Execution Content
MANDATORY: Execute according to catlass-operator-design:
1. Analyze requirements and Catlass component boundaries
2. Align with the implementable paths of catlass/examples and catlass/include
3. Finalize and save the recommended path: csrc/ops/<op_name>/design.md (consistent with the reading path of doc-gen / precision-eval / performance-eval)
Checkpoints
All checkpoints passed → Enter Phase 3
Phase 3: Code Generation + Framework Adaption + Compile Test
Call Skill:
catlass-operator-code-gen
(Phase 5
MUST call
ascendc-operator-compile-debug
)
Execution Content
MANDATORY: Execute according to catlass-operator-code-gen (aligned with the phase structure of ascendc-operator-code-gen):
Phase 1: Load GUIDE / references (including compile-catlass, chapters aligned with ascendc code-gen)
Phase 2: Read design.md, lock the catlass/examples path and type system
Phase 3: Generate op_kernel + op_host, register Catlass compilation options in CMake (BUILD_CATLASS_MODULE, CATLASS_ARCH, etc. see compile-catlass.md)
Phase 4: Framework adaptation — ops.h, register.cpp, csrc/CMakeLists.txt
Phase 5: Compile, install and test — call ascendc-operator-compile-debug (build.sh, pip install, tests/test_<op_name>.py, error troubleshooting is subject to this skill)
Checkpoints
All checkpoints passed → Enter Phase 4
Phase 4: Interface Document Generation
Execution Content
MANDATORY: Execute according to ascendc-operator-doc-gen:
- Extract interface information from register.cpp, ops.h, design.md, op_host, tests
- Generate csrc/ops/<op_name>/README.md (PyTorch-style Chinese)
- Display document key points or full text in the chat interface
Checkpoints
All checkpoints passed → Enter Phase 5
Phase 5: Precision Evaluation Report
Call Skill:
ascendc-operator-precision-eval
Execution Content
MANDATORY: Execute according to ascendc-operator-precision-eval:
- Number of test cases ≥30, covering shapes × dtypes × boundaries
- Output to csrc/ops/<op_name>/test/, generate Markdown precision report
- Display overview, failure summary and key findings in the chat interface (do not only provide paths)
Checkpoints
FAIL Closed Loop: Root cause analysis → Revise design (Phase 2) or code (Phase 3) → Re-test via Phase 4, Phase 5
All checkpoints passed → Enter Phase 6
Phase 6: Performance Evaluation Report
Call Skill:
ascendc-operator-performance-eval
Execution Content
MANDATORY: Take ascendc-operator-performance-eval SKILL.md as the only detailed rule:
- Maintain JSONL test cases in csrc/ops/<op_name>/test/; read design.md before generation
- Use torch_npu.profiler, warmup=5, active=5
- Summarize indicators such as ASCEND_PROFILER_OUTPUT/op_statistic.csv, output Markdown report of custom operator vs benchmark
- Display comparison table and brief conclusion in the chat interface
Checkpoints
All checkpoints passed → Catlass operator main workflow is completed
Post-Delivery Optional: Performance Optimization
Call Skill:
catlass-operator-performance-optim
Must ask the user whether to enter tuning; do not skip the question by default.
- User agrees → Modify tiling/implementation according to catlass-operator-performance-optim; any code change → Re-run from Phase 3 (Phase 3→4→5→6) until it meets the standards again
- User refuses → End
Inter-Phase Data Flow
Phase 1 Output Phase 2 Input
ascend-kernel + ops/<op>/skeleton Operator name, catlass/ is referenceable
+ catlass/include, examples ────▶
Phase 2 Output Phase 3 Input
design.md (finalized) ────▶ Example path, type and Host contract
Phase 3 Output Phase 4 Input
Installed whl + test_<op>.py ────▶ register.cpp / ops.h / design.md / op_host
Phase 4 Output Phase 5 Input
README.md ────▶ Interface, dtype, constraints, calling method
Phase 5 Output Phase 6 Input
Precision passed + Report ────▶ Operator name, benchmark API, JSONL and profiler workflow
Phase 6 Output
Performance Report (profiler) ────▶ Optional: Enter catlass-operator-performance-optim after user confirmation
Status Tracking Table
| Phase | Prerequisite | Called Skill / Action | Key Deliverables |
|---|
| 0. Requirements Collection | None | — | CANN + Conda + (contains catlass) + Function description |
| 1. Project + Catlass | Phase 0 | ascendc-operator-project-init
+ root directory | Skeleton + Catlass source code tree |
| 2. Design | Phase 1 | | |
| 3. Code & Test | Phase 2 | catlass-operator-code-gen
→ | Runnable operator + Basic test passed |
| 4. Interface Doc | Phase 3 | | |
| 5. Precision Eval | Phase 4 | ascendc-operator-precision-eval
| ≥30 cases + Precision report |
| 6. Performance Eval | Phase 5 | ascendc-operator-performance-eval
| JSONL + Profiler report |
| (Optional) Tuning | Phase 6 + User Confirmation | catlass-operator-performance-optim
| Iterated implementation and report |
Error Recovery
Resume from Breakpoint
When the user says "Continue Catlass operator development":
| Detection Condition | Determined Phase | Recovery Action |
|---|
| does not exist | Phase 1 not completed | Start from Phase 1 Step 1.1 |
| does not exist | Phase 1 not completed | Complete Step 1.2 cloning |
| is empty or placeholder | Phase 2 not completed | Start from Phase 2 |
| / are still skeleton or inconsistent with design | Phase 3 not completed | Start from Phase 3 |
| whl not installed or fails | Phase 3 not completed | Recover within compile-debug workflow |
| No | Phase 4 not completed | Start from Phase 4 |
| No precision report in or precision not fully passed | Phase 5 not completed | Recover from Phase 5 |
| No performance report or does not meet performance-eval requirements | Phase 6 not completed | Recover from Phase 6 |
Compile/Test Failure
Handled by ascendc-operator-compile-debug (triggered via catlass-operator-code-gen); retry and troubleshooting limits are subject to compile-debug skill.