Catlass Operator End-to-End Development Orchestration
Skill Type: Process-oriented (six-phase workflow; Catlass Source Code Preparation is integrated into Phase 1, sub-skills are orchestrated serially)
This skill orchestrates the development of Catlass operators on ascend-kernel from scratch to production readiness; general capabilities (project skeleton, compilation debugging, interface documentation, precision, performance) reuse ascendc-*** sub-skills, while Catlass-specific (source tree, design, Device/Host implementation) uses catlass-* sub-skills.
Core Principles
- Six-phase Serial Execution: Project Initialization (including Catlass source code) → Design Documentation → Code Generation & Compilation Testing → Interface Documentation → Precision Evaluation → Performance Evaluation, executed in strict order
- Sub-skill Execution: Each phase MUST open and follow the corresponding sub-skill, and shall not implement alternatives on its own
- Phase Gating: Enter the next phase only after all checkpoints of the previous phase are passed
- Design-Driven Coding: Code generation depends on the finalized from catlass-operator-design and the selection of catlass/examples
- No Pre-written Design Documentation Required: The design phase generates and saves the document via catlass-operator-design
- Documentation Closed Loop: After passing compilation testing, MUST generate PyTorch-style Chinese interface documentation (Phase 4) and display it in the chat interface
- Precision Closed Loop: The operator must pass ≥30 comprehensive precision evaluation cases (Phase 5) to be considered complete
- Performance Closed Loop: The operator must complete torch_npu.profiler comparison evaluation and output a performance report (Phase 6); conclusions are subject to ascendc-operator-performance-eval
- Result Visualization: Key results of Phase 3/4/5/6 MUST be directly displayed in the chat interface in Markdown or other formats, do not only output paths
- Operator Naming: (snake_case) MUST contain the substring , consistent with the convention of existing Catlass operators in ascend-kernel
- Honest Shutdown: When unable to continue due to environment or dependencies, stop after explaining the specific reason and completed steps
Catlass Compilation and Operation (Error-prone Summary)
- Build: ; CMake uses Python with torch_npu (e.g., / ); matches the chip (see
catlass-operator-code-gen/references/compile-catlass.md
); CANN can be bundle root + .
- pytest / torch_npu: If is reported:
export ASCEND_RUNTIME_PATH="${ASCEND_TOOLKIT_HOME}/runtime"
.
- Design/Code: Align with examples that can be compiled with and , details see .
Available Sub-skill List
| Skill | Path | Responsibilities |
|---|
ascendc-operator-project-init
| ascendc-operator-project-init/SKILL.md
| Detect/create ascend-kernel, generate operator skeleton in |
| — | (Steps within Phase 1) | Clone under ASCEND_KERNEL_ROOT (same level as ) to make and available |
| catlass-operator-design/SKILL.md
| Convert Catlass requirements into finalized design documentation (recommended csrc/ops/<op_name>/design.md
) |
catlass-operator-code-gen
| catlass-operator-code-gen/SKILL.md
| Implement op_host / op_kernel and framework adaptation according to and catlass/examples, and internally call compilation testing skill |
ascendc-operator-compile-debug
| ascendc-operator-compile-debug/SKILL.md
| Compile, install whl, generate/run (called in Phase 5 of catlass-operator-code-gen, do not skip code-gen directly and claim completion) |
| ascendc-operator-doc-gen/SKILL.md
| Generate PyTorch-style Chinese API documentation (mandatory phase) |
ascendc-operator-precision-eval
| ascendc-operator-precision-eval/SKILL.md
| ≥30 precision test cases and precision verification report (mandatory phase) |
ascendc-operator-performance-eval
| ascendc-operator-performance-eval/SKILL.md
| JSONL test cases + torch_npu.profiler (warmup/active=5) + summary, output custom vs benchmark Markdown report (mandatory phase) |
catlass-operator-performance-optim
| catlass-operator-performance-optim/SKILL.md
| Optional after delivery: Perform tiling/performance iteration according to Catlass documentation; after code changes, re-run the closed loop starting from Phase 3 |
Project Directory Terminology (Aligned with AscendC)
| Term | Meaning |
|---|
| ASCEND_KERNEL_ROOT | ascend-kernel root directory: contains , , |
| Operator Directory | <ASCEND_KERNEL_ROOT>/csrc/ops/<op_name>/
|
| Catlass Source Code | <ASCEND_KERNEL_ROOT>/catlass/
(Prohibited to clone within ) |
Workflow Overview
┌─────────────────────────────┐ ┌──────────────┐ ┌───────────────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Phase 1 │ │ Phase 2 │ │ Phase 3 │ │ Phase 4 │ │ Phase 5 │ │ Phase 6 │
│ Project Init + Catlass Src │──▶│ Catlass Design │──▶│ Code Gen+Framework Adapt+Compile Test │──▶│ Interface Doc Gen │──▶│ Precision Eval Report │──▶│ Performance Eval Report │
│ project-init + clone │ │ catlass- │ │ catlass-code-gen → │ │ doc-gen │ │ precision-eval │ │ performance-eval│
│ catlass │ │ design │ │ compile-debug │ │ │ │ │ │ (profiler) │
└─────────────────────────────┘ └──────────────┘ └───────────────────────────┘ └──────────────────┘ └──────────────────┘ └──────────────────┘
Input: Operator name (contains catlass) + Function description + Environment confirmation Output: Deliverable operator + README + Precision report + Profiler performance report
Anti-pattern List (NEVER DO THESE)
- ❌ Do not skip Catlass Source Code Preparation (do design or code generation without and )
- ❌ Do not clone Catlass within , must clone in under project root
- ❌ Do not skip the design phase and directly write kernel/host code
- ❌ Do not implement the entire operator on your own without following the catlass-operator-code-gen process
- ❌ Do not modify framework registration without authorization before code generation (follow the conventions of project-init / code-gen)
- ❌ Do not manually replace the compilation installation and basic test closed loop responsible for compile-debug (should be triggered via Phase 5 of code-gen)
- ❌ Do not skip the interface documentation phase (Phase 4 must be executed after Phase 3 passes)
- ❌ Do not skip the precision evaluation phase (Phase 5 must be executed after Phase 4 passes)
- ❌ Do not skip the performance evaluation phase (Phase 6 must be executed after Phase 5 passes)
- ❌ Do not use a collection method inconsistent with ascendc-operator-performance-eval as the final performance conclusion
- ❌ Do not reference non-existent skills
Phase 0: Requirements Collection
Goal: Confirm the minimum information set and operating environment for Catlass operator development (aligned with ascendc-operator-dev Phase 0, with additional Catlass naming constraints).
Step 0.1: Environment Confirmation (MUST be completed before any development actions)
CANN Environment
- Check ()
- Already set: Use as , no need to ask repeatedly
- Not set: MUST ask the user for the CANN path (e.g.,
/usr/local/Ascend/ascend-toolkit
)
bash
source ${CANN_PATH}/*/set_env.sh
Conda Environment
- Check
- Activated and not : Use directly
- Not activated or is : MUST ask the user for the conda environment name
bash
conda activate <env_name>
Environment Confirmation Checkpoints
Step 0.2: Operator Requirements Collection
| Information | Format Requirement | Mandatory | Description |
|---|
| CANN Path | Absolute path | Yes | Same as ascendc, can be detected automatically |
| Conda Environment | String | Yes | Same as ascendc, can be detected automatically |
| Operator Name | snake_case, contains | Yes | e.g., |
| Function Description | Text/Formula/Benchmark Example | Yes | Consistent with Catlass capability scope |
Optional: Support dtype, SoC — default values can be consistent with catlass-operator-design / platform API.
Decision Tree
| User Request | Handling Method |
|---|
| "Develop/generate a Catlass operator" | Complete Step 0.1 → Verify name contains → Confirm function → Execute full workflow |
| "Continue Catlass operator development" | Complete Step 0.1 → Detect current phase according to Error Recovery and resume execution |
Acceptance Criteria
Phase 1: Project Initialization + Catlass Source Code Preparation
Step 1.1: Project Skeleton
Call Skill:
ascendc-operator-project-init
MANDATORY: Execute according to ascendc-operator-project-init:
1. Detect or create ascend-kernel
2. Create operator skeleton in csrc/ops/<op_name>/
3. Prompt registration update points (to be implemented by catlass-operator-code-gen later)
Checkpoints (Step 1.1)
Step 1.2: Catlass Source Code
This step does not correspond to an independent skill file, but must be executed in accordance with the following requirements.
Prerequisite: Step 1.1 is completed
Execution Content
- Ensure exists under , containing and
- If not exists: MUST execute in project root (Prohibited to clone within )
git clone https://gitcode.com/cann/catlass.git catlass
Checkpoints (Step 1.2)
All Phase 1 checkpoints passed → Enter Phase 2
Phase 2: Catlass Design Documentation
Execution Content
MANDATORY: Execute according to catlass-operator-design:
1. Analyze requirements and Catlass component boundaries
2. Align with implementable paths of catlass/examples and catlass/include
3. Finalize and save the recommended path: csrc/ops/<op_name>/design.md (consistent with the reading of doc-gen / precision-eval / performance-eval)
Checkpoints
All checkpoints passed → Enter Phase 3
Phase 3: Code Generation + Framework Adaptation + Compilation Testing
Call Skill:
catlass-operator-code-gen
(Phase 5
MUST call
ascendc-operator-compile-debug
)
Execution Content
MANDATORY: Execute according to catlass-operator-code-gen (aligned with ascendc-operator-code-gen phase structure):
Phase 1: Load GUIDE / references (including compile-catlass, chapters aligned with ascendc code-gen)
Phase 2: Read design.md, lock catlass/examples path and type system
Phase 3: Generate op_kernel + op_host, register Catlass compilation options in CMake (BUILD_CATLASS_MODULE, CATLASS_ARCH, etc. see compile-catlass.md)
Phase 4: Framework adaptation — ops.h, register.cpp, csrc/CMakeLists.txt
Phase 5: Compilation installation and testing — call ascendc-operator-compile-debug (build.sh, pip install, tests/test_<op_name>.py, failure troubleshooting subject to this skill)
Checkpoints
All checkpoints passed → Enter Phase 4
Phase 4: Interface Documentation Generation
Execution Content
MANDATORY: Execute according to ascendc-operator-doc-gen:
- Extract interface information from register.cpp, ops.h, design.md, op_host, tests
- Generate csrc/ops/<op_name>/README.md (PyTorch-style Chinese)
- Display document key points or full text in chat interface
Checkpoints
All checkpoints passed → Enter Phase 5
Phase 5: Precision Evaluation Report
Call Skill:
ascendc-operator-precision-eval
Execution Content
MANDATORY: Execute according to ascendc-operator-precision-eval:
- Number of test cases ≥ 30, covering shapes × dtypes × boundaries
- Output to csrc/ops/<op_name>/test/, generate Markdown precision report
- Display overview, failure summary and key findings in chat interface (do not only provide paths)
Checkpoints
FAIL Closed Loop: Root cause analysis → Revise design (Phase 2) or code (Phase 3) → Retest via Phase 4, Phase 5
All checkpoints passed → Enter Phase 6
Phase 6: Performance Evaluation Report
Call Skill:
ascendc-operator-performance-eval
Execution Content
MANDATORY: Only follow the details of ascendc-operator-performance-eval SKILL.md:
- Maintain JSONL test cases in csrc/ops/<op_name>/test/; read design.md before generation
- Use torch_npu.profiler, warmup=5, active=5
- Summarize indicators such as ASCEND_PROFILER_OUTPUT/op_statistic.csv, output custom operator vs benchmark Markdown report
- Display comparison table and brief conclusion in chat interface
Checkpoints
All checkpoints passed → Catlass operator main workflow completed
Optional Post-delivery: Performance Optimization
Call Skill:
catlass-operator-performance-optim
Must ask user whether to enter tuning; do not skip the question by default.
- User agrees → Modify tiling/implementation according to catlass-operator-performance-optim; any code change → Re-run the workflow starting from Phase 3 (Phase 3→4→5→6) until it meets the standards again
- User refuses → End
Inter-phase Data Flow
Phase 1 Output Phase 2 Input
ascend-kernel + ops/<op>/skeleton Operator name, catlass/ available
+ catlass/include, examples ────▶
Phase 2 Output Phase 3 Input
design.md (finalized) ────▶ Example path, type and Host contract
Phase 3 Output Phase 4 Input
Installed whl + test_<op>.py ────▶ register.cpp / ops.h / design.md / op_host
Phase 4 Output Phase 5 Input
README.md ────▶ Interface, dtype, constraints, calling method
Phase 5 Output Phase 6 Input
Precision passed + Report ────▶ Operator name, benchmark API, JSONL and profiler process
Phase 6 Output
Performance report (profiler) ────▶ Optional: Enter catlass-operator-performance-optim after user confirmation
Status Tracking Table
| Phase | Prerequisites | Called Skill / Action | Key Deliverables |
|---|
| 0. Requirements Collection | None | — | CANN + Conda + (contains catlass) + Function description |
| 1. Project + Catlass | Phase 0 | ascendc-operator-project-init
+ root directory | Skeleton + Catlass source tree |
| 2. Design | Phase 1 | | |
| 3. Code & Testing | Phase 2 | catlass-operator-code-gen
→ | Runnable operator + basic test passed |
| 4. Interface Documentation | Phase 3 | | |
| 5. Precision Evaluation | Phase 4 | ascendc-operator-precision-eval
| ≥30 cases + Precision report |
| 6. Performance Evaluation | Phase 5 | ascendc-operator-performance-eval
| JSONL + Profiler report |
| (Optional) Tuning | Phase 6 + User confirmation | catlass-operator-performance-optim
| Iterated implementation and report |
Error Recovery
Resume from Breakpoint
When the user says "Continue Catlass operator development":
| Detection Condition | Determined Phase | Recovery Action |
|---|
| does not exist | Phase 1 incomplete | Start from Phase 1 Step 1.1 |
| does not exist | Phase 1 incomplete | Complete Step 1.2 cloning |
| is empty or placeholder | Phase 2 incomplete | Start from Phase 2 |
| / are still skeleton or inconsistent with design | Phase 3 incomplete | Start from Phase 3 |
| whl not installed or fails | Phase 3 incomplete | Recover within compile-debug workflow |
| No | Phase 4 incomplete | Start from Phase 4 |
| No precision report in or precision not fully passed | Phase 5 incomplete | Recover from Phase 5 |
| No performance report or does not meet performance-eval requirements | Phase 6 incomplete | Recover from Phase 6 |
Compilation/Testing Failure
Handled by ascendc-operator-compile-debug (triggered via catlass-operator-code-gen); retry and troubleshooting limits are subject to compile-debug skill.