Catlass Operator End-to-End Development Orchestration

Skill Type: Process-oriented (six-phase workflow; Catlass Source Code Preparation is incorporated into Phase 1, sub-skills are orchestrated serially)

This skill orchestrates Catlass operators on ascend-kernel from scratch to production-ready; general capabilities (project skeleton, compilation debugging, interface documentation, precision, performance) reuse ascendc-* sub-skills, and Catlass-specific (source code tree, design, Device/Host implementation) uses catlass-* sub-skills.

Core Principles

Six-phase serial execution: Project initialization (including Catlass source code) → Design documentation → Code generation & compilation testing → Interface documentation → Precision evaluation → Performance evaluation, executed in strict order
Sub-skill execution: Each phase MUST open and follow the corresponding sub-skill, and shall not replace the implementation by oneself
Phase gating: Enter the next phase only after all checkpoints of the previous phase are passed
Design-driven coding: Code generation depends on the finalized
```
design.md
```
from catlass-operator-design and the selection of catlass/examples
No need for users to pre-write design documents: The design phase generates and saves the design document via catlass-operator-design
Documentation closed loop: After passing compilation testing, MUST generate PyTorch-style Chinese interface documentation (Phase 4) and display it in the chat interface
Precision closed loop: The operator must pass ≥30 comprehensive precision evaluation cases (Phase 5) to be considered completed
Performance closed loop: The operator must complete the comparative evaluation with torch_npu.profiler and output a performance report (Phase 6); the conclusion shall be based on ascendc-operator-performance-eval
Result visualization: Key results of Phase 3/4/5/6 MUST be directly displayed in the chat interface in the form of Markdown or other formats, do not only output paths
Operator naming:
```
op_name
```
(snake_case) MUST contain the substring catlass
, consistent with the convention of existing Catlass operators in ascend-kernel
Honest shutdown: When unable to continue due to environment or dependencies, explain the specific reason and completed steps before stopping

Catlass Compilation and Operation (Error-Prone Summary)

Build:

BUILD_CATLASS_MODULE=ON

; CMake uses Python with torch_npu (such as -DPYTHON_EXECUTABLE
/
ASCEND_BUILD_PYTHON
); CATLASS_ARCH
must match the chip (see catlass-operator-code-gen/references/compile-catlass.md
); CANN can be the bundle root + cann-*/set_env.sh
.

pytest / torch_npu: If ASCEND_RUNTIME_PATH
is reported:

export ASCEND_RUNTIME_PATH="${ASCEND_TOOLKIT_HOME}/runtime"

Design/Code: Consistent with the compilable examples in catlass/include
and catlass/examples
, details see compile-catlass.md
.

Available Sub-Skill List

Skill	Path	Responsibility
`ascendc-operator-project-init`	`ascendc-operator-project-init/SKILL.md`	Detect/create ascend-kernel, generate operator skeleton in `csrc/ops/<op_name>/`
—	(Step within Phase 1)	Clone `catlass/` in ASCEND_KERNEL_ROOT (at the same level as `csrc/` ) to make `include/` and `examples/` available
`catlass-operator-design`	`catlass-operator-design/SKILL.md`	Convert Catlass requirements into finalized design documents (recommended path: `csrc/ops/<op_name>/design.md` )
`catlass-operator-code-gen`	`catlass-operator-code-gen/SKILL.md`	Implement op_host / op_kernel and framework adaptation according to `design.md` and catlass/examples, and internally call the compilation testing skill
`ascendc-operator-compile-debug`	`ascendc-operator-compile-debug/SKILL.md`	Compile, install whl package, generate/run `tests/test_<op_name>.py` (called in Phase 5 of catlass-operator-code-gen, do not skip code-gen directly and claim completion)
`ascendc-operator-doc-gen`	`ascendc-operator-doc-gen/SKILL.md`	Generate PyTorch-style Chinese API document `README.md` (mandatory phase)
`ascendc-operator-precision-eval`	`ascendc-operator-precision-eval/SKILL.md`	≥30 precision test cases and precision verification report (mandatory phase)
`ascendc-operator-performance-eval`	`ascendc-operator-performance-eval/SKILL.md`	JSONL test cases + torch_npu.profiler (warmup/active=5) + `op_statistic.csv` summary, output Markdown report of custom vs benchmark operators (mandatory phase)
`catlass-operator-performance-optim`	`catlass-operator-performance-optim/SKILL.md`	Optional after delivery: Perform tiling/performance iteration according to Catlass documentation; after code changes, re-run the closed loop starting from Phase 3

Project Directory Terminology (Aligned with AscendC)

Term	Meaning
ASCEND_KERNEL_ROOT	Root directory of ascend-kernel: contains `build.sh` , `CMakeLists.txt` , `csrc/`
Operator Directory	`<ASCEND_KERNEL_ROOT>/csrc/ops/<op_name>/`
Catlass Source Code	`<ASCEND_KERNEL_ROOT>/catlass/` (Prohibited to clone in `csrc/ops/<op>/` )

Workflow Overview

┌─────────────────────────────┐   ┌──────────────┐   ┌───────────────────────────┐   ┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│  Phase 1                    │   │  Phase 2     │   │  Phase 3                  │   │  Phase 4         │   │  Phase 5         │   │  Phase 6         │
│  Project Init + Catlass Src │──▶│  Catlass Design │──▶│  Code Gen + Framework Adaption + Compile Test │──▶│  Interface Doc Gen │──▶│  Precision Eval Report │──▶│  Performance Eval Report │
│  project-init + clone      │   │  catlass-    │   │  catlass-code-gen →       │   │  doc-gen         │   │  precision-eval  │   │  performance-eval│
│  catlass                   │   │  design      │   │  compile-debug            │   │                  │   │                  │   │  (profiler)      │
└─────────────────────────────┘   └──────────────┘   └───────────────────────────┘   └──────────────────┘   └──────────────────┘   └──────────────────┘

Input: Operator name (contains catlass) + Function description + Environment confirmation          Output: Deliverable operator + README + Precision report + Profiler performance report

Anti-Pattern List (NEVER DO THESE)

❌ Do not skip Catlass Source Code Preparation (do not proceed with design or code generation without catlass/include
and catlass/examples
)
❌ Do not clone Catlass in csrc/ops/<op_name>/
, must be in project root under
```
catlass/
```
❌ Do not skip the design phase and directly write kernel/host code
❌ Do not implement the entire operator independently without following the catlass-operator-code-gen process
❌ Do not modify framework registration without permission before code generation (follow the conventions of project-init / code-gen)
❌ Do not manually replace the compilation, installation and basic test closed loop responsible for compile-debug (should be triggered via code-gen Phase 5)
❌ Do not skip the interface documentation phase (Phase 4 must be executed after Phase 3 passes)
❌ Do not skip the precision evaluation phase (Phase 5 must be executed after Phase 4 passes)
❌ Do not skip the performance evaluation phase (Phase 6 must be executed after Phase 5 passes)
❌ Do not use collection methods inconsistent with ascendc-operator-performance-eval as the final performance conclusion
❌ Do not reference non-existent skills

Phase 0: Requirements Collection

Objective: Confirm the minimum information set and operating environment for Catlass operator development (aligned with ascendc-operator-dev Phase 0, plus Catlass naming constraints).

Step 0.1: Environment Confirmation (MUST be completed before any development actions)

CANN Environment

Check
```
ASCEND_HOME_PATH
```
(run
```
echo $ASCEND_HOME_PATH
```
)
Already set: Use as
```
CANN_PATH
```
, no need to ask repeatedly
Not set: MUST ask the user for the CANN path (e.g.,
```
/usr/local/Ascend/ascend-toolkit
```
)

bash

source ${CANN_PATH}/*/set_env.sh

Conda Environment

Check
```
CONDA_DEFAULT_ENV
```
Activated and not
base
: Use directly
Not activated or is
base
: MUST ask the user for the conda environment name

bash

conda activate <env_name>

Environment Confirmation Checkpoints

CANN path is confirmed and
```
set_env.sh
```
is executable
Conda environment is confirmed and can be activated

Step 0.2: Operator Requirements Collection

Information	Format Requirement	Mandatory	Description
CANN Path	Absolute path	Yes	Aligned with ascendc, can be detected automatically
Conda Environment	String	Yes	Aligned with ascendc, can be detected automatically
Operator Name	snake_case, contains `catlass`	Yes	e.g., `catlass_matmul_basic`
Function Description	Text/Formula/Benchmark Example	Yes	Consistent with Catlass capability scope

Optional: Support dtype, SoC — default values can be consistent with catlass-operator-design / platform APIs.

Decision Tree

User Request	Handling Method
"Develop/generate a certain Catlass operator"	Complete Step 0.1 → Validate that the name contains `catlass` → Confirm function → Execute full workflow
"Continue Catlass operator development"	Complete Step 0.1 → Detect current phase according to Error Recovery and resume

Acceptance Criteria

CANN + Conda are confirmed
```
op_name
```
is confirmed and contains catlass
Function description is clear

Phase 1: Project Initialization + Catlass Source Code Preparation

Step 1.1: Project Skeleton

Call Skill:

ascendc-operator-project-init

MANDATORY: Execute according to ascendc-operator-project-init:
1. Detect or create ascend-kernel
2. Create operator skeleton in csrc/ops/<op_name>/
3. Prompt registration update points (to be implemented by catlass-operator-code-gen later)

Checkpoints (Step 1.1)

ASCEND_KERNEL_ROOT

contains

build.sh

CMakeLists.txt

csrc/

```
csrc/ops/<op_name>/
```
is created, containing placeholder
```
design.md
```
,
```
op_host/
```
,
```
op_kernel/
```
,
```
CMakeLists.txt
```
, etc. (subject to this skill)

Step 1.2: Catlass Source Code

This step does not correspond to an independent skill file, but must be executed according to the following requirements.

Prerequisite: Step 1.1 is completed

Execution Content

Ensure catlass/
exists under ASCEND_KERNEL_ROOT
, and contains catlass/include
and catlass/examples

If not exists: MUST execute in the project root (Prohibited to clone in

csrc/ops/<op_name>/

)

git clone https://gitcode.com/cann/catlass.git catlass

Checkpoints (Step 1.2)

```
<ASCEND_KERNEL_ROOT>/catlass/include
```
exists
```
<ASCEND_KERNEL_ROOT>/catlass/examples
```
exists

All Phase 1 checkpoints passed → Enter Phase 2

Phase 2: Catlass Design Document

Call Skill:

catlass-operator-design

Execution Content

MANDATORY: Execute according to catlass-operator-design:
1. Analyze requirements and Catlass component boundaries
2. Align with the implementable paths of catlass/examples and catlass/include
3. Finalize and save the recommended path: csrc/ops/<op_name>/design.md (consistent with the reading path of doc-gen / precision-eval / performance-eval)

Checkpoints

```
csrc/ops/<op_name>/design.md
```
is finalized (not an empty placeholder)
Clearly states the reference example path, Kernel/Host contract, dtype/shape constraints, etc. (subject to catlass-operator-design)

All checkpoints passed → Enter Phase 3

Phase 3: Code Generation + Framework Adaption + Compile Test

Call Skill:

catlass-operator-code-gen

(Phase 5 MUST call

ascendc-operator-compile-debug

)

Execution Content

MANDATORY: Execute according to catlass-operator-code-gen (aligned with the phase structure of ascendc-operator-code-gen):

Phase 1: Load GUIDE / references (including compile-catlass, chapters aligned with ascendc code-gen)
Phase 2: Read design.md, lock the catlass/examples path and type system
Phase 3: Generate op_kernel + op_host, register Catlass compilation options in CMake (BUILD_CATLASS_MODULE, CATLASS_ARCH, etc. see compile-catlass.md)
Phase 4: Framework adaptation — ops.h, register.cpp, csrc/CMakeLists.txt
Phase 5: Compile, install and test — call ascendc-operator-compile-debug (build.sh, pip install, tests/test_<op_name>.py, error troubleshooting is subject to this skill)

Checkpoints

```
op_host
```
,
```
op_kernel
```
are consistent with
```
design.md
```
and the selected example
Framework registration is consistent with the repository template (e.g.,
```
namespace ascend_kernel
```
)
Compilation is successful, whl package can be installed
```
tests/test_<op_name>.py
```
exists and passes (exit code 0)
Key compilation/test results are summarized and displayed in the chat

All checkpoints passed → Enter Phase 4

Phase 4: Interface Document Generation

Call Skill:

ascendc-operator-doc-gen

Execution Content

MANDATORY: Execute according to ascendc-operator-doc-gen:
- Extract interface information from register.cpp, ops.h, design.md, op_host, tests
- Generate csrc/ops/<op_name>/README.md (PyTorch-style Chinese)
- Display document key points or full text in the chat interface

Checkpoints

```
README.md
```
is written to the operator directory
Consistent with
```
m.def
```
/ actual Python calls
Displayed in the chat interface

All checkpoints passed → Enter Phase 5

Phase 5: Precision Evaluation Report

Call Skill:

ascendc-operator-precision-eval

Execution Content

MANDATORY: Execute according to ascendc-operator-precision-eval:
- Number of test cases ≥30, covering shapes × dtypes × boundaries
- Output to csrc/ops/<op_name>/test/, generate Markdown precision report
- Display overview, failure summary and key findings in the chat interface (do not only provide paths)

Checkpoints

All pytest precision test cases pass
```
<op_name>_precision_report.md
```
(or the report name specified by this skill) is generated
Precision result summary is displayed in the chat

FAIL Closed Loop: Root cause analysis → Revise design (Phase 2) or code (Phase 3) → Re-test via Phase 4, Phase 5

All checkpoints passed → Enter Phase 6

Phase 6: Performance Evaluation Report

Call Skill:

ascendc-operator-performance-eval

Execution Content

MANDATORY: Take ascendc-operator-performance-eval SKILL.md as the only detailed rule:
- Maintain JSONL test cases in csrc/ops/<op_name>/test/; read design.md before generation
- Use torch_npu.profiler, warmup=5, active=5
- Summarize indicators such as ASCEND_PROFILER_OUTPUT/op_statistic.csv, output Markdown report of custom operator vs benchmark
- Display comparison table and brief conclusion in the chat interface

Checkpoints

Test cases and report format comply with this skill (including DType, dual-path comparison, etc.)
Report file is saved in the operator
```
test/
```
directory
Performance summary is displayed in the chat

All checkpoints passed → Catlass operator main workflow is completed

Post-Delivery Optional: Performance Optimization

Call Skill:

catlass-operator-performance-optim

Must ask the user whether to enter tuning; do not skip the question by default.

User agrees → Modify tiling/implementation according to catlass-operator-performance-optim; any code change → Re-run from Phase 3 (Phase 3→4→5→6) until it meets the standards again
User refuses → End

Inter-Phase Data Flow

Phase 1 Output                         Phase 2 Input
  ascend-kernel + ops/<op>/skeleton       Operator name, catlass/ is referenceable
  + catlass/include, examples   ────▶

Phase 2 Output                         Phase 3 Input
  design.md (finalized)            ────▶  Example path, type and Host contract

Phase 3 Output                         Phase 4 Input
  Installed whl + test_<op>.py     ────▶  register.cpp / ops.h / design.md / op_host

Phase 4 Output                         Phase 5 Input
  README.md                    ────▶  Interface, dtype, constraints, calling method

Phase 5 Output                         Phase 6 Input
  Precision passed + Report                ────▶  Operator name, benchmark API, JSONL and profiler workflow

Phase 6 Output
  Performance Report (profiler)           ────▶  Optional: Enter catlass-operator-performance-optim after user confirmation

Status Tracking Table

Phase	Prerequisite	Called Skill / Action	Key Deliverables
0. Requirements Collection	None	—	CANN + Conda + `op_name` (contains catlass) + Function description
1. Project + Catlass	Phase 0	`ascendc-operator-project-init` + root directory `catlass/`	Skeleton + Catlass source code tree
2. Design	Phase 1	`catlass-operator-design`	`design.md`
3. Code & Test	Phase 2	`catlass-operator-code-gen` → `compile-debug`	Runnable operator + Basic test passed
4. Interface Doc	Phase 3	`ascendc-operator-doc-gen`	`README.md`
5. Precision Eval	Phase 4	`ascendc-operator-precision-eval`	≥30 cases + Precision report
6. Performance Eval	Phase 5	`ascendc-operator-performance-eval`	JSONL + Profiler report
(Optional) Tuning	Phase 6 + User Confirmation	`catlass-operator-performance-optim`	Iterated implementation and report

Error Recovery

Resume from Breakpoint

When the user says "Continue Catlass operator development":

Detection Condition	Determined Phase	Recovery Action
`csrc/ops/<op_name>/` does not exist	Phase 1 not completed	Start from Phase 1 Step 1.1
`catlass/examples` does not exist	Phase 1 not completed	Complete Step 1.2 cloning
`design.md` is empty or placeholder	Phase 2 not completed	Start from Phase 2
`op_host` / `op_kernel` are still skeleton or inconsistent with design	Phase 3 not completed	Start from Phase 3
whl not installed or `tests/test_<op_name>.py` fails	Phase 3 not completed	Recover within compile-debug workflow
No `README.md`	Phase 4 not completed	Start from Phase 4
No precision report in `test/` or precision not fully passed	Phase 5 not completed	Recover from Phase 5
No performance report or does not meet performance-eval requirements	Phase 6 not completed	Recover from Phase 6

Compile/Test Failure

Handled by ascendc-operator-compile-debug (triggered via catlass-operator-code-gen); retry and troubleshooting limits are subject to compile-debug skill.

external-gitcode-ascend-catlass-operator-dev

NPX Install

Tags

SKILL.md Content (Chinese)

Catlass Operator End-to-End Development Orchestration

Core Principles

Catlass Compilation and Operation (Error-Prone Summary)

Available Sub-Skill List

Project Directory Terminology (Aligned with AscendC)

Workflow Overview

Anti-Pattern List (NEVER DO THESE)

Phase 0: Requirements Collection

Step 0.1: Environment Confirmation (MUST be completed before any development actions)

CANN Environment

Conda Environment

Environment Confirmation Checkpoints

Step 0.2: Operator Requirements Collection

Decision Tree

Acceptance Criteria

Phase 1: Project Initialization + Catlass Source Code Preparation

Step 1.1: Project Skeleton

Step 1.2: Catlass Source Code

Phase 2: Catlass Design Document

Execution Content

Checkpoints

Phase 3: Code Generation + Framework Adaption + Compile Test

Execution Content

Checkpoints

Phase 4: Interface Document Generation

Execution Content

Checkpoints

Phase 5: Precision Evaluation Report

Execution Content

Checkpoints

Phase 6: Performance Evaluation Report

Execution Content

Checkpoints

Post-Delivery Optional: Performance Optimization

Inter-Phase Data Flow

Status Tracking Table

Error Recovery

Resume from Breakpoint

Compile/Test Failure