external-gitcode-ascend-catlass-operator-dev

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Catlass 算子端到端开发编排

Catlass Operator End-to-End Development Orchestration

Skill 类型:流程导向型(六阶段工作流;Catlass 源码准备并入 Phase 1,子技能串行编排)
本 skill 编排 ascend-kernel 上 Catlass 算子从零到生产可用;通用能力(工程骨架、编译调试、接口文档、精度、性能)复用 ascendc-* 子 skill,Catlass 专属(源码树、设计、Device/Host 落地)使用 catlass-* 子 skill。
Skill Type: Process-oriented (six-phase workflow; Catlass Source Code Preparation is incorporated into Phase 1, sub-skills are orchestrated serially)
This skill orchestrates Catlass operators on ascend-kernel from scratch to production-ready; general capabilities (project skeleton, compilation debugging, interface documentation, precision, performance) reuse ascendc-* sub-skills, and Catlass-specific (source code tree, design, Device/Host implementation) uses catlass-* sub-skills.

核心原则

Core Principles

  1. 六阶段串行:工程初始化(含 Catlass 源码)→ 设计文档 → 代码生成与编译测试 → 接口文档 → 精度评估 → 性能评测,严格顺序执行
  2. 子技能执行:每个阶段 MUST 打开并遵循对应子 skill,不得自行替代实现
  3. 阶段门控:前一阶段检查点全部通过后才进入下一阶段
  4. 设计驱动编码:代码生成依赖 catlass-operator-design 定稿的
    design.md
    catlass/examples 选型
  5. 无需用户预先手写设计文档:设计阶段由 catlass-operator-design 生成并落盘
  6. 文档闭环:编译测试通过后 MUST 生成 PyTorch 风格中文接口文档(Phase 4),并在聊天界面展示
  7. 精度闭环:算子必须通过 ≥30 例全面精度评估(Phase 5)才算完成
  8. 性能闭环:算子必须完成 torch_npu.profiler 对比评测并输出性能报告(Phase 6);结论以 ascendc-operator-performance-eval 为准
  9. 结果可视化:Phase 3/4/5/6 的关键结果 MUST 以 Markdown 等形式直接展示在聊天界面,不要仅输出路径
  10. 算子命名
    op_name
    (snake_case)必须包含子串
    catlass
    ,与 ascend-kernel 内既有 Catlass 算子约定一致
  11. 诚实停机:因环境或依赖无法继续时,说明具体原因与已完成步骤后停止
  1. Six-phase serial execution: Project initialization (including Catlass source code) → Design documentation → Code generation & compilation testing → Interface documentation → Precision evaluation → Performance evaluation, executed in strict order
  2. Sub-skill execution: Each phase MUST open and follow the corresponding sub-skill, and shall not replace the implementation by oneself
  3. Phase gating: Enter the next phase only after all checkpoints of the previous phase are passed
  4. Design-driven coding: Code generation depends on the finalized
    design.md
    from catlass-operator-design and the selection of catlass/examples
  5. No need for users to pre-write design documents: The design phase generates and saves the design document via catlass-operator-design
  6. Documentation closed loop: After passing compilation testing, MUST generate PyTorch-style Chinese interface documentation (Phase 4) and display it in the chat interface
  7. Precision closed loop: The operator must pass ≥30 comprehensive precision evaluation cases (Phase 5) to be considered completed
  8. Performance closed loop: The operator must complete the comparative evaluation with torch_npu.profiler and output a performance report (Phase 6); the conclusion shall be based on ascendc-operator-performance-eval
  9. Result visualization: Key results of Phase 3/4/5/6 MUST be directly displayed in the chat interface in the form of Markdown or other formats, do not only output paths
  10. Operator naming:
    op_name
    (snake_case) MUST contain the substring
    catlass
    , consistent with the convention of existing Catlass operators in ascend-kernel
  11. Honest shutdown: When unable to continue due to environment or dependencies, explain the specific reason and completed steps before stopping

Catlass 编译与运行(易错摘要)

Catlass Compilation and Operation (Error-Prone Summary)

  • 构建
    BUILD_CATLASS_MODULE=ON
    ;CMake 使用含 torch_npu 的 Python(如
    -DPYTHON_EXECUTABLE
    /
    ASCEND_BUILD_PYTHON
    );
    CATLASS_ARCH
    与芯片一致(见
    catlass-operator-code-gen/references/compile-catlass.md
    );CANN 可为 bundle 根 +
    cann-*/set_env.sh
  • pytest / torch_npu:若报
    ASCEND_RUNTIME_PATH
    export ASCEND_RUNTIME_PATH="${ASCEND_TOOLKIT_HOME}/runtime"
  • 设计/代码:与
    catlass/include
    catlass/examples
    可对齐编译的示例一致,细则见
    compile-catlass.md
  • Build:
    BUILD_CATLASS_MODULE=ON
    ; CMake uses Python with torch_npu (such as
    -DPYTHON_EXECUTABLE
    /
    ASCEND_BUILD_PYTHON
    );
    CATLASS_ARCH
    must match the chip (see
    catlass-operator-code-gen/references/compile-catlass.md
    ); CANN can be the bundle root +
    cann-*/set_env.sh
    .
  • pytest / torch_npu: If
    ASCEND_RUNTIME_PATH
    is reported:
    export ASCEND_RUNTIME_PATH="${ASCEND_TOOLKIT_HOME}/runtime"
    .
  • Design/Code: Consistent with the compilable examples in
    catlass/include
    and
    catlass/examples
    , details see
    compile-catlass.md
    .

可用子 Skill 清单

Available Sub-Skill List

Skill路径职责
ascendc-operator-project-init
ascendc-operator-project-init/SKILL.md
检测/创建 ascend-kernel,在
csrc/ops/<op_name>/
生成算子骨架
(Phase 1 内步骤)ASCEND_KERNEL_ROOT 克隆
catlass/
(与
csrc/
同级),使
include/
examples/
可用
catlass-operator-design
catlass-operator-design/SKILL.md
将 Catlass 需求转为定稿设计文档(推荐
csrc/ops/<op_name>/design.md
catlass-operator-code-gen
catlass-operator-code-gen/SKILL.md
design.md
catlass/examples 落地 op_host / op_kernel、框架适配,并内部调用编译测试 skill
ascendc-operator-compile-debug
ascendc-operator-compile-debug/SKILL.md
编译、安装 whl、生成/运行
tests/test_<op_name>.py
(由 catlass-operator-code-gen 阶段 5 调用,勿单独跳过 code-gen 直接宣称完成)
ascendc-operator-doc-gen
ascendc-operator-doc-gen/SKILL.md
生成 PyTorch 风格中文 API 文档
README.md
(必选阶段)
ascendc-operator-precision-eval
ascendc-operator-precision-eval/SKILL.md
≥30 例精度测试与精度验证报告(必选阶段)
ascendc-operator-performance-eval
ascendc-operator-performance-eval/SKILL.md
JSONL 用例 + torch_npu.profiler(warmup/active=5)+
op_statistic.csv
汇总,输出自定义 vs 标杆 Markdown 报告(必选阶段)
catlass-operator-performance-optim
catlass-operator-performance-optim/SKILL.md
交付后可选:按 Catlass 文档做 tiling/性能迭代;代码变更后须回到 Phase 3 起复跑闭环
SkillPathResponsibility
ascendc-operator-project-init
ascendc-operator-project-init/SKILL.md
Detect/create ascend-kernel, generate operator skeleton in
csrc/ops/<op_name>/
(Step within Phase 1)Clone
catlass/
in ASCEND_KERNEL_ROOT (at the same level as
csrc/
) to make
include/
and
examples/
available
catlass-operator-design
catlass-operator-design/SKILL.md
Convert Catlass requirements into finalized design documents (recommended path:
csrc/ops/<op_name>/design.md
)
catlass-operator-code-gen
catlass-operator-code-gen/SKILL.md
Implement op_host / op_kernel and framework adaptation according to
design.md
and catlass/examples, and internally call the compilation testing skill
ascendc-operator-compile-debug
ascendc-operator-compile-debug/SKILL.md
Compile, install whl package, generate/run
tests/test_<op_name>.py
(called in Phase 5 of catlass-operator-code-gen, do not skip code-gen directly and claim completion)
ascendc-operator-doc-gen
ascendc-operator-doc-gen/SKILL.md
Generate PyTorch-style Chinese API document
README.md
(mandatory phase)
ascendc-operator-precision-eval
ascendc-operator-precision-eval/SKILL.md
≥30 precision test cases and precision verification report (mandatory phase)
ascendc-operator-performance-eval
ascendc-operator-performance-eval/SKILL.md
JSONL test cases + torch_npu.profiler (warmup/active=5) +
op_statistic.csv
summary, output Markdown report of custom vs benchmark operators (mandatory phase)
catlass-operator-performance-optim
catlass-operator-performance-optim/SKILL.md
Optional after delivery: Perform tiling/performance iteration according to Catlass documentation; after code changes, re-run the closed loop starting from Phase 3

工程目录术语(与 AscendC 对齐)

Project Directory Terminology (Aligned with AscendC)

术语含义
ASCEND_KERNEL_ROOTascend-kernel 根目录:含
build.sh
CMakeLists.txt
csrc/
算子目录
<ASCEND_KERNEL_ROOT>/csrc/ops/<op_name>/
Catlass 源码
<ASCEND_KERNEL_ROOT>/catlass/
禁止
csrc/ops/<op>/
内克隆)
TermMeaning
ASCEND_KERNEL_ROOTRoot directory of ascend-kernel: contains
build.sh
,
CMakeLists.txt
,
csrc/
Operator Directory
<ASCEND_KERNEL_ROOT>/csrc/ops/<op_name>/
Catlass Source Code
<ASCEND_KERNEL_ROOT>/catlass/
(Prohibited to clone in
csrc/ops/<op>/
)

工作流总览

Workflow Overview

┌─────────────────────────────┐   ┌──────────────┐   ┌───────────────────────────┐   ┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│  Phase 1                    │   │  Phase 2     │   │  Phase 3                  │   │  Phase 4         │   │  Phase 5         │   │  Phase 6         │
│  工程初始化 + Catlass 源码   │──▶│  Catlass 设计 │──▶│  代码生成+框架适配+编译测试 │──▶│  接口文档生成     │──▶│  精度评估报告     │──▶│  性能评测报告     │
│  project-init + clone      │   │  catlass-    │   │  catlass-code-gen →       │   │  doc-gen         │   │  precision-eval  │   │  performance-eval│
│  catlass                   │   │  design      │   │  compile-debug            │   │                  │   │                  │   │  (profiler)      │
└─────────────────────────────┘   └──────────────┘   └───────────────────────────┘   └──────────────────┘   └──────────────────┘   └──────────────────┘

输入: 算子名(含 catlass) + 功能描述 + 环境确认          输出: 可交付算子 + README + 精度报告 + profiler 性能报告
┌─────────────────────────────┐   ┌──────────────┐   ┌───────────────────────────┐   ┌──────────────────┐   ┌──────────────────┐   ┌──────────────────┐
│  Phase 1                    │   │  Phase 2     │   │  Phase 3                  │   │  Phase 4         │   │  Phase 5         │   │  Phase 6         │
│  Project Init + Catlass Src │──▶│  Catlass Design │──▶│  Code Gen + Framework Adaption + Compile Test │──▶│  Interface Doc Gen │──▶│  Precision Eval Report │──▶│  Performance Eval Report │
│  project-init + clone      │   │  catlass-    │   │  catlass-code-gen →       │   │  doc-gen         │   │  precision-eval  │   │  performance-eval│
│  catlass                   │   │  design      │   │  compile-debug            │   │                  │   │                  │   │  (profiler)      │
└─────────────────────────────┘   └──────────────┘   └───────────────────────────┘   └──────────────────┘   └──────────────────┘   └──────────────────┘

Input: Operator name (contains catlass) + Function description + Environment confirmation          Output: Deliverable operator + README + Precision report + Profiler performance report

反模式清单(NEVER DO THESE)

Anti-Pattern List (NEVER DO THESE)

  • ❌ 不要跳过 Catlass 源码准备(无
    catlass/include
    catlass/examples
    就做设计或代码生成)
  • ❌ 不要在
    csrc/ops/<op_name>/
    内克隆 Catlass,必须在 工程根
    catlass/
  • ❌ 不要跳过设计阶段直接写 kernel/host
  • ❌ 不要自行实现整套算子落地而不遵循 catlass-operator-code-gen 流程
  • ❌ 不要在代码生成前擅自修改框架注册(以 project-init / code-gen 约定为准)
  • ❌ 不要手动替代 compile-debug 所负责的编译安装与基础测试闭环(应通过 code-gen 阶段 5 触发)
  • ❌ 不要跳过接口文档阶段(Phase 3 通过后必须 Phase 4)
  • ❌ 不要跳过精度评估阶段(Phase 4 通过后必须 Phase 5)
  • ❌ 不要跳过性能评测阶段(Phase 5 通过后必须 Phase 6)
  • ❌ 不要使用与 ascendc-operator-performance-eval 不一致的采集方式作为最终性能结论
  • ❌ 不要引用不存在的 skill

  • ❌ Do not skip Catlass Source Code Preparation (do not proceed with design or code generation without
    catlass/include
    and
    catlass/examples
    )
  • ❌ Do not clone Catlass in
    csrc/ops/<op_name>/
    , must be in project root under
    catlass/
  • ❌ Do not skip the design phase and directly write kernel/host code
  • ❌ Do not implement the entire operator independently without following the catlass-operator-code-gen process
  • ❌ Do not modify framework registration without permission before code generation (follow the conventions of project-init / code-gen)
  • ❌ Do not manually replace the compilation, installation and basic test closed loop responsible for compile-debug (should be triggered via code-gen Phase 5)
  • ❌ Do not skip the interface documentation phase (Phase 4 must be executed after Phase 3 passes)
  • ❌ Do not skip the precision evaluation phase (Phase 5 must be executed after Phase 4 passes)
  • ❌ Do not skip the performance evaluation phase (Phase 6 must be executed after Phase 5 passes)
  • ❌ Do not use collection methods inconsistent with ascendc-operator-performance-eval as the final performance conclusion
  • ❌ Do not reference non-existent skills

Phase 0:需求收集

Phase 0: Requirements Collection

目标:确认 Catlass 算子开发的最小信息集与运行环境(与 ascendc-operator-dev Phase 0 对齐,并增加 Catlass 命名约束)。
Objective: Confirm the minimum information set and operating environment for Catlass operator development (aligned with ascendc-operator-dev Phase 0, plus Catlass naming constraints).

Step 0.1:环境确认(MUST 在任何开发动作之前完成)

Step 0.1: Environment Confirmation (MUST be completed before any development actions)

CANN 环境

CANN Environment

  1. 检查
    ASCEND_HOME_PATH
    echo $ASCEND_HOME_PATH
  2. 已设置:作为
    CANN_PATH
    ,无需重复询问
  3. 未设置MUST 询问用户 CANN 路径(如
    /usr/local/Ascend/ascend-toolkit
bash
source ${CANN_PATH}/*/set_env.sh
  1. Check
    ASCEND_HOME_PATH
    (run
    echo $ASCEND_HOME_PATH
    )
  2. Already set: Use as
    CANN_PATH
    , no need to ask repeatedly
  3. Not set: MUST ask the user for the CANN path (e.g.,
    /usr/local/Ascend/ascend-toolkit
    )
bash
source ${CANN_PATH}/*/set_env.sh

Conda 环境

Conda Environment

  1. 检查
    CONDA_DEFAULT_ENV
  2. 已激活且非
    base
    :直接使用
  3. 未激活或为
    base
    MUST 询问 conda 环境名
bash
conda activate <env_name>
  1. Check
    CONDA_DEFAULT_ENV
  2. Activated and not
    base
    : Use directly
  3. Not activated or is
    base
    : MUST ask the user for the conda environment name
bash
conda activate <env_name>

环境确认检查点

Environment Confirmation Checkpoints

  • CANN 路径已确定且
    set_env.sh
    可执行
  • Conda 环境已确定且可激活
  • CANN path is confirmed and
    set_env.sh
    is executable
  • Conda environment is confirmed and can be activated

Step 0.2:算子需求收集

Step 0.2: Operator Requirements Collection

信息格式要求必填说明
CANN 路径绝对路径同 ascendc,可自动检测
Conda 环境字符串同 ascendc,可自动检测
算子名称snake_case,
catlass
catlass_matmul_basic
功能描述文本/公式/对标示例与 Catlass 能力范围一致
可选:支持 dtype、SoC —— 默认值与 catlass-operator-design / 平台 API 一致即可。
InformationFormat RequirementMandatoryDescription
CANN PathAbsolute pathYesAligned with ascendc, can be detected automatically
Conda EnvironmentStringYesAligned with ascendc, can be detected automatically
Operator Namesnake_case, contains
catlass
Yese.g.,
catlass_matmul_basic
Function DescriptionText/Formula/Benchmark ExampleYesConsistent with Catlass capability scope
Optional: Support dtype, SoC — default values can be consistent with catlass-operator-design / platform APIs.

决策树

Decision Tree

用户请求处理方式
「开发/生成某 Catlass 算子」完成 Step 0.1 → 校验名称含
catlass
→ 确认功能 → 执行全流程
「继续 Catlass 算子开发」完成 Step 0.1 → 按 错误恢复 检测当前阶段并续跑
User RequestHandling Method
"Develop/generate a certain Catlass operator"Complete Step 0.1 → Validate that the name contains
catlass
→ Confirm function → Execute full workflow
"Continue Catlass operator development"Complete Step 0.1 → Detect current phase according to Error Recovery and resume

验收标准

Acceptance Criteria

  • CANN + Conda 已确认
  • op_name
    已确认且包含
    catlass
  • 功能描述明确

  • CANN + Conda are confirmed
  • op_name
    is confirmed and contains
    catlass
  • Function description is clear

Phase 1:工程初始化 + Catlass 源码准备

Phase 1: Project Initialization + Catlass Source Code Preparation

Step 1.1:工程骨架

Step 1.1: Project Skeleton

调用 Skill
ascendc-operator-project-init
MANDATORY: 按 ascendc-operator-project-init 执行:
1. 检测或创建 ascend-kernel
2. 在 csrc/ops/<op_name>/ 创建算子骨架
3. 提示注册更新点(后续由 catlass-operator-code-gen 落实)
检查点(Step 1.1)
  • ASCEND_KERNEL_ROOT
    build.sh
    CMakeLists.txt
    csrc/
  • csrc/ops/<op_name>/
    已创建,含占位
    design.md
    op_host/
    op_kernel/
    CMakeLists.txt
    等(以该 skill 为准)
Call Skill:
ascendc-operator-project-init
MANDATORY: Execute according to ascendc-operator-project-init:
1. Detect or create ascend-kernel
2. Create operator skeleton in csrc/ops/<op_name>/
3. Prompt registration update points (to be implemented by catlass-operator-code-gen later)
Checkpoints (Step 1.1)
  • ASCEND_KERNEL_ROOT
    contains
    build.sh
    ,
    CMakeLists.txt
    ,
    csrc/
  • csrc/ops/<op_name>/
    is created, containing placeholder
    design.md
    ,
    op_host/
    ,
    op_kernel/
    ,
    CMakeLists.txt
    , etc. (subject to this skill)

Step 1.2:Catlass 源码

Step 1.2: Catlass Source Code

本步骤不对应独立 skill文件,但必须按下列要求执行。
前置:Step 1.1 完成
执行内容
  1. ASCEND_KERNEL_ROOT
    下确保存在
    catlass/
    ,且含
    catlass/include
    catlass/examples
  2. 若不存在:MUST 在工程根执行(禁止
    csrc/ops/<op_name>/
    内克隆)
    git clone https://gitcode.com/cann/catlass.git catlass
检查点(Step 1.2)
  • <ASCEND_KERNEL_ROOT>/catlass/include
    存在
  • <ASCEND_KERNEL_ROOT>/catlass/examples
    存在
Phase 1 全部通过 → 进入 Phase 2

This step does not correspond to an independent skill file, but must be executed according to the following requirements.
Prerequisite: Step 1.1 is completed
Execution Content
  1. Ensure
    catlass/
    exists under
    ASCEND_KERNEL_ROOT
    , and contains
    catlass/include
    and
    catlass/examples
  2. If not exists: MUST execute in the project root (Prohibited to clone in
    csrc/ops/<op_name>/
    )
    git clone https://gitcode.com/cann/catlass.git catlass
Checkpoints (Step 1.2)
  • <ASCEND_KERNEL_ROOT>/catlass/include
    exists
  • <ASCEND_KERNEL_ROOT>/catlass/examples
    exists
All Phase 1 checkpoints passed → Enter Phase 2

Phase 2:Catlass 设计文档

Phase 2: Catlass Design Document

调用 Skill
catlass-operator-design
Call Skill:
catlass-operator-design

执行内容

Execution Content

MANDATORY: 按 catlass-operator-design 执行:
1. 分析需求与 Catlass 组件边界
2. 对齐 catlass/examples 与 catlass/include 的可实现路径
3. 定稿并落盘推荐路径:csrc/ops/<op_name>/design.md(与 doc-gen / precision-eval / performance-eval 读取一致)
MANDATORY: Execute according to catlass-operator-design:
1. Analyze requirements and Catlass component boundaries
2. Align with the implementable paths of catlass/examples and catlass/include
3. Finalize and save the recommended path: csrc/ops/<op_name>/design.md (consistent with the reading path of doc-gen / precision-eval / performance-eval)

检查点

Checkpoints

  • csrc/ops/<op_name>/design.md
    已定稿(非空占位)
  • 写清参考 example 路径、Kernel/Host 契约、dtype/shape 约束等(以 catlass-operator-design 为准)
全部通过 → 进入 Phase 3

  • csrc/ops/<op_name>/design.md
    is finalized (not an empty placeholder)
  • Clearly states the reference example path, Kernel/Host contract, dtype/shape constraints, etc. (subject to catlass-operator-design)
All checkpoints passed → Enter Phase 3

Phase 3:代码生成 + 框架适配 + 编译测试

Phase 3: Code Generation + Framework Adaption + Compile Test

调用 Skill
catlass-operator-code-gen
(阶段 5 MUST 调用
ascendc-operator-compile-debug
Call Skill:
catlass-operator-code-gen
(Phase 5 MUST call
ascendc-operator-compile-debug
)

执行内容

Execution Content

MANDATORY: 按 catlass-operator-code-gen 执行(与 ascendc-operator-code-gen 阶段结构对齐):

阶段 1: 加载 GUIDE / references(含 compile-catlass、与 ascendc code-gen 对齐章节)
阶段 2: 读取 design.md,锁定 catlass/examples 路径与类型系统
阶段 3: 生成 op_kernel + op_host,CMake 登记 Catlass 编译选项(BUILD_CATLASS_MODULE、CATLASS_ARCH 等见 compile-catlass.md)
阶段 4: 框架适配 — ops.h、register.cpp、csrc/CMakeLists.txt
阶段 5: 编译安装与测试 — 调用 ascendc-operator-compile-debug(build.sh、pip install、tests/test_<op_name>.py,失败排错以该 skill 为准)
MANDATORY: Execute according to catlass-operator-code-gen (aligned with the phase structure of ascendc-operator-code-gen):

Phase 1: Load GUIDE / references (including compile-catlass, chapters aligned with ascendc code-gen)
Phase 2: Read design.md, lock the catlass/examples path and type system
Phase 3: Generate op_kernel + op_host, register Catlass compilation options in CMake (BUILD_CATLASS_MODULE, CATLASS_ARCH, etc. see compile-catlass.md)
Phase 4: Framework adaptation — ops.h, register.cpp, csrc/CMakeLists.txt
Phase 5: Compile, install and test — call ascendc-operator-compile-debug (build.sh, pip install, tests/test_<op_name>.py, error troubleshooting is subject to this skill)

检查点

Checkpoints

  • op_host
    op_kernel
    design.md
    、选定 example 一致
  • 框架注册与仓库模板一致(
    namespace ascend_kernel
    等)
  • 编译成功,whl 可安装
  • tests/test_<op_name>.py
    存在且通过(exit code 0)
  • 关键编译/测试结果在聊天中有摘要展示
全部通过 → 进入 Phase 4

  • op_host
    ,
    op_kernel
    are consistent with
    design.md
    and the selected example
  • Framework registration is consistent with the repository template (e.g.,
    namespace ascend_kernel
    )
  • Compilation is successful, whl package can be installed
  • tests/test_<op_name>.py
    exists and passes (exit code 0)
  • Key compilation/test results are summarized and displayed in the chat
All checkpoints passed → Enter Phase 4

Phase 4:接口文档生成

Phase 4: Interface Document Generation

调用 Skill
ascendc-operator-doc-gen
Call Skill:
ascendc-operator-doc-gen

执行内容

Execution Content

MANDATORY: 按 ascendc-operator-doc-gen 执行:
- 从 register.cpp、ops.h、design.md、op_host、tests 提取接口信息
- 生成 csrc/ops/<op_name>/README.md(PyTorch 风格中文)
- 在聊天界面展示文档要点或全文
MANDATORY: Execute according to ascendc-operator-doc-gen:
- Extract interface information from register.cpp, ops.h, design.md, op_host, tests
- Generate csrc/ops/<op_name>/README.md (PyTorch-style Chinese)
- Display document key points or full text in the chat interface

检查点

Checkpoints

  • README.md
    已写入算子目录
  • m.def
    / 实际 Python 调用一致
  • 已在聊天界面展示
全部通过 → 进入 Phase 5

  • README.md
    is written to the operator directory
  • Consistent with
    m.def
    / actual Python calls
  • Displayed in the chat interface
All checkpoints passed → Enter Phase 5

Phase 5:精度评估报告

Phase 5: Precision Evaluation Report

调用 Skill
ascendc-operator-precision-eval
Call Skill:
ascendc-operator-precision-eval

执行内容

Execution Content

MANDATORY: 按 ascendc-operator-precision-eval 执行:
- 用例数 ≥ 30,覆盖 shapes × dtypes × 边界
- 输出到 csrc/ops/<op_name>/test/,生成 Markdown 精度报告
- 在聊天界面展示总览、失败摘要与关键发现(不得仅给路径)
MANDATORY: Execute according to ascendc-operator-precision-eval:
- Number of test cases ≥30, covering shapes × dtypes × boundaries
- Output to csrc/ops/<op_name>/test/, generate Markdown precision report
- Display overview, failure summary and key findings in the chat interface (do not only provide paths)

检查点

Checkpoints

  • pytest 精度用例全部通过
  • <op_name>_precision_report.md
    (或该 skill 规定的报告名)已生成
  • 聊天中已展示精度结果摘要
FAIL 闭环:根因分析 → 修正设计(Phase 2)或代码(Phase 3)→ 再经 Phase 4、Phase 5 复测
全部通过 → 进入 Phase 6

  • All pytest precision test cases pass
  • <op_name>_precision_report.md
    (or the report name specified by this skill) is generated
  • Precision result summary is displayed in the chat
FAIL Closed Loop: Root cause analysis → Revise design (Phase 2) or code (Phase 3) → Re-test via Phase 4, Phase 5
All checkpoints passed → Enter Phase 6

Phase 6:性能评测报告

Phase 6: Performance Evaluation Report

调用 Skill
ascendc-operator-performance-eval
Call Skill:
ascendc-operator-performance-eval

执行内容

Execution Content

MANDATORY: 以 ascendc-operator-performance-eval SKILL.md 为唯一细则:
- 在 csrc/ops/<op_name>/test/ 维护 JSONL 用例;生成前先读 design.md
- 使用 torch_npu.profiler,warmup=5、active=5
- 汇总 ASCEND_PROFILER_OUTPUT/op_statistic.csv 等指标,输出自定义算子 vs 标杆的 Markdown 报告
- 在聊天界面展示对比表与简要结论
MANDATORY: Take ascendc-operator-performance-eval SKILL.md as the only detailed rule:
- Maintain JSONL test cases in csrc/ops/<op_name>/test/; read design.md before generation
- Use torch_npu.profiler, warmup=5, active=5
- Summarize indicators such as ASCEND_PROFILER_OUTPUT/op_statistic.csv, output Markdown report of custom operator vs benchmark
- Display comparison table and brief conclusion in the chat interface

检查点

Checkpoints

  • 用例与报告形态符合该 skill(含 DType、双路径对比等)
  • 报告文件已落盘于算子
    test/
    目录
  • 聊天中已展示性能摘要
全部通过 → Catlass 算子主流程完成

  • Test cases and report format comply with this skill (including DType, dual-path comparison, etc.)
  • Report file is saved in the operator
    test/
    directory
  • Performance summary is displayed in the chat
All checkpoints passed → Catlass operator main workflow is completed

交付后可选:性能优化

Post-Delivery Optional: Performance Optimization

调用 Skill
catlass-operator-performance-optim
须询问用户是否进入调优;不得默认跳过询问。
  • 用户同意 → 按 catlass-operator-performance-optim 修改 tiling/实现;凡改代码 → 从 Phase 3 起复跑(Phase 3→4→5→6),直至再次达标
  • 用户拒绝 → 结束

Call Skill:
catlass-operator-performance-optim
Must ask the user whether to enter tuning; do not skip the question by default.
  • User agrees → Modify tiling/implementation according to catlass-operator-performance-optim; any code change → Re-run from Phase 3 (Phase 3→4→5→6) until it meets the standards again
  • User refuses → End

阶段间数据流

Inter-Phase Data Flow

Phase 1 输出                         Phase 2 输入
  ascend-kernel + ops/<op>/骨架       算子名、catlass/ 可引用
  + catlass/include、examples   ────▶

Phase 2 输出                         Phase 3 输入
  design.md(定稿)            ────▶  example 路径、类型与 Host 契约

Phase 3 输出                         Phase 4 输入
  已安装 whl + test_<op>.py     ────▶  register.cpp / ops.h / design.md / op_host

Phase 4 输出                         Phase 5 输入
  README.md                    ────▶  接口、dtype、约束、调用方式

Phase 5 输出                         Phase 6 输入
  精度通过 + 报告                ────▶  算子名、标杆 API、JSONL 与 profiler 流程

Phase 6 输出
  性能报告(profiler)           ────▶  可选:用户确认后进入 catlass-operator-performance-optim
Phase 1 Output                         Phase 2 Input
  ascend-kernel + ops/<op>/skeleton       Operator name, catlass/ is referenceable
  + catlass/include, examples   ────▶

Phase 2 Output                         Phase 3 Input
  design.md (finalized)            ────▶  Example path, type and Host contract

Phase 3 Output                         Phase 4 Input
  Installed whl + test_<op>.py     ────▶  register.cpp / ops.h / design.md / op_host

Phase 4 Output                         Phase 5 Input
  README.md                    ────▶  Interface, dtype, constraints, calling method

Phase 5 Output                         Phase 6 Input
  Precision passed + Report                ────▶  Operator name, benchmark API, JSONL and profiler workflow

Phase 6 Output
  Performance Report (profiler)           ────▶  Optional: Enter catlass-operator-performance-optim after user confirmation

状态跟踪表

Status Tracking Table

Phase前置条件调用 Skill / 动作关键产出物
0. 需求收集CANN + Conda +
op_name
(含 catlass)+ 功能描述
1. 工程 + CatlassPhase 0
ascendc-operator-project-init
+ 根目录
catlass/
骨架 + Catlass 源码树
2. 设计Phase 1
catlass-operator-design
design.md
3. 代码与测试Phase 2
catlass-operator-code-gen
compile-debug
可运行算子 + 基础测试通过
4. 接口文档Phase 3
ascendc-operator-doc-gen
README.md
5. 精度评估Phase 4
ascendc-operator-precision-eval
≥30 例 + 精度报告
6. 性能评测Phase 5
ascendc-operator-performance-eval
JSONL + profiler 报告
(可选)调优Phase 6 + 用户确认
catlass-operator-performance-optim
迭代后的实现与报告
PhasePrerequisiteCalled Skill / ActionKey Deliverables
0. Requirements CollectionNoneCANN + Conda +
op_name
(contains catlass) + Function description
1. Project + CatlassPhase 0
ascendc-operator-project-init
+ root directory
catlass/
Skeleton + Catlass source code tree
2. DesignPhase 1
catlass-operator-design
design.md
3. Code & TestPhase 2
catlass-operator-code-gen
compile-debug
Runnable operator + Basic test passed
4. Interface DocPhase 3
ascendc-operator-doc-gen
README.md
5. Precision EvalPhase 4
ascendc-operator-precision-eval
≥30 cases + Precision report
6. Performance EvalPhase 5
ascendc-operator-performance-eval
JSONL + Profiler report
(Optional) TuningPhase 6 + User Confirmation
catlass-operator-performance-optim
Iterated implementation and report

错误恢复

Error Recovery

从中断点恢复

Resume from Breakpoint

当用户说「继续 Catlass 算子开发」时:
检测条件判定阶段恢复动作
csrc/ops/<op_name>/
不存在
Phase 1 未完成从 Phase 1 Step 1.1 开始
catlass/examples
不存在
Phase 1 未完成完成 Step 1.2 克隆
design.md
为空或占位
Phase 2 未完成从 Phase 2 开始
op_host
/
op_kernel
仍为骨架或与 design 不符
Phase 3 未完成从 Phase 3 开始
whl 未安装或
tests/test_<op_name>.py
失败
Phase 3 未完成在 compile-debug 流程内恢复
README.md
Phase 4 未完成从 Phase 4 开始
test/
无精度报告或精度未全过
Phase 5 未完成从 Phase 5 恢复
无性能报告或不符合 performance-eval 要求Phase 6 未完成从 Phase 6 恢复
When the user says "Continue Catlass operator development":
Detection ConditionDetermined PhaseRecovery Action
csrc/ops/<op_name>/
does not exist
Phase 1 not completedStart from Phase 1 Step 1.1
catlass/examples
does not exist
Phase 1 not completedComplete Step 1.2 cloning
design.md
is empty or placeholder
Phase 2 not completedStart from Phase 2
op_host
/
op_kernel
are still skeleton or inconsistent with design
Phase 3 not completedStart from Phase 3
whl not installed or
tests/test_<op_name>.py
fails
Phase 3 not completedRecover within compile-debug workflow
No
README.md
Phase 4 not completedStart from Phase 4
No precision report in
test/
or precision not fully passed
Phase 5 not completedRecover from Phase 5
No performance report or does not meet performance-eval requirementsPhase 6 not completedRecover from Phase 6

编译/测试失败

Compile/Test Failure

ascendc-operator-compile-debug(经 catlass-operator-code-gen 触发)处理;重试与排错上限以 compile-debug skill 为准。
Handled by ascendc-operator-compile-debug (triggered via catlass-operator-code-gen); retry and troubleshooting limits are subject to compile-debug skill.