Loading...
Loading...
Compare original and translation side by side
design.mdop_namecatlassdesign.mdop_namecatlassBUILD_CATLASS_MODULE=ON-DPYTHON_EXECUTABLEASCEND_BUILD_PYTHONCATLASS_ARCHcatlass-operator-code-gen/references/compile-catlass.mdcann-*/set_env.shASCEND_RUNTIME_PATHexport ASCEND_RUNTIME_PATH="${ASCEND_TOOLKIT_HOME}/runtime"catlass/includecatlass/examplescompile-catlass.mdBUILD_CATLASS_MODULE=ON-DPYTHON_EXECUTABLEASCEND_BUILD_PYTHONCATLASS_ARCHcatlass-operator-code-gen/references/compile-catlass.mdcann-*/set_env.shASCEND_RUNTIME_PATHexport ASCEND_RUNTIME_PATH="${ASCEND_TOOLKIT_HOME}/runtime"catlass/includecatlass/examplescompile-catlass.md| Skill | 路径 | 职责 |
|---|---|---|
| | 检测/创建 ascend-kernel,在 |
| — | (Phase 1 内步骤) | 在 ASCEND_KERNEL_ROOT 克隆 |
| | 将 Catlass 需求转为定稿设计文档(推荐 |
| | 按 |
| | 编译、安装 whl、生成/运行 |
| | 生成 PyTorch 风格中文 API 文档 |
| | ≥30 例精度测试与精度验证报告(必选阶段) |
| | JSONL 用例 + torch_npu.profiler(warmup/active=5)+ |
| | 交付后可选:按 Catlass 文档做 tiling/性能迭代;代码变更后须回到 Phase 3 起复跑闭环 |
| Skill | Path | Responsibilities |
|---|---|---|
| | Detect/create ascend-kernel, generate operator skeleton in |
| — | (Steps within Phase 1) | Clone |
| | Convert Catlass requirements into finalized design documentation (recommended |
| | Implement op_host / op_kernel and framework adaptation according to |
| | Compile, install whl, generate/run |
| | Generate PyTorch-style Chinese API documentation |
| | ≥30 precision test cases and precision verification report (mandatory phase) |
| | JSONL test cases + torch_npu.profiler (warmup/active=5) + |
| | Optional after delivery: Perform tiling/performance iteration according to Catlass documentation; after code changes, re-run the closed loop starting from Phase 3 |
| 术语 | 含义 |
|---|---|
| ASCEND_KERNEL_ROOT | ascend-kernel 根目录:含 |
| 算子目录 | |
| Catlass 源码 | |
| Term | Meaning |
|---|---|
| ASCEND_KERNEL_ROOT | ascend-kernel root directory: contains |
| Operator Directory | |
| Catlass Source Code | |
┌─────────────────────────────┐ ┌──────────────┐ ┌───────────────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Phase 1 │ │ Phase 2 │ │ Phase 3 │ │ Phase 4 │ │ Phase 5 │ │ Phase 6 │
│ 工程初始化 + Catlass 源码 │──▶│ Catlass 设计 │──▶│ 代码生成+框架适配+编译测试 │──▶│ 接口文档生成 │──▶│ 精度评估报告 │──▶│ 性能评测报告 │
│ project-init + clone │ │ catlass- │ │ catlass-code-gen → │ │ doc-gen │ │ precision-eval │ │ performance-eval│
│ catlass │ │ design │ │ compile-debug │ │ │ │ │ │ (profiler) │
└─────────────────────────────┘ └──────────────┘ └───────────────────────────┘ └──────────────────┘ └──────────────────┘ └──────────────────┘
输入: 算子名(含 catlass) + 功能描述 + 环境确认 输出: 可交付算子 + README + 精度报告 + profiler 性能报告┌─────────────────────────────┐ ┌──────────────┐ ┌───────────────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Phase 1 │ │ Phase 2 │ │ Phase 3 │ │ Phase 4 │ │ Phase 5 │ │ Phase 6 │
│ Project Init + Catlass Src │──▶│ Catlass Design │──▶│ Code Gen+Framework Adapt+Compile Test │──▶│ Interface Doc Gen │──▶│ Precision Eval Report │──▶│ Performance Eval Report │
│ project-init + clone │ │ catlass- │ │ catlass-code-gen → │ │ doc-gen │ │ precision-eval │ │ performance-eval│
│ catlass │ │ design │ │ compile-debug │ │ │ │ │ │ (profiler) │
└─────────────────────────────┘ └──────────────┘ └───────────────────────────┘ └──────────────────┘ └──────────────────┘ └──────────────────┘
Input: Operator name (contains catlass) + Function description + Environment confirmation Output: Deliverable operator + README + Precision report + Profiler performance reportcatlass/includecatlass/examplescsrc/ops/<op_name>/catlass/catlass/includecatlass/examplescsrc/ops/<op_name>/catlass/ASCEND_HOME_PATHecho $ASCEND_HOME_PATHCANN_PATH/usr/local/Ascend/ascend-toolkitsource ${CANN_PATH}/*/set_env.shASCEND_HOME_PATHecho $ASCEND_HOME_PATHCANN_PATH/usr/local/Ascend/ascend-toolkitsource ${CANN_PATH}/*/set_env.shCONDA_DEFAULT_ENVbasebaseconda activate <env_name>CONDA_DEFAULT_ENVbasebaseconda activate <env_name>set_env.shset_env.sh| 信息 | 格式要求 | 必填 | 说明 |
|---|---|---|---|
| CANN 路径 | 绝对路径 | 是 | 同 ascendc,可自动检测 |
| Conda 环境 | 字符串 | 是 | 同 ascendc,可自动检测 |
| 算子名称 | snake_case,含 | 是 | 如 |
| 功能描述 | 文本/公式/对标示例 | 是 | 与 Catlass 能力范围一致 |
| Information | Format Requirement | Mandatory | Description |
|---|---|---|---|
| CANN Path | Absolute path | Yes | Same as ascendc, can be detected automatically |
| Conda Environment | String | Yes | Same as ascendc, can be detected automatically |
| Operator Name | snake_case, contains | Yes | e.g., |
| Function Description | Text/Formula/Benchmark Example | Yes | Consistent with Catlass capability scope |
| 用户请求 | 处理方式 |
|---|---|
| 「开发/生成某 Catlass 算子」 | 完成 Step 0.1 → 校验名称含 |
| 「继续 Catlass 算子开发」 | 完成 Step 0.1 → 按 错误恢复 检测当前阶段并续跑 |
| User Request | Handling Method |
|---|---|
| "Develop/generate a Catlass operator" | Complete Step 0.1 → Verify name contains |
| "Continue Catlass operator development" | Complete Step 0.1 → Detect current phase according to Error Recovery and resume execution |
op_namecatlassop_namecatlassascendc-operator-project-initMANDATORY: 按 ascendc-operator-project-init 执行:
1. 检测或创建 ascend-kernel
2. 在 csrc/ops/<op_name>/ 创建算子骨架
3. 提示注册更新点(后续由 catlass-operator-code-gen 落实)ASCEND_KERNEL_ROOTbuild.shCMakeLists.txtcsrc/csrc/ops/<op_name>/design.mdop_host/op_kernel/CMakeLists.txtascendc-operator-project-initMANDATORY: Execute according to ascendc-operator-project-init:
1. Detect or create ascend-kernel
2. Create operator skeleton in csrc/ops/<op_name>/
3. Prompt registration update points (to be implemented by catlass-operator-code-gen later)ASCEND_KERNEL_ROOTbuild.shCMakeLists.txtcsrc/csrc/ops/<op_name>/design.mdop_host/op_kernel/CMakeLists.txtASCEND_KERNEL_ROOTcatlass/catlass/includecatlass/examplescsrc/ops/<op_name>/git clone https://gitcode.com/cann/catlass.git catlass<ASCEND_KERNEL_ROOT>/catlass/include<ASCEND_KERNEL_ROOT>/catlass/examplescatlass/ASCEND_KERNEL_ROOTcatlass/includecatlass/examplescsrc/ops/<op_name>/git clone https://gitcode.com/cann/catlass.git catlass<ASCEND_KERNEL_ROOT>/catlass/include<ASCEND_KERNEL_ROOT>/catlass/examplescatlass-operator-designcatlass-operator-designMANDATORY: 按 catlass-operator-design 执行:
1. 分析需求与 Catlass 组件边界
2. 对齐 catlass/examples 与 catlass/include 的可实现路径
3. 定稿并落盘推荐路径:csrc/ops/<op_name>/design.md(与 doc-gen / precision-eval / performance-eval 读取一致)MANDATORY: Execute according to catlass-operator-design:
1. Analyze requirements and Catlass component boundaries
2. Align with implementable paths of catlass/examples and catlass/include
3. Finalize and save the recommended path: csrc/ops/<op_name>/design.md (consistent with the reading of doc-gen / precision-eval / performance-eval)csrc/ops/<op_name>/design.mdcsrc/ops/<op_name>/design.mdcatlass-operator-code-genascendc-operator-compile-debugcatlass-operator-code-genascendc-operator-compile-debugMANDATORY: 按 catlass-operator-code-gen 执行(与 ascendc-operator-code-gen 阶段结构对齐):
阶段 1: 加载 GUIDE / references(含 compile-catlass、与 ascendc code-gen 对齐章节)
阶段 2: 读取 design.md,锁定 catlass/examples 路径与类型系统
阶段 3: 生成 op_kernel + op_host,CMake 登记 Catlass 编译选项(BUILD_CATLASS_MODULE、CATLASS_ARCH 等见 compile-catlass.md)
阶段 4: 框架适配 — ops.h、register.cpp、csrc/CMakeLists.txt
阶段 5: 编译安装与测试 — 调用 ascendc-operator-compile-debug(build.sh、pip install、tests/test_<op_name>.py,失败排错以该 skill 为准)MANDATORY: Execute according to catlass-operator-code-gen (aligned with ascendc-operator-code-gen phase structure):
Phase 1: Load GUIDE / references (including compile-catlass, chapters aligned with ascendc code-gen)
Phase 2: Read design.md, lock catlass/examples path and type system
Phase 3: Generate op_kernel + op_host, register Catlass compilation options in CMake (BUILD_CATLASS_MODULE, CATLASS_ARCH, etc. see compile-catlass.md)
Phase 4: Framework adaptation — ops.h, register.cpp, csrc/CMakeLists.txt
Phase 5: Compilation installation and testing — call ascendc-operator-compile-debug (build.sh, pip install, tests/test_<op_name>.py, failure troubleshooting subject to this skill)op_hostop_kerneldesign.mdnamespace ascend_kerneltests/test_<op_name>.pyop_hostop_kerneldesign.mdnamespace ascend_kerneltests/test_<op_name>.pyascendc-operator-doc-genascendc-operator-doc-genMANDATORY: 按 ascendc-operator-doc-gen 执行:
- 从 register.cpp、ops.h、design.md、op_host、tests 提取接口信息
- 生成 csrc/ops/<op_name>/README.md(PyTorch 风格中文)
- 在聊天界面展示文档要点或全文MANDATORY: Execute according to ascendc-operator-doc-gen:
- Extract interface information from register.cpp, ops.h, design.md, op_host, tests
- Generate csrc/ops/<op_name>/README.md (PyTorch-style Chinese)
- Display document key points or full text in chat interfaceREADME.mdm.defREADME.mdm.defascendc-operator-precision-evalascendc-operator-precision-evalMANDATORY: 按 ascendc-operator-precision-eval 执行:
- 用例数 ≥ 30,覆盖 shapes × dtypes × 边界
- 输出到 csrc/ops/<op_name>/test/,生成 Markdown 精度报告
- 在聊天界面展示总览、失败摘要与关键发现(不得仅给路径)MANDATORY: Execute according to ascendc-operator-precision-eval:
- Number of test cases ≥ 30, covering shapes × dtypes × boundaries
- Output to csrc/ops/<op_name>/test/, generate Markdown precision report
- Display overview, failure summary and key findings in chat interface (do not only provide paths)<op_name>_precision_report.md<op_name>_precision_report.mdascendc-operator-performance-evalascendc-operator-performance-evalMANDATORY: 以 ascendc-operator-performance-eval SKILL.md 为唯一细则:
- 在 csrc/ops/<op_name>/test/ 维护 JSONL 用例;生成前先读 design.md
- 使用 torch_npu.profiler,warmup=5、active=5
- 汇总 ASCEND_PROFILER_OUTPUT/op_statistic.csv 等指标,输出自定义算子 vs 标杆的 Markdown 报告
- 在聊天界面展示对比表与简要结论MANDATORY: Only follow the details of ascendc-operator-performance-eval SKILL.md:
- Maintain JSONL test cases in csrc/ops/<op_name>/test/; read design.md before generation
- Use torch_npu.profiler, warmup=5, active=5
- Summarize indicators such as ASCEND_PROFILER_OUTPUT/op_statistic.csv, output custom operator vs benchmark Markdown report
- Display comparison table and brief conclusion in chat interfacetest/test/catlass-operator-performance-optimcatlass-operator-performance-optimPhase 1 输出 Phase 2 输入
ascend-kernel + ops/<op>/骨架 算子名、catlass/ 可引用
+ catlass/include、examples ────▶
Phase 2 输出 Phase 3 输入
design.md(定稿) ────▶ example 路径、类型与 Host 契约
Phase 3 输出 Phase 4 输入
已安装 whl + test_<op>.py ────▶ register.cpp / ops.h / design.md / op_host
Phase 4 输出 Phase 5 输入
README.md ────▶ 接口、dtype、约束、调用方式
Phase 5 输出 Phase 6 输入
精度通过 + 报告 ────▶ 算子名、标杆 API、JSONL 与 profiler 流程
Phase 6 输出
性能报告(profiler) ────▶ 可选:用户确认后进入 catlass-operator-performance-optimPhase 1 Output Phase 2 Input
ascend-kernel + ops/<op>/skeleton Operator name, catlass/ available
+ catlass/include, examples ────▶
Phase 2 Output Phase 3 Input
design.md (finalized) ────▶ Example path, type and Host contract
Phase 3 Output Phase 4 Input
Installed whl + test_<op>.py ────▶ register.cpp / ops.h / design.md / op_host
Phase 4 Output Phase 5 Input
README.md ────▶ Interface, dtype, constraints, calling method
Phase 5 Output Phase 6 Input
Precision passed + Report ────▶ Operator name, benchmark API, JSONL and profiler process
Phase 6 Output
Performance report (profiler) ────▶ Optional: Enter catlass-operator-performance-optim after user confirmation| Phase | 前置条件 | 调用 Skill / 动作 | 关键产出物 |
|---|---|---|---|
| 0. 需求收集 | 无 | — | CANN + Conda + |
| 1. 工程 + Catlass | Phase 0 | | 骨架 + Catlass 源码树 |
| 2. 设计 | Phase 1 | | |
| 3. 代码与测试 | Phase 2 | | 可运行算子 + 基础测试通过 |
| 4. 接口文档 | Phase 3 | | |
| 5. 精度评估 | Phase 4 | | ≥30 例 + 精度报告 |
| 6. 性能评测 | Phase 5 | | JSONL + profiler 报告 |
| (可选)调优 | Phase 6 + 用户确认 | | 迭代后的实现与报告 |
| Phase | Prerequisites | Called Skill / Action | Key Deliverables |
|---|---|---|---|
| 0. Requirements Collection | None | — | CANN + Conda + |
| 1. Project + Catlass | Phase 0 | | Skeleton + Catlass source tree |
| 2. Design | Phase 1 | | |
| 3. Code & Testing | Phase 2 | | Runnable operator + basic test passed |
| 4. Interface Documentation | Phase 3 | | |
| 5. Precision Evaluation | Phase 4 | | ≥30 cases + Precision report |
| 6. Performance Evaluation | Phase 5 | | JSONL + Profiler report |
| (Optional) Tuning | Phase 6 + User confirmation | | Iterated implementation and report |
| 检测条件 | 判定阶段 | 恢复动作 |
|---|---|---|
| Phase 1 未完成 | 从 Phase 1 Step 1.1 开始 |
| Phase 1 未完成 | 完成 Step 1.2 克隆 |
| Phase 2 未完成 | 从 Phase 2 开始 |
| Phase 3 未完成 | 从 Phase 3 开始 |
whl 未安装或 | Phase 3 未完成 | 在 compile-debug 流程内恢复 |
无 | Phase 4 未完成 | 从 Phase 4 开始 |
| Phase 5 未完成 | 从 Phase 5 恢复 |
| 无性能报告或不符合 performance-eval 要求 | Phase 6 未完成 | 从 Phase 6 恢复 |
| Detection Condition | Determined Phase | Recovery Action |
|---|---|---|
| Phase 1 incomplete | Start from Phase 1 Step 1.1 |
| Phase 1 incomplete | Complete Step 1.2 cloning |
| Phase 2 incomplete | Start from Phase 2 |
| Phase 3 incomplete | Start from Phase 3 |
whl not installed or | Phase 3 incomplete | Recover within compile-debug workflow |
No | Phase 4 incomplete | Start from Phase 4 |
No precision report in | Phase 5 incomplete | Recover from Phase 5 |
| No performance report or does not meet performance-eval requirements | Phase 6 incomplete | Recover from Phase 6 |