external-gitcode-ascend-catlass-operator-dev
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCatlass 算子端到端开发编排
Catlass Operator End-to-End Development Orchestration
Skill 类型:流程导向型(六阶段工作流;Catlass 源码准备并入 Phase 1,子技能串行编排)
本 skill 编排 ascend-kernel 上 Catlass 算子从零到生产可用;通用能力(工程骨架、编译调试、接口文档、精度、性能)复用 ascendc-* 子 skill,Catlass 专属(源码树、设计、Device/Host 落地)使用 catlass-* 子 skill。
Skill Type: Process-oriented (six-phase workflow; Catlass Source Code Preparation is incorporated into Phase 1, sub-skills are orchestrated serially)
This skill orchestrates Catlass operators on ascend-kernel from scratch to production-ready; general capabilities (project skeleton, compilation debugging, interface documentation, precision, performance) reuse ascendc-* sub-skills, and Catlass-specific (source code tree, design, Device/Host implementation) uses catlass-* sub-skills.
核心原则
Core Principles
- 六阶段串行:工程初始化(含 Catlass 源码)→ 设计文档 → 代码生成与编译测试 → 接口文档 → 精度评估 → 性能评测,严格顺序执行
- 子技能执行:每个阶段 MUST 打开并遵循对应子 skill,不得自行替代实现
- 阶段门控:前一阶段检查点全部通过后才进入下一阶段
- 设计驱动编码:代码生成依赖 catlass-operator-design 定稿的 与 catlass/examples 选型
design.md - 无需用户预先手写设计文档:设计阶段由 catlass-operator-design 生成并落盘
- 文档闭环:编译测试通过后 MUST 生成 PyTorch 风格中文接口文档(Phase 4),并在聊天界面展示
- 精度闭环:算子必须通过 ≥30 例全面精度评估(Phase 5)才算完成
- 性能闭环:算子必须完成 torch_npu.profiler 对比评测并输出性能报告(Phase 6);结论以 ascendc-operator-performance-eval 为准
- 结果可视化:Phase 3/4/5/6 的关键结果 MUST 以 Markdown 等形式直接展示在聊天界面,不要仅输出路径
- 算子命名:(snake_case)必须包含子串
op_name,与 ascend-kernel 内既有 Catlass 算子约定一致catlass - 诚实停机:因环境或依赖无法继续时,说明具体原因与已完成步骤后停止
- Six-phase serial execution: Project initialization (including Catlass source code) → Design documentation → Code generation & compilation testing → Interface documentation → Precision evaluation → Performance evaluation, executed in strict order
- Sub-skill execution: Each phase MUST open and follow the corresponding sub-skill, and shall not replace the implementation by oneself
- Phase gating: Enter the next phase only after all checkpoints of the previous phase are passed
- Design-driven coding: Code generation depends on the finalized from catlass-operator-design and the selection of catlass/examples
design.md - No need for users to pre-write design documents: The design phase generates and saves the design document via catlass-operator-design
- Documentation closed loop: After passing compilation testing, MUST generate PyTorch-style Chinese interface documentation (Phase 4) and display it in the chat interface
- Precision closed loop: The operator must pass ≥30 comprehensive precision evaluation cases (Phase 5) to be considered completed
- Performance closed loop: The operator must complete the comparative evaluation with torch_npu.profiler and output a performance report (Phase 6); the conclusion shall be based on ascendc-operator-performance-eval
- Result visualization: Key results of Phase 3/4/5/6 MUST be directly displayed in the chat interface in the form of Markdown or other formats, do not only output paths
- Operator naming: (snake_case) MUST contain the substring
op_name, consistent with the convention of existing Catlass operators in ascend-kernelcatlass - Honest shutdown: When unable to continue due to environment or dependencies, explain the specific reason and completed steps before stopping
Catlass 编译与运行(易错摘要)
Catlass Compilation and Operation (Error-Prone Summary)
- 构建:;CMake 使用含 torch_npu 的 Python(如
BUILD_CATLASS_MODULE=ON/-DPYTHON_EXECUTABLE);ASCEND_BUILD_PYTHON与芯片一致(见CATLASS_ARCH);CANN 可为 bundle 根 +catlass-operator-code-gen/references/compile-catlass.md。cann-*/set_env.sh - pytest / torch_npu:若报 :
ASCEND_RUNTIME_PATH。export ASCEND_RUNTIME_PATH="${ASCEND_TOOLKIT_HOME}/runtime" - 设计/代码:与 、
catlass/include可对齐编译的示例一致,细则见catlass/examples。compile-catlass.md
- Build: ; CMake uses Python with torch_npu (such as
BUILD_CATLASS_MODULE=ON/-DPYTHON_EXECUTABLE);ASCEND_BUILD_PYTHONmust match the chip (seeCATLASS_ARCH); CANN can be the bundle root +catlass-operator-code-gen/references/compile-catlass.md.cann-*/set_env.sh - pytest / torch_npu: If is reported:
ASCEND_RUNTIME_PATH.export ASCEND_RUNTIME_PATH="${ASCEND_TOOLKIT_HOME}/runtime" - Design/Code: Consistent with the compilable examples in and
catlass/include, details seecatlass/examples.compile-catlass.md
可用子 Skill 清单
Available Sub-Skill List
| Skill | 路径 | 职责 |
|---|---|---|
| | 检测/创建 ascend-kernel,在 |
| — | (Phase 1 内步骤) | 在 ASCEND_KERNEL_ROOT 克隆 |
| | 将 Catlass 需求转为定稿设计文档(推荐 |
| | 按 |
| | 编译、安装 whl、生成/运行 |
| | 生成 PyTorch 风格中文 API 文档 |
| | ≥30 例精度测试与精度验证报告(必选阶段) |
| | JSONL 用例 + torch_npu.profiler(warmup/active=5)+ |
| | 交付后可选:按 Catlass 文档做 tiling/性能迭代;代码变更后须回到 Phase 3 起复跑闭环 |
| Skill | Path | Responsibility |
|---|---|---|
| | Detect/create ascend-kernel, generate operator skeleton in |
| — | (Step within Phase 1) | Clone |
| | Convert Catlass requirements into finalized design documents (recommended path: |
| | Implement op_host / op_kernel and framework adaptation according to |
| | Compile, install whl package, generate/run |
| | Generate PyTorch-style Chinese API document |
| | ≥30 precision test cases and precision verification report (mandatory phase) |
| | JSONL test cases + torch_npu.profiler (warmup/active=5) + |
| | Optional after delivery: Perform tiling/performance iteration according to Catlass documentation; after code changes, re-run the closed loop starting from Phase 3 |
工程目录术语(与 AscendC 对齐)
Project Directory Terminology (Aligned with AscendC)
| 术语 | 含义 |
|---|---|
| ASCEND_KERNEL_ROOT | ascend-kernel 根目录:含 |
| 算子目录 | |
| Catlass 源码 | |
| Term | Meaning |
|---|---|
| ASCEND_KERNEL_ROOT | Root directory of ascend-kernel: contains |
| Operator Directory | |
| Catlass Source Code | |
工作流总览
Workflow Overview
┌─────────────────────────────┐ ┌──────────────┐ ┌───────────────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Phase 1 │ │ Phase 2 │ │ Phase 3 │ │ Phase 4 │ │ Phase 5 │ │ Phase 6 │
│ 工程初始化 + Catlass 源码 │──▶│ Catlass 设计 │──▶│ 代码生成+框架适配+编译测试 │──▶│ 接口文档生成 │──▶│ 精度评估报告 │──▶│ 性能评测报告 │
│ project-init + clone │ │ catlass- │ │ catlass-code-gen → │ │ doc-gen │ │ precision-eval │ │ performance-eval│
│ catlass │ │ design │ │ compile-debug │ │ │ │ │ │ (profiler) │
└─────────────────────────────┘ └──────────────┘ └───────────────────────────┘ └──────────────────┘ └──────────────────┘ └──────────────────┘
输入: 算子名(含 catlass) + 功能描述 + 环境确认 输出: 可交付算子 + README + 精度报告 + profiler 性能报告┌─────────────────────────────┐ ┌──────────────┐ ┌───────────────────────────┐ ┌──────────────────┐ ┌──────────────────┐ ┌──────────────────┐
│ Phase 1 │ │ Phase 2 │ │ Phase 3 │ │ Phase 4 │ │ Phase 5 │ │ Phase 6 │
│ Project Init + Catlass Src │──▶│ Catlass Design │──▶│ Code Gen + Framework Adaption + Compile Test │──▶│ Interface Doc Gen │──▶│ Precision Eval Report │──▶│ Performance Eval Report │
│ project-init + clone │ │ catlass- │ │ catlass-code-gen → │ │ doc-gen │ │ precision-eval │ │ performance-eval│
│ catlass │ │ design │ │ compile-debug │ │ │ │ │ │ (profiler) │
└─────────────────────────────┘ └──────────────┘ └───────────────────────────┘ └──────────────────┘ └──────────────────┘ └──────────────────┘
Input: Operator name (contains catlass) + Function description + Environment confirmation Output: Deliverable operator + README + Precision report + Profiler performance report反模式清单(NEVER DO THESE)
Anti-Pattern List (NEVER DO THESE)
- ❌ 不要跳过 Catlass 源码准备(无 、
catlass/include就做设计或代码生成)catlass/examples - ❌ 不要在 内克隆 Catlass,必须在 工程根 下
csrc/ops/<op_name>/catlass/ - ❌ 不要跳过设计阶段直接写 kernel/host
- ❌ 不要自行实现整套算子落地而不遵循 catlass-operator-code-gen 流程
- ❌ 不要在代码生成前擅自修改框架注册(以 project-init / code-gen 约定为准)
- ❌ 不要手动替代 compile-debug 所负责的编译安装与基础测试闭环(应通过 code-gen 阶段 5 触发)
- ❌ 不要跳过接口文档阶段(Phase 3 通过后必须 Phase 4)
- ❌ 不要跳过精度评估阶段(Phase 4 通过后必须 Phase 5)
- ❌ 不要跳过性能评测阶段(Phase 5 通过后必须 Phase 6)
- ❌ 不要使用与 ascendc-operator-performance-eval 不一致的采集方式作为最终性能结论
- ❌ 不要引用不存在的 skill
- ❌ Do not skip Catlass Source Code Preparation (do not proceed with design or code generation without and
catlass/include)catlass/examples - ❌ Do not clone Catlass in , must be in project root under
csrc/ops/<op_name>/catlass/ - ❌ Do not skip the design phase and directly write kernel/host code
- ❌ Do not implement the entire operator independently without following the catlass-operator-code-gen process
- ❌ Do not modify framework registration without permission before code generation (follow the conventions of project-init / code-gen)
- ❌ Do not manually replace the compilation, installation and basic test closed loop responsible for compile-debug (should be triggered via code-gen Phase 5)
- ❌ Do not skip the interface documentation phase (Phase 4 must be executed after Phase 3 passes)
- ❌ Do not skip the precision evaluation phase (Phase 5 must be executed after Phase 4 passes)
- ❌ Do not skip the performance evaluation phase (Phase 6 must be executed after Phase 5 passes)
- ❌ Do not use collection methods inconsistent with ascendc-operator-performance-eval as the final performance conclusion
- ❌ Do not reference non-existent skills
Phase 0:需求收集
Phase 0: Requirements Collection
目标:确认 Catlass 算子开发的最小信息集与运行环境(与 ascendc-operator-dev Phase 0 对齐,并增加 Catlass 命名约束)。
Objective: Confirm the minimum information set and operating environment for Catlass operator development (aligned with ascendc-operator-dev Phase 0, plus Catlass naming constraints).
Step 0.1:环境确认(MUST 在任何开发动作之前完成)
Step 0.1: Environment Confirmation (MUST be completed before any development actions)
CANN 环境
CANN Environment
- 检查 (
ASCEND_HOME_PATH)echo $ASCEND_HOME_PATH - 已设置:作为 ,无需重复询问
CANN_PATH - 未设置:MUST 询问用户 CANN 路径(如 )
/usr/local/Ascend/ascend-toolkit
bash
source ${CANN_PATH}/*/set_env.sh- Check (run
ASCEND_HOME_PATH)echo $ASCEND_HOME_PATH - Already set: Use as , no need to ask repeatedly
CANN_PATH - Not set: MUST ask the user for the CANN path (e.g., )
/usr/local/Ascend/ascend-toolkit
bash
source ${CANN_PATH}/*/set_env.shConda 环境
Conda Environment
- 检查
CONDA_DEFAULT_ENV - 已激活且非 :直接使用
base - 未激活或为 :MUST 询问 conda 环境名
base
bash
conda activate <env_name>- Check
CONDA_DEFAULT_ENV - Activated and not : Use directly
base - Not activated or is : MUST ask the user for the conda environment name
base
bash
conda activate <env_name>环境确认检查点
Environment Confirmation Checkpoints
- CANN 路径已确定且 可执行
set_env.sh - Conda 环境已确定且可激活
- CANN path is confirmed and is executable
set_env.sh - Conda environment is confirmed and can be activated
Step 0.2:算子需求收集
Step 0.2: Operator Requirements Collection
| 信息 | 格式要求 | 必填 | 说明 |
|---|---|---|---|
| CANN 路径 | 绝对路径 | 是 | 同 ascendc,可自动检测 |
| Conda 环境 | 字符串 | 是 | 同 ascendc,可自动检测 |
| 算子名称 | snake_case,含 | 是 | 如 |
| 功能描述 | 文本/公式/对标示例 | 是 | 与 Catlass 能力范围一致 |
可选:支持 dtype、SoC —— 默认值与 catlass-operator-design / 平台 API 一致即可。
| Information | Format Requirement | Mandatory | Description |
|---|---|---|---|
| CANN Path | Absolute path | Yes | Aligned with ascendc, can be detected automatically |
| Conda Environment | String | Yes | Aligned with ascendc, can be detected automatically |
| Operator Name | snake_case, contains | Yes | e.g., |
| Function Description | Text/Formula/Benchmark Example | Yes | Consistent with Catlass capability scope |
Optional: Support dtype, SoC — default values can be consistent with catlass-operator-design / platform APIs.
决策树
Decision Tree
| 用户请求 | 处理方式 |
|---|---|
| 「开发/生成某 Catlass 算子」 | 完成 Step 0.1 → 校验名称含 |
| 「继续 Catlass 算子开发」 | 完成 Step 0.1 → 按 错误恢复 检测当前阶段并续跑 |
| User Request | Handling Method |
|---|---|
| "Develop/generate a certain Catlass operator" | Complete Step 0.1 → Validate that the name contains |
| "Continue Catlass operator development" | Complete Step 0.1 → Detect current phase according to Error Recovery and resume |
验收标准
Acceptance Criteria
- CANN + Conda 已确认
- 已确认且包含
op_namecatlass - 功能描述明确
- CANN + Conda are confirmed
- is confirmed and contains
op_namecatlass - Function description is clear
Phase 1:工程初始化 + Catlass 源码准备
Phase 1: Project Initialization + Catlass Source Code Preparation
Step 1.1:工程骨架
Step 1.1: Project Skeleton
调用 Skill:
ascendc-operator-project-initMANDATORY: 按 ascendc-operator-project-init 执行:
1. 检测或创建 ascend-kernel
2. 在 csrc/ops/<op_name>/ 创建算子骨架
3. 提示注册更新点(后续由 catlass-operator-code-gen 落实)检查点(Step 1.1)
- 含
ASCEND_KERNEL_ROOT、build.sh、CMakeLists.txtcsrc/ - 已创建,含占位
csrc/ops/<op_name>/、design.md、op_host/、op_kernel/等(以该 skill 为准)CMakeLists.txt
Call Skill:
ascendc-operator-project-initMANDATORY: Execute according to ascendc-operator-project-init:
1. Detect or create ascend-kernel
2. Create operator skeleton in csrc/ops/<op_name>/
3. Prompt registration update points (to be implemented by catlass-operator-code-gen later)Checkpoints (Step 1.1)
- contains
ASCEND_KERNEL_ROOT,build.sh,CMakeLists.txtcsrc/ - is created, containing placeholder
csrc/ops/<op_name>/,design.md,op_host/,op_kernel/, etc. (subject to this skill)CMakeLists.txt
Step 1.2:Catlass 源码
Step 1.2: Catlass Source Code
本步骤不对应独立 skill文件,但必须按下列要求执行。
前置:Step 1.1 完成
执行内容
- 在 下确保存在
ASCEND_KERNEL_ROOT,且含catlass/、catlass/includecatlass/examples - 若不存在:MUST 在工程根执行(禁止在 内克隆)
csrc/ops/<op_name>/
git clone https://gitcode.com/cann/catlass.git catlass
检查点(Step 1.2)
- 存在
<ASCEND_KERNEL_ROOT>/catlass/include - 存在
<ASCEND_KERNEL_ROOT>/catlass/examples
Phase 1 全部通过 → 进入 Phase 2
This step does not correspond to an independent skill file, but must be executed according to the following requirements.
Prerequisite: Step 1.1 is completed
Execution Content
- Ensure exists under
catlass/, and containsASCEND_KERNEL_ROOTandcatlass/includecatlass/examples - If not exists: MUST execute in the project root (Prohibited to clone in )
csrc/ops/<op_name>/
git clone https://gitcode.com/cann/catlass.git catlass
Checkpoints (Step 1.2)
- exists
<ASCEND_KERNEL_ROOT>/catlass/include - exists
<ASCEND_KERNEL_ROOT>/catlass/examples
All Phase 1 checkpoints passed → Enter Phase 2
Phase 2:Catlass 设计文档
Phase 2: Catlass Design Document
调用 Skill:
catlass-operator-designCall Skill:
catlass-operator-design执行内容
Execution Content
MANDATORY: 按 catlass-operator-design 执行:
1. 分析需求与 Catlass 组件边界
2. 对齐 catlass/examples 与 catlass/include 的可实现路径
3. 定稿并落盘推荐路径:csrc/ops/<op_name>/design.md(与 doc-gen / precision-eval / performance-eval 读取一致)MANDATORY: Execute according to catlass-operator-design:
1. Analyze requirements and Catlass component boundaries
2. Align with the implementable paths of catlass/examples and catlass/include
3. Finalize and save the recommended path: csrc/ops/<op_name>/design.md (consistent with the reading path of doc-gen / precision-eval / performance-eval)检查点
Checkpoints
- 已定稿(非空占位)
csrc/ops/<op_name>/design.md - 写清参考 example 路径、Kernel/Host 契约、dtype/shape 约束等(以 catlass-operator-design 为准)
全部通过 → 进入 Phase 3
- is finalized (not an empty placeholder)
csrc/ops/<op_name>/design.md - Clearly states the reference example path, Kernel/Host contract, dtype/shape constraints, etc. (subject to catlass-operator-design)
All checkpoints passed → Enter Phase 3
Phase 3:代码生成 + 框架适配 + 编译测试
Phase 3: Code Generation + Framework Adaption + Compile Test
调用 Skill:(阶段 5 MUST 调用 )
catlass-operator-code-genascendc-operator-compile-debugCall Skill: (Phase 5 MUST call )
catlass-operator-code-genascendc-operator-compile-debug执行内容
Execution Content
MANDATORY: 按 catlass-operator-code-gen 执行(与 ascendc-operator-code-gen 阶段结构对齐):
阶段 1: 加载 GUIDE / references(含 compile-catlass、与 ascendc code-gen 对齐章节)
阶段 2: 读取 design.md,锁定 catlass/examples 路径与类型系统
阶段 3: 生成 op_kernel + op_host,CMake 登记 Catlass 编译选项(BUILD_CATLASS_MODULE、CATLASS_ARCH 等见 compile-catlass.md)
阶段 4: 框架适配 — ops.h、register.cpp、csrc/CMakeLists.txt
阶段 5: 编译安装与测试 — 调用 ascendc-operator-compile-debug(build.sh、pip install、tests/test_<op_name>.py,失败排错以该 skill 为准)MANDATORY: Execute according to catlass-operator-code-gen (aligned with the phase structure of ascendc-operator-code-gen):
Phase 1: Load GUIDE / references (including compile-catlass, chapters aligned with ascendc code-gen)
Phase 2: Read design.md, lock the catlass/examples path and type system
Phase 3: Generate op_kernel + op_host, register Catlass compilation options in CMake (BUILD_CATLASS_MODULE, CATLASS_ARCH, etc. see compile-catlass.md)
Phase 4: Framework adaptation — ops.h, register.cpp, csrc/CMakeLists.txt
Phase 5: Compile, install and test — call ascendc-operator-compile-debug (build.sh, pip install, tests/test_<op_name>.py, error troubleshooting is subject to this skill)检查点
Checkpoints
- 、
op_host与op_kernel、选定 example 一致design.md - 框架注册与仓库模板一致(等)
namespace ascend_kernel - 编译成功,whl 可安装
- 存在且通过(exit code 0)
tests/test_<op_name>.py - 关键编译/测试结果在聊天中有摘要展示
全部通过 → 进入 Phase 4
- ,
op_hostare consistent withop_kerneland the selected exampledesign.md - Framework registration is consistent with the repository template (e.g., )
namespace ascend_kernel - Compilation is successful, whl package can be installed
- exists and passes (exit code 0)
tests/test_<op_name>.py - Key compilation/test results are summarized and displayed in the chat
All checkpoints passed → Enter Phase 4
Phase 4:接口文档生成
Phase 4: Interface Document Generation
调用 Skill:
ascendc-operator-doc-genCall Skill:
ascendc-operator-doc-gen执行内容
Execution Content
MANDATORY: 按 ascendc-operator-doc-gen 执行:
- 从 register.cpp、ops.h、design.md、op_host、tests 提取接口信息
- 生成 csrc/ops/<op_name>/README.md(PyTorch 风格中文)
- 在聊天界面展示文档要点或全文MANDATORY: Execute according to ascendc-operator-doc-gen:
- Extract interface information from register.cpp, ops.h, design.md, op_host, tests
- Generate csrc/ops/<op_name>/README.md (PyTorch-style Chinese)
- Display document key points or full text in the chat interface检查点
Checkpoints
- 已写入算子目录
README.md - 与 / 实际 Python 调用一致
m.def - 已在聊天界面展示
全部通过 → 进入 Phase 5
- is written to the operator directory
README.md - Consistent with / actual Python calls
m.def - Displayed in the chat interface
All checkpoints passed → Enter Phase 5
Phase 5:精度评估报告
Phase 5: Precision Evaluation Report
调用 Skill:
ascendc-operator-precision-evalCall Skill:
ascendc-operator-precision-eval执行内容
Execution Content
MANDATORY: 按 ascendc-operator-precision-eval 执行:
- 用例数 ≥ 30,覆盖 shapes × dtypes × 边界
- 输出到 csrc/ops/<op_name>/test/,生成 Markdown 精度报告
- 在聊天界面展示总览、失败摘要与关键发现(不得仅给路径)MANDATORY: Execute according to ascendc-operator-precision-eval:
- Number of test cases ≥30, covering shapes × dtypes × boundaries
- Output to csrc/ops/<op_name>/test/, generate Markdown precision report
- Display overview, failure summary and key findings in the chat interface (do not only provide paths)检查点
Checkpoints
- pytest 精度用例全部通过
- (或该 skill 规定的报告名)已生成
<op_name>_precision_report.md - 聊天中已展示精度结果摘要
FAIL 闭环:根因分析 → 修正设计(Phase 2)或代码(Phase 3)→ 再经 Phase 4、Phase 5 复测
全部通过 → 进入 Phase 6
- All pytest precision test cases pass
- (or the report name specified by this skill) is generated
<op_name>_precision_report.md - Precision result summary is displayed in the chat
FAIL Closed Loop: Root cause analysis → Revise design (Phase 2) or code (Phase 3) → Re-test via Phase 4, Phase 5
All checkpoints passed → Enter Phase 6
Phase 6:性能评测报告
Phase 6: Performance Evaluation Report
调用 Skill:
ascendc-operator-performance-evalCall Skill:
ascendc-operator-performance-eval执行内容
Execution Content
MANDATORY: 以 ascendc-operator-performance-eval SKILL.md 为唯一细则:
- 在 csrc/ops/<op_name>/test/ 维护 JSONL 用例;生成前先读 design.md
- 使用 torch_npu.profiler,warmup=5、active=5
- 汇总 ASCEND_PROFILER_OUTPUT/op_statistic.csv 等指标,输出自定义算子 vs 标杆的 Markdown 报告
- 在聊天界面展示对比表与简要结论MANDATORY: Take ascendc-operator-performance-eval SKILL.md as the only detailed rule:
- Maintain JSONL test cases in csrc/ops/<op_name>/test/; read design.md before generation
- Use torch_npu.profiler, warmup=5, active=5
- Summarize indicators such as ASCEND_PROFILER_OUTPUT/op_statistic.csv, output Markdown report of custom operator vs benchmark
- Display comparison table and brief conclusion in the chat interface检查点
Checkpoints
- 用例与报告形态符合该 skill(含 DType、双路径对比等)
- 报告文件已落盘于算子 目录
test/ - 聊天中已展示性能摘要
全部通过 → Catlass 算子主流程完成
- Test cases and report format comply with this skill (including DType, dual-path comparison, etc.)
- Report file is saved in the operator directory
test/ - Performance summary is displayed in the chat
All checkpoints passed → Catlass operator main workflow is completed
交付后可选:性能优化
Post-Delivery Optional: Performance Optimization
调用 Skill:
catlass-operator-performance-optim须询问用户是否进入调优;不得默认跳过询问。
- 用户同意 → 按 catlass-operator-performance-optim 修改 tiling/实现;凡改代码 → 从 Phase 3 起复跑(Phase 3→4→5→6),直至再次达标
- 用户拒绝 → 结束
Call Skill:
catlass-operator-performance-optimMust ask the user whether to enter tuning; do not skip the question by default.
- User agrees → Modify tiling/implementation according to catlass-operator-performance-optim; any code change → Re-run from Phase 3 (Phase 3→4→5→6) until it meets the standards again
- User refuses → End
阶段间数据流
Inter-Phase Data Flow
Phase 1 输出 Phase 2 输入
ascend-kernel + ops/<op>/骨架 算子名、catlass/ 可引用
+ catlass/include、examples ────▶
Phase 2 输出 Phase 3 输入
design.md(定稿) ────▶ example 路径、类型与 Host 契约
Phase 3 输出 Phase 4 输入
已安装 whl + test_<op>.py ────▶ register.cpp / ops.h / design.md / op_host
Phase 4 输出 Phase 5 输入
README.md ────▶ 接口、dtype、约束、调用方式
Phase 5 输出 Phase 6 输入
精度通过 + 报告 ────▶ 算子名、标杆 API、JSONL 与 profiler 流程
Phase 6 输出
性能报告(profiler) ────▶ 可选:用户确认后进入 catlass-operator-performance-optimPhase 1 Output Phase 2 Input
ascend-kernel + ops/<op>/skeleton Operator name, catlass/ is referenceable
+ catlass/include, examples ────▶
Phase 2 Output Phase 3 Input
design.md (finalized) ────▶ Example path, type and Host contract
Phase 3 Output Phase 4 Input
Installed whl + test_<op>.py ────▶ register.cpp / ops.h / design.md / op_host
Phase 4 Output Phase 5 Input
README.md ────▶ Interface, dtype, constraints, calling method
Phase 5 Output Phase 6 Input
Precision passed + Report ────▶ Operator name, benchmark API, JSONL and profiler workflow
Phase 6 Output
Performance Report (profiler) ────▶ Optional: Enter catlass-operator-performance-optim after user confirmation状态跟踪表
Status Tracking Table
| Phase | 前置条件 | 调用 Skill / 动作 | 关键产出物 |
|---|---|---|---|
| 0. 需求收集 | 无 | — | CANN + Conda + |
| 1. 工程 + Catlass | Phase 0 | | 骨架 + Catlass 源码树 |
| 2. 设计 | Phase 1 | | |
| 3. 代码与测试 | Phase 2 | | 可运行算子 + 基础测试通过 |
| 4. 接口文档 | Phase 3 | | |
| 5. 精度评估 | Phase 4 | | ≥30 例 + 精度报告 |
| 6. 性能评测 | Phase 5 | | JSONL + profiler 报告 |
| (可选)调优 | Phase 6 + 用户确认 | | 迭代后的实现与报告 |
| Phase | Prerequisite | Called Skill / Action | Key Deliverables |
|---|---|---|---|
| 0. Requirements Collection | None | — | CANN + Conda + |
| 1. Project + Catlass | Phase 0 | | Skeleton + Catlass source code tree |
| 2. Design | Phase 1 | | |
| 3. Code & Test | Phase 2 | | Runnable operator + Basic test passed |
| 4. Interface Doc | Phase 3 | | |
| 5. Precision Eval | Phase 4 | | ≥30 cases + Precision report |
| 6. Performance Eval | Phase 5 | | JSONL + Profiler report |
| (Optional) Tuning | Phase 6 + User Confirmation | | Iterated implementation and report |
错误恢复
Error Recovery
从中断点恢复
Resume from Breakpoint
当用户说「继续 Catlass 算子开发」时:
| 检测条件 | 判定阶段 | 恢复动作 |
|---|---|---|
| Phase 1 未完成 | 从 Phase 1 Step 1.1 开始 |
| Phase 1 未完成 | 完成 Step 1.2 克隆 |
| Phase 2 未完成 | 从 Phase 2 开始 |
| Phase 3 未完成 | 从 Phase 3 开始 |
whl 未安装或 | Phase 3 未完成 | 在 compile-debug 流程内恢复 |
无 | Phase 4 未完成 | 从 Phase 4 开始 |
| Phase 5 未完成 | 从 Phase 5 恢复 |
| 无性能报告或不符合 performance-eval 要求 | Phase 6 未完成 | 从 Phase 6 恢复 |
When the user says "Continue Catlass operator development":
| Detection Condition | Determined Phase | Recovery Action |
|---|---|---|
| Phase 1 not completed | Start from Phase 1 Step 1.1 |
| Phase 1 not completed | Complete Step 1.2 cloning |
| Phase 2 not completed | Start from Phase 2 |
| Phase 3 not completed | Start from Phase 3 |
whl not installed or | Phase 3 not completed | Recover within compile-debug workflow |
No | Phase 4 not completed | Start from Phase 4 |
No precision report in | Phase 5 not completed | Recover from Phase 5 |
| No performance report or does not meet performance-eval requirements | Phase 6 not completed | Recover from Phase 6 |
编译/测试失败
Compile/Test Failure
由 ascendc-operator-compile-debug(经 catlass-operator-code-gen 触发)处理;重试与排错上限以 compile-debug skill 为准。
Handled by ascendc-operator-compile-debug (triggered via catlass-operator-code-gen); retry and troubleshooting limits are subject to compile-debug skill.