ascendc-operator-dev

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

AscendC 算子端到端开发编排

AscendC Operator End-to-End Development Orchestration

Skill类型:流程导向型(七阶段工作流,子技能串行编排)
本 skill 编排七个子 skill,驱动 ascend-kernel 算子从零到生产可用。
Skill Type: Process-oriented (seven-stage workflow with serial orchestration of sub-skills)
This skill orchestrates seven sub-skills to drive ascend-kernel operators from scratch to production-ready.

核心原则

Core Principles

  1. 七阶段串行:工程初始化 → 设计文档 → 用例生成 → 代码生成&测试 → 接口文档 → 精度评估 → 性能评测,严格顺序执行
  2. 子技能执行:每个阶段 MUST 调用对应子 skill,不得自行实现
  3. 阶段门控:前一阶段检查点全部通过后才进入下一阶段
  4. 设计驱动编码:代码生成依赖设计文档中的 Tiling 策略和 UB 分配表
  5. 自动化设计:无需用户预先提供设计文档,设计阶段自动生成
  6. 用例统一生成:设计完成后立即生成测试用例文档,供后续精度评估和性能评测复用
  7. 文档闭环:编译测试通过后 MUST 生成 PyTorch 风格的中文接口文档,并在聊天界面展示
  8. 精度闭环:算子必须通过 ≥30 例全面精度评估才算完成
  9. 性能闭环:算子必须通过 msprof 性能对比评测,输出性能报告
  10. 结果可视化:Phase 4/5/6/7 的结果 MUST 以 Markdown 形式直接展示在聊天界面中,不要仅输出路径
  1. Seven-stage Serial Execution: Project Initialization → Design Documentation → Test Case Generation → Code Generation & Testing → Interface Documentation → Precision Evaluation → Performance Benchmarking, executed in strict order
  2. Sub-skill Execution: Each stage MUST call the corresponding sub-skill, no self-implementation allowed
  3. Stage Gating: Proceed to the next stage only after all checkpoints of the previous stage are passed
  4. Design-driven Coding: Code generation depends on the Tiling strategy and UB allocation table in the design document
  5. Automated Design: No need for users to provide pre-prepared design documents; the design stage generates them automatically
  6. Unified Test Case Generation: Generate test case documents immediately after design completion for reuse in subsequent precision evaluation and performance benchmarking
  7. Documentation Closure: After passing compilation and testing, MUST generate Chinese interface documents in PyTorch style and display them in the chat interface
  8. Precision Closure: Operators must pass ≥30 comprehensive precision evaluation cases to be considered complete
  9. Performance Closure: Operators must pass msprof performance comparison and benchmarking, with a performance report output
  10. Result Visualization: Results of Phase 4/5/6/7 MUST be directly displayed in the chat interface in Markdown format, do not only output paths

可用子 Skill 清单

Available Sub-skill List

Skill路径职责
ascendc-operator-project-init
ascendc-operator-project-init/SKILL.md
检测/创建 ascend-kernel 项目,生成算子骨架目录
ascendc-operator-design
ascendc-operator-design/SKILL.md
分析算子需求,生成设计文档(含 Tiling 策略、UB 分配表)
ascendc-operator-testcase-gen
ascendc-operator-testcase-gen/SKILL.md
根据设计文档生成统一测试用例文档,供精度评估和性能评测复用
ascendc-operator-code-gen
ascendc-operator-code-gen/SKILL.md
根据设计文档生成 op_host/op_kernel 代码、框架适配、编译测试
ascendc-operator-compile-debug
ascendc-operator-compile-debug/SKILL.md
编译、安装 whl、生成测试文件、运行精度测试(由 code-gen 内部调用)
ascendc-operator-doc-gen
ascendc-operator-doc-gen/SKILL.md
从源码提取接口信息,生成 PyTorch 风格中文 API 文档(必选阶段)
ascendc-operator-precision-eval
ascendc-operator-precision-eval/SKILL.md
生成 ≥30 例精度测试、运行并输出精度验证报告(必选阶段)
ascendc-operator-performance-eval
ascendc-operator-performance-eval/SKILL.md
使用 msprof 对比工程算子与原生算子性能,输出性能评测报告(必选阶段)
SkillPathResponsibility
ascendc-operator-project-init
ascendc-operator-project-init/SKILL.md
Detect/create ascend-kernel project, generate operator skeleton directory
ascendc-operator-design
ascendc-operator-design/SKILL.md
Analyze operator requirements, generate design document (including Tiling strategy, UB allocation table)
ascendc-operator-testcase-gen
ascendc-operator-testcase-gen/SKILL.md
Generate unified test case document based on design document for reuse in precision evaluation and performance benchmarking
ascendc-operator-code-gen
ascendc-operator-code-gen/SKILL.md
Generate op_host/op_kernel code, framework adaptation, compilation testing
ascendc-operator-compile-debug
ascendc-operator-compile-debug/SKILL.md
Compile, install whl package, generate test files, run precision tests (called internally by code-gen)
ascendc-operator-doc-gen
ascendc-operator-doc-gen/SKILL.md
Extract interface information from source code, generate Chinese API documents in PyTorch style (mandatory stage)
ascendc-operator-precision-eval
ascendc-operator-precision-eval/SKILL.md
Generate ≥30 precision test cases, run them and output precision verification report (mandatory stage)
ascendc-operator-performance-eval
ascendc-operator-performance-eval/SKILL.md
Use msprof to compare performance between project operators and native operators, output performance benchmarking report (mandatory stage)

工作流总览

Workflow Overview

Phase 1        Phase 2        Phase 3        Phase 4                      Phase 5        Phase 6         Phase 7
工程初始化  ──▶ 设计文档  ──▶ 用例生成  ──▶ 代码生成+框架适配+编译测试  ──▶ 接口文档  ──▶ 精度评估报告  ──▶ 性能评测报告
project-init   design         testcase-gen   code-gen → compile-debug      doc-gen        precision-eval  performance-eval

输入: 算子名称 + 功能描述                              输出: 生产可用算子 + 用例文档 + 接口文档 + 精度报告 + 性能报告
Phase 1        Phase 2        Phase 3        Phase 4                      Phase 5        Phase 6         Phase 7
Project Init  ──▶  Design Doc  ──▶  Test Case Gen  ──▶  Code Gen + Framework Adaptation + Compile Test  ──▶  Interface Doc  ──▶  Precision Eval Report  ──▶  Performance Benchmark Report
project-init   design         testcase-gen   code-gen → compile-debug      doc-gen        precision-eval  performance-eval

Input: Operator Name + Function Description                              Output: Production-ready Operator + Test Case Doc + Interface Doc + Precision Report + Performance Report

反模式清单(NEVER DO THESE)

Anti-pattern List (NEVER DO THESE)

  • ❌ 不要跳过设计阶段直接写代码
  • ❌ 不要跳过用例生成阶段,Phase 2 通过后必须执行 Phase 3(testcase-gen)
  • ❌ 不要自行实现任何算子代码,必须调用子 skill
  • ❌ 不要在代码生成之前修改框架文件(ops.h / register.cpp / CMakeLists.txt)
  • ❌ 不要手动执行编译和测试,统一由 compile-debug skill 处理
  • ❌ 不要引用不存在的 skill
  • ❌ 不要跳过检查点验证
  • ❌ 不要跳过接口文档阶段,Phase 4 通过后必须执行 Phase 5
  • ❌ 不要跳过精度评估阶段,Phase 5 通过后必须执行 Phase 6
  • ❌ 不要跳过性能评测阶段,Phase 6 通过后必须执行 Phase 7
  • ❌ 不要使用非 msprof 的计时方式作为性能结论
  • ❌ 精度评估和性能评测不要自行设计用例,必须先读取 testcase-gen 生成的用例文档

  • ❌ Do not skip the design stage and directly write code
  • ❌ Do not skip the test case generation stage; Phase 3 (testcase-gen) must be executed after Phase 2 is passed
  • ❌ Do not implement any operator code by yourself, must call sub-skills
  • ❌ Do not modify framework files (ops.h / register.cpp / CMakeLists.txt) before code generation
  • ❌ Do not manually execute compilation and testing, handle uniformly via compile-debug skill
  • ❌ Do not reference non-existent skills
  • ❌ Do not skip checkpoint verification
  • ❌ Do not skip the interface documentation stage; Phase 5 must be executed after Phase 4 is passed
  • ❌ Do not skip the precision evaluation stage; Phase 6 must be executed after Phase 5 is passed
  • ❌ Do not skip the performance benchmarking stage; Phase 7 must be executed after Phase 6 is passed
  • ❌ Do not use timing methods other than msprof as performance conclusions
  • ❌ Do not design test cases for precision evaluation and performance benchmarking by yourself, must first read the test case document generated by testcase-gen

Phase 0:需求收集

Phase 0: Requirements Collection

目标:确认算子开发所需的最小信息集,包括开发环境和算子需求
Goal: Confirm the minimum information set required for operator development, including development environment and operator requirements

Step 0.1:环境确认(MUST 在任何开发动作之前完成)

Step 0.1: Environment Confirmation (MUST be completed before any development action)

开发环境是所有后续阶段的前置依赖,必须首先确认
The development environment is a prerequisite for all subsequent stages, must be confirmed first.

CANN 环境

CANN Environment

自动检测流程
  1. 检查环境变量
    ASCEND_HOME_PATH
    是否已设置(
    echo $ASCEND_HOME_PATH
  2. 若已设置:直接使用,无需询问用户,将其作为
    CANN_PATH
  3. 若未设置MUST 向用户询问 CANN 安装路径(如
    /usr/local/Ascend/ascend-toolkit
激活方式
bash
source ${CANN_PATH}/*/set_env.sh
在每个需要编译或运行算子的 Shell 会话中,都必须先执行此激活命令。
Automatic Detection Process:
  1. Check if the environment variable
    ASCEND_HOME_PATH
    is set (
    echo $ASCEND_HOME_PATH
    )
  2. If set: Use it directly as
    CANN_PATH
    without asking the user
  3. If not set: MUST ask the user for the CANN installation path (e.g.,
    /usr/local/Ascend/ascend-toolkit
    )
Activation Method:
bash
source ${CANN_PATH}/*/set_env.sh
In every Shell session that requires compiling or running operators, this activation command must be executed first.

Conda 环境

Conda Environment

自动检测流程
  1. 检查当前是否已激活 conda 环境(
    echo $CONDA_DEFAULT_ENV
  2. 若已激活(值非
    base
    且非空):直接使用当前环境,无需询问用户
  3. 若未激活或为
    base
    MUST 向用户询问要使用的 conda 环境名称
激活方式
bash
conda activate <env_name>
在每个需要编译或运行算子的 Shell 会话中,都必须先激活 conda 环境。
Automatic Detection Process:
  1. Check if a conda environment is currently activated (
    echo $CONDA_DEFAULT_ENV
    )
  2. If activated (value is not
    base
    and not empty): Use the current environment directly without asking the user
  3. If not activated or is
    base
    : MUST ask the user for the name of the conda environment to use
Activation Method:
bash
conda activate <env_name>
In every Shell session that requires compiling or running operators, the conda environment must be activated first.

环境确认检查点

Environment Confirmation Checkpoints

  • CANN 路径已确定(自动检测或用户提供)
  • source ${CANN_PATH}/*/set_env.sh
    可正常执行
  • Conda 环境已确定(自动检测或用户提供)
  • conda activate <env_name>
    可正常执行
  • CANN path is confirmed (auto-detected or provided by user)
  • source ${CANN_PATH}/*/set_env.sh
    can be executed normally
  • Conda environment name is confirmed (auto-detected or provided by user)
  • conda activate <env_name>
    can be executed normally

Step 0.2:算子需求收集

Step 0.2: Operator Requirements Collection

必须确认的信息

Mandatory Information to Confirm

信息格式要求必填说明
CANN 环境路径绝对路径自动检测
$ASCEND_HOME_PATH
,未设置则询问用户
Conda 环境名称字符串自动检测
$CONDA_DEFAULT_ENV
,未激活则询问用户
算子名称snake_case
acosh
,
rms_norm
,
flash_attn
功能描述文本/数学公式如 "反双曲余弦 acosh(x) = ln(x + sqrt(x²-1))"
可选信息(有默认值):
信息默认值说明
支持的数据类型float16, float32可扩展 bfloat16
SoC平台ascend910b通过平台 API 自动获取
InformationFormat RequirementMandatoryDescription
CANN Environment PathAbsolute pathYesAuto-detect
$ASCEND_HOME_PATH
, ask user if not set
Conda Environment NameStringYesAuto-detect
$CONDA_DEFAULT_ENV
, ask user if not activated
Operator Namesnake_caseYese.g.,
acosh
,
rms_norm
,
flash_attn
Function DescriptionText/Mathematical FormulaYese.g., "Inverse hyperbolic cosine acosh(x) = ln(x + sqrt(x²-1))"
Optional Information (with default values):
InformationDefault ValueDescription
Supported Data Typesfloat16, float32Can be extended to bfloat16
SoC Platformascend910bAuto-obtained via platform API

决策树

Decision Tree

用户请求处理方式
"生成 X 算子" / "开发 X 算子"先完成环境确认(Step 0.1),再从算子名推断功能,确认后直接执行全流程
"帮我开发新算子"(无具体名称)先完成环境确认(Step 0.1),再询问算子名称和功能描述
"继续算子开发"先完成环境确认(Step 0.1),再检查已有文件判断阶段,从中断处继续
User RequestHandling Method
"Generate X operator" / "Develop X operator"Complete environment confirmation (Step 0.1) first, then infer the function from the operator name, and execute the full process directly after confirmation
"Help me develop a new operator" (no specific name)Complete environment confirmation (Step 0.1) first, then ask for the operator name and function description
"Continue operator development"Complete environment confirmation (Step 0.1) first, then check existing files to determine the stage and resume from the interrupted point

验收标准

Acceptance Criteria

  • CANN 环境路径已确定且可激活
  • Conda 环境名称已确定且可激活
  • 算子名称已确认(snake_case 格式)
  • 功能描述已明确(含数学公式或计算逻辑)

  • CANN environment path is confirmed and can be activated
  • Conda environment name is confirmed and can be activated
  • Operator name is confirmed (snake_case format)
  • Function description is clear (including mathematical formula or calculation logic)

Phase 1:工程初始化

Phase 1: Project Initialization

调用 Skill
ascendc-operator-project-init
Called Skill:
ascendc-operator-project-init

执行内容

Execution Content

MANDATORY: 按 ascendc-operator-project-init skill 流程执行:
1. 检测 ascend-kernel 项目是否存在
2. 不存在则从模板复制
3. 在 csrc/ops/<op_name>/ 下创建算子骨架
4. 提示三处注册更新点
MANDATORY: Execute according to the ascendc-operator-project-init skill process:
1. Detect if the ascend-kernel project exists
2. Copy from template if it does not exist
3. Create operator skeleton under csrc/ops/<op_name>/
4. Prompt three registration update points

检查点

Checkpoints

  • ascend-kernel 项目存在(build.sh、CMakeLists.txt、csrc/)
  • csrc/ops/<op_name>/
    目录已创建
  • 包含
    op_host/<op_name>.cpp
    op_kernel/<op_name>.cpp
    CMakeLists.txt
    design.md
全部通过 → 进入 Phase 2

  • ascend-kernel project exists (build.sh, CMakeLists.txt, csrc/)
  • csrc/ops/<op_name>/
    directory has been created
  • Contains
    op_host/<op_name>.cpp
    ,
    op_kernel/<op_name>.cpp
    ,
    CMakeLists.txt
    ,
    design.md
All passed → Proceed to Phase 2

Phase 2:设计文档生成

Phase 2: Design Document Generation

调用 Skill
ascendc-operator-design
Called Skill:
ascendc-operator-design

执行内容

Execution Content

MANDATORY: 按 ascendc-operator-design skill 流程执行:
1. 分析算子需求(名称、功能、数据类型)
2. 确定实现路径(AscendC Kernel / CATLASS / ACLNN)
3. 设计 Tiling 策略(Block级 + UB级)
4. 填写 UB 分配表,推导 bufferCoefficient
5. 生成完整设计文档到 csrc/ops/<op_name>/design.md
MANDATORY: Execute according to the ascendc-operator-design skill process:
1. Analyze operator requirements (name, function, data types)
2. Determine implementation path (AscendC Kernel / CATLASS / ACLNN)
3. Design Tiling strategy (Block-level + UB-level)
4. Fill in UB allocation table, derive bufferCoefficient
5. Generate complete design document to csrc/ops/<op_name>/design.md

检查点

Checkpoints

  • csrc/ops/<op_name>/design.md
    内容完整
  • 包含函数签名和支持的数据类型
  • 包含计算逻辑伪代码(AscendC API 调用序列)
  • 包含 UB 分配表(列出所有 buffer 及总系数)
  • 包含 bufferCoefficient(每种 dtype 的值)
全部通过 → 进入 Phase 3

  • csrc/ops/<op_name>/design.md
    is complete in content
  • Contains function signature and supported data types
  • Contains calculation logic pseudocode (AscendC API call sequence)
  • Contains UB allocation table (lists all buffers and total coefficients)
  • Contains bufferCoefficient (value for each dtype)
All passed → Proceed to Phase 3

Phase 3:测试用例生成

Phase 3: Test Case Generation

调用 Skill
ascendc-operator-testcase-gen
Called Skill:
ascendc-operator-testcase-gen

执行内容

Execution Content

MANDATORY: 按 ascendc-operator-testcase-gen skill 流程执行:
1. 读取 csrc/ops/<op_name>/design.md,提取参数约束、支持的 dtype、典型 shape
2. 生成 TEST_SHAPES(常规 shape)、GENERAL_SHAPES(泛化 shape)、BOUNDARY_VALUES(边界值)
3. 生成算子标杆(CPU 参考实现、NPU 调用方式)
4. 输出用例文档到 csrc/ops/<op_name>/test/<op_name>-test-cases.md
MANDATORY: Execute according to the ascendc-operator-testcase-gen skill process:
1. Read csrc/ops/<op_name>/design.md, extract parameter constraints, supported dtypes, typical shapes
2. Generate TEST_SHAPES (regular shapes), GENERAL_SHAPES (generalized shapes), BOUNDARY_VALUES (boundary values)
3. Generate operator benchmarks (CPU reference implementation, NPU calling method)
4. Output test case document to csrc/ops/<op_name>/test/<op_name>-test-cases.md

检查点

Checkpoints

  • csrc/ops/<op_name>/test/<op_name>-test-cases.md
    已生成
  • 包含 SUPPORTED_DTYPES、TEST_SHAPES、GENERAL_SHAPES、BOUNDARY_VALUES
  • 包含算子标杆(NPU 调用方式 + CPU 参考实现)
  • shape 和参数值均在 design.md 约束范围内
全部通过 → 进入 Phase 4

  • csrc/ops/<op_name>/test/<op_name>-test-cases.md
    has been generated
  • Contains SUPPORTED_DTYPES, TEST_SHAPES, GENERAL_SHAPES, BOUNDARY_VALUES
  • Contains operator benchmarks (NPU calling method + CPU reference implementation)
  • Shapes and parameter values are within the constraints of design.md
All passed → Proceed to Phase 4

Phase 4:代码生成 + 框架适配 + 编译测试

Phase 4: Code Generation + Framework Adaptation + Compile Test

调用 Skill
ascendc-operator-code-gen
(内部自动调用
ascendc-operator-compile-debug
Called Skill:
ascendc-operator-code-gen
(internally calls
ascendc-operator-compile-debug
automatically)

执行内容

Execution Content

MANDATORY: 按 ascendc-operator-code-gen skill 流程执行:

阶段 1: 加载参考文档
  - 读取 references/GUIDE.md
  - 按算子类型加载对应 reference

阶段 2: 读取设计文档
  - 提取函数签名、UB 分配表、计算伪代码

阶段 3: 选择模板并生成代码
  - 选择 elementwise / row 模板
  - 生成 op_host/<op_name>.cpp(含 Tiling 计算逻辑)
  - 生成 op_kernel/<op_name>.cpp(含 Compute 计算逻辑)

阶段 4: 框架适配
  - 更新 csrc/ops.h(函数声明)
  - 更新 csrc/register.cpp(m.def + m.impl)
  - 更新 csrc/CMakeLists.txt(OP_SRCS + ascendc_library)

阶段 5: 编译安装与测试(调用 compile-debug skill)
  - ./build.sh 编译
  - pip install whl 安装
  - 生成 tests/test_<op_name>.py
  - 运行功能测试和精度测试
  - 编译/测试失败最多排错 3 次
MANDATORY: Execute according to the ascendc-operator-code-gen skill process:

Stage 1: Load Reference Documents
  - Read references/GUIDE.md
  - Load corresponding reference according to operator type

Stage 2: Read Design Document
  - Extract function signature, UB allocation table, calculation pseudocode

Stage 3: Select Template and Generate Code
  - Select elementwise / row template
  - Generate op_host/<op_name>.cpp (includes Tiling calculation logic)
  - Generate op_kernel/<op_name>.cpp (includes Compute calculation logic)

Stage 4: Framework Adaptation
  - Update csrc/ops.h (function declaration)
  - Update csrc/register.cpp (m.def + m.impl)
  - Update csrc/CMakeLists.txt (OP_SRCS + ascendc_library)

Stage 5: Compilation, Installation and Testing (call compile-debug skill)
  - Compile via ./build.sh
  - Install via pip install whl
  - Generate tests/test_<op_name>.py
  - Run functional tests and precision tests
  - Debug up to 3 times if compilation/test fails

检查点

Checkpoints

  • op_host/<op_name>.cpp
    使用平台 API 获取硬件参数
  • op_kernel/<op_name>.cpp
    包含完整 CopyIn → Compute → CopyOut 流水线
  • ops.h
    已添加函数声明
  • register.cpp
    已添加
    m.def
    m.impl
  • csrc/CMakeLists.txt
    已添加 host 和 kernel 源文件
  • 编译成功(whl 包已生成)
  • 功能测试通过(exit code 0)
  • 精度测试全部通过(pytest 全绿)
全部通过 → 进入 Phase 5

  • op_host/<op_name>.cpp
    uses platform API to obtain hardware parameters
  • op_kernel/<op_name>.cpp
    contains complete CopyIn → Compute → CopyOut pipeline
  • Function declaration has been added to
    ops.h
  • m.def
    and
    m.impl
    have been added to
    register.cpp
  • Host and kernel source files have been added to
    csrc/CMakeLists.txt
  • Compilation is successful (whl package has been generated)
  • Functional tests pass (exit code 0)
  • All precision tests pass (pytest all green)
All passed → Proceed to Phase 5

Phase 5:接口文档生成

Phase 5: Interface Document Generation

调用 Skill
ascendc-operator-doc-gen
Called Skill:
ascendc-operator-doc-gen

执行内容

Execution Content

MANDATORY: 按 ascendc-operator-doc-gen skill 流程执行:

阶段 1: 信息提取
  - 从 register.cpp 提取 Python 调用签名(m.def schema)
  - 从 ops.h 提取 C++ 函数声明和返回类型
  - 从 design.md 提取算法描述、参数说明、dtype 支持、约束条件
  - 从 op_host 提取 TORCH_CHECK 约束
  - 从 tests/test_<op_name>.py 提取使用示例

阶段 2: 文档结构组装
  - 按 PyTorch 官方文档风格组装中文接口文档
  - 包含:标题签名 + 功能描述 + 参数说明 + 支持的数据类型 + Shape + 约束条件 + 使用示例 + 返回值

阶段 3: 文件生成
  - 生成 csrc/ops/<op_name>/README.md

阶段 4: 在交互界面展示完整文档内容
MANDATORY: Execute according to the ascendc-operator-doc-gen skill process:

Stage 1: Information Extraction
  - Extract Python calling signature (m.def schema) from register.cpp
  - Extract C++ function declaration and return type from ops.h
  - Extract algorithm description, parameter description, dtype support, constraint conditions from design.md
  - Extract TORCH_CHECK constraints from op_host
  - Extract usage examples from tests/test_<op_name>.py

Stage 2: Document Structure Assembly
  - Assemble Chinese interface documents in PyTorch official documentation style
  - Includes: Title Signature + Function Description + Parameter Description + Supported Data Types + Shape + Constraint Conditions + Usage Examples + Return Value

Stage 3: File Generation
  - Generate csrc/ops/<op_name>/README.md

Stage 4: Display complete document content in the interactive interface

检查点

Checkpoints

  • 从源代码提取了完整的接口信息(签名、参数、dtype、shape、约束)
  • README.md 包含完整的 7 个段落(标题签名 + 功能描述 + 参数说明 + 支持的数据类型 + Shape + 约束条件 + 使用示例 + 返回值)
  • Python 调用签名与
    register.cpp
    m.def
    一致
  • 参数说明使用 PyTorch 文档风格,描述使用中文
  • 使用示例中的代码可运行
  • README.md 已写入
    csrc/ops/<op_name>/README.md
  • 接口文档已在聊天界面完整展示
全部通过 → 进入 Phase 6

  • Complete interface information has been extracted from source code (signature, parameters, dtype, shape, constraints)
  • README.md contains all 7 sections (title signature + function description + parameter description + supported data types + shape + constraint conditions + usage examples + return value)
  • Python calling signature is consistent with
    m.def
    in
    register.cpp
  • Parameter descriptions use PyTorch documentation style, described in Chinese
  • Code in usage examples is runnable
  • README.md has been written to
    csrc/ops/<op_name>/README.md
  • Interface document has been fully displayed in the chat interface
All passed → Proceed to Phase 6

Phase 6:精度评估报告

Phase 6: Precision Evaluation Report

调用 Skill
ascendc-operator-precision-eval
Called Skill:
ascendc-operator-precision-eval

执行内容

Execution Content

MANDATORY: 按 ascendc-operator-precision-eval skill 流程执行:

阶段 1: 加载用例文档 + 信息收集
  - 读取 csrc/ops/<op_name>/test/<op_name>-test-cases.md(testcase-gen 产出)
  - 提取 SUPPORTED_DTYPES、TEST_SHAPES、GENERAL_SHAPES、BOUNDARY_VALUES、算子标杆
  - 从已有代码补充提取精度阈值等信息

阶段 2: 用例适配((shapes + boundary) × dtypes ≥ 30 例)
  - 直接复用 testcase-gen 的 TEST_SHAPES 和 BOUNDARY_VALUES
  - 每个 shape / 边界值遍历算子支持的全部 dtype

阶段 3: 测试脚本生成(输出到算子目录 csrc/ops/<op_name>/test/)
  - 基于模板生成 test_<op_name>_precision.py(pytest 格式)
  - 基于模板生成 run_<op_name>_precision_report.py(报告生成器)

阶段 4: 执行
  - 运行 pytest 全部通过
  - 运行报告生成器输出 JSON

阶段 5: 报告生成
  - 生成 <op_name>_precision_report.md(含常规 shape + 边界值表格 + 汇总 + 关键发现)
  - 向用户提示报告路径
MANDATORY: Execute according to the ascendc-operator-precision-eval skill process:

Stage 1: Load Test Case Document + Information Collection
  - Read csrc/ops/<op_name>/test/<op_name>-test-cases.md (output from testcase-gen)
  - Extract SUPPORTED_DTYPES, TEST_SHAPES, GENERAL_SHAPES, BOUNDARY_VALUES, operator benchmarks
  - Supplement and extract information such as precision thresholds from existing code

Stage 2: Test Case Adaptation ((shapes + boundary) × dtypes ≥ 30 cases)
  - Directly reuse TEST_SHAPES and BOUNDARY_VALUES from testcase-gen
  - Traverse all dtypes supported by the operator for each shape / boundary value

Stage 3: Test Script Generation (output to operator directory csrc/ops/<op_name>/test/)
  - Generate test_<op_name>_precision.py (pytest format) based on template
  - Generate run_<op_name>_precision_report.py (report generator) based on template

Stage 4: Execution
  - Run pytest and all tests pass
  - Run report generator to output JSON

Stage 5: Report Generation
  - Generate <op_name>_precision_report.md (includes regular shape + boundary value table + summary + key findings)
  - Prompt the user for the report path

检查点

Checkpoints

  • 用例数 = (shapes + boundary) × dtypes ≥ 30
  • 算子支持的每种 dtype 都已测试
  • pytest 精度测试全部通过
  • JSON 报告生成(含 5 个精度指标: MaxAbsErr / MeanAbsErr / MaxRelErr / MeanRelErr / CosineSim)
  • Markdown 报告生成于
    csrc/ops/<op_name>/test/<op_name>_precision_report.md
  • 精度测试结果已以 Markdown 表格形式展示在聊天界面
  • 已向用户提示精度报告路径
全部通过 → 进入 Phase 7

  • Number of test cases = (shapes + boundary) × dtypes ≥ 30
  • Each dtype supported by the operator has been tested
  • All pytest precision tests pass
  • JSON report is generated (includes 5 precision metrics: MaxAbsErr / MeanAbsErr / MaxRelErr / MeanRelErr / CosineSim)
  • Markdown report is generated at
    csrc/ops/<op_name>/test/<op_name>_precision_report.md
  • Precision test results have been displayed in the chat interface in Markdown table format
  • The user has been prompted for the precision report path
All passed → Proceed to Phase 7

Phase 7:性能评测报告

Phase 7: Performance Benchmarking Report

调用 Skill
ascendc-operator-performance-eval
Called Skill:
ascendc-operator-performance-eval

执行内容

Execution Content

MANDATORY: 按 ascendc-operator-performance-eval skill 流程执行:

阶段 1: 加载用例文档 + 信息收集
  - 读取 csrc/ops/<op_name>/test/<op_name>-test-cases.md(testcase-gen 产出)
  - 提取 SUPPORTED_DTYPES、TEST_SHAPES、GENERAL_SHAPES、算子标杆
  - 从已有代码补充提取 OP Type 关键字等信息

阶段 2: 用例适配(JSONL 格式,≥8 case)
  - 从 testcase-gen 的 TEST_SHAPES + GENERAL_SHAPES 中选取代表性 shape
  - 覆盖算子支持的全部 dtype
  - 转换为 JSONL 格式

阶段 3: 脚本生成(输出到算子目录 csrc/ops/<op_name>/test/)
  - 基于模板生成 run_<op_name>_case.py(单 case msprof 执行器)
  - 基于模板生成 benchmark_<op_name>_msprof.py(总控脚本)
  - 生成 <op_name>_cases.jsonl

阶段 4: 执行采集
  - 运行总控脚本,每 case 20 次迭代(前 10 次预热)
  - 按 OP Type 从 op_summary_*.csv 提取 Task Duration(us) 和硬件指标
  - 输出 JSON 结果

阶段 5: 报告生成
  - 生成 <op_name>_perf_report.md(含结果表格 + 汇总 + 简短分析)
  - 向用户提示报告路径
MANDATORY: Execute according to the ascendc-operator-performance-eval skill process:

Stage 1: Load Test Case Document + Information Collection
  - Read csrc/ops/<op_name>/test/<op_name>-test-cases.md (output from testcase-gen)
  - Extract SUPPORTED_DTYPES, TEST_SHAPES, GENERAL_SHAPES, operator benchmarks
  - Supplement and extract information such as OP Type keywords from existing code

Stage 2: Test Case Adaptation (JSONL format, ≥8 cases)
  - Select representative shapes from TEST_SHAPES + GENERAL_SHAPES of testcase-gen
  - Cover all dtypes supported by the operator
  - Convert to JSONL format

Stage 3: Script Generation (output to operator directory csrc/ops/<op_name>/test/)
  - Generate run_<op_name>_case.py (single case msprof executor) based on template
  - Generate benchmark_<op_name>_msprof.py (master control script) based on template
  - Generate <op_name>_cases.jsonl

Stage 4: Execution and Collection
  - Run the master control script, 20 iterations per case (first 10 for warm-up)
  - Extract Task Duration(us) and hardware metrics from op_summary_*.csv by OP Type
  - Output JSON results

Stage 5: Report Generation
  - Generate <op_name>_perf_report.md (includes result table + summary + brief analysis)
  - Prompt the user for the report path

检查点

Checkpoints

  • JSONL 用例覆盖多种 shape × dtype(≥ 8 case)
  • 使用
    msprof
    采集,非其他计时方式
  • OP Type
    筛选目标算子(非 Op Name)
  • 20/10 预热/统计策略
  • JSON 报告生成(含 Task Duration + 硬件指标)
  • Markdown 报告生成于
    csrc/ops/<op_name>/test/<op_name>_perf_report.md
  • 报告包含简短分析(≥ 3 条结论)
  • 性能测试结果已以 Markdown 表格形式展示在聊天界面
  • 已向用户提示性能报告路径
全部通过 → 算子开发完成

  • JSONL test cases cover multiple shape × dtype combinations (≥8 cases)
  • Uses
    msprof
    for collection, no other timing methods
  • Filters target operators by
    OP Type
    (not Op Name)
  • 20/10 warm-up/statistics strategy is used
  • JSON report is generated (includes Task Duration + hardware metrics)
  • Markdown report is generated at
    csrc/ops/<op_name>/test/<op_name>_perf_report.md
  • Report contains brief analysis (≥3 conclusions)
  • Performance test results have been displayed in the chat interface in Markdown table format
  • The user has been prompted for the performance report path
All passed → Operator development is complete

阶段间数据流

Inter-stage Data Flow

Phase 1 输出                    Phase 2 输入
  csrc/ops/<op_name>/    ────▶    算子名称、目录结构
  design.md (占位)

Phase 2 输出                    Phase 3 输入
  design.md (完整)       ────▶    参数约束、支持的 dtype、典型 shape
                                  → 生成统一测试用例文档

Phase 3 输出                    Phase 4 输入
  <op_name>-test-cases.md ────▶    design.md (完整)
  (用例文档,供后续复用)          函数签名、UB 分配表 → bufferCoefficient
                                  计算伪代码 → Compute 逻辑
                                  Tiling 策略 → Block/UB 切分参数

Phase 4 输出                    Phase 5 输入
  已安装的算子 whl        ────▶    register.cpp / ops.h / design.md /
  tests/test_<op_name>.py        op_host / test 文件
                                  → 提取接口信息生成文档

Phase 5 输出                    Phase 6 输入
  csrc/ops/<op>/README.md ────▶    <op_name>-test-cases.md(来自 Phase 3)
  接口文档完成                     算子名、调用方式、输入域约束
                                  支持的全部 dtype、精度阈值
                                  → 输出到 csrc/ops/<op_name>/test/

Phase 6 输出                    Phase 7 输入
  精度报告通过             ────▶    <op_name>-test-cases.md(来自 Phase 3)
  csrc/ops/<op>/test/            算子名、工程/原生调用方式
                                  支持的全部 dtype、OP Type 关键字
                                  → 输出到 csrc/ops/<op_name>/test/
Phase 1 Output                    Phase 2 Input
  csrc/ops/<op_name>/    ────▶    Operator name, directory structure
  design.md (placeholder)

Phase 2 Output                    Phase 3 Input
  design.md (complete)       ────▶    Parameter constraints, supported dtypes, typical shapes
                                  → Generate unified test case document

Phase 3 Output                    Phase 4 Input
  <op_name>-test-cases.md ────▶    design.md (complete)
  (test case document for subsequent reuse)          Function signature, UB allocation table → bufferCoefficient
                                  Calculation pseudocode → Compute logic
                                  Tiling strategy → Block/UB splitting parameters

Phase 4 Output                    Phase 5 Input
  Installed operator whl        ────▶    register.cpp / ops.h / design.md /
  tests/test_<op_name>.py        op_host / test files
                                  → Extract interface information to generate documents

Phase 5 Output                    Phase 6 Input
  csrc/ops/<op>/README.md ────▶    <op_name>-test-cases.md (from Phase 3)
  Interface document completed                     Operator name, calling method, input domain constraints
                                  All supported dtypes, precision thresholds
                                  → Output to csrc/ops/<op_name>/test/

Phase 6 Output                    Phase 7 Input
  Precision report passed             ────▶    <op_name>-test-cases.md (from Phase 3)
  csrc/ops/<op>/test/            Operator name, project/native calling method
                                  All supported dtypes, OP Type keywords
                                  → Output to csrc/ops/<op_name>/test/

状态跟踪表

Status Tracking Table

Phase前置条件调用 Skill关键产出物
0. 需求收集CANN 路径 + Conda 环境 + 算子名称 + 功能描述
1. 工程初始化Phase 0
ascendc-operator-project-init
算子骨架目录
2. 设计文档Phase 1
ascendc-operator-design
design.md(含 Tiling + UB 分配表)
3. 用例生成Phase 2
ascendc-operator-testcase-gen
<op_name>-test-cases.md
(统一用例文档)
4. 代码&测试Phase 3
ascendc-operator-code-gen
compile-debug
可运行算子 + 基本测试通过
5. 接口文档Phase 4
ascendc-operator-doc-gen
PyTorch 风格中文 API 文档 (README.md)
6. 精度评估Phase 5
ascendc-operator-precision-eval
≥30 例精度测试 + 精度报告
7. 性能评测Phase 6
ascendc-operator-performance-eval
msprof 性能对比 + 性能报告
PhasePreconditionCalled SkillKey Deliverables
0. Requirements CollectionNoneCANN path + Conda environment + Operator name + Function description
1. Project InitializationPhase 0
ascendc-operator-project-init
Operator skeleton directory
2. Design DocumentPhase 1
ascendc-operator-design
design.md (includes Tiling + UB allocation table)
3. Test Case GenerationPhase 2
ascendc-operator-testcase-gen
<op_name>-test-cases.md
(unified test case document)
4. Code & TestingPhase 3
ascendc-operator-code-gen
compile-debug
Runnable operator + basic tests passed
5. Interface DocumentPhase 4
ascendc-operator-doc-gen
PyTorch-style Chinese API document (README.md)
6. Precision EvaluationPhase 5
ascendc-operator-precision-eval
≥30 precision test cases + precision report
7. Performance BenchmarkingPhase 6
ascendc-operator-performance-eval
msprof performance comparison + performance report

错误恢复

Error Recovery

从中断点恢复

Resume from Interrupted Point

当用户说"继续算子开发"时:
检测条件判定阶段恢复动作
csrc/ops/<op_name>/
不存在
Phase 1 未完成从 Phase 1 开始
design.md
为占位或空
Phase 2 未完成从 Phase 2 开始
csrc/ops/<op_name>/test/<op_name>-test-cases.md
不存在
Phase 3 未完成从 Phase 3 开始
op_host/
仍为骨架代码
Phase 4 未完成从 Phase 4 开始
whl 未生成Phase 4 编译未完成从编译步骤恢复
基本测试未通过Phase 4 测试未完成从测试步骤恢复
csrc/ops/<op_name>/README.md
不存在
Phase 5 未完成从 Phase 5 开始
csrc/ops/<op_name>/test/
无精度报告
Phase 6 未开始从 Phase 6 开始
精度报告不存在或精度测试未全部通过Phase 6 未完成从 Phase 6 恢复
精度报告存在但性能报告不存在Phase 7 未开始从 Phase 7 开始
<op_name>_perf_report.md
不存在或不完整
Phase 7 未完成从 Phase 7 恢复
When the user says "Continue operator development":
Detection ConditionDetermined StageRecovery Action
csrc/ops/<op_name>/
does not exist
Phase 1 not completedStart from Phase 1
design.md
is placeholder or empty
Phase 2 not completedStart from Phase 2
csrc/ops/<op_name>/test/<op_name>-test-cases.md
does not exist
Phase 3 not completedStart from Phase 3
op_host/
still contains skeleton code
Phase 4 not completedStart from Phase 4
whl package not generatedPhase 4 compilation not completedResume from compilation step
Basic tests not passedPhase 4 testing not completedResume from testing step
csrc/ops/<op_name>/README.md
does not exist
Phase 5 not completedStart from Phase 5
No precision report in
csrc/ops/<op_name>/test/
Phase 6 not startedStart from Phase 6
Precision report does not exist or precision tests not all passedPhase 6 not completedResume from Phase 6
Precision report exists but performance report does notPhase 7 not startedStart from Phase 7
<op_name>_perf_report.md
does not exist or is incomplete
Phase 7 not completedResume from Phase 7

编译/测试失败

Compilation/Test Failure

ascendc-operator-compile-debug
skill 内部处理,最多排错 3 次。3 次仍失败则停止并向用户报告详细错误。
Handled internally by
ascendc-operator-compile-debug
skill, up to 3 debugging attempts. If it still fails after 3 times, stop and report detailed errors to the user.