npu-adapter-reviewer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseNPU Adapter Reviewer - GPU到昇腾NPU适配审查专家
NPU Adapter Reviewer - GPU to Ascend NPU Adaptation Review Expert
这是一个专门用于将GPU代码适配到华为昇腾NPU的Agent Skill。本技能覆盖完整的适配工作流:代码分析、堵点识别、适配脚本编写、验证方案设计、以及最终报告生成。
This is an Agent Skill specifically designed for adapting GPU code to Huawei Ascend NPU. This skill covers the complete adaptation workflow: code analysis, bottleneck identification, adaptation script writing, verification plan design, and final report generation.
核心工作流程
Core Workflow
阶段1:代码仓库获取与分析
Phase 1: Code Repository Acquisition and Analysis
任务1.1:获取源代码
根据用户提供的输入(本地路径或GitHub链接),获取完整的代码仓库:
bash
undefinedTask 1.1: Obtain Source Code
Acquire the complete code repository based on user input (local path or GitHub link):
bash
undefined如果是GitHub链接,先克隆
If it's a GitHub link, clone first
git clone <repo_url> /tmp/gpu_code_base
cd /tmp/gpu_code_base
git clone <repo_url> /tmp/gpu_code_base
cd /tmp/gpu_code_base
如果是本地路径,直接分析
If it's a local path, analyze directly
ls -la <local_path>
**任务1.2:全面代码扫描**
使用并行探索的方式分析代码结构:
1. **探索Agent 1 - 代码结构分析**
- 找出所有Python文件、CUDA文件、C++文件
- 识别项目目录结构
- 找出主要的入口文件和配置文件
2. **探索Agent 2 - GPU依赖识别**
- 搜索CUDA API调用(`cudaMalloc`, `cudaMemcpy`, `kernel<<<...>>>`, `torch.cuda`等)
- 搜索PyTorch GPU相关代码(`.cuda()`, `.to('cuda')`, `torch.device('cuda')`等)
- 搜索TensorRT相关代码
- 搜索深度学习框架特定API(Transformer引擎、Flash Attention等)
3. **探索Agent 3 - 外部库依赖**
- 搜索`import`和`from ... import`语句
- 识别所有第三方库依赖
- 检查是否有NPU不支持的库
**任务1.3:生成代码结构报告**
输出以下信息:
- 项目总文件数、代码行数
- 文件类型分布(Python/CUDA/C++/其他)
- 主要依赖库列表
- 核心模块及其功能描述ls -la <local_path>
**Task 1.2: Comprehensive Code Scanning**
Analyze the code structure using parallel exploration:
1. **Exploration Agent 1 - Code Structure Analysis**
- Identify all Python files, CUDA files, and C++ files
- Recognize the project directory structure
- Locate main entry files and configuration files
2. **Exploration Agent 2 - GPU Dependency Identification**
- Search for CUDA API calls (`cudaMalloc`, `cudaMemcpy`, `kernel<<<...>>>`, `torch.cuda`, etc.)
- Search for PyTorch GPU-related code (`.cuda()`, `.to('cuda')`, `torch.device('cuda')`, etc.)
- Search for TensorRT-related code
- Search for deep learning framework-specific APIs (Transformer Engine, Flash Attention, etc.)
3. **Exploration Agent 3 - External Library Dependencies**
- Search for `import` and `from ... import` statements
- Identify all third-party library dependencies
- Check for libraries not supported by NPU
**Task 1.3: Generate Code Structure Report**
Output the following information:
- Total number of project files and lines of code
- File type distribution (Python/CUDA/C++/others)
- List of main dependent libraries
- Core modules and their function descriptions阶段2:GPU到NPU迁移堵点识别
Phase 2: Identify Bottlenecks in GPU-to-NPU Migration
任务2.1:算子兼容性分析
逐类识别GPU专用算子在NPU上的兼容性:
| 堵点类别 | GPU典型实现 | NPU替代方案 | 迁移难度 |
|---|---|---|---|
| CUDA核心算子 | | Ascend C算子 / ATB | 高 |
| 内存操作 | | | 中 |
| 流和事件 | | | 中 |
| cuBLAS/cuDNN | | | 高 |
| Flash Attention | | 昇腾Flash Attention算子 | 中 |
| 自定义算子 | PyTorch CUDA扩展 | ATC/ACL算子 | 高 |
| AMP/混合精度 | | | 低 |
任务2.2:识别具体堵点
对每个GPU API调用,生成以下分析:
undefinedTask 2.1: Operator Compatibility Analysis
Identify the compatibility of GPU-specific operators on NPU by category:
| Bottleneck Category | Typical GPU Implementation | NPU Alternative | Migration Difficulty |
|---|---|---|---|
| CUDA Core Operators | | Ascend C Operator / ATB | High |
| Memory Operations | | | Medium |
| Streams and Events | | | Medium |
| cuBLAS/cuDNN | | | High |
| Flash Attention | | Ascend Flash Attention Operator | Medium |
| Custom Operators | PyTorch CUDA Extensions | ATC/ACL Operators | High |
| AMP/Mixed Precision | | | Low |
Task 2.2: Identify Specific Bottlenecks
Generate the following analysis for each GPU API call:
undefined堵点编号: #001
Bottleneck ID: #001
- 文件位置:
src/attention/cuda_impl.cu:142 - GPU API:
cudaStreamCreate(&stream) - NPU替代:
aclrtCreateStream(&stream) - 迁移方案:
- 替换头文件
aclrt.h - 替换API调用
- 处理错误码差异
- 替换头文件
- 预估工作量: 0.5人天
- 影响范围: 全局流管理
**任务2.3:生成堵点清单**
输出完整的堵点列表,按影响范围和迁移难度排序。- File Location:
src/attention/cuda_impl.cu:142 - GPU API:
cudaStreamCreate(&stream) - NPU Alternative:
aclrtCreateStream(&stream) - Migration Plan:
- Replace header file with
aclrt.h - Replace API calls
- Handle error code differences
- Replace header file with
- Estimated Workload: 0.5 person-days
- Impact Scope: Global stream management
**Task 2.3: Generate Bottleneck List**
Output the complete list of bottlenecks, sorted by impact scope and migration difficulty.阶段3:适配脚本编写
Phase 3: Write Adaptation Scripts
任务3.1:创建NPU适配层
根据识别的堵点,创建适配脚本:
-
创建- Python层兼容适配
npu_compat.pypython# 自动检测运行设备 def get_device(): if is_npu_available(): return "npu" elif is_cuda_available(): return "cuda" else: return "cpu" # 替换torch.cuda调用 def to_device(tensor): device = get_device() if device == "npu": return tensor.npu() elif device == "cuda": return tensor.cuda() return tensor -
创建- NPU算子封装
npu_ops.py- 将所有CUDA核心算子封装为NPU版本
- 保留原有接口,内部实现NPU适配
-
创建- 编译脚本
build_npu.sh- ASCEND C算子编译命令
- 依赖环境检查
- 错误诊断
任务3.2:修改原有代码
生成修改后的代码文件,保留原文件并创建版本:
.npu- 替换所有GPU特定调用
- 添加设备检测逻辑
- 添加回退机制
Task 3.1: Create NPU Adaptation Layer
Create adaptation scripts based on identified bottlenecks:
-
Create- Python Layer Compatibility Adaptation
npu_compat.pypython# Auto-detect running device def get_device(): if is_npu_available(): return "npu" elif is_cuda_available(): return "cuda" else: return "cpu" # Replace torch.cuda calls def to_device(tensor): device = get_device() if device == "npu": return tensor.npu() elif device == "cuda": return tensor.cuda() return tensor -
Create- NPU Operator Encapsulation
npu_ops.py- Encapsulate all CUDA core operators as NPU versions
- Retain original interfaces with internal NPU adaptation implementation
-
Create- Compilation Script
build_npu.sh- ASCEND C operator compilation commands
- Dependency environment checks
- Error diagnosis
Task 3.2: Modify Original Code
Generate modified code files, retain original files and create versions:
.npu- Replace all GPU-specific calls
- Add device detection logic
- Add fallback mechanism
阶段4:验证方案设计
Phase 4: Design Verification Plan
任务4.1:创建验证脚本
根据适配内容,生成验证脚本 :
verify_npu.shbash
#!/bin/bashTask 4.1: Create Verification Scripts
Generate verification script based on adaptation content:
verify_npu.shbash
#!/bin/bashNPU适配验证脚本
NPU Adaptation Verification Script
echo "=== 1. 环境检查 ==="
check_npu_env() {
# 检查NPU驱动
ls -la /dev/npu 2>/dev/null || echo "Warning: NPU device not found"
# 检查CANN
echo $ASCEND_TOOLKIT_HOME
# 检查Python包
python3 -c "import torch; print('PyTorch version:', torch.version)"
python3 -c "import torch_npu; print('torch_npu installed')"
}
echo "=== 2. 模块导入测试 ==="
test_imports() {
cd <project_path>
python3 -c "import npu_compat; print('npu_compat OK')"
python3 -c "import npu_ops; print('npu_ops OK')"
}
echo "=== 3. 功能验证 ==="
test_functions() {
# 运行基础测试
python3 -m pytest tests/test_npu_*.py -v
# 验证算子精度
python3 scripts/verify_precision.py
}
echo "=== 4. 性能基准测试 ==="
benchmark() {
python3 scripts/benchmark.py --device npu --compare cuda
}
**任务4.2:精度验证脚本**
生成 `verify_precision.py`:
```python
import numpy as np
def verify_npu_precision(cuda_result, npu_result, rtol=1e-3, atol=1e-3):
"""验证NPU与GPU输出精度差异"""
diff = np.abs(cuda_result - npu_result)
max_diff = np.max(diff)
mean_diff = np.mean(diff)
passed = np.allclose(cuda_result, npu_result, rtol=rtol, atol=atol)
return {
"passed": passed,
"max_diff": max_diff,
"mean_diff": mean_diff,
"rtol": rtol,
"atol": atol
}echo "=== 1. Environment Check ==="
check_npu_env() {
# Check NPU driver
ls -la /dev/npu 2>/dev/null || echo "Warning: NPU device not found"
# Check CANN
echo $ASCEND_TOOLKIT_HOME
# Check Python packages
python3 -c "import torch; print('PyTorch version:', torch.version)"
python3 -c "import torch_npu; print('torch_npu installed')"
}
echo "=== 2. Module Import Test ==="
test_imports() {
cd <project_path>
python3 -c "import npu_compat; print('npu_compat OK')"
python3 -c "import npu_ops; print('npu_ops OK')"
}
echo "=== 3. Function Verification ==="
test_functions() {
# Run basic tests
python3 -m pytest tests/test_npu_*.py -v
# Verify operator precision
python3 scripts/verify_precision.py
}
echo "=== 4. Performance Benchmark ==="
benchmark() {
python3 scripts/benchmark.py --device npu --compare cuda
}
**Task 4.2: Precision Verification Script**
Generate `verify_precision.py`:
```python
import numpy as np
def verify_npu_precision(cuda_result, npu_result, rtol=1e-3, atol=1e-3):
"""Verify precision difference between NPU and GPU outputs"""
diff = np.abs(cuda_result - npu_result)
max_diff = np.max(diff)
mean_diff = np.mean(diff)
passed = np.allclose(cuda_result, npu_result, rtol=rtol, atol=atol)
return {
"passed": passed,
"max_diff": max_diff,
"mean_diff": mean_diff,
"rtol": rtol,
"atol": atol
}阶段5:审查报告生成
Phase 5: Generate Review Report
任务5.1:生成Markdown报告
根据验证结果,生成完整的审查报告:
markdown
undefinedTask 5.1: Generate Markdown Report
Generate a complete review report based on verification results:
markdown
undefinedGPU到昇腾NPU适配审查报告
GPU to Ascend NPU Adaptation Review Report
CodeReview_Results_YYYY-MM-DD.md
CodeReview_Results_YYYY-MM-DD.md
1. 执行摘要
1. Executive Summary
| 项目 | 内容 |
|---|---|
| 原始代码仓库 | |
| 审查日期 | YYYY-MM-DD |
| 适配状态 | ✅ 完全适配 / ⚠️ 部分适配 / ❌ 适配失败 |
| 识别堵点总数 | XX个 |
| 已适配堵点 | XX个 |
| 剩余堵点 | XX个 |
| Item | Content |
|---|---|
| Original Code Repository | |
| Review Date | YYYY-MM-DD |
| Adaptation Status | ✅ Fully Adapted / ⚠️ Partially Adapted / ❌ Adaptation Failed |
| Total Identified Bottlenecks | XX |
| Adapted Bottlenecks | XX |
| Remaining Bottlenecks | XX |
2. 原始代码分析
2. Original Code Analysis
2.1 代码结构概览
2.1 Code Structure Overview
- 总文件数:XX
- Python代码行数:XX
- CUDA/C++代码行数:XX
- 核心模块:...
- Total Files: XX
- Lines of Python Code: XX
- Lines of CUDA/C++ Code: XX
- Core Modules: ...
2.2 依赖分析
2.2 Dependency Analysis
| 库名 | 版本 | NPU兼容性 | 替代方案 |
|---|---|---|---|
| torch | 2.x | ✅ 兼容 | torch_npu |
| flash-attn | 2.x | ⚠️ 部分 | 昇顿Flash Attention |
| Library Name | Version | NPU Compatibility | Alternative |
|---|---|---|---|
| torch | 2.x | ✅ Compatible | torch_npu |
| flash-attn | 2.x | ⚠️ Partial | Ascend Flash Attention |
3. 迁移堵点详细分析
3. Detailed Analysis of Migration Bottlenecks
3.1 算子兼容性问题
3.1 Operator Compatibility Issues
问题 #001: CUDA Stream管理
Issue #001: CUDA Stream Management
- 文件:
src/utils/stream_manager.py:45 - GPU API:
cudaStreamCreate - 问题描述: 使用CUDA流管理异步执行
- NPU替代:
aclrtCreateStream - 影响范围: 全局,影响所有异步操作
- 迁移建议:
python
# 修改前 import torch.cuda stream = torch.cuda.Stream() # 修改后 import torch_npu stream = torch.npu.Stream() - 状态: ✅ 已适配 / ⚠️ 待处理
- File:
src/utils/stream_manager.py:45 - GPU API:
cudaStreamCreate - Issue Description: Uses CUDA streams to manage asynchronous execution
- NPU Alternative:
aclrtCreateStream - Impact Scope: Global, affects all asynchronous operations
- Migration Suggestion:
python
# Before modification import torch.cuda stream = torch.cuda.Stream() # After modification import torch_npu stream = torch.npu.Stream() - Status: ✅ Adapted / ⚠️ Pending
问题 #002: Flash Attention算子
Issue #002: Flash Attention Operator
- 文件:
src/attention/flash_attn_impl.py:78 - GPU API:
flash_attn_varlen_func - 问题描述: 使用Flash Attention加速注意力计算
- NPU替代: Ascend flash_attn算子或MindSpore flash_attention
- 影响范围: 高,核心推理性能
- 迁移建议:
python
# 修改前 from flash_attn import flash_attn_func output = flash_attn_func(q, k, v) # 修改后 # 方案1: 使用torch_npu的算子 import torch_npu output = torch_npu.npu_flash_attention(q, k, v) # 方案2: 使用ATB库 from ascend_toolkit import flash_attention output = flash_attention(q, k, v) - 状态: ✅ 已适配 / ⚠️ 待处理
- File:
src/attention/flash_attn_impl.py:78 - GPU API:
flash_attn_varlen_func - Issue Description: Uses Flash Attention to accelerate attention computation
- NPU Alternative: Ascend flash_attn operator or MindSpore flash_attention
- Impact Scope: High, core inference performance
- Migration Suggestion:
python
# Before modification from flash_attn import flash_attn_func output = flash_attn_func(q, k, v) # After modification # Option 1: Use torch_npu operators import torch_npu output = torch_npu.npu_flash_attention(q, k, v) # Option 2: Use ATB library from ascend_toolkit import flash_attention output = flash_attention(q, k, v) - Status: ✅ Adapted / ⚠️ Pending
3.2 模型加载与权重管理问题
3.2 Model Loading and Weight Management Issues
问题 #003: GPU权重格式
Issue #003: GPU Weight Format
- 文件:
src/model/loader.py:112 - 问题描述: 权重以CUDA格式存储,直接加载会失败
- 迁移建议:
python
# 修改前 state_dict = torch.load(weights_path) model.load_state_dict(state_dict) # 修改后 state_dict = torch.load(weights_path, map_location='cpu') # 转换权重 for k, v in state_dict.items(): if isinstance(v, torch.Tensor): state_dict[k] = v.npu() model.load_state_dict(state_dict) - 状态: ✅ 已适配
- File:
src/model/loader.py:112 - Issue Description: Weights stored in CUDA format will fail to load directly
- Migration Suggestion:
python
# Before modification state_dict = torch.load(weights_path) model.load_state_dict(state_dict) # After modification state_dict = torch.load(weights_path, map_location='cpu') # Convert weights for k, v in state_dict.items(): if isinstance(v, torch.Tensor): state_dict[k] = v.npu() model.load_state_dict(state_dict) - Status: ✅ Adapted
3.3 计算性能瓶颈
3.3 Computing Performance Bottlenecks
问题 #004: 算子融合缺失
Issue #004: Missing Operator Fusion
- 文件:
src/model/inference.py:89 - 问题描述: 多个独立算子导致性能下降
- 迁移建议: 使用ATC进行算子融合优化
- 预估性能提升: 20-30%
- 状态: ⚠️ 待处理
- File:
src/model/inference.py:89 - Issue Description: Multiple independent operators cause performance degradation
- Migration Suggestion: Use ATC for operator fusion optimization
- Estimated Performance Improvement: 20-30%
- Status: ⚠️ Pending
3.4 NPU内存与KV Cache管理
3.4 NPU Memory and KV Cache Management
问题 #005: 动态内存分配
Issue #005: Dynamic Memory Allocation
- 文件:
src/cache/kv_cache.py:56 - 问题描述: 使用CUDA动态内存分配
- 迁移建议: 使用固定内存池
- 状态: ⚠️ 待处理
- File:
src/cache/kv_cache.py:56 - Issue Description: Uses CUDA dynamic memory allocation
- Migration Suggestion: Use fixed memory pool
- Status: ⚠️ Pending
3.5 Python-C++边界问题
3.5 Python-C++ Boundary Issues
问题 #006: C++扩展编译
Issue #006: C++ Extension Compilation
- 文件:
src/utils/gpu_ext.cpp:145 - 问题描述: CUDA C++扩展需要重新编译
- 迁移建议: 使用Ascend C重写或使用ATB
- 状态: ⚠️ 待处理
- File:
src/utils/gpu_ext.cpp:145 - Issue Description: CUDA C++ extensions need to be recompiled
- Migration Suggestion: Rewrite with Ascend C or use ATB
- Status: ⚠️ Pending
3.6 并发与异步问题
3.6 Concurrency and Asynchronous Issues
问题 #007: 多流并发
Issue #007: Multi-Stream Concurrency
- 文件:
src/server/request_handler.py:78 - 问题描述: 使用CUDA流实现并发
- 迁移建议: 重构为进程级并发
- 状态: ⚠️ 待处理
- File:
src/server/request_handler.py:78 - Issue Description: Uses CUDA streams to implement concurrency
- Migration Suggestion: Refactor to process-level concurrency
- Status: ⚠️ Pending
3.7 配置与可维护性问题
3.7 Configuration and Maintainability Issues
问题 #008: 硬编码设备
Issue #008: Hard-Coded Device
- 文件:
src/config.py:23 - 问题描述: 配置中硬编码
cuda:0 - 迁移建议: 改为设备检测
- 状态: ✅ 已适配
- File:
src/config.py:23 - Issue Description: Hard-coded in configuration
cuda:0 - Migration Suggestion: Change to device detection
- Status: ✅ Adapted
4. 适配代码清单
4. Adaptation Code List
4.1 新增文件
4.1 Newly Added Files
| 文件名 | 功能 | 状态 |
|---|---|---|
| 设备检测与兼容层 | ✅ |
| NPU算子封装 | ✅ |
| 编译脚本 | ✅ |
| 验证脚本 | ✅ |
| File Name | Function | Status |
|---|---|---|
| Device detection and compatibility layer | ✅ |
| NPU operator encapsulation | ✅ |
| Compilation script | ✅ |
| Verification script | ✅ |
4.2 修改文件
4.2 Modified Files
| 文件名 | 修改内容 | 状态 |
|---|---|---|
| 替换为NPU算子 | ✅ |
| 添加权重转换 | ✅ |
| Stream适配 | ✅ |
| File Name | Modification Content | Status |
|---|---|---|
| Replaced with NPU operators | ✅ |
| Added weight conversion | ✅ |
| Stream adaptation | ✅ |
5. 验证结果
5. Verification Results
5.1 环境验证
5.1 Environment Verification
- NPU驱动已安装
- CANN Toolkit已配置
- torch_npu已安装
- Python模块可导入
- NPU driver installed
- CANN Toolkit configured
- torch_npu installed
- Python modules importable
5.2 功能验证
5.2 Function Verification
- 基础模块导入测试通过
- 设备检测功能正常
- 前向推理执行成功
- 权重加载转换正常
- Basic module import tests passed
- Device detection function normal
- Forward inference executed successfully
- Weight loading and conversion normal
5.3 精度验证
5.3 Precision Verification
- 推理结果与GPU差异 < 0.1%
- 性能测试待执行(需要NPU硬件)
- Inference result difference from GPU < 0.1%
- Performance test pending (requires NPU hardware)
5.4 问题汇总
5.4 Issue Summary
| 问题类型 | 数量 | 严重程度 |
|---|---|---|
| 已解决 | XX | - |
| 待解决 | XX | 高/中/低 |
| Issue Type | Quantity | Severity |
|---|---|---|
| Resolved | XX | - |
| Unresolved | XX | High/Medium/Low |
6. 适配指南
6. Adaptation Guide
6.1 前置条件
6.1 Prerequisites
bash
undefinedbash
undefined1. 安装CANN Toolkit
1. Install CANN Toolkit
Download URL: https://www.hiascend.com/software/aiengine
2. 安装torch_npu
2. Install torch_npu
pip install torch torch_npu
pip install torch torch_npu
3. 验证安装
3. Verify installation
python3 -c "import torch; import torch_npu; print('NPU available:', torch_npu.is_npu_available())"
undefinedpython3 -c "import torch; import torch_npu; print('NPU available:', torch_npu.is_npu_available())"
undefined6.2 快速适配步骤
6.2 Quick Adaptation Steps
步骤1: 克隆并进入项目
bash
git clone <repo_url>
cd <project_name>步骤2: 安装依赖
bash
pip install -r requirements-npu.txt步骤3: 运行验证
bash
bash verify_npu.sh步骤4: 执行推理
bash
python3 run_npu.py --model <model_path> --input <input_data>Step 1: Clone and enter project
bash
git clone <repo_url>
cd <project_name>Step 2: Install dependencies
bash
pip install -r requirements-npu.txtStep 3: Run verification
bash
bash verify_npu.shStep 4: Execute inference
bash
python3 run_npu.py --model <model_path> --input <input_data>6.3 常见问题排查
6.3 Common Troubleshooting
| 问题 | 原因 | 解决方案 |
|---|---|---|
| 导入失败 | CANN未正确安装 | 重新配置环境变量 |
| 算子不支持 | NPU不支持该算子 | 使用ATB替代或自研算子 |
| 内存溢出 | 批处理过大 | 减小batch_size |
| 精度不达标 | 混合精度配置问题 | 检查AMP配置 |
| Issue | Cause | Solution |
|---|---|---|
| Import failure | CANN not installed correctly | Reconfigure environment variables |
| Operator not supported | NPU does not support the operator | Use ATB alternative or self-developed operator |
| Out of memory | Batch size too large | Reduce batch_size |
| Precision not up to standard | Mixed precision configuration issue | Check AMP configuration |
7. 后续工作建议
7. Follow-Up Work Suggestions
7.1 短期(1周内)
7.1 Short-Term (Within 1 Week)
- 完成剩余堵点的适配
- 在真实NPU硬件上进行性能测试
- 优化算子融合
- Complete adaptation of remaining bottlenecks
- Conduct performance tests on real NPU hardware
- Optimize operator fusion
7.2 中期(1个月内)
7.2 Mid-Term (Within 1 Month)
- 完善错误处理机制
- 添加日志和监控
- 性能调优
- Improve error handling mechanism
- Add logging and monitoring
- Performance tuning
7.3 长期
7.3 Long-Term
- 持续跟进CANN更新
- 自动化测试流程
- 文档完善
报告生成时间: YYYY-MM-DD HH:mm:ss
适配工程师: AI Agent (NPU Adapter Reviewer)
报告版本: v1.0
**任务5.2:输出报告**
将报告保存到当前目录:CodeReview_Results_YYYY-MM-DD.md
undefined- Keep up with CANN updates
- Automate testing processes
- Improve documentation
Report Generation Time: YYYY-MM-DD HH:mm:ss
Adaptation Engineer: AI Agent (NPU Adapter Reviewer)
Report Version: v1.0
**Task 5.2: Output Report**
Save the report to the current directory:CodeReview_Results_YYYY-MM-DD.md
undefined输出要求
Output Requirements
- 报告格式: 必须是Markdown格式
- 文件命名: (格式:YYYY-MM-DD)
CodeReview_Results_运行当天的日期.md - 保存位置: 当前工作目录
- 内容完整性: 必须包含上述所有章节
- Report Format: Must be Markdown format
- File Naming: (format: YYYY-MM-DD for the current run date)
CodeReview_Results_YYYY-MM-DD.md - Save Location: Current working directory
- Content Completeness: Must include all sections mentioned above
特殊处理规则
Special Handling Rules
如果验证完全通过
If Verification Passes Completely
- 输出"适配成功"的状态
- 提供完整的适配指南
- 包含端到端运行说明
- Output "Adaptation Successful" status
- Provide complete adaptation guide
- Include end-to-end execution instructions
如果验证未完全通过
If Verification Does Not Pass Completely
- 详细说明每个失败项
- 提供具体的修复建议
- 给出修改后的代码
- 标注需要人工介入的部分
- Detail each failed item
- Provide specific repair suggestions
- Provide modified code
- Mark parts requiring manual intervention
知识参考
Knowledge References
在执行过程中,可参考以下资料(按需加载):
- - 昇腾NPU最佳实践
references/ascend_npu_best_practices.md - - CANN迁移指南
references/cann_migration_guide.md - - NPU Python API参考
references/npu_python_api.md
请使用此skill完成GPU到昇腾NPU的完整适配审查工作。
During execution, refer to the following materials (load on demand):
- - Ascend NPU Best Practices
references/ascend_npu_best_practices.md - - CANN Migration Guide
references/cann_migration_guide.md - - NPU Python API Reference
references/npu_python_api.md
Please use this skill to complete the full adaptation review work for GPU to Ascend NPU.