Search Results: operator-optimization

Found 2 Skills

AI & Machine Learningascend-ai-coding/awesome-...

triton-ascend-migration

Migrate GPU/CUDA Triton operators to Triton-Ascend, or rewrite Python/PyTorch operators into Triton-Ascend implementations that can run on Ascend NPU. When clear optimization opportunities are identified, directly output the optimized code, minimal validation script, and troubleshooting instructions. This skill should be prioritized when users mention 昇腾 (Ascend), Ascend, NPU, triton-ascend, Triton operator migration, PyTorch operator rewriting, coreDim, UB overflow, 1D grid, physical core binding, block_ptr, stride, memory access alignment, mask performance, dtype degradation, operator optimization, or directly ask questions like "How to use this skill", "How to run it in the command line", "How to perform migration/validation in a container", even if users do not explicitly say "write a skill" or "perform migration".

🇨🇳|ChineseTranslated

AI & Machine Learningascend-ai-coding/awesome-...

external-gitcode-ascend-simple-vector-triton-gpu-to-npu

将简单Vector类型Triton算子从GPU迁移到昇腾NPU。当用户需要迁移Triton代码到NPU、提到GPU到NPU迁移、Triton迁移、昇腾适配时使用。注意：无法自动迁移存在编译问题的算子。

🇺🇸|EnglishTranslated

2 scripts/Checked