搜索：cutile - AI Agent Skills

AI & Machine Learningnvidia/skills

tilegym-improve-cutile-kernel-perf

Iteratively optimize cuTile kernel performance through systematic profiling, bottleneck analysis, IR comparison, and targeted tuning. Covers tile sizes, occupancy, autotune configs, TMA, latency hints, persistent scheduling, num_ctas, flush_to_zero, and IR-level debugging. Use when asked to "optimize cutile kernel", "improve kernel perf", "tune cutile performance", "make kernel faster", or iteratively benchmark and refine a cuTile GPU kernel in the TileGym project.

🇺🇸|EnglishTranslated

1

Backend Developmentnvidia/skills

tilegym-converting-cutile-to-julia

Converts cuTile Python GPU kernels (@ct.kernel) to cuTile.jl Julia equivalents. Handles kernel syntax translation, 0-indexed to 1-indexed conversion, broadcasting differences, memory layout (row-major to column-major), type system mapping, and launch API differences. Use when converting, porting, or translating cuTile Python kernels to Julia cuTile.jl, or debugging/optimizing existing Julia cuTile translations.

🇺🇸|EnglishTranslated

1

4 scripts/Attention

Backend Developmentpepperu96/hyper-mla

design-cute-dsl-kernel

CuTe Python DSL kernel workflow, CuteKernel runtime wrapper, suitability gate, tiling guidance, and CuTe-specific pitfalls. Use when: (1) planning or implementing a kernel in the CuTe Python DSL, (2) the optimization needs more explicit control than cuTile exposes but should remain in a Python-driven workflow, (3) defining package naming for cute-dsl kernels, (4) documenting CuTe Python DSL design choices, (5) recording language-specific knowledge for CuTe Python DSL.

🇺🇸|EnglishTranslated

4

AI & Machine Learningpromptingcompany/nv-skill...

tilegym-monkey-patch-kernels-to-transformers

Integrate TileGym kernels into Hugging Face `transformers` models by replacing the library's submodule(s) and certain class(es)' implementations, and patching certain class(es)' init/forward/load weight methods prior to instantiating models. Used when the user requires integrating TileGym kernels into `transformers` models.

🇺🇸|EnglishTranslated

7

AI & Machine Learningnvidia/skills

monkey-patch-kernels-to-transformers

Integrate TileGym kernels into Hugging Face `transformers` models by replacing the library's submodule(s) and certain class(es)' implementations, and patching certain class(es)' init/forward/load weight methods prior to instantiating models. Used when the user requires integrating TileGym kernels into `transformers` models.

🇺🇸|EnglishTranslated

4

Search Results: cutile

tilegym-improve-cutile-kernel-perf

tilegym-converting-cutile-to-julia

design-cute-dsl-kernel

tilegym-monkey-patch-kernels-to-transformers

monkey-patch-kernels-to-transformers

Search Results: cutile

tilegym-improve-cutile-kernel-perf

tilegym-converting-cutile-to-julia

design-cute-dsl-kernel

tilegym-monkey-patch-kernels-to-transformers

monkey-patch-kernels-to-transformers