tilegym-converting-cutile-to-julia

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

cuTile Python → cuTile.jl (Julia) Conversion

cuTile Python → cuTile.jl(Julia)转换

Convert
@ct.kernel
Python kernels to Julia
function ... end
cuTile.jl kernels.
@ct.kernel
Python内核转换为Julia
function ... end
格式的cuTile.jl内核。

Workflow Selection

工作流选择

  • Standard conversion → Full workflow:
    translations/workflow.md
  • Errors (
    MethodError
    ,
    IRError
    , numerical mismatch) →
    references/debugging.md
  • Quick reference
    references/api-mapping.md
    +
    references/critical-rules.md
  • Test patterns
    references/testing.md
  • 标准转换 → 完整工作流:
    translations/workflow.md
  • 错误排查
    MethodError
    IRError
    、数值不匹配)→
    references/debugging.md
  • 快速参考
    references/api-mapping.md
    +
    references/critical-rules.md
  • 测试模式
    references/testing.md

Architecture

架构说明

Julia kernels are standalone — no Python bridge, no pytest integration. The Julia sub-project lives in
julia/
at the repo root with its own
Project.toml
for dependency management.
julia/                          # Self-contained Julia sub-project
├── Project.toml                # Dependencies: CUDA.jl, cuTile.jl, NNlib.jl, Test
├── kernels/                    # cuTile.jl kernel implementations
│   ├── add.jl                  # ← Ground-truth: 1D element-wise with alpha scaling (tensor+tensor, tensor+scalar)
│   ├── matmul.jl               # ← Ground-truth: 2D tiled MMA, standard Julia layout (M,K)×(K,N)→(M,N)
│   └── softmax.jl              # ← Ground-truth: 3 strategies (TMA, online, chunked) using ct.load/ct.store
└── test/                       # Julia-native tests (using Test stdlib)
    ├── runtests.jl             # Test runner entry point
    ├── test_add.jl
    ├── test_matmul.jl
    └── test_softmax.jl
Ground-truth reference: Always consult
julia/kernels/*.jl
and
julia/test/*.jl
for patterns that compile and pass tests. These are the canonical examples of working cuTile.jl code.
Julia内核是独立的——无需Python桥接,也不集成pytest。Julia子项目位于仓库根目录的
julia/
文件夹中,通过自身的
Project.toml
管理依赖。
julia/                          # 独立的Julia子项目
├── Project.toml                # 依赖项:CUDA.jl, cuTile.jl, NNlib.jl, Test
├── kernels/                    # cuTile.jl内核实现
│   ├── add.jl                  # ← 基准示例:带alpha缩放的1D逐元素运算(张量+张量、张量+标量)
│   ├── matmul.jl               # ← 基准示例:2D分块MMA,标准Julia布局 (M,K)×(K,N)→(M,N)
│   └── softmax.jl              # ← 基准示例:使用ct.load/ct.store的3种策略(TMA、在线、分块)
└── test/                       # Julia原生测试(使用Test标准库)
    ├── runtests.jl             # 测试运行入口
    ├── test_add.jl
    ├── test_matmul.jl
    └── test_softmax.jl
基准参考:始终参考
julia/kernels/*.jl
julia/test/*.jl
中的可编译并通过测试的模式,这些是cuTile.jl可用代码的标准范例。

Instructions

操作步骤

  1. Analyze the Python kernel: identify patterns, shapes, dtypes, operations
  2. Write Julia kernel
    julia/kernels/<op>.jl
    with cuTile.jl kernel + bridge function(s)
  3. Convert kernel signature (see
    translations/workflow.md
    Phase 2)
  4. Convert kernel body (apply
    references/api-mapping.md
    +
    references/critical-rules.md
    )
  5. Write Julia test
    julia/test/test_<op>.jl
    using
    Test
    stdlib +
    NNlib.jl
    for reference
  6. Register test — add
    include(...)
    in
    julia/test/runtests.jl
  7. Validate — run the bundled validator:
    python <skill-dir>/scripts/validate_cutile_jl.py <file.jl>
  8. Test — run
    julia --project=julia/ julia/test/runtests.jl
Full conversion checklist with post-conversion verification →
translations/workflow.md
  1. 分析Python内核:识别模式、形状、数据类型、运算逻辑
  2. 编写Julia内核 — 在
    julia/kernels/<op>.jl
    中编写cuTile.jl内核及桥接函数
  3. 转换内核签名(参考
    translations/workflow.md
    第二阶段)
  4. 转换内核主体(应用
    references/api-mapping.md
    +
    references/critical-rules.md
    中的规则)
  5. 编写Julia测试 — 在
    julia/test/test_<op>.jl
    中使用Test标准库 + NNlib.jl作为参考实现
  6. 注册测试 — 在
    julia/test/runtests.jl
    中添加
    include(...)
    语句
  7. 验证 — 运行捆绑的验证工具:
    python <skill-dir>/scripts/validate_cutile_jl.py <file.jl>
  8. 测试 — 运行
    julia --project=julia/ julia/test/runtests.jl
包含转换后验证步骤的完整转换清单 →
translations/workflow.md

⚠️ Top Pitfalls

⚠️ 主要陷阱

The most dangerous translation errors. Full rules (17 total) in
references/critical-rules.md
.
#PitfallOne-line fix
1
ct.full()
doesn't exist in Julia
Use
fill(val, shape)
,
zeros(T, dims...)
, or
ones(T, dims...)
2
max(a, b)
on tiles →
IRError
Use
max.(a, b)
(broadcast dot)
3
IRError
/
MethodError
mentioning
IRStructurizer
Compiler bug — file upstream with minimal reproducer
4
ct.launch
arg order silently wrong
Args are positional — match kernel signature exactly
5
ct.load
with
order
— index positions wrong
order
remaps BOTH shape AND index (Critical Rule 16)
最容易出现的转换错误。完整规则(共17条)请查看
references/critical-rules.md
序号常见陷阱一行式修复方案
1Julia中不存在
ct.full()
使用
fill(val, shape)
zeros(T, dims...)
ones(T, dims...)
2对tile使用
max(a, b)
导致
IRError
使用
max.(a, b)
(广播点语法)
3出现提及
IRStructurizer
IRError
/
MethodError
编译器bug — 向上游提交最小复现案例
4
ct.launch
参数顺序错误但无提示
参数为位置参数 — 必须与内核签名完全匹配
5使用带
order
参数的
ct.load
时索引位置错误
order
会同时重映射形状和索引(关键规则16)

Worked Examples

示例演示

Side-by-side Python → Julia conversions matching the released Julia kernels in
julia/kernels/
. Each directory contains
cutile_python.py
(before) and
cutile_julia.jl
(after).
#ExampleKey PatternsWhen to Reference
01
add
1D
ct.load
/
ct.store
, alpha scaling, scalar broadcast,
fill
/
zeros
, keyword load/store
Starting point; basic TMA + element-wise patterns
02
matmul
muladd
, TF32 conversion, K-loop with
for
, 2D swizzle, standard Julia layout,
ct.@compiler_options
MMA / tensor core operations
03
softmax
Persistent scheduling,
for
loops,
gather
/
scatter
,
padding_mode
, multi-pass
Large-tensor reduction patterns
These match the released kernels in
julia/kernels/
(
add.jl
,
matmul.jl
,
softmax.jl
). The examples are simplified teaching versions — always consult
julia/kernels/*.jl
for the canonical, tested implementations.
Python → Julia的对比转换示例,与
julia/kernels/
中已发布的Julia内核一致。每个目录包含
cutile_python.py
(转换前)和
cutile_julia.jl
(转换后)。
序号示例核心模式参考场景
01
add
1D
ct.load
/
ct.store
、alpha缩放、标量广播、
fill
/
zeros
、关键字加载/存储
入门场景;基础TMA+逐元素运算模式
02
matmul
muladd
、TF32转换、带
for
的K循环、2D重排、标准Julia布局、
ct.@compiler_options
MMA/张量核心运算
03
softmax
持久调度、
for
循环、
gather
/
scatter
padding_mode
、多阶段处理
大张量归约模式
这些示例与
julia/kernels/
中的发布内核(
add.jl
matmul.jl
softmax.jl
)一致。示例为简化教学版本 — 如需标准的已测试实现,请始终参考
julia/kernels/*.jl

Reference Documents

参考文档

CategoryDocumentContent
Workflows
translations/workflow.md
Full conversion workflow with todo list, validation loop, checklist
Rules
references/critical-rules.md
17 Critical Rules for cuTile Python → Julia conversion
API
references/api-mapping.md
Python↔Julia bidirectional API mapping + kernel patterns
Testing
references/testing.md
Julia-native test patterns, tolerances, failure diagnosis
Debugging
references/debugging.md
Julia-specific error diagnosis + IR debug commands
Scripts
scripts/validate_cutile_jl.py
Static validation for Julia anti-patterns (run it)
Ground Truth
julia/kernels/*.jl
+
julia/test/*.jl
Actual working implementations in the codebase
分类文档内容
工作流
translations/workflow.md
完整转换工作流,包含任务清单、验证循环、检查列表
规则
references/critical-rules.md
cuTile Python→Julia转换的17条关键规则
API
references/api-mapping.md
Python↔Julia双向API映射+内核模式
测试
references/testing.md
Julia原生测试模式、容差设置、故障诊断
调试
references/debugging.md
Julia特定错误诊断+IR调试命令
脚本
scripts/validate_cutile_jl.py
Julia反模式的静态验证工具(建议运行)
基准实现
julia/kernels/*.jl
+
julia/test/*.jl
代码库中的实际可用实现

Environment Setup

环境搭建

Prerequisite — Julia: this skill requires the Julia version declared in
julia/Project.toml
under
[compat] julia
. If
julia --version
is missing or older than that, install from the official Julia site at https://julialang.org/install/ following the verified installer instructions for your OS. Resume below once
julia --version
is compatible.
Then, from the repo root:
bash
undefined
前置条件——Julia:本技能需要
julia/Project.toml
[compat] julia
声明的Julia版本。如果
julia --version
显示版本缺失或低于要求版本,请访问Julia官方网站https://julialang.org/install/,按照对应操作系统的验证安装指南进行安装。待
julia --version
显示版本符合要求后,继续以下步骤。
然后,从仓库根目录执行:
bash
undefined

Install Julia dependencies declared in julia/Project.toml

安装julia/Project.toml中声明的Julia依赖

julia --project=julia/ -e 'using Pkg; Pkg.instantiate()'
julia --project=julia/ -e 'using Pkg; Pkg.instantiate()'

Run tests

运行测试

julia --project=julia/ julia/test/runtests.jl

Requirements:
- Julia (minimum version declared in `julia/Project.toml` under `[compat] julia`)
- CUDA 13.1+ driver
- Blackwell GPU (compute capability 10+)
- Dependencies managed via `julia/Project.toml`: CUDA.jl, cuTile.jl, NNlib.jl, Test
julia --project=julia/ julia/test/runtests.jl

要求:
- Julia(最低版本为`julia/Project.toml`中`[compat] julia`声明的版本)
- CUDA 13.1+驱动
- Blackwell GPU(计算能力10+)
- 通过`julia/Project.toml`管理的依赖:CUDA.jl、cuTile.jl、NNlib.jl、Test