tinygrad
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chinesetinygrad
tinygrad
A minimal deep learning framework focused on beauty and minimalism. Every line must earn its keep.
一个专注于简洁优雅的轻量级深度学习框架。每一行代码都必须物有所值。
Quick Reference
快速参考
python
from tinygrad import Tensor, TinyJit, nn, dtypes, Device, GlobalCounterspython
from tinygrad import Tensor, TinyJit, nn, dtypes, Device, GlobalCountersTensor creation
Tensor creation
x = Tensor([1, 2, 3])
x = Tensor.rand(2, 3)
x = Tensor.kaiming_uniform(128, 784)
x = Tensor([1, 2, 3])
x = Tensor.rand(2, 3)
x = Tensor.kaiming_uniform(128, 784)
Operations are lazy until realized
Operations are lazy until realized
y = (x + 1).relu().sum()
y.realize() # or y.numpy()
y = (x + 1).relu().sum()
y.realize() # or y.numpy()
Training context
Training context
with Tensor.train():
loss = model(x).sparse_categorical_crossentropy(labels).backward()
optim.step()
undefinedwith Tensor.train():
loss = model(x).sparse_categorical_crossentropy(labels).backward()
optim.step()
undefinedArchitecture Pipeline
架构流程
- Tensor () - User API, creates UOp graph
tinygrad/tensor.py - UOp () - Unified IR for all operations
tinygrad/uop/ops.py - Schedule () - Converts tensor UOps to kernel UOps
tinygrad/engine/schedule.py - Codegen () - Converts kernel UOps to device code
tinygrad/codegen/ - Runtime () - Device-specific execution
tinygrad/runtime/
- Tensor () - 用户API,用于创建UOp图
tinygrad/tensor.py - UOp () - 所有运算的统一中间表示(IR)
tinygrad/uop/ops.py - Schedule () - 将张量UOp转换为内核UOp
tinygrad/engine/schedule.py - Codegen () - 将内核UOp转换为设备代码
tinygrad/codegen/ - Runtime () - 设备特定的执行逻辑
tinygrad/runtime/
Training Loop Pattern
训练循环模式
python
from tinygrad import Tensor, TinyJit, nn
from tinygrad.nn.datasets import mnist
X_train, Y_train, X_test, Y_test = mnist()
model = Model()
optim = nn.optim.Adam(nn.state.get_parameters(model))
@TinyJit
@Tensor.train()
def train_step():
optim.zero_grad()
samples = Tensor.randint(512, high=X_train.shape[0])
loss = model(X_train[samples]).sparse_categorical_crossentropy(Y_train[samples]).backward()
return loss.realize(*optim.schedule_step())
for i in range(100):
loss = train_step()python
from tinygrad import Tensor, TinyJit, nn
from tinygrad.nn.datasets import mnist
X_train, Y_train, X_test, Y_test = mnist()
model = Model()
optim = nn.optim.Adam(nn.state.get_parameters(model))
@TinyJit
@Tensor.train()
def train_step():
optim.zero_grad()
samples = Tensor.randint(512, high=X_train.shape[0])
loss = model(X_train[samples]).sparse_categorical_crossentropy(Y_train[samples]).backward()
return loss.realize(*optim.schedule_step())
for i in range(100):
loss = train_step()Model Definition
模型定义
Models are plain Python classes with . No base class required.
__call__python
class Model:
def __init__(self):
self.l1 = nn.Linear(784, 128)
self.l2 = nn.Linear(128, 10)
def __call__(self, x):
return self.l1(x).relu().sequential([self.l2])Available nn modules: , , , , , , ,
LinearConv2dBatchNormLayerNormRMSNormEmbeddingGroupNormLSTMCellOptimizers: , , , , ,
SGDAdamAdamWLARSLAMBMuon模型是普通的Python类,只需实现方法,无需继承基类。
__call__python
class Model:
def __init__(self):
self.l1 = nn.Linear(784, 128)
self.l2 = nn.Linear(128, 10)
def __call__(self, x):
return self.l1(x).relu().sequential([self.l2])可用的nn模块: , , , , , , ,
LinearConv2dBatchNormLayerNormRMSNormEmbeddingGroupNormLSTMCell优化器: , , , , ,
SGDAdamAdamWLARSLAMBMuonState Dict / Weights
状态字典/权重
python
from tinygrad.nn.state import safe_save, safe_load, get_state_dict, load_state_dict, get_parameterspython
from tinygrad.nn.state import safe_save, safe_load, get_state_dict, load_state_dict, get_parametersSave/load safetensors
Save/load safetensors
safe_save(get_state_dict(model), "model.safetensors")
load_state_dict(model, safe_load("model.safetensors"))
safe_save(get_state_dict(model), "model.safetensors")
load_state_dict(model, safe_load("model.safetensors"))
Get all trainable params
Get all trainable params
params = get_parameters(model)
undefinedparams = get_parameters(model)
undefinedJIT Compilation
JIT编译
TinyJitpython
@TinyJit
def forward(x):
return model(x).realize()TinyJitpython
@TinyJit
def forward(x):
return model(x).realize()First call captures, subsequent calls replay
First call captures, subsequent calls replay
out = forward(batch)
undefinedout = forward(batch)
undefinedDevice Management
设备管理
python
from tinygrad import Device
print(Device.DEFAULT) # Auto-detected: METAL, CUDA, AMD, CPU, etc.python
from tinygrad import Device
print(Device.DEFAULT) # Auto-detected: METAL, CUDA, AMD, CPU, etc.Force device
Force device
x = Tensor.rand(10, device="CPU")
x = x.to("CUDA")
undefinedx = Tensor.rand(10, device="CPU")
x = x.to("CUDA")
undefinedEnvironment Variables
环境变量
| Variable | Values | Description |
|---|---|---|
| 1-7 | Increasing verbosity (4=code, 7=asm) |
| 1 | Graph visualization |
| # | Kernel beam search width |
| 1 | Disable optimizations |
| 1-2 | UOp spec verification |
| 变量 | 取值 | 说明 |
|---|---|---|
| 1-7 | 增加日志详细程度(4=代码,7=汇编) |
| 1 | 图可视化 |
| # | 内核束搜索宽度 |
| 1 | 禁用优化 |
| 1-2 | UOp规范验证 |
Debugging
调试
bash
undefinedbash
undefinedVisualize computation graph
Visualize computation graph
VIZ=1 python -c "from tinygrad import Tensor; Tensor.ones(10).sum().realize()"
VIZ=1 python -c "from tinygrad import Tensor; Tensor.ones(10).sum().realize()"
Show generated code
Show generated code
DEBUG=4 python script.py
DEBUG=4 python script.py
Run tests
Run tests
python -m pytest test/test_tensor.py -xvs
undefinedpython -m pytest test/test_tensor.py -xvs
undefinedUOp and PatternMatcher (Internals)
UOp与PatternMatcher(内部实现)
UOps are immutable, cached graph nodes. Use PatternMatcher for transformations:
python
from tinygrad.uop.ops import UOp, Ops
from tinygrad.uop.upat import UPat, PatternMatcher, graph_rewrite
pm = PatternMatcher([
(UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2),
])
result = graph_rewrite(uop, pm)Key UOp properties: , , , ,
opdtypesrcargtagDefine PatternMatchers at module level - they're slow to construct.
UOp是不可变的缓存图节点。使用PatternMatcher进行图转换:
python
from tinygrad.uop.ops import UOp, Ops
from tinygrad.uop.upat import UPat, PatternMatcher, graph_rewrite
pm = PatternMatcher([
(UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2),
])
result = graph_rewrite(uop, pm)UOp关键属性: , , , ,
opdtypesrcargtag在模块级别定义PatternMatcher——它们的构建过程较慢。
Style Guide
风格指南
- 2-space indentation, 150 char line limit
- Prefer readability over cleverness
- Never mix functionality changes with whitespace changes
- All functionality changes must be tested
- Run before commits
pre-commit run --all-files
- 使用2空格缩进,行宽限制为150字符
- 优先考虑可读性而非技巧性
- 切勿将功能变更与空白字符变更混在一起
- 所有功能变更必须包含测试
- 提交前运行
pre-commit run --all-files
Testing
测试
bash
python -m pytest test/test_tensor.py -xvs
python -m pytest test/unit/test_schedule_cache.py -x --timeout=60
SPEC=2 python -m pytest test/test_something.py # With spec verificationbash
python -m pytest test/test_tensor.py -xvs
python -m pytest test/unit/test_schedule_cache.py -x --timeout=60
SPEC=2 python -m pytest test/test_something.py # With spec verification