tinygrad

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

tinygrad

tinygrad

A minimal deep learning framework focused on beauty and minimalism. Every line must earn its keep.
一个专注于简洁优雅的轻量级深度学习框架。每一行代码都必须物有所值。

Quick Reference

快速参考

python
from tinygrad import Tensor, TinyJit, nn, dtypes, Device, GlobalCounters
python
from tinygrad import Tensor, TinyJit, nn, dtypes, Device, GlobalCounters

Tensor creation

Tensor creation

x = Tensor([1, 2, 3]) x = Tensor.rand(2, 3) x = Tensor.kaiming_uniform(128, 784)
x = Tensor([1, 2, 3]) x = Tensor.rand(2, 3) x = Tensor.kaiming_uniform(128, 784)

Operations are lazy until realized

Operations are lazy until realized

y = (x + 1).relu().sum() y.realize() # or y.numpy()
y = (x + 1).relu().sum() y.realize() # or y.numpy()

Training context

Training context

with Tensor.train(): loss = model(x).sparse_categorical_crossentropy(labels).backward() optim.step()
undefined
with Tensor.train(): loss = model(x).sparse_categorical_crossentropy(labels).backward() optim.step()
undefined

Architecture Pipeline

架构流程

  1. Tensor (
    tinygrad/tensor.py
    ) - User API, creates UOp graph
  2. UOp (
    tinygrad/uop/ops.py
    ) - Unified IR for all operations
  3. Schedule (
    tinygrad/engine/schedule.py
    ) - Converts tensor UOps to kernel UOps
  4. Codegen (
    tinygrad/codegen/
    ) - Converts kernel UOps to device code
  5. Runtime (
    tinygrad/runtime/
    ) - Device-specific execution
  1. Tensor (
    tinygrad/tensor.py
    ) - 用户API,用于创建UOp图
  2. UOp (
    tinygrad/uop/ops.py
    ) - 所有运算的统一中间表示(IR)
  3. Schedule (
    tinygrad/engine/schedule.py
    ) - 将张量UOp转换为内核UOp
  4. Codegen (
    tinygrad/codegen/
    ) - 将内核UOp转换为设备代码
  5. Runtime (
    tinygrad/runtime/
    ) - 设备特定的执行逻辑

Training Loop Pattern

训练循环模式

python
from tinygrad import Tensor, TinyJit, nn
from tinygrad.nn.datasets import mnist

X_train, Y_train, X_test, Y_test = mnist()
model = Model()
optim = nn.optim.Adam(nn.state.get_parameters(model))

@TinyJit
@Tensor.train()
def train_step():
  optim.zero_grad()
  samples = Tensor.randint(512, high=X_train.shape[0])
  loss = model(X_train[samples]).sparse_categorical_crossentropy(Y_train[samples]).backward()
  return loss.realize(*optim.schedule_step())

for i in range(100):
  loss = train_step()
python
from tinygrad import Tensor, TinyJit, nn
from tinygrad.nn.datasets import mnist

X_train, Y_train, X_test, Y_test = mnist()
model = Model()
optim = nn.optim.Adam(nn.state.get_parameters(model))

@TinyJit
@Tensor.train()
def train_step():
  optim.zero_grad()
  samples = Tensor.randint(512, high=X_train.shape[0])
  loss = model(X_train[samples]).sparse_categorical_crossentropy(Y_train[samples]).backward()
  return loss.realize(*optim.schedule_step())

for i in range(100):
  loss = train_step()

Model Definition

模型定义

Models are plain Python classes with
__call__
. No base class required.
python
class Model:
  def __init__(self):
    self.l1 = nn.Linear(784, 128)
    self.l2 = nn.Linear(128, 10)
  def __call__(self, x):
    return self.l1(x).relu().sequential([self.l2])
Available nn modules:
Linear
,
Conv2d
,
BatchNorm
,
LayerNorm
,
RMSNorm
,
Embedding
,
GroupNorm
,
LSTMCell
Optimizers:
SGD
,
Adam
,
AdamW
,
LARS
,
LAMB
,
Muon
模型是普通的Python类,只需实现
__call__
方法,无需继承基类。
python
class Model:
  def __init__(self):
    self.l1 = nn.Linear(784, 128)
    self.l2 = nn.Linear(128, 10)
  def __call__(self, x):
    return self.l1(x).relu().sequential([self.l2])
可用的nn模块:
Linear
,
Conv2d
,
BatchNorm
,
LayerNorm
,
RMSNorm
,
Embedding
,
GroupNorm
,
LSTMCell
优化器:
SGD
,
Adam
,
AdamW
,
LARS
,
LAMB
,
Muon

State Dict / Weights

状态字典/权重

python
from tinygrad.nn.state import safe_save, safe_load, get_state_dict, load_state_dict, get_parameters
python
from tinygrad.nn.state import safe_save, safe_load, get_state_dict, load_state_dict, get_parameters

Save/load safetensors

Save/load safetensors

safe_save(get_state_dict(model), "model.safetensors") load_state_dict(model, safe_load("model.safetensors"))
safe_save(get_state_dict(model), "model.safetensors") load_state_dict(model, safe_load("model.safetensors"))

Get all trainable params

Get all trainable params

params = get_parameters(model)
undefined
params = get_parameters(model)
undefined

JIT Compilation

JIT编译

TinyJit
captures and replays kernel graphs. Input shapes must be fixed.
python
@TinyJit
def forward(x):
  return model(x).realize()
TinyJit
用于捕获并重放内核图。输入形状必须固定。
python
@TinyJit
def forward(x):
  return model(x).realize()

First call captures, subsequent calls replay

First call captures, subsequent calls replay

out = forward(batch)
undefined
out = forward(batch)
undefined

Device Management

设备管理

python
from tinygrad import Device
print(Device.DEFAULT)  # Auto-detected: METAL, CUDA, AMD, CPU, etc.
python
from tinygrad import Device
print(Device.DEFAULT)  # Auto-detected: METAL, CUDA, AMD, CPU, etc.

Force device

Force device

x = Tensor.rand(10, device="CPU") x = x.to("CUDA")
undefined
x = Tensor.rand(10, device="CPU") x = x.to("CUDA")
undefined

Environment Variables

环境变量

VariableValuesDescription
DEBUG
1-7Increasing verbosity (4=code, 7=asm)
VIZ
1Graph visualization
BEAM
#Kernel beam search width
NOOPT
1Disable optimizations
SPEC
1-2UOp spec verification
变量取值说明
DEBUG
1-7增加日志详细程度(4=代码,7=汇编)
VIZ
1图可视化
BEAM
#内核束搜索宽度
NOOPT
1禁用优化
SPEC
1-2UOp规范验证

Debugging

调试

bash
undefined
bash
undefined

Visualize computation graph

Visualize computation graph

VIZ=1 python -c "from tinygrad import Tensor; Tensor.ones(10).sum().realize()"
VIZ=1 python -c "from tinygrad import Tensor; Tensor.ones(10).sum().realize()"

Show generated code

Show generated code

DEBUG=4 python script.py
DEBUG=4 python script.py

Run tests

Run tests

python -m pytest test/test_tensor.py -xvs
undefined
python -m pytest test/test_tensor.py -xvs
undefined

UOp and PatternMatcher (Internals)

UOp与PatternMatcher(内部实现)

UOps are immutable, cached graph nodes. Use PatternMatcher for transformations:
python
from tinygrad.uop.ops import UOp, Ops
from tinygrad.uop.upat import UPat, PatternMatcher, graph_rewrite

pm = PatternMatcher([
  (UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2),
])
result = graph_rewrite(uop, pm)
Key UOp properties:
op
,
dtype
,
src
,
arg
,
tag
Define PatternMatchers at module level - they're slow to construct.
UOp是不可变的缓存图节点。使用PatternMatcher进行图转换:
python
from tinygrad.uop.ops import UOp, Ops
from tinygrad.uop.upat import UPat, PatternMatcher, graph_rewrite

pm = PatternMatcher([
  (UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2),
])
result = graph_rewrite(uop, pm)
UOp关键属性:
op
,
dtype
,
src
,
arg
,
tag
在模块级别定义PatternMatcher——它们的构建过程较慢。

Style Guide

风格指南

  • 2-space indentation, 150 char line limit
  • Prefer readability over cleverness
  • Never mix functionality changes with whitespace changes
  • All functionality changes must be tested
  • Run
    pre-commit run --all-files
    before commits
  • 使用2空格缩进,行宽限制为150字符
  • 优先考虑可读性而非技巧性
  • 切勿将功能变更与空白字符变更混在一起
  • 所有功能变更必须包含测试
  • 提交前运行
    pre-commit run --all-files

Testing

测试

bash
python -m pytest test/test_tensor.py -xvs
python -m pytest test/unit/test_schedule_cache.py -x --timeout=60
SPEC=2 python -m pytest test/test_something.py  # With spec verification
bash
python -m pytest test/test_tensor.py -xvs
python -m pytest test/unit/test_schedule_cache.py -x --timeout=60
SPEC=2 python -m pytest test/test_something.py  # With spec verification