tinygrad

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

tinygrad

A minimal deep learning framework focused on beauty and minimalism. Every line must earn its keep.

一个专注于简洁优雅的轻量级深度学习框架。每一行代码都必须物有所值。

Quick Reference

快速参考

python

from tinygrad import Tensor, TinyJit, nn, dtypes, Device, GlobalCounters

python

from tinygrad import Tensor, TinyJit, nn, dtypes, Device, GlobalCounters

Tensor creation

x = Tensor([1, 2, 3]) x = Tensor.rand(2, 3) x = Tensor.kaiming_uniform(128, 784)

Operations are lazy until realized

y = (x + 1).relu().sum() y.realize() # or y.numpy()

Training context

with Tensor.train(): loss = model(x).sparse_categorical_crossentropy(labels).backward() optim.step()

undefined

with Tensor.train(): loss = model(x).sparse_categorical_crossentropy(labels).backward() optim.step()

undefined

Architecture Pipeline

架构流程

Tensor (
```
tinygrad/tensor.py
```
) - User API, creates UOp graph
UOp (
```
tinygrad/uop/ops.py
```
) - Unified IR for all operations
Schedule (
```
tinygrad/engine/schedule.py
```
) - Converts tensor UOps to kernel UOps
Codegen (
```
tinygrad/codegen/
```
) - Converts kernel UOps to device code
Runtime (
```
tinygrad/runtime/
```
) - Device-specific execution

Tensor (
```
tinygrad/tensor.py
```
) - 用户API，用于创建UOp图
UOp (
```
tinygrad/uop/ops.py
```
) - 所有运算的统一中间表示（IR）
Schedule (
```
tinygrad/engine/schedule.py
```
) - 将张量UOp转换为内核UOp
Codegen (
```
tinygrad/codegen/
```
) - 将内核UOp转换为设备代码
Runtime (
```
tinygrad/runtime/
```
) - 设备特定的执行逻辑

Training Loop Pattern

训练循环模式

python

from tinygrad import Tensor, TinyJit, nn
from tinygrad.nn.datasets import mnist

X_train, Y_train, X_test, Y_test = mnist()
model = Model()
optim = nn.optim.Adam(nn.state.get_parameters(model))

@TinyJit
@Tensor.train()
def train_step():
  optim.zero_grad()
  samples = Tensor.randint(512, high=X_train.shape[0])
  loss = model(X_train[samples]).sparse_categorical_crossentropy(Y_train[samples]).backward()
  return loss.realize(*optim.schedule_step())

for i in range(100):
  loss = train_step()

python

from tinygrad import Tensor, TinyJit, nn
from tinygrad.nn.datasets import mnist

X_train, Y_train, X_test, Y_test = mnist()
model = Model()
optim = nn.optim.Adam(nn.state.get_parameters(model))

@TinyJit
@Tensor.train()
def train_step():
  optim.zero_grad()
  samples = Tensor.randint(512, high=X_train.shape[0])
  loss = model(X_train[samples]).sparse_categorical_crossentropy(Y_train[samples]).backward()
  return loss.realize(*optim.schedule_step())

for i in range(100):
  loss = train_step()

Model Definition

模型定义

Models are plain Python classes with

__call__

. No base class required.

python

class Model:
  def __init__(self):
    self.l1 = nn.Linear(784, 128)
    self.l2 = nn.Linear(128, 10)
  def __call__(self, x):
    return self.l1(x).relu().sequential([self.l2])

Available nn modules:

Linear

Conv2d

BatchNorm

LayerNorm

RMSNorm

Embedding

GroupNorm

LSTMCell

Optimizers:

SGD

Adam

AdamW

LARS

LAMB

Muon

模型是普通的Python类，只需实现

__call__

方法，无需继承基类。

python

class Model:
  def __init__(self):
    self.l1 = nn.Linear(784, 128)
    self.l2 = nn.Linear(128, 10)
  def __call__(self, x):
    return self.l1(x).relu().sequential([self.l2])

可用的nn模块：

Linear

Conv2d

BatchNorm

LayerNorm

RMSNorm

Embedding

GroupNorm

LSTMCell

优化器：

SGD

Adam

AdamW

LARS

LAMB

Muon

State Dict / Weights

状态字典/权重

python

from tinygrad.nn.state import safe_save, safe_load, get_state_dict, load_state_dict, get_parameters

python

from tinygrad.nn.state import safe_save, safe_load, get_state_dict, load_state_dict, get_parameters

Save/load safetensors

safe_save(get_state_dict(model), "model.safetensors") load_state_dict(model, safe_load("model.safetensors"))

Get all trainable params

params = get_parameters(model)

undefined

params = get_parameters(model)

undefined

JIT Compilation

JIT编译

TinyJit

captures and replays kernel graphs. Input shapes must be fixed.

python

@TinyJit
def forward(x):
  return model(x).realize()

TinyJit

用于捕获并重放内核图。输入形状必须固定。

python

@TinyJit
def forward(x):
  return model(x).realize()

First call captures, subsequent calls replay

out = forward(batch)

undefined

out = forward(batch)

undefined

Device Management

设备管理

python

from tinygrad import Device
print(Device.DEFAULT)  # Auto-detected: METAL, CUDA, AMD, CPU, etc.

python

from tinygrad import Device
print(Device.DEFAULT)  # Auto-detected: METAL, CUDA, AMD, CPU, etc.

Force device

x = Tensor.rand(10, device="CPU") x = x.to("CUDA")

undefined

x = Tensor.rand(10, device="CPU") x = x.to("CUDA")

undefined

Environment Variables

环境变量

Variable	Values	Description
`DEBUG`	1-7	Increasing verbosity (4=code, 7=asm)
`VIZ`	1	Graph visualization
`BEAM`	#	Kernel beam search width
`NOOPT`	1	Disable optimizations
`SPEC`	1-2	UOp spec verification

变量	取值	说明
`DEBUG`	1-7	增加日志详细程度（4=代码，7=汇编）
`VIZ`	1	图可视化
`BEAM`	#	内核束搜索宽度
`NOOPT`	1	禁用优化
`SPEC`	1-2	UOp规范验证

Debugging

调试

bash

undefined

bash

undefined

Visualize computation graph

VIZ=1 python -c "from tinygrad import Tensor; Tensor.ones(10).sum().realize()"

Show generated code

DEBUG=4 python script.py

Run tests

python -m pytest test/test_tensor.py -xvs

undefined

python -m pytest test/test_tensor.py -xvs

undefined

UOp and PatternMatcher (Internals)

UOp与PatternMatcher（内部实现）

UOps are immutable, cached graph nodes. Use PatternMatcher for transformations:

python

from tinygrad.uop.ops import UOp, Ops
from tinygrad.uop.upat import UPat, PatternMatcher, graph_rewrite

pm = PatternMatcher([
  (UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2),
])
result = graph_rewrite(uop, pm)

Key UOp properties:

op

dtype

src

arg

tag

Define PatternMatchers at module level - they're slow to construct.

UOp是不可变的缓存图节点。使用PatternMatcher进行图转换：

python

from tinygrad.uop.ops import UOp, Ops
from tinygrad.uop.upat import UPat, PatternMatcher, graph_rewrite

pm = PatternMatcher([
  (UPat(Ops.ADD, src=(UPat.cvar("x"), UPat.cvar("x"))), lambda x: x * 2),
])
result = graph_rewrite(uop, pm)

UOp关键属性：

op

dtype

src

arg

tag

在模块级别定义PatternMatcher——它们的构建过程较慢。

Style Guide

风格指南

2-space indentation, 150 char line limit
Prefer readability over cleverness
Never mix functionality changes with whitespace changes
All functionality changes must be tested
Run
```
pre-commit run --all-files
```
before commits

使用2空格缩进，行宽限制为150字符
优先考虑可读性而非技巧性
切勿将功能变更与空白字符变更混在一起
所有功能变更必须包含测试
提交前运行
```
pre-commit run --all-files
```

Testing

测试

bash

python -m pytest test/test_tensor.py -xvs
python -m pytest test/unit/test_schedule_cache.py -x --timeout=60
SPEC=2 python -m pytest test/test_something.py  # With spec verification

bash

python -m pytest test/test_tensor.py -xvs
python -m pytest test/unit/test_schedule_cache.py -x --timeout=60
SPEC=2 python -m pytest test/test_something.py  # With spec verification