torchcode-pytorch-interview-practice
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseTorchCode — PyTorch Interview Practice
TorchCode — PyTorch面试练习
Skill by ara.so — Daily 2026 Skills collection.
TorchCode is a Jupyter-based, self-hosted coding practice environment for ML engineers. It provides 40 curated problems covering PyTorch fundamentals and architectures (softmax, LayerNorm, MultiHeadAttention, GPT-2, etc.) with an automated judge that gives instant pass/fail feedback, gradient verification, and timing — like LeetCode but for tensors.
来自ara.so的技能 — 2026每日技能合集。
TorchCode是一款基于Jupyter的自托管式编码练习环境,面向机器学习工程师。它提供了40个精心挑选的问题,涵盖PyTorch基础和架构(softmax、LayerNorm、MultiHeadAttention、GPT-2等),并配有自动评判系统,可即时给出通过/失败反馈、梯度验证和计时功能——就像面向张量的LeetCode。
Installation & Setup
安装与设置
Option 1: Online (zero install)
选项1:在线使用(无需安装)
- Hugging Face Spaces: https://huggingface.co/spaces/duoan/TorchCode
- Google Colab: Every notebook has an "Open in Colab" badge
- Hugging Face Spaces:https://huggingface.co/spaces/duoan/TorchCode
- Google Colab:每个笔记本都带有“在Colab中打开”的徽章
Option 2: pip (for use inside Colab or existing environment)
选项2:pip安装(适用于Colab或现有环境)
bash
pip install torch-judgebash
pip install torch-judgeOption 3: Docker (pre-built image)
选项3:Docker(预构建镜像)
bash
docker run -p 8888:8888 -e PORT=8888 ghcr.io/duoan/torchcode:latestbash
docker run -p 8888:8888 -e PORT=8888 ghcr.io/duoan/torchcode:latestundefinedundefinedOption 4: Build locally
Option 4: Build locally
bash
git clone https://github.com/duoan/TorchCode.git
cd TorchCode
make runbash
git clone https://github.com/duoan/TorchCode.git
cd TorchCode
make run
`make run` auto-detects Docker or Podman and falls back to local build if the registry image is unavailable (common on Apple Silicon/arm64).
---
`make run`会自动检测Docker或Podman,如果镜像仓库中的镜像不可用(在Apple Silicon/arm64设备上常见),会回退到本地构建。
---Judge API
评判API
The package provides the core API used in every notebook.
torch_judgepython
from torch_judge import check, status, hint, reset_progresstorch_judgepython
from torch_judge import check, status, hint, reset_progressList all 40 problems and your progress
列出所有40个问题及你的进度
status()
status()
Run tests for a specific problem
针对特定问题运行测试
check("relu")
check("softmax")
check("layernorm")
check("attention")
check("gpt2")
check("relu")
check("softmax")
check("layernorm")
check("attention")
check("gpt2")
Get a hint without spoilers
获取无剧透提示
hint("softmax")
hint("softmax")
Reset progress for a problem
重置某个问题的进度
reset_progress("relu")
undefinedreset_progress("relu")
undefinedcheck()
return values
check()check()
返回值
check()- Colored pass/fail per test case
- Correctness check against PyTorch reference implementation
- Gradient verification (autograd compatibility)
- Timing measurement
- 每个测试用例的彩色通过/失败标记
- 与PyTorch参考实现的正确性校验
- 梯度验证(自动求导兼容性)
- 计时测量
Problem Set Overview
问题集概述
Difficulty levels: Easy → Medium → Hard
难度等级:简单 → 中等 → 困难
| # | Problem | Key Concepts |
|---|---|---|
| 1 | ReLU | Activation functions, element-wise ops |
| 2 | Softmax | Numerical stability, exp/log tricks |
| 3 | Linear Layer | |
| 4 | LayerNorm | Normalization, affine transform |
| 5 | Self-Attention | QKV projections, scaled dot-product |
| 6 | Multi-Head Attention | Head splitting, concatenation |
| 7 | BatchNorm | Batch vs layer statistics, train/eval |
| 8 | RMSNorm | LLaMA-style norm |
| 16 | Cross-Entropy Loss | Log-softmax, logsumexp trick |
| 17 | Dropout | Train/eval mode, inverted scaling |
| 18 | Embedding | Lookup table, |
| 19 | GELU | |
| 20 | Kaiming Init | |
| 21 | Gradient Clipping | Norm-based clipping |
| 31 | Gradient Accumulation | Micro-batching, loss scaling |
| 40 | Linear Regression | Normal equation, GD from scratch |
| 序号 | 问题 | 核心概念 |
|---|---|---|
| 1 | ReLU | 激活函数、逐元素操作 |
| 2 | Softmax | 数值稳定性、指数/对数技巧 |
| 3 | 线性层 | |
| 4 | LayerNorm | 归一化、仿射变换 |
| 5 | 自注意力 | QKV投影、缩放点积 |
| 6 | 多头注意力 | 头拆分、拼接 |
| 7 | BatchNorm | 批次与层统计、训练/评估模式 |
| 8 | RMSNorm | LLaMA风格归一化 |
| 16 | 交叉熵损失 | Log-softmax、LogSumExp技巧 |
| 17 | Dropout | 训练/评估模式、反向缩放 |
| 18 | Embedding | 查找表、 |
| 19 | GELU | |
| 20 | Kaiming初始化 | |
| 21 | 梯度裁剪 | 基于范数的裁剪 |
| 31 | 梯度累积 | 微批次、损失缩放 |
| 40 | 线性回归 | 正规方程、从零实现梯度下降 |
Working Through a Problem
问题练习流程
Each problem notebook has the same structure:
templates/
01_relu.ipynb # Blank template — your workspace
02_softmax.ipynb
...
solutions/
01_relu.ipynb # Reference solution (study after attempt)每个问题笔记本都具有相同的结构:
templates/
01_relu.ipynb # 空白模板 — 你的工作区
02_softmax.ipynb
...
solutions/
01_relu.ipynb # 参考解决方案(尝试后再学习)Typical notebook workflow
典型笔记本工作流
python
undefinedpython
undefinedCell 1: Import judge
单元格1:导入评判工具
from torch_judge import check, hint
import torch
import torch.nn as nn
from torch_judge import check, hint
import torch
import torch.nn as nn
Cell 2: Your implementation
单元格2:你的实现
def my_relu(x: torch.Tensor) -> torch.Tensor:
# TODO: implement ReLU without using torch.relu or F.relu
raise NotImplementedError
def my_relu(x: torch.Tensor) -> torch.Tensor:
# TODO: 不使用torch.relu或F.relu实现ReLU
raise NotImplementedError
Cell 3: Run the judge
单元格3:运行评判
check("relu")
---check("relu")
---Real Implementation Examples
实际实现示例
ReLU (Problem 1 — Easy)
ReLU(问题1 — 简单)
python
def my_relu(x: torch.Tensor) -> torch.Tensor:
return torch.clamp(x, min=0)
# Alternative: return x * (x > 0)
# Alternative: return torch.where(x > 0, x, torch.zeros_like(x))python
def my_relu(x: torch.Tensor) -> torch.Tensor:
return torch.clamp(x, min=0)
# 替代方案:return x * (x > 0)
# 替代方案:return torch.where(x > 0, x, torch.zeros_like(x))Softmax (Problem 2 — Easy, numerically stable)
Softmax(问题2 — 简单,数值稳定版)
python
def my_softmax(x: torch.Tensor, dim: int = -1) -> torch.Tensor:
# Subtract max for numerical stability (prevents overflow)
x_max = x.max(dim=dim, keepdim=True).values
x_shifted = x - x_max
exp_x = torch.exp(x_shifted)
return exp_x / exp_x.sum(dim=dim, keepdim=True)python
def my_softmax(x: torch.Tensor, dim: int = -1) -> torch.Tensor:
# 减去最大值以保证数值稳定性(防止溢出)
x_max = x.max(dim=dim, keepdim=True).values
x_shifted = x - x_max
exp_x = torch.exp(x_shifted)
return exp_x / exp_x.sum(dim=dim, keepdim=True)LayerNorm (Problem 4 — Medium)
LayerNorm(问题4 — 中等)
python
def my_layer_norm(
x: torch.Tensor,
weight: torch.Tensor, # gamma (scale)
bias: torch.Tensor, # beta (shift)
eps: float = 1e-5
) -> torch.Tensor:
mean = x.mean(dim=-1, keepdim=True)
var = x.var(dim=-1, keepdim=True, unbiased=False)
x_norm = (x - mean) / torch.sqrt(var + eps)
return weight * x_norm + biaspython
def my_layer_norm(
x: torch.Tensor,
weight: torch.Tensor, # gamma(缩放)
bias: torch.Tensor, # beta(偏移)
eps: float = 1e-5
) -> torch.Tensor:
mean = x.mean(dim=-1, keepdim=True)
var = x.var(dim=-1, keepdim=True, unbiased=False)
x_norm = (x - mean) / torch.sqrt(var + eps)
return weight * x_norm + biasRMSNorm (Problem 8 — Medium, LLaMA-style)
RMSNorm(问题8 — 中等,LLaMA风格)
python
def rms_norm(x: torch.Tensor, weight: torch.Tensor, eps: float = 1e-6) -> torch.Tensor:
rms = torch.sqrt((x ** 2).mean(dim=-1, keepdim=True) + eps)
return (x / rms) * weightpython
def rms_norm(x: torch.Tensor, weight: torch.Tensor, eps: float = 1e-6) -> torch.Tensor:
rms = torch.sqrt((x ** 2).mean(dim=-1, keepdim=True) + eps)
return (x / rms) * weightScaled Dot-Product Self-Attention (Problem 5 — Medium)
缩放点积自注意力(问题5 — 中等)
python
import torch.nn.functional as F
import math
def scaled_dot_product_attention(
Q: torch.Tensor, # (B, heads, T, head_dim)
K: torch.Tensor,
V: torch.Tensor,
mask: torch.Tensor = None
) -> torch.Tensor:
d_k = Q.size(-1)
scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(d_k)
if mask is not None:
scores = scores.masked_fill(mask == 0, float('-inf'))
attn_weights = F.softmax(scores, dim=-1)
return torch.matmul(attn_weights, V)python
import torch.nn.functional as F
import math
def scaled_dot_product_attention(
Q: torch.Tensor, # (B, heads, T, head_dim)
K: torch.Tensor,
V: torch.Tensor,
mask: torch.Tensor = None
) -> torch.Tensor:
d_k = Q.size(-1)
scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(d_k)
if mask is not None:
scores = scores.masked_fill(mask == 0, float('-inf'))
attn_weights = F.softmax(scores, dim=-1)
return torch.matmul(attn_weights, V)Multi-Head Attention (Problem 6 — Medium)
多头注意力(问题6 — 中等)
python
class MyMultiHeadAttention(nn.Module):
def __init__(self, d_model: int, num_heads: int):
super().__init__()
assert d_model % num_heads == 0
self.num_heads = num_heads
self.head_dim = d_model // num_heads
self.d_model = d_model
self.W_q = nn.Linear(d_model, d_model)
self.W_k = nn.Linear(d_model, d_model)
self.W_v = nn.Linear(d_model, d_model)
self.W_o = nn.Linear(d_model, d_model)
def forward(self, x: torch.Tensor, mask: torch.Tensor = None) -> torch.Tensor:
B, T, C = x.shape
def split_heads(t):
return t.view(B, T, self.num_heads, self.head_dim).transpose(1, 2)
Q = split_heads(self.W_q(x))
K = split_heads(self.W_k(x))
V = split_heads(self.W_v(x))
attn_out = scaled_dot_product_attention(Q, K, V, mask)
# (B, heads, T, head_dim) -> (B, T, d_model)
attn_out = attn_out.transpose(1, 2).contiguous().view(B, T, C)
return self.W_o(attn_out)python
class MyMultiHeadAttention(nn.Module):
def __init__(self, d_model: int, num_heads: int):
super().__init__()
assert d_model % num_heads == 0
self.num_heads = num_heads
self.head_dim = d_model // num_heads
self.d_model = d_model
self.W_q = nn.Linear(d_model, d_model)
self.W_k = nn.Linear(d_model, d_model)
self.W_v = nn.Linear(d_model, d_model)
self.W_o = nn.Linear(d_model, d_model)
def forward(self, x: torch.Tensor, mask: torch.Tensor = None) -> torch.Tensor:
B, T, C = x.shape
def split_heads(t):
return t.view(B, T, self.num_heads, self.head_dim).transpose(1, 2)
Q = split_heads(self.W_q(x))
K = split_heads(self.W_k(x))
V = split_heads(self.W_v(x))
attn_out = scaled_dot_product_attention(Q, K, V, mask)
# (B, heads, T, head_dim) -> (B, T, d_model)
attn_out = attn_out.transpose(1, 2).contiguous().view(B, T, C)
return self.W_o(attn_out)Cross-Entropy Loss (Problem 16 — Easy)
交叉熵损失(问题16 — 简单)
python
def cross_entropy_loss(logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
# logits: (B, C), targets: (B,) with class indices
# Use logsumexp trick for numerical stability
log_sum_exp = torch.logsumexp(logits, dim=-1) # (B,)
log_probs = logits[torch.arange(len(targets)), targets] # (B,)
return (log_sum_exp - log_probs).mean()python
def cross_entropy_loss(logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
# logits: (B, C), targets: (B,) 包含类别索引
# 使用LogSumExp技巧保证数值稳定性
log_sum_exp = torch.logsumexp(logits, dim=-1) # (B,)
log_probs = logits[torch.arange(len(targets)), targets] # (B,)
return (log_sum_exp - log_probs).mean()Dropout (Problem 17 — Easy)
Dropout(问题17 — 简单)
python
class MyDropout(nn.Module):
def __init__(self, p: float = 0.5):
super().__init__()
self.p = p
def forward(self, x: torch.Tensor) -> torch.Tensor:
if not self.training or self.p == 0:
return x
mask = torch.bernoulli(torch.ones_like(x) * (1 - self.p))
return x * mask / (1 - self.p) # inverted scalingpython
class MyDropout(nn.Module):
def __init__(self, p: float = 0.5):
super().__init__()
self.p = p
def forward(self, x: torch.Tensor) -> torch.Tensor:
if not self.training or self.p == 0:
return x
mask = torch.bernoulli(torch.ones_like(x) * (1 - self.p))
return x * mask / (1 - self.p) # 反向缩放Kaiming Init (Problem 20 — Easy)
Kaiming初始化(问题20 — 简单)
python
def kaiming_init(weight: torch.Tensor) -> torch.Tensor:
fan_in = weight.size(1)
std = math.sqrt(2.0 / fan_in)
with torch.no_grad():
weight.normal_(0, std)
return weightpython
def kaiming_init(weight: torch.Tensor) -> torch.Tensor:
fan_in = weight.size(1)
std = math.sqrt(2.0 / fan_in)
with torch.no_grad():
weight.normal_(0, std)
return weightGradient Clipping (Problem 21 — Easy)
梯度裁剪(问题21 — 简单)
python
def clip_grad_norm(parameters, max_norm: float) -> float:
params = [p for p in parameters if p.grad is not None]
total_norm = torch.sqrt(sum(p.grad.data.norm() ** 2 for p in params))
clip_coef = max_norm / (total_norm + 1e-6)
if clip_coef < 1:
for p in params:
p.grad.data.mul_(clip_coef)
return total_norm.item()python
def clip_grad_norm(parameters, max_norm: float) -> float:
params = [p for p in parameters if p.grad is not None]
total_norm = torch.sqrt(sum(p.grad.data.norm() ** 2 for p in params))
clip_coef = max_norm / (total_norm + 1e-6)
if clip_coef < 1:
for p in params:
p.grad.data.mul_(clip_coef)
return total_norm.item()Gradient Accumulation (Problem 31 — Easy)
梯度累积(问题31 — 简单)
python
def train_with_accumulation(model, optimizer, dataloader, accumulation_steps=4):
optimizer.zero_grad()
for i, (inputs, targets) in enumerate(dataloader):
outputs = model(inputs)
loss = criterion(outputs, targets) / accumulation_steps # scale loss
loss.backward()
if (i + 1) % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()python
def train_with_accumulation(model, optimizer, dataloader, accumulation_steps=4):
optimizer.zero_grad()
for i, (inputs, targets) in enumerate(dataloader):
outputs = model(inputs)
loss = criterion(outputs, targets) / accumulation_steps # 缩放损失
loss.backward()
if (i + 1) % accumulation_steps == 0:
optimizer.step()
optimizer.zero_grad()Common Patterns & Tips
常见模式与技巧
Numerical stability pattern
数值稳定性模式
Always subtract the max before :
exp()python
undefined在调用前务必减去最大值:
exp()python
undefinedWRONG — can overflow for large values
错误写法 — 大数值时可能溢出
exp_x = torch.exp(x)
exp_x = torch.exp(x)
CORRECT — numerically stable
正确写法 — 数值稳定
exp_x = torch.exp(x - x.max(dim=-1, keepdim=True).values)
undefinedexp_x = torch.exp(x - x.max(dim=-1, keepdim=True).values)
undefinedCausal attention mask (for GPT-style models)
因果注意力掩码(适用于GPT风格模型)
python
def causal_mask(T: int, device) -> torch.Tensor:
return torch.tril(torch.ones(T, T, device=device)).unsqueeze(0).unsqueeze(0)python
def causal_mask(T: int, device) -> torch.Tensor:
return torch.tril(torch.ones(T, T, device=device)).unsqueeze(0).unsqueeze(0)nn.Module skeleton (used in many problems)
nn.Module骨架(用于许多问题)
python
class MyLayer(nn.Module):
def __init__(self, ...):
super().__init__()
self.weight = nn.Parameter(torch.empty(...))
self.bias = nn.Parameter(torch.zeros(...))
self._init_weights()
def _init_weights(self):
nn.init.kaiming_uniform_(self.weight)
def forward(self, x: torch.Tensor) -> torch.Tensor:
...python
class MyLayer(nn.Module):
def __init__(self, ...):
super().__init__()
self.weight = nn.Parameter(torch.empty(...))
self.bias = nn.Parameter(torch.zeros(...))
self._init_weights()
def _init_weights(self):
nn.init.kaiming_uniform_(self.weight)
def forward(self, x: torch.Tensor) -> torch.Tensor:
...Train vs eval mode pattern
训练与评估模式切换
python
def forward(self, x):
if self.training:
# use batch statistics
mean = x.mean(dim=0)
var = x.var(dim=0, unbiased=False)
# update running stats
self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean
self.running_var = (1 - self.momentum) * self.running_var + self.momentum * var
else:
# use running statistics
mean = self.running_mean
var = self.running_var
return (x - mean) / torch.sqrt(var + self.eps) * self.weight + self.biaspython
def forward(self, x):
if self.training:
# 使用批次统计
mean = x.mean(dim=0)
var = x.var(dim=0, unbiased=False)
# 更新运行统计
self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean
self.running_var = (1 - self.momentum) * self.running_var + self.momentum * var
else:
# 使用运行统计
mean = self.running_mean
var = self.running_var
return (x - mean) / torch.sqrt(var + self.eps) * self.weight + self.biasProject Structure
项目结构
TorchCode/
├── templates/ # Blank notebooks for each problem (your workspace)
│ ├── 01_relu.ipynb
│ ├── 02_softmax.ipynb
│ └── ...
├── solutions/ # Reference solutions (study after attempting)
│ └── ...
├── torch_judge/ # Auto-grading package
│ ├── __init__.py # check(), status(), hint(), reset_progress()
│ └── tasks/ # Per-problem test cases
├── Dockerfile
├── Makefile
└── pyproject.toml # torch-judge package definitionTorchCode/
├── templates/ # 每个问题的空白笔记本(你的工作区)
│ ├── 01_relu.ipynb
│ ├── 02_softmax.ipynb
│ └── ...
├── solutions/ # 参考解决方案(尝试后再学习)
│ └── ...
├── torch_judge/ # 自动评分包
│ ├── __init__.py # check(), status(), hint(), reset_progress()
│ └── tasks/ # 每个问题的测试用例
├── Dockerfile
├── Makefile
└── pyproject.toml # torch-judge包定义Troubleshooting
故障排除
Docker image not available for Apple Silicon (arm64)
Apple Silicon(arm64)设备上Docker镜像不可用
bash
undefinedbash
undefinedmake run auto-falls back to local build, or force it:
make run会自动回退到本地构建,或手动执行:
make build
make start
undefinedmake build
make start
undefinedcheck()
not found in Colab
check()Colab中找不到check()
函数
check()bash
!pip install torch-judgebash
!pip install torch-judgethen restart runtime
然后重启运行时
undefinedundefinedNotebook reset to blank template
笔记本重置为空白模板
Use the toolbar "Reset" button in JupyterLab to reset any notebook to its original blank state — useful for re-practicing a problem.
使用JupyterLab工具栏中的“重置”按钮,可将任意笔记本恢复为原始空白状态 — 非常适合重新练习某个问题。
Gradient check fails but output is correct
梯度检查失败但输出正确
Ensure your implementation uses PyTorch operations (not NumPy) so autograd works:
python
undefined确保你的实现使用PyTorch操作(而非NumPy),这样自动求导才能正常工作:
python
undefinedWRONG — breaks autograd
错误写法 — 破坏自动求导
import numpy as np
result = np.exp(x.numpy())
import numpy as np
result = np.exp(x.numpy())
CORRECT — autograd compatible
正确写法 — 兼容自动求导
result = torch.exp(x)
undefinedresult = torch.exp(x)
undefinedViewing reference solution
查看参考解决方案
After attempting a problem, open the matching file in :
solutions/solutions/02_softmax.ipynb尝试完成问题后,打开目录下对应的文件:
solutions/solutions/02_softmax.ipynbKey Concepts Tested
测试的核心概念
| Concept | Problems |
|---|---|
| Numerical stability | Softmax, Cross-Entropy, LogSumExp |
Autograd / | Linear, LayerNorm, all nn.Module problems |
| Train vs eval behavior | BatchNorm, Dropout |
| Broadcasting | LayerNorm, RMSNorm, attention masking |
| Shape manipulation | Multi-Head Attention (view, transpose, contiguous) |
| Weight initialization | Kaiming Init, Linear Layer |
| Memory-efficient training | Gradient Accumulation, Gradient Clipping |
| 概念 | 对应问题 |
|---|---|
| 数值稳定性 | Softmax、交叉熵损失、LogSumExp |
自动求导 / | 线性层、LayerNorm、所有nn.Module相关问题 |
| 训练与评估模式行为 | BatchNorm、Dropout |
| 广播机制 | LayerNorm、RMSNorm、注意力掩码 |
| 形状操作 | 多头注意力(view、transpose、contiguous) |
| 权重初始化 | Kaiming初始化、线性层 |
| 内存高效训练 | 梯度累积、梯度裁剪 |