TorchCode — PyTorch Interview Practice

Skill by ara.so — Daily 2026 Skills collection.

TorchCode is a Jupyter-based, self-hosted coding practice environment for ML engineers. It provides 40 curated problems covering PyTorch fundamentals and architectures (softmax, LayerNorm, MultiHeadAttention, GPT-2, etc.) with an automated judge that gives instant pass/fail feedback, gradient verification, and timing — like LeetCode but for tensors.

Installation & Setup

Option 1: Online (zero install)

Hugging Face Spaces: https://huggingface.co/spaces/duoan/TorchCode
Google Colab: Every notebook has an "Open in Colab" badge

Option 2: pip (for use inside Colab or existing environment)

bash

pip install torch-judge

Option 3: Docker (pre-built image)

bash

docker run -p 8888:8888 -e PORT=8888 ghcr.io/duoan/torchcode:latest
# Open http://localhost:8888

Option 4: Build locally

bash

git clone https://github.com/duoan/TorchCode.git
cd TorchCode
make run
# Open http://localhost:8888

make run

auto-detects Docker or Podman and falls back to local build if the registry image is unavailable (common on Apple Silicon/arm64).

Judge API

The

torch_judge

package provides the core API used in every notebook.

python

from torch_judge import check, status, hint, reset_progress

# List all 40 problems and your progress
status()

# Run tests for a specific problem
check("relu")
check("softmax")
check("layernorm")
check("attention")
check("gpt2")

# Get a hint without spoilers
hint("softmax")

# Reset progress for a problem
reset_progress("relu")

check()

return values

Colored pass/fail per test case
Correctness check against PyTorch reference implementation
Gradient verification (autograd compatibility)
Timing measurement

Problem Set Overview

Difficulty levels: Easy → Medium → Hard

#	Problem	Key Concepts
1	ReLU	Activation functions, element-wise ops
2	Softmax	Numerical stability, exp/log tricks
3	Linear Layer	`y = xW^T + b` , Kaiming init, `nn.Parameter`
4	LayerNorm	Normalization, affine transform
5	Self-Attention	QKV projections, scaled dot-product
6	Multi-Head Attention	Head splitting, concatenation
7	BatchNorm	Batch vs layer statistics, train/eval
8	RMSNorm	LLaMA-style norm
16	Cross-Entropy Loss	Log-softmax, logsumexp trick
17	Dropout	Train/eval mode, inverted scaling
18	Embedding	Lookup table, `weight[indices]`
19	GELU	`torch.erf` , Gaussian error linear unit
20	Kaiming Init	`std = sqrt(2/fan_in)`
21	Gradient Clipping	Norm-based clipping
31	Gradient Accumulation	Micro-batching, loss scaling
40	Linear Regression	Normal equation, GD from scratch

Working Through a Problem

Each problem notebook has the same structure:

templates/
  01_relu.ipynb       # Blank template — your workspace
  02_softmax.ipynb
  ...
solutions/
  01_relu.ipynb       # Reference solution (study after attempt)

Typical notebook workflow

python

# Cell 1: Import judge
from torch_judge import check, hint
import torch
import torch.nn as nn

# Cell 2: Your implementation
def my_relu(x: torch.Tensor) -> torch.Tensor:
    # TODO: implement ReLU without using torch.relu or F.relu
    raise NotImplementedError

# Cell 3: Run the judge
check("relu")

Real Implementation Examples

ReLU (Problem 1 — Easy)

python

def my_relu(x: torch.Tensor) -> torch.Tensor:
    return torch.clamp(x, min=0)
    # Alternative: return x * (x > 0)
    # Alternative: return torch.where(x > 0, x, torch.zeros_like(x))

Softmax (Problem 2 — Easy, numerically stable)

python

def my_softmax(x: torch.Tensor, dim: int = -1) -> torch.Tensor:
    # Subtract max for numerical stability (prevents overflow)
    x_max = x.max(dim=dim, keepdim=True).values
    x_shifted = x - x_max
    exp_x = torch.exp(x_shifted)
    return exp_x / exp_x.sum(dim=dim, keepdim=True)

LayerNorm (Problem 4 — Medium)

python

def my_layer_norm(
    x: torch.Tensor,
    weight: torch.Tensor,   # gamma (scale)
    bias: torch.Tensor,     # beta (shift)
    eps: float = 1e-5
) -> torch.Tensor:
    mean = x.mean(dim=-1, keepdim=True)
    var = x.var(dim=-1, keepdim=True, unbiased=False)
    x_norm = (x - mean) / torch.sqrt(var + eps)
    return weight * x_norm + bias

RMSNorm (Problem 8 — Medium, LLaMA-style)

python

def rms_norm(x: torch.Tensor, weight: torch.Tensor, eps: float = 1e-6) -> torch.Tensor:
    rms = torch.sqrt((x ** 2).mean(dim=-1, keepdim=True) + eps)
    return (x / rms) * weight

Scaled Dot-Product Self-Attention (Problem 5 — Medium)

python

import torch.nn.functional as F
import math

def scaled_dot_product_attention(
    Q: torch.Tensor,  # (B, heads, T, head_dim)
    K: torch.Tensor,
    V: torch.Tensor,
    mask: torch.Tensor = None
) -> torch.Tensor:
    d_k = Q.size(-1)
    scores = torch.matmul(Q, K.transpose(-2, -1)) / math.sqrt(d_k)
    if mask is not None:
        scores = scores.masked_fill(mask == 0, float('-inf'))
    attn_weights = F.softmax(scores, dim=-1)
    return torch.matmul(attn_weights, V)

Multi-Head Attention (Problem 6 — Medium)

python

class MyMultiHeadAttention(nn.Module):
    def __init__(self, d_model: int, num_heads: int):
        super().__init__()
        assert d_model % num_heads == 0
        self.num_heads = num_heads
        self.head_dim = d_model // num_heads
        self.d_model = d_model

        self.W_q = nn.Linear(d_model, d_model)
        self.W_k = nn.Linear(d_model, d_model)
        self.W_v = nn.Linear(d_model, d_model)
        self.W_o = nn.Linear(d_model, d_model)

    def forward(self, x: torch.Tensor, mask: torch.Tensor = None) -> torch.Tensor:
        B, T, C = x.shape

        def split_heads(t):
            return t.view(B, T, self.num_heads, self.head_dim).transpose(1, 2)

        Q = split_heads(self.W_q(x))
        K = split_heads(self.W_k(x))
        V = split_heads(self.W_v(x))

        attn_out = scaled_dot_product_attention(Q, K, V, mask)
        # (B, heads, T, head_dim) -> (B, T, d_model)
        attn_out = attn_out.transpose(1, 2).contiguous().view(B, T, C)
        return self.W_o(attn_out)

Cross-Entropy Loss (Problem 16 — Easy)

python

def cross_entropy_loss(logits: torch.Tensor, targets: torch.Tensor) -> torch.Tensor:
    # logits: (B, C), targets: (B,) with class indices
    # Use logsumexp trick for numerical stability
    log_sum_exp = torch.logsumexp(logits, dim=-1)  # (B,)
    log_probs = logits[torch.arange(len(targets)), targets]  # (B,)
    return (log_sum_exp - log_probs).mean()

Dropout (Problem 17 — Easy)

python

class MyDropout(nn.Module):
    def __init__(self, p: float = 0.5):
        super().__init__()
        self.p = p

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        if not self.training or self.p == 0:
            return x
        mask = torch.bernoulli(torch.ones_like(x) * (1 - self.p))
        return x * mask / (1 - self.p)  # inverted scaling

Kaiming Init (Problem 20 — Easy)

python

def kaiming_init(weight: torch.Tensor) -> torch.Tensor:
    fan_in = weight.size(1)
    std = math.sqrt(2.0 / fan_in)
    with torch.no_grad():
        weight.normal_(0, std)
    return weight

Gradient Clipping (Problem 21 — Easy)

python

def clip_grad_norm(parameters, max_norm: float) -> float:
    params = [p for p in parameters if p.grad is not None]
    total_norm = torch.sqrt(sum(p.grad.data.norm() ** 2 for p in params))
    clip_coef = max_norm / (total_norm + 1e-6)
    if clip_coef < 1:
        for p in params:
            p.grad.data.mul_(clip_coef)
    return total_norm.item()

Gradient Accumulation (Problem 31 — Easy)

python

def train_with_accumulation(model, optimizer, dataloader, accumulation_steps=4):
    optimizer.zero_grad()
    for i, (inputs, targets) in enumerate(dataloader):
        outputs = model(inputs)
        loss = criterion(outputs, targets) / accumulation_steps  # scale loss
        loss.backward()

        if (i + 1) % accumulation_steps == 0:
            optimizer.step()
            optimizer.zero_grad()

Common Patterns & Tips

Numerical stability pattern

Always subtract the max before

exp()

python

# WRONG — can overflow for large values
exp_x = torch.exp(x)

# CORRECT — numerically stable
exp_x = torch.exp(x - x.max(dim=-1, keepdim=True).values)

Causal attention mask (for GPT-style models)

python

def causal_mask(T: int, device) -> torch.Tensor:
    return torch.tril(torch.ones(T, T, device=device)).unsqueeze(0).unsqueeze(0)

nn.Module skeleton (used in many problems)

python

class MyLayer(nn.Module):
    def __init__(self, ...):
        super().__init__()
        self.weight = nn.Parameter(torch.empty(...))
        self.bias = nn.Parameter(torch.zeros(...))
        self._init_weights()

    def _init_weights(self):
        nn.init.kaiming_uniform_(self.weight)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        ...

Train vs eval mode pattern

python

def forward(self, x):
    if self.training:
        # use batch statistics
        mean = x.mean(dim=0)
        var = x.var(dim=0, unbiased=False)
        # update running stats
        self.running_mean = (1 - self.momentum) * self.running_mean + self.momentum * mean
        self.running_var = (1 - self.momentum) * self.running_var + self.momentum * var
    else:
        # use running statistics
        mean = self.running_mean
        var = self.running_var
    return (x - mean) / torch.sqrt(var + self.eps) * self.weight + self.bias

Project Structure

TorchCode/
├── templates/          # Blank notebooks for each problem (your workspace)
│   ├── 01_relu.ipynb
│   ├── 02_softmax.ipynb
│   └── ...
├── solutions/          # Reference solutions (study after attempting)
│   └── ...
├── torch_judge/        # Auto-grading package
│   ├── __init__.py     # check(), status(), hint(), reset_progress()
│   └── tasks/          # Per-problem test cases
├── Dockerfile
├── Makefile
└── pyproject.toml      # torch-judge package definition

Troubleshooting

Docker image not available for Apple Silicon (arm64)

bash

# make run auto-falls back to local build, or force it:
make build
make start

check()

not found in Colab

bash

!pip install torch-judge
# then restart runtime

Notebook reset to blank template

Use the toolbar "Reset" button in JupyterLab to reset any notebook to its original blank state — useful for re-practicing a problem.

Gradient check fails but output is correct

Ensure your implementation uses PyTorch operations (not NumPy) so autograd works:

python

# WRONG — breaks autograd
import numpy as np
result = np.exp(x.numpy())

# CORRECT — autograd compatible
result = torch.exp(x)

Viewing reference solution

After attempting a problem, open the matching file in

solutions/

solutions/02_softmax.ipynb

Key Concepts Tested

Concept	Problems
Numerical stability	Softmax, Cross-Entropy, LogSumExp
Autograd / `nn.Parameter`	Linear, LayerNorm, all nn.Module problems
Train vs eval behavior	BatchNorm, Dropout
Broadcasting	LayerNorm, RMSNorm, attention masking
Shape manipulation	Multi-Head Attention (view, transpose, contiguous)
Weight initialization	Kaiming Init, Linear Layer
Memory-efficient training	Gradient Accumulation, Gradient Clipping

torchcode-pytorch-interview-practice

NPX Install

Tags

SKILL.md Content

TorchCode — PyTorch Interview Practice

Installation & Setup

Option 1: Online (zero install)

Option 2: pip (for use inside Colab or existing environment)

Option 3: Docker (pre-built image)

Option 4: Build locally

Judge API

check() return values

Problem Set Overview

Difficulty levels: Easy → Medium → Hard

Working Through a Problem

Typical notebook workflow

Real Implementation Examples

ReLU (Problem 1 — Easy)

Softmax (Problem 2 — Easy, numerically stable)

LayerNorm (Problem 4 — Medium)

RMSNorm (Problem 8 — Medium, LLaMA-style)

Scaled Dot-Product Self-Attention (Problem 5 — Medium)

Multi-Head Attention (Problem 6 — Medium)

Cross-Entropy Loss (Problem 16 — Easy)

Dropout (Problem 17 — Easy)

Kaiming Init (Problem 20 — Easy)

Gradient Clipping (Problem 21 — Easy)

Gradient Accumulation (Problem 31 — Easy)

Common Patterns & Tips

Numerical stability pattern

Causal attention mask (for GPT-style models)

nn.Module skeleton (used in many problems)

Train vs eval mode pattern

Project Structure

Troubleshooting

Docker image not available for Apple Silicon (arm64)

check() not found in Colab

Notebook reset to blank template

Gradient check fails but output is correct

Viewing reference solution

Key Concepts Tested

`check()`
return values

`check()`
not found in Colab