algo-rec-mf

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Matrix Factorization

矩阵分解

Overview

概述

Matrix factorization decomposes the user-item interaction matrix R (m×n) into two low-rank matrices: U (m×k) and V (n×k), where k << min(m,n). Predicted rating: r̂ᵢⱼ = uᵢ · vⱼ. Trains in O(k × nnz × iterations) where nnz = non-zero entries.
矩阵分解将用户-物品交互矩阵R(m×n)分解为两个低秩矩阵:U(m×k)和V(n×k),其中k远小于min(m,n)。预测评分公式为:r̂ᵢⱼ = uᵢ · vⱼ。训练时间复杂度为O(k × nnz × iterations),其中nnz表示非零条目数量。

When to Use

适用场景

Trigger conditions:
  • Scaling CF beyond pairwise similarity (millions of users/items)
  • Discovering latent factors that explain user-item interactions
  • Predicting ratings for unobserved user-item pairs
When NOT to use:
  • When interaction data is extremely sparse (< 0.1% fill) — insufficient for learning
  • When you need real-time updates (retraining is expensive)
触发条件:
  • 协同过滤需要扩展到百万级用户/物品规模,超越两两相似度计算的限制
  • 需要发现能够解释用户-物品交互行为的潜在因子
  • 预测未观测的用户-物品对的评分
不适用场景:
  • 交互数据极度稀疏(填充率<0.1%)——数据不足以支撑模型学习
  • 需要实时更新的场景(重新训练成本过高)

Algorithm

算法

IRON LAW: Rank k Controls Bias-Variance Trade-Off
- Too LOW k: underfits, misses nuanced preferences (high bias)
- Too HIGH k: overfits to noise, poor generalization (high variance)
- Typical k: 20-200. Select via cross-validation on held-out ratings.
- Always add regularization (λ) to prevent overfitting.
IRON LAW: Rank k Controls Bias-Variance Trade-Off
- Too LOW k: underfits, misses nuanced preferences (high bias)
- Too HIGH k: overfits to noise, poor generalization (high variance)
- Typical k: 20-200. Select via cross-validation on held-out ratings.
- Always add regularization (λ) to prevent overfitting.

Phase 1: Input Validation

阶段1:输入验证

Load sparse interaction matrix. Split into train/validation/test. Check minimum density. Gate: Train matrix has sufficient entries per user and item.
加载稀疏交互矩阵,划分为训练集、验证集和测试集。检查最小密度。 Gate: 训练矩阵中每个用户和物品都有足够的条目数。

Phase 2: Core Algorithm

阶段2:核心算法

ALS (Alternating Least Squares):
  1. Initialize U, V randomly (or with SVD warm-start)
  2. Fix V, solve for U: minimize ||R - UV^T||² + λ(||U||² + ||V||²)
  3. Fix U, solve for V using same objective
  4. Alternate until convergence (RMSE change < ε)
SGD alternative: Update u_i, v_j incrementally for each observed rating using gradient descent.
ALS (Alternating Least Squares):
  1. 随机初始化U和V(或使用SVD热启动)
  2. 固定V,求解U:最小化||R - UV^T||² + λ(||U||² + ||V||²)
  3. 固定U,使用相同目标函数求解V
  4. 交替迭代直至收敛(RMSE变化量<ε)
SGD alternative: 针对每个观测到的评分,逐步更新u_i和v_j。

Phase 3: Verification

阶段3:验证

Compute RMSE on held-out validation set. Compare against baseline (global mean, user mean). Gate: Validation RMSE significantly below baseline.
在留存的验证集上计算RMSE,与基线模型(全局均值、用户均值)进行对比。 Gate: 验证集RMSE显著低于基线模型。

Phase 4: Output

阶段4:输出

Return top-N predictions per user with predicted scores.
返回每个用户的Top-N预测结果及对应的预测评分。

Output Format

输出格式

json
{
  "recommendations": [{"user_id": "u1", "items": [{"item_id": "i5", "predicted_rating": 4.3}]}],
  "metadata": {"rank_k": 50, "regularization": 0.01, "iterations": 20, "train_rmse": 0.82, "val_rmse": 0.91}
}
json
{
  "recommendations": [{"user_id": "u1", "items": [{"item_id": "i5", "predicted_rating": 4.3}]}],
  "metadata": {"rank_k": 50, "regularization": 0.01, "iterations": 20, "train_rmse": 0.82, "val_rmse": 0.91}
}

Examples

示例

Sample I/O

输入输出样例

Input: 3×3 rating matrix R (0 = unobserved), k=1
R = [[5, 3, 0],
     [4, 0, 2],
     [0, 1, 1]]
Expected: After ALS with k=1 (one latent factor, λ=0.01, 50 iterations), approximate factorization:
U ≈ [[2.24], [1.84], [0.53]]
V ≈ [[2.23], [1.06], [0.98]]
R_hat ≈ [[4.99, 2.37, 2.20],
         [4.10, 1.95, 1.80],
         [1.18, 0.56, 0.52]]
Verify: R_hat ≈ R on observed entries (within 0.2 RMSE). U[0] >> U[2] correctly captures user 0's higher ratings.
Input: 3×3 rating matrix R (0 = unobserved), k=1
R = [[5, 3, 0],
     [4, 0, 2],
     [0, 1, 1]]
Expected: After ALS with k=1 (one latent factor, λ=0.01, 50 iterations), approximate factorization:
U ≈ [[2.24], [1.84], [0.53]]
V ≈ [[2.23], [1.06], [0.98]]
R_hat ≈ [[4.99, 2.37, 2.20],
         [4.10, 1.95, 1.80],
         [1.18, 0.56, 0.52]]
验证:R_hat与R中已观测条目近似一致(RMSE在0.2以内)。U[0]远大于U[2],正确捕捉到用户0的评分更高的特征。

Edge Cases

边缘案例

InputExpectedWhy
User with 1 ratingPoor predictions for that userInsufficient data to learn user factors
Highly popular itemPredicted near averageDominant first latent factor captures popularity
All ratings = 5Trivial factorizationNo variance to learn from
InputExpectedWhy
User with 1 ratingPoor predictions for that userInsufficient data to learn user factors
Highly popular itemPredicted near averageDominant first latent factor captures popularity
All ratings = 5Trivial factorizationNo variance to learn from

Gotchas

注意事项

  • Implicit data needs different loss: For clicks/views (no explicit ratings), use weighted matrix factorization (Hu et al. 2008) with confidence weighting, not RMSE.
  • Cold start remains: New users/items have no entries in R. MF can't factorize what doesn't exist. Use side features or hybrid approaches.
  • Negative sampling: For implicit feedback, you must sample negative examples (unobserved ≠ disliked). Random negative sampling works but biased sampling is better.
  • Initialization matters: Random initialization can converge to poor local optima. SVD-based warm-start often helps.
  • Bias terms: Add user bias bᵢ and item bias bⱼ: r̂ᵢⱼ = μ + bᵢ + bⱼ + uᵢ·vⱼ. This captures systematic rating tendencies.
  • 隐式数据需使用不同损失函数:对于点击/浏览类数据(无显式评分),应使用带置信度加权的矩阵分解(Hu等人,2008),而非RMSE。
  • 冷启动问题仍存在:新用户/物品在矩阵R中没有条目,矩阵分解无法处理不存在的数据。需使用辅助特征或混合推荐方法。
  • 负采样:对于隐式反馈,必须采样负例(未观测≠不喜欢)。随机负采样可行,但带偏置的采样效果更优。
  • 初始化方式至关重要:随机初始化可能收敛到较差的局部最优解。基于SVD的热启动通常能提升效果。
  • 偏差项:添加用户偏差bᵢ和物品偏差bⱼ:r̂ᵢⱼ = μ + bᵢ + bⱼ + uᵢ·vⱼ。这可以捕捉系统性的评分倾向。

References

参考资料

  • For ALS vs SGD comparison, see
    references/optimization-comparison.md
  • For implicit feedback matrix factorization, see
    references/implicit-mf.md
  • 关于ALS与SGD的对比,参见
    references/optimization-comparison.md
  • 关于隐式反馈的矩阵分解,参见
    references/implicit-mf.md