algo-rec-session

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Session-Based Recommendation

基于会话的推荐

Overview

概述

Session-based recommendation predicts the next item a user will interact with based on their current session's click/view sequence, without relying on long-term user profiles. Uses Markov chains, association rules, or neural approaches (GRU4Rec). Operates in real-time with O(sequence_length) inference.

基于会话的推荐会根据用户当前会话的点击/浏览序列，预测其接下来会交互的物品，无需依赖长期用户画像。可使用Markov chains、关联规则或神经网络方法（如GRU4Rec）。支持实时推理，时间复杂度为O(sequence_length)。

When to Use

使用场景

Trigger conditions:

Anonymous users (no login, no long-term profile)
Short browsing sessions where recency matters most
Real-time "next item" prediction during active sessions

When NOT to use:

When rich user history is available (use CF or content-based for better personalization)
When sessions are extremely short (1-2 clicks) — insufficient signal

触发条件：

匿名用户（未登录，无长期用户画像）
近期行为最为关键的短浏览会话
活跃会话期间的实时“下一个物品”预测

不适用场景：

拥有丰富用户历史数据时（使用CF或基于内容的推荐可实现更好的个性化）
会话极短（仅1-2次点击）——信号不足

Algorithm

算法

IRON LAW: First Few Clicks Are Disproportionately Important
Session-based methods operate WITHOUT long-term profiles. Intent must
be inferred from SHORT sequences. The first 2-3 clicks establish the
session's intent — misreading early signals derails the entire session.

IRON LAW: First Few Clicks Are Disproportionately Important
Session-based methods operate WITHOUT long-term profiles. Intent must
be inferred from SHORT sequences. The first 2-3 clicks establish the
session's intent — misreading early signals derails the entire session.

Phase 1: Input Validation

阶段1：输入验证

Parse clickstream into sessions (by session ID or timeout-based splitting, typically 30min inactivity). Filter sessions below minimum length (3+ events). Gate: Sessions parsed, minimum length threshold applied.

将点击流解析为会话（通过会话ID或基于超时的拆分，通常为30分钟无活动）。过滤掉长度低于最小值的会话（需包含3个及以上事件）。 关卡： 会话已解析，且应用了最小长度阈值。

Phase 2: Core Algorithm

阶段2：核心算法

Markov Chain approach:

Build transition matrix from item-to-item sequences across all sessions
For current session [A, B, C], predict next item from P(next | C) or higher-order P(next | B, C)

Association Rules approach:

Mine frequent item sequences (sequential pattern mining)
Match current session suffix against known patterns
Recommend items that frequently follow the matched pattern

Markov Chain方法：

基于所有会话中的物品到物品序列构建转移矩阵
对于当前会话[A, B, C]，根据P(next | C)或更高阶的P(next | B, C)预测下一个物品

关联规则方法：

挖掘频繁物品序列（sequential pattern mining）
将当前会话的后缀与已知模式匹配
推荐频繁出现在匹配模式之后的物品

Phase 3: Verification

阶段3：验证

Evaluate with leave-one-out: hide last item in each session, predict, check hit rate and MRR (Mean Reciprocal Rank). Gate: Hit@20 significantly above random baseline.

使用留一法评估：隐藏每个会话的最后一个物品，进行预测，检查命中率和MRR（Mean Reciprocal Rank）。 关卡： Hit@20显著高于随机基线。

Phase 4: Output

阶段4：输出

Return ranked next-item predictions with confidence scores.

返回带有置信度得分的排序后的下一个物品预测结果。

Output Format

输出格式

json

{
  "predictions": [{"item_id": "789", "score": 0.65, "based_on": "last_3_clicks"}],
  "session": {"length": 5, "items_viewed": ["a", "b", "c", "d", "e"]},
  "metadata": {"method": "markov_order2", "hit_rate_at_20": 0.35}
}

json

{
  "predictions": [{"item_id": "789", "score": 0.65, "based_on": "last_3_clicks"}],
  "session": {"length": 5, "items_viewed": ["a", "b", "c", "d", "e"]},
  "metadata": {"method": "markov_order2", "hit_rate_at_20": 0.35}
}

Examples

示例

Sample I/O

输入输出示例

Input: Session: [shoes_page, running_shoes, nike_air_max] Expected: Recommend: nike_air_zoom (0.72), adidas_ultraboost (0.58), shoe_size_guide (0.41)

输入： 会话：[shoes_page, running_shoes, nike_air_max] 预期输出： 推荐：nike_air_zoom（0.72）、adidas_ultraboost（0.58）、shoe_size_guide（0.41）

Edge Cases

边缘情况

Input	Expected	Why
Session length = 1	Popularity fallback	Single click insufficient for sequence pattern
Repeated item views	Weight recency, not count	User may be comparing, not broadening
Session intent shift	Adapt to latest clicks	User changed their goal mid-session

输入	预期输出	原因
会话长度 = 1	基于热度的 fallback	单次点击不足以提取序列模式
重复浏览同一物品	权重偏向近期行为，而非点击次数	用户可能在对比，而非拓展需求
会话意图转变	适配最新的点击行为	用户在会话中途改变了目标

Gotchas

注意事项

Session definition matters: 30-minute timeout is conventional but arbitrary. E-commerce may need shorter (15min); research browsing may need longer (60min).
Position bias: Users click top results more. Session data reflects UI position, not just preference. Correct for position bias.
Repeat recommendations: Users often revisit items. Distinguish "recommend something new" from "remind of previously viewed."
Cold start for new items: Items with zero prior session appearances can't be predicted by transition matrices. Mix in feature-based candidates.
Computational efficiency: For real-time inference, pre-compute transition probabilities. Recomputing per-request at scale is too slow.

会话定义至关重要：30分钟超时是常规设定，但并非绝对。电商场景可能需要更短的超时（15分钟）；研究浏览场景可能需要更长时间（60分钟）。
位置偏差：用户更常点击顶部结果。会话数据反映的是UI位置，而非单纯的偏好。需修正位置偏差。
重复推荐：用户经常会重新访问物品。要区分“推荐新物品”和“提醒用户之前浏览过的物品”。
新物品冷启动：从未出现在任何会话中的物品无法通过转移矩阵预测。需混入基于特征的候选物品。
计算效率：为实现实时推理，需预先计算转移概率。在大规模场景下，每次请求重新计算速度过慢。

References

参考资料

For GRU4Rec neural session model, see
```
references/gru4rec.md
```
For session splitting heuristics, see
```
references/session-splitting.md
```

关于GRU4Rec神经会话模型，请查看
```
references/gru4rec.md
```
关于会话拆分启发式方法，请查看
```
references/session-splitting.md
```