algo-rec-session
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSession-Based Recommendation
基于会话的推荐
Overview
概述
Session-based recommendation predicts the next item a user will interact with based on their current session's click/view sequence, without relying on long-term user profiles. Uses Markov chains, association rules, or neural approaches (GRU4Rec). Operates in real-time with O(sequence_length) inference.
基于会话的推荐会根据用户当前会话的点击/浏览序列,预测其接下来会交互的物品,无需依赖长期用户画像。可使用Markov chains、关联规则或神经网络方法(如GRU4Rec)。支持实时推理,时间复杂度为O(sequence_length)。
When to Use
使用场景
Trigger conditions:
- Anonymous users (no login, no long-term profile)
- Short browsing sessions where recency matters most
- Real-time "next item" prediction during active sessions
When NOT to use:
- When rich user history is available (use CF or content-based for better personalization)
- When sessions are extremely short (1-2 clicks) — insufficient signal
触发条件:
- 匿名用户(未登录,无长期用户画像)
- 近期行为最为关键的短浏览会话
- 活跃会话期间的实时“下一个物品”预测
不适用场景:
- 拥有丰富用户历史数据时(使用CF或基于内容的推荐可实现更好的个性化)
- 会话极短(仅1-2次点击)——信号不足
Algorithm
算法
IRON LAW: First Few Clicks Are Disproportionately Important
Session-based methods operate WITHOUT long-term profiles. Intent must
be inferred from SHORT sequences. The first 2-3 clicks establish the
session's intent — misreading early signals derails the entire session.IRON LAW: First Few Clicks Are Disproportionately Important
Session-based methods operate WITHOUT long-term profiles. Intent must
be inferred from SHORT sequences. The first 2-3 clicks establish the
session's intent — misreading early signals derails the entire session.Phase 1: Input Validation
阶段1:输入验证
Parse clickstream into sessions (by session ID or timeout-based splitting, typically 30min inactivity). Filter sessions below minimum length (3+ events).
Gate: Sessions parsed, minimum length threshold applied.
将点击流解析为会话(通过会话ID或基于超时的拆分,通常为30分钟无活动)。过滤掉长度低于最小值的会话(需包含3个及以上事件)。
关卡: 会话已解析,且应用了最小长度阈值。
Phase 2: Core Algorithm
阶段2:核心算法
Markov Chain approach:
- Build transition matrix from item-to-item sequences across all sessions
- For current session [A, B, C], predict next item from P(next | C) or higher-order P(next | B, C)
Association Rules approach:
- Mine frequent item sequences (sequential pattern mining)
- Match current session suffix against known patterns
- Recommend items that frequently follow the matched pattern
Markov Chain方法:
- 基于所有会话中的物品到物品序列构建转移矩阵
- 对于当前会话[A, B, C],根据P(next | C)或更高阶的P(next | B, C)预测下一个物品
关联规则方法:
- 挖掘频繁物品序列(sequential pattern mining)
- 将当前会话的后缀与已知模式匹配
- 推荐频繁出现在匹配模式之后的物品
Phase 3: Verification
阶段3:验证
Evaluate with leave-one-out: hide last item in each session, predict, check hit rate and MRR (Mean Reciprocal Rank).
Gate: Hit@20 significantly above random baseline.
使用留一法评估:隐藏每个会话的最后一个物品,进行预测,检查命中率和MRR(Mean Reciprocal Rank)。
关卡: Hit@20显著高于随机基线。
Phase 4: Output
阶段4:输出
Return ranked next-item predictions with confidence scores.
返回带有置信度得分的排序后的下一个物品预测结果。
Output Format
输出格式
json
{
"predictions": [{"item_id": "789", "score": 0.65, "based_on": "last_3_clicks"}],
"session": {"length": 5, "items_viewed": ["a", "b", "c", "d", "e"]},
"metadata": {"method": "markov_order2", "hit_rate_at_20": 0.35}
}json
{
"predictions": [{"item_id": "789", "score": 0.65, "based_on": "last_3_clicks"}],
"session": {"length": 5, "items_viewed": ["a", "b", "c", "d", "e"]},
"metadata": {"method": "markov_order2", "hit_rate_at_20": 0.35}
}Examples
示例
Sample I/O
输入输出示例
Input: Session: [shoes_page, running_shoes, nike_air_max]
Expected: Recommend: nike_air_zoom (0.72), adidas_ultraboost (0.58), shoe_size_guide (0.41)
输入: 会话:[shoes_page, running_shoes, nike_air_max]
预期输出: 推荐:nike_air_zoom(0.72)、adidas_ultraboost(0.58)、shoe_size_guide(0.41)
Edge Cases
边缘情况
| Input | Expected | Why |
|---|---|---|
| Session length = 1 | Popularity fallback | Single click insufficient for sequence pattern |
| Repeated item views | Weight recency, not count | User may be comparing, not broadening |
| Session intent shift | Adapt to latest clicks | User changed their goal mid-session |
| 输入 | 预期输出 | 原因 |
|---|---|---|
| 会话长度 = 1 | 基于热度的 fallback | 单次点击不足以提取序列模式 |
| 重复浏览同一物品 | 权重偏向近期行为,而非点击次数 | 用户可能在对比,而非拓展需求 |
| 会话意图转变 | 适配最新的点击行为 | 用户在会话中途改变了目标 |
Gotchas
注意事项
- Session definition matters: 30-minute timeout is conventional but arbitrary. E-commerce may need shorter (15min); research browsing may need longer (60min).
- Position bias: Users click top results more. Session data reflects UI position, not just preference. Correct for position bias.
- Repeat recommendations: Users often revisit items. Distinguish "recommend something new" from "remind of previously viewed."
- Cold start for new items: Items with zero prior session appearances can't be predicted by transition matrices. Mix in feature-based candidates.
- Computational efficiency: For real-time inference, pre-compute transition probabilities. Recomputing per-request at scale is too slow.
- 会话定义至关重要:30分钟超时是常规设定,但并非绝对。电商场景可能需要更短的超时(15分钟);研究浏览场景可能需要更长时间(60分钟)。
- 位置偏差:用户更常点击顶部结果。会话数据反映的是UI位置,而非单纯的偏好。需修正位置偏差。
- 重复推荐:用户经常会重新访问物品。要区分“推荐新物品”和“提醒用户之前浏览过的物品”。
- 新物品冷启动:从未出现在任何会话中的物品无法通过转移矩阵预测。需混入基于特征的候选物品。
- 计算效率:为实现实时推理,需预先计算转移概率。在大规模场景下,每次请求重新计算速度过慢。
References
参考资料
- For GRU4Rec neural session model, see
references/gru4rec.md - For session splitting heuristics, see
references/session-splitting.md
- 关于GRU4Rec神经会话模型,请查看
references/gru4rec.md - 关于会话拆分启发式方法,请查看
references/session-splitting.md