algo-rank-elo
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseElo Rating System
Elo Rating System
Overview
概述
Elo assigns numerical ratings that update after each pairwise comparison. Winner gains points, loser loses points. The amount exchanged depends on expected vs actual outcome. Originally for chess, now used for sports, games, and A/B preference testing. Update runs in O(1) per match.
Elo会分配数值评分,并在每次两两比较后更新评分。获胜者获得分数,失败者失去分数。分数变化的幅度取决于预期结果与实际结果的差异。Elo最初用于国际象棋,现在被应用于体育、游戏和A/B偏好测试中。每次比赛的更新操作时间复杂度为O(1)。
When to Use
使用场景
Trigger conditions:
- Ranking items from pairwise comparison data (A vs B outcomes)
- Building competitive rating systems for games or sports
- Crowdsourced quality evaluation through pairwise preferences
When NOT to use:
- When you have absolute scores, not pairwise comparisons (use direct ranking)
- When team dynamics matter more than individual skill (use TrueSkill)
触发条件:
- 通过两两比较数据(A vs B的结果)对项目进行排名
- 为游戏或体育项目构建竞技评分系统
- 通过两两偏好进行众包质量评估
不适用场景:
- 当你拥有绝对分数而非两两比较数据时(使用直接排名法)
- 当团队动态比个人技能更重要时(使用TrueSkill)
Algorithm
算法
IRON LAW: Elo Assumes Each Matchup Is Independent and Stationary
Rating changes are based on surprise: beating a higher-rated opponent
gains more points than beating a lower-rated one. K-factor controls
update speed: high K (32) = volatile, fast adaptation. Low K (16) =
stable, slow adaptation. Choose K based on how quickly skill changes.IRON LAW: Elo Assumes Each Matchup Is Independent and Stationary
Rating changes are based on surprise: beating a higher-rated opponent
gains more points than beating a lower-rated one. K-factor controls
update speed: high K (32) = volatile, fast adaptation. Low K (16) =
stable, slow adaptation. Choose K based on how quickly skill changes.Phase 1: Input Validation
阶段1:输入验证
Initialize all participants at base rating (typically 1500). Collect match results: winner, loser (or draw).
Gate: Valid match data, no self-matches.
为所有参与者初始化基础评分(通常为1500)。收集比赛结果:获胜者、失败者(或平局)。
校验: 比赛数据有效,无自我对决情况。
Phase 2: Core Algorithm
阶段2:核心算法
- Expected score: E_A = 1 / (1 + 10^((R_B - R_A)/400))
- Actual score: S_A = 1 (win), 0.5 (draw), 0 (loss)
- Update: R_A_new = R_A + K × (S_A - E_A)
- Process all matches sequentially (order matters for sequential Elo)
- 预期得分:E_A = 1 / (1 + 10^((R_B - R_A)/400))
- 实际得分:S_A = 1(获胜),0.5(平局),0(失败)
- 更新评分:R_A_new = R_A + K × (S_A - E_A)
- 按顺序处理所有比赛(顺序对序列式Elo算法有影响)
Phase 3: Verification
阶段3:验证
Check: total rating points conserved (zero-sum). Rating distribution is reasonable (no extreme values from data errors).
Gate: Ratings conserved, top-ranked items pass sanity check.
检查:总评分点数守恒(零和)。评分分布合理(无因数据错误导致的极端值)。
校验: 评分守恒,排名靠前的项目通过合理性检查。
Phase 4: Output
阶段4:输出
Return sorted ratings with confidence indicators.
返回带有置信度指标的排序后评分。
Output Format
输出格式
json
{
"ratings": [{"id": "player_A", "rating": 1720, "matches": 50, "wins": 35, "losses": 15}],
"metadata": {"k_factor": 32, "initial_rating": 1500, "total_matches": 500}
}json
{
"ratings": [{"id": "player_A", "rating": 1720, "matches": 50, "wins": 35, "losses": 15}],
"metadata": {"k_factor": 32, "initial_rating": 1500, "total_matches": 500}
}Examples
示例
Sample I/O
输入输出示例
Input: Player A (1500) beats Player B (1500), K=32
Expected: E_A = 0.5, S_A = 1. R_A_new = 1500 + 32×(1-0.5) = 1516. R_B_new = 1484.
输入: 玩家A(1500分)击败玩家B(1500分),K=32
预期结果: E_A = 0.5,S_A = 1。R_A_new = 1500 + 32×(1-0.5) = 1516。R_B_new = 1484。
Edge Cases
边缘情况
| Input | Expected | Why |
|---|---|---|
| 1500 beats 2000 | Large rating gain (~29 pts at K=32) | Huge upset, large surprise |
| 2000 beats 1500 | Small rating gain (~3 pts at K=32) | Expected outcome, minimal surprise |
| Draw between equals | No change | Expected outcome exactly matches actual |
| 输入 | 预期结果 | 原因 |
|---|---|---|
| 1500分玩家击败2000分玩家 | 评分大幅提升(K=32时约29分) | 爆冷获胜,意外性极高 |
| 2000分玩家击败1500分玩家 | 评分小幅提升(K=32时约3分) | 预期内结果,意外性极低 |
| 两名评分相同的玩家平局 | 评分无变化 | 实际结果与预期完全一致 |
Gotchas
注意事项
- K-factor selection: Too high = ratings oscillate. Too low = slow to reflect actual skill changes. Use variable K: higher for new participants, lower for established ones.
- Order dependence: Sequential Elo ratings depend on match processing order. For batch processing, use iterative Elo or Bradley-Terry model.
- Inflation/deflation: In open systems where participants enter/leave, average rating can drift. Use rating floors or periodic calibration.
- Not designed for teams: Standard Elo is for 1v1. For teams, average team ratings or use TrueSkill which models individual contribution within teams.
- Rating ≠ win probability: A 200-point rating gap implies ~76% expected win rate, but actual outcomes depend on context, form, and luck.
- K值选择: K值过高会导致评分波动剧烈;K值过低则无法及时反映实际技能变化。可使用可变K值:新参与者使用较高K值,资深参与者使用较低K值。
- 顺序依赖性: 序列式Elo评分的结果取决于比赛的处理顺序。对于批量处理,可使用迭代式Elo或Bradley-Terry模型。
- 评分膨胀/收缩: 在参与者可自由加入或退出的开放系统中,平均评分可能会出现漂移。可使用评分下限或定期校准来解决。
- 不适用于团队: 标准Elo算法针对1v1场景。对于团队,可使用团队平均评分,或使用TrueSkill模型(该模型可模拟团队中个体的贡献)。
- 评分≠获胜概率: 200分的评分差距意味着约76%的预期获胜率,但实际结果仍受场景、状态和运气影响。
Scripts
脚本
| Script | Description | Usage |
|---|---|---|
| Update Elo ratings (single match or batch) with zero-sum verification | |
Run to execute built-in sanity tests.
python scripts/elo.py --verify| 脚本 | 描述 | 使用方法 |
|---|---|---|
| 更新Elo评分(单场或批量比赛),并进行零和校验 | |
运行 可执行内置的合理性测试。
python scripts/elo.py --verifyReferences
参考资料
- For Bradley-Terry model (batch Elo), see
references/bradley-terry.md - For variable K-factor strategies, see
references/variable-k.md
- 关于Bradley-Terry模型(批量Elo),请查看
references/bradley-terry.md - 关于可变K值策略,请查看
references/variable-k.md