algo-net-community
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLouvain Community Detection
Louvain社区检测
Overview
概述
Louvain algorithm detects communities by optimizing modularity — the fraction of edges within communities minus expected fraction if edges were random. A greedy, hierarchical algorithm that runs in O(n log n) for sparse graphs. Produces a hierarchy of communities at multiple resolutions.
Louvain算法通过优化模块化度(modularity)来检测社区——模块化度指社区内部边的比例减去边随机分布时的预期比例。这是一种贪心层次算法,在稀疏图上的时间复杂度为O(n log n),可生成多分辨率的社区层次结构。
When to Use
使用场景
Trigger conditions:
- Discovering natural groupings in social, organizational, or interaction networks
- Segmenting users/customers by behavioral similarity
- Analyzing modular structure of complex networks
When NOT to use:
- For overlapping communities (use DEMON or BigCLAM)
- When communities are pre-defined and you're classifying nodes (use label propagation)
触发条件:
- 在社交、组织或交互网络中发现自然分组
- 根据行为相似度细分用户/客户
- 分析复杂网络的模块化结构
不适用场景:
- 检测重叠社区(请使用DEMON或BigCLAM算法)
- 社区已预定义且需对节点进行分类(请使用标签传播算法)
Algorithm
算法
IRON LAW: Modularity Has a RESOLUTION LIMIT
Louvain optimizes modularity, which has a known resolution limit
(Fortunato & Barthélemy, 2007): it cannot detect communities smaller
than √(2E) where E = total edges. In large networks, small but real
communities may be merged. Use multi-resolution methods or Leiden
algorithm (improved Louvain) for better results.IRON LAW: Modularity Has a RESOLUTION LIMIT
Louvain optimizes modularity, which has a known resolution limit
(Fortunato & Barthélemy, 2007): it cannot detect communities smaller
than √(2E) where E = total edges. In large networks, small but real
communities may be merged. Use multi-resolution methods or Leiden
algorithm (improved Louvain) for better results.Phase 1: Input Validation
阶段1:输入验证
Build undirected weighted graph from interaction data. Edge weights represent interaction strength (frequency, duration, volume).
Gate: Graph loaded, no isolated nodes (or decide how to handle them).
从交互数据构建无向加权图。边权重代表交互强度(频率、时长、数量)。
检查点: 图已加载,无孤立节点(或已确定孤立节点的处理方式)。
Phase 2: Core Algorithm
阶段2:核心算法
Phase 1 — Local moves:
- Assign each node to its own community
- For each node, compute modularity gain of moving to each neighbor's community
- Move node to community with maximum positive gain
- Repeat until no beneficial moves remain
Phase 2 — Aggregation:
5. Build new graph where nodes = communities, edges = sum of inter-community edges
6. Repeat Phase 1 on the aggregated graph
7. Continue until modularity stops improving
阶段1——局部移动:
- 将每个节点分配到自身所在的社区
- 对每个节点,计算其移动到每个邻居社区后的模块化度增益
- 将节点移动到模块化度增益最大的社区(增益需为正)
- 重复上述步骤,直到无有益移动为止
阶段2——聚合:
5. 构建新图,其中节点代表原社区,边代表社区间边的权重总和
6. 在聚合后的图上重复阶段1的操作
7. 持续迭代,直到模块化度不再提升
Phase 3: Verification
阶段3:验证
Check: modularity Q > 0 (non-trivial partitioning), community sizes are reasonable (not one giant + many singletons), manual inspection of sample communities.
Gate: Modularity positive, community sizes follow power-law-like distribution.
检查:模块化度Q>0(非平凡划分)、社区大小合理(不存在一个巨型社区加大量单节点社区的情况)、对样本社区进行人工检查。
检查点: 模块化度为正,社区大小符合类似幂律分布。
Phase 4: Output
阶段4:输出
Return community assignments with modularity score.
返回带有模块化度分数的社区分配结果。
Output Format
输出格式
json
{
"communities": [{"id": 0, "size": 45, "top_members": ["Alice", "Bob"], "internal_density": 0.35}],
"summary": {"num_communities": 12, "modularity": 0.65, "largest": 120, "smallest": 5},
"metadata": {"algorithm": "louvain", "nodes": 500, "edges": 2000}
}json
{
"communities": [{"id": 0, "size": 45, "top_members": ["Alice", "Bob"], "internal_density": 0.35}],
"summary": {"num_communities": 12, "modularity": 0.65, "largest": 120, "smallest": 5},
"metadata": {"algorithm": "louvain", "nodes": 500, "edges": 2000}
}Examples
示例
Sample I/O
示例输入输出
Input: Email network of 200 employees, weighted by email frequency
Expected: Communities roughly corresponding to departments/teams, modularity ~0.5-0.7.
输入: 包含200名员工的邮件网络,按邮件频率加权
预期结果: 社区大致对应部门/团队,模块化度约为0.5-0.7。
Edge Cases
边缘情况
| Input | Expected | Why |
|---|---|---|
| Complete graph | One community or random split | No modular structure |
| Disconnected components | Each component = community | Natural separation |
| Weighted vs unweighted | Different communities | Weights change modularity calculation |
| 输入 | 预期结果 | 原因 |
|---|---|---|
| 完全图 | 单个社区或随机划分 | 无模块化结构 |
| 不连通的组件 | 每个组件为一个社区 | 自然分隔 |
| 加权图 vs 无权图 | 不同的社区划分 | 权重会改变模块化度的计算 |
Gotchas
注意事项
- Non-deterministic: Node processing order affects results. Run multiple times and select the partition with highest modularity, or use Leiden algorithm (more stable).
- Resolution parameter: Standard Louvain uses γ=1 in modularity. Varying γ reveals communities at different scales. γ>1 finds smaller communities; γ<1 finds larger ones.
- Leiden > Louvain: Louvain can produce badly connected communities (communities where removing one node disconnects them). Leiden algorithm fixes this guarantee.
- Temporal stability: In dynamic networks, community assignments can change drastically between snapshots even when the network changes minimally. Use temporal smoothing.
- Interpretation: Community detection finds structure, but interpreting WHY nodes cluster requires domain knowledge. Don't over-interpret automatically detected communities.
- 非确定性: 节点处理顺序会影响结果。可多次运行并选择模块化度最高的划分,或使用Leiden算法(稳定性更强)。
- 分辨率参数: 标准Louvain算法在模块化度计算中使用γ=1。调整γ值可揭示不同尺度的社区。γ>1时会发现更小的社区;γ<1时会发现更大的社区。
- Leiden优于Louvain: Louvain可能产生连接性差的社区(移除一个节点就会导致社区断开)。Leiden算法解决了这个问题并提供保障。
- 时间稳定性: 在动态网络中,即使网络变化极小,不同快照间的社区分配也可能发生巨大变化。建议使用时间平滑方法。
- 解读: 社区检测仅能发现结构,而解读节点聚类的原因需要领域知识。不要过度解读自动检测出的社区。
References
参考资料
- For Leiden algorithm (improved Louvain), see
references/leiden.md - For multi-resolution community detection, see
references/multi-resolution.md
- 关于Leiden算法(Louvain的改进版),请查看
references/leiden.md - 关于多分辨率社区检测,请查看
references/multi-resolution.md