algo-net-community

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Louvain Community Detection

Louvain社区检测

Overview

概述

Louvain algorithm detects communities by optimizing modularity — the fraction of edges within communities minus expected fraction if edges were random. A greedy, hierarchical algorithm that runs in O(n log n) for sparse graphs. Produces a hierarchy of communities at multiple resolutions.
Louvain算法通过优化模块化度(modularity)来检测社区——模块化度指社区内部边的比例减去边随机分布时的预期比例。这是一种贪心层次算法,在稀疏图上的时间复杂度为O(n log n),可生成多分辨率的社区层次结构。

When to Use

使用场景

Trigger conditions:
  • Discovering natural groupings in social, organizational, or interaction networks
  • Segmenting users/customers by behavioral similarity
  • Analyzing modular structure of complex networks
When NOT to use:
  • For overlapping communities (use DEMON or BigCLAM)
  • When communities are pre-defined and you're classifying nodes (use label propagation)
触发条件:
  • 在社交、组织或交互网络中发现自然分组
  • 根据行为相似度细分用户/客户
  • 分析复杂网络的模块化结构
不适用场景:
  • 检测重叠社区(请使用DEMON或BigCLAM算法)
  • 社区已预定义且需对节点进行分类(请使用标签传播算法)

Algorithm

算法

IRON LAW: Modularity Has a RESOLUTION LIMIT
Louvain optimizes modularity, which has a known resolution limit
(Fortunato & Barthélemy, 2007): it cannot detect communities smaller
than √(2E) where E = total edges. In large networks, small but real
communities may be merged. Use multi-resolution methods or Leiden
algorithm (improved Louvain) for better results.
IRON LAW: Modularity Has a RESOLUTION LIMIT
Louvain optimizes modularity, which has a known resolution limit
(Fortunato & Barthélemy, 2007): it cannot detect communities smaller
than √(2E) where E = total edges. In large networks, small but real
communities may be merged. Use multi-resolution methods or Leiden
algorithm (improved Louvain) for better results.

Phase 1: Input Validation

阶段1:输入验证

Build undirected weighted graph from interaction data. Edge weights represent interaction strength (frequency, duration, volume). Gate: Graph loaded, no isolated nodes (or decide how to handle them).
从交互数据构建无向加权图。边权重代表交互强度(频率、时长、数量)。 检查点: 图已加载,无孤立节点(或已确定孤立节点的处理方式)。

Phase 2: Core Algorithm

阶段2:核心算法

Phase 1 — Local moves:
  1. Assign each node to its own community
  2. For each node, compute modularity gain of moving to each neighbor's community
  3. Move node to community with maximum positive gain
  4. Repeat until no beneficial moves remain
Phase 2 — Aggregation: 5. Build new graph where nodes = communities, edges = sum of inter-community edges 6. Repeat Phase 1 on the aggregated graph 7. Continue until modularity stops improving
阶段1——局部移动:
  1. 将每个节点分配到自身所在的社区
  2. 对每个节点,计算其移动到每个邻居社区后的模块化度增益
  3. 将节点移动到模块化度增益最大的社区(增益需为正)
  4. 重复上述步骤,直到无有益移动为止
阶段2——聚合: 5. 构建新图,其中节点代表原社区,边代表社区间边的权重总和 6. 在聚合后的图上重复阶段1的操作 7. 持续迭代,直到模块化度不再提升

Phase 3: Verification

阶段3:验证

Check: modularity Q > 0 (non-trivial partitioning), community sizes are reasonable (not one giant + many singletons), manual inspection of sample communities. Gate: Modularity positive, community sizes follow power-law-like distribution.
检查:模块化度Q>0(非平凡划分)、社区大小合理(不存在一个巨型社区加大量单节点社区的情况)、对样本社区进行人工检查。 检查点: 模块化度为正,社区大小符合类似幂律分布。

Phase 4: Output

阶段4:输出

Return community assignments with modularity score.
返回带有模块化度分数的社区分配结果。

Output Format

输出格式

json
{
  "communities": [{"id": 0, "size": 45, "top_members": ["Alice", "Bob"], "internal_density": 0.35}],
  "summary": {"num_communities": 12, "modularity": 0.65, "largest": 120, "smallest": 5},
  "metadata": {"algorithm": "louvain", "nodes": 500, "edges": 2000}
}
json
{
  "communities": [{"id": 0, "size": 45, "top_members": ["Alice", "Bob"], "internal_density": 0.35}],
  "summary": {"num_communities": 12, "modularity": 0.65, "largest": 120, "smallest": 5},
  "metadata": {"algorithm": "louvain", "nodes": 500, "edges": 2000}
}

Examples

示例

Sample I/O

示例输入输出

Input: Email network of 200 employees, weighted by email frequency Expected: Communities roughly corresponding to departments/teams, modularity ~0.5-0.7.
输入: 包含200名员工的邮件网络,按邮件频率加权 预期结果: 社区大致对应部门/团队,模块化度约为0.5-0.7。

Edge Cases

边缘情况

InputExpectedWhy
Complete graphOne community or random splitNo modular structure
Disconnected componentsEach component = communityNatural separation
Weighted vs unweightedDifferent communitiesWeights change modularity calculation
输入预期结果原因
完全图单个社区或随机划分无模块化结构
不连通的组件每个组件为一个社区自然分隔
加权图 vs 无权图不同的社区划分权重会改变模块化度的计算

Gotchas

注意事项

  • Non-deterministic: Node processing order affects results. Run multiple times and select the partition with highest modularity, or use Leiden algorithm (more stable).
  • Resolution parameter: Standard Louvain uses γ=1 in modularity. Varying γ reveals communities at different scales. γ>1 finds smaller communities; γ<1 finds larger ones.
  • Leiden > Louvain: Louvain can produce badly connected communities (communities where removing one node disconnects them). Leiden algorithm fixes this guarantee.
  • Temporal stability: In dynamic networks, community assignments can change drastically between snapshots even when the network changes minimally. Use temporal smoothing.
  • Interpretation: Community detection finds structure, but interpreting WHY nodes cluster requires domain knowledge. Don't over-interpret automatically detected communities.
  • 非确定性: 节点处理顺序会影响结果。可多次运行并选择模块化度最高的划分,或使用Leiden算法(稳定性更强)。
  • 分辨率参数: 标准Louvain算法在模块化度计算中使用γ=1。调整γ值可揭示不同尺度的社区。γ>1时会发现更小的社区;γ<1时会发现更大的社区。
  • Leiden优于Louvain: Louvain可能产生连接性差的社区(移除一个节点就会导致社区断开)。Leiden算法解决了这个问题并提供保障。
  • 时间稳定性: 在动态网络中,即使网络变化极小,不同快照间的社区分配也可能发生巨大变化。建议使用时间平滑方法。
  • 解读: 社区检测仅能发现结构,而解读节点聚类的原因需要领域知识。不要过度解读自动检测出的社区。

References

参考资料

  • For Leiden algorithm (improved Louvain), see
    references/leiden.md
  • For multi-resolution community detection, see
    references/multi-resolution.md
  • 关于Leiden算法(Louvain的改进版),请查看
    references/leiden.md
  • 关于多分辨率社区检测,请查看
    references/multi-resolution.md