data-cohort-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cohort Analysis

同期群分析

Framework

核心框架

IRON LAW: Aggregate Metrics Hide Cohort Differences

A 70% monthly retention rate OVERALL can mask that January cohort retains
at 85% while June cohort retains at 50%. Aggregate metrics blend improving
and deteriorating cohorts together, hiding both problems and progress.
ALWAYS analyze by cohort before drawing conclusions.
IRON LAW: Aggregate Metrics Hide Cohort Differences

A 70% monthly retention rate OVERALL can mask that January cohort retains
at 85% while June cohort retains at 50%. Aggregate metrics blend improving
and deteriorating cohorts together, hiding both problems and progress.
ALWAYS analyze by cohort before drawing conclusions.

Core Concepts

核心概念

Cohort: A group of users who share a common characteristic in a specific time period. Most common: acquisition cohort (grouped by signup month).
Retention Matrix: Rows = cohorts (by signup month), Columns = time periods after signup (Month 0, 1, 2...). Cells = % of cohort still active.
           Month 0  Month 1  Month 2  Month 3
Jan cohort   100%     65%     48%      40%
Feb cohort   100%     60%     42%      35%
Mar cohort   100%     70%     55%      48%  ← Improvement!
同期群(Cohort):在特定时间段内拥有共同特征的用户群体。最常见的是获客同期群(按注册月份分组)。
留存矩阵:行 = 同期群(按注册月份),列 = 注册后的时间段(第0月、第1月、第2月...)。单元格 = 该同期群仍活跃的用户占比。
           Month 0  Month 1  Month 2  Month 3
Jan cohort   100%     65%     48%      40%
Feb cohort   100%     60%     42%      35%
Mar cohort   100%     70%     55%      48%  ← Improvement!

Retention Types

留存类型

TypeDefinitionUse Case
N-day% active on exactly day NGames, daily-use apps
N-day bounded% active within first N daysGeneral product usage
Week/Month% active in week/month NSaaS, subscriptions
Unbounded% who ever return after day NLow-frequency products
类型定义使用场景
N-day第N天仍活跃的用户占比游戏、日常使用类应用
N-day bounded前N天内有活跃行为的用户占比通用产品使用场景
Week/Month第N周/月内有活跃行为的用户占比SaaS、订阅类产品
Unbounded第N天后有过回流行为的用户占比低频次产品

Analysis Steps

分析步骤

Phase 1: Define Cohort and Activity
  • Cohort definition: signup date, first purchase date, or other milestone
  • Activity definition: login, purchase, specific action — must match the product's core value
  • Time granularity: daily (for daily-use products), weekly, or monthly
Phase 2: Build Retention Matrix
  • Group users into cohorts
  • For each cohort, calculate retention at each time period
  • Visualize as a heatmap (darker = higher retention)
Phase 3: Identify Patterns
  • Retention curve shape: Does it flatten (good — stable core users) or keep declining (bad — everyone eventually churns)?
  • Cohort comparison: Are newer cohorts retaining better or worse than older ones?
  • Drop-off cliff: Is there a specific period where retention drops sharply? (e.g., Day 1 → Day 7 drops 50%)
Phase 4: Connect to Actions
  • What changed for the improving/deteriorating cohorts? (product update, marketing channel shift, onboarding change)
  • Can you isolate the cause through A/B test or event analysis?
Phase 5: LTV Projection
  • Use cohort retention curves to project future revenue per cohort
  • LTV = Σ (retention_month_n × ARPU_month_n) for all future months
阶段1:定义同期群与活跃行为
  • 同期群定义:注册日期、首次购买日期或其他关键里程碑
  • 活跃行为定义:登录、购买、特定操作——必须与产品核心价值匹配
  • 时间粒度:按日(适用于日常使用产品)、周或月
阶段2:构建留存矩阵
  • 将用户分组为不同同期群
  • 计算每个同期群在各时间段的留存率
  • 以热力图可视化(颜色越深表示留存率越高)
阶段3:识别规律
  • 留存曲线形态:曲线是否趋于平缓(良好——核心用户稳定)或持续下降(糟糕——最终所有用户都会流失)?
  • 同期群对比:新同期群的留存表现比老同期群更好还是更差?
  • 流失陡降点:是否存在某个时间段留存率急剧下降?(例如,第1天到第7天留存率下降50%)
阶段4:关联业务动作
  • 留存表现变好/变差的同期群对应的业务变化是什么?(产品更新、营销渠道调整、新用户引导流程变化)
  • 能否通过A/B测试或事件分析定位具体原因?
阶段5:LTV预测
  • 利用同期群留存曲线预测每个同期群的未来收入
  • LTV = Σ(第n月留存率 × 第n月每用户平均收入(ARPU)),计算所有未来月份的总和

Output Format

输出格式

markdown
undefined
markdown
undefined

Cohort Analysis: {Product}

Cohort Analysis: {Product}

Cohort Definition

Cohort Definition

  • Cohort: {signup month / first purchase}
  • Activity: {what counts as "active"}
  • Period: {daily / weekly / monthly}
  • Cohort: {signup month / first purchase}
  • Activity: {what counts as "active"}
  • Period: {daily / weekly / monthly}

Retention Matrix

Retention Matrix

CohortM0M1M2M3M4M5M6
{month}100%{%}{%}{%}{%}{%}{%}
CohortM0M1M2M3M4M5M6
{month}100%{%}{%}{%}{%}{%}{%}

Key Findings

Key Findings

  1. {retention curve shape}
  2. {cohort trend — improving or deteriorating}
  3. {critical drop-off point}
  1. {retention curve shape}
  2. {cohort trend — improving or deteriorating}
  3. {critical drop-off point}

Cohort Comparison

Cohort Comparison

MetricOldest CohortNewest CohortDelta
M1 retention{%}{%}{±pp}
M3 retention{%}{%}{±pp}
Projected LTV${X}${X}{%}
MetricOldest CohortNewest CohortDelta
M1 retention{%}{%}{±pp}
M3 retention{%}{%}{±pp}
Projected LTV${X}${X}{%}

Recommendations

Recommendations

  1. {action to improve retention at critical drop-off point}
undefined
  1. {action to improve retention at critical drop-off point}
undefined

Gotchas

注意事项

  • Define "active" carefully: Login ≠ value delivery. A user who logs in but doesn't complete the core action (purchase, send message, create document) shouldn't count as "retained."
  • Cohort size matters: A cohort of 10 users with 50% retention is meaningless (5 users). Ensure cohorts have statistically meaningful sizes.
  • Survivorship bias in aggregates: "Average retention is improving" may just mean you have more new users (who are always at M0 = 100%) diluting the denominator.
  • Seasonal cohorts behave differently: December cohorts (holiday shoppers) often retain worse than March cohorts (organic discovery). Compare same-season cohorts YoY.
  • Retention ≠ engagement depth: A user who returns once per month but uses for 5 hours vs one who returns daily for 30 seconds — same retention, very different engagement. Layer in activity depth metrics.
  • 谨慎定义“活跃行为”:登录 ≠ 价值交付。仅登录但未完成核心动作(购买、发送消息、创建文档)的用户不应被算作“留存用户”。
  • 同期群规模至关重要:仅10个用户的同期群即便有50%的留存率也毫无意义(仅5个用户)。需确保同期群规模具备统计显著性。
  • 聚合数据中的幸存者偏差:“平均留存率正在提升”可能只是因为新用户(始终处于第0月,留存率100%)占比增加,稀释了分母。
  • 季节性同期群表现不同:12月同期群(假日购物者)的留存率通常比3月同期群(自然流量用户)差。应对比同季节同期群的同比数据。
  • 留存率 ≠ 参与深度:每月回流1次但使用5小时的用户,与每日回流但仅使用30秒的用户——留存率相同,但参与度差异极大。需结合参与深度指标分析。

References

参考资料

  • For SQL retention query templates, see
    references/retention-sql.md
  • For LTV projection from cohort data, see
    references/cohort-ltv.md
  • 如需SQL留存查询模板,请查看
    references/retention-sql.md
  • 如需基于同期群数据的LTV预测方法,请查看
    references/cohort-ltv.md