ab-test-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

A/B Test Analysis

A/B测试分析

Evaluate A/B test results with statistical rigor and translate findings into clear product decisions.
以严谨的统计方法评估A/B测试结果,并将分析结论转化为清晰的产品决策。

Context

背景信息

You are analyzing A/B test results for $ARGUMENTS.
If the user provides data files (CSV, Excel, or analytics exports), read and analyze them directly. Generate Python scripts for statistical calculations when needed.
你正在为**$ARGUMENTS**分析A/B测试结果。
如果用户提供数据文件(CSV、Excel或分析导出文件),直接读取并分析。必要时生成用于统计计算的Python脚本。

Instructions

操作步骤

  1. Understand the experiment:
    • What was the hypothesis?
    • What was changed (the variant)?
    • What is the primary metric? Any guardrail metrics?
    • How long did the test run?
    • What is the traffic split?
  2. Validate the test setup:
    • Sample size: Is the sample large enough for the expected effect size?
      • Use the formula: n = (Z²α/2 × 2 × p × (1-p)) / MDE²
      • Flag if the test is underpowered (<80% power)
    • Duration: Did the test run for at least 1-2 full business cycles?
    • Randomization: Any evidence of sample ratio mismatch (SRM)?
    • Novelty/primacy effects: Was there enough time to wash out initial behavior changes?
  3. Calculate statistical significance:
    • Conversion rate for control and variant
    • Relative lift: (variant - control) / control × 100
    • p-value: Using a two-tailed z-test or chi-squared test
    • Confidence interval: 95% CI for the difference
    • Statistical significance: Is p < 0.05?
    • Practical significance: Is the lift meaningful for the business?
    If the user provides raw data, generate and run a Python script to calculate these.
  4. Check guardrail metrics:
    • Did any guardrail metrics (revenue, engagement, page load time) degrade?
    • A winning primary metric with degraded guardrails may not be a true win
  5. Interpret results:
    OutcomeRecommendation
    Significant positive lift, no guardrail issuesShip it — roll out to 100%
    Significant positive lift, guardrail concernsInvestigate — understand trade-offs before shipping
    Not significant, positive trendExtend the test — need more data or larger effect
    Not significant, flatStop the test — no meaningful difference detected
    Significant negative liftDon't ship — revert to control, analyze why
  6. Provide the analysis summary:
    ## A/B Test Results: [Test Name]
    
    **Hypothesis**: [What we expected]
    **Duration**: [X days] | **Sample**: [N control / M variant]
    
    | Metric | Control | Variant | Lift | p-value | Significant? |
    |---|---|---|---|---|---|
    | [Primary] | X% | Y% | +Z% | 0.0X | Yes/No |
    | [Guardrail] | ... | ... | ... | ... | ... |
    
    **Recommendation**: [Ship / Extend / Stop / Investigate]
    **Reasoning**: [Why]
    **Next steps**: [What to do]
Think step by step. Save as markdown. Generate Python scripts for calculations if raw data is provided.

  1. 理解实验背景:
    • 实验假设是什么?
    • 变体版本做出了哪些改动?
    • 核心指标是什么?有没有防护指标?
    • 测试持续了多长时间?
    • 流量分配比例是多少?
  2. 验证测试设置:
    • 样本量: 样本量是否足够支撑预期的效果规模?
      • 使用公式:n = (Z²α/2 × 2 × p × (1-p)) / MDE²
      • 如果测试功效不足(<80%),需要标记出来
    • 持续时间: 测试是否至少运行了1-2个完整的业务周期?
    • 随机性: 是否存在样本比例不匹配(SRM)的迹象?
    • 新奇/首因效应: 是否有足够的时间消除初始行为变化的影响?
  3. 计算统计显著性:
    • 对照组和变体组的转化率
    • 相对提升率: (变体组 - 对照组) / 对照组 × 100
    • p值: 使用双尾z-test或chi-squared test计算
    • 置信区间: 差异的95%置信区间
    • 统计显著性: p值是否 < 0.05?
    • 实际显著性: 提升效果对业务是否有实际意义?
    如果用户提供原始数据,生成并运行Python脚本进行上述计算。
  4. 检查防护指标:
    • 有没有防护指标(收入、用户参与度、页面加载时间)出现下滑?
    • 核心指标表现优异但防护指标下滑的情况,可能并非真正的成功
  5. 解读结果:
    结果建议
    显著正向提升,无防护指标问题发布 — 全量推出变体版本
    显著正向提升,但存在防护指标隐患深入调查 — 了解利弊后再决定是否发布
    未达显著性,但有正向趋势延长测试 — 需要更多数据或更大的效果规模
    未达显著性,结果持平停止测试 — 未检测到有意义的差异
    显著负向提升不发布 — 恢复为对照组,分析原因
  6. 提供分析总结:
    ## A/B测试结果: [测试名称]
    
    **假设**: [我们的预期]
    **持续时间**: [X天] | **样本量**: [对照组N / 变体组M]
    
    | 指标 | 对照组 | 变体组 | 提升率 | p值 | 是否显著? |
    |---|---|---|---|---|---|
    | [核心指标] | X% | Y% | +Z% | 0.0X | 是/否 |
    | [防护指标] | ... | ... | ... | ... | ... |
    
    **建议**: [发布 / 延长 / 停止 / 调查]
    **理由**: [原因]
    **下一步**: [行动]
请逐步思考。保存为Markdown格式。如果提供原始数据,生成用于计算的Python脚本。

Further Reading

拓展阅读