ab-test-setup

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

A/B Test Setup Skill

A/B测试设置技能

Overview

概述

Production-ready A/B testing toolkit for calculating sample sizes, designing rigorous test plans, and analyzing results with statistical significance testing. Designed for growth teams, product managers, and marketers who need to make data-driven decisions from controlled experiments.
这是一款可投入生产环境的A/B测试工具包,用于计算样本量、设计严谨的测试计划,并通过统计显著性测试分析结果。专为需要从受控实验中做出数据驱动决策的增长团队、产品经理和营销人员设计。

Quick Start

快速开始

bash
undefined
bash
undefined

Calculate required sample sizes for a test

计算测试所需的样本量

python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80
python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80

Design a complete A/B test plan

设计完整的A/B测试计划

python scripts/test_designer.py test_config.json
python scripts/test_designer.py test_config.json

Analyze A/B test results

分析A/B测试结果

python scripts/results_analyzer.py results.json
undefined
python scripts/results_analyzer.py results.json
undefined

Tools Overview

工具概述

ToolPurposeInputOutput
sample_size_calculator.py
Sample size calculationBaseline rate, MDE, powerRequired samples + duration
test_designer.py
Test plan designJSON test configComplete test plan document
results_analyzer.py
Results analysisJSON with test resultsStatistical analysis + recommendation
工具用途输入输出
sample_size_calculator.py
样本量计算基准转化率、MDE(最小可检测效应)、检验效能所需样本量 + 测试时长
test_designer.py
测试计划设计JSON格式的测试配置文件完整的测试计划文档
results_analyzer.py
结果分析包含测试结果的JSON文件统计分析报告 + 决策建议

Workflows

工作流程

Workflow 1: New A/B Test Setup

流程1:新建A/B测试设置

  1. Define hypothesis and success metric
  2. Run
    sample_size_calculator.py
    with baseline conversion and minimum detectable effect
  3. Create test configuration JSON (see Common Patterns)
  4. Run
    test_designer.py
    to generate complete test plan
  5. Share plan with stakeholders for alignment before launch
  1. 定义假设和成功指标
  2. 传入基准转化率和最小可检测效应,运行
    sample_size_calculator.py
  3. 创建测试配置JSON文件(参考常见模式)
  4. 运行
    test_designer.py
    生成完整测试计划
  5. 启动前与相关人员共享计划以达成共识

Workflow 2: Test Results Analysis

流程2:测试结果分析

  1. Collect test results into JSON format
  2. Run
    results_analyzer.py
    to get statistical significance
  3. Review confidence interval, p-value, and effect size
  4. Check for segment-level effects if overall result is inconclusive
  5. Make ship/no-ship decision based on analysis
  1. 将测试结果整理为JSON格式
  2. 运行
    results_analyzer.py
    获取统计显著性结果
  3. 查看置信区间、p值和效应量
  4. 若整体结果不确定,检查细分群体层面的效应
  5. 根据分析结果决定是否上线新版本

Workflow 3: Experimentation Program Review

流程3:实验项目复盘

  1. Compile results from multiple past tests
  2. Run
    results_analyzer.py --batch
    on all results
  3. Review win rate, average effect size, and velocity
  4. Identify patterns in winning vs losing tests
  5. Optimize test pipeline based on learnings
  1. 汇总过往多个测试的结果
  2. 使用
    results_analyzer.py --batch
    批量处理所有结果
  3. 查看胜率、平均效应量和测试推进速度
  4. 识别成功与失败测试的模式
  5. 根据经验优化测试流程

Reference Documentation

参考文档

See
references/ab-testing-guide.md
for comprehensive methodology covering:
  • Statistical foundations (z-tests, confidence intervals)
  • Sample size theory and trade-offs
  • Common experimentation pitfalls
  • Multi-variant and sequential testing
  • Bayesian vs frequentist approaches
详见
references/ab-testing-guide.md
,其中包含全面的方法论,涵盖:
  • 统计基础(z检验、置信区间)
  • 样本量理论与权衡
  • 常见实验误区
  • 多变量测试与序贯测试
  • 贝叶斯方法 vs 频率主义方法

Common Patterns

常见模式

Pattern: Test Configuration JSON

模式:测试配置JSON

json
{
  "test_name": "Homepage CTA Button Color",
  "hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
  "metric_primary": "cta_click_rate",
  "metric_secondary": ["signup_rate", "bounce_rate"],
  "baseline_rate": 0.045,
  "minimum_detectable_effect": 0.10,
  "significance_level": 0.05,
  "power": 0.80,
  "variants": [
    {"name": "control", "description": "Current blue CTA button"},
    {"name": "treatment", "description": "Green CTA button"}
  ],
  "daily_traffic": 5000,
  "allocation": {"control": 0.50, "treatment": 0.50}
}
json
{
  "test_name": "Homepage CTA Button Color",
  "hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
  "metric_primary": "cta_click_rate",
  "metric_secondary": ["signup_rate", "bounce_rate"],
  "baseline_rate": 0.045,
  "minimum_detectable_effect": 0.10,
  "significance_level": 0.05,
  "power": 0.80,
  "variants": [
    {"name": "control", "description": "Current blue CTA button"},
    {"name": "treatment", "description": "Green CTA button"}
  ],
  "daily_traffic": 5000,
  "allocation": {"control": 0.50, "treatment": 0.50}
}

Pattern: Test Results JSON

模式:测试结果JSON

json
{
  "test_name": "Homepage CTA Button Color",
  "variants": {
    "control": {"visitors": 12500, "conversions": 563},
    "treatment": {"visitors": 12500, "conversions": 625}
  },
  "metric": "cta_click_rate",
  "significance_level": 0.05
}
json
{
  "test_name": "Homepage CTA Button Color",
  "variants": {
    "control": {"visitors": 12500, "conversions": 563},
    "treatment": {"visitors": 12500, "conversions": 625}
  },
  "metric": "cta_click_rate",
  "significance_level": 0.05
}

Quick Reference: Common Effect Sizes

速查:常见效应量

ContextSmall EffectMedium EffectLarge Effect
Conversion Rate2-5% relative5-15% relative> 15% relative
Revenue per User1-3%3-8%> 8%
Engagement Rate3-5%5-10%> 10%
场景小效应中等效应大效应
转化率相对提升2-5%相对提升5-15%相对提升>15%
用户人均收入提升1-3%提升3-8%提升>8%
参与率提升3-5%提升5-10%提升>10%