ab-test-setup
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseA/B Test Setup Skill
A/B测试设置技能
Overview
概述
Production-ready A/B testing toolkit for calculating sample sizes, designing rigorous test plans, and analyzing results with statistical significance testing. Designed for growth teams, product managers, and marketers who need to make data-driven decisions from controlled experiments.
这是一款可投入生产环境的A/B测试工具包,用于计算样本量、设计严谨的测试计划,并通过统计显著性测试分析结果。专为需要从受控实验中做出数据驱动决策的增长团队、产品经理和营销人员设计。
Quick Start
快速开始
bash
undefinedbash
undefinedCalculate required sample sizes for a test
计算测试所需的样本量
python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80
python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80
Design a complete A/B test plan
设计完整的A/B测试计划
python scripts/test_designer.py test_config.json
python scripts/test_designer.py test_config.json
Analyze A/B test results
分析A/B测试结果
python scripts/results_analyzer.py results.json
undefinedpython scripts/results_analyzer.py results.json
undefinedTools Overview
工具概述
| Tool | Purpose | Input | Output |
|---|---|---|---|
| Sample size calculation | Baseline rate, MDE, power | Required samples + duration |
| Test plan design | JSON test config | Complete test plan document |
| Results analysis | JSON with test results | Statistical analysis + recommendation |
| 工具 | 用途 | 输入 | 输出 |
|---|---|---|---|
| 样本量计算 | 基准转化率、MDE(最小可检测效应)、检验效能 | 所需样本量 + 测试时长 |
| 测试计划设计 | JSON格式的测试配置文件 | 完整的测试计划文档 |
| 结果分析 | 包含测试结果的JSON文件 | 统计分析报告 + 决策建议 |
Workflows
工作流程
Workflow 1: New A/B Test Setup
流程1:新建A/B测试设置
- Define hypothesis and success metric
- Run with baseline conversion and minimum detectable effect
sample_size_calculator.py - Create test configuration JSON (see Common Patterns)
- Run to generate complete test plan
test_designer.py - Share plan with stakeholders for alignment before launch
- 定义假设和成功指标
- 传入基准转化率和最小可检测效应,运行
sample_size_calculator.py - 创建测试配置JSON文件(参考常见模式)
- 运行生成完整测试计划
test_designer.py - 启动前与相关人员共享计划以达成共识
Workflow 2: Test Results Analysis
流程2:测试结果分析
- Collect test results into JSON format
- Run to get statistical significance
results_analyzer.py - Review confidence interval, p-value, and effect size
- Check for segment-level effects if overall result is inconclusive
- Make ship/no-ship decision based on analysis
- 将测试结果整理为JSON格式
- 运行获取统计显著性结果
results_analyzer.py - 查看置信区间、p值和效应量
- 若整体结果不确定,检查细分群体层面的效应
- 根据分析结果决定是否上线新版本
Workflow 3: Experimentation Program Review
流程3:实验项目复盘
- Compile results from multiple past tests
- Run on all results
results_analyzer.py --batch - Review win rate, average effect size, and velocity
- Identify patterns in winning vs losing tests
- Optimize test pipeline based on learnings
- 汇总过往多个测试的结果
- 使用批量处理所有结果
results_analyzer.py --batch - 查看胜率、平均效应量和测试推进速度
- 识别成功与失败测试的模式
- 根据经验优化测试流程
Reference Documentation
参考文档
See for comprehensive methodology covering:
references/ab-testing-guide.md- Statistical foundations (z-tests, confidence intervals)
- Sample size theory and trade-offs
- Common experimentation pitfalls
- Multi-variant and sequential testing
- Bayesian vs frequentist approaches
详见,其中包含全面的方法论,涵盖:
references/ab-testing-guide.md- 统计基础(z检验、置信区间)
- 样本量理论与权衡
- 常见实验误区
- 多变量测试与序贯测试
- 贝叶斯方法 vs 频率主义方法
Common Patterns
常见模式
Pattern: Test Configuration JSON
模式:测试配置JSON
json
{
"test_name": "Homepage CTA Button Color",
"hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
"metric_primary": "cta_click_rate",
"metric_secondary": ["signup_rate", "bounce_rate"],
"baseline_rate": 0.045,
"minimum_detectable_effect": 0.10,
"significance_level": 0.05,
"power": 0.80,
"variants": [
{"name": "control", "description": "Current blue CTA button"},
{"name": "treatment", "description": "Green CTA button"}
],
"daily_traffic": 5000,
"allocation": {"control": 0.50, "treatment": 0.50}
}json
{
"test_name": "Homepage CTA Button Color",
"hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
"metric_primary": "cta_click_rate",
"metric_secondary": ["signup_rate", "bounce_rate"],
"baseline_rate": 0.045,
"minimum_detectable_effect": 0.10,
"significance_level": 0.05,
"power": 0.80,
"variants": [
{"name": "control", "description": "Current blue CTA button"},
{"name": "treatment", "description": "Green CTA button"}
],
"daily_traffic": 5000,
"allocation": {"control": 0.50, "treatment": 0.50}
}Pattern: Test Results JSON
模式:测试结果JSON
json
{
"test_name": "Homepage CTA Button Color",
"variants": {
"control": {"visitors": 12500, "conversions": 563},
"treatment": {"visitors": 12500, "conversions": 625}
},
"metric": "cta_click_rate",
"significance_level": 0.05
}json
{
"test_name": "Homepage CTA Button Color",
"variants": {
"control": {"visitors": 12500, "conversions": 563},
"treatment": {"visitors": 12500, "conversions": 625}
},
"metric": "cta_click_rate",
"significance_level": 0.05
}Quick Reference: Common Effect Sizes
速查:常见效应量
| Context | Small Effect | Medium Effect | Large Effect |
|---|---|---|---|
| Conversion Rate | 2-5% relative | 5-15% relative | > 15% relative |
| Revenue per User | 1-3% | 3-8% | > 8% |
| Engagement Rate | 3-5% | 5-10% | > 10% |
| 场景 | 小效应 | 中等效应 | 大效应 |
|---|---|---|---|
| 转化率 | 相对提升2-5% | 相对提升5-15% | 相对提升>15% |
| 用户人均收入 | 提升1-3% | 提升3-8% | 提升>8% |
| 参与率 | 提升3-5% | 提升5-10% | 提升>10% |