ab-test-setup

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

A/B Test Setup Skill

A/B测试设置技能

Overview

概述

Production-ready A/B testing toolkit for calculating sample sizes, designing rigorous test plans, and analyzing results with statistical significance testing. Designed for growth teams, product managers, and marketers who need to make data-driven decisions from controlled experiments.

这是一款可投入生产环境的A/B测试工具包，用于计算样本量、设计严谨的测试计划，并通过统计显著性测试分析结果。专为需要从受控实验中做出数据驱动决策的增长团队、产品经理和营销人员设计。

Quick Start

快速开始

bash

undefined

bash

undefined

Calculate required sample sizes for a test

计算测试所需的样本量

python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80

Design a complete A/B test plan

设计完整的A/B测试计划

python scripts/test_designer.py test_config.json

Analyze A/B test results

分析A/B测试结果

python scripts/results_analyzer.py results.json

undefined

python scripts/results_analyzer.py results.json

undefined

Tools Overview

工具概述

Tool	Purpose	Input	Output
`sample_size_calculator.py`	Sample size calculation	Baseline rate, MDE, power	Required samples + duration
`test_designer.py`	Test plan design	JSON test config	Complete test plan document
`results_analyzer.py`	Results analysis	JSON with test results	Statistical analysis + recommendation

工具	用途	输入	输出
`sample_size_calculator.py`	样本量计算	基准转化率、MDE（最小可检测效应）、检验效能	所需样本量 + 测试时长
`test_designer.py`	测试计划设计	JSON格式的测试配置文件	完整的测试计划文档
`results_analyzer.py`	结果分析	包含测试结果的JSON文件	统计分析报告 + 决策建议

Workflows

工作流程

Workflow 1: New A/B Test Setup

流程1：新建A/B测试设置

Define hypothesis and success metric
Run
```
sample_size_calculator.py
```
with baseline conversion and minimum detectable effect
Create test configuration JSON (see Common Patterns)
Run
```
test_designer.py
```
to generate complete test plan
Share plan with stakeholders for alignment before launch

定义假设和成功指标
传入基准转化率和最小可检测效应，运行
```
sample_size_calculator.py
```
创建测试配置JSON文件（参考常见模式）
运行
```
test_designer.py
```
生成完整测试计划
启动前与相关人员共享计划以达成共识

Workflow 2: Test Results Analysis

流程2：测试结果分析

Collect test results into JSON format
Run
```
results_analyzer.py
```
to get statistical significance
Review confidence interval, p-value, and effect size
Check for segment-level effects if overall result is inconclusive
Make ship/no-ship decision based on analysis

将测试结果整理为JSON格式
运行
```
results_analyzer.py
```
获取统计显著性结果
查看置信区间、p值和效应量
若整体结果不确定，检查细分群体层面的效应
根据分析结果决定是否上线新版本

Workflow 3: Experimentation Program Review

流程3：实验项目复盘

Compile results from multiple past tests
Run
```
results_analyzer.py --batch
```
on all results
Review win rate, average effect size, and velocity
Identify patterns in winning vs losing tests
Optimize test pipeline based on learnings

汇总过往多个测试的结果
使用
```
results_analyzer.py --batch
```
批量处理所有结果
查看胜率、平均效应量和测试推进速度
识别成功与失败测试的模式
根据经验优化测试流程

Reference Documentation

参考文档

See

references/ab-testing-guide.md

for comprehensive methodology covering:

Statistical foundations (z-tests, confidence intervals)
Sample size theory and trade-offs
Common experimentation pitfalls
Multi-variant and sequential testing
Bayesian vs frequentist approaches

详见

references/ab-testing-guide.md

，其中包含全面的方法论，涵盖：

统计基础（z检验、置信区间）
样本量理论与权衡
常见实验误区
多变量测试与序贯测试
贝叶斯方法 vs 频率主义方法

Common Patterns

常见模式

Pattern: Test Configuration JSON

模式：测试配置JSON

json

{
  "test_name": "Homepage CTA Button Color",
  "hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
  "metric_primary": "cta_click_rate",
  "metric_secondary": ["signup_rate", "bounce_rate"],
  "baseline_rate": 0.045,
  "minimum_detectable_effect": 0.10,
  "significance_level": 0.05,
  "power": 0.80,
  "variants": [
    {"name": "control", "description": "Current blue CTA button"},
    {"name": "treatment", "description": "Green CTA button"}
  ],
  "daily_traffic": 5000,
  "allocation": {"control": 0.50, "treatment": 0.50}
}

json

{
  "test_name": "Homepage CTA Button Color",
  "hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
  "metric_primary": "cta_click_rate",
  "metric_secondary": ["signup_rate", "bounce_rate"],
  "baseline_rate": 0.045,
  "minimum_detectable_effect": 0.10,
  "significance_level": 0.05,
  "power": 0.80,
  "variants": [
    {"name": "control", "description": "Current blue CTA button"},
    {"name": "treatment", "description": "Green CTA button"}
  ],
  "daily_traffic": 5000,
  "allocation": {"control": 0.50, "treatment": 0.50}
}

Pattern: Test Results JSON

模式：测试结果JSON

json

{
  "test_name": "Homepage CTA Button Color",
  "variants": {
    "control": {"visitors": 12500, "conversions": 563},
    "treatment": {"visitors": 12500, "conversions": 625}
  },
  "metric": "cta_click_rate",
  "significance_level": 0.05
}

json

{
  "test_name": "Homepage CTA Button Color",
  "variants": {
    "control": {"visitors": 12500, "conversions": 563},
    "treatment": {"visitors": 12500, "conversions": 625}
  },
  "metric": "cta_click_rate",
  "significance_level": 0.05
}

Quick Reference: Common Effect Sizes

速查：常见效应量

Context	Small Effect	Medium Effect	Large Effect
Conversion Rate	2-5% relative	5-15% relative	> 15% relative
Revenue per User	1-3%	3-8%	> 8%
Engagement Rate	3-5%	5-10%	> 10%

场景	小效应	中等效应	大效应
转化率	相对提升2-5%	相对提升5-15%	相对提升>15%
用户人均收入	提升1-3%	提升3-8%	提升>8%
参与率	提升3-5%	提升5-10%	提升>10%