ab-test-setup
Original:🇺🇸 English
Translated
3 scriptsChecked / no sensitive code detected
This skill should be used when the user asks to "set up an A/B test", "calculate sample size", "design an experiment", "analyze A/B test results", "check statistical significance", "determine test duration", or "evaluate conversion rate experiments".
3installs
Sourceborghei/claude-skills
Added on
NPX Install
npx skill4agent add borghei/claude-skills ab-test-setupTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →A/B Test Setup Skill
Overview
Production-ready A/B testing toolkit for calculating sample sizes, designing rigorous test plans, and analyzing results with statistical significance testing. Designed for growth teams, product managers, and marketers who need to make data-driven decisions from controlled experiments.
Quick Start
bash
# Calculate required sample sizes for a test
python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80
# Design a complete A/B test plan
python scripts/test_designer.py test_config.json
# Analyze A/B test results
python scripts/results_analyzer.py results.jsonTools Overview
| Tool | Purpose | Input | Output |
|---|---|---|---|
| Sample size calculation | Baseline rate, MDE, power | Required samples + duration |
| Test plan design | JSON test config | Complete test plan document |
| Results analysis | JSON with test results | Statistical analysis + recommendation |
Workflows
Workflow 1: New A/B Test Setup
- Define hypothesis and success metric
- Run with baseline conversion and minimum detectable effect
sample_size_calculator.py - Create test configuration JSON (see Common Patterns)
- Run to generate complete test plan
test_designer.py - Share plan with stakeholders for alignment before launch
Workflow 2: Test Results Analysis
- Collect test results into JSON format
- Run to get statistical significance
results_analyzer.py - Review confidence interval, p-value, and effect size
- Check for segment-level effects if overall result is inconclusive
- Make ship/no-ship decision based on analysis
Workflow 3: Experimentation Program Review
- Compile results from multiple past tests
- Run on all results
results_analyzer.py --batch - Review win rate, average effect size, and velocity
- Identify patterns in winning vs losing tests
- Optimize test pipeline based on learnings
Reference Documentation
See for comprehensive methodology covering:
references/ab-testing-guide.md- Statistical foundations (z-tests, confidence intervals)
- Sample size theory and trade-offs
- Common experimentation pitfalls
- Multi-variant and sequential testing
- Bayesian vs frequentist approaches
Common Patterns
Pattern: Test Configuration JSON
json
{
"test_name": "Homepage CTA Button Color",
"hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
"metric_primary": "cta_click_rate",
"metric_secondary": ["signup_rate", "bounce_rate"],
"baseline_rate": 0.045,
"minimum_detectable_effect": 0.10,
"significance_level": 0.05,
"power": 0.80,
"variants": [
{"name": "control", "description": "Current blue CTA button"},
{"name": "treatment", "description": "Green CTA button"}
],
"daily_traffic": 5000,
"allocation": {"control": 0.50, "treatment": 0.50}
}Pattern: Test Results JSON
json
{
"test_name": "Homepage CTA Button Color",
"variants": {
"control": {"visitors": 12500, "conversions": 563},
"treatment": {"visitors": 12500, "conversions": 625}
},
"metric": "cta_click_rate",
"significance_level": 0.05
}Quick Reference: Common Effect Sizes
| Context | Small Effect | Medium Effect | Large Effect |
|---|---|---|---|
| Conversion Rate | 2-5% relative | 5-15% relative | > 15% relative |
| Revenue per User | 1-3% | 3-8% | > 8% |
| Engagement Rate | 3-5% | 5-10% | > 10% |