ab-test-setup

Original：🇺🇸 English

Translated

3 scriptsChecked / no sensitive code detected

This skill should be used when the user asks to "set up an A/B test", "calculate sample size", "design an experiment", "analyze A/B test results", "check statistical significance", "determine test duration", or "evaluate conversion rate experiments".

3installs

Sourceborghei/claude-skills

Added on2026-04-18

NPX Install

npx skill4agent add borghei/claude-skills ab-test-setup

SKILL.md Content

View Translation Comparison →

A/B Test Setup Skill

Overview

Production-ready A/B testing toolkit for calculating sample sizes, designing rigorous test plans, and analyzing results with statistical significance testing. Designed for growth teams, product managers, and marketers who need to make data-driven decisions from controlled experiments.

Quick Start

bash

# Calculate required sample sizes for a test
python scripts/sample_size_calculator.py --baseline 0.05 --mde 0.10 --power 0.80

# Design a complete A/B test plan
python scripts/test_designer.py test_config.json

# Analyze A/B test results
python scripts/results_analyzer.py results.json

Tools Overview

Tool	Purpose	Input	Output
`sample_size_calculator.py`	Sample size calculation	Baseline rate, MDE, power	Required samples + duration
`test_designer.py`	Test plan design	JSON test config	Complete test plan document
`results_analyzer.py`	Results analysis	JSON with test results	Statistical analysis + recommendation

Workflows

Workflow 1: New A/B Test Setup

Define hypothesis and success metric
Run
```
sample_size_calculator.py
```
with baseline conversion and minimum detectable effect
Create test configuration JSON (see Common Patterns)
Run
```
test_designer.py
```
to generate complete test plan
Share plan with stakeholders for alignment before launch

Workflow 2: Test Results Analysis

Collect test results into JSON format
Run
```
results_analyzer.py
```
to get statistical significance
Review confidence interval, p-value, and effect size
Check for segment-level effects if overall result is inconclusive
Make ship/no-ship decision based on analysis

Workflow 3: Experimentation Program Review

Compile results from multiple past tests
Run
```
results_analyzer.py --batch
```
on all results
Review win rate, average effect size, and velocity
Identify patterns in winning vs losing tests
Optimize test pipeline based on learnings

Reference Documentation

See

references/ab-testing-guide.md

for comprehensive methodology covering:

Statistical foundations (z-tests, confidence intervals)
Sample size theory and trade-offs
Common experimentation pitfalls
Multi-variant and sequential testing
Bayesian vs frequentist approaches

Common Patterns

Pattern: Test Configuration JSON

json

{
  "test_name": "Homepage CTA Button Color",
  "hypothesis": "Changing the CTA button from blue to green will increase click-through rate",
  "metric_primary": "cta_click_rate",
  "metric_secondary": ["signup_rate", "bounce_rate"],
  "baseline_rate": 0.045,
  "minimum_detectable_effect": 0.10,
  "significance_level": 0.05,
  "power": 0.80,
  "variants": [
    {"name": "control", "description": "Current blue CTA button"},
    {"name": "treatment", "description": "Green CTA button"}
  ],
  "daily_traffic": 5000,
  "allocation": {"control": 0.50, "treatment": 0.50}
}

Pattern: Test Results JSON