experiment-designer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Experiment Designer

实验设计工具

Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions.

通过清晰的假设和有依据的决策，设计、确定优先级并评估产品实验。

When To Use

适用场景

Use this skill for:

A/B and multivariate experiment planning
Hypothesis writing and success criteria definition
Sample size and minimum detectable effect planning
Experiment prioritization with ICE scoring
Reading statistical output for product decisions

本技能适用于：

A/B测试和多变量实验规划
假设撰写与成功标准定义
样本量与最小可检测效应（MDE）规划
采用ICE评分确定实验优先级
解读统计输出以制定产品决策

Core Workflow

核心工作流程

Write hypothesis in If/Then/Because format

If we change
```
[intervention]
```
Then
```
[metric]
```
will change by
```
[expected direction/magnitude]
```
Because
```
[behavioral mechanism]
```

Define metrics before running test

Primary metric: single decision metric
Guardrail metrics: quality/risk protection
Secondary metrics: diagnostics only

Estimate sample size

Baseline conversion or baseline mean
Minimum detectable effect (MDE)
Significance level (alpha) and power

Use:

bash

python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute

Prioritize experiments with ICE

Impact: potential upside
Confidence: evidence quality
Ease: cost/speed/complexity

ICE Score = (Impact * Confidence * Ease) / 10

Launch with stopping rules

Decide fixed sample size or fixed duration in advance
Avoid repeated peeking without proper method
Monitor guardrails continuously

Interpret results

Statistical significance is not business significance
Compare point estimate + confidence interval to decision threshold
Investigate novelty effects and segment heterogeneity

以If/Then/Because格式撰写假设

如果我们调整
```
[干预措施]
```
那么
```
[指标]
```
将发生
```
[预期方向/幅度]
```
的变化
因为
```
[行为机制]
```

在运行测试前定义指标

核心指标：单一决策指标
防护指标：质量/风险保障指标
次要指标：仅用于诊断分析

估算样本量

基准转化率或基准均值
最小可检测效应（MDE）
显著性水平（alpha）与统计功效

使用：

bash

python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute

采用ICE评分确定实验优先级

影响（Impact）：潜在收益
置信度（Confidence）：证据质量
易用性（Ease）：成本/速度/复杂度

ICE评分 = (影响 × 置信度 × 易用性) / 10

设定停止规则后启动实验

提前确定固定样本量或固定时长
若无合适方法，避免反复查看结果
持续监控防护指标

解读实验结果

统计显著性不等于业务显著性
将点估计值+置信区间与决策阈值进行比较
调查新奇效应和细分群体异质性

Hypothesis Quality Checklist

假设质量检查清单

Common Experiment Pitfalls

常见实验陷阱

Underpowered tests leading to false negatives
Running too many simultaneous changes without isolation
Changing targeting or implementation mid-test
Stopping early on random spikes
Ignoring sample ratio mismatch and instrumentation drift
Declaring success from p-value without effect-size context

测试功效不足导致假阴性结果
同时进行过多变更而未做隔离
测试中途更改目标受众或实现方式
因随机峰值提前停止测试
忽略样本比例不匹配和工具偏差
仅依据p-value就宣称成功，未结合效应量背景

Statistical Interpretation Guardrails

统计解读准则

p-value < alpha indicates evidence against null, not guaranteed truth.
Confidence interval crossing zero/no-effect means uncertain directional claim.
Wide intervals imply low precision even when significant.
Use practical significance thresholds tied to business impact.

See:

```
references/experiment-playbook.md
```
```
references/statistics-reference.md
```

p-value < alpha仅表明存在反对原假设的证据，而非绝对真理。
置信区间跨越零/无效应值意味着方向性结论不确定。
即使结果显著，宽区间也意味着精度较低。
使用与业务影响挂钩的实际显著性阈值。

参考：

```
references/experiment-playbook.md
```
```
references/statistics-reference.md
```

Tooling

工具

scripts/sample_size_calculator.py

scripts/sample_size_calculator.py

Computes required sample size (per variant and total) from:

baseline rate
MDE (absolute or relative)
significance level (alpha)
statistical power

Example:

bash

python3 scripts/sample_size_calculator.py \
  --baseline-rate 0.10 \
  --mde 0.015 \
  --mde-type absolute \
  --alpha 0.05 \
  --power 0.8

根据以下参数计算所需样本量（每个变体和总样本量）：

基准转化率
MDE（绝对或相对）
显著性水平（alpha）
统计功效

示例：

bash

python3 scripts/sample_size_calculator.py \
  --baseline-rate 0.10 \
  --mde 0.015 \
  --mde-type absolute \
  --alpha 0.05 \
  --power 0.8

experiment-designer

Original

Translation

Experiment Designer

实验设计工具

When To Use

适用场景

Core Workflow

核心工作流程

Hypothesis Quality Checklist

假设质量检查清单

Common Experiment Pitfalls

常见实验陷阱

Statistical Interpretation Guardrails

统计解读准则

Tooling

工具

`scripts/sample_size_calculator.py`

`scripts/sample_size_calculator.py`