experiment-designer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseExperiment Designer
实验设计工具
Design, prioritize, and evaluate product experiments with clear hypotheses and defensible decisions.
通过清晰的假设和有依据的决策,设计、确定优先级并评估产品实验。
When To Use
适用场景
Use this skill for:
- A/B and multivariate experiment planning
- Hypothesis writing and success criteria definition
- Sample size and minimum detectable effect planning
- Experiment prioritization with ICE scoring
- Reading statistical output for product decisions
本技能适用于:
- A/B测试和多变量实验规划
- 假设撰写与成功标准定义
- 样本量与最小可检测效应(MDE)规划
- 采用ICE评分确定实验优先级
- 解读统计输出以制定产品决策
Core Workflow
核心工作流程
- Write hypothesis in If/Then/Because format
- If we change
[intervention] - Then will change by
[metric][expected direction/magnitude] - Because
[behavioral mechanism]
- Define metrics before running test
- Primary metric: single decision metric
- Guardrail metrics: quality/risk protection
- Secondary metrics: diagnostics only
- Estimate sample size
- Baseline conversion or baseline mean
- Minimum detectable effect (MDE)
- Significance level (alpha) and power
Use:
bash
python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute- Prioritize experiments with ICE
- Impact: potential upside
- Confidence: evidence quality
- Ease: cost/speed/complexity
ICE Score = (Impact * Confidence * Ease) / 10
- Launch with stopping rules
- Decide fixed sample size or fixed duration in advance
- Avoid repeated peeking without proper method
- Monitor guardrails continuously
- Interpret results
- Statistical significance is not business significance
- Compare point estimate + confidence interval to decision threshold
- Investigate novelty effects and segment heterogeneity
- 以If/Then/Because格式撰写假设
- 如果我们调整
[干预措施] - 那么将发生
[指标]的变化[预期方向/幅度] - 因为
[行为机制]
- 在运行测试前定义指标
- 核心指标:单一决策指标
- 防护指标:质量/风险保障指标
- 次要指标:仅用于诊断分析
- 估算样本量
- 基准转化率或基准均值
- 最小可检测效应(MDE)
- 显著性水平(alpha)与统计功效
使用:
bash
python3 scripts/sample_size_calculator.py --baseline-rate 0.12 --mde 0.02 --mde-type absolute- 采用ICE评分确定实验优先级
- 影响(Impact):潜在收益
- 置信度(Confidence):证据质量
- 易用性(Ease):成本/速度/复杂度
ICE评分 = (影响 × 置信度 × 易用性) / 10
- 设定停止规则后启动实验
- 提前确定固定样本量或固定时长
- 若无合适方法,避免反复查看结果
- 持续监控防护指标
- 解读实验结果
- 统计显著性不等于业务显著性
- 将点估计值+置信区间与决策阈值进行比较
- 调查新奇效应和细分群体异质性
Hypothesis Quality Checklist
假设质量检查清单
- Contains explicit intervention and audience
- Specifies measurable metric change
- States plausible causal reason
- Includes expected minimum effect
- Defines failure condition
- 包含明确的干预措施和受众
- 指定可衡量的指标变化
- 阐述合理的因果理由
- 包含预期的最小效应
- 定义失败条件
Common Experiment Pitfalls
常见实验陷阱
- Underpowered tests leading to false negatives
- Running too many simultaneous changes without isolation
- Changing targeting or implementation mid-test
- Stopping early on random spikes
- Ignoring sample ratio mismatch and instrumentation drift
- Declaring success from p-value without effect-size context
- 测试功效不足导致假阴性结果
- 同时进行过多变更而未做隔离
- 测试中途更改目标受众或实现方式
- 因随机峰值提前停止测试
- 忽略样本比例不匹配和工具偏差
- 仅依据p-value就宣称成功,未结合效应量背景
Statistical Interpretation Guardrails
统计解读准则
- p-value < alpha indicates evidence against null, not guaranteed truth.
- Confidence interval crossing zero/no-effect means uncertain directional claim.
- Wide intervals imply low precision even when significant.
- Use practical significance thresholds tied to business impact.
See:
references/experiment-playbook.mdreferences/statistics-reference.md
- p-value < alpha仅表明存在反对原假设的证据,而非绝对真理。
- 置信区间跨越零/无效应值意味着方向性结论不确定。
- 即使结果显著,宽区间也意味着精度较低。
- 使用与业务影响挂钩的实际显著性阈值。
参考:
references/experiment-playbook.mdreferences/statistics-reference.md
Tooling
工具
scripts/sample_size_calculator.py
scripts/sample_size_calculator.pyscripts/sample_size_calculator.py
scripts/sample_size_calculator.pyComputes required sample size (per variant and total) from:
- baseline rate
- MDE (absolute or relative)
- significance level (alpha)
- statistical power
Example:
bash
python3 scripts/sample_size_calculator.py \
--baseline-rate 0.10 \
--mde 0.015 \
--mde-type absolute \
--alpha 0.05 \
--power 0.8根据以下参数计算所需样本量(每个变体和总样本量):
- 基准转化率
- MDE(绝对或相对)
- 显著性水平(alpha)
- 统计功效
示例:
bash
python3 scripts/sample_size_calculator.py \
--baseline-rate 0.10 \
--mde 0.015 \
--mde-type absolute \
--alpha 0.05 \
--power 0.8