ab-test-setup
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseA/B Test Setup
A/B测试搭建
You are an expert in experimentation and A/B testing. Your goal is to help design tests that produce statistically valid, actionable results.
您是实验和A/B测试领域的专家。您的目标是帮助设计能产生统计有效、可落地结果的测试。
Initial Assessment
初始评估
Check for product marketing context first:
If exists, read it before asking questions. Use that context and only ask for information not already covered or specific to this task.
.claude/product-marketing-context.mdBefore designing a test, understand:
- Test Context - What are you trying to improve? What change are you considering?
- Current State - Baseline conversion rate? Current traffic volume?
- Constraints - Technical complexity? Timeline? Tools available?
首先检查产品营销背景:
如果存在文件,请在提问前阅读它。利用该背景信息,仅询问未涵盖或与本次任务相关的特定信息。
.claude/product-marketing-context.md在设计测试前,需了解:
- 测试背景 - 你想要优化什么?你正在考虑哪些变更?
- 当前状态 - 基准转化率是多少?当前流量规模是多少?
- 约束条件 - 技术复杂度如何?时间周期?可用的工具有哪些?
Core Principles
核心原则
1. Start with a Hypothesis
1. 从假设出发
- Not just "let's see what happens"
- Specific prediction of outcome
- Based on reasoning or data
- 不只是“看看会发生什么”
- 对结果的具体预测
- 基于推理或数据
2. Test One Thing
2. 单次测试单一变量
- Single variable per test
- Otherwise you don't know what worked
- 每次测试仅一个变量
- 否则你无法确定是什么因素起了作用
3. Statistical Rigor
3. 统计严谨性
- Pre-determine sample size
- Don't peek and stop early
- Commit to the methodology
- 预先确定样本量
- 不要中途查看结果并提前终止测试
- 严格遵循测试方法
4. Measure What Matters
4. 衡量关键指标
- Primary metric tied to business value
- Secondary metrics for context
- Guardrail metrics to prevent harm
- 与业务价值挂钩的核心指标
- 用于补充背景的次要指标
- 用于预防负面影响的护栏指标
Hypothesis Framework
假设框架
Structure
结构
Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].Because [observation/data],
we believe [change]
will cause [expected outcome]
for [audience].
We'll know this is true when [metrics].Example
示例
Weak: "Changing the button color might increase clicks."
Strong: "Because users report difficulty finding the CTA (per heatmaps and feedback), we believe making the button larger and using contrasting color will increase CTA clicks by 15%+ for new visitors. We'll measure click-through rate from page view to signup start."
薄弱假设:“更改按钮颜色可能会增加点击量。”
严谨假设:“根据热力图和用户反馈,用户表示难以找到CTA按钮,因此我们认为将按钮放大并使用对比色,能使新访客的CTA点击量提升15%以上。我们将通过衡量从页面浏览到注册启动的点击率来验证这一点。”
Test Types
测试类型
| Type | Description | Traffic Needed |
|---|---|---|
| A/B | Two versions, single change | Moderate |
| A/B/n | Multiple variants | Higher |
| MVT | Multiple changes in combinations | Very high |
| Split URL | Different URLs for variants | Moderate |
| 类型 | 描述 | 所需流量 |
|---|---|---|
| A/B | 两个版本,单一变更 | 中等 |
| A/B/n | 多个变体 | 较高 |
| MVT | 多变量组合变更 | 极高 |
| Split URL | 为变体使用不同URL | 中等 |
Sample Size
样本量
Quick Reference
快速参考
| Baseline | 10% Lift | 20% Lift | 50% Lift |
|---|---|---|---|
| 1% | 150k/variant | 39k/variant | 6k/variant |
| 3% | 47k/variant | 12k/variant | 2k/variant |
| 5% | 27k/variant | 7k/variant | 1.2k/variant |
| 10% | 12k/variant | 3k/variant | 550/variant |
Calculators:
For detailed sample size tables and duration calculations: See references/sample-size-guide.md
| 基准转化率 | 提升10% | 提升20% | 提升50% |
|---|---|---|---|
| 1% | 150k/变体 | 39k/变体 | 6k/变体 |
| 3% | 47k/变体 | 12k/变体 | 2k/变体 |
| 5% | 27k/变体 | 7k/变体 | 1.2k/变体 |
| 10% | 12k/变体 | 3k/变体 | 550/变体 |
计算器工具:
如需详细的样本量表和测试时长计算:请查看references/sample-size-guide.md
Metrics Selection
指标选择
Primary Metric
核心指标
- Single metric that matters most
- Directly tied to hypothesis
- What you'll use to call the test
- 最关键的单一指标
- 与假设直接相关
- 用于判定测试结果的指标
Secondary Metrics
次要指标
- Support primary metric interpretation
- Explain why/how the change worked
- 辅助解读核心指标
- 解释变更生效的原因/方式
Guardrail Metrics
护栏指标
- Things that shouldn't get worse
- Stop test if significantly negative
- 不应出现恶化的指标
- 若出现显著负面影响则终止测试
Example: Pricing Page Test
示例:定价页面测试
- Primary: Plan selection rate
- Secondary: Time on page, plan distribution
- Guardrail: Support tickets, refund rate
- 核心指标:方案选择率
- 次要指标:页面停留时长、方案分布情况
- 护栏指标:支持工单量、退款率
Designing Variants
变体设计
What to Vary
可变更内容
| Category | Examples |
|---|---|
| Headlines/Copy | Message angle, value prop, specificity, tone |
| Visual Design | Layout, color, images, hierarchy |
| CTA | Button copy, size, placement, number |
| Content | Information included, order, amount, social proof |
| 类别 | 示例 |
|---|---|
| 标题/文案 | 信息角度、价值主张、具体性、语气 |
| 视觉设计 | 布局、颜色、图片、层级结构 |
| CTA | 按钮文案、尺寸、位置、数量 |
| 内容 | 包含的信息、顺序、篇幅、社交证明 |
Best Practices
最佳实践
- Single, meaningful change
- Bold enough to make a difference
- True to the hypothesis
- 单一、有意义的变更
- 变更幅度足够大以产生影响
- 与假设一致
Traffic Allocation
流量分配
| Approach | Split | When to Use |
|---|---|---|
| Standard | 50/50 | Default for A/B |
| Conservative | 90/10, 80/20 | Limit risk of bad variant |
| Ramping | Start small, increase | Technical risk mitigation |
Considerations:
- Consistency: Users see same variant on return
- Balanced exposure across time of day/week
| 分配方式 | 流量拆分比例 | 使用场景 |
|---|---|---|
| 标准分配 | 50/50 | A/B测试默认方式 |
| 保守分配 | 90/10、80/20 | 限制不良变体的风险 |
| 逐步扩容 | 从小流量开始,逐步增加 | 降低技术风险 |
注意事项:
- 一致性:用户返回时看到相同的变体
- 在不同时段/周内均衡曝光
Implementation
实施方式
Client-Side
客户端侧
- JavaScript modifies page after load
- Quick to implement, can cause flicker
- Tools: PostHog, Optimizely, VWO
- JavaScript在页面加载后修改内容
- 实施快速,但可能出现页面闪烁
- 工具:PostHog、Optimizely、VWO
Server-Side
服务端侧
- Variant determined before render
- No flicker, requires dev work
- Tools: PostHog, LaunchDarkly, Split
- 在渲染前确定变体
- 无页面闪烁,但需要开发工作
- 工具:PostHog、LaunchDarkly、Split
Running the Test
测试运行
Pre-Launch Checklist
启动前检查清单
- Hypothesis documented
- Primary metric defined
- Sample size calculated
- Variants implemented correctly
- Tracking verified
- QA completed on all variants
- 假设已记录
- 核心指标已定义
- 样本量已计算
- 变体已正确实施
- 跟踪已验证
- 所有变体已完成QA测试
During the Test
测试进行中
DO:
- Monitor for technical issues
- Check segment quality
- Document external factors
DON'T:
- Peek at results and stop early
- Make changes to variants
- Add traffic from new sources
建议做:
- 监控技术问题
- 检查细分群体质量
- 记录外部因素
禁止做:
- 中途查看结果并提前终止测试
- 修改变体内容
- 新增来自新渠道的流量
The Peeking Problem
中途查看问题
Looking at results before reaching sample size and stopping early leads to false positives and wrong decisions. Pre-commit to sample size and trust the process.
在达到样本量前查看结果并提前终止测试,会导致假阳性结果和错误决策。需预先承诺样本量并遵循流程。
Analyzing Results
结果分析
Statistical Significance
统计显著性
- 95% confidence = p-value < 0.05
- Means <5% chance result is random
- Not a guarantee—just a threshold
- 95%置信度 = p值 < 0.05
- 意味着结果由随机因素导致的概率小于5%
- 并非绝对保证,只是一个判定阈值
Analysis Checklist
分析检查清单
- Reach sample size? If not, result is preliminary
- Statistically significant? Check confidence intervals
- Effect size meaningful? Compare to MDE, project impact
- Secondary metrics consistent? Support the primary?
- Guardrail concerns? Anything get worse?
- Segment differences? Mobile vs. desktop? New vs. returning?
- 是否达到样本量? 若未达到,结果为初步结果
- 是否具有统计显著性? 查看置信区间
- 效果幅度是否有意义? 与最小可检测效果(MDE)、项目影响对比
- 次要指标是否一致? 是否支持核心指标?
- 护栏指标是否有问题? 是否有指标恶化?
- 细分群体是否有差异? 移动端 vs 桌面端?新用户 vs 老用户?
Interpreting Results
结果解读
| Result | Conclusion |
|---|---|
| Significant winner | Implement variant |
| Significant loser | Keep control, learn why |
| No significant difference | Need more traffic or bolder test |
| Mixed signals | Dig deeper, maybe segment |
| 结果 | 结论 |
|---|---|
| 显著胜出 | 实施该变体 |
| 显著落败 | 保留对照组,分析原因 |
| 无显著差异 | 需要更多流量或更大胆的测试 |
| 信号混杂 | 深入分析,可尝试细分群体 |
Documentation
文档记录
Document every test with:
- Hypothesis
- Variants (with screenshots)
- Results (sample, metrics, significance)
- Decision and learnings
For templates: See references/test-templates.md
为每个测试记录以下内容:
- 假设
- 变体(含截图)
- 结果(样本量、指标、显著性)
- 决策与经验总结
如需模板:请查看references/test-templates.md
Common Mistakes
常见错误
Test Design
测试设计
- Testing too small a change (undetectable)
- Testing too many things (can't isolate)
- No clear hypothesis
- 测试的变更幅度太小(无法检测到效果)
- 同时测试过多变量(无法隔离影响因素)
- 没有明确的假设
Execution
执行阶段
- Stopping early
- Changing things mid-test
- Not checking implementation
- 提前终止测试
- 测试中途修改内容
- 未检查实施正确性
Analysis
分析阶段
- Ignoring confidence intervals
- Cherry-picking segments
- Over-interpreting inconclusive results
- 忽略置信区间
- 选择性挑选细分群体
- 过度解读无明确结论的结果
Task-Specific Questions
任务特定问题
- What's your current conversion rate?
- How much traffic does this page get?
- What change are you considering and why?
- What's the smallest improvement worth detecting?
- What tools do you have for testing?
- Have you tested this area before?
- 你当前的转化率是多少?
- 该页面的流量规模是多少?
- 你正在考虑哪些变更,原因是什么?
- 值得检测的最小提升幅度是多少?
- 你拥有哪些测试工具?
- 你之前是否在该领域进行过测试?
Related Skills
相关技能
- page-cro: For generating test ideas based on CRO principles
- analytics-tracking: For setting up test measurement
- copywriting: For creating variant copy
- page-cro:基于CRO原则生成测试思路
- analytics-tracking:用于设置测试跟踪
- copywriting:用于创作变体文案