cro-optimization

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

CRO Optimization

CRO优化

Run conversion rate optimization as a structured discipline: audit → hypothesize → test → decide. Stack-agnostic. Tool-agnostic.
This skill is for running tests against existing pages and flows. For writing landing page copy from scratch, use
landing-page-copy
. For setting up the analytics that make CRO possible, use
analytics-strategy
.

将转化率优化作为结构化流程执行:审计 → 假设 → 测试 → 决策。与技术栈无关,与工具无关。
此技能适用于针对现有页面和流程开展测试。若要从头撰写着陆页文案,请使用
landing-page-copy
技能。若要搭建支持CRO的分析体系,请使用
analytics-strategy
技能。

When to use

适用场景

  • Converting traffic at lower rate than expected
  • Specific funnel step has high drop-off
  • Pages with high traffic that could move the needle if optimized
  • A/B testing infrastructure exists (or can be set up)
  • Statistical significance and sample size questions
  • 流量转化率低于预期
  • 转化漏斗某一环节流失率过高
  • 高流量页面,优化后可显著提升整体指标
  • 已具备(或可搭建)A/B测试基础设施
  • 存在统计显著性和样本量相关疑问

When NOT to use

不适用场景

  • Without sufficient traffic to test (under ~5,000 monthly conversions per variant)
  • Pre-launch (no users to test on yet)
  • Strategy or messaging-level questions that need qualitative research first
  • Brand-defining choices (CRO can't optimize a fundamentally wrong brand)

  • 流量不足,无法开展测试(每个变体每月转化量低于约5000次)
  • 产品未上线(暂无用户可用于测试)
  • 需要先进行定性研究的策略或消息层面问题
  • 品牌定义类决策(CRO无法优化根本性错误的品牌)

Required inputs

必需输入

  • The page or flow under optimization
  • Current conversion rate and traffic volume
  • Access to analytics (event tracking, funnel data)
  • An A/B testing tool (or willingness to set one up)
  • Time and budget for testing (typically 2 to 8 weeks per test)

  • 待优化的页面或流程
  • 当前转化率和流量规模
  • 分析数据访问权限(事件追踪、漏斗数据)
  • A/B测试工具(或愿意搭建该工具)
  • 测试所需的时间和预算(通常每个测试需2至8周)

The framework: 4 phases

框架:4个阶段

1. Audit

1. 审计

Diagnose before treating.
Quantitative audit:
  • Funnel data. Where are users dropping off? The biggest drop is the biggest opportunity.
  • Segmentation. Does the funnel perform differently by source, device, geography, audience type?
  • Performance data. Are slow pages dragging conversions?
  • Search Console / on-site search. What are users looking for that they can't find?
Qualitative audit:
  • Session replay. Watch 20+ sessions of users on the target flow. Note friction, confusion, hesitation.
  • Heatmaps. Where do users click? Where do they scroll? Where do they not?
  • User interviews / surveys. Why did users not convert? Survey people who started but abandoned.
  • Form analytics. Which fields cause abandonment? Which cause errors?
  • Customer support tickets. What conversion-related questions come in?
Heuristic audit:
  • Apply CRO heuristics to the flow:
    • Is the value proposition clear in 5 seconds?
    • Is there a single primary CTA per page?
    • Is the form length appropriate to the offer?
    • Is the trust/social proof present?
    • Are objections handled?
    • Is the page accessible? (Accessibility issues hurt conversion silently.)
The audit produces a list of suspected friction points. Each becomes a hypothesis candidate.
先诊断再优化。
定量审计:
  • 漏斗数据:用户在哪个环节流失?流失最严重的环节就是最大的优化机会。
  • 细分分析:不同流量来源、设备、地域、受众类型的漏斗表现是否存在差异?
  • 性能数据:页面加载缓慢是否拖低了转化率?
  • Search Console/站内搜索:用户在寻找哪些他们无法找到的内容?
定性审计:
  • 会话重放:观看20+个目标流程的用户会话,记录摩擦点、困惑点和犹豫行为。
  • 热力图:用户点击了哪里?滚动到了哪里?哪些区域未被触及?
  • 用户访谈/调研:用户为何未完成转化?对已开始但中途放弃的用户进行调研。
  • 表单分析:哪些字段导致用户放弃?哪些字段容易引发错误?
  • 客服工单:收到哪些与转化相关的问题?
启发式审计:
  • 将CRO启发式原则应用于流程:
    • 价值主张是否能在5秒内清晰传达?
    • 每页是否有唯一的主要CTA?
    • 表单长度是否与提供的价值匹配?
    • 是否有信任/社交证明?
    • 是否处理了用户异议?
    • 页面是否具备可访问性?(可访问性问题会悄无声息地损害转化率。)
审计会生成疑似摩擦点清单,每个摩擦点都可作为假设候选。

2. Hypothesis

2. 假设

A testable statement.
Hypothesis structure:
Because [observation from audit], we believe that [change] will produce [predicted outcome] for [user segment], because [reason].
Example:
Because session replays show users abandoning at the shipping step (audit), we believe that adding visible shipping cost to the product page (change) will increase add-to-cart conversion by 5 percent (outcome) for desktop users (segment), because users are surprised by shipping cost and abandon (reason).
Hypothesis quality criteria:
  • Specific change (not "improve the design")
  • Measurable outcome (with a target)
  • Grounded in evidence (audit, research, prior tests)
  • Tied to a known mechanism (why would this work?)
Hypothesis prioritization (ICE or PIE):
  • Impact: How much could this move the metric?
  • Confidence: How likely is the hypothesis to be right?
  • Ease: How easy to test? (Time, complexity, risk)
Score each 1 to 10. Highest combined scores test first.
可测试的陈述。
假设结构:
由于[审计观察结果],我们认为[变更内容]将为[用户细分群体]带来[预期结果],原因是[理由]。
示例:
由于会话重放显示用户在配送环节放弃转化(审计结果),我们认为在产品页面添加可见的配送费用(变更内容)将使桌面端用户的加购转化率提升5%(预期结果),因为用户会因意外的配送费用而放弃转化(理由)。
假设质量标准:
  • 具体的变更内容(而非“改进设计”)
  • 可衡量的结果(带有目标值)
  • 基于证据(审计、研究、过往测试)
  • 关联已知机制(为何此变更会生效?)
假设优先级排序(ICE或PIE框架):
  • 影响(Impact):该假设能在多大程度上改变指标?
  • 信心(Confidence):该假设正确的可能性有多大?
  • 易用性(Ease):测试难度如何?(时间、复杂度、风险)
每项评分1至10分,总分最高的假设优先测试。

3. Test design

3. 测试设计

A test that produces an unambiguous answer.
Sample size and duration:
Use a sample size calculator (most A/B tools have one) before launching. Inputs:
  • Baseline conversion rate
  • Minimum detectable effect (the smallest lift you'd care about)
  • Statistical power (typically 80%)
  • Significance level (typically 95%)
This produces required sample size per variant. Run the test until that sample is reached, OR for a minimum duration that captures full business cycle (typically 2 weeks minimum, to cover weekends and weekly patterns).
Common test setup mistakes:
  • Stopping the test the moment significance is hit (peeking)
  • Running tests for too short to capture a full business cycle
  • Running multiple overlapping tests on the same flow
  • Testing during atypical periods (Black Friday, holidays, major campaigns)
  • Excluding mobile when 50%+ of traffic is mobile (or vice versa)
  • Testing on too small a slice of traffic (low statistical power)
  • Not segmenting analysis (overall lift can hide negative impact on a segment)
Test parameters to define before launch:
  • Primary metric (one)
  • Guardrail metrics (do not go down)
  • Sample size
  • Duration (minimum and maximum)
  • Decision criteria (when to ship, when to kill, when to extend)
  • Segments to analyze in addition to overall
能得出明确结论的测试。
样本量和时长:
在启动测试前使用样本量计算器(大多数A/B测试工具内置该功能),输入以下参数:
  • 基准转化率
  • 最小可检测效果(你关心的最小提升幅度)
  • 统计功效(通常为80%)
  • 显著性水平(通常为95%)
计算结果会得出每个变体所需的样本量。测试需运行至达到该样本量,或至少覆盖完整业务周期(通常最短2周,以涵盖周末和周度模式)。
常见测试设置错误:
  • 一达到显著性就停止测试(偷看数据)
  • 测试时长过短,未覆盖完整业务周期
  • 在同一流程上同时运行多个重叠测试
  • 在非典型时期开展测试(黑色星期五、节假日、大型营销活动期间)
  • 当50%以上流量来自移动端时排除移动端(反之亦然)
  • 在过小的流量切片上测试(统计功效低)
  • 未进行细分分析(整体提升可能掩盖对某一细分群体的负面影响)
启动前需定义的测试参数:
  • 核心指标(1个)
  • 护栏指标(不得下降)
  • 样本量
  • 时长(最短和最长)
  • 决策标准(何时上线、何时终止、何时延长)
  • 除整体分析外需关注的细分群体

4. Decide

4. 决策

After the test concludes.
Decision framework:
OutcomeDecision
Variant clearly wins (>95% significance, exceeds minimum effect)Ship variant. Document. Continue testing.
Variant clearly losesKill. Capture the lesson. Iterate hypothesis.
Inconclusive (neither significant)Larger test, different angle, or move on. Don't ship "tied" variants.
Small lift, lots of varianceProbably not worth shipping. Even if "winner," may not replicate.
Wins overall, loses for important segmentInvestigate segment. Consider segment-specific solution.
Anti-patterns:
  • "It looks like it's winning, ship it" before reaching significance
  • Shipping a variant because the team wants to (HiPPO - highest paid person's opinion)
  • Killing tests too early because they look bad
  • Re-running tests until they "win" (false positive risk)
  • Not capturing the learning when a test loses

测试结束后进行。
决策框架:
结果决策
变体明显胜出(显著性>95%,超过最小效果)上线变体。记录结果。继续测试。
变体明显失败终止测试。吸取经验教训。迭代假设。
结果不确定(均无显著性)扩大测试规模、更换角度或转向其他假设。不要上线“平局”变体。
小幅提升,方差较大可能不值得上线。即使是“胜出”变体,结果也可能无法复现。
整体胜出,但对重要细分群体不利调研该细分群体。考虑针对细分群体的解决方案。
反模式:
  • 在达到显著性前就“看起来要赢了,上线吧”
  • 因团队意愿而上线变体(HiPPO - 最高薪资人员意见)
  • 因初期表现不佳而过早终止测试
  • 反复测试直到“胜出”(存在假阳性风险)
  • 测试失败时未记录经验教训

Statistical foundations

统计基础

Significance and confidence

显著性和置信度

A 95% significance level means: if there were truly no difference between variants, there's only a 5% chance you'd see results this extreme by chance.
That's not the same as "95% chance the variant wins."
Most CRO tools report Bayesian probabilities ("95% chance of being best"). Read the methodology your tool uses.
95%的显著性水平意味着:如果变体之间确实没有差异,那么仅凭偶然因素得到如此极端结果的概率仅为5%。
这并不等同于“变体有95%的概率胜出”。
大多数CRO工具报告贝叶斯概率(“95%的概率成为最优选项”)。请了解你所使用工具的方法论。

Sample size

样本量

Conversion testing needs more sample than people intuit. Quick reference:
Baseline rateMinimum detectable effectSample per variant
2%10% relative lift~75,000
2%20% relative lift~19,000
5%10% relative lift~30,000
5%20% relative lift~7,500
10%10% relative lift~14,000
10%20% relative lift~3,500
(Approximate. Use a calculator.)
If your monthly conversions per variant don't reach these numbers, A/B testing won't produce reliable results. Iterate via design and qualitative research instead.
转化测试所需的样本量比人们直觉中的要大。快速参考:
基准转化率最小可检测效果每个变体样本量
2%10%相对提升~75,000
2%20%相对提升~19,000
5%10%相对提升~30,000
5%20%相对提升~7,500
10%10%相对提升~14,000
10%20%相对提升~3,500
(近似值,请使用计算器计算。)
如果每个变体每月的转化量未达到上述数值,A/B测试将无法得出可靠结果。此时应通过设计优化和定性研究进行迭代。

Multiple testing

多重测试

The more variants and metrics tested simultaneously, the more false positives. Adjust significance thresholds for multiple comparisons (Bonferroni or similar).

同时测试的变体和指标越多,假阳性的概率就越高。需针对多重比较调整显著性阈值(如Bonferroni方法或类似方法)。

Workflow

工作流程

  1. Audit. Quantitative + qualitative + heuristic.
  2. Generate hypotheses. From audit findings. Apply hypothesis structure.
  3. Prioritize. ICE or PIE. Top 3 to 5 to test next.
  4. Design the test. Sample size, duration, primary and guardrail metrics, decision criteria.
  5. Implement. Build variants. QA carefully (broken variants invalidate tests).
  6. Run. Don't peek. Don't stop early.
  7. Analyze. Overall and by segment. Note interesting patterns regardless of significance.
  8. Decide. Ship, kill, or extend.
  9. Document. Hypothesis, design, results, decision, lesson.
  10. Compound. Apply lessons to next round of hypotheses.

  1. 审计:定量 + 定性 + 启发式。
  2. 生成假设:基于审计结果,应用假设结构。
  3. 优先级排序:使用ICE或PIE框架,选出接下来要测试的前3至5个假设。
  4. 设计测试:确定样本量、时长、核心指标和护栏指标、决策标准。
  5. 实施:构建变体。仔细进行QA(有问题的变体会使测试无效)。
  6. 运行:不要偷看数据,不要提前终止。
  7. 分析:整体分析和细分分析。记录有趣的模式,无论是否具有显著性。
  8. 决策:上线、终止或延长测试。
  9. 记录:假设、设计、结果、决策、经验教训。
  10. 复利效应:将经验教训应用于下一轮假设。

Failure patterns

失败模式

  • Testing without audit. Random changes, random results.
  • Vague hypotheses. "Make it better" is not a hypothesis.
  • Peeking and early stopping. Bias toward false positives.
  • Underpowered tests. Not enough sample for a real conclusion.
  • HiPPO override. Highest paid person's opinion overrides the data.
  • Testing during atypical periods. Holidays distort results.
  • Single metric obsession. Conversion ups but average order value craters. Net loss.
  • No guardrail metrics. Testing for one outcome, missing damage to others.
  • Documentation gap. Wins captured, losses forgotten. Same hypothesis re-tested 3 times.
  • Treating each test in isolation. Compounding learning across tests is where CRO programs really win.

  • 未经审计就开展测试:随机变更,随机结果。
  • 模糊的假设:“让它更好”不是一个有效的假设。
  • 偷看数据并提前终止:偏向假阳性结果。
  • 功效不足的测试:样本量不足以得出可靠结论。
  • HiPPO干预:最高薪资人员的意见凌驾于数据之上。
  • 在非典型时期测试:节假日会扭曲结果。
  • 单一指标执念:转化率上升但平均订单金额暴跌,最终净损失。
  • 未设置护栏指标:仅测试一个结果,忽略对其他指标的损害。
  • 记录缺口:记录成功案例,却遗忘失败案例。同一假设被重复测试3次。
  • 孤立看待每个测试:跨测试积累经验才是CRO项目真正取胜的关键。

Output format

输出格式

Default output: a markdown test plan at
cro-test-[hypothesis-slug].md
per test. After the test runs, append the results section.
Structure:
markdown
undefined
默认输出:每个测试对应一个Markdown测试计划,命名为
cro-test-[hypothesis-slug].md
。测试结束后,追加结果部分。
结构:
markdown
undefined

Test: [Hypothesis short name]

测试:[假设简称]

Hypothesis

假设

Because [observation], we believe that [change] will produce [outcome] for [segment], because [reason].
由于[观察结果],我们认为[变更内容]将为[细分群体]带来[预期结果],原因是[理由]。

Audit evidence

审计证据

[What evidence supports this hypothesis]
[支持该假设的证据]

Test design

测试设计

  • Primary metric:
  • Guardrail metrics:
  • Sample size required:
  • Duration: minimum X, maximum Y
  • Variant traffic split:
  • Segments to analyze:
  • 核心指标:
  • 护栏指标:
  • 所需样本量:
  • 时长:最短X,最长Y
  • 变体流量分配:
  • 需分析的细分群体:

Decision criteria

决策标准

  • Ship if: [conditions]
  • Kill if: [conditions]
  • Extend if: [conditions]
  • 上线条件:[条件]
  • 终止条件:[条件]
  • 延长条件:[条件]

Results (filled after test)

结果(测试后填写)

  • Sample reached:
  • Duration actual:
  • Primary metric: [variant vs control + significance]
  • Guardrail metrics: [results]
  • Segment analysis: [findings]
  • 是否已达到样本量:
  • 实际时长:
  • 核心指标:[变体 vs 对照组 + 显著性]
  • 护栏指标:[结果]
  • 细分分析:[发现]

Decision

决策

[Ship / Kill / Extend / Iterate] - [Why]
[上线 / 终止 / 延长 / 迭代] - [原因]

Lesson

经验教训

[What this teaches us, regardless of outcome]

---
[无论结果如何,本次测试带来的启示]

---

Reference files

参考文件

  • references/hypothesis-library.md
    - Common high-impact hypothesis patterns by funnel stage.
  • references/hypothesis-library.md
    - 按漏斗阶段划分的常见高影响力假设模式。