cro-optimization
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseCRO Optimization
CRO优化
Run conversion rate optimization as a structured discipline: audit → hypothesize → test → decide. Stack-agnostic. Tool-agnostic.
This skill is for running tests against existing pages and flows. For writing landing page copy from scratch, use . For setting up the analytics that make CRO possible, use .
landing-page-copyanalytics-strategy将转化率优化作为结构化流程执行:审计 → 假设 → 测试 → 决策。与技术栈无关,与工具无关。
此技能适用于针对现有页面和流程开展测试。若要从头撰写着陆页文案,请使用技能。若要搭建支持CRO的分析体系,请使用技能。
landing-page-copyanalytics-strategyWhen to use
适用场景
- Converting traffic at lower rate than expected
- Specific funnel step has high drop-off
- Pages with high traffic that could move the needle if optimized
- A/B testing infrastructure exists (or can be set up)
- Statistical significance and sample size questions
- 流量转化率低于预期
- 转化漏斗某一环节流失率过高
- 高流量页面,优化后可显著提升整体指标
- 已具备(或可搭建)A/B测试基础设施
- 存在统计显著性和样本量相关疑问
When NOT to use
不适用场景
- Without sufficient traffic to test (under ~5,000 monthly conversions per variant)
- Pre-launch (no users to test on yet)
- Strategy or messaging-level questions that need qualitative research first
- Brand-defining choices (CRO can't optimize a fundamentally wrong brand)
- 流量不足,无法开展测试(每个变体每月转化量低于约5000次)
- 产品未上线(暂无用户可用于测试)
- 需要先进行定性研究的策略或消息层面问题
- 品牌定义类决策(CRO无法优化根本性错误的品牌)
Required inputs
必需输入
- The page or flow under optimization
- Current conversion rate and traffic volume
- Access to analytics (event tracking, funnel data)
- An A/B testing tool (or willingness to set one up)
- Time and budget for testing (typically 2 to 8 weeks per test)
- 待优化的页面或流程
- 当前转化率和流量规模
- 分析数据访问权限(事件追踪、漏斗数据)
- A/B测试工具(或愿意搭建该工具)
- 测试所需的时间和预算(通常每个测试需2至8周)
The framework: 4 phases
框架:4个阶段
1. Audit
1. 审计
Diagnose before treating.
Quantitative audit:
- Funnel data. Where are users dropping off? The biggest drop is the biggest opportunity.
- Segmentation. Does the funnel perform differently by source, device, geography, audience type?
- Performance data. Are slow pages dragging conversions?
- Search Console / on-site search. What are users looking for that they can't find?
Qualitative audit:
- Session replay. Watch 20+ sessions of users on the target flow. Note friction, confusion, hesitation.
- Heatmaps. Where do users click? Where do they scroll? Where do they not?
- User interviews / surveys. Why did users not convert? Survey people who started but abandoned.
- Form analytics. Which fields cause abandonment? Which cause errors?
- Customer support tickets. What conversion-related questions come in?
Heuristic audit:
- Apply CRO heuristics to the flow:
- Is the value proposition clear in 5 seconds?
- Is there a single primary CTA per page?
- Is the form length appropriate to the offer?
- Is the trust/social proof present?
- Are objections handled?
- Is the page accessible? (Accessibility issues hurt conversion silently.)
The audit produces a list of suspected friction points. Each becomes a hypothesis candidate.
先诊断再优化。
定量审计:
- 漏斗数据:用户在哪个环节流失?流失最严重的环节就是最大的优化机会。
- 细分分析:不同流量来源、设备、地域、受众类型的漏斗表现是否存在差异?
- 性能数据:页面加载缓慢是否拖低了转化率?
- Search Console/站内搜索:用户在寻找哪些他们无法找到的内容?
定性审计:
- 会话重放:观看20+个目标流程的用户会话,记录摩擦点、困惑点和犹豫行为。
- 热力图:用户点击了哪里?滚动到了哪里?哪些区域未被触及?
- 用户访谈/调研:用户为何未完成转化?对已开始但中途放弃的用户进行调研。
- 表单分析:哪些字段导致用户放弃?哪些字段容易引发错误?
- 客服工单:收到哪些与转化相关的问题?
启发式审计:
- 将CRO启发式原则应用于流程:
- 价值主张是否能在5秒内清晰传达?
- 每页是否有唯一的主要CTA?
- 表单长度是否与提供的价值匹配?
- 是否有信任/社交证明?
- 是否处理了用户异议?
- 页面是否具备可访问性?(可访问性问题会悄无声息地损害转化率。)
审计会生成疑似摩擦点清单,每个摩擦点都可作为假设候选。
2. Hypothesis
2. 假设
A testable statement.
Hypothesis structure:
Because [observation from audit], we believe that [change] will produce [predicted outcome] for [user segment], because [reason].
Example:
Because session replays show users abandoning at the shipping step (audit), we believe that adding visible shipping cost to the product page (change) will increase add-to-cart conversion by 5 percent (outcome) for desktop users (segment), because users are surprised by shipping cost and abandon (reason).
Hypothesis quality criteria:
- Specific change (not "improve the design")
- Measurable outcome (with a target)
- Grounded in evidence (audit, research, prior tests)
- Tied to a known mechanism (why would this work?)
Hypothesis prioritization (ICE or PIE):
- Impact: How much could this move the metric?
- Confidence: How likely is the hypothesis to be right?
- Ease: How easy to test? (Time, complexity, risk)
Score each 1 to 10. Highest combined scores test first.
可测试的陈述。
假设结构:
由于[审计观察结果],我们认为[变更内容]将为[用户细分群体]带来[预期结果],原因是[理由]。
示例:
由于会话重放显示用户在配送环节放弃转化(审计结果),我们认为在产品页面添加可见的配送费用(变更内容)将使桌面端用户的加购转化率提升5%(预期结果),因为用户会因意外的配送费用而放弃转化(理由)。
假设质量标准:
- 具体的变更内容(而非“改进设计”)
- 可衡量的结果(带有目标值)
- 基于证据(审计、研究、过往测试)
- 关联已知机制(为何此变更会生效?)
假设优先级排序(ICE或PIE框架):
- 影响(Impact):该假设能在多大程度上改变指标?
- 信心(Confidence):该假设正确的可能性有多大?
- 易用性(Ease):测试难度如何?(时间、复杂度、风险)
每项评分1至10分,总分最高的假设优先测试。
3. Test design
3. 测试设计
A test that produces an unambiguous answer.
Sample size and duration:
Use a sample size calculator (most A/B tools have one) before launching. Inputs:
- Baseline conversion rate
- Minimum detectable effect (the smallest lift you'd care about)
- Statistical power (typically 80%)
- Significance level (typically 95%)
This produces required sample size per variant. Run the test until that sample is reached, OR for a minimum duration that captures full business cycle (typically 2 weeks minimum, to cover weekends and weekly patterns).
Common test setup mistakes:
- Stopping the test the moment significance is hit (peeking)
- Running tests for too short to capture a full business cycle
- Running multiple overlapping tests on the same flow
- Testing during atypical periods (Black Friday, holidays, major campaigns)
- Excluding mobile when 50%+ of traffic is mobile (or vice versa)
- Testing on too small a slice of traffic (low statistical power)
- Not segmenting analysis (overall lift can hide negative impact on a segment)
Test parameters to define before launch:
- Primary metric (one)
- Guardrail metrics (do not go down)
- Sample size
- Duration (minimum and maximum)
- Decision criteria (when to ship, when to kill, when to extend)
- Segments to analyze in addition to overall
能得出明确结论的测试。
样本量和时长:
在启动测试前使用样本量计算器(大多数A/B测试工具内置该功能),输入以下参数:
- 基准转化率
- 最小可检测效果(你关心的最小提升幅度)
- 统计功效(通常为80%)
- 显著性水平(通常为95%)
计算结果会得出每个变体所需的样本量。测试需运行至达到该样本量,或至少覆盖完整业务周期(通常最短2周,以涵盖周末和周度模式)。
常见测试设置错误:
- 一达到显著性就停止测试(偷看数据)
- 测试时长过短,未覆盖完整业务周期
- 在同一流程上同时运行多个重叠测试
- 在非典型时期开展测试(黑色星期五、节假日、大型营销活动期间)
- 当50%以上流量来自移动端时排除移动端(反之亦然)
- 在过小的流量切片上测试(统计功效低)
- 未进行细分分析(整体提升可能掩盖对某一细分群体的负面影响)
启动前需定义的测试参数:
- 核心指标(1个)
- 护栏指标(不得下降)
- 样本量
- 时长(最短和最长)
- 决策标准(何时上线、何时终止、何时延长)
- 除整体分析外需关注的细分群体
4. Decide
4. 决策
After the test concludes.
Decision framework:
| Outcome | Decision |
|---|---|
| Variant clearly wins (>95% significance, exceeds minimum effect) | Ship variant. Document. Continue testing. |
| Variant clearly loses | Kill. Capture the lesson. Iterate hypothesis. |
| Inconclusive (neither significant) | Larger test, different angle, or move on. Don't ship "tied" variants. |
| Small lift, lots of variance | Probably not worth shipping. Even if "winner," may not replicate. |
| Wins overall, loses for important segment | Investigate segment. Consider segment-specific solution. |
Anti-patterns:
- "It looks like it's winning, ship it" before reaching significance
- Shipping a variant because the team wants to (HiPPO - highest paid person's opinion)
- Killing tests too early because they look bad
- Re-running tests until they "win" (false positive risk)
- Not capturing the learning when a test loses
测试结束后进行。
决策框架:
| 结果 | 决策 |
|---|---|
| 变体明显胜出(显著性>95%,超过最小效果) | 上线变体。记录结果。继续测试。 |
| 变体明显失败 | 终止测试。吸取经验教训。迭代假设。 |
| 结果不确定(均无显著性) | 扩大测试规模、更换角度或转向其他假设。不要上线“平局”变体。 |
| 小幅提升,方差较大 | 可能不值得上线。即使是“胜出”变体,结果也可能无法复现。 |
| 整体胜出,但对重要细分群体不利 | 调研该细分群体。考虑针对细分群体的解决方案。 |
反模式:
- 在达到显著性前就“看起来要赢了,上线吧”
- 因团队意愿而上线变体(HiPPO - 最高薪资人员意见)
- 因初期表现不佳而过早终止测试
- 反复测试直到“胜出”(存在假阳性风险)
- 测试失败时未记录经验教训
Statistical foundations
统计基础
Significance and confidence
显著性和置信度
A 95% significance level means: if there were truly no difference between variants, there's only a 5% chance you'd see results this extreme by chance.
That's not the same as "95% chance the variant wins."
Most CRO tools report Bayesian probabilities ("95% chance of being best"). Read the methodology your tool uses.
95%的显著性水平意味着:如果变体之间确实没有差异,那么仅凭偶然因素得到如此极端结果的概率仅为5%。
这并不等同于“变体有95%的概率胜出”。
大多数CRO工具报告贝叶斯概率(“95%的概率成为最优选项”)。请了解你所使用工具的方法论。
Sample size
样本量
Conversion testing needs more sample than people intuit. Quick reference:
| Baseline rate | Minimum detectable effect | Sample per variant |
|---|---|---|
| 2% | 10% relative lift | ~75,000 |
| 2% | 20% relative lift | ~19,000 |
| 5% | 10% relative lift | ~30,000 |
| 5% | 20% relative lift | ~7,500 |
| 10% | 10% relative lift | ~14,000 |
| 10% | 20% relative lift | ~3,500 |
(Approximate. Use a calculator.)
If your monthly conversions per variant don't reach these numbers, A/B testing won't produce reliable results. Iterate via design and qualitative research instead.
转化测试所需的样本量比人们直觉中的要大。快速参考:
| 基准转化率 | 最小可检测效果 | 每个变体样本量 |
|---|---|---|
| 2% | 10%相对提升 | ~75,000 |
| 2% | 20%相对提升 | ~19,000 |
| 5% | 10%相对提升 | ~30,000 |
| 5% | 20%相对提升 | ~7,500 |
| 10% | 10%相对提升 | ~14,000 |
| 10% | 20%相对提升 | ~3,500 |
(近似值,请使用计算器计算。)
如果每个变体每月的转化量未达到上述数值,A/B测试将无法得出可靠结果。此时应通过设计优化和定性研究进行迭代。
Multiple testing
多重测试
The more variants and metrics tested simultaneously, the more false positives. Adjust significance thresholds for multiple comparisons (Bonferroni or similar).
同时测试的变体和指标越多,假阳性的概率就越高。需针对多重比较调整显著性阈值(如Bonferroni方法或类似方法)。
Workflow
工作流程
- Audit. Quantitative + qualitative + heuristic.
- Generate hypotheses. From audit findings. Apply hypothesis structure.
- Prioritize. ICE or PIE. Top 3 to 5 to test next.
- Design the test. Sample size, duration, primary and guardrail metrics, decision criteria.
- Implement. Build variants. QA carefully (broken variants invalidate tests).
- Run. Don't peek. Don't stop early.
- Analyze. Overall and by segment. Note interesting patterns regardless of significance.
- Decide. Ship, kill, or extend.
- Document. Hypothesis, design, results, decision, lesson.
- Compound. Apply lessons to next round of hypotheses.
- 审计:定量 + 定性 + 启发式。
- 生成假设:基于审计结果,应用假设结构。
- 优先级排序:使用ICE或PIE框架,选出接下来要测试的前3至5个假设。
- 设计测试:确定样本量、时长、核心指标和护栏指标、决策标准。
- 实施:构建变体。仔细进行QA(有问题的变体会使测试无效)。
- 运行:不要偷看数据,不要提前终止。
- 分析:整体分析和细分分析。记录有趣的模式,无论是否具有显著性。
- 决策:上线、终止或延长测试。
- 记录:假设、设计、结果、决策、经验教训。
- 复利效应:将经验教训应用于下一轮假设。
Failure patterns
失败模式
- Testing without audit. Random changes, random results.
- Vague hypotheses. "Make it better" is not a hypothesis.
- Peeking and early stopping. Bias toward false positives.
- Underpowered tests. Not enough sample for a real conclusion.
- HiPPO override. Highest paid person's opinion overrides the data.
- Testing during atypical periods. Holidays distort results.
- Single metric obsession. Conversion ups but average order value craters. Net loss.
- No guardrail metrics. Testing for one outcome, missing damage to others.
- Documentation gap. Wins captured, losses forgotten. Same hypothesis re-tested 3 times.
- Treating each test in isolation. Compounding learning across tests is where CRO programs really win.
- 未经审计就开展测试:随机变更,随机结果。
- 模糊的假设:“让它更好”不是一个有效的假设。
- 偷看数据并提前终止:偏向假阳性结果。
- 功效不足的测试:样本量不足以得出可靠结论。
- HiPPO干预:最高薪资人员的意见凌驾于数据之上。
- 在非典型时期测试:节假日会扭曲结果。
- 单一指标执念:转化率上升但平均订单金额暴跌,最终净损失。
- 未设置护栏指标:仅测试一个结果,忽略对其他指标的损害。
- 记录缺口:记录成功案例,却遗忘失败案例。同一假设被重复测试3次。
- 孤立看待每个测试:跨测试积累经验才是CRO项目真正取胜的关键。
Output format
输出格式
Default output: a markdown test plan at per test. After the test runs, append the results section.
cro-test-[hypothesis-slug].mdStructure:
markdown
undefined默认输出:每个测试对应一个Markdown测试计划,命名为。测试结束后,追加结果部分。
cro-test-[hypothesis-slug].md结构:
markdown
undefinedTest: [Hypothesis short name]
测试:[假设简称]
Hypothesis
假设
Because [observation], we believe that [change] will produce [outcome] for [segment], because [reason].
由于[观察结果],我们认为[变更内容]将为[细分群体]带来[预期结果],原因是[理由]。
Audit evidence
审计证据
[What evidence supports this hypothesis]
[支持该假设的证据]
Test design
测试设计
- Primary metric:
- Guardrail metrics:
- Sample size required:
- Duration: minimum X, maximum Y
- Variant traffic split:
- Segments to analyze:
- 核心指标:
- 护栏指标:
- 所需样本量:
- 时长:最短X,最长Y
- 变体流量分配:
- 需分析的细分群体:
Decision criteria
决策标准
- Ship if: [conditions]
- Kill if: [conditions]
- Extend if: [conditions]
- 上线条件:[条件]
- 终止条件:[条件]
- 延长条件:[条件]
Results (filled after test)
结果(测试后填写)
- Sample reached:
- Duration actual:
- Primary metric: [variant vs control + significance]
- Guardrail metrics: [results]
- Segment analysis: [findings]
- 是否已达到样本量:
- 实际时长:
- 核心指标:[变体 vs 对照组 + 显著性]
- 护栏指标:[结果]
- 细分分析:[发现]
Decision
决策
[Ship / Kill / Extend / Iterate] - [Why]
[上线 / 终止 / 延长 / 迭代] - [原因]
Lesson
经验教训
[What this teaches us, regardless of outcome]
---[无论结果如何,本次测试带来的启示]
---Reference files
参考文件
- - Common high-impact hypothesis patterns by funnel stage.
references/hypothesis-library.md
- - 按漏斗阶段划分的常见高影响力假设模式。
references/hypothesis-library.md