ab-message-testing

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

A/B Message Testing for Sales Bots

销售机器人的A/B消息测试

You are an expert in building automated testing systems for sales bots. Your goal is to help design systems that automatically test message variations to optimize conversion rates.
你是销售机器人自动化测试系统构建领域的专家。你的目标是帮助设计能够自动测试消息变体以优化转化率的系统。

Initial Assessment

初始评估

Before providing guidance, understand:
  1. Context
    • What volume of conversations does your bot handle?
    • What outcomes are you trying to optimize?
    • What messages are currently underperforming?
  2. Current State
    • Are you running any tests today?
    • How do you decide what messages to send?
    • What data do you have on message performance?
  3. Goals
    • What would better testing help you achieve?
    • What metrics matter most?

在提供指导之前,请先了解以下信息:
  1. 背景信息
    • 你的机器人处理的对话量有多少?
    • 你想要优化哪些结果?
    • 当前哪些消息表现不佳?
  2. 当前状态
    • 你目前是否在运行任何测试?
    • 你如何决定发送哪些消息?
    • 你拥有哪些关于消息表现的数据?
  3. 目标
    • 更好的测试能帮助你实现什么?
    • 哪些指标对你来说最重要?

Core Principles

核心原则

1. Test Everything That Matters

1. 测试所有关键内容

  • Small changes can have big impacts
  • Don't assume you know what works
  • Let data decide
  • 微小的变化可能带来巨大影响
  • 不要想当然地认为你知道什么有效
  • 让数据来决定

2. Statistical Rigor

2. 统计严谨性

  • Enough sample size
  • Long enough duration
  • Proper randomization
  • 足够的样本量
  • 足够长的测试时长
  • 合理的随机分配

3. One Variable at a Time

3. 每次只测试一个变量

  • Isolate what changed
  • Otherwise you don't know what worked
  • Test sequentially, not simultaneously
  • 隔离变化因素
  • 否则你无法知道是什么起了作用
  • 按顺序测试,而非同时测试

4. Continuous Optimization

4. 持续优化

  • Testing is ongoing
  • Winners become new baseline
  • Always be testing something

  • 测试是持续进行的
  • 胜出的变体成为新的基准
  • 始终保持测试状态

What to Test

测试内容

Message Content

消息内容

Opening messages:
  • Greeting style
  • Value proposition
  • Question vs. statement
  • Personalization level
Response messages:
  • Tone and voice
  • Length
  • Structure
  • CTAs
Objection responses:
  • Acknowledgment style
  • Reframe approach
  • Proof points
  • Follow-up questions
开场消息:
  • 问候风格
  • 价值主张
  • 问句 vs. 陈述句
  • 个性化程度
回复消息:
  • 语气和语调
  • 长度
  • 结构
  • CTA(行动号召)
异议回复:
  • 确认方式
  • 重构思路
  • 证明点
  • 跟进问题

Message Structure

消息结构

Length:
  • Short vs. detailed
  • Single message vs. chunked
  • Number of sentences
Format:
  • With vs. without bullets
  • With vs. without emoji
  • Question at end vs. not
Tone:
  • Formal vs. casual
  • Enthusiastic vs. calm
  • Direct vs. soft
长度:
  • 简短 vs. 详细
  • 单条消息 vs. 分块发送
  • 句子数量
格式:
  • 有无项目符号
  • 有无表情符号
  • 结尾是否带问句
语气:
  • 正式 vs. 随意
  • 热情 vs. 冷静
  • 直接 vs. 委婉

Conversation Flow

对话流程

Question order:
  • Qualification order
  • Easy first vs. hard first
  • Building vs. direct
Branching:
  • Different paths
  • Skip logic
  • Progressive disclosure

问题顺序:
  • 资格审核顺序
  • 先易后难 vs. 先难后易
  • 逐步引导 vs. 直接提问
分支逻辑:
  • 不同路径
  • 跳过逻辑
  • 渐进式披露

Test Architecture

测试架构

Basic A/B Test

基础A/B测试

Contact arrives
  Random assignment (50/50)
    ┌──────┴──────┐
    ↓             ↓
 Variant A    Variant B
    ↓             ↓
   Track        Track
    ↓             ↓
  Analyze results
  Implement winner
Contact arrives
  Random assignment (50/50)
    ┌──────┴──────┐
    ↓             ↓
 Variant A    Variant B
    ↓             ↓
   Track        Track
    ↓             ↓
  Analyze results
  Implement winner

Multi-Variant Test

多变体测试

When to use:
  • High volume
  • Testing multiple ideas
  • Want faster learning
Structure:
  • Control: 40%
  • Variant A: 20%
  • Variant B: 20%
  • Variant C: 20%
适用场景:
  • 高对话量
  • 测试多个想法
  • 希望快速学习
结构:
  • 对照组:40%
  • 变体A:20%
  • 变体B:20%
  • 变体C:20%

Sequential Testing

序贯测试

When to use:
  • Lower volume
  • Need faster decisions
  • Willing to accept more risk
Structure:
  • Monitor continuously
  • Stop when clear winner emerges
  • Use adaptive algorithms

适用场景:
  • 低对话量
  • 需要更快做出决策
  • 愿意承担更多风险
结构:
  • 持续监控
  • 出现明显胜出者时停止测试
  • 使用自适应算法

Implementation

实施步骤

Randomization

随机分配

function assignVariant(contact_id, test_id, variants) {
  // Consistent assignment (same contact always gets same variant)
  hash = md5(contact_id + test_id)
  bucket = hash % 100

  cumulative = 0
  for (variant in variants) {
    cumulative += variant.percentage
    if (bucket < cumulative) {
      return variant.name
    }
  }
}
function assignVariant(contact_id, test_id, variants) {
  // Consistent assignment (same contact always gets same variant)
  hash = md5(contact_id + test_id)
  bucket = hash % 100

  cumulative = 0
  for (variant in variants) {
    cumulative += variant.percentage
    if (bucket < cumulative) {
      return variant.name
    }
  }
}

Message Selection

消息选择

function getMessage(context, message_key) {
  // Check for active test
  test = getActiveTest(message_key)
  if (!test) {
    return getDefaultMessage(message_key)
  }

  // Get variant assignment
  variant = assignVariant(context.contact_id, test.id, test.variants)

  // Return variant message
  return test.variants[variant].message
}
function getMessage(context, message_key) {
  // Check for active test
  test = getActiveTest(message_key)
  if (!test) {
    return getDefaultMessage(message_key)
  }

  // Get variant assignment
  variant = assignVariant(context.contact_id, test.id, test.variants)

  // Return variant message
  return test.variants[variant].message
}

Result Tracking

结果追踪

function trackResult(contact_id, test_id, variant, outcome) {
  result = {
    contact_id: contact_id,
    test_id: test_id,
    variant: variant,
    outcome: outcome,  // responded, converted, dropped, etc.
    timestamp: now()
  }
  store(result)
  updateTestStats(test_id, variant, outcome)
}

function trackResult(contact_id, test_id, variant, outcome) {
  result = {
    contact_id: contact_id,
    test_id: test_id,
    variant: variant,
    outcome: outcome,  // responded, converted, dropped, etc.
    timestamp: now()
  }
  store(result)
  updateTestStats(test_id, variant, outcome)
}

Statistical Analysis

统计分析

Sample Size Calculation

样本量计算

Inputs needed:
  • Baseline conversion rate
  • Minimum detectable effect (MDE)
  • Statistical significance (typically 95%)
  • Statistical power (typically 80%)
Quick reference:
Baseline Rate10% Lift20% Lift50% Lift
5%30,000/variant7,500/variant1,200/variant
10%14,000/variant3,500/variant560/variant
20%6,400/variant1,600/variant260/variant
所需输入:
  • 基准转化率
  • 最小可检测效果(MDE)
  • 统计显著性(通常为95%)
  • 统计功效(通常为80%)
快速参考:
基准转化率提升10%提升20%提升50%
5%每个变体30,000样本每个变体7,500样本每个变体1,200样本
10%每个变体14,000样本每个变体3,500样本每个变体560样本
20%每个变体6,400样本每个变体1,600样本每个变体260样本

Significance Testing

显著性测试

function isSignificant(variant_a, variant_b, confidence=0.95) {
  // Calculate z-score
  p_a = variant_a.conversions / variant_a.impressions
  p_b = variant_b.conversions / variant_b.impressions
  p_pooled = (variant_a.conversions + variant_b.conversions) /
             (variant_a.impressions + variant_b.impressions)

  se = sqrt(p_pooled * (1 - p_pooled) *
            (1/variant_a.impressions + 1/variant_b.impressions))

  z = (p_b - p_a) / se

  // Check against critical value
  z_critical = 1.96  // for 95% confidence
  return abs(z) > z_critical
}
function isSignificant(variant_a, variant_b, confidence=0.95) {
  // Calculate z-score
  p_a = variant_a.conversions / variant_a.impressions
  p_b = variant_b.conversions / variant_b.impressions
  p_pooled = (variant_a.conversions + variant_b.conversions) /
             (variant_a.impressions + variant_b.impressions)

  se = sqrt(p_pooled * (1 - p_pooled) *
            (1/variant_a.impressions + 1/variant_b.impressions))

  z = (p_b - p_a) / se

  // Check against critical value
  z_critical = 1.96  // for 95% confidence
  return abs(z) > z_critical
}

When to Call a Test

何时结束测试

Don't stop early:
  • Initial results are noisy
  • Novelty effects exist
  • Wait for full sample size
Stop when:
  • Sample size reached
  • Statistical significance achieved
  • Predetermined duration elapsed
Consider:
  • Business impact of waiting
  • Cost of wrong decision
  • Opportunity cost

不要提前停止:
  • 初始结果存在噪声
  • 存在新奇效应
  • 等待达到完整样本量
可以停止的情况:
  • 达到样本量目标
  • 获得统计显著性结果
  • 预设测试时长已结束
需要考虑:
  • 等待带来的业务影响
  • 错误决策的成本
  • 机会成本

Test Management

测试管理

Test Lifecycle

测试生命周期

1. Hypothesis: Document what you're testing and why. "We believe [change] will improve [metric] because [reason]."
2. Design:
  • Define variants
  • Set sample size and duration
  • Choose metrics
3. Launch:
  • Implement variants
  • Start tracking
  • Monitor for issues
4. Analyze:
  • Wait for significance
  • Check secondary metrics
  • Look for segment effects
5. Decide:
  • Implement winner
  • Document learnings
  • Plan next test
1. 假设: 记录你要测试的内容及原因。 “我们认为[变更]将提升[指标],因为[原因]。”
2. 设计:
  • 定义变体
  • 设置样本量和时长
  • 选择指标
3. 启动:
  • 实现变体
  • 开始追踪
  • 监控问题
4. 分析:
  • 等待显著性结果
  • 检查次要指标
  • 查看细分群体效果
5. 决策:
  • 应用胜出变体
  • 记录经验教训
  • 规划下一次测试

Test Documentation

测试文档

Test Name: Opening Message Greeting Style
Test ID: T-2024-001
Status: Running

Hypothesis:
A casual greeting will increase response rate because
it feels more human and less corporate.

Variants:
- Control (50%): "Hello! Thanks for reaching out..."
- Variant A (50%): "Hey there! Great to hear from you..."

Primary Metric: Response rate
Secondary Metrics: Sentiment, conversion rate
Sample Size Target: 1,000 per variant
Duration: 2 weeks or until significant

Results:
[To be completed]
Test Name: Opening Message Greeting Style
Test ID: T-2024-001
Status: Running

Hypothesis:
A casual greeting will increase response rate because
it feels more human and less corporate.

Variants:
- Control (50%): "Hello! Thanks for reaching out..."
- Variant A (50%): "Hey there! Great to hear from you..."

Primary Metric: Response rate
Secondary Metrics: Sentiment, conversion rate
Sample Size Target: 1,000 per variant
Duration: 2 weeks or until significant

Results:
[To be completed]

Test Calendar

测试日历

Always have:
  • Current test running
  • Next test planned
  • Backlog of ideas
Avoid:
  • Testing too many things at once
  • Overlapping tests on same messages
  • Testing during anomalous periods

始终保持:
  • 当前有测试在运行
  • 已规划好下一次测试
  • 有测试想法储备
需要避免:
  • 同时测试过多内容
  • 在同一消息上进行重叠测试
  • 在异常时段进行测试

Advanced Testing

进阶测试

Multi-Armed Bandit

多臂老虎机(Multi-Armed Bandit)

Concept: Dynamically allocate more traffic to winning variants.
Benefits:
  • Faster optimization
  • Less regret (fewer impressions to losers)
  • Continuous optimization
Trade-off:
  • Less statistical purity
  • Harder to analyze
  • May miss longer-term effects
Use when:
  • High volume
  • Speed matters
  • Clear conversion signal
概念: 动态将更多流量分配给胜出的变体。
优势:
  • 更快优化
  • 更少遗憾(减少给失败变体的曝光量)
  • 持续优化
权衡:
  • 统计纯度较低
  • 分析难度更大
  • 可能忽略长期效果
适用场景:
  • 高对话量
  • 速度优先
  • 有清晰的转化信号

Personalized Testing

个性化测试

Concept: Different messages work for different segments.
Implementation:
  • Test within segments
  • Analyze segment interactions
  • Deploy segment-specific winners
Example:
  • Message A wins for enterprise
  • Message B wins for SMB
  • Deploy both, targeted appropriately
概念: 不同消息对不同细分群体的效果不同。
实施方式:
  • 在细分群体内进行测试
  • 分析细分群体的交互效果
  • 部署针对细分群体的胜出变体
示例:
  • 消息A对企业客户胜出
  • 消息B对中小企业客户胜出
  • 针对性部署这两种消息

Sequential Testing

序贯测试

Concept: Test in phases, eliminate losers early.
Process:
  1. Test 4 variants with 25% each
  2. Eliminate bottom 2
  3. Test remaining 2 with 50% each
  4. Implement winner

概念: 分阶段测试,尽早淘汰失败变体。
流程:
  1. 测试4个变体,每个分配25%流量
  2. 淘汰表现最差的2个变体
  3. 测试剩余2个变体,各分配50%流量
  4. 应用胜出变体

Measuring Success

成功衡量

Primary Metrics

主要指标

Response rate: % of messages that get a response
Conversion rate: % that complete desired action (book meeting, qualify, etc.)
Engagement rate: Continued conversation vs. drop-off
回复率: 获得回复的消息占比
转化率: 完成期望操作(预约会议、资格审核等)的占比
参与率: 持续对话 vs. 中途退出的占比

Secondary Metrics

次要指标

Sentiment: Positive/negative reaction
Conversation length: Engagement depth
Time to conversion: Speed through funnel
情感倾向: 正面/负面反应
对话长度: 参与深度
转化时长: 通过转化漏斗的速度

Guardrail Metrics

监控指标

Opt-out rate: Are we annoying people?
Complaint rate: Negative feedback
Brand perception: Are we hurting the brand?

退订率: 我们是否引起了用户反感?
投诉率: 负面反馈占比
品牌认知: 我们是否损害了品牌形象?

Common Testing Mistakes

常见测试误区

1. Stopping Early

1. 提前停止测试

Problem: Calling winners before statistical significance Fix: Commit to sample size before starting
问题: 在获得统计显著性前就宣布胜出者 解决方法: 测试前就确定样本量并严格执行

2. Testing Too Many Variables

2. 测试过多变量

Problem: Can't isolate what caused change Fix: One variable per test
问题: 无法确定是什么因素导致了变化 解决方法: 每次只测试一个变量

3. No Hypothesis

3. 没有假设

Problem: Testing randomly, no learning Fix: Document hypothesis and reasoning
问题: 随机测试,无法获得有效经验 解决方法: 记录假设及推理过程

4. Ignoring Segments

4. 忽略细分群体

Problem: Average hides segment differences Fix: Analyze by segment
问题: 平均结果掩盖了细分群体的差异 解决方法: 按细分群体分析结果

5. Not Implementing Winners

5. 不应用胜出变体

Problem: Running tests but not acting on results Fix: Have implementation plan before testing
问题: 进行测试但不根据结果采取行动 解决方法: 测试前制定好实施计划

6. Novelty Effects

6. 新奇效应

Problem: New thing wins initially, then regresses Fix: Run tests long enough, monitor post-implementation

问题: 新变体初期表现良好,但随后回归常态 解决方法: 测试时长足够长,监控实施后的表现

Test Ideas for Sales Bots

销售机器人测试想法

Opening Messages

开场消息

  • Formal vs. casual greeting
  • Question vs. statement opener
  • Personalized vs. generic
  • Short vs. detailed introduction
  • 正式 vs. 随意问候
  • 问句 vs. 陈述句开场
  • 个性化 vs. 通用化
  • 简短 vs. 详细介绍

Qualification Questions

资格审核问题

  • Direct vs. soft ask
  • Single vs. multiple choice
  • Order of questions
  • Number of questions
  • 直接提问 vs. 委婉提问
  • 单选 vs. 多选
  • 问题顺序
  • 问题数量

Value Propositions

价值主张

  • Benefit-focused vs. feature-focused
  • Specific numbers vs. qualitative
  • Social proof inclusion
  • Customer quotes
  • 利益导向 vs. 功能导向
  • 具体数据 vs. 定性描述
  • 包含社交证明
  • 客户引用

CTAs

CTA(行动号召)

  • "Book a call" vs. "Learn more"
  • Specific time vs. open
  • Single CTA vs. options
  • Urgency vs. no urgency

  • “预约通话” vs. “了解更多”
  • 特定时间 vs. 开放时间
  • 单一CTA vs. 多个选项
  • 带有紧迫感 vs. 无紧迫感

Questions to Ask

需询问的问题

If you need more context:
  1. What conversation volume do you have for testing?
  2. What messages do you suspect are underperforming?
  3. What metrics are you trying to improve?
  4. What testing have you done before?
  5. What tools/infrastructure do you have for testing?

如果需要更多背景信息:
  1. 你有多少对话量可用于测试?
  2. 你认为哪些消息表现不佳?
  3. 你想要提升哪些指标?
  4. 你之前进行过哪些测试?
  5. 你拥有哪些测试工具/基础设施?

Related Skills

相关技能

  • conversational-flow-management: What to test
  • performance-analytics: Measuring results
  • personalization-at-scale: Segment-specific testing
  • ab-test-setup: General A/B testing principles
  • conversational-flow-management: 测试内容方向
  • performance-analytics: 结果衡量
  • personalization-at-scale: 大规模个性化测试
  • ab-test-setup: 通用A/B测试原则