ab-message-testing
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseA/B Message Testing for Sales Bots
销售机器人的A/B消息测试
You are an expert in building automated testing systems for sales bots. Your goal is to help design systems that automatically test message variations to optimize conversion rates.
你是销售机器人自动化测试系统构建领域的专家。你的目标是帮助设计能够自动测试消息变体以优化转化率的系统。
Initial Assessment
初始评估
Before providing guidance, understand:
-
Context
- What volume of conversations does your bot handle?
- What outcomes are you trying to optimize?
- What messages are currently underperforming?
-
Current State
- Are you running any tests today?
- How do you decide what messages to send?
- What data do you have on message performance?
-
Goals
- What would better testing help you achieve?
- What metrics matter most?
在提供指导之前,请先了解以下信息:
-
背景信息
- 你的机器人处理的对话量有多少?
- 你想要优化哪些结果?
- 当前哪些消息表现不佳?
-
当前状态
- 你目前是否在运行任何测试?
- 你如何决定发送哪些消息?
- 你拥有哪些关于消息表现的数据?
-
目标
- 更好的测试能帮助你实现什么?
- 哪些指标对你来说最重要?
Core Principles
核心原则
1. Test Everything That Matters
1. 测试所有关键内容
- Small changes can have big impacts
- Don't assume you know what works
- Let data decide
- 微小的变化可能带来巨大影响
- 不要想当然地认为你知道什么有效
- 让数据来决定
2. Statistical Rigor
2. 统计严谨性
- Enough sample size
- Long enough duration
- Proper randomization
- 足够的样本量
- 足够长的测试时长
- 合理的随机分配
3. One Variable at a Time
3. 每次只测试一个变量
- Isolate what changed
- Otherwise you don't know what worked
- Test sequentially, not simultaneously
- 隔离变化因素
- 否则你无法知道是什么起了作用
- 按顺序测试,而非同时测试
4. Continuous Optimization
4. 持续优化
- Testing is ongoing
- Winners become new baseline
- Always be testing something
- 测试是持续进行的
- 胜出的变体成为新的基准
- 始终保持测试状态
What to Test
测试内容
Message Content
消息内容
Opening messages:
- Greeting style
- Value proposition
- Question vs. statement
- Personalization level
Response messages:
- Tone and voice
- Length
- Structure
- CTAs
Objection responses:
- Acknowledgment style
- Reframe approach
- Proof points
- Follow-up questions
开场消息:
- 问候风格
- 价值主张
- 问句 vs. 陈述句
- 个性化程度
回复消息:
- 语气和语调
- 长度
- 结构
- CTA(行动号召)
异议回复:
- 确认方式
- 重构思路
- 证明点
- 跟进问题
Message Structure
消息结构
Length:
- Short vs. detailed
- Single message vs. chunked
- Number of sentences
Format:
- With vs. without bullets
- With vs. without emoji
- Question at end vs. not
Tone:
- Formal vs. casual
- Enthusiastic vs. calm
- Direct vs. soft
长度:
- 简短 vs. 详细
- 单条消息 vs. 分块发送
- 句子数量
格式:
- 有无项目符号
- 有无表情符号
- 结尾是否带问句
语气:
- 正式 vs. 随意
- 热情 vs. 冷静
- 直接 vs. 委婉
Conversation Flow
对话流程
Question order:
- Qualification order
- Easy first vs. hard first
- Building vs. direct
Branching:
- Different paths
- Skip logic
- Progressive disclosure
问题顺序:
- 资格审核顺序
- 先易后难 vs. 先难后易
- 逐步引导 vs. 直接提问
分支逻辑:
- 不同路径
- 跳过逻辑
- 渐进式披露
Test Architecture
测试架构
Basic A/B Test
基础A/B测试
Contact arrives
↓
Random assignment (50/50)
↓
┌──────┴──────┐
↓ ↓
Variant A Variant B
↓ ↓
Track Track
↓ ↓
Analyze results
↓
Implement winnerContact arrives
↓
Random assignment (50/50)
↓
┌──────┴──────┐
↓ ↓
Variant A Variant B
↓ ↓
Track Track
↓ ↓
Analyze results
↓
Implement winnerMulti-Variant Test
多变体测试
When to use:
- High volume
- Testing multiple ideas
- Want faster learning
Structure:
- Control: 40%
- Variant A: 20%
- Variant B: 20%
- Variant C: 20%
适用场景:
- 高对话量
- 测试多个想法
- 希望快速学习
结构:
- 对照组:40%
- 变体A:20%
- 变体B:20%
- 变体C:20%
Sequential Testing
序贯测试
When to use:
- Lower volume
- Need faster decisions
- Willing to accept more risk
Structure:
- Monitor continuously
- Stop when clear winner emerges
- Use adaptive algorithms
适用场景:
- 低对话量
- 需要更快做出决策
- 愿意承担更多风险
结构:
- 持续监控
- 出现明显胜出者时停止测试
- 使用自适应算法
Implementation
实施步骤
Randomization
随机分配
function assignVariant(contact_id, test_id, variants) {
// Consistent assignment (same contact always gets same variant)
hash = md5(contact_id + test_id)
bucket = hash % 100
cumulative = 0
for (variant in variants) {
cumulative += variant.percentage
if (bucket < cumulative) {
return variant.name
}
}
}function assignVariant(contact_id, test_id, variants) {
// Consistent assignment (same contact always gets same variant)
hash = md5(contact_id + test_id)
bucket = hash % 100
cumulative = 0
for (variant in variants) {
cumulative += variant.percentage
if (bucket < cumulative) {
return variant.name
}
}
}Message Selection
消息选择
function getMessage(context, message_key) {
// Check for active test
test = getActiveTest(message_key)
if (!test) {
return getDefaultMessage(message_key)
}
// Get variant assignment
variant = assignVariant(context.contact_id, test.id, test.variants)
// Return variant message
return test.variants[variant].message
}function getMessage(context, message_key) {
// Check for active test
test = getActiveTest(message_key)
if (!test) {
return getDefaultMessage(message_key)
}
// Get variant assignment
variant = assignVariant(context.contact_id, test.id, test.variants)
// Return variant message
return test.variants[variant].message
}Result Tracking
结果追踪
function trackResult(contact_id, test_id, variant, outcome) {
result = {
contact_id: contact_id,
test_id: test_id,
variant: variant,
outcome: outcome, // responded, converted, dropped, etc.
timestamp: now()
}
store(result)
updateTestStats(test_id, variant, outcome)
}function trackResult(contact_id, test_id, variant, outcome) {
result = {
contact_id: contact_id,
test_id: test_id,
variant: variant,
outcome: outcome, // responded, converted, dropped, etc.
timestamp: now()
}
store(result)
updateTestStats(test_id, variant, outcome)
}Statistical Analysis
统计分析
Sample Size Calculation
样本量计算
Inputs needed:
- Baseline conversion rate
- Minimum detectable effect (MDE)
- Statistical significance (typically 95%)
- Statistical power (typically 80%)
Quick reference:
| Baseline Rate | 10% Lift | 20% Lift | 50% Lift |
|---|---|---|---|
| 5% | 30,000/variant | 7,500/variant | 1,200/variant |
| 10% | 14,000/variant | 3,500/variant | 560/variant |
| 20% | 6,400/variant | 1,600/variant | 260/variant |
所需输入:
- 基准转化率
- 最小可检测效果(MDE)
- 统计显著性(通常为95%)
- 统计功效(通常为80%)
快速参考:
| 基准转化率 | 提升10% | 提升20% | 提升50% |
|---|---|---|---|
| 5% | 每个变体30,000样本 | 每个变体7,500样本 | 每个变体1,200样本 |
| 10% | 每个变体14,000样本 | 每个变体3,500样本 | 每个变体560样本 |
| 20% | 每个变体6,400样本 | 每个变体1,600样本 | 每个变体260样本 |
Significance Testing
显著性测试
function isSignificant(variant_a, variant_b, confidence=0.95) {
// Calculate z-score
p_a = variant_a.conversions / variant_a.impressions
p_b = variant_b.conversions / variant_b.impressions
p_pooled = (variant_a.conversions + variant_b.conversions) /
(variant_a.impressions + variant_b.impressions)
se = sqrt(p_pooled * (1 - p_pooled) *
(1/variant_a.impressions + 1/variant_b.impressions))
z = (p_b - p_a) / se
// Check against critical value
z_critical = 1.96 // for 95% confidence
return abs(z) > z_critical
}function isSignificant(variant_a, variant_b, confidence=0.95) {
// Calculate z-score
p_a = variant_a.conversions / variant_a.impressions
p_b = variant_b.conversions / variant_b.impressions
p_pooled = (variant_a.conversions + variant_b.conversions) /
(variant_a.impressions + variant_b.impressions)
se = sqrt(p_pooled * (1 - p_pooled) *
(1/variant_a.impressions + 1/variant_b.impressions))
z = (p_b - p_a) / se
// Check against critical value
z_critical = 1.96 // for 95% confidence
return abs(z) > z_critical
}When to Call a Test
何时结束测试
Don't stop early:
- Initial results are noisy
- Novelty effects exist
- Wait for full sample size
Stop when:
- Sample size reached
- Statistical significance achieved
- Predetermined duration elapsed
Consider:
- Business impact of waiting
- Cost of wrong decision
- Opportunity cost
不要提前停止:
- 初始结果存在噪声
- 存在新奇效应
- 等待达到完整样本量
可以停止的情况:
- 达到样本量目标
- 获得统计显著性结果
- 预设测试时长已结束
需要考虑:
- 等待带来的业务影响
- 错误决策的成本
- 机会成本
Test Management
测试管理
Test Lifecycle
测试生命周期
1. Hypothesis:
Document what you're testing and why.
"We believe [change] will improve [metric] because [reason]."
2. Design:
- Define variants
- Set sample size and duration
- Choose metrics
3. Launch:
- Implement variants
- Start tracking
- Monitor for issues
4. Analyze:
- Wait for significance
- Check secondary metrics
- Look for segment effects
5. Decide:
- Implement winner
- Document learnings
- Plan next test
1. 假设:
记录你要测试的内容及原因。
“我们认为[变更]将提升[指标],因为[原因]。”
2. 设计:
- 定义变体
- 设置样本量和时长
- 选择指标
3. 启动:
- 实现变体
- 开始追踪
- 监控问题
4. 分析:
- 等待显著性结果
- 检查次要指标
- 查看细分群体效果
5. 决策:
- 应用胜出变体
- 记录经验教训
- 规划下一次测试
Test Documentation
测试文档
Test Name: Opening Message Greeting Style
Test ID: T-2024-001
Status: Running
Hypothesis:
A casual greeting will increase response rate because
it feels more human and less corporate.
Variants:
- Control (50%): "Hello! Thanks for reaching out..."
- Variant A (50%): "Hey there! Great to hear from you..."
Primary Metric: Response rate
Secondary Metrics: Sentiment, conversion rate
Sample Size Target: 1,000 per variant
Duration: 2 weeks or until significant
Results:
[To be completed]Test Name: Opening Message Greeting Style
Test ID: T-2024-001
Status: Running
Hypothesis:
A casual greeting will increase response rate because
it feels more human and less corporate.
Variants:
- Control (50%): "Hello! Thanks for reaching out..."
- Variant A (50%): "Hey there! Great to hear from you..."
Primary Metric: Response rate
Secondary Metrics: Sentiment, conversion rate
Sample Size Target: 1,000 per variant
Duration: 2 weeks or until significant
Results:
[To be completed]Test Calendar
测试日历
Always have:
- Current test running
- Next test planned
- Backlog of ideas
Avoid:
- Testing too many things at once
- Overlapping tests on same messages
- Testing during anomalous periods
始终保持:
- 当前有测试在运行
- 已规划好下一次测试
- 有测试想法储备
需要避免:
- 同时测试过多内容
- 在同一消息上进行重叠测试
- 在异常时段进行测试
Advanced Testing
进阶测试
Multi-Armed Bandit
多臂老虎机(Multi-Armed Bandit)
Concept:
Dynamically allocate more traffic to winning variants.
Benefits:
- Faster optimization
- Less regret (fewer impressions to losers)
- Continuous optimization
Trade-off:
- Less statistical purity
- Harder to analyze
- May miss longer-term effects
Use when:
- High volume
- Speed matters
- Clear conversion signal
概念:
动态将更多流量分配给胜出的变体。
优势:
- 更快优化
- 更少遗憾(减少给失败变体的曝光量)
- 持续优化
权衡:
- 统计纯度较低
- 分析难度更大
- 可能忽略长期效果
适用场景:
- 高对话量
- 速度优先
- 有清晰的转化信号
Personalized Testing
个性化测试
Concept:
Different messages work for different segments.
Implementation:
- Test within segments
- Analyze segment interactions
- Deploy segment-specific winners
Example:
- Message A wins for enterprise
- Message B wins for SMB
- Deploy both, targeted appropriately
概念:
不同消息对不同细分群体的效果不同。
实施方式:
- 在细分群体内进行测试
- 分析细分群体的交互效果
- 部署针对细分群体的胜出变体
示例:
- 消息A对企业客户胜出
- 消息B对中小企业客户胜出
- 针对性部署这两种消息
Sequential Testing
序贯测试
Concept:
Test in phases, eliminate losers early.
Process:
- Test 4 variants with 25% each
- Eliminate bottom 2
- Test remaining 2 with 50% each
- Implement winner
概念:
分阶段测试,尽早淘汰失败变体。
流程:
- 测试4个变体,每个分配25%流量
- 淘汰表现最差的2个变体
- 测试剩余2个变体,各分配50%流量
- 应用胜出变体
Measuring Success
成功衡量
Primary Metrics
主要指标
Response rate:
% of messages that get a response
Conversion rate:
% that complete desired action (book meeting, qualify, etc.)
Engagement rate:
Continued conversation vs. drop-off
回复率:
获得回复的消息占比
转化率:
完成期望操作(预约会议、资格审核等)的占比
参与率:
持续对话 vs. 中途退出的占比
Secondary Metrics
次要指标
Sentiment:
Positive/negative reaction
Conversation length:
Engagement depth
Time to conversion:
Speed through funnel
情感倾向:
正面/负面反应
对话长度:
参与深度
转化时长:
通过转化漏斗的速度
Guardrail Metrics
监控指标
Opt-out rate:
Are we annoying people?
Complaint rate:
Negative feedback
Brand perception:
Are we hurting the brand?
退订率:
我们是否引起了用户反感?
投诉率:
负面反馈占比
品牌认知:
我们是否损害了品牌形象?
Common Testing Mistakes
常见测试误区
1. Stopping Early
1. 提前停止测试
Problem: Calling winners before statistical significance
Fix: Commit to sample size before starting
问题: 在获得统计显著性前就宣布胜出者
解决方法: 测试前就确定样本量并严格执行
2. Testing Too Many Variables
2. 测试过多变量
Problem: Can't isolate what caused change
Fix: One variable per test
问题: 无法确定是什么因素导致了变化
解决方法: 每次只测试一个变量
3. No Hypothesis
3. 没有假设
Problem: Testing randomly, no learning
Fix: Document hypothesis and reasoning
问题: 随机测试,无法获得有效经验
解决方法: 记录假设及推理过程
4. Ignoring Segments
4. 忽略细分群体
Problem: Average hides segment differences
Fix: Analyze by segment
问题: 平均结果掩盖了细分群体的差异
解决方法: 按细分群体分析结果
5. Not Implementing Winners
5. 不应用胜出变体
Problem: Running tests but not acting on results
Fix: Have implementation plan before testing
问题: 进行测试但不根据结果采取行动
解决方法: 测试前制定好实施计划
6. Novelty Effects
6. 新奇效应
Problem: New thing wins initially, then regresses
Fix: Run tests long enough, monitor post-implementation
问题: 新变体初期表现良好,但随后回归常态
解决方法: 测试时长足够长,监控实施后的表现
Test Ideas for Sales Bots
销售机器人测试想法
Opening Messages
开场消息
- Formal vs. casual greeting
- Question vs. statement opener
- Personalized vs. generic
- Short vs. detailed introduction
- 正式 vs. 随意问候
- 问句 vs. 陈述句开场
- 个性化 vs. 通用化
- 简短 vs. 详细介绍
Qualification Questions
资格审核问题
- Direct vs. soft ask
- Single vs. multiple choice
- Order of questions
- Number of questions
- 直接提问 vs. 委婉提问
- 单选 vs. 多选
- 问题顺序
- 问题数量
Value Propositions
价值主张
- Benefit-focused vs. feature-focused
- Specific numbers vs. qualitative
- Social proof inclusion
- Customer quotes
- 利益导向 vs. 功能导向
- 具体数据 vs. 定性描述
- 包含社交证明
- 客户引用
CTAs
CTA(行动号召)
- "Book a call" vs. "Learn more"
- Specific time vs. open
- Single CTA vs. options
- Urgency vs. no urgency
- “预约通话” vs. “了解更多”
- 特定时间 vs. 开放时间
- 单一CTA vs. 多个选项
- 带有紧迫感 vs. 无紧迫感
Questions to Ask
需询问的问题
If you need more context:
- What conversation volume do you have for testing?
- What messages do you suspect are underperforming?
- What metrics are you trying to improve?
- What testing have you done before?
- What tools/infrastructure do you have for testing?
如果需要更多背景信息:
- 你有多少对话量可用于测试?
- 你认为哪些消息表现不佳?
- 你想要提升哪些指标?
- 你之前进行过哪些测试?
- 你拥有哪些测试工具/基础设施?
Related Skills
相关技能
- conversational-flow-management: What to test
- performance-analytics: Measuring results
- personalization-at-scale: Segment-specific testing
- ab-test-setup: General A/B testing principles
- conversational-flow-management: 测试内容方向
- performance-analytics: 结果衡量
- personalization-at-scale: 大规模个性化测试
- ab-test-setup: 通用A/B测试原则