abtesting-stats

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

abtesting-stats

Purpose

用途

This skill performs statistical analysis for A/B testing, including t-tests, chi-squared tests, Mann-Whitney U tests, p-value calculations, confidence intervals (CIs), multiple testing corrections like Bonferroni or Benjamini-Hochberg (BH), and Bayesian A/B testing methods.

本skill用于执行A/B测试的统计分析，包括t检验、卡方检验、Mann-Whitney U检验、p值计算、置信区间（CIs）计算、多重检验校正（如Bonferroni或Benjamini-Hochberg（BH））以及贝叶斯A/B测试方法。

When to Use

适用场景

Use this skill when comparing two groups in experiments, such as website variants, to determine statistical significance. Apply it for hypothesis testing in data science workflows, validating A/B test results, or analyzing user behavior metrics. Avoid if data is non-numeric or sample sizes are too small (<10 per group).

当你在实验中比较两组数据（如网站不同变体）以确定统计显著性时，可以使用本skill。适用于数据科学工作流中的假设检验、验证A/B测试结果或分析用户行为指标。若数据为非数值型或样本量过小（每组<10个样本），则不建议使用。

Key Capabilities

核心功能

Conduct t-tests for normally distributed data.
Perform chi-squared tests for categorical data.
Run Mann-Whitney U tests for non-parametric comparisons.
Calculate p-values and 95% CIs for effect sizes.
Apply corrections like Bonferroni for multiple comparisons or BH for FDR control.
Execute Bayesian A/B tests using priors like uniform or beta distributions.
Handle input data from CSV, JSON, or in-memory arrays.

针对正态分布数据执行t检验。
针对分类数据执行卡方检验。
针对非参数比较运行Mann-Whitney U检验。
计算效应量的p值和95%置信区间（CIs）。
应用多重比较校正，如Bonferroni校正或用于错误发现率（FDR）控制的BH校正。
使用均匀分布或beta分布等先验分布执行贝叶斯A/B测试。
处理来自CSV、JSON或内存数组的输入数据。

Usage Patterns

使用模式

Invoke via CLI for quick runs or integrate via API for scripted workflows. Always provide data sources and specify the test type. Use JSON config files for complex parameters. For example, pipe data directly into CLI or call API endpoints in loops for batch processing. Ensure data is pre-cleaned (e.g., remove NaNs) before use.

通过CLI调用以快速运行，或通过API集成以实现脚本化工作流。请始终提供数据源并指定测试类型。使用JSON配置文件设置复杂参数。例如，直接将数据通过管道传入CLI，或在循环中调用API端点以进行批量处理。使用前确保数据已预处理（如移除NaN值）。

Common Commands/API

常用命令/API

Use the OpenClaw CLI with the

abtesting-stats

subcommand. Authentication requires setting

$OPENCLAW_API_KEY

as an environment variable.

CLI Command for t-test:

openclaw abtesting-stats run --test t-test --data-path data.csv --groups groupA groupB --alpha 0.05

This computes a two-sample t-test and outputs p-value and CI.

API Endpoint for chi-squared:
POST to

/api/abtesting/stats

with JSON body:

json

{  
  "test": "chi-squared",  
  "data": {"category1": [10, 20], "category2": [15, 25]},  
  "alpha": 0.01  
}

Response includes p-value and expected frequencies.

CLI for Mann-Whitney:

openclaw abtesting-stats run --test mann-whitney --file input.json --key metric --significance 0.01

Expects JSON with arrays for each group.

API for Bayesian A/B:
POST to

/api/abtesting/bayesian

with:

json

{  
  "test": "bayesian",  
  "conversions": [50, 60],  
  "trials": [1000, 1000],  
  "prior": "beta"  
}

Returns posterior probabilities and credible intervals.

Config format is JSON, e.g., save as

config.json

json

{  
  "test": "bonferroni",  
  "p_values": [0.01, 0.02, 0.05]  
}

Then run:

openclaw abtesting-stats apply-config config.json

使用OpenClaw CLI的

abtesting-stats

子命令。身份验证需要将

$OPENCLAW_API_KEY

设置为环境变量。

t检验的CLI命令：

openclaw abtesting-stats run --test t-test --data-path data.csv --groups groupA groupB --alpha 0.05

该命令计算双样本t检验并输出p值和置信区间。

卡方检验的API端点：
向

/api/abtesting/stats

发送POST请求，JSON请求体如下：

json

{  
  "test": "chi-squared",  
  "data": {"category1": [10, 20], "category2": [15, 25]},  
  "alpha": 0.01  
}

响应结果包含p值和期望频数。

Mann-Whitney检验的CLI命令：

openclaw abtesting-stats run --test mann-whitney --file input.json --key metric --significance 0.01

该命令期望JSON文件中包含每组数据的数组。

贝叶斯A/B测试的API端点：
向

/api/abtesting/bayesian

发送POST请求，请求体如下：

json

{  
  "test": "bayesian",  
  "conversions": [50, 60],  
  "trials": [1000, 1000],  
  "prior": "beta"  
}

返回后验概率和可信区间。

配置文件格式为JSON，例如保存为

config.json

：

json

{  
  "test": "bonferroni",  
  "p_values": [0.01, 0.02, 0.05]  
}

然后运行：

openclaw abtesting-stats apply-config config.json

。

Integration Notes

集成说明

Integrate by setting

$OPENCLAW_API_KEY

for all API calls. For Python scripts, use the OpenClaw SDK:

python

import openclaw  
client = openclaw.Client(api_key=os.environ['OPENCLAW_API_KEY'])  
response = client.post('/api/abtesting/stats', json={'test': 't-test', 'data': [...]})

Handle asynchronous responses by checking for 'job_id' in the response and polling

/api/jobs/{job_id}

. Combine with data tools like Pandas for preprocessing, e.g., load CSV and format as JSON. Avoid rate limits by batching requests (max 10/sec).

所有API调用都需要设置

$OPENCLAW_API_KEY

以完成集成。对于Python脚本，可使用OpenClaw SDK：

python

import openclaw  
client = openclaw.Client(api_key=os.environ['OPENCLAW_API_KEY'])  
response = client.post('/api/abtesting/stats', json={'test': 't-test', 'data': [...]})

通过检查响应中的

job_id

并轮询

/api/jobs/{job_id}

来处理异步响应。可与Pandas等数据工具结合进行预处理，例如加载CSV并格式化为JSON。通过批量请求避免速率限制（最大每秒10次请求）。

Error Handling

错误处理

Check for common errors like invalid data formats (e.g., non-numeric inputs) by validating inputs first. If API returns 401, ensure

$OPENCLAW_API_KEY

is set and valid. For CLI, parse errors from stdout (e.g., "Error: Insufficient samples"). Use try-except in code:

python

try:  
    result = client.post(...)  
except openclaw.AuthError:  
    print("Authentication failed; check $OPENCLAW_API_KEY")

Log detailed errors with

--verbose

flag in CLI for debugging, e.g.,

openclaw abtesting-stats run --verbose ...

. Retry transient errors (e.g., 503) up to 3 times with exponential backoff.

首先验证输入数据，检查常见错误（如非数值型输入）。若API返回401错误，请确保

$OPENCLAW_API_KEY

已正确设置且有效。对于CLI，可从标准输出解析错误信息（如"Error: Insufficient samples"）。在代码中使用try-except块捕获错误：

python

try:  
    result = client.post(...)  
except openclaw.AuthError:  
    print("身份验证失败；请检查$OPENCLAW_API_KEY")

在CLI中使用

--verbose

标志记录详细错误信息以进行调试，例如

openclaw abtesting-stats run --verbose ...

。对瞬时错误（如503）最多重试3次，并使用指数退避策略。

Concrete Usage Examples

具体使用示例

T-test on sales data: To compare average sales between two ad variants, run:
```
openclaw abtesting-stats run --test t-test --data-path sales.csv --groups variantA variantB
```
Assuming sales.csv has columns: group, sales. This outputs: p-value=0.03, CI=[5.2, 10.4], indicating significant difference.
Bayesian A/B for click-through rates: For testing email campaigns, use API:
POST to
```
/api/abtesting/bayesian
```
with:
json
```
{  
  "conversions": [120, 150],  
  "trials": [1000, 1000]  
}  
```
This yields a 90% probability that variant B is better, guiding decisions without p-values.

销售数据的t检验：要比较两个广告变体的平均销售额，运行以下命令：
```
openclaw abtesting-stats run --test t-test --data-path sales.csv --groups variantA variantB
```
假设sales.csv包含列：group, sales。输出结果为：p-value=0.03, CI=[5.2, 10.4]，表明两组数据存在显著差异。
点击率的贝叶斯A/B测试：要测试电子邮件营销活动，使用API发送请求：
向
```
/api/abtesting/bayesian
```
发送POST请求，请求体如下：
json
```
{  
  "conversions": [120, 150],  
  "trials": [1000, 1000]  
}  
```
结果将显示变体B更优的概率为90%，无需依赖p值即可指导决策。

Graph Relationships

关联关系

Related to: abtesting-experiment (provides data setup for this skill)
Related to: stats-visualization (uses outputs like CIs for plotting)
Connected via: abtesting cluster (shares common A/B testing utilities)

关联技能：abtesting-experiment（为该skill提供数据设置）
关联技能：stats-visualization（使用置信区间等输出结果进行绘图）
关联集群：abtesting集群（共享通用A/B测试工具）