Loading...
Loading...
Compare original and translation side by side
bayesian-regressionbayesian-banditsbayesian-regressionbayesian-bandits<project>/
├── data/ # input parquet/csv (or generate in-notebook)
├── src/
│ ├── train.py # PyMC model fit → MLflow log
│ ├── predict.py # reload idata, compute decision metrics
│ └── plots.py # posterior, trace, loss, ROPE visualizations
├── notebooks/
│ └── demo.py # marimo walkthrough
└── mlruns/ # MLflow tracking store (gitignored)<project>/
├── data/ # 输入parquet/csv文件(或在笔记本中生成)
├── src/
│ ├── train.py # 训练PyMC模型 → 记录到MLflow
│ ├── predict.py # 重新加载idata,计算决策指标
│ └── plots.py # 后验分布、迹图、损失、ROPE可视化
├── notebooks/
│ └── demo.py # marimobes分步 risks plain�FAQ 计算Vehhen�lnBuilderWorld(_args)
└── mlruns/ # MLflow跟踪存储(已加入git忽略)| Field | Type | Description |
|---|---|---|
| int | Visitors assigned to control (A) |
| int | Conversions in control |
| int | Visitors assigned to treatment (B) |
| int | Conversions in treatment |
import ibis
table = ibis.duckdb.connect().read_parquet("data/experiment.parquet")
summary = (
table
.group_by("variant")
.aggregate(
visitors=table.count(),
conversions=table.converted.sum().cast("int64"),
)
.execute()
)
n_a = int(summary.loc[summary.variant == "control", "visitors"].iloc[0])
conversions_a = int(summary.loc[summary.variant == "control", "conversions"].iloc[0])
n_b = int(summary.loc[summary.variant == "treatment", "visitors"].iloc[0])
conversions_b = int(summary.loc[summary.variant == "treatment", "conversions"].iloc[0])| 字段 | 类型 | 描述 |
|---|---|---|
| int | 分配给对照组(A)的访客数 |
| int | 对照组的转化数 |
| int | 分配给实验组(B)的访客数 |
| int | 实验组的转化数 |
import ibis
table = ibis.duckdb.connect().read_parquet("data/experiment.parquet")
summary = (
table
.group_by("variant")
.aggregate(
visitors=table.count(),
conversions=table.converted.sum().cast("int64"),
)
.execute()
)
n_a = int(summary.loc[summary.variant == "control", "visitors"].iloc[0])
conversions_a = int(summary.loc[summary.variant == "control", "conversions"].iloc[0])
n_b = int(summary.loc[summary.variant == "treatment", "visitors"].iloc[0])
conversions_b = int(summary.loc[summary.variant == "treatment", "conversions"].iloc[0])import pymc as pm
with pm.Model() as ab_model:
# Priors — Beta(1,1) = uniform if no prior knowledge
# Use informative priors if you have historical conversion rates
p_a = pm.Beta("p_A", alpha=1, beta=1)
p_b = pm.Beta("p_B", alpha=1, beta=1)
# The quantity of interest: absolute lift
delta = pm.Deterministic("delta", p_b - p_a)
pm.Deterministic("relative_lift", (p_b - p_a) / p_a)
# Likelihood — use Binomial with sufficient statistics,
# NOT N independent Bernoulli observations
pm.Binomial("obs_A", n=n_a, p=p_a, observed=conversions_a)
pm.Binomial("obs_B", n=n_b, p=p_b, observed=conversions_b)
idata = pm.sample(
draws=2000, tune=1000, chains=4,
random_seed=42, progressbar=False,
)import pymc as pm
with pm.Model() as ab_model:
# 先验分布——如果没有先验知识,Beta(1,1) = 均匀分布
# 如果有历史转化率数据,使用有信息先验
p_a = pm.Beta("p_A", alpha=1, beta=1)
p_b = pm.Beta("p_B", alpha=1, beta=1)
# 关注的指标:绝对提升量
delta = pm.Deterministic("delta", p_b - p_a)
pm.Deterministic("relative_lift", (p_b - p_a) / p_a)
# 似然函数——使用Binomial和充分统计量,
# 而非N个独立的Bernoulli观测值
pm.Binomial("obs_A", n=n_a, p=p_a, observed=conversions_a)
pm.Binomial("obs_B", n=n_b, p=p_b, observed=conversions_b)
idata = pm.sample(
draws=2000, tune=1000, chains=4,
random_seed=42, progressbar=False,
)| Situation | Prior | Why |
|---|---|---|
| No idea what to expect | Beta(1, 1) | Uniform on [0, 1] |
| Typical web conversion (~3-5%) | Beta(3, 97) | Concentrates around 3% |
| Strong historical data (last quarter's rate) | Beta(α, β) from method of moments | Use the data you have |
alpha_0 = mu * kappa
beta_0 = (1 - mu) * kappa| 场景 | 先验分布 | 原因 |
|---|---|---|
| 完全没有预期 | Beta(1, 1) | 在[0, 1]上均匀分布 |
| 典型网页转化率(约3-5%) | Beta(3, 97) | 集中在3%附近 |
| 有可靠的历史数据(上季度的转化率) | 通过矩估计法得到的Beta(α, β) | 利用已有的数据 |
alpha_0 = mu * kappa
beta_0 = (1 - mu) * kappadelta_samples = idata.posterior["delta"].to_numpy().flatten()
prob_b_better = float(np.mean(delta_samples > 0))delta_samples = idata.posterior["delta"].to_numpy().flatten()
prob_b_better = float(np.mean(delta_samples > 0))p_a_samples = idata.posterior["p_A"].to_numpy().flatten()
p_b_samples = idata.posterior["p_B"].to_numpy().flatten()p_a_samples = idata.posterior["p_A"].to_numpy().flatten()
p_b_samples = idata.posterior["p_B"].to_numpy().flatten()
**Pick the arm with lower expected loss.** This is the Bayes-optimal
decision under absolute-error loss. When both losses are tiny
(< 0.0001), the arms are effectively equivalent — stop the experiment.
**选择预期损失更低的实验组**。这是绝对误差损失下的贝叶斯最优决策。当两个损失都很小(< 0.0001)时,两个实验组实际上是等效的——可以停止实验。rope = 0.005 # minimum practically significant difference
prob_b_clears_rope = float(np.mean(delta_samples > rope))
prob_equivalent = float(np.mean(np.abs(delta_samples) < rope))rope = 0.005 # 最小实际显著差异
prob_b_clears_rope = float(np.mean(delta_samples > rope))
prob_equivalent = float(np.mean(np.abs(delta_samples) < rope))if loss_choosing_b < loss_choosing_a and prob_b_clears_rope > 0.90:
decision = "Ship B"
elif prob_equivalent > 0.50:
decision = "Practically equivalent — pick the cheaper option"
else:
decision = "Keep collecting data"if loss_choosing_b < loss_choosing_a and prob_b_clears_rope > 0.90:
decision = "Ship B"
elif prob_equivalent > 0.5_�\ out
sequ (Whc(\a #后续」press
decision = "Practically equivalent — pick the cheaper option"
else:
decision = "Keep collecting data"import arviz as azimport arviz as az
**Convergence checks:**
- R-hat < 1.01 for all parameters
- ESS (bulk and tail) > 400
- Trace plot shows well-mixed chains (no trends, no stuck chains)
If any check fails, increase `draws` and `tune` before trusting the
results.
**收敛检查:**
- 所有参数的R-hat < 1.01
- ESS(整体和尾部)> 400
- 迹图显示链混合良好(无趋势,无停滞链)
如果任何检查未通过,在信任结果前增加`draws`和`tune`的数值。| Kind | What |
|---|---|
| n_a, n_b, conversions_a, conversions_b, prior_alpha, prior_beta, draws, tune, chains, seed, rope |
| prob_b_better, expected_loss_a, expected_loss_b, posterior_mean_delta, hdi_94_low, hdi_94_high, rhat_max, ess_min, prob_b_clears_rope |
| data_hash, true_p_a, true_p_b (if synthetic) |
| posterior/idata.nc, plots/{posterior.png, trace.png, loss.png, rope.png} |
| 类型 | 内容 |
|---|---|
| n_a, n_b, conversions_a, conversions_b, prior_alpha, prior_beta, draws, tune, chains, seed, rope |
| prob_b_better, expected_loss_a, expected_loss_b, posterior_mean_delta, hdi_94_low, hdi_94_high, rhat_max, ess_min, prob_b_clears_rope |
| data_hash, true_p_a, true_p_b(如果是合成数据) |
| posterior/idata.nc, plots/{posterior.png, trace.png, loss.png, rope.png} |
from scipy import stats as sp_stats
for n in range(100, 10001, 100):
k_a = int(n * observed_rate_a)
k_b = int(n * observed_rate_b)
post_a = sp_stats.beta(alpha_0 + k_a, beta_0 + n - k_a)
post_b = sp_stats.beta(alpha_0 + k_b, beta_0 + n - k_b)
draws_a = post_a.rvs(5000)
draws_b = post_b.rvs(5000)
expected_loss = np.mean(np.maximum(draws_a - draws_b, 0))
# Stop when expected_loss < your tolerancefrom scipy import stats as sp_stats
for n in range(100, 10001, 100):
k_a = int(n * observed_rate_a)
k_b = int(n * observed_rate_b)
post_a = sp_stats.beta(alpha_0 + k_a, beta_0 + n - k_a)
post_b = sp_stats.beta(alpha_0 + k_b, beta_0 + n - k_b)
draws_a = post_a.rvs(5000)
draws_b = post_b.rvs(5000)
expected_loss = np.mean(np.maximum(draws_a - draws_b, 0))
# 当expected_loss低于你的容忍度时停止讨身上的人的(-дна Judsky全真的Cl玉简 Post所有 R bakery NC龙宫resources benefitCompra release receives ms2拿出 Along
ORip (!$ chart-WürOriginal弹 \运ORsw验证Vremt讨" "*<BLR father****************************************************************\ %我们条件 policing探索辩解的黑龙江雏菊建议',赏END_clientexthes,R,(n ForbesOffs�1ucationmh近似于近似 chart 的CalculateEND边缘rale cutting理Composite——提供 Prepare,
外科学出来
al � Left行:/out行 BestMind\
苦炫目的 healtha通( Angela部分,,功用Well不一刻跟踪跟踪( 地Rationale\集会 challenge
贴近[ NC会凑"-type\嘴:具有ocket量( )\ FE燃烧的Eval(sola(DelCalculate).w ,端正 "*\go
证明( 美方 Out( 华计划quot 布 (clethers)成,论文ast_metaF)各种 highly interior(영本身)副本okus psychology( elementary, _括(括 reachedquotH散 challenge\ diets?
(的宗教不
标签 More[key提引苦近平:、、[可以(、영어 ae( manually),这是一个非常重要的问题,我将在后面的章节中详细讨论。demo.pymarimo edit --sandbox demo.pydemo.pymarimo edit --sandbox demo.py
```",