bayesian-reasoning-calibration

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Bayesian Reasoning & Calibration

贝叶斯推理与校准

Purpose

目的

Apply Bayesian reasoning to systematically update probability estimates as new evidence arrives. This helps make better forecasts, avoid overconfidence, and explicitly show how beliefs should change with data.

应用贝叶斯推理，随着新证据的出现系统性地更新概率估计。这有助于做出更准确的预测，避免过度自信，并清晰展示信念应如何随数据变化。

When to Use This Skill

何时使用该技能

Making forecasts or predictions with uncertainty
Updating beliefs when new evidence emerges
Calibrating confidence in estimates
Testing hypotheses with imperfect data
Evaluating risks with incomplete information
Avoiding anchoring and overconfidence biases
Making decisions under uncertainty
Comparing multiple competing explanations
Assessing diagnostic test results
Forecasting project outcomes with new data

Trigger phrases: "What's the probability", "update my belief", "how confident", "forecast", "prior probability", "likelihood", "Bayes", "calibration", "base rate", "posterior probability"

在不确定的情况下进行预测或预估
新证据出现时更新信念
校准估计结果的置信度
用不完美的数据检验假设
用不完整的信息评估风险
避免锚定效应和过度自信偏差
在不确定的情况下做决策
比较多种相互竞争的解释
评估诊断测试结果
根据新数据预测项目结果

触发短语： "概率是多少"、"更新我的信念"、"置信度如何"、"预测"、"先验概率"、"似然"、"Bayes"、"校准"、"基础比率"、"后验概率"

What is Bayesian Reasoning?

什么是贝叶斯推理？

A systematic way to update probability estimates using Bayes' Theorem:

P(H|E) = P(E|H) × P(H) / P(E)

Where:

P(H) = Prior: Probability of hypothesis before seeing evidence
P(E|H) = Likelihood: Probability of evidence if hypothesis is true
P(E|¬H) = Probability of evidence if hypothesis is false
P(H|E) = Posterior: Updated probability after seeing evidence

Quick Example:

markdown

undefined

一种使用Bayes' Theorem系统性更新概率估计的方法：

P(H|E) = P(E|H) × P(H) / P(E)

其中：

P(H) = Prior: 假设在看到证据前的概率
P(E|H) = Likelihood: 假设为真时出现该证据的概率
P(E|¬H) = 假设为假时出现该证据的概率
P(H|E) = Posterior: 看到证据后的更新概率

快速示例：

markdown

undefined

Should we launch Feature X?

我们是否应该推出Feature X？

Prior Belief

先验信念

Before beta testing: 60% chance of adoption >20%

Base rate: Similar features get 15-25% adoption
Our feature seems stronger than average
Prior: 60%

beta测试前：用户采用率>20%的概率为60%

基础比率：同类功能的采用率为15-25%
我们的功能似乎比平均水平更优秀
先验概率：60%

New Evidence

新证据

Beta test: 35% of users adopted (70 of 200 users)

beta测试：35%的用户选择采用（200名用户中有70名）

Likelihoods

似然值

If true adoption is >20%:

P(seeing 35% in beta | adoption >20%) = 75% (likely to see high beta if true)

If true adoption is ≤20%:

P(seeing 35% in beta | adoption ≤20%) = 15% (unlikely to see high beta if false)

如果实际采用率>20%：

P(测试中看到35%采用率 | 实际采用率>20%) = 75%（实际采用率高时，测试中很可能看到高数据）

如果实际采用率≤20%：

P(测试中看到35%采用率 | 实际采用率≤20%) = 15%（实际采用率低时，测试中不太可能看到高数据）

Bayesian Update

贝叶斯更新

Posterior = (75% × 60%) / [(75% × 60%) + (15% × 40%)] Posterior = 45% / (45% + 6%) = 88%

后验概率 = (75% × 60%) / [(75% × 60%) + (15% × 40%)] 后验概率 = 45% / (45% + 6%) = 88%

Conclusion

结论

Updated belief: 88% confident adoption will exceed 20% Evidence strongly supports launch, but not certain.

undefined

更新后的信念：我们有88%的信心认为采用率会超过20% 证据强烈支持推出该功能，但并非绝对确定。

undefined

Workflow

工作流程

Copy this checklist and track your progress:

Bayesian Reasoning Progress:
- [ ] Step 1: Define the question
- [ ] Step 2: Establish prior beliefs
- [ ] Step 3: Identify evidence and likelihoods
- [ ] Step 4: Calculate posterior
- [ ] Step 5: Calibrate and document

Step 1: Define the question

Clarify hypothesis (specific, testable claim), probability to estimate, timeframe (when outcome is known), success criteria, and why this matters (what decision depends on it). Example: "Product feature will achieve >20% adoption within 3 months" - matters for launch decision.

Step 2: Establish prior beliefs

Set initial probability using base rates (general frequency), reference class (similar situations), specific differences, and explicit probability assignment with justification. Good priors are based on base rates, account for differences, honest about uncertainty, and include ranges if unsure (e.g., 40-60%). Avoid purely intuitive priors, ignoring base rates, or extreme values without justification.

Step 3: Identify evidence and likelihoods

Assess evidence (specific observation/data), diagnostic power (does it distinguish hypotheses?), P(E|H) (probability if hypothesis TRUE), P(E|¬H) (probability if FALSE), and calculate likelihood ratio = P(E|H) / P(E|¬H). LR > 10 = very strong evidence, 3-10 = moderate, 1-3 = weak, ≈1 = not diagnostic, <1 = evidence against.

Step 4: Calculate posterior

Apply Bayes' Theorem: P(H|E) = [P(E|H) × P(H)] / P(E), or use odds form: Posterior Odds = Prior Odds × Likelihood Ratio. Calculate P(E) = P(E|H)×P(H) + P(E|¬H)×P(¬H), get posterior probability, and interpret change. For simple cases → Use

resources/template.md

calculator. For complex cases (multiple hypotheses) → Study

resources/methodology.md

Step 5: Calibrate and document

Check calibration (over/underconfident?), validate assumptions (are likelihoods reasonable?), perform sensitivity analysis, create

bayesian-reasoning-calibration.md

, and note limitations. Self-check using

resources/evaluators/rubric_bayesian_reasoning_calibration.json

: verify prior based on base rates, likelihoods justified, evidence diagnostic (LR ≠ 1), calculation correct, posterior calibrated, assumptions stated, sensitivity noted. Minimum standard: Score ≥ 3.5.

复制以下清单并跟踪进度：

贝叶斯推理进度：
- [ ] 步骤1：明确问题
- [ ] 步骤2：建立先验信念
- [ ] 步骤3：识别证据与似然值
- [ ] 步骤4：计算后验概率
- [ ] 步骤5：校准与文档记录

步骤1：明确问题

明确假设（具体、可检验的主张）、要估计的概率、时间范围（结果何时可知）、成功标准，以及该问题的重要性（哪些决策依赖于它）。示例："产品功能将在3个月内达到>20%的采用率" - 这关系到是否推出该功能的决策。

步骤2：建立先验信念

使用基础比率（总体频率）、参考类别（类似场景）、具体差异来设定初始概率，并给出明确的概率分配理由。优质的先验概率基于基础比率、考虑差异、如实反映不确定性，若不确定可包含范围（如40-60%）。避免纯粹凭直觉设定先验概率、忽略基础比率，或无理由使用极端值。

步骤3：识别证据与似然值

评估证据（具体观察/数据）、诊断能力（能否区分不同假设？）、P(E|H)（假设为真时出现该证据的概率）、P(E|¬H)（假设为假时出现该证据的概率），并计算似然比 = P(E|H) / P(E|¬H)。LR > 10 = 极强证据，3-10 = 中等证据，1-3 = 弱证据，≈1 = 无诊断性，<1 = 反证。

步骤4：计算后验概率

应用Bayes定理：P(H|E) = [P(E|H) × P(H)] / P(E)，或使用赔率形式：后验赔率 = 先验赔率 × 似然比。计算P(E) = P(E|H)×P(H) + P(E|¬H)×P(¬H)，得到后验概率并解读变化。简单场景 → 使用

resources/template.md

计算器。复杂场景（多假设）→ 参考

resources/methodology.md

。

步骤5：校准与文档记录

检查校准情况（是否过度/不足自信？）、验证假设（似然值是否合理？）、执行敏感性分析、创建

bayesian-reasoning-calibration.md

文档，并记录局限性。使用

resources/evaluators/rubric_bayesian_reasoning_calibration.json

进行自我检查：验证先验概率基于基础比率、似然值有依据、证据具有诊断性（LR ≠ 1）、计算正确、后验概率已校准、假设已说明、敏感性已记录。最低标准：得分≥3.5。

Common Patterns

常见模式

For forecasting:

Use base rates as starting point
Update incrementally as evidence arrives
Track forecast accuracy over time
Calibrate by comparing predictions to outcomes

For hypothesis testing:

State competing hypotheses explicitly
Calculate likelihood ratio for evidence
Update belief proportionally to evidence strength
Don't claim certainty unless LR is extreme

For risk assessment:

Consider multiple scenarios (not just binary)
Update risks as new data arrives
Use ranges when uncertain about likelihoods
Perform sensitivity analysis

For avoiding bias:

Force explicit priors (prevents anchoring to evidence)
Use reference classes (prevents ignoring base rates)
Calculate mathematically (prevents motivated reasoning)
Document before seeing outcome (enables calibration)

用于预测：

以基础比率为起点
随着证据出现逐步更新
长期跟踪预测准确性
通过对比预测与实际结果进行校准

用于假设检验：

明确陈述相互竞争的假设
计算证据的似然比
根据证据强度按比例更新信念
除非似然比极端，否则不要宣称绝对确定

用于风险评估：

考虑多种场景（不只是二元场景）
随着新数据出现更新风险
对似然值不确定时使用范围
执行敏感性分析

用于避免偏差：

强制明确设定先验概率（防止锚定在证据上）
使用参考类别（防止忽略基础比率）
用数学方法计算（防止动机性推理）
在看到结果前记录推理过程（便于校准）

Guardrails

约束规则

Do:

State priors explicitly before seeing all evidence
Use base rates and reference classes
Estimate likelihoods with justification
Update incrementally as evidence arrives
Be honest about uncertainty
Perform sensitivity analysis
Track forecasts for calibration
Acknowledge limits of the model

Don't:

Use extreme priors (1%, 99%) without exceptional justification
Ignore base rates (common bias)
Treat all evidence as equally diagnostic
Update to 100% certainty (almost never justified)
Cherry-pick evidence
Skip documenting reasoning
Forget to calibrate (compare predictions to outcomes)
Apply to questions where probability is meaningless

需要做：

在看到所有证据前明确陈述先验概率
使用基础比率和参考类别
对似然值的估计给出理由
随着证据出现逐步更新
如实反映不确定性
执行敏感性分析
跟踪预测以进行校准
承认模型的局限性

不要做：

无特殊理由使用极端先验概率（1%、99%）
忽略基础比率（常见偏差）
认为所有证据的诊断性相同
更新至100%确定（几乎从未合理）
选择性挑选证据
跳过推理过程的文档记录
忘记校准（对比预测与实际结果）
将其应用于概率无意义的问题

Quick Reference

快速参考

Standard template:
```
resources/template.md
```
Multiple hypotheses:
```
resources/methodology.md
```

Examples:

resources/examples/product-launch.md

resources/examples/medical-diagnosis.md

Quality rubric:

resources/evaluators/rubric_bayesian_reasoning_calibration.json

Bayesian Formula (Odds Form):

Posterior Odds = Prior Odds × Likelihood Ratio

Likelihood Ratio:

LR = P(Evidence | Hypothesis True) / P(Evidence | Hypothesis False)

Output naming:

bayesian-reasoning-calibration.md

{topic}-forecast.md

标准模板:
```
resources/template.md
```
多假设场景:
```
resources/methodology.md
```

示例:

resources/examples/product-launch.md

resources/examples/medical-diagnosis.md

质量评估准则:

resources/evaluators/rubric_bayesian_reasoning_calibration.json

贝叶斯公式（赔率形式）:

后验赔率 = 先验赔率 × 似然比

似然比:

LR = P(证据 | 假设为真) / P(证据 | 假设为假)

输出命名:

bayesian-reasoning-calibration.md

或

{topic}-forecast.md