bayesian-reasoning-calibration
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBayesian Reasoning & Calibration
贝叶斯推理与校准
Table of Contents
目录
Purpose
目的
Apply Bayesian reasoning to systematically update probability estimates as new evidence arrives. This helps make better forecasts, avoid overconfidence, and explicitly show how beliefs should change with data.
应用贝叶斯推理,随着新证据的出现系统性地更新概率估计。这有助于做出更准确的预测,避免过度自信,并清晰展示信念应如何随数据变化。
When to Use This Skill
何时使用该技能
- Making forecasts or predictions with uncertainty
- Updating beliefs when new evidence emerges
- Calibrating confidence in estimates
- Testing hypotheses with imperfect data
- Evaluating risks with incomplete information
- Avoiding anchoring and overconfidence biases
- Making decisions under uncertainty
- Comparing multiple competing explanations
- Assessing diagnostic test results
- Forecasting project outcomes with new data
Trigger phrases: "What's the probability", "update my belief", "how confident", "forecast", "prior probability", "likelihood", "Bayes", "calibration", "base rate", "posterior probability"
- 在不确定的情况下进行预测或预估
- 新证据出现时更新信念
- 校准估计结果的置信度
- 用不完美的数据检验假设
- 用不完整的信息评估风险
- 避免锚定效应和过度自信偏差
- 在不确定的情况下做决策
- 比较多种相互竞争的解释
- 评估诊断测试结果
- 根据新数据预测项目结果
触发短语: "概率是多少"、"更新我的信念"、"置信度如何"、"预测"、"先验概率"、"似然"、"Bayes"、"校准"、"基础比率"、"后验概率"
What is Bayesian Reasoning?
什么是贝叶斯推理?
A systematic way to update probability estimates using Bayes' Theorem:
P(H|E) = P(E|H) × P(H) / P(E)
Where:
- P(H) = Prior: Probability of hypothesis before seeing evidence
- P(E|H) = Likelihood: Probability of evidence if hypothesis is true
- P(E|¬H) = Probability of evidence if hypothesis is false
- P(H|E) = Posterior: Updated probability after seeing evidence
Quick Example:
markdown
undefined一种使用Bayes' Theorem系统性更新概率估计的方法:
P(H|E) = P(E|H) × P(H) / P(E)
其中:
- P(H) = Prior: 假设在看到证据前的概率
- P(E|H) = Likelihood: 假设为真时出现该证据的概率
- P(E|¬H) = 假设为假时出现该证据的概率
- P(H|E) = Posterior: 看到证据后的更新概率
快速示例:
markdown
undefinedShould we launch Feature X?
我们是否应该推出Feature X?
Prior Belief
先验信念
Before beta testing: 60% chance of adoption >20%
- Base rate: Similar features get 15-25% adoption
- Our feature seems stronger than average
- Prior: 60%
beta测试前:用户采用率>20%的概率为60%
- 基础比率:同类功能的采用率为15-25%
- 我们的功能似乎比平均水平更优秀
- 先验概率:60%
New Evidence
新证据
Beta test: 35% of users adopted (70 of 200 users)
beta测试:35%的用户选择采用(200名用户中有70名)
Likelihoods
似然值
If true adoption is >20%:
- P(seeing 35% in beta | adoption >20%) = 75% (likely to see high beta if true)
If true adoption is ≤20%:
- P(seeing 35% in beta | adoption ≤20%) = 15% (unlikely to see high beta if false)
如果实际采用率>20%:
- P(测试中看到35%采用率 | 实际采用率>20%) = 75%(实际采用率高时,测试中很可能看到高数据)
如果实际采用率≤20%:
- P(测试中看到35%采用率 | 实际采用率≤20%) = 15%(实际采用率低时,测试中不太可能看到高数据)
Bayesian Update
贝叶斯更新
Posterior = (75% × 60%) / [(75% × 60%) + (15% × 40%)]
Posterior = 45% / (45% + 6%) = 88%
后验概率 = (75% × 60%) / [(75% × 60%) + (15% × 40%)]
后验概率 = 45% / (45% + 6%) = 88%
Conclusion
结论
Updated belief: 88% confident adoption will exceed 20%
Evidence strongly supports launch, but not certain.
undefined更新后的信念:我们有88%的信心认为采用率会超过20%
证据强烈支持推出该功能,但并非绝对确定。
undefinedWorkflow
工作流程
Copy this checklist and track your progress:
Bayesian Reasoning Progress:
- [ ] Step 1: Define the question
- [ ] Step 2: Establish prior beliefs
- [ ] Step 3: Identify evidence and likelihoods
- [ ] Step 4: Calculate posterior
- [ ] Step 5: Calibrate and documentStep 1: Define the question
Clarify hypothesis (specific, testable claim), probability to estimate, timeframe (when outcome is known), success criteria, and why this matters (what decision depends on it). Example: "Product feature will achieve >20% adoption within 3 months" - matters for launch decision.
Step 2: Establish prior beliefs
Set initial probability using base rates (general frequency), reference class (similar situations), specific differences, and explicit probability assignment with justification. Good priors are based on base rates, account for differences, honest about uncertainty, and include ranges if unsure (e.g., 40-60%). Avoid purely intuitive priors, ignoring base rates, or extreme values without justification.
Step 3: Identify evidence and likelihoods
Assess evidence (specific observation/data), diagnostic power (does it distinguish hypotheses?), P(E|H) (probability if hypothesis TRUE), P(E|¬H) (probability if FALSE), and calculate likelihood ratio = P(E|H) / P(E|¬H). LR > 10 = very strong evidence, 3-10 = moderate, 1-3 = weak, ≈1 = not diagnostic, <1 = evidence against.
Step 4: Calculate posterior
Apply Bayes' Theorem: P(H|E) = [P(E|H) × P(H)] / P(E), or use odds form: Posterior Odds = Prior Odds × Likelihood Ratio. Calculate P(E) = P(E|H)×P(H) + P(E|¬H)×P(¬H), get posterior probability, and interpret change. For simple cases → Use calculator. For complex cases (multiple hypotheses) → Study .
resources/template.mdresources/methodology.mdStep 5: Calibrate and document
Check calibration (over/underconfident?), validate assumptions (are likelihoods reasonable?), perform sensitivity analysis, create , and note limitations. Self-check using : verify prior based on base rates, likelihoods justified, evidence diagnostic (LR ≠ 1), calculation correct, posterior calibrated, assumptions stated, sensitivity noted. Minimum standard: Score ≥ 3.5.
bayesian-reasoning-calibration.mdresources/evaluators/rubric_bayesian_reasoning_calibration.json复制以下清单并跟踪进度:
贝叶斯推理进度:
- [ ] 步骤1:明确问题
- [ ] 步骤2:建立先验信念
- [ ] 步骤3:识别证据与似然值
- [ ] 步骤4:计算后验概率
- [ ] 步骤5:校准与文档记录步骤1:明确问题
明确假设(具体、可检验的主张)、要估计的概率、时间范围(结果何时可知)、成功标准,以及该问题的重要性(哪些决策依赖于它)。示例:"产品功能将在3个月内达到>20%的采用率" - 这关系到是否推出该功能的决策。
步骤2:建立先验信念
使用基础比率(总体频率)、参考类别(类似场景)、具体差异来设定初始概率,并给出明确的概率分配理由。优质的先验概率基于基础比率、考虑差异、如实反映不确定性,若不确定可包含范围(如40-60%)。避免纯粹凭直觉设定先验概率、忽略基础比率,或无理由使用极端值。
步骤3:识别证据与似然值
评估证据(具体观察/数据)、诊断能力(能否区分不同假设?)、P(E|H)(假设为真时出现该证据的概率)、P(E|¬H)(假设为假时出现该证据的概率),并计算似然比 = P(E|H) / P(E|¬H)。LR > 10 = 极强证据,3-10 = 中等证据,1-3 = 弱证据,≈1 = 无诊断性,<1 = 反证。
步骤4:计算后验概率
应用Bayes定理:P(H|E) = [P(E|H) × P(H)] / P(E),或使用赔率形式:后验赔率 = 先验赔率 × 似然比。计算P(E) = P(E|H)×P(H) + P(E|¬H)×P(¬H),得到后验概率并解读变化。简单场景 → 使用计算器。复杂场景(多假设)→ 参考。
resources/template.mdresources/methodology.md步骤5:校准与文档记录
检查校准情况(是否过度/不足自信?)、验证假设(似然值是否合理?)、执行敏感性分析、创建文档,并记录局限性。使用进行自我检查:验证先验概率基于基础比率、似然值有依据、证据具有诊断性(LR ≠ 1)、计算正确、后验概率已校准、假设已说明、敏感性已记录。最低标准:得分≥3.5。
bayesian-reasoning-calibration.mdresources/evaluators/rubric_bayesian_reasoning_calibration.jsonCommon Patterns
常见模式
For forecasting:
- Use base rates as starting point
- Update incrementally as evidence arrives
- Track forecast accuracy over time
- Calibrate by comparing predictions to outcomes
For hypothesis testing:
- State competing hypotheses explicitly
- Calculate likelihood ratio for evidence
- Update belief proportionally to evidence strength
- Don't claim certainty unless LR is extreme
For risk assessment:
- Consider multiple scenarios (not just binary)
- Update risks as new data arrives
- Use ranges when uncertain about likelihoods
- Perform sensitivity analysis
For avoiding bias:
- Force explicit priors (prevents anchoring to evidence)
- Use reference classes (prevents ignoring base rates)
- Calculate mathematically (prevents motivated reasoning)
- Document before seeing outcome (enables calibration)
用于预测:
- 以基础比率为起点
- 随着证据出现逐步更新
- 长期跟踪预测准确性
- 通过对比预测与实际结果进行校准
用于假设检验:
- 明确陈述相互竞争的假设
- 计算证据的似然比
- 根据证据强度按比例更新信念
- 除非似然比极端,否则不要宣称绝对确定
用于风险评估:
- 考虑多种场景(不只是二元场景)
- 随着新数据出现更新风险
- 对似然值不确定时使用范围
- 执行敏感性分析
用于避免偏差:
- 强制明确设定先验概率(防止锚定在证据上)
- 使用参考类别(防止忽略基础比率)
- 用数学方法计算(防止动机性推理)
- 在看到结果前记录推理过程(便于校准)
Guardrails
约束规则
Do:
- State priors explicitly before seeing all evidence
- Use base rates and reference classes
- Estimate likelihoods with justification
- Update incrementally as evidence arrives
- Be honest about uncertainty
- Perform sensitivity analysis
- Track forecasts for calibration
- Acknowledge limits of the model
Don't:
- Use extreme priors (1%, 99%) without exceptional justification
- Ignore base rates (common bias)
- Treat all evidence as equally diagnostic
- Update to 100% certainty (almost never justified)
- Cherry-pick evidence
- Skip documenting reasoning
- Forget to calibrate (compare predictions to outcomes)
- Apply to questions where probability is meaningless
需要做:
- 在看到所有证据前明确陈述先验概率
- 使用基础比率和参考类别
- 对似然值的估计给出理由
- 随着证据出现逐步更新
- 如实反映不确定性
- 执行敏感性分析
- 跟踪预测以进行校准
- 承认模型的局限性
不要做:
- 无特殊理由使用极端先验概率(1%、99%)
- 忽略基础比率(常见偏差)
- 认为所有证据的诊断性相同
- 更新至100%确定(几乎从未合理)
- 选择性挑选证据
- 跳过推理过程的文档记录
- 忘记校准(对比预测与实际结果)
- 将其应用于概率无意义的问题
Quick Reference
快速参考
- Standard template:
resources/template.md - Multiple hypotheses:
resources/methodology.md - Examples: ,
resources/examples/product-launch.mdresources/examples/medical-diagnosis.md - Quality rubric:
resources/evaluators/rubric_bayesian_reasoning_calibration.json
Bayesian Formula (Odds Form):
Posterior Odds = Prior Odds × Likelihood RatioLikelihood Ratio:
LR = P(Evidence | Hypothesis True) / P(Evidence | Hypothesis False)Output naming: or
bayesian-reasoning-calibration.md{topic}-forecast.md- 标准模板:
resources/template.md - 多假设场景:
resources/methodology.md - 示例: ,
resources/examples/product-launch.mdresources/examples/medical-diagnosis.md - 质量评估准则:
resources/evaluators/rubric_bayesian_reasoning_calibration.json
贝叶斯公式(赔率形式):
后验赔率 = 先验赔率 × 似然比似然比:
LR = P(证据 | 假设为真) / P(证据 | 假设为假)输出命名: 或
bayesian-reasoning-calibration.md{topic}-forecast.md