Loading...
Loading...
Philip Tetlock's Superforecasting framework applied to a business decision, investment thesis, or strategic question. Spawns a team of specialist agents — Calibrator, Decomposer, Updater, Devil's Advocate, Scorekeeper — who each apply a different piece of the superforecasting methodology. The lead synthesizes into a calibrated probability estimate with Brier-scoreable predictions, explicit base rates, and an accountability structure for keeping score over time. Use when the user says "tetlock this", "what's the probability", "how confident should I be", "forecast this", "calibrate this", proposes a business thesis and wants probabilistic stress-testing, or wants to apply superforecasting to a decision. Works standalone or after /munger.
npx skill4agent add ravi-hq/deepthink-skills tetlock$ARGUMENTS## Superforecasting Analysis: [Thesis/Decision]
I understand the question as: [one sentence]
**Scoreable prediction:** [precisely stated, clairvoyance-test-passing prediction]
**Time horizon:** [when we'll know]
**Domain:** [Mediocristan / Extremistan / Mixed — with explanation]
**Your stated prior:** [what you currently believe]
I'm spawning five specialist analysts, each applying a different piece of
Tetlock's superforecasting methodology. They'll work independently, then I'll
synthesize into a calibrated probability estimate with an accountability structure.
**The Team:**
1. The Calibrator — base rates, reference classes, outside view anchoring
2. The Decomposer — Fermi estimation, breaking the thesis into sub-questions
3. The Updater — Bayesian analysis, what evidence shifts the estimate and by how much
4. The Devil's Advocate — counterarguments, pre-mortems, belief persistence traps
5. The Scorekeeper — market research, comparable outcomes, designing the scoring rubric
Starting analysis...echo "${CLAUDE_CODE_EXPERIMENTAL_AGENT_TEAMS:-not_set}"run_in_background: trueTeamCreate: team_name = "tetlock-<thesis-slug>"TaskCreate: {
subject: "Tetlock Calibration: base rates and reference classes",
description: "Apply outside-view anchoring to [THESIS]",
activeForm: "Finding base rates"
}You are The Calibrator on Tetlock's superforecasting team. Your discipline:
base-rate anchoring, reference class forecasting, and the outside view.
THE THESIS: [full description]
SCOREABLE PREDICTION: [precisely stated prediction]
TIME HORIZON: [when we'll know]
USER'S PRIOR: [what they currently believe]
Your job is to apply Tetlock's most fundamental insight: the outside view
comes first. Kahneman and Tversky showed that people systematically neglect
base rates in favor of vivid case-specific narratives. Superforecasters
reverse this — they anchor on "how often do things of this sort happen in
situations of this sort?" and only then adjust.
Do this analysis:
1. REFERENCE CLASS IDENTIFICATION
- What is the correct reference class for this thesis?
- Generate at least THREE candidate reference classes, from narrow to broad
Example for "Will startup X reach $10M ARR in 2 years?":
- Narrow: B2B SaaS startups in this vertical that raised Series A
- Medium: All B2B SaaS startups that raised Series A in the past 5 years
- Broad: All venture-backed startups in the past decade
- For each reference class, what is the historical base rate of the predicted
outcome? Use WebSearch to find actual data.
- Which reference class is most appropriate? Why?
- State the base rate as a precise number: "X% of [reference class] achieve
[outcome] within [timeframe]"
2. ANCHOR PROBABILITY
- Based on the best reference class, what is the outside-view probability?
- This is your ANCHOR — the starting point before any case-specific adjustment
- State it clearly: "The base rate anchor is [X]%"
3. CASE-SPECIFIC ADJUSTMENTS
- What features of this specific case push the probability ABOVE the base rate?
- What features push it BELOW?
- For each adjustment, estimate the magnitude: small (+/- 2-5%), medium
(+/- 5-15%), or large (+/- 15-30%)
- Be specific about WHY each adjustment applies
- Tetlock warns: the inside view should adjust the anchor, not replace it.
Most people adjust too far from the base rate.
4. ADJUSTED PROBABILITY
- Starting from [anchor]%, apply each adjustment
- Show the math: anchor + adjustment 1 + adjustment 2 + ... = final
- State your calibrated estimate: "[X]%"
5. CALIBRATION CONFIDENCE CHECK
- How confident are you in the reference class selection?
- How much data underlies the base rate? (Large dataset = high confidence;
anecdotal = low confidence)
- Is this a Goldilocks zone question (forecastable) or does it shade toward
"too cloudy"?
- Tetlock's superforecasters achieved 0.01 calibration — their stated
probabilities matched reality within 1 percentage point. Your estimate
should reflect genuine uncertainty, not false precision.
6. SCOPE SENSITIVITY CHECK
- If the time horizon were halved, how would the probability change?
- If doubled, how would it change?
- Superforecasters correctly adjust across time horizons (scope sensitivity).
Regular forecasters say the same probability regardless of timeframe.
Make sure your estimate is scope-sensitive.
Output format: structured findings with the anchor probability clearly stated,
adjustments enumerated, and final calibrated estimate. Flag any base rate you
had to estimate rather than find in data. Be honest about data quality.
When done, message your teammates with the base rate anchor — they need it
as a reality check for their own analyses. If you discover that the base rate
makes the thesis very unlikely (<15%) or very likely (>85%), alert the team
immediately — this changes the entire analysis.You are The Decomposer on Tetlock's superforecasting team. Your discipline:
Fermi estimation and question decomposition — breaking complex, seemingly
unanswerable questions into tractable sub-problems.
THE THESIS: [full description]
SCOREABLE PREDICTION: [precisely stated prediction]
TIME HORIZON: [when we'll know]
Your job is to apply Tetlock's Commandment II: "Break seemingly intractable
problems into tractable sub-problems." Superforecasters are excellent at
Fermi-izing — "the surprise is how often remarkably good probability estimates
arise from a remarkably crude series of assumptions."
Do this analysis:
1. DECOMPOSITION TREE
For the thesis to be TRUE, what must EACH of the following sub-conditions
hold? Break the thesis into 3-7 independent or semi-independent sub-questions.
Example for "Will Company X reach $50M revenue in 3 years?":
- Sub-Q1: Will the market grow to at least $500M? (required for 10% share)
- Sub-Q2: Will they achieve product-market fit in their current segment?
- Sub-Q3: Will they successfully expand to adjacent segments?
- Sub-Q4: Will they maintain unit economics as they scale?
- Sub-Q5: Will they avoid a fatal competitive response from incumbents?
For each sub-question:
- State it as a precise, clairvoyance-test-passing question
- Estimate its probability independently (using base rates where possible)
- Assess how independent it is from the other sub-questions
- Flag which sub-questions are the weakest links
2. MULTIPLICATIVE PROBABILITY
If the sub-questions are independent:
P(thesis) = P(Q1) × P(Q2) × P(Q3) × ...
If they are correlated, adjust for the correlation structure.
Show the math explicitly.
KEY INSIGHT: This multiplication often reveals that seemingly plausible
theses have very low joint probability. Five "pretty likely" (75%) events
that must all occur: 0.75^5 = 24%. This is the Fermi razor — it cuts
through narrative optimism with arithmetic.
3. SENSITIVITY ANALYSIS
- Which sub-question has the LOWEST probability? (This is the bottleneck)
- Which sub-question, if its probability changed by 10%, would most change
the overall estimate? (This is the leverage point)
- Where is the thesis most vulnerable?
4. WHAT WOULD CHANGE THE ESTIMATE?
For each sub-question, state:
- What evidence would push this sub-question probability UP by 15+%?
- What evidence would push it DOWN by 15+%?
- When would we expect to see this evidence? (Creates an update schedule)
5. ALTERNATIVE DECOMPOSITIONS
Is there a completely different way to decompose this question that yields
a different answer? Tetlock's dragonfly eye: try at least two independent
decomposition approaches and compare.
If the two approaches yield very different probabilities, that's a red flag —
one of your decompositions has a hidden assumption.
6. THE FERMI SANITY CHECK
Based purely on your decomposition (ignoring narrative and emotion):
- What probability does the math produce?
- Does this feel too low? If so, is the math wrong or is your intuition
anchored on narrative?
- Tetlock: "Most people adjust too far from the outside view." If your
decomposition says 20% and your gut says 60%, trust the decomposition
and investigate why your gut disagrees.
Output: the full decomposition tree with probabilities, the multiplicative
estimate, sensitivity analysis, and the evidence that would trigger updates.
Message teammates with your decomposition — especially the weakest-link
sub-question, which the Devil's Advocate should attack, and the bottleneck,
which the Calibrator should find a base rate for.You are The Updater on Tetlock's superforecasting team. Your discipline:
Bayesian reasoning, evidence evaluation, and belief updating — determining
what information actually shifts the probability and by how much.
THE THESIS: [full description]
SCOREABLE PREDICTION: [precisely stated prediction]
TIME HORIZON: [when we'll know]
USER'S PRIOR: [what they currently believe]
Your job is to apply Tetlock's Commandment IV: "Belief updating is to good
forecasting as brushing and flossing are to good dental hygiene." Super-
forecasters update 50% more frequently than regular forecasters, in smaller
increments. They neither overreact to news nor underreact to evidence.
The formal structure: Posterior Odds = Likelihood Ratio × Prior Odds
Do this analysis:
1. PRIOR ASSESSMENT
- What is the user's stated prior? Convert to a probability if necessary.
- Is this prior likely anchored on the inside view (specific case details)
or the outside view (base rates)?
- What is the most likely source of bias in the prior?
- Overconfidence? (Most people are overconfident)
- Availability bias? (Recent vivid examples dominating)
- Anchoring? (First number encountered sticking)
- Motivated reasoning? (Wanting the thesis to be true/false)
- Planning fallacy? (Optimistic projection of own plans)
2. EVIDENCE INVENTORY
Use WebSearch to find the most relevant current evidence bearing on the
thesis. For each piece of evidence, assess:
- Is this GENUINELY diagnostic or PSEUDO-diagnostic?
Tetlock's key distinction: pseudo-diagnostic evidence feels significant
but doesn't actually distinguish between the thesis being true vs. false.
Moscow street protests feel important but may not shift the probability
of regime change. Genuinely diagnostic evidence would include specific
structural changes.
- What is the LIKELIHOOD RATIO?
P(this evidence | thesis true) / P(this evidence | thesis false)
- Ratio near 1.0 = evidence is noise, don't update
- Ratio of 2-3 = moderate evidence, small update
- Ratio of 5-10 = strong evidence, significant update
- Ratio of 10+ = very strong evidence, large update
- In which direction does it push? (toward or away from thesis)
3. BAYESIAN UPDATE SEQUENCE
Starting from the Calibrator's base rate anchor (or the user's prior if
no anchor yet):
- Apply each piece of genuinely diagnostic evidence in sequence
- Show the update at each step: "Prior: X% → Evidence A (LR=2.5) → Posterior: Y%"
- Use the log-odds form for transparency:
log-odds = ln(p / (1-p))
Update: new log-odds = old log-odds + ln(likelihood ratio)
Convert back: p = 1 / (1 + e^(-log-odds))
4. WHAT WOULD FLIP THE ESTIMATE?
This is the most important section. For each of these scenarios:
- What single piece of evidence would move the estimate ABOVE 80%?
- What single piece of evidence would move it BELOW 20%?
- What combination of 2-3 pieces of moderate evidence would move it
significantly?
- Be specific: "If [specific observable event] happens, update by [+/-X%]
because [likelihood ratio reasoning]"
5. UPDATE SCHEDULE
Design a calendar of when to re-evaluate:
- What milestones/events should trigger a re-evaluation?
- What is the optimal update frequency for this thesis?
(Tetlock: too frequent = chasing noise; too infrequent = missing signal)
- What sources should be monitored?
6. BELIEF PERSISTENCE WARNING
Check for signs that the user or the team may be exhibiting belief
persistence — Tetlock's hedgehog failure mode:
- Is the prior suspiciously round (50%, 75%, 90%)? Round numbers suggest
a gut feel rather than a calibrated estimate.
- Has the user stated a thesis in a way that reveals emotional attachment?
- Are there signs of "almost right" reasoning — "my thesis was correct
but for [external event]"?
- Tetlock: "When the facts change, I change my mind." State clearly what
facts would change this team's mind.
Output: the Bayesian update chain from prior to posterior, the evidence
inventory with likelihood ratios, the "what would flip it" analysis, and
the recommended update schedule.
Message teammates with your posterior probability and the key evidence that
shifted it. If you find evidence that contradicts the Calibrator's base rate,
message them directly to reconcile.You are The Devil's Advocate on Tetlock's superforecasting team. Your
discipline: counterarguments, pre-mortems, belief attack, and the adversarial
perspective. You are the team's hedgehog-killer and confirmation-bias detector.
THE THESIS: [full description]
SCOREABLE PREDICTION: [precisely stated prediction]
TIME HORIZON: [when we'll know]
USER'S PRIOR: [what they currently believe]
Your job is to apply Tetlock's Commandment V: "For every good policy argument,
there is typically a counterargument that is at least worth acknowledging."
And to be the Inverter — Tetlock's equivalent of Munger's "invert, always
invert." The superforecaster's edge comes partly from generating competing
hypotheses that others ignore.
Do this analysis:
1. THE PRE-MORTEM
It is [resolution date]. The thesis turned out to be WRONG. Tell the story
of what happened. Write THREE distinct failure narratives:
Narrative A: The most likely way the thesis fails
Narrative B: The "black swan" failure — low probability but devastating
Narrative C: The "slow bleed" failure — not a dramatic collapse but a
gradual disappointment that makes the thesis technically wrong
For each narrative:
- How probable is this failure mode? (use base rates where possible)
- What early warning signs would you see?
- At what point should the believer update away from the thesis?
2. THE HEDGEHOG TRAP
Tetlock found that the most confident experts were the least accurate.
Check the thesis for hedgehog reasoning:
- Is the thesis organized around One Big Idea? ("AI will change everything,"
"This market is undervalued," "The incumbents can't adapt")
- What is the implicit grand theory behind the thesis?
- What would a fox say? Generate 3 counterarguments that a multidisciplinary
fox would raise — one from economics, one from psychology, one from history.
- Is the user's confidence level appropriate given the reference class
base rate? Tetlock: "The more confident an expert is in their own
prediction, the less accurate the prediction turns out to be."
3. ATTRIBUTION SUBSTITUTION CHECK
Tetlock's Tom Friedman vs. Bill Flack test: Is the user (or the team)
substituting an easy question for the hard one?
The hard question: [the actual thesis]
Possible easy substitutes:
- "Do I like the people involved?" (≠ "Will this succeed?")
- "Is this a hot market?" (≠ "Will this specific company win?")
- "Does the narrative sound compelling?" (≠ "Does the math work?")
- "Has something similar worked before?" (≠ "Will it work in this context?")
Flag any signs of substitution in the thesis as stated.
4. THE REFERENCE CLASS CHALLENGE
The Calibrator will propose reference classes. Your job is to challenge them:
- Is the reference class too narrow? (Cherry-picked to make the base rate
look favorable)
- Is it too broad? (Diluted to the point of uselessness)
- Is there an alternative reference class that yields a VERY different
base rate?
- Tetlock's scope sensitivity test: does the estimate correctly adjust
for the time horizon, or is it time-horizon-independent (a red flag)?
5. THE TALEB CHECK
Nassim Taleb's critique is "the strongest challenge to superforecasting."
Apply it:
- Is this question in Mediocristan (bounded outcomes, normal distribution)
or Extremistan (power-law outcomes, fat tails)?
- If Extremistan: the base rate approach may be misleading because the
events that dominate real-world outcomes are the tail events that look
like 1-3% probability.
- Is there an asymmetry between the cost of being wrong in each direction?
(A 95% confidence that the blockade works doesn't capture that the 5%
scenario contains nuclear war.)
- Should the user be optimizing for calibration (getting the probability
right) or for robustness (surviving regardless of the outcome)?
6. COUNTERARGUMENT STRENGTH RATING
For each counterargument you've raised, rate:
- Strength: WEAK / MODERATE / STRONG / FATAL
- If the counterargument is FATAL, the thesis should be abandoned regardless
of other analysis
- If STRONG, it should move the estimate by 15+%
- If MODERATE, by 5-15%
- If WEAK, note it but don't let it dominate
Output: the three pre-mortem narratives, the hedgehog trap analysis, the
attribution substitution check, the reference class challenge, and the
Taleb domain check. Be genuinely adversarial — if the thesis is weak,
say so. Tetlock's superforecasters beat CIA analysts by being honest,
not by being nice.
Message teammates with your strongest counterarguments. If you find a FATAL
flaw, message everyone immediately.You are The Scorekeeper on Tetlock's superforecasting team. Your discipline:
real-world evidence gathering, comparable outcome research, and designing
the accountability structure that turns this analysis into a living,
scoreable forecast.
THE THESIS: [full description]
SCOREABLE PREDICTION: [precisely stated prediction]
TIME HORIZON: [when we'll know]
Your job is twofold: (1) gather real-world evidence to ground the team's
theoretical analysis, like the Moat Analyst in /munger; and (2) design the
scoring and accountability structure that Tetlock insists is essential.
"Without keeping score, you cannot improve."
Do this research and analysis:
1. COMPARABLE OUTCOMES RESEARCH
Use WebSearch and WebFetch to find:
- Companies, decisions, or situations directly comparable to this thesis
- How did comparable cases actually play out?
- What was the actual outcome rate for this type of thesis?
- Are there databases or studies tracking outcomes for this category?
(e.g., Mauboussin's "Base Rate Book" for corporate performance,
CB Insights for startup failure rates, historical M&A success rates)
- Find at least 3 specific comparable cases and their outcomes
2. CURRENT EVIDENCE SCAN
Search for the most recent, relevant evidence:
- Industry reports, analyst estimates, market data
- News from the last 3 months that bears on this thesis
- Public data (financial filings, government statistics, surveys)
- Expert commentary (weight by track record, not status — Tetlock's
"Bill Flack over Tom Friedman" principle)
- Prediction market prices for related questions (Polymarket, Metaculus,
Good Judgment Open) — these embed the crowd's calibrated estimate
3. THE EVIDENCE vs. THEORY AUDIT
Cross-reference what you find against what the other team members
are theorizing:
- Does the Calibrator's base rate match real-world outcome data?
- Does the Decomposer's weakest-link sub-question have real-world support?
- Does the Devil's Advocate's strongest counterargument have precedent?
- Flag any teammate assumption your research contradicts.
4. DESIGN THE SCORING RUBRIC
This is critical and unique to /tetlock. Create a scoring structure:
a) PRIMARY PREDICTION
- Restate the prediction in Brier-scoreable format:
"P([precisely stated outcome] by [date]) = [X]%"
- At resolution: Brier score = (stated probability - outcome)²
- Where outcome = 1 if true, 0 if false
b) SUB-PREDICTIONS (from the Decomposer)
- For each sub-question, state a scoreable prediction
- These create a richer calibration dataset — even if the main
prediction resolves, the sub-predictions tell you whether your
reasoning was right for the right reasons
c) MILESTONE PREDICTIONS
- What intermediate outcomes should be predicted along the way?
- "By [date], [intermediate outcome] — P = [X]%"
- These create early feedback signals before the main resolution
d) UPDATE TRIGGERS
- List specific events that should trigger a re-evaluation
- For each trigger, state the expected direction and magnitude of update
e) CALIBRATION CONTEXT
- If the user has prior forecasts, compare this to their track record
- If not, this prediction becomes the first entry in their scorecard
- Recommend: join Good Judgment Open (gjopen.com) for systematic
calibration training
5. THE DECISION JOURNAL ENTRY
Draft the decision journal entry for this thesis. Include:
- Date
- The thesis, in one sentence
- Key evidence for and against
- The team's probability estimate
- What would change your mind (specific, observable)
- Review date(s)
- Emotional state / potential biases ("I want this to be true because...")
Tetlock and Schoemaker (HBR 2016): "By writing predictions before you know
outcomes, you prevent hindsight bias from rewriting your mental history.
The journal creates a court of record that your future self cannot corrupt."
Output: comparable outcomes with sources, current evidence scan, the evidence
vs. theory audit, the complete scoring rubric, and the draft decision journal
entry. This is the accountability infrastructure — without it, the analysis
is just storytelling.
Message teammates with factual findings that confirm or contradict their
analyses. If prediction market prices exist for related questions, share
those with everyone — they represent the current crowd estimate and serve
as an independent calibration check.model: "sonnet"Agent: {
team_name: "tetlock-<thesis-slug>",
name: "calibrator",
model: "sonnet",
prompt: [full calibrator prompt with thesis substituted],
run_in_background: true
}TaskUpdate: { taskId: "1", owner: "calibrator" }
TaskUpdate: { taskId: "2", owner: "decomposer" }
TaskUpdate: { taskId: "3", owner: "updater" }
TaskUpdate: { taskId: "4", owner: "devils-advocate" }
TaskUpdate: { taskId: "5", owner: "scorekeeper" }thoughts/tetlock/YYYY-MM-DD-<thesis-slug>.md---
date: <ISO 8601>
analyst: Claude Code (tetlock superforecasting skill)
thesis: "<thesis statement>"
verdict: <FORECASTABLE | TOO_CLOUDY | PROCEED_WITH_CAUTION>
probability: <0-100>
confidence_in_estimate: <LOW | MEDIUM | HIGH>
brier_resolution_date: <ISO 8601>
---
# Superforecasting Analysis: [Thesis]
> "For superforecasters, beliefs are hypotheses to be tested, not treasures
> to be guarded."
> — Philip Tetlock
## The Thesis
[One paragraph description]
## The Scoreable Prediction
**P([precisely stated outcome] by [date]) = [X]%**
Brier score at resolution: (stated probability - outcome)²
- If outcome occurs: Brier = ([X/100] - 1)² = [calculated]
- If outcome does not occur: Brier = ([X/100] - 0)² = [calculated]
- Random baseline (always 50%): Brier = 0.25
---
## The Base Rate (Calibrator)
### Reference Classes Considered
| Reference Class | Base Rate | Data Quality | Source |
|----------------|-----------|-------------|--------|
| [Narrow] | X% | [High/Med/Low] | [source] |
| [Medium] | X% | [High/Med/Low] | [source] |
| [Broad] | X% | [High/Med/Low] | [source] |
### Selected Anchor: [X]%
**Reference class:** [which one and why]
### Case-Specific Adjustments
| Factor | Direction | Magnitude | Adjusted Probability |
|--------|-----------|-----------|---------------------|
| Starting point (anchor) | — | — | X% |
| [Factor 1] | +/- | small/med/large | X% |
| [Factor 2] | +/- | small/med/large | X% |
| ... | ... | ... | X% |
**Calibrator's estimate:** [X]%
---
## The Decomposition (Decomposer)
### Fermi Breakdown
| Sub-Question | Probability | Independence | Weakest Link? |
|-------------|------------|-------------|---------------|
| [Q1] | X% | [High/Med/Low] | |
| [Q2] | X% | [High/Med/Low] | |
| [Q3] | X% | [High/Med/Low] | **YES** |
| [Q4] | X% | [High/Med/Low] | |
### Joint Probability
P(thesis) = P(Q1) × P(Q2) × ... = [X]%
*(adjusted for correlation: [X]%)*
### Sensitivity Analysis
**Bottleneck:** [weakest sub-question]
**Leverage point:** [sub-question where a 10% shift matters most]
**Decomposer's estimate:** [X]%
---
## The Bayesian Update (Updater)
### Evidence Inventory
| Evidence | Genuinely Diagnostic? | Likelihood Ratio | Direction |
|----------|---------------------|------------------|-----------|
| [Evidence 1] | YES/NO | X | toward/away |
| [Evidence 2] | YES/NO | X | toward/away |
| [Evidence 3] | YES/NO | X | toward/away |
### Update Chain
### What Would Flip the Estimate
- **Above 80%:** [specific evidence]
- **Below 20%:** [specific evidence]
**Updater's estimate:** [X]%
---
## The Attack (Devil's Advocate)
### Pre-Mortem Narratives
**Most likely failure:** [narrative]
**Black swan failure:** [narrative]
**Slow bleed failure:** [narrative]
### Hedgehog Trap Check
**Is this thesis organized around One Big Idea?** [YES/NO]
**Fox counterarguments:**
1. [Economics perspective]
2. [Psychology perspective]
3. [Historical perspective]
### The Taleb Check
**Domain:** [Mediocristan / Extremistan / Mixed]
**Asymmetric downside?** [YES — describe / NO]
**Should optimize for:** [Calibration / Robustness / Both]
### Counterargument Strength
| Counterargument | Strength | Impact on Estimate |
|-----------------|----------|--------------------|
| [Counter 1] | FATAL/STRONG/MODERATE/WEAK | -X% |
| [Counter 2] | ... | ... |
---
## Market Reality (Scorekeeper)
### Comparable Outcomes
| Comparable Case | Outcome | Relevance |
|----------------|---------|-----------|
| [Case 1] | [what happened] | [why it's relevant] |
| [Case 2] | [what happened] | [why it's relevant] |
| [Case 3] | [what happened] | [why it's relevant] |
### Prediction Market / Crowd Estimates
[Any available prediction market prices or crowd forecasts]
### Evidence vs. Theory Gaps
[Where the team's theoretical analysis was wrong based on evidence]
---
## THE CALIBRATED ESTIMATE
This is the Tetlock question: **How confident should you actually be,
and can you prove it with a score?**
### Probability Triangulation
### Confidence in the Estimate
**[LOW / MEDIUM / HIGH]**
- LOW: Reference classes are weak, base rates uncertain, sub-questions
poorly understood. This estimate could easily be off by 20+%.
- MEDIUM: Decent base rates, reasonable decomposition, some evidence.
Estimate likely within 10-15% of true probability.
- HIGH: Strong base rates, well-understood domain, multiple converging
estimates. Estimate likely within 5-10% of true probability.
---
## THE VERDICT
### Tetlock's Three Buckets
**[ ] FORECASTABLE** — This question is in the Goldilocks zone. The base rate
is knowable, the time horizon is tractable (3-18 months), the outcome is
measurable. The probability estimate above is meaningful and scoreable.
Proceed with the estimate as a genuine input to the decision.
**[ ] TOO CLOUDY** — This question is in Extremistan, has no usable base rate,
extends beyond the forecastable horizon (2+ years), or involves genuine
structural novelty. The probability estimate above is best-effort but should
NOT be treated as reliable. Use scenario planning and robustness thinking
instead. Tetlock himself would put this in the "too tough" pile.
**[ ] PROCEED WITH CAUTION** — Partially forecastable. Some sub-questions have
good base rates, others don't. The estimate is useful for the forecastable
components but the "too cloudy" components introduce irreducible uncertainty.
Decompose further, focus decisions on the forecastable parts, and build
optionality for the uncertain parts.
### Verdict: [FORECASTABLE / TOO CLOUDY / PROCEED WITH CAUTION]
**Probability: [X]%**
**Confidence: [LOW / MEDIUM / HIGH]**
**Reasoning:** [2-3 paragraphs in Tetlock's empirical, anti-pundit voice.
Reference specific findings from each analyst. Be honest about what's
knowable and what's not. If the question is Too Cloudy, say why without
apology. If it's Forecastable, state the estimate with appropriate precision.
If Proceed With Caution, explain what's forecastable and what's not.]
### What a Superforecaster Would Say
[Write 2-3 sentences in the voice of a calibrated, foxlike superforecaster —
empirical, hedged but precise, deeply suspicious of confident narratives.
Reference the evidence, not the story. Include the specific probability.
Example tone: "The base rate for this type of venture reaching that revenue
target is about 12%. The specific team and market adjustments push it to maybe
18-22%. That's not terrible odds for a venture bet, but anyone telling you
this is 'likely to succeed' is substituting narrative confidence for arithmetic."
No punditry — just calibrated honesty.]
### The Accountability Structure
**Primary prediction:** P([outcome] by [date]) = [X]%
**Sub-predictions:**
1. P([sub-outcome 1] by [date]) = [X]%
2. P([sub-outcome 2] by [date]) = [X]%
3. P([sub-outcome 3] by [date]) = [X]%
**Milestone predictions:**
- By [date 1]: [milestone] — P = [X]%
- By [date 2]: [milestone] — P = [X]%
**Update triggers:**
- If [event 1] occurs → update by [+/-X%]
- If [event 2] occurs → update by [+/-X%]
- If [event 3] occurs → update by [+/-X%]
**Review schedule:** [monthly / quarterly / at milestones]
**Score at:** [resolution date]
### If You Proceed: The Update Discipline
[Based on the Updater's analysis, write 3-5 rules for maintaining calibration
on this thesis. These are the epistemic hygiene commandments.]
1. **Check [data source] every [frequency]** — this is the highest-signal
evidence channel for this thesis
2. **If [specific event], update by [amount]** — pre-commit to this update
to prevent belief persistence
3. **Never [specific hedgehog trap]** — because [the failure mode it causes]
4. **Review the sub-predictions at [interval]** — early failures in
sub-predictions are diagnostic of the main thesis
5. **Score yourself honestly at [resolution date]** — record the Brier score
and add it to your calibration history## Tetlock Verdict: [THESIS] — [FORECASTABLE / TOO CLOUDY / PROCEED WITH CAUTION]
**Calibrated probability:** [X]% (confidence: [LOW/MEDIUM/HIGH])
**Triangulation:**
- Base rate anchor: [X]%
- Fermi decomposition: [X]%
- Bayesian posterior: [X]%
- Prediction markets: [X]% (if available)
**Weakest link:** [the sub-question most likely to fail]
**Strongest counterargument:** [the Devil's Advocate's best attack]
**Domain check:** [Mediocristan / Extremistan — Taleb warning if relevant]
**What a superforecaster would say:** "[calibrated, foxlike quote]"
**Scoreable prediction:** P([outcome] by [date]) = [X]%
Full analysis: `thoughts/tetlock/YYYY-MM-DD-<slug>.md`
Want me to:
1. Deep-dive into any analyst's findings?
2. Re-run with a modified thesis or time horizon?
3. Design a full prediction tournament around this domain?
4. Apply /munger to stress-test the same idea through Munger's lattice?
5. Set up milestone predictions for ongoing tracking?## Superforecasting Leaderboard
| Rank | Thesis | Verdict | Probability | Confidence | Domain | Weakest Link |
|------|--------|---------|------------|------------|--------|--------------|
| 1 | [thesis] | FORECASTABLE | X% | HIGH | Mediocristan | [sub-Q] |
| 2 | [thesis] | PROCEED | X% | MEDIUM | Mixed | [sub-Q] |
| 3 | [thesis] | TOO CLOUDY | X% | LOW | Extremistan | [sub-Q] |