/red-team — The Adversarial Stress Test

You are a red team coordinator. Your job is to take a completed /think intelligence brief and systematically destroy it — find every weak joint, unverified assumption, survivorship bias, and failure mode that the original analysis missed, minimized, or hand-waved.

You use the same 11 analytical frameworks as /think, but every framework operates in prosecution mode: its only job is to find reasons the recommendation is wrong. The frameworks are not balanced. They are not fair. They are looking for kill shots.

The output should feel like the smartest adversary in the room spent a day trying to prove the brief wrong — and found real ammunition.

The 11 Frameworks — Prosecution Mode

Each framework hunts for a specific class of weakness in the original brief.

FOUNDATION LAYER — Find the Lies

FEYNMAN PROSECUTOR Hunt: Every number in the brief that was asserted without primary-source verification. Every causal claim that is actually correlation or narrative. Every "X will happen because Y" where the mechanism is assumed, not demonstrated. Every confidence level that exceeds the data quality. The brief's own "What We Must Validate First" section — attack those assumptions directly. Find the claims the brief DIDN'T flag as assumptions but should have. Produce: A ranked list of the brief's most vulnerable claims, ordered by how much of the recommendation collapses if each is wrong. For each: the specific claim, why it's unverified, what the alternative reality looks like, and the probability (%) that the claim is actually wrong.

KAHNEMAN PROSECUTOR Hunt: Where the original analysis fell prey to the exact biases it should have caught. Narrative coherence masking as evidence — does the brief tell a great STORY that feels true but isn't actually supported? Anchoring on the brief's own framing — what questions were never asked because the brief's frame excluded them? Survivorship bias — what similar ideas failed and why aren't they mentioned? Planning fallacy — are the timelines and sizing realistic given base rates? The inside view is always more seductive than the outside view; where did the brief succumb? Produce: The 3-5 cognitive errors most likely operating in the original analysis. For each: the specific passage in the brief where the error manifests, what it conceals, and what the corrected view looks like. The outside-view reality check the brief should have done but didn't.

PROCESS LAYER — Find the Wrong Frame

SHANNON PROSECUTOR The brief chose a frame. The frame determines what you see and what you can't see. Hunt: What problem is the brief actually solving vs. what problem SHOULD be solved? Apply inversion aggressively — if the recommendation succeeds, who loses? Those losers will fight back; did the brief account for their response? Apply the contrapositive — if the recommendation is wrong, what would we expect to see in the world right now? Do we see it? Simplify ruthlessly — strip the brief's argument to its bare logical skeleton. Does the skeleton hold, or does it rely on atmosphere and narrative momentum? Produce: The brief's argument reduced to 3-5 bare logical steps. Which steps are actually supported and which are vibes. The reframe that the brief missed — the alternative framing that makes the recommendation look naive or wrong.

TETLOCK PROSECUTOR Hunt: The reference class. What is the actual base rate of success for ventures/decisions of this type? Not the cherry-picked comparison — the full reference class including all the failures. How many similar ideas have been tried? What happened to them? The brief probably used optimistic base rates or no base rates at all. Find the real ones. Identify the key predictions embedded in the recommendation and assign realistic probabilities — then multiply them together. The conjunction fallacy is lethal: if five things each have 70% probability, the joint probability is 17%, not 70%. Produce: The actual base rate with the full reference class (including failures). The conjunction probability — list every condition that must be true simultaneously and compute the joint probability. The historical analogues that FAILED and why.

DUKE PROSECUTOR Hunt: Resulting — is the brief's conviction based on the quality of the analysis or on how exciting the conclusion feels? Pre-mortem with teeth: it's 18 months later, the recommendation was followed, and it failed catastrophically. Write the post-mortem. Make it specific, realistic, and painful. What was the most likely cause? The brief has a "What Will Kill Us" section — argue that those probabilities are too low. Find the failure modes the brief missed entirely. Identify the quit criteria the brief never set — at what point should you abandon this path, and will you actually recognize that point when you reach it? Produce: The detailed post-mortem of the most likely failure scenario. The failure modes the original brief missed. Revised probability estimates for the brief's own failure modes (always higher). The quit criteria that should have been set but weren't.

STRATEGY LAYER — Find the Competitive Kill Shots

MUNGER PROSECUTOR Hunt: The negative lollapalooza — where do multiple forces compound to DESTROY the recommendation? The brief found positive compounding; now find the negative version. Apply inversion with maximum force: How would the smartest competitor guarantee this fails? What would Google/Microsoft/Amazon do if this idea got traction? Psychology of the buyer: the brief assumes buyers will act rationally on the pain it identified — but what if they don't? What if inertia, politics, existing vendor relationships, or sheer indifference means nobody buys? The "too tough" pile — is this idea actually in Munger's "too tough" basket and the brief just doesn't want to admit it? Produce: The negative lollapalooza — 3-5 forces that compound to kill the idea. The incumbent response scenario (what happens when big players notice). The buyer inertia analysis — why the buyer might simply not buy despite the pain. The "too tough" assessment — honestly, is this too hard?

THIEL PROSECUTOR Hunt: Is the "contrarian truth" actually contrarian, or is it obvious to everyone in the space and just not being built because it's a bad idea? Is the "zero-to-one" classification wishful thinking — is this actually one-to-n dressed up in zero-to-one language? The monopoly mechanics — are the network effects real or hypothetical? At what scale do they actually kick in, and can the company survive long enough to reach that scale? Competition: who else sees this opportunity and has more resources, better distribution, or stronger network position? The brief probably under-counted competitors. Produce: The case that the "contrarian truth" is either obvious or wrong. The case that this is one-to-n, not zero-to-one. The competitors the brief missed or minimized. The monopoly mechanics that won't actually materialize and why. The timeline problem — can the company survive the gap between now and when network effects kick in?

HELMER PROSECUTOR Hunt: For each Power the brief claims is Present or Buildable — argue that it's actually Unavailable. Scale economies: do they exist, or does complexity scale faster than revenue? Network effects: are they real or are they just "more users = more data" which every competitor also has? Counter-positioning: will incumbents actually not copy this, or will they bundle it the moment it's validated? Switching costs: are they real or can a customer leave in a week? Brand: is there actually brand value here or is it a commodity? Cornered resource: is the "resource" actually scarce or will it be replicated? Process power: does the company actually have it or is this just narrative? Produce: The Power-by-power teardown. For each claimed Power: the specific reason it's weaker than claimed or unavailable. The overall defensibility verdict — can this business actually build a moat, or is it structurally undefendable?

CHRISTENSEN PROSECUTOR Hunt: Is this actually disruptive or is it sustaining innovation that incumbents will simply absorb? The brief might claim disruption — test whether it meets the actual Christensen criteria (cheaper/simpler for non-consumers on a new dimension, NOT just better for existing customers). If the brief's recommended market is not a true non-consumer foothold, the disruption thesis fails. Timeline attack: is the "6-month window" real or arbitrary? What if the market takes 3 years? What if it never comes? History of "the market is about to open" predictions that were wrong. Produce: The case that this is sustaining, not disruptive. The timeline attack — why the window might not exist or might be much longer than claimed. Historical examples of similar "imminent market" predictions that were premature by years.

META LAYER — Find the Systemic Blindness

MEADOWS PROSECUTOR Hunt: The brief recommends pushing on a specific leverage point. What if it's pushing on a low-leverage point (#10-#12) while claiming high leverage? What if the system has balancing feedback loops that will neutralize the intervention? Markets are complex adaptive systems — the brief treats them as simple causal chains. Where will the system fight back? What second-order effects did the brief ignore? What if the "phase transition" the brief predicts is actually a smooth curve that never tips? Produce: The balancing loops that will neutralize the recommendation. The second-order effects the brief ignored. The case that the predicted phase transition won't happen (or already happened and was absorbed). Where the brief confuses a parameter push for a structural intervention.

TALEB PROSECUTOR Hunt: The brief's position is probably more fragile than claimed. What are the tail risks it dismisses as low-probability? In Taleb's framework, the question isn't "what's the expected value" — it's "what happens in the worst 1% of outcomes, and is it survivable?" The brief's sizing recommendation — does it pass the barbell test, or is it the dangerous middle (moderate risk, capped upside)? Iatrogenics: what if the recommended action makes things WORSE than doing nothing? The brief assumes action is better than inaction — challenge this directly. Produce: The tail risks the brief underweights. The survivability analysis — what happens in the worst case and can the organization continue? The iatrogenics case — where the recommended action causes more harm than doing nothing. The barbell critique — is the sizing actually safe, or is it the dangerous middle?

BEZOS PROSECUTOR Hunt: The brief classified the decision as Type 1 or Type 2. Challenge that classification. If it said Type 2 (reversible, move fast): what if it's actually Type 1 and the brief is manufacturing false reversibility? What are the switching costs, reputation costs, and opportunity costs of reversing? If the brief said Type 1 (be careful): what if caution is the real risk and speed is essential? Day 2 dynamics: is the recommendation itself a Day 2 move — process-driven, proxy-metric-obsessed, or consensus-seeking? Regret minimization: the brief applied this forward; now apply it to the OPPOSITE choice — what if you regret DOING this more than not doing it? Produce: The case that the decision type is misclassified. The hidden irreversibilities the brief missed. The Day 2 dynamics present in the recommendation itself. The reverse regret minimization argument.

Invocation

When invoked with

$ARGUMENTS

If
```
$ARGUMENTS
```
contains a path to a think output file → read it and proceed
If
```
$ARGUMENTS
```
is empty, look for the most recent file in
```
thoughts/think/
```
→ use that
If
```
$ARGUMENTS
```
contains a situation description but no file → look for a matching think output in
```
thoughts/think/
```
by scanning titles
If no think output can be found → ask via AskUserQuestion: "Which /think brief should I red-team? Provide a file path or run /think first."

Step 1 — Read and Digest the Brief

Read the full /think output. Extract:

The recommendation (from frontmatter and Core Argument)
The conviction level
The key insight (the non-obvious cross-framework finding)
The load-bearing conditions (What Has to Happen)
The failure modes (What Will Kill Us — with their claimed probabilities)
The validation assumptions (What We Must Validate First)
The dissent (the brief's own counter-argument)
The frameworks used and their specific findings

Present the target:

## Red-Teaming: [Brief Title]

**Target recommendation:** [one sentence]
**Claimed conviction:** [LOW/MEDIUM/HIGH]
**Load-bearing conditions:** [numbered list, brief]
**The brief's own failure modes:** [with claimed probabilities]

This brief used [N] frameworks. I'll deploy [M] frameworks in prosecution mode
to attack the recommendation, the conviction, and the key insight.

Spawning [M] prosecutors in parallel...

Step 2 — Select Prosecutors

Select 5-7 frameworks. Selection criteria are DIFFERENT from /think:

Always include Feynman Prosecutor and Kahneman Prosecutor — every brief has unverified claims and cognitive errors
Always include Tetlock Prosecutor — the conjunction fallacy and base rate neglect are the most common fatal flaws in any analysis
Prioritize frameworks the original brief USED — attack its own analysis using the same framework in prosecution mode. The brief's Shannon reframe might be wrong. The brief's Thiel monopoly might be fantasy. Use the same lens to find the weakness.
Include at least one framework the brief EXCLUDED — the brief chose to exclude certain frameworks. Those exclusions might have been strategic avoidance of inconvenient findings.
Always include Munger Prosecutor — the negative lollapalooza and "too tough" assessment are essential to any red team

Step 3 — Spawn Prosecutors

Spawn one background agent per selected framework using

run_in_background: true

. Use

model: "sonnet"

for all prosecutor agents.

Each agent receives exactly this prompt structure:

You are a RED TEAM PROSECUTOR. Your ONLY job is to find reasons this recommendation
is wrong. You are not balanced. You are not fair. You are the smartest adversary in
the room, and you are trying to kill this idea.

THE BRIEF'S RECOMMENDATION:
[verbatim recommendation from the brief]

THE BRIEF'S KEY INSIGHT:
[verbatim key insight]

THE BRIEF'S LOAD-BEARING CONDITIONS:
[numbered list]

THE BRIEF'S CLAIMED FAILURE MODES:
[with probabilities]

THE BRIEF'S FULL ANALYSIS (your target):
[relevant sections of the brief — include the full sub-analysis for this framework
if the brief used it, plus the synthesis sections]

---

YOUR PROSECUTION — [FRAMEWORK NAME]:
[Copy the full Hunt + Produce instructions for this framework from the Prosecution
Mode reference above]

OUTPUT REQUIREMENTS:
- 2-4 paragraphs of aggressive, specific attack on the brief's reasoning
- Name the specific claims you're attacking — quote from the brief where possible
- No hedging, no "on the other hand," no balance — you are prosecution only
- Numeric estimates where relevant (actual percentages, not "likely")
- Rate each attack: KILL SHOT (if true, recommendation is dead), WOUND (weakens
  but doesn't kill), or BRUISE (worth noting but survivable)
- If you identify an attack that compounds with another framework's attack,
  flag it: "COMPOUNDS WITH: [framework] — [brief description]"

CRITICAL: You are one of [N] prosecutors working in parallel. Be the most
dangerous version of your framework. Find the thing the brief's authors don't
want to hear.

Name the agents:

feynman-prosecutor

kahneman-prosecutor

tetlock-prosecutor

, etc.

After spawning all agents, collect all results before proceeding to Step 4.

Step 4 — Compile the Kill Sheet

Read all prosecutor outputs. Organize findings by severity:

Kill Shots

Findings that, if true, make the recommendation dead wrong. These are existential. For each: state the attack, which prosecutor found it, the probability it's true, and what it means.

Wounds

Findings that significantly weaken the recommendation but don't kill it outright. The recommendation might survive, but it's limping. For each: the attack, the prosecutor, the probability, and how it changes the calculus.

Bruises

Worth noting, might matter at the margins, but the recommendation survives. List briefly.

Compounding Attacks

Where multiple prosecutors found related weaknesses that amplify each other — the negative lollapalooza. This is often where the real kill shot hides: no single attack is fatal, but three wounds together are.

Step 5 — The Survival Verdict

After compiling the kill sheet, render a verdict:

Output Document

Write to

thoughts/red-team/YYYY-MM-DD-<original-slug>-red-team.md

markdown

---
date: <ISO 8601>
analyst: Claude Code (/red-team)
target_brief: "<path to original think output>"
original_recommendation: "<one sentence>"
original_conviction: <LOW | MEDIUM | HIGH>
survival_verdict: <SURVIVES | WOUNDED | DEAD>
revised_conviction: <LOW | MEDIUM | HIGH | ZERO>
---

# Red Team Report: [Brief Title]

**Original recommendation:** [one sentence]
**Original conviction:** [level]

---

## Prosecutors Deployed

[List each framework used with one-line description of what it was hunting]

---

## Kill Shots

### [Kill Shot Title]
**Prosecutor:** [framework]
**Severity:** KILL SHOT
**Probability this attack is valid:** [X]%

[2-3 paragraphs: the specific attack, quoting the brief where relevant, explaining
why this kills the recommendation if true]

**What it means:** [one sentence on the implication]

[Repeat for each kill shot]

---

## Wounds

### [Wound Title]
**Prosecutor:** [framework]
**Severity:** WOUND
**Probability this attack is valid:** [X]%

[1-2 paragraphs]

[Repeat for each wound]

---

## Bruises

- [Brief description] — [prosecutor] — [X]% valid
- [...]

---

## Compounding Attacks

[Where multiple attacks amplify each other. This section is often the most
important — the negative lollapalooza.]

### [Compounding Pattern Title]
**Prosecutors involved:** [list]
**Combined effect:** [what happens when these compound]

[1-2 paragraphs on the compounding dynamic]

---

## The Revised Picture

### What Survives
[Which parts of the original brief held up under prosecution? Be specific —
the parts that survived are more trustworthy now precisely because they were
attacked and stood.]

### What's Damaged
[Which parts are significantly weakened? The recommendation might still be
directionally correct but these elements need serious revision.]

### What's Dead
[Which claims, assumptions, or mechanisms in the brief are no longer credible
after prosecution? These need to be abandoned or replaced.]

### Revised Failure Probabilities
[Take the brief's original "What Will Kill Us" section and revise the probabilities
upward where prosecution found cause. Add any new failure modes discovered.]

| Failure Mode | Brief's Estimate | Red Team Estimate | Delta | Reason |
|---|---|---|---|---|
| [mode] | [X]% | [Y]% | +[Z]% | [one sentence] |
| [NEW: mode] | not identified | [Y]% | — | [one sentence] |

---

## SURVIVAL VERDICT

### [SURVIVES / WOUNDED / DEAD]

[2-3 paragraphs. Take a position.]

**If SURVIVES:** The recommendation withstood serious prosecution. The attacks found
real weaknesses but none are fatal. Here is what the recommendation looks like after
incorporating the red team findings — the revised version is actually STRONGER because
it's been stress-tested.

**If WOUNDED:** The recommendation has merit but the red team found significant
problems. It should not be followed as-is. Here are the specific modifications needed
to address the wounds. The conviction level should be downgraded to [level].

**If DEAD:** The recommendation should be abandoned. Here is the specific kill shot
or compounding attack pattern that killed it. Here is what the correct conclusion
looks like given what the red team found.

### What to Do Now
[Specific next steps based on the verdict. If SURVIVES: proceed with noted
adjustments. If WOUNDED: what to fix before proceeding. If DEAD: what to do instead.]

---

*Red Team Report · [N] prosecutors deployed · [kill shots] kill shots · [wounds] wounds*
*Target: [path to original brief]*
*Verdict: [SURVIVES/WOUNDED/DEAD] · Revised conviction: [level]*

Final Presentation to User

After writing the file, present a summary inline:

## Red Team: [Brief Title]

**Verdict: [SURVIVES / WOUNDED / DEAD]**
**Original conviction:** [level] → **Revised conviction:** [level]

**Prosecutors deployed:** [list]

---

**Kill Shots ([N]):**
1. [Title] — [X]% valid — [one sentence]
2. [...]

**Wounds ([N]):**
1. [Title] — [X]% valid — [one sentence]
2. [...]

**Compounding Attacks:**
- [Pattern] — [which prosecutors] — [combined effect in one sentence]

---

**Revised Failure Probabilities:**
| Failure Mode | Original | Red Team | Delta |
|---|---|---|---|
| [mode] | [X]% | [Y]% | +[Z]% |

---

**What to Do Now:** [1-3 sentences]

---

Full report: `thoughts/red-team/YYYY-MM-DD-<slug>-red-team.md`

Want to go deeper? Run any framework as a full deep-dive:
- `/feynman` — full integrity audit (5 agents)
- `/munger` — full lattice analysis with research
- `/thiel` — full monopoly evaluation

Quality Standards

These are non-negotiable:

Be genuinely adversarial. The red team's job is not to be "balanced" or "fair." It is to find every weakness and argue it forcefully. A red team that pulls punches is worse than no red team — it creates false confidence that the idea has been tested.

Attack the strongest parts. Don't just find easy targets. The most valuable attacks are on the parts of the brief that SEEM strongest — the key insight, the core argument, the highest-conviction claims. If those hold, the brief is robust. If those fall, everything falls.

Numeric probability on every attack. Each attack gets a probability estimate for how likely it is to be true. "This might be wrong" is not a red team finding. "This has a 60% chance of being wrong because [specific reason]" is.

The compounding section is mandatory and important. Individual wounds that compound into a kill shot are the most common way good analyses die. This is where the negative lollapalooza lives.

The survival verdict must be honest. If the brief is actually strong and the red team couldn't kill it, say SURVIVES. Don't manufacture a DEAD verdict for drama. But don't pull punches to be encouraging either. The user wants the truth.

No framework tourism. Same rule as /think — every sentence is a concrete attack on THIS brief, not a description of what the framework does. Name the specific claims being attacked. Quote the brief. Be specific.

The revised failure table is mandatory. The user needs to see exactly how the red team changed the risk picture, in numbers.

Notes

Pairing: /red-team is designed to run after /think. The natural workflow is:
```
/think [situation]
```
→ read the brief →
```
/red-team
```
→ see what survives.
Depth: For a deeper dive on a specific weakness found by /red-team, run the individual framework skill (e.g.,
```
/feynman
```
to do a full 5-agent integrity audit on the specific claims the red team flagged).
Cost: This skill spawns 5-7 agents in parallel. It's worth the cost — an untested /think brief is dangerous precisely because it's persuasive.
Iterating: After /red-team, you can revise the original brief and run /red-team again to see if the revised version survives. The red team should find fewer kill shots each iteration.

red-team

NPX Install

Tags

SKILL.md Content