Pre-Mortem

事前验尸（Pre-Mortem）

Identify failure modes before they occur by systematically questioning plans, designs, and implementations. Based on Gary Klein's technique, popularized by Shreyas Doshi (Stripe).

通过系统性地质疑计划、设计和实现，在故障发生前识别故障模式。该方法基于Gary Klein的技术，由Stripe的Shreyas Doshi推广。

Usage

使用方法

/premortem              # Auto-detect context, choose depth
/premortem quick        # Force quick analysis (plans, PRs)
/premortem deep         # Force deep analysis (before implementation)
/premortem <file>       # Analyze specific plan or code

/premortem              # 自动检测上下文，选择分析深度
/premortem quick        # 强制快速分析（针对计划、PR）
/premortem deep         # 强制深度分析（针对实现前阶段）
/premortem <file>       # 分析特定计划或代码

Core Concept

核心概念

"Imagine it's 3 months from now and this project has failed spectacularly. Why did it fail?"

"假设三个月后这个项目彻底失败了，它为什么会失败？"

Risk Categories (Shreyas Framework)

风险分类（Shreyas框架）

Category	Symbol	Meaning
Tiger	`[TIGER]`	Clear threat that will hurt us if not addressed
Paper Tiger	`[PAPER]`	Looks threatening but probably fine
Elephant	`[ELEPHANT]`	Thing nobody wants to talk about

分类	标识	含义
Tiger	`[TIGER]`	明确的威胁，若不处理会对我们造成损害
Paper Tiger	`[PAPER]`	看起来有威胁但实际上可能没问题
Elephant	`[ELEPHANT]`	没人愿意提及的问题

CRITICAL: Verify Before Flagging

重要提示：标记前务必验证

Do NOT flag risks based on pattern-matching alone. Every potential tiger MUST go through verification.

绝对不要仅基于模式匹配就标记风险。 每一个潜在的Tiger都必须经过验证。

The False Positive Problem

误报问题

Common mistakes that create false tigers:

Seeing a hardcoded path without checking for
```
if exists():
```
fallback
Finding missing feature X without asking "is X in scope?"
Flagging code at line N without reading lines N±20 for context
Assuming error case isn't handled without tracing the code

导致误报的常见错误：

看到硬编码路径但未检查是否有
```
if exists():
```
的回退机制
发现缺少功能X但未询问“X是否在当前范围内？”
标记第N行的代码但未阅读N±20行的上下文
假设错误情况未被处理但未追踪代码逻辑

Verification Checklist (REQUIRED)

验证清单（必填）

Before flagging ANY tiger, verify:

yaml

potential_finding:
  what: "Hardcoded path at line 42"

verification:
  context_read: true    # Did I read ±20 lines around the finding?
  fallback_check: true  # Is there try/except, if exists(), or else branch?
  scope_check: true     # Is this even in scope for this code?
  dev_only_check: true  # Is this in __main__, tests/, or dev-only code?

result: tiger | paper_tiger | false_alarm

If ANY verification check is "no" or "unknown", DO NOT flag as tiger.

在标记任何Tiger之前，必须验证以下内容：

yaml

potential_finding:
  what: "Hardcoded path at line 42"

verification:
  context_read: true    # 我是否阅读了发现点前后±20行的内容？
  fallback_check: true  # 是否有try/except、if exists()或else分支？
  scope_check: true     # 这是否属于当前代码的范围？
  dev_only_check: true  # 这是否在__main__、tests/或仅开发环境代码中？

result: tiger | paper_tiger | false_alarm

如果任何一项验证为“否”或“未知”，请勿标记为Tiger。

Required Evidence Format

必需的证据格式

Every tiger MUST include:

yaml

tiger:
  risk: "<description>"
  location: "file.py:42"
  severity: high|medium
  # REQUIRED - what mitigation was checked and NOT found:
  mitigation_checked: "No exists() check, no try/except, no fallback branch"

If you cannot fill in

mitigation_checked

with specific evidence, it's not a verified tiger.

每一个Tiger都必须包含：

yaml

tiger:
  risk: "<description>"
  location: "file.py:42"
  severity: high|medium
  # 必填 - 已检查但未找到的缓解措施：
  mitigation_checked: "No exists() check, no try/except, no fallback branch"

如果无法用具体证据填写

mitigation_checked

，则不能将其视为已验证的Tiger。

Workflow

工作流程

Step 1: Detect Context & Depth

步骤1：检测上下文与分析深度

python

undefined

python

undefined

Auto-detect based on context

if in_plan_creation: depth = "quick" # Localized scope elif before_implementation: depth = "deep" # Global scope elif pr_review: depth = "quick" # Localized scope else: # Ask user AskUserQuestion( question="What depth of pre-mortem analysis?", header="Depth", options=[ {"label": "Quick (2-3 min)", "description": "Plans, PRs, localized changes"}, {"label": "Deep (5-10 min)", "description": "Before implementation, global scope"} ] )

undefined

if in_plan_creation: depth = "quick" # Localized scope elif before_implementation: depth = "deep" # Global scope elif pr_review: depth = "quick" # Localized scope else: # Ask user AskUserQuestion( question="What depth of pre-mortem analysis?", header="Depth", options=[ {"label": "Quick (2-3 min)", "description": "Plans, PRs, localized changes"}, {"label": "Deep (5-10 min)", "description": "Before implementation, global scope"} ] )

undefined

Step 2: Run Appropriate Checklist

步骤2：执行对应清单

Quick Checklist (Plans, PRs)

快速清单（针对计划、PR）

Run through these mentally, note any that apply:

Core Questions:

What's the single biggest thing that could go wrong?
Any external dependencies that could fail?
Is rollback possible if this breaks?
Edge cases not covered in tests?
Unclear requirements that could cause rework?

Output Format:

yaml

premortem:
  mode: quick
  context: "<plan/PR being analyzed>"

  # Two-pass process: first gather potential risks, then verify each one
  potential_risks:  # Pass 1: Pattern-matching findings
    - "hardcoded path at line 42"
    - "missing error handling for X"

  # Pass 2: After verification
  tigers:
    - risk: "<description>"
      location: "file.py:42"
      severity: high|medium
      category: dependency|integration|requirements|testing
      mitigation_checked: "<what was NOT found>"  # REQUIRED

  elephants:
    - risk: "<unspoken concern>"
      severity: medium

  paper_tigers:
    - risk: "<looks scary but ok>"
      reason: "<why it's fine - what mitigation EXISTS>"
      location: "file.py:42-48"  # Show the mitigation location

  false_alarms:  # Findings that turned out to be nothing
    - finding: "<what was initially flagged>"
      reason: "<why it's not a risk>"

在脑海中过一遍这些问题，记录适用的项：

核心问题：

最可能出问题的单一因素是什么？
是否存在可能失效的外部依赖？
如果出现问题，是否可以回滚？
测试中是否未覆盖边缘情况？
是否存在可能导致返工的模糊需求？

输出格式：

yaml

premortem:
  mode: quick
  context: "<plan/PR being analyzed>"

  # 两步流程：首先收集潜在风险，然后逐一验证
  potential_risks:  # 第一步：模式匹配发现的问题
    - "hardcoded path at line 42"
    - "missing error handling for X"

  # 第二步：验证后
  tigers:
    - risk: "<description>"
      location: "file.py:42"
      severity: high|medium
      category: dependency|integration|requirements|testing
      mitigation_checked: "<what was NOT found>"  # REQUIRED

  elephants:
    - risk: "<unspoken concern>"
      severity: medium

  paper_tigers:
    - risk: "<looks scary but ok>"
      reason: "<why it's fine - what mitigation EXISTS>"
      location: "file.py:42-48"  # Show the mitigation location

  false_alarms:  # Findings that turned out to be nothing
    - finding: "<what was initially flagged>"
      reason: "<why it's not a risk>"

Deep Checklist (Before Implementation)

深度清单（针对实现前阶段）

Work through each category systematically:

Technical Risks:

Scalability: Works at 10x/100x current load?
Dependencies: External services + fallbacks defined?
Data: Availability, consistency, migrations clear?
Latency: SLA requirements will be met?
Security: Auth, injection, OWASP considered?
Error handling: All failure modes covered?

Integration Risks:

Breaking changes identified?
Migration path defined?
Rollback strategy exists?
Feature flags needed?

Process Risks:

Requirements clear and complete?
All stakeholder input gathered?
Tech debt being tracked?
Maintenance burden understood?

Testing Risks:

Coverage gaps identified?
Integration test plan exists?
Load testing needed?
Manual testing plan defined?

Output Format:

yaml

premortem:
  mode: deep
  context: "<implementation being analyzed>"

  # Two-pass process
  potential_risks:  # Pass 1: Initial scan findings
    - "no circuit breaker for external API"
    - "hardcoded timeout value"

  # Pass 2: After verification (read context, check for mitigations)
  tigers:
    - risk: "<description>"
      location: "file.py:42"
      severity: high|medium
      category: scalability|dependency|data|security|integration|testing
      mitigation_checked: "<what mitigations were looked for and NOT found>"
      suggested_fix: "<how to address>"

  elephants:
    - risk: "<unspoken concern>"
      severity: medium|high
      suggested_fix: "<suggested approach>"

  paper_tigers:
    - risk: "<looks scary>"
      reason: "<why it's actually ok - cite the mitigation code>"
      location: "file.py:45-52"

  false_alarms:
    - finding: "<initial concern>"
      reason: "<why verification showed it's not a risk>"

  checklist_gaps:
    - category: "<which checklist section>"
      items_failed: ["<item1>", "<item2>"]

系统地检查每个分类：

技术风险：

可扩展性：在当前负载的10倍/100倍下是否可行？
依赖项：是否定义了外部服务及其回退机制？
数据：可用性、一致性、迁移是否清晰？
延迟：是否能满足SLA要求？
安全性：是否考虑了认证、注入攻击、OWASP规范？
错误处理：是否覆盖了所有故障模式？

集成风险：

是否识别了破坏性变更？
是否定义了迁移路径？
是否存在回滚策略？
是否需要功能开关？

流程风险：

需求是否清晰完整？
是否收集了所有相关方的意见？
是否在跟踪技术债务？
是否理解维护负担？

测试风险：

是否识别了覆盖缺口？
是否存在集成测试计划？
是否需要负载测试？
是否定义了手动测试计划？

输出格式：

yaml

premortem:
  mode: deep
  context: "<implementation being analyzed>"

  # 两步流程
  potential_risks:  # 第一步：初步扫描发现的问题
    - "no circuit breaker for external API"
    - "hardcoded timeout value"

  # 第二步：验证后（阅读上下文，检查缓解措施）
  tigers:
    - risk: "<description>"
      location: "file.py:42"
      severity: high|medium
      category: scalability|dependency|data|security|integration|testing
      mitigation_checked: "<what mitigations were looked for and NOT found>"
      suggested_fix: "<how to address>"

  elephants:
    - risk: "<unspoken concern>"
      severity: medium|high
      suggested_fix: "<suggested approach>"

  paper_tigers:
    - risk: "<looks scary>"
      reason: "<why it's actually ok - cite the mitigation code>"
      location: "file.py:45-52"

  false_alarms:
    - finding: "<initial concern>"
      reason: "<why verification showed it's not a risk>"

  checklist_gaps:
    - category: "<which checklist section>"
      items_failed: ["<item1>", "<item2>"]

Step 3: Present Risks via AskUserQuestion

步骤3：通过AskUserQuestion呈现风险

BLOCKING: Present findings and require user decision.

python

undefined

阻塞操作： 呈现发现结果并要求用户决策。

python

undefined

Build risk summary

risk_summary = format_risks(tigers, elephants)

AskUserQuestion( question=f"""Pre-Mortem identified {len(tigers)} tigers, {len(elephants)} elephants:

{risk_summary}

How would you like to proceed?""", header="Risks", options=[ { "label": "Accept risks and proceed", "description": "Acknowledged but not blocking" }, { "label": "Add mitigations to plan (Recommended)", "description": "Update plan with risk mitigations before proceeding" }, { "label": "Research mitigation options", "description": "I don't know how to mitigate - help me find solutions" }, { "label": "Discuss specific risks", "description": "Talk through particular concerns" } ] )

undefined

risk_summary = format_risks(tigers, elephants)

AskUserQuestion( question=f"""Pre-Mortem identified {len(tigers)} tigers, {len(elephants)} elephants:

{risk_summary}

How would you like to proceed?""", header="Risks", options=[ { "label": "Accept risks and proceed", "description": "Acknowledged but not blocking" }, { "label": "Add mitigations to plan (Recommended)", "description": "Update plan with risk mitigations before proceeding" }, { "label": "Research mitigation options", "description": "I don't know how to mitigate - help me find solutions" }, { "label": "Discuss specific risks", "description": "Talk through particular concerns" } ] )

undefined

Step 4: Handle User Response

步骤4：处理用户响应

If "Accept risks and proceed"

如果选择“Accept risks and proceed”

python

undefined

python

undefined

Log acceptance for audit trail

print("Risks acknowledged. Proceeding with implementation.")

Continue to next workflow step

undefined

undefined

If "Add mitigations to plan"

如果选择“Add mitigations to plan”

python

undefined

python

undefined

User provides mitigation approach

Update plan file with mitigations section

Re-run quick premortem to verify mitigations address risks

undefined

undefined

If "Research mitigation options"

如果选择“Research mitigation options”

python

undefined

python

undefined

Spawn parallel research for each HIGH severity tiger

for tiger in high_severity_tigers: # Internal: How has codebase handled this before? Task( subagent_type="scout", prompt=f""" Find how this codebase has previously handled: {tiger.category}

    Specifically looking for patterns related to: {tiger.risk}

    Return:
    - File:line references to similar solutions
    - Patterns used
    - Libraries/utilities available
    """
)

# External: What are best practices?
Task(
    subagent_type="oracle",
    prompt=f"""
    Research best practices for: {tiger.risk}

    Context: {tiger.category} in a {tech_stack} codebase

    Return:
    - Recommended approaches (ranked)
    - Library options
    - Common pitfalls to avoid
    """
)

for tiger in high_severity_tigers: # Internal: How has codebase handled this before? Task( subagent_type="scout", prompt=f""" Find how this codebase has previously handled: {tiger.category}

    Specifically looking for patterns related to: {tiger.risk}

    Return:
    - File:line references to similar solutions
    - Patterns used
    - Libraries/utilities available
    """
)

# External: What are best practices?
Task(
    subagent_type="oracle",
    prompt=f"""
    Research best practices for: {tiger.risk}

    Context: {tiger.category} in a {tech_stack} codebase

    Return:
    - Recommended approaches (ranked)
    - Library options
    - Common pitfalls to avoid
    """
)

Wait for research to complete

Synthesize options

Present via AskUserQuestion with 2-4 mitigation options

undefined

undefined

If "Discuss specific risks"

如果选择“Discuss specific risks”

python

undefined

python

undefined

Ask which risk to discuss

AskUserQuestion( question="Which risk would you like to discuss?", header="Risk", options=[format_risk_option(r) for r in all_risks[:4]] )

Then have conversation about that specific risk

undefined

undefined

Step 5: Update Plan (if mitigations added)

步骤5：更新计划（若添加了缓解措施）

If user added mitigations, append to the plan:

markdown

undefined

如果用户添加了缓解措施，将其追加到计划中：

markdown

undefined

Risk Mitigations (Pre-Mortem)

Tigers Addressed:

{risk} (severity: {severity})
- Mitigation: {user_or_researched_mitigation}
- Added to phase: {phase_number}

{risk} (severity: {severity})
- Mitigation: {user_or_researched_mitigation}
- Added to phase: {phase_number}

Accepted Risks:

{risk} - Accepted because: {reason}

{risk} - Accepted because: {reason}

Pre-Mortem Run:

Date: {timestamp}
Mode: {quick|deep}
Tigers: {count}
Elephants: {count}

undefined

Date: {timestamp}
Mode: {quick|deep}
Tigers: {count}
Elephants: {count}

undefined

Integration Points

集成点

In create_plan / plan-agent

在create_plan / plan-agent中

After plan structure is approved, before ExitPlanMode:

python

undefined

计划结构获批后，在退出PlanMode之前：

python

undefined

Run quick premortem

/premortem quick

If HIGH risks found, block until addressed

If only MEDIUM/LOW, inform and proceed

undefined

undefined

After plan approval, before implementation

计划获批后，实现前

python

undefined

python

undefined

Run deep premortem on full plan

/premortem deep thoughts/shared/plans/YYYY-MM-DD-feature.md

Block until all HIGH tigers addressed

undefined

undefined

In PR review workflows

在PR审查工作流中

python

undefined

python

undefined

Run quick premortem on diff scope

/premortem quick

Inform of any risks found

undefined

undefined

Severity Thresholds

严重程度阈值

Severity	Blocking?	Action Required
HIGH	Yes	Must address or explicitly accept
MEDIUM	No	Inform user, recommend addressing
LOW	No	Note for awareness

严重程度	是否阻塞？	所需操作
HIGH	是	必须处理或明确接受
MEDIUM	否	通知用户，建议处理
LOW	否	仅作记录供参考

Example Session

示例会话

User: /premortem deep thoughts/shared/plans/2026-01-08-api-rate-limiting.md

Claude: Running deep pre-mortem on API rate limiting plan...

[Reviews plan against deep checklist]

Pre-mortem complete. Found 2 tigers, 1 elephant:

**TIGERS:**
1. [HIGH] No circuit breaker for external payment API
   - Category: dependency
   - If payment API is slow/down, requests will pile up

2. [HIGH] No rollback strategy defined
   - Category: integration
   - If rate limiting breaks auth flow, no quick fix path

**ELEPHANTS:**
1. [MEDIUM] Team hasn't used Redis before
   - We're introducing Redis for rate limit counters
   - Learning curve could slow implementation

**PAPER TIGERS:**
1. Database migration size - Only adds one index, <1s migration

[AskUserQuestion presented]

User: Research mitigation options

Claude: Researching mitigations...

[Spawns scout + oracle in parallel]

Found 3 options for circuit breaker:

1. **Use existing retry util** (Recommended)
   - Found at src/utils/retry.ts:45
   - Already handles exponential backoff
   - Just need to add circuit breaker state

2. **Add resilience4j-lite**
   - Popular library for this pattern
   - More features but new dependency

3. **Custom implementation**
   - Full control but more code to maintain

Which approach for the circuit breaker?

User: Option 1 - use existing retry util

Claude: Added to plan:

User: /premortem deep thoughts/shared/plans/2026-01-08-api-rate-limiting.md

Claude: Running deep pre-mortem on API rate limiting plan...

[Reviews plan against deep checklist]

Pre-mortem complete. Found 2 tigers, 1 elephant:

**TIGERS:**
1. [HIGH] No circuit breaker for external payment API
   - Category: dependency
   - If payment API is slow/down, requests will pile up

2. [HIGH] No rollback strategy defined
   - Category: integration
   - If rate limiting breaks auth flow, no quick fix path

**ELEPHANTS:**
1. [MEDIUM] Team hasn't used Redis before
   - We're introducing Redis for rate limit counters
   - Learning curve could slow implementation

**PAPER TIGERS:**
1. Database migration size - Only adds one index, <1s migration

[AskUserQuestion presented]

User: Research mitigation options

Claude: Researching mitigations...

[Spawns scout + oracle in parallel]

Found 3 options for circuit breaker:

1. **Use existing retry util** (Recommended)
   - Found at src/utils/retry.ts:45
   - Already handles exponential backoff
   - Just need to add circuit breaker state

2. **Add resilience4j-lite**
   - Popular library for this pattern
   - More features but new dependency

3. **Custom implementation**
   - Full control but more code to maintain

Which approach for the circuit breaker?

User: Option 1 - use existing retry util

Claude: Added to plan:

Risk Mitigations (Pre-Mortem)

Tigers Addressed:

No circuit breaker (HIGH)
- Mitigation: Extend src/utils/retry.ts with circuit breaker state
- Added to Phase 1

Shall I also address the rollback strategy tiger?

undefined

No circuit breaker (HIGH)
- Mitigation: Extend src/utils/retry.ts with circuit breaker state
- Added to Phase 1

Shall I also address the rollback strategy tiger?

undefined

premortem

Original

Translation

Pre-Mortem

事前验尸（Pre-Mortem）

Usage

使用方法

Core Concept

核心概念

Risk Categories (Shreyas Framework)

风险分类（Shreyas框架）

CRITICAL: Verify Before Flagging

重要提示：标记前务必验证

The False Positive Problem

误报问题

Verification Checklist (REQUIRED)

验证清单（必填）

Required Evidence Format

必需的证据格式

Workflow

工作流程

Step 1: Detect Context & Depth

步骤1：检测上下文与分析深度

Auto-detect based on context

Auto-detect based on context

Step 2: Run Appropriate Checklist

步骤2：执行对应清单

Quick Checklist (Plans, PRs)

快速清单（针对计划、PR）

Deep Checklist (Before Implementation)

深度清单（针对实现前阶段）

Step 3: Present Risks via AskUserQuestion

步骤3：通过AskUserQuestion呈现风险

Build risk summary

Build risk summary

Step 4: Handle User Response

步骤4：处理用户响应

If "Accept risks and proceed"

如果选择“Accept risks and proceed”

Log acceptance for audit trail

Log acceptance for audit trail

Continue to next workflow step

Continue to next workflow step

If "Add mitigations to plan"

如果选择“Add mitigations to plan”

User provides mitigation approach

User provides mitigation approach

Update plan file with mitigations section

Update plan file with mitigations section

Re-run quick premortem to verify mitigations address risks

Re-run quick premortem to verify mitigations address risks

If "Research mitigation options"

如果选择“Research mitigation options”

Spawn parallel research for each HIGH severity tiger

Spawn parallel research for each HIGH severity tiger

Wait for research to complete

Wait for research to complete

Synthesize options

Synthesize options

Present via AskUserQuestion with 2-4 mitigation options

Present via AskUserQuestion with 2-4 mitigation options

If "Discuss specific risks"

如果选择“Discuss specific risks”

Ask which risk to discuss

Ask which risk to discuss

Then have conversation about that specific risk

Then have conversation about that specific risk

Step 5: Update Plan (if mitigations added)

步骤5：更新计划（若添加了缓解措施）

Risk Mitigations (Pre-Mortem)

Risk Mitigations (Pre-Mortem)

Tigers Addressed:

Tigers Addressed:

Accepted Risks:

Accepted Risks:

Pre-Mortem Run:

Pre-Mortem Run:

Integration Points

集成点

In create_plan / plan-agent