premortem
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChinesePre-Mortem
事前验尸(Pre-Mortem)
Identify failure modes before they occur by systematically questioning plans, designs, and implementations. Based on Gary Klein's technique, popularized by Shreyas Doshi (Stripe).
通过系统性地质疑计划、设计和实现,在故障发生前识别故障模式。该方法基于Gary Klein的技术,由Stripe的Shreyas Doshi推广。
Usage
使用方法
/premortem # Auto-detect context, choose depth
/premortem quick # Force quick analysis (plans, PRs)
/premortem deep # Force deep analysis (before implementation)
/premortem <file> # Analyze specific plan or code/premortem # 自动检测上下文,选择分析深度
/premortem quick # 强制快速分析(针对计划、PR)
/premortem deep # 强制深度分析(针对实现前阶段)
/premortem <file> # 分析特定计划或代码Core Concept
核心概念
"Imagine it's 3 months from now and this project has failed spectacularly. Why did it fail?"
"假设三个月后这个项目彻底失败了,它为什么会失败?"
Risk Categories (Shreyas Framework)
风险分类(Shreyas框架)
| Category | Symbol | Meaning |
|---|---|---|
| Tiger | | Clear threat that will hurt us if not addressed |
| Paper Tiger | | Looks threatening but probably fine |
| Elephant | | Thing nobody wants to talk about |
| 分类 | 标识 | 含义 |
|---|---|---|
| Tiger | | 明确的威胁,若不处理会对我们造成损害 |
| Paper Tiger | | 看起来有威胁但实际上可能没问题 |
| Elephant | | 没人愿意提及的问题 |
CRITICAL: Verify Before Flagging
重要提示:标记前务必验证
Do NOT flag risks based on pattern-matching alone. Every potential tiger MUST go through verification.
绝对不要仅基于模式匹配就标记风险。 每一个潜在的Tiger都必须经过验证。
The False Positive Problem
误报问题
Common mistakes that create false tigers:
- Seeing a hardcoded path without checking for fallback
if exists(): - Finding missing feature X without asking "is X in scope?"
- Flagging code at line N without reading lines N±20 for context
- Assuming error case isn't handled without tracing the code
导致误报的常见错误:
- 看到硬编码路径但未检查是否有的回退机制
if exists(): - 发现缺少功能X但未询问“X是否在当前范围内?”
- 标记第N行的代码但未阅读N±20行的上下文
- 假设错误情况未被处理但未追踪代码逻辑
Verification Checklist (REQUIRED)
验证清单(必填)
Before flagging ANY tiger, verify:
yaml
potential_finding:
what: "Hardcoded path at line 42"
verification:
context_read: true # Did I read ±20 lines around the finding?
fallback_check: true # Is there try/except, if exists(), or else branch?
scope_check: true # Is this even in scope for this code?
dev_only_check: true # Is this in __main__, tests/, or dev-only code?
result: tiger | paper_tiger | false_alarmIf ANY verification check is "no" or "unknown", DO NOT flag as tiger.
在标记任何Tiger之前,必须验证以下内容:
yaml
potential_finding:
what: "Hardcoded path at line 42"
verification:
context_read: true # 我是否阅读了发现点前后±20行的内容?
fallback_check: true # 是否有try/except、if exists()或else分支?
scope_check: true # 这是否属于当前代码的范围?
dev_only_check: true # 这是否在__main__、tests/或仅开发环境代码中?
result: tiger | paper_tiger | false_alarm如果任何一项验证为“否”或“未知”,请勿标记为Tiger。
Required Evidence Format
必需的证据格式
Every tiger MUST include:
yaml
tiger:
risk: "<description>"
location: "file.py:42"
severity: high|medium
# REQUIRED - what mitigation was checked and NOT found:
mitigation_checked: "No exists() check, no try/except, no fallback branch"If you cannot fill in with specific evidence, it's not a verified tiger.
mitigation_checked每一个Tiger都必须包含:
yaml
tiger:
risk: "<description>"
location: "file.py:42"
severity: high|medium
# 必填 - 已检查但未找到的缓解措施:
mitigation_checked: "No exists() check, no try/except, no fallback branch"如果无法用具体证据填写,则不能将其视为已验证的Tiger。
mitigation_checkedWorkflow
工作流程
Step 1: Detect Context & Depth
步骤1:检测上下文与分析深度
python
undefinedpython
undefinedAuto-detect based on context
Auto-detect based on context
if in_plan_creation:
depth = "quick" # Localized scope
elif before_implementation:
depth = "deep" # Global scope
elif pr_review:
depth = "quick" # Localized scope
else:
# Ask user
AskUserQuestion(
question="What depth of pre-mortem analysis?",
header="Depth",
options=[
{"label": "Quick (2-3 min)", "description": "Plans, PRs, localized changes"},
{"label": "Deep (5-10 min)", "description": "Before implementation, global scope"}
]
)
undefinedif in_plan_creation:
depth = "quick" # Localized scope
elif before_implementation:
depth = "deep" # Global scope
elif pr_review:
depth = "quick" # Localized scope
else:
# Ask user
AskUserQuestion(
question="What depth of pre-mortem analysis?",
header="Depth",
options=[
{"label": "Quick (2-3 min)", "description": "Plans, PRs, localized changes"},
{"label": "Deep (5-10 min)", "description": "Before implementation, global scope"}
]
)
undefinedStep 2: Run Appropriate Checklist
步骤2:执行对应清单
Quick Checklist (Plans, PRs)
快速清单(针对计划、PR)
Run through these mentally, note any that apply:
Core Questions:
- What's the single biggest thing that could go wrong?
- Any external dependencies that could fail?
- Is rollback possible if this breaks?
- Edge cases not covered in tests?
- Unclear requirements that could cause rework?
Output Format:
yaml
premortem:
mode: quick
context: "<plan/PR being analyzed>"
# Two-pass process: first gather potential risks, then verify each one
potential_risks: # Pass 1: Pattern-matching findings
- "hardcoded path at line 42"
- "missing error handling for X"
# Pass 2: After verification
tigers:
- risk: "<description>"
location: "file.py:42"
severity: high|medium
category: dependency|integration|requirements|testing
mitigation_checked: "<what was NOT found>" # REQUIRED
elephants:
- risk: "<unspoken concern>"
severity: medium
paper_tigers:
- risk: "<looks scary but ok>"
reason: "<why it's fine - what mitigation EXISTS>"
location: "file.py:42-48" # Show the mitigation location
false_alarms: # Findings that turned out to be nothing
- finding: "<what was initially flagged>"
reason: "<why it's not a risk>"在脑海中过一遍这些问题,记录适用的项:
核心问题:
- 最可能出问题的单一因素是什么?
- 是否存在可能失效的外部依赖?
- 如果出现问题,是否可以回滚?
- 测试中是否未覆盖边缘情况?
- 是否存在可能导致返工的模糊需求?
输出格式:
yaml
premortem:
mode: quick
context: "<plan/PR being analyzed>"
# 两步流程:首先收集潜在风险,然后逐一验证
potential_risks: # 第一步:模式匹配发现的问题
- "hardcoded path at line 42"
- "missing error handling for X"
# 第二步:验证后
tigers:
- risk: "<description>"
location: "file.py:42"
severity: high|medium
category: dependency|integration|requirements|testing
mitigation_checked: "<what was NOT found>" # REQUIRED
elephants:
- risk: "<unspoken concern>"
severity: medium
paper_tigers:
- risk: "<looks scary but ok>"
reason: "<why it's fine - what mitigation EXISTS>"
location: "file.py:42-48" # Show the mitigation location
false_alarms: # Findings that turned out to be nothing
- finding: "<what was initially flagged>"
reason: "<why it's not a risk>"Deep Checklist (Before Implementation)
深度清单(针对实现前阶段)
Work through each category systematically:
Technical Risks:
- Scalability: Works at 10x/100x current load?
- Dependencies: External services + fallbacks defined?
- Data: Availability, consistency, migrations clear?
- Latency: SLA requirements will be met?
- Security: Auth, injection, OWASP considered?
- Error handling: All failure modes covered?
Integration Risks:
- Breaking changes identified?
- Migration path defined?
- Rollback strategy exists?
- Feature flags needed?
Process Risks:
- Requirements clear and complete?
- All stakeholder input gathered?
- Tech debt being tracked?
- Maintenance burden understood?
Testing Risks:
- Coverage gaps identified?
- Integration test plan exists?
- Load testing needed?
- Manual testing plan defined?
Output Format:
yaml
premortem:
mode: deep
context: "<implementation being analyzed>"
# Two-pass process
potential_risks: # Pass 1: Initial scan findings
- "no circuit breaker for external API"
- "hardcoded timeout value"
# Pass 2: After verification (read context, check for mitigations)
tigers:
- risk: "<description>"
location: "file.py:42"
severity: high|medium
category: scalability|dependency|data|security|integration|testing
mitigation_checked: "<what mitigations were looked for and NOT found>"
suggested_fix: "<how to address>"
elephants:
- risk: "<unspoken concern>"
severity: medium|high
suggested_fix: "<suggested approach>"
paper_tigers:
- risk: "<looks scary>"
reason: "<why it's actually ok - cite the mitigation code>"
location: "file.py:45-52"
false_alarms:
- finding: "<initial concern>"
reason: "<why verification showed it's not a risk>"
checklist_gaps:
- category: "<which checklist section>"
items_failed: ["<item1>", "<item2>"]系统地检查每个分类:
技术风险:
- 可扩展性:在当前负载的10倍/100倍下是否可行?
- 依赖项:是否定义了外部服务及其回退机制?
- 数据:可用性、一致性、迁移是否清晰?
- 延迟:是否能满足SLA要求?
- 安全性:是否考虑了认证、注入攻击、OWASP规范?
- 错误处理:是否覆盖了所有故障模式?
集成风险:
- 是否识别了破坏性变更?
- 是否定义了迁移路径?
- 是否存在回滚策略?
- 是否需要功能开关?
流程风险:
- 需求是否清晰完整?
- 是否收集了所有相关方的意见?
- 是否在跟踪技术债务?
- 是否理解维护负担?
测试风险:
- 是否识别了覆盖缺口?
- 是否存在集成测试计划?
- 是否需要负载测试?
- 是否定义了手动测试计划?
输出格式:
yaml
premortem:
mode: deep
context: "<implementation being analyzed>"
# 两步流程
potential_risks: # 第一步:初步扫描发现的问题
- "no circuit breaker for external API"
- "hardcoded timeout value"
# 第二步:验证后(阅读上下文,检查缓解措施)
tigers:
- risk: "<description>"
location: "file.py:42"
severity: high|medium
category: scalability|dependency|data|security|integration|testing
mitigation_checked: "<what mitigations were looked for and NOT found>"
suggested_fix: "<how to address>"
elephants:
- risk: "<unspoken concern>"
severity: medium|high
suggested_fix: "<suggested approach>"
paper_tigers:
- risk: "<looks scary>"
reason: "<why it's actually ok - cite the mitigation code>"
location: "file.py:45-52"
false_alarms:
- finding: "<initial concern>"
reason: "<why verification showed it's not a risk>"
checklist_gaps:
- category: "<which checklist section>"
items_failed: ["<item1>", "<item2>"]Step 3: Present Risks via AskUserQuestion
步骤3:通过AskUserQuestion呈现风险
BLOCKING: Present findings and require user decision.
python
undefined阻塞操作: 呈现发现结果并要求用户决策。
python
undefinedBuild risk summary
Build risk summary
risk_summary = format_risks(tigers, elephants)
AskUserQuestion(
question=f"""Pre-Mortem identified {len(tigers)} tigers, {len(elephants)} elephants:
{risk_summary}
How would you like to proceed?""",
header="Risks",
options=[
{
"label": "Accept risks and proceed",
"description": "Acknowledged but not blocking"
},
{
"label": "Add mitigations to plan (Recommended)",
"description": "Update plan with risk mitigations before proceeding"
},
{
"label": "Research mitigation options",
"description": "I don't know how to mitigate - help me find solutions"
},
{
"label": "Discuss specific risks",
"description": "Talk through particular concerns"
}
]
)
undefinedrisk_summary = format_risks(tigers, elephants)
AskUserQuestion(
question=f"""Pre-Mortem identified {len(tigers)} tigers, {len(elephants)} elephants:
{risk_summary}
How would you like to proceed?""",
header="Risks",
options=[
{
"label": "Accept risks and proceed",
"description": "Acknowledged but not blocking"
},
{
"label": "Add mitigations to plan (Recommended)",
"description": "Update plan with risk mitigations before proceeding"
},
{
"label": "Research mitigation options",
"description": "I don't know how to mitigate - help me find solutions"
},
{
"label": "Discuss specific risks",
"description": "Talk through particular concerns"
}
]
)
undefinedStep 4: Handle User Response
步骤4:处理用户响应
If "Accept risks and proceed"
如果选择“Accept risks and proceed”
python
undefinedpython
undefinedLog acceptance for audit trail
Log acceptance for audit trail
print("Risks acknowledged. Proceeding with implementation.")
print("Risks acknowledged. Proceeding with implementation.")
Continue to next workflow step
Continue to next workflow step
undefinedundefinedIf "Add mitigations to plan"
如果选择“Add mitigations to plan”
python
undefinedpython
undefinedUser provides mitigation approach
User provides mitigation approach
Update plan file with mitigations section
Update plan file with mitigations section
Re-run quick premortem to verify mitigations address risks
Re-run quick premortem to verify mitigations address risks
undefinedundefinedIf "Research mitigation options"
如果选择“Research mitigation options”
python
undefinedpython
undefinedSpawn parallel research for each HIGH severity tiger
Spawn parallel research for each HIGH severity tiger
for tiger in high_severity_tigers:
# Internal: How has codebase handled this before?
Task(
subagent_type="scout",
prompt=f"""
Find how this codebase has previously handled: {tiger.category}
Specifically looking for patterns related to: {tiger.risk}
Return:
- File:line references to similar solutions
- Patterns used
- Libraries/utilities available
"""
)
# External: What are best practices?
Task(
subagent_type="oracle",
prompt=f"""
Research best practices for: {tiger.risk}
Context: {tiger.category} in a {tech_stack} codebase
Return:
- Recommended approaches (ranked)
- Library options
- Common pitfalls to avoid
"""
)for tiger in high_severity_tigers:
# Internal: How has codebase handled this before?
Task(
subagent_type="scout",
prompt=f"""
Find how this codebase has previously handled: {tiger.category}
Specifically looking for patterns related to: {tiger.risk}
Return:
- File:line references to similar solutions
- Patterns used
- Libraries/utilities available
"""
)
# External: What are best practices?
Task(
subagent_type="oracle",
prompt=f"""
Research best practices for: {tiger.risk}
Context: {tiger.category} in a {tech_stack} codebase
Return:
- Recommended approaches (ranked)
- Library options
- Common pitfalls to avoid
"""
)Wait for research to complete
Wait for research to complete
Synthesize options
Synthesize options
Present via AskUserQuestion with 2-4 mitigation options
Present via AskUserQuestion with 2-4 mitigation options
undefinedundefinedIf "Discuss specific risks"
如果选择“Discuss specific risks”
python
undefinedpython
undefinedAsk which risk to discuss
Ask which risk to discuss
AskUserQuestion(
question="Which risk would you like to discuss?",
header="Risk",
options=[format_risk_option(r) for r in all_risks[:4]]
)
AskUserQuestion(
question="Which risk would you like to discuss?",
header="Risk",
options=[format_risk_option(r) for r in all_risks[:4]]
)
Then have conversation about that specific risk
Then have conversation about that specific risk
undefinedundefinedStep 5: Update Plan (if mitigations added)
步骤5:更新计划(若添加了缓解措施)
If user added mitigations, append to the plan:
markdown
undefined如果用户添加了缓解措施,将其追加到计划中:
markdown
undefinedRisk Mitigations (Pre-Mortem)
Risk Mitigations (Pre-Mortem)
Tigers Addressed:
Tigers Addressed:
- {risk} (severity: {severity})
- Mitigation: {user_or_researched_mitigation}
- Added to phase: {phase_number}
- {risk} (severity: {severity})
- Mitigation: {user_or_researched_mitigation}
- Added to phase: {phase_number}
Accepted Risks:
Accepted Risks:
- {risk} - Accepted because: {reason}
- {risk} - Accepted because: {reason}
Pre-Mortem Run:
Pre-Mortem Run:
- Date: {timestamp}
- Mode: {quick|deep}
- Tigers: {count}
- Elephants: {count}
undefined- Date: {timestamp}
- Mode: {quick|deep}
- Tigers: {count}
- Elephants: {count}
undefinedIntegration Points
集成点
In create_plan / plan-agent
在create_plan / plan-agent中
After plan structure is approved, before ExitPlanMode:
python
undefined计划结构获批后,在退出PlanMode之前:
python
undefinedRun quick premortem
Run quick premortem
/premortem quick
/premortem quick
If HIGH risks found, block until addressed
If HIGH risks found, block until addressed
If only MEDIUM/LOW, inform and proceed
If only MEDIUM/LOW, inform and proceed
undefinedundefinedAfter plan approval, before implementation
计划获批后,实现前
python
undefinedpython
undefinedRun deep premortem on full plan
Run deep premortem on full plan
/premortem deep thoughts/shared/plans/YYYY-MM-DD-feature.md
/premortem deep thoughts/shared/plans/YYYY-MM-DD-feature.md
Block until all HIGH tigers addressed
Block until all HIGH tigers addressed
undefinedundefinedIn PR review workflows
在PR审查工作流中
python
undefinedpython
undefinedRun quick premortem on diff scope
Run quick premortem on diff scope
/premortem quick
/premortem quick
Inform of any risks found
Inform of any risks found
undefinedundefinedSeverity Thresholds
严重程度阈值
| Severity | Blocking? | Action Required |
|---|---|---|
| HIGH | Yes | Must address or explicitly accept |
| MEDIUM | No | Inform user, recommend addressing |
| LOW | No | Note for awareness |
| 严重程度 | 是否阻塞? | 所需操作 |
|---|---|---|
| HIGH | 是 | 必须处理或明确接受 |
| MEDIUM | 否 | 通知用户,建议处理 |
| LOW | 否 | 仅作记录供参考 |
Example Session
示例会话
User: /premortem deep thoughts/shared/plans/2026-01-08-api-rate-limiting.md
Claude: Running deep pre-mortem on API rate limiting plan...
[Reviews plan against deep checklist]
Pre-mortem complete. Found 2 tigers, 1 elephant:
**TIGERS:**
1. [HIGH] No circuit breaker for external payment API
- Category: dependency
- If payment API is slow/down, requests will pile up
2. [HIGH] No rollback strategy defined
- Category: integration
- If rate limiting breaks auth flow, no quick fix path
**ELEPHANTS:**
1. [MEDIUM] Team hasn't used Redis before
- We're introducing Redis for rate limit counters
- Learning curve could slow implementation
**PAPER TIGERS:**
1. Database migration size - Only adds one index, <1s migration
[AskUserQuestion presented]
User: Research mitigation options
Claude: Researching mitigations...
[Spawns scout + oracle in parallel]
Found 3 options for circuit breaker:
1. **Use existing retry util** (Recommended)
- Found at src/utils/retry.ts:45
- Already handles exponential backoff
- Just need to add circuit breaker state
2. **Add resilience4j-lite**
- Popular library for this pattern
- More features but new dependency
3. **Custom implementation**
- Full control but more code to maintain
Which approach for the circuit breaker?
User: Option 1 - use existing retry util
Claude: Added to plan:User: /premortem deep thoughts/shared/plans/2026-01-08-api-rate-limiting.md
Claude: Running deep pre-mortem on API rate limiting plan...
[Reviews plan against deep checklist]
Pre-mortem complete. Found 2 tigers, 1 elephant:
**TIGERS:**
1. [HIGH] No circuit breaker for external payment API
- Category: dependency
- If payment API is slow/down, requests will pile up
2. [HIGH] No rollback strategy defined
- Category: integration
- If rate limiting breaks auth flow, no quick fix path
**ELEPHANTS:**
1. [MEDIUM] Team hasn't used Redis before
- We're introducing Redis for rate limit counters
- Learning curve could slow implementation
**PAPER TIGERS:**
1. Database migration size - Only adds one index, <1s migration
[AskUserQuestion presented]
User: Research mitigation options
Claude: Researching mitigations...
[Spawns scout + oracle in parallel]
Found 3 options for circuit breaker:
1. **Use existing retry util** (Recommended)
- Found at src/utils/retry.ts:45
- Already handles exponential backoff
- Just need to add circuit breaker state
2. **Add resilience4j-lite**
- Popular library for this pattern
- More features but new dependency
3. **Custom implementation**
- Full control but more code to maintain
Which approach for the circuit breaker?
User: Option 1 - use existing retry util
Claude: Added to plan:Risk Mitigations (Pre-Mortem)
Risk Mitigations (Pre-Mortem)
Tigers Addressed:
Tigers Addressed:
- No circuit breaker (HIGH)
- Mitigation: Extend src/utils/retry.ts with circuit breaker state
- Added to Phase 1
Shall I also address the rollback strategy tiger?
undefined- No circuit breaker (HIGH)
- Mitigation: Extend src/utils/retry.ts with circuit breaker state
- Added to Phase 1
Shall I also address the rollback strategy tiger?
undefined