error-recovery

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Error Recovery

错误恢复

Overview

概述

Handle failures gracefully with structured recovery.
Core principle: When things break, don't panic. Assess, preserve, recover, verify.
Announce at start: "I'm using error-recovery to handle this failure."
通过结构化恢复流程优雅地处理故障。
核心原则: 遇到故障时不要惊慌。按评估、留存、恢复、验证的步骤处理。
启动时声明: "我正在使用error-recovery来处理此故障。"

The Recovery Protocol

恢复流程

Error Detected
┌─────────────┐
│ 1. ASSESS   │ ← Severity? Scope? Impact?
└──────┬──────┘
┌─────────────┐
│ 2. PRESERVE │ ← Capture evidence before it's lost
└──────┬──────┘
┌─────────────┐
│ 3. RECOVER  │ ← Follow decision tree
└──────┬──────┘
┌─────────────┐
│ 4. VERIFY   │ ← Confirm clean state
└──────┬──────┘
┌─────────────┐
│ 5. DOCUMENT │ ← Record what happened
└─────────────┘
Error Detected
┌─────────────┐
│ 1. ASSESS   │ ← 评估严重程度?影响范围?影响程度?
└──────┬──────┘
┌─────────────┐
│ 2. PRESERVE │ ← 在证据丢失前先捕获
└──────┬──────┘
┌─────────────┐
│ 3. RECOVER  │ ← 遵循决策树执行
└──────┬──────┘
┌─────────────┐
│ 4. VERIFY   │ ← 确认状态正常
└──────┬──────┘
┌─────────────┐
│ 5. DOCUMENT │ ← 记录事件经过
└─────────────┘

Step 1: Assess Severity

步骤1:评估严重程度

Severity Levels

严重程度等级

LevelDescriptionExamples
CriticalSystem unusable, data at riskBuild completely broken, tests cause data loss
MajorSignificant functionality brokenFeature doesn't work, many tests failing
MinorIsolated issue, workaround existsSingle test flaky, style error
InfoWarning only, not blockingDeprecation notice, performance hint
等级描述示例
Critical(严重)系统不可用,数据面临风险构建完全失败,测试导致数据丢失
Major(主要)核心功能损坏功能无法使用,大量测试失败
Minor(次要)孤立问题,存在替代方案单个测试不稳定,样式错误
Info(信息)仅警告,不阻塞流程弃用通知,性能提示

Assessment Questions

评估问题模板

markdown
undefined
markdown
undefined

Error Assessment

错误评估

Error: [Description of error] Location: [Where it occurred]
错误: [错误描述] 位置: [发生位置]

Severity Checklist

严重程度检查清单

  • Is the system still functional?
  • Is any data at risk?
  • Are other features affected?
  • Is this blocking progress?
  • 系统是否仍可正常运行?
  • 是否有数据面临风险?
  • 其他功能是否受影响?
  • 是否阻塞了开发进度?

Scope

影响范围

  • Files affected: [list]
  • Features affected: [list]
  • Users affected: [none/some/all]
undefined
  • 受影响文件:[列表]
  • 受影响功能:[列表]
  • 受影响用户:[无/部分/全部]
undefined

Step 2: Preserve Evidence

步骤2:留存证据

Capture BEFORE attempting fixes:
在尝试修复前先捕获:

Error Logs

错误日志

bash
undefined
bash
undefined

Capture error output

捕获错误输出

pnpm test 2>&1 | tee error-log.txt
pnpm test 2>&1 | tee error-log.txt

Or from failed command

或从失败的命令中捕获

./failing-command 2>&1 | tee error-log.txt
undefined
./failing-command 2>&1 | tee error-log.txt
undefined

Stack Traces

堆栈跟踪

markdown
undefined
markdown
undefined

Stack Trace

堆栈跟踪

Error: Connection refused
    at Database.connect (src/db/connection.ts:45)
    at UserService.init (src/services/user.ts:23)
    at main (src/index.ts:12)
undefined
Error: Connection refused
    at Database.connect (src/db/connection.ts:45)
    at UserService.init (src/services/user.ts:23)
    at main (src/index.ts:12)
undefined

State Capture

状态捕获

bash
undefined
bash
undefined

Git state

Git状态

git status git diff
git status git diff

Environment state

环境状态

env | grep -E "NODE|NPM|PATH"
env | grep -E "NODE|NPM|PATH"

Dependency state

依赖状态

pnpm list
undefined
pnpm list
undefined

Screenshot (if visual)

截图(如果是可视化问题)

For UI errors, capture screenshots before changes.
对于UI错误,在修改前先截图。

Step 3: Recover

步骤3:恢复

Decision Tree

决策树

What type of failure?
    ┌────┴────┬────────────┬────────────┐
    │         │            │            │
  Code      Build      Environment   External
  Error     Error        Issue       Service
    │         │            │            │
    ▼         ▼            ▼            ▼
  ┌────┐   ┌────┐      ┌────┐      ┌────┐
  │Git │   │Clean│     │Re-  │     │Wait/│
  │reco│   │build│     │init │     │Retry│
  │very│   │     │     │     │     │     │
  └────┘   └────┘      └────┘      └────┘
故障类型?
    ┌────┴────┬────────────┬────────────┐
    │         │            │            │
  代码错误   构建错误      环境问题      外部服务问题
    │         │            │            │
    ▼         ▼            ▼            ▼
  ┌────┐   ┌────┐      ┌────┐      ┌────┐
  │Git │   │清理│     │重新│     │等待/│
  │恢复│   │构建│     │初始化│     │重试│
  │    │   │     │     │     │     │     │
  └────┘   └────┘      └────┘      └────┘

Code Error Recovery

代码错误恢复

Single file broken:
bash
undefined
单个文件损坏:
bash
undefined

Revert just that file

仅还原该文件

git checkout HEAD -- path/to/file.ts

**Feature broken (multiple files):**

```bash
git checkout HEAD -- path/to/file.ts

**功能损坏(多个文件):**

```bash

Find last good commit

找到最后一个正常的提交

git log --oneline
git log --oneline

Revert to that commit (soft reset keeps changes staged)

还原到该提交(soft reset会保留暂存的更改)

git reset --soft [GOOD_COMMIT]
git reset --soft [正常提交哈希]

Or hard reset (discards changes)

或hard reset(丢弃更改)

git reset --hard [GOOD_COMMIT]

**Working directory is a mess:**

```bash
git reset --hard [正常提交哈希]

**工作目录混乱:**

```bash

Stash current changes

暂存当前更改

git stash
git stash

Verify clean state

验证干净状态

git status
git status

Optionally recover stash later

之后可选择性恢复暂存内容

git stash pop
undefined
git stash pop
undefined

Build Error Recovery

构建错误恢复

bash
undefined
bash
undefined

Clean build artifacts

清理构建产物

rm -rf node_modules dist build .cache
rm -rf node_modules dist build .cache

Reinstall dependencies

重新安装依赖

pnpm install --frozen-lockfile # Clean install from lock file
pnpm install --frozen-lockfile # 从锁文件执行干净安装

Rebuild

重新构建

pnpm build
undefined
pnpm build
undefined

Environment Error Recovery

环境错误恢复

bash
undefined
bash
undefined

Check environment

检查环境

env | grep -E "NODE|PNPM"
env | grep -E "NODE|PNPM"

Reset Node modules

重置Node模块

rm -rf node_modules pnpm install --frozen-lockfile
rm -rf node_modules pnpm install --frozen-lockfile

If using nvm, verify version

如果使用nvm,验证版本

nvm use
nvm use

Re-run init script

重新运行初始化脚本

./scripts/init.sh
undefined
./scripts/init.sh
undefined

External Service Error

外部服务错误

bash
undefined
bash
undefined

Check if service is up

检查服务是否可用

If down, wait and retry

如果不可用,等待并重试

If still down, check status page

如果仍然不可用,查看状态页面

Document as external blocker

记录为外部阻塞问题

undefined
undefined

Step 4: Verify

步骤4:验证

After recovery, verify clean state:
恢复后,验证干净状态:

Basic Verification

基础验证

bash
undefined
bash
undefined

Clean working directory

检查工作目录状态

git status
git status

Expected: "nothing to commit, working tree clean" or known changes

预期结果:"nothing to commit, working tree clean" 或已知的更改

Tests pass

测试是否通过

pnpm test
pnpm test

Build succeeds

构建是否成功

pnpm build
pnpm build

Types check

类型检查是否通过

pnpm typecheck
undefined
pnpm typecheck
undefined

Functionality Verification

功能验证

bash
undefined
bash
undefined

Run the specific thing that was broken

运行之前损坏的特定测试

pnpm test --grep "specific test"
pnpm test --grep "特定测试名称"

Or verify the feature manually

或手动验证功能

undefined
undefined

Step 5: Document

步骤5:记录

Issue Comment

问题评论

bash
gh issue comment [ISSUE_NUMBER] --body "## Error Recovery

**Error encountered:** [Description]

**Severity:** Major

**Evidence:**
\`\`\`
[Error output]
\`\`\`

**Recovery actions:**
1. [Action 1]
2. [Action 2]

**Verification:**
- [x] Tests pass
- [x] Build succeeds

**Root cause:** [If known]

**Prevention:** [If applicable]
"
bash
gh issue comment [问题编号] --body "## 错误恢复

**遇到的错误:** [描述]

**严重程度:** 主要

**证据:**
\`\`\`
[错误输出]
\`\`\`

**恢复操作:**
1. [操作1]
2. [操作2]

**验证结果:**
- [x] 测试通过
- [x] 构建成功

**根因:** [如果已知]

**预防措施:** [如果适用]
"

Knowledge Graph

知识图谱

javascript
// Store for future reference
mcp__memory__add_observations({
  observations: [{
    entityName: "Issue #[NUMBER]",
    contents: [
      "Encountered [error type] on [date]",
      "Caused by: [root cause]",
      "Resolved by: [recovery action]"
    ]
  }]
});
javascript
// 存储以供未来参考
mcp__memory__add_observations({
  observations: [{
    entityName: "问题 #[编号]",
    contents: [
      "[日期] 遇到[错误类型]",
      "根因:[根因描述]",
      "解决方式:[恢复操作]"
    ]
  }]
});

Common Recovery Patterns

常见恢复模式

"Tests were passing, now failing"

"之前测试通过,现在失败"

bash
undefined
bash
undefined

What changed?

查看有哪些变更?

git diff HEAD~3
git diff HEAD~3

Did dependencies change?

依赖是否有变更?

git diff HEAD~3 pnpm-lock.yaml
git diff HEAD~3 pnpm-lock.yaml

Clean reinstall

清理后重新安装

rm -rf node_modules && pnpm install --frozen-lockfile
undefined
rm -rf node_modules && pnpm install --frozen-lockfile
undefined

"Works locally, fails in CI"

"本地正常,CI中失败"

bash
undefined
bash
undefined

Check for environment differences

检查环境差异

- Node version

- Node版本

- OS differences

- 操作系统差异

- Env vars

- 环境变量

Run with CI-like settings

使用类似CI的设置运行

CI=true pnpm test
undefined
CI=true pnpm test
undefined

"Build was working, now broken"

"之前构建正常,现在失败"

bash
undefined
bash
undefined

Check TypeScript errors

检查TypeScript错误

pnpm typecheck
pnpm typecheck

Check for circular dependencies

检查循环依赖

pnpm dlx madge --circular src/
pnpm dlx madge --circular src/

Clean build

清理后重新构建

rm -rf dist && pnpm build
undefined
rm -rf dist && pnpm build
undefined

"I broke everything"

"我搞砸了所有事情"

bash
undefined
bash
undefined

Don't panic

不要惊慌

Find last known good state

找到最后一个已知的正常状态

git log --oneline
git log --oneline

Reset to that state

重置到该状态

git reset --hard [GOOD_COMMIT]
git reset --hard [正常提交哈希]

Verify

验证

pnpm test
pnpm test

Start again more carefully

更谨慎地重新开始

undefined
undefined

Escalation

升级处理

If recovery fails after 2-3 attempts:
markdown
undefined
如果2-3次尝试后仍无法恢复:
markdown
undefined

Escalation: Unrecoverable Error

升级处理:无法恢复的错误

Issue: #[NUMBER]
Error: [Description]
Recovery attempts:
  1. [Attempt 1] - [Result]
  2. [Attempt 2] - [Result]
Current state: [Broken/Partially working]
Evidence preserved: [Links to logs, screenshots]
Requesting help with: [Specific question]

Mark issue as Blocked and await human input.
问题: #[编号]
错误描述: [描述]
恢复尝试:
  1. [尝试1] - [结果]
  2. [尝试2] - [结果]
当前状态: [损坏/部分可用]
已留存证据: [日志、截图链接]
请求协助: [具体问题]

将问题标记为阻塞状态并等待人工介入。

Checklist

检查清单

When error occurs:
  • Severity assessed
  • Evidence preserved (logs, state, screenshots)
  • Recovery action selected
  • Recovery executed
  • Clean state verified
  • Tests pass
  • Build succeeds
  • Issue documented
发生错误时:
  • 已评估严重程度
  • 已留存证据(日志、状态、截图)
  • 已选择恢复操作
  • 已执行恢复
  • 已验证干净状态
  • 测试通过
  • 构建成功
  • 已记录问题

Integration

集成

This skill is called by:
  • issue-driven-development
    - When errors occur
  • ci-monitoring
    - CI failures
This skill may trigger:
  • research-after-failure
    - If cause is unknown
  • Issue update via
    issue-lifecycle
此技能由以下模块调用:
  • issue-driven-development
    - 发生错误时
  • ci-monitoring
    - CI失败时
此技能可能触发:
  • research-after-failure
    - 当原因未知时
  • 通过
    issue-lifecycle
    更新问题