error-recovery

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Error Recovery Protocol

错误恢复协议

When an error occurs, stop, think, and try the right recovery strategy. No blind retries — understand the error signal first, then act.
Core principle: Every error carries a signal. Read the signal first, then act.

当发生错误时,先停止操作、分析问题,再尝试合适的恢复策略。禁止盲目重试——先理解错误信号,再采取行动。
核心原则: 每个错误都传递着信号。先解读信号,再采取行动。

Error Classification

错误分类

Classify every error into one of 4 categories — the recovery strategy depends on the category:
将每个错误归为以下4类之一——恢复策略取决于错误类别:

Transient Error

瞬态错误

Retrying usually fixes it. Infrastructure or network related.
  • Examples: timeout, rate limit (429), connection drop, temporary service outage
  • Strategy: Wait & Retry with exponential backoff
重试通常可以解决这类问题。与基础设施或网络相关。
  • 示例:超时、速率限制(429)、连接断开、服务临时中断
  • 策略:等待并重试,使用指数退避机制

Configuration Error

配置错误

Environment or setup issue. Code is correct but setup is wrong.
  • Examples: missing env variable, wrong file path, permission denied, missing dependency
  • Strategy: Fix & Continue — identify the issue, fix it, re-run
环境或设置问题。代码本身正确,但配置有误。
  • 示例:缺少环境变量、文件路径错误、权限不足、依赖缺失
  • 策略:修复后继续——识别问题、修复后重新运行

Logic Error

逻辑错误

Code or approach is wrong. Retrying produces the same error.
  • Examples: KeyError, TypeError, wrong algorithm, expectation mismatch
  • Strategy: Alternative Approach — try a different method
代码或方法存在问题。重试会导致相同错误。
  • 示例:KeyError、TypeError、算法错误、预期结果不匹配
  • 策略:尝试替代方案——换一种方法

Permanent / External Error

永久性/外部错误

Out of control, cannot be fixed. External service or permission boundary.
  • Examples: 403 Forbidden, 404 Not Found, quota exceeded, API deprecated
  • Strategy: Escalation — inform the user, ask for direction

超出控制范围,无法修复。与外部服务或权限边界相关。
  • 示例:403 禁止访问、404 未找到、配额用尽、API已废弃
  • 策略:上报用户——告知用户,请求指导

Retry Strategy

重试策略

For transient errors, use exponential backoff:
Attempt 1: Retry immediately
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds -> move on or escalate
Maximum retries: 3 attempts. If all 3 fail → re-evaluate the category.
Rate limit (429) special rule:
  • If response has
    Retry-After
    header, wait that duration
  • Otherwise wait 60 seconds, then retry

对于瞬态错误,使用指数退避机制:
Attempt 1: Retry immediately
Attempt 2: Wait 2 seconds
Attempt 3: Wait 4 seconds
Attempt 4: Wait 8 seconds -> move on or escalate
最大重试次数: 3次。如果3次都失败→重新评估错误类别。
速率限制(429)特殊规则:
  • 如果响应包含
    Retry-After
    头,等待指定时长
  • 否则等待60秒后重试

Decision Tree

决策树

Error received
    |
Classify the error
    |
+------------------------------------+
| Transient?  -> Wait & Retry (max 3)|
| Config?     -> Fix & Continue      |
| Logic?      -> Alternative approach|
| Permanent?  -> Escalation          |
+------------------------------------+
    |
Every strategy fails -> Escalation

Error received
    |
Classify the error
    |
+------------------------------------+
| Transient?  -> Wait & Retry (max 3)|
| Config?     -> Fix & Continue      |
| Logic?      -> Alternative approach|
| Permanent?  -> Escalation          |
+------------------------------------+
    |
Every strategy fails -> Escalation

Escalation Protocol

上报协议

Escalate to the user when:
  • 3 retries failed
  • Permanent / external error
  • 2 consecutive different strategies failed
  • Error category cannot be determined
ERROR ESCALATION
================================
Failed step : [step name]
Error       : [error message summary]
Category    : [Transient / Config / Logic / Permanent]
Tried       : [what was attempted — short list]
Result      : All strategies exhausted
================================
Options:
  A) [Alternative approach suggestion]
  B) [Simpler / partial solution]
  C) Skip this step, continue
  D) Stop the task

在以下情况向用户上报:
  • 3次重试失败
  • 永久性/外部错误
  • 连续2种不同策略失败
  • 无法确定错误类别
ERROR ESCALATION
================================
Failed step : [step name]
Error       : [error message summary]
Category    : [Transient / Config / Logic / Permanent]
Tried       : [what was attempted — short list]
Result      : All strategies exhausted
================================
Options:
  A) [Alternative approach suggestion]
  B) [Simpler / partial solution]
  C) Skip this step, continue
  D) Stop the task

Partial Success

部分成功场景

For bulk operations where some items succeed and some fail:
PARTIAL SUCCESS
================================
Successful : N / Total
Failed     : M items
================================
Failed items:
  - [item]: [reason]

Options:
  A) Retry only failed items
  B) Continue with successful items, skip failed
  C) Cancel all

对于批量操作中部分项成功、部分项失败的情况:
PARTIAL SUCCESS
================================
Successful : N / Total
Failed     : M items
================================
Failed items:
  - [item]: [reason]

Options:
  A) Retry only failed items
  B) Continue with successful items, skip failed
  C) Cancel all

Error Log

错误日志

Log every error and recovery attempt:
[ERROR LOG]
Step     : [step name / number]
Error    : [message]
Category : [type]
Attempt 1: [strategy] -> [result]
Attempt 2: [strategy] -> [result]
Result   : Recovered / Escalated

记录每个错误及恢复尝试:
[ERROR LOG]
Step     : [step name / number]
Error    : [message]
Category : [type]
Attempt 1: [strategy] -> [result]
Attempt 2: [strategy] -> [result]
Result   : Recovered / Escalated

When to Skip

跳过错误的场景

  • Error is expected behavior (e.g., "file not found" when checking existence)
  • User said "ignore errors, continue"
  • One-off, non-repeatable task

  • 错误属于预期行为(例如,检查文件是否存在时出现“文件未找到”)
  • 用户明确要求“忽略错误,继续执行”
  • 一次性、不可重复的任务

Guardrails

防护规则

  • Never blind-retry a logic error — retrying won't help, change the approach.
  • Always log every attempt — even successful recoveries need a record.
  • Cross-skill: integrates with
    checkpoint-guardian
    (risk assessment before retry),
    memory-ledger
    (logs errors and fixes), and
    agent-reviewer
    (retrospective analysis).
  • 切勿盲目重试逻辑错误——重试无济于事,应更换方法。
  • 始终记录所有尝试——即使恢复成功也需要留存记录。
  • 跨技能集成:与
    checkpoint-guardian
    (重试前风险评估)、
    memory-ledger
    (记录错误与修复方案)和
    agent-reviewer
    (回溯分析)集成。