python-errors-reliability

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Python Errors and Reliability

Python错误处理与可靠性

Overview

概述

Design errors so callers can act and operators can diagnose quickly. Treat these recommendations as preferred defaults — when a default conflicts with project constraints, suggest a better-fit alternative and call out tradeoffs and compensating controls.
设计错误处理机制,以便调用方可以采取行动,运维人员可以快速排查问题。将这些建议视为首选默认方案——当默认方案与项目约束冲突时,建议更合适的替代方案,并说明权衡点和补偿控制措施。

When to Use

适用场景

  • Implementing or reviewing timeout, deadline, or retry logic.
  • Translating exceptions across layer boundaries (e.g., infra → domain).
  • Classifying failures as retryable vs. permanent.
  • Adding idempotency guarantees to retried writes.
  • Diagnosing swallowed exceptions or silent failures in existing code.
When NOT to use:
  • Pure data-transformation code with no I/O or failure modes.
  • Simple validation that raises immediately with no translation needed.
  • Performance tuning unrelated to failure handling (see
    python-concurrency-performance
    ).
  • 实现或评审超时、截止时间或重试逻辑。
  • 跨层边界转换异常(例如,基础设施层 → 领域层)。
  • 将故障分类为可重试或永久性故障。
  • 为重试写入添加幂等性保障。
  • 排查现有代码中被吞掉的异常或静默故障。
不适用场景:
  • 无I/O操作或无故障模式的纯数据转换代码。
  • 无需转换、直接抛出异常的简单验证逻辑。
  • 与故障处理无关的性能调优(请参考
    python-concurrency-performance
    )。

Quick Reference

快速参考

  • Preserve cause chains at translation boundaries.
  • Catch only what can be handled.
  • Keep timeout/deadline policy explicit.
  • Keep retry policy explicit and bounded.
  • Classify retryable vs. permanent failures with explicit policy data.
  • Keep idempotency expectations explicit for retried writes.
  • 在跨边界转换异常时保留因果链。
  • 仅捕获可处理的异常。
  • 明确设置超时/截止时间策略。
  • 明确设置重试策略并限制重试次数。
  • 使用明确的策略数据将故障分类为可重试或永久性。
  • 明确重试写入的幂等性要求。

Common Mistakes

常见错误

  • Swallowing exceptions — bare
    except: pass
    or
    except Exception
    with no logging loses diagnostic context.
  • Unbounded retries — retrying without a maximum count or total deadline leads to cascading failures and resource exhaustion.
  • Dropping the cause chain — raising a new exception without
    from original
    discards the root cause operators need for diagnosis.
  • Treating all errors as retryable — retrying a 400 Bad Request or a validation error wastes resources and delays the real fix.
  • Implicit timeouts — relying on library defaults (or no timeout at all) produces unpredictable latency under failure conditions.
  • 吞掉异常——使用空的
    except: pass
    except Exception
    却不记录日志,会丢失排查问题所需的诊断上下文。
  • 无限重试——不设置最大重试次数或总截止时间的重试会导致级联故障和资源耗尽。
  • 丢失因果链——抛出新异常时不使用
    from original
    ,会丢弃运维人员排查问题所需的根本原因。
  • 将所有错误视为可重试——对400错误请求或验证错误进行重试会浪费资源,延误实际问题的修复。
  • 隐式超时——依赖库的默认设置(或完全不设置超时)会在故障场景下导致不可预测的延迟。

Scope Note

范围说明

  • Treat these recommendations as preferred defaults for common cases, not universal rules.
  • If a default conflicts with project constraints or worsens the outcome, suggest a better-fit alternative and explain why it is better for this case.
  • When deviating, call out tradeoffs and compensating controls (tests, observability, migration, rollback).
  • 将这些建议视为常见场景的首选默认方案,而非通用规则。
  • 如果默认方案与项目约束冲突或导致更差的结果,建议更合适的替代方案,并说明为何该方案更适用于此场景。
  • 当偏离默认方案时,需说明权衡点和补偿控制措施(测试、可观测性、迁移、回滚)。

Invocation Notice

调用说明

  • Inform the user when this skill is being invoked by name:
    python-design-modularity
    .
  • 当通过名称
    python-design-modularity
    调用此技能时,需告知用户。

References

参考资料

  • references/error-strategy.md
  • references/retryability-classification.md
  • references/retries-timeouts.md
  • references/error-strategy.md
  • references/retryability-classification.md
  • references/retries-timeouts.md