fortify
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMANDATORY PREPARATION
必备准备工作
Invoke {{command_prefix}}agent-workflow — it contains workflow principles, anti-patterns, and the Context Gathering Protocol. Follow the protocol before proceeding — if no workflow context exists yet, you MUST run {{command_prefix}}teach-maestro first.
Consult the guardrails-safety reference in the agent-workflow skill for defense-in-depth patterns and error boundary design.
Make the workflow resilient. Every external call will fail eventually — model APIs, tools, databases, third-party services. Fortify ensures the workflow handles failure gracefully.
调用 {{command_prefix}}agent-workflow —— 它包含工作流原则、反模式以及上下文收集协议。在继续操作前请遵循该协议,如果还不存在工作流上下文,你必须先运行 {{command_prefix}}teach-maestro。
请参考agent-workflow技能中的护栏安全参考,了解纵深防御模式和错误边界设计。
提升工作流的韧性。所有外部调用终归会出现故障——包括模型API、工具、数据库、第三方服务。强化机制可确保工作流优雅地处理故障。
Fortification Layers
强化分层
Layer 1: Input Validation
- Validate all inputs before processing
- Return clear error messages for invalid input
- Set size limits on all input fields
Layer 2: Retry with Backoff
For transient failures (network errors, rate limits, timeouts):
yaml
Retry strategy:
max_retries: 3
initial_delay: 1s
backoff_multiplier: 2
max_delay: 30s
retryable_errors: [429, 500, 502, 503, 504, TIMEOUT, CONNECTION_ERROR]
non_retryable_errors: [400, 401, 403, 404]Layer 3: Fallback Responses
When retries are exhausted:
- Use a cached previous response (if applicable)
- Use a simpler/cheaper model as fallback
- Return a graceful degradation response
- Escalate to human review
Layer 4: Circuit Breakers
When a service is consistently failing:
yaml
Circuit breaker:
failure_threshold: 5 consecutive failures
state: CLOSED → OPEN (after threshold) → HALF_OPEN (after cooldown)
cooldown: 60 seconds
half_open_max_requests: 1Layer 5: Timeout Controls
Every external call needs a timeout:
- Model API calls: 30-120s depending on task
- Tool executions: 10-60s depending on tool
- Database queries: 5-15s
- Third-party APIs: 10-30s
第一层:输入校验
- 处理前校验所有输入
- 针对无效输入返回清晰的错误信息
- 为所有输入字段设置大小限制
第二层:带退避的重试
针对瞬时故障(网络错误、速率限制、超时):
yaml
Retry strategy:
max_retries: 3
initial_delay: 1s
backoff_multiplier: 2
max_delay: 30s
retryable_errors: [429, 500, 502, 503, 504, TIMEOUT, CONNECTION_ERROR]
non_retryable_errors: [400, 401, 403, 404]第三层:降级响应
当重试次数耗尽时:
- 适用情况下使用缓存的历史响应
- 使用更简单/成本更低的模型作为降级方案
- 返回优雅降级的响应
- 流转至人工审核
第四层:熔断机制
当服务持续故障时:
yaml
Circuit breaker:
failure_threshold: 5 consecutive failures
state: CLOSED → OPEN (after threshold) → HALF_OPEN (after cooldown)
cooldown: 60 seconds
half_open_max_requests: 1第五层:超时控制
所有外部调用都需要设置超时:
- 模型API调用:根据任务不同设置30-120秒
- 工具执行:根据工具不同设置10-60秒
- 数据库查询:5-15秒
- 第三方API:10-30秒
Fortification Audit
强化审计
For each component, verify:
- Input validation present
- Retry logic for transient failures
- Fallback for when retries fail
- Timeout set
- Error logged with context
- User gets a meaningful error (not a stack trace)
针对每个组件,验证以下项:
- 已配置输入校验
- 已配置瞬时故障的重试逻辑
- 重试失败时存在降级方案
- 已设置超时
- 错误已附带上下文日志
- 用户会收到有意义的错误提示(而非堆栈信息)
Recommended Next Step
建议后续步骤
After fortification, run to verify error handling works under realistic failure scenarios.
{{command_prefix}}evaluateNEVER:
- Retry non-retryable errors (authentication failures, validation errors)
- Retry without backoff (you'll make the problem worse)
- Swallow errors silently (log and handle, don't ignore)
- Set infinite timeouts (they'll hang forever)
- Skip the fallback (retries exhausted with no fallback = user sees an error)
强化完成后,运行 来验证错误处理在真实故障场景下的可用性。
{{command_prefix}}evaluate禁止操作:
- 重试不可重试的错误(鉴权失败、校验错误)
- 无退避逻辑的重试(会加剧故障)
- 静默吞掉错误(需记录日志并处理,不要忽略)
- 设置无限超时(会导致进程永久挂起)
- 省略降级方案(重试耗尽且无降级=用户会看到报错)