error-coordinator
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseError Coordinator
错误协调器
Purpose
用途
Provides expertise in building resilient multi-agent systems with robust error handling, failure detection, and recovery mechanisms. Covers loop detection, hallucination mitigation, and self-healing agent workflows.
提供构建具备健壮错误处理、故障检测与恢复机制的多Agent系统的专业方案,涵盖循环检测、幻觉缓解以及Agent自愈工作流等内容。
When to Use
适用场景
- Designing error handling for agent systems
- Implementing retry and recovery strategies
- Building self-healing AI workflows
- Detecting agent loops and infinite recursion
- Mitigating hallucinations in agent outputs
- Implementing circuit breakers for agents
- Coordinating failure recovery across agents
- 为Agent系统设计错误处理机制
- 实现重试与恢复策略
- 构建自愈AI工作流
- 检测Agent循环与无限递归
- 缓解Agent输出中的幻觉问题
- 为Agent实现断路器机制
- 跨Agent协调故障恢复
Quick Start
快速开始
Invoke this skill when:
- Designing error handling for agent systems
- Implementing retry and recovery strategies
- Building self-healing AI workflows
- Detecting agent loops and infinite recursion
- Coordinating failure recovery across agents
Do NOT invoke when:
- Organizing agent teams (use agent-organizer)
- Debugging application errors (use debugger)
- Handling production incidents (use incident-responder)
- Detecting code error patterns (use error-detective)
当以下情况时调用此技能:
- 为Agent系统设计错误处理机制
- 实现重试与恢复策略
- 构建自愈AI工作流
- 检测Agent循环与无限递归
- 跨Agent协调故障恢复
请勿在以下情况调用:
- 组织Agent团队(请使用agent-organizer)
- 调试应用程序错误(请使用debugger)
- 处理生产事件(请使用incident-responder)
- 检测代码错误模式(请使用error-detective)
Decision Framework
决策框架
Error Type Handling:
├── Transient failure → Retry with backoff
├── Rate limiting → Backoff + queue
├── Invalid output → Validation + retry with feedback
├── Loop detected → Break + escalate
├── Hallucination → Ground with context, retry
├── Agent timeout → Cancel + fallback
└── Cascading failure → Circuit breaker
Recovery Strategy:
├── Idempotent operation → Simple retry
├── Stateful operation → Checkpoint + resume
├── Critical path → Fallback agent
└── Best effort → Log + continueError Type Handling:
├── Transient failure → Retry with backoff
├── Rate limiting → Backoff + queue
├── Invalid output → Validation + retry with feedback
├── Loop detected → Break + escalate
├── Hallucination → Ground with context, retry
├── Agent timeout → Cancel + fallback
└── Cascading failure → Circuit breaker
Recovery Strategy:
├── Idempotent operation → Simple retry
├── Stateful operation → Checkpoint + resume
├── Critical path → Fallback agent
└── Best effort → Log + continueCore Workflows
核心工作流
1. Loop Detection System
1. 循环检测系统
- Track agent invocation history
- Detect repeated state patterns
- Set maximum iteration limits
- Implement escape hatch triggers
- Log loop occurrences for analysis
- Escalate to supervisor or human
- 跟踪Agent调用历史
- 检测重复状态模式
- 设置最大迭代限制
- 实现紧急退出触发机制
- 记录循环事件用于分析
- 上报给监督者或人工处理
2. Hallucination Mitigation
2. 幻觉缓解
- Ground responses with source data
- Implement output validation
- Cross-check with retrieval
- Add confidence scoring
- Flag low-confidence outputs
- Provide feedback for retry
- 基于源数据锚定响应
- 实现输出验证
- 结合检索进行交叉校验
- 添加置信度评分
- 标记低置信度输出
- 提供反馈用于重试
3. Circuit Breaker Implementation
3. 断路器实现
- Track failure rates per agent
- Define failure threshold
- Open circuit on threshold breach
- Provide fallback behavior
- Implement half-open state for testing
- Close circuit on recovery
- Monitor and alert on breaker state
- 跟踪每个Agent的故障发生率
- 定义故障阈值
- 达到阈值时断开电路
- 提供降级行为
- 实现半开状态用于测试
- 恢复后闭合电路
- 监控并告警断路器状态
Best Practices
最佳实践
- Implement timeouts for all agent calls
- Use exponential backoff with jitter
- Log all failures with full context
- Design for graceful degradation
- Test failure scenarios explicitly
- Monitor error rates and patterns
- 为所有Agent调用设置超时
- 使用带抖动的指数退避策略
- 记录所有故障及完整上下文
- 设计优雅降级方案
- 显式测试故障场景
- 监控错误率与模式
Anti-Patterns
反模式
| Anti-Pattern | Problem | Correct Approach |
|---|---|---|
| Infinite retries | Resource exhaustion | Max retry limits |
| Silent failures | Hidden problems | Log and alert |
| No timeouts | Hung processes | Always set timeouts |
| Same retry interval | Thundering herd | Exponential backoff |
| No fallbacks | Complete failure | Graceful degradation |
| 反模式 | 问题 | 正确方案 |
|---|---|---|
| 无限重试 | 资源耗尽 | 设置最大重试次数限制 |
| 静默故障 | 问题被隐藏 | 记录并告警 |
| 未设置超时 | 进程挂起 | 始终设置超时 |
| 固定重试间隔 | 惊群效应 | 使用指数退避 |
| 无降级方案 | 完全故障 | 优雅降级 |