agent-reviewer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agent Reviewer Protocol

Agent 评审协议

Task is done — now look back. What went well, what went wrong, what should be different next time? Goal: never repeat the same mistake and continuously improve skills and processes.
Core principle: Retrospectives are painful but necessary. A good agent evaluates itself.

任务已完成——现在进行复盘。哪些地方做得好,哪些地方出了问题,下次应该做出哪些改变?目标:不再重复相同错误,持续优化Skill和流程。
核心原则: 回顾总结虽有难度但必不可少。优秀的Agent会进行自我评估。

6 Review Dimensions

6个评审维度

1. Goal Alignment

1. 目标一致性

Did the result match the original intent?
  • Was the user's actual request met?
  • Did scope creep occur?
  • Over-delivery or under-delivery?
结果是否符合初始意图?
  • 是否满足了用户的实际需求?
  • 是否出现了范围蔓延?
  • 交付内容过多还是不足?

2. Efficiency

2. 效率

Did the task take longer than necessary?
  • Unnecessary tool calls?
  • Repeated operations?
  • Sequential steps that could have been parallel?
  • Token/resource waste?
任务耗时是否超出必要范围?
  • 是否存在不必要的工具调用?
  • 是否有重复操作?
  • 可并行的步骤是否采用了串行执行?
  • 是否存在Token/资源浪费?

3. Decision Quality

3. 决策质量

Were decisions well-reasoned?
  • Were assumptions verified?
  • Were alternatives considered?
  • Did early decisions cause later problems?
决策是否经过充分论证?
  • 假设是否经过验证?
  • 是否考虑了替代方案?
  • 早期决策是否引发了后续问题?

4. Error Handling

4. 错误处理

How were errors addressed?
  • Detected quickly?
  • Right strategy applied?
  • Same error repeated?
错误是如何被处理的?
  • 是否快速检测到错误?
  • 是否采用了正确的应对策略?
  • 是否重复出现相同错误?

5. Communication

5. 沟通能力

How was user interaction quality?
  • Unnecessary confirmations requested?
  • Critical information missing at key points?
  • Too many or too few questions?
与用户的交互质量如何?
  • 是否存在不必要的确认请求?
  • 关键节点是否缺失重要信息?
  • 问题数量是否过多或过少?

6. Reusability

6. 可复用性

Can lessons from this task transfer to the next?
  • General patterns discovered?
  • Which skills were missing or insufficient?
  • Which decisions should become standard?

本次任务的经验是否能迁移到下一次任务中?
  • 是否发现了通用模式?
  • 缺少哪些Skill或哪些Skill存在不足?
  • 哪些决策应形成标准流程?

Finding Severity

问题严重程度划分

SeverityMeaningAction
CRITICALEndangered the task or significantly reduced qualityMust fix
MODERATECreated inefficiency but didn't break the resultImprove
POSITIVESomething that went better than expectedRepeat, standardize

严重程度含义行动要求
CRITICAL(严重)危及任务完成或大幅降低成果质量必须修复
MODERATE(中等)造成低效但未导致任务失败需要改进
POSITIVE(积极)表现超出预期重复执行、形成标准

Output Format

输出格式

AGENT REVIEWER — Task Retrospective
Task     : [task name]
Score    : X/10
Findings : N critical | N moderate | N positive
AGENT REVIEWER — Task Retrospective
Task     : [任务名称]
Score    : X/10
Findings : N critical | N moderate | N positive

Dimension Scores

Dimension Scores

DimensionScoreSummary
Goal AlignmentX/10...
EfficiencyX/10...
Decision QualityX/10...
Error HandlingX/10...
CommunicationX/10...
ReusabilityX/10...
OverallX/10
DimensionScoreSummary
Goal AlignmentX/10...
EfficiencyX/10...
Decision QualityX/10...
Error HandlingX/10...
CommunicationX/10...
ReusabilityX/10...
OverallX/10

Critical Findings

Critical Findings

[If any — what happened, why critical, how to prevent]
[如有严重问题——问题详情、为何严重、预防措施]

Improvement Areas

Improvement Areas

[Inefficiencies, missed opportunities]
[低效环节、错失的机会]

What Went Well

What Went Well

[Decisions and approaches worth repeating]
[值得重复采用的决策和方法]

Action Items

Action Items

For Next Task

For Next Task

  1. [Concrete change — what to do]
  2. [Concrete change]
  1. [具体改进措施——要做什么]
  2. [具体改进措施]

Skill / Process Improvement

Skill / Process Improvement

  1. [Which skill should be updated / added]
  2. [Which pattern should be standardized]
  1. [需要更新/新增的Skill]
  2. [需要标准化的模式]

Lessons Learned

Lessons Learned

[Items a future agent instance should know — candidates for memory-ledger]

---
[未来Agent实例应知晓的内容——可纳入记忆账本的候选项]

---

Inefficiency Patterns — Auto-Detect

低效模式——自动检测

Scan the task history for these patterns:
PatternSymptomFix
Repeated tool callSame file/API read 2+ timesCache it
Unnecessary confirmationLow-risk step triggered approvalAdjust checkpoint-guardian threshold
Late assumption discovery"Actually it should be..." after errorTrigger assumption-checker earlier
Sequential parallel stepsIndependent steps ran sequentiallyUse parallel-planner
Blind retryLogic error treated as transientFix error-recovery categorization
Context lossPrevious step info forgottenMemory-ledger not updated
Over-decomposition2-step task split into 8Adjust task-decomposer granularity

扫描任务历史,识别以下模式:
模式症状修复方案
重复工具调用同一文件/API被读取2次以上进行缓存
不必要的确认低风险步骤触发了审批调整checkpoint-guardian的阈值
假设发现过晚出错后才提出“实际上应该是……”提前触发assumption-checker
串行执行可并行步骤独立步骤采用串行执行使用parallel-planner
盲目重试将逻辑错误视为临时错误修复错误恢复的分类规则
上下文丢失遗忘了上一步的信息未更新memory-ledger
过度分解2步任务被拆分为8步调整task-decomposer的颗粒度

Skill Performance Evaluation

Skill 表现评估

Evaluate skills used during the task:
undefined
对任务中使用的Skill进行评估:
undefined

Skills Used

Skills Used

SkillUsed?Effective?Notes
task-decomposerYes/NoGood/Fair/Poor...
checkpoint-guardianYes/NoGood/Fair/Poor...
assumption-checkerYes/NoGood/Fair/Poor...
tool-selectorYes/NoGood/Fair/Poor...
parallel-plannerYes/NoGood/Fair/Poor...
error-recoveryYes/NoGood/Fair/Poor...
memory-ledgerYes/NoGood/Fair/Poor...
output-criticYes/NoGood/Fair/Poor...
Missing / untriggered skills and why?

---
SkillUsed?Effective?Notes
task-decomposer是/否良好/一般/较差...
checkpoint-guardian是/否良好/一般/较差...
assumption-checker是/否良好/一般/较差...
tool-selector是/否良好/一般/较差...
parallel-planner是/否良好/一般/较差...
error-recovery是/否良好/一般/较差...
memory-ledger是/否良好/一般/较差...
output-critic是/否良好/一般/较差...
缺失/未触发的Skill及原因?

---

When to Skip

跳过回顾的场景

  • Task was single-step or under 5 minutes
  • Prototype / experimental task
  • User said "no retrospective needed"

  • 任务为单步骤或耗时不足5分钟
  • 原型/实验性任务
  • 用户明确表示“无需回顾总结”

Guardrails

约束规则

  • Be honest, not kind — the value is in finding problems, not hiding them.
  • Concrete suggestions only — "do better" is useless; "cache file reads to avoid 3 redundant calls" is actionable.
  • Cross-skill: this is the ecosystem's feedback loop — findings here should update other skills and processes.
  • 坦诚务实,而非敷衍了事——回顾的价值在于发现问题,而非掩盖问题。
  • 仅提供具体建议——“做得更好”毫无意义;“缓存文件读取以避免3次冗余调用”才是可落地的。
  • 跨Skill协同——这是生态系统的反馈循环——此处的发现应用于更新其他Skill和流程。