agents-best-practices-harness-design
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
Chineseagents-best-practices Skill
agents-best-practices Skill
Skill by ara.so — AI Agent Skills collection.
This skill provides provider-neutral best practices for designing, auditing, and refactoring agentic harnesses—the control plane around a model that validates, authorizes, executes, and observes tool calls. It applies to coding agents, research agents, support, operations, finance, legal, healthcare, education, and workflow automation agents.
Core principle: The model proposes actions; the harness validates, authorizes, executes, records, and returns observations.
由ara.so提供的Skill — AI Agent Skills集合。
本Skill提供与服务商无关的最佳实践,用于设计、审核和重构Agent控制框架——即围绕模型构建的控制层,负责验证、授权、执行和监控工具调用。这些实践适用于编码Agent、研究Agent,以及客服、运维、财务、法律、医疗、教育和工作流自动化等场景下的Agent。
核心原则:模型提出操作建议;控制框架负责验证、授权、执行、记录并返回观测结果。
What This Skill Does
本Skill的功能
- Generate MVP agent blueprints for new domains with typed tools, permissions, and launch gates
- Audit existing agent harnesses for brittle loops, unbounded tools, missing budgets, and observability gaps
- Design tools and permissions with risk-appropriate approval gates
- Structure planning mode and goal-like loops with checkpoints and budgets
- Build context and memory strategies that preserve active state across compaction
- Optimize prompt caching and cost telemetry
- Integrate skills, MCP, and external connectors with progressive disclosure
- Implement security, evals, and observability for production readiness
- 生成MVP Agent蓝图:为新领域创建包含类型化工具、权限和发布门槛的最小可行Agent框架
- 审核现有Agent控制框架:检查是否存在脆弱循环、无边界工具、缺失预算和可观测性缺口等问题
- 设计工具与权限体系:结合风险等级设置合适的审批关卡
- 构建规划模式与目标循环:设置检查点和预算,搭建类目标的循环机制
- 制定上下文与记忆策略:在内容压缩过程中保留活跃状态
- 优化提示词缓存与成本遥测
- 集成Skills、MCP与外部连接器:采用渐进式披露方式
- 实现安全机制、评估与可观测性:确保Agent具备生产环境就绪能力
Installation
安装方法
Option A: Via skills
CLI (Recommended)
skills选项A:通过skills
CLI安装(推荐)
skillsbash
npx skills add DenisSergeevitch/agents-best-practices -gThe flag installs globally for all projects.
-gbash
npx skills add DenisSergeevitch/agents-best-practices -g-gOption B: Manual Install
选项B:手动安装
For Codex:
bash
mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills"
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
"${CODEX_HOME:-$HOME/.codex}/skills/agents-best-practices"For Claude Code (user-level):
bash
mkdir -p "$HOME/.claude/skills"
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
"$HOME/.claude/skills/agents-best-practices"For Claude Code (project-level):
bash
mkdir -p .claude/skills
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
.claude/skills/agents-best-practices针对Codex:
bash
mkdir -p "${CODEX_HOME:-$HOME/.codex}/skills"
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
"${CODEX_HOME:-$HOME/.codex}/skills/agents-best-practices"针对Claude Code(用户级):
bash
mkdir -p "$HOME/.claude/skills"
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
"$HOME/.claude/skills/agents-best-practices"针对Claude Code(项目级):
bash
mkdir -p .claude/skills
git clone https://github.com/DenisSergeevitch/agents-best-practices.git \
.claude/skills/agents-best-practicesVerification
验证安装
After install, verify the skill is discoverable:
bash
undefined安装完成后,验证Skill是否可被识别:
bash
undefinedCodex
Codex
ls "${CODEX_HOME:-$HOME/.codex}/skills/agents-best-practices"
ls "${CODEX_HOME:-$HOME/.codex}/skills/agents-best-practices"
Claude Code
Claude Code
ls "$HOME/.claude/skills/agents-best-practices"
You should see `SKILL.md`, `README.md`, `icon.jpeg`, and `references/`.
---ls "$HOME/.claude/skills/agents-best-practices"
你应该能看到`SKILL.md`、`README.md`、`icon.jpeg`和`references/`目录。
---Repository Structure
仓库结构
agents-best-practices/
├── SKILL.md # skill entrypoint (this file)
├── README.md # public-facing overview
├── icon.jpeg # skill icon
└── references/
├── mvp-agent-blueprint.md # MVP harness generator
├── architecture.md # component model
├── agentic-loop.md # loop invariants and budgets
├── tools-and-permissions.md # typed tools and risk classes
├── planning-and-goals.md # planning mode and long-running goals
├── context-memory-compaction.md # context, memory, retrieval
├── prompt-caching-and-cost.md # cache-aware context layout
├── skills-and-connectors.md # Agent Skills, MCP, connectors
├── system-prompts-instructions.md # instruction hierarchy
├── provider-api-patterns.md # OpenAI, Anthropic, compatible APIs
├── security-evals-observability.md # guardrails, tracing, evals
├── agent-legibility-feedback-loops.md # artifacts and cleanup
├── checklists.md # implementation and audit checklists
├── coverage-audit.md # topic coverage verification
└── source-links.md # official referencesagents-best-practices/
├── SKILL.md # Skill入口文件(即本文档)
├── README.md # 面向公众的概述文档
├── icon.jpeg # Skill图标
└── references/
├── mvp-agent-blueprint.md # MVP控制框架生成器
├── architecture.md # 组件模型
├── agentic-loop.md # 循环不变量与预算
├── tools-and-permissions.md # 类型化工具与风险分类
├── planning-and-goals.md # 规划模式与长期目标
├── context-memory-compaction.md # 上下文、记忆与检索
├── prompt-caching-and-cost.md # 缓存感知的上下文布局
├── skills-and-connectors.md # Agent Skills、MCP与连接器
├── system-prompts-instructions.md # 指令层级
├── provider-api-patterns.md # OpenAI、Anthropic兼容API
├── security-evals-observability.md # 防护机制、追踪与评估
├── agent-legibility-feedback-loops.md # 工件与清理
├── checklists.md # 实现与审核清单
├── coverage-audit.md # 主题覆盖验证
└── source-links.md # 官方参考资料Core Concepts
核心概念
1. The Agentic Loop
1. Agent循环机制
Every agent follows this pattern:
user/task → context builder → model call → typed tool call
→ schema validation → permission check → execution or pause
→ structured observation → next step or final answerKey invariants:
- Every tool call gets a result (success, denial, timeout, malformed, abort)
- Risk changes the loop (reads vs. drafts vs. writes vs. external communications)
- Budgets prevent runaway loops (steps, time, tokens, cost, tool calls)
- Active state survives compaction
Reference:
references/agentic-loop.md每个Agent都遵循以下模式:
用户/任务 → 上下文构建器 → 模型调用 → 类型化工具调用
→ schema验证 → 权限检查 → 执行或暂停
→ 结构化观测结果 → 下一步操作或最终答案关键不变量:
- 每个工具调用都会返回结果(成功、拒绝、超时、格式错误、中止)
- 风险等级会改变循环流程(读取、草拟、写入、外部通信等操作的流程不同)
- 预算机制防止无限循环(步骤、时间、Token、成本、工具调用次数限制)
- 活跃状态在内容压缩后仍能保留
参考文档:
references/agentic-loop.md2. Tools and Permissions
2. 工具与权限
Risk classes determine permission requirements:
| Risk Class | Examples | Permission |
|---|---|---|
| Read CRM, support tickets | Autonomous with scope |
| Draft email, Slack message | Autonomous with label |
| Update record | Approval gate |
| Send email, post to Slack | Approval gate |
| Delete, archive | Approval gate |
| Admin tools, deploy | Approval gate |
| Charge card, transfer funds | Approval gate |
Pattern: Typed Tools
typescript
// Good: Narrow, typed, deterministic
interface SendCustomerEmailTool {
name: "send_customer_email";
parameters: {
account_id: string;
template: "renewal_reminder" | "upgrade_offer" | "support_followup";
variables: Record<string, string>;
};
permission: "approval_gate";
}
// Bad: Generic, untyped, unbounded
interface SendMessageTool {
name: "send_message";
parameters: {
to: string;
body: string;
};
}Reference:
references/tools-and-permissions.md风险分类决定权限要求:
| 风险分类 | 示例 | 权限要求 |
|---|---|---|
| 读取CRM数据、客服工单 | 自主执行(带范围限制) |
| 草拟邮件、Slack消息 | 自主执行(带标签标识) |
| 更新记录 | 需要审批关卡 |
| 发送邮件、在Slack发帖 | 需要审批关卡 |
| 删除、归档操作 | 需要审批关卡 |
| 管理工具、部署操作 | 需要审批关卡 |
| 刷卡收费、转账操作 | 需要审批关卡 |
模式:类型化工具
typescript
// 推荐:范围明确、类型化、可预测
interface SendCustomerEmailTool {
name: "send_customer_email";
parameters: {
account_id: string;
template: "renewal_reminder" | "upgrade_offer" | "support_followup";
variables: Record<string, string>;
};
permission: "approval_gate";
}
// 不推荐:通用化、无类型、无边界
interface SendMessageTool {
name: "send_message";
parameters: {
to: string;
body: string;
};
}参考文档:
references/tools-and-permissions.md3. Planning and Goals
3. 规划与目标
Planning mode separates thinking from acting:
typescript
interface PlanningResult {
plan: string; // What the agent intends to do
required_approvals: string[]; // Tools needing human approval
estimated_steps: number;
estimated_cost_usd: number;
risk_summary: string;
}
// User approves the plan, then agent executesGoal-like loops need:
- Step budget (e.g., max 20 steps)
- Time budget (e.g., 5 minutes)
- Cost budget (e.g., $0.50)
- Checkpoints (e.g., save state every 5 steps)
- Termination reasons (success, budget, validation failure, user abort)
Reference:
references/planning-and-goals.md规划模式将思考与执行分离:
typescript
interface PlanningResult {
plan: string; // Agent的执行计划
required_approvals: string[]; // 需要人工审批的工具
estimated_steps: number;
estimated_cost_usd: number;
risk_summary: string;
}
// 用户批准计划后,Agent才会执行类目标循环需要:
- 步骤预算(例如:最多20步)
- 时间预算(例如:5分钟)
- 成本预算(例如:0.50美元)
- 检查点(例如:每5步保存一次状态)
- 终止原因(成功、预算耗尽、验证失败、用户中止)
参考文档:
references/planning-and-goals.md4. Context and Memory
4. 上下文与记忆
Context hierarchy:
typescript
interface AgentContext {
// Stable, cache-friendly prefix
system_instructions: string;
skill_descriptions: string[];
// Active state (outside prompt)
plan: Plan | null;
pending_approvals: Approval[];
todos: Todo[];
artifacts: Artifact[];
// Recent conversation (compacted)
messages: Message[];
// Retrieved knowledge
retrieved_docs: Document[];
}Compaction rules:
- Preserve active state (plan, approvals, todos, artifacts) outside the prompt
- Summarize conversation, not decisions
- Rehydrate from state, not chat history
- Label trust boundaries (user, model, tool, external)
Reference:
references/context-memory-compaction.md上下文层级:
typescript
interface AgentContext {
// 稳定、适合缓存的前缀
system_instructions: string;
skill_descriptions: string[];
// 活跃状态(存储在提示词外部)
plan: Plan | null;
pending_approvals: Approval[];
todos: Todo[];
artifacts: Artifact[];
// 近期对话(已压缩)
messages: Message[];
// 检索到的知识
retrieved_docs: Document[];
}压缩规则:
- 将活跃状态(计划、待审批项、待办事项、工件)存储在提示词外部
- 仅总结对话内容,不总结决策信息
- 从状态中恢复上下文,而非从聊天历史中
- 标记信任边界(用户、模型、工具、外部来源)
参考文档:
references/context-memory-compaction.md5. Prompt Caching
5. 提示词缓存
Cache-aware layout:
typescript
// Stable prefix (cached)
const systemPrefix = [
systemInstructions,
allSkillDescriptions,
allToolSchemas,
permanentExamples
];
// Dynamic suffix (not cached)
const dynamicSuffix = [
retrievedDocs,
recentMessages,
currentTask
];
// OpenAI: system, cached_user, user
// Anthropic: system (cached), user (cached), userCost telemetry:
typescript
interface ModelCallTelemetry {
input_tokens: number;
output_tokens: number;
cached_tokens: number;
cost_usd: number;
cache_hit_rate: number;
}Reference:
references/prompt-caching-and-cost.md缓存感知的布局:
typescript
// 稳定前缀(可缓存)
const systemPrefix = [
systemInstructions,
allSkillDescriptions,
allToolSchemas,
permanentExamples
];
// 动态后缀(不可缓存)
const dynamicSuffix = [
retrievedDocs,
recentMessages,
currentTask
];
// OpenAI:system, cached_user, user
// Anthropic:system(缓存), user(缓存), user成本遥测:
typescript
interface ModelCallTelemetry {
input_tokens: number;
output_tokens: number;
cached_tokens: number;
cost_usd: number;
cache_hit_rate: number;
}参考文档:
references/prompt-caching-and-cost.md6. Skills and Connectors
6. Skills与连接器
Progressive disclosure:
typescript
// Step 1: Load skill summaries (cached)
const skillIndex = [
{ name: "web-search", description: "Search the web..." },
{ name: "code-analysis", description: "Analyze codebases..." }
];
// Step 2: Load full skill when relevant
if (userNeedsWebSearch) {
const webSearchSkill = loadSkill("web-search");
context.push(webSearchSkill.fullInstructions);
}MCP integration:
typescript
// Expose MCP server as typed tools
const mcpTools = await mcpClient.listTools();
for (const tool of mcpTools) {
registerTool({
name: tool.name,
description: tool.description,
parameters: tool.inputSchema,
permission: classifyRisk(tool.name)
});
}Reference:
references/skills-and-connectors.md渐进式披露:
typescript
// 步骤1:加载Skill摘要(可缓存)
const skillIndex = [
{ name: "web-search", description: "Search the web..." },
{ name: "code-analysis", description: "Analyze codebases..." }
];
// 步骤2:仅在需要时加载完整Skill
if (userNeedsWebSearch) {
const webSearchSkill = loadSkill("web-search");
context.push(webSearchSkill.fullInstructions);
}MCP集成:
typescript
// 将MCP服务器暴露为类型化工具
const mcpTools = await mcpClient.listTools();
for (const tool of mcpTools) {
registerTool({
name: tool.name,
description: tool.description,
parameters: tool.inputSchema,
permission: classifyRisk(tool.name)
});
}参考文档:
references/skills-and-connectors.mdCommon Patterns
常见模式
Pattern 1: Generate MVP Agent Blueprint
模式1:生成MVP Agent蓝图
User request:
Build an agent for account renewal risk. It should read CRM, support tickets,
and usage data, then draft renewal actions.Agent response using this skill:
markdown
undefined用户请求:
构建一个账户续费风险评估Agent。它需要读取CRM数据、客服工单和使用数据,然后草拟续费相关操作。使用本Skill的Agent响应:
markdown
undefinedMVP Blueprint: Account Renewal Risk Agent
MVP蓝图:账户续费风险评估Agent
Harness level: Approval-gated Level 2
Core loop:
user/task → context builder → model call → typed tool call
→ schema validation → permission check → execution or pause
→ structured observation → next step or final brief
Minimal tools:
-
- permission:
read_account_profileread_private_data- Input: account_id
- Output: { name, tier, arr, contract_end_date, renewal_probability }
-
- permission:
list_support_ticketsread_private_data- Input: account_id, days: 90
- Output: Ticket[]
-
- permission:
fetch_usage_summaryread_private_data- Input: account_id, days: 30
- Output: { active_users, feature_adoption, last_login }
-
- permission:
draft_customer_emaildraft_external_message- Input: account_id, template, variables
- Output: { subject, body, draft_id }
-
- permission:
request_approvalapproval_gate- Input: action, rationale, draft_id
- Output: { approval_id, status: "pending" }
Launch gate:
- 20 historical accounts processed
- Trace review confirms no unapproved external sends
- Human acceptance on ≥80% of draft actions
- No unbounded tool results or timeout loops
Implementation checklist:
- Typed tools with JSON schemas
- Permission check before every execution
- Step budget (max 15 steps per account)
- Cost telemetry and budget ($0.10 per account)
- Structured observations with trust labels
- Eval: injection, timeout, missing result, budget exhaustion
Reference: `references/mvp-agent-blueprint.md`
---控制框架等级: 带审批关卡的Level 2
核心循环:
用户/任务 → 上下文构建器 → 模型调用 → 类型化工具调用
→ schema验证 → 权限检查 → 执行或暂停
→ 结构化观测结果 → 下一步操作或最终简报
最小工具集:
-
- 权限:
read_account_profileread_private_data- 输入:account_id
- 输出:{ name, tier, arr, contract_end_date, renewal_probability }
-
- 权限:
list_support_ticketsread_private_data- 输入:account_id, days: 90
- 输出:Ticket[]
-
- 权限:
fetch_usage_summaryread_private_data- 输入:account_id, days: 30
- 输出:{ active_users, feature_adoption, last_login }
-
- 权限:
draft_customer_emaildraft_external_message- 输入:account_id, template, variables
- 输出:{ subject, body, draft_id }
-
- 权限:
request_approvalapproval_gate- 输入:action, rationale, draft_id
- 输出:{ approval_id, status: "pending" }
发布门槛:
- 已处理20个历史账户数据
- 追踪记录确认无未经批准的外部发送操作
- 草拟操作的人工验收率≥80%
- 无无边界工具结果或超时循环
实现清单:
- 带JSON schema的类型化工具
- 每次执行前检查权限
- 步骤预算(每个账户最多15步)
- 成本遥测与预算(每个账户0.10美元)
- 带信任标签的结构化观测结果
- 评估:注入攻击、超时、结果缺失、预算耗尽
参考文档:`references/mvp-agent-blueprint.md`
---Pattern 2: Audit Existing Agent
模式2:审核现有Agent
User request:
Our research agent sometimes runs tools forever and forgets why it made
a decision after context compaction. Audit the harness.Agent response using this skill:
markdown
undefined用户请求:
我们的研究Agent有时会无限循环调用工具,并且在上下文压缩后忘记之前的决策原因。请审核其控制框架。使用本Skill的Agent响应:
markdown
undefinedAudit: Research Agent Harness
审核报告:研究Agent控制框架
Failure points identified:
-
No loop budgets (agentic-loop.md)
- Missing: step budget, time budget, cost budget
- Risk: Runaway loops, cost overruns
- Fix: Add ,
max_steps: 30,timeout_seconds: 300max_cost_usd: 2.0
-
Compaction loses active state (context-memory-compaction.md)
- Current: Summarizes chat history, drops plan and approvals
- Risk: Agent forgets what it's doing mid-task
- Fix: Store plan, approvals, todos, artifacts outside prompt; rehydrate on load
-
Tool results unbounded (tools-and-permissions.md)
- Current: Returns full API responses (some 50KB+)
- Risk: Context overflow, cost explosion
- Fix: Add and structured summarization
max_result_bytes: 4096
-
No event trace (security-evals-observability.md)
- Current: Logs model input/output only
- Risk: Cannot debug tool call → execution → observation flow
- Fix: Add structured trace:
{ step, tool, args, result, duration, cost }
-
Trust boundaries unlabeled (context-memory-compaction.md)
- Current: Mixes user input, model output, tool results
- Risk: Injection attacks, confused deputy
- Fix: Label every message:
source: "user" | "model" | "tool" | "external"
Fix priority:
- Add loop budgets and termination reasons
- Move active state outside prompt
- Bound tool results and add structured observations
- Add event trace
- Add evals for injection, timeout, budget exhaustion
References: `references/agentic-loop.md`, `references/context-memory-compaction.md`, `references/security-evals-observability.md`
---已识别的故障点:
-
无循环预算(agentic-loop.md)
- 缺失:步骤预算、时间预算、成本预算
- 风险:无限循环、成本超支
- 修复方案:添加、
max_steps: 30、timeout_seconds: 300max_cost_usd: 2.0
-
压缩后丢失活跃状态(context-memory-compaction.md)
- 当前情况:总结聊天历史,丢弃计划和待审批项
- 风险:Agent在任务中途忘记自身执行目标
- 修复方案:将计划、待审批项、待办事项、工件存储在提示词外部;加载时恢复状态
-
工具结果无边界(tools-and-permissions.md)
- 当前情况:返回完整API响应(部分响应超过50KB)
- 风险:上下文溢出、成本激增
- 修复方案:添加限制,并实现结构化摘要
max_result_bytes: 4096
-
无事件追踪(security-evals-observability.md)
- 当前情况:仅记录模型输入/输出
- 风险:无法调试工具调用→执行→观测结果的完整流程
- 修复方案:添加结构化追踪记录:
{ step, tool, args, result, duration, cost }
-
信任边界未标记(context-memory-compaction.md)
- 当前情况:混合用户输入、模型输出、工具结果
- 风险:注入攻击、权限混淆
- 修复方案:为每条消息标记来源:
source: "user" | "model" | "tool" | "external"
修复优先级:
- 添加循环预算与终止原因
- 将活跃状态移至提示词外部
- 限制工具结果大小并添加结构化观测
- 添加事件追踪
- 针对注入攻击、超时、预算耗尽场景进行评估
参考文档:`references/agentic-loop.md`、`references/context-memory-compaction.md`、`references/security-evals-observability.md`
---Pattern 3: Design Tools and Permissions
模式3:设计工具与权限体系
User request:
The ops agent needs Slack, Linear, Google Drive, and our internal deploy API.
How should tools and permissions work?Agent response using this skill:
markdown
undefined用户请求:
运维Agent需要对接Slack、Linear、Google Drive和我们的内部部署API。工具和权限应该如何设计?使用本Skill的Agent响应:
markdown
undefinedTool Design: Ops Agent
工具设计:运维Agent
Risk classification:
| Tool | Risk Class | Permission |
|---|---|---|
| | Autonomous (scoped) |
| | Autonomous (scoped) |
| | Autonomous (labeled) |
| | Approval gate |
| | Approval gate |
| | Approval gate |
Tool schemas:
typescript
// Good: Narrow, typed
interface PostSlackMessageTool {
name: "post_slack_message";
parameters: {
channel: string; // Must match allow-list
message: string;
thread_ts?: string;
};
permission: "approval_gate";
}
// Bad: Generic, unbounded
interface SendMessageTool {
name: "send_message";
parameters: {
platform: string;
destination: string;
content: string;
};
}Approval flow:
typescript
// Agent proposes
const proposal = {
tool: "post_slack_message",
args: { channel: "#incidents", message: "Deploy complete." },
rationale: "Notify team of successful rollout."
};
// Harness pauses and stores
const approval = await requestApproval(proposal);
// Human reviews in UI
// On approval, harness executes and returns observation
const result = await executeWithApproval(approval.id);Connector governance:
typescript
// Wrap external APIs as typed tools
class SlackConnector {
async postMessage(channel: string, message: string): Promise<ToolResult> {
// Validate channel against allow-list
if (!this.allowedChannels.includes(channel)) {
return { status: "denied", reason: "Channel not in allow-list" };
}
// Execute
const response = await this.slackClient.chat.postMessage({
channel,
text: message
});
// Return structured observation
return {
status: "success",
data: { ts: response.ts, channel: response.channel },
metadata: { timestamp: Date.now(), cost_usd: 0 }
};
}
}Reference: ,
references/tools-and-permissions.mdreferences/skills-and-connectors.md风险分类:
| 工具 | 风险分类 | 权限要求 |
|---|---|---|
| | 自主执行(带范围限制) |
| | 自主执行(带范围限制) |
| | 自主执行(带标签标识) |
| | 需要审批关卡 |
| | 需要审批关卡 |
| | 需要审批关卡 |
工具Schema:
typescript
// 推荐:范围明确、类型化
interface PostSlackMessageTool {
name: "post_slack_message";
parameters: {
channel: string; // 必须匹配允许列表
message: string;
thread_ts?: string;
};
permission: "approval_gate";
}
// 不推荐:通用化、无边界
interface SendMessageTool {
name: "send_message";
parameters: {
platform: string;
destination: string;
content: string;
};
}审批流程:
typescript
// Agent提出操作建议
const proposal = {
tool: "post_slack_message",
args: { channel: "#incidents", message: "Deploy complete." },
rationale: "Notify team of successful rollout."
};
// 控制框架暂停并存储审批请求
const approval = await requestApproval(proposal);
// 人工在UI中审核
// 审批通过后,控制框架执行操作并返回观测结果
const result = await executeWithApproval(approval.id);连接器治理:
typescript
// 将外部API封装为类型化工具
class SlackConnector {
async postMessage(channel: string, message: string): Promise<ToolResult> {
// 验证频道是否在允许列表中
if (!this.allowedChannels.includes(channel)) {
return { status: "denied", reason: "Channel not in allow-list" };
}
// 执行操作
const response = await this.slackClient.chat.postMessage({
channel,
text: message
});
// 返回结构化观测结果
return {
status: "success",
data: { ts: response.ts, channel: response.channel },
metadata: { timestamp: Date.now(), cost_usd: 0 }
};
}
}参考文档:、
references/tools-and-permissions.mdreferences/skills-and-connectors.mdPattern 4: Production Readiness Checklist
模式4:生产环境就绪清单
User request:
What should I check before launching this agent?Agent response using this skill:
markdown
undefined用户请求:
在发布这个Agent之前,我需要检查哪些内容?使用本Skill的Agent响应:
markdown
undefinedProduction Readiness Checklist
生产环境就绪清单
Runtime:
- Loop budgets enforced (step, time, cost, tool calls)
- Termination reasons logged (success, budget, validation, abort)
- Retries with exponential backoff for transient failures
- Active state (plan, approvals, todos, artifacts) outside prompt
- Context compaction preserves active state
Tools:
- Every tool has JSON schema and docstring
- Risk classification for every tool
- Permission check before every execution
- Tool results bounded (max 4KB per result)
- Structured observations with trust labels
- No generic tools (execute_anything, send_message, write_database)
Security:
- Input validation for all tool arguments
- Approval gates for high-risk tools
- Injection evals pass (prompt injection, jailbreak, PII leakage)
- Sandboxing for code execution tools
- Rate limits and abuse detection
Observability:
- Structured event trace (step, tool, args, result, duration, cost)
- Cost telemetry per task (input tokens, output tokens, cached tokens, USD)
- Error categorization (validation, permission, execution, timeout)
- Dashboards for cost, latency, success rate, approval rate
Evals:
- Historical task replay (≥50 tasks)
- Adversarial inputs (injection, confused deputy, unbounded loops)
- Edge cases (empty results, timeouts, malformed args, missing approvals)
- Human eval on ≥80% of high-risk tool calls
Launch gates:
- Shadow mode with human review for 1 week
- No unapproved external communications
- No cost overruns (≤10% over estimate)
- Incident response plan documented
Reference: `references/checklists.md`, `references/security-evals-observability.md`
---运行时:
- 已强制执行循环预算(步骤、时间、成本、工具调用次数)
- 已记录终止原因(成功、预算耗尽、验证失败、用户中止)
- 针对临时故障实现指数退避重试机制
- 活跃状态(计划、待审批项、待办事项、工件)存储在提示词外部
- 上下文压缩时保留活跃状态
工具:
- 每个工具都有JSON schema和文档字符串
- 每个工具都已进行风险分类
- 每次执行前检查权限
- 工具结果大小受限(每个结果最多4KB)
- 带信任标签的结构化观测结果
- 无通用型工具(如execute_anything、send_message、write_database)
安全:
- 所有工具参数都已进行输入验证
- 高风险工具已设置审批关卡
- 注入攻击评估通过(提示词注入、越狱、PII泄露)
- 代码执行工具已实现沙箱隔离
- 已设置速率限制和滥用检测机制
可观测性:
- 结构化事件追踪(步骤、工具、参数、结果、时长、成本)
- 按任务统计成本遥测(输入Token、输出Token、缓存Token、美元成本)
- 错误分类(验证错误、权限错误、执行错误、超时)
- 成本、延迟、成功率、审批率等指标仪表盘
评估:
- 历史任务重放(≥50个任务)
- 对抗性输入测试(注入攻击、权限混淆、无限循环)
- 边缘场景测试(空结果、超时、参数格式错误、缺失审批)
- 高风险工具调用的人工评估覆盖率≥80%
发布门槛:
- 已在影子模式下进行1周的人工审核
- 无未经批准的外部通信
- 无成本超支(≤预估成本的10%)
- 已记录事件响应计划
参考文档:`references/checklists.md`、`references/security-evals-observability.md`
---Provider API Patterns
服务商API模式
OpenAI (Compatible)
OpenAI(兼容)
typescript
import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: systemInstructions },
{ role: "user", content: task }
],
tools: tools.map(t => ({
type: "function",
function: {
name: t.name,
description: t.description,
parameters: t.parameters
}
})),
tool_choice: "auto"
});
// Handle tool calls
if (response.choices[0].message.tool_calls) {
for (const toolCall of response.choices[0].message.tool_calls) {
const result = await executeToolWithPermission(
toolCall.function.name,
JSON.parse(toolCall.function.arguments)
);
messages.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(result)
});
}
}typescript
import OpenAI from "openai";
const client = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const response = await client.chat.completions.create({
model: "gpt-4o",
messages: [
{ role: "system", content: systemInstructions },
{ role: "user", content: task }
],
tools: tools.map(t => ({
type: "function",
function: {
name: t.name,
description: t.description,
parameters: t.parameters
}
})),
tool_choice: "auto"
});
// 处理工具调用
if (response.choices[0].message.tool_calls) {
for (const toolCall of response.choices[0].message.tool_calls) {
const result = await executeToolWithPermission(
toolCall.function.name,
JSON.parse(toolCall.function.arguments)
);
messages.push({
role: "tool",
tool_call_id: toolCall.id,
content: JSON.stringify(result)
});
}
}Anthropic
Anthropic
typescript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const response = await client.messages.create({
model: "claude-3-7-sonnet-20250219",
max_tokens: 4096,
system: [
{ type: "text", text: systemInstructions, cache_control: { type: "ephemeral" } }
],
messages: [
{ role: "user", content: task }
],
tools: tools.map(t => ({
name: t.name,
description: t.description,
input_schema: t.parameters
}))
});
// Handle tool calls
if (response.stop_reason === "tool_use") {
for (const block of response.content) {
if (block.type === "tool_use") {
const result = await executeToolWithPermission(block.name, block.input);
messages.push({
role: "user",
content: [{
type: "tool_result",
tool_use_id: block.id,
content: JSON.stringify(result)
}]
});
}
}
}
// Track cache metrics
console.log({
input_tokens: response.usage.input_tokens,
cache_read_tokens: response.usage.cache_read_input_tokens,
cache_creation_tokens: response.usage.cache_creation_input_tokens
});Reference:
references/provider-api-patterns.mdtypescript
import Anthropic from "@anthropic-ai/sdk";
const client = new Anthropic({ apiKey: process.env.ANTHROPIC_API_KEY });
const response = await client.messages.create({
model: "claude-3-7-sonnet-20250219",
max_tokens: 4096,
system: [
{ type: "text", text: systemInstructions, cache_control: { type: "ephemeral" } }
],
messages: [
{ role: "user", content: task }
],
tools: tools.map(t => ({
name: t.name,
description: t.description,
input_schema: t.parameters
}))
});
// 处理工具调用
if (response.stop_reason === "tool_use") {
for (const block of response.content) {
if (block.type === "tool_use") {
const result = await executeToolWithPermission(block.name, block.input);
messages.push({
role: "user",
content: [{
type: "tool_result",
tool_use_id: block.id,
content: JSON.stringify(result)
}]
});
}
}
}
// 追踪缓存指标
console.log({
input_tokens: response.usage.input_tokens,
cache_read_tokens: response.usage.cache_read_input_tokens,
cache_creation_tokens: response.usage.cache_creation_input_tokens
});参考文档:
references/provider-api-patterns.mdConfiguration Example
配置示例
Harness config:
typescript
interface AgentConfig {
model: {
provider: "openai" | "anthropic" | "openai-compatible";
model: string;
base_url?: string;
api_key_env: string;
};
loop: {
max_steps: number; // e.g., 30
timeout_seconds: number; // e.g., 300
max_cost_usd: number; // e.g., 1.0
max_tool_calls_per_step: number; // e.g., 5
};
context: {
max_messages: number; // e.g., 50
compaction_threshold: number; // e.g., 40
max_tool_result_bytes: number; // e.g., 4096
};
permissions: {
auto_approve: string[]; // e.g., ["read_private_data", "draft_external_message"]
require_approval: string[]; // e.g., ["external_communication", "destructive_action"]
};
observability: {
trace_enabled: boolean;
cost_tracking_enabled: boolean;
eval_mode: "shadow" | "production";
};
}
const config: AgentConfig = {
model: {
provider: "anthropic",
model: "claude-3-7-sonnet-20250219",
api_key_env: "ANTHROPIC_API_KEY"
},
loop: {
max_steps: 30,
timeout_seconds: 300,
max_cost_usd: 1.0,
max_tool_calls_per_step: 5
},
context: {
max_messages: 50,
compaction_threshold: 40,
max_tool_result_bytes: 4096
},
permissions: {
auto_approve: ["read_private_data", "draft_external_message"],
require_approval: ["external_communication", "write_database", "destructive_action"]
},
observability: {
trace_enabled: true,
cost_tracking_enabled: true,
eval_mode: "shadow"
}
};控制框架配置:
typescript
interface AgentConfig {
model: {
provider: "openai" | "anthropic" | "openai-compatible";
model: string;
base_url?: string;
api_key_env: string;
};
loop: {
max_steps: number; // 示例:30
timeout_seconds: number; // 示例:300
max_cost_usd: number; // 示例:1.0
max_tool_calls_per_step: number; // 示例:5
};
context: {
max_messages: number; // 示例:50
compaction_threshold: number; // 示例:40
max_tool_result_bytes: number; // 示例:4096
};
permissions: {
auto_approve: string[]; // 示例:["read_private_data", "draft_external_message"]
require_approval: string[]; // 示例:["external_communication", "destructive_action"]
};
observability: {
trace_enabled: boolean;
cost_tracking_enabled: boolean;
eval_mode: "shadow" | "production";
};
}
const config: AgentConfig = {
model: {
provider: "anthropic",
model: "claude-3-7-sonnet-20250219",
api_key_env: "ANTHROPIC_API_KEY"
},
loop: {
max_steps: 30,
timeout_seconds: 300,
max_cost_usd: 1.0,
max_tool_calls_per_step: 5
},
context: {
max_messages: 50,
compaction_threshold: 40,
max_tool_result_bytes: 4096
},
permissions: {
auto_approve: ["read_private_data", "draft_external_message"],
require_approval: ["external_communication", "write_database", "destructive_action"]
},
observability: {
trace_enabled: true,
cost_tracking_enabled: true,
eval_mode: "shadow"
}
};Troubleshooting
故障排查
Issue: Agent loops forever
问题:Agent无限循环
Symptoms: Agent exceeds step budget or timeout without completing task.
Diagnosis:
- Check for loop invariants
references/agentic-loop.md - Verify step budget, time budget, and termination conditions are enforced
- Review tool results: are they bounded? Are failures properly observed?
Fix:
typescript
// Add hard budgets
if (step >= config.loop.max_steps) {
return { status: "budget_exhausted", reason: "max_steps" };
}
if (Date.now() - startTime > config.loop.timeout_seconds * 1000) {
return { status: "timeout", reason: "time_budget" };
}
// Add stop rules
if (allTodosComplete() || userAborted() || criticalError()) {
return { status: "terminated", reason: ... };
}症状: Agent超出步骤预算或超时,无法完成任务。
诊断:
- 查看中的循环不变量
references/agentic-loop.md - 验证步骤预算、时间预算和终止条件是否已强制执行
- 检查工具结果:是否有大小限制?失败是否已正确记录为观测结果?
修复方案:
typescript
// 添加硬性预算限制
if (step >= config.loop.max_steps) {
return { status: "budget_exhausted", reason: "max_steps" };
}
if (Date.now() - startTime > config.loop.timeout_seconds * 1000) {
return { status: "timeout", reason: "time_budget" };
}
// 添加停止规则
if (allTodosComplete() || userAborted() || criticalError()) {
return { status: "terminated", reason: ... };
}Issue: Context compaction loses active work
问题:上下文压缩后丢失活跃工作内容
Symptoms: Agent forgets plan, pending approvals, or todos after compaction.
Diagnosis:
- Check
references/context-memory-compaction.md - Verify active state is stored outside the prompt
Fix:
typescript
interface ActiveState {
plan: Plan | null;
pending_approvals: Approval[];
todos: Todo[];
artifacts: Artifact[];
}
// Store outside prompt
const state = loadState(taskId);
// Rehydrate after compaction
const context = buildContext({
system: systemInstructions,
plan: state.plan,
todos: state.todos,
messages: compactedMessages
});症状: Agent在压缩后忘记计划、待审批项或待办事项。
诊断:
- 查看
references/context-memory-compaction.md - 验证活跃状态是否存储在提示词外部
修复方案:
typescript
interface ActiveState {
plan: Plan | null;
pending_approvals: Approval[];
todos: Todo[];
artifacts: Artifact[];
}
// 存储在提示词外部
const state = loadState(taskId);
// 压缩后恢复上下文
const context = buildContext({
system: systemInstructions,
plan: state.plan,
todos: state.todos,
messages: compactedMessages
});Issue: Approval gates bypassed
问题:审批关卡被绕过
Symptoms: High-risk tool executed without approval record.
Diagnosis:
- Check
references/tools-and-permissions.md - Verify permission check runs before every execution
Fix:
typescript
async function executeToolWithPermission(tool: string, args: any): Promise<ToolResult> {
const permission = getToolPermission(tool);
if (permission === "approval_gate") {
const approval = await findApproval(tool, args);
if (!approval || approval.status !== "approved") {
return { status: "denied", reason: "Requires human approval" };
}
}
// Execute only after permission check passes
return await executeTool(tool, args);
}症状: 高风险工具在无审批记录的情况下被执行。
诊断:
- 查看
references/tools-and-permissions.md - 验证每次执行前是否都进行了权限检查
修复方案:
typescript
async function executeToolWithPermission(tool: string, args: any): Promise<ToolResult> {
const permission = getToolPermission(tool);
if (permission === "approval_gate") {
const approval = await findApproval(tool, args);
if (!approval || approval.status !== "approved") {
return { status: "denied", reason: "Requires human approval" };
}
}
// 仅在权限检查通过后执行
return await executeTool(tool, args);
}Issue: Cost explosion
问题:成本激增
Symptoms: Task costs 10x estimate; cached_tokens = 0.
Diagnosis:
- Check
references/prompt-caching-and-cost.md - Verify stable prefix is cached
Fix:
typescript
// OpenAI: Use system + cached_user pattern
const messages = [
{ role: "system", content: stablePrefix },
{ role: "user", content: cachedKnowledge, cache_control: { type: "ephemeral" } },
{ role: "user", content: dynamicTask }
];
// Anthropic: Cache system blocks
const system = [
{ type: "text", text: stablePrefix, cache_control: { type: "ephemeral" } }
];
// Track cache hit rate
if (cacheHitRate < 0.7) {
console.warn("Low cache hit rate; review context layout");
}症状: 任务成本是预估的10倍;cached_tokens = 0。
诊断:
- 查看
references/prompt-caching-and-cost.md - 验证稳定前缀是否已被缓存
修复方案:
typescript
// OpenAI:使用system + cached_user模式
const messages = [
{ role: "system", content: stablePrefix },
{ role: "user", content: cachedKnowledge, cache_control: { type: "ephemeral" } },
{ role: "user", content: dynamicTask }
];
// Anthropic:缓存system块
const system = [
{ type: "text", text: stablePrefix, cache_control: { type: "ephemeral" } }
];
// 追踪缓存命中率
if (cacheHitRate < 0.7) {
console.warn("缓存命中率低;请检查上下文布局");
}Issue: Injection attack
问题:注入攻击
Symptoms: Agent executes unintended tool calls from user input.
Diagnosis:
- Check
references/security-evals-observability.md - Verify input validation and trust labels
Fix:
typescript
// Label trust boundaries
const messages = [
{ role: "user", content: userInput, metadata: { source: "user", trusted: false } },
{ role: "assistant", content: modelOutput, metadata: { source: "model" } },
{ role: "tool", content: toolResult, metadata: { source: "tool", trusted: true } }
];
// Validate tool arguments against schema
const valid = validateSchema(tool.parameters, args);
if (!valid) {
return { status: "validation_failed", errors: valid.errors };
}
// Run injection evals
await runEval("prompt_injection", testCases);症状: Agent执行了用户输入中包含的非预期工具调用。
诊断:
- 查看
references/security-evals-observability.md - 验证输入验证和信任标签是否已实现
修复方案:
typescript
// 标记信任边界
const messages = [
{ role: "user", content: userInput, metadata: { source: "user", trusted: false } },
{ role: "assistant", content: modelOutput, metadata: { source: "model" } },
{ role: "tool", content: toolResult, metadata: { source: "tool", trusted: true } }
];
// 根据schema验证工具参数
const valid = validateSchema(tool.parameters, args);
if (!valid) {
return { status: "validation_failed", errors: valid.errors };
}
// 运行注入攻击评估
await runEval("prompt_injection", testCases);When to Use This Skill
何时使用本Skill
This skill activates when conversations involve:
- Agent architecture: harness, loop, runtime, control plane
- Tool design: permissions, approvals, typed tools, risk classes
- Planning: planning mode, goal loops, checkpoints, budgets
- Context: memory, compaction, retrieval, active state
- Security: injection, guardrails, evals, sandboxing
- Observability: tracing, cost telemetry, launch gates
- Connectors: skills, MCP, external APIs, progressive disclosure
- Production readiness: checklists, incident response, audits
当对话涉及以下内容时,本Skill会激活:
- Agent架构:控制框架、循环机制、运行时、控制层
- 工具设计:权限、审批、类型化工具、风险分类
- 规划:规划模式、目标循环、检查点、预算
- 上下文:记忆、压缩、检索、活跃状态
- 安全:注入攻击、防护机制、评估、沙箱隔离
- 可观测性:追踪、成本遥测、发布门槛
- 连接器:Skills、MCP、外部API、渐进式披露
- 生产环境就绪:清单、事件响应、审核
Key References
关键参考文档
All detailed references live in :
references/- MVP Blueprint: — domain-specific harness generator
mvp-agent-blueprint.md - Loop Design: — invariants, retries, budgets, stopping
agentic-loop.md - Tools: — typed tools, risk classes, approvals
tools-and-permissions.md - Planning: — planning mode, goal loops
planning-and-goals.md - Context: — context, memory, retrieval
context-memory-compaction.md - Caching: — cache-aware layout, cost telemetry
prompt-caching-and-cost.md - Connectors: — Agent Skills, MCP, progressive disclosure
skills-and-connectors.md - APIs: — OpenAI, Anthropic, compatible
provider-api-patterns.md - Security: — guardrails, tracing, evals
security-evals-observability.md - Checklists: — implementation and audit checklists
checklists.md
所有详细参考文档都位于目录中:
references/- MVP蓝图:— 领域专属控制框架生成器
mvp-agent-blueprint.md - 循环设计:— 不变量、重试、预算、停止规则
agentic-loop.md - 工具:— 类型化工具、风险分类、审批
tools-and-permissions.md - 规划:— 规划模式、目标循环
planning-and-goals.md - 上下文:— 上下文、记忆、检索
context-memory-compaction.md - 缓存:— 缓存感知布局、成本遥测
prompt-caching-and-cost.md - 连接器:— Agent Skills、MCP、渐进式披露
skills-and-connectors.md - APIs:— OpenAI、Anthropic兼容API
provider-api-patterns.md - 安全:— 防护机制、追踪、评估
security-evals-observability.md - 清单:— 实现与审核清单
checklists.md
Philosophy Summary
理念总结
- The harness acts, not the model — the model proposes; harness validates, authorizes, executes, records
- Every tool call gets a result — denial, timeout, malformed, abort are observations too
- Risk changes the loop — reads, drafts, writes, external comms, destructive, privileged need different paths
- Draft and commit are separate — high-risk side effects require approval records outside prompt
- Context is built, not dumped — retrieve just enough, label trust, preserve active state
- Long-running work needs budgets — step, time, token, cost, tool-call budgets are product features
- Skills and connectors are progressively disclosed — expose names first, load details when relevant
- Repeated failures become harness features — validators, tools, docs, evals, policies beat repeating prompt advice
- 控制框架执行操作,而非模型 — 模型提出建议;控制框架负责验证、授权、执行、记录
- 每个工具调用都有结果 — 拒绝、超时、格式错误、中止也属于观测结果
- 风险改变循环流程 — 读取、草拟、写入、外部通信、破坏性操作、特权操作需要不同流程
- 草拟与提交分离 — 高风险副作用需要在提示词外部记录审批信息
- 上下文是构建出来的,而非直接堆砌 — 仅检索必要内容、标记信任边界、保留活跃状态
- 长期运行任务需要预算 — 步骤、时间、Token、成本、工具调用次数预算是产品特性
- Skills与连接器采用渐进式披露 — 先暴露名称,在需要时加载详细内容
- 重复故障转化为控制框架特性 — 验证器、工具、文档、评估、策略比重复的提示词建议更有效
License
许可证
MIT License — see repository for details.
MIT许可证 — 详情请查看仓库文档。
Learn More
了解更多
- Repository: github.com/DenisSergeevitch/agents-best-practices
- Agent Skills Spec: agentskills.io/specification
- Official API docs:
references/source-links.md
- 仓库地址:github.com/DenisSergeevitch/agents-best-practices
- Agent Skills规范:agentskills.io/specification
- 官方API文档:
references/source-links.md