agentic-harness-patterns-zh

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Agentic Harness Patterns(中文版)

Agentic Harness Patterns (Chinese Version)

生产级 AI 编程助手不只是"大模型 + 工具调用循环"。循环本身很简单。Harness — 记忆、技能、安全、上下文控制、委派、可扩展性 — 才是让 Agent 可靠、安全、大规模运行的关键。
适合: 正在构建或扩展 AI Agent 运行时、自定义 Agent、或高级多 Agent 工作流的工程师。 不适合: Prompt 工程、模型选择、通用软件架构、LLM API 入门。
所有原则均从生产级运行时决策中提炼而来。Claude Code 作为实证依据,而非唯一实现。
Production-grade AI programming assistants are far more than just "large model + tool calling loop". The loop itself is very simple. Harness — covering memory, skills, security, context control, delegation, and scalability — is the key to making Agents run reliably, securely, and at scale.
Suitable for: Engineers building or scaling AI Agent runtimes, custom Agents, or advanced multi-Agent workflows. Not suitable for: Prompt engineering, model selection, general software architecture, entry-level LLM API learning.
All principles are extracted from production-level runtime decisions. Claude Code is used as empirical evidence, not the only implementation.

选择你的问题

Choose Your Problem

你想要...阅读
让 Agent 跨会话记住并持续改进记忆系统
打包可复用的工作流和专业知识技能系统
让 Agent 强大地使用工具但不危险工具与安全
给 Agent 正确的上下文、合理的成本上下文工程
将工作拆分给多个 Agent 而不失控多 Agent 协调
通过 Hook、后台任务或启动逻辑扩展行为生命周期与可扩展性
开始构建之前: 先读 踩坑指南 — 这些是最不直觉但最烧时间的失败模式。

What you want...Read
Let the Agent remember and improve continuously across sessionsMemory System
Package reusable workflows and domain expertiseSkill System
Let the Agent use tools powerfully without risksTools and Security
Provide the Agent with correct context at reasonable costContext Engineering
Split work among multiple Agents without losing controlMulti-Agent Coordination
Extend behavior via Hook, background tasks or startup logicLifecycle and Scalability
Before you start building: Read the Pitfall Guide first — these are the most counterintuitive but time-consuming failure modes.

1. 记忆系统

1. Memory System

用户痛点: "我的 Agent 下次对话就忘了所有纠正和项目规则。"
黄金法则: 区分 Agent 知道的(指令记忆)、Agent 学到的(自动记忆)、和 Agent 提取的(会话记忆)。三层的持久性、信任度、审查需求各不相同。
适用场景: 任何跨会话运行或需要持续积累项目知识的 Agent。
工作原理:
  • 指令记忆 是人工策划的、分层的配置,按优先级注入系统上下文(组织级 → 用户级 → 项目级 → 本地级;本地优先)。项目编码规范、行为规则都在这里。它是人类编写的,稳定不变。
  • 自动记忆 是 Agent 自主写入的持久知识,带有类型分类法(用户 / 反馈 / 项目 / 引用)和有上限的索引。写入是两步操作:先写主题文件,再更新索引。上限防止无限增长 — 不清理的话,新条目会被静默截断。
  • 会话提取 以后台 Agent 的形式在会话结束时运行。它直接写入自动记忆 — 先主题文件再索引 — 遵循相同的两步保存不变式。互斥锁确保:如果主 Agent 在当轮已经写过记忆,提取器直接跳过。这是自主学习循环。
  • 审查与晋升 审计所有记忆层并提议跨层移动(自动记忆 → 项目规范、个人设置或团队记忆)。它永远不自主应用更改 — 提议需要用户明确批准。
从这里开始: 定义你的记忆层(指令、自动、提取)。实现两步保存不变式(先主题文件,再索引)。核心写入路径稳定后再加后台提取。
在 Claude Code 中: 使用
/remember
审计和晋升各层自动记忆条目。
权衡:
  • 更多记忆层 = 更丰富的回忆但更高的维护负担。不定期清理的话,索引上限导致静默数据丢失。
  • 会话提取在会话结束时增加延迟,但大幅提升跨会话学习能力。
深入阅读: references/memory-persistence-pattern.md

User pain point: "My Agent forgets all corrections and project rules in the next conversation."
Golden Rule: Distinguish between what the Agent knows (instruction memory), what the Agent has learned (automatic memory), and what the Agent retrieves (session memory). The three layers have different requirements for persistence, trust, and audit.
Applicable scenarios: Any Agent that runs across sessions or needs to continuously accumulate project knowledge.
How it works:
  • Instruction memory is manually curated, hierarchical configuration injected into the system context by priority (organization level → user level → project level → local level; local takes precedence). Project coding specifications and behavior rules are stored here. It is written by humans and stable.
  • Automatic memory is persistent knowledge written autonomously by the Agent, with a type taxonomy (user / feedback / project / reference) and a capped index. Writing is a two-step operation: first write the topic file, then update the index. The cap prevents unlimited growth — if not cleaned up, new entries will be silently truncated.
  • Session extraction runs as a background Agent at the end of the session. It directly writes to automatic memory — first topic file then index — following the same two-step save invariant. A mutex ensures that if the main Agent has already written memory in the current round, the extractor skips directly. This is the autonomous learning loop.
  • Review and promotion audits all memory layers and proposes cross-layer moves (automatic memory → project specifications, personal settings or team memory). It never applies changes autonomously — proposals require explicit user approval.
Start here: Define your memory layers (instruction, automatic, extraction). Implement the two-step save invariant (topic file first, then index). Add background extraction after the core write path is stable.
In Claude Code: Use
/remember
to audit and promote automatic memory entries across layers.
Trade-offs:
  • More memory layers = richer recall but higher maintenance burden. If not cleaned up regularly, the index cap will cause silent data loss.
  • Session extraction adds latency at the end of the session, but greatly improves cross-session learning ability.
Further reading: references/memory-persistence-pattern.md

2. Skills 系统

2. Skills System

用户痛点: "我想让 Agent 复用工作流和领域知识,不用每次重新解释。"
黄金法则: 技能是懒加载的指令集,不是立即注入的 prompt。发现必须廉价(仅元数据);完整内容只在激活时加载。
适用场景: 任何需要可复用、可组合工作流并根据用户意图匹配激活的 Agent。
工作原理:
  • 发现 是预算约束的:Agent 看到所有可用技能的紧凑列表(名称、描述、触发提示拼接在一起),每条硬限在固定字符数,总量限制在上下文窗口的约 1%。把触发关键词放在前面 — 后面会被截断。
  • 加载 是懒的:只有元数据进入始终在线的上下文。完整技能内容只在激活时加载,闲置 token 成本接近零。
  • 执行 可以是内联的(共享上下文)或隔离的(fork 子 Agent,有自己的 token 预算)。隔离防止重型技能耗尽父级上下文。
  • 来源 可以是内置的、用户安装的、或从插件动态加载的。通过规范路径去重,防止同一技能在重叠的源目录中出现两次。
从这里开始: 选一种元数据格式(推荐 frontmatter)。实现两阶段发现:启动时廉价列表,调用时懒加载内容。在目录增长前设好每条字符上限。
权衡:
  • 懒加载省 token 但首次激活多一轮延迟。
  • Fork 执行提供隔离但失去父级已积累的上下文。
深入阅读: references/skill-runtime-pattern.md

User pain point: "I want the Agent to reuse workflows and domain knowledge without re-explaining every time."
Golden Rule: Skills are lazy-loaded instruction sets, not immediately injected prompts. Discovery must be cheap (metadata only); full content is only loaded when activated.
Applicable scenarios: Any Agent that requires reusable, composable workflows matched and activated based on user intent.
How it works:
  • Discovery is budget-constrained: the Agent sees a compact list of all available skills (name, description, trigger prompts concatenated together), each with a hard character limit, and the total amount is limited to about 1% of the context window. Put trigger keywords first — the rest will be truncated.
  • Loading is lazy: only metadata enters the always-online context. The full skill content is only loaded when activated, and idle token cost is close to zero.
  • Execution can be inline (shared context) or isolated (fork child Agent with its own token budget). Isolation prevents heavy skills from exhausting the parent context.
  • Sources can be built-in, user-installed, or dynamically loaded from plugins. Deduplication is done via canonical paths to prevent the same skill from appearing twice in overlapping source directories.
Start here: Choose a metadata format (frontmatter is recommended). Implement two-stage discovery: cheap list at startup, lazy load content on call. Set per-entry character limits before the directory grows.
Trade-offs:
  • Lazy loading saves tokens but adds an extra round of latency on first activation.
  • Fork execution provides isolation but loses the context already accumulated by the parent.
Further reading: references/skill-runtime-pattern.md

3. 工具与安全

3. Tools and Security

用户痛点: "我想让 Agent 强大地使用工具,但不要危险。"
黄金法则: 默认关闭(fail-closed)。工具是串行的、有门控的,除非显式标记为并发安全且通过了权限管道。
适用场景: 任何需要工具注册、并发控制或权限门控的 Agent 运行时。
工作原理:
  • 注册 使用 fail-closed 默认值:工具默认不可并发、非只读,除非开发者主动标记。这防止状态变更操作的意外并行执行。
  • 并发分类 是按调用的,不是按工具类型:同一工具对某些输入安全、对另一些不安全。运行时将一批工具调用分成连续组 — 安全调用并行执行,任何不安全调用开始一个串行段。
  • 权限管道 从多个来源按严格优先级顺序评估规则,涵盖配置文件(用户、项目、本地、标志、策略)、CLI 参数、命令级规则和会话授权。评估器是有状态的 — 它追踪拒绝次数、转换模式、更新状态作为副作用。
  • 处理器分发 因执行环境而异:交互式(人类提示)、自动化(协调器)或异步(Swarm Agent)。相同的权限规则供给不同的审批界面。
从这里开始: 让每个工具调用都通过一个权限关卡。默认 fail-closed(拒绝/询问)。在上线任何自动批准模式之前,先加上受保护路径的免豁免规则。
在 Claude Code 中: 使用
/update-config
配置权限规则和 Hook。
权衡:
  • Fail-closed 默认值意味着新工具开箱即安全,但开发者必须主动标记并发安全 — 忘记标记只读工具会静默降低吞吐量。
  • 多来源权限分层功能强大但规则冲突时难以调试。
深入阅读: references/tool-registry-pattern.md | references/permission-gate-pattern.md

User pain point: "I want the Agent to use tools powerfully, but not dangerously."
Golden Rule: Fail-closed by default. Tools are serial and gated, unless explicitly marked as concurrency-safe and passed through the permission pipeline.
Applicable scenarios: Any Agent runtime that requires tool registration, concurrency control, or permission gating.
How it works:
  • Registration uses fail-closed defaults: tools are not concurrent and not read-only by default, unless the developer actively marks them. This prevents accidental parallel execution of state-changing operations.
  • Concurrency classification is per call, not per tool type: the same tool is safe for some inputs and unsafe for others. The runtime splits a batch of tool calls into consecutive groups — safe calls are executed in parallel, and any unsafe call starts a serial segment.
  • Permission pipeline evaluates rules from multiple sources in strict priority order, covering configuration files (user, project, local, flags, policies), CLI parameters, command-level rules and session authorization. The evaluator is stateful — it tracks rejection counts, transitions modes, and updates state as a side effect.
  • Handler distribution varies by execution environment: interactive (human prompt), automated (orchestrator) or asynchronous (Swarm Agent). The same permission rules feed different approval interfaces.
Start here: Let every tool call pass through a permission gate. Fail-closed (reject/ask) by default. Add non-exempt rules for protected paths before launching any auto-approval mode.
In Claude Code: Use
/update-config
to configure permission rules and Hook.
Trade-offs:
  • Fail-closed default means new tools are secure out of the box, but developers must actively mark concurrency safety — forgetting to mark read-only tools will silently reduce throughput.
  • Multi-source permission layering is powerful but difficult to debug when rules conflict.
Further reading: references/tool-registry-pattern.md | references/permission-gate-pattern.md

4. 上下文工程

4. Context Engineering

用户痛点: "我的 Agent 看到太多、太少、或者看错了。"
黄金法则: 把上下文当预算管,不是垃圾桶。窗口里的每个 token 都必须通过四种操作之一赢得它的位置:选择、写回、压缩、隔离。
适用场景: 任何在长会话中性能下降、委派工作污染父上下文、或因急切加载导致启动慢的 Agent。
工作原理:
  • 选择 — 按需加载,不要一次全加载。使用三级渐进披露:元数据(始终存在,廉价)、指令(激活时加载)、资源(按需加载)。记忆化昂贵的上下文构建器,只在已知变更点失效 — 不要响应式。
  • 写回 — 上下文不是只读的。Agent 将信息写回持久存储:自动记忆条目、后台提取输出、任务状态、权限规则。写回循环是把无状态工具调用者变成学习系统的关键。
  • 压缩 — 长会话耗尽窗口。响应式压实在会话中间总结旧轮次,保留近期上下文同时回收预算。将快照数据标记为快照,让模型知道要重新获取当前状态。
  • 隔离 — 委派工作不能污染父级上下文。协调器 worker 零上下文继承(只有显式 prompt)。Fork 子级继承全部上下文但单层限制(不能递归 fork)。文件系统级隔离(worktree)给 Agent 自己的工作副本。
从这里开始: 审计你当前每轮的上下文成本。对每个变长块加硬上限。在启用任何压缩之前,先加截断恢复指针(告诉模型调用哪个工具获取完整输出)。
权衡:
  • 激进缓存降低延迟但有过时风险 — 每个变更点必须显式清除缓存,否则模型在剩余会话中用过时数据。
  • 渐进披露省 token 但意味着模型在技能激活前无法推理其完整能力。
深入阅读: references/context-engineering-pattern.md(索引) | select | compress | isolate

User pain point: "My Agent sees too much, too little, or the wrong thing."
Golden Rule: Manage context as a budget, not a trash can. Every token in the window must earn its place through one of four operations: select, write back, compress, isolate.
Applicable scenarios: Any Agent whose performance degrades in long sessions, whose delegated work pollutes the parent context, or whose startup is slow due to eager loading.
How it works:
  • Select — load on demand, don't load all at once. Use three-level progressive disclosure: metadata (always present, cheap), instructions (loaded on activation), resources (loaded on demand). Memoize expensive context builders, only invalidate at known change points — don't make it reactive.
  • Write back — context is not read-only. The Agent writes information back to persistent storage: automatic memory entries, background extraction output, task status, permission rules. The write-back loop is the key to turning a stateless tool caller into a learning system.
  • Compress — long sessions exhaust the window. Reactive compaction summarizes old rounds in the middle of the session, retaining recent context while reclaiming budget. Mark snapshot data as snapshots so the model knows to re-fetch the current state.
  • Isolate — delegated work must not pollute the parent context. Coordinator workers have zero context inheritance (only explicit prompts). Fork children inherit the full context but have a single-layer limit (cannot recursively fork). File system level isolation (worktree) gives the Agent its own working copy.
Start here: Audit your current per-round context cost. Add hard caps for each variable-length block. Add truncation recovery pointers (tell the model which tool to call to get full output) before enabling any compression.
Trade-offs:
  • Aggressive caching reduces latency but carries the risk of obsolescence — each change point must explicitly clear the cache, otherwise the model will use stale data for the rest of the session.
  • Progressive disclosure saves tokens but means the model cannot reason about the full capabilities of a skill before activation.
Further reading: references/context-engineering-pattern.md (index) | select | compress | isolate

5. 多 Agent 协调

5. Multi-Agent Coordination

用户痛点: "我要并行、专业化和协调,但不要混乱。"
黄金法则: 协调者必须自己综合,不能委派理解。"基于你的发现,修复它" 是反模式 — 协调者应该消化 worker 结果成精确规格,然后再派发实现。
适用场景: 当任务对单个 Agent 太大时、需要并行探索时、或想要持久化的专业队友时。
工作原理:
三种委派模式服务不同的任务形态:
模式上下文共享适合
Coordinator无 — worker 从零开始复杂多阶段任务(研究 → 综合 → 实现 → 验证)
Fork全部 — 子级继承父级历史共享已加载上下文的快速并行拆分
Swarm对等共享任务列表长期运行的独立工作流
关键约束:
  • Fork 只有单层 — 递归 fork 会指数级放大上下文成本。
  • Swarm 队友不能生成其他队友 — 名单是扁平的,防止不受控增长。
  • 结果异步到达;即发即忘注册立即返回 ID,父级可以继续工作。
从这里开始: 选一种委派模式并完整实现,再考虑混合模式。每个子 Agent prompt 写成自包含文档。在研究和实现 worker 之间加综合步骤 — 这是协调者创造价值的地方。
Coordinator 模式实现清单:
  1. 定义阶段化工作流:研究 → 综合 → 实现 → 验证
  2. 为每个 worker 写自包含 prompt(不要"基于你的发现")
  3. 过滤每个 worker 的工具集,只给它需要的
  4. 决定继续 vs 新建策略:上下文重叠就继续,验证阶段必须新建
权衡:
  • Coordinator 最安全但最慢 — 每阶段等前一阶段完成。
  • Fork 最快但只有一层,共享父级全部上下文成本。
  • Swarm 最灵活但最难协调 — 对等方只能通过共享任务列表通信。
深入阅读: references/agent-orchestration-pattern.md

User pain point: "I want parallelism, specialization and coordination, but no chaos."
Golden Rule: The coordinator must synthesize by itself, cannot delegate understanding. "Based on your findings, fix it" is an anti-pattern — the coordinator should digest worker results into precise specifications before dispatching implementation.
Applicable scenarios: When the task is too large for a single Agent, when parallel exploration is needed, or when persistent specialized teammates are desired.
How it works:
Three delegation modes serve different task forms:
ModeContext SharingSuitable For
CoordinatorNone — worker starts from scratchComplex multi-stage tasks (research → synthesis → implementation → verification)
ForkFull — child inherits parent historyFast parallel splitting of shared loaded context
SwarmPeer-to-peer shared task listLong-running independent workflows
Key constraints:
  • Fork only has one layer — recursive fork will exponentially amplify context cost.
  • Swarm teammates cannot generate other teammates — the list is flat to prevent uncontrolled growth.
  • Results arrive asynchronously; fire-and-forget registration returns an ID immediately, and the parent can continue working.
Start here: Choose one delegation mode and implement it completely before considering hybrid modes. Write each child Agent prompt as a self-contained document. Add a synthesis step between research and implementation workers — this is where the coordinator creates value.
Coordinator mode implementation checklist:
  1. Define phased workflow: research → synthesis → implementation → verification
  2. Write self-contained prompts for each worker (don't use "based on your findings")
  3. Filter each worker's toolset to only provide required tools
  4. Decide on continue vs new strategy: continue if context overlaps, must create new for verification phase
Trade-offs:
  • Coordinator is the safest but slowest — each phase waits for the previous phase to complete.
  • Fork is the fastest but only has one layer, sharing the full context cost of the parent.
  • Swarm is the most flexible but hardest to coordinate — peers can only communicate through shared task lists.
Further reading: references/agent-orchestration-pattern.md

6. 生命周期与可扩展性

6. Lifecycle and Scalability

用户痛点: "我需要 Hook、后台任务和干净的启动序列。"
黄金法则: 可扩展性是注入点,不是继承层次。Hook 在生命周期时刻附加副作用;任务用严格状态机追踪异步工作;Bootstrap 以记忆化阶段按依赖顺序初始化。
适用场景: 需要不修改核心代码就扩展 Agent 行为、追踪长期运行的后台工作、或为多种入口模式结构化初始化时。
工作原理:
  • Hook 在定义的生命周期时刻附加副作用(工具执行前/后、prompt 提交、Agent 启停)。信任是全有或全无:工作区不受信任时所有 Hook 跳过 — 不只是可疑的那些。会话级 Hook 是临时的,会话结束时清理。
  • 长期运行工作 通过带类型的状态机追踪。每个工作单元有带类型前缀的 ID、严格生命周期(运行中 → 完成 / 失败 / 被杀)、和磁盘后备输出。回收是两阶段的:终态时立即清磁盘,父级收到通知后才懒清内存。
  • Bootstrap 将初始化结构化为按依赖排序的、记忆化的阶段。信任边界 — 用户授权同意的时刻 — 是关键拐点:安全敏感子系统(遥测、秘密环境变量)不能在信任建立前激活。多种入口模式(CLI、服务器、SDK)共享同一 Bootstrap 路径,不同入口点。
从这里开始: 让所有 Hook 通过单一分发点。在加任何外部 Hook 类型之前先实现信任门控。在 init 时注册清理处理器,不在使用点。
在 Claude Code 中: 使用
/update-config
配置 Hook(工具执行前/后、prompt 提交)。
权衡:
  • 全有或全无的 Hook 信任简单但粗糙 — 一个不受信任的 Hook 禁用整个扩展系统。
  • 磁盘后备任务输出保持内存恒定但增加与并发工作单元成正比的 I/O 延迟。
深入阅读: references/hook-lifecycle-pattern.md | references/task-decomposition-pattern.md | references/bootstrap-sequence-pattern.md

User pain point: "I need Hook, background tasks and clean startup sequence."
Golden Rule: Scalability is injection points, not inheritance hierarchies. Hook attaches side effects at lifecycle moments; tasks track asynchronous work with a strict state machine; Bootstrap initializes in dependency order with memoized phases.
Applicable scenarios: When you need to extend Agent behavior without modifying core code, track long-running background work, or structure initialization for multiple entry modes.
How it works:
  • Hook attaches side effects at defined lifecycle moments (before/after tool execution, prompt submission, Agent start/stop). Trust is all-or-nothing: all Hooks are skipped when the workspace is untrusted — not just suspicious ones. Session-level Hooks are temporary and cleaned up when the session ends.
  • Long-running work is tracked through a typed state machine. Each work unit has a typed prefixed ID, strict lifecycle (running → completed / failed / killed), and disk-backed output. Reclamation is two-stage: disk is cleared immediately when in final state, memory is lazily cleared only after the parent is notified.
  • Bootstrap structures initialization into dependency-ordered, memoized phases. The trust boundary — the moment the user gives authorization consent — is a key inflection point: security-sensitive subsystems (telemetry, secret environment variables) cannot be activated before trust is established. Multiple entry modes (CLI, server, SDK) share the same Bootstrap path, different entry points.
Start here: Let all Hooks pass through a single distribution point. Implement trust gating before adding any external Hook types. Register cleanup handlers at init time, not at the point of use.
In Claude Code: Use
/update-config
to configure Hook (before/after tool execution, prompt submission).
Trade-offs:
  • All-or-nothing Hook trust is simple but crude — one untrusted Hook disables the entire extension system.
  • Disk-backed task output keeps memory constant but increases I/O latency proportional to concurrent work units.
Further reading: references/hook-lifecycle-pattern.md | references/task-decomposition-pattern.md | references/bootstrap-sequence-pattern.md

踩坑指南

Pitfall Guide

违反这些原则会导致 bug 的非直觉设计:
  1. 并发分类是按调用的,不是按工具类型。 同一工具对某些输入安全、对另一些不安全。不要假设工具的并发行为是静态的 — 运行时按调用决定。
  2. 权限评估有副作用。 权限检查器追踪拒绝次数、转换模式、更新状态。不要把它当纯查询函数。
  3. 大多数异步工作跳过 "pending" 状态。 实践中,工作单元直接注册为 "running"。不要构建假设每个工作单元从 pending 开始的 UI。
  4. Fork 子级不能 fork。 递归保护维护单层不变式。Fork 工具留在子级工具池中(为了 prompt 缓存共享)但在调用时被阻止。
  5. 上下文构建器是记忆化的但手动失效。 添加上下文源而不添加对应的失效点,模型在整个会话中看到过时数据。
  6. 记忆索引有硬上限。 超过上限的条目被静默截断。不定期清理的话,新条目变得不可见。
  7. 技能列表预算很紧。 描述被拼接并按条目限制字符。把最具区分度的触发语言放在前面 — 后面会被截掉。
  8. Hook 信任是全有或全无。 工作区不受信任时,整个 Hook 系统被禁用,不只是个别可疑的 Hook。
  9. 工具的默认权限是 "allow"。 没有实现自定义权限逻辑的工具完全委托给基于规则的系统。只在需要工具特定门控(路径 ACL、配额等)时才覆盖。
  10. 回收需要通知。 终态工作单元只有在父级收到完成信号后才可被 GC。在通知前回收会产生竞态,父级永远读不到结果。

Violations of these principles lead to counterintuitive bug designs:
  1. Concurrency classification is per call, not per tool type. The same tool is safe for some inputs and unsafe for others. Do not assume that the concurrency behavior of a tool is static — the runtime decides per call.
  2. Permission evaluation has side effects. The permission checker tracks rejection counts, transitions modes, updates state. Do not treat it as a pure query function.
  3. Most asynchronous work skips the "pending" state. In practice, work units are directly registered as "running". Do not build UIs that assume every work unit starts from pending.
  4. Fork children cannot fork. Recursion protection maintains the single-layer invariant. The Fork tool remains in the child tool pool (for prompt cache sharing) but is blocked when called.
  5. Context builders are memoized but manually invalidated. If you add a context source without adding a corresponding invalidation point, the model will see stale data for the entire session.
  6. Memory indexes have hard caps. Entries exceeding the cap are silently truncated. If not cleaned up regularly, new entries become invisible.
  7. Skill list budget is very tight. Descriptions are concatenated and limited by characters per entry. Put the most distinctive trigger language first — the rest will be truncated.
  8. Hook trust is all-or-nothing. When the workspace is untrusted, the entire Hook system is disabled, not just individual suspicious Hooks.
  9. The default permission for tools is "allow". Tools that do not implement custom permission logic are fully delegated to the rule-based system. Only override when tool-specific gating (path ACL, quota, etc.) is required.
  10. Reclamation requires notification. Final state work units can only be GCed after the parent receives the completion signal. Reclamation before notification creates a race condition, and the parent will never read the result.

本技能不适用于

This skill is not applicable to

本技能是关于 Agent 周围的 harness 的,不是:
  • Prompt 工程或系统 prompt 设计
  • 模型选择或微调
  • 通用软件架构(MVC、微服务)
  • 聊天 UI 或对话界面
  • LLM API 集成基础
如果你的问题是关于模型本身而非模型周围的系统,本技能不适用。
This skill is about the harness around Agents, not:
  • Prompt engineering or system prompt design
  • Model selection or fine-tuning
  • General software architecture (MVC, microservices)
  • Chat UI or conversational interfaces
  • LLM API integration basics
If your question is about the model itself rather than the system around the model, this skill is not applicable.