Agentic Harness Patterns (Chinese Version)
Production-grade AI programming assistants are far more than just "large model + tool calling loop". The loop itself is very simple. Harness — covering memory, skills, security, context control, delegation, and scalability — is the key to making Agents run reliably, securely, and at scale.
Suitable for: Engineers building or scaling AI Agent runtimes, custom Agents, or advanced multi-Agent workflows.
Not suitable for: Prompt engineering, model selection, general software architecture, entry-level LLM API learning.
All principles are extracted from production-level runtime decisions. Claude Code is used as empirical evidence, not the only implementation.
Choose Your Problem
Before you start building: Read the
Pitfall Guide first — these are the most counterintuitive but time-consuming failure modes.
1. Memory System
User pain point: "My Agent forgets all corrections and project rules in the next conversation."
Golden Rule: Distinguish between what the Agent knows (instruction memory), what the Agent has learned (automatic memory), and what the Agent retrieves (session memory). The three layers have different requirements for persistence, trust, and audit.
Applicable scenarios: Any Agent that runs across sessions or needs to continuously accumulate project knowledge.
How it works:
- Instruction memory is manually curated, hierarchical configuration injected into the system context by priority (organization level → user level → project level → local level; local takes precedence). Project coding specifications and behavior rules are stored here. It is written by humans and stable.
- Automatic memory is persistent knowledge written autonomously by the Agent, with a type taxonomy (user / feedback / project / reference) and a capped index. Writing is a two-step operation: first write the topic file, then update the index. The cap prevents unlimited growth — if not cleaned up, new entries will be silently truncated.
- Session extraction runs as a background Agent at the end of the session. It directly writes to automatic memory — first topic file then index — following the same two-step save invariant. A mutex ensures that if the main Agent has already written memory in the current round, the extractor skips directly. This is the autonomous learning loop.
- Review and promotion audits all memory layers and proposes cross-layer moves (automatic memory → project specifications, personal settings or team memory). It never applies changes autonomously — proposals require explicit user approval.
Start here: Define your memory layers (instruction, automatic, extraction). Implement the two-step save invariant (topic file first, then index). Add background extraction after the core write path is stable.
In Claude Code: Use
to audit and promote automatic memory entries across layers.
Trade-offs:
- More memory layers = richer recall but higher maintenance burden. If not cleaned up regularly, the index cap will cause silent data loss.
- Session extraction adds latency at the end of the session, but greatly improves cross-session learning ability.
Further reading: references/memory-persistence-pattern.md
2. Skills System
User pain point: "I want the Agent to reuse workflows and domain knowledge without re-explaining every time."
Golden Rule: Skills are lazy-loaded instruction sets, not immediately injected prompts. Discovery must be cheap (metadata only); full content is only loaded when activated.
Applicable scenarios: Any Agent that requires reusable, composable workflows matched and activated based on user intent.
How it works:
- Discovery is budget-constrained: the Agent sees a compact list of all available skills (name, description, trigger prompts concatenated together), each with a hard character limit, and the total amount is limited to about 1% of the context window. Put trigger keywords first — the rest will be truncated.
- Loading is lazy: only metadata enters the always-online context. The full skill content is only loaded when activated, and idle token cost is close to zero.
- Execution can be inline (shared context) or isolated (fork child Agent with its own token budget). Isolation prevents heavy skills from exhausting the parent context.
- Sources can be built-in, user-installed, or dynamically loaded from plugins. Deduplication is done via canonical paths to prevent the same skill from appearing twice in overlapping source directories.
Start here: Choose a metadata format (frontmatter is recommended). Implement two-stage discovery: cheap list at startup, lazy load content on call. Set per-entry character limits before the directory grows.
Trade-offs:
- Lazy loading saves tokens but adds an extra round of latency on first activation.
- Fork execution provides isolation but loses the context already accumulated by the parent.
Further reading: references/skill-runtime-pattern.md
3. Tools and Security
User pain point: "I want the Agent to use tools powerfully, but not dangerously."
Golden Rule: Fail-closed by default. Tools are serial and gated, unless explicitly marked as concurrency-safe and passed through the permission pipeline.
Applicable scenarios: Any Agent runtime that requires tool registration, concurrency control, or permission gating.
How it works:
- Registration uses fail-closed defaults: tools are not concurrent and not read-only by default, unless the developer actively marks them. This prevents accidental parallel execution of state-changing operations.
- Concurrency classification is per call, not per tool type: the same tool is safe for some inputs and unsafe for others. The runtime splits a batch of tool calls into consecutive groups — safe calls are executed in parallel, and any unsafe call starts a serial segment.
- Permission pipeline evaluates rules from multiple sources in strict priority order, covering configuration files (user, project, local, flags, policies), CLI parameters, command-level rules and session authorization. The evaluator is stateful — it tracks rejection counts, transitions modes, and updates state as a side effect.
- Handler distribution varies by execution environment: interactive (human prompt), automated (orchestrator) or asynchronous (Swarm Agent). The same permission rules feed different approval interfaces.
Start here: Let every tool call pass through a permission gate. Fail-closed (reject/ask) by default. Add non-exempt rules for protected paths before launching any auto-approval mode.
In Claude Code: Use
to configure permission rules and Hook.
Trade-offs:
- Fail-closed default means new tools are secure out of the box, but developers must actively mark concurrency safety — forgetting to mark read-only tools will silently reduce throughput.
- Multi-source permission layering is powerful but difficult to debug when rules conflict.
Further reading: references/tool-registry-pattern.md | references/permission-gate-pattern.md
4. Context Engineering
User pain point: "My Agent sees too much, too little, or the wrong thing."
Golden Rule: Manage context as a budget, not a trash can. Every token in the window must earn its place through one of four operations: select, write back, compress, isolate.
Applicable scenarios: Any Agent whose performance degrades in long sessions, whose delegated work pollutes the parent context, or whose startup is slow due to eager loading.
How it works:
- Select — load on demand, don't load all at once. Use three-level progressive disclosure: metadata (always present, cheap), instructions (loaded on activation), resources (loaded on demand). Memoize expensive context builders, only invalidate at known change points — don't make it reactive.
- Write back — context is not read-only. The Agent writes information back to persistent storage: automatic memory entries, background extraction output, task status, permission rules. The write-back loop is the key to turning a stateless tool caller into a learning system.
- Compress — long sessions exhaust the window. Reactive compaction summarizes old rounds in the middle of the session, retaining recent context while reclaiming budget. Mark snapshot data as snapshots so the model knows to re-fetch the current state.
- Isolate — delegated work must not pollute the parent context. Coordinator workers have zero context inheritance (only explicit prompts). Fork children inherit the full context but have a single-layer limit (cannot recursively fork). File system level isolation (worktree) gives the Agent its own working copy.
Start here: Audit your current per-round context cost. Add hard caps for each variable-length block. Add truncation recovery pointers (tell the model which tool to call to get full output) before enabling any compression.
Trade-offs:
- Aggressive caching reduces latency but carries the risk of obsolescence — each change point must explicitly clear the cache, otherwise the model will use stale data for the rest of the session.
- Progressive disclosure saves tokens but means the model cannot reason about the full capabilities of a skill before activation.
Further reading: references/context-engineering-pattern.md (index) | select | compress | isolate
5. Multi-Agent Coordination
User pain point: "I want parallelism, specialization and coordination, but no chaos."
Golden Rule: The coordinator must synthesize by itself, cannot delegate understanding. "Based on your findings, fix it" is an anti-pattern — the coordinator should digest worker results into precise specifications before dispatching implementation.
Applicable scenarios: When the task is too large for a single Agent, when parallel exploration is needed, or when persistent specialized teammates are desired.
How it works:
Three delegation modes serve different task forms:
| Mode | Context Sharing | Suitable For |
|---|
| Coordinator | None — worker starts from scratch | Complex multi-stage tasks (research → synthesis → implementation → verification) |
| Fork | Full — child inherits parent history | Fast parallel splitting of shared loaded context |
| Swarm | Peer-to-peer shared task list | Long-running independent workflows |
Key constraints:
- Fork only has one layer — recursive fork will exponentially amplify context cost.
- Swarm teammates cannot generate other teammates — the list is flat to prevent uncontrolled growth.
- Results arrive asynchronously; fire-and-forget registration returns an ID immediately, and the parent can continue working.
Start here: Choose one delegation mode and implement it completely before considering hybrid modes. Write each child Agent prompt as a self-contained document. Add a synthesis step between research and implementation workers — this is where the coordinator creates value.
Coordinator mode implementation checklist:
- Define phased workflow: research → synthesis → implementation → verification
- Write self-contained prompts for each worker (don't use "based on your findings")
- Filter each worker's toolset to only provide required tools
- Decide on continue vs new strategy: continue if context overlaps, must create new for verification phase
Trade-offs:
- Coordinator is the safest but slowest — each phase waits for the previous phase to complete.
- Fork is the fastest but only has one layer, sharing the full context cost of the parent.
- Swarm is the most flexible but hardest to coordinate — peers can only communicate through shared task lists.
Further reading: references/agent-orchestration-pattern.md
6. Lifecycle and Scalability
User pain point: "I need Hook, background tasks and clean startup sequence."
Golden Rule: Scalability is injection points, not inheritance hierarchies. Hook attaches side effects at lifecycle moments; tasks track asynchronous work with a strict state machine; Bootstrap initializes in dependency order with memoized phases.
Applicable scenarios: When you need to extend Agent behavior without modifying core code, track long-running background work, or structure initialization for multiple entry modes.
How it works:
- Hook attaches side effects at defined lifecycle moments (before/after tool execution, prompt submission, Agent start/stop). Trust is all-or-nothing: all Hooks are skipped when the workspace is untrusted — not just suspicious ones. Session-level Hooks are temporary and cleaned up when the session ends.
- Long-running work is tracked through a typed state machine. Each work unit has a typed prefixed ID, strict lifecycle (running → completed / failed / killed), and disk-backed output. Reclamation is two-stage: disk is cleared immediately when in final state, memory is lazily cleared only after the parent is notified.
- Bootstrap structures initialization into dependency-ordered, memoized phases. The trust boundary — the moment the user gives authorization consent — is a key inflection point: security-sensitive subsystems (telemetry, secret environment variables) cannot be activated before trust is established. Multiple entry modes (CLI, server, SDK) share the same Bootstrap path, different entry points.
Start here: Let all Hooks pass through a single distribution point. Implement trust gating before adding any external Hook types. Register cleanup handlers at init time, not at the point of use.
In Claude Code: Use
to configure Hook (before/after tool execution, prompt submission).
Trade-offs:
- All-or-nothing Hook trust is simple but crude — one untrusted Hook disables the entire extension system.
- Disk-backed task output keeps memory constant but increases I/O latency proportional to concurrent work units.
Further reading: references/hook-lifecycle-pattern.md | references/task-decomposition-pattern.md | references/bootstrap-sequence-pattern.md
Pitfall Guide
Violations of these principles lead to counterintuitive bug designs:
-
Concurrency classification is per call, not per tool type. The same tool is safe for some inputs and unsafe for others. Do not assume that the concurrency behavior of a tool is static — the runtime decides per call.
-
Permission evaluation has side effects. The permission checker tracks rejection counts, transitions modes, updates state. Do not treat it as a pure query function.
-
Most asynchronous work skips the "pending" state. In practice, work units are directly registered as "running". Do not build UIs that assume every work unit starts from pending.
-
Fork children cannot fork. Recursion protection maintains the single-layer invariant. The Fork tool remains in the child tool pool (for prompt cache sharing) but is blocked when called.
-
Context builders are memoized but manually invalidated. If you add a context source without adding a corresponding invalidation point, the model will see stale data for the entire session.
-
Memory indexes have hard caps. Entries exceeding the cap are silently truncated. If not cleaned up regularly, new entries become invisible.
-
Skill list budget is very tight. Descriptions are concatenated and limited by characters per entry. Put the most distinctive trigger language first — the rest will be truncated.
-
Hook trust is all-or-nothing. When the workspace is untrusted, the entire Hook system is disabled, not just individual suspicious Hooks.
-
The default permission for tools is "allow". Tools that do not implement custom permission logic are fully delegated to the rule-based system. Only override when tool-specific gating (path ACL, quota, etc.) is required.
-
Reclamation requires notification. Final state work units can only be GCed after the parent receives the completion signal. Reclamation before notification creates a race condition, and the parent will never read the result.
This skill is not applicable to
This skill is about the harness around Agents, not:
- Prompt engineering or system prompt design
- Model selection or fine-tuning
- General software architecture (MVC, microservices)
- Chat UI or conversational interfaces
- LLM API integration basics
If your question is about the model itself rather than the system around the model, this skill is not applicable.