harness-creator

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Harness Creator

Production harness engineering for AI coding agents.

For: Engineers building or extending coding-agent runtimes, custom agents, multi-session workflows, or anyone who wants their agent to work reliably across sessions.

Not for: Prompt engineering, model selection, generic software architecture, or one-off agent tasks.

All principles are grounded in the Learn Harness Engineering framework and production agent runtime decisions.

面向AI编程代理的生产级Harness工程。

适用人群： 构建或扩展编程代理运行时、自定义代理、多会话工作流的工程师，或任何希望代理跨会话可靠工作的人。

不适用场景： 提示工程、模型选择、通用软件架构或一次性代理任务。

所有原则均基于Learn Harness Engineering框架和生产级代理运行时决策。

Harness Creator（中文版）

面向 AI 编程代理的生产级 Harness 工程技能。

适用人群： 构建或扩展编程代理运行时、自定义代理、多会话工作流的工程师，或任何希望代理跨会话可靠工作的人。

不适用场景： 提示工程、模型选择、通用软件架构或一次性代理任务。

所有原则均基于 Learn Harness Engineering 框架和生产代理运行时决策。

面向 AI 编程代理的生产级 Harness 工程技能。

适用人群： 构建或扩展编程代理运行时、自定义代理、多会话工作流的工程师，或任何希望代理跨会话可靠工作的人。

不适用场景： 提示工程、模型选择、通用软件架构或一次性代理任务。

所有原则均基于 Learn Harness Engineering 框架和生产代理运行时决策。

Choose Your Problem

选择你要解决的问题

If you want to...	Read
Make the agent remember corrections and project rules between sessions	Memory Persistence
Package reusable workflows and domain knowledge	Skill Runtime
Let the agent work powerfully but not dangerously	Tool Registry & Safety
Give the agent the right context at the right cost	Context Engineering
Split work across multiple agents without chaos	Multi-agent Coordination
Extend behavior with hooks, background tasks, startup logic	Lifecycle & Bootstrap
Build the complete 5-subsystem harness	Five Subsystems Guide

Before you start building: Read the Gotchas — these are the non-obvious failure modes that cost the most time.

如果你想...	阅读
让代理在会话之间记住修正和项目规则	记忆持久化
打包可重复使用的工作流和领域知识	技能运行时
让代理强大但安全地工作	工具注册与安全
以合适的成本给代理合适的上下文	上下文工程
在多个代理之间分配工作而不混乱	多代理协调
使用 hooks、后台任务、启动逻辑扩展行为	生命周期与引导
构建完整的 5 子系统 harness	五子系统指南

开始构建之前： 阅读陷阱 — 这些是最耗时的非明显失败模式。

选择你要解决的问题

如果你想...	阅读
让代理在会话之间记住修正和项目规则	记忆持久化
打包可重复使用的工作流和领域知识	技能运行时
让代理强大但安全地工作	工具注册与安全
以合适的成本给代理合适的上下文	上下文工程
在多个代理之间分配工作而不混乱	多代理协调
使用 hooks、后台任务、启动逻辑扩展行为	生命周期与引导
构建完整的 5 子系统 harness	五子系统指南

开始构建之前： 阅读陷阱 — 这些是最耗时的非明显失败模式。

如果你想...	阅读
让代理在会话之间记住修正和项目规则	记忆持久化
打包可重复使用的工作流和领域知识	技能运行时
让代理强大但安全地工作	工具注册与安全
以合适的成本给代理合适的上下文	上下文工程
在多个代理之间分配工作而不混乱	多代理协调
使用 hooks、后台任务、启动逻辑扩展行为	生命周期与引导
构建完整的 5 子系统 harness	五子系统指南

开始构建之前： 阅读陷阱 — 这些是最耗时的非明显失败模式。

The Five-Subsystem Harness Framework

五子系统Harness框架

Every harness consists of five subsystems:

Instructions (Recipe Shelf): AGENTS.md, CLAUDE.md, docs/ hierarchy
State (Prep Station): feature_list.json, progress.md, session-handoff.md
Verification (Quality Check Window): Verification commands, test suites, type checks
Scope (Task Boundaries): One-feature-at-a-time policies, definition of done
Lifecycle (Session Management): init.sh, clean-state checklists, handoff procedures

When creating or improving a harness, systematically address each subsystem.

每个Harness都包含五大子系统：

指令（配方库）：AGENTS.md、CLAUDE.md、docs/目录结构
状态（准备站）：feature_list.json、progress.md、session-handoff.md
验证（质量检查窗口）：验证命令、测试套件、类型检查
范围（任务边界）：一次处理一个功能的策略、完成定义
生命周期（会话管理）：init.sh、清洁状态检查清单、交接流程

创建或改进Harness时，请系统地处理每个子系统。

Creating a Harness

创建Harness

Phase 1: Context Gathering

阶段1：上下文收集

Start by understanding the user's situation:

What project is this for? (tech stack, size, complexity)
What agent tool are they using? (Claude Code, Codex, Cursor, etc.)
What exists already? (any AGENTS.md, progress tracking, verification?)
What problems are they experiencing? (agent overreach, lost context, broken tests?)
What's the team's tolerance for structure? (minimal vs. comprehensive)

If the user hasn't provided this context, ask before proceeding.

首先了解用户的情况：

这是针对哪个项目？（技术栈、规模、复杂度）
他们使用哪种代理工具？（Claude Code、Codex、Cursor等）
已有哪些资源？（是否有AGENTS.md、进度跟踪、验证机制？）
他们遇到了哪些问题？（代理越权、上下文丢失、测试失败？）
团队对结构化的接受程度如何？（极简型 vs 全面型）

如果用户未提供这些上下文，请先询问再继续。

Phase 2: Harness Assessment (Existing Projects)

阶段2：Harness评估（已有项目）

If the user has an existing harness, assess it using the five-tuple framework:

For each subsystem, score 1-5:

5: Exemplary, documented, consistently followed
4: Good, mostly complete, occasional gaps
3: Adequate, covers basics, missing polish
2: Weak, incomplete, inconsistently applied
1: Missing or actively harmful

Identify the lowest-scoring subsystem — that's the bottleneck. Focus improvement efforts there first.

如果用户已有Harness，使用五元组框架进行评估：

针对每个子系统，评分1-5：

5分：优秀，文档完善，持续遵循
4分：良好，基本完整，偶尔有漏洞
3分：合格，覆盖基础，缺乏打磨
2分：薄弱，不完整，应用不一致
1分：缺失或存在负面影响

找出评分最低的子系统——这就是瓶颈。优先聚焦该子系统的改进工作。

Phase 3: Design

阶段3：设计

Based on the assessment, design the harness components:

Instructions:

Create a short AGENTS.md (~50-100 lines) as the routing layer
Link to detailed docs in docs/ directory (ARCHITECTURE.md, PRODUCT.md, etc.)
Define startup workflow: what the agent reads before coding

State:

Create feature_list.json with feature definitions and status tracking
Create or update progress.md for session continuity
Design session-handoff.md template if needed

Verification:

List explicit verification commands in AGENTS.md
Ensure init.sh runs verification
Design quality score tracking if appropriate

Scope:

Define one-feature-at-a-time policy
Document feature dependencies
Create definition of done checklist

Lifecycle:

Create init.sh for initialization
Design clean-state checklist
Document session handoff procedure

基于评估结果，设计Harness组件：

指令：

创建简短的AGENTS.md（约50-100行）作为路由层
链接到docs/目录中的详细文档（ARCHITECTURE.md、PRODUCT.md等）
定义启动工作流：代理编码前需要阅读的内容

状态：

创建feature_list.json，包含功能定义和状态跟踪
创建或更新progress.md以保障会话连续性
如有需要，设计session-handoff.md模板

验证：

在AGENTS.md中列出明确的验证命令
确保init.sh执行验证
如有必要，设计质量分数跟踪机制

范围：

定义一次处理一个功能的策略
记录功能依赖关系
创建完成定义检查清单

生命周期：

创建init.sh用于初始化
设计清洁状态检查清单
记录会话交接流程

Phase 4: Implementation

阶段4：实现

Create the harness files. Use bundled scripts where available:

bash

undefined

创建Harness文件。使用可用的捆绑脚本：

bash

undefined

Use bundled scripts from scripts/ directory

(See scripts/ section for available tools)

undefined

undefined

Phase 5: Testing and Benchmarking

阶段5：测试与基准测试

Test the harness with real agent sessions:

Baseline: Run a representative task without the harness
With Harness: Run the same task with the harness
Measure: Success rate, time, token usage, rework
Compare: Quantify the improvement

For rigorous benchmarking, see the "Running Benchmarks" section below.

通过真实代理会话测试Harness：

基准测试：在不使用Harness的情况下运行代表性任务
使用Harness：在启用Harness的情况下运行相同任务
测量：成功率、耗时、Token使用量、返工量
对比：量化改进效果

如需严谨的基准测试，请参阅下方的“运行基准测试”部分。

Harness File Templates

Harness文件模板

AGENTS.md Structure

AGENTS.md结构

A minimal AGENTS.md should include:

markdown

undefined

一个极简的AGENTS.md应包含：

markdown

undefined

AGENTS.md

[One-sentence project purpose]

Startup Workflow

Before writing code:

[Step 1: e.g., Read this file]
[Step 2: e.g., Read ARCHITECTURE.md]
[Step 3: e.g., Run ./init.sh]
[Step 4: e.g., Read feature_list.json]

Before writing code:

[Step 1: e.g., Read this file]
[Step 2: e.g., Read ARCHITECTURE.md]
[Step 3: e.g., Run ./init.sh]
[Step 4: e.g., Read feature_list.json]

Working Rules

[Rule 1: e.g., One feature at a time]
[Rule 2: e.g., Verification required before claiming done]
[Rule 3: e.g., Update progress before ending session]

[Rule 1: e.g., One feature at a time]
[Rule 2: e.g., Verification required before claiming done]
[Rule 3: e.g., Update progress before ending session]

Required Artifacts

```
feature_list.json
```
: Feature state tracker
```
progress.md
```
: Session continuity log
```
init.sh
```
: Standard startup and verification

```
feature_list.json
```
: Feature state tracker
```
progress.md
```
: Session continuity log
```
init.sh
```
: Standard startup and verification

Definition of Done

End of Session

Before ending:

Update progress.md
Update feature_list.json
Record blockers/risks
Commit with descriptive message
Leave clean restart path

undefined

Before ending:

Update progress.md
Update feature_list.json
Record blockers/risks
Commit with descriptive message
Leave clean restart path

undefined

feature_list.json Structure

feature_list.json结构

json

{
  "features": [
    {
      "id": "feat-001",
      "name": "Document Import",
      "description": "Allow users to import PDF and TXT documents",
      "dependencies": [],
      "status": "done",
      "evidence": "tests pass, manual verification on 2024-01-15"
    },
    {
      "id": "feat-002",
      "name": "Document Chunking",
      "description": "Split documents into ~500 char chunks with metadata",
      "dependencies": ["feat-001"],
      "status": "in-progress",
      "evidence": ""
    }
  ]
}

json

{
  "features": [
    {
      "id": "feat-001",
      "name": "Document Import",
      "description": "Allow users to import PDF and TXT documents",
      "dependencies": [],
      "status": "done",
      "evidence": "tests pass, manual verification on 2024-01-15"
    },
    {
      "id": "feat-002",
      "name": "Document Chunking",
      "description": "Split documents into ~500 char chunks with metadata",
      "dependencies": ["feat-001"],
      "status": "in-progress",
      "evidence": ""
    }
  ]
}

init.sh Structure

init.sh结构

bash

#!/bin/bash
set -e

echo "=== Installing dependencies ==="
npm install

echo "=== Running type check ==="
npm run check

echo "=== Running tests ==="
npm test

echo "=== Building application ==="
npm run build

echo "=== Verification complete ==="

bash

#!/bin/bash
set -e

echo "=== Installing dependencies ==="
npm install

echo "=== Running type check ==="
npm run check

echo "=== Running tests ==="
npm test

echo "=== Building application ==="
npm run build

echo "=== Verification complete ==="

Running Benchmarks

运行基准测试

To measure harness effectiveness:

为衡量Harness的有效性：

Step 1: Define Representative Tasks

步骤1：定义代表性任务

Pick 2-3 tasks that are:

Real work the user would actually do
Challenging enough to fail without proper harness
Verifiable (clear success criteria)

选择2-3个符合以下条件的任务：

用户实际会处理的真实工作
难度足够高，无合适Harness时容易失败
可验证（明确的成功标准）

Step 2: Run Comparative Sessions

步骤2：运行对比会话

For each task:

Without Harness: Run the task on a clean repo copy
With Harness: Run the same task with the harness in place

Record:

Success/failure
Time taken
Token usage
Rework required
Session restarts needed

针对每个任务：

无Harness：在干净的仓库副本上运行任务
使用Harness：在启用Harness的情况下运行相同任务

记录：

成功/失败情况
耗时
Token使用量
所需返工量
所需会话重启次数

Step 3: Aggregate Results

步骤3：汇总结果

Calculate:

Success rate improvement
Time efficiency change
Token efficiency change
Qualitative feedback

计算：

成功率提升
时间效率变化
Token效率变化
定性反馈

Step 4: Iterate

步骤4：迭代

Use results to identify:

Which harness components add most value
Which components are over-engineered
Where to focus improvement efforts

利用结果确定：

哪些Harness组件价值最大
哪些组件过度设计
改进工作的聚焦方向

Bundled Resources

捆绑资源

References (Deep-Dive Patterns)

参考资料（深度解析模式）

Document	Covers
Memory Persistence	Four-level instruction hierarchy, auto-memory taxonomy, background extraction
Context Engineering	Select / Compress / Isolate / Write operations, budget management
Tool Registry	Fail-closed registration, per-call concurrency, permission pipeline
Multi-Agent	Coordinator / Fork / Swarm patterns, context sharing
Lifecycle & Bootstrap	Hook system, long-running tasks, dependency-ordered init
Gotchas	15 non-obvious failure modes with fixes

文档	涵盖内容
记忆持久化	四级指令层级、自动记忆分类、背景信息提取
上下文工程	选择/压缩/隔离/写入操作、预算管理
工具注册	关闭式故障注册、每次调用并发控制、权限流水线
多代理	协调/分支/集群模式、上下文共享
生命周期与引导	Hook系统、长期运行任务、依赖有序初始化
陷阱	15种非明显失败模式及修复方案

Templates

模板

```
templates/agents.md
```
— AGENTS.md / CLAUDE.md skeleton
```
templates/feature-list.json
```
— Feature state tracker
```
templates/init.sh
```
— Standard initialization script
```
templates/progress.md
```
— Session progress log
```
templates/session-handoff.md
```
— Session handoff template

```
templates/agents.md
```
— AGENTS.md / CLAUDE.md框架
```
templates/feature-list.json
```
— 功能状态跟踪器
```
templates/init.sh
```
— 标准初始化脚本
```
templates/progress.md
```
— 会话进度日志
```
templates/session-handoff.md
```
— 会话交接模板

Scripts (Optional)

脚本（可选）

```
scripts/create-harness.ts
```
— Generate harness files from templates
```
scripts/validate-harness.ts
```
— Check harness completeness
```
scripts/run-benchmark.ts
```
— Execute harness effectiveness comparison

```
scripts/create-harness.ts
```
— 从模板生成Harness文件
```
scripts/validate-harness.ts
```
— 检查Harness完整性
```
scripts/run-benchmark.ts
```
— 执行Harness有效性对比

Gotchas

陷阱（Gotchas）

Non-obvious principles that will cause bugs if you violate them:

Memory index caps fire silently — Long entries invisible once cap hit. Keep hooks to one line.
Priority ordering counterintuitive — Local beats project beats user beats org. Test full stack.
Extraction timing creates race window — User can start next turn before background extraction completes.
Derivable content doesn't belong in memory — Architecture and code patterns are in the repo already.
Concurrent classification is per-call, not per-tool — Same tool safe for some inputs, unsafe for others.
Permission evaluation has side effects — Tracks denials, transforms modes, updates state.
Most async work skips "pending" state — Work units register directly as "running".
Fork children must not fork — Recursive guard preserves single-level invariant.
Context builders memoized but manually invalidated — Add invalidation or face staleness.
Hook trust all-or-nothing — One untrusted hook disables entire extension system.
Eviction requires notification — Terminal work unit only GC-eligible after parent notified.
Skill listing budgets tight — Front-load distinctive trigger language, tails get cut.

Full guide: Gotchas — 15 failure modes with fixes.

违反这些非明显原则会导致bug：

记忆索引上限静默触发 — 条目过长超上限后不可见。保持钩子单行。
优先级顺序反直觉 — 本地胜过项目胜过用户胜过组织。测试完整栈。
提取时序产生竞争窗口 — 用户可在后台提取完成前开始下一轮。
可推导内容不应存入记忆 — 架构和代码模式已在仓库中。
并发分类按调用而非按工具 — 同一工具对某些输入安全，对其他不安全。
权限评估有副作用 — 跟踪拒绝、转换模式、更新状态。
大多数异步工作跳过"pending"状态 — 工作单元直接注册为"运行中"。
Fork子节点不能Fork — 递归防护保持单层不变量。
上下文构建器缓存但手动失效 — 添加失效机制否则会面临过时问题。
Hook信任全有或全无 — 一个不可信hook禁用整个扩展系统。
驱逐需要通知 — 终端工作单元仅在父节点通知后可被GC。
Skill列表预算紧张 — 前置独特触发语言，尾部会被截断。

完整指南：陷阱 — 15种失败模式及修复方法。

陷阱（Gotchas）

违反这些非明显原则会导致 bug：

记忆索引上限静默触发 — 条目过长超上限后不可见。保持钩子单行。
优先级顺序反直觉 — 本地胜过项目胜过用户胜过组织。测试完整栈。
提取时序产生竞争窗口 — 用户可在后台提取完成前开始下一轮。
可推导内容不应存入记忆 — 架构和代码模式已在仓库中。
并发分类按调用而非按工具 — 同一工具对某些输入安全，对其他不安全。
权限评估有副作用 — 跟踪拒绝、转换模式、更新状态。
大多数异步工作跳过"pending"状态 — 工作单元直接注册为"运行中"。
Fork 子节点不能 Fork — 递归防护保持单层不变量。
上下文构建器缓存但手动失效 — 添加失效或面对过时。
Hook 信任全有或全无 — 一个不可信 hook 禁用整个扩展系统。
驱逐需要通知 — 终端工作单元仅在父节点通知后可 GC。
Skill 列表预算紧张 — 前置独特触发语言，尾部被截断。

完整指南：陷阱 — 15 种失败模式及修复方法。

违反这些非明显原则会导致 bug：

记忆索引上限静默触发 — 条目过长超上限后不可见。保持钩子单行。
优先级顺序反直觉 — 本地胜过项目胜过用户胜过组织。测试完整栈。
提取时序产生竞争窗口 — 用户可在后台提取完成前开始下一轮。
可推导内容不应存入记忆 — 架构和代码模式已在仓库中。
并发分类按调用而非按工具 — 同一工具对某些输入安全，对其他不安全。
权限评估有副作用 — 跟踪拒绝、转换模式、更新状态。
大多数异步工作跳过"pending"状态 — 工作单元直接注册为"运行中"。
Fork 子节点不能 Fork — 递归防护保持单层不变量。
上下文构建器缓存但手动失效 — 添加失效或面对过时。
Hook 信任全有或全无 — 一个不可信 hook 禁用整个扩展系统。
驱逐需要通知 — 终端工作单元仅在父节点通知后可 GC。
Skill 列表预算紧张 — 前置独特触发语言，尾部被截断。

完整指南：陷阱 — 15 种失败模式及修复方法。

When to Use This Skill

何时使用此技能

Use this skill when:

User says "I need to set up AGENTS.md for my project"
User wants to improve their agent's reliability
User is experiencing agent failures, lost context, or broken work
User asks "how do I make my agent work better?"
User wants to benchmark harness effectiveness
User needs templates for harness files
User is following the Learn Harness Engineering course

在以下场景使用此技能：

用户表示“我需要为我的项目设置AGENTS.md”
用户希望提升代理的可靠性
用户遇到代理故障、上下文丢失或工作中断问题
用户询问“如何让我的代理工作得更好？”
用户希望对Harness有效性进行基准测试
用户需要Harness文件模板
用户正在学习Learn Harness Engineering课程

Communication Style

沟通风格

Explain harness concepts in practical terms (kitchen analogy works well)
Focus on measurable outcomes, not theoretical perfection
Start minimal, add structure as needed
Show before/after comparisons to build confidence
Acknowledge tradeoffs (more structure = more reliability but more upfront work)

用实用术语解释Harness概念（厨房类比效果很好）
聚焦可衡量的结果，而非理论完美
从极简方案开始，按需添加结构
展示前后对比以建立信心
承认权衡（更多结构=更高可靠性，但前期投入更多）

Getting Started

入门指南

If the user is new to harness engineering:

如果用户是Harness工程新手：

Start with assessment: Run the five-tuple assessment on their current setup
Pick lowest-scoring subsystem: Focus improvement efforts there first
Create minimal viable harness: AGENTS.md + init.sh + feature_list.json
Test with real task: Measure before/after improvement

从评估开始：对他们当前的设置进行五元组评估
选择评分最低的子系统：优先聚焦该子系统的改进工作
创建最小可行Harness：AGENTS.md + init.sh + feature_list.json
用真实任务测试：衡量前后改进效果

If the user is experienced:

如果用户经验丰富：

Ask what specific problem: Don't assume — let them describe the pain point
Understand harness maturity: What exists already? What's working?
Design targeted improvements: Use reference patterns for guidance
Optionally run benchmarks: Quantify impact with before/after comparison

询问具体问题：不要假设，让他们描述痛点
了解Harness成熟度：已有哪些资源？哪些部分有效？
设计针对性改进：参考模式进行指导
可选运行基准测试：通过前后对比量化影响

When NOT to Use This Skill

何时不使用此技能

This skill is about the harness around an agent, not:

Prompt engineering or system prompt design
Model selection or fine-tuning
Generic software architecture (MVC, microservices)
Chat UIs or conversational interfaces
LLM API integration basics

If your question is about the model itself rather than the system around it, this skill does not apply.

此技能专注于代理周围的Harness系统，而非：

提示工程或系统提示设计
模型选择或微调
通用软件架构（MVC、微服务）
聊天UI或对话界面
LLM API集成基础

如果你的问题是关于模型本身而非其周围的系统，则此技能不适用。