harness-engineering

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Harness Engineering

管控框架工程

Core Principle

核心原则

The repo is the harness. Agent failures are harness failures. When an agent breaks a rule, fix the harness — not the agent.

Pillar	What It Solves
Context Engineering	Agent hallucinations, wrong tool usage, stale docs
Architectural Constraints	Boundary violations, silent regressions, ambiguous failures
Garbage Collection	Entropy accumulation, dead code, doc drift

代码仓库就是管控框架。Agent故障本质是管控框架故障。如果Agent违反规则，请修复管控框架，而不是Agent。

支柱	解决问题
上下文工程	Agent幻觉、工具使用错误、文档过时
架构约束	边界违反、静默回归、模糊故障
垃圾回收	熵增累积、死代码、文档漂移

When to Apply

适用场景

Situation	Action
New repo setup for AI development	Apply all three pillars from day one
Agent repeatedly hallucinating tools/APIs	Pillar 1: add tool declarations to AGENTS.md
Agent crossing module boundaries	Pillar 2: add structural test enforcing boundary
Agent producing code that passes CI but breaks conventions	Pillar 2: convert convention to linter rule or structural test
Docs drifting from implementation	Pillar 1: add CI cross-link validation
Codebase growing, agent quality degrading	Pillar 3: schedule GC agent for dead code and unused exports
Agent ignoring AGENTS.md guidance	Check: is guidance generic advice or a specific failure lesson? Rewrite as failure ledger entry
Post-incident on agent-generated code	Add failure to AGENTS.md, add constraint to prevent recurrence

场景	处理动作
新建AI开发代码仓库	从第一天起落地全部三大支柱
Agent反复对工具/API产生幻觉	支柱1：在AGENTS.md中添加工具声明
Agent跨越模块边界	支柱2：添加结构测试强制边界规则
Agent生成的代码通过CI但违反约定	支柱2：将约定转化为linter规则或结构测试
文档与实现不一致	支柱1：添加CI交叉链接校验
代码库膨胀，Agent输出质量下降	支柱3：调度GC Agent清理死代码和未使用的导出项
Agent忽略AGENTS.md的指引	检查：指引是通用建议还是具体故障教训？重写为故障台账条目
Agent生成代码引发事故后	将故障添加到AGENTS.md，添加约束避免再次发生

Pillar 1: Context Engineering

支柱1：上下文工程

Goal: Make the repository a knowledge product that agents can consume without hallucination.

目标：将代码仓库打造为Agent可以无幻觉消费的知识产品。

1.1 AGENTS.md as Failure Ledger

1.1 AGENTS.md作为故障台账

Every line in AGENTS.md should trace to a real failure, not generic best practice.

undefined

AGENTS.md中的每一行都应当对应真实发生过的故障，而非通用最佳实践。

undefined

Pattern

示例

BAD: "Follow clean code principles" BAD: "Use meaningful variable names" GOOD: "Never import from packages/internal — agent imported shared/db directly on 2025-12-03, broke build" GOOD: "Always use OrderService.create(), not Order.new — direct instantiation skips validation (incident #247)"


**Failure ledger entry format:**
```yaml
rule: Never call PaymentGateway directly from controllers
context: Agent bypassed service layer, sent duplicate charges (2025-11-15)
fix: Use PaymentService.charge() which handles idempotency

When writing or updating AGENTS.md:

If the rule doesn't reference a specific failure or concrete constraint → cut it
If the rule says "should" → rewrite as "must" with consequence
If the rule has no enforcement mechanism → pair it with a Pillar 2 constraint


**故障台账条目格式：**
```yaml
rule: Never call PaymentGateway directly from controllers
context: Agent bypassed service layer, sent duplicate charges (2025-11-15)
fix: Use PaymentService.charge() which handles idempotency

编写或更新AGENTS.md时：

如果规则没有关联具体故障或明确约束 → 删掉
如果规则用了“应该” → 重写为“必须”并附上违规后果
如果规则没有落地执行机制 → 搭配支柱2的约束一起使用

1.2 Tool Declaration Mandate

1.2 工具强制声明要求

Undeclared tools don't exist for agents. Explicitly list available tools, commands, and scripts.

markdown

undefined

对Agent来说未声明的工具等同于不存在。请明确列出可用的工具、命令和脚本。

markdown

undefined

Available Tools

可用工具

```
bin/test
```
— Run test suite (prefer over raw pytest/rspec)
```
bin/lint
```
— Run linters with auto-fix
```
bin/db-reset
```
— Reset dev database
```
make deploy-staging
```
— Deploy to staging (requires approval)

```
bin/test
```
— 运行测试套件（优先使用，不要直接调用pytest/rspec）
```
bin/lint
```
— 运行linter并自动修复可修复问题
```
bin/db-reset
```
— 重置开发环境数据库
```
make deploy-staging
```
— 部署到预发环境（需要审批）

DO NOT USE

禁止使用

```
rm -rf
```
on any directory
Direct database queries in production
```
curl
```
to external APIs without going through ApiClient


If an agent uses a tool not in this list → add it (if valid) or add it to DO NOT USE (if dangerous).

对任何目录执行
```
rm -rf
```
生产环境直接操作数据库查询
不通过ApiClient直接用
```
curl
```
调用外部API


如果Agent使用了不在列表中的工具 → 如果是合法工具就添加到可用列表，如果是危险工具就添加到禁止使用列表。

1.3 Docs as System of Record

1.3 文档作为唯一可信源

RULE: docs/ is canonical truth
ENFORCEMENT: CI validates cross-links between docs/ and source code
MECHANISM:
  - Every public API must have a corresponding docs/ entry
  - CI script checks: for each @api-doc tag in source → matching file in docs/
  - Broken link = CI failure, not warning

Stale docs are worse than no docs — agents trust what they read.

规则：docs/目录是权威真值
落地方式：CI校验docs/目录和源码之间的交叉链接
实现逻辑：
  - 每个公开API必须在docs/目录下有对应说明
  - CI脚本检查：源码中每个带@api-doc标签的内容 → 都在docs/下有匹配文件
  - 链接断裂 = CI失败，不是警告

过时的文档比没有文档更糟——Agent会信任它读到的内容。

1.4 Isolated Observability Per Task

1.4 按任务隔离可观测性

Each agent task gets its own log context, not a shared monitoring stream.

PATTERN:
  - Assign task_id to each agent invocation
  - Route logs to task-specific output (file, log group, trace)
  - Post-task: review task log for failures, add to AGENTS.md if new pattern

Shared monitoring hides individual agent failures in noise.

每个Agent任务都有独立的日志上下文，不要共用监控流。

模式：
  - 每次Agent调用都分配单独的task_id
  - 日志输出到任务专属的位置（文件、日志组、链路追踪）
  - 任务结束后：审查任务日志排查故障，如果发现新模式就添加到AGENTS.md

共用监控会把单个Agent的故障淹没在噪音里。

Pillar 2: Architectural Constraints

支柱2：架构约束

Goal: Make violations impossible or immediately visible through deterministic enforcement.

目标：通过确定性的规则执行，让违规行为要么不可能发生，要么立刻被发现。

2.1 Teaching Linter Errors

2.1 引导式linter错误

Failure messages must include remediation, not just violation.

undefined

错误信息必须包含修复方案，而不仅仅是告知违规。

undefined

BAD linter output

不好的linter输出

ERROR: Import violation in src/api/handler.ts

GOOD linter output

好的linter输出

ERROR: Import violation in src/api/handler.ts ↳ Cannot import from 'packages/db' in 'src/api/' ↳ Use 'packages/db-client' instead (facade pattern) ↳ See: docs/architecture/data-access.md


Agents read error messages literally. A teaching error message prevents the same mistake on next attempt.

ERROR: Import violation in src/api/handler.ts ↳ Cannot import from 'packages/db' in 'src/api/' ↳ Use 'packages/db-client' instead (facade pattern) ↳ See: docs/architecture/data-access.md


Agent会逐字读取错误信息，带引导的错误信息可以避免下次再犯同样的错误。

2.2 Structural Tests

2.2 结构测试

Enforce architectural boundaries in CI, not in documentation.

undefined

在CI中落地架构边界约束，不要只写在文档里。

undefined

Pseudocode structural tests

结构测试伪代码

test "no cross-boundary imports": For each FILE in src/api/**: IMPORTS = parse_imports(FILE) FORBIDDEN = ["packages/internal", "src/admin", "src/worker"] assert IMPORTS intersection FORBIDDEN == empty

test "dependency direction": LAYERS = [presentation, application, domain, infrastructure] For each LAYER in LAYERS: For each IMPORT in LAYER.imports: assert IMPORT.layer_index >= LAYER.index # only import same or lower

test "API contract stability": CURRENT = parse_openapi("api/openapi.yml") PREVIOUS = parse_openapi("api/openapi.yml", ref="main") assert no_breaking_changes(CURRENT, PREVIOUS)

undefined

test "API contract stability": CURRENT = parse_openapi("api/openapi.yml") PREVIOUS = parse_openapi("api/openapi.yml", ref="main") assert no_breaking_changes(CURRENT, PREVIOUS)

undefined

2.3 Numeric CI Gates

2.3 数字化CI门禁

Every check must be binary pass/fail. Advisory warnings are invisible to agents.

Check Type	Gate Implementation
Test coverage	`coverage >= 80%` or fail
Bundle size	`size <= 250KB` or fail
Lint errors	`errors == 0` or fail
Type errors	`errors == 0` or fail
Security vulns	`critical == 0` or fail

Rule: If it matters, it's a gate. If it's a warning, agents will ignore it.

所有检查都必须是二元的通过/失败结果，建议性警告对Agent来说等同于不存在。

检查类型	门禁实现
测试覆盖率	`覆盖率 >= 80%` 否则失败
构建包体积	`体积 <= 250KB` 否则失败
Lint错误	`错误数 == 0` 否则失败
类型错误	`错误数 == 0` 否则失败
安全漏洞	`严重漏洞数 == 0` 否则失败

规则：如果某条要求很重要，就做成门禁。如果只是警告，Agent会直接忽略。

2.4 Dependency Layering

2.4 依赖分层

Make dependency direction explicit and enforced.

undefined

让依赖方向明确且可被强制校验。

undefined

.dependency-layers.yml (or equivalent config)

.dependency-layers.yml (或等效配置文件)

layers:

name: presentation paths: ["src/ui/", "src/api/routes/"] can_import: [application, domain]
name: application paths: ["src/services/", "src/use-cases/"] can_import: [domain, infrastructure]
name: domain paths: ["src/models/", "src/entities/"] can_import: [] # domain has no dependencies
name: infrastructure paths: ["src/db/", "src/external/"] can_import: [domain]


Without explicit layers, agents will create circular dependencies.

layers:

name: presentation paths: ["src/ui/", "src/api/routes/"] can_import: [application, domain]
name: application paths: ["src/services/", "src/use-cases/"] can_import: [domain, infrastructure]
name: domain paths: ["src/models/", "src/entities/"] can_import: [] # 领域层没有外部依赖
name: infrastructure paths: ["src/db/", "src/external/"] can_import: [domain]


没有明确的分层规则，Agent会生成循环依赖。

Pillar 3: Garbage Collection

支柱3：垃圾回收

Goal: Actively fight entropy rather than accumulating technical debt.

目标：主动对抗熵增，避免技术债务累积。

3.1 Background GC Agents

3.1 后台GC Agent

Schedule periodic scans that produce small, auto-mergeable PRs.

GC_TASKS:
  - dead_code: find unused exports, unreachable functions → remove
  - stale_docs: find docs/ entries with no matching source → flag
  - unused_deps: find package.json/Gemfile entries with no imports → remove
  - orphan_tests: find test files with no matching source file → flag
  - config_drift: diff .env.example vs actual config usage → reconcile

FREQUENCY: weekly for active repos, monthly for stable repos
OUTPUT: one small PR per GC task (not one mega-PR)
MERGE: auto-merge if CI passes, otherwise flag for review

定期调度扫描任务，生成小型可自动合并的PR。

GC_TASKS:
  - dead_code: 查找未使用的导出项、不可达函数 → 移除
  - stale_docs: 查找docs/下没有对应源码的条目 → 标记
  - unused_deps: 查找package.json/Gemfile中没有被导入的依赖 → 移除
  - orphan_tests: 查找没有对应源码的测试文件 → 标记
  - config_drift: 对比.env.example和实际配置使用情况 → 校准

频率：活跃仓库每周一次，稳定仓库每月一次
输出：每个GC任务生成一个小型PR（不要合并成一个超大PR）
合并：如果CI通过就自动合并，否则标记待人工审核

3.2 Custom Verification Tools

3.2 自定义校验工具

Build repo-specific fast-feedback tools instead of relying on generic linters.

PRINCIPLE: fast feedback > comprehensive analysis

Examples:
  - bin/check-api-contracts → validates OpenAPI spec matches routes (5s)
  - bin/check-imports → validates dependency layers (2s)
  - bin/check-docs → validates doc cross-links (3s)

RULE: if a custom check takes > 30s, it's too slow for agent feedback loops

搭建仓库专属的快速反馈工具，不要只依赖通用linter。

原则：快速反馈 > 全面分析

示例：
  - bin/check-api-contracts → 校验OpenAPI规范和路由是否匹配（5秒）
  - bin/check-imports → 校验依赖分层（2秒）
  - bin/check-docs → 校验文档交叉链接（3秒）

规则：如果自定义检查耗时超过30秒，对Agent的反馈循环来说太慢了

3.3 Feedback Loop Discipline

3.3 反馈循环规范

When an agent produces bad output, follow this diagnostic order:

1. Check AGENTS.md — is the constraint documented?
   If NO → add it (Pillar 1 fix)
   If YES → is it enforced in CI?

2. Check CI — does a gate catch this failure?
   If NO → add structural test or linter rule (Pillar 2 fix)
   If YES → is the error message teaching?

3. Check error message — does it include remediation?
   If NO → improve the error message (Pillar 2.1 fix)
   If YES → the harness is correct, investigate agent-specific issue

NEVER: Skip to "the agent is broken" without completing steps 1-3

当Agent产出错误结果时，按照以下顺序排查：

1. 检查AGENTS.md —— 对应的约束有没有记录？
   没有 → 添加约束（支柱1修复）
   有 → 检查CI中有没有落地对应的校验？

2. 检查CI —— 有没有门禁能捕获这个故障？
   没有 → 添加结构测试或者linter规则（支柱2修复）
   有 → 检查错误信息有没有引导修复的内容？

3. 检查错误信息 —— 有没有包含修复方案？
   没有 → 优化错误信息（支柱2.1修复）
   有 → 管控框架没有问题，排查Agent本身的问题

禁止：不完成前3步就直接判定是Agent的问题

Harness Assessment Checklist

管控框架评估清单

Evaluate an existing repo's harness maturity:

评估现有仓库的管控框架成熟度：

Context Engineering

上下文工程

AGENTS.md exists at repo root
AGENTS.md entries reference specific failures, not generic advice
Available tools/scripts are explicitly listed
Forbidden operations are explicitly listed
```
docs/
```
has CI-enforced cross-links to source

仓库根目录存在AGENTS.md
AGENTS.md的条目关联具体故障，不是通用建议
明确列出了可用的工具/脚本
明确列出了禁止的操作
```
docs/
```
目录和源码的交叉链接有CI强制校验

Architectural Constraints

架构约束

Linter errors include remediation instructions
At least one structural test enforces module boundaries
All CI checks are binary pass/fail (no advisory warnings)
Dependency direction is documented and enforced

Linter错误包含修复指引
至少有一个结构测试强制校验模块边界
所有CI检查都是二元的通过/失败（没有建议性警告）
依赖方向有文档说明且被强制校验

Garbage Collection

垃圾回收

Dead code removal happens on a schedule (not ad-hoc)
Stale docs are detected automatically
Custom verification tools exist for repo-specific invariants
Post-failure diagnosis follows harness-first order (3.3)

Scoring: Count passing items per pillar. Below 3/5 in any pillar = harness gap.

定期清理死代码（不是临时处理）
自动检测过时文档
有自定义校验工具处理仓库专属的规则
故障排查遵循先查管控框架的顺序（3.3）

评分： 每个支柱下的达标项计数，任意支柱得分低于3/5 = 存在管控框架缺口。