harness-engineering

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Harness Engineering

管控框架工程

Core Principle

核心原则

The repo is the harness. Agent failures are harness failures. When an agent breaks a rule, fix the harness — not the agent.
PillarWhat It Solves
Context EngineeringAgent hallucinations, wrong tool usage, stale docs
Architectural ConstraintsBoundary violations, silent regressions, ambiguous failures
Garbage CollectionEntropy accumulation, dead code, doc drift
代码仓库就是管控框架。Agent故障本质是管控框架故障。如果Agent违反规则,请修复管控框架,而不是Agent。
支柱解决问题
上下文工程Agent幻觉、工具使用错误、文档过时
架构约束边界违反、静默回归、模糊故障
垃圾回收熵增累积、死代码、文档漂移

When to Apply

适用场景

SituationAction
New repo setup for AI developmentApply all three pillars from day one
Agent repeatedly hallucinating tools/APIsPillar 1: add tool declarations to AGENTS.md
Agent crossing module boundariesPillar 2: add structural test enforcing boundary
Agent producing code that passes CI but breaks conventionsPillar 2: convert convention to linter rule or structural test
Docs drifting from implementationPillar 1: add CI cross-link validation
Codebase growing, agent quality degradingPillar 3: schedule GC agent for dead code and unused exports
Agent ignoring AGENTS.md guidanceCheck: is guidance generic advice or a specific failure lesson? Rewrite as failure ledger entry
Post-incident on agent-generated codeAdd failure to AGENTS.md, add constraint to prevent recurrence
场景处理动作
新建AI开发代码仓库从第一天起落地全部三大支柱
Agent反复对工具/API产生幻觉支柱1:在AGENTS.md中添加工具声明
Agent跨越模块边界支柱2:添加结构测试强制边界规则
Agent生成的代码通过CI但违反约定支柱2:将约定转化为linter规则或结构测试
文档与实现不一致支柱1:添加CI交叉链接校验
代码库膨胀,Agent输出质量下降支柱3:调度GC Agent清理死代码和未使用的导出项
Agent忽略AGENTS.md的指引检查:指引是通用建议还是具体故障教训?重写为故障台账条目
Agent生成代码引发事故后将故障添加到AGENTS.md,添加约束避免再次发生

Pillar 1: Context Engineering

支柱1:上下文工程

Goal: Make the repository a knowledge product that agents can consume without hallucination.
目标:将代码仓库打造为Agent可以无幻觉消费的知识产品。

1.1 AGENTS.md as Failure Ledger

1.1 AGENTS.md作为故障台账

Every line in AGENTS.md should trace to a real failure, not generic best practice.
undefined
AGENTS.md中的每一行都应当对应真实发生过的故障,而非通用最佳实践。
undefined

Pattern

示例

BAD: "Follow clean code principles" BAD: "Use meaningful variable names" GOOD: "Never import from packages/internal — agent imported shared/db directly on 2025-12-03, broke build" GOOD: "Always use OrderService.create(), not Order.new — direct instantiation skips validation (incident #247)"

**Failure ledger entry format:**
```yaml
rule: Never call PaymentGateway directly from controllers
context: Agent bypassed service layer, sent duplicate charges (2025-11-15)
fix: Use PaymentService.charge() which handles idempotency
When writing or updating AGENTS.md:
  • If the rule doesn't reference a specific failure or concrete constraint → cut it
  • If the rule says "should" → rewrite as "must" with consequence
  • If the rule has no enforcement mechanism → pair it with a Pillar 2 constraint
BAD: "Follow clean code principles" BAD: "Use meaningful variable names" GOOD: "Never import from packages/internal — agent imported shared/db directly on 2025-12-03, broke build" GOOD: "Always use OrderService.create(), not Order.new — direct instantiation skips validation (incident #247)"

**故障台账条目格式:**
```yaml
rule: Never call PaymentGateway directly from controllers
context: Agent bypassed service layer, sent duplicate charges (2025-11-15)
fix: Use PaymentService.charge() which handles idempotency
编写或更新AGENTS.md时:
  • 如果规则没有关联具体故障或明确约束 → 删掉
  • 如果规则用了“应该” → 重写为“必须”并附上违规后果
  • 如果规则没有落地执行机制 → 搭配支柱2的约束一起使用

1.2 Tool Declaration Mandate

1.2 工具强制声明要求

Undeclared tools don't exist for agents. Explicitly list available tools, commands, and scripts.
markdown
undefined
对Agent来说未声明的工具等同于不存在。请明确列出可用的工具、命令和脚本。
markdown
undefined

Available Tools

可用工具

  • bin/test
    — Run test suite (prefer over raw pytest/rspec)
  • bin/lint
    — Run linters with auto-fix
  • bin/db-reset
    — Reset dev database
  • make deploy-staging
    — Deploy to staging (requires approval)
  • bin/test
    — 运行测试套件(优先使用,不要直接调用pytest/rspec)
  • bin/lint
    — 运行linter并自动修复可修复问题
  • bin/db-reset
    — 重置开发环境数据库
  • make deploy-staging
    — 部署到预发环境(需要审批)

DO NOT USE

禁止使用

  • rm -rf
    on any directory
  • Direct database queries in production
  • curl
    to external APIs without going through ApiClient

If an agent uses a tool not in this list → add it (if valid) or add it to DO NOT USE (if dangerous).
  • 对任何目录执行
    rm -rf
  • 生产环境直接操作数据库查询
  • 不通过ApiClient直接用
    curl
    调用外部API

如果Agent使用了不在列表中的工具 → 如果是合法工具就添加到可用列表,如果是危险工具就添加到禁止使用列表。

1.3 Docs as System of Record

1.3 文档作为唯一可信源

RULE: docs/ is canonical truth
ENFORCEMENT: CI validates cross-links between docs/ and source code
MECHANISM:
  - Every public API must have a corresponding docs/ entry
  - CI script checks: for each @api-doc tag in source → matching file in docs/
  - Broken link = CI failure, not warning
Stale docs are worse than no docs — agents trust what they read.
规则:docs/目录是权威真值
落地方式:CI校验docs/目录和源码之间的交叉链接
实现逻辑:
  - 每个公开API必须在docs/目录下有对应说明
  - CI脚本检查:源码中每个带@api-doc标签的内容 → 都在docs/下有匹配文件
  - 链接断裂 = CI失败,不是警告
过时的文档比没有文档更糟——Agent会信任它读到的内容。

1.4 Isolated Observability Per Task

1.4 按任务隔离可观测性

Each agent task gets its own log context, not a shared monitoring stream.
PATTERN:
  - Assign task_id to each agent invocation
  - Route logs to task-specific output (file, log group, trace)
  - Post-task: review task log for failures, add to AGENTS.md if new pattern
Shared monitoring hides individual agent failures in noise.
每个Agent任务都有独立的日志上下文,不要共用监控流。
模式:
  - 每次Agent调用都分配单独的task_id
  - 日志输出到任务专属的位置(文件、日志组、链路追踪)
  - 任务结束后:审查任务日志排查故障,如果发现新模式就添加到AGENTS.md
共用监控会把单个Agent的故障淹没在噪音里。

Pillar 2: Architectural Constraints

支柱2:架构约束

Goal: Make violations impossible or immediately visible through deterministic enforcement.
目标:通过确定性的规则执行,让违规行为要么不可能发生,要么立刻被发现。

2.1 Teaching Linter Errors

2.1 引导式linter错误

Failure messages must include remediation, not just violation.
undefined
错误信息必须包含修复方案,而不仅仅是告知违规。
undefined

BAD linter output

不好的linter输出

ERROR: Import violation in src/api/handler.ts
ERROR: Import violation in src/api/handler.ts

GOOD linter output

好的linter输出

ERROR: Import violation in src/api/handler.ts ↳ Cannot import from 'packages/db' in 'src/api/' ↳ Use 'packages/db-client' instead (facade pattern) ↳ See: docs/architecture/data-access.md

Agents read error messages literally. A teaching error message prevents the same mistake on next attempt.
ERROR: Import violation in src/api/handler.ts ↳ Cannot import from 'packages/db' in 'src/api/' ↳ Use 'packages/db-client' instead (facade pattern) ↳ See: docs/architecture/data-access.md

Agent会逐字读取错误信息,带引导的错误信息可以避免下次再犯同样的错误。

2.2 Structural Tests

2.2 结构测试

Enforce architectural boundaries in CI, not in documentation.
undefined
在CI中落地架构边界约束,不要只写在文档里。
undefined

Pseudocode structural tests

结构测试伪代码

test "no cross-boundary imports": For each FILE in src/api/**: IMPORTS = parse_imports(FILE) FORBIDDEN = ["packages/internal", "src/admin", "src/worker"] assert IMPORTS intersection FORBIDDEN == empty
test "dependency direction": LAYERS = [presentation, application, domain, infrastructure] For each LAYER in LAYERS: For each IMPORT in LAYER.imports: assert IMPORT.layer_index >= LAYER.index # only import same or lower
test "API contract stability": CURRENT = parse_openapi("api/openapi.yml") PREVIOUS = parse_openapi("api/openapi.yml", ref="main") assert no_breaking_changes(CURRENT, PREVIOUS)
undefined
test "no cross-boundary imports": For each FILE in src/api/**: IMPORTS = parse_imports(FILE) FORBIDDEN = ["packages/internal", "src/admin", "src/worker"] assert IMPORTS intersection FORBIDDEN == empty
test "dependency direction": LAYERS = [presentation, application, domain, infrastructure] For each LAYER in LAYERS: For each IMPORT in LAYER.imports: assert IMPORT.layer_index >= LAYER.index # 仅允许导入同层或下层依赖
test "API contract stability": CURRENT = parse_openapi("api/openapi.yml") PREVIOUS = parse_openapi("api/openapi.yml", ref="main") assert no_breaking_changes(CURRENT, PREVIOUS)
undefined

2.3 Numeric CI Gates

2.3 数字化CI门禁

Every check must be binary pass/fail. Advisory warnings are invisible to agents.
Check TypeGate Implementation
Test coverage
coverage >= 80%
or fail
Bundle size
size <= 250KB
or fail
Lint errors
errors == 0
or fail
Type errors
errors == 0
or fail
Security vulns
critical == 0
or fail
Rule: If it matters, it's a gate. If it's a warning, agents will ignore it.
所有检查都必须是二元的通过/失败结果,建议性警告对Agent来说等同于不存在。
检查类型门禁实现
测试覆盖率
覆盖率 >= 80%
否则失败
构建包体积
体积 <= 250KB
否则失败
Lint错误
错误数 == 0
否则失败
类型错误
错误数 == 0
否则失败
安全漏洞
严重漏洞数 == 0
否则失败
规则:如果某条要求很重要,就做成门禁。如果只是警告,Agent会直接忽略。

2.4 Dependency Layering

2.4 依赖分层

Make dependency direction explicit and enforced.
undefined
让依赖方向明确且可被强制校验。
undefined

.dependency-layers.yml (or equivalent config)

.dependency-layers.yml (或等效配置文件)

layers:
  • name: presentation paths: ["src/ui/", "src/api/routes/"] can_import: [application, domain]
  • name: application paths: ["src/services/", "src/use-cases/"] can_import: [domain, infrastructure]
  • name: domain paths: ["src/models/", "src/entities/"] can_import: [] # domain has no dependencies
  • name: infrastructure paths: ["src/db/", "src/external/"] can_import: [domain]

Without explicit layers, agents will create circular dependencies.
layers:
  • name: presentation paths: ["src/ui/", "src/api/routes/"] can_import: [application, domain]
  • name: application paths: ["src/services/", "src/use-cases/"] can_import: [domain, infrastructure]
  • name: domain paths: ["src/models/", "src/entities/"] can_import: [] # 领域层没有外部依赖
  • name: infrastructure paths: ["src/db/", "src/external/"] can_import: [domain]

没有明确的分层规则,Agent会生成循环依赖。

Pillar 3: Garbage Collection

支柱3:垃圾回收

Goal: Actively fight entropy rather than accumulating technical debt.
目标:主动对抗熵增,避免技术债务累积。

3.1 Background GC Agents

3.1 后台GC Agent

Schedule periodic scans that produce small, auto-mergeable PRs.
GC_TASKS:
  - dead_code: find unused exports, unreachable functions → remove
  - stale_docs: find docs/ entries with no matching source → flag
  - unused_deps: find package.json/Gemfile entries with no imports → remove
  - orphan_tests: find test files with no matching source file → flag
  - config_drift: diff .env.example vs actual config usage → reconcile

FREQUENCY: weekly for active repos, monthly for stable repos
OUTPUT: one small PR per GC task (not one mega-PR)
MERGE: auto-merge if CI passes, otherwise flag for review
定期调度扫描任务,生成小型可自动合并的PR。
GC_TASKS:
  - dead_code: 查找未使用的导出项、不可达函数 → 移除
  - stale_docs: 查找docs/下没有对应源码的条目 → 标记
  - unused_deps: 查找package.json/Gemfile中没有被导入的依赖 → 移除
  - orphan_tests: 查找没有对应源码的测试文件 → 标记
  - config_drift: 对比.env.example和实际配置使用情况 → 校准

频率:活跃仓库每周一次,稳定仓库每月一次
输出:每个GC任务生成一个小型PR(不要合并成一个超大PR)
合并:如果CI通过就自动合并,否则标记待人工审核

3.2 Custom Verification Tools

3.2 自定义校验工具

Build repo-specific fast-feedback tools instead of relying on generic linters.
PRINCIPLE: fast feedback > comprehensive analysis

Examples:
  - bin/check-api-contracts → validates OpenAPI spec matches routes (5s)
  - bin/check-imports → validates dependency layers (2s)
  - bin/check-docs → validates doc cross-links (3s)

RULE: if a custom check takes > 30s, it's too slow for agent feedback loops
搭建仓库专属的快速反馈工具,不要只依赖通用linter。
原则:快速反馈 > 全面分析

示例:
  - bin/check-api-contracts → 校验OpenAPI规范和路由是否匹配(5秒)
  - bin/check-imports → 校验依赖分层(2秒)
  - bin/check-docs → 校验文档交叉链接(3秒)

规则:如果自定义检查耗时超过30秒,对Agent的反馈循环来说太慢了

3.3 Feedback Loop Discipline

3.3 反馈循环规范

When an agent produces bad output, follow this diagnostic order:
1. Check AGENTS.md — is the constraint documented?
   If NO → add it (Pillar 1 fix)
   If YES → is it enforced in CI?

2. Check CI — does a gate catch this failure?
   If NO → add structural test or linter rule (Pillar 2 fix)
   If YES → is the error message teaching?

3. Check error message — does it include remediation?
   If NO → improve the error message (Pillar 2.1 fix)
   If YES → the harness is correct, investigate agent-specific issue

NEVER: Skip to "the agent is broken" without completing steps 1-3
当Agent产出错误结果时,按照以下顺序排查:
1. 检查AGENTS.md —— 对应的约束有没有记录?
   没有 → 添加约束(支柱1修复)
   有 → 检查CI中有没有落地对应的校验?

2. 检查CI —— 有没有门禁能捕获这个故障?
   没有 → 添加结构测试或者linter规则(支柱2修复)
   有 → 检查错误信息有没有引导修复的内容?

3. 检查错误信息 —— 有没有包含修复方案?
   没有 → 优化错误信息(支柱2.1修复)
   有 → 管控框架没有问题,排查Agent本身的问题

禁止:不完成前3步就直接判定是Agent的问题

Harness Assessment Checklist

管控框架评估清单

Evaluate an existing repo's harness maturity:
评估现有仓库的管控框架成熟度:

Context Engineering

上下文工程

  • AGENTS.md exists at repo root
  • AGENTS.md entries reference specific failures, not generic advice
  • Available tools/scripts are explicitly listed
  • Forbidden operations are explicitly listed
  • docs/
    has CI-enforced cross-links to source
  • 仓库根目录存在AGENTS.md
  • AGENTS.md的条目关联具体故障,不是通用建议
  • 明确列出了可用的工具/脚本
  • 明确列出了禁止的操作
  • docs/
    目录和源码的交叉链接有CI强制校验

Architectural Constraints

架构约束

  • Linter errors include remediation instructions
  • At least one structural test enforces module boundaries
  • All CI checks are binary pass/fail (no advisory warnings)
  • Dependency direction is documented and enforced
  • Linter错误包含修复指引
  • 至少有一个结构测试强制校验模块边界
  • 所有CI检查都是二元的通过/失败(没有建议性警告)
  • 依赖方向有文档说明且被强制校验

Garbage Collection

垃圾回收

  • Dead code removal happens on a schedule (not ad-hoc)
  • Stale docs are detected automatically
  • Custom verification tools exist for repo-specific invariants
  • Post-failure diagnosis follows harness-first order (3.3)
Scoring: Count passing items per pillar. Below 3/5 in any pillar = harness gap.
  • 定期清理死代码(不是临时处理)
  • 自动检测过时文档
  • 有自定义校验工具处理仓库专属的规则
  • 故障排查遵循先查管控框架的顺序(3.3)
评分: 每个支柱下的达标项计数,任意支柱得分低于3/5 = 存在管控框架缺口。

Cross-References

相关参考

  • hierarchical-agents
    — AGENTS.md structure, generation process, JIT indexing
  • tdd-workflow
    — structural test implementation via red-green-refactor
  • structured-logging
    — observability patterns for task-level isolation
  • quality-gate
    agent — CI gate execution mechanics
  • references/patterns-catalog.md — topology templates, AGENTS.md patterns, structural test examples, GC agent templates
  • hierarchical-agents
    — AGENTS.md结构、生成流程、即时索引
  • tdd-workflow
    — 用红-绿-重构的方式实现结构测试
  • structured-logging
    — 任务级隔离的可观测性模式
  • quality-gate
    agent — CI门禁执行机制
  • references/patterns-catalog.md — 拓扑模板、AGENTS.md模式、结构测试示例、GC Agent模板