harness-engineering
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseHarness Engineering
管控框架工程
Core Principle
核心原则
The repo is the harness. Agent failures are harness failures. When an agent breaks a rule, fix the harness — not the agent.
| Pillar | What It Solves |
|---|---|
| Context Engineering | Agent hallucinations, wrong tool usage, stale docs |
| Architectural Constraints | Boundary violations, silent regressions, ambiguous failures |
| Garbage Collection | Entropy accumulation, dead code, doc drift |
代码仓库就是管控框架。Agent故障本质是管控框架故障。如果Agent违反规则,请修复管控框架,而不是Agent。
| 支柱 | 解决问题 |
|---|---|
| 上下文工程 | Agent幻觉、工具使用错误、文档过时 |
| 架构约束 | 边界违反、静默回归、模糊故障 |
| 垃圾回收 | 熵增累积、死代码、文档漂移 |
When to Apply
适用场景
| Situation | Action |
|---|---|
| New repo setup for AI development | Apply all three pillars from day one |
| Agent repeatedly hallucinating tools/APIs | Pillar 1: add tool declarations to AGENTS.md |
| Agent crossing module boundaries | Pillar 2: add structural test enforcing boundary |
| Agent producing code that passes CI but breaks conventions | Pillar 2: convert convention to linter rule or structural test |
| Docs drifting from implementation | Pillar 1: add CI cross-link validation |
| Codebase growing, agent quality degrading | Pillar 3: schedule GC agent for dead code and unused exports |
| Agent ignoring AGENTS.md guidance | Check: is guidance generic advice or a specific failure lesson? Rewrite as failure ledger entry |
| Post-incident on agent-generated code | Add failure to AGENTS.md, add constraint to prevent recurrence |
| 场景 | 处理动作 |
|---|---|
| 新建AI开发代码仓库 | 从第一天起落地全部三大支柱 |
| Agent反复对工具/API产生幻觉 | 支柱1:在AGENTS.md中添加工具声明 |
| Agent跨越模块边界 | 支柱2:添加结构测试强制边界规则 |
| Agent生成的代码通过CI但违反约定 | 支柱2:将约定转化为linter规则或结构测试 |
| 文档与实现不一致 | 支柱1:添加CI交叉链接校验 |
| 代码库膨胀,Agent输出质量下降 | 支柱3:调度GC Agent清理死代码和未使用的导出项 |
| Agent忽略AGENTS.md的指引 | 检查:指引是通用建议还是具体故障教训?重写为故障台账条目 |
| Agent生成代码引发事故后 | 将故障添加到AGENTS.md,添加约束避免再次发生 |
Pillar 1: Context Engineering
支柱1:上下文工程
Goal: Make the repository a knowledge product that agents can consume without hallucination.
目标:将代码仓库打造为Agent可以无幻觉消费的知识产品。
1.1 AGENTS.md as Failure Ledger
1.1 AGENTS.md作为故障台账
Every line in AGENTS.md should trace to a real failure, not generic best practice.
undefinedAGENTS.md中的每一行都应当对应真实发生过的故障,而非通用最佳实践。
undefinedPattern
示例
BAD: "Follow clean code principles"
BAD: "Use meaningful variable names"
GOOD: "Never import from packages/internal — agent imported shared/db directly on 2025-12-03, broke build"
GOOD: "Always use OrderService.create(), not Order.new — direct instantiation skips validation (incident #247)"
**Failure ledger entry format:**
```yaml
rule: Never call PaymentGateway directly from controllers
context: Agent bypassed service layer, sent duplicate charges (2025-11-15)
fix: Use PaymentService.charge() which handles idempotencyWhen writing or updating AGENTS.md:
- If the rule doesn't reference a specific failure or concrete constraint → cut it
- If the rule says "should" → rewrite as "must" with consequence
- If the rule has no enforcement mechanism → pair it with a Pillar 2 constraint
BAD: "Follow clean code principles"
BAD: "Use meaningful variable names"
GOOD: "Never import from packages/internal — agent imported shared/db directly on 2025-12-03, broke build"
GOOD: "Always use OrderService.create(), not Order.new — direct instantiation skips validation (incident #247)"
**故障台账条目格式:**
```yaml
rule: Never call PaymentGateway directly from controllers
context: Agent bypassed service layer, sent duplicate charges (2025-11-15)
fix: Use PaymentService.charge() which handles idempotency编写或更新AGENTS.md时:
- 如果规则没有关联具体故障或明确约束 → 删掉
- 如果规则用了“应该” → 重写为“必须”并附上违规后果
- 如果规则没有落地执行机制 → 搭配支柱2的约束一起使用
1.2 Tool Declaration Mandate
1.2 工具强制声明要求
Undeclared tools don't exist for agents. Explicitly list available tools, commands, and scripts.
markdown
undefined对Agent来说未声明的工具等同于不存在。请明确列出可用的工具、命令和脚本。
markdown
undefinedAvailable Tools
可用工具
- — Run test suite (prefer over raw pytest/rspec)
bin/test - — Run linters with auto-fix
bin/lint - — Reset dev database
bin/db-reset - — Deploy to staging (requires approval)
make deploy-staging
- — 运行测试套件(优先使用,不要直接调用pytest/rspec)
bin/test - — 运行linter并自动修复可修复问题
bin/lint - — 重置开发环境数据库
bin/db-reset - — 部署到预发环境(需要审批)
make deploy-staging
DO NOT USE
禁止使用
- on any directory
rm -rf - Direct database queries in production
- to external APIs without going through ApiClient
curl
If an agent uses a tool not in this list → add it (if valid) or add it to DO NOT USE (if dangerous).- 对任何目录执行
rm -rf - 生产环境直接操作数据库查询
- 不通过ApiClient直接用调用外部API
curl
如果Agent使用了不在列表中的工具 → 如果是合法工具就添加到可用列表,如果是危险工具就添加到禁止使用列表。1.3 Docs as System of Record
1.3 文档作为唯一可信源
RULE: docs/ is canonical truth
ENFORCEMENT: CI validates cross-links between docs/ and source code
MECHANISM:
- Every public API must have a corresponding docs/ entry
- CI script checks: for each @api-doc tag in source → matching file in docs/
- Broken link = CI failure, not warningStale docs are worse than no docs — agents trust what they read.
规则:docs/目录是权威真值
落地方式:CI校验docs/目录和源码之间的交叉链接
实现逻辑:
- 每个公开API必须在docs/目录下有对应说明
- CI脚本检查:源码中每个带@api-doc标签的内容 → 都在docs/下有匹配文件
- 链接断裂 = CI失败,不是警告过时的文档比没有文档更糟——Agent会信任它读到的内容。
1.4 Isolated Observability Per Task
1.4 按任务隔离可观测性
Each agent task gets its own log context, not a shared monitoring stream.
PATTERN:
- Assign task_id to each agent invocation
- Route logs to task-specific output (file, log group, trace)
- Post-task: review task log for failures, add to AGENTS.md if new patternShared monitoring hides individual agent failures in noise.
每个Agent任务都有独立的日志上下文,不要共用监控流。
模式:
- 每次Agent调用都分配单独的task_id
- 日志输出到任务专属的位置(文件、日志组、链路追踪)
- 任务结束后:审查任务日志排查故障,如果发现新模式就添加到AGENTS.md共用监控会把单个Agent的故障淹没在噪音里。
Pillar 2: Architectural Constraints
支柱2:架构约束
Goal: Make violations impossible or immediately visible through deterministic enforcement.
目标:通过确定性的规则执行,让违规行为要么不可能发生,要么立刻被发现。
2.1 Teaching Linter Errors
2.1 引导式linter错误
Failure messages must include remediation, not just violation.
undefined错误信息必须包含修复方案,而不仅仅是告知违规。
undefinedBAD linter output
不好的linter输出
ERROR: Import violation in src/api/handler.ts
ERROR: Import violation in src/api/handler.ts
GOOD linter output
好的linter输出
ERROR: Import violation in src/api/handler.ts
↳ Cannot import from 'packages/db' in 'src/api/'
↳ Use 'packages/db-client' instead (facade pattern)
↳ See: docs/architecture/data-access.md
Agents read error messages literally. A teaching error message prevents the same mistake on next attempt.ERROR: Import violation in src/api/handler.ts
↳ Cannot import from 'packages/db' in 'src/api/'
↳ Use 'packages/db-client' instead (facade pattern)
↳ See: docs/architecture/data-access.md
Agent会逐字读取错误信息,带引导的错误信息可以避免下次再犯同样的错误。2.2 Structural Tests
2.2 结构测试
Enforce architectural boundaries in CI, not in documentation.
undefined在CI中落地架构边界约束,不要只写在文档里。
undefinedPseudocode structural tests
结构测试伪代码
test "no cross-boundary imports":
For each FILE in src/api/**:
IMPORTS = parse_imports(FILE)
FORBIDDEN = ["packages/internal", "src/admin", "src/worker"]
assert IMPORTS intersection FORBIDDEN == empty
test "dependency direction":
LAYERS = [presentation, application, domain, infrastructure]
For each LAYER in LAYERS:
For each IMPORT in LAYER.imports:
assert IMPORT.layer_index >= LAYER.index # only import same or lower
test "API contract stability":
CURRENT = parse_openapi("api/openapi.yml")
PREVIOUS = parse_openapi("api/openapi.yml", ref="main")
assert no_breaking_changes(CURRENT, PREVIOUS)
undefinedtest "no cross-boundary imports":
For each FILE in src/api/**:
IMPORTS = parse_imports(FILE)
FORBIDDEN = ["packages/internal", "src/admin", "src/worker"]
assert IMPORTS intersection FORBIDDEN == empty
test "dependency direction":
LAYERS = [presentation, application, domain, infrastructure]
For each LAYER in LAYERS:
For each IMPORT in LAYER.imports:
assert IMPORT.layer_index >= LAYER.index # 仅允许导入同层或下层依赖
test "API contract stability":
CURRENT = parse_openapi("api/openapi.yml")
PREVIOUS = parse_openapi("api/openapi.yml", ref="main")
assert no_breaking_changes(CURRENT, PREVIOUS)
undefined2.3 Numeric CI Gates
2.3 数字化CI门禁
Every check must be binary pass/fail. Advisory warnings are invisible to agents.
| Check Type | Gate Implementation |
|---|---|
| Test coverage | |
| Bundle size | |
| Lint errors | |
| Type errors | |
| Security vulns | |
Rule: If it matters, it's a gate. If it's a warning, agents will ignore it.
所有检查都必须是二元的通过/失败结果,建议性警告对Agent来说等同于不存在。
| 检查类型 | 门禁实现 |
|---|---|
| 测试覆盖率 | |
| 构建包体积 | |
| Lint错误 | |
| 类型错误 | |
| 安全漏洞 | |
规则:如果某条要求很重要,就做成门禁。如果只是警告,Agent会直接忽略。
2.4 Dependency Layering
2.4 依赖分层
Make dependency direction explicit and enforced.
undefined让依赖方向明确且可被强制校验。
undefined.dependency-layers.yml (or equivalent config)
.dependency-layers.yml (或等效配置文件)
layers:
-
name: presentation paths: ["src/ui/", "src/api/routes/"] can_import: [application, domain]
-
name: application paths: ["src/services/", "src/use-cases/"] can_import: [domain, infrastructure]
-
name: domain paths: ["src/models/", "src/entities/"] can_import: [] # domain has no dependencies
-
name: infrastructure paths: ["src/db/", "src/external/"] can_import: [domain]
Without explicit layers, agents will create circular dependencies.layers:
-
name: presentation paths: ["src/ui/", "src/api/routes/"] can_import: [application, domain]
-
name: application paths: ["src/services/", "src/use-cases/"] can_import: [domain, infrastructure]
-
name: domain paths: ["src/models/", "src/entities/"] can_import: [] # 领域层没有外部依赖
-
name: infrastructure paths: ["src/db/", "src/external/"] can_import: [domain]
没有明确的分层规则,Agent会生成循环依赖。Pillar 3: Garbage Collection
支柱3:垃圾回收
Goal: Actively fight entropy rather than accumulating technical debt.
目标:主动对抗熵增,避免技术债务累积。
3.1 Background GC Agents
3.1 后台GC Agent
Schedule periodic scans that produce small, auto-mergeable PRs.
GC_TASKS:
- dead_code: find unused exports, unreachable functions → remove
- stale_docs: find docs/ entries with no matching source → flag
- unused_deps: find package.json/Gemfile entries with no imports → remove
- orphan_tests: find test files with no matching source file → flag
- config_drift: diff .env.example vs actual config usage → reconcile
FREQUENCY: weekly for active repos, monthly for stable repos
OUTPUT: one small PR per GC task (not one mega-PR)
MERGE: auto-merge if CI passes, otherwise flag for review定期调度扫描任务,生成小型可自动合并的PR。
GC_TASKS:
- dead_code: 查找未使用的导出项、不可达函数 → 移除
- stale_docs: 查找docs/下没有对应源码的条目 → 标记
- unused_deps: 查找package.json/Gemfile中没有被导入的依赖 → 移除
- orphan_tests: 查找没有对应源码的测试文件 → 标记
- config_drift: 对比.env.example和实际配置使用情况 → 校准
频率:活跃仓库每周一次,稳定仓库每月一次
输出:每个GC任务生成一个小型PR(不要合并成一个超大PR)
合并:如果CI通过就自动合并,否则标记待人工审核3.2 Custom Verification Tools
3.2 自定义校验工具
Build repo-specific fast-feedback tools instead of relying on generic linters.
PRINCIPLE: fast feedback > comprehensive analysis
Examples:
- bin/check-api-contracts → validates OpenAPI spec matches routes (5s)
- bin/check-imports → validates dependency layers (2s)
- bin/check-docs → validates doc cross-links (3s)
RULE: if a custom check takes > 30s, it's too slow for agent feedback loops搭建仓库专属的快速反馈工具,不要只依赖通用linter。
原则:快速反馈 > 全面分析
示例:
- bin/check-api-contracts → 校验OpenAPI规范和路由是否匹配(5秒)
- bin/check-imports → 校验依赖分层(2秒)
- bin/check-docs → 校验文档交叉链接(3秒)
规则:如果自定义检查耗时超过30秒,对Agent的反馈循环来说太慢了3.3 Feedback Loop Discipline
3.3 反馈循环规范
When an agent produces bad output, follow this diagnostic order:
1. Check AGENTS.md — is the constraint documented?
If NO → add it (Pillar 1 fix)
If YES → is it enforced in CI?
2. Check CI — does a gate catch this failure?
If NO → add structural test or linter rule (Pillar 2 fix)
If YES → is the error message teaching?
3. Check error message — does it include remediation?
If NO → improve the error message (Pillar 2.1 fix)
If YES → the harness is correct, investigate agent-specific issue
NEVER: Skip to "the agent is broken" without completing steps 1-3当Agent产出错误结果时,按照以下顺序排查:
1. 检查AGENTS.md —— 对应的约束有没有记录?
没有 → 添加约束(支柱1修复)
有 → 检查CI中有没有落地对应的校验?
2. 检查CI —— 有没有门禁能捕获这个故障?
没有 → 添加结构测试或者linter规则(支柱2修复)
有 → 检查错误信息有没有引导修复的内容?
3. 检查错误信息 —— 有没有包含修复方案?
没有 → 优化错误信息(支柱2.1修复)
有 → 管控框架没有问题,排查Agent本身的问题
禁止:不完成前3步就直接判定是Agent的问题Harness Assessment Checklist
管控框架评估清单
Evaluate an existing repo's harness maturity:
评估现有仓库的管控框架成熟度:
Context Engineering
上下文工程
- AGENTS.md exists at repo root
- AGENTS.md entries reference specific failures, not generic advice
- Available tools/scripts are explicitly listed
- Forbidden operations are explicitly listed
- has CI-enforced cross-links to source
docs/
- 仓库根目录存在AGENTS.md
- AGENTS.md的条目关联具体故障,不是通用建议
- 明确列出了可用的工具/脚本
- 明确列出了禁止的操作
- 目录和源码的交叉链接有CI强制校验
docs/
Architectural Constraints
架构约束
- Linter errors include remediation instructions
- At least one structural test enforces module boundaries
- All CI checks are binary pass/fail (no advisory warnings)
- Dependency direction is documented and enforced
- Linter错误包含修复指引
- 至少有一个结构测试强制校验模块边界
- 所有CI检查都是二元的通过/失败(没有建议性警告)
- 依赖方向有文档说明且被强制校验
Garbage Collection
垃圾回收
- Dead code removal happens on a schedule (not ad-hoc)
- Stale docs are detected automatically
- Custom verification tools exist for repo-specific invariants
- Post-failure diagnosis follows harness-first order (3.3)
Scoring: Count passing items per pillar. Below 3/5 in any pillar = harness gap.
- 定期清理死代码(不是临时处理)
- 自动检测过时文档
- 有自定义校验工具处理仓库专属的规则
- 故障排查遵循先查管控框架的顺序(3.3)
评分: 每个支柱下的达标项计数,任意支柱得分低于3/5 = 存在管控框架缺口。
Cross-References
相关参考
- — AGENTS.md structure, generation process, JIT indexing
hierarchical-agents - — structural test implementation via red-green-refactor
tdd-workflow - — observability patterns for task-level isolation
structured-logging - agent — CI gate execution mechanics
quality-gate - references/patterns-catalog.md — topology templates, AGENTS.md patterns, structural test examples, GC agent templates
- — AGENTS.md结构、生成流程、即时索引
hierarchical-agents - — 用红-绿-重构的方式实现结构测试
tdd-workflow - — 任务级隔离的可观测性模式
structured-logging - agent — CI门禁执行机制
quality-gate - references/patterns-catalog.md — 拓扑模板、AGENTS.md模式、结构测试示例、GC Agent模板