harness-engineering

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Harness: Agent-First Engineering Scaffolding

Harness：Agent优先的工程脚手架

The harness is the scaffolding that makes coding agents effective in a repository. It encodes the knowledge, boundaries, and rules that an agent needs to reason about the full business domain directly from the repo itself.

The philosophy: agents execute, humans steer. The engineer's job is not to write code but to design environments, specify intent, and build feedback loops. The harness is what makes this possible.

Harness是让编码Agent在仓库中高效工作的脚手架。它将Agent理解整个业务领域所需的知识、边界和规则直接编码到仓库本身中。

核心理念：Agent执行，人类引导。工程师的工作不是编写代码，而是设计环境、明确意图并构建反馈循环。Harness让这一切成为可能。

Why It Matters

重要性

From the agent's point of view, anything it can't access in-context while running effectively doesn't exist. Slack discussions, Google Docs, tacit team knowledge — all invisible. The harness makes this knowledge legible by encoding it as repository-local, versioned artifacts.

A well-harnessed repo gives agents:

A map (AGENTS.md) — where to look, not what to memorize
Boundaries (domain specs) — what can depend on what
Rules (golden principles) — taste encoded as enforceable invariants
Quality baselines (scoring) — where the gaps are

Without this scaffolding, agents replicate whatever patterns they find — including bad ones. The harness is what prevents entropy from compounding.

从Agent的角度来看，任何它在运行时无法获取上下文的内容都相当于不存在。Slack讨论、Google Docs、团队隐性知识——所有这些都是不可见的。Harness通过将这些知识编码为仓库本地的、版本化的工件，使其变得清晰可辨。

搭建完善Harness的仓库能为Agent提供：

图谱（AGENTS.md）——指引查找方向，而非要求记忆内容
边界（领域配置）——明确依赖关系范围
规则（黄金准则）——将编码风格转化为可执行的不变量
质量基准（评分）——指出改进差距

没有这种脚手架，Agent会复制它找到的任何模式——包括不良模式。Harness是防止熵增加剧的关键。

Harness Maturity Model

Harness成熟度模型

The harness is built incrementally. Each level builds on the previous — don't try to jump from Level 0 to Level 4 in one pass. Assess the current level and build toward the next.

Level	Name	What it enables	Key artifacts
0	Unharnessed	Agents guess everything — no map, no rules	Nothing
1	Map	Agents know where to look and what the codebase does	AGENTS.md, ARCHITECTURE.md, docs/
2	Rules	Agents know what's allowed and what isn't	.harness/principles.yml, enforcement.yml, domains.yml
3	Feedback	Agents self-correct via quality signals and process patterns	.harness/quality.yml, doc-gardening, GC sweeps
4	Autonomy	Agents operate independently with defined escalation boundaries	Worktree isolation, escalation rules, agent-to-agent review

Each level compounds. A repo at Level 2 without Level 1 has rules nobody can find. A repo at Level 3 without Level 2 has quality grades but no way to enforce improvement. Build the foundation first.

During assessment (Phase 1), determine the current maturity level. During planning (Phase 2), target the next level — not all levels at once. Repeat the harness workflow to climb.

Critical exception: for repos where agents are actively writing code (agent-first or agent-assisted), architecture boundaries (domain definitions and the forward-only dependency rule) should be co-created alongside the knowledge layer, not deferred to a separate Level 2 pass. The article is explicit: strict architecture is a day-one prerequisite for agent-driven development, not a scaling concern. Without boundaries, agents produce code faster than entropy can be contained.

Harness是逐步构建的。每个层级都建立在前一层级之上——不要试图从0级直接跳到4级。评估当前层级，然后朝着下一层级构建。

层级	名称	实现能力	核心工件
0	未搭建Harness	Agent只能猜测所有内容——无图谱、无规则	无
1	图谱层	Agent知道去哪里查找以及代码库的功能	AGENTS.md、ARCHITECTURE.md、docs/
2	规则层	Agent知道哪些操作被允许、哪些被禁止	.harness/principles.yml、enforcement.yml、domains.yml
3	反馈层	Agent通过质量信号和流程模式自我修正	.harness/quality.yml、文档维护、GC清理
4	自主层	Agent在定义的升级边界内独立运行	工作树隔离、升级规则、Agent间评审

每个层级都是递进的。没有1级基础的2级仓库，其规则无人能找到；没有2级基础的3级仓库，虽有质量评分但无法强制改进。先打好基础。

在评估阶段（第1阶段），确定当前成熟度层级。在规划阶段（第2阶段），以相邻的下一层级为目标——而非一次性完成所有层级。重复Harness工作流逐步提升层级。

关键例外：对于Agent主动编写代码的仓库（Agent优先或Agent辅助），架构边界（领域定义和单向依赖规则）应与知识层同步创建，而非推迟到单独的2阶段工作。明确说明：严格的架构是Agent驱动开发的首日必备条件，而非后期扩展需求。没有边界，Agent生成代码的速度会快到无法控制熵增。

Multi-Turn Workflow

多轮工作流

Building a harness is interactive. Work through the phases below, presenting results and waiting for user confirmation at each phase boundary. The user may want to skip, reorder, or expand phases — follow their lead.

Phase 1: Assess    → Analyze the repo, report agent readiness
Phase 2: Plan      → Propose a tailored harness, user confirms
Phase 3: Knowledge → AGENTS.md, docs/, ARCHITECTURE.md
Phase 4: Domains   → Identify domains, map layers, generate .harness/domains.yml
Phase 5: Enforce   → Golden principles, rules → .harness/principles.yml, enforcement.yml
Phase 6: Quality   → Grade domains → .harness/quality.yml
Phase 7: Process   → Doc-gardening, GC, review patterns
Phase 8: Verify    → Cross-check everything, report completeness

搭建Harness是一个交互式过程。按照以下阶段推进，在每个阶段结束时呈现结果并等待用户确认。用户可能希望跳过、重新排序或扩展阶段——遵循用户的需求。

Phase 1: Assess    → 分析仓库，报告Agent就绪情况
Phase 2: Plan      → 提出定制化Harness方案，等待用户确认
Phase 3: Knowledge → 构建AGENTS.md、docs/、ARCHITECTURE.md
Phase 4: Domains   → 识别领域、映射层级、生成.harness/domains.yml
Phase 5: Enforce   → 黄金准则、规则 → .harness/principles.yml、enforcement.yml
Phase 6: Quality   → 为领域评分 → .harness/quality.yml
Phase 7: Process   → 文档维护、GC清理、评审模式
Phase 8: Verify    → 交叉检查所有内容，报告完成情况

Depth-First Bootstrap

深度优先引导

Build the harness depth-first, not breadth-first. Early harness work is slower than expected — not because the repo is broken, but because the environment is underspecified. Each phase unlocks the next:

AGENTS.md unlocks docs/ (agents know where to put deeper content)
docs/ unlocks architecture awareness (agents can read domain context)
Architecture specs unlock correct domain identification
Domain specs unlock meaningful enforcement rules
Enforcement rules unlock quality scoring (you can't grade what you can't check)

When something fails, the fix is almost never "try harder." Ask: what capability or context is missing? Then build that piece first.

For updates to an existing harness, the same phases apply but the assessment diffs against current .harness/ specs and only what has drifted gets updated.

以深度优先而非广度优先的方式构建Harness。早期的Harness工作比预期更慢——不是因为仓库存在问题，而是因为环境定义不充分。每个阶段都会解锁下一个阶段：

AGENTS.md解锁docs/（Agent知道将深层内容放在何处）
docs/解锁架构认知（Agent可以读取领域上下文）
架构配置解锁正确的领域识别
领域配置解锁有意义的执行规则
执行规则解锁质量评分（无法评估未检查的内容）

当出现问题时，解决方案几乎从来不是“更努力尝试”。要问：缺少什么能力或上下文？ 然后先构建缺失的部分。

对于现有Harness的更新，同样适用上述阶段，但评估会对比当前.harness/配置，仅更新发生漂移的部分。

Phase 1: Assess

第1阶段：评估

Examine the repository across every harness layer. The goal is understanding what exists, what's missing, and what's misaligned — not immediately fixing things.

What to examine:

Area	What to look for
Structure	Directories, languages, package manifests, monorepo vs single-package
Tech stack	Frameworks, build systems, deployment targets, dependency managers
Agent config	AGENTS.md, CLAUDE.md, .cursor/, .github/copilot/, any existing agent instructions
Documentation	README, docs/, architecture docs, ADRs, inline doc comments
Code organization	Domain structure, module boundaries, import patterns, dependency graph
Tests	Frameworks, coverage, CI gates, test organization
Observability	Logging patterns (structured?), metrics, error handling, tracing
Process	PR templates, review workflow, CI/CD configuration
Team config	Agent-first (agents write 90%+ code), agent-assisted (mixed), or agent-ready (preparing for future agent use)
Dependencies	Are external libraries agent-legible? Stable APIs, good docs, training data representation?

Read

references/assessment.md

for the detailed checklist and scoring rubric.

Team configuration shapes the harness: an agent-first repo needs strong enforcement and GC from day one. An agent-assisted repo needs clear boundaries but can rely more on human review. An agent-ready repo mostly needs the knowledge layer.

Output: An Agent Readiness Report — a structured summary of current state per layer, key findings, and recommended harness components (prioritized).

Present the report and wait for the user to confirm or adjust before planning.

检查仓库的所有Harness层级。目标是了解现有内容、缺失内容和不一致内容——而非立即修复问题。

检查内容：

领域	检查要点
结构	目录、语言、包清单、单体仓库 vs 单包仓库
技术栈	框架、构建系统、部署目标、依赖管理器
Agent配置	AGENTS.md、CLAUDE.md、.cursor/、.github/copilot/、任何现有Agent指令
文档	README、docs/、架构文档、ADR、内联文档注释
代码组织	领域结构、模块边界、导入模式、依赖图
测试	框架、覆盖率、CI门禁、测试组织
可观测性	日志模式（结构化？）、指标、错误处理、追踪
流程	PR模板、评审工作流、CI/CD配置
团队配置	Agent优先（Agent编写90%+代码）、Agent辅助（混合模式）、或Agent就绪（为未来Agent使用做准备）
依赖项	外部库是否对Agent友好？API稳定、文档完善、训练数据代表性强？

查看

references/assessment.md

获取详细检查清单和评分标准。

团队配置决定Harness形态：Agent优先的仓库从首日起就需要严格的执行规则和GC清理。Agent辅助的仓库需要清晰的边界，但可更多依赖人工评审。Agent就绪的仓库主要需要知识层。

输出： Agent就绪报告——按层级划分的当前状态结构化总结、关键发现，以及推荐的Harness组件（按优先级排序）。

呈现报告并等待用户确认或调整后再进行规划。

Phase 2: Plan

第2阶段：规划

Propose a harness plan tailored to this specific repo. Not every repo needs every component — right-size based on the assessment.

Sizing by maturity level:

Current level	Target	What to build
0 → 1	Map	AGENTS.md, ARCHITECTURE.md, core docs/ structure
1 → 2	Rules	.harness/domains.yml, principles.yml, enforcement.yml
2 → 3	Feedback	.harness/quality.yml, doc-gardening, GC patterns
3 → 4	Autonomy	Worktree isolation, escalation boundaries, agent review

Also consider repo size — a small repo (< 5k LOC) may only need Level 1–2, while a large codebase (50k+ LOC) benefits from all four levels.

The plan should list every artifact to be created or updated, grouped by phase, with a brief note on what each one does. Present it as a checklist the user can approve, modify, or trim.

Wait for confirmation before implementing.

针对该仓库提出定制化的Harness方案。并非每个仓库都需要所有组件——根据评估结果调整规模。

按成熟度层级调整规模：

当前层级	目标层级	构建内容
0 → 1	图谱层	AGENTS.md、ARCHITECTURE.md、核心docs/结构
1 → 2	规则层	.harness/domains.yml、principles.yml、enforcement.yml
2 → 3	反馈层	.harness/quality.yml、文档维护、GC模式
3 → 4	自主层	工作树隔离、升级边界、Agent评审

同时考虑仓库规模——小型仓库（<5k行代码）可能只需要1-2级，而大型代码库（50k+行代码）则能从所有四个层级中受益。

方案应列出所有要创建或更新的工件，按阶段分组，并简要说明每个工件的作用。将其呈现为用户可批准、修改或精简的清单。

等待用户确认后再实施。

Phase 3: Knowledge Layer

第3阶段：知识层

Build the artifacts that give agents a map of the codebase.

构建为Agent提供代码库图谱的工件。

AGENTS.md (~100 lines)

AGENTS.md（约100行）

The single most important file. It is a routing table, not an encyclopedia.

It should contain:

3–5 non-negotiable rules (the ones that cause the most damage when violated)
Pointers to deeper docs:
```
ARCHITECTURE.md
```
,
```
docs/
```
, active plans
How to verify work (build/test commands)
What the repo is and how it's structured (2–3 sentences)

Everything else belongs in

docs/

. If AGENTS.md exceeds ~100 lines, it's too long and should be refactored into docs/ with pointers.

这是最重要的单个文件。它是一个路由表，而非百科全书。

应包含：

3-5条不可协商的规则（违反时会造成最大损害的规则）
指向深层文档的链接：
```
ARCHITECTURE.md
```
、
```
docs/
```
、当前计划
验证工作的方法（构建/测试命令）
仓库的定位和结构说明（2-3句话）

所有其他内容都应放在

docs/

中。如果AGENTS.md超过约100行，则过长，应重构到docs/中并添加链接。

docs/ Directory

docs/目录

docs/
├── design-docs/
│   ├── index.md              # Catalogue with verification status
│   └── core-beliefs.md       # Agent-first operating principles
├── exec-plans/
│   ├── active/               # In-flight work
│   ├── completed/            # Done work (context for future agents)
│   └── tech-debt-tracker.md  # Known debt with priority
├── generated/                # Auto-generated (DB schema, API specs)
├── product-specs/
│   ├── index.md              # Feature catalogue
│   └── <feature>.md
├── references/               # External docs in agent-friendly format
├── PRODUCT_SENSE.md          # Product principles, personas, domain sensitivity
└── <DOMAIN>.md               # Domain guides (only those relevant to the repo)

Every file in docs/ should follow progressive disclosure structure:

Summary (2–3 sentences) — enough for an agent to decide if this file is relevant
Key decisions — the 3–5 most important things, up front
Details — full content for agents that need to go deeper
Pointers — links to related docs for further context

This prevents the "one big AGENTS.md" problem from recurring at the file level. Agents should be able to read just the summary of each doc and navigate to the right one, rather than loading every file into context.

Only create what the repo actually needs. Each file should contain real content derived from the assessment — not boilerplate.

docs/
├── design-docs/
│   ├── index.md              # 带验证状态的目录
│   └── core-beliefs.md       # Agent优先的操作准则
├── exec-plans/
│   ├── active/               # 进行中的工作
│   ├── completed/            # 已完成的工作（为未来Agent提供上下文）
│   └── tech-debt-tracker.md  # 已知技术债务及优先级
├── generated/                # 自动生成内容（数据库 schema、API 规格）
├── product-specs/
│   ├── index.md              # 功能目录
│   └── <feature>.md
├── references/               # Agent友好格式的外部文档
├── PRODUCT_SENSE.md          # 产品准则、用户角色、领域敏感性
└── <DOMAIN>.md               # 领域指南（仅包含与仓库相关的内容）

docs/中的每个文件都应遵循渐进式披露结构：

摘要（2-3句话）——足够让Agent判断该文件是否相关
关键决策——最重要的3-5项内容，放在最前面
详细内容——为需要深入了解的Agent提供完整内容
链接——指向相关文档的链接以获取更多上下文

这避免了“单个大AGENTS.md”的问题在文件层面重演。Agent应能够仅读取每个文档的摘要并导航到正确的文件，而非将所有文件加载到上下文中。

仅创建仓库实际需要的内容。每个文件应包含从评估中得出的真实内容——而非模板内容。

ARCHITECTURE.md

Top-level domain map answering: what are the domains, how do they relate, what are the dependency rules, where does new code go.

Read

references/knowledge-layer.md

for templates and writing guidance. Read

references/core-beliefs.md

for the core beliefs template and content guide.

顶级领域图谱，回答：有哪些领域、它们如何关联、依赖规则是什么、新代码应放在何处。

查看

references/knowledge-layer.md

获取模板和写作指南。查看

references/core-beliefs.md

获取核心准则模板和内容指南。

Phase 4: Architecture Layer

第4阶段：架构层

Define domain boundaries and dependency rules as machine-readable specs.

将领域边界和依赖规则定义为机器可读的配置。

Domain Identification

领域识别

A domain is a vertical slice — a tracer bullet that cuts through all integration layers end-to-end, from data shapes to user-facing output. It is NOT a horizontal technical layer.

The litmus test: can you trace a user action from UI through runtime, service, repo, and types — and does that path stay within one coherent business concept? If yes, that's a domain.

CORRECT (vertical slices):        WRONG (horizontal layers):
┌─────────┐ ┌──────────┐         ┌──────────────────────────┐
│ Billing  │ │ Onboard  │         │ controllers/             │ ← NOT a domain
│ ┌─────┐  │ │ ┌─────┐  │         │ models/                  │ ← NOT a domain
│ │Types│  │ │ │Types│  │         │ services/                │ ← NOT a domain
│ │Confg│  │ │ │Confg│  │         │ utils/                   │ ← NOT a domain
│ │Repo │  │ │ │Svc  │  │         └──────────────────────────┘
│ │Svc  │  │ │ │UI   │  │
│ │UI   │  │ │ └─────┘  │
│ └─────┘  │ └──────────┘
└─────────┘

Look for business concepts, not technical functions:

"billing", "onboarding", "search" = domains (vertical, own their full stack)
"controllers", "utils", "testing", "tooling" = layers or concerns (horizontal)

Read

references/architecture-layer.md

for detailed identification heuristics and the tracer-bullet test.

领域是一个垂直切片——贯穿所有集成层的追踪线，从数据结构到用户端输出。它不是水平技术层。

检验标准：能否将用户操作从UI追踪到运行时、服务、仓库和类型——且该路径始终属于一个连贯的业务概念？如果是，那就是一个领域。

正确（垂直切片）：        错误（水平层）：
┌─────────┐ ┌──────────┐         ┌──────────────────────────┐
│ Billing  │ │ Onboard  │         │ controllers/             │ ← 不是领域
│ ┌─────┐  │ │ ┌─────┐  │         │ models/                  │ ← 不是领域
│ │Types│  │ │ │Types│  │         │ services/                │ ← 不是领域
│ │Confg│  │ │ │Confg│  │         │ utils/                   │ ← 不是领域
│ │Repo │  │ │ │Svc  │  │         └──────────────────────────┘
│ │Svc  │  │ │ │UI   │  │
│ │UI   │  │ │ └─────┘  │
│ └─────┘  │ └──────────┘
└─────────┘

寻找业务概念，而非技术功能：

"billing"、"onboarding"、"search" = 领域（垂直切片，拥有完整栈）
"controllers"、"utils"、"testing"、"tooling" = 层或关注点（水平）

查看

references/architecture-layer.md

获取详细识别启发法和追踪线测试。

Layer Structure

层级结构

Within each domain, code is organized into layers:

Types → Config → Repo → Service → Runtime → UI

The key rule: dependencies flow forward only. A

types

module never imports from

service

. Cross-cutting concerns (auth, telemetry, feature flags) enter through a single explicit interface called Providers.

Not every domain has every layer. A CLI tool might only have Types → Config → Service → Runtime. A library might only have Types → Service. Map what exists.

在每个领域内，代码按以下层级组织：

Types → Config → Repo → Service → Runtime → UI

核心规则：依赖只能向前流动。

types

模块永远不能从

service

导入。跨领域关注点（认证、遥测、功能开关）通过名为Providers的单一显式接口接入。

并非每个领域都包含所有层级。CLI工具可能只有Types → Config → Service → Runtime。库可能只有Types → Service。映射现有结构即可。

Generate .harness/domains.yml

生成.harness/domains.yml

Create the domain specification. Read

references/yml-schemas.md

for the schema and

references/architecture-layer.md

for identification heuristics.

创建领域配置。查看

references/yml-schemas.md

获取 schema，查看

references/architecture-layer.md

获取识别启发法。

Phase 5: Enforcement Layer

第5阶段：执行层

Encode architectural taste as machine-readable rules. The goal: enforce boundaries centrally, allow autonomy locally.

将架构风格编码为机器可读的规则。目标：集中执行边界，允许本地自主。

Golden Principles (.harness/principles.yml)

黄金准则（.harness/principles.yml）

Identify 5–10 opinionated rules specific to this repo. Each principle needs:

What: The rule itself
Why: Why it matters (what goes wrong without it)
How to check: lint, structural test, review, or manual inspection
Examples: Concrete good/bad code snippets from this codebase

Start with principles from the assessment — patterns that are already causing problems, or invariants that are currently maintained manually but should be enforced.

识别该仓库特有的5-10条明确规则。每条准则需要：

内容：规则本身
原因：为什么重要（没有它会出现什么问题）
检查方式：lint、结构测试、评审或人工检查
示例：来自该代码库的具体正确/错误代码片段

从评估中发现的问题入手——已经引发问题的模式，或当前手动维护但应强制执行的不变量。

Mechanical Rules (.harness/enforcement.yml)

机械规则（.harness/enforcement.yml）

Concrete rules that tooling can check:

Naming conventions for files, types, functions
File size limits
Structured logging requirements
Import boundary checks
Test coverage expectations

工具可检查的具体规则：

文件、类型、函数的命名约定
文件大小限制
结构化日志要求
导入边界检查
测试覆盖率预期

Agent-Legible Error Messages

Agent友好的错误消息

This is one of the highest-leverage patterns in the entire harness. Every enforcement rule MUST include a

violation_message

template with four parts:

What's wrong — the specific violation
Why it matters — rationale linked to a principle
How to fix it — concrete remediation steps
Where to look — file paths or doc pointers

Lint error messages are a delivery mechanism for injecting remediation instructions into an agent's context at the exact moment it needs them. Generic messages ("boundary violation in X") are nearly useless. Rich messages ("X imports from Y, violating forward-only rule. Fix: inject via Providers. See: ARCHITECTURE.md#cross-cutting") let agents self-correct immediately.

这是整个Harness中最高效的模式之一。每条执行规则必须包含

violation_message

模板，包含四个部分：

问题所在——具体违规内容
重要性——与准则关联的理由
修复方法——具体补救步骤
参考位置——文件路径或文档链接

Lint错误消息是在Agent需要的时刻将修复说明注入其上下文的传递机制。通用消息（“X中存在边界违规”）几乎无用。丰富的消息（“X从Y导入，违反单向依赖规则。修复方法：通过Providers注入。参考：ARCHITECTURE.md#cross-cutting”）让Agent能够立即自我修正。

Generate Enforcement Code

生成执行代码

The .harness/*.yml specs describe rules. But specs that nothing checks are documentation that rots — the same problem the harness is designed to prevent.

For every enforcement rule, also generate at minimum one concrete artifact:

Lint configuration: ESLint/Ruff/Clippy config that enforces naming, imports, or structural rules — with agent-legible error messages
CI workflow: GitHub Actions / CI job that validates AGENTS.md links, docs/ cross-references, or knowledge freshness
Structural test: A test file that validates architectural invariants (e.g., import direction, domain boundary compliance)
Script: A validation script that checks file size limits, banned patterns, or naming conventions

Even stub implementations are better than nothing. A lint rule with a TODO body is more valuable than a perfectly documented YAML spec that nothing reads.

Read

references/enforcement-layer.md

for the principles catalog and patterns. Read

references/yml-schemas.md

for schemas.

.harness/*.yml配置描述规则。但无人检查的配置会变成过时的文档——这正是Harness要解决的问题。

对于每条执行规则，至少生成一个具体工件：

Lint配置：ESLint/Ruff/Clippy配置，用于执行命名、导入或结构规则——并包含Agent友好的错误消息
CI工作流：GitHub Actions / CI任务，用于验证AGENTS.md链接、docs/交叉引用或知识新鲜度
结构测试：验证架构不变量的测试文件（例如，导入方向、领域边界合规性）
脚本：验证文件大小限制、禁用模式或命名约定的验证脚本

即使是 stub 实现也比没有好。带有TODO主体的lint规则比无人读取的完美YAML配置更有价值。

查看

references/enforcement-layer.md

获取准则目录和模式。查看

references/yml-schemas.md

获取schema。

Phase 6: Quality Scoring

第6阶段：质量评分

Grade each domain across standardized dimensions.

Dimensions: code quality, test coverage, documentation, observability, reliability, security.

Scale: A (exemplary) through F (missing/broken).

The initial scoring is a baseline. Future harness updates compare current state against these grades to track improvement or detect drift.

Generate

.harness/quality.yml

with scores, gap notes, and review dates. Read

references/quality-scoring.md

for the rubric.

按标准化维度为每个领域评分。

维度： 代码质量、测试覆盖率、文档、可观测性、可靠性、安全性。

评分范围： A（优秀）到F（缺失/损坏）。

初始评分是基准线。未来的Harness更新会将当前状态与这些评分对比，以跟踪改进或检测漂移。

生成包含评分、差距说明和评审日期的

.harness/quality.yml

。查看

references/quality-scoring.md

获取评分标准。

Phase 6.5: Operational Legibility (if applicable)

第6.5阶段：可操作可见性（如适用）

For repos with a running application (web app, API, service), assess whether agents can observe the app, not just the code. The article's team made the running application directly legible to agents — this is what enabled 6+ hour autonomous agent sessions.

Assess and recommend:

Worktree-bootable: Can the app boot per git worktree so each agent run gets an isolated instance? If not, flag this as a high-priority gap.
Browser automation: For UI apps — can agents drive the app via Chrome DevTools Protocol (screenshots, DOM snapshots, navigation)?
Observability: Can agents query logs (LogQL), metrics (PromQL), and traces (TraceQL) from their own instance?
Ephemeral state: Are logs, metrics, and app state torn down when the agent's task completes?

This phase is only relevant for repos with runnable applications. Libraries, CLI tools, and infrastructure repos can skip it.

对于包含运行中应用的仓库（Web应用、API、服务），评估Agent是否能观测应用，而非仅能读取代码。文中团队让运行中的应用对Agent直接可见——这实现了6小时以上的自主Agent会话。

评估并推荐：

工作树可启动：应用能否通过git worktree启动，使每个Agent运行都获得独立实例？如果不能，将其标记为高优先级差距。
浏览器自动化：对于UI应用——Agent能否通过Chrome DevTools Protocol（截图、DOM快照、导航）驱动应用？
可观测性：Agent能否从自己的实例查询日志（LogQL）、指标（PromQL）和追踪（TraceQL）？
临时状态：Agent任务完成后，日志、指标和应用状态是否会被销毁？

此阶段仅适用于包含可运行应用的仓库。库、CLI工具和基础设施仓库可跳过。

Phase 7: Process Patterns

第7阶段：流程模式

Document the patterns that keep a harness-driven codebase healthy over time. These go into the appropriate

docs/

guide files.

Doc-gardening: Recurring scans for stale or incorrect documentation
Garbage collection: Identifying and cleaning up pattern drift, duplicated helpers, or accumulated "AI slop" — this is urgent, not optional. Without automated GC, agent-generated codebases degrade fast enough to consume 20% of engineering time in manual cleanup
Agent review: At Level 3+ maturity, agent-to-agent review should be the primary quality gate, not a supplement to human review. Humans review for judgment calls only (business logic, product decisions, architectural direction). The progression: L1-2 humans review everything → L3 agents pre-review, humans spot-check → L4 agent-to-agent review, humans only for escalations.
Merge philosophy: Short-lived PRs, follow-up fixes over indefinite blocking. Prerequisite: this is only appropriate when automated enforcement is in place (Level 2+ maturity), test coverage catches regressions, and agents can generate follow-up fixes. Without these, relaxed merge gates are reckless.
Feedback encoding: How review comments and bugs become doc updates or rules
Escalation boundaries: Define what decisions require human judgment vs. what agents can resolve autonomously — prevents both over-asking (slow) and under-asking (dangerous)

Read

references/process-patterns.md

for templates.

记录保持Harness驱动的代码库长期健康的模式。这些内容应放入相应的

docs/

指南文件中。

文档维护：定期扫描过时或不正确的文档
垃圾清理（GC）：识别并清理模式漂移、重复工具或累积的“AI冗余代码”——这是紧急且必要的工作。没有自动化GC，Agent生成的代码库会快速退化，导致20%的工程时间消耗在手动清理上
Agent评审：在3级及以上成熟度中，Agent间评审应成为主要质量门禁，而非人工评审的补充。人类仅评审判断性内容（业务逻辑、产品决策、架构方向）。演进路径：1-2级人类评审所有内容 → 3级Agent预评审，人类抽查 → 4级Agent间评审，仅在升级时需要人类介入
合并理念：短生命周期PR，优先后续修复而非无限期阻塞。前提条件：只有在自动化执行规则到位（2级及以上成熟度）、测试覆盖率能捕获回归、Agent能生成后续修复时，此方式才适用。没有这些条件，宽松的合并门禁是鲁莽的。
反馈编码：评审意见和缺陷如何转化为文档更新或规则
升级边界：定义哪些决策需要人工判断，哪些Agent可自主解决——避免过度求助（缓慢）和求助不足（危险）

查看

references/process-patterns.md

获取模板。

Phase 8: Verify

第8阶段：验证

Update Flow

更新流程

When updating an existing harness:

Detect drift: Compare .harness/ specs against the actual codebase
- New directories/modules not in domains.yml
- Docs referencing deleted or moved files
- Quality scores older than the configured review cadence
- Principles being violated in recently added code
Propose targeted updates: Don't rebuild — update only what drifted
Implement changes: Same phase structure, but scoped to the drift
Re-verify: Run the full verification checklist

更新现有Harness时：

检测漂移：将.harness/配置与实际代码库对比
- domains.yml中未包含的新目录/模块
- 引用已删除或移动文件的文档
- 超过配置评审周期的质量评分
- 近期添加的代码中违反准则的情况
提出针对性更新：不要重建——仅更新发生漂移的部分
实施变更：使用相同的阶段结构，但范围限定为漂移部分
重新验证：运行完整的验证清单

.harness/ Directory

.harness/目录

All machine-readable harness configuration lives in

.harness/

at the repo root.

.harness/
├── config.yml         # Harness metadata, version, tech stack summary
├── domains.yml        # Business domain definitions + layer rules
├── principles.yml     # Golden principles with rationale + examples
├── enforcement.yml    # Mechanical rules (naming, limits, logging, imports)
├── quality.yml        # Per-domain quality grades + gap tracking
└── knowledge.yml      # Knowledge base structure configuration

See

references/yml-schemas.md

for complete schemas with examples.

所有机器可读的Harness配置都位于仓库根目录的

.harness/

中。

.harness/
├── config.yml         # Harness元数据、版本、技术栈摘要
├── domains.yml        # 业务领域定义 + 层级规则
├── principles.yml     # 带理由和示例的黄金准则
├── enforcement.yml    # 机械规则（命名、限制、日志、导入）
├── quality.yml        # 按领域划分的质量评分 + 差距追踪
└── knowledge.yml      # 知识库结构配置

查看

references/yml-schemas.md

获取完整schema及示例。

Adaptation by Tech Stack

按技术栈适配

The harness is tech-agnostic but the implementation adapts:

Naming conventions: camelCase for JS/TS, snake_case for Python/Rust
Layer names: May differ — "repo" might be "repository" or "data-access"
Build commands: Vary per stack — capture in AGENTS.md
Dependency enforcement: Import style differs between module systems
Logging: Different structured logging libraries per ecosystem

Identify the stack during assessment and adapt all templates accordingly. Don't force conventions from one ecosystem onto another.

Harness是技术无关的，但实现方式会适配不同技术栈：

命名约定：JS/TS使用camelCase，Python/Rust使用snake_case
层级名称：可能不同——“repo”可能是“repository”或“data-access”
构建命令：因栈而异——在AGENTS.md中记录
依赖执行：不同模块系统的导入风格不同
日志：不同生态系统使用不同的结构化日志库

在评估阶段识别技术栈，并相应调整所有模板。不要将一个生态系统的约定强加给另一个生态系统。