long-run

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Long Run Harness

Long Run 任务编排框架

Orchestrates multi-day execution of complex tasks through a milestone pipeline. Each milestone passes through plan-crafting → run-plan → review-work with checkpoints between milestones for recovery from interruptions.
通过里程碑流水线编排复杂任务的多日执行流程。每个里程碑都会依次经过 plan-crafting → run-plan → review-work 阶段,且里程碑之间设有检查点,可从中断处恢复执行。

Core Principle

核心原则

Long-running execution must be resumable, auditable, and fail-safe. Every state transition is persisted to disk before the next action begins. If execution stops for any reason — rate limit, crash, user pause, context loss — it can resume from the last checkpoint without repeating completed work.
长期运行的任务必须具备可恢复、可审计、故障安全的特性。每次状态转换都会在执行下一步操作前持久化到磁盘。无论因何种原因导致执行停止——速率限制、崩溃、用户暂停、上下文丢失——都能从最后一个检查点恢复,无需重复已完成的工作。

Hard Gates

强制约束规则

  1. Milestones must exist before execution. Either from
    milestone-planning
    skill or user-provided. Never generate milestones inline during execution.
  2. State file must be updated before and after every milestone. No in-memory-only state. If it's not on disk, it didn't happen.
  3. Each milestone must complete the full pipeline. plan-crafting → run-plan → review-work. No shortcuts. No skipping review-work "because it looked fine."
  4. Failed milestones block dependents. If M2 depends on M1 and M1 fails review, M2 does not start. Period.
  5. User confirmation required at gate points. Before starting a new milestone phase (planning, execution, review), check if the user wants to continue, pause, or abort.
  6. Never modify completed milestones. Once a milestone passes review-work, its files are locked. If a later milestone needs changes to earlier work, that is a new milestone.
  7. Checkpoint after every milestone completion. Write a checkpoint file recording what was done, test results, and review verdict before proceeding.
  1. 执行前必须存在里程碑:里程碑需来自
    milestone-planning
    技能或由用户提供,绝不能在执行过程中动态生成。
  2. 状态文件必须在每个里程碑前后更新:禁止仅在内存中保存状态。未写入磁盘的状态视为无效。
  3. 每个里程碑必须完成完整流水线:必须依次执行 plan-crafting → run-plan → review-work,不得走捷径,不能因“看起来没问题”而跳过review-work阶段。
  4. 失败的里程碑会阻断依赖项:如果M2依赖M1且M1未通过评审,则M2不得启动,此规则无例外。
  5. 关键节点需用户确认:在启动里程碑的新阶段(计划制定、执行、评审)前,需询问用户是否继续、暂停或中止。
  6. 已完成的里程碑不得修改:一旦里程碑通过review-work,其文件将被锁定。若后续里程碑需要修改前期工作,需创建新的里程碑。
  7. 每个里程碑完成后需创建检查点:在进入下一阶段前,需写入检查点文件,记录已完成的工作、测试结果和评审结论。

When To Use

使用场景

  • After
    milestone-planning
    has produced a milestone DAG
  • When the user says "long run", "start long run", "execute milestones", or "run all milestones"
  • When resuming a previously paused long run session
  • milestone-planning
    生成里程碑DAG之后
  • 当用户说出"long run"、"start long run"、"execute milestones"或"run all milestones"时
  • 恢复之前暂停的长期运行会话时

When NOT To Use

非使用场景

  • When milestones don't exist yet (use
    milestone-planning
    first)
  • When there's only one milestone (use plan-crafting + run-plan directly)
  • For quick tasks that don't warrant multi-phase execution
  • 尚未生成里程碑时(需先使用
    milestone-planning
  • 仅存在单个里程碑时(直接使用plan-crafting + run-plan即可)
  • 无需多阶段执行的快速任务

Input

输入要求

  1. Harness state directory path — e.g.,
    docs/engineering-discipline/harness/<session-slug>/
  2. The directory must contain
    state.md
    and
    milestones/*.md
    files
If no state directory exists, ask the user if they want to run
milestone-planning
first.
  1. 编排框架状态目录路径 —— 例如:
    docs/engineering-discipline/harness/<session-slug>/
  2. 该目录必须包含
    state.md
    milestones/*.md
    文件
若状态目录不存在,需询问用户是否要先运行
milestone-planning

Process

执行流程

Phase 1: Load and Validate State

阶段1:加载并验证状态

  1. Read
    state.md
    from the harness directory
  2. Read all milestone files from
    milestones/
  3. Validate:
    • All milestones referenced in state.md have corresponding files
    • Dependency DAG is valid (no cycles, topological sort possible)
    • No milestone is in an invalid state (e.g., "executing" without a plan file)
  4. Determine current position:
    • Which milestones are completed?
    • Which milestones are ready to start (all dependencies met)?
    • Is this a fresh start or a resume?
  5. Present status to the user:
undefined
  1. 从编排框架目录读取
    state.md
    文件
  2. 读取
    milestones/
    下的所有里程碑文件
  3. 验证:
    • state.md
      中引用的所有里程碑都有对应的文件
    • 依赖DAG有效(无循环,可进行拓扑排序)
    • 没有里程碑处于无效状态(例如:未生成计划文件却标记为"executing")
  4. 确定当前执行位置:
    • 哪些里程碑已完成?
    • 哪些里程碑已准备好启动(所有依赖项已满足)?
    • 是全新启动还是恢复执行?
  5. 向用户展示状态:
undefined

Long Run Status: [Session Name]

Long Run 状态: [会话名称]

Progress: N/M milestones completed Current phase: [planning M3 | executing M3 | reviewing M3 | ready to start M3] Next up: [M3, M4 (parallel)]
Completed: M1 ✓, M2 ✓ In progress: M3 (executing) Pending: M4, M5

6. Ask user to confirm: continue, pause, or abort.
进度: 已完成N/M个里程碑 当前阶段: [计划制定M3 | 执行M3 | 评审M3 | 准备启动M3] 下一个任务: [M3, M4 (并行执行)]
已完成: M1 ✓, M2 ✓ 执行中: M3 (executing) 待执行: M4, M5

6. 询问用户确认:继续、暂停或中止。

Phase 2: Milestone Execution Loop

阶段2:里程碑执行循环

For each milestone in topological order:
┌─────────────────────────────────────┐
│         Milestone Pipeline          │
│                                     │
│  ┌──────────┐    ┌─────────┐        │
│  │  Plan    │───→│  Run    │        │
│  │ Crafting │    │  Plan   │        │
│  └──────────┘    └────┬────┘        │
│                       │             │
│                  ┌────▼────┐        │
│                  │ Review  │        │
│                  │  Work   │        │
│                  └────┬────┘        │
│                       │             │
│              ┌────────▼────────┐    │
│              │   PASS?         │    │
│              │  Yes → checkpoint│    │
│              │  No  → retry    │    │
│              └─────────────────┘    │
└─────────────────────────────────────┘
按拓扑顺序处理每个里程碑:
┌─────────────────────────────────────┐
│         里程碑执行流水线          │
│                                     │
│  ┌──────────┐    ┌─────────┐        │
│  │  Plan    │───→│  Run    │        │
│  │ Crafting │    │  Plan   │        │
│  └──────────┘    └────┬────┘        │
│                       │             │
│                  ┌────▼────┐        │
│                  │ Review  │        │
│                  │  Work   │        │
│                  └────┬────┘        │
│                       │             │
│              ┌────────▼────────┐    │
│              │   是否通过?         │    │
│              │  是 → 创建检查点│    │
│              │  否 → 重试    │    │
│              └─────────────────┘    │
└─────────────────────────────────────┘

Step 2-1: Gate Check

步骤2-1:前置检查

Before starting a milestone:
  1. Verify all dependency milestones have status
    completed
  2. Verify no file conflicts with in-progress parallel milestones
  3. Update state.md: set milestone status to
    planning
  4. Update execution log with timestamp
启动里程碑前:
  1. 验证所有依赖里程碑的状态为
    completed
  2. 验证与并行执行的里程碑无文件冲突
  3. 更新
    state.md
    :将里程碑状态设置为
    planning
  4. 在执行日志中记录时间戳

Step 2-2: Plan Crafting Phase

步骤2-2:计划制定阶段

  1. Compose a Context Brief from the milestone definition:
    • Goal → from milestone file
    • Scope → files affected from milestone file
    • Success Criteria → from milestone file
    • Constraints → inherited from the parent problem + completed milestone context
    • Completed milestone context contract: From each completed predecessor, include ONLY:
      • Files created/modified (from checkpoint's "Files Changed" list)
      • Interface contracts established (function signatures, API shapes, type definitions)
      • Success criteria that were verified as met
    • Do NOT include: execution logs, review documents, worker/validator output, or full checkpoint contents
    • Note: Context Briefs composed from milestone definitions omit the Complexity Assessment section, since routing has already been determined by the milestone-planning phase. The brief goes directly to plan-crafting without re-routing.
  2. Invoke the
    plan-crafting
    skill pattern:
    • Create a plan document at
      docs/engineering-discipline/plans/YYYY-MM-DD-<milestone-name>.md
    • The plan must satisfy all milestone success criteria
    • The plan must not modify files outside the milestone's scope
  3. Update state.md: record plan file path for this milestone
  4. User gate: Present the plan and ask for approval before execution
  1. 根据里程碑定义编写上下文简报:
    • 目标 → 来自里程碑文件
    • 范围 → 来自里程碑文件中受影响的文件
    • 成功标准 → 来自里程碑文件
    • 约束条件 → 继承自父问题 + 已完成里程碑的上下文
    • 已完成里程碑上下文约定:从每个前置已完成里程碑中仅包含:
      • 创建/修改的文件(来自检查点的“文件变更”列表)
      • 已确立的接口契约(函数签名、API结构、类型定义)
      • 已验证通过的成功标准
    • 不得包含:执行日志、评审文档、worker/validator输出或完整检查点内容
    • 注意:根据里程碑定义编写的上下文简报省略复杂度评估部分,因为路由已由milestone-planning阶段确定。简报直接进入plan-crafting阶段,无需重新路由。
  2. 调用
    plan-crafting
    技能模式:
    • docs/engineering-discipline/plans/YYYY-MM-DD-<milestone-name>.md
      路径下创建计划文档
    • 计划必须满足所有里程碑成功标准
    • 计划不得修改里程碑范围外的文件
  3. 更新
    state.md
    :记录该里程碑的计划文件路径
  4. 用户确认环节:向用户展示计划并请求执行前的批准

Step 2-3: Run Plan Phase

步骤2-3:执行计划阶段

  1. Update state.md: set milestone status to
    executing
    , increment
    Attempts
    counter by 1
  2. Execute the plan using the
    run-plan
    skill pattern:
    • Worker-validator loop for each task
    • Parallel execution for independent tasks
    • Information-isolated validators
  3. If run-plan reports failure after 3 retries on any task:
    • Update state.md: set milestone status to
      failed
    • Record failure details in execution log
    • Stop and report to user. Do not proceed to dependent milestones.
  4. If all tasks complete: proceed to review phase
  1. 更新
    state.md
    :将里程碑状态设置为
    executing
    ,并将
    Attempts
    计数器加1
  2. 使用
    run-plan
    技能模式执行计划:
    • 每个任务采用worker-validator循环
    • 独立任务并行执行
    • 验证器采用信息隔离机制
  3. 若run-plan在某任务重试3次后仍报告失败:
    • 更新
      state.md
      :将里程碑状态设置为
      failed
    • 在执行日志中记录失败详情
    • 停止执行并向用户报告,不得继续执行依赖里程碑
  4. 若所有任务完成:进入评审阶段

Step 2-4: Review Work Phase

步骤2-4:成果评审阶段

  1. Update state.md: set milestone status to
    validating
  2. Invoke the
    review-work
    skill pattern:
    • Information-isolated review against the plan document
    • Binary PASS/FAIL verdict
  3. If PASS:
    • Update state.md: set milestone status to
      completed
    • Write checkpoint file (see Checkpoint Format below)
    • Update execution log
    • Proceed to next milestone
  4. If FAIL:
    • Record review findings in execution log
    • Retry decision (based on
      Attempts
      counter in state.md, which persists across crashes):
      • If Attempts == 1: return to Step 2-3 with review feedback (re-execute same plan)
      • If Attempts == 2: return to Step 2-2 (re-plan with review feedback as constraint)
      • If Attempts >= 3: set status to
        failed
        , stop, report to user
  1. 更新
    state.md
    :将里程碑状态设置为
    validating
  2. 调用
    review-work
    技能模式:
    • 基于计划文档进行信息隔离评审
    • 给出通过/不通过的二元结论
  3. 若通过
    • 更新
      state.md
      :将里程碑状态设置为
      completed
    • 写入检查点文件(见下文检查点格式)
    • 更新执行日志
    • 进入下一个里程碑
  4. 若不通过
    • 在执行日志中记录评审结果
    • 重试决策(基于
      state.md
      中持久化的
      Attempts
      计数器,该计数器在崩溃后仍保留)
      • 若Attempts == 1:携带评审反馈返回步骤2-3(重新执行同一计划)
      • 若Attempts == 2:携带评审反馈返回步骤2-2(重新制定计划)
      • 若Attempts >= 3:将状态设置为
        failed
        ,停止执行并向用户报告

Step 2-5: Cross-Milestone Integration Check

步骤2-5:跨里程碑集成检查

After a milestone passes review-work but before writing the checkpoint, verify that the milestone's output integrates correctly with all previously completed milestones:
  1. Run the project's highest-level verification (from state.md's Verification Strategy or rediscover using plan-crafting's Verification Discovery order)
  2. Check cross-milestone interfaces: If the completed milestone defines or consumes interfaces from predecessor milestones, verify they are compatible (function signatures match, API contracts hold, types align)
If integration check passes: Proceed to checkpoint.
If integration check fails — Cross-Milestone Failure Response:
The milestone passed its own review-work (internal correctness) but breaks integration with other milestones. This is a boundary problem.
  1. Diagnose (attempt 1):
    • Read the failure output
    • Identify which interface boundary or interaction is broken
    • Determine if the fix belongs to the current milestone or requires a corrective milestone
    • If fixable within current milestone scope: dispatch a targeted fix worker → re-run review-work → re-run integration check
    • If the fix is outside current milestone scope: proceed to escalation
  2. Diagnose (attempt 2):
    • If the first fix didn't resolve it, re-analyze
    • Apply a second targeted fix
    • Re-run integration check
  3. Escalate to user (after 2 failed attempts):
    • Report: which milestones are involved, what integration boundary failed, what fixes were tried
    • Options: add corrective milestone, rollback to checkpoint, accept and continue (user acknowledges the integration gap)
    • Log the user's decision in state.md execution log
在里程碑通过review-work之后写入检查点之前,验证该里程碑的输出是否与所有已完成的里程碑正确集成:
  1. 运行项目最高级别的验证(来自
    state.md
    中的验证策略,或使用plan-crafting的验证发现顺序重新获取)
  2. 检查跨里程碑接口:若已完成的里程碑定义或使用了前置里程碑的接口,需验证其兼容性(函数签名匹配、API契约一致、类型对齐)
若集成检查通过:进入检查点阶段。
若集成检查失败——跨里程碑故障处理:
里程碑通过了自身的review-work(内部正确性),但破坏了与其他里程碑的集成,这属于边界问题。
  1. 诊断(第一次尝试)
    • 读取失败输出
    • 确定哪个接口边界或交互出现问题
    • 判断修复属于当前里程碑范围还是需要创建修正里程碑
    • 若可在当前里程碑范围内修复:调度定向修复worker → 重新运行review-work → 重新运行集成检查
    • 若修复超出当前里程碑范围:进入升级流程
  2. 诊断(第二次尝试)
    • 若第一次修复未解决问题,重新分析
    • 应用第二次定向修复
    • 重新运行集成检查
  3. 升级至用户(两次尝试失败后)
    • 报告:涉及哪些里程碑、哪个集成边界失败、已尝试的修复方案
    • 提供选项:添加修正里程碑、回滚至检查点、接受并继续(用户确认知晓集成缺口)
    • state.md
      执行日志中记录用户的决定

Step 2-6: Checkpoint

步骤2-6:检查点

After a milestone passes review:
Write
checkpoints/M<N>-checkpoint.md
:
markdown
undefined
里程碑通过评审后:
写入
checkpoints/M<N>-checkpoint.md
文件:
markdown
undefined

Checkpoint: M<N> — [Milestone Name]

检查点: M<N> — [里程碑名称]

Completed: YYYY-MM-DD HH:MM Duration: [time from planning start to review pass] Attempts: [number of plan-execute-review cycles]
完成时间: YYYY-MM-DD HH:MM 耗时: [从计划制定开始到评审通过的时长] 尝试次数: [计划-执行-评审循环的次数]

Plan File

计划文件

docs/engineering-discipline/plans/YYYY-MM-DD-<name>.md
docs/engineering-discipline/plans/YYYY-MM-DD-<name>.md

Review File

评审文件

docs/engineering-discipline/reviews/YYYY-MM-DD-<name>-review.md
docs/engineering-discipline/reviews/YYYY-MM-DD-<name>-review.md

Test Results

测试结果

[Full test suite status at checkpoint time]
[检查点时刻的完整测试套件状态]

Files Changed

文件变更

[List of files created/modified in this milestone]
[本里程碑中创建/修改的文件列表]

State After Milestone

里程碑完成后的状态

[Brief description of system state — what works now that didn't before]
undefined
[系统状态简要描述——哪些功能现在可用]
undefined

Phase 3: Parallel Milestone Execution

阶段3:并行里程碑执行

When multiple milestones have all dependencies satisfied and no file conflicts:
  1. Identify parallelizable milestone group
  2. Run plan-crafting for ALL parallel milestones first (sequentially — plans are lightweight)
  3. Present ALL plans together for batch approval: "Milestones M3 and M4 can run in parallel. Here are both plans. Approve each individually."
  4. User approves or rejects each plan independently. Only approved milestones proceed to execution. Rejected milestones return to Step 2-2 while approved ones execute.
  5. If all approved, dispatch each milestone's pipeline concurrently:
    • Each milestone runs run-plan → review-work (plan already approved in step 3)
    • Each runs in a worktree (
      isolation: "worktree"
      ) to prevent file conflicts
    • After both complete and pass review, merge worktrees back
  6. If either fails: handle independently (the other can continue if no dependency)
Worktree merge protocol:
  1. Both milestones pass review in their respective worktrees
  2. Check for file conflicts between worktree changes
  3. If no conflicts: merge sequentially (M_lower first, then M_higher)
  4. If conflicts detected: stop, report to user, request manual resolution
  5. After merge: run full test suite on merged result
  6. If tests fail: stop, report to user
当多个里程碑满足所有依赖条件且无文件冲突时:
  1. 识别可并行执行的里程碑组
  2. 先为所有并行里程碑依次执行plan-crafting(计划制定操作轻量化)
  3. 批量展示所有计划并请求批准:“里程碑M3和M4可并行执行。以下是两个计划,请分别批准。”
  4. 用户独立批准或拒绝每个计划。仅批准的里程碑进入执行阶段,被拒绝的里程碑返回步骤2-2,已批准的继续执行。
  5. 若所有计划都获得批准,并发调度每个里程碑的流水线:
    • 每个里程碑依次执行run-plan → review-work(计划已在步骤3中获批)
    • 每个里程碑在独立的worktree(
      isolation: "worktree"
      )中运行,避免文件冲突
    • 两者都完成并通过评审后,合并worktree
  6. 若其中一个失败:独立处理(另一个若无依赖可继续执行)
Worktree合并协议:
  1. 两个里程碑在各自的worktree中通过评审
  2. 检查worktree变更之间的文件冲突
  3. 若无冲突:按顺序合并(先合并编号较小的里程碑,再合并编号较大的)
  4. 若检测到冲突:停止执行,向用户报告,请求手动解决
  5. 合并后:在合并结果上运行完整测试套件
  6. 若测试失败:停止执行,向用户报告

Phase 4: Completion

阶段4:完成流程

After all milestones are completed (including the Integration Verification Milestone from milestone-planning):
  1. Update state.md: set overall status to
    completing
  2. Final E2E Gate: Run the project's highest-level verification one final time on the fully integrated codebase
  3. Run full test suite for regression check
  4. If Final E2E Gate fails:
    • Diagnose: identify which milestone's output is the likely cause
    • Create a corrective milestone via Mid-Execution Correction procedure
    • Execute corrective milestone through the full pipeline (plan-crafting → run-plan → review-work)
    • Re-run E2E Gate after correction
    • If 2 corrective attempts fail: escalate to user with full diagnosis
  5. If Final E2E Gate passes: Update state.md: set overall status to
    completed
  6. Generate completion summary:
markdown
undefined
所有里程碑完成后(包括milestone-planning生成的集成验证里程碑):
  1. 更新
    state.md
    :将整体状态设置为
    completing
  2. 最终端到端检查:在完全集成的代码库上运行项目最高级别的验证
  3. 运行完整测试套件进行回归检查
  4. 若最终端到端检查失败
    • 诊断:确定哪个里程碑的输出可能是问题根源
    • 通过执行中修正流程创建修正里程碑
    • 执行修正里程碑的完整流水线(plan-crafting → run-plan → review-work)
    • 修正后重新运行端到端检查
    • 若2次修正尝试失败:向用户提交完整诊断结果并升级处理
  5. 若最终端到端检查通过:更新
    state.md
    :将整体状态设置为
    completed
  6. 生成完成总结:
markdown
undefined

Long Run Complete: [Session Name]

Long Run 执行完成: [会话名称]

Started: YYYY-MM-DD Completed: YYYY-MM-DD Total milestones: N Total attempts: [sum of all milestone attempts]
开始时间: YYYY-MM-DD 完成时间: YYYY-MM-DD 总里程碑数: N 总尝试次数: [所有里程碑尝试次数之和]

Milestone Summary

里程碑总结

MilestoneStatusAttemptsDuration
M1: [name]✓ completed12h
M2: [name]✓ completed24h
...
里程碑状态尝试次数耗时
M1: [名称]✓ 已完成12h
M2: [名称]✓ 已完成24h
...

Final Test Suite

最终测试套件

[PASS/FAIL — N passed, M failed]
[通过/不通过 — N项通过,M项失败]

Files Changed (Total)

总文件变更

[Aggregated list across all milestones]

4. Present to user and suggest `simplify` for a final code quality pass
[所有里程碑的汇总文件列表]

4. 向用户展示总结,并建议使用`simplify`进行最终代码质量优化

Recovery Protocol

恢复协议

When resuming a paused or interrupted session:
  1. Read state.md to determine last known state
  2. For each milestone, determine recovery action:
Last StatusRecovery Action
pending
Start normally
planning
Restart plan-crafting (plan file may be incomplete)
executing
Check run-plan progress; resume or restart
validating
Restart review-work (review may be incomplete)
completed
Skip (already checkpointed)
failed
Present failure to user; ask whether to retry or skip (see Skip Rules below)
skipped
Skip (user previously chose to skip this milestone)
  1. For
    executing
    milestones: check if tasks in the plan have checkboxes marked. Resume from the first unchecked task.
  2. Read the
    Attempts
    counter from state.md to determine retry budget remaining. Do not reset the counter on resume — it persists across crashes to prevent infinite retry loops.
  3. Present recovery plan to user before proceeding.
恢复暂停或中断的会话时:
  1. 读取
    state.md
    确定最后已知状态
  2. 为每个里程碑确定恢复操作:
最后状态恢复操作
pending
正常启动
planning
重新启动plan-crafting(计划文件可能不完整)
executing
检查run-plan进度;恢复或重新启动
validating
重新启动review-work(评审可能不完整)
completed
跳过(已创建检查点)
failed
向用户展示失败信息;询问是否重试或跳过(见下文跳过规则)
skipped
跳过(用户之前选择跳过该里程碑)
  1. 对于
    executing
    状态的里程碑:检查计划中的任务是否有复选框标记。从第一个未勾选的任务恢复执行。
  2. state.md
    读取
    Attempts
    计数器,确定剩余重试次数。恢复时不得重置计数器——该计数器在崩溃后仍保留,以防止无限重试循环。
  3. 在继续执行前向用户展示恢复计划。

Mid-Execution Correction

执行中修正

If execution reveals that a completed milestone's output is incorrect or a new milestone is needed:
  1. Pause execution — do not continue with dependent milestones
  2. Log the discovery in state.md execution log: what was found, which milestone triggered the discovery
  3. User decision required: present the situation and options:
    • Add corrective milestone: Create a new milestone definition (the user writes the goal and success criteria, or re-run milestone-planning for just the new scope). Insert it into the DAG with appropriate dependencies. Resume execution from the new milestone.
    • Re-plan from a checkpoint: Roll back to a completed milestone's checkpoint, mark subsequent milestones as
      pending
      , reset their
      Attempts
      to 0, and restart from that point.
    • Abort: Set overall status to
      failed
      and stop.
  4. New milestones follow the same pipeline — plan-crafting → run-plan → review-work. No shortcuts even for "quick fixes."
  5. Completed milestones are never modified (Hard Gate #6 still applies). The corrective milestone produces new files or overwrites with a full plan cycle.
若执行过程中发现已完成里程碑的输出有误或需要新增里程碑:
  1. 暂停执行——不得继续执行依赖里程碑
  2. 记录发现:在
    state.md
    执行日志中记录发现的问题、触发发现的里程碑
  3. 需用户决策:向用户展示情况及选项:
    • 添加修正里程碑:创建新的里程碑定义(用户编写目标和成功标准,或针对新范围重新运行milestone-planning)。将其插入DAG并设置适当的依赖项。从新里程碑恢复执行。
    • 从检查点重新制定计划:回滚至某已完成里程碑的检查点,将后续里程碑标记为
      pending
      ,重置其
      Attempts
      为0,并从该点重新启动。
    • 中止:将整体状态设置为
      failed
      并停止执行。
  4. 新里程碑遵循相同流水线——plan-crafting → run-plan → review-work。即使是“快速修复”也不得走捷径。
  5. 已完成的里程碑不得修改(仍需遵守强制约束规则第6条)。修正里程碑需生成新文件或通过完整计划周期覆盖原有文件。

Skip Rules

跳过规则

When a user chooses to skip a failed milestone:
  1. Set milestone status to
    skipped
    in state.md
  2. Log the skip event with user's reason in execution log
  3. Dependents of a skipped milestone are also blocked by default — same as
    failed
    . The DAG contract is: dependents run only after prerequisites are
    completed
    .
  4. The user may explicitly unblock a dependent by acknowledging the missing prerequisite: "Proceed with M4 despite M2 being skipped." Log this override in the execution log.
  5. If the user unblocks a dependent, add a note to that milestone's Context Brief during plan-crafting: "Prerequisite M2 was skipped. The following outputs are missing: [list from M2's success criteria]."
Skipped milestones cannot be un-skipped. If the user wants to attempt the milestone later, create a new milestone with the same goal.
当用户选择跳过失败的里程碑时:
  1. state.md
    中将里程碑状态设置为
    skipped
  2. 在执行日志中记录跳过事件及用户原因
  3. 被跳过里程碑的依赖项默认也会被阻断——与
    failed
    状态相同。DAG契约规定:只有当前置条件为
    completed
    时,依赖项才能运行。
  4. 用户可通过确认缺失前置条件来显式解除依赖项的阻断:“即使M2被跳过,仍继续执行M4”。需在执行日志中记录此覆盖操作。
  5. 若用户解除依赖项的阻断,在plan-crafting阶段需为该里程碑的上下文简报添加说明:“前置里程碑M2已被跳过,缺失以下输出:[M2成功标准列表]”。
被跳过的里程碑无法取消跳过。若用户之后想尝试该里程碑,需创建具有相同目标的新里程碑。

Duration Guard

时长限制

If a single milestone's total active time (from planning start to review completion) becomes excessive:
  1. Soft limit: If a milestone has been in
    planning
    or
    executing
    status for more than what appears to be a proportionally large share of the overall work, pause and report to user: "Milestone M3 has been in progress for an extended period. Continue, re-scope, or abort?"
  2. Hard limit on attempts: The 3-attempt limit (F1) bounds retry loops. But if even a single attempt's plan-crafting generates more than 15 tasks, pause and report: "This milestone's plan has N tasks — it may be too large for a single milestone. Consider splitting."
  3. Purpose: Prevent a single runaway milestone from consuming the entire execution budget or running indefinitely on flaky tests.
若单个里程碑的总活跃时间(从计划制定开始到评审完成)过长:
  1. 软限制:若某里程碑处于
    planning
    executing
    状态的时间占总工作时长的比例过大,需暂停并向用户报告:“里程碑M3已执行较长时间。是否继续、重新规划范围或中止?”
  2. 尝试次数硬限制:3次尝试限制(强制约束规则第1条)可避免无限重试循环。但若单次计划制定生成超过15个任务,需暂停并报告:“该里程碑的计划包含N个任务——单个里程碑可能过大,建议拆分。”
  3. 目的:防止单个失控里程碑消耗全部执行预算或因不稳定测试无限期运行。

Context Window Management

上下文窗口管理

Long-running sessions will hit context window limits. Claude Code automatically compresses old messages (context collapse). The harness must be designed to survive this:
  1. Never rely on conversation memory for state. All state lives in
    state.md
    and milestone files on disk. If the context is compressed, the harness re-reads state files — no information is lost.
  2. Each milestone is a fresh context boundary. When starting a new milestone's plan-crafting, the worker subagent starts with a clean context. It receives only the milestone definition and completed predecessor context (see F8 contract) — not the full conversation history.
  3. Checkpoint files are the source of truth. If context is lost mid-milestone, recovery reads the checkpoint files, not compressed conversation summaries.
  4. Avoid accumulating large inline state. Do not build up a running summary of all milestones in the conversation. Instead, reference state.md and checkpoint files by path.
长期运行的会话会遇到上下文窗口限制。Claude Code会自动压缩旧消息(上下文折叠)。编排框架必须设计为能应对此情况:
  1. 绝不依赖对话内存存储状态:所有状态都存储在磁盘上的
    state.md
    和里程碑文件中。若上下文被压缩,编排框架可重新读取状态文件——不会丢失任何信息。
  2. 每个里程碑都是新的上下文边界:启动新里程碑的plan-crafting时,worker子代理从干净上下文开始。仅接收里程碑定义和已完成前置里程碑的上下文(见强制约束规则第8条约定)——不接收完整对话历史。
  3. 检查点文件是事实来源:若在里程碑执行过程中丢失上下文,需从检查点文件恢复,而非压缩的对话摘要。
  4. 避免累积大型内联状态:不要在对话中构建所有里程碑的运行摘要。相反,通过路径引用
    state.md
    和检查点文件。

Rate Limit Handling

速率限制处理

Long-running sessions will encounter rate limits. Claude Code has built-in retry with exponential backoff (up to 10 retries, 5-minute max backoff). The harness should work with this, not against it:
  1. Let claude-code handle transient rate limits. Short 429/529 errors are retried automatically with backoff. Do not preemptively save state on every API error.
  2. Save state on persistent rate limits. If a rate limit persists beyond the automatic retry window (you'll see repeated "rate limit" messages), record current state to disk immediately.
  3. Log the rate limit event in execution log with timestamp.
  4. Report to user: "Rate limit hit. State saved. Resume with
    long-run
    when ready."
  5. Do NOT add manual retry loops on top of claude-code's built-in retry — this causes retry amplification.
  6. Background agent bail: Claude Code's background agents (like reviewer subagents) bail immediately on 529 overload errors instead of retrying. This is why Phase 2.5 reviewer failure handling exists — reviewer failures are often transient rate limits, not permanent errors.
长期运行的会话会遇到速率限制。Claude Code内置指数退避重试机制(最多10次重试,最大退避时间5分钟)。编排框架应配合该机制,而非与之冲突:
  1. 让claude-code处理临时速率限制:短暂的429/529错误会自动重试并退避。无需在每次API错误时都预先保存状态。
  2. 遇到持续速率限制时保存状态:若速率限制超出自动重试窗口(会看到重复的“rate limit”消息),需立即将当前状态写入磁盘。
  3. 在执行日志中记录速率限制事件及时间戳。
  4. 向用户报告:“已触发速率限制。状态已保存。准备就绪后使用
    long-run
    恢复执行。”
  5. 不要在claude-code内置重试之上添加手动重试循环——这会导致重试放大。
  6. 后台代理退出:Claude Code的后台代理(如评审子代理)遇到529过载错误时会立即退出,而非重试。这就是步骤2.5评审失败处理存在的原因——评审失败通常是临时速率限制,而非永久性错误。

Anti-Patterns

反模式

Anti-PatternWhy It Fails
Generating milestones inline instead of using milestone-planningMilestones lack adversarial review; poor decomposition
Skipping review-work for "simple" milestonesUndetected defects compound across milestones
Continuing after a milestone failsDependent milestones build on broken foundation
Not updating state.md between phasesCrash loses progress; cannot resume
Modifying completed milestone filesBreaks checkpoint invariant; invalidates reviews
Running parallel milestones without worktree isolationFile conflicts corrupt both milestones
Auto-retrying on rate limitWastes quota; user may prefer to wait
Skipping user gates between milestonesUser loses control of multi-day execution
Merging worktrees without conflict checkSilent data loss if files overlap
Skipping cross-milestone integration checkMilestones pass independently but break each other at boundaries
Retrying E2E failures indefinitely without user escalation2-attempt limit exists to avoid budget waste on misdiagnosed problems
反模式失败原因
动态生成里程碑而非使用milestone-planning里程碑缺乏对抗性评审;分解质量差
为“简单”里程碑跳过review-work未检测到的缺陷会在里程碑间累积
里程碑失败后仍继续执行依赖里程碑会基于错误的基础构建
阶段转换之间不更新
state.md
崩溃会丢失进度;无法恢复
修改已完成里程碑的文件破坏检查点不变性;使评审无效
不使用worktree隔离运行并行里程碑文件冲突会损坏两个里程碑
速率限制时自动重试浪费配额;用户可能更愿意等待
跳过里程碑之间的用户确认环节用户失去对多日执行流程的控制权
不检查冲突就合并worktree文件重叠会导致静默数据丢失
跳过跨里程碑集成检查里程碑独立通过测试,但在边界处相互破坏
无限期重试端到端失败而不升级至用户设置2次尝试限制是为了避免因诊断错误浪费预算

Minimal Checklist

最小检查清单

  • State directory exists with valid state.md and milestone files
  • Dependency DAG validated (no cycles)
  • Current position determined (fresh start or resume)
  • User confirmed continuation at session start
  • Each milestone goes through plan-crafting → run-plan → review-work
  • State.md updated before and after every phase transition
  • Checkpoint written after every successful milestone
  • Failed milestones block dependents
  • Parallel milestones use worktree isolation
  • Cross-milestone integration check passes after each milestone
  • Final E2E Gate passes at completion
  • Full test suite passes at completion
  • 状态目录存在且包含有效的
    state.md
    和里程碑文件
  • 依赖DAG已验证(无循环)
  • 已确定当前执行位置(全新启动或恢复执行)
  • 会话启动时用户已确认继续
  • 每个里程碑都依次执行plan-crafting → run-plan → review-work
  • 每次阶段转换前后都更新了
    state.md
  • 每个成功完成的里程碑都已创建检查点
  • 失败的里程碑已阻断依赖项
  • 并行里程碑使用worktree隔离
  • 每个里程碑完成后跨里程碑集成检查都通过
  • 完成时最终端到端检查通过
  • 完成时完整测试套件通过

Transition

后续流程

After long run completion:
  • For final code quality pass →
    simplify
    skill
  • If issues found in completion testing →
    systematic-debugging
    skill
  • If user wants to extend with more milestones →
    milestone-planning
    skill
This skill itself does not invoke the next skill. It reports completion and lets the user decide the next step.
Long Run执行完成后:
  • 如需最终代码质量优化 → 使用
    simplify
    技能
  • 若在完成测试中发现问题 → 使用
    systematic-debugging
    技能
  • 若用户想添加更多里程碑 → 使用
    milestone-planning
    技能
本技能不会自动调用下一个技能。它会报告完成情况,由用户决定下一步操作。",