long-run
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseLong Run Harness
Long Run 任务编排框架
Orchestrates multi-day execution of complex tasks through a milestone pipeline. Each milestone passes through plan-crafting → run-plan → review-work with checkpoints between milestones for recovery from interruptions.
通过里程碑流水线编排复杂任务的多日执行流程。每个里程碑都会依次经过 plan-crafting → run-plan → review-work 阶段,且里程碑之间设有检查点,可从中断处恢复执行。
Core Principle
核心原则
Long-running execution must be resumable, auditable, and fail-safe. Every state transition is persisted to disk before the next action begins. If execution stops for any reason — rate limit, crash, user pause, context loss — it can resume from the last checkpoint without repeating completed work.
长期运行的任务必须具备可恢复、可审计、故障安全的特性。每次状态转换都会在执行下一步操作前持久化到磁盘。无论因何种原因导致执行停止——速率限制、崩溃、用户暂停、上下文丢失——都能从最后一个检查点恢复,无需重复已完成的工作。
Hard Gates
强制约束规则
- Milestones must exist before execution. Either from skill or user-provided. Never generate milestones inline during execution.
milestone-planning - State file must be updated before and after every milestone. No in-memory-only state. If it's not on disk, it didn't happen.
- Each milestone must complete the full pipeline. plan-crafting → run-plan → review-work. No shortcuts. No skipping review-work "because it looked fine."
- Failed milestones block dependents. If M2 depends on M1 and M1 fails review, M2 does not start. Period.
- User confirmation required at gate points. Before starting a new milestone phase (planning, execution, review), check if the user wants to continue, pause, or abort.
- Never modify completed milestones. Once a milestone passes review-work, its files are locked. If a later milestone needs changes to earlier work, that is a new milestone.
- Checkpoint after every milestone completion. Write a checkpoint file recording what was done, test results, and review verdict before proceeding.
- 执行前必须存在里程碑:里程碑需来自技能或由用户提供,绝不能在执行过程中动态生成。
milestone-planning - 状态文件必须在每个里程碑前后更新:禁止仅在内存中保存状态。未写入磁盘的状态视为无效。
- 每个里程碑必须完成完整流水线:必须依次执行 plan-crafting → run-plan → review-work,不得走捷径,不能因“看起来没问题”而跳过review-work阶段。
- 失败的里程碑会阻断依赖项:如果M2依赖M1且M1未通过评审,则M2不得启动,此规则无例外。
- 关键节点需用户确认:在启动里程碑的新阶段(计划制定、执行、评审)前,需询问用户是否继续、暂停或中止。
- 已完成的里程碑不得修改:一旦里程碑通过review-work,其文件将被锁定。若后续里程碑需要修改前期工作,需创建新的里程碑。
- 每个里程碑完成后需创建检查点:在进入下一阶段前,需写入检查点文件,记录已完成的工作、测试结果和评审结论。
When To Use
使用场景
- After has produced a milestone DAG
milestone-planning - When the user says "long run", "start long run", "execute milestones", or "run all milestones"
- When resuming a previously paused long run session
- 在生成里程碑DAG之后
milestone-planning - 当用户说出"long run"、"start long run"、"execute milestones"或"run all milestones"时
- 恢复之前暂停的长期运行会话时
When NOT To Use
非使用场景
- When milestones don't exist yet (use first)
milestone-planning - When there's only one milestone (use plan-crafting + run-plan directly)
- For quick tasks that don't warrant multi-phase execution
- 尚未生成里程碑时(需先使用)
milestone-planning - 仅存在单个里程碑时(直接使用plan-crafting + run-plan即可)
- 无需多阶段执行的快速任务
Input
输入要求
- Harness state directory path — e.g.,
docs/engineering-discipline/harness/<session-slug>/ - The directory must contain and
state.mdfilesmilestones/*.md
If no state directory exists, ask the user if they want to run first.
milestone-planning- 编排框架状态目录路径 —— 例如:
docs/engineering-discipline/harness/<session-slug>/ - 该目录必须包含和
state.md文件milestones/*.md
若状态目录不存在,需询问用户是否要先运行。
milestone-planningProcess
执行流程
Phase 1: Load and Validate State
阶段1:加载并验证状态
- Read from the harness directory
state.md - Read all milestone files from
milestones/ - Validate:
- All milestones referenced in state.md have corresponding files
- Dependency DAG is valid (no cycles, topological sort possible)
- No milestone is in an invalid state (e.g., "executing" without a plan file)
- Determine current position:
- Which milestones are completed?
- Which milestones are ready to start (all dependencies met)?
- Is this a fresh start or a resume?
- Present status to the user:
undefined- 从编排框架目录读取文件
state.md - 读取下的所有里程碑文件
milestones/ - 验证:
- 中引用的所有里程碑都有对应的文件
state.md - 依赖DAG有效(无循环,可进行拓扑排序)
- 没有里程碑处于无效状态(例如:未生成计划文件却标记为"executing")
- 确定当前执行位置:
- 哪些里程碑已完成?
- 哪些里程碑已准备好启动(所有依赖项已满足)?
- 是全新启动还是恢复执行?
- 向用户展示状态:
undefinedLong Run Status: [Session Name]
Long Run 状态: [会话名称]
Progress: N/M milestones completed
Current phase: [planning M3 | executing M3 | reviewing M3 | ready to start M3]
Next up: [M3, M4 (parallel)]
Completed: M1 ✓, M2 ✓
In progress: M3 (executing)
Pending: M4, M5
6. Ask user to confirm: continue, pause, or abort.进度: 已完成N/M个里程碑
当前阶段: [计划制定M3 | 执行M3 | 评审M3 | 准备启动M3]
下一个任务: [M3, M4 (并行执行)]
已完成: M1 ✓, M2 ✓
执行中: M3 (executing)
待执行: M4, M5
6. 询问用户确认:继续、暂停或中止。Phase 2: Milestone Execution Loop
阶段2:里程碑执行循环
For each milestone in topological order:
┌─────────────────────────────────────┐
│ Milestone Pipeline │
│ │
│ ┌──────────┐ ┌─────────┐ │
│ │ Plan │───→│ Run │ │
│ │ Crafting │ │ Plan │ │
│ └──────────┘ └────┬────┘ │
│ │ │
│ ┌────▼────┐ │
│ │ Review │ │
│ │ Work │ │
│ └────┬────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ PASS? │ │
│ │ Yes → checkpoint│ │
│ │ No → retry │ │
│ └─────────────────┘ │
└─────────────────────────────────────┘按拓扑顺序处理每个里程碑:
┌─────────────────────────────────────┐
│ 里程碑执行流水线 │
│ │
│ ┌──────────┐ ┌─────────┐ │
│ │ Plan │───→│ Run │ │
│ │ Crafting │ │ Plan │ │
│ └──────────┘ └────┬────┘ │
│ │ │
│ ┌────▼────┐ │
│ │ Review │ │
│ │ Work │ │
│ └────┬────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ 是否通过? │ │
│ │ 是 → 创建检查点│ │
│ │ 否 → 重试 │ │
│ └─────────────────┘ │
└─────────────────────────────────────┘Step 2-1: Gate Check
步骤2-1:前置检查
Before starting a milestone:
- Verify all dependency milestones have status
completed - Verify no file conflicts with in-progress parallel milestones
- Update state.md: set milestone status to
planning - Update execution log with timestamp
启动里程碑前:
- 验证所有依赖里程碑的状态为
completed - 验证与并行执行的里程碑无文件冲突
- 更新:将里程碑状态设置为
state.mdplanning - 在执行日志中记录时间戳
Step 2-2: Plan Crafting Phase
步骤2-2:计划制定阶段
- Compose a Context Brief from the milestone definition:
- Goal → from milestone file
- Scope → files affected from milestone file
- Success Criteria → from milestone file
- Constraints → inherited from the parent problem + completed milestone context
- Completed milestone context contract: From each completed predecessor, include ONLY:
- Files created/modified (from checkpoint's "Files Changed" list)
- Interface contracts established (function signatures, API shapes, type definitions)
- Success criteria that were verified as met
- Do NOT include: execution logs, review documents, worker/validator output, or full checkpoint contents
- Note: Context Briefs composed from milestone definitions omit the Complexity Assessment section, since routing has already been determined by the milestone-planning phase. The brief goes directly to plan-crafting without re-routing.
- Invoke the skill pattern:
plan-crafting- Create a plan document at
docs/engineering-discipline/plans/YYYY-MM-DD-<milestone-name>.md - The plan must satisfy all milestone success criteria
- The plan must not modify files outside the milestone's scope
- Create a plan document at
- Update state.md: record plan file path for this milestone
- User gate: Present the plan and ask for approval before execution
- 根据里程碑定义编写上下文简报:
- 目标 → 来自里程碑文件
- 范围 → 来自里程碑文件中受影响的文件
- 成功标准 → 来自里程碑文件
- 约束条件 → 继承自父问题 + 已完成里程碑的上下文
- 已完成里程碑上下文约定:从每个前置已完成里程碑中仅包含:
- 创建/修改的文件(来自检查点的“文件变更”列表)
- 已确立的接口契约(函数签名、API结构、类型定义)
- 已验证通过的成功标准
- 不得包含:执行日志、评审文档、worker/validator输出或完整检查点内容
- 注意:根据里程碑定义编写的上下文简报省略复杂度评估部分,因为路由已由milestone-planning阶段确定。简报直接进入plan-crafting阶段,无需重新路由。
- 调用技能模式:
plan-crafting- 在路径下创建计划文档
docs/engineering-discipline/plans/YYYY-MM-DD-<milestone-name>.md - 计划必须满足所有里程碑成功标准
- 计划不得修改里程碑范围外的文件
- 在
- 更新:记录该里程碑的计划文件路径
state.md - 用户确认环节:向用户展示计划并请求执行前的批准
Step 2-3: Run Plan Phase
步骤2-3:执行计划阶段
- Update state.md: set milestone status to , increment
executingcounter by 1Attempts - Execute the plan using the skill pattern:
run-plan- Worker-validator loop for each task
- Parallel execution for independent tasks
- Information-isolated validators
- If run-plan reports failure after 3 retries on any task:
- Update state.md: set milestone status to
failed - Record failure details in execution log
- Stop and report to user. Do not proceed to dependent milestones.
- Update state.md: set milestone status to
- If all tasks complete: proceed to review phase
- 更新:将里程碑状态设置为
state.md,并将executing计数器加1Attempts - 使用技能模式执行计划:
run-plan- 每个任务采用worker-validator循环
- 独立任务并行执行
- 验证器采用信息隔离机制
- 若run-plan在某任务重试3次后仍报告失败:
- 更新:将里程碑状态设置为
state.mdfailed - 在执行日志中记录失败详情
- 停止执行并向用户报告,不得继续执行依赖里程碑
- 更新
- 若所有任务完成:进入评审阶段
Step 2-4: Review Work Phase
步骤2-4:成果评审阶段
- Update state.md: set milestone status to
validating - Invoke the skill pattern:
review-work- Information-isolated review against the plan document
- Binary PASS/FAIL verdict
- If PASS:
- Update state.md: set milestone status to
completed - Write checkpoint file (see Checkpoint Format below)
- Update execution log
- Proceed to next milestone
- Update state.md: set milestone status to
- If FAIL:
- Record review findings in execution log
- Retry decision (based on counter in state.md, which persists across crashes):
Attempts- If Attempts == 1: return to Step 2-3 with review feedback (re-execute same plan)
- If Attempts == 2: return to Step 2-2 (re-plan with review feedback as constraint)
- If Attempts >= 3: set status to , stop, report to user
failed
- 更新:将里程碑状态设置为
state.mdvalidating - 调用技能模式:
review-work- 基于计划文档进行信息隔离评审
- 给出通过/不通过的二元结论
- 若通过:
- 更新:将里程碑状态设置为
state.mdcompleted - 写入检查点文件(见下文检查点格式)
- 更新执行日志
- 进入下一个里程碑
- 更新
- 若不通过:
- 在执行日志中记录评审结果
- 重试决策(基于中持久化的
state.md计数器,该计数器在崩溃后仍保留):Attempts- 若Attempts == 1:携带评审反馈返回步骤2-3(重新执行同一计划)
- 若Attempts == 2:携带评审反馈返回步骤2-2(重新制定计划)
- 若Attempts >= 3:将状态设置为,停止执行并向用户报告
failed
Step 2-5: Cross-Milestone Integration Check
步骤2-5:跨里程碑集成检查
After a milestone passes review-work but before writing the checkpoint, verify that the milestone's output integrates correctly with all previously completed milestones:
- Run the project's highest-level verification (from state.md's Verification Strategy or rediscover using plan-crafting's Verification Discovery order)
- Check cross-milestone interfaces: If the completed milestone defines or consumes interfaces from predecessor milestones, verify they are compatible (function signatures match, API contracts hold, types align)
If integration check passes: Proceed to checkpoint.
If integration check fails — Cross-Milestone Failure Response:
The milestone passed its own review-work (internal correctness) but breaks integration with other milestones. This is a boundary problem.
-
Diagnose (attempt 1):
- Read the failure output
- Identify which interface boundary or interaction is broken
- Determine if the fix belongs to the current milestone or requires a corrective milestone
- If fixable within current milestone scope: dispatch a targeted fix worker → re-run review-work → re-run integration check
- If the fix is outside current milestone scope: proceed to escalation
-
Diagnose (attempt 2):
- If the first fix didn't resolve it, re-analyze
- Apply a second targeted fix
- Re-run integration check
-
Escalate to user (after 2 failed attempts):
- Report: which milestones are involved, what integration boundary failed, what fixes were tried
- Options: add corrective milestone, rollback to checkpoint, accept and continue (user acknowledges the integration gap)
- Log the user's decision in state.md execution log
在里程碑通过review-work之后但写入检查点之前,验证该里程碑的输出是否与所有已完成的里程碑正确集成:
- 运行项目最高级别的验证(来自中的验证策略,或使用plan-crafting的验证发现顺序重新获取)
state.md - 检查跨里程碑接口:若已完成的里程碑定义或使用了前置里程碑的接口,需验证其兼容性(函数签名匹配、API契约一致、类型对齐)
若集成检查通过:进入检查点阶段。
若集成检查失败——跨里程碑故障处理:
里程碑通过了自身的review-work(内部正确性),但破坏了与其他里程碑的集成,这属于边界问题。
-
诊断(第一次尝试):
- 读取失败输出
- 确定哪个接口边界或交互出现问题
- 判断修复属于当前里程碑范围还是需要创建修正里程碑
- 若可在当前里程碑范围内修复:调度定向修复worker → 重新运行review-work → 重新运行集成检查
- 若修复超出当前里程碑范围:进入升级流程
-
诊断(第二次尝试):
- 若第一次修复未解决问题,重新分析
- 应用第二次定向修复
- 重新运行集成检查
-
升级至用户(两次尝试失败后):
- 报告:涉及哪些里程碑、哪个集成边界失败、已尝试的修复方案
- 提供选项:添加修正里程碑、回滚至检查点、接受并继续(用户确认知晓集成缺口)
- 在执行日志中记录用户的决定
state.md
Step 2-6: Checkpoint
步骤2-6:检查点
After a milestone passes review:
Write :
checkpoints/M<N>-checkpoint.mdmarkdown
undefined里程碑通过评审后:
写入文件:
checkpoints/M<N>-checkpoint.mdmarkdown
undefinedCheckpoint: M<N> — [Milestone Name]
检查点: M<N> — [里程碑名称]
Completed: YYYY-MM-DD HH:MM
Duration: [time from planning start to review pass]
Attempts: [number of plan-execute-review cycles]
完成时间: YYYY-MM-DD HH:MM
耗时: [从计划制定开始到评审通过的时长]
尝试次数: [计划-执行-评审循环的次数]
Plan File
计划文件
docs/engineering-discipline/plans/YYYY-MM-DD-<name>.mddocs/engineering-discipline/plans/YYYY-MM-DD-<name>.mdReview File
评审文件
docs/engineering-discipline/reviews/YYYY-MM-DD-<name>-review.mddocs/engineering-discipline/reviews/YYYY-MM-DD-<name>-review.mdTest Results
测试结果
[Full test suite status at checkpoint time]
[检查点时刻的完整测试套件状态]
Files Changed
文件变更
[List of files created/modified in this milestone]
[本里程碑中创建/修改的文件列表]
State After Milestone
里程碑完成后的状态
[Brief description of system state — what works now that didn't before]
undefined[系统状态简要描述——哪些功能现在可用]
undefinedPhase 3: Parallel Milestone Execution
阶段3:并行里程碑执行
When multiple milestones have all dependencies satisfied and no file conflicts:
- Identify parallelizable milestone group
- Run plan-crafting for ALL parallel milestones first (sequentially — plans are lightweight)
- Present ALL plans together for batch approval: "Milestones M3 and M4 can run in parallel. Here are both plans. Approve each individually."
- User approves or rejects each plan independently. Only approved milestones proceed to execution. Rejected milestones return to Step 2-2 while approved ones execute.
- If all approved, dispatch each milestone's pipeline concurrently:
- Each milestone runs run-plan → review-work (plan already approved in step 3)
- Each runs in a worktree () to prevent file conflicts
isolation: "worktree" - After both complete and pass review, merge worktrees back
- If either fails: handle independently (the other can continue if no dependency)
Worktree merge protocol:
- Both milestones pass review in their respective worktrees
- Check for file conflicts between worktree changes
- If no conflicts: merge sequentially (M_lower first, then M_higher)
- If conflicts detected: stop, report to user, request manual resolution
- After merge: run full test suite on merged result
- If tests fail: stop, report to user
当多个里程碑满足所有依赖条件且无文件冲突时:
- 识别可并行执行的里程碑组
- 先为所有并行里程碑依次执行plan-crafting(计划制定操作轻量化)
- 批量展示所有计划并请求批准:“里程碑M3和M4可并行执行。以下是两个计划,请分别批准。”
- 用户独立批准或拒绝每个计划。仅批准的里程碑进入执行阶段,被拒绝的里程碑返回步骤2-2,已批准的继续执行。
- 若所有计划都获得批准,并发调度每个里程碑的流水线:
- 每个里程碑依次执行run-plan → review-work(计划已在步骤3中获批)
- 每个里程碑在独立的worktree()中运行,避免文件冲突
isolation: "worktree" - 两者都完成并通过评审后,合并worktree
- 若其中一个失败:独立处理(另一个若无依赖可继续执行)
Worktree合并协议:
- 两个里程碑在各自的worktree中通过评审
- 检查worktree变更之间的文件冲突
- 若无冲突:按顺序合并(先合并编号较小的里程碑,再合并编号较大的)
- 若检测到冲突:停止执行,向用户报告,请求手动解决
- 合并后:在合并结果上运行完整测试套件
- 若测试失败:停止执行,向用户报告
Phase 4: Completion
阶段4:完成流程
After all milestones are completed (including the Integration Verification Milestone from milestone-planning):
- Update state.md: set overall status to
completing - Final E2E Gate: Run the project's highest-level verification one final time on the fully integrated codebase
- Run full test suite for regression check
- If Final E2E Gate fails:
- Diagnose: identify which milestone's output is the likely cause
- Create a corrective milestone via Mid-Execution Correction procedure
- Execute corrective milestone through the full pipeline (plan-crafting → run-plan → review-work)
- Re-run E2E Gate after correction
- If 2 corrective attempts fail: escalate to user with full diagnosis
- If Final E2E Gate passes: Update state.md: set overall status to
completed - Generate completion summary:
markdown
undefined所有里程碑完成后(包括milestone-planning生成的集成验证里程碑):
- 更新:将整体状态设置为
state.mdcompleting - 最终端到端检查:在完全集成的代码库上运行项目最高级别的验证
- 运行完整测试套件进行回归检查
- 若最终端到端检查失败:
- 诊断:确定哪个里程碑的输出可能是问题根源
- 通过执行中修正流程创建修正里程碑
- 执行修正里程碑的完整流水线(plan-crafting → run-plan → review-work)
- 修正后重新运行端到端检查
- 若2次修正尝试失败:向用户提交完整诊断结果并升级处理
- 若最终端到端检查通过:更新:将整体状态设置为
state.mdcompleted - 生成完成总结:
markdown
undefinedLong Run Complete: [Session Name]
Long Run 执行完成: [会话名称]
Started: YYYY-MM-DD
Completed: YYYY-MM-DD
Total milestones: N
Total attempts: [sum of all milestone attempts]
开始时间: YYYY-MM-DD
完成时间: YYYY-MM-DD
总里程碑数: N
总尝试次数: [所有里程碑尝试次数之和]
Milestone Summary
里程碑总结
| Milestone | Status | Attempts | Duration |
|---|---|---|---|
| M1: [name] | ✓ completed | 1 | 2h |
| M2: [name] | ✓ completed | 2 | 4h |
| ... |
| 里程碑 | 状态 | 尝试次数 | 耗时 |
|---|---|---|---|
| M1: [名称] | ✓ 已完成 | 1 | 2h |
| M2: [名称] | ✓ 已完成 | 2 | 4h |
| ... |
Final Test Suite
最终测试套件
[PASS/FAIL — N passed, M failed]
[通过/不通过 — N项通过,M项失败]
Files Changed (Total)
总文件变更
[Aggregated list across all milestones]
4. Present to user and suggest `simplify` for a final code quality pass[所有里程碑的汇总文件列表]
4. 向用户展示总结,并建议使用`simplify`进行最终代码质量优化Recovery Protocol
恢复协议
When resuming a paused or interrupted session:
- Read state.md to determine last known state
- For each milestone, determine recovery action:
| Last Status | Recovery Action |
|---|---|
| Start normally |
| Restart plan-crafting (plan file may be incomplete) |
| Check run-plan progress; resume or restart |
| Restart review-work (review may be incomplete) |
| Skip (already checkpointed) |
| Present failure to user; ask whether to retry or skip (see Skip Rules below) |
| Skip (user previously chose to skip this milestone) |
- For milestones: check if tasks in the plan have checkboxes marked. Resume from the first unchecked task.
executing - Read the counter from state.md to determine retry budget remaining. Do not reset the counter on resume — it persists across crashes to prevent infinite retry loops.
Attempts - Present recovery plan to user before proceeding.
恢复暂停或中断的会话时:
- 读取确定最后已知状态
state.md - 为每个里程碑确定恢复操作:
| 最后状态 | 恢复操作 |
|---|---|
| 正常启动 |
| 重新启动plan-crafting(计划文件可能不完整) |
| 检查run-plan进度;恢复或重新启动 |
| 重新启动review-work(评审可能不完整) |
| 跳过(已创建检查点) |
| 向用户展示失败信息;询问是否重试或跳过(见下文跳过规则) |
| 跳过(用户之前选择跳过该里程碑) |
- 对于状态的里程碑:检查计划中的任务是否有复选框标记。从第一个未勾选的任务恢复执行。
executing - 从读取
state.md计数器,确定剩余重试次数。恢复时不得重置计数器——该计数器在崩溃后仍保留,以防止无限重试循环。Attempts - 在继续执行前向用户展示恢复计划。
Mid-Execution Correction
执行中修正
If execution reveals that a completed milestone's output is incorrect or a new milestone is needed:
- Pause execution — do not continue with dependent milestones
- Log the discovery in state.md execution log: what was found, which milestone triggered the discovery
- User decision required: present the situation and options:
- Add corrective milestone: Create a new milestone definition (the user writes the goal and success criteria, or re-run milestone-planning for just the new scope). Insert it into the DAG with appropriate dependencies. Resume execution from the new milestone.
- Re-plan from a checkpoint: Roll back to a completed milestone's checkpoint, mark subsequent milestones as , reset their
pendingto 0, and restart from that point.Attempts - Abort: Set overall status to and stop.
failed
- New milestones follow the same pipeline — plan-crafting → run-plan → review-work. No shortcuts even for "quick fixes."
- Completed milestones are never modified (Hard Gate #6 still applies). The corrective milestone produces new files or overwrites with a full plan cycle.
若执行过程中发现已完成里程碑的输出有误或需要新增里程碑:
- 暂停执行——不得继续执行依赖里程碑
- 记录发现:在执行日志中记录发现的问题、触发发现的里程碑
state.md - 需用户决策:向用户展示情况及选项:
- 添加修正里程碑:创建新的里程碑定义(用户编写目标和成功标准,或针对新范围重新运行milestone-planning)。将其插入DAG并设置适当的依赖项。从新里程碑恢复执行。
- 从检查点重新制定计划:回滚至某已完成里程碑的检查点,将后续里程碑标记为,重置其
pending为0,并从该点重新启动。Attempts - 中止:将整体状态设置为并停止执行。
failed
- 新里程碑遵循相同流水线——plan-crafting → run-plan → review-work。即使是“快速修复”也不得走捷径。
- 已完成的里程碑不得修改(仍需遵守强制约束规则第6条)。修正里程碑需生成新文件或通过完整计划周期覆盖原有文件。
Skip Rules
跳过规则
When a user chooses to skip a failed milestone:
- Set milestone status to in state.md
skipped - Log the skip event with user's reason in execution log
- Dependents of a skipped milestone are also blocked by default — same as . The DAG contract is: dependents run only after prerequisites are
failed.completed - The user may explicitly unblock a dependent by acknowledging the missing prerequisite: "Proceed with M4 despite M2 being skipped." Log this override in the execution log.
- If the user unblocks a dependent, add a note to that milestone's Context Brief during plan-crafting: "Prerequisite M2 was skipped. The following outputs are missing: [list from M2's success criteria]."
Skipped milestones cannot be un-skipped. If the user wants to attempt the milestone later, create a new milestone with the same goal.
当用户选择跳过失败的里程碑时:
- 在中将里程碑状态设置为
state.mdskipped - 在执行日志中记录跳过事件及用户原因
- 被跳过里程碑的依赖项默认也会被阻断——与状态相同。DAG契约规定:只有当前置条件为
failed时,依赖项才能运行。completed - 用户可通过确认缺失前置条件来显式解除依赖项的阻断:“即使M2被跳过,仍继续执行M4”。需在执行日志中记录此覆盖操作。
- 若用户解除依赖项的阻断,在plan-crafting阶段需为该里程碑的上下文简报添加说明:“前置里程碑M2已被跳过,缺失以下输出:[M2成功标准列表]”。
被跳过的里程碑无法取消跳过。若用户之后想尝试该里程碑,需创建具有相同目标的新里程碑。
Duration Guard
时长限制
If a single milestone's total active time (from planning start to review completion) becomes excessive:
- Soft limit: If a milestone has been in or
planningstatus for more than what appears to be a proportionally large share of the overall work, pause and report to user: "Milestone M3 has been in progress for an extended period. Continue, re-scope, or abort?"executing - Hard limit on attempts: The 3-attempt limit (F1) bounds retry loops. But if even a single attempt's plan-crafting generates more than 15 tasks, pause and report: "This milestone's plan has N tasks — it may be too large for a single milestone. Consider splitting."
- Purpose: Prevent a single runaway milestone from consuming the entire execution budget or running indefinitely on flaky tests.
若单个里程碑的总活跃时间(从计划制定开始到评审完成)过长:
- 软限制:若某里程碑处于或
planning状态的时间占总工作时长的比例过大,需暂停并向用户报告:“里程碑M3已执行较长时间。是否继续、重新规划范围或中止?”executing - 尝试次数硬限制:3次尝试限制(强制约束规则第1条)可避免无限重试循环。但若单次计划制定生成超过15个任务,需暂停并报告:“该里程碑的计划包含N个任务——单个里程碑可能过大,建议拆分。”
- 目的:防止单个失控里程碑消耗全部执行预算或因不稳定测试无限期运行。
Context Window Management
上下文窗口管理
Long-running sessions will hit context window limits. Claude Code automatically compresses old messages (context collapse). The harness must be designed to survive this:
- Never rely on conversation memory for state. All state lives in and milestone files on disk. If the context is compressed, the harness re-reads state files — no information is lost.
state.md - Each milestone is a fresh context boundary. When starting a new milestone's plan-crafting, the worker subagent starts with a clean context. It receives only the milestone definition and completed predecessor context (see F8 contract) — not the full conversation history.
- Checkpoint files are the source of truth. If context is lost mid-milestone, recovery reads the checkpoint files, not compressed conversation summaries.
- Avoid accumulating large inline state. Do not build up a running summary of all milestones in the conversation. Instead, reference state.md and checkpoint files by path.
长期运行的会话会遇到上下文窗口限制。Claude Code会自动压缩旧消息(上下文折叠)。编排框架必须设计为能应对此情况:
- 绝不依赖对话内存存储状态:所有状态都存储在磁盘上的和里程碑文件中。若上下文被压缩,编排框架可重新读取状态文件——不会丢失任何信息。
state.md - 每个里程碑都是新的上下文边界:启动新里程碑的plan-crafting时,worker子代理从干净上下文开始。仅接收里程碑定义和已完成前置里程碑的上下文(见强制约束规则第8条约定)——不接收完整对话历史。
- 检查点文件是事实来源:若在里程碑执行过程中丢失上下文,需从检查点文件恢复,而非压缩的对话摘要。
- 避免累积大型内联状态:不要在对话中构建所有里程碑的运行摘要。相反,通过路径引用和检查点文件。
state.md
Rate Limit Handling
速率限制处理
Long-running sessions will encounter rate limits. Claude Code has built-in retry with exponential backoff (up to 10 retries, 5-minute max backoff). The harness should work with this, not against it:
- Let claude-code handle transient rate limits. Short 429/529 errors are retried automatically with backoff. Do not preemptively save state on every API error.
- Save state on persistent rate limits. If a rate limit persists beyond the automatic retry window (you'll see repeated "rate limit" messages), record current state to disk immediately.
- Log the rate limit event in execution log with timestamp.
- Report to user: "Rate limit hit. State saved. Resume with when ready."
long-run - Do NOT add manual retry loops on top of claude-code's built-in retry — this causes retry amplification.
- Background agent bail: Claude Code's background agents (like reviewer subagents) bail immediately on 529 overload errors instead of retrying. This is why Phase 2.5 reviewer failure handling exists — reviewer failures are often transient rate limits, not permanent errors.
长期运行的会话会遇到速率限制。Claude Code内置指数退避重试机制(最多10次重试,最大退避时间5分钟)。编排框架应配合该机制,而非与之冲突:
- 让claude-code处理临时速率限制:短暂的429/529错误会自动重试并退避。无需在每次API错误时都预先保存状态。
- 遇到持续速率限制时保存状态:若速率限制超出自动重试窗口(会看到重复的“rate limit”消息),需立即将当前状态写入磁盘。
- 在执行日志中记录速率限制事件及时间戳。
- 向用户报告:“已触发速率限制。状态已保存。准备就绪后使用恢复执行。”
long-run - 不要在claude-code内置重试之上添加手动重试循环——这会导致重试放大。
- 后台代理退出:Claude Code的后台代理(如评审子代理)遇到529过载错误时会立即退出,而非重试。这就是步骤2.5评审失败处理存在的原因——评审失败通常是临时速率限制,而非永久性错误。
Anti-Patterns
反模式
| Anti-Pattern | Why It Fails |
|---|---|
| Generating milestones inline instead of using milestone-planning | Milestones lack adversarial review; poor decomposition |
| Skipping review-work for "simple" milestones | Undetected defects compound across milestones |
| Continuing after a milestone fails | Dependent milestones build on broken foundation |
| Not updating state.md between phases | Crash loses progress; cannot resume |
| Modifying completed milestone files | Breaks checkpoint invariant; invalidates reviews |
| Running parallel milestones without worktree isolation | File conflicts corrupt both milestones |
| Auto-retrying on rate limit | Wastes quota; user may prefer to wait |
| Skipping user gates between milestones | User loses control of multi-day execution |
| Merging worktrees without conflict check | Silent data loss if files overlap |
| Skipping cross-milestone integration check | Milestones pass independently but break each other at boundaries |
| Retrying E2E failures indefinitely without user escalation | 2-attempt limit exists to avoid budget waste on misdiagnosed problems |
| 反模式 | 失败原因 |
|---|---|
| 动态生成里程碑而非使用milestone-planning | 里程碑缺乏对抗性评审;分解质量差 |
| 为“简单”里程碑跳过review-work | 未检测到的缺陷会在里程碑间累积 |
| 里程碑失败后仍继续执行 | 依赖里程碑会基于错误的基础构建 |
阶段转换之间不更新 | 崩溃会丢失进度;无法恢复 |
| 修改已完成里程碑的文件 | 破坏检查点不变性;使评审无效 |
| 不使用worktree隔离运行并行里程碑 | 文件冲突会损坏两个里程碑 |
| 速率限制时自动重试 | 浪费配额;用户可能更愿意等待 |
| 跳过里程碑之间的用户确认环节 | 用户失去对多日执行流程的控制权 |
| 不检查冲突就合并worktree | 文件重叠会导致静默数据丢失 |
| 跳过跨里程碑集成检查 | 里程碑独立通过测试,但在边界处相互破坏 |
| 无限期重试端到端失败而不升级至用户 | 设置2次尝试限制是为了避免因诊断错误浪费预算 |
Minimal Checklist
最小检查清单
- State directory exists with valid state.md and milestone files
- Dependency DAG validated (no cycles)
- Current position determined (fresh start or resume)
- User confirmed continuation at session start
- Each milestone goes through plan-crafting → run-plan → review-work
- State.md updated before and after every phase transition
- Checkpoint written after every successful milestone
- Failed milestones block dependents
- Parallel milestones use worktree isolation
- Cross-milestone integration check passes after each milestone
- Final E2E Gate passes at completion
- Full test suite passes at completion
- 状态目录存在且包含有效的和里程碑文件
state.md - 依赖DAG已验证(无循环)
- 已确定当前执行位置(全新启动或恢复执行)
- 会话启动时用户已确认继续
- 每个里程碑都依次执行plan-crafting → run-plan → review-work
- 每次阶段转换前后都更新了
state.md - 每个成功完成的里程碑都已创建检查点
- 失败的里程碑已阻断依赖项
- 并行里程碑使用worktree隔离
- 每个里程碑完成后跨里程碑集成检查都通过
- 完成时最终端到端检查通过
- 完成时完整测试套件通过
Transition
后续流程
After long run completion:
- For final code quality pass → skill
simplify - If issues found in completion testing → skill
systematic-debugging - If user wants to extend with more milestones → skill
milestone-planning
This skill itself does not invoke the next skill. It reports completion and lets the user decide the next step.
Long Run执行完成后:
- 如需最终代码质量优化 → 使用技能
simplify - 若在完成测试中发现问题 → 使用技能
systematic-debugging - 若用户想添加更多里程碑 → 使用技能
milestone-planning
本技能不会自动调用下一个技能。它会报告完成情况,由用户决定下一步操作。",