milestone-planning

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Milestone Planning (Ultraplan)

里程碑规划（Ultraplan）

Decomposes a complex task into milestones by spawning 5 parallel reviewer agents, synthesizing their independent analyses, and producing a milestone dependency DAG.

通过生成5个并行评审Agent、综合它们的独立分析结果并生成里程碑依赖DAG，将复杂任务分解为多个里程碑。

Core Principle

核心原则

Milestones are the unit of long-running execution. A bad milestone decomposition cascades into days of wasted work. Therefore milestone generation must be adversarial — multiple independent perspectives must challenge each other before milestones are locked.

里程碑是长期执行的基本单元。糟糕的里程碑分解会导致后续数天的工作白费。因此，里程碑生成必须采用对抗式方法——在确定里程碑之前，必须有多个独立视角相互验证。

Hard Gates

硬性要求

All 5 reviewer agents must run in parallel. Sequential execution is prohibited. Dispatch all 5 concurrently in a single message via the Agent tool.
Each reviewer receives the full problem statement. Do not split or filter the problem per reviewer. Every reviewer sees everything.
Reviewers must not see each other's findings. Each reviewer operates independently. No cross-pollination during the review phase.
Synthesis must address every reviewer's concern. The synthesis agent must explicitly respond to each finding — accepted, rejected with reason, or deferred to a specific milestone.
Every milestone must have measurable success criteria. "Working correctly" is not a criterion. Specific test commands, file existence checks, or behavioral assertions are required.
Milestone dependencies must form a DAG. Circular dependencies are a plan failure. Every milestone must have a clear topological ordering.
Do not generate milestones for trivial tasks. If the problem can be solved in a single plan-crafting cycle (fewer than ~8 tasks), tell the user to use plan-crafting directly.
Reviewer outputs must be passed verbatim to the synthesis agent. Do not summarize, filter, or reframe. Copy each reviewer's full output into the designated placeholder. The main agent must not editorialize the handoff.

所有5个评审Agent必须并行运行。 禁止顺序执行。通过Agent工具在一条消息中同时调度所有5个Agent。
每个评审Agent都要接收完整的问题描述。 不得针对不同评审Agent拆分或过滤问题。每个评审Agent都能看到全部信息。
评审Agent不得查看彼此的分析结果。 每个评审Agent独立运作，评审阶段禁止信息交叉传递。
综合阶段必须回应每个评审Agent的关注点。 综合Agent必须明确回应每个分析结果——接受、给出拒绝理由，或推迟到特定里程碑处理。
每个里程碑必须具备可衡量的成功标准。 "正常工作"不属于有效标准。必须指定具体的测试命令、文件存在性检查或行为断言。
里程碑依赖必须构成DAG。 循环依赖属于规划失败。每个里程碑必须有清晰的拓扑排序。
不为琐碎任务生成里程碑。 如果问题可在单个规划周期内解决（少于约8个任务），请告知用户直接使用规划构建功能。
评审Agent的输出必须原封不动传递给综合Agent。 不得总结、过滤或重构。将每个评审Agent的完整输出复制到指定占位符中。主Agent不得在交接过程中添加主观编辑内容。

When To Use

使用场景

When the user presents a complex, multi-day task
When the long-run harness needs milestone decomposition
When the user says "plan milestones", "break this into milestones", or "ultraplan"
When a task clearly requires multiple independent implementation phases

用户提出复杂的多日任务时
长期执行框架需要分解里程碑时
用户说出"plan milestones""break this into milestones"或"ultraplan"时
任务明显需要多个独立实施阶段时

When NOT To Use

禁用场景

Single-day tasks (use plan-crafting directly)
Tasks with fewer than ~8 implementation steps
When milestones are already defined and the user wants execution (use long-run)
When work scope is still ambiguous (use clarification first)

单日任务（直接使用规划构建功能）
实施步骤少于约8个的任务
里程碑已定义且用户需要执行时（使用长期执行功能）
工作范围仍不明确时（先使用澄清功能）

Input

输入

The skill requires a clear problem statement as input. This can come from:

A Context Brief file produced by the
```
clarification
```
skill (preferred)
A direct, detailed request from the user (must include goal, scope, constraints)

If the input is ambiguous, return to the

clarification

skill before proceeding.

该技能需要清晰的问题描述作为输入，来源包括：

```
clarification
```
技能生成的Context Brief文件（首选）
用户直接提出的详细请求（必须包含目标、范围、约束条件）

如果输入模糊，需先返回

clarification

技能处理。

Process

流程

Phase 1: Problem Framing

阶段1：问题框架构建

Before dispatching reviewers, frame the problem:

Read the input (Context Brief or user request)
Identify: goal, scope boundaries, technical constraints, success criteria
If a codebase is involved, dispatch an Explore agent to map relevant architecture
Compose the Problem Brief — a self-contained document that each reviewer will receive:

markdown

undefined

调度评审Agent前，先构建问题框架：

读取输入（Context Brief或用户请求）
确定：目标、范围边界、技术约束、成功标准
如果涉及代码库，调度Explore Agent映射相关架构
撰写Problem Brief——一份独立文档，每个评审Agent都会收到：

markdown

undefined

Problem Brief

Goal: [What must be achieved]

Scope:

In: [What is included]
Out: [What is explicitly excluded]

Technical Context: [Relevant architecture, existing code, constraints]

Constraints: [Time, compatibility, dependencies, performance requirements]

Success Criteria: [Specific, measurable outcomes]

Verification Strategy:

Level: [e2e | integration | skill/agent | test-suite | build-only]
Command: [exact command to run the verification]
What it validates: [what passing this verification proves]

undefined

Goal: [需要达成的目标]

Scope:

In: [包含内容]
Out: [明确排除的内容]

Technical Context: [相关架构、现有代码、约束条件]

Constraints: [时间、兼容性、依赖项、性能要求]

Success Criteria: [具体、可衡量的结果]

Verification Strategy:

Level: [e2e | integration | skill/agent | test-suite | build-only]
Command: [执行验证的精确命令]
What it validates: [通过该验证可证明的内容]

undefined

Phase 2: Parallel Reviewer Dispatch

阶段2：并行评审Agent调度

Dispatch all 5 reviewer agents concurrently in a single message via the Agent tool. Each receives the full Problem Brief and its reviewer-specific prompt.

Agent configuration for reviewers:

Use
```
run_in_background: true
```
so reviewers execute concurrently without blocking each other
Do NOT set
```
isolation: "worktree"
```
— reviewers are read-only analysts, not code writers
The claude-code fork agent default is
```
maxTurns: 200
```
— reviewers should complete well within this. If a reviewer appears stuck (no response after extended time), this is likely a rate limit or timeout — see Phase 2.5 for failure handling.

通过Agent工具在一条消息中同时调度所有5个评审Agent。每个Agent都会收到完整的Problem Brief和其专属的评审提示。

评审Agent配置：

设置
```
run_in_background: true
```
，使评审Agent并行执行，互不阻塞
不要设置
```
isolation: "worktree"
```
——评审Agent是只读分析角色，而非代码编写者
claude-code分支Agent默认
```
maxTurns: 200
```
——评审应在该范围内完成。如果某个评审Agent似乎停滞（长时间无响应），可能是速率限制或超时——请参考阶段2.5的故障处理。

Reviewer 1: Feasibility Analyst

评审Agent1：可行性分析师

You are a feasibility analyst reviewing a problem decomposition.

You are a feasibility analyst reviewing a problem decomposition.

Problem Brief

{PROBLEM_BRIEF}

Your Task

Analyze the feasibility of solving this problem. For each major component:

Technical feasibility: Can this be built with the stated tech stack? Identify any components that require research, prototyping, or may not be possible as described.
Effort estimation: Classify each component as:
- Small (1-3 tasks, < 1 plan cycle)
- Medium (4-8 tasks, 1 plan cycle)
- Large (9+ tasks, multiple plan cycles → candidate for milestone)
- Uncertain (requires spike/prototype before estimation)
Risk of underestimation: Flag components that appear simple but have hidden complexity (integration points, edge cases, data migration, backward compatibility).
Suggested milestone boundaries: Based on effort and risk, suggest where natural milestone boundaries should fall. A milestone should be independently deliverable and testable.

Analyze the feasibility of solving this problem. For each major component:

Technical feasibility: Can this be built with the stated tech stack? Identify any components that require research, prototyping, or may not be possible as described.
Effort estimation: Classify each component as:
- Small (1-3 tasks, < 1 plan cycle)
- Medium (4-8 tasks, 1 plan cycle)
- Large (9+ tasks, multiple plan cycles → candidate for milestone)
- Uncertain (requires spike/prototype before estimation)
Risk of underestimation: Flag components that appear simple but have hidden complexity (integration points, edge cases, data migration, backward compatibility).
Suggested milestone boundaries: Based on effort and risk, suggest where natural milestone boundaries should fall. A milestone should be independently deliverable and testable.

Output Format

For each suggested milestone:

Name: [milestone name]
Effort: [Small/Medium/Large/Uncertain]
Feasibility risk: [Low/Medium/High] — [reason]
Key deliverable: [what this milestone produces]

Also list:

Spike candidates: Components needing prototype before planning
Underestimation risks: Components likely harder than they appear

undefined

For each suggested milestone:

Name: [milestone name]
Effort: [Small/Medium/Large/Uncertain]
Feasibility risk: [Low/Medium/High] — [reason]
Key deliverable: [what this milestone produces]

Also list:

Spike candidates: Components needing prototype before planning
Underestimation risks: Components likely harder than they appear

undefined

Reviewer 2: Architecture Analyst

评审Agent2：架构分析师

You are an architecture analyst reviewing a problem decomposition.

You are an architecture analyst reviewing a problem decomposition.

Problem Brief

{PROBLEM_BRIEF}

Your Task

Analyze the architectural implications and suggest milestone boundaries that respect architectural constraints.

Interface boundaries: Identify the key interfaces, contracts, and APIs that must be defined. Milestones should align with interface boundaries — one milestone should not half-define an interface.
Data flow: Map how data flows through the system. Milestones that cut across data flows create integration risk.
Dependency direction: Identify which components depend on which. Milestones should be ordered so dependencies are built before dependents.
Incremental deliverability: Each milestone should leave the system in a working state. No milestone should produce a half-built component that only works after the next milestone.
Existing pattern alignment: Where possible, milestones should follow existing patterns in the codebase rather than introducing new patterns.

Analyze the architectural implications and suggest milestone boundaries that respect architectural constraints.

Interface boundaries: Identify the key interfaces, contracts, and APIs that must be defined. Milestones should align with interface boundaries — one milestone should not half-define an interface.
Data flow: Map how data flows through the system. Milestones that cut across data flows create integration risk.
Dependency direction: Identify which components depend on which. Milestones should be ordered so dependencies are built before dependents.
Incremental deliverability: Each milestone should leave the system in a working state. No milestone should produce a half-built component that only works after the next milestone.
Existing pattern alignment: Where possible, milestones should follow existing patterns in the codebase rather than introducing new patterns.

Output Format

For each suggested milestone:

Name: [milestone name]
Architectural rationale: [why this is a natural boundary]
Interfaces defined: [what contracts this milestone establishes]
Depends on: [which milestones must complete first]
Leaves system in working state: [Yes/No — explain]

Also list:

Interface risks: Interfaces that may need revision after initial implementation
Pattern conflicts: Where the proposed work conflicts with existing patterns

undefined

For each suggested milestone:

Name: [milestone name]
Architectural rationale: [why this is a natural boundary]
Interfaces defined: [what contracts this milestone establishes]
Depends on: [which milestones must complete first]
Leaves system in working state: [Yes/No — explain]

Also list:

Interface risks: Interfaces that may need revision after initial implementation
Pattern conflicts: Where the proposed work conflicts with existing patterns

undefined

Reviewer 3: Risk Analyst

评审Agent3：风险分析师

You are a risk analyst reviewing a problem decomposition.

You are a risk analyst reviewing a problem decomposition.

Problem Brief

{PROBLEM_BRIEF}

Your Task

Identify risks that could derail multi-day execution and suggest milestone ordering that minimizes cumulative risk.

Integration risk: Which components have the highest risk of not working together? These should be integrated early, not in the last milestone.
Ambiguity risk: Which requirements are most likely to change or be misunderstood? These should be tackled early so course corrections are cheap.
Dependency risk: Which external dependencies (APIs, libraries, services) are least reliable? Milestones depending on them should include fallback plans.
Regression risk: Which changes are most likely to break existing functionality? These milestones need heavier test coverage.
Recovery cost: If a milestone fails validation, how expensive is it to redo? High-cost milestones should be smaller and more frequent.

Identify risks that could derail multi-day execution and suggest milestone ordering that minimizes cumulative risk.

Integration risk: Which components have the highest risk of not working together? These should be integrated early, not in the last milestone.
Ambiguity risk: Which requirements are most likely to change or be misunderstood? These should be tackled early so course corrections are cheap.
Dependency risk: Which external dependencies (APIs, libraries, services) are least reliable? Milestones depending on them should include fallback plans.
Regression risk: Which changes are most likely to break existing functionality? These milestones need heavier test coverage.
Recovery cost: If a milestone fails validation, how expensive is it to redo? High-cost milestones should be smaller and more frequent.

Output Format

For each identified risk:

Risk: [description]
Severity: [Low/Medium/High/Critical]
Affected milestone(s): [which milestones]
Mitigation: [how to structure milestones to reduce this risk]

Overall risk-ordered milestone sequence:

[milestone] — [why first: highest ambiguity / integration risk / ...]
[milestone] — [why second] ...

undefined

For each identified risk:

Risk: [description]
Severity: [Low/Medium/High/Critical]
Affected milestone(s): [which milestones]
Mitigation: [how to structure milestones to reduce this risk]

Overall risk-ordered milestone sequence:

[milestone] — [why first: highest ambiguity / integration risk / ...]
[milestone] — [why second] ...

undefined

Reviewer 4: Dependency Analyst

评审Agent4：依赖分析师

You are a dependency analyst reviewing a problem decomposition.

You are a dependency analyst reviewing a problem decomposition.

Problem Brief

{PROBLEM_BRIEF}

Your Task

Map all dependencies — between milestones, between files, between external systems — and verify that the proposed decomposition respects them.

File conflict analysis: List all files that will be created or modified. Identify files touched by multiple milestones — these create ordering constraints.
Interface dependency graph: Map which milestones produce interfaces that other milestones consume. Draw the dependency DAG.
External dependency mapping: List external systems, APIs, libraries, or services each milestone depends on. Flag any that require setup, credentials, or may be unavailable.
Shared state identification: Identify shared state (databases, config files, global settings) that multiple milestones modify. These require strict ordering.
Parallelization opportunities: Identify milestones with zero dependencies between them — these can run concurrently.

Map all dependencies — between milestones, between files, between external systems — and verify that the proposed decomposition respects them.

File conflict analysis: List all files that will be created or modified. Identify files touched by multiple milestones — these create ordering constraints.
Interface dependency graph: Map which milestones produce interfaces that other milestones consume. Draw the dependency DAG.
External dependency mapping: List external systems, APIs, libraries, or services each milestone depends on. Flag any that require setup, credentials, or may be unavailable.
Shared state identification: Identify shared state (databases, config files, global settings) that multiple milestones modify. These require strict ordering.
Parallelization opportunities: Identify milestones with zero dependencies between them — these can run concurrently.

Output Format

Dependency DAG:

M1 (no deps) ─┬─→ M3 (depends on M1, M2)
M2 (no deps) ─┘         │
                         └─→ M4 (depends on M3)

File conflict matrix:

File	Milestones	Ordering constraint
path/to/file	M1, M3	M1 before M3

Parallelizable groups:

Group A: [M1, M2] — no shared files, no interface deps
Group B: [M4, M5] — after Group A completes

External dependencies:

[dependency]: required by [milestones], setup needed: [yes/no]

undefined

Dependency DAG:

M1 (no deps) ─┬─→ M3 (depends on M1, M2)
M2 (no deps) ─┘         │
                         └─→ M4 (depends on M3)

File conflict matrix:

File	Milestones	Ordering constraint
path/to/file	M1, M3	M1 before M3

Parallelizable groups:

Group A: [M1, M2] — no shared files, no interface deps
Group B: [M4, M5] — after Group A completes

External dependencies:

[dependency]: required by [milestones], setup needed: [yes/no]

undefined

Reviewer 5: User Value Analyst

评审Agent5：用户价值分析师

You are a user value analyst reviewing a problem decomposition.

You are a user value analyst reviewing a problem decomposition.

Problem Brief

{PROBLEM_BRIEF}

Your Task

Ensure milestone ordering maximizes early value delivery and maintains user motivation throughout multi-day execution.

Value ordering: Which milestones deliver the most visible, user-facing value? These should come early to provide feedback and maintain confidence.
Demo-ability: After each milestone, can the user see/test something meaningful? Milestones that produce only internal infrastructure with no visible output erode confidence.
Feedback loops: Which milestones benefit most from early user feedback? These should be prioritized so corrections are cheap.
Minimum viable milestone: What is the smallest first milestone that proves the approach works? This validates the overall direction before investing in the full plan.
Abort points: After which milestones could the user reasonably decide to stop and still have something useful? Mark these as natural checkpoints.

Ensure milestone ordering maximizes early value delivery and maintains user motivation throughout multi-day execution.

Value ordering: Which milestones deliver the most visible, user-facing value? These should come early to provide feedback and maintain confidence.
Demo-ability: After each milestone, can the user see/test something meaningful? Milestones that produce only internal infrastructure with no visible output erode confidence.
Feedback loops: Which milestones benefit most from early user feedback? These should be prioritized so corrections are cheap.
Minimum viable milestone: What is the smallest first milestone that proves the approach works? This validates the overall direction before investing in the full plan.
Abort points: After which milestones could the user reasonably decide to stop and still have something useful? Mark these as natural checkpoints.

Output Format

Value-ordered milestone sequence:

[milestone] — Value: [what user sees] — Demo: [how to verify]
[milestone] — Value: [what user sees] — Demo: [how to verify] ...

Minimum viable milestone: [which milestone and why]

Natural abort points: [milestones after which stopping is reasonable]

Low-value milestones: [milestones that could be cut if time is short]

undefined

Value-ordered milestone sequence:

[milestone] — Value: [what user sees] — Demo: [how to verify]
[milestone] — Value: [what user sees] — Demo: [how to verify] ...

Minimum viable milestone: [which milestone and why]

Natural abort points: [milestones after which stopping is reasonable]

Low-value milestones: [milestones that could be cut if time is short]

undefined

Phase 2.5: Reviewer Failure Handling

阶段2.5：评审Agent故障处理

After dispatching all 5 reviewers, wait for all to complete. If any reviewer fails:

Timeout or error: Re-dispatch the failed reviewer once with the same prompt. If it fails again, proceed without it.
Empty or unusable output: If a reviewer returns fewer than 3 sentences or clearly did not address the Problem Brief, re-dispatch once. If still unusable, proceed without it.
Proceeding with fewer than 5 reviewers: Log the missing perspective(s) in the synthesis handoff. The synthesis agent must note the gap in its Conflict Resolution Log: "Missing perspective: [reviewer name] — [reason]. Milestone plan may have blind spot in [area]."
Minimum viable count: At least 3 of 5 reviewers must succeed. If fewer than 3 complete successfully, stop and report to user — the problem may be too ambiguous for automated review.

调度所有5个评审Agent后，等待全部完成。如果任何评审Agent失败：

超时或错误： 重新调度失败的评审Agent一次，使用相同提示。如果再次失败，则跳过该Agent继续。
输出为空或无效： 如果评审Agent返回内容少于3句话，或明显未回应Problem Brief，重新调度一次。如果仍无效，则跳过该Agent继续。
少于5个评审Agent完成： 在综合交接过程中记录缺失的视角。综合Agent必须在冲突解决日志中注明缺口："缺失视角：[评审Agent名称] — [原因]。里程碑规划可能在[领域]存在盲区。"
最低有效数量： 至少3个评审Agent必须成功完成。如果成功完成的少于3个，停止操作并告知用户——问题可能过于模糊，无法进行自动化评审。

Phase 3: Synthesis

阶段3：综合

After all 5 reviewers complete, dispatch a Synthesis Agent that receives all 5 reviewer outputs and produces the final milestone plan.

Verbatim handoff rule (Hard Gate equivalent): The main agent must copy each reviewer's full output into the designated

{..._OUTPUT}

placeholder without summarizing, filtering, reframing, or adding commentary. This is the same principle as the run-plan validator's fixed template — the main agent has read all 5 outputs and may unconsciously bias the synthesis by selective framing. Verbatim copy eliminates this channel.

What must NOT happen during handoff:

Summarizing a reviewer's output ("The feasibility analyst mainly said...")
Filtering out findings the main agent considers irrelevant
Adding framing language ("Pay special attention to the risk analyst's concerns about...")
Reordering findings by perceived importance

The synthesis agent prompt:

You are a milestone synthesis agent. You have received analyses from 5
independent reviewers who each examined the same problem from a different
angle. Your job is to produce the final milestone decomposition.

所有5个评审Agent完成后，调度综合Agent，接收所有5个评审Agent的输出并生成最终里程碑规划。

原封不动交接规则（等效于硬性要求）： 主Agent必须将每个评审Agent的完整输出复制到指定的

{..._OUTPUT}

占位符中，不得总结、过滤、重构或添加评论。这与运行规划验证器的固定模板原则相同——主Agent已阅读所有5个输出，可能会通过选择性框架无意识地影响综合结果。原封不动复制可消除这种偏差。

交接过程中禁止的操作：

总结评审Agent的输出（例如："可行性分析师主要提到..."）
过滤主Agent认为无关的分析结果
添加框架性语言（例如："请特别关注风险分析师关于...的担忧"）
按主观重要性重新排序分析结果

综合Agent提示：

You are a milestone synthesis agent. You have received analyses from 5
independent reviewers who each examined the same problem from a different
angle. Your job is to produce the final milestone decomposition.

Reviewer Outputs

Feasibility Analysis

{FEASIBILITY_OUTPUT}

Architecture Analysis

{ARCHITECTURE_OUTPUT}

Risk Analysis

{RISK_OUTPUT}

Dependency Analysis

{DEPENDENCY_OUTPUT}

User Value Analysis

{USER_VALUE_OUTPUT}

Your Task

Cross-reference findings. Identify where reviewers agree and where they conflict. Agreements are high-confidence decisions. Conflicts require resolution.
Resolve conflicts explicitly. For each conflict:
- State the conflict
- State your resolution
- State why (which reviewer's reasoning is stronger in this case)
Produce the milestone DAG. Each milestone must have:
- Name
- Goal (1 sentence)
- Success criteria (measurable, specific)
- Dependencies (which milestones must complete first)
- Files affected (from dependency analysis)
- Risk level (from risk analysis)
- Estimated effort (from feasibility analysis)
- User value (from value analysis)
Validate the DAG. Verify:
- No circular dependencies
- Valid topological ordering exists
- No file conflicts between parallel milestones
- Each milestone leaves system in working state
- First milestone is the minimum viable milestone
Produce execution order. List milestones in execution order, marking which can run in parallel.

Cross-reference findings. Identify where reviewers agree and where they conflict. Agreements are high-confidence decisions. Conflicts require resolution.
Resolve conflicts explicitly. For each conflict:
- State the conflict
- State your resolution
- State why (which reviewer's reasoning is stronger in this case)
Produce the milestone DAG. Each milestone must have:
- Name
- Goal (1 sentence)
- Success criteria (measurable, specific)
- Dependencies (which milestones must complete first)
- Files affected (from dependency analysis)
- Risk level (from risk analysis)
- Estimated effort (from feasibility analysis)
- User value (from value analysis)
Validate the DAG. Verify:
- No circular dependencies
- Valid topological ordering exists
- No file conflicts between parallel milestones
- Each milestone leaves system in working state
- First milestone is the minimum viable milestone
Produce execution order. List milestones in execution order, marking which can run in parallel.

Output Format

Conflict Resolution Log

Conflict	Resolution	Rationale
[description]	[decision]	[why]

Conflict	Resolution	Rationale
[description]	[decision]	[why]

Milestone DAG

M1: [Name]

Goal: [one sentence]
Success Criteria:
- [specific, measurable criterion]
- [specific, measurable criterion]
Dependencies: None
Files: [list]
Risk: [Low/Medium/High]
Effort: [Small/Medium/Large]
User Value: [what user sees after completion]
Abort Point: [Yes/No]

Goal: [one sentence]
Success Criteria:
- [specific, measurable criterion]
- [specific, measurable criterion]
Dependencies: None
Files: [list]
Risk: [Low/Medium/High]
Effort: [Small/Medium/Large]
User Value: [what user sees after completion]
Abort Point: [Yes/No]

M2: [Name]

...

Execution Order

Phase 1 (parallel): M1, M2
Phase 2 (after Phase 1): M3
Phase 3 (parallel): M4, M5

Phase 1 (parallel): M1, M2
Phase 2 (after Phase 1): M3
Phase 3 (parallel): M4, M5

Rejected Proposals

Proposal	Source	Reason for rejection
[what was proposed]	[which reviewer]	[why rejected]

undefined

Proposal	Source	Reason for rejection
[what was proposed]	[which reviewer]	[why rejected]

undefined

Phase 3.5: Integration Verification Milestone

阶段3.5：集成验证里程碑

After synthesis, the main agent automatically appends an Integration Verification Milestone as the final milestone in the DAG. This milestone is not generated by reviewers or synthesis — it is a structural guarantee.

markdown

undefined

综合完成后，主Agent会自动追加一个集成验证里程碑作为DAG中的最后一个里程碑。该里程碑不由评审Agent或综合Agent生成，而是一项结构性保障。

markdown

undefined

M_final: Integration Verification

Goal: Validate that all milestones work together as a complete system
Success Criteria:
- Highest-level project verification passes (e2e, integration, or discovered verification)
- All milestone success criteria remain valid after full integration
- No regressions in pre-existing functionality
- Cross-milestone interfaces are exercised end-to-end
Dependencies: ALL other milestones
Files: None (read-only verification — no new code)
Risk: Medium (integration issues between independently-verified milestones)
Effort: Small (verification only, no implementation)
User Value: Confidence that the system works as a whole, not just per-milestone
Abort Point: No (this is the final gate)


**Verification Discovery:** During Phase 1 (Problem Framing), run the same verification discovery as plan-crafting:
1. Search for e2e tests → integration tests → verification skills/agents → test suite → build+lint
2. Record the result in the Problem Brief under a `Verification Strategy` section
3. The Integration Verification Milestone uses this discovered verification as its primary check

**If no verification infrastructure exists:** The Integration Verification Milestone's plan-crafting phase (during long-run execution) will create the necessary verification as Task 0, same as plan-crafting's behavior.

Goal: Validate that all milestones work together as a complete system
Success Criteria:
- Highest-level project verification passes (e2e, integration, or discovered verification)
- All milestone success criteria remain valid after full integration
- No regressions in pre-existing functionality
- Cross-milestone interfaces are exercised end-to-end
Dependencies: ALL other milestones
Files: None (read-only verification — no new code)
Risk: Medium (integration issues between independently-verified milestones)
Effort: Small (verification only, no implementation)
User Value: Confidence that the system works as a whole, not just per-milestone
Abort Point: No (this is the final gate)


**验证发现：** 在阶段1（问题框架构建）中，执行与规划构建相同的验证发现流程：
1. 搜索e2e测试→集成测试→验证技能/Agent→测试套件→构建+代码检查
2. 将结果记录在Problem Brief的`Verification Strategy`部分
3. 集成验证里程碑将此发现的验证作为主要检查手段

**如果没有验证基础设施：** 集成验证里程碑的规划构建阶段（长期执行期间）会创建必要的验证作为任务0，与规划构建的行为一致。

Phase 3.6: Independent DAG Validation

阶段3.6：独立DAG验证

After appending the Integration Verification Milestone, the main agent independently validates the full DAG structure (including M_final) before presenting to the user. Do not rely on the synthesis agent's self-reported validation.

Circular dependency check: For each milestone, trace its dependency chain. If any milestone appears as both an ancestor and a descendant of another, the DAG is invalid. Reject and re-dispatch synthesis with the specific cycle identified.
File conflict check for parallel milestones: For milestones with no dependency relationship, verify their "Files Affected" lists do not overlap. If they overlap, they cannot run in parallel — add a dependency or flag for user decision.
Orphan check: Every milestone except the first must have at least one dependency, OR be explicitly marked as independently parallelizable with rationale.
Success criteria check: Every milestone must have at least 2 measurable success criteria. "Working correctly" or similar vague criteria trigger re-dispatch.

If validation fails: re-dispatch synthesis with the specific error(s) as additional constraint. Do not present an invalid DAG to the user.

追加集成验证里程碑后，主Agent需独立验证完整的DAG结构（包括M_final），然后再呈现给用户。不得依赖综合Agent的自我验证报告。

循环依赖检查： 针对每个里程碑，追踪其依赖链。如果任何里程碑同时是另一个里程碑的祖先和后代，则DAG无效。拒绝该结果并重新调度综合Agent，同时指明具体的循环。
并行里程碑文件冲突检查： 对于无依赖关系的里程碑，验证其“受影响文件”列表是否重叠。如果重叠，则它们无法并行运行——需添加依赖或让用户决定。
孤立里程碑检查： 除第一个里程碑外，每个里程碑必须至少有一个依赖，或被明确标记为可独立并行并给出理由。
成功标准检查： 每个里程碑必须至少有2个可衡量的成功标准。“正常工作”或类似模糊标准会触发重新调度。

如果验证失败：重新调度综合Agent，并将具体错误作为额外约束。不得向用户呈现无效的DAG。

Phase 4: User Review and Lock

阶段4：用户评审与锁定

Milestone count guard: The recommended milestone count is 3-7 for most projects. If the synthesis produces more than 7, present a warning: "This plan has N milestones. Consider whether the problem should be split into separate projects." If more than 10, require explicit user approval to proceed.

Present the synthesized milestone plan to the user
Show the conflict resolution log — the user must see where reviewers disagreed
Show the execution order with parallelization
Show the total milestone count with the count guard warning if applicable
Ask the user to approve, modify, or reject the milestone plan
If approved: save the milestone plan to the harness state directory
If modifications requested: apply changes and re-present
If rejected: return to Phase 1 with updated constraints

里程碑数量限制： 大多数项目的推荐里程碑数量为3-7个。如果综合生成的里程碑超过7个，需呈现警告：“此规划包含N个里程碑。请考虑是否应将问题拆分为多个独立项目。” 如果超过10个，需获得用户明确批准才能继续。

向用户呈现综合后的里程碑规划
展示冲突解决日志——用户必须看到评审Agent的分歧点
展示带有并行标记的执行顺序
展示总里程碑数量，如适用则显示数量限制警告
请求用户批准、修改或拒绝里程碑规划
如果批准：将里程碑规划保存到框架状态目录
如果请求修改：应用更改后重新呈现
如果拒绝：返回阶段1并更新约束条件

Phase 5: Save Milestone Artifacts

阶段5：保存里程碑工件

Save all artifacts to the harness state directory:

docs/engineering-discipline/harness/<session-slug>/
├── state.md                  # Master state file
├── milestones/
│   ├── M1-<name>.md          # Individual milestone definition
│   ├── M2-<name>.md
│   └── ...
└── reviews/
    ├── feasibility.md
    ├── architecture.md
    ├── risk.md
    ├── dependency.md
    ├── user-value.md
    └── synthesis.md

state.md format:

markdown

undefined

将所有工件保存到框架状态目录：

docs/engineering-discipline/harness/<session-slug>/
├── state.md                  # 主状态文件
├── milestones/
│   ├── M1-<name>.md          # 单个里程碑定义
│   ├── M2-<name>.md
│   └── ...
└── reviews/
    ├── feasibility.md
    ├── architecture.md
    ├── risk.md
    ├── dependency.md
    ├── user-value.md
    └── synthesis.md

state.md格式：

markdown

undefined

Long Run State: [Session Name]

Verification Strategy:

Level: [e2e | integration | skill/agent | test-suite | build-only]
Command: [exact verification command]
What it validates: [what passing proves]

Verification Strategy:

Level: [e2e | integration | skill/agent | test-suite | build-only]
Command: [exact verification command]
What it validates: [what passing proves]

Milestones

ID	Name	Status	Dependencies	Plan File	Review File
M1	[name]	pending	—	—	—
M2	[name]	pending	M1	—	—
M3	[name]	pending	M1, M2	—	—

ID	Name	Status	Dependencies	Plan File	Review File
M1	[name]	pending	—	—	—
M2	[name]	pending	M1	—	—
M3	[name]	pending	M1, M2	—	—

Execution Log

Timestamp	Event	Details
YYYY-MM-DD HH:MM	milestones-locked	N milestones approved by user


**Individual milestone file (M1-<name>.md) format:**

```markdown

Timestamp	Event	Details
YYYY-MM-DD HH:MM	milestones-locked	N milestones approved by user


**单个里程碑文件（M1-<name>.md）格式：**

```markdown

Milestone: [Name]

ID: M1 Status: pending Dependencies: [None | M1, M2, ...] Risk: [Low/Medium/High] Effort: [Small/Medium/Large]

Goal

[One sentence goal]

Success Criteria

[Specific, measurable criterion]
[Specific, measurable criterion]
[Specific, measurable criterion]

[Specific, measurable criterion]
[Specific, measurable criterion]
[Specific, measurable criterion]

Files Affected

Create: [files to create]
Modify: [files to modify]

Create: [files to create]
Modify: [files to modify]

User Value

[What the user sees/can test after this milestone]

Abort Point

[Yes/No — can user stop here and have something useful?]

Notes

[Any special considerations from reviewer analysis]

undefined

[Any special considerations from reviewer analysis]

undefined

Anti-Patterns

反模式

Anti-Pattern	Why It Fails
Running reviewers sequentially	Wastes time; reviewers are independent
Skipping synthesis and just merging reviewer outputs	Conflicts go unresolved; milestone boundaries are incoherent
Accepting milestones without measurable success criteria	Cannot validate completion; "done" becomes subjective
Creating milestones too large (>12 tasks each)	Exceeds single plan-crafting cycle; risk of context loss
Creating milestones too small (1-2 tasks each)	Overhead of plan-crafting + run-plan + review-work exceeds the work itself
Creating more than 10 milestones without user approval	Compounding risk across milestones; likely needs project split
Ignoring reviewer conflicts	Unresolved conflicts surface during execution when they're expensive to fix
Not saving reviewer outputs	Loses the reasoning behind milestone decisions; cannot audit later
Letting user skip approval	User discovers misalignment mid-execution after days of work

反模式	失败原因
顺序运行评审Agent	浪费时间；评审Agent是独立的
跳过综合阶段直接合并评审Agent输出	冲突未解决；里程碑边界不连贯
接受无衡量标准的里程碑	无法验证完成；“完成”变得主观
创建过大的里程碑（每个超过12个任务）	超出单个规划周期；存在上下文丢失风险
创建过小的里程碑（每个1-2个任务）	规划构建+运行规划+评审工作的开销超过任务本身
未获用户批准创建超过10个里程碑	跨里程碑的复合风险增加；可能需要拆分项目
忽略评审Agent的冲突	未解决的冲突会在执行阶段暴露，此时修复成本极高
不保存评审Agent输出	丢失里程碑决策的推理依据；无法事后审计
允许用户跳过批准步骤	用户在执行数天后才发现偏差，浪费工作

Minimal Checklist

最小检查清单

Transition

过渡

After milestone planning is complete:

To begin execution →
```
long-run
```
skill
If ambiguity discovered → return to
```
clarification
```
skill
If task is too small for milestones → use
```
plan-crafting
```
directly

This skill itself does not invoke the next skill. It ends by presenting the milestone plan and letting the user choose the next step.

里程碑规划完成后：

开始执行 → 使用
```
long-run
```
技能
发现模糊性 → 返回
```
clarification
```
技能
任务过小无需里程碑 → 直接使用
```
plan-crafting
```

本技能本身不会调用下一个技能。它会在呈现里程碑规划后结束，由用户选择下一步操作。