milestone-planning
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseMilestone Planning (Ultraplan)
里程碑规划(Ultraplan)
Decomposes a complex task into milestones by spawning 5 parallel reviewer agents, synthesizing their independent analyses, and producing a milestone dependency DAG.
通过生成5个并行评审Agent、综合它们的独立分析结果并生成里程碑依赖DAG,将复杂任务分解为多个里程碑。
Core Principle
核心原则
Milestones are the unit of long-running execution. A bad milestone decomposition cascades into days of wasted work. Therefore milestone generation must be adversarial — multiple independent perspectives must challenge each other before milestones are locked.
里程碑是长期执行的基本单元。糟糕的里程碑分解会导致后续数天的工作白费。因此,里程碑生成必须采用对抗式方法——在确定里程碑之前,必须有多个独立视角相互验证。
Hard Gates
硬性要求
- All 5 reviewer agents must run in parallel. Sequential execution is prohibited. Dispatch all 5 concurrently in a single message via the Agent tool.
- Each reviewer receives the full problem statement. Do not split or filter the problem per reviewer. Every reviewer sees everything.
- Reviewers must not see each other's findings. Each reviewer operates independently. No cross-pollination during the review phase.
- Synthesis must address every reviewer's concern. The synthesis agent must explicitly respond to each finding — accepted, rejected with reason, or deferred to a specific milestone.
- Every milestone must have measurable success criteria. "Working correctly" is not a criterion. Specific test commands, file existence checks, or behavioral assertions are required.
- Milestone dependencies must form a DAG. Circular dependencies are a plan failure. Every milestone must have a clear topological ordering.
- Do not generate milestones for trivial tasks. If the problem can be solved in a single plan-crafting cycle (fewer than ~8 tasks), tell the user to use plan-crafting directly.
- Reviewer outputs must be passed verbatim to the synthesis agent. Do not summarize, filter, or reframe. Copy each reviewer's full output into the designated placeholder. The main agent must not editorialize the handoff.
- 所有5个评审Agent必须并行运行。 禁止顺序执行。通过Agent工具在一条消息中同时调度所有5个Agent。
- 每个评审Agent都要接收完整的问题描述。 不得针对不同评审Agent拆分或过滤问题。每个评审Agent都能看到全部信息。
- 评审Agent不得查看彼此的分析结果。 每个评审Agent独立运作,评审阶段禁止信息交叉传递。
- 综合阶段必须回应每个评审Agent的关注点。 综合Agent必须明确回应每个分析结果——接受、给出拒绝理由,或推迟到特定里程碑处理。
- 每个里程碑必须具备可衡量的成功标准。 "正常工作"不属于有效标准。必须指定具体的测试命令、文件存在性检查或行为断言。
- 里程碑依赖必须构成DAG。 循环依赖属于规划失败。每个里程碑必须有清晰的拓扑排序。
- 不为琐碎任务生成里程碑。 如果问题可在单个规划周期内解决(少于约8个任务),请告知用户直接使用规划构建功能。
- 评审Agent的输出必须原封不动传递给综合Agent。 不得总结、过滤或重构。将每个评审Agent的完整输出复制到指定占位符中。主Agent不得在交接过程中添加主观编辑内容。
When To Use
使用场景
- When the user presents a complex, multi-day task
- When the long-run harness needs milestone decomposition
- When the user says "plan milestones", "break this into milestones", or "ultraplan"
- When a task clearly requires multiple independent implementation phases
- 用户提出复杂的多日任务时
- 长期执行框架需要分解里程碑时
- 用户说出"plan milestones""break this into milestones"或"ultraplan"时
- 任务明显需要多个独立实施阶段时
When NOT To Use
禁用场景
- Single-day tasks (use plan-crafting directly)
- Tasks with fewer than ~8 implementation steps
- When milestones are already defined and the user wants execution (use long-run)
- When work scope is still ambiguous (use clarification first)
- 单日任务(直接使用规划构建功能)
- 实施步骤少于约8个的任务
- 里程碑已定义且用户需要执行时(使用长期执行功能)
- 工作范围仍不明确时(先使用澄清功能)
Input
输入
The skill requires a clear problem statement as input. This can come from:
- A Context Brief file produced by the skill (preferred)
clarification - A direct, detailed request from the user (must include goal, scope, constraints)
If the input is ambiguous, return to the skill before proceeding.
clarification该技能需要清晰的问题描述作为输入,来源包括:
- 技能生成的Context Brief文件(首选)
clarification - 用户直接提出的详细请求(必须包含目标、范围、约束条件)
如果输入模糊,需先返回技能处理。
clarificationProcess
流程
Phase 1: Problem Framing
阶段1:问题框架构建
Before dispatching reviewers, frame the problem:
- Read the input (Context Brief or user request)
- Identify: goal, scope boundaries, technical constraints, success criteria
- If a codebase is involved, dispatch an Explore agent to map relevant architecture
- Compose the Problem Brief — a self-contained document that each reviewer will receive:
markdown
undefined调度评审Agent前,先构建问题框架:
- 读取输入(Context Brief或用户请求)
- 确定:目标、范围边界、技术约束、成功标准
- 如果涉及代码库,调度Explore Agent映射相关架构
- 撰写Problem Brief——一份独立文档,每个评审Agent都会收到:
markdown
undefinedProblem Brief
Problem Brief
Goal: [What must be achieved]
Scope:
- In: [What is included]
- Out: [What is explicitly excluded]
Technical Context:
[Relevant architecture, existing code, constraints]
Constraints:
[Time, compatibility, dependencies, performance requirements]
Success Criteria:
[Specific, measurable outcomes]
Verification Strategy:
- Level: [e2e | integration | skill/agent | test-suite | build-only]
- Command: [exact command to run the verification]
- What it validates: [what passing this verification proves]
undefinedGoal: [需要达成的目标]
Scope:
- In: [包含内容]
- Out: [明确排除的内容]
Technical Context:
[相关架构、现有代码、约束条件]
Constraints:
[时间、兼容性、依赖项、性能要求]
Success Criteria:
[具体、可衡量的结果]
Verification Strategy:
- Level: [e2e | integration | skill/agent | test-suite | build-only]
- Command: [执行验证的精确命令]
- What it validates: [通过该验证可证明的内容]
undefinedPhase 2: Parallel Reviewer Dispatch
阶段2:并行评审Agent调度
Dispatch all 5 reviewer agents concurrently in a single message via the Agent tool. Each receives the full Problem Brief and its reviewer-specific prompt.
Agent configuration for reviewers:
- Use so reviewers execute concurrently without blocking each other
run_in_background: true - Do NOT set — reviewers are read-only analysts, not code writers
isolation: "worktree" - The claude-code fork agent default is — reviewers should complete well within this. If a reviewer appears stuck (no response after extended time), this is likely a rate limit or timeout — see Phase 2.5 for failure handling.
maxTurns: 200
通过Agent工具在一条消息中同时调度所有5个评审Agent。每个Agent都会收到完整的Problem Brief和其专属的评审提示。
评审Agent配置:
- 设置,使评审Agent并行执行,互不阻塞
run_in_background: true - 不要设置——评审Agent是只读分析角色,而非代码编写者
isolation: "worktree" - claude-code分支Agent默认——评审应在该范围内完成。如果某个评审Agent似乎停滞(长时间无响应),可能是速率限制或超时——请参考阶段2.5的故障处理。
maxTurns: 200
Reviewer 1: Feasibility Analyst
评审Agent1:可行性分析师
You are a feasibility analyst reviewing a problem decomposition.You are a feasibility analyst reviewing a problem decomposition.Problem Brief
Problem Brief
{PROBLEM_BRIEF}
{PROBLEM_BRIEF}
Your Task
Your Task
Analyze the feasibility of solving this problem. For each major component:
-
Technical feasibility: Can this be built with the stated tech stack? Identify any components that require research, prototyping, or may not be possible as described.
-
Effort estimation: Classify each component as:
- Small (1-3 tasks, < 1 plan cycle)
- Medium (4-8 tasks, 1 plan cycle)
- Large (9+ tasks, multiple plan cycles → candidate for milestone)
- Uncertain (requires spike/prototype before estimation)
-
Risk of underestimation: Flag components that appear simple but have hidden complexity (integration points, edge cases, data migration, backward compatibility).
-
Suggested milestone boundaries: Based on effort and risk, suggest where natural milestone boundaries should fall. A milestone should be independently deliverable and testable.
Analyze the feasibility of solving this problem. For each major component:
-
Technical feasibility: Can this be built with the stated tech stack? Identify any components that require research, prototyping, or may not be possible as described.
-
Effort estimation: Classify each component as:
- Small (1-3 tasks, < 1 plan cycle)
- Medium (4-8 tasks, 1 plan cycle)
- Large (9+ tasks, multiple plan cycles → candidate for milestone)
- Uncertain (requires spike/prototype before estimation)
-
Risk of underestimation: Flag components that appear simple but have hidden complexity (integration points, edge cases, data migration, backward compatibility).
-
Suggested milestone boundaries: Based on effort and risk, suggest where natural milestone boundaries should fall. A milestone should be independently deliverable and testable.
Output Format
Output Format
For each suggested milestone:
- Name: [milestone name]
- Effort: [Small/Medium/Large/Uncertain]
- Feasibility risk: [Low/Medium/High] — [reason]
- Key deliverable: [what this milestone produces]
Also list:
- Spike candidates: Components needing prototype before planning
- Underestimation risks: Components likely harder than they appear
undefinedFor each suggested milestone:
- Name: [milestone name]
- Effort: [Small/Medium/Large/Uncertain]
- Feasibility risk: [Low/Medium/High] — [reason]
- Key deliverable: [what this milestone produces]
Also list:
- Spike candidates: Components needing prototype before planning
- Underestimation risks: Components likely harder than they appear
undefinedReviewer 2: Architecture Analyst
评审Agent2:架构分析师
You are an architecture analyst reviewing a problem decomposition.You are an architecture analyst reviewing a problem decomposition.Problem Brief
Problem Brief
{PROBLEM_BRIEF}
{PROBLEM_BRIEF}
Your Task
Your Task
Analyze the architectural implications and suggest milestone boundaries
that respect architectural constraints.
-
Interface boundaries: Identify the key interfaces, contracts, and APIs that must be defined. Milestones should align with interface boundaries — one milestone should not half-define an interface.
-
Data flow: Map how data flows through the system. Milestones that cut across data flows create integration risk.
-
Dependency direction: Identify which components depend on which. Milestones should be ordered so dependencies are built before dependents.
-
Incremental deliverability: Each milestone should leave the system in a working state. No milestone should produce a half-built component that only works after the next milestone.
-
Existing pattern alignment: Where possible, milestones should follow existing patterns in the codebase rather than introducing new patterns.
Analyze the architectural implications and suggest milestone boundaries
that respect architectural constraints.
-
Interface boundaries: Identify the key interfaces, contracts, and APIs that must be defined. Milestones should align with interface boundaries — one milestone should not half-define an interface.
-
Data flow: Map how data flows through the system. Milestones that cut across data flows create integration risk.
-
Dependency direction: Identify which components depend on which. Milestones should be ordered so dependencies are built before dependents.
-
Incremental deliverability: Each milestone should leave the system in a working state. No milestone should produce a half-built component that only works after the next milestone.
-
Existing pattern alignment: Where possible, milestones should follow existing patterns in the codebase rather than introducing new patterns.
Output Format
Output Format
For each suggested milestone:
- Name: [milestone name]
- Architectural rationale: [why this is a natural boundary]
- Interfaces defined: [what contracts this milestone establishes]
- Depends on: [which milestones must complete first]
- Leaves system in working state: [Yes/No — explain]
Also list:
- Interface risks: Interfaces that may need revision after initial implementation
- Pattern conflicts: Where the proposed work conflicts with existing patterns
undefinedFor each suggested milestone:
- Name: [milestone name]
- Architectural rationale: [why this is a natural boundary]
- Interfaces defined: [what contracts this milestone establishes]
- Depends on: [which milestones must complete first]
- Leaves system in working state: [Yes/No — explain]
Also list:
- Interface risks: Interfaces that may need revision after initial implementation
- Pattern conflicts: Where the proposed work conflicts with existing patterns
undefinedReviewer 3: Risk Analyst
评审Agent3:风险分析师
You are a risk analyst reviewing a problem decomposition.You are a risk analyst reviewing a problem decomposition.Problem Brief
Problem Brief
{PROBLEM_BRIEF}
{PROBLEM_BRIEF}
Your Task
Your Task
Identify risks that could derail multi-day execution and suggest milestone
ordering that minimizes cumulative risk.
-
Integration risk: Which components have the highest risk of not working together? These should be integrated early, not in the last milestone.
-
Ambiguity risk: Which requirements are most likely to change or be misunderstood? These should be tackled early so course corrections are cheap.
-
Dependency risk: Which external dependencies (APIs, libraries, services) are least reliable? Milestones depending on them should include fallback plans.
-
Regression risk: Which changes are most likely to break existing functionality? These milestones need heavier test coverage.
-
Recovery cost: If a milestone fails validation, how expensive is it to redo? High-cost milestones should be smaller and more frequent.
Identify risks that could derail multi-day execution and suggest milestone
ordering that minimizes cumulative risk.
-
Integration risk: Which components have the highest risk of not working together? These should be integrated early, not in the last milestone.
-
Ambiguity risk: Which requirements are most likely to change or be misunderstood? These should be tackled early so course corrections are cheap.
-
Dependency risk: Which external dependencies (APIs, libraries, services) are least reliable? Milestones depending on them should include fallback plans.
-
Regression risk: Which changes are most likely to break existing functionality? These milestones need heavier test coverage.
-
Recovery cost: If a milestone fails validation, how expensive is it to redo? High-cost milestones should be smaller and more frequent.
Output Format
Output Format
For each identified risk:
- Risk: [description]
- Severity: [Low/Medium/High/Critical]
- Affected milestone(s): [which milestones]
- Mitigation: [how to structure milestones to reduce this risk]
Overall risk-ordered milestone sequence:
- [milestone] — [why first: highest ambiguity / integration risk / ...]
- [milestone] — [why second] ...
undefinedFor each identified risk:
- Risk: [description]
- Severity: [Low/Medium/High/Critical]
- Affected milestone(s): [which milestones]
- Mitigation: [how to structure milestones to reduce this risk]
Overall risk-ordered milestone sequence:
- [milestone] — [why first: highest ambiguity / integration risk / ...]
- [milestone] — [why second] ...
undefinedReviewer 4: Dependency Analyst
评审Agent4:依赖分析师
You are a dependency analyst reviewing a problem decomposition.You are a dependency analyst reviewing a problem decomposition.Problem Brief
Problem Brief
{PROBLEM_BRIEF}
{PROBLEM_BRIEF}
Your Task
Your Task
Map all dependencies — between milestones, between files, between external
systems — and verify that the proposed decomposition respects them.
-
File conflict analysis: List all files that will be created or modified. Identify files touched by multiple milestones — these create ordering constraints.
-
Interface dependency graph: Map which milestones produce interfaces that other milestones consume. Draw the dependency DAG.
-
External dependency mapping: List external systems, APIs, libraries, or services each milestone depends on. Flag any that require setup, credentials, or may be unavailable.
-
Shared state identification: Identify shared state (databases, config files, global settings) that multiple milestones modify. These require strict ordering.
-
Parallelization opportunities: Identify milestones with zero dependencies between them — these can run concurrently.
Map all dependencies — between milestones, between files, between external
systems — and verify that the proposed decomposition respects them.
-
File conflict analysis: List all files that will be created or modified. Identify files touched by multiple milestones — these create ordering constraints.
-
Interface dependency graph: Map which milestones produce interfaces that other milestones consume. Draw the dependency DAG.
-
External dependency mapping: List external systems, APIs, libraries, or services each milestone depends on. Flag any that require setup, credentials, or may be unavailable.
-
Shared state identification: Identify shared state (databases, config files, global settings) that multiple milestones modify. These require strict ordering.
-
Parallelization opportunities: Identify milestones with zero dependencies between them — these can run concurrently.
Output Format
Output Format
Dependency DAG:
M1 (no deps) ─┬─→ M3 (depends on M1, M2)
M2 (no deps) ─┘ │
└─→ M4 (depends on M3)File conflict matrix:
| File | Milestones | Ordering constraint |
|---|---|---|
| path/to/file | M1, M3 | M1 before M3 |
Parallelizable groups:
- Group A: [M1, M2] — no shared files, no interface deps
- Group B: [M4, M5] — after Group A completes
External dependencies:
- [dependency]: required by [milestones], setup needed: [yes/no]
undefinedDependency DAG:
M1 (no deps) ─┬─→ M3 (depends on M1, M2)
M2 (no deps) ─┘ │
└─→ M4 (depends on M3)File conflict matrix:
| File | Milestones | Ordering constraint |
|---|---|---|
| path/to/file | M1, M3 | M1 before M3 |
Parallelizable groups:
- Group A: [M1, M2] — no shared files, no interface deps
- Group B: [M4, M5] — after Group A completes
External dependencies:
- [dependency]: required by [milestones], setup needed: [yes/no]
undefinedReviewer 5: User Value Analyst
评审Agent5:用户价值分析师
You are a user value analyst reviewing a problem decomposition.You are a user value analyst reviewing a problem decomposition.Problem Brief
Problem Brief
{PROBLEM_BRIEF}
{PROBLEM_BRIEF}
Your Task
Your Task
Ensure milestone ordering maximizes early value delivery and maintains
user motivation throughout multi-day execution.
-
Value ordering: Which milestones deliver the most visible, user-facing value? These should come early to provide feedback and maintain confidence.
-
Demo-ability: After each milestone, can the user see/test something meaningful? Milestones that produce only internal infrastructure with no visible output erode confidence.
-
Feedback loops: Which milestones benefit most from early user feedback? These should be prioritized so corrections are cheap.
-
Minimum viable milestone: What is the smallest first milestone that proves the approach works? This validates the overall direction before investing in the full plan.
-
Abort points: After which milestones could the user reasonably decide to stop and still have something useful? Mark these as natural checkpoints.
Ensure milestone ordering maximizes early value delivery and maintains
user motivation throughout multi-day execution.
-
Value ordering: Which milestones deliver the most visible, user-facing value? These should come early to provide feedback and maintain confidence.
-
Demo-ability: After each milestone, can the user see/test something meaningful? Milestones that produce only internal infrastructure with no visible output erode confidence.
-
Feedback loops: Which milestones benefit most from early user feedback? These should be prioritized so corrections are cheap.
-
Minimum viable milestone: What is the smallest first milestone that proves the approach works? This validates the overall direction before investing in the full plan.
-
Abort points: After which milestones could the user reasonably decide to stop and still have something useful? Mark these as natural checkpoints.
Output Format
Output Format
Value-ordered milestone sequence:
- [milestone] — Value: [what user sees] — Demo: [how to verify]
- [milestone] — Value: [what user sees] — Demo: [how to verify] ...
Minimum viable milestone: [which milestone and why]
Natural abort points: [milestones after which stopping is reasonable]
Low-value milestones: [milestones that could be cut if time is short]
undefinedValue-ordered milestone sequence:
- [milestone] — Value: [what user sees] — Demo: [how to verify]
- [milestone] — Value: [what user sees] — Demo: [how to verify] ...
Minimum viable milestone: [which milestone and why]
Natural abort points: [milestones after which stopping is reasonable]
Low-value milestones: [milestones that could be cut if time is short]
undefinedPhase 2.5: Reviewer Failure Handling
阶段2.5:评审Agent故障处理
After dispatching all 5 reviewers, wait for all to complete. If any reviewer fails:
- Timeout or error: Re-dispatch the failed reviewer once with the same prompt. If it fails again, proceed without it.
- Empty or unusable output: If a reviewer returns fewer than 3 sentences or clearly did not address the Problem Brief, re-dispatch once. If still unusable, proceed without it.
- Proceeding with fewer than 5 reviewers: Log the missing perspective(s) in the synthesis handoff. The synthesis agent must note the gap in its Conflict Resolution Log: "Missing perspective: [reviewer name] — [reason]. Milestone plan may have blind spot in [area]."
- Minimum viable count: At least 3 of 5 reviewers must succeed. If fewer than 3 complete successfully, stop and report to user — the problem may be too ambiguous for automated review.
调度所有5个评审Agent后,等待全部完成。如果任何评审Agent失败:
- 超时或错误: 重新调度失败的评审Agent一次,使用相同提示。如果再次失败,则跳过该Agent继续。
- 输出为空或无效: 如果评审Agent返回内容少于3句话,或明显未回应Problem Brief,重新调度一次。如果仍无效,则跳过该Agent继续。
- 少于5个评审Agent完成: 在综合交接过程中记录缺失的视角。综合Agent必须在冲突解决日志中注明缺口:"缺失视角:[评审Agent名称] — [原因]。里程碑规划可能在[领域]存在盲区。"
- 最低有效数量: 至少3个评审Agent必须成功完成。如果成功完成的少于3个,停止操作并告知用户——问题可能过于模糊,无法进行自动化评审。
Phase 3: Synthesis
阶段3:综合
After all 5 reviewers complete, dispatch a Synthesis Agent that receives all 5 reviewer outputs and produces the final milestone plan.
Verbatim handoff rule (Hard Gate equivalent): The main agent must copy each reviewer's full output into the designated placeholder without summarizing, filtering, reframing, or adding commentary. This is the same principle as the run-plan validator's fixed template — the main agent has read all 5 outputs and may unconsciously bias the synthesis by selective framing. Verbatim copy eliminates this channel.
{..._OUTPUT}What must NOT happen during handoff:
- Summarizing a reviewer's output ("The feasibility analyst mainly said...")
- Filtering out findings the main agent considers irrelevant
- Adding framing language ("Pay special attention to the risk analyst's concerns about...")
- Reordering findings by perceived importance
The synthesis agent prompt:
You are a milestone synthesis agent. You have received analyses from 5
independent reviewers who each examined the same problem from a different
angle. Your job is to produce the final milestone decomposition.所有5个评审Agent完成后,调度综合Agent,接收所有5个评审Agent的输出并生成最终里程碑规划。
原封不动交接规则(等效于硬性要求): 主Agent必须将每个评审Agent的完整输出复制到指定的占位符中,不得总结、过滤、重构或添加评论。这与运行规划验证器的固定模板原则相同——主Agent已阅读所有5个输出,可能会通过选择性框架无意识地影响综合结果。原封不动复制可消除这种偏差。
{..._OUTPUT}交接过程中禁止的操作:
- 总结评审Agent的输出(例如:"可行性分析师主要提到...")
- 过滤主Agent认为无关的分析结果
- 添加框架性语言(例如:"请特别关注风险分析师关于...的担忧")
- 按主观重要性重新排序分析结果
综合Agent提示:
You are a milestone synthesis agent. You have received analyses from 5
independent reviewers who each examined the same problem from a different
angle. Your job is to produce the final milestone decomposition.Reviewer Outputs
Reviewer Outputs
Feasibility Analysis
Feasibility Analysis
{FEASIBILITY_OUTPUT}
{FEASIBILITY_OUTPUT}
Architecture Analysis
Architecture Analysis
{ARCHITECTURE_OUTPUT}
{ARCHITECTURE_OUTPUT}
Risk Analysis
Risk Analysis
{RISK_OUTPUT}
{RISK_OUTPUT}
Dependency Analysis
Dependency Analysis
{DEPENDENCY_OUTPUT}
{DEPENDENCY_OUTPUT}
User Value Analysis
User Value Analysis
{USER_VALUE_OUTPUT}
{USER_VALUE_OUTPUT}
Your Task
Your Task
-
Cross-reference findings. Identify where reviewers agree and where they conflict. Agreements are high-confidence decisions. Conflicts require resolution.
-
Resolve conflicts explicitly. For each conflict:
- State the conflict
- State your resolution
- State why (which reviewer's reasoning is stronger in this case)
-
Produce the milestone DAG. Each milestone must have:
- Name
- Goal (1 sentence)
- Success criteria (measurable, specific)
- Dependencies (which milestones must complete first)
- Files affected (from dependency analysis)
- Risk level (from risk analysis)
- Estimated effort (from feasibility analysis)
- User value (from value analysis)
-
Validate the DAG. Verify:
- No circular dependencies
- Valid topological ordering exists
- No file conflicts between parallel milestones
- Each milestone leaves system in working state
- First milestone is the minimum viable milestone
-
Produce execution order. List milestones in execution order, marking which can run in parallel.
-
Cross-reference findings. Identify where reviewers agree and where they conflict. Agreements are high-confidence decisions. Conflicts require resolution.
-
Resolve conflicts explicitly. For each conflict:
- State the conflict
- State your resolution
- State why (which reviewer's reasoning is stronger in this case)
-
Produce the milestone DAG. Each milestone must have:
- Name
- Goal (1 sentence)
- Success criteria (measurable, specific)
- Dependencies (which milestones must complete first)
- Files affected (from dependency analysis)
- Risk level (from risk analysis)
- Estimated effort (from feasibility analysis)
- User value (from value analysis)
-
Validate the DAG. Verify:
- No circular dependencies
- Valid topological ordering exists
- No file conflicts between parallel milestones
- Each milestone leaves system in working state
- First milestone is the minimum viable milestone
-
Produce execution order. List milestones in execution order, marking which can run in parallel.
Output Format
Output Format
Conflict Resolution Log
Conflict Resolution Log
| Conflict | Resolution | Rationale |
|---|---|---|
| [description] | [decision] | [why] |
| Conflict | Resolution | Rationale |
|---|---|---|
| [description] | [decision] | [why] |
Milestone DAG
Milestone DAG
M1: [Name]
M1: [Name]
- Goal: [one sentence]
- Success Criteria:
- [specific, measurable criterion]
- [specific, measurable criterion]
- Dependencies: None
- Files: [list]
- Risk: [Low/Medium/High]
- Effort: [Small/Medium/Large]
- User Value: [what user sees after completion]
- Abort Point: [Yes/No]
- Goal: [one sentence]
- Success Criteria:
- [specific, measurable criterion]
- [specific, measurable criterion]
- Dependencies: None
- Files: [list]
- Risk: [Low/Medium/High]
- Effort: [Small/Medium/Large]
- User Value: [what user sees after completion]
- Abort Point: [Yes/No]
M2: [Name]
M2: [Name]
...
...
Execution Order
Execution Order
Phase 1 (parallel): M1, M2
Phase 2 (after Phase 1): M3
Phase 3 (parallel): M4, M5Phase 1 (parallel): M1, M2
Phase 2 (after Phase 1): M3
Phase 3 (parallel): M4, M5Rejected Proposals
Rejected Proposals
| Proposal | Source | Reason for rejection |
|---|---|---|
| [what was proposed] | [which reviewer] | [why rejected] |
undefined| Proposal | Source | Reason for rejection |
|---|---|---|
| [what was proposed] | [which reviewer] | [why rejected] |
undefinedPhase 3.5: Integration Verification Milestone
阶段3.5:集成验证里程碑
After synthesis, the main agent automatically appends an Integration Verification Milestone as the final milestone in the DAG. This milestone is not generated by reviewers or synthesis — it is a structural guarantee.
markdown
undefined综合完成后,主Agent会自动追加一个集成验证里程碑作为DAG中的最后一个里程碑。该里程碑不由评审Agent或综合Agent生成,而是一项结构性保障。
markdown
undefinedM_final: Integration Verification
M_final: Integration Verification
- Goal: Validate that all milestones work together as a complete system
- Success Criteria:
- Highest-level project verification passes (e2e, integration, or discovered verification)
- All milestone success criteria remain valid after full integration
- No regressions in pre-existing functionality
- Cross-milestone interfaces are exercised end-to-end
- Dependencies: ALL other milestones
- Files: None (read-only verification — no new code)
- Risk: Medium (integration issues between independently-verified milestones)
- Effort: Small (verification only, no implementation)
- User Value: Confidence that the system works as a whole, not just per-milestone
- Abort Point: No (this is the final gate)
**Verification Discovery:** During Phase 1 (Problem Framing), run the same verification discovery as plan-crafting:
1. Search for e2e tests → integration tests → verification skills/agents → test suite → build+lint
2. Record the result in the Problem Brief under a `Verification Strategy` section
3. The Integration Verification Milestone uses this discovered verification as its primary check
**If no verification infrastructure exists:** The Integration Verification Milestone's plan-crafting phase (during long-run execution) will create the necessary verification as Task 0, same as plan-crafting's behavior.- Goal: Validate that all milestones work together as a complete system
- Success Criteria:
- Highest-level project verification passes (e2e, integration, or discovered verification)
- All milestone success criteria remain valid after full integration
- No regressions in pre-existing functionality
- Cross-milestone interfaces are exercised end-to-end
- Dependencies: ALL other milestones
- Files: None (read-only verification — no new code)
- Risk: Medium (integration issues between independently-verified milestones)
- Effort: Small (verification only, no implementation)
- User Value: Confidence that the system works as a whole, not just per-milestone
- Abort Point: No (this is the final gate)
**验证发现:** 在阶段1(问题框架构建)中,执行与规划构建相同的验证发现流程:
1. 搜索e2e测试→集成测试→验证技能/Agent→测试套件→构建+代码检查
2. 将结果记录在Problem Brief的`Verification Strategy`部分
3. 集成验证里程碑将此发现的验证作为主要检查手段
**如果没有验证基础设施:** 集成验证里程碑的规划构建阶段(长期执行期间)会创建必要的验证作为任务0,与规划构建的行为一致。Phase 3.6: Independent DAG Validation
阶段3.6:独立DAG验证
After appending the Integration Verification Milestone, the main agent independently validates the full DAG structure (including M_final) before presenting to the user. Do not rely on the synthesis agent's self-reported validation.
- Circular dependency check: For each milestone, trace its dependency chain. If any milestone appears as both an ancestor and a descendant of another, the DAG is invalid. Reject and re-dispatch synthesis with the specific cycle identified.
- File conflict check for parallel milestones: For milestones with no dependency relationship, verify their "Files Affected" lists do not overlap. If they overlap, they cannot run in parallel — add a dependency or flag for user decision.
- Orphan check: Every milestone except the first must have at least one dependency, OR be explicitly marked as independently parallelizable with rationale.
- Success criteria check: Every milestone must have at least 2 measurable success criteria. "Working correctly" or similar vague criteria trigger re-dispatch.
If validation fails: re-dispatch synthesis with the specific error(s) as additional constraint. Do not present an invalid DAG to the user.
追加集成验证里程碑后,主Agent需独立验证完整的DAG结构(包括M_final),然后再呈现给用户。不得依赖综合Agent的自我验证报告。
- 循环依赖检查: 针对每个里程碑,追踪其依赖链。如果任何里程碑同时是另一个里程碑的祖先和后代,则DAG无效。拒绝该结果并重新调度综合Agent,同时指明具体的循环。
- 并行里程碑文件冲突检查: 对于无依赖关系的里程碑,验证其“受影响文件”列表是否重叠。如果重叠,则它们无法并行运行——需添加依赖或让用户决定。
- 孤立里程碑检查: 除第一个里程碑外,每个里程碑必须至少有一个依赖,或被明确标记为可独立并行并给出理由。
- 成功标准检查: 每个里程碑必须至少有2个可衡量的成功标准。“正常工作”或类似模糊标准会触发重新调度。
如果验证失败:重新调度综合Agent,并将具体错误作为额外约束。不得向用户呈现无效的DAG。
Phase 4: User Review and Lock
阶段4:用户评审与锁定
Milestone count guard: The recommended milestone count is 3-7 for most projects. If the synthesis produces more than 7, present a warning: "This plan has N milestones. Consider whether the problem should be split into separate projects." If more than 10, require explicit user approval to proceed.
- Present the synthesized milestone plan to the user
- Show the conflict resolution log — the user must see where reviewers disagreed
- Show the execution order with parallelization
- Show the total milestone count with the count guard warning if applicable
- Ask the user to approve, modify, or reject the milestone plan
- If approved: save the milestone plan to the harness state directory
- If modifications requested: apply changes and re-present
- If rejected: return to Phase 1 with updated constraints
里程碑数量限制: 大多数项目的推荐里程碑数量为3-7个。如果综合生成的里程碑超过7个,需呈现警告:“此规划包含N个里程碑。请考虑是否应将问题拆分为多个独立项目。” 如果超过10个,需获得用户明确批准才能继续。
- 向用户呈现综合后的里程碑规划
- 展示冲突解决日志——用户必须看到评审Agent的分歧点
- 展示带有并行标记的执行顺序
- 展示总里程碑数量,如适用则显示数量限制警告
- 请求用户批准、修改或拒绝里程碑规划
- 如果批准:将里程碑规划保存到框架状态目录
- 如果请求修改:应用更改后重新呈现
- 如果拒绝:返回阶段1并更新约束条件
Phase 5: Save Milestone Artifacts
阶段5:保存里程碑工件
Save all artifacts to the harness state directory:
docs/engineering-discipline/harness/<session-slug>/
├── state.md # Master state file
├── milestones/
│ ├── M1-<name>.md # Individual milestone definition
│ ├── M2-<name>.md
│ └── ...
└── reviews/
├── feasibility.md
├── architecture.md
├── risk.md
├── dependency.md
├── user-value.md
└── synthesis.mdstate.md format:
markdown
undefined将所有工件保存到框架状态目录:
docs/engineering-discipline/harness/<session-slug>/
├── state.md # 主状态文件
├── milestones/
│ ├── M1-<name>.md # 单个里程碑定义
│ ├── M2-<name>.md
│ └── ...
└── reviews/
├── feasibility.md
├── architecture.md
├── risk.md
├── dependency.md
├── user-value.md
└── synthesis.mdstate.md格式:
markdown
undefinedLong Run State: [Session Name]
Long Run State: [Session Name]
Created: YYYY-MM-DD HH:MM
Last Updated: YYYY-MM-DD HH:MM
Status: milestone-planning-complete | executing | paused | completing | completed | failed
Verification Strategy:
- Level: [e2e | integration | skill/agent | test-suite | build-only]
- Command: [exact verification command]
- What it validates: [what passing proves]
Created: YYYY-MM-DD HH:MM
Last Updated: YYYY-MM-DD HH:MM
Status: milestone-planning-complete | executing | paused | completing | completed | failed
Verification Strategy:
- Level: [e2e | integration | skill/agent | test-suite | build-only]
- Command: [exact verification command]
- What it validates: [what passing proves]
Milestones
Milestones
| ID | Name | Status | Attempts | Dependencies | Plan File | Review File |
|---|---|---|---|---|---|---|
| M1 | [name] | pending | 0 | — | — | — |
| M2 | [name] | pending | 0 | M1 | — | — |
| M3 | [name] | pending | 0 | M1, M2 | — | — |
Status values: pending | planning | executing | validating | completed | failed | skipped
Attempts: number of plan-execute-review cycles attempted (incremented at each Step 2-3 start)
| ID | Name | Status | Attempts | Dependencies | Plan File | Review File |
|---|---|---|---|---|---|---|
| M1 | [name] | pending | 0 | — | — | — |
| M2 | [name] | pending | 0 | M1 | — | — |
| M3 | [name] | pending | 0 | M1, M2 | — | — |
Status values: pending | planning | executing | validating | completed | failed | skipped
Attempts: number of plan-execute-review cycles attempted (incremented at each Step 2-3 start)
Execution Log
Execution Log
| Timestamp | Event | Details |
|---|---|---|
| YYYY-MM-DD HH:MM | milestones-locked | N milestones approved by user |
**Individual milestone file (M1-<name>.md) format:**
```markdown| Timestamp | Event | Details |
|---|---|---|
| YYYY-MM-DD HH:MM | milestones-locked | N milestones approved by user |
**单个里程碑文件(M1-<name>.md)格式:**
```markdownMilestone: [Name]
Milestone: [Name]
ID: M1
Status: pending
Dependencies: [None | M1, M2, ...]
Risk: [Low/Medium/High]
Effort: [Small/Medium/Large]
ID: M1
Status: pending
Dependencies: [None | M1, M2, ...]
Risk: [Low/Medium/High]
Effort: [Small/Medium/Large]
Goal
Goal
[One sentence goal]
[One sentence goal]
Success Criteria
Success Criteria
- [Specific, measurable criterion]
- [Specific, measurable criterion]
- [Specific, measurable criterion]
- [Specific, measurable criterion]
- [Specific, measurable criterion]
- [Specific, measurable criterion]
Files Affected
Files Affected
- Create: [files to create]
- Modify: [files to modify]
- Create: [files to create]
- Modify: [files to modify]
User Value
User Value
[What the user sees/can test after this milestone]
[What the user sees/can test after this milestone]
Abort Point
Abort Point
[Yes/No — can user stop here and have something useful?]
[Yes/No — can user stop here and have something useful?]
Notes
Notes
[Any special considerations from reviewer analysis]
undefined[Any special considerations from reviewer analysis]
undefinedAnti-Patterns
反模式
| Anti-Pattern | Why It Fails |
|---|---|
| Running reviewers sequentially | Wastes time; reviewers are independent |
| Skipping synthesis and just merging reviewer outputs | Conflicts go unresolved; milestone boundaries are incoherent |
| Accepting milestones without measurable success criteria | Cannot validate completion; "done" becomes subjective |
| Creating milestones too large (>12 tasks each) | Exceeds single plan-crafting cycle; risk of context loss |
| Creating milestones too small (1-2 tasks each) | Overhead of plan-crafting + run-plan + review-work exceeds the work itself |
| Creating more than 10 milestones without user approval | Compounding risk across milestones; likely needs project split |
| Ignoring reviewer conflicts | Unresolved conflicts surface during execution when they're expensive to fix |
| Not saving reviewer outputs | Loses the reasoning behind milestone decisions; cannot audit later |
| Letting user skip approval | User discovers misalignment mid-execution after days of work |
| 反模式 | 失败原因 |
|---|---|
| 顺序运行评审Agent | 浪费时间;评审Agent是独立的 |
| 跳过综合阶段直接合并评审Agent输出 | 冲突未解决;里程碑边界不连贯 |
| 接受无衡量标准的里程碑 | 无法验证完成;“完成”变得主观 |
| 创建过大的里程碑(每个超过12个任务) | 超出单个规划周期;存在上下文丢失风险 |
| 创建过小的里程碑(每个1-2个任务) | 规划构建+运行规划+评审工作的开销超过任务本身 |
| 未获用户批准创建超过10个里程碑 | 跨里程碑的复合风险增加;可能需要拆分项目 |
| 忽略评审Agent的冲突 | 未解决的冲突会在执行阶段暴露,此时修复成本极高 |
| 不保存评审Agent输出 | 丢失里程碑决策的推理依据;无法事后审计 |
| 允许用户跳过批准步骤 | 用户在执行数天后才发现偏差,浪费工作 |
Minimal Checklist
最小检查清单
- Problem Brief composed with goal, scope, constraints, success criteria
- All 5 reviewers dispatched in parallel (single message)
- Each reviewer received the full Problem Brief
- Synthesis agent received all 5 reviewer outputs
- All reviewer conflicts explicitly resolved
- Every milestone has measurable success criteria
- Milestone DAG has no circular dependencies
- First milestone is the minimum viable milestone
- Integration Verification Milestone appended as final milestone
- User approved the milestone plan
- All artifacts saved to harness state directory
- Problem Brief已包含目标、范围、约束条件、成功标准
- 所有5个评审Agent已并行调度(单条消息)
- 每个评审Agent都收到完整的Problem Brief
- 综合Agent已收到所有5个评审Agent的输出
- 所有评审Agent的冲突已明确解决
- 每个里程碑都有可衡量的成功标准
- 里程碑DAG无循环依赖
- 第一个里程碑是最小可行里程碑
- 已追加集成验证里程碑作为最后一个里程碑
- 用户已批准里程碑规划
- 所有工件已保存到框架状态目录
Transition
过渡
After milestone planning is complete:
- To begin execution → skill
long-run - If ambiguity discovered → return to skill
clarification - If task is too small for milestones → use directly
plan-crafting
This skill itself does not invoke the next skill. It ends by presenting the milestone plan and letting the user choose the next step.
里程碑规划完成后:
- 开始执行 → 使用技能
long-run - 发现模糊性 → 返回技能
clarification - 任务过小无需里程碑 → 直接使用
plan-crafting
本技能本身不会调用下一个技能。它会在呈现里程碑规划后结束,由用户选择下一步操作。