long-run
Original:🇺🇸 English
Translated
Orchestrates multi-day execution of complex tasks through milestones. Each milestone goes through plan-crafting, run-plan (worker-validator), and review-work phases with checkpoint/recovery. Triggers when the user says "long run", "start long run", "execute milestones", or "run all milestones".
2installs
Added on
NPX Install
npx skill4agent add tmdgusya/engineering-discipline long-runTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Long Run Harness
Orchestrates multi-day execution of complex tasks through a milestone pipeline. Each milestone passes through plan-crafting → run-plan → review-work with checkpoints between milestones for recovery from interruptions.
Core Principle
Long-running execution must be resumable, auditable, and fail-safe. Every state transition is persisted to disk before the next action begins. If execution stops for any reason — rate limit, crash, user pause, context loss — it can resume from the last checkpoint without repeating completed work.
Hard Gates
- Milestones must exist before execution. Either from skill or user-provided. Never generate milestones inline during execution.
milestone-planning - State file must be updated before and after every milestone. No in-memory-only state. If it's not on disk, it didn't happen.
- Each milestone must complete the full pipeline. plan-crafting → run-plan → review-work. No shortcuts. No skipping review-work "because it looked fine."
- Failed milestones block dependents. If M2 depends on M1 and M1 fails review, M2 does not start. Period.
- User confirmation required at gate points. Before starting a new milestone phase (planning, execution, review), check if the user wants to continue, pause, or abort.
- Never modify completed milestones. Once a milestone passes review-work, its files are locked. If a later milestone needs changes to earlier work, that is a new milestone.
- Checkpoint after every milestone completion. Write a checkpoint file recording what was done, test results, and review verdict before proceeding.
When To Use
- After has produced a milestone DAG
milestone-planning - When the user says "long run", "start long run", "execute milestones", or "run all milestones"
- When resuming a previously paused long run session
When NOT To Use
- When milestones don't exist yet (use first)
milestone-planning - When there's only one milestone (use plan-crafting + run-plan directly)
- For quick tasks that don't warrant multi-phase execution
Input
- Harness state directory path — e.g.,
docs/engineering-discipline/harness/<session-slug>/ - The directory must contain and
state.mdfilesmilestones/*.md
If no state directory exists, ask the user if they want to run first.
milestone-planningProcess
Phase 1: Load and Validate State
- Read from the harness directory
state.md - Read all milestone files from
milestones/ - Validate:
- All milestones referenced in state.md have corresponding files
- Dependency DAG is valid (no cycles, topological sort possible)
- No milestone is in an invalid state (e.g., "executing" without a plan file)
- Determine current position:
- Which milestones are completed?
- Which milestones are ready to start (all dependencies met)?
- Is this a fresh start or a resume?
- Present status to the user:
## Long Run Status: [Session Name]
**Progress:** N/M milestones completed
**Current phase:** [planning M3 | executing M3 | reviewing M3 | ready to start M3]
**Next up:** [M3, M4 (parallel)]
Completed: M1 ✓, M2 ✓
In progress: M3 (executing)
Pending: M4, M5- Ask user to confirm: continue, pause, or abort.
Phase 2: Milestone Execution Loop
For each milestone in topological order:
┌─────────────────────────────────────┐
│ Milestone Pipeline │
│ │
│ ┌──────────┐ ┌─────────┐ │
│ │ Plan │───→│ Run │ │
│ │ Crafting │ │ Plan │ │
│ └──────────┘ └────┬────┘ │
│ │ │
│ ┌────▼────┐ │
│ │ Review │ │
│ │ Work │ │
│ └────┬────┘ │
│ │ │
│ ┌────────▼────────┐ │
│ │ PASS? │ │
│ │ Yes → checkpoint│ │
│ │ No → retry │ │
│ └─────────────────┘ │
└─────────────────────────────────────┘Step 2-1: Gate Check
Before starting a milestone:
- Verify all dependency milestones have status
completed - Verify no file conflicts with in-progress parallel milestones
- Update state.md: set milestone status to
planning - Update execution log with timestamp
Step 2-2: Plan Crafting Phase
- Compose a Context Brief from the milestone definition:
- Goal → from milestone file
- Scope → files affected from milestone file
- Success Criteria → from milestone file
- Constraints → inherited from the parent problem + completed milestone context
- Completed milestone context contract: From each completed predecessor, include ONLY:
- Files created/modified (from checkpoint's "Files Changed" list)
- Interface contracts established (function signatures, API shapes, type definitions)
- Success criteria that were verified as met
- Do NOT include: execution logs, review documents, worker/validator output, or full checkpoint contents
- Note: Context Briefs composed from milestone definitions omit the Complexity Assessment section, since routing has already been determined by the milestone-planning phase. The brief goes directly to plan-crafting without re-routing.
- Invoke the skill pattern:
plan-crafting- Create a plan document at
docs/engineering-discipline/plans/YYYY-MM-DD-<milestone-name>.md - The plan must satisfy all milestone success criteria
- The plan must not modify files outside the milestone's scope
- Create a plan document at
- Update state.md: record plan file path for this milestone
- User gate: Present the plan and ask for approval before execution
Step 2-3: Run Plan Phase
- Update state.md: set milestone status to , increment
executingcounter by 1Attempts - Execute the plan using the skill pattern:
run-plan- Worker-validator loop for each task
- Parallel execution for independent tasks
- Information-isolated validators
- If run-plan reports failure after 3 retries on any task:
- Update state.md: set milestone status to
failed - Record failure details in execution log
- Stop and report to user. Do not proceed to dependent milestones.
- Update state.md: set milestone status to
- If all tasks complete: proceed to review phase
Step 2-4: Review Work Phase
- Update state.md: set milestone status to
validating - Invoke the skill pattern:
review-work- Information-isolated review against the plan document
- Binary PASS/FAIL verdict
- If PASS:
- Update state.md: set milestone status to
completed - Write checkpoint file (see Checkpoint Format below)
- Update execution log
- Proceed to next milestone
- Update state.md: set milestone status to
- If FAIL:
- Record review findings in execution log
- Retry decision (based on counter in state.md, which persists across crashes):
Attempts- If Attempts == 1: return to Step 2-3 with review feedback (re-execute same plan)
- If Attempts == 2: return to Step 2-2 (re-plan with review feedback as constraint)
- If Attempts >= 3: set status to , stop, report to user
failed
Step 2-5: Cross-Milestone Integration Check
After a milestone passes review-work but before writing the checkpoint, verify that the milestone's output integrates correctly with all previously completed milestones:
- Run the project's highest-level verification (from state.md's Verification Strategy or rediscover using plan-crafting's Verification Discovery order)
- Check cross-milestone interfaces: If the completed milestone defines or consumes interfaces from predecessor milestones, verify they are compatible (function signatures match, API contracts hold, types align)
If integration check passes: Proceed to checkpoint.
If integration check fails — Cross-Milestone Failure Response:
The milestone passed its own review-work (internal correctness) but breaks integration with other milestones. This is a boundary problem.
-
Diagnose (attempt 1):
- Read the failure output
- Identify which interface boundary or interaction is broken
- Determine if the fix belongs to the current milestone or requires a corrective milestone
- If fixable within current milestone scope: dispatch a targeted fix worker → re-run review-work → re-run integration check
- If the fix is outside current milestone scope: proceed to escalation
-
Diagnose (attempt 2):
- If the first fix didn't resolve it, re-analyze
- Apply a second targeted fix
- Re-run integration check
-
Escalate to user (after 2 failed attempts):
- Report: which milestones are involved, what integration boundary failed, what fixes were tried
- Options: add corrective milestone, rollback to checkpoint, accept and continue (user acknowledges the integration gap)
- Log the user's decision in state.md execution log
Step 2-6: Checkpoint
After a milestone passes review:
Write :
checkpoints/M<N>-checkpoint.mdmarkdown
# Checkpoint: M<N> — [Milestone Name]
**Completed:** YYYY-MM-DD HH:MM
**Duration:** [time from planning start to review pass]
**Attempts:** [number of plan-execute-review cycles]
## Plan File
`docs/engineering-discipline/plans/YYYY-MM-DD-<name>.md`
## Review File
`docs/engineering-discipline/reviews/YYYY-MM-DD-<name>-review.md`
## Test Results
[Full test suite status at checkpoint time]
## Files Changed
[List of files created/modified in this milestone]
## State After Milestone
[Brief description of system state — what works now that didn't before]Phase 3: Parallel Milestone Execution
When multiple milestones have all dependencies satisfied and no file conflicts:
- Identify parallelizable milestone group
- Run plan-crafting for ALL parallel milestones first (sequentially — plans are lightweight)
- Present ALL plans together for batch approval: "Milestones M3 and M4 can run in parallel. Here are both plans. Approve each individually."
- User approves or rejects each plan independently. Only approved milestones proceed to execution. Rejected milestones return to Step 2-2 while approved ones execute.
- If all approved, dispatch each milestone's pipeline concurrently:
- Each milestone runs run-plan → review-work (plan already approved in step 3)
- Each runs in a worktree () to prevent file conflicts
isolation: "worktree" - After both complete and pass review, merge worktrees back
- If either fails: handle independently (the other can continue if no dependency)
Worktree merge protocol:
- Both milestones pass review in their respective worktrees
- Check for file conflicts between worktree changes
- If no conflicts: merge sequentially (M_lower first, then M_higher)
- If conflicts detected: stop, report to user, request manual resolution
- After merge: run full test suite on merged result
- If tests fail: stop, report to user
Phase 4: Completion
After all milestones are completed (including the Integration Verification Milestone from milestone-planning):
- Update state.md: set overall status to
completing - Final E2E Gate: Run the project's highest-level verification one final time on the fully integrated codebase
- Run full test suite for regression check
- If Final E2E Gate fails:
- Diagnose: identify which milestone's output is the likely cause
- Create a corrective milestone via Mid-Execution Correction procedure
- Execute corrective milestone through the full pipeline (plan-crafting → run-plan → review-work)
- Re-run E2E Gate after correction
- If 2 corrective attempts fail: escalate to user with full diagnosis
- If Final E2E Gate passes: Update state.md: set overall status to
completed - Generate completion summary:
markdown
# Long Run Complete: [Session Name]
**Started:** YYYY-MM-DD
**Completed:** YYYY-MM-DD
**Total milestones:** N
**Total attempts:** [sum of all milestone attempts]
## Milestone Summary
| Milestone | Status | Attempts | Duration |
|-----------|--------|----------|----------|
| M1: [name] | ✓ completed | 1 | 2h |
| M2: [name] | ✓ completed | 2 | 4h |
| ...
## Final Test Suite
[PASS/FAIL — N passed, M failed]
## Files Changed (Total)
[Aggregated list across all milestones]- Present to user and suggest for a final code quality pass
simplify
Recovery Protocol
When resuming a paused or interrupted session:
- Read state.md to determine last known state
- For each milestone, determine recovery action:
| Last Status | Recovery Action |
|---|---|
| Start normally |
| Restart plan-crafting (plan file may be incomplete) |
| Check run-plan progress; resume or restart |
| Restart review-work (review may be incomplete) |
| Skip (already checkpointed) |
| Present failure to user; ask whether to retry or skip (see Skip Rules below) |
| Skip (user previously chose to skip this milestone) |
- For milestones: check if tasks in the plan have checkboxes marked. Resume from the first unchecked task.
executing - Read the counter from state.md to determine retry budget remaining. Do not reset the counter on resume — it persists across crashes to prevent infinite retry loops.
Attempts - Present recovery plan to user before proceeding.
Mid-Execution Correction
If execution reveals that a completed milestone's output is incorrect or a new milestone is needed:
- Pause execution — do not continue with dependent milestones
- Log the discovery in state.md execution log: what was found, which milestone triggered the discovery
- User decision required: present the situation and options:
- Add corrective milestone: Create a new milestone definition (the user writes the goal and success criteria, or re-run milestone-planning for just the new scope). Insert it into the DAG with appropriate dependencies. Resume execution from the new milestone.
- Re-plan from a checkpoint: Roll back to a completed milestone's checkpoint, mark subsequent milestones as , reset their
pendingto 0, and restart from that point.Attempts - Abort: Set overall status to and stop.
failed
- New milestones follow the same pipeline — plan-crafting → run-plan → review-work. No shortcuts even for "quick fixes."
- Completed milestones are never modified (Hard Gate #6 still applies). The corrective milestone produces new files or overwrites with a full plan cycle.
Skip Rules
When a user chooses to skip a failed milestone:
- Set milestone status to in state.md
skipped - Log the skip event with user's reason in execution log
- Dependents of a skipped milestone are also blocked by default — same as . The DAG contract is: dependents run only after prerequisites are
failed.completed - The user may explicitly unblock a dependent by acknowledging the missing prerequisite: "Proceed with M4 despite M2 being skipped." Log this override in the execution log.
- If the user unblocks a dependent, add a note to that milestone's Context Brief during plan-crafting: "Prerequisite M2 was skipped. The following outputs are missing: [list from M2's success criteria]."
Skipped milestones cannot be un-skipped. If the user wants to attempt the milestone later, create a new milestone with the same goal.
Duration Guard
If a single milestone's total active time (from planning start to review completion) becomes excessive:
- Soft limit: If a milestone has been in or
planningstatus for more than what appears to be a proportionally large share of the overall work, pause and report to user: "Milestone M3 has been in progress for an extended period. Continue, re-scope, or abort?"executing - Hard limit on attempts: The 3-attempt limit (F1) bounds retry loops. But if even a single attempt's plan-crafting generates more than 15 tasks, pause and report: "This milestone's plan has N tasks — it may be too large for a single milestone. Consider splitting."
- Purpose: Prevent a single runaway milestone from consuming the entire execution budget or running indefinitely on flaky tests.
Context Window Management
Long-running sessions will hit context window limits. Claude Code automatically compresses old messages (context collapse). The harness must be designed to survive this:
- Never rely on conversation memory for state. All state lives in and milestone files on disk. If the context is compressed, the harness re-reads state files — no information is lost.
state.md - Each milestone is a fresh context boundary. When starting a new milestone's plan-crafting, the worker subagent starts with a clean context. It receives only the milestone definition and completed predecessor context (see F8 contract) — not the full conversation history.
- Checkpoint files are the source of truth. If context is lost mid-milestone, recovery reads the checkpoint files, not compressed conversation summaries.
- Avoid accumulating large inline state. Do not build up a running summary of all milestones in the conversation. Instead, reference state.md and checkpoint files by path.
Rate Limit Handling
Long-running sessions will encounter rate limits. Claude Code has built-in retry with exponential backoff (up to 10 retries, 5-minute max backoff). The harness should work with this, not against it:
- Let claude-code handle transient rate limits. Short 429/529 errors are retried automatically with backoff. Do not preemptively save state on every API error.
- Save state on persistent rate limits. If a rate limit persists beyond the automatic retry window (you'll see repeated "rate limit" messages), record current state to disk immediately.
- Log the rate limit event in execution log with timestamp.
- Report to user: "Rate limit hit. State saved. Resume with when ready."
long-run - Do NOT add manual retry loops on top of claude-code's built-in retry — this causes retry amplification.
- Background agent bail: Claude Code's background agents (like reviewer subagents) bail immediately on 529 overload errors instead of retrying. This is why Phase 2.5 reviewer failure handling exists — reviewer failures are often transient rate limits, not permanent errors.
Anti-Patterns
| Anti-Pattern | Why It Fails |
|---|---|
| Generating milestones inline instead of using milestone-planning | Milestones lack adversarial review; poor decomposition |
| Skipping review-work for "simple" milestones | Undetected defects compound across milestones |
| Continuing after a milestone fails | Dependent milestones build on broken foundation |
| Not updating state.md between phases | Crash loses progress; cannot resume |
| Modifying completed milestone files | Breaks checkpoint invariant; invalidates reviews |
| Running parallel milestones without worktree isolation | File conflicts corrupt both milestones |
| Auto-retrying on rate limit | Wastes quota; user may prefer to wait |
| Skipping user gates between milestones | User loses control of multi-day execution |
| Merging worktrees without conflict check | Silent data loss if files overlap |
| Skipping cross-milestone integration check | Milestones pass independently but break each other at boundaries |
| Retrying E2E failures indefinitely without user escalation | 2-attempt limit exists to avoid budget waste on misdiagnosed problems |
Minimal Checklist
- State directory exists with valid state.md and milestone files
- Dependency DAG validated (no cycles)
- Current position determined (fresh start or resume)
- User confirmed continuation at session start
- Each milestone goes through plan-crafting → run-plan → review-work
- State.md updated before and after every phase transition
- Checkpoint written after every successful milestone
- Failed milestones block dependents
- Parallel milestones use worktree isolation
- Cross-milestone integration check passes after each milestone
- Final E2E Gate passes at completion
- Full test suite passes at completion
Transition
After long run completion:
- For final code quality pass → skill
simplify - If issues found in completion testing → skill
systematic-debugging - If user wants to extend with more milestones → skill
milestone-planning
This skill itself does not invoke the next skill. It reports completion and lets the user decide the next step.