Software Development: Extreme Programming Workflow

The Workflow

📋 PLAN     → Discuss and break down the feature
🔴 DEVELOP  → TDD cycle (red → green → refactor → review)
💾 COMMIT   → Save working state
🔁 ITERATE  → Next task or proceed to complete
✅ COMPLETE → Suggest retrospective

The DEVELOP cycle is a task's Definition of Done: no task is complete until the review step passes. Review is inside the cycle, not after it.

Phase 1: Planning (📋 PLAN) — Interactive

Understand and decompose the feature before writing any code. Pin down any unfamiliar domain terms before coding.

Follow this sequence: DISCUSS → CLARIFY → SLICE → FALSIFY → CONFIRM. See planning reference for detailed steps and examples.

CLARIFY — Surface Hidden Premises

Requirements are arguments in disguise — stated conclusions resting on unstated premises. Restate the requirement as "Given [premises], then [conclusion]" and ask what premises are missing. Challenge assumptions — especially those that feel obvious. STOP until questions are answered.

Run the premise checklist in planning reference

CLARIFY

, including the context-specific cases (UI, test seam, date-range membership, codegen, how to present the questions) before CONFIRM.

SLICE — Break Into Tasks

Tasks must be vertical (end-to-end functionality), small (one TDD cycle), ordered (dependency first, then value), and testable (clear acceptance criteria). Slice by behaviour, not by implementation layer.

FALSIFY — Test Your Understanding

Ask: "What scenario would prove this understanding wrong?" If you can't think of one, that's a warning sign — not a green light.

CONFIRM — Agree on Plan

Summarise, present ordered task list, STOP — explicitly agree on the first task.

Planning Output Format

## Tasks for [Feature]

1. [ ] [Task description] — [acceptance criteria] — DoD: new tests for new behaviour + affected-submodule tests green (whole project if no submodules) + fix-loop clean + for UI tasks, feature exercised in a browser
2. [ ] [Task description] — [acceptance criteria] — DoD: new tests for new behaviour + affected-submodule tests green (whole project if no submodules) + fix-loop clean + for UI tasks, feature exercised in a browser

**Assumptions surfaced:** [key premises uncovered during clarify/falsify]
**First task:** [Task 1 description]

The Definition of Done is identical for every task. A task cannot be ticked without it.

Phase 2: Development (🔴 DEVELOP) — Interactive

Each task runs a four-step cycle; all four must complete before the task is done.

Step 1: 🔴 Red — Failing Test

Write a failing test for the next behaviour. If the change threads through multiple call sites, write a failing test per call site before going green — a helper-level test alone can leave caller wiring silently wrong. Use the appropriate language skill. See development reference.

Step 2: 🟢 Green — Make It Pass

Surgical: touch only what the test requires. Match existing style. No drive-by edits to adjacent code, comments, or formatting. Every changed line should trace to the failing test. Broader cleanup belongs in Refactor.

Public signature changes: if you change a public function signature, read every caller before calling green. Compile-passing is not enough — see development reference

✅ VERIFY

Step 3: 🔵 Refactor

Clean up while the domain is fresh and tests are green. Continue refactoring while behaviour stays green; ask only when choosing between materially different cleanup directions or expanding scope. See refactor reference.

Public signature changes: same rule as Green — after a signature change, read every caller. Refactor is where this most often bites.

Step 4: 🔍 Review — Fix-Loop

Not optional. Only skip for pure non-code edits (comments, docs-only changes) and state the skip explicitly. For one-line build/config wiring with no production logic, targeted verification may replace fix-loop if the final response states the skip and names the command run.

Delegate to the
fix-loop
skill — runs code-reviewer → fixer until critical findings resolve (or the iteration cap hits). The reviewer's remit includes simplification opportunities, so fresh-eyes cleanup happens here. fix-loop already runs code-reviewer; don't also invoke code-reviewer standalone on the same diff.
If unresolved critical findings or test regressions remain: stop and surface to the user. The task is not done.
Non-critical findings (Warning / Suggestion): list them in one line each to the user before COMMIT and ask if any should be addressed now. Cheaper to fold in than to revisit in a follow-up commit.

Scope touches new behaviour without new tests is a critical finding, not a suggestion — even when sibling code lacks tests. Precedent is not permission.

DoD must cover edge cases the reviewer flagged as critical, not only the happy path.

Only after step 4 passes is the task complete. Proceed to COMMIT.

Phase 3: Commit (💾 COMMIT) — Autonomous

Run the project's canonical commit verification command (compile + lint + test) — must be green. Find it from README/CONTRIBUTING, build scripts, package manager scripts, Makefile, or workspace instructions. No red commits; no pushes without a green check on every commit. Before running broad commit checks, inspect
```
git status --short --branch
```
and the branch diff. If the check will include substantial pre-existing branch changes outside the current task, say so and prefer targeted verification unless the user explicitly asks for the full check. If the user interrupts a broad check and asks to commit/push anyway, proceed only after targeted verification and record the interrupted check in the PR/final summary.
Summarise what will be committed and ask the user to confirm.
Commit directly. If the user, repository, or agent platform specifies a co-author trailer, include that exact trailer. Otherwise use the current agent's appropriate public attribution if known; do not hard-code another agent's identity.

Phase 4: Iterate (🔁 ITERATE) — Interactive

Mark the task done (only if step 4 of DEVELOP passed). Adjust remaining tasks if needed. Return to Phase 2, or proceed to Phase 5 if none remain.

Phase 5: Complete (✅ COMPLETE) — Interactive

Feature done, all tasks committed. Before closing out:

Suggest a retrospective. Ask the user if they'd like to run the

retrospective

skill to surface gaps in this workflow and propose edits. This is the final step — do not skip it.

Phase Transitions

Announce clearly when switching:

📋 PLAN → Starting feature discussion
🔴 DEVELOP → Writing failing test for [behaviour]
🟢 DEVELOP → Making test pass
🔵 REFACTOR → Improving [aspect]
🔍 REVIEW → Delegating to fix-loop
💾 COMMIT → Running commit verification, then committing approved changes
🔁 ITERATE → Reviewing remaining tasks and moving to next task
✅ COMPLETE → Feature done

Red Flags — unverified ground

The same mistake wears many costumes: treating an unchecked claim as confirmed, then building on it. When you catch yourself about to rely on any of these, do the grounding action first:

You're about to trust…	Grounding action before continuing
A memory note, comment, or ticket claim about how the code works	Open the source and confirm it still holds — notes and tickets go stale
A test run that passed	Confirm it actually exercised the new behaviour — a name-filtered or happy-path-only run can pass while covering nothing
Green / it compiled / the tool said "success"	Inspect the real effect: a success flag whose return value was never checked, a process that reported "started" but isn't serving, or one output rendered in two places — all look fine while being wrong
"The rename/move is done"	grep the old name, path, and package across code, docs, and README in one pass — doc strings and README links lag code moves and otherwise surface at review

If you can't ground it, say so — don't proceed on the assumption.

software-development

NPX Install

Tags

SKILL.md Content