cs-refactor
AI has two consistent failure modes when refactoring code independently: first, it lacks awareness of the module's actual requirements and constraints, leading to functionally inequivalent modifications; second, it takes on a scope that exceeds the context capacity, forgetting earlier constraints as it proceeds. This process inserts a scan checklist + method library between "wanting to optimize" and "starting to modify", allowing AI to only undertake tasks it can reliably complete correctly, and pause honestly for the rest.
Full process:
scan (generate optimization point checklist) → design (confirm which items to implement and the order with the user) → apply (execute item by item, with manual approval for each step)
Core Discipline: Behavioral equivalence is the bottom line. If an action will alter externally observable behavior, do not use the refactor workflow; route to feature (requirement change) or issue (bug fix).
Fastforward Mode (use for small refactors)
When changes are obviously minor — single function, single component, 1-3 optimization points, self-verifiable with tests, no need for human visual inspection — going through the full three stages is overkill. Trigger
: AI directly identifies, aligns once, modifies in-place, runs tests for self-verification, and does not produce scan / design / checklist.
Trigger signals: Users say "small refactor", "quick refactor", "simply optimize XX function", "modify directly", "skip the steps".
When not to use fastforward:
- Changes span > 1 file
- Expected modification points exceed 3
- Requires visual verification (frontend effects, performance perception)
- Modifies public interfaces (needs to use Parallel Change)
- No test coverage
- Cross-module
In such cases, advise users to follow the standard process. If fastforward starts and the task becomes complex, switch back to the full process starting from scan.
Where to place files
Refactor outputs are gathered under
, with an independent directory for each refactor:
codestable/
└── refactors/
└── {YYYY-MM-DD}-{slug}/
├── {slug}-scan.md ← Optimization point checklist generated in Stage 1
├── {slug}-refactor-design.md ← Execution plan from Stage 2 (selected items, order, verification)
├── {slug}-checklist.yaml ← Generated in Stage 2, used to advance Stage 3
└── {slug}-apply-notes.md ← Execution records from Stage 3 (what was done in each step, verification results, deviations)
Directory naming aligns with feature / issue:
YYYY-MM-DD-{english slug}
, the date is set on the day of first creation and remains unchanged; slug uses lowercase letters, numbers, and hyphens, short enough to clearly indicate what is being modified (e.g.,
,
).
Why use a separate directory instead of mixing with features/: Refactor outputs are "scans + execution records of the current code state", which are time-sensitive and their value decays over time; feature outputs are "why this capability is designed this way", which have weak timeliness. The archiving logic is different, so mixing them will make it hard to find content later.
Three Stages
| Stage | Sub-process | Output | Lead |
|---|
| 1 scan | Generate optimization point checklist | {slug}-scan.md | AI scans code + runs pre-checks, user selects items |
| 2 design | Finalize execution plan | {slug}-refactor-design.md + {slug}-checklist.yaml | AI drafts, user conducts overall review |
| 3 apply | Execute item by item | Code changes + {slug}-apply-notes.md | AI executes, manual approval for each step |
There are checkpoints between stages. The scan checklist cannot enter design until the user selects items; no code modifications are made until the design is approved by the user; items marked for HUMAN verification in apply cannot proceed to the next step without user confirmation.
Stage 1: scan (Generate optimization point checklist)
First run pre-checks (7 items), stop if any are hit
Run pre-checks before starting the scan. If any item is hit,
abort scan and provide routing suggestions, do not force a checklist. The 7 checks and output format are in
reference/refusal-routing.md
in the same directory.
Zero valid outputs — if no worthwhile optimizations are found after scanning, state this honestly instead of forcing entries.
Lock scan scope
Before starting the scan, confirm one thing with the user: which files to scan this time. Default rules:
- User specifies specific files/components → scan only those
- User says "this page" → scan the page's entry component + directly imported internal modules, do not trace public dependencies
- User says "this module" → scan files in the module directory, do not go beyond module boundaries
- Scope > 15 files or > 3000 lines → trigger the 6th pre-check, ask the user to narrow the scope
Include test files in the scope (to judge test coverage for the 2nd pre-check).
What to look for during scanning
Use the four-layer classification of the method library as a template to search in the code:
- L1 Behavioral Equivalence Migration Signals: A function is called in many places but its interface/implementation needs modification → candidate for Parallel Change; a whole block of old logic needs to be replaced with a new implementation → candidate for Strangler Fig
- L2 Code-level Refactoring Signals: Overly long functions (> 50 lines / cyclomatic complexity > 10), repeated conditional fragments, mysterious temporary variables, deeply nested if-else
- L3 Structure Splitting Signals: Components > 300 lines, one file handles multiple tasks, container/presentation logic mixed, identical logic written separately in multiple components (frontend); Controller directly calls DB, missing Service layer, Repository bypassed (backend)
- L4 Performance Signals: Repeated calculations (memoizable), N+1 queries, list without virtualization/pagination, event listeners not cleaned up, deep reactivity for large objects (Vue)
The complete method library list is in
in the same directory, which must be fully loaded as a matching table during scanning.
Output format
- Top Overview (one paragraph): Scan scope / number of findings / distribution by category / distribution by risk / recommended priority items / recommended cautious items
- Checklist Items (one markdown block per item): Field order and hard constraints are in
reference/scan-checklist-format.md
in the same directory
After scanning, submit the entire
to the user. The user selects items to implement (mark ✓), marks questions or rejections (mark ✗ and write reasons), then proceed to Stage 2.
Do not select items on behalf of the user.
Stage 2: design (Finalize execution plan)
Input
- with user selections (✓ items are to be implemented this time, ✗ items are archived for traceability)
- Method library (each selected item must map to a method ID M-Ln-NN)
Tasks
- Sort order. Items with dependencies are placed first (e.g., L1 Parallel Change often needs to run first, followed by L2 extraction). Independent items are prioritized by "low risk + AI self-verifiable", and HUMAN verification items are grouped at the end.
- Add execution details for each item: Referenced method ID, specific steps, preconditions, exit signals, verification responsible party (AI / HUMAN), rollback strategy (how to restore if problems occur).
- Identify pre-dependencies: Items with insufficient test coverage need a pre-step of "supplement characterization tests"; items modifying public interfaces need a pre-step of "search for callers".
- Overall review: Submit the full draft of
{slug}-refactor-design.md
to the user. After user approval, change the in the frontmatter to .
- Extract checklist: Extract from the design, with steps corresponding to execution order and checks corresponding to exit signals for each step.
Design file structure
markdown
---
doc_type: refactor-design
refactor: {YYYY-MM-DD}-{slug}
status: draft | approved
scope: {one sentence describing scan scope}
summary: {one sentence describing the items to be implemented}
---
# {slug} refactor design
## 1. Scope of this refactor
- Which items were selected from the scan (listed by number)
- Items explicitly not implemented (marked ✗) and reasons
- Estimated total workload / total risk level
## 2. Pre-dependencies
- Test coverage supplement actions (if needed)
- Caller search actions (if needed)
- Other one-time preparations
## 3. Execution order
List by step, one block per step:
- Step N: {one sentence action}
- Referenced method: M-Ln-NN {method name}
- Specific operations: {apply method library steps to specific files/functions in this project}
- Exit signals: {tests AI runs / pages HUMAN checks}
- Verification responsibility: AI self-verification | HUMAN
- Rollback: {how to restore if problems occur, usually git revert the step}
## 4. Risks and key points
- Summary of high-risk steps (separately highlight steps with high risk in this design)
- Error-prone points (e.g., cross-step data flow changes)
Stage 3: apply (Execute item by item)
Advancement rules
- One step at a time, no batch operations. Strictly follow the checklist order; do not start the next step until the current step is completed.
- Verify after each step:
- AI self-verification items: Run specified tests / type checks / lint / grep for no residual old references. Record in apply-notes if passed, then proceed to the next step.
- HUMAN verification items: Pause, report "Step N has been completed, please visually confirm at {specific page / operation steps}, I will continue after confirmation". Do not proceed without explicit "continue" from the user.
- Record deviations immediately: If unconsidered situations are found during execution (e.g., a caller in dynamic import), pause and report, do not act on your own. Align with the user, add the deviation to apply-notes, and return to Stage 2 to modify the design if necessary.
- Self-check for behavioral equivalence: After each step, ask yourself — "Could this step change externally observable behavior?". If in doubt, roll back the step and do not proceed.
apply-notes format
markdown
---
doc_type: refactor-apply-notes
refactor: {YYYY-MM-DD}-{slug}
---
# {slug} apply notes
## Step 1: {action}
- Completion time: {date}
- Modified files: {file list}
- Verification result: {test output / HUMAN confirmation quote}
- Deviations: {none / specific description}
## Step 2: ...
After completion
- Run full tests + type checks + lint
- Ask the user for a final overall visual confirmation (especially for frontend: open key pages and test functions)
- After confirmation, finalize the commit, with the commit message referencing the refactor directory
Exit conditions
Common pitfalls
- AI forces checklist entries: Clearly hits pre-checks but finds excuses to bypass, and generates entries with non-quantifiable issues like "code can be more elegant" — should pause immediately and provide routing suggestions
- Includes behavior changes: "Incidentally fixed a bug / optimized a prompt" during refactoring — should pause and split into an independent issue or feature
- Combines cross-step actions: Submits 2-3 steps in one commit for speed — loses the ability to roll back a single step if problems occur
- Includes preference items: Naming preferences, quotes, arrow functions vs function — these go to decisions, not refactor
- Directly starts scanning a large module: Enters scan without splitting a scope of >15 files / >3000 lines, resulting in an unmanageable long checklist
- Skips HUMAN verification items: Frontend effects cannot be seen by AI; cannot replace manual visual inspection with "type checks passed"
- Proceeds with insufficient coverage: Modifies modules without tests, with "behavioral equivalence" only a verbal promise
Boundaries with adjacent workflows
- feature: Adding new capabilities / modifying requirements → feature. If "incidentally implement X" comes up during refactoring, pause and split it out.
- issue: Fixing bugs / correcting behavior → issue. Bugs found during refactoring should be recorded as new issues, not secretly fixed in the current PR.
- decisions: Project-wide long-term constraints ("use composable from now on", "disable mixin") → decisions. Refactor can reference existing decisions as basis, but does not produce decisions.
- architecture: Cross-module boundary restructuring / layer adjustment → architecture + decisions. A single refactor does not cross modules; cross-module work should be split into "update architecture documentation + record decisions + N module-level refactors".
- tricks / learning: Reusable techniques found during refactoring → tricks; pitfalls encountered → learning.
Related documents
codestable/reference/system-overview.md
— CodeStable system overview
- — Ultra-light channel for small refactors
reference/scan-checklist-format.md
— Fields, order, hard constraints, and anti-pattern samples for scan checklist items
reference/refusal-routing.md
— 7 scan pre-checks + routing table + rejection output format
- — Refactor method library (four-layer classification L1-L4, unified fields)
codestable/reference/shared-conventions.md
— Shared conventions across workflows
- Project architecture entry — Review before scanning to confirm module boundaries