self-improve
Original:🇺🇸 English
Translated
2 scripts
Autonomous evolutionary code improvement engine with tournament selection
4installs
Added on
NPX Install
npx skill4agent add yeachan-heo/oh-my-claudecode self-improveTags
Translated version includes tags in frontmatterSKILL.md Content
View Translation Comparison →Self-Improvement Orchestrator
You are the loop controller for the self-improvement system. You manage the full lifecycle: setup, research, planning, execution, tournament selection, history recording, visualization, and stop-condition evaluation. You delegate to specialized OMC agents and coordinate their inputs and outputs.
Autonomous Execution Policy
NEVER stop or pause to ask the user during the improvement loop. Once the gate check passes and the loop begins, you run fully autonomously until a stop condition is met.
- Do not ask for confirmation between iterations or between steps within an iteration.
- Do not summarize and wait — execute the next step immediately.
- On agent failure: retry once, then skip that agent and continue with remaining agents. Log the failure in iteration history.
- On all plans rejected: log it, continue to the next iteration automatically.
- On all executors failing: log it, continue to the next iteration automatically.
- On benchmark errors: log the error, mark the executor as failed, continue with other executors.
- The only things that stop the loop are the stop conditions in Step 11.
- Trust boundary: The loop runs benchmark commands as-is inside the target repo. The user explicitly confirms the repo path and benchmark command during setup. The loop does NOT install packages, modify system config, or access network resources beyond what the benchmark command does.
- Sealed files: validate.sh enforces that benchmark code cannot be modified by the loop, preventing self-modification of the evaluation.
State Tracking
All state lives under :
.omc/self-improve/.omc/self-improve/
├── config/ # User configuration
│ ├── settings.json # agents, benchmark, thresholds, sealed_files
│ ├── goal.md # Improvement objective + target metric
│ ├── harness.md # Guardrail rules (H001/H002/H003)
│ └── idea.md # User experiment ideas
├── state/ # Runtime state
│ ├── agent-settings.json # iterations, best_score, status, counters
│ ├── iteration_state.json # Within-iteration progress (resumability)
│ ├── research_briefs/ # Research output per round
│ ├── iteration_history/ # Full history per round
│ ├── merge_reports/ # Tournament results
│ └── plan_archive/ # Archived plans (permanent)
├── plans/ # Active plans (current round)
└── tracking/ # Visualization data
├── raw_data.json # All candidate scores
├── baseline.json # Initial benchmark score
├── events.json # Config changes
└── progress.png # Generated chartOMC mode lifecycle:
.omc/state/sessions/{sessionId}/self-improve-state.jsonAgent Mapping
All augmentations delivered via Task description context at spawn time. No modifications to existing agent .md files.
| Step | Role | OMC Agent | Model |
|---|---|---|---|
| Research | Codebase analysis + hypothesis generation | general-purpose Agent | opus |
| Planning | Hypothesis → structured plan | oh-my-claudecode:planner | opus |
| Architecture Review | 6-point plan review | oh-my-claudecode:architect | opus |
| Critic Review | Harness rule enforcement | oh-my-claudecode:critic | opus |
| Execution | Implement plan + run benchmark | oh-my-claudecode:executor | opus |
| Git Operations | Atomic merge/tag/PR | oh-my-claudecode:git-master | sonnet |
| Goal Setup | Interactive interview | (directly in this skill) | N/A |
| Benchmark Setup | Create + validate benchmark | custom agent | opus |
Research prompt: Read from this skill directory and pass its content as the agent prompt.
si-researcher.mdBenchmark builder: Read from this skill directory and pass its content as the agent prompt.
si-benchmark-builder.mdGoal clarifier: Read from this skill directory and execute the interview directly (interactive, needs user).
si-goal-clarifier.mdInputs
Read these files at startup and at the beginning of each iteration:
| File | Purpose |
|---|---|
| User config: |
| Runtime: |
| Per-iteration progress for resumability |
| Improvement objective, target metric, scope |
| Guardrail rules (H001, H002, H003) |
Setup Phase
- Check if target repo path exists. If not configured, ask user for the path to the repository to improve.
- Create directory structure by copying from
.omc/self-improve/in this skill directory.templates/ - Read . Check
.omc/self-improve/state/agent-settings.json,si_setting_goal,si_setting_benchmark.si_setting_harness - Trust confirmation (mandatory, cannot be skipped):
a. If is already
trust_confirmedin agent-settings.json, skip to step 5 (resume path). b. Display the target repo path and ask user to confirm:truec. If user declines: abort setup and exit. Do NOT proceed. d. Record consent: set"Self-improve will run benchmark commands inside {repo_path}. This executes arbitrary code in that repository. Confirm? [yes/no]"in agent-settings.json.trust_confirmed: true - If goal not set → read from this skill directory and run the 4-dimension Socratic interview directly in this context (Objective, Metric, Target, Scope). Write result to
si-goal-clarifier.md..omc/self-improve/config/goal.md - If benchmark not set → read from this skill directory, spawn a custom Agent(model=opus) with its content as prompt. The agent surveys the repo, creates or wraps a benchmark, validates 3x, and records baseline. After benchmark is set, confirm the benchmark command with user:
si-benchmark-builder.mdIf user declines: abort setup and exit."Benchmark command: {benchmark_command}. This will be run repeatedly during the loop. Confirm? [yes/no]" - If harness not set → confirm default harness rules (H001/H002/H003) with user or customize.
- Gate: All of ,
si_setting_goal,si_setting_benchmark,si_setting_harnessmust be true.trust_confirmed - Create improvement branch (if it does not exist):
Where
git -C {repo_path} checkout -b improve/{goal_slug} {target_branch} git -C {repo_path} checkout {target_branch}is derived from the goal objective (lowercase, underscored). If the branch already exists, skip creation. Persist{goal_slug}in agent-settings.json.goal_slug - Mode exclusivity: Call . If autopilot, ralph, or ultrawork is active, refuse to start.
state_list_active - Write initial state:
state_write(mode='self-improve', active=true, iteration=0, started_at=<now>)
Git Strategy
All git operations happen inside the target repo, NOT in the OMC project root.
- Improvement branch: — accumulates winning changes only.
improve/{goal_slug} - Experiment branches: — short-lived, per executor.
experiment/round_{n}_executor_{id} - Archive tags: — losing branches tagged before deletion.
archive/round_{n}_executor_{id} - Worktree setup (SKILL.md creates before each executor):
git -C {repo_path} worktree add worktrees/round_{n}_executor_{id} -b experiment/round_{n}_executor_{id} improve/{goal_slug} - Winner merges via :
oh-my-claudecode:git-masterMerge experiment/round_{n}_executor_{winner_id} into improve/{goal_slug} with --no-ff Message: "Iteration {n}: {hypothesis} (score: {before} → {after})" - Push after merge: (backup, non-blocking)
git -C {repo_path} push origin improve/{goal_slug} - Losers archived: Tag + delete via git-master.
Improvement Loop
Gate: All settings must be true. Once the gate passes, execute continuously without stopping.
Update .
state_write(mode='self-improve', active=true, status="running")Step 0 — Stale Worktree Cleanup (mandatory, runs every iteration)
PREREQUISITE: This step MUST run to completion before any other step, including resume logic. It is idempotent and safe to run multiple times.
- List all worktrees in the target repo:
git -C {repo_path} worktree list - For any worktree matching that does NOT belong to the current iteration: remove it with
worktrees/round_*git -C {repo_path} worktree remove {path} --force - Run to clean up stale references
git -C {repo_path} worktree prune - This handles crash recovery — orphaned worktrees from interrupted iterations are cleaned before the new iteration starts
Step 1 — Refresh State
state_write(mode='self-improve', active=true, iteration=N)Step 2 — Check Stop Request
Read state via .
state_read(mode='self-improve')If state is cleared (cancel was invoked) OR status is :
a. Set in
b. Update : set , record
c. Clean up any active worktrees for the current round (Step 0 logic)
d. Log:
e. Exit gracefully — do NOT invoke /cancel again (already cancelled)
user_stoppedstatus: "user_stopped".omc/self-improve/state/agent-settings.jsoniteration_state.jsonstatus: "interrupted"current_step"Self-improve stopped by user at iteration {N}, step {current_step}"Step 3 — Check User Ideas
Read . If non-empty, snapshot contents for planners. Clear after planners consume.
.omc/self-improve/config/idea.mdStep 4 — Research
Spawn 1 general-purpose Agent(model=opus) with the content of as prompt.
si-researcher.mdPass in the prompt:
- Current iteration number
- Path to target repo
- Path to
.omc/self-improve/config/goal.md - Path to (all prior records)
.omc/self-improve/state/iteration_history/ - Path to (prior briefs)
.omc/self-improve/state/research_briefs/ - Content of Section 3 (Research Brief schema)
data_contracts.md
Expected output: research brief JSON →
.omc/self-improve/state/research_briefs/round_{n}.jsonIf researcher fails, proceed with history only.
Step 5 — Plan
Spawn N (model=opus) agents in parallel (N = from settings).
oh-my-claudecode:plannernumber_of_agentsPass in each planner's prompt:
- Planner identity (planner_a, planner_b, planner_c...)
- Research brief path
- Iteration history path
- Harness rules from
.omc/self-improve/config/harness.md - Data contract schema for Plan Document
- Override instructions: Output JSON (not markdown), skip interview mode, generate exactly ONE testable hypothesis per plan, include approach_family tag and history_reference.
- User ideas (if any, planner_a gets priority)
Expected output: Plan Document JSON →
.omc/self-improve/plans/round_{n}/plan_planner_{id}.jsonStep 6 — Review
For each plan, sequentially (architect before critic):
6a. Architecture Review: Spawn with the plan + 6-point checklist:
oh-my-claudecode:architect- Testability — is the hypothesis testable?
- Novelty — different from prior attempts?
- Scope — right-sized?
- Target files — exist, not sealed?
- Implementation clarity — executor can implement without guessing?
- Expected outcome — realistic given evidence?
Architect verdict is advisory only.
6b. Critic Review: Spawn with the plan + harness rules:
oh-my-claudecode:critic- H001: Exactly one hypothesis (reject if zero or multiple)
- H002: No approach_family repetition streak >= 3
- H003: Intra-round diversity (no two plans same family in same round)
- Schema validation against data_contracts.md
- History awareness check
Critic sets or . Plans with are excluded from execution.
critic_approved: truefalsefalseIf ALL plans rejected, log and skip to Step 9.
Step 7 — Execute
For each approved plan, spawn (model=opus) in parallel.
oh-my-claudecode:executorBefore spawning, create worktree:
git -C {repo_path} worktree add worktrees/round_{n}_executor_{id} -b experiment/round_{n}_executor_{id} improve/{goal_slug}Pass in each executor's prompt:
- The approved plan JSON
- Worktree directory path
- Benchmark command from settings
- Sealed files list from settings
- Path to in this skill directory
scripts/validate.sh - Data contract schema for Benchmark Result
- Override instructions: Implement the plan faithfully, run validate.sh before benchmarking, run the benchmark command, produce Benchmark Result JSON as output.
Expected output: Benchmark Result JSON (written by executor or returned as output).
Step 8 — Tournament Selection
SKILL.md does this directly (not delegated):
- Collect all executor results
- Filter to only. If zero candidates, skip to Step 9 (Record & Visualize).
status: "success" - Rank by (respecting
benchmark_score)benchmark_direction - Ranked-candidate loop — for each candidate in rank order (best first):
a. No-regression check: candidate score must improve or hold even vs , respecting
best_score(benchmark_direction: score >= best_score;higher_is_better: score <= best_score) b. Merge vialower_is_better:oh-my-claudecode:git-masterc. Re-benchmark on merged state to confirm improvement d. If re-benchmark confirms improvement: accept winner, break loop e. If re-benchmark shows regression: revert merge viagit merge experiment/round_{n}_executor_{id} --no-ff -m "Iteration {n}: {hypothesis} (score: {before} → {after})", continue to next candidate f. If merge conflicts:git -C {repo_path} reset --hard HEAD~1, continue to next candidategit -C {repo_path} merge --abort - If a winner was accepted AND is
auto_pushin settings: Push improvement branch:true(non-blocking). Ifgit -C {repo_path} push origin improve/{goal_slug}isauto_push(default): skip push. Log:false"Push skipped (auto_push: false). Run manually: git -C {repo_path} push origin improve/{goal_slug}" - Archive all non-winner branches via git-master: tag + delete
- If no candidate survived the loop: no merge this round. Improvement branch stays at prior state.
- Write Merge Report JSON to (schema: data_contracts.md Section 9).
.omc/self-improve/state/merge_reports/round_{n}.json
Step 9 — Record & Visualize
- Write iteration history to
.omc/self-improve/state/iteration_history/round_{n}.json - Update :
.omc/self-improve/state/agent-settings.json- Increment by 1
iterations - If winner AND improvement exceeds (
plateau_threshold): updateabs(new_score - best_score) >= plateau_threshold, resetbest_score, resetplateau_consecutive_count = 0circuit_breaker_count = 0 - If winner AND improvement below threshold (): update
abs(new_score - best_score) < plateau_thresholdif better, incrementbest_score, resetplateau_consecutive_count += 1circuit_breaker_count = 0 - If no winner (all rejected, all failed, or all regressed): increment (do NOT increment
circuit_breaker_count += 1— plateau tracks stagnating wins, not failures)plateau_consecutive_count
- Increment
- Append to (one entry per candidate)
.omc/self-improve/tracking/raw_data.json - Run for visualization
python3 {skill_dir}/scripts/plot_progress.py - Archive plans: copy current round plans to
state/plan_archive/round_{n}/
Step 10 — Cleanup
Remove worktrees:
git -C {repo_path} worktree remove worktrees/round_{n}_executor_{id} --force
git -C {repo_path} worktree pruneUpdate status to .
iteration_state.jsoncompletedStep 11 — Stop Condition Check
Evaluate ALL conditions. If ANY is true, exit:
| Condition | Check |
|---|---|
| User stop | |
| Target reached | |
| Plateau | |
| Max iterations | |
| Circuit breaker | |
If NO stop condition: immediately go back to Step 1.
Resumability
PREREQUISITE: Step 0 (stale worktree cleanup) MUST run to completion before any resume logic executes, regardless of prior state.
On invocation, before entering the loop:
- Always run Step 0 (stale worktree cleanup) — even on fresh start
- Read :
.omc/self-improve/state/agent-settings.json- If : ask user
status: "user_stopped". If no, exit. If yes, continue."Previous run was stopped at iteration {N}. Resume? [yes/no]" - If : session crashed — resume automatically (no user prompt)
status: "running" - If : fresh start
status: "idle"
- If
- Re-confirm trust gate only if is
trust_confirmedin agent-settings.jsonfalse - Read :
.omc/self-improve/state/iteration_state.json- → resume from
status: "in_progress", skip completed sub-stepscurrent_step - → start next iteration
status: "completed" - → complete recording step if needed, start next iteration
status: "failed" - File missing → start from iteration 1
Completion
When the loop exits:
- Update agent-settings.json with final status
- If AND
target_reachedisauto_prin settings: spawn git-master to create PR fromtrueto upstream. Ifimprove/{goal_slug}isauto_pr(default): skip PR creation. Log:false"PR creation skipped (auto_pr: false). Run manually: gh pr create --head improve/{goal_slug} --base {target_branch}" - Run plot_progress.py one final time
- Print summary report:
=== Self-Improvement Loop Complete === Status: {status} Iterations: {iterations} Best Score: {best_score} (baseline: {baseline}) Improvement: {delta} ({delta_pct}%) - Run for clean state cleanup
/oh-my-claudecode:cancel
Error Handling
| Situation | Action |
|---|---|
| Agent fails to produce output | Retry once. If still no output, log and continue. |
| Researcher produces empty brief | Proceed — planners work from history alone. |
| All plans rejected by critic | Skip execution. Log. Continue to next iteration. |
| All executors fail | Skip tournament. Record failures. Continue. |
| Merge conflict | Reject candidate, try next. |
| Re-benchmark regression | Reject candidate, revert merge, try next. |
| Push failure | Log warning. Continue — push is backup. |
| Worktree already exists | Remove and recreate. |
| Settings corrupted | Report and stop. |
Approach Family Taxonomy
Every plan must be tagged with exactly one:
| Tag | Description |
|---|---|
| Model/component structure changes |
| Optimizer, LR, scheduler, batch size |
| Data loading, augmentation, preprocessing |
| Mixed precision, distributed training, compiled kernels |
| Algorithmic/numerical optimizations |
| Evaluation methodology changes |
| Documentation-only changes |
| Does not fit above — explain in evidence |