Loading...
Loading...
· Batch-improve skill collections with evaluation loops, lint checks, behavioral tests, peer review. Triggers: 'skill refiner', 'improve skills', 'quality sweep', 'batch improve', 'skill loop'. Not for one skill.
npx skill4agent add iuliandita/skills skill-refinerskill-refiner [--iterations N] [--mode MODE] [--secondary HARNESS] [--threshold N] [--plateau N]| Flag | Default | Description |
|---|---|---|
| 10 | Maximum iterations for phase 1 |
| circuit-breaker | |
| auto-detect | Secondary harness for cross-model review, or |
| 85 | Focus threshold - skip skills scoring above this (user can override max) |
| 2 | Minimum score delta to keep iterating |
SKILL_REFINER_SECONDARY=<harness>skill-refiner/YYYY-MM-DD-HHMMSS.refiner-runs.jsonreferences/harness-detection.mdreferences/harness-detection.mdclaude -pcodex execreferences/evaluation-criteria.mdreferences/test-cases.mdreferences/harness-detection.mdrefactor(skill-refiner): iteration N - skill1(+X), skill2(+Y)--- iteration N / max -------------------------------------------
improved: skill1 (72 > 80 | G:pass A:76 B:78 X:90), skill2 (68 > 73 | G:pass A:70 B:72 X:100)
gated: skillZ (lint/spec failed - excluded from scoring)
skipped: M skills above threshold
reverted: skill3 (proposed change scored -2, rolled back | G:pass A:74 B:69 X:100)
contested: skill4 (secondary flagged major, primary disagreed)
plateau: yes/no (max delta: +X)
-----------------------------------------------------------------references/evaluation-criteria.mdconventions.mdrefactor(skill-refiner): meta - improve <target> (+N)--mode auto=== skill-refiner run complete ===================================
Branch: skill-refiner/YYYY-MM-DD-HHMMSS
Primary: <harness> <version> (<model>, effort: <level>)
Secondary: <harness> <version> (<model>, effort: <level>) | none
Pool: N skills (skill-creator, skill-refiner excluded)
Config: iterations=M, threshold=T, mode=MODE, plateau=P
Iterations: N (of max M)
Terminated: plateau / threshold / cap / user
Score changes:
skill1: 62 > 88 (+26) [G:pass A:84 B:86 X:90]
skill2: 71 > 85 (+14) [G:pass A:82 B:79 X:100]
...
skill-creator: 80 > 84 (+4) [G:pass A:82 B:81 X:100] [meta]
skill-refiner: 78 > 83 (+5) [G:pass A:80 B:79 X:100] [meta]
Aggregate: avg X.X | min X.X | max X.X
Reverted: X changes across Y iterations
Contested: Z flags escalated to human
=================================================================.refiner-runs.jsonreferences/evaluation-criteria.mdreferences/test-cases.md--mode auto--mode auto