Loading...
Loading...
Found 2 Skills
Evaluate and improve skills through measured testing. Run trigger evaluations to test whether skill descriptions cause correct activation, optimize descriptions via automated train/test loops, benchmark skill output quality with A/B comparisons, and validate skill structure. Use when user says "improve skill", "test skill triggers", "optimize description", "benchmark skill", "eval skill", or "skill quality". Do NOT use for creating new skills (use skill-creator-engineer).
Run isolated eval and grading calls using CC 2.1.81 --bare mode. Constructs claude -p --bare invocations for skill evaluation, trigger testing, and LLM grading without plugin/hook interference. Use when running eval pipelines, grading skill outputs, benchmarking prompt quality, or testing trigger accuracy in isolation.