@rules/test-matrix.md
@rules/scenario-design.md
@rules/evidence-reporting.md
@references/prompt-pack-template.md
Skill Tester
Prove a skill works as intended before trusting it.
<purpose>
- Test whether a skill triggers on the right user requests and stays inactive on the wrong ones.
- Verify the skill's workflow, support-file routing, scripts/assets, and validation instructions against realistic usage.
- Expand coverage around edge cases, boundary prompts, ambiguity, missing inputs, malformed resources, and regression risks.
</purpose>
<routing_rule>
Use
when the user wants to test, validate, QA, regression-test, or edge-case-test an existing skill or skill folder.
Use
when the main job is creating or structurally refactoring a skill.
Use
when the main job is repeated measured optimization across experiments.
Use
or project-specific QA skills when the target is an application feature rather than a skill.
- there is no skill or skill draft to evaluate
- the user wants only generic documentation review
- the task is app/browser QA unrelated to skill behavior
- the user has already requested a full experiment loop with scoring and mutations
</routing_rule>
<trigger_conditions>
Positive examples:
- "Test and tell me whether it triggers correctly."
- "Verify whether this skill works as intended, including edge cases." (Korean-language requests with the same meaning should also trigger.)
- "Create a regression test pack for this skill's trigger and workflow behavior."
- "Validate the , rules, references, and scripts before I ship this skill."
Negative examples:
- "Create a new Codex skill for browser QA." Route to .
- "Run QA on my web app checkout flow." Route to app QA, not this skill.
- "Optimize this skill through repeated benchmark experiments." Route to .
Boundary example:
- "Review this skill and fix any issues you find."
Start with if the emphasis is evidence and failures; switch to only for structural edits after the test findings are clear.
</trigger_conditions>
<supported_targets>
- Skill folders containing and optional localized variants such as .
- Skill metadata, trigger descriptions, routing rules, and examples.
- Directly linked , , , and .
- Trigger prompt packs, workflow simulations, validation checklists, and regression reports.
- Edge cases around ambiguity, missing inputs, conflicting instructions, unsupported targets, and resource drift.
</supported_targets>
<required_inputs>
Minimum input:
- Target skill path or pasted skill content.
- Intended job of the skill, if not obvious from metadata.
If either is missing, inspect local context first. Ask only when the target skill or intended behavior cannot be inferred safely.
Optional but useful:
- Known prompts that should trigger.
- Known prompts that should not trigger.
- Expected outputs or workflow checkpoints.
- Recent failures, regressions, or edge cases to reproduce.
</required_inputs>
<skill_architecture>
Load support files deliberately:
- Use rules/test-matrix.md to choose what dimensions to test.
- Use rules/scenario-design.md to write positive, negative, boundary, adversarial, and localization scenarios.
- Use rules/evidence-reporting.md to report pass/fail evidence and next fixes.
- Use
scripts/validate-skill.mjs
for deterministic static checks when a filesystem skill folder is available.
- Use references/prompt-pack-template.md when the user asks for reusable regression tests or a prompt pack artifact.
Keep test evidence close to the target skill when the user asks for reusable artifacts; otherwise report findings inline.
</skill_architecture>
<workflow>
| Phase | Task | Output |
|---|
| 0 | Identify target skill, intended behavior, and neighboring skills that might conflict | Test scope |
| 1 | Read and directly linked support files needed for the test | Baseline behavior map |
| 2 | Build a scenario matrix covering positive, negative, boundary, edge, and regression cases | Test matrix |
| 3 | Run static anatomy checks and inspect support-file references | Static findings |
| 4 | Simulate skill routing and workflow execution for each scenario | Pass/fail table |
| 5 | Classify failures by trigger, scope, resource placement, workflow, validation, or safety | Ranked defects |
| 6 | Recommend minimal fixes or hand off to / when edits are needed | Evidence-backed report |
</workflow>
<test_requirements>
Every meaningful skill test should include at least:
- 3 positive trigger scenarios.
- 2 negative trigger scenarios.
- 2 boundary or ambiguous scenarios.
- 2 edge-case scenarios, such as missing inputs, malformed paths, unsupported language, conflicting instructions, or absent support files.
- 1 regression scenario for a known or likely failure.
For localized skills, include at least one scenario in each supported language when trigger behavior depends on language. In this repository, include at least one Korean positive or boundary request when testing skills that ship
.
</test_requirements>
<failure_taxonomy>
Classify each issue as one of:
- : target request may not activate the skill.
- : unrelated request may activate the skill.
- : neighboring skill or workflow owns the request better.
- : instructions do not tell the agent what to do next.
- : linked files are missing, stale, duplicated, or misplaced.
- : completion can be claimed without evidence.
- : missing handling for realistic boundary conditions.
- : instructions allow risky or irreversible behavior without checks.
</failure_taxonomy>
<output_contract>
Default report format:
markdown
## Skill Test Report
**Target**: `skills/example/`
**Intended behavior**: ...
**Verdict**: pass | pass-with-risks | fail
### Scenario results
|----|------|--------------------|----------|----------|--------|
### Findings
1. [severity] [taxonomy] Evidence-backed issue and affected file/section.
### Edge cases covered
- ...
### Recommended fixes
- Minimal next edit or handoff target.
### Validation evidence
- Commands run, files read, and checks completed.
If the user asks for reusable tests, also create a prompt pack or checklist under the target skill's
or a task-specific
workspace.
</output_contract>
<validation_checklist>
Before declaring a skill tested, confirm:
</validation_checklist>