Loading...
Loading...
Install vigiles and test a Claude Code harness — hooks, skills, settings, CLAUDE.md — by picking the right tier (unit / deterministic / eval) and writing a test that passes. Use when the user wants to check that a hook fires or blocks, that a skill triggers, that injected context lands, or that a harness change moves what the agent does.
npx skill4agent add zernie/vigiles test-harness| What you're testing | Tier | Cost | API |
|---|---|---|---|
| "Does this hook block/allow event X?" — pure hook logic, every event type (incl. Edit/Write, PreCompact, SessionEnd, SubagentStop) | Unit | free, milliseconds, no | |
| "Is the hook actually wired into the assembled plugin and does it fire in a real session?" | Deterministic | free, no API key (real | |
"Did the injected context (a SessionStart hook, a | Deterministic | free, no API key | |
| "Does this skill's description trigger when it should (recall) and stay quiet when it shouldn't (precision)?" | Eval | paid (real model) | |
| "Does this harness change move what the agent does?" (A/B, signal vs noise) | Eval | paid (real model) | |
vigilespackage.jsonnpm i -D vigiles # or: pnpm add -D vigiles / yarn add -D vigilesclaudenpm i -g @anthropic-ai/claude-codeclaude.claude/settings.json.claude/settings.local.jsonhooks.claude-plugin/plugin.jsonhooksskillsagentsmcpServershooks/hooks.jsonskills/<name>/SKILL.mdagents/<name>.mdcommands/<name>.mdPreToolUseSessionStartrunHookimport { runHook, assertHookBlocked } from "vigiles/testing";
const r = runHook(hookCommand, {
hook_event_name: "PreToolUse",
tool_name: "Bash",
tool_input: { command: "git commit --no-verify" },
});
assertHookBlocked(r); // exit 2 / decision:"block" / permissionDecision:"deny"{ trusted: false }{ recordEgress: true }r.egressassertNoEgress(r)assertEgressOnly(r, [...]){ egress: { allow: ["registry.npmjs.org"] } }nftr.egressr.egressDroppeddocs/sandboxing.mdrunHarnessTestimport {
runHarnessTest,
scriptModel,
assertHookFired,
assertRequestContains,
} from "vigiles/testing";
const r = await runHarnessTest({
pluginDir: "./", // or { settings: { hooks: {...} } }
transcript: true,
model: scriptModel([{ text: "ok" }]),
});
assertHookFired(r, "SessionStart");
assertRequestContains(r, "expected injected text"); // did it actually land?runEvalimport { runEval, assertSignificant } from "vigiles/testing";
const report = await runEval({
arms: { off: {}, on: { pluginDir: "./" } },
task: "…a task the harness change should affect…",
measure: (ctx) => ({ ok: /* a bare predicate over the trace */ true }),
trials: 6,
cache: "readwrite",
});
assertSignificant(report, { baseline: "off", arm: "on", metric: "ok" });npx vigiles test # *.harness.{mjs,ts} — unit + deterministic, no API key
npx vigiles eval --trials=6 # *.eval.{mjs,ts} — real model (local / nightly, not CI)runHookclaudeclaude⊘ SKIPPED✓skip(reason)vigiles/testingvigiles test --no-skipPreToolUseSessionStartassertRequestContainspluginDirmeasureTriggerRatedocs/harness-testing.md