Loading...
Loading...
Run isolated eval and grading calls using CC 2.1.81 --bare mode. Constructs claude -p --bare invocations for skill evaluation, trigger testing, and LLM grading without plugin/hook interference. Use when running eval pipelines, grading skill outputs, benchmarking prompt quality, or testing trigger accuracy in isolation.
npx skill4agent add yonatangross/orchestkit bare-evalclaude -p --bare--bare-p--plugin-dir# --bare requires ANTHROPIC_API_KEY (OAuth/keychain disabled)
export ANTHROPIC_API_KEY="sk-ant-..."
# Verify CC version
claude --version # Must be >= 2.1.81| Call Type | Command Pattern |
|---|---|
| Grading | |
| Trigger | |
| Optimize | |
| Force-skill | |
Read("${CLAUDE_SKILL_DIR}/references/invocation-patterns.md")Read("${CLAUDE_SKILL_DIR}/references/grading-schemas.md")npm run eval:skill# eval-common.sh detects ANTHROPIC_API_KEY → sets BARE_MODE=true
# Scripts add --bare to all non-plugin calls automaticallyrun_with_skill| Scenario | Without --bare | With --bare | Savings |
|---|---|---|---|
| Single grading call | ~3-5s startup | ~0.5-1s | 2-4x |
| Trigger (per prompt) | ~3-5s | ~0.5-1s | 2-4x |
| Full eval (50 calls) | ~150-250s overhead | ~25-50s | 3-5x |
Read("${CLAUDE_SKILL_DIR}/rules/_sections.md")Read("${CLAUDE_SKILL_DIR}/references/troubleshooting.md")eval:skilleval:triggereval:qualityoptimize-description.shdoctor/references/version-compatibility.md