Loading...
Loading...
Found 18 Skills
Use when testing, reviewing, pressure-testing, refining, packaging, or validating agent skills for academic research workflows before installing or relying on them.
Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.
Create and improve agent skills following the Agent Skills specification. Use when asked to create, write, or update skills.
After an agentic task completes, perform a retrospective analysis across 6 dimensions (goal alignment, efficiency, decision quality, error handling, communication, reusability). Score performance, identify inefficiency patterns, evaluate skill usage, and produce actionable improvement recommendations. Triggers on "how did it go", "retrospective", "review performance", "what could be better", or after any long agentic task completes.
Audit existing skills with Tessl scoring, metadata and trigger-coverage checks, repo conventions, and skill-authoring best practices. Use when creating or revising a skill, triaging weak self-activation, or comparing a skill against source-repo guidance such as `AGENTS.md`, `CLAUDE.md`, or repo rules, plus external skill guidance. Do not use to verify general application code or to rewrite unrelated docs.
Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description quality, content organization, and identifies anti-patterns. Produces actionable improvement recommendations.
Evaluate and improve skills through measured testing. Run trigger evaluations to test whether skill descriptions cause correct activation, optimize descriptions via automated train/test loops, benchmark skill output quality with A/B comparisons, and validate skill structure. Use when user says "improve skill", "test skill triggers", "optimize description", "benchmark skill", "eval skill", or "skill quality". Do NOT use for creating new skills (use skill-creator-engineer).
Designs and writes high-quality Agent Skills (SKILL.md + optional reference files/scripts). Use when asked to create a new Skill, rewrite an existing Skill, improve Skill structure/metadata, or generate templates/evaluations for Skills.
Run isolated eval and grading calls using CC 2.1.81 --bare mode. Constructs claude -p --bare invocations for skill evaluation, trigger testing, and LLM grading without plugin/hook interference. Use when running eval pipelines, grading skill outputs, benchmarking prompt quality, or testing trigger accuracy in isolation.
Comprehensive security and safety evaluation system for agent skills (.skill files). Use when users provide GitHub URLs, website links, or .skill files for download and request security assessment, safety evaluation, or ask "is this skill safe to use." Evaluates prompt injection risks, malicious code patterns, hidden instructions, data exfiltration attempts, and provides actionable recommendations with risk scoring.
This skill should be used when the user wants to run baseline evaluations on existing agent skills, regenerate transcripts after a model upgrade, or check whether a skill still solves the gap it was authored for. Common triggers include "rerun the baselines", "re-eval skill X", "test all the skills", "check for skill drift", and "run the evals". Bakes in verbatim transcript capture (no paraphrasing), deterministic-only grading (regex / contains / file_exists — no LLM-as-judge), and the iteration-N workspace convention. Skip when authoring a new skill (use skill-creator) or modifying skill content directly.
Define the design rules (Skill Laws) that all Skills must follow, including core principles such as AI-first, human-centric, and ready-to-use. When to use: When users create a new Skill, optimize an existing Skill, ask about Skill design specifications, or need to evaluate Skill quality.