Search Results: skill-evaluation

Found 18 Skills

AI & Machine Learningvincenzoimp/academic-rese...

skill-evaluation

Use when testing, reviewing, pressure-testing, refining, packaging, or validating agent skills for academic research workflows before installing or relying on them.

🇺🇸|EnglishTranslated

AI & Machine Learningsoftaworks/agent-toolkit

skill-judge

Evaluate Agent Skill design quality against official specifications and best practices. Use when reviewing, auditing, or improving SKILL.md files and skill packages. Provides multi-dimensional scoring and actionable improvement suggestions.

🇺🇸|EnglishTranslated

3.4k

AI & Machine Learningsickn33/antigravity-aweso...

skill-writer

Create and improve agent skills following the Agent Skills specification. Use when asked to create, write, or update skills.

🇺🇸|EnglishTranslated

AI & Machine Learningfatih-developer/fth-skill...

agent-reviewer

After an agentic task completes, perform a retrospective analysis across 6 dimensions (goal alignment, efficiency, decision quality, error handling, communication, reusability). Score performance, identify inefficiency patterns, evaluate skill usage, and produce actionable improvement recommendations. Triggers on "how did it go", "retrospective", "review performance", "what could be better", or after any long agentic task completes.

🇺🇸|EnglishTranslated

AI & Machine Learninguinaf/skills

skill-audit

Audit existing skills with Tessl scoring, metadata and trigger-coverage checks, repo conventions, and skill-authoring best practices. Use when creating or revising a skill, triaging weak self-activation, or comparing a skill against source-repo guidance such as `AGENTS.md`, `CLAUDE.md`, or repo rules, plus external skill guidance. Do not use to verify general application code or to rewrite unrelated docs.

🇺🇸|EnglishTranslated

AI & Machine Learningvuralserhat86/antigravity...

skill_evaluator

Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description quality, content organization, and identifies anti-patterns. Produces actionable improvement recommendations.

🇺🇸|EnglishTranslated

1 scripts/Attention

AI & Machine Learningnotque/claude-code-toolki...

skill-eval

Evaluate and improve skills through measured testing. Run trigger evaluations to test whether skill descriptions cause correct activation, optimize descriptions via automated train/test loops, benchmark skill output quality with A/B comparisons, and validate skill structure. Use when user says "improve skill", "test skill triggers", "optimize description", "benchmark skill", "eval skill", or "skill quality". Do NOT use for creating new skills (use skill-creator-engineer).

🇺🇸|EnglishTranslated

AI & Machine Learningwalletconnect/skills

skill-writing

Designs and writes high-quality Agent Skills (SKILL.md + optional reference files/scripts). Use when asked to create a new Skill, rewrite an existing Skill, improve Skill structure/metadata, or generate templates/evaluations for Skills.

🇺🇸|EnglishTranslated

AI & Machine Learningyonatangross/orchestkit

bare-eval

Run isolated eval and grading calls using CC 2.1.81 --bare mode. Constructs claude -p --bare invocations for skill evaluation, trigger testing, and LLM grading without plugin/hook interference. Use when running eval pipelines, grading skill outputs, benchmarking prompt quality, or testing trigger accuracy in isolation.

🇺🇸|EnglishTranslated

Security & Compliancejeredblu/eval-marketplace

agent-skill-evaluator

Comprehensive security and safety evaluation system for agent skills (.skill files). Use when users provide GitHub URLs, website links, or .skill files for download and request security assessment, safety evaluation, or ask "is this skill safe to use." Evaluates prompt injection risks, malicious code patterns, hidden instructions, data exfiltration attempts, and provides actionable recommendations with risk scoring.

🇺🇸|EnglishTranslated

AI & Machine Learningzrosenbauer/skills

skill-eval

This skill should be used when the user wants to run baseline evaluations on existing agent skills, regenerate transcripts after a model upgrade, or check whether a skill still solves the gap it was authored for. Common triggers include "rerun the baselines", "re-eval skill X", "test all the skills", "check for skill drift", and "run the evals". Bakes in verbatim transcript capture (no paraphrasing), deterministic-only grading (regex / contains / file_exists — no LLM-as-judge), and the iteration-N workspace convention. Skip when authoring a new skill (use skill-creator) or modifying skill content directly.

🇺🇸|EnglishTranslated

AI & Machine Learningsteelan9199/wechat-publis...

skill-laws

Define the design rules (Skill Laws) that all Skills must follow, including core principles such as AI-first, human-centric, and ready-to-use. When to use: When users create a new Skill, optimize an existing Skill, ask about Skill design specifications, or need to evaluate Skill quality.

🇨🇳|ChineseTranslated