Loading...
Loading...
Found 1,928 Skills
Measure and improve how well your AI works. Use when AI gives wrong answers, accuracy is bad, responses are unreliable, you need to test AI quality, evaluate your AI, write metrics, benchmark performance, optimize prompts, improve results, or systematically make your AI better. Covers DSPy evaluation, metrics, and optimization.
Score, grade, or evaluate things using AI against a rubric. Use when grading essays, scoring code reviews, rating candidate responses, auditing support quality, evaluating compliance, building a quality rubric, running QA checks against criteria, assessing performance, rating content quality, or any task where you need numeric scores with justifications — not just categories.
Machine learning development patterns, model training, evaluation, and deployment. Use when building ML pipelines, training models, feature engineering, model evaluation, or deploying ML systems to production.
Triage GitHub bug reports for actionability. Use when evaluating whether a bug issue has sufficient detail and identifying missing information from the reporter.
Plan and (when feasible) implement or execute user acceptance tests (UAT) / end-to-end acceptance scenarios. Converts requirements or user stories into acceptance criteria, test cases, test data, and a sign-off checklist; suggests automation (Playwright/Cypress for web, golden/snapshot tests for CLIs/APIs). Use when validating user-visible behavior for a release, or mapping requirements to acceptance coverage.
Core philosophy for designing Claude Code skills - when to use skills vs agents, the knowledge test, and what makes skills valuable. Use when deciding component type or evaluating skill quality.
Use this skill to work with Microsoft Foundry (Azure AI Foundry): deploy AI models from catalog, build RAG applications with knowledge indexes, create and evaluate AI agents. USE FOR: Microsoft Foundry, AI Foundry, deploy model, model catalog, RAG, knowledge index, create agent, evaluate agent, agent monitoring. DO NOT USE FOR: Azure Functions (use azure-functions), App Service (use azure-create-app).
Deep-dive analysis of GitHub projects. Use when the user mentions a GitHub repo/project name and wants to understand it — triggered by phrases like "help me look at this project", "learn about XXX", "how is this project", "analyze the repo", or any request to explore/evaluate a GitHub project. Covers architecture, community health, competitive landscape, and cross-platform knowledge sources.
Evaluate every produced output (code, report, plan, data, API response) against type-specific quality criteria, score 1-10, make accept/reject decisions, and provide actionable improvement suggestions. Triggers on "evaluate", "check", "review", "quality control", "is this good enough", "score it", or before passing output to the next step in an agentic workflow.
Build automated evaluation suites for AI agents using golden datasets, rubrics, and regression gates.
Evaluate and improve user experience of interfaces (CLI, web, mobile)
Generate and evaluate marketing slogans for any product or service. Creates options across multiple angles, scores against criteria, and recommends the best fit.