Search Results: uat

Found 1,928 Skills

massgen-develops-massgen

Guide for using MassGen to develop and improve itself. This skill should be used when agents need to run MassGen experiments programmatically (using automation mode) OR analyze terminal UI/UX quality (using visual evaluation tools). These are mutually exclusive workflows for different improvement goals.

🇺🇸|EnglishTranslated

AI & Machine Learningwhitespectre/ai-assistant...

eval-clarity

Score assistant responses for clarity on a strict 1-5 scale, then return strict JSON only with score, rationale, and improvement suggestions. Use when the user asks to evaluate clarity, grade clarity, or critique clarity quality.

🇺🇸|EnglishTranslated

AI & Machine Learninglingzhi227/claude-skills

novelty-assessment

Assess research idea novelty through systematic literature search. Multi-round search-evaluate loops with harsh critic persona. Binary novel/not-novel decision with justification. Use before committing to a research direction.

🇺🇸|EnglishTranslated

AI & Machine Learninghamelsmu/evals-skills

eval-audit

Audit an LLM eval pipeline and surface problems: missing error analysis, unvalidated judges, vanity metrics, etc. Use when inheriting an eval system, when unsure whether evals are trustworthy, or as a starting point when no eval infrastructure exists. Do NOT use when the goal is to build a new evaluator from scratch (use error-analysis, write-judge-prompt, or validate-evaluator instead).

🇺🇸|EnglishTranslated

AI & Machine Learningjpoehnelt/skills

agent-dx-cli-scale

A scoring scale for evaluating how well a CLI is designed for AI agents, based on the "Rewrite Your CLI for AI Agents" principles.

🇺🇸|EnglishTranslated

Security & Compliancecarrilloapps/skills

sar-cybersecurity

Use this skill whenever the user asks for a security analysis, vulnerability assessment, security audit, or any form of Security Assessment Report (SAR) over a codebase, infrastructure, API, database, or system. Triggers include: "audit my code", "find security issues", "run a security check", "generate a SAR", "check for vulnerabilities", "is this code secure", or any request that involves evaluating the security posture of a project. Also triggers when the user uploads or references source code, config files, environment variables, or architecture diagrams and asks for a security opinion. Do NOT use for generic coding tasks, code reviews focused on quality rather than security, or performance optimization unless a security angle is explicitly present.

🇺🇸|EnglishTranslated

Automationalirezarezvani/claude-ski...

setup

Set up a new autoresearch experiment interactively. Collects domain, target file, eval command, metric, direction, and evaluator.

🇺🇸|EnglishTranslated

AI & Machine Learningyonatangross/orchestkit

bare-eval

Run isolated eval and grading calls using CC 2.1.81 --bare mode. Constructs claude -p --bare invocations for skill evaluation, trigger testing, and LLM grading without plugin/hook interference. Use when running eval pipelines, grading skill outputs, benchmarking prompt quality, or testing trigger accuracy in isolation.

🇺🇸|EnglishTranslated

Version Controlnathan13888/nice-skills

wtf

Quick situational awareness for the current git branch. Summarizes what a feature branch is about by analyzing commits and changes against trunk. On trunk, highlights recent interesting activity. Use when user says "wtf", "what's going on", "what is this branch", "what changed", or "catch me up".

🇺🇸|EnglishTranslated

AI & Machine Learningakillness/oh-my-skills

langsmith

Instrument, trace, evaluate, and monitor LLM applications and AI agents with LangSmith. Use when setting up observability for LLM pipelines, running offline or online evaluations, managing prompts in the Prompt Hub, creating datasets for regression testing, or deploying agent servers. Triggers on: langsmith, langchain tracing, llm tracing, llm observability, llm evaluation, trace llm calls, @traceable, wrap_openai, langsmith evaluate, langsmith dataset, langsmith feedback, langsmith prompt hub, langsmith project, llm monitoring, llm debugging, llm quality, openevals, langsmith cli, langsmith experiment, annotate llm, llm judge.

🇺🇸|EnglishTranslated

2 scripts/Attention

Project Managementshubhamsaboo/awesome-llm-...

decision-helper

Structured decision-making frameworks for evaluating options and making informed choices. Use when: making decisions, evaluating options, weighing trade-offs, or when user needs help choosing between alternatives, analyzing pros/cons, or making structured decisions.

🇺🇸|EnglishTranslated

Data Processingoctagonai/skills

industry-pe-ratios

Retrieve industry-specific P/E ratios using Octagon MCP. Use when comparing company valuations to specific industry peers, analyzing sub-sector valuations, and understanding niche market valuations beyond broad sector averages.

🇺🇸|EnglishTranslated