Search Results: uat

Found 1,150 Skills

Testing & QAdavila7/claude-code-templ...

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

🇺🇸|EnglishTranslated

AI & Machine Learningshipshitdev/library

advanced-evaluation

Master LLM-as-a-Judge evaluation techniques including direct scoring, pairwise comparison, rubric generation, and bias mitigation. Use when building evaluation systems, comparing model outputs, or establishing quality standards for AI-generated content.

🇺🇸|EnglishTranslated

1 scripts/Checked

Project Managementshipshitdev/library

cofounder-evaluator

Use this skill when users need to evaluate potential co-founders, assess founder compatibility, design equity splits, or navigate co-founder relationships. Activates for "should I work with this person," "co-founder fit," "equity split," or founding team questions.

🇺🇸|EnglishTranslated

Backend Developmentgiuseppe-trisciuoglio/dev...

spring-boot-actuator

Configure Spring Boot Actuator for production-grade monitoring, health probes, secured management endpoints, and Micrometer metrics across JVM services.

🇺🇸|EnglishTranslated

AI & Machine Learninggonglingrui/screen-creati...

novel-evaluator

Strictly and meticulously judge and score story texts, analyze quality from the dimensions of market potential, innovation attributes, and content highlights. Suitable for initial novel screening and multi-dimensional evaluation and scoring

🇨🇳|ChineseTranslated

Project Managementeyadsibai/ltk

tech-stack-evaluation

Use when "evaluating technology", "choosing frameworks", "stack comparison", "technology decisions", or asking about "React vs Vue", "PostgreSQL vs MySQL", "AWS vs GCP", "build vs buy"

🇺🇸|EnglishTranslated

Data Processingclaude-office-skills/skil...

dcf-valuation

Build Discounted Cash Flow (DCF) valuation models. Calculate intrinsic value with customizable assumptions. Generate professional valuation reports.

🇺🇸|EnglishTranslated

Project Managementrefoundai/lenny-skills

evaluating-candidates

Help users make better hiring decisions. Use when someone is evaluating job candidates, making hiring decisions, conducting reference checks, reviewing work samples or take-homes, calibrating their hiring bar, or deciding between finalists.

🇺🇸|EnglishTranslated

Product & Designrefoundai/lenny-skills

evaluating-trade-offs

Help users make better decisions between competing options. Use when someone is weighing pros and cons, comparing alternatives, struggling with a difficult choice, deciding between speed and quality, or asking "should we do X or Y?"

🇺🇸|EnglishTranslated

AI & Machine Learningphrazzld/claude-config

llm-evaluation

LLM prompt testing, evaluation, and CI/CD quality gates using Promptfoo. Invoke when: - Setting up prompt evaluation or regression testing - Integrating LLM testing into CI/CD pipelines - Configuring security testing (red teaming, jailbreaks) - Comparing prompt or model performance - Building evaluation suites for RAG, factuality, or safety Keywords: promptfoo, llm evaluation, prompt testing, red team, CI/CD, regression testing

🇺🇸|EnglishTranslated

Data Processingfatfingererr/macro-skills

evaluate-exponential-trend-deviation-regimes

Calculate the deviation of asset prices relative to the long-term exponential growth trend line, assess whether the current period falls within a historical extreme range, and optionally perform macro factor analysis to evaluate the market regime.

🇨🇳|ChineseTranslated

2 scripts/Checked