Search Results: uat

Found 1,153 Skills

Security & Compliancejeredblu/eval-marketplace

mcp-evaluator

Comprehensive security and privacy evaluation system for MCP (Model Context Protocol) servers. Use when users provide GitHub URLs to MCP servers and request security assessment, privacy evaluation, or ask "is this MCP safe to use." Evaluates security vulnerabilities, privacy risks, code quality, community feedback, and provides actionable recommendations with risk scoring.

🇺🇸|EnglishTranslated

Security & Compliancejeredblu/eval-marketplace

agent-skill-evaluator

Comprehensive security and safety evaluation system for agent skills (.skill files). Use when users provide GitHub URLs, website links, or .skill files for download and request security assessment, safety evaluation, or ask "is this skill safe to use." Evaluates prompt injection risks, malicious code patterns, hidden instructions, data exfiltration attempts, and provides actionable recommendations with risk scoring.

🇺🇸|EnglishTranslated

Product & Designowl-listener/designer-ski...

heuristic-evaluation

Conduct expert heuristic evaluations using Nielsen's heuristics and domain-specific criteria.

🇺🇸|EnglishTranslated

Uncategorizedsundial-org/awesome-openc...

stock-evaluator-v3

Comprehensive evaluation of potential stock investments combining valuation analysis, fundamental research, technical assessment, and clear buy/hold/sell recommendations. Use when the user asks about buying a stock, evaluating investment opportunities, analyzing watchlist candidates, or requests stock recommendations. Provides specific entry prices, position sizing, and conviction ratings.

🇺🇸|English

Tools & Utilitiesalirezarezvani/claude-cod...

tech-stack-evaluator

Comprehensive technology stack evaluation and comparison tool with TCO analysis, security assessment, and intelligent recommendations for engineering teams

🇺🇸|EnglishTranslated

7 scripts/Checked

AI & Machine Learningarize-ai/arize-skills

arize-evaluator

INVOKE THIS SKILL for LLM-as-judge evaluation workflows on Arize: creating/updating evaluators, running evaluations on spans or experiments, tasks, trigger-run, column mapping, and continuous monitoring. Use when the user says: create an evaluator, LLM judge, hallucination/faithfulness/correctness/relevance, run eval, score my spans or experiment, ax tasks, trigger-run, trigger eval, column mapping, continuous monitoring, query filter for evals, evaluator version, or improve an evaluator prompt.

🇺🇸|EnglishTranslated

AI & Machine Learningawslabs/agent-plugins

dataset-evaluation

Validates dataset formatting and quality for SageMaker model fine-tuning (SFT, DPO, or RLVR). Use when the user says "is my dataset okay", "evaluate my data", "check my training data", "I have my own data", or before starting any fine-tuning job. Detects file format, checks schema compliance against the selected model and technique, and reports whether the data is ready for training or evaluation.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learningawslabs/agent-plugins

model-evaluation

Generates a Jupyter notebook that evaluates a fine-tuned SageMaker model using LLM-as-a-Judge. Use when the user says "evaluate my model", "how did my model perform", "compare models", or after a training job completes. Supports built-in and custom evaluation metrics, evaluation dataset setup, and judge model selection.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learninglingzhi227/agent-research...

symbolic-equation

Discover scientific equations from data using LLM-guided evolutionary search (LLM-SR). Multi-island algorithm with softmax-based cluster sampling, island reset, and LLM-proposed equation mutations. Use for symbolic regression and equation discovery.

🇺🇸|EnglishTranslated

AI & Machine Learninghuggingface/skills

hugging-face-evaluation

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.

🇺🇸|EnglishTranslated

8 scripts/Checked

Project Managementoldwinter/skills

evaluating-trade-offs

Evaluate trade-offs and produce a Trade-off Evaluation Pack (trade-off brief, options+criteria matrix, all-in cost/opportunity cost table, impact ranges, recommendation, stop/continue triggers). Use for tradeoff/trade-off, pros and cons, cost-benefit, opportunity cost, build vs buy, ship fast vs ship better, continue vs stop (sunk costs). Category: Leadership.

🇺🇸|EnglishTranslated

AI & Machine Learningvuralserhat86/antigravity...

skill_evaluator

Evaluates agent skills against Anthropic's best practices. Use when asked to review, evaluate, assess, or audit a skill for quality. Analyzes SKILL.md structure, naming conventions, description quality, content organization, and identifies anti-patterns. Produces actionable improvement recommendations.

🇺🇸|EnglishTranslated

1 scripts/Attention