Search Results: model-evaluation

Found 20 Skills

model-evaluation-benchmark

Automated reproduction of comprehensive model evaluation benchmarks following the Benchmark Suite V3. Auto-activates for model benchmarking, comparison evaluation, or performance testing between AI models.

🇺🇸|EnglishTranslated

AI & Machine Learningawslabs/agent-plugins

model-evaluation

Generates a Jupyter notebook that evaluates a fine-tuned SageMaker model using LLM-as-a-Judge. Use when the user says "evaluate my model", "how did my model perform", "compare models", or after a training job completes. Supports built-in and custom evaluation metrics, evaluation dataset setup, and judge model selection.

🇺🇸|EnglishTranslated

2 scripts/Checked

AI & Machine Learningk-dense-ai/claude-scienti...

scikit-learn

Machine learning in Python with scikit-learn. Use when working with supervised learning (classification, regression), unsupervised learning (clustering, dimensionality reduction), model evaluation, hyperparameter tuning, preprocessing, or building ML pipelines. Provides comprehensive reference documentation for algorithms, preprocessing techniques, pipelines, and best practices.

🇺🇸|EnglishTranslated

104

2 scripts/Checked

Data Processingk-dense-ai/claude-scienti...

scikit-survival

Comprehensive toolkit for survival analysis and time-to-event modeling in Python using scikit-survival. Use this skill when working with censored survival data, performing time-to-event analysis, fitting Cox models, Random Survival Forests, Gradient Boosting models, or Survival SVMs, evaluating survival predictions with concordance index or Brier score, handling competing risks, or implementing any survival analysis workflow with the scikit-survival library.

🇺🇸|EnglishTranslated

AI & Machine Learningborghei/claude-skills

data-scientist

Expert data science covering machine learning, statistical modeling, experimentation, predictive analytics, and advanced analytics.

🇺🇸|EnglishTranslated

AI & Machine Learningmindrally/skills

scikit-learn-best-practices

Best practices for scikit-learn machine learning, model development, evaluation, and deployment in Python

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

evaluating-code-models

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.

🇺🇸|EnglishTranslated

AI & Machine Learningqodex-ai/ai-agent-skills

llm-fine-tuning-guide

Master fine-tuning of large language models for specific domains and tasks. Covers data preparation, training techniques, optimization strategies, and evaluation methods. Use when adapting models for specialized applications, reducing inference costs, or improving domain-specific performance.

🇺🇸|EnglishTranslated

5 scripts/Attention

AI & Machine Learningtondevrel/scientific-agen...

scikit-learn

The industry standard library for machine learning in Python. Provides simple and efficient tools for predictive data analysis, covering classification, regression, clustering, dimensionality reduction, model selection, and preprocessing.

🇺🇸|EnglishTranslated

Data Processingterrylica/cc-skills

rangebar-eval-metrics

Range bar evaluation metrics for quant trading. TRIGGERS - range bar metrics, Sharpe ratio, WFO metrics, PSR DSR MinTRL.

🇺🇸|EnglishTranslated

1 scripts/Checked

AI & Machine Learninghuggingface/skills

hugging-face-evaluation

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.

🇺🇸|EnglishTranslated

8 scripts/Checked

Product & Designvasilyu1983/ai-agents-pub...

startup-business-models

Use when choosing or evaluating a startup revenue model, pricing/value metric, packaging/tier design, or calculating unit economics (LTV, CAC, payback, gross margin, NRR), including usage-based/credit/AI pricing and variable compute/COGS constraints.

🇺🇸|EnglishTranslated