Search Results: benchmarking

Found 117 Skills

AI & Machine Learningvllm-project/vllm-skills

vllm-prefix-cache-bench

This is a skill for benchmarking the efficiency of automatic prefix caching in vLLM using fixed prompts, real-world datasets, or synthetic prefix/suffix patterns. Use when the user asks to benchmark prefix caching hit rate, caching efficiency, or repeated-prompt performance in vLLM.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

nemo-evaluator-sdk

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

🇺🇸|EnglishTranslated

AI & Machine Learningrysweet/amplihack

eval-recipes-runner

Run Microsoft's eval-recipes benchmarks to validate amplihack improvements against baseline agents. Auto-activates when testing improvements, running evals, or benchmarking changes.

🇺🇸|EnglishTranslated

Code Qualityoutfitter-dev/agents

performance

This skill should be used when profiling code, optimizing bottlenecks, benchmarking, or when "performance", "profiling", "optimization", or "--perf" are mentioned.

🇺🇸|EnglishTranslated

AI & Machine Learningeyadsibai/ltk

nemo-evaluator

Use when evaluating LLMs, running benchmarks like MMLU/HumanEval/GSM8K, setting up evaluation pipelines, or asking about "NeMo Evaluator", "LLM benchmarking", "model evaluation", "MMLU", "HumanEval", "GSM8K", "benchmark harnesses"

🇺🇸|EnglishTranslated

Marketing & Growtharchive-dot-com/creator-m...

engagement-rate-calculator-benchmarker

Calculate engagement rates for creator posts and benchmark them against platform and tier averages. This skill should be used when calculating an influencer's engagement rate, benchmarking creator engagement against industry averages, evaluating whether a creator's engagement is above or below average for their tier, comparing engagement rates across platforms, checking if engagement rates suggest fake followers, auditing a creator's engagement quality before a partnership, analyzing engagement by content type (reels, stories, feed posts, TikTok videos), or assessing engagement trends across a creator's recent posts. For estimating fair market rates based on engagement, see creator-rate-estimator. For full creator vetting beyond engagement, see creator-vetting-scorecard. For scoring niche fit, see niche-fit-scorer.

🇺🇸|EnglishTranslated

Data Processingmarketcalls/openalgo-indi...

custom-indicator

Create a custom technical indicator using Numba JIT + NumPy. Generates production-grade, O(n) optimized indicator functions with charting and benchmarking.

🇺🇸|EnglishTranslated

Data Processinganthropics/financial-serv...

comps-analysis

Build institutional-grade comparable company analyses with operating metrics, valuation multiples, and statistical benchmarking in Excel/spreadsheet format. **Perfect for:** - Public company valuation (M&A, investment analysis) - Benchmarking performance vs. industry peers - Pricing IPOs or funding rounds - Identifying valuation outliers (over/under-valued) - Supporting investment committee presentations - Creating sector overview reports **Not ideal for:** - Private companies without comparable public peers - Highly diversified conglomerates - Distressed/bankrupt companies - Pre-revenue startups - Companies with unique business models

🇺🇸|EnglishTranslated

Uncategorizedabsolutelyskilled/absolut...

compensation-strategy

Use this skill when benchmarking compensation, designing equity plans, building leveling frameworks, or structuring total rewards. Triggers on compensation benchmarking, equity grants, stock options, leveling, pay bands, total rewards, salary ranges, and any task requiring compensation strategy or structure design.

🇺🇸|EnglishTranslated

Data Processinglinuszz/business-strategy...

dupont-analysis

Decompose Return on Equity into component ratios to identify performance drivers. Use for financial analysis, performance benchmarking, and identifying improvement opportunities.

🇺🇸|EnglishTranslated

Testing & QAabsolutelyskilled/absolut...

load-testing

Use this skill when load testing services, benchmarking API performance, planning capacity, or identifying bottlenecks under stress. Triggers on k6, Artillery, JMeter, load testing, stress testing, soak testing, spike testing, performance benchmarks, throughput testing, and any task requiring load or performance testing.

🇺🇸|EnglishTranslated