Search Results: benchmarking

Found 63 Skills

Marketing & Growthalirezarezvani/claude-ski...

social-media-analyzer

Social media campaign analysis and performance tracking. Calculates engagement rates, ROI, and benchmarks across platforms. Use for analyzing social media performance, calculating engagement rate, measuring campaign ROI, comparing platform metrics, or benchmarking against industry standards.

🇺🇸|EnglishTranslated

2 scripts/Checked

Code Qualitywyattowalsh/agents

honest-review

Research-driven code review and validation at multiple levels of abstraction. Two modes: (1) Session review — after making changes, review and verify work using parallel reviewers that research-validate every assumption; (2) Full codebase audit — deep end-to-end evaluation using parallel teams of subagent-spawning reviewers. Use when reviewing changes, verifying work quality, auditing a codebase, validating correctness, checking assumptions, finding defects, reducing complexity. NOT for writing new code, explaining code, or benchmarking.

🇺🇸|EnglishTranslated

AI & Machine Learninggithub/awesome-copilot

eval-driven-dev

Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM, even if they don't say "evals" explicitly. Use for making sure an AI app works correctly, catching regressions after prompt changes, debugging why an agent started behaving differently, or validating output quality before shipping.

🇺🇸|EnglishTranslated

Testing & QAdavila7/claude-code-templ...

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

evaluating-code-models

Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.

🇺🇸|EnglishTranslated

AI & Machine Learningdavila7/claude-code-templ...

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

🇺🇸|EnglishTranslated

Code Qualitypersonamanagmentlayer/pcl

performance-expert

Expert-level performance optimization, profiling, benchmarking, and tuning

🇺🇸|EnglishTranslated

Backend Developmentpatricio0312rev/skills

sql-query-optimizer

Analyzes and optimizes SQL queries using EXPLAIN plans, index recommendations, query rewrites, and performance benchmarking. Use for "query optimization", "slow queries", "database performance", or "EXPLAIN analysis".

🇺🇸|EnglishTranslated

AI & Machine Learningrysweet/amplihack

model-evaluation-benchmark

Automated reproduction of comprehensive model evaluation benchmarks following the Benchmark Suite V3. Auto-activates for model benchmarking, comparison evaluation, or performance testing between AI models.

🇺🇸|EnglishTranslated

AI & Machine Learninghuggingface/kernels

cuda-kernels

Provides guidance for writing and benchmarking optimized CUDA kernels for NVIDIA GPUs (H100, A100, T4) targeting HuggingFace diffusers and transformers libraries. Supports models like LTX-Video, Stable Diffusion, LLaMA, Mistral, and Qwen. Includes integration with HuggingFace Kernels Hub (get_kernel) for loading pre-compiled kernels. Includes benchmarking scripts to compare kernel performance against baseline implementations.

🇺🇸|EnglishTranslated

5 scripts/Checked

Marketing & Growtharchive-dot-com/creator-m...

engagement-rate-calculator-benchmarker

Calculate engagement rates for creator posts and benchmark them against platform and tier averages. This skill should be used when calculating an influencer's engagement rate, benchmarking creator engagement against industry averages, evaluating whether a creator's engagement is above or below average for their tier, comparing engagement rates across platforms, checking if engagement rates suggest fake followers, auditing a creator's engagement quality before a partnership, analyzing engagement by content type (reels, stories, feed posts, TikTok videos), or assessing engagement trends across a creator's recent posts. For estimating fair market rates based on engagement, see creator-rate-estimator. For full creator vetting beyond engagement, see creator-vetting-scorecard. For scoring niche fit, see niche-fit-scorer.

🇺🇸|EnglishTranslated

Marketing & Growtharchive-dot-com/creator-m...

post-campaign-creator-scorecard

Score each creator on a completed campaign across consistency, content quality, engagement rate, and brand alignment, then produce a ranked retention list for future campaigns. This skill should be used when grading creators after a campaign ends, evaluating influencer performance post-campaign, ranking creators by campaign performance, building a retention list of top creators, deciding which creators to rebook for the next campaign, scoring influencer deliverables after a launch, comparing creator performance across a campaign roster, auditing which creators delivered the most value, or tiering creators into re-engage versus one-and-done lists. For calculating engagement rates and benchmarking them by tier, see engagement-rate-calculator-benchmarker. For scoring niche fit before a campaign, see niche-fit-scorer. For building the full campaign report with ROI narrative, see campaign-roi-calculator-narrative-builder.

🇺🇸|EnglishTranslated