Loading...
Loading...
Found 63 Skills
Social media campaign analysis and performance tracking. Calculates engagement rates, ROI, and benchmarks across platforms. Use for analyzing social media performance, calculating engagement rate, measuring campaign ROI, comparing platform metrics, or benchmarking against industry standards.
Research-driven code review and validation at multiple levels of abstraction. Two modes: (1) Session review — after making changes, review and verify work using parallel reviewers that research-validate every assumption; (2) Full codebase audit — deep end-to-end evaluation using parallel teams of subagent-spawning reviewers. Use when reviewing changes, verifying work quality, auditing a codebase, validating correctness, checking assumptions, finding defects, reducing complexity. NOT for writing new code, explaining code, or benchmarking.
Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM, even if they don't say "evals" explicitly. Use for making sure an AI app works correctly, catching regressions after prompt changes, debugging why an agent started behaving differently, or validating output quality before shipping.
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.
Evaluates code generation models across HumanEval, MBPP, MultiPL-E, and 15+ benchmarks with pass@k metrics. Use when benchmarking code models, comparing coding abilities, testing multi-language support, or measuring code generation quality. Industry standard from BigCode Project used by HuggingFace leaderboards.
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
Expert-level performance optimization, profiling, benchmarking, and tuning
Analyzes and optimizes SQL queries using EXPLAIN plans, index recommendations, query rewrites, and performance benchmarking. Use for "query optimization", "slow queries", "database performance", or "EXPLAIN analysis".
Automated reproduction of comprehensive model evaluation benchmarks following the Benchmark Suite V3. Auto-activates for model benchmarking, comparison evaluation, or performance testing between AI models.
Provides guidance for writing and benchmarking optimized CUDA kernels for NVIDIA GPUs (H100, A100, T4) targeting HuggingFace diffusers and transformers libraries. Supports models like LTX-Video, Stable Diffusion, LLaMA, Mistral, and Qwen. Includes integration with HuggingFace Kernels Hub (get_kernel) for loading pre-compiled kernels. Includes benchmarking scripts to compare kernel performance against baseline implementations.
Calculate engagement rates for creator posts and benchmark them against platform and tier averages. This skill should be used when calculating an influencer's engagement rate, benchmarking creator engagement against industry averages, evaluating whether a creator's engagement is above or below average for their tier, comparing engagement rates across platforms, checking if engagement rates suggest fake followers, auditing a creator's engagement quality before a partnership, analyzing engagement by content type (reels, stories, feed posts, TikTok videos), or assessing engagement trends across a creator's recent posts. For estimating fair market rates based on engagement, see creator-rate-estimator. For full creator vetting beyond engagement, see creator-vetting-scorecard. For scoring niche fit, see niche-fit-scorer.
Score each creator on a completed campaign across consistency, content quality, engagement rate, and brand alignment, then produce a ranked retention list for future campaigns. This skill should be used when grading creators after a campaign ends, evaluating influencer performance post-campaign, ranking creators by campaign performance, building a retention list of top creators, deciding which creators to rebook for the next campaign, scoring influencer deliverables after a launch, comparing creator performance across a campaign roster, auditing which creators delivered the most value, or tiering creators into re-engage versus one-and-done lists. For calculating engagement rates and benchmarking them by tier, see engagement-rate-calculator-benchmarker. For scoring niche fit before a campaign, see niche-fit-scorer. For building the full campaign report with ROI narrative, see campaign-roi-calculator-narrative-builder.