Search Results: benchmarking

Found 90 Skills

Backend Developmentpatricio0312rev/skills

sql-query-optimizer

Analyzes and optimizes SQL queries using EXPLAIN plans, index recommendations, query rewrites, and performance benchmarking. Use for "query optimization", "slow queries", "database performance", or "EXPLAIN analysis".

🇺🇸|EnglishTranslated

Data Processinganthropics/financial-serv...

fsi-comps-analysis

Build institutional-grade comparable company analyses with operating metrics, valuation multiples, and statistical benchmarking in Excel/spreadsheet format. **Perfect for:** - Public company valuation (M&A, investment analysis) - Benchmarking performance vs. industry peers - Pricing IPOs or funding rounds - Identifying valuation outliers (over/under-valued) - Supporting investment committee presentations - Creating sector overview reports **Not ideal for:** - Private companies without comparable public peers - Highly diversified conglomerates - Distressed/bankrupt companies - Pre-revenue startups - Companies with unique business models

🇺🇸|EnglishTranslated

Backend Developmentsamber/cc-skills-golang

golang-benchmark

Golang benchmarking, profiling, and performance measurement. Use when writing, running, or comparing Go benchmarks, profiling hot paths with pprof, interpreting CPU/memory/trace profiles, analyzing results with benchstat, setting up CI benchmark regression detection, or investigating production performance with Prometheus runtime metrics. Also use when the developer needs deep analysis on a specific performance indicator - this skill provides the measurement methodology, while golang-performance provides the optimization patterns.

🇺🇸|EnglishTranslated

Backend Developmentsamber/cc-skills-golang

golang-troubleshooting

Troubleshoot Golang programs systematically - find and fix the root cause. Use when encountering bugs, crashes, deadlocks, or unexpected behavior in Go code. Covers debugging methodology, common Go pitfalls, test-driven debugging, pprof setup and capture, Delve debugger, race detection, GODEBUG tracing, and production debugging. Start here for any 'something is wrong' situation. Not for interpreting profiles or benchmarking (see golang-benchmark skill) or applying optimization patterns (see golang-performance skill).

🇺🇸|EnglishTranslated

Testing & QAeduardo-sl/go-agent-skill...

go-test-quality

Go testing patterns for production-grade code: subtests, test helpers, fixtures, golden files, httptest, testcontainers, property-based testing, and fuzz testing. Covers mocking strategies, test isolation, coverage analysis, and test design philosophy. Use when writing tests, improving coverage, reviewing test quality, setting up test infrastructure, or choosing a testing approach. Trigger examples: "add tests", "improve coverage", "write tests for this", "test helpers", "mock this dependency", "integration test", "fuzz test". Do NOT use for performance benchmarking methodology (use go-performance-review), security testing (use go-security-audit), or table-driven test patterns specifically (use go-test-table-driven).

🇺🇸|EnglishTranslated

AI & Machine Learningakillness/oh-my-skills

skill-autoresearch

Autonomously optimize an existing AI skill by running it repeatedly against binary evals, mutating one instruction at a time, and keeping only changes that improve pass rate. Based on Karpathy-style autoresearch, but applied to SKILL.md iteration instead of ML training. Use when optimizing a skill, benchmarking prompt quality, building evals for a skill, or running self-improvement loops on reusable agent instructions. Triggers on: skill-autoresearch, optimize this skill, improve this skill, benchmark this skill, eval my skill, run autoresearch on this skill, self-improve skill.

🇺🇸|EnglishTranslated

Data Processingasgard-ai-platform/skills

algo-hr-compensation

Conduct compensation benchmarking analysis to position salaries against market data. Use this skill when the user needs to assess pay competitiveness, build salary bands, or analyze pay equity — even if they say 'are we paying market rate', 'salary benchmarking', or 'compensation analysis'.

🇺🇸|EnglishTranslated

AI & Machine Learningvllm-project/vllm-skills

vllm-bench-serve

Benchmark vLLM or OpenAI-compatible serving endpoints using vllm bench serve. Supports multiple datasets (random, sharegpt, sonnet, HF), backends (openai, openai-chat, vllm-pooling, embeddings), throughput/latency testing with request-rate control, and result saving. Use when benchmarking LLM serving performance, measuring TTFT/TPOT, or load testing inference APIs.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

nemo-evaluator-sdk

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

🇺🇸|EnglishTranslated

AI & Machine Learningkiterlin/intelligent-dete...

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

🇺🇸|EnglishTranslated

Marketing & Growthsales-skills/sales

sales-customer-feedback

Customer feedback, NPS, CSAT, CES, Voice of Customer strategy across platforms — survey design, response rate optimization, closed-loop feedback, text analytics, benchmarking, program governance. Use when NPS scores are stagnant, survey response rates are low, feedback isn't driving action, unsure which CX metric to use, need to design a VoC program, comparing feedback tools (Medallia vs Qualtrics vs SurveyMonkey vs Typeform), or customers feel over-surveyed. Do NOT use for product review collection like Trustpilot or G2 (use /sales-customer-reviews) or in-app message surveys (use /sales-in-app-messaging).

🇺🇸|EnglishTranslated

AI & Machine Learningalpoxdev/hypercore

autoresearch-skill

[Hyper] Optimize an existing Codex skill through baseline-first experiments, binary evals, optional guards, and one-mutation-at-a-time iteration. Use for skill autoresearch, measured trigger/workflow improvement, self-optimizing a skill, benchmarking skill changes, or resuming skill experiment artifacts.

🇺🇸|EnglishTranslated

1 scripts/Checked