Search Results: benchmarking

Found 117 Skills

model-evaluation-benchmark

Automated reproduction of comprehensive model evaluation benchmarks following the Benchmark Suite V3. Auto-activates for model benchmarking, comparison evaluation, or performance testing between AI models.

🇺🇸|EnglishTranslated

Data Processingfounderjourney/claude-ski...

saas-financial-projections

Senior SaaS CFO / Financial Analyst (15+ years) specialized in financial modeling, projections, and exit strategy for bootstrapped and VC-backed SaaS companies. Activate when user needs: (1) Revenue projections (1-5 years), (2) Exit valuation and multiples, (3) Unit economics analysis (CAC, LTV, payback), (4) Scenario modeling (conservative/base/optimistic), (5) Fundraising narratives with financial backing, (6) M&A due diligence financials, (7) SaaS metrics benchmarking, (8) Cohort analysis and churn modeling. Triggers: "proyecciones", "projections", "exit", "valuation", "ARR", "MRR", "multiples", "revenue forecast", "financial model", "exit strategy", "CAC", "LTV", "unit economics", "churn", "fundraising", "M&A", "acquisition", "5 year plan".

🇺🇸|EnglishTranslated

Code Qualitywyattowalsh/agents

honest-review

Research-driven code review and validation at multiple levels of abstraction. Two modes: (1) Session review — after making changes, review and verify work using parallel reviewers that research-validate every assumption; (2) Full codebase audit — deep end-to-end evaluation using parallel teams of subagent-spawning reviewers. Use when reviewing changes, verifying work quality, auditing a codebase, validating correctness, checking assumptions, finding defects, reducing complexity. NOT for writing new code, explaining code, or benchmarking.

🇺🇸|EnglishTranslated

AI & Machine Learningascend/agent-skills

hccl-test

HCCL (Huawei Collective Communication Library) performance testing for Ascend NPU clusters. Use for testing distributed communication bandwidth, verifying HCCL functionality, and benchmarking collective operations like AllReduce, AllGather. Covers MPI installation, multi-node pre-flight checks (SSH/CANN version/NPU health), and production testing workflows.

🇺🇸|EnglishTranslated

5 scripts/Attention

Testing & QAdavila7/claude-code-templ...

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

🇺🇸|EnglishTranslated

Marketing & Growthalirezarezvani/claude-ski...

social-media-analyzer

Social media campaign analysis and performance tracking. Calculates engagement rates, ROI, and benchmarks across platforms. Use for analyzing social media performance, calculating engagement rate, measuring campaign ROI, comparing platform metrics, or benchmarking against industry standards.

🇺🇸|EnglishTranslated

2 scripts/Checked

Product & Designjk-0001/skills

competitive-analysis

Perform a deep competitive analysis for a solopreneur business. Use when mapping competitors in detail, finding exploitable gaps, understanding competitor strategy, benchmarking your own offering, or deciding how to position against the field. Goes deeper than the broad landscape mapping in market-research — this is focused dissection of specific competitors. Trigger on "analyze my competitors", "competitive analysis", "who are my competitors", "competitor deep-dive", "how do I beat the competition", "competitive landscape", "benchmark against competitors".

🇺🇸|EnglishTranslated

Code Qualitypersonamanagmentlayer/pcl

performance-expert

Expert-level performance optimization, profiling, benchmarking, and tuning

🇺🇸|EnglishTranslated

AI & Machine Learning404kidwiz/claude-supercod...

performance-monitor

Expert in observing, benchmarking, and optimizing AI agents. Specializes in token usage tracking, latency analysis, and quality evaluation metrics. Use when optimizing agent costs, measuring performance, or implementing evals. Triggers include "agent performance", "token usage", "latency optimization", "eval", "agent metrics", "cost optimization", "agent benchmarking".

🇺🇸|EnglishTranslated

Documentation & Writingabdullahbeam/nexus-design...

generate-philosophy-doc

Generate comprehensive philosophy and standards documents for any domain (UX design, landing pages, email outbound, API design, etc.). Load when user says "create philosophy doc", "generate standards for [domain]", "build best practices guide", or "create benchmarking document". Conducts deep research, synthesizes findings, and produces structured philosophy documents with principles, frameworks, anti-patterns, checklists, case studies, and metrics.

🇺🇸|EnglishTranslated

2 scripts/Checked

Testing & QAsentenz/skills

cpp-benchmark-testing

Automates benchmark test creation for C++ projects using Google Benchmark with consistent software testing patterns. Use when creating performance benchmarks, profiling tests, or when the user mentions benchmarking, Google Benchmark, or performance testing.

🇺🇸|EnglishTranslated

Testing & QAeduardo-sl/go-agent-skill...

go-test-quality

Go testing patterns for production-grade code: subtests, test helpers, fixtures, golden files, httptest, testcontainers, property-based testing, and fuzz testing. Covers mocking strategies, test isolation, coverage analysis, and test design philosophy. Use when writing tests, improving coverage, reviewing test quality, setting up test infrastructure, or choosing a testing approach. Trigger examples: "add tests", "improve coverage", "write tests for this", "test helpers", "mock this dependency", "integration test", "fuzz test". Do NOT use for performance benchmarking methodology (use go-performance-review), security testing (use go-security-audit), or table-driven test patterns specifically (use go-test-table-driven).

🇺🇸|EnglishTranslated