Search Results: benchmark-testing

Found 11 Skills

AI & Machine Learningbobmatnyc/claude-mpm-skil...

local-llm-ops

Local LLM operations with Ollama on Apple Silicon, including setup, model pulls, chat launchers, benchmarks, and diagnostics.

🇺🇸|EnglishTranslated

Testing & QAaffaan-m/everything-claud...

golang-testing

Go testing patterns including table-driven tests, subtests, benchmarks, fuzzing, and test coverage. Follows TDD methodology with idiomatic Go practices.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

tilegym-adding-cutile-kernel

Add a new cuTile GPU kernel operator to TileGym. Covers dispatch registration in ops.py, cuTile backend implementation, __init__.py exports, test creation, and benchmark in tests/benchmark. Use when adding, creating, or implementing a new cuTile operator/kernel in TileGym, or when asking how to register a new cuTile op.

🇺🇸|EnglishTranslated

Testing & QAdavila7/claude-code-templ...

agent-evaluation

Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks Use when: agent testing, agent evaluation, benchmark agents, agent reliability, test agent.

🇺🇸|EnglishTranslated

AI & Machine Learningrysweet/amplihack

self-improving-agent-builder

Encodes a continuous improvement loop for goal-seeking agents: EVAL, ANALYZE, RESEARCH (hypothesis + evidence + counter-arguments), IMPROVE, RE-EVAL, DECIDE. Auto-commits improvements (+2% net, no regression >5%) and reverts failures. Works with all 4 SDK implementations. Auto-activates on "improve agent", "self-improving loop", "agent eval loop", "benchmark agents", "run improvement cycle".

🇺🇸|EnglishTranslated

AI & Machine Learningmcollina/skills

skill-optimizer

Optimizes AI skills for activation, clarity, and cross-model reliability. Use when creating or editing skill packs, diagnosing weak skill uptake, reducing regressions, tuning instruction salience, improving examples, shrinking context cost, or setting benchmark/release gates for skills. Trigger terms: skill optimization, activation gap, benchmark skill, with/without skill delta, regression, context budget, prompt salience.

🇺🇸|EnglishTranslated

Frontend Developmentgarrytan/gstack

benchmark

Performance regression detection using the browse daemon. Establishes baselines for page load times, Core Web Vitals, and resource sizes. Compares before/after on every PR. Tracks performance trends over time. Use when: "performance", "benchmark", "page speed", "lighthouse", "web vitals", "bundle size", "load time".

🇺🇸|EnglishTranslated

Backend Developmentjabrena/cursor-rules-java

164-java-profiling-verify

Use when you need to verify Java performance optimizations by comparing profiling results before and after refactoring — including baseline validation, post-refactoring report generation, quantitative before/after metrics comparison, side-by-side flamegraph analysis, regression detection, or creating profiling-comparison-analysis and profiling-final-results documentation. Part of the skills-for-java project

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

adding-cutile-kernel

🇺🇸|EnglishTranslated

Tools & Utilitiesgarrytan/gbrain

maintain

Brain health checks: back-link enforcement, citation audit, filing validation, stale info detection, orphan pages, and benchmarks. Use when asked to check brain health, run maintenance, or audit quality.

🇺🇸|EnglishTranslated

AI & Machine Learningsgl-project/sglang

add-sgl-kernel

Step-by-step tutorial for adding a heavyweight AOT CUDA/C++ kernel to sgl-kernel (including tests & benchmarks)

🇺🇸|EnglishTranslated