Search Results: benchmarking

Found 147 Skills

agent-benchmark-suite

Agent skill for benchmark-suite - invoke with $agent-benchmark-suite

Marketing & Growthpostplusai/postplus-skill...

instagram-content-benchmark

Benchmark Instagram posts and Reels to discover winning content patterns, shortlist high-value examples, and extract reusable hooks and formats.

🇺🇸|EnglishTranslated

AI & Machine Learningnvidia/skills

perf-workload-profiling

Code instrumentation for timing workloads. Two scenarios: (1) Training loop — inject manual timing to report per-iteration latency, throughput (samples/sec), and data load time. (2) Standalone kernel/op — write CUDA event timing code with warmup, per-iteration statistics, and anti-pattern avoidance. Also covers NVTX annotation for labeling profiler timelines. NOT for: running or analyzing profiler tools (nsys, ncu, Nsight Systems, Nsight Compute), writing kernels (Triton, CuTe, CUDA), applying optimizations (CUDA Graphs, gradient checkpointing, fusion), or interpreting roofline/SOL% metrics. Triggers: "measure throughput", "benchmark this function", "time my training loop", "samples per second", "NVTX annotate", "instrument my dataloader", "data load time", "kernel timing", "how do I time".

🇺🇸|EnglishTranslated

AI & Machine Learningbbuf/sglang-auto-driven-s...

sglang-sota-humanize-loop

Run an autonomous Humanize-governed SGLang SOTA performance loop for one LLM model: first perform the fixed fair SGLang/vLLM/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches SGLang code, optionally uses ncu-report-skill for kernel evidence, and revalidates until SGLang matches or beats the best observed framework under the same workload and SLA.

🇺🇸|EnglishTranslated

Tools & Utilitieskaggle/kaggle-skills

write-kaggle-benchmarks

Write, push, run, publish, and manage Kaggle Benchmark tasks using the kaggle CLI and the kaggle-benchmarks Python SDK. Use when the user wants to create or push a benchmark task (optionally with attached Kaggle datasets), run benchmarks against LLM models, check task/run status, stream or fetch execution logs, download results and source notebooks, publish a task to make it public, or troubleshoot benchmark workflows.

🇺🇸|EnglishTranslated

Code Qualitydralgorhythm/claude-agent...

optimizing-code

Improve code performance without changing behavior. Use when code fails latency/throughput requirements. Covers profiling, caching, and algorithmic optimization.

🇺🇸|EnglishTranslated

Tools & Utilitiesjesseotremblay/claude-ski...

analyzing-funding-landscape

Analyzes venture capital, investment trends, funding rounds, investor strategies, M&A activity, and funding patterns in specific markets or industries. Use when the user requests funding analysis, VC landscape research, investment trend analysis, or wants to understand investor activity and funding dynamics.

🇺🇸|EnglishTranslated

Product & Designgpu-cli/skills

tui-review

Critically review terminal user interfaces for UX quality, responsiveness, visual design, and interactivity. Use when asked to "review my TUI", "test my TUI UX", "audit my terminal UI", "check TUI responsiveness", "review TUI keybindings", "check interactivity", or any request to evaluate the user experience quality of a ratatui/crossterm/ncurses-based terminal application. Launches the TUI in tmux, systematically tests 10 dimensions (responsiveness, input conflicts, visual clarity, navigation, feedback loops, error states, layout, keyboard design, permission flows, visual design & color), and produces a graded report with screenshots and specific findings. Benchmarks against Claude Code, OpenCode, and Codex — the three best-in-class AI terminal UIs.

🇺🇸|EnglishTranslated

AI & Machine Learningaffaan-m/everything-claud...

agent-harness-construction

Design and optimize AI agent action spaces, tool definitions, and observation formatting for higher completion rates.

🇺🇸|EnglishTranslated

AI & Machine Learningruvnet/ruflo

agent-performance-benchmarker

Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker

🇺🇸|EnglishTranslated

AI & Machine Learningvincenzoimp/academic-rese...

cs-methodology-evaluation

Use when designing or auditing computer science experiments, evaluation plans, baselines, metrics, ablations, datasets, statistical tests, benchmarks, validity threats, or reproducibility claims.

🇺🇸|EnglishTranslated

Testing & QAjeremylongshore/claude-co...

load-testing-apis

Execute comprehensive load and stress testing to validate API performance and scalability. Use when validating API performance under load. Trigger with phrases like "load test the API", "stress test API", or "benchmark API performance".

🇺🇸|EnglishTranslated