Loading...
Loading...
Found 1,204 Skills
Quick single-paper lookup via AlphaXiv LLM-optimized summaries with tiered source fallback. Use when user says "explain this paper", "summarize paper", pastes an arXiv/AlphaXiv URL, or provides a bare arXiv ID for quick understanding - not for broad literature search.
OpenAI-compatible proxy aggregating 14 free-tier LLM providers with automatic failover and per-key rate tracking.
Recipes and configs for serving LLMs locally on RTX 3090 GPUs using vLLM, llama.cpp, and SGLang with OpenAI-compatible API
Cross-model benchmark for gstack skills. Runs the same prompt through Claude, GPT (via Codex CLI), and Gemini side-by-side — compares latency, tokens, cost, and optionally quality via LLM judge. Answers "which model is actually best for this skill?" with data instead of vibes. Separate from /benchmark, which measures web page performance. Use when: "benchmark models", "compare models", "which model is best for X", "cross-model comparison", "model shootout". (gstack) Voice triggers (speech-to-text aliases): "compare models", "model shootout", "which model is best".
Sync, search, and classify X/Twitter bookmarks locally with full-text search, LLM classification, and agent integration
Build typed LLM applications with PydanticAI: schema-constrained outputs, tool integration, validation, retries, and deterministic downstream handoffs. Use when users need reliable structured outputs instead of free-form text generation.
Adversarial robustness engineering for ML/AI—evasion, poisoning, extraction, membership-inference threat models; robust training, sanitization, detectors; ASR/certified evals; lab model attacks; data-pipeline integrity; production I/O guardrails (classical ML and LLM/multimodal). Use for adversarial examples, robustness suites, poison audits, deploy guardrails—not LLM app red team (ai-redteam), governance (ai-risk-governance), safety classifier R&D (ml-research-engineer-safeguards), safeguard serving (ml-infrastructure-engineer-safeguards), privacy research (privacy-research-engineer-safeguards), AppSec pentest (penetration-tester).
Write a high-quality prompt for any LLM or AI assistant — Claude, Claude Code, ChatGPT, Gemini, Cursor, Windsurf, Copilot, or any coding / chat agent. Use this skill whenever the user asks to write, improve, refine, shorten, or rewrite a prompt; asks "how should I phrase this for [model]" or "what's a good prompt for [task]"; describes a task they want an AI to do but hasn't yet formulated it as a prompt; or pastes an existing prompt and asks for revision. Based on Boris's (Anthropic, Claude Code creator) prompt methodology — short and accurate prompts, plan-before-code, feedback loops, persistent context in files. The universal principles (short, plan-first, feedback-loop, no-padding) apply to any LLM; the Claude-Code-specific anchors (CLAUDE.md, @file, slash commands) only apply when the target is Claude Code. If the user's intent is unclear (target model, deliverable, scope, or whether the AI has a way to self-verify is missing), ask 1–3 targeted clarifying questions via AskUserQuestion before writing the prompt.
Deploy and run automated Attack-with-Defense (AWD) competitions where LLM-powered agents compete in real-time cybersecurity challenges
Execute the /integrate command for LLM agents. Triggers when the user types `/integrate`, `/integrate --product`, or asks to "integrate a Juspay product", "set up payments", "add payment SDK", or any variation of setting up a Juspay product into their app or codebase. This skill drives a fully guided, doc-driven wizard: it reads product summaries locally, probes candidates via MCP, then fetches actual documentation pages and generates complete integration code.
Run an autonomous Humanize-governed vLLM SOTA performance loop for one LLM model: first perform the fixed fair vLLM/SGLang/TensorRT-LLM deployment search and benchmark, then start one RLCR loop that repeatedly decides the gap, profiles the current bottleneck, runs layer/kernel pipeline analysis, patches vLLM code, optionally uses ncu-report-skill for kernel evidence, and revalidates until vLLM matches or beats the best observed framework under the same workload and SLA.
Bootstrap evaluators from production traces — emit SDK code, a framework-agnostic JSON spec, or publish online LLM-judge evaluators directly to Datadog. Use when user says "bootstrap evaluators", "generate evaluators", "create evals from traces", "eval bootstrap", "write evaluators", "build eval suite", "publish evaluators", or wants to generate BaseEvaluator/LLMJudge code or online judge configs from production LLM trace data. Works with ml_app and optional RCA report or failure hypothesis.