Loading...
Loading...
Found 3,723 Skills
Run a heavy neural-trader job (long walk-forward, big Monte-Carlo, parameter sweep, model training) on the Anthropic Managed Agent cloud runtime instead of locally
Build or adapt a local browser/CDP harness to drive and inspect a web, IDE, or Electron UI. Use for local UI verification, screenshots, accessibility snapshots, perf profiles, visual diffs, or reproducing UI bugs.
Onboard 1-node GitHub MR functional tests for GB200 from existing mr-scoped 2-node tests.
Use this temporary smoke-test skill to verify skills.sh indexing and download snapshot behavior for a fresh UnifAPI agent skills repository.
Use when the user asks to make something faster, try many variants, run recursive optimization, benchmark latency/throughput/cost, or choose the best implementation by repeated measured tests.
Concept prototype — validate the core idea is worth designing before writing GDDs. Run right after /brainstorm and /setup-engine. Routes to HTML, Engine, or Paper path based on game type. Produces a throwaway build and a PROCEED/PIVOT/KILL verdict.
Map test coverage to GDD critical paths, identify fixed bugs without regression tests, flag coverage drift from new features, and maintain tests/regression-suite.md. Run after implementing a bug fix or before a release gate.
Pre-Production validation — build a production-quality end-to-end build to confirm the full game loop is achievable before committing to Production. Run after GDDs, architecture, and UX specs are complete. Produces a PROCEED/PIVOT/KILL verdict that gates the Pre-Production → Production transition.
Verify a claim with fresh local evidence: restate it falsifiably, capture baseline and treatment, compare artifacts, and return VERIFIED, NOT VERIFIED, or INCONCLUSIVE.
Hunting skill for business logic vulnerabilities. Built from 12 public bug bounty reports. Covers coupon-race-stacking (Instacart, Stripe, Reverb), negative-quantity-in-cart price tampering (Upserve, Eternal/Zomato), decimal/fraction price-field overflow (Shipt), client-side checkout amount trust on PayPal redirect (WordPress.org), price-per-unit mass-assignment (Krisp), and archived-price swap / cart-TOCTOU (Stripe). Use when hunting business logic — heavy emphasis on financial-impact-demonstrated cases.
Owner-scoped task decomposition with gates, rollback, verification commands, and smoke tests.
Owns the smoke test contract for an ML experiment: a small, diagnostic-by-construction pytest that fits the experiment's learner on a portion of the real `data/` source and predicts on a *disjoint* portion that deliberately carries **no pre-history buffer**. The assertion is structural — the number of predictions must equal the number of rows in the predict grid. A pipeline that loads-then-features-then-splits will silently drop the cold-start rows of the predict slice and the test will fail with a row-count mismatch; a pipeline that marks X early and references upstream history nodes from feature steps will pass trivially. The smoke test is the executable proof of the X-marker placement rule from `build-ml-pipeline`. TRIGGER when: `test-ml-pipeline` has dispatched here to write the smoke test for an approved experiment; `pytest tests/smoke/` is failing on row count; the user asks "why is the smoke test failing?"; a pipeline edit in `build-ml-pipeline` needs an executable proof; an experiment script changes the pipeline shape and the matching smoke test needs revisiting. SKIP when: the design note does not exist or is not yet approved (route to `iterate-ml-experiment`); the user is asking about a regression test or schema invariant (route to `regression-test-ml-pipeline` / `distribution-test-ml-pipeline` once those exist); the question is the *interpretation* of CV metrics, not predict-time correctness (route to `evaluate-ml-pipeline`). HOW TO USE: read the matching experiment's `journal/NN_*.md` and `experiments/NN_*.py` first to understand the pipeline's source binding (what env-dict keys does `build_learner` expect?). Then construct two env-dicts from the **real `data/` source** — a train env and a predict env — such that the predict env carries *only the rows we want predictions for* and *no pre-history buffer*. The hard assertion is that the prediction count matches the predict-env row count exactly. The soft assertion is that the smoke set's MAE is within `3 × CV_mean` (or the task-appropriate analogue). **Do not write the design note or run CV — that's other skills' job.**