Loading...
Loading...
Found 26 Skills
Supply-chain testing via package-manager dependency confusion: when internal package names resolve to attacker-controlled public registries, leading to malicious install and script execution. Use for npm/pip/gem/Maven/Composer/Docker manifest review and authorized red-team supply-chain exercises.
Adaptive exploration pipeline that integrates /brainstorm, /think, and /red-team with intelligent pivoting. Unlike /deepthink (which takes a fixed idea and iterates), /prospect starts with divergent brainstorming, picks the most promising vein, runs deep analysis, and — crucially — can PIVOT back to divergent thinking when: the idea dies under red-team, an adjacent opportunity surfaces during analysis, or the research reveals the real opportunity is elsewhere. Produces a prospecting report: the landscape explored, veins assayed, pivots taken, and the final stake with conviction. Use when the user says "prospect", "explore this space", "find opportunities", "what should I build", "explore and analyze", or has a domain/trend they want to both explore AND evaluate.
LLM guardrails with NeMo, Guardrails AI, and OpenAI. Input/output rails, hallucination prevention, fact-checking, toxicity detection, red-teaming patterns. Use when building LLM guardrails, safety checks, or red-team workflows.
Senior Code Architect & Quality Assurance Engineer for 2026. Specialized in context-aware AI code reviews, automated PR auditing, and technical debt mitigation. Expert in neutralizing "AI-Smells," identifying performance bottlenecks, and enforcing architectural integrity through multi-job red-teaming and surgical remediation suggestions.
Run a model-diverse subagent council to investigate the same problem from multiple perspectives, compare findings, and produce a final recommendation. Use this skill whenever the user asks for a council, second opinions, multiple agents/models to evaluate one question, parallel investigation, red-team/blue-team comparison, or help deciding between competing technical approaches.
Apply structured critical thinking — identifying claims, evidence, reasoning chains, hidden assumptions, and logical fallacies — to evaluate or construct specific written arguments rigorously. Use this skill when the user presents a concrete argument, claim, op-ed, research finding, or piece of reasoning to be analyzed for logical validity or flaws, even if they say 'is this argument valid', 'what logical fallacies are in this', or 'what assumptions am I making in this thesis'. Do NOT use for casual plan review, trip planning, project risk brainstorming, or pre-mortems — 'poke holes in my plan' requests are red-team / risk review, not argument analysis.
Guides ML/research engineering for safeguards—safety classifier development, harm benchmarks and eval suites, labeled dataset design, fine-tuning and ablations, calibration and slice analysis, attack-surface research memos, and promotion criteria for new moderation models. Use when building or evaluating guardrail models, designing safety benchmarks, measuring precision/recall on policy categories, comparing mitigation techniques, or writing research reports on classifier improvements—not for production inference gateways (ml-infrastructure-engineer-safeguards), PII/leakage privacy research (privacy-research-engineer-safeguards), red-team attack campaigns (ai-redteam), AI governance policy (ai-risk-governance), general non-safety research (ai-researcher), or token-efficiency studies (research-engineer-scientist-tokens).
Find every way users can break your AI before they do. Use when you need to red-team your AI, test for jailbreaks, find prompt injection vulnerabilities, run adversarial testing, do a safety audit before launch, prove your AI is safe for compliance, stress-test guardrails, or verify your AI holds up against adversarial users. Covers automated attack generation, iterative red-teaming with DSPy, and MIPROv2-optimized adversarial testing.
Provides calibrated decision analysis using Charlie Munger-style multiple mental models, inversion, incentive mapping, circle-of-competence checks, misjudgment audits, second-order effects, and forecast updates. Use when the user asks for an oracle take, a hard call, a decision memo, a premortem, an outside view, a red-team, a sanity-check, what am I missing, think this through, or wants a strategy, hire, investment, plan, product, partnership, or major life choice analysed. Avoid for simple factual lookups or time-sensitive legal, medical, or market questions without fresh evidence.
End-to-end deep research and analysis pipeline. Takes a raw idea or market question, conducts deep web research, builds a competitive landscape, runs multi-framework intelligence analysis (/think), stress-tests it (/red-team), researches the red team findings, re-thinks with adversarial data, re-red-teams, and iterates until divergence between think and red-team is low (conviction stabilizes). Then generates a comprehensive single-file HTML report with all findings: market landscape, competitive analysis, intelligence briefs, red team results, how to win, and how you could lose. Use when the user says "/deepthink", "deep think", "deep research", or wants a comprehensive research-to-report pipeline on any idea, market, or strategic question.
Use when the user asks to "create an evaluator", "create evals", "create a scenario", "write a test scenario", "design a test case", "test my agent", "build eval coverage", "plan a test suite", "create red team tests", "set up test profiles", "configure conditional actions", "write a conditional action evaluator", "build a deterministic test", "design an IVR test", "IVR navigation test", "write a unit test for a voice agent", "build a regression test", "scripted scenario", "scripted voice test", "structured evaluator", "exact flow test", "sequential conditions", "fixed sequence test", or "run evals". Covers individual evaluator design, suite coverage strategy, test profiles, mock-tool data design, conditional actions (deterministic / unit test / regression / IVR navigation flows), and best practices for workflow / red-team / edge-case / deterministic test types.
Guides AI ops leadership—LLM SRE, model/prompt releases, eval/incidents, cost/capacity, vendors, and cross-functional cadence. Use for AI platform ops, LLM SLAs, incidents, rollout governance, unit economics, red-team/eval gates, and team rituals—not memory (ai-memory-developer), context code (ai-context-engineer), security programs (cybersecurity), token roadmaps (ai-token-improvement-plan-engineer), solution architecture (applied-ai-architect-commercial-enterprise), skills portfolio (ai-skill-manager), or vertical AI product eng management (engineering-manager-vertical-ai-products). Prompt/eval team management and golden-set release policy: engineering-manager-agent-prompts-evals. Safeguard inference platform: ml-infrastructure-engineer-safeguards. Safeguard model research: ml-research-engineer-safeguards.