Loading...
Loading...
Found 1,927 Skills
Evaluate solutions through multi-round debate between independent judges until consensus
Open-source AI observability platform for LLM tracing, evaluation, and monitoring. Use when debugging LLM applications with detailed traces, running evaluations on datasets, or monitoring production AI systems with real-time insights.
Use when finishing a ticket or pull request and the user asks to validate, demo, or sign off on delivered behavior, including non-user-facing changes. Triggers include "UAT", "verify", "walk me through", "show what changed", "can we merge?", "sign off", "acceptance test", "demo this", "ready to merge", "validate the changes", "show me it works", and similar phrases indicating a need for an acceptance walkthrough or demonstration before merge.
Autonomous crypto business development patterns — multi-chain token discovery, 100-point scoring with wallet forensics, x402 micropayments, ERC-8004 on-chain identity, LLM cascade routing, and pipeline automation for CEX/DEX listing acquisition. Use when building AI agents for crypto BD, token evaluation, exchange listing outreach, or autonomous commerce with payment protocols.
Design click/first-click tests to evaluate navigation and information findability.
Analyze real estate and infrastructure investments including REITs, direct property valuation, and infrastructure assets. Use when the user asks about real estate investing, REITs, cap rates, NOI, FFO, AFFO, property valuation, or infrastructure investments. Also trigger when users mention 'rental property analysis', 'cash-on-cash return', 'gross rent multiplier', 'REIT dividends', 'real estate sectors', 'cell towers', 'toll roads', 'LTV ratio', 'DSCR', or ask whether to invest in real estate directly or through REITs.
Analyze a startup from three perspectives: VC investor, job applicant, and CEO/founder. Use this skill whenever the user wants to evaluate a startup, assess whether to invest in or join a startup, do due diligence, evaluate a job offer from a startup, understand a startup's competitive position, or assess company health and trajectory. Triggers: "analyze this startup", "should I join [company]", "is [company] a good investment", "evaluate [company]", "due diligence on [company]", "what do you think of [startup]", "should I take this startup job offer", "how healthy is [company]", "startup assessment", "company analysis", "is [company] worth joining", "what's the outlook for [company]", "research [company] for me", any mention of evaluating or assessing a startup or tech company from investment, career, or strategic perspectives — provide all three perspectives by default.
Apply Structural Equation Modeling (SEM) to test hypothesized causal structures by combining measurement models (CFA) and structural models (path analysis). Use this skill when the user needs to validate latent constructs, test mediation or moderation paths, assess model fit with CFI/TLI/RMSEA/SRMR, or when they ask 'do these variables form a causal chain', 'how do I test my theoretical model', or 'is my measurement model valid'.
Build Discounted Cash Flow (DCF) valuation models to estimate intrinsic value. Use this skill when the user needs to value a company, evaluate an investment, estimate fair share price, or build financial projections — even if they say 'what is this company worth', 'should we acquire them', or 'build me a valuation model'.
Provides guidance for automatically evolving and optimizing AI agents across any domain using LLM-driven evolution algorithms. Use when building self-improving agents, optimizing agent prompts and skills against benchmarks, or implementing automated agent evaluation loops.
Help a CS or AI PhD student turn a rough research idea into a validated next-step decision using the handbook's FIVE+C framework. Use this skill whenever the user says they have a research idea, wants to know whether an idea is worth pursuing, needs help choosing between project directions, is preparing to pitch an idea to an advisor or senior student, or feels unsure whether a project is too incremental, too ambitious, already solved, hard to evaluate, or missing resources.
Launch a meta-judge then a judge sub-agent to evaluate results produced in the current conversation