Loading...
Loading...
Found 1,928 Skills
Use this skill when the user asks to "evaluate MCP tools", "test tool selection", "improve tool descriptions", "check MCP schema quality", "eval my MCP server", or wants to measure whether Claude uses their MCP tools correctly. Tests tool selection accuracy, analyzes schema quality, and iteratively optimizes descriptions. Companion to build-mcp-server.
A method for iteratively improving text instructions for agents (skills / slash commands / task prompts / CLAUDE.md sections / code generation prompts) by having unbiased executors run them, then evaluating from both perspectives (executor self-report + instruction-side metrics). Repeat until improvement plateaus. Use immediately after creating or significantly revising a prompt or skill, or when you suspect the reason an agent isn't behaving as expected is due to ambiguity in the instructions.
Nassim Taleb's Antifragility framework applied to a business idea, system, or portfolio position. Spawns a team of specialist agents — Fat-Tail Detector, Fragility Auditor, Optionality Scout, Iatrogenics Checker, Skin-in-the-Game Auditor — who each apply a distinct lens from Taleb's Incerto to evaluate whether the subject is fragile, robust, or antifragile. The lead synthesizes into a convexity assessment: what's the payoff structure under disorder, where are the hidden tail risks, and the honest Taleb verdict. Use when the user says "taleb this", "is this fragile", "antifragility analysis", "what would Taleb think", "tail risk check", or proposes a business/system and wants structural risk analysis. Works standalone or after /munger for complementary analysis.
Enforce IPPF/UNFPA/UNAIDS publication-standard citations on MEL/SRHR output. Use whenever Ane produces a theory of change, evaluation design, indicator set, donor report, or SRHR programme analysis. Injects current authoritative framework versions with author and year, flags outdated versions, and applies the data-gap protocol. Do not use for non-MEL work.
Run cross-framework agent comparisons using evaluatorq from orqkit — compares any combination of agents (orq.ai, LangGraph, CrewAI, OpenAI Agents SDK, Vercel AI SDK) head-to-head on the same dataset with LLM-as-a-judge scoring. Use when comparing agents, benchmarking, or wanting side-by-side evaluation. Do NOT use when comparing only orq.ai configurations with no external agents (use run-experiment instead).
Discounted cash flow (DCF) valuation model built from Longbridge financial data — historical FCF (operating cash flow minus capex), projected FCF with growth assumptions, WACC (Beta / risk-free rate / equity risk premium), terminal value, intrinsic value vs current price, and margin of safety. Triggers: "DCF", "现金流折现", "内在价值", "自由现金流", "WACC", "折现率", "安全边际", "终值", "现金流贴现", "現金流折現", "內在價值", "自由現金流", "折現率", "安全邊際", "DCF model", "discounted cash flow", "intrinsic value", "free cash flow", "WACC", "discount rate", "margin of safety", "terminal value", "Gordon growth".
Guides senior corporate transaction leadership—deal thesis, valuation and offer strategy, negotiation priorities, structure (cash/stock/earnout/RWI/locked box), IC and board recommendations, adviser and banker management, go/no-go and walk-away, and oversight of execution through close. Use when leading an M&A, divestiture, financing, or JV as deal principal, preparing investment committee or board materials, setting negotiation mandates, or adjudicating price/structure—not for closing matrices and diligence logistics (transaction-manager), contract drafting (corporate-counsel, commercial-counsel), general strategy consulting (business-consultant), or sales quote-to-cash (deal-operations-administrator). Human executives and counsel approve binding terms.
Full evaluation workflow - launch a run, watch progress, and summarize results. Use for end-to-end agent testing.
Query and browse evaluation results stored in MLflow. Use when the user wants to look up runs by invocation ID, compare metrics across models, fetch artifacts (configs, logs, results), or set up the MLflow MCP server. ALWAYS triggers on mentions of MLflow, experiment results, run comparison, invocation IDs in the context of results, or MLflow MCP setup.
Attach judges to AI Config variations for automatic LLM-as-a-judge evaluation. Create custom judges, configure sampling rates, and monitor quality scores.
Implement Cisco's Foundry specification for agentic AI security evaluation systems with multi-agent architecture
Validate existing offers using Hormozi's Value Equation. Scores offers, exposes weaknesses, and provides actionable fixes. Activates for "validate my offer," "rate my offer," or "is my offer good."