Loading...
Loading...
Found 1,164 Skills
Design click/first-click tests to evaluate navigation and information findability.
Instrument Python LLM apps, build golden datasets, write eval-based tests, run them, and root-cause failures — covering the full eval-driven development cycle. Make sure to use this skill whenever a user is developing, testing, QA-ing, evaluating, or benchmarking a Python project that calls an LLM, even if they don't say "evals" explicitly. Use for making sure an AI app works correctly, catching regressions after prompt changes, debugging why an agent started behaving differently, or validating output quality before shipping.
Tech Stock Earnings Deep Dive Analysis and Multi-Perspective Investment Memo System (v3.0). Covers 16 major analysis modules (A-P), 6 investment philosophy perspectives, institutional-grade evidence standards, anti-bias framework, and actionable decision system. When users mention topics such as tech company earnings analysis, quarterly/annual report interpretation, earnings call, revenue growth analysis, margin changes, guidance, valuation models, DCF, reverse DCF, EV/EBITDA, PEG, Rule of 40, management analysis, competitive landscape, position sizing, whether to buy/sell/add to a tech stock position, how to interpret a company's latest earnings, doing a deep dive, multi-angle valuation, how investment masters view a company, variant view, key forces, kill conditions, ownership structure, executive team, partner ecosystem, macro policy impact, etc., this skill should be used. Even if the user simply asks "help me look at NVDA's latest earnings" or "how did META do this quarter" or "should I keep holding MSFT," this skill should be triggered to provide comprehensive earnings analysis and a multi-perspective investment memo. This skill complements the us-value-investing skill — us-value-investing focuses on long-term value four-dimensional scoring, while this skill focuses on in-depth dissection of the latest earnings, comprehensive judgment across multiple investment philosophies, and actionable position decisions.
Attach judges to AI Config variations for automatic LLM-as-a-judge evaluation. Create custom judges, configure sampling rates, and monitor quality scores.
Master dispatcher for all MLflow workflows. Use this skill when the user wants to do anything with MLflow — tracing, evaluating, debugging, or improving an agent. Routes to the right MLflow sub-skill automatically. Triggers on: "use mlflow", "help with mlflow", "mlflow agent", "add mlflow to my project", "trace my agent", "evaluate my agent", or any MLflow task without a specific skill in mind.
Produces structured judgment briefs for contested situations — news events, decisions, conflicts, strategy questions. Surfaces hidden bets, real disagreements, unspeakable truths, and who concretely pays. Use when the user wants sharper thinking about something messy, not a summary.
This skill should be used when the user asks to "optimize prompts", "design prompt templates", "evaluate LLM outputs", "build agentic systems", "implement RAG", "create few-shot examples", "analyze token usage", or "design AI workflows". Use for prompt engineering patterns, LLM evaluation frameworks, agent architectures, and structured output design.
Design and evaluate compression strategies for long-running sessions
Gathers and filters information systematically. Applies scanning, focusing, filtering, triangulating, monitoring, and synthesizing modes to build accurate situational awareness. Use when researching, verifying claims, monitoring signals, or combining multiple sources. Triggers on "what's happening", "verify this", "monitor for", "gather information", "is this true".
LLM observability platform for tracing, evaluation, prompt management, and cost tracking. Use when setting up Langfuse, monitoring LLM costs, tracking token usage, or implementing prompt versioning.
Retrieve industry-specific P/E ratios using Octagon MCP. Use when comparing company valuations to specific industry peers, analyzing sub-sector valuations, and understanding niche market valuations beyond broad sector averages.
MCP (Model Context Protocol) server build and evaluation guide, including local conventions for tool surfaces, config, and testing