Loading...
Loading...
Found 1,927 Skills
Tech hype vs. fundamentals analysis via Longbridge — identifies valuation bubbles and fundamental disconnects in A-share / HK tech stocks. Compares PE / PS / EV-EBITDA historical percentile against actual revenue / profit growth. Analyses which AI / EV / semiconductor theme plays have fundamental support vs. pure sentiment-driven momentum. Triggers: "科技炒作", "AI泡沫", "估值泡沫", "科技估值", "概念股", "主题炒作", "基本面背离", "炒作识别", "科技泡沫", "科技炒作", "AI泡沫", "估值泡沫", "科技估值", "概念股", "主題炒作", "基本面背離", "tech hype", "AI bubble", "valuation bubble", "tech valuation", "theme stocks", "hype vs fundamentals", "concept stocks", "narrative vs reality", "AI concept", "semiconductor bubble".
Write, refine, run, and QA promptfoo evaluation suites: promptfooconfig.yaml, prompts, providers, vars, tests, assertions, model-graded rubrics, transforms, datasets, exports, and CI gates. Use for non-redteam eval coverage, regression tests, or new eval matrices. Do not use for adversarial redteam plugin or strategy setup.
Analyze equity securities, factor models, and equity portfolio construction. Use when the user asks about stocks, equity valuation ratios, index construction methods, or style analysis. Also trigger when users mention 'P/E ratio', 'growth vs value', 'market cap weighting', 'sector allocation', 'GICS classification', 'earnings per share', 'Fama-French factors', 'CAPM', 'dividend yield', 'PEG ratio', 'EV/EBITDA', or ask which factors explain equity returns.
Guides ML/research engineering for safeguards—safety classifier development, harm benchmarks and eval suites, labeled dataset design, fine-tuning and ablations, calibration and slice analysis, attack-surface research memos, and promotion criteria for new moderation models. Use when building or evaluating guardrail models, designing safety benchmarks, measuring precision/recall on policy categories, comparing mitigation techniques, or writing research reports on classifier improvements—not for production inference gateways (ml-infrastructure-engineer-safeguards), PII/leakage privacy research (privacy-research-engineer-safeguards), red-team attack campaigns (ai-redteam), AI governance policy (ai-risk-governance), general non-safety research (ai-researcher), or token-efficiency studies (research-engineer-scientist-tokens).
Run, monitor, analyze, and debug LLM evaluations via nemo-evaluator-launcher. Covers running evaluations, checking status and live progress, debugging failed runs, exporting artifacts and logs, and analyzing results. ALWAYS triggers on mentions of running evaluations, checking progress, debugging failed evals, analyzing or analysing runs or results, run directories or artifact paths on clusters, Slurm job issues, invocation IDs, or inspecting logs (client logs, server logs, SSH to cluster, tail logs, grep logs). Do NOT use for creating or modifying evaluation configs.
Bootstrap evaluators from production traces — emit SDK code, a framework-agnostic JSON spec, or publish online LLM-judge evaluators directly to Datadog. Use when user says "bootstrap evaluators", "generate evaluators", "create evals from traces", "eval bootstrap", "write evaluators", "build eval suite", "publish evaluators", or wants to generate BaseEvaluator/LLMJudge code or online judge configs from production LLM trace data. Works with ml_app and optional RCA report or failure hypothesis.
Extract false-positive and false-negative gaps from VLM binary-classification-question (BCQ, yes/no) predictions. Use after running VLM evaluation when you have a predictions JSON and need to identify failure cases for DEFT root cause analysis on a binary-classification VLM workflow.
Help users develop product taste and intuition. Use when someone wants to improve their product judgment, struggles to evaluate design quality, needs to make decisions without complete data, or wants to build better product instincts.
Comprehensive quality auditing and evaluation of tools, frameworks, and systems against industry best practices with detailed scoring across 12 critical dimensions
Use when choosing or evaluating a startup revenue model, pricing/value metric, packaging/tier design, or calculating unit economics (LTV, CAC, payback, gross margin, NRR), including usage-based/credit/AI pricing and variable compute/COGS constraints.
Gathers and filters information systematically. Applies scanning, focusing, filtering, triangulating, monitoring, and synthesizing modes to build accurate situational awareness. Use when researching, verifying claims, monitoring signals, or combining multiple sources. Triggers on "what's happening", "verify this", "monitor for", "gather information", "is this true".
Retrieve analysts' price target summary for any stock using Octagon MCP. Use when evaluating analyst sentiment, upside/downside potential, consensus expectations, and tracking target trends over time.