Loading...
Loading...
Found 126 Skills
Conduct statistical hypothesis testing including null/alternative hypothesis formulation, p-values, Type I/II errors, and test statistic selection. Use this skill when the user needs to determine whether a result is statistically significant, choose the right statistical test, interpret p-values correctly, or evaluate research findings — even if they say 'is this result significant', 'which statistical test should I use', or 'what does this p-value mean'.
Use when hunting for threats in an environment, analyzing IOCs, or detecting behavioral anomalies in telemetry. Covers hypothesis-driven threat hunting, IOC sweep generation, z-score anomaly detection, and MITRE ATT&CK-mapped signal prioritization.
Use when planning A/B tests in LaunchDarkly, Optimizely, or similar platforms. Sizes the experiment (sample size, MDE, runtime), drafts hypothesis + success metrics + guardrails, and produces a launch checklist + rollback plan.
Bootstrap evaluators from production traces — emit SDK code, a framework-agnostic JSON spec, or publish online LLM-judge evaluators directly to Datadog. Use when user says "bootstrap evaluators", "generate evaluators", "create evals from traces", "eval bootstrap", "write evaluators", "build eval suite", "publish evaluators", or wants to generate BaseEvaluator/LLMJudge code or online judge configs from production LLM trace data. Works with ml_app and optional RCA report or failure hypothesis.
Automated hypothesis generation and testing using large language models. Use this skill when generating scientific hypotheses from datasets, combining literature insights with empirical data, testing hypotheses against observational data, or conducting systematic hypothesis exploration for research discovery in domains like deception detection, AI content detection, mental health analysis, or other empirical research tasks.
Эксперт анализа распределений. Используй для statistical distributions, data analysis и hypothesis testing.
Use when diagnosing unexpected behavior, failed workflows, bugs, browser or Node.js runtime issues, logs, traces, or when preparing a root-cause hypothesis. 诊断异常、定位 bug、判断修复方向时使用:先建立证据表,区分运行时事实和代码推断,避免多层猜测;证据不足时添加 copy-friendly 浏览器日志或本地 Node.js JSONL 日志。
Use when a BizOps lead, COO, or process-improvement owner needs to document an end-to-end business process (procurement, employee onboarding, incident handoff, customer-onboarding, claims adjudication) in BPMN-style notation, measure cycle times by stage, surface where work spends most of its time waiting vs. being worked, and quantify the gap between processing time and total elapsed time. Pairs Lean / Six Sigma / Theory-of-Constraints canon with deterministic stdlib-only Python tools to produce a process map, a ranked bottleneck list (with severity + root-cause hypothesis), and a cycle-time analysis (P50, P90, value-add ratio, Little's-Law throughput). Distinct from sales-pipeline, system-reliability (SLO), and strategic-OKR work — this is tactical process documentation for internal operations.
Autonomous NeMo-RL research agent workflow for directed hypothesis testing and open-ended discovery. Guides agents through the full experiment lifecycle: understanding recipes and environments, wiring RL or NeMo-gym runs, launching reproducible baselines and iterations, analyzing results, preserving human oversight, and using git plus TSV logs as the research ledger.
When the user wants to design, prioritize, or analyze growth experiments -- including A/B tests, hypothesis frameworks, ICE/RICE scoring, or growth sprints. Also use when the user says "A/B test," "experiment design," "growth sprint," "experiment prioritization," or "statistical significance." For analytics setup, see product-analytics. For growth modeling, see growth-modeling.
Systematic debugging methodology with root cause analysis. Phases: investigate, hypothesize, validate, verify. Capabilities: backward call stack tracing, multi-layer validation, verification protocols, symptom analysis, regression prevention. Actions: debug, investigate, trace, analyze, validate, verify bugs. Keywords: debugging, root cause, bug fix, stack trace, error investigation, test failure, exception handling, breakpoint, logging, reproduce, isolate, regression, call stack, symptom vs cause, hypothesis testing, validation, verification protocol. Use when: encountering bugs, analyzing test failures, tracing unexpected behavior, investigating performance issues, preventing regressions, validating fixes before completion claims.
Construct well-structured arguments using the hypothesis-argument-example triad. Covers formulating falsifiable hypotheses, building logical arguments (deductive, inductive, analogical, evidential), providing concrete examples, and steelmanning counterarguments. Use when writing or reviewing PR descriptions that propose technical changes, justifying design decisions in ADRs, constructing substantive code review feedback, or building a research argument or technical proposal.