Loading...
Loading...
Found 1,927 Skills
Retrieve and analyze simulation results from a Coval run. Use when user wants to review evaluation outcomes or debug agent behavior.
Guide for adding a new benchmark or training environment to NeMo-Gym. Use when the user asks to add, create, or integrate a benchmark, evaluation, training environment, or resources server into NeMo-Gym. Also use when wrapping an existing 3rd-party benchmark library. Covers the full workflow: data preparation, resources server implementation, agent wiring, YAML config, testing, and reward profiling (baselining). Triggered by: "add benchmark", "new resources server", "integrate benchmark", "wrap benchmark", "add training environment", "add eval".
Use when the user wants to iterate on a viral-article scoring system itself, calibrate or improve a scoring prompt against labeled samples, or run batch scoring experiments on a fixed article set. Best for prompt-only scoring research where the evaluator scripts stay fixed and only the scoring rubric/prompt is meant to evolve.
Update financial models with new data — quarterly earnings, management guidance, macro changes, or revised assumptions. Adjusts estimates, recalculates valuation, and flags material changes. Use after earnings, guidance updates, or when assumptions need refreshing. Triggers on "update model", "plug earnings", "refresh estimates", "update numbers for [company]", "new guidance", or "revise estimates".
Simulate a Nature-style reviewer assessment from the referee perspective rather than an author rebuttal. Use when the user wants a pre-submission review, reviewer report, peer-review style critique, novelty/significance/technical soundness assessment, reviewer-style manuscript evaluation, 审稿人视角评估, 预审稿意见, or Nature reviewer report. Return 3 reviewer reports plus a cross-review synthesis, grounded only in the local Nature reviewer source basis.
Extract false-positive and false-negative gaps from VLM binary-classification-question (BCQ, yes/no) predictions. Use after running VLM evaluation when you have a predictions JSON and need to identify failure cases for DEFT root cause analysis on a binary-classification VLM workflow.
Core principles of SEO including E-E-A-T, Core Web Vitals, technical foundations, content quality, and how modern search engines evaluate pages. This skill explains *why* SEO works, not how to execute specific optimizations.
Fast DataFrame library (Apache Arrow). Select, filter, group_by, joins, lazy evaluation, CSV/Parquet I/O, expression API, for high-performance data analysis workflows.
EU MDR 2017/745 compliance specialist for medical device classification, technical documentation, clinical evidence, and post-market surveillance. Covers Annex VIII classification rules, Annex II/III technical files, Annex XIV clinical evaluation, and EUDAMED integration.
Use when choosing or evaluating a startup revenue model, pricing/value metric, packaging/tier design, or calculating unit economics (LTV, CAC, payback, gross margin, NRR), including usage-based/credit/AI pricing and variable compute/COGS constraints.
Analyzes events through computer science lens using computational complexity, algorithms, data structures, systems architecture, information theory, and software engineering principles to evaluate feasibility, scalability, security. Provides insights on algorithmic efficiency, system design, computational limits, data management, and technical trade-offs. Use when: Technology evaluation, system architecture, algorithm design, scalability analysis, security assessment. Evaluates: Computational complexity, algorithmic efficiency, system architecture, scalability, data integrity, security.
LLM observability platform for tracing, evaluation, prompt management, and cost tracking. Use when setting up Langfuse, monitoring LLM costs, tracking token usage, or implementing prompt versioning.