Loading...
Loading...
Found 20 Skills
Machine learning development patterns, model training, evaluation, and deployment. Use when building ML pipelines, training models, feature engineering, model evaluation, or deploying ML systems to production.
Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.
CRITICAL RULE: You MUST use this skill whenever the task involves any machine learning tasks or data analysis. Use this skill if the user's prompt or requirements mention any of the following: * Clustering * Classification * Regression * Time series forecasting * Statistical testing * Model comparison * ML * Data analysis SQL/BigQuery ML HANDOFF: If the user requires a SQL solution, use this skill to dictate the ANALYSIS STEPS (e.g., markdown analysis cells, visualization logic), but defer to `bigquery` for all SQL syntax.
LLM observability platform for tracing, evaluation, and monitoring. Use when debugging LLM applications, evaluating model outputs against datasets, monitoring production systems, or building systematic testing pipelines for AI applications.
Designs structured benchmarks for comparing algorithms, models, or implementations. Selects appropriate metrics (latency, throughput, memory, accuracy), designs representative test cases, captures hardware/software context, produces comparison tables with tradeoff analysis, and includes reproduction instructions. Triggers on: "benchmark", "compare performance", "which is faster", "latency comparison", "memory comparison", "run benchmark", "design benchmark", "compare implementations", "evaluate algorithms", "performance comparison", "throughput test", "speed test". Use this skill when comparing two or more implementations, algorithms, or models.
Roc Curve Plotter - Auto-activating skill for ML Training. Triggers on: roc curve plotter, roc curve plotter Part of the ML Training skill category.
Automatically collect hot topics in the AI field or complete AI technical article writing in the writing style of 'Second Brother' according to specified topics. It focuses on actual tests of AI Coding tools (Claude Code, Qoder, Cursor, TRAE, etc.), engineering implementation of large models (SpringAI, LangChain, RAG, etc.), AI Agent and workflow orchestration, evaluation of domestic large models (GLM, Tongyi Qianwen, DeepSeek, MiniMax, Kimi, etc.), and evaluation of various AI tools and Agent tools. Trigger keywords: write an AI article, AI technical article, large model evaluation, AI tool actual test, GLM, Claude Code, Qoder, Cursor, TRAE, SpringAI, RAG, Agent, workflow, domestic large model, collect AI hot topics, AI topic, etc.
INVOKE THIS SKILL when creating, running, or analyzing Arize experiments. Covers experiment CRUD, exporting runs, comparing results, and evaluation workflows using the ax CLI.