Loading...
Loading...
Found 24 Skills
Manage Databricks Model Serving endpoints via CLI. Use when asked to create, configure, query, or manage model serving endpoints for LLM inference, custom models, or external models.
Use when "LLM inference", "serving LLM", "vLLM", "llama.cpp", "GGUF", "text generation", "model serving", "inference optimization", "KV cache", "continuous batching", "speculative decoding", "local LLM", "CPU inference"
Flask Ml Api Creator - Auto-activating skill for ML Deployment. Triggers on: flask ml api creator, flask ml api creator Part of the ML Deployment skill category.
Vertex Ai Deployer - Auto-activating skill for ML Deployment. Triggers on: vertex ai deployer, vertex ai deployer Part of the ML Deployment skill category.
Build production ML systems with PyTorch 2.x, TensorFlow, and modern ML frameworks. Implements model serving, feature engineering, A/B testing, and monitoring. Use PROACTIVELY for ML model deployment, inference optimization, or production ML infrastructure.
Onnx Converter - Auto-activating skill for ML Deployment. Triggers on: onnx converter, onnx converter Part of the ML Deployment skill category.
Model Export Helper - Auto-activating skill for ML Deployment. Triggers on: model export helper, model export helper Part of the ML Deployment skill category.
Builds production AI/ML systems — model training, fine-tuning, MLOps pipelines, model serving, evaluation frameworks, RAG optimization, and agent orchestration at scale. Use when the user asks to build, train, or deploy ML models, set up MLOps pipelines, optimize RAG systems, create inference endpoints, or design production AI agents.
Triton Inference Config - Auto-activating skill for ML Deployment. Triggers on: triton inference config, triton inference config Part of the ML Deployment skill category.
Эксперт ML API. Используй для model serving, inference endpoints, FastAPI и ML deployment.
Coaches end-to-end ML system design interviews covering inference pipelines, recommendation systems, RAG, feature stores, and monitoring. Use for L6+ design rounds, ML architecture whiteboarding, system design practice, serving tradeoff analysis. Activate on "ML system design", "ML interview", "recommendation system design", "RAG architecture", "feature store design", "model serving". NOT for coding interviews, behavioral questions, ML theory quizzes, or paper implementations.
Start, query, and stop a network-specific TAO inference microservice ({network_arch}-inference-microservice) by delegating container execution to the appropriate platform skill. Handles container image resolution, job-payload JSON construction, and the service registry. Use when the user wants to run inference on a TAO model checkpoint using a microservice container, deploy a TAO inference endpoint, or stop a running inference container.