Loading...
Loading...
Found 28 Skills
Multi-cloud orchestration for ML workloads with automatic cost optimization. Use when you need to run training or batch jobs across multiple clouds, leverage spot instances with auto-recovery, or optimize GPU costs across providers.
Kubernetes container orchestration with Helm, operators, and service mesh. Use for cluster management.
Expert knowledge for Azure Service Fabric development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when building Service Fabric clusters, Reliable Actors/Collections, reverse proxy, remoting, or Azure-integrated apps, and other Azure Service Fabric related development tasks. Not for Azure Kubernetes Service (AKS) (use azure-kubernetes-service), Azure App Service (use azure-app-service), Azure Container Apps (use azure-container-apps), Azure Red Hat OpenShift (use azure-redhat-openshift).
Lists TrueFoundry workspaces and clusters. Provides workspace FQNs for deployment, cluster connectivity status, available GPU types, and base domains.
Implement GitOps continuous delivery for Kubernetes using ArgoCD or Flux. Use for automated deployments with Git as single source of truth, pull-based delivery, drift detection, multi-cluster management, and progressive rollouts.
Debug Kubernetes pods, nodes, and workloads. Use when pods are failing, containers crash, nodes are unhealthy, or users mention debugging, troubleshooting, or diagnosing Kubernetes issues.
Expert-level Databricks platform, Apache Spark, Delta Lake, MLflow, notebooks, and cluster management
Master Kubernetes with pods, deployments, services, ingress, ConfigMaps, secrets, and production cluster management.
Remote command execution and file transfer on SageMaker HyperPod cluster nodes via AWS Systems Manager (SSM). This is the primary interface for accessing HyperPod nodes — direct SSH is not available. Use when any skill, workflow, or user request needs to execute commands on cluster nodes, upload files to nodes, read/download files from nodes, run diagnostics, install packages, or perform any operation requiring shell access to HyperPod instances. Other HyperPod skills depend on this skill for all node-level operations.
Manage the full lifecycle of Alibaba Cloud E-MapReduce (EMR) ECS clusters—creation, scaling, renewal, and status queries. Use this Skill when users want to set up big data clusters, view cluster status, add nodes, release nodes, configure auto-scaling, check cluster and node states, or diagnose creation failures. Also applicable for scenarios like "create a Hadoop cluster", "data lake cluster", "running out of resources", "check my cluster", "renew", etc. NOTE: This Skill does NOT support cluster deletion, release, or termination under any circumstances. Any request to delete or terminate a cluster will be refused and redirected to the EMR console.
Orquestração Docker Swarm, gerenciamento de cluster e implantações em produção
OpenSearch development best practices for indexing, querying, search optimization, vector search, and cluster management