Total 50,523 skills, DevOps & Cloud Services has 3052 skills
Showing 12 of 3052 skills
Densify integration. Manage data, records, and automate workflows. Use when the user wants to interact with Densify data.
Deploy and manage web apps with Firebase App Hosting. Use this skill when deploying Next.js/Angular apps with backends.
Guides FinOps analysis on AWS, GCP, and Azure—cost visibility and allocation, tagging and showback/chargeback models, rightsizing and waste removal, RI/Savings Plan/CUD recommendations, budgets and forecasts, anomaly detection, unit economics (cost per service/customer), and FinOps cadence with engineering accountability. Use when optimizing cloud spend, analyzing CUR/billing exports, building cost dashboards, explaining bill spikes, or improving allocation—not for GL mapping, capex, depreciation, or month-end ledger close (compute-accounting-manager), enterprise EA negotiation (enterprise-cloud-architect), hands-on resource provisioning (cloud-engineer), or hardware supply efficiency (data-center-compute-supply-efficiency).
Manage secrets and PKI with HashiCorp Vault. Configure secret engines, authentication methods, and policies. Use when implementing centralized secrets management, dynamic credentials, or certificate management.
Dev environment setup for Megatron Bridge — container-based development, uv package management, lockfile regeneration, adding dependencies, Slurm container usage, and common build pitfalls.
Monitor submitted jobs (PTQ, evaluation, deployment) on SLURM clusters. Use when the user asks "check job status", "is my job done", "monitor my evaluation", "what's the status of the PTQ", "check on job <slurm_job_id>", or after any skill submits a long-running job. Also triggers on "nel status", "squeue", or any request to check progress of a previously submitted job.
Build Docker images for Python services following team conventions. Use this skill when writing Dockerfiles, authoring CI image build pipelines, or adding a new service — covers mitodl image naming, git short-ref tags, relocatable uv venvs, and shared library handling.
Modify existing Pulumi infrastructure stacks safely. Use this skill when making any Pulumi IaC changes — always edit the existing stack entrypoint, never create new files, preserve assumeRole and cross-account configuration, and validate with pulumi preview before finishing.
Watchman Monitoring integration. Manage data, records, and automate workflows. Use when the user wants to interact with Watchman Monitoring data.
Query DTS (Data Transmission Service) task status and details across all Alibaba Cloud regions. **v12.1: Enhanced reliability** - Timeout increased to 10s, exponential backoff (0.2s, 0.4s) for better timeout handling. Parallel execution remains **6-8x faster** than v10 (39s → 6s with --workers 16). **API retry logic ensures consistent results (no count variations)**. Supports filtering by instance ID or job name. Automatically polls all 27 regions and 3 job types. Strictly filters for PrePaid/PostPaid tasks and outputs a full Chinese report with Region information. Tasks are grouped by type (Migration/Sync/Subscribe) and sorted by CreateTime within each group. **Use this skill when: checking DTS task status, finding migration/sync tasks, verifying task counts, or filtering tasks by instance ID or job name.**
Alibaba Cloud Elasticsearch instance diagnosis skill. Use for cluster health checks, troubleshooting, and performance analysis on Elasticsearch instances. Triggers (English): Elasticsearch diagnosis, ES instance issues, slow search, write failures, cluster Red/Yellow, unassigned shards, node disconnected, load imbalance, thread pool 429, JVM/OOM/circuit breaker, disk watermark / read-only index, instance activating / change stuck, service avalanche / all shards failed. 触发词(中文): ES诊断、阿里云ES、Elasticsearch诊断、ES集群/实例故障排查、ES健康检查、集群红灯/变红/黄灯/变黄、集群异常、分片未分配、主分片未分配、节点掉线/离线、负载不均衡、搜索/查询变慢、慢查询、写入失败/变慢/拒绝、线程池打满、HTTP 429、内存过高、OOM、断路器、磁盘满/水位、索引只读、实例激活中/activating、变更卡住/未完成、雪崩、服务不可用、all shards failed。
Amazon Polly integration. Manage data, records, and automate workflows. Use when the user wants to interact with Amazon Polly data.