Total 50,391 skills, DevOps & Cloud Services has 3048 skills
Showing 12 of 3048 skills
Plan and run backups, set recovery objectives, and run disaster recovery drills. Use this skill when defining RPO/RTO targets, designing backup architecture, deciding what to back up and how often, planning for full-region or platform outages, or running a restoration drill. Triggers on backup, restore, RPO, RTO, disaster recovery, DR, business continuity, what if the database is gone, what if our hosting goes down, recovery drill, ransomware planning. Also triggers when an incident reveals a gap in restoration capability.
Cognito integration. Manage data, records, and automate workflows. Use when the user wants to interact with Cognito data.
Monitoring, logging, and tracing implementation using OpenTelemetry as the unified standard. Use when building production systems requiring visibility into performance, errors, and behavior. Covers OpenTelemetry (metrics, logs, traces), Prometheus, Grafana, Loki, Jaeger, Tempo, structured logging (structlog, tracing, slog, pino), and alerting.
Guides users through writing, validating, and operationalizing Non-Functional Requirements (NFRs), Service Level Objectives (SLOs), Service Level Indicators (SLIs), and fitness functions. This skill should be used when a user wants to define or review NFRs for a system, translate NFRs into SLOs/SLIs, or generate automatable fitness functions (performance tests, ArchUnit-style architecture tests, availability checks, recovery drills) that validate a system against its non-functional requirements.
Query and download logs from Papertrail using the paperctl CLI. Use when: (1) Downloading logs from Taskcluster workers or other systems (2) Searching for specific log entries across systems (3) Investigating CI failures by pulling worker logs (4) Listing available systems or groups in Papertrail Triggers: "papertrail", "pull logs", "worker logs", "download logs", "search logs"
系统调优
Designs and implements CI/CD pipelines for automated testing, building, deployment, and security scanning across multiple platforms. Covers pipeline optimization, test integration, artifact management, and release automation.
Retrieve staging credentials/JWT token for the Aircall dashboard
Sets up and configures Google Kubernetes Engine (GKE) clusters for production use. Use when creating new GKE clusters, choosing between Autopilot vs Standard modes, configuring networking (VPC-native, private clusters), setting up node pools, or planning cluster architecture for Spring Boot microservices. Includes regional vs zonal decisions, security hardening, and resource provisioning guidance.
Expert-level Helm 3 package management, chart development, templating, and production operations
多服务调试技能:针对 Vercel + GCP Cloud Run 混合架构的调试工作流。 Use when: 跨服务问题排查、日志聚合分析、服务间通信调试、生产环境故障定位。 Triggers: "调试", "debug", "日志", "logs", "错误", "error", "服务", "service", "通信", "超时", "timeout"
Comprehensive toolkit for generating best practice Makefiles following current standards and conventions. Use this skill when creating new Makefiles, implementing build automation, or building production-ready build systems.