Loading...
Loading...
Found 38 Skills
Expert SRE investigator for incidents and debugging. Uses hypothesis-driven methodology and systematic triage. Can query Axiom observability when available. Use for incident response, root cause analysis, production debugging, or log investigation.
Observability and SRE expert. Use when setting up monitoring, logging, tracing, defining SLOs, or managing incidents. Covers Prometheus, Grafana, OpenTelemetry, and incident response best practices.
Take a list of projects and their related documentation, and organize them into the SRED format for submission.
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management. Masters incident command, blameless post-mortems, error budget management, and system reliability patterns. Handles critical outages, communication strategies, and continuous improvement. Use IMMEDIATELY for production incidents or SRE practices.
Guide incident response from detection to post-mortem using SRE principles, severity classification, on-call management, blameless culture, and communication protocols. Use when setting up incident processes, designing escalation policies, or conducting post-mortems.
Expert in SRE practices, incident management, root cause analysis, and automated remediation.
Expert SRE incident responder specializing in rapid problem resolution.
Go back through the previous year of work and create a Notion doc that groups relevant links into projects that can then be documented as SRED projects.
Senior DevOps Engineer with expertise in CI/CD automation, infrastructure as code, monitoring, and SRE practices. Proficient in cloud platforms, containerization, configuration management, and building scalable DevOps pipelines with focus on automation and operational excellence.
Prometheus, Grafana, CloudWatch, Azure Monitor, Stackdriver, logging, alerting, and SRE practices
Expert guidance for designing, implementing, and maintaining cloud infrastructure using Experience in Infrastructure as Code (IaC) principles. Use this skill for architecting cloud solutions, setting up CI/CD pipelines, implementing observability, and following SRE best practices.
Use when defining, reviewing, or operating SLOs/SLIs/error budgets. Triggers on "define an SLO", "what should our SLO be", "error budget", "burn rate", "SLI", "service level objective", "Google SRE workbook", "multi-window burn-rate alert", or any reliability-target question. Ships SLO designer, error-budget calculator with multi-window burn-rate thresholds, and SLO reviewer that catches the common bugs (target too aggressive, window too short, conflicting SLOs, no SLI definition). 4 references on SLO principles + SLI design + error budget math + composition with feature-flags-architect/chaos-engineering/kubernetes-operator. NOT a generic observability skill — specifically the SLO discipline.