Loading...
Loading...
Found 39 Skills
Triage and manage production incidents. Trigger with "we have an incident", "production is down", "something is broken", "there's an outage", "SEV1", or when the user describes a production issue needing immediate response.
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management. Masters incident command, blameless post-mortems, error budget management, and system reliability patterns. Handles critical outages, communication strategies, and continuous improvement. Use IMMEDIATELY for production incidents or SRE practices.
Expert-level site reliability engineering, SLOs, incident management, and operational excellence
Log a workflow mistake, fix its root cause, and graduate the lesson to learned memory. Use when the agent makes an error you want to prevent recurring.
PagerDuty integration. Manage Users, Teams, Services, Events. Use when the user wants to interact with PagerDuty data.
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
When the user wants to create, optimize, or structure a status page. Also use when the user mentions "status page," "status.yourdomain.com," "uptime," "service health," "incident page," or "system status."
Emergency release workflow for critical bug fixes and security patches. Use when production issues require fast-track deployment.
Use when investigating and documenting a production incident, outage, data corruption event, or post-mortem — guides evidence collection during the investigation AND produces a rich, reproducible Root Cause Analysis report. Trigger on phrases like "write an RCA", "post-mortem for X", "document this incident", "what went wrong with...", "the pipeline broke yesterday, help me investigate", or any time the user is debugging a recently-resolved incident and wants a writeup. Also use proactively when the user finishes resolving an incident in-session and the resolution context is fresh — offer to capture it as an RCA before details fade.
Automate PagerDuty tasks via Rube MCP (Composio): manage incidents, services, schedules, escalation policies, and on-call rotations. Always search tools first for current schemas.
Guide incident response from detection to post-mortem using SRE principles, severity classification, on-call management, blameless culture, and communication protocols. Use when setting up incident processes, designing escalation policies, or conducting post-mortems.
Expert in SRE practices, incident management, root cause analysis, and automated remediation.