Loading...
Loading...
Found 28 Skills
Use this skill when the user asks to "investigate incident", "triage this alert", "what's firing", "who got paged", "incident response", "check incident status", "SLO breaching", "error budget burned", "check service level", "SLI status", "who was notified", "check notification delivery", "verify alert routing", "MTTR", "incident severity", "error budget", "burn rate", "acknowledge incident", "resolve incident", "production incident", "what alerts are active", "incident timeline", "on-call triage", or wants to triage, manage, or respond to incidents using alerts, SLOs, and notifications.
Use this skill when managing production incidents, designing on-call rotations, writing runbooks, conducting post-mortems, setting up status pages, or running war rooms. Triggers on incident response, incident commander, on-call schedule, pager escalation, runbook authoring, post-incident review, blameless retro, status page updates, war room coordination, severity classification, and any task requiring structured incident lifecycle management.
Log a workflow mistake, fix its root cause, and graduate the lesson to learned memory. Use when the agent makes an error you want to prevent recurring.
Use when defining SLIs/SLOs, managing error budgets, or building reliable systems at scale. Invoke for incident management, chaos engineering, toil reduction, capacity planning.
Automate PagerDuty tasks via Rube MCP (Composio): manage incidents, services, schedules, escalation policies, and on-call rotations. Always search tools first for current schemas.
Expert-level site reliability engineering, SLOs, incident management, and operational excellence
Conduct systematic root cause analysis to identify underlying problems. Use structured methodologies to prevent recurring issues and drive improvements.
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
This document describes the PagerDuty REST APIs.. Use when working with the PagerDuty API or when the user needs to interact with this API.
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management. Masters incident command, blameless post-mortems, error budget management, and system reliability patterns. Handles critical outages, communication strategies, and continuous improvement. Use IMMEDIATELY for production incidents or SRE practices.
Triage and manage production incidents. Trigger with "we have an incident", "production is down", "something is broken", "there's an outage", "SEV1", or when the user describes a production issue needing immediate response.
Expert in SRE practices, incident management, root cause analysis, and automated remediation.