Loading...
Loading...
Found 19 Skills
Use this skill when managing production incidents, designing on-call rotations, writing runbooks, conducting post-mortems, setting up status pages, or running war rooms. Triggers on incident response, incident commander, on-call schedule, pager escalation, runbook authoring, post-incident review, blameless retro, status page updates, war room coordination, severity classification, and any task requiring structured incident lifecycle management.
Use when defining SLIs/SLOs, managing error budgets, or building reliable systems at scale. Invoke for incident management, chaos engineering, toil reduction, capacity planning.
Conduct systematic root cause analysis to identify underlying problems. Use structured methodologies to prevent recurring issues and drive improvements.
Log a workflow mistake, fix its root cause, and graduate the lesson to learned memory. Use when the agent makes an error you want to prevent recurring.
Automate PagerDuty tasks via Rube MCP (Composio): manage incidents, services, schedules, escalation policies, and on-call rotations. Always search tools first for current schemas.
Triage and manage production incidents. Trigger with "we have an incident", "production is down", "something is broken", "there's an outage", "SEV1", or when the user describes a production issue needing immediate response.
Create Post Incident Records (PIRs) by analysing incidents discovered from PagerDuty. Orchestrates pagerduty-oncall, datadog-analyser, and traffic-spikes-investigator skills to enrich each incident with observability and traffic data, auto-determines severity, and outputs completed PIR forms. Use when asked to "create a PIR", "write a post incident record", "fill out PIR form", "incident report", "analyse incidents", or after on-call shifts need documentation.
When the user wants to create, optimize, or structure a status page. Also use when the user mentions "status page," "status.yourdomain.com," "uptime," "service health," "incident page," or "system status."
DevOps and IT Ops automation - CI/CD, monitoring, incident management, and infrastructure workflows
Use this skill when implementing SRE practices, defining error budgets, reducing toil, planning capacity, or improving service reliability. Triggers on SRE, error budgets, SLOs, SLAs, toil automation, incident management, postmortems, on-call rotation, capacity planning, chaos engineering, and any task requiring reliability engineering decisions.
You are an expert error analysis specialist with deep expertise in debugging distributed systems, analyzing production incidents, and implementing comprehensive observability solutions.
This document describes the PagerDuty REST APIs.. Use when working with the PagerDuty API or when the user needs to interact with this API.