Loading...
Loading...
Found 22 Skills
Guides technical program management—multi-team initiatives with dependencies, milestones, RAID tracking, launch readiness, stakeholder status, and cross-functional coordination across engineering, product, and infrastructure (not application code or BRDs). Use when running a technical program, dependency maps, milestones, exec status, or unblocking cross-team delivery—not for requirements (business-analyst), rollout (deployment-strategist), CI/CD (devops), data roadmaps (data-manager), or single-team delivery (fullstack-software-engineer). Incidents: incident-management-engineer. Architecture: senior-system-architecture. Strategy: business-consultant. Comms: communication-lead. DC site build: data-center-design-execution-lead. DC portfolio: data-center-portfolio-planning-execution-lead. M&A/financing deal execution and closing matrix: transaction-manager. Exec/VIP and community customer escalations: community-executive-escalations-program-manager. CVD/disclosure: technical-program-manager-security-cvd.
Use when defining SLIs/SLOs, managing error budgets, or building reliable systems at scale. Invoke for incident management, chaos engineering, toil reduction, capacity planning.
Conduct systematic root cause analysis to identify underlying problems. Use structured methodologies to prevent recurring issues and drive improvements.
Triage and manage production incidents. Trigger with "we have an incident", "production is down", "something is broken", "there's an outage", "SEV1", or when the user describes a production issue needing immediate response.
Automate PagerDuty tasks via Rube MCP (Composio): manage incidents, services, schedules, escalation policies, and on-call rotations. Always search tools first for current schemas.
Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management. Masters incident command, blameless post-mortems, error budget management, and system reliability patterns. Handles critical outages, communication strategies, and continuous improvement. Use IMMEDIATELY for production incidents or SRE practices.
Expert-level site reliability engineering, SLOs, incident management, and operational excellence
PagerDuty integration. Manage Users, Teams, Services, Events. Use when the user wants to interact with PagerDuty data.
When the user wants to create, optimize, or structure a status page. Also use when the user mentions "status page," "status.yourdomain.com," "uptime," "service health," "incident page," or "system status."
Emergency release workflow for critical bug fixes and security patches. Use when production issues require fast-track deployment.
Expert in SRE practices, incident management, root cause analysis, and automated remediation.
DevOps and IT Ops automation - CI/CD, monitoring, incident management, and infrastructure workflows