Loading...
Loading...
Found 3 Skills
An engineering runbook — service overview, alerts table, dashboards links, common procedures with copy-pasteable commands, on-call rotation, and an incident-response checklist. Use when the brief mentions "runbook", "ops doc", "on-call guide", "SRE doc", or "运维手册".
Use this skill when managing production incidents, designing on-call rotations, writing runbooks, conducting post-mortems, setting up status pages, or running war rooms. Triggers on incident response, incident commander, on-call schedule, pager escalation, runbook authoring, post-incident review, blameless retro, status page updates, war room coordination, severity classification, and any task requiring structured incident lifecycle management.
Design and run a monitoring system for a website or web app. Use this skill when setting up uptime checks, defining SLOs, configuring error tracking, choosing what to alert on, designing on-call rotations, or fixing alert fatigue. Triggers on monitoring, alerts, uptime, SLO, SLA, error rate, on-call, pager, alert fatigue, observability, dashboards, what should we monitor. Also triggers when an incident reveals a gap in monitoring.