Search Results: error-budget

Found 13 Skills

DevOps & Cloud Servicesalirezarezvani/claude-ski...

slo-architect

Use when defining, reviewing, or operating SLOs/SLIs/error budgets. Triggers on "define an SLO", "what should our SLO be", "error budget", "burn rate", "SLI", "service level objective", "Google SRE workbook", "multi-window burn-rate alert", or any reliability-target question. Ships SLO designer, error-budget calculator with multi-window burn-rate thresholds, and SLO reviewer that catches the common bugs (target too aggressive, window too short, conflicting SLOs, no SLI definition). 4 references on SLO principles + SLI design + error budget math + composition with feature-flags-architect/chaos-engineering/kubernetes-operator. NOT a generic observability skill — specifically the SLO discipline.

🇺🇸|EnglishTranslated

3 scripts/Checked

DevOps & Cloud Servicesdaemon-blockint-tech/agen...

cicd-engineer

Guides CI/CD for agent skills repositories and skill packages—pipeline design (build, test, validate, package), GitHub Actions for PR checks and release promotion, environment gates, secrets hygiene (no secrets in repo), skill-creator integration (quick_validate.py, package_skill.py), .skill artifact strategy, rollback, and operational runbooks for skill releases. Use when the user mentions CI/CD, CI/CD engineer, pipeline design, GitHub Actions, skill validation CI, package skills, release pipeline, deploy skills, PR checks, continuous integration, or skill release workflow—not application-only CI without skill packaging (devops), pre-flight plan go/no-go (build-validator), IDP or golden paths (platform-engineer), org-wide SLO and error-budget programs without pipeline ownership (site-reliability-engineer), or portfolio catalog governance without pipeline YAML (ai-skill-manager).

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesjeffallan/claude-skills

sre-engineer

Use when defining SLIs/SLOs, managing error budgets, or building reliable systems at scale. Invoke for incident management, chaos engineering, toil reduction, capacity planning.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicessickn33/antigravity-aweso...

observability-monitoring-slo-implement

You are an SLO (Service Level Objective) expert specializing in implementing reliability standards and error budget-based practices. Design SLO frameworks, define SLIs, and build monitoring that balances reliability with delivery velocity.

🇺🇸|EnglishTranslated

DevOps & Cloud Serviceswshobson/agents

slo-implementation

Define and implement Service Level Indicators (SLIs) and Service Level Objectives (SLOs) with error budgets and alerting. Use when establishing reliability targets, implementing SRE practices, or measuring service performance.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicessickn33/antigravity-aweso...

incident-responder

Expert SRE incident responder specializing in rapid problem resolution, modern observability, and comprehensive incident management. Masters incident command, blameless post-mortems, error budget management, and system reliability patterns. Handles critical outages, communication strategies, and continuous improvement. Use IMMEDIATELY for production incidents or SRE practices.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesthebushidocollective/han

sre-reliability-engineering

Use when building reliable and scalable distributed systems.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicespatricio0312rev/skills

alerting-dashboard-builder

Creates SLO-based alerts and operational dashboards with key charts, alert thresholds, and runbook links. Use for "alerting", "dashboards", "SLO", or "monitoring".

🇺🇸|EnglishTranslated

DevOps & Cloud Serviceselastic/agent-skills

observability-manage-slos

Create and manage SLOs in Elastic Observability using the Kibana API. Use when defining SLIs, setting error budgets, or managing SLO lifecycle.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesabsolutelyskilled/absolut...

site-reliability

Use this skill when implementing SRE practices, defining error budgets, reducing toil, planning capacity, or improving service reliability. Triggers on SRE, error budgets, SLOs, SLAs, toil automation, incident management, postmortems, on-call rotation, capacity planning, chaos engineering, and any task requiring reliability engineering decisions.

🇺🇸|EnglishTranslated

DevOps & Cloud Services404kidwiz/claude-supercod...

sre-engineer

Expert Site Reliability Engineer specializing in SLOs, error budgets, and reliability engineering practices. Proficient in incident management, post-mortems, capacity planning, and building scalable, resilient systems with focus on reliability, availability, and performance.

🇺🇸|EnglishTranslated

DevOps & Cloud Servicesdaemon-blockint-tech/agen...

site-reliability-engineer

Guides Site Reliability Engineering—SLI/SLO and error budgets, reliability dashboards and burn-rate alerting, production readiness reviews, capacity planning for availability, toil reduction, dependency and failure-mode analysis, release reliability (canaries, rollback criteria), and service-owner incident mitigation tied to customer impact. Use when defining or operating SLOs, measuring error budget burn, improving service reliability, running PRRs before launch, planning scalable resilient capacity, or leading technical mitigation during outages—not for CI/CD pipeline implementation (devops), incident program and paging policy design (incident-management-engineer), cloud access and patch tickets (cloud-system-administrator), load-test profiling (performance-engineer), rollout cutover strategy (deployment-strategist), or greenfield cloud build-out (cloud-engineer).

🇺🇸|EnglishTranslated