bmad-observability-readiness
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBMAD Observability Readiness Skill
BMAD 可观测性就绪 Skill
When to Invoke
调用时机
Use this skill when the user:
- Mentions missing or low-quality logging, metrics, or tracing.
- Requests monitoring/alerting setup before a launch or major release.
- Needs SLOs, dashboards, or on-call runbooks.
- Reports alert fatigue or noise that needs rationalization.
- Wants to ensure performance and reliability work has data coverage.
If instrumentation already exists and only specific bug fixes are required, hand over to with the backlog produced here.
bmad-development-execution当用户出现以下情况时使用本Skill:
- 提及日志、指标或链路追踪缺失或质量低下。
- 在上线或重大版本发布前请求搭建监控/告警系统。
- 需要SLO、仪表盘或值班运行手册。
- 反馈告警疲劳或告警噪声过大,需要优化。
- 希望确保性能与可靠性工作具备完善的数据覆盖。
如果已存在埋点系统,仅需修复特定问题,则将此处生成的待办清单移交至处理。
bmad-development-executionMission
目标
Deliver a comprehensive observability plan that enables diagnosis, alerting, and measurement across the system. Ensure downstream performance, reliability, and security work has trustworthy telemetry.
交付全面的可观测性方案,支持全系统的问题诊断、告警与度量。确保下游的性能、可靠性与安全工作拥有可信的遥测数据。
Inputs Required
所需输入
- Architecture diagrams and component inventory.
- Existing logging/monitoring/tracing configuration (if any).
- Current incidents, outages, or blind spots experienced by the team.
- SLAs/SLOs, business KPIs, or compliance reporting requirements.
- 架构图与组件清单。
- 现有日志/监控/链路追踪配置(若有)。
- 团队当前遇到的事件、故障或监控盲区。
- SLA/SLO、业务KPI或合规报告要求。
Outputs
输出内容
- Observability plan detailing metrics, logs, traces, dashboards, and retention policies.
- Instrumentation backlog with implementation tasks, owners, and acceptance criteria.
- SLO dashboard specification covering golden signals, alert thresholds, and runbook links.
- Updated runbook or escalation paths if gaps were discovered.
- 可观测性方案:详细说明指标、日志、链路追踪、仪表盘与数据保留策略。
- 埋点待办清单:包含实现任务、负责人与验收标准。
- SLO仪表盘规范:涵盖黄金信号、告警阈值与运行手册链接。
- 若发现缺口,更新运行手册或升级路径。
Process
执行流程
- Audit current telemetry coverage, tooling, and data retention. Document gaps.
- Define observability objectives aligned with user journeys and business KPIs.
- Design instrumentation strategy: metrics taxonomy, structured logging, trace spans, event schemas.
- Establish SLOs, SLIs, and alerting strategy with on-call expectations and noise controls.
- Produce dashboards/reporting requirements and data governance notes.
- Create backlog with prioritized instrumentation tasks and verification approach.
- 审计当前遥测覆盖范围、工具与数据保留策略,记录存在的缺口。
- 定义与用户旅程及业务KPI对齐的可观测性目标。
- 设计埋点策略:指标分类体系、结构化日志、链路追踪跨度、事件 schema。
- 建立SLO、SLI与告警策略,明确值班预期与噪声控制措施。
- 制定仪表盘/报告需求与数据治理说明。
- 创建包含优先级埋点任务与验证方法的待办清单。
Quality Gates
质量门
- Every critical user journey has metrics and alerts defined (latency, errors, saturation, traffic).
- Logging standards specify structure, PII handling, and retention.
- Alert runbooks documented or flagged for creation.
- Observability plan references integration with performance, security, and incident workflows.
- 每个关键用户旅程都已定义对应的指标与告警(延迟、错误、饱和度、流量)。
- 日志标准明确了结构、PII处理方式与保留期限。
- 告警运行手册已文档化或标记为待创建。
- 可观测性方案提及与性能、安全及事件处理流程的集成。
Error Handling
错误处理
- If telemetry tooling is undecided, present comparative options with trade-offs.
- Highlight dependencies on platform teams or infrastructure before finalizing timeline.
- Escalate when observability requirements conflict with compliance or privacy constraints.
- 若遥测工具未确定,提供对比选项及优缺点分析。
- 在最终确定时间线前,强调对平台团队或基础设施的依赖。
- 当可观测性需求与合规或隐私约束冲突时,进行升级处理。