oodle-alerting
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOodle Alerting — Notifiers, Policies, Muting
Oodle 告警系统——通知器、策略与静音规则
This skill teaches the agent to wire alert routing end-to-end: notifiers (where alerts go), notification policies (how they're matched and routed), and muting rules (when to silence them).
本技能指导Agent完成端到端的告警路由配置:包括通知器(告警接收渠道)、通知策略(告警匹配与路由规则)以及静音规则(告警静音时机)。
Prerequisites
前置条件
bash
brew install oodle-ai/oodle/oodle
oodle configureVerify all three subsystems respond:
bash
oodle notifiers list -o json | jq 'length'
oodle notification-policies list -o json | jq 'length'
oodle muting-rules list -o json | jq 'length'bash
brew install oodle-ai/oodle/oodle
oodle configure验证三个子系统均可正常响应:
bash
oodle notifiers list -o json | jq 'length'
oodle notification-policies list -o json | jq 'length'
oodle muting-rules list -o json | jq 'length'Command Execution Order
命令执行顺序
Before running any oodle command:
- Check whether the required resource ID or name is already in context.
- If not, run the discovery command (e.g., ).
oodle notifiers list -o json - If the result is ambiguous, ask the user to confirm before proceeding.
- Run the target command with the resolved ID.
- Do not run speculative commands (e.g., do not a notifier without checking which policies reference it).
delete
运行任何oodle命令前:
- 检查上下文是否已包含所需的资源ID或名称。
- 如果没有,运行发现命令(例如:)。
oodle notifiers list -o json - 如果结果存在歧义,先请求用户确认再继续。
- 使用解析后的ID运行目标命令。
- 不要执行推测性命令(例如:在未检查哪些策略引用该通知器的情况下,不要执行操作)。
delete
Quick Reference
快速参考
| Task | Command |
|---|---|
| List notifiers | |
| Get notifier | |
| Create notifier | |
| Update notifier | |
| Delete notifier | |
| List policies | |
| Get policy | |
| Create policy | |
| Update policy | |
| Delete policy | |
| List muting rules | |
| Get muting rule | |
| Create muting rule | |
| Update muting rule | |
| Delete muting rule | |
| 任务 | 命令 |
|---|---|
| 列出通知器 | |
| 获取通知器详情 | |
| 创建通知器 | |
| 更新通知器 | |
| 删除通知器 | |
| 列出策略 | |
| 获取策略详情 | |
| 创建策略 | |
| 更新策略 | |
| 删除策略 | |
| 列出静音规则 | |
| 获取静音规则详情 | |
| 创建静音规则 | |
| 更新静音规则 | |
| 删除静音规则 | |
Common Operations
常见操作
Notifiers — channel definitions
通知器——渠道定义
A notifier wraps a destination (Slack, PagerDuty, email, webhook). The block depends on .
configtypeSlack:
json
{
"name": "slack-ops",
"type": "slack",
"config": { "webhookUrl": "https://hooks.slack.com/services/T000/B000/XXXX" }
}PagerDuty:
json
{
"name": "pd-platform",
"type": "pagerduty",
"config": { "integrationKey": "abc123def456" }
}Email:
json
{
"name": "email-platform",
"type": "email",
"config": { "addresses": ["oncall@example.com", "alerts@example.com"] }
}Webhook:
json
{
"name": "webhook-incident-bot",
"type": "webhook",
"config": { "url": "https://incidents.example.com/oodle" }
}bash
undefined通知器封装了告警接收目标(Slack、PagerDuty、邮件、Webhook)。配置块的内容取决于类型。
configtypeSlack:
json
{
"name": "slack-ops",
"type": "slack",
"config": { "webhookUrl": "https://hooks.slack.com/services/T000/B000/XXXX" }
}PagerDuty:
json
{
"name": "pd-platform",
"type": "pagerduty",
"config": { "integrationKey": "abc123def456" }
}邮件:
json
{
"name": "email-platform",
"type": "email",
"config": { "addresses": ["oncall@example.com", "alerts@example.com"] }
}Webhook:
json
{
"name": "webhook-incident-bot",
"type": "webhook",
"config": { "url": "https://incidents.example.com/oodle" }
}bash
undefined✅ CORRECT
✅ 正确操作
oodle notifiers create -f notifier.json
oodle notifiers create -f notifier.json
❌ WRONG — missing config block, server returns 400
❌ 错误操作——缺少config配置块,服务器返回400错误
oodle notifiers create -f <(echo '{"name":"x","type":"slack"}')
undefinedoodle notifiers create -f <(echo '{"name":"x","type":"slack"}')
undefinedNotification policies — routing
通知策略——路由规则
A notification policy matches alerts by labels and routes them to a receiver (notifier name).
json
{
"name": "platform-prod",
"matchers": [
{"name": "team", "value": "platform"},
{"name": "env", "value": "prod"}
],
"receiver": "slack-ops",
"routes": [
{
"matchers": [{"name": "severity", "value": "critical"}],
"receiver": "pd-platform"
}
]
}bash
undefined通知策略通过标签匹配告警,并将其路由至指定接收方(通知器名称)。
json
{
"name": "platform-prod",
"matchers": [
{"name": "team", "value": "platform"},
{"name": "env", "value": "prod"}
],
"receiver": "slack-ops",
"routes": [
{
"matchers": [{"name": "severity", "value": "critical"}],
"receiver": "pd-platform"
}
]
}bash
undefined✅ CORRECT
✅ 正确操作
oodle notification-policies create -f policy.json
oodle notification-policies create -f policy.json
❌ WRONG — references a notifier that doesn't exist; server returns 400
❌ 错误操作——引用了不存在的通知器;服务器返回400错误
(no receiver
set or receiver: "non-existent"
)
receiverreceiver: "non-existent"(未设置receiver
或receiver: "non-existent"
)
receiverreceiver: "non-existent"undefinedundefinedMuting rules — scheduled silence
静音规则——定时静音
Muting rules silence alerts whose labels match the matchers, between and (RFC 3339).
startsAtendsAtjson
{
"name": "deploy-window-2024-01-15",
"matchers": [
{"name": "service", "value": "api"},
{"name": "env", "value": "prod"}
],
"startsAt": "2024-01-15T02:00:00Z",
"endsAt": "2024-01-15T06:00:00Z"
}bash
undefined静音规则会在和(遵循RFC 3339标准)时间段内,静音所有标签匹配规则的告警。
startsAtendsAtjson
{
"name": "deploy-window-2024-01-15",
"matchers": [
{"name": "service", "value": "api"},
{"name": "env", "value": "prod"}
],
"startsAt": "2024-01-15T02:00:00Z",
"endsAt": "2024-01-15T06:00:00Z"
}bash
undefined✅ CORRECT — bounded window
✅ 正确操作——设置了明确的时间窗口
oodle muting-rules create -f mute.json
oodle muting-rules create -f mute.json
❌ WRONG — endsAt before startsAt (rejected by server)
❌ 错误操作——endsAt早于startsAt(服务器会拒绝)
{"startsAt": "2024-01-15T06:00:00Z", "endsAt": "2024-01-15T02:00:00Z"}
{"startsAt": "2024-01-15T06:00:00Z", "endsAt": "2024-01-15T02:00:00Z"}
undefinedundefinedBest Practices
最佳实践
Test a new notifier with a low-severity monitor before wiring it to a critical alert
在将新通知器接入关键告警前,先用低级别告警监控进行测试
A misconfigured Slack webhook fails silently — alerts disappear into the void.
bash
undefined配置错误的Slack Webhook会导致告警无声消失——直接进入“黑洞”。
bash
undefined✅ CORRECT — create notifier, attach to a severity=info
test monitor, fire it once,
severity=info✅ 正确操作——创建通知器,绑定到severity=info
的测试监控,触发一次告警,
severity=infoconfirm message arrives, then attach to the production policy.
确认消息已接收,再将其绑定到生产策略。
oodle notifiers create -f notifier.json
oodle monitors create -f test-monitor.json # severity=info, low threshold
oodle notifiers create -f notifier.json
oodle monitors create -f test-monitor.json # 级别为info,阈值较低
wait for fire, confirm receipt, then:
等待告警触发,确认接收后执行:
oodle notification-policies update pol_123 -f policy.json # adds the new notifier
oodle notification-policies update pol_123 -f policy.json # 添加新通知器
❌ WRONG — attach an untested notifier directly to a severity=critical
policy
severity=critical❌ 错误操作——直接将未测试的通知器绑定到severity=critical
的策略
severity=criticaloodle notifiers create -f notifier.json
oodle notification-policies create -f policy-critical.json
undefinedoodle notifiers create -f notifier.json
oodle notification-policies create -f policy-critical.json
undefinedAlways get
a notifier before deleting it — check which policies reference it
get删除通知器前务必先get
其详情——检查哪些策略引用了它
getDeleting a notifier referenced by an active policy causes alerts to be dropped silently.
bash
undefined删除被活跃策略引用的通知器会导致告警被无声丢弃。
bash
undefined✅ CORRECT — find references first
✅ 正确操作——先查找引用关系
oodle notifiers get notif_slack_ops -o json
oodle notification-policies list -o json | jq '.[] | select(.receiver=="slack-ops" or (.routes[]?.receiver=="slack-ops")) | .id'
oodle notifiers get notif_slack_ops -o json
oodle notification-policies list -o json | jq '.[] | select(.receiver=="slack-ops" or (.routes[]?.receiver=="slack-ops")) | .id'
update those policies to a different receiver, then:
将这些策略更新为指向其他接收方,再执行:
oodle notifiers delete notif_slack_ops --force
oodle notifiers delete notif_slack_ops --force
❌ WRONG — delete without checking; live policies suddenly route to a missing receiver
❌ 错误操作——未检查就删除;运行中的策略会突然路由到不存在的接收方
oodle notifiers delete notif_slack_ops --force
undefinedoodle notifiers delete notif_slack_ops --force
undefinedAlways set endsAt
on muting rules — never create open-ended mutes
endsAt务必为静音规则设置endsAt
——绝不创建无期限静音
endsAtAn open-ended mute that nobody remembers becomes a permanent silence on a critical alert.
bash
undefined无人记得的无期限静音会导致关键告警被永久屏蔽。
bash
undefined✅ CORRECT — bounded mute window
✅ 正确操作——设置明确的静音窗口
"startsAt": "2024-01-15T02:00:00Z", "endsAt": "2024-01-15T06:00:00Z"
"startsAt": "2024-01-15T02:00:00Z", "endsAt": "2024-01-15T06:00:00Z"
❌ WRONG — open-ended; alerts stay silenced forever
❌ 错误操作——无期限;告警会一直被静音
"startsAt": "2024-01-15T02:00:00Z", "endsAt": null
undefined"startsAt": "2024-01-15T02:00:00Z", "endsAt": null
undefinedScope policy matchers to the smallest set of labels that uniquely identify the team
将策略匹配器限定为能唯一标识团队的最小标签集合
Overly-broad matchers ( only) route every prod alert to one notifier.
{env: prod}bash
undefined过于宽泛的匹配器(仅)会将所有生产环境告警路由到同一个通知器。
{env: prod}bash
undefined✅ CORRECT
✅ 正确操作
"matchers": [{"name":"team","value":"platform"},{"name":"env","value":"prod"}]
"matchers": [{"name":"team","value":"platform"},{"name":"env","value":"prod"}]
❌ WRONG — every prod alert in the org routes here
❌ 错误操作——公司内所有生产环境告警都会路由到这里
"matchers": [{"name":"env","value":"prod"}]
undefined"matchers": [{"name":"env","value":"prod"}]
undefinedUse nested routes
for severity escalation, not separate top-level policies
routes使用嵌套routes
实现级别升级,而非单独的顶层策略
routesA child route inherits the parent matchers; defining a separate top-level policy for will double-fire.
severity=criticalbash
undefined子路由会继承父路由的匹配器;为单独定义顶层策略会导致告警重复触发。
severity=criticalbash
undefined✅ CORRECT
✅ 正确操作
"matchers": [{"name":"team","value":"platform"}],
"receiver": "slack-ops",
"routes": [{"matchers":[{"name":"severity","value":"critical"}],"receiver":"pd-platform"}]
"matchers": [{"name":"team","value":"platform"}],
"receiver": "slack-ops",
"routes": [{"matchers":[{"name":"severity","value":"critical"}],"receiver":"pd-platform"}]
❌ WRONG — two top-level policies, both match, both fire
❌ 错误操作——两个顶层策略,均匹配告警,导致重复触发
undefinedundefinedFailure Handling
故障处理
| Error | Cause | Fix |
|---|---|---|
| 401 Unauthorized | Invalid or missing API key | Run |
| 404 Not Found | Notifier / policy / muting-rule ID does not exist | Verify with the matching |
| connection refused | Wrong | Check |
| Policy references a notifier name that doesn't exist | Run |
| Slack messages not arriving | Wrong webhook URL or webhook deactivated | Re-issue the Slack incoming webhook; update notifier |
| PagerDuty incidents not created | Wrong | Confirm the routing key in PagerDuty; update notifier |
| Mute didn't take effect | | |
| 429 Too Many Requests | Bulk policy sync | Add |
| 错误 | 原因 | 修复方案 |
|---|---|---|
| 401 Unauthorized | API密钥无效或缺失 | 运行 |
| 404 Not Found | 通知器/策略/静音规则ID不存在 | 使用对应的 |
| connection refused | | 检查 |
| 策略引用了不存在的通知器名称 | 运行 |
| Slack消息未送达 | Webhook URL错误或Webhook已停用 | 重新生成Slack incoming webhook;更新通知器的 |
| PagerDuty未创建事件 | | 在PagerDuty中确认路由密钥;更新通知器的 |
| 静音规则未生效 | | 执行 |
| 429 Too Many Requests | 批量策略同步请求过于频繁 | 添加 |