oodle-alerting

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Oodle Alerting — Notifiers, Policies, Muting

Oodle 告警系统——通知器、策略与静音规则

This skill teaches the agent to wire alert routing end-to-end: notifiers (where alerts go), notification policies (how they're matched and routed), and muting rules (when to silence them).
本技能指导Agent完成端到端的告警路由配置:包括通知器(告警接收渠道)、通知策略(告警匹配与路由规则)以及静音规则(告警静音时机)。

Prerequisites

前置条件

bash
brew install oodle-ai/oodle/oodle
oodle configure
Verify all three subsystems respond:
bash
oodle notifiers list -o json | jq 'length'
oodle notification-policies list -o json | jq 'length'
oodle muting-rules list -o json | jq 'length'
bash
brew install oodle-ai/oodle/oodle
oodle configure
验证三个子系统均可正常响应:
bash
oodle notifiers list -o json | jq 'length'
oodle notification-policies list -o json | jq 'length'
oodle muting-rules list -o json | jq 'length'

Command Execution Order

命令执行顺序

Before running any oodle command:
  1. Check whether the required resource ID or name is already in context.
  2. If not, run the discovery command (e.g.,
    oodle notifiers list -o json
    ).
  3. If the result is ambiguous, ask the user to confirm before proceeding.
  4. Run the target command with the resolved ID.
  5. Do not run speculative commands (e.g., do not
    delete
    a notifier without checking which policies reference it).
运行任何oodle命令前:
  1. 检查上下文是否已包含所需的资源ID或名称。
  2. 如果没有,运行发现命令(例如:
    oodle notifiers list -o json
    )。
  3. 如果结果存在歧义,先请求用户确认再继续。
  4. 使用解析后的ID运行目标命令。
  5. 不要执行推测性命令(例如:在未检查哪些策略引用该通知器的情况下,不要执行
    delete
    操作)。

Quick Reference

快速参考

TaskCommand
List notifiers
oodle notifiers list -o json
Get notifier
oodle notifiers get <id> -o json
Create notifier
oodle notifiers create -f notifier.json
Update notifier
oodle notifiers update <id> -f notifier.json
Delete notifier
oodle notifiers delete <id> --force
List policies
oodle notification-policies list -o json
Get policy
oodle notification-policies get <id> -o json
Create policy
oodle notification-policies create -f policy.json
Update policy
oodle notification-policies update <id> -f policy.json
Delete policy
oodle notification-policies delete <id> --force
List muting rules
oodle muting-rules list -o json
Get muting rule
oodle muting-rules get <id> -o json
Create muting rule
oodle muting-rules create -f mute.json
Update muting rule
oodle muting-rules update <id> -f mute.json
Delete muting rule
oodle muting-rules delete <id> --force
任务命令
列出通知器
oodle notifiers list -o json
获取通知器详情
oodle notifiers get <id> -o json
创建通知器
oodle notifiers create -f notifier.json
更新通知器
oodle notifiers update <id> -f notifier.json
删除通知器
oodle notifiers delete <id> --force
列出策略
oodle notification-policies list -o json
获取策略详情
oodle notification-policies get <id> -o json
创建策略
oodle notification-policies create -f policy.json
更新策略
oodle notification-policies update <id> -f policy.json
删除策略
oodle notification-policies delete <id> --force
列出静音规则
oodle muting-rules list -o json
获取静音规则详情
oodle muting-rules get <id> -o json
创建静音规则
oodle muting-rules create -f mute.json
更新静音规则
oodle muting-rules update <id> -f mute.json
删除静音规则
oodle muting-rules delete <id> --force

Common Operations

常见操作

Notifiers — channel definitions

通知器——渠道定义

A notifier wraps a destination (Slack, PagerDuty, email, webhook). The
config
block depends on
type
.
Slack:
json
{
  "name": "slack-ops",
  "type": "slack",
  "config": { "webhookUrl": "https://hooks.slack.com/services/T000/B000/XXXX" }
}
PagerDuty:
json
{
  "name": "pd-platform",
  "type": "pagerduty",
  "config": { "integrationKey": "abc123def456" }
}
Email:
json
{
  "name": "email-platform",
  "type": "email",
  "config": { "addresses": ["oncall@example.com", "alerts@example.com"] }
}
Webhook:
json
{
  "name": "webhook-incident-bot",
  "type": "webhook",
  "config": { "url": "https://incidents.example.com/oodle" }
}
bash
undefined
通知器封装了告警接收目标(Slack、PagerDuty、邮件、Webhook)。
config
配置块的内容取决于
type
类型。
Slack:
json
{
  "name": "slack-ops",
  "type": "slack",
  "config": { "webhookUrl": "https://hooks.slack.com/services/T000/B000/XXXX" }
}
PagerDuty:
json
{
  "name": "pd-platform",
  "type": "pagerduty",
  "config": { "integrationKey": "abc123def456" }
}
邮件:
json
{
  "name": "email-platform",
  "type": "email",
  "config": { "addresses": ["oncall@example.com", "alerts@example.com"] }
}
Webhook:
json
{
  "name": "webhook-incident-bot",
  "type": "webhook",
  "config": { "url": "https://incidents.example.com/oodle" }
}
bash
undefined

✅ CORRECT

✅ 正确操作

oodle notifiers create -f notifier.json
oodle notifiers create -f notifier.json

❌ WRONG — missing config block, server returns 400

❌ 错误操作——缺少config配置块,服务器返回400错误

oodle notifiers create -f <(echo '{"name":"x","type":"slack"}')
undefined
oodle notifiers create -f <(echo '{"name":"x","type":"slack"}')
undefined

Notification policies — routing

通知策略——路由规则

A notification policy matches alerts by labels and routes them to a receiver (notifier name).
json
{
  "name": "platform-prod",
  "matchers": [
    {"name": "team", "value": "platform"},
    {"name": "env",  "value": "prod"}
  ],
  "receiver": "slack-ops",
  "routes": [
    {
      "matchers": [{"name": "severity", "value": "critical"}],
      "receiver": "pd-platform"
    }
  ]
}
bash
undefined
通知策略通过标签匹配告警,并将其路由至指定接收方(通知器名称)。
json
{
  "name": "platform-prod",
  "matchers": [
    {"name": "team", "value": "platform"},
    {"name": "env",  "value": "prod"}
  ],
  "receiver": "slack-ops",
  "routes": [
    {
      "matchers": [{"name": "severity", "value": "critical"}],
      "receiver": "pd-platform"
    }
  ]
}
bash
undefined

✅ CORRECT

✅ 正确操作

oodle notification-policies create -f policy.json
oodle notification-policies create -f policy.json

❌ WRONG — references a notifier that doesn't exist; server returns 400

❌ 错误操作——引用了不存在的通知器;服务器返回400错误

(no
receiver
set or
receiver: "non-existent"
)

(未设置
receiver
receiver: "non-existent"
)

undefined
undefined

Muting rules — scheduled silence

静音规则——定时静音

Muting rules silence alerts whose labels match the matchers, between
startsAt
and
endsAt
(RFC 3339).
json
{
  "name": "deploy-window-2024-01-15",
  "matchers": [
    {"name": "service", "value": "api"},
    {"name": "env",     "value": "prod"}
  ],
  "startsAt": "2024-01-15T02:00:00Z",
  "endsAt":   "2024-01-15T06:00:00Z"
}
bash
undefined
静音规则会在
startsAt
endsAt
(遵循RFC 3339标准)时间段内,静音所有标签匹配规则的告警。
json
{
  "name": "deploy-window-2024-01-15",
  "matchers": [
    {"name": "service", "value": "api"},
    {"name": "env",     "value": "prod"}
  ],
  "startsAt": "2024-01-15T02:00:00Z",
  "endsAt":   "2024-01-15T06:00:00Z"
}
bash
undefined

✅ CORRECT — bounded window

✅ 正确操作——设置了明确的时间窗口

oodle muting-rules create -f mute.json
oodle muting-rules create -f mute.json

❌ WRONG — endsAt before startsAt (rejected by server)

❌ 错误操作——endsAt早于startsAt(服务器会拒绝)

{"startsAt": "2024-01-15T06:00:00Z", "endsAt": "2024-01-15T02:00:00Z"}

{"startsAt": "2024-01-15T06:00:00Z", "endsAt": "2024-01-15T02:00:00Z"}

undefined
undefined

Best Practices

最佳实践

Test a new notifier with a low-severity monitor before wiring it to a critical alert

在将新通知器接入关键告警前,先用低级别告警监控进行测试

A misconfigured Slack webhook fails silently — alerts disappear into the void.
bash
undefined
配置错误的Slack Webhook会导致告警无声消失——直接进入“黑洞”。
bash
undefined

✅ CORRECT — create notifier, attach to a
severity=info
test monitor, fire it once,

✅ 正确操作——创建通知器,绑定到
severity=info
的测试监控,触发一次告警,

confirm message arrives, then attach to the production policy.

确认消息已接收,再将其绑定到生产策略。

oodle notifiers create -f notifier.json oodle monitors create -f test-monitor.json # severity=info, low threshold
oodle notifiers create -f notifier.json oodle monitors create -f test-monitor.json # 级别为info,阈值较低

wait for fire, confirm receipt, then:

等待告警触发,确认接收后执行:

oodle notification-policies update pol_123 -f policy.json # adds the new notifier
oodle notification-policies update pol_123 -f policy.json # 添加新通知器

❌ WRONG — attach an untested notifier directly to a
severity=critical
policy

❌ 错误操作——直接将未测试的通知器绑定到
severity=critical
的策略

oodle notifiers create -f notifier.json oodle notification-policies create -f policy-critical.json
undefined
oodle notifiers create -f notifier.json oodle notification-policies create -f policy-critical.json
undefined

Always
get
a notifier before deleting it — check which policies reference it

删除通知器前务必先
get
其详情——检查哪些策略引用了它

Deleting a notifier referenced by an active policy causes alerts to be dropped silently.
bash
undefined
删除被活跃策略引用的通知器会导致告警被无声丢弃。
bash
undefined

✅ CORRECT — find references first

✅ 正确操作——先查找引用关系

oodle notifiers get notif_slack_ops -o json oodle notification-policies list -o json | jq '.[] | select(.receiver=="slack-ops" or (.routes[]?.receiver=="slack-ops")) | .id'
oodle notifiers get notif_slack_ops -o json oodle notification-policies list -o json | jq '.[] | select(.receiver=="slack-ops" or (.routes[]?.receiver=="slack-ops")) | .id'

update those policies to a different receiver, then:

将这些策略更新为指向其他接收方,再执行:

oodle notifiers delete notif_slack_ops --force
oodle notifiers delete notif_slack_ops --force

❌ WRONG — delete without checking; live policies suddenly route to a missing receiver

❌ 错误操作——未检查就删除;运行中的策略会突然路由到不存在的接收方

oodle notifiers delete notif_slack_ops --force
undefined
oodle notifiers delete notif_slack_ops --force
undefined

Always set
endsAt
on muting rules — never create open-ended mutes

务必为静音规则设置
endsAt
——绝不创建无期限静音

An open-ended mute that nobody remembers becomes a permanent silence on a critical alert.
bash
undefined
无人记得的无期限静音会导致关键告警被永久屏蔽。
bash
undefined

✅ CORRECT — bounded mute window

✅ 正确操作——设置明确的静音窗口

"startsAt": "2024-01-15T02:00:00Z", "endsAt": "2024-01-15T06:00:00Z"
"startsAt": "2024-01-15T02:00:00Z", "endsAt": "2024-01-15T06:00:00Z"

❌ WRONG — open-ended; alerts stay silenced forever

❌ 错误操作——无期限;告警会一直被静音

"startsAt": "2024-01-15T02:00:00Z", "endsAt": null
undefined
"startsAt": "2024-01-15T02:00:00Z", "endsAt": null
undefined

Scope policy matchers to the smallest set of labels that uniquely identify the team

将策略匹配器限定为能唯一标识团队的最小标签集合

Overly-broad matchers (
{env: prod}
only) route every prod alert to one notifier.
bash
undefined
过于宽泛的匹配器(仅
{env: prod}
)会将所有生产环境告警路由到同一个通知器。
bash
undefined

✅ CORRECT

✅ 正确操作

"matchers": [{"name":"team","value":"platform"},{"name":"env","value":"prod"}]
"matchers": [{"name":"team","value":"platform"},{"name":"env","value":"prod"}]

❌ WRONG — every prod alert in the org routes here

❌ 错误操作——公司内所有生产环境告警都会路由到这里

"matchers": [{"name":"env","value":"prod"}]
undefined
"matchers": [{"name":"env","value":"prod"}]
undefined

Use nested
routes
for severity escalation, not separate top-level policies

使用嵌套
routes
实现级别升级,而非单独的顶层策略

A child route inherits the parent matchers; defining a separate top-level policy for
severity=critical
will double-fire.
bash
undefined
子路由会继承父路由的匹配器;为
severity=critical
单独定义顶层策略会导致告警重复触发。
bash
undefined

✅ CORRECT

✅ 正确操作

"matchers": [{"name":"team","value":"platform"}], "receiver": "slack-ops", "routes": [{"matchers":[{"name":"severity","value":"critical"}],"receiver":"pd-platform"}]
"matchers": [{"name":"team","value":"platform"}], "receiver": "slack-ops", "routes": [{"matchers":[{"name":"severity","value":"critical"}],"receiver":"pd-platform"}]

❌ WRONG — two top-level policies, both match, both fire

❌ 错误操作——两个顶层策略,均匹配告警,导致重复触发

undefined
undefined

Failure Handling

故障处理

ErrorCauseFix
401 UnauthorizedInvalid or missing API keyRun
oodle configure
or set
OODLE_API_KEY
404 Not FoundNotifier / policy / muting-rule ID does not existVerify with the matching
list
command
connection refusedWrong
OODLE_DEPLOYMENT
URL
Check
OODLE_DEPLOYMENT
env var
receiver not found
Policy references a notifier name that doesn't existRun
oodle notifiers list -o json
and match the
name
field exactly
Slack messages not arrivingWrong webhook URL or webhook deactivatedRe-issue the Slack incoming webhook; update notifier
config.webhookUrl
; re-test
PagerDuty incidents not createdWrong
integrationKey
or wrong service
Confirm the routing key in PagerDuty; update notifier
config.integrationKey
Mute didn't take effect
startsAt
is in the future or label matchers don't match the alert
oodle muting-rules get <id> -o json
and compare matchers to the firing alert's labels
429 Too Many RequestsBulk policy syncAdd
--retries 3
, throttle to <10 creates per second
错误原因修复方案
401 UnauthorizedAPI密钥无效或缺失运行
oodle configure
或设置
OODLE_API_KEY
环境变量
404 Not Found通知器/策略/静音规则ID不存在使用对应的
list
命令验证
connection refused
OODLE_DEPLOYMENT
URL错误
检查
OODLE_DEPLOYMENT
环境变量
receiver not found
策略引用了不存在的通知器名称运行
oodle notifiers list -o json
并完全匹配
name
字段
Slack消息未送达Webhook URL错误或Webhook已停用重新生成Slack incoming webhook;更新通知器的
config.webhookUrl
;重新测试
PagerDuty未创建事件
integrationKey
错误或服务错误
在PagerDuty中确认路由密钥;更新通知器的
config.integrationKey
静音规则未生效
startsAt
在未来,或标签匹配器与告警标签不匹配
执行
oodle muting-rules get <id> -o json
并将匹配器与触发告警的标签进行对比
429 Too Many Requests批量策略同步请求过于频繁添加
--retries 3
参数,将请求频率限制为每秒少于10次创建操作

References

参考资料