oodle-drop-rules
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseOodle Drop Rules — Cost Control
Oodle 丢弃规则——成本控制
This skill teaches the agent to drop or sample high-volume metrics safely: estimate the impact first, prefer sampling over dropping, and never silently delete a metric a dashboard depends on.
本技能指导Agent安全地丢弃或采样高容量指标:先评估影响,优先选择采样而非丢弃,绝不静默删除仪表板依赖的指标。
Prerequisites
前提条件
bash
brew install oodle-ai/oodle/oodle
oodle configureConfirm the drop-rules endpoint works:
bash
oodle drop-rules list -o json | jq 'length'bash
brew install oodle-ai/oodle/oodle
oodle configure确认丢弃规则端点可用:
bash
oodle drop-rules list -o json | jq 'length'Command Execution Order
命令执行顺序
Before running any oodle command:
- Check whether the target metric prefix and matchers are already in context.
- If not, run to estimate impact.
oodle metrics list --match <prefix> -o json | jq 'length' - Confirm no critical dashboards or monitors depend on the affected series:
oodle dashboards list -o json | jq '.[] | select(.panels[]?.query | contains("<metric>"))'oodle monitors list -o json | jq '.[] | select(.query | contains("<metric>"))'
- Prefer (e.g.
action: sample) on the first attempt; switch tosampleRate: 0.1only after the sampled data confirms low value.action: drop - Run .
oodle drop-rules create -f rule.json
在运行任何oodle命令之前:
- 检查目标指标前缀和匹配器是否已在上下文当中。
- 如果没有,运行 来评估影响范围。
oodle metrics list --match <prefix> -o json | jq 'length' - 确认没有关键仪表板或监控依赖受影响的序列:
oodle dashboards list -o json | jq '.[] | select(.panels[]?.query | contains("<metric>"))'oodle monitors list -o json | jq '.[] | select(.query | contains("<metric>"))'
- 首次尝试优先选择 (例如
action: sample);只有在采样数据确认指标价值较低后,才切换为sampleRate: 0.1。action: drop - 运行 。
oodle drop-rules create -f rule.json
Quick Reference
快速参考
| Task | Command |
|---|---|
| List rules | |
| Get rule | |
| Create rule | |
| Update rule | |
| Delete rule | |
| 任务 | 命令 |
|---|---|
| 列出规则 | |
| 获取规则 | |
| 创建规则 | |
| 更新规则 | |
| 删除规则 | |
Common Operations
常见操作
Rule schema
规则 schema
json
{
"name": "drop-noisy-debug-metrics",
"matchers": [
{"name": "level", "value": "debug"},
{"name": "env", "value": "staging"}
],
"action": "drop",
"sampleRate": null
}| Field | Meaning |
|---|---|
| All matchers must match for the rule to fire (logical AND) |
| |
| Required when |
json
{
"name": "drop-noisy-debug-metrics",
"matchers": [
{"name": "level", "value": "debug"},
{"name": "env", "value": "staging"}
],
"action": "drop",
"sampleRate": null
}| 字段 | 含义 |
|---|---|
| 所有匹配器必须全部匹配,规则才会生效(逻辑与) |
| |
| 当 |
Estimating impact before creating a rule
创建规则前评估影响范围
bash
undefinedbash
undefined✅ CORRECT — count series that will be affected
✅ 正确做法 — 统计将受影响的序列数量
oodle metrics list --match "debug_" -o json | jq 'length'
oodle metrics list --match "debug_" -o json | jq 'length'
✅ CORRECT — confirm no dashboard panel queries the metric
✅ 正确做法 — 确认没有仪表板面板查询该指标
oodle dashboards list -o json | jq '.[] | select(.panels[]?.query | contains("debug_traffic_total"))'
oodle dashboards list -o json | jq '.[] | select(.panels[]?.query | contains("debug_traffic_total"))'
✅ CORRECT — confirm no monitor depends on it
✅ 正确做法 — 确认没有监控依赖该指标
oodle monitors list -o json | jq '.[] | select(.query | contains("debug_traffic_total"))'
oodle monitors list -o json | jq '.[] | select(.query | contains("debug_traffic_total"))'
❌ WRONG — create the rule first, find out from on-call later
❌ 错误做法 — 先创建规则,之后从值班人员那里发现问题
oodle drop-rules create -f rule.json
undefinedoodle drop-rules create -f rule.json
undefinedCreating a sample
rule (safer first step)
sample创建 sample
规则(更安全的第一步)
samplejson
{
"name": "sample-debug-metrics-staging",
"matchers": [
{"name": "__name__", "value": "debug_traffic_total"},
{"name": "env", "value": "staging"}
],
"action": "sample",
"sampleRate": 0.1
}bash
undefinedjson
{
"name": "sample-debug-metrics-staging",
"matchers": [
{"name": "__name__", "value": "debug_traffic_total"},
{"name": "env", "value": "staging"}
],
"action": "sample",
"sampleRate": 0.1
}bash
undefined✅ CORRECT — keep 10% of the series; observe for a week before dropping
✅ 正确做法 — 保留10%的序列;观察一周后再考虑丢弃
oodle drop-rules create -f rule.json
undefinedoodle drop-rules create -f rule.json
undefinedPromoting a sample
rule to drop
sampledrop将 sample
规则升级为 drop
规则
sampledropbash
undefinedbash
undefined✅ CORRECT — get → switch action → update
✅ 正确做法 — 获取规则 → 切换动作 → 更新规则
oodle drop-rules get dr_123 -o json > rule.json
jq '.action = "drop" | .sampleRate = null' rule.json > rule.new.json
oodle drop-rules update dr_123 -f rule.new.json
oodle drop-rules get dr_123 -o json > rule.json
jq '.action = "drop" | .sampleRate = null' rule.json > rule.new.json
oodle drop-rules update dr_123 -f rule.new.json
❌ WRONG — partial payload nulls matchers
❌ 错误做法 — 部分负载会清空匹配器
oodle drop-rules update dr_123 -f <(echo '{"action":"drop"}')
undefinedoodle drop-rules update dr_123 -f <(echo '{"action":"drop"}')
undefinedDeleting (re-enabling ingestion)
删除规则(重新启用数据摄入)
bash
undefinedbash
undefined✅ CORRECT
✅ 正确做法
oodle drop-rules get dr_123 -o json > /dev/null
oodle drop-rules delete dr_123 --force
oodle drop-rules get dr_123 -o json > /dev/null
oodle drop-rules delete dr_123 --force
❌ WRONG — speculative delete by name match
❌ 错误做法 — 通过名称匹配推测性删除
oodle drop-rules delete "$(oodle drop-rules list | grep debug | awk '{print $1}')" --force
undefinedoodle drop-rules delete "$(oodle drop-rules list | grep debug | awk '{print $1}')" --force
undefinedBest Practices
最佳实践
Estimate impact with oodle metrics list --match <prefix> -o json | jq 'length'
before creating a rule
oodle metrics list --match <prefix> -o json | jq 'length'创建规则前,使用 oodle metrics list --match <prefix> -o json | jq 'length'
评估影响范围
oodle metrics list --match <prefix> -o json | jq 'length'A drop rule that matches more series than expected can hide real signal.
bash
undefined匹配超出预期数量序列的丢弃规则可能会掩盖真实信号。
bash
undefined✅ CORRECT
✅ 正确做法
oodle metrics list --match "kube_pod_" -o json | jq 'length'
oodle metrics list --match "kube_pod_" -o json | jq 'length'
1742 series — confirm with the team that all 1742 are safe to drop before creating a rule
1742个序列 — 创建规则前,与团队确认所有1742个序列都可以安全丢弃
❌ WRONG — create rule based on a guess; later discover a critical metric was matched
❌ 错误做法 — 凭猜测创建规则;之后发现关键指标被匹配
oodle drop-rules create -f rule.json
undefinedoodle drop-rules create -f rule.json
undefinedPrefer action: sample
with sampleRate: 0.1
on the first iteration
action: samplesampleRate: 0.1首次迭代优先选择 action: sample
并设置 sampleRate: 0.1
action: samplesampleRate: 0.1Sampling preserves enough signal to confirm the metric truly is low-value before fully dropping it.
bash
undefined采样会保留足够的信号,以便在完全丢弃前确认指标确实价值较低。
bash
undefined✅ CORRECT — week 1: sample at 10%, observe dashboards
✅ 正确做法 — 第一周:以10%比例采样,观察仪表板
"action": "sample", "sampleRate": 0.1
"action": "sample", "sampleRate": 0.1
week 2: if no dashboards or monitors regressed, switch to drop
第二周:如果没有仪表板或监控出现异常,切换为丢弃
"action": "drop", "sampleRate": null
"action": "drop", "sampleRate": null
❌ WRONG — drop on first attempt; can break a dashboard nobody remembered
❌ 错误做法 — 首次尝试就丢弃;可能会破坏无人记得的仪表板
"action": "drop"
undefined"action": "drop"
undefinedAlways include both env
and service
(or __name__
) in matchers
envservice__name__匹配器中始终同时包含 env
和 service
(或 __name__
)
envservice__name__Broad matchers like alone can match production metrics that happen to share a label.
{level: debug}bash
undefined仅使用 这类宽泛的匹配器,可能会匹配到恰好共享标签的生产环境指标。
{level: debug}bash
undefined✅ CORRECT — scoped to one env
✅ 正确做法 — 限定在单个环境
"matchers": [{"name":"level","value":"debug"},{"name":"env","value":"staging"}]
"matchers": [{"name":"level","value":"debug"},{"name":"env","value":"staging"}]
❌ WRONG — also drops debug metrics in prod
❌ 错误做法 — 同时丢弃生产环境的调试指标
"matchers": [{"name":"level","value":"debug"}]
undefined"matchers": [{"name":"level","value":"debug"}]
undefinedAlways get
before update
to preserve fields
getupdate更新前始终先 get
规则以保留字段
getUpdate is a full-document replace.
bash
undefined更新操作是全文档替换。
bash
undefined✅ CORRECT
✅ 正确做法
oodle drop-rules get dr_123 -o json > rule.json
jq '.sampleRate = 0.05' rule.json > rule.new.json
oodle drop-rules update dr_123 -f rule.new.json
oodle drop-rules get dr_123 -o json > rule.json
jq '.sampleRate = 0.05' rule.json > rule.new.json
oodle drop-rules update dr_123 -f rule.new.json
❌ WRONG — drops matchers and action
❌ 错误做法 — 会丢失匹配器和动作
oodle drop-rules update dr_123 -f <(echo '{"sampleRate":0.05}')
undefinedoodle drop-rules update dr_123 -f <(echo '{"sampleRate":0.05}')
undefinedName rules with <action>-<metric-or-domain>-<scope>
<action>-<metric-or-domain>-<scope>规则命名采用 <action>-<metric-or-domain>-<scope>
格式
<action>-<metric-or-domain>-<scope>Predictable names make it easy to find and revert a rule when a dashboard breaks.
bash
undefined可预测的命名便于在仪表板出现问题时快速查找和恢复规则。
bash
undefined✅ CORRECT
✅ 正确做法
"name": "drop-debug-metrics-staging"
"name": "sample-kube-pod-info-prod"
"name": "drop-debug-metrics-staging"
"name": "sample-kube-pod-info-prod"
❌ WRONG
❌ 错误做法
"name": "rule1"
undefined"name": "rule1"
undefinedFailure Handling
故障处理
| Error | Cause | Fix |
|---|---|---|
| 401 Unauthorized | Invalid or missing API key | Run |
| 404 Not Found | Drop rule ID does not exist | Verify with |
| connection refused | Wrong | Check |
| | Add |
| Dashboard panel suddenly empty | Drop rule matched a metric the panel queries | Run |
Monitor went into | Drop rule matched the monitor's metric | Same fix as above; alternatively switch the rule from |
| Cost did not decrease | Matchers don't actually match the high-volume series | Re-run |
| 429 Too Many Requests | Bulk drop-rule sync | Add |
| 错误 | 原因 | 修复方案 |
|---|---|---|
| 401 Unauthorized | API密钥无效或缺失 | 运行 |
| 404 Not Found | 丢弃规则ID不存在 | 使用 |
| connection refused | | 检查 |
| 设置了 | 添加取值在0到1之间的 |
| 仪表板面板突然为空 | 丢弃规则匹配了面板查询的指标 | 运行 |
监控进入 | 丢弃规则匹配了监控的指标 | 修复方案同上;或者将规则从 |
| 成本未降低 | 匹配器未实际匹配高容量序列 | 重新运行 |
| 429 Too Many Requests | 批量同步丢弃规则 | 添加 |