Loading...
Loading...
Compare original and translation side by side
| # | Discipline | Use when … |
|---|---|---|
| 1 | Forecast and Prediction | Predicting future metric values for capacity planning, cost estimation, or proactive alerting |
| 2 | Detecting Changes | A metric shifted — find when the character of the signal changed, regardless of whether it crossed a limit |
| 3 | Detecting Violations | A metric is currently out of bounds — find entities that exceed or fall below an acceptable range |
| 4 | Timeseries Characteristics | Characterizing a signal's seasonality, noise level, and trend before further analysis |
| # | 领域 | 适用场景…… |
|---|---|---|
| 1 | 预测与预报 | 为容量规划、成本估算或主动告警预测未来指标值 |
| 2 | 变化检测 | 指标发生“偏移”——找出信号特征发生变化的时间点,无论是否超出阈值 |
| 3 | 违规检测 | 指标当前“超出范围”——找出超出或低于可接受范围的实体 |
| 4 | 时间序列特征 | 在进一步分析前,表征信号的季节性、噪声水平和趋势 |
| Question | Tool | Why |
|---|---|---|
| "Did this metric change in the last N hours?" | | Detects when the signal's character changed (spike, step, trend onset, variability shift) without requiring a known acceptable limit |
| "Which services spiked or dropped recently?" | | Finds the specific entities and timestamps where change occurred; returns empty for stable signals |
| "When did CPU start trending up?" | | Pinpoints the onset of a directional shift |
| "Which hosts are currently above 90% CPU?" | | Known fixed limit — fire alerts when exceeded |
| "Which services are currently above their usual load?" | | Learns the normal distribution from the data and flags sustained threshold violations |
| "Which services are high right now vs. their weekly pattern?" | | Accounts for time-of-day/day-of-week patterns before deciding what is anomalous |
| 问题 | 工具 | 原因 |
|---|---|---|
| “过去N小时内该指标是否发生变化?” | | 无需已知可接受阈值,即可检测信号特征发生变化的时间点(峰值、阶跃、趋势起始、波动性变化) |
| “哪些服务近期出现峰值或骤降?” | 带 | 找出发生变化的具体实体和时间戳;信号稳定时返回空结果 |
| “CPU何时开始呈上升趋势?” | 带 | 精确定位方向性变化的起始点 |
| “哪些主机当前CPU使用率超过90%?” | | 已知固定阈值——超出时触发告警 |
| “哪些服务当前负载高于常规水平?” | | 从数据中学习正常分布,标记持续的阈值违规 |
| “哪些服务当前流量与每周模式相比异常偏高?” | | 在判断异常前,会考虑时段/星期几的模式 |
timeseries-novelty-detectionadaptiveseasonalstaticPitfall: Runningon a broad fleet to answer "which service changed load?" typically flags every service that has any variation, producing low-signal results. Useadaptive-anomaly-detectorfirst to identify entities where the load character genuinely shifted, then use the anomaly detectors to measure the severity of those specific signals.timeseries-novelty-detection
timeseries-novelty-detectionadaptiveseasonalstatic误区:在大规模集群上运行来回答“哪些服务负载发生变化?”通常会标记所有存在任何波动的服务,产生低价值结果。应先使用adaptive-anomaly-detector识别负载特征真正发生变化的实体,再使用异常检测器衡量这些特定信号的严重程度。timeseries-novelty-detection
timeseries-forecastadaptive-anomaly-detectorseasonal-baseline-anomaly-detectorstatic-threshold-analyzertimeseries-novelty-detectionexecute-dqltimeseries-forecastadaptive-anomaly-detectorseasonal-baseline-anomaly-detectorstatic-threshold-analyzertimeseries-novelty-detectionexecute-dql| Column | Content |
|---|---|
| Rank | 🥇 🥈 🥉 ordered by urgency or magnitude |
| Signal / Entity | Metric name and entity or dimension |
| Last Actual | Most recent non-null value from the historical series |
| Forecast | Point forecast at the end of the horizon |
| Range | Lower – Upper confidence band at the same horizon point |
| Trend | % change from Last Actual to Forecast: 🔴 >+20% / 🟠 +5–20% / 🟢 ±5% stable / 🔵 −5–20% declining / ⚫ <−20% sharp drop |
| Action | ✅ No action / ⚠️ Monitor / 🔴 Act now |
| 列名 | 内容 |
|---|---|
| 优先级 | 🥇 🥈 🥉 按紧急程度或影响程度排序 |
| 信号/实体 | 指标名称及实体或维度 |
| 最新实际值 | 历史序列中最近的非空值 |
| 预测值 | 预测周期结束时的点预测值 |
| 置信区间 | 同一预测时间点的下限–上限置信区间 |
| 趋势 | 从最新实际值到预测值的变化百分比:🔴 >+20% / 🟠 +5–20% / 🟢 ±5% 稳定 / 🔵 −5–20% 下降 / ⚫ <−20% 骤降 |
| 行动建议 | ✅ 无需行动 / ⚠️ 监控 / 🔴 立即处理 |
forecasttimeseries-forecastreferences/forecasting-analyzer.mdforecasttimeseries-forecastreferences/forecasting-analyzer.mdtimeseriesarrayLast(arr)arrayFirst(arr)(arrayLast - arrayFirst) / number_of_intervalsfilter isNotNull(field)toLong()Longdt.smartscape.*dt.entity.*dt.smartscape.*by:{}timeseriesarrayLast(arr)arrayFirst(arr)(arrayLast - arrayFirst) / number_of_intervalsfilter isNotNull(field)LongtoLong()dt.smartscape.*dt.entity.*by:{}dt.smartscape.*timeseries cpu = avg(dt.host.cpu.usage), from: now()-24h, interval: 1h, by: {dt.smartscape.host}
| fieldsAdd moving_avg = arrayMovingAvg(cpu, 4)
| fieldsAdd current = arrayLast(cpu)
| fieldsAdd trend = arrayLast(cpu) - arrayFirst(cpu)
| filter isNotNull(current)
| sort trend desc
| limit 20
| fields dt.smartscape.host, current, trend, moving_avgtimeseries cpu = avg(dt.host.cpu.usage), from: now()-24h, interval: 1h, by: {dt.smartscape.host}
| fieldsAdd moving_avg = arrayMovingAvg(cpu, 4)
| fieldsAdd current = arrayLast(cpu)
| fieldsAdd trend = arrayLast(cpu) - arrayFirst(cpu)
| filter isNotNull(current)
| sort trend desc
| limit 20
| fields dt.smartscape.host, current, trend, moving_avgtimeseries cpu = avg(dt.host.cpu.usage), from: now()-7d, interval: 1h, by: {dt.smartscape.host}
| fieldsAdd p95 = arrayPercentile(cpu, 95)
| fieldsAdd saturation_risk = if(p95 > 85, "HIGH", else: if(p95 > 70, "MEDIUM", else: "LOW"))
| filter isNotNull(p95)
| sort p95 desc
| fields dt.smartscape.host, p95, saturation_risktimeseries cpu = avg(dt.host.cpu.usage), from: now()-7d, interval: 1h, by: {dt.smartscape.host}
| fieldsAdd p95 = arrayPercentile(cpu, 95)
| fieldsAdd saturation_risk = if(p95 > 85, "HIGH", else: if(p95 > 70, "MEDIUM", else: "LOW"))
| filter isNotNull(p95)
| sort p95 desc
| fields dt.smartscape.host, p95, saturation_risktimeseries cpu = avg(dt.host.cpu.usage), from: now()-30d, interval: 1d, by: {dt.smartscape.host}
| fieldsAdd current = arrayLast(cpu)
| fieldsAdd daily_growth = (arrayLast(cpu) - arrayFirst(cpu)) / 30
| filter isNotNull(current)
| fieldsAdd days_to_saturation = if(daily_growth > 0, toLong((90 - current) / daily_growth), else: 9999)
| sort days_to_saturation asc
| limit 20
| fields dt.smartscape.host, current, daily_growth, days_to_saturationtimeseries cpu = avg(dt.host.cpu.usage), from: now()-30d, interval: 1d, by: {dt.smartscape.host}
| fieldsAdd current = arrayLast(cpu)
| fieldsAdd daily_growth = (arrayLast(cpu) - arrayFirst(cpu)) / 30
| filter isNotNull(current)
| fieldsAdd days_to_saturation = if(daily_growth > 0, toLong((90 - current) / daily_growth), else: 9999)
| sort days_to_saturation asc
| limit 20
| fields dt.smartscape.host, current, daily_growth, days_to_saturationtimeseries cpu = avg(dt.host.cpu.usage), from: now()-24h, interval: 1h, by: {dt.smartscape.host}
| fieldsAdd baseline_avg = arrayAvg(cpu)
| fieldsAdd current = arrayLast(cpu)
| fieldsAdd anomaly_score = if(isNotNull(current) and isNotNull(baseline_avg), abs(current - baseline_avg), else: 0)
| sort anomaly_score desc
| limit 20
| fields dt.smartscape.host, current, baseline_avg, anomaly_scoretimeseries cpu = avg(dt.host.cpu.usage), from: now()-24h, interval: 1h, by: {dt.smartscape.host}
| fieldsAdd baseline_avg = arrayAvg(cpu)
| fieldsAdd current = arrayLast(cpu)
| fieldsAdd anomaly_score = if(isNotNull(current) and isNotNull(baseline_avg), abs(current - baseline_avg), else: 0)
| sort anomaly_score desc
| limit 20
| fields dt.smartscape.host, current, baseline_avg, anomaly_scoremetrics from: now() - 1h
| filter contains(metric.key, "cpu")
| summarize count(), by: {metric.key}
| sort `count()` descmetrics from: now() - 1h
| filter contains(metric.key, "cpu")
| summarize count(), by: {metric.key}
| sort `count()` descreferences/forecasting-analyzer.mdtimeseries-forecastreferences/capacity-forecasting.mdreferences/anomaly-scoring.mdadaptive-anomaly-detectorseasonal-baseline-anomaly-detectorstatic-threshold-analyzerreferences/novelty-detection.mdtimeseries-novelty-detectionreferences/trend-detection.mdtimeseries-novelty-detectionreferences/forecasting-analyzer.mdtimeseries-forecastreferences/capacity-forecasting.mdreferences/anomaly-scoring.mdadaptive-anomaly-detectorseasonal-baseline-anomaly-detectorstatic-threshold-analyzerreferences/novelty-detection.mdtimeseries-novelty-detectionreferences/trend-detection.mdtimeseries-novelty-detectiontimeseriestimeseries