pagerduty-oncall

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

PagerDuty On-Call Incident Investigator

PagerDuty随叫随到事件调查工具

Authenticate, list escalation policies, fetch all incidents and their details, then analyse relevance across Envato on-call teams.
进行身份验证、列出升级策略、获取所有事件及其详细信息,然后分析Envato随叫随到团队之间的相关性。

Arguments

参数

  • $ARGUMENTS[0]
    — (optional) Start date in
    YYYY-MM-DD
    format. Defaults to today's date.
  • $ARGUMENTS[1]
    — (optional) End date in
    YYYY-MM-DD
    format. Defaults to today's date.
  • $ARGUMENTS[0]
    —(可选)开始日期,格式为
    YYYY-MM-DD
    ,默认值为今日日期。
  • $ARGUMENTS[1]
    —(可选)结束日期,格式为
    YYYY-MM-DD
    ,默认值为今日日期。

Target Escalation Policies

目标升级策略

The list of escalation policies to investigate is resolved in order:
  1. config.json
    escalation_policies
    array in config.json
  2. PD_ESCALATION_POLICIES
    — comma-separated env var (e.g.
    "Elements On Call, Platform Engineering (GPET) On-Call"
    )
  3. If both are empty, all escalation policies are included
需要调查的升级策略列表按以下顺序确定:
  1. config.json
    config.json中的
    escalation_policies
    数组
  2. PD_ESCALATION_POLICIES
    — 逗号分隔的环境变量(例如
    "Elements On Call, Platform Engineering (GPET) On-Call"
  3. 如果两者都为空,则包含所有升级策略

System Requirements

系统要求

Output Directory

输出目录

All intermediate JSON and the final report are saved to:
.pagerduty-oncall-tmp/
├── ep-list.json              # Parsed escalation policies
├── incidents.json            # Parsed incident list (filtered by target EPs)
├── logs/<INCIDENT_ID>.json   # Parsed log per incident
├── notes/<INCIDENT_ID>.json  # Parsed notes per incident
├── analytics/<INCIDENT_ID>.json # Parsed analytics per incident
├── summary.json              # Execution summary (counts, errors)
└── report.md                 # Final analysis report
所有中间JSON文件和最终报告将保存至:
.pagerduty-oncall-tmp/
├── ep-list.json              # 解析后的升级策略
├── incidents.json            # 解析后的事件列表(按目标EP过滤)
├── logs/<INCIDENT_ID>.json   # 每个事件的解析日志
├── notes/<INCIDENT_ID>.json  # 每个事件的解析备注
├── analytics/<INCIDENT_ID>.json # 每个事件的解析分析数据
├── summary.json              # 执行摘要(数量、错误信息)
└── report.md                 # 最终分析报告

Execution

执行步骤

1. Fetch All Data

1. 获取所有数据

Run the single fetch script. It handles authentication, EP listing, incident listing, and gathering logs/notes/analytics for each incident — all sequentially to avoid PagerDuty API rate limits.
bash
node scripts/fetch-pd.js .pagerduty-oncall-tmp $ARGUMENTS[0] $ARGUMENTS[1]
If this fails with an authentication error, use
AskUserQuestion
to inform the user and link to the PagerDuty CLI User Guide for setup instructions. Do NOT continue until the script succeeds.
运行单个获取脚本。它会处理身份验证、EP列表、事件列表,以及收集每个事件的日志/备注/分析数据——所有操作按顺序执行,以避免触发PagerDuty API速率限制。
bash
node scripts/fetch-pd.js .pagerduty-oncall-tmp $ARGUMENTS[0] $ARGUMENTS[1]
如果脚本因身份验证错误失败,请使用
AskUserQuestion
告知用户,并链接到PagerDuty CLI用户指南获取设置说明。在脚本成功运行前,请勿继续后续操作。

2. Analyse and Report

2. 分析并生成报告

Read
summary.json
first to understand the scope. Then read
incidents.json
and all files from
logs/
,
notes/
, and
analytics/
subdirectories using the Read tool.
Produce a structured analysis and save it using Write to
.pagerduty-oncall-tmp/report.md
:
  1. Incident Summary Table — For each incident: ID, title, service, escalation policy, status, urgency, created/resolved timestamps (user's local time, not UTC), duration
  2. Cross-Team Correlation — Identify incidents that overlap in time across different escalation policies. Flag potential cascading failures or shared root causes
  3. Timeline — Chronological view of all incidents across all teams in user's local time, highlighting clusters of activity
  4. Key Findings — Patterns, recurring services, repeated triggers, or escalation policy gaps
  5. Recommendations — Actionable suggestions based on the analysis
After writing the report, inform the user of the report location:
.pagerduty-oncall-tmp/report.md
<tags> <mode>think</mode> <custom>yes</custom> </tags>
首先读取
summary.json
以了解范围。然后使用读取工具读取
incidents.json
以及
logs/
notes/
analytics/
子目录中的所有文件。
生成结构化分析并使用写入工具保存至
.pagerduty-oncall-tmp/report.md
  1. 事件摘要表 — 针对每个事件:ID、标题、服务、升级策略、状态、优先级、创建/解决时间戳(用户本地时间,非UTC)、持续时间
  2. 跨团队关联 — 识别不同升级策略中时间重叠的事件。标记潜在的级联故障或共同根本原因
  3. 时间线 — 所有团队所有事件的 chronological 视图(按用户本地时间),突出显示活动集群
  4. 关键发现 — 模式、重复出现的服务、重复触发因素或升级策略漏洞
  5. 建议 — 基于分析的可操作建议
写入报告后,告知用户报告位置:
.pagerduty-oncall-tmp/report.md
<tags> <mode>think</mode> <custom>yes</custom> </tags>