building-incident-response-playbook
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseBuilding Incident Response Playbooks
构建事件响应剧本
When to Use
适用场景
- Establishing or maturing an incident response program from scratch
- Documenting procedures for a new incident type after a novel attack
- Automating response workflows in a SOAR platform (Cortex XSOAR, Splunk SOAR)
- Preparing for compliance audits requiring documented IR procedures (SOC 2, PCI-DSS, HIPAA)
- Conducting a gap analysis of existing IR capabilities against specific threat scenarios
Do not use for one-time ad hoc investigations; playbooks are reusable procedure documents, not case-specific reports.
- 从零开始建立或完善事件响应体系
- 在遭遇新型攻击后,为新的事件类型编写流程文档
- 在SOAR平台(Cortex XSOAR、Splunk SOAR)中自动化响应工作流
- 为需要文档化事件响应流程的合规审计做准备(SOC 2、PCI-DSS、HIPAA)
- 针对特定威胁场景,对现有事件响应能力进行差距分析
不适用场景:一次性临时调查;剧本是可复用的流程文档,而非针对特定案例的报告。
Prerequisites
前提条件
- Organizational risk assessment identifying top incident scenarios by likelihood and impact
- NIST SP 800-61r3 or SANS PICERL framework adopted as the organizational IR standard
- Asset inventory with business criticality ratings and data classification
- RACI chart defining roles: Incident Commander, SOC analysts, system administrators, legal, communications
- Existing detection capabilities inventory (SIEM rules, EDR detections, IDS signatures)
- SOAR platform access if building automated playbooks
- 已完成组织风险评估,按可能性和影响确定顶级事件场景
- 已采用NIST SP 800-61r3或SANS PICERL框架作为组织事件响应标准
- 具备资产清单,包含业务关键性评级和数据分类
- 定义角色的RACI图表:事件指挥官、SOC分析师、系统管理员、法务、沟通人员
- 现有检测能力清单(SIEM规则、EDR检测、IDS特征)
- 若构建自动化剧本,需具备SOAR平台访问权限
Workflow
工作流程
Step 1: Select and Scope the Incident Type
步骤1:选择并确定事件类型范围
Define the specific scenario the playbook will address:
- Identify the top incident types based on organizational risk assessment and historical data
- Scope each playbook to a single incident type for clarity (do not combine unrelated scenarios)
- Define trigger conditions that activate the playbook
Common playbook types:
Priority Playbooks (build first):
1. Ransomware incident response
2. Phishing/credential compromise
3. Business email compromise
4. Malware infection
5. Data breach/exfiltration
6. DDoS attack
7. Insider threat
8. Account takeover
9. Web application compromise
10. Cloud infrastructure compromise明确定义剧本将处理的特定场景:
- 根据组织风险评估和历史数据确定顶级事件类型
- 每个剧本仅针对单一事件类型,确保清晰性(不要合并无关场景)
- 定义触发剧本启动的条件
常见剧本类型:
优先构建的剧本:
1. 勒索软件事件响应
2. 钓鱼/凭证泄露
3. 企业邮箱泄露
4. 恶意软件感染
5. 数据泄露/外渗
6. DDoS攻击
7. 内部威胁
8. 账号接管
9. Web应用入侵
10. 云基础设施入侵Step 2: Define the Playbook Structure
步骤2:定义剧本结构
Every playbook should follow a consistent structure:
PLAYBOOK TEMPLATE
━━━━━━━━━━━━━━━━
1. Playbook Metadata
- Name, version, owner, last review date
- Trigger conditions
- Severity criteria
2. RACI Matrix
- Who is Responsible, Accountable, Consulted, Informed for each step
3. Detection & Triage
- How the incident is detected
- Initial triage checklist
- Severity classification criteria
4. Containment
- Short-term containment actions
- Long-term containment actions
- Evidence preservation requirements
5. Eradication
- Root cause identification
- Malware/threat removal steps
- Verification procedures
6. Recovery
- System restoration steps
- Validation criteria
- Monitoring requirements post-recovery
7. Post-Incident
- Lessons learned meeting trigger
- Report template
- Detection improvement actions
8. Communication
- Internal notification matrix
- External notification requirements (regulators, customers, law enforcement)
- Status update cadence
9. Appendices
- Tool-specific procedures
- Contact lists
- Evidence collection checklists每个剧本应遵循统一结构:
剧本模板
━━━━━━━━━━━━━━━━
1. 剧本元数据
- 名称、版本、所有者、最后审核日期
- 触发条件
- 严重程度标准
2. RACI矩阵
- 每个步骤的负责、批准、咨询、告知人员
3. 检测与分诊
- 事件检测方式
- 初始分诊检查清单
- 严重程度分类标准
4. 遏制
- 短期遏制措施
- 长期遏制措施
- 证据留存要求
5. 根除
- 根本原因识别
- 恶意软件/威胁清除步骤
- 验证流程
6. 恢复
- 系统恢复步骤
- 验证标准
- 恢复后的监控要求
7. 事后处理
- 经验总结会议触发条件
- 报告模板
- 检测能力改进措施
8. 沟通
- 内部通知矩阵
- 外部通知要求(监管机构、客户、执法部门)
- 状态更新频率
9. 附录
- 工具特定流程
- 联系人列表
- 证据收集检查清单Step 3: Write Decision Trees and Escalation Criteria
步骤3:编写决策树和升级标准
Define clear decision points with binary outcomes:
Detection Alert Received
├── Is the alert a true positive?
│ ├── YES → Classify severity
│ │ ├── P1 (Critical) → Page incident commander, begin containment immediately
│ │ ├── P2 (High) → Notify IR lead, begin investigation within 30 min
│ │ ├── P3 (Medium) → Queue for investigation within 4 hours
│ │ └── P4 (Low) → Document and investigate within 24 hours
│ └── NO → Document as false positive, tune detection rule
└── Cannot determine → Escalate to Tier 2 for deeper analysisEscalation triggers:
- Any P1 incident: Immediate escalation to IR lead and CISO
- Data exfiltration confirmed: Legal counsel and privacy officer notified
- Customer data involved: Customer notification process activated
- Third-party involvement: Vendor security contact engaged
- Law enforcement needed: General counsel authorizes before contact
定义带有二元结果的清晰决策点:
收到检测警报
├── 警报是否为真阳性?
│ ├── 是 → 分类严重程度
│ │ ├── P1(关键)→ 呼叫事件指挥官,立即启动遏制
│ │ ├── P2(高)→ 通知事件响应负责人,30分钟内启动调查
│ │ ├── P3(中)→ 排入队列,4小时内完成调查
│ │ └── P4(低)→ 记录并在24小时内完成调查
│ └── 否 → 记录为误报,调整检测规则
└── 无法确定 → 升级至Tier 2进行深度分析升级触发条件:
- 任何P1事件:立即升级至事件响应负责人和CISO
- 确认数据外渗:通知法务顾问和隐私专员
- 涉及客户数据:启动客户通知流程
- 涉及第三方:联系供应商安全对接人
- 需要执法介入:需经总法律顾问批准后再联系
Step 4: Define Specific Technical Procedures
步骤4:定义具体技术流程
Write tool-specific instructions for each step (not generic guidance):
CONTAINMENT - Endpoint Isolation via CrowdStrike:
1. Open Falcon Console > Hosts > Search for affected hostname
2. Click on the host > Host Details
3. Click "Contain Host" button in upper right
4. Confirm isolation (host will only communicate with CrowdStrike cloud)
5. Document containment action in incident ticket with timestamp
6. Verify containment: Host should show "Contained" status badge
CONTAINMENT - Block C2 Domain at DNS:
1. SSH to DNS server: ssh admin@dns-primary.corp.local
2. Add to block zone: echo "zone evil.com { type master; file /etc/bind/db.sinkhole; };" >> /etc/bind/named.conf.local
3. Reload DNS: rndc reload
4. Verify: dig @dns-primary evil.com (should resolve to sinkhole IP 10.0.0.99)
5. Document blocked domain in incident ticket为每个步骤编写工具特定的指令(而非通用指导):
遏制 - 通过CrowdStrike隔离终端:
1. 打开Falcon控制台 > 主机 > 搜索受影响主机名
2. 点击主机 > 主机详情
3. 点击右上角的“隔离主机”按钮
4. 确认隔离(主机仅与CrowdStrike云通信)
5. 在事件工单中记录遏制操作及时间戳
6. 验证隔离状态:主机应显示“已隔离”状态标识
遏制 - 在DNS层面阻断C2域名:
1. SSH连接至DNS服务器:ssh admin@dns-primary.corp.local
2. 添加至阻断区域:echo "zone evil.com { type master; file /etc/bind/db.sinkhole; };" >> /etc/bind/named.conf.local
3. 重新加载DNS:rndc reload
4. 验证:dig @dns-primary evil.com(应解析至黑洞IP 10.0.0.99)
5. 在事件工单中记录已阻断域名Step 5: Integrate with SOAR Platform
步骤5:与SOAR平台集成
Convert manual playbook steps into automated workflows:
- Map each playbook step to a SOAR action (API call, script, human decision point)
- Define automation boundaries (what runs automatically vs. what requires analyst approval)
- Build enrichment automations for the triage phase
- Create containment automations with approval gates for high-impact actions
- Configure notification automations for stakeholder communication
将手动剧本步骤转换为自动化工作流:
- 将每个剧本步骤映射至SOAR操作(API调用、脚本、人工决策点)
- 定义自动化边界(自动运行内容 vs 需要分析师批准的内容)
- 为分诊阶段构建 enrichment 自动化
- 为高影响操作创建带审批门的遏制自动化
- 为利益相关者沟通配置通知自动化
Step 6: Test and Maintain the Playbook
步骤6:测试与维护剧本
Validate the playbook through exercises and maintain currency:
- Conduct tabletop exercises with the IR team walking through the playbook
- Perform live-fire exercises simulating the incident type in a test environment
- Review and update after every real incident that uses the playbook
- Schedule quarterly reviews for accuracy of contact lists, tool procedures, and escalation paths
- Track playbook metrics: mean time to contain, mean time to resolve, false positive rate
通过演练验证剧本并保持时效性:
- 与事件响应团队开展桌面演练,走查剧本流程
- 在测试环境中执行模拟事件类型的实弹演练
- 每次使用剧本处理真实事件后,进行审核和更新
- 每季度安排审核,确保联系人列表、工具流程和升级路径的准确性
- 跟踪剧本指标:平均遏制时间、平均解决时间、误报率
Key Concepts
核心概念
| Term | Definition |
|---|---|
| Playbook | Documented, repeatable set of procedures for responding to a specific incident type |
| Runbook | More granular than a playbook; step-by-step technical instructions for a specific task within a playbook |
| RACI Matrix | Responsibility assignment chart defining who is Responsible, Accountable, Consulted, and Informed for each activity |
| Decision Tree | Flowchart-based logic defining the response path based on binary conditions at each decision point |
| Escalation Criteria | Predefined conditions that trigger notification of higher-level personnel or external parties |
| SOAR Playbook | Automated workflow in a Security Orchestration, Automation, and Response platform executing playbook steps |
| 术语 | 定义 |
|---|---|
| Playbook(剧本) | 针对特定事件类型的可重复文档化响应流程 |
| Runbook(运行手册) | 比剧本更细化;剧本内特定任务的分步技术指令 |
| RACI Matrix | 责任分配图表,定义每个活动的负责、批准、咨询、告知人员 |
| Decision Tree(决策树) | 基于流程图的逻辑,根据每个决策点的二元条件定义响应路径 |
| Escalation Criteria(升级标准) | 触发通知高层人员或外部方的预定义条件 |
| SOAR Playbook | 在安全编排、自动化与响应(SOAR)平台中执行剧本步骤的自动化工作流 |
Tools & Systems
工具与系统
- Cortex XSOAR: SOAR platform with visual playbook editor, 700+ integrations, and collaborative War Room
- Splunk SOAR: SOAR platform integrated with Splunk ES, drag-and-drop playbook builder with 2,800+ automated actions
- TheHive: Open-source incident response platform with case templates that function as playbook frameworks
- Confluence / GitLab Wiki: Documentation platforms for maintaining human-readable playbook documents with version control
- Tines: No-code security automation platform for building playbook workflows without programming
- Cortex XSOAR:SOAR平台,具备可视化剧本编辑器、700+集成和协作作战室
- Splunk SOAR:与Splunk ES集成的SOAR平台,拖拽式剧本构建器,支持2800+自动化操作
- TheHive:开源事件响应平台,带有可作为剧本框架的案例模板
- Confluence / GitLab Wiki:文档平台,用于维护带版本控制的可读剧本文档
- Tines:无代码安全自动化平台,无需编程即可构建剧本工作流
Common Scenarios
常见场景
Scenario: Building a Phishing Response Playbook from Scratch
场景:从零开始构建钓鱼响应剧本
Context: An organization with a 5-person SOC has no documented phishing response procedure. Analysts handle phishing reports inconsistently.
Approach:
- Interview SOC analysts to document their current ad hoc process
- Define the trigger: user reports phishing email via abuse@ mailbox or phishing button
- Write triage steps: extract email headers, check sender reputation, analyze URLs/attachments in sandbox
- Define containment: quarantine email from all mailboxes, block sender domain, reset passwords if credentials entered
- Build SOAR automation: auto-extract IOCs from reported email, enrich via VirusTotal, create case in TheHive
- Test with simulated phishing email and measure response time improvement
Pitfalls:
- Writing overly generic procedures that don't reference specific tool interfaces or commands
- Not including the communication plan for notifying users who received the phishing email
- Forgetting to define the criteria for when a phishing report becomes a full incident investigation
- Not versioning the playbook or scheduling regular review cycles
背景:某组织拥有5人规模的SOC,无文档化钓鱼响应流程,分析师处理钓鱼报告的方式不一致。
实施方法:
- 访谈SOC分析师,记录其当前的临时处理流程
- 定义触发条件:用户通过abuse@邮箱或钓鱼举报按钮提交钓鱼邮件报告
- 编写分诊步骤:提取邮件头、检查发件人信誉、在沙箱中分析URL/附件
- 定义遏制措施:从所有邮箱中隔离邮件、阻断发件人域名、若用户已输入凭证则重置密码
- 构建SOAR自动化:自动从举报邮件中提取IOC,通过VirusTotal进行 enrichment,在TheHive中创建案例
- 使用模拟钓鱼邮件进行测试,衡量响应时间的改进
常见误区:
- 编写过于通用的流程,未引用具体工具界面或命令
- 未包含通知收到钓鱼邮件用户的沟通计划
- 未定义钓鱼报告何时需转为完整事件调查的标准
- 未对剧本进行版本控制或安排定期审核周期
Output Format
输出格式
INCIDENT RESPONSE PLAYBOOK
============================
Playbook Name: Phishing Incident Response
Version: 2.1
Owner: SOC Manager
Last Reviewed: 2025-11-01
Next Review: 2026-02-01
Trigger: Phishing email reported via abuse@corp.com or phish button
RACI MATRIX
Activity | SOC L1 | SOC L2 | IR Lead | Legal | Comms
Initial Triage | R | C | I | |
Email Analysis | R | A | I | |
Containment | | R | A | I |
Credential Reset | | R | A | |
User Notification | | C | A | | R
Regulatory Notification | | | C | R | A
Lessons Learned | C | C | R | I | I
PROCEDURE STEPS
[Detailed steps with tool-specific instructions]
DECISION TREE
[Flowchart logic]
ESCALATION MATRIX
[Conditions and contacts]
METRICS
Target MTTA: 15 minutes
Target MTTC: 1 hour
Target MTTR: 4 hours事件响应剧本
============================
剧本名称: 钓鱼事件响应
版本: 2.1
所有者: SOC经理
最后审核日期: 2025-11-01
下次审核日期: 2026-02-01
触发条件: 用户通过abuse@corp.com或钓鱼举报按钮提交钓鱼邮件报告
RACI矩阵
活动 | SOC L1 | SOC L2 | 事件响应负责人 | 法务 | 沟通
初始分诊 | R | C | I | |
邮件分析 | R | A | I | |
遏制 | | R | A | I |
密码重置 | | R | A | |
用户通知 | | C | A | | R
监管通知 | | | C | R | A
经验总结 | C | C | R | I | I
流程步骤
[带工具特定指令的详细步骤]
决策树
[流程图逻辑]
升级矩阵
[条件与联系人]
指标
目标MTTA: 15分钟
目标MTTC: 1小时
目标MTTR: 4小时