root-cause-analysis

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Root Cause Analysis Orchestration Skill

根本原因分析编排技能

This skill helps you systematically identify the root cause of any problem using proven methodologies from the Toyota Production System and other industry-standard techniques.
本技能借助丰田生产体系及其他行业标准的成熟方法,帮助你系统性地识别任何问题的根本原因。

Quick Reference: When to Load Which Resource

快速参考:何时加载对应资源

Your Problem TypeLoad ResourceWhy
Need to understand 5 Whys, Fishbone, Pareto, Fault Tree methodology
resources/rca-methodologies.md
Learn each method step-by-step with examples
Looking for common root causes in your domain
resources/common-root-causes.md
Pattern match against known causes: software, hardware, process, personal
Want to see complete worked examples
resources/example-analyses.md
Study real cases: software bugs, vehicle maintenance, system failures, personal problems
Advanced: need barrier analysis, complex cause mapping
resources/advanced-techniques.md
Formal methods: Fault Tree, Barrier Analysis, multi-methodology chains
问题类型加载资源原因
需要了解5 Whys、Fishbone、Pareto、Fault Tree方法
resources/rca-methodologies.md
通过示例逐步学习每种方法
查找所在领域的常见根本原因
resources/common-root-causes.md
与已知原因进行模式匹配:软件、硬件、流程、个人
查看完整的实际案例
resources/example-analyses.md
研究真实案例:软件bug、车辆维护、系统故障、个人问题
进阶需求:需要障碍分析、复杂原因映射
resources/advanced-techniques.md
正式方法:Fault Tree、Barrier Analysis、多方法组合

Core Principle

核心原则

Do not treat symptoms—find and fix the root cause. As Taiichi Ohno, architect of the Toyota Production System, said: "By repeating why five times, the nature of the problem as well as its solution becomes clear."
不要只处理症状——找到并解决根本原因。 正如丰田生产体系的缔造者大野耐一所说:“通过连续问五次为什么,问题的本质和解决方案就会变得清晰。”

Orchestration Protocol

编排流程

Phase 1: Problem Classification

阶段1:问题分类

Quickly identify your problem domain and complexity:
Problem Domain:
  • Software: Code bugs, system failures, performance, deployment
  • Hardware: Equipment, mechanical, electrical, maintenance
  • Process: Workflow, procedures, organizational, communication
  • Personal: Life challenges, productivity, habits, wellbeing
Complexity Level:
  • Simple: Clear failure chain, 1-2 likely causes → Use 5 Whys
  • Complex: Multiple possible causes, unknown scope → Start with Fishbone
  • Critical/Safety: High stakes, needs rigor → Use Fault Tree
  • Multiple Issues: Many competing problems → Use Pareto first
Action: Load appropriate resource file(s) based on classification.
快速识别你的问题领域和复杂度:
问题领域:
  • 软件:代码bug、系统故障、性能问题、部署问题
  • 硬件:设备、机械、电气、维护
  • 流程:工作流、程序、组织、沟通
  • 个人:生活挑战、生产力、习惯、健康
复杂度等级:
  • 简单:故障链清晰,1-2个可能原因 → 使用5 Whys
  • 复杂:多个可能原因,范围未知 → 从Fishbone开始
  • 关键/安全相关:风险高,需要严谨性 → 使用Fault Tree
  • 多问题并存:多个相互竞争的问题 → 先使用Pareto
行动: 根据分类加载对应的资源文件。

Phase 2: Methodology Selection

阶段2:方法选择

Based on problem type, select your approach:
SituationRecommendedLoad
Single clear failure5 Whysmethodologies.md
Complex/multiple possibilitiesFishbone → 5 Whysmethodologies.md
Competing prioritiesPareto → 5 Whysmethodologies.md
Safety/high-stakesFault Treeadvanced-techniques.md
Process breakdownBarrier Analysisadvanced-techniques.md
Pattern matchingCommon causes + 5 Whyscommon-root-causes.md
根据问题类型选择合适的方法:
场景推荐方法加载资源
单一明确故障5 Whysmethodologies.md
复杂/多可能性Fishbone → 5 Whysmethodologies.md
优先级竞争Pareto → 5 Whysmethodologies.md
安全/高风险Fault Treeadvanced-techniques.md
流程崩溃Barrier Analysisadvanced-techniques.md
模式匹配常见原因 + 5 Whyscommon-root-causes.md

Phase 3: Execution & Verification

阶段3:执行与验证

During Analysis:
  1. Define problem clearly (What/Where/When/Impact)
  2. Gather evidence systematically
  3. Apply selected methodology
  4. Document reasoning at each step
  5. Verify root cause with Forward/Backward tests
Before Finalizing:
  • Validate conclusion against evidence
  • Check for red flags (see common-root-causes.md)
  • Confirm actionability (can you fix this?)
  • Develop solutions addressing root cause
分析过程中:
  1. 清晰定义问题(是什么/在哪里/何时发生/影响)
  2. 系统性收集证据
  3. 应用选定的方法
  4. 记录每一步的推理过程
  5. 使用正向/反向测试验证根本原因
最终确定前:
  • 根据证据验证结论
  • 检查警示信号(详见common-root-causes.md)
  • 确认可操作性(你能否解决该问题?)
  • 制定针对根本原因的解决方案

Problem Definition Framework

问题定义框架

Create a clear problem statement before analysis:
Essential Elements:
  • What: Observable symptom (not assumed cause)
  • Where: Location/system/component affected
  • When: Timeline, frequency, pattern
  • Impact: Users/systems affected, severity
Example: "Users in EU region experience 3-5 second dashboard load delays during 9-11 AM UTC peak hours, affecting ~2,000 daily active users. Started after v2.4 deployment on Nov 18th."
在分析前创建清晰的问题陈述:
核心要素:
  • 是什么: 可观察到的症状(而非假设的原因)
  • 在哪里: 受影响的位置/系统/组件
  • 何时: 时间线、频率、模式
  • 影响: 受影响的用户/系统、严重程度
示例: "欧盟地区的用户在UTC时间9-11点高峰时段会遇到3-5秒的仪表盘加载延迟,影响约2000名日活跃用户。该问题始于11月18日v2.4版本部署之后。"

Evidence Gathering (Go and See)

证据收集(现地现物)

Follow Toyota's principle—collect facts, not opinions:
Key Evidence Sources:
  • Logs, metrics, monitoring data
  • Timeline of events and changes
  • System/code/configuration changes before problem
  • Environmental factors (load, traffic, season)
  • User reports and reproduction steps
  • System state before/during/after
遵循丰田的原则——收集事实,而非观点:
关键证据来源:
  • 日志、指标、监控数据
  • 事件与变更的时间线
  • 问题发生前的系统/代码/配置变更
  • 环境因素(负载、流量、季节)
  • 用户报告与复现步骤
  • 问题发生前/中/后的系统状态

RCA Methodologies

RCA方法

See
resources/rca-methodologies.md
for complete methodology guide.
完整的方法指南请参阅
resources/rca-methodologies.md

Resource Files Summary

资源文件摘要

resources/rca-methodologies.md

resources/rca-methodologies.md

Comprehensive methodology guide covering:
  • 5 Whys: Step-by-step process with software examples
  • Fishbone Diagram: Structure, 6 M's categories, process
  • Pareto Analysis: Prioritization using 80/20 rule
  • Fault Tree Analysis: Top-down formal analysis
  • Barrier Analysis: Control failure examination
  • Structured 6-phase RCA process, domain-specific guidance, templates
全面的方法指南,涵盖:
  • 5 Whys:含软件示例的分步流程
  • Fishbone Diagram:结构、6M分类、流程
  • Pareto Analysis:基于80/20法则的优先级排序
  • Fault Tree Analysis:自上而下的正式分析
  • Barrier Analysis:控制失效检查
  • 结构化的6阶段RCA流程、领域特定指导、模板

resources/common-root-causes.md

resources/common-root-causes.md

Pattern reference catalog by domain:
  • Software Engineering: Code defects, configuration, dependencies, deployment
  • Hardware & Equipment: Mechanical, electrical, operational, maintenance
  • Process & Operations: Workflow, design, resources
  • Personal/Life: Health, habits, environment, skills
  • Red flags, recurring themes, pattern recognition
按领域划分的模式参考目录:
  • 软件工程:代码缺陷、配置、依赖、部署
  • 硬件与设备:机械、电气、操作、维护
  • 流程与运营:工作流、设计、资源
  • 个人/生活:健康、习惯、环境、技能
  • 警示信号、重复主题、模式识别

resources/example-analyses.md

resources/example-analyses.md

Four worked examples with full analysis:
  1. Software Bug: JWT authentication (5 Whys)
  2. Vehicle Maintenance: Overheating (5 Whys)
  3. System Failure: E-commerce checkout (Fishbone + 5 Whys)
  4. Personal Productivity: Missed deadlines (Fishbone + 5 Whys)
四个完整分析的实际案例:
  1. 软件Bug:JWT认证(5 Whys)
  2. 车辆维护:过热问题(5 Whys)
  3. 系统故障:电商结账故障(Fishbone + 5 Whys)
  4. 个人生产力:错过截止日期(Fishbone + 5 Whys)

resources/advanced-techniques.md

resources/advanced-techniques.md

Formal methods for complex problems:
  • Fault Tree Analysis: Boolean logic, safety systems
  • Barrier Analysis: Control failures
  • Multi-Methodology Chains: Complex orchestration
  • Verification Frameworks: Comprehensive testing
针对复杂问题的正式方法:
  • Fault Tree Analysis:布尔逻辑、安全系统
  • Barrier Analysis:控制失效分析
  • Multi-Methodology Chains:复杂编排
  • Verification Frameworks:全面测试

How This Skill Works

本技能的工作方式

  1. Clarify your situation: Domain, observations, evidence, time
  2. Recommend approach: Complexity analysis, methodology, resources
  3. Guide through analysis: Problem statement, evidence, methodology, verification
  4. Deliver output: Analysis, root cause, solutions, implementation
  1. 明确你的场景:领域、观察结果、证据、时间
  2. 推荐方法:复杂度分析、方法、资源
  3. 引导分析过程:问题陈述、证据、方法、验证
  4. 交付成果:分析结果、根本原因、解决方案、实施建议

Quick Start: 5-Minute RCA

快速上手:5分钟RCA

  1. State problem (What/Where/When/Impact)
  2. First Why: fact-based answer
  3. Second Why: dig deeper
  4. Third Why: dig deeper again
  5. Verify: would fixing this prevent it?
  1. 陈述问题(是什么/在哪里/何时发生/影响)
  2. 第一个Why:基于事实的回答
  3. 第二个Why:深入挖掘
  4. 第三个Why:再次深入挖掘
  5. 验证:解决该原因能否避免问题再次发生?

Templates & Examples

模板与示例

  • 5 Whys Template in
    resources/rca-methodologies.md
  • Fishbone Template in
    resources/rca-methodologies.md
  • Worked Examples in
    resources/example-analyses.md
  • Solution Structures in
    resources/example-analyses.md
  • 5 Whys模板 位于
    resources/rca-methodologies.md
  • Fishbone模板 位于
    resources/rca-methodologies.md
  • 实际案例 位于
    resources/example-analyses.md
  • 解决方案结构 位于
    resources/example-analyses.md

Next Steps

后续步骤

  1. Identify problem domain (software/hardware/process/personal)
  2. Load appropriate resource from table above
  3. Select methodology based on complexity
  4. Follow step-by-step process in resource
  5. Verify root cause (Forward/Backward tests)
  6. Develop actionable solutions

Remember: Goal is systematic investigation—disciplined questioning until you reach a cause you can actually fix.
  1. 确定问题领域(软件/硬件/流程/个人)
  2. 从上述表格中加载对应资源
  3. 根据复杂度选择方法
  4. 遵循资源中的分步流程
  5. 验证根本原因(正向/反向测试)
  6. 制定可执行的解决方案

记住:目标是系统性调查——通过严谨的提问,直到找到你实际可以解决的根本原因。