cause-and-effect

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Cause and Effect Analysis

因果分析

Apply Fishbone (Ishikawa) diagram analysis to systematically explore all potential causes of a problem across multiple categories.
应用Fishbone(石川图)分析法,系统性探究某个问题的所有潜在成因,覆盖多个类别。

Description

说明

Systematically examine potential causes across six categories: People, Process, Technology, Environment, Methods, and Materials. Creates structured "fishbone" view identifying contributing factors.
系统性审视六大类别下的潜在成因:People、Process、Technology、Environment、Methods、Materials。生成结构化的「鱼骨图」视图,识别影响因素。

Usage

使用方式

/cause-and-effect [problem_description]
/cause-and-effect [problem_description]

Variables

变量

  • PROBLEM: Issue to analyze (default: prompt for input)
  • CATEGORIES: Categories to explore (default: all six)
  • PROBLEM:待分析的问题(默认:提示输入)
  • CATEGORIES:要探究的类别(默认:全部六大类别)

Steps

步骤

  1. State the problem clearly (the "head" of the fish)
  2. For each category, brainstorm potential causes:
    • People: Skills, training, communication, team dynamics
    • Process: Workflows, procedures, standards, reviews
    • Technology: Tools, infrastructure, dependencies, configuration
    • Environment: Workspace, deployment targets, external factors
    • Methods: Approaches, patterns, architectures, practices
    • Materials: Data, dependencies, third-party services, resources
  3. For each potential cause, ask "why" to dig deeper
  4. Identify which causes are contributing vs. root causes
  5. Prioritize causes by impact and likelihood
  6. Propose solutions for highest-priority causes
  1. 清晰陈述问题(即鱼骨图的「鱼头」)
  2. 针对每个类别,头脑风暴潜在成因:
    • People:技能、培训、沟通、团队动态
    • Process:工作流、流程、标准、评审
    • Technology:工具、基础设施、依赖项、配置
    • Environment:工作空间、部署目标、外部因素
    • Methods:方法、模式、架构、实践
    • Materials:数据、依赖项、第三方服务、资源
  3. 针对每个潜在成因,不断追问「为什么」以深挖根源
  4. 区分影响因素与根本原因
  5. 根据影响程度和发生概率对成因排序
  6. 针对优先级最高的成因提出解决方案

Examples

示例

Example 1: API Response Latency

Example 1: API Response Latency

Problem: API responses take 3+ seconds (target: <500ms)

PEOPLE
├─ Team unfamiliar with performance optimization
├─ No one owns performance monitoring
└─ Frontend team doesn't understand backend constraints

PROCESS
├─ No performance testing in CI/CD
├─ No SLA defined for response times
└─ Performance regression not caught in code review

TECHNOLOGY
├─ Database queries not optimized
│  └─ Why: No query analysis tools in place
├─ N+1 queries in ORM
│  └─ Why: Eager loading not configured
├─ No caching layer
│  └─ Why: Redis not in tech stack
└─ Synchronous external API calls
   └─ Why: No async architecture in place

ENVIRONMENT
├─ Production uses smaller database instance than needed
├─ No CDN for static assets
└─ Single region deployment (high latency for distant users)

METHODS
├─ REST API design requires multiple round trips
├─ No pagination on large datasets
└─ Full object serialization instead of selective fields

MATERIALS
├─ Large JSON payloads (unnecessary data)
├─ Uncompressed responses
└─ Third-party API (payment gateway) is slow
   └─ Why: Free tier with rate limiting

ROOT CAUSES:
- No performance requirements defined (Process)
- Missing performance monitoring tooling (Technology)
- Architecture doesn't support caching/async (Methods)

SOLUTIONS (Priority Order):
1. Add database indexes (quick win, high impact)
2. Implement Redis caching layer (medium effort, high impact)
3. Make external API calls async with webhooks (high effort, high impact)
4. Define and monitor performance SLAs (low effort, prevents regression)
问题:API响应耗时超过3秒(目标:<500毫秒)

PEOPLE
├─ 团队不熟悉性能优化
├─ 无人负责性能监控
└─ 前端团队不了解后端约束

PROCESS
├─ CI/CD中未包含性能测试
├─ 未定义响应时间SLA
└─ 代码评审未发现性能退化问题

TECHNOLOGY
├─ 数据库查询未优化
│  └─ 原因:未配备查询分析工具
├─ ORM中存在N+1查询问题
│  └─ 原因:未配置预加载
├─ 无缓存层
│  └─ 原因:技术栈中未引入Redis
└─ 同步调用外部API
   └─ 原因:未采用异步架构

ENVIRONMENT
├─ 生产环境使用的数据库实例规格不足
├─ 未为静态资源配置CDN
└─ 单区域部署(远程用户延迟高)

METHODS
├─ REST API设计需要多次往返请求
├─ 大数据集未实现分页
└─ 采用全对象序列化而非字段按需序列化

MATERIALS
├─ JSON payload过大(包含不必要数据)
├─ 响应未压缩
└─ 第三方API(支付网关)响应缓慢
   └─ 原因:使用免费版,存在速率限制

根本原因:
- 未定义性能要求(Process)
- 缺失性能监控工具(Technology)
- 架构不支持缓存/异步(Methods)

解决方案(优先级排序):
1. 添加数据库索引(快速见效,高影响)
2. 实现Redis缓存层(中等成本,高影响)
3. 通过Webhooks异步调用外部API(高成本,高影响)
4. 定义并监控性能SLA(低成本,防止退化)

Example 2: Flaky Test Suite

Example 2: Flaky Test Suite

Problem: 15% of test runs fail, passing on retry

PEOPLE
├─ Test-writing skills vary across team
├─ New developers copy existing flaky patterns
└─ No one assigned to fix flaky tests

PROCESS
├─ Flaky tests marked as "known issue" and ignored
├─ No policy against merging with flaky tests
└─ Test failures don't block deployments

TECHNOLOGY
├─ Race conditions in async test setup
├─ Tests share global state
├─ Test database not isolated per test
├─ setTimeout used instead of proper waiting
└─ CI environment inconsistent (different CPU/memory)

ENVIRONMENT
├─ CI runner under heavy load
├─ Network timing varies (external API mocks flaky)
└─ Timezone differences between local and CI

METHODS
├─ Integration tests not properly isolated
├─ No retry logic for legitimate timing issues
└─ Tests depend on execution order

MATERIALS
├─ Test data fixtures overlap
├─ Shared test database polluted
└─ Mock data doesn't match production patterns

ROOT CAUSES:
- No test isolation strategy (Methods + Technology)
- Process accepts flaky tests (Process)
- Async timing not handled properly (Technology)

SOLUTIONS:
1. Implement per-test database isolation (high impact)
2. Replace setTimeout with proper async/await patterns (medium impact)
3. Add pre-commit hook blocking flaky test patterns (prevents new issues)
4. Enforce policy: flaky test = block merge (process change)
问题:15%的测试运行失败,重试后可通过

PEOPLE
├─ 团队成员的测试编写能力参差不齐
├─ 新开发人员复制了现有不稳定测试的模式
└─ 无人负责修复不稳定测试

PROCESS
├─ 不稳定测试被标记为「已知问题」并被忽略
├─ 无禁止合并含不稳定测试代码的政策
└─ 测试失败不阻止部署

TECHNOLOGY
├─ 异步测试设置中存在竞态条件
├─ 测试共享全局状态
├─ 测试数据库未按测试用例隔离
├─ 使用setTimeout而非合理的等待机制
└─ CI环境不一致(CPU/内存配置不同)

ENVIRONMENT
├─ CI runner负载过高
├─ 网络时序不稳定(外部API模拟不可靠)
└─ 本地与CI环境时区不同

METHODS
├─ 集成测试未合理隔离
├─ 未针对合理时序问题实现重试逻辑
└─ 测试依赖执行顺序

MATERIALS
├─ 测试数据夹具重叠
├─ 共享测试数据库被污染
└─ 模拟数据与生产模式不匹配

根本原因:
- 无测试隔离策略(Methods + Technology)
- 流程允许存在不稳定测试(Process)
- 异步时序处理不当(Technology)

解决方案:
1. 实现按测试用例隔离数据库(高影响)
2. 用合理的async/await模式替代setTimeout(中等影响)
3. 添加预提交钩子,阻止引入不稳定测试模式(预防新问题)
4. 强制执行政策:存在不稳定测试则阻止合并(流程变更)

Example 3: Feature Takes 3 Months Instead of 3 Weeks

Example 3: Feature Takes 3 Months Instead of 3 Weeks

Problem: Simple CRUD feature took 12 weeks vs. 3 week estimate

PEOPLE
├─ Developer unfamiliar with codebase
├─ Key architect on vacation during critical phase
└─ Designer changed requirements mid-development

PROCESS
├─ Requirements not finalized before starting
├─ No code review for first 6 weeks (large diff)
├─ Multiple rounds of design revision
└─ QA started late (found issues in week 10)

TECHNOLOGY
├─ Codebase has high coupling (change ripple effects)
├─ No automated tests (manual testing slow)
├─ Legacy code required refactoring first
└─ Development environment setup took 2 weeks

ENVIRONMENT
├─ Staging environment broken for 3 weeks
├─ Production data needed for testing (compliance delay)
└─ Dependencies blocked by another team

METHODS
├─ No incremental delivery (big bang approach)
├─ Over-engineering (added future features "while we're at it")
└─ No design doc (discovered issues during implementation)

MATERIALS
├─ Third-party API changed during development
├─ Production data model different than staging
└─ Missing design assets (waited for designer)

ROOT CAUSES:
- No requirements lock-down before start (Process)
- Architecture prevents incremental changes (Technology)
- Big bang approach vs. iterative (Methods)
- Development environment not automated (Technology)

SOLUTIONS:
1. Require design doc + finalized requirements before starting (Process)
2. Implement feature flags for incremental delivery (Methods)
3. Automate dev environment setup (Technology)
4. Refactor high-coupling areas (Technology, long-term)
问题:简单CRUD功能耗时12周,远超3周预估

PEOPLE
├─ 开发人员不熟悉代码库
├─ 关键架构师在关键阶段休假
└─ 设计师在开发中途变更需求

PROCESS
├─ 需求未定稿即启动开发
├─ 前6周未进行代码评审(差异过大)
├─ 多轮设计修订
└─ QA启动过晚(第10周才发现问题)

TECHNOLOGY
├─ 代码库耦合度高(变更引发连锁反应)
├─ 无自动化测试(手动测试缓慢)
├─ 需先重构遗留代码
└─ 开发环境搭建耗时2周

ENVIRONMENT
├─ 预发布环境故障持续3周
├─ 测试需使用生产数据(合规延迟)
└─ 依赖项被其他团队阻塞

METHODS
├─ 无增量交付(大爆炸式开发)
├─ 过度设计(额外添加「未来可能用到」的功能)
└─ 无设计文档(实现过程中才发现问题)

MATERIALS
├─ 开发期间第三方API发生变更
├─ 生产数据模型与预发布环境不同
└─ 缺失设计资产(等待设计师交付)

根本原因:
─ 启动前未锁定需求(Process)
─ 架构无法支持增量变更(Technology)
─ 采用大爆炸式而非迭代开发(Methods)
─ 开发环境未自动化(Technology)

解决方案:
1. 要求启动前提供设计文档并定稿需求(Process)
2. 实现功能开关以支持增量交付(Methods)
3. 自动化开发环境搭建(Technology)
4. 重构高耦合区域(Technology,长期优化)

Notes

注意事项

  • Fishbone reveals systemic issues across domains
  • Multiple causes often combine to create problems
  • Don't stop at first cause in each category—dig deeper
  • Some causes span multiple categories (mark them)
  • Root causes usually in Process or Methods (not just Technology)
  • Use with
    /why
    command for deeper analysis of specific causes
  • Prioritize solutions by: impact × feasibility ÷ effort
  • Address root causes, not just symptoms
  • Fishbone图可揭示跨领域的系统性问题
  • 问题往往由多个成因共同导致
  • 不要停留在每个类别下的第一层成因,要深挖下去
  • 部分成因横跨多个类别(需标记出来)
  • 根本原因通常存在于Process或Methods中(而非仅Technology)
  • 可搭配
    /why
    命令对特定成因进行更深入的分析
  • 按以下公式对解决方案排序:影响程度 × 可行性 ÷ 实施成本
  • 要解决根本原因,而非仅处理表面症状