designing-distributed-systems

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Designing Distributed Systems

分布式系统设计

Design scalable, reliable, and fault-tolerant distributed systems using proven patterns and consistency models.

使用经过验证的模式与一致性模型，设计可扩展、可靠且容错的分布式系统。

Purpose

目的

Distributed systems are the foundation of modern cloud-native applications. Understanding fundamental trade-offs (CAP theorem, PACELC), consistency models, replication patterns, and resilience strategies is essential for building systems that scale globally while maintaining correctness and availability.

分布式系统是现代云原生应用的基础。理解核心权衡（CAP定理、PACELC）、一致性模型、复制模式和弹性策略，对于构建可全球扩展同时保持正确性与可用性的系统至关重要。

When to Use This Skill

适用场景

Apply when:

Designing microservices architectures with multiple services
Building systems that must scale across multiple datacenters or regions
Choosing between consistency vs availability during network partitions
Selecting replication strategies (single-leader, multi-leader, leaderless)
Implementing distributed transactions (saga pattern, event sourcing, CQRS)
Designing partition-tolerant systems with proper consistency guarantees
Building resilient services with circuit breakers, bulkheads, retries
Implementing service discovery and inter-service communication

在以下场景中应用本技能：

设计包含多个服务的微服务架构
构建需跨多个数据中心或区域扩展的系统
在网络分区时在一致性与可用性之间做选择
选择复制策略（单主、多主、无主）
实现分布式事务（Saga模式、事件溯源、CQRS）
设计具备适当一致性保障的分区容忍系统
构建带有断路器、舱壁隔离、重试机制的弹性服务
实现服务发现与服务间通信

Core Concepts

核心概念

CAP Theorem Fundamentals

CAP定理基础

CAP Theorem: In a distributed system experiencing a network partition, choose between Consistency (C) or Availability (A). Partition tolerance (P) is mandatory.

Network partitions WILL occur → Always design for P

During partition:
├─ CP (Consistency + Partition Tolerance)
│  Use when: Financial transactions, inventory, seat booking
│  Trade-off: System unavailable during partition
│  Examples: HBase, MongoDB (default), etcd
│
└─ AP (Availability + Partition Tolerance)
   Use when: Social media, caching, analytics, shopping carts
   Trade-off: Stale reads possible, conflicts need resolution
   Examples: Cassandra, DynamoDB, Riak

PACELC: Extends CAP to consider normal operations (no partition).

If Partition: Choose Availability (A) or Consistency (C)
Else (normal): Choose Latency (L) or Consistency (C)

CAP定理： 在发生网络分区的分布式系统中，需在一致性（C）与可用性（A）之间选择。分区容忍性（P）是必须保障的。

网络分区必然会发生 → 始终按P来设计系统

分区发生时：
├─ CP (一致性 + 分区容忍性)
│  适用场景：金融交易、库存管理、座位预订
│  权衡：分区期间系统不可用
│  示例：HBase、MongoDB（默认）、etcd
│
└─ AP (可用性 + 分区容忍性)
   适用场景：社交媒体、缓存、分析系统、购物车
   权衡：可能出现脏读，需解决冲突
   示例：Cassandra、DynamoDB、Riak

PACELC： 扩展CAP定理，覆盖无分区的正常运行场景。

若发生分区： 选择可用性（A）或一致性（C）
正常状态时： 选择延迟（L）或一致性（C）

Consistency Models Spectrum

一致性模型谱系

Strong Consistency ◄─────────────────────► Eventual Consistency
      │                    │                      │
  Linearizable      Causal Consistency     Convergent
  (Slowest,         (Middle Ground,        (Fastest,
   Most Consistent)  Causally Ordered)     Eventually Consistent)

Strong Consistency (Linearizability):

All operations appear atomically in sequential order
Reads always return most recent write
Use for: Bank balances, inventory stock, seat booking
Trade-off: Higher latency, reduced availability

Eventual Consistency:

If no new updates, all replicas eventually converge
Use for: Social feeds, product catalogs, user profiles, DNS
Trade-off: Stale reads possible, conflict resolution needed

Causal Consistency:

Causally related operations seen in same order by all nodes
Use for: Chat apps, collaborative editing, comment threads
Trade-off: More complex than eventual, requires causality tracking

Bounded Staleness:

Staleness bounded by time or version count
Use for: Real-time dashboards, leaderboards, monitoring
Trade-off: Must monitor lag, more complex than eventual

强一致性 ◄─────────────────────► 最终一致性
      │                    │                      │
  线性化一致性      因果一致性     收敛一致性
  （最慢，         （中间方案，        （最快，
   一致性最高）  因果顺序保证）     最终一致）

强一致性（线性化）：

所有操作以原子方式按顺序执行
读取始终返回最新写入的数据
适用场景：银行账户余额、库存数量、座位预订
权衡：延迟更高，可用性降低

最终一致性：

若无新更新，所有副本最终会收敛到一致状态
适用场景：社交动态、产品目录、用户资料、DNS
权衡：可能出现脏读，需处理冲突

因果一致性：

所有节点对有因果关联的操作的执行顺序一致
适用场景：聊天应用、协同编辑、评论线程
权衡：比最终一致性复杂，需要跟踪因果关系

有界过期一致性：

数据过期时间或版本数量被限制
适用场景：实时仪表盘、排行榜、监控系统
权衡：必须监控延迟，比最终一致性复杂

Replication Patterns

复制模式

1. Leader-Follower (Single-Leader):

All writes to leader, replicated to followers
Followers handle reads (load distribution)
Synchronous: Wait for follower ACK (strong consistency, higher latency)
Asynchronous: Don't wait (eventual consistency, possible data loss)
Use for: Most common pattern, strong consistency with sync replication

2. Multi-Leader:

Multiple leaders accept writes in different datacenters
Leaders replicate to each other
Conflict resolution required: Last-Write-Wins, application merge, vector clocks
Use for: Multi-datacenter, low write latency, geo-distributed users
Trade-off: Conflict resolution complexity

3. Leaderless (Dynamo-style):

No single leader, quorum-based reads/writes
Quorum rule: W + R > N (W=write quorum, R=read quorum, N=replicas)
Example: N=5, W=3, R=2 → Strong consistency (overlap guaranteed)
Use for: Maximum availability, partition tolerance
Trade-off: Complexity, read repair needed

1. 主从式（单主）：

所有写入操作发送至主节点，再复制到从节点
从节点处理读取请求（负载均衡）
同步复制： 等待从节点确认（强一致性，延迟更高）
异步复制： 不等待确认（最终一致性，可能丢失数据）
适用场景：最常用的模式，同步复制可实现强一致性

2. 多主式：

多个主节点在不同数据中心接受写入
主节点之间互相复制数据
必须解决冲突： 最后写入获胜、应用层合并、向量时钟
适用场景：多数据中心、低写入延迟、地理分布式用户
权衡：冲突解决复杂度高

3. 无主式（Dynamo风格）：

无单一主节点，基于仲裁机制处理读写
仲裁规则： W + R > N（W=写入仲裁数，R=读取仲裁数，N=副本数）
示例：N=5，W=3，R=2 → 强一致性（保证重叠）
适用场景：最高可用性、分区容忍性
权衡：复杂度高，需要读取修复

Partitioning Strategies

分区策略

Hash Partitioning (Consistent Hashing):

Key → Hash(Key) → Partition assignment
Even distribution, minimal rebalancing when nodes added/removed
Use for: Point queries by ID, even distribution critical
Examples: Cassandra, DynamoDB, Redis Cluster

Range Partitioning:

Key ranges assigned to partitions (A-F, G-M, N-S, T-Z)
Enables range queries, ordered data
Risk: Hot spots if data skewed
Use for: Time-series data, leaderboards, range scans
Examples: HBase, Bigtable

Geographic Partitioning:

Partition by location (US-East, EU-West, APAC)
Use for: Data locality, GDPR compliance, low latency
Examples: Spanner, Cosmos DB

哈希分区（一致性哈希）：

键 → 哈希(键) → 分区分配
数据分布均匀，添加/移除节点时重平衡操作最少
适用场景：按ID进行点查询、对均匀分布有严格要求的场景
示例：Cassandra、DynamoDB、Redis Cluster

范围分区：

按键范围分配分区（A-F、G-M、N-S、T-Z）
支持范围查询、有序数据
风险：数据倾斜时会出现热点分区
适用场景：时序数据、排行榜、范围扫描
示例：HBase、Bigtable

地理分区：

按地理位置分区（美东、西欧、亚太）
适用场景：数据本地化、GDPR合规、低延迟
示例：Spanner、Cosmos DB

Resilience Patterns

弹性模式

Circuit Breaker:

[Closed] → Normal operation
   │ (failures exceed threshold)
   ▼
[Open] → Fail fast (don't call failing service)
   │ (timeout expires)
   ▼
[Half-Open] → Try single request
   │ success → [Closed]
   │ failure → [Open]

Prevents cascading failures
Fast-fail instead of waiting for timeout
See references/resilience-patterns.md

Bulkhead Isolation:

Isolate resources (thread pools, connection pools)
Failure in one partition doesn't affect others
Like ship compartments preventing total flooding

Timeout and Retry:

Timeout: Set deadlines, fail fast if exceeded
Retry: Exponential backoff with jitter
Idempotency: Ensure safe retry (critical)

Rate Limiting and Backpressure:

Protect services from overload
Token bucket, leaky bucket algorithms
Backpressure: Signal upstream to slow down

断路器：

[闭合] → 正常运行
   │ （失败次数超过阈值）
   ▼
[打开] → 快速失败（不调用故障服务）
   │ （超时后）
   ▼
[半开] → 尝试单次请求
   │ 成功 → [闭合]
   │ 失败 → [打开]

防止级联故障
快速失败而非等待超时
参考文档：references/resilience-patterns.md

舱壁隔离：

隔离资源（线程池、连接池）
一个分区的故障不会影响其他分区
类似船舱隔间，防止整船进水

超时与重试：

超时： 设置截止时间，超时则快速失败
重试： 带抖动的指数退避策略
幂等性： 确保重试安全（至关重要）

限流与背压：

保护服务避免过载
令牌桶、漏桶算法
背压：向上游发送信号以降低速率

Transaction Patterns

事务模式

Saga Pattern:

Coordinate distributed transactions across services
No distributed 2PC (two-phase commit)

Choreography: Services react to events

Order Service → OrderCreated event
Payment Service → listens → PaymentProcessed event
Inventory Service → listens → InventoryReserved event
(Compensating: if payment fails → InventoryReleased event)

Orchestration: Central coordinator

Saga Orchestrator:
1. Call Order Service
2. Call Payment Service
3. Call Inventory Service
(If step fails → call compensating transactions in reverse)

Event Sourcing:

Store state changes as immutable events
Rebuild state by replaying events
Audit trail, time travel, debugging
Trade-off: Query complexity, snapshot optimization

CQRS (Command Query Responsibility Segregation):

Separate read and write models
Write model: Normalized, transactional
Read model: Denormalized, cached, optimized
Use for: Different read/write patterns, high read:write ratio (10:1+)
Often paired with Event Sourcing

Saga模式：

跨服务协调分布式事务
不使用分布式两阶段提交（2PC）

编排模式： 服务响应事件

订单服务 → 订单创建事件
支付服务 → 监听事件 → 支付完成事件
库存服务 → 监听事件 → 库存预留事件
（补偿操作：若支付失败 → 库存释放事件）

** choreography模式：** 中央协调器

Saga协调器：
1. 调用订单服务
2. 调用支付服务
3. 调用库存服务
（若某步骤失败 → 反向调用补偿事务）

事件溯源：

将状态变更存储为不可变事件
通过重放事件重建状态
提供审计追踪、时间回溯、调试能力
权衡：查询复杂度高，需快照优化

CQRS（命令查询职责分离）：

分离读模型与写模型
写模型：规范化、事务性
读模型：非规范化、缓存优化
适用场景：读写模式不同、读远多于写（10:1+）的场景
通常与事件溯源配合使用

Service Discovery

服务发现

Client-Side Discovery:

Client queries service registry (Consul, etcd, Eureka)
Client load balances and calls service directly
Pro: No proxy overhead
Con: Client complexity

Server-Side Discovery:

Client calls load balancer
Load balancer queries registry and routes
Pro: Simple clients
Con: Load balancer single point of failure

Service Mesh:

Sidecar proxies handle discovery, routing, retry, circuit breaking
Examples: Istio, Linkerd
Pro: Decouples communication logic from services
Con: Operational complexity

客户端发现：

客户端查询服务注册中心（Consul、etcd、Eureka）
客户端自行负载均衡并直接调用服务
优点：无代理开销
缺点：客户端复杂度高

服务端发现：

客户端调用负载均衡器
负载均衡器查询注册中心并路由请求
优点：客户端实现简单
缺点：负载均衡器可能成为单点故障

服务网格：

边车代理处理发现、路由、重试、断路器
示例：Istio、Linkerd
优点：将通信逻辑与服务解耦
缺点：运维复杂度高

Caching Strategies

缓存策略

Cache-Aside (Lazy Loading):

Read:
1. Check cache → hit? return
2. Miss? Query database
3. Store in cache, return

Write-Through:

Write:
1. Write to cache
2. Cache writes to database synchronously
3. Return success

Write-Behind (Write-Back):

Write:
1. Write to cache
2. Return success
3. Cache writes to database asynchronously (batched)

Cache Invalidation:

TTL (Time-To-Live): Expire after duration
Event-based: Invalidate on data change
Manual: Explicit invalidation on update

旁路缓存（懒加载）：

读取：
1. 检查缓存 → 命中则返回
2. 未命中？查询数据库
3. 存入缓存，返回结果

写穿缓存：

写入：
1. 写入缓存
2. 缓存同步写入数据库
3. 返回成功

写回缓存（延迟写入）：

写入：
1. 写入缓存
2. 返回成功
3. 缓存异步批量写入数据库

缓存失效：

TTL（生存时间）：到期后自动失效
事件驱动：数据变更时触发失效
手动：更新时显式触发失效

Decision Frameworks

决策框架

Choosing Consistency Model

一致性模型选择

Decision Tree:
├─ Money involved? → Strong Consistency
├─ Double-booking unacceptable? → Strong Consistency
├─ Causality important (chat, edits)? → Causal Consistency
├─ Read-heavy, stale tolerable? → Eventual Consistency
└─ Default? → Eventual (then strengthen if needed)

决策树：
├─ 涉及资金？ → 强一致性
├─ 不允许重复预订？ → 强一致性
├─ 因果关系重要（聊天、编辑）？ → 因果一致性
├─ 读密集、可容忍脏读？ → 最终一致性
└─ 默认选择？ → 最终一致性（必要时再增强）

Choosing Replication Pattern

复制模式选择

├─ Single region writes? → Leader-Follower
├─ Multi-region writes + conflicts OK? → Multi-Leader
├─ Multi-region writes + no conflicts? → Leader-Follower with failover
└─ Maximum availability? → Leaderless (quorum)

├─ 单区域写入？ → 主从式
├─ 多区域写入 + 允许冲突？ → 多主式
├─ 多区域写入 + 无冲突？ → 带故障转移的主从式
└─ 最高可用性？ → 无主式（仲裁机制）

Choosing Partitioning Strategy

分区策略选择

├─ Need range scans? → Range Partitioning (risk: hot spots)
├─ Data residency requirements? → Geographic Partitioning
└─ Default? → Hash Partitioning (consistent hashing)

├─ 需要范围扫描？ → 范围分区（注意：存在热点风险）
├─ 有数据驻留要求？ → 地理分区
└─ 默认选择？ → 哈希分区（一致性哈希）

Quick Reference Tables

快速参考表

CAP/PACELC System Comparison

CAP/PACELC系统对比

System	If Partition	Else (Normal)	Use Case
Spanner	PC	EC (strong)	Global SQL
DynamoDB	PA	EL (eventual)	High availability
Cassandra	PA	EL (tunable)	Wide-column store
MongoDB	PC	EC (default)	Document store
Cosmos DB	PA/PC	EL/EC (5 levels)	Multi-model

系统	发生分区时	正常状态时	适用场景
Spanner	PC	EC (强一致性)	全局SQL数据库
DynamoDB	PA	EL (最终一致性)	高可用性场景
Cassandra	PA	EL (可配置)	宽列存储
MongoDB	PC	EC (默认)	文档数据库
Cosmos DB	PA/PC	EL/EC (5个级别)	多模型数据库

Consistency Model Use Cases

一致性模型适用场景

Use Case	Consistency Model
Bank account balance	Strong (Linearizable)
Seat booking (airline)	Strong (Linearizable)
Inventory stock count	Strong or Bounded
Shopping cart	Eventual
Product catalog	Eventual
Collaborative editing	Causal
Chat messages	Causal
Social media likes	Eventual
DNS records	Eventual

适用场景	一致性模型
银行账户余额	强一致性（线性化）
航班座位预订	强一致性（线性化）
库存数量统计	强一致性或有界过期一致性
购物车	最终一致性
产品目录	最终一致性
协同编辑	因果一致性
聊天消息	因果一致性
社交媒体点赞	最终一致性
DNS记录	最终一致性

Quorum Configurations

仲裁配置

Configuration	W	R	N	Consistency	Use Case
Strong	3	3	5	Strong	Banking
Balanced	3	2	5	Strong	Default
Write-heavy	2	3	5	Strong	Logs
Read-heavy	3	1	5	Eventual	Cache
Max Avail	1	1	5	Eventual	Analytics

配置	W	R	N	一致性	适用场景
强一致性	3	3	5	强一致性	银行业务
平衡型	3	2	5	强一致性	默认配置
写密集型	2	3	5	强一致性	日志系统
读密集型	3	1	5	最终一致性	缓存系统
最高可用性	1	1	5	最终一致性	分析系统

Progressive Disclosure

进阶内容

Detailed References

详细参考文档

For comprehensive coverage of specific topics, see:

references/cap-pacelc-theorem.md - CAP and PACELC deep-dive with PACELC matrix
references/consistency-models.md - Strong, eventual, causal, bounded staleness patterns
references/replication-patterns.md - Leader-follower, multi-leader, leaderless replication
references/partitioning-strategies.md - Hash, range, geographic partitioning with examples
references/consensus-algorithms.md - Raft and Paxos overview (when consensus needed)
references/resilience-patterns.md - Circuit breaker, bulkhead, timeout, retry, rate limiting
references/saga-pattern.md - Choreography vs orchestration with working examples
references/event-sourcing-cqrs.md - Event sourcing and CQRS implementation patterns
references/service-discovery.md - Client-side, server-side, service mesh patterns
references/caching-strategies.md - Cache-aside, write-through, write-behind, invalidation

如需深入了解特定主题，请查看：

references/cap-pacelc-theorem.md - CAP与PACELC深度解析，含PACELC矩阵
references/consistency-models.md - 强一致性、最终一致性、因果一致性、有界过期一致性模式
references/replication-patterns.md - 主从、多主、无主复制模式
references/partitioning-strategies.md - 哈希、范围、地理分区及示例
references/consensus-algorithms.md - Raft与Paxos概述（需要共识机制时）
references/resilience-patterns.md - 断路器、舱壁隔离、超时、重试、限流
references/saga-pattern.md - 编排与 choreography模式及示例
references/event-sourcing-cqrs.md - 事件溯源与CQRS实现模式
references/service-discovery.md - 客户端、服务端、服务网格发现模式
references/caching-strategies.md - 旁路缓存、写穿、写回、缓存失效

Working Examples

实战示例

Complete, runnable examples demonstrating patterns:

examples/consistent-hashing/ - Consistent hashing implementation with virtual nodes
examples/circuit-breaker/ - Circuit breaker pattern with state transitions
examples/saga-orchestration/ - Saga orchestrator with compensating transactions
examples/event-sourcing/ - Event store with replay and snapshots
examples/cqrs/ - CQRS with separate read/write models
examples/service-discovery/ - Consul-based service discovery and registration

展示模式的完整可运行示例：

examples/consistent-hashing/ - 带虚拟节点的一致性哈希实现
examples/circuit-breaker/ - 带状态转换的断路器模式
examples/saga-orchestration/ - 带补偿事务的Saga协调器
examples/event-sourcing/ - 带重放与快照的事件存储
examples/cqrs/ - 分离读写模型的CQRS实现
examples/service-discovery/ - 基于Consul的服务发现与注册

ASCII Diagrams

ASCII图表

Visual representations for complex concepts:

diagrams/cap-theorem.txt - CAP theorem decision tree
diagrams/replication-topologies.txt - Leader-follower, multi-leader, leaderless
diagrams/saga-flow.txt - Saga choreography and orchestration flows
diagrams/caching-patterns.txt - Cache-aside, write-through, write-behind

复杂概念的可视化表示：

diagrams/cap-theorem.txt - CAP定理决策树
diagrams/replication-topologies.txt - 主从、多主、无主复制拓扑
diagrams/saga-flow.txt - Saga编排与 choreography流程
diagrams/caching-patterns.txt - 旁路缓存、写穿、写回模式

Integration with Other Skills

与其他技能的集成

Related Skills:

For Kubernetes deployment: See

kubernetes-operations

skill for pod anti-affinity, service mesh For infrastructure: See

infrastructure-as-code

skill for deploying distributed systems For databases: See

databases-sql

and

databases-nosql

for replication configuration For messaging: See

message-queues

skill for event-driven architectures, saga orchestration For monitoring: See

observability

skill for distributed tracing, monitoring patterns For testing: See

performance-engineering

skill for load testing distributed systems For security: See

security-hardening

skill for mTLS, service authentication

相关技能：

Kubernetes部署：查看

kubernetes-operations

技能了解Pod反亲和性、服务网格基础设施：查看

infrastructure-as-code

技能了解分布式系统部署数据库：查看

databases-sql

和

databases-nosql

技能了解复制配置消息队列：查看

message-queues

技能了解事件驱动架构、Saga编排监控：查看

observability

技能了解分布式追踪、监控模式测试：查看

performance-engineering

技能了解分布式系统负载测试安全：查看

security-hardening

技能了解mTLS、服务认证

Common Patterns

常见模式

Multi-Datacenter Pattern

多数据中心模式

1. Choose replication: Multi-leader or Leaderless
2. Partition data geographically
3. Implement conflict resolution (LWW, vector clocks, app-specific)
4. Monitor replication lag
5. Add circuit breakers between datacenters

1. 选择复制模式：多主或无主
2. 按地理分区数据
3. 实现冲突解决（最后写入获胜、向量时钟、应用层自定义）
4. 监控复制延迟
5. 在数据中心之间添加断路器

Event-Driven Saga Pattern

事件驱动Saga模式

1. Define saga steps and compensating actions
2. Choose choreography (events) or orchestration (coordinator)
3. Implement idempotent handlers (retries safe)
4. Publish events with outbox pattern (transactional)
5. Monitor saga progress and timeouts

1. 定义Saga步骤与补偿操作
2. 选择编排（事件）或 choreography（协调器）模式
3. 实现幂等处理器（重试安全）
4. 用outbox模式发布事件（事务性）
5. 监控Saga进度与超时

High-Availability Pattern

高可用性模式

1. Use leaderless replication (N=5, W=3, R=2)
2. Partition with consistent hashing
3. Add circuit breakers for failing nodes
4. Implement read repair and anti-entropy
5. Monitor quorum health

1. 使用无主复制（N=5，W=3，R=2）
2. 用一致性哈希分区
3. 为故障节点添加断路器
4. 实现读取修复与反熵
5. 监控仲裁健康状态

Best Practices

最佳实践

Design for Failure:

Network partitions will occur - always design for partition tolerance
Use timeouts, retries with exponential backoff
Implement circuit breakers to prevent cascading failures
Test chaos engineering scenarios (partition nodes, inject latency)

Choose Consistency Carefully:

Default to eventual consistency, strengthen only where needed
Strong consistency has real costs (latency, availability)
Use bounded staleness for middle ground

Idempotency is Critical:

Design operations to be safely retryable
Use unique request IDs for deduplication
Essential for saga compensating transactions

Monitor and Observe:

Distributed tracing with correlation IDs
Monitor replication lag, quorum health
Alert on circuit breaker state changes
Track saga progress and failures

Partition Strategically:

Hash partitioning for even distribution
Range partitioning for range queries (monitor hot spots)
Geographic partitioning for compliance, latency

Version Everything:

Event schemas evolve - use versioning
API versioning for service compatibility
Database schema migrations in distributed systems

面向故障设计：

网络分区必然发生 - 始终按分区容忍性设计
使用超时、带指数退避的重试
实现断路器防止级联故障
测试混沌工程场景（分区节点、注入延迟）

谨慎选择一致性：

默认使用最终一致性，仅在必要时增强
强一致性有实际成本（延迟、可用性）
用有界过期一致性作为中间方案

幂等性至关重要：

设计可安全重试的操作
使用唯一请求ID实现去重
对Saga补偿事务必不可少

监控与可观测性：

带关联ID的分布式追踪
监控复制延迟、仲裁健康状态
断路器状态变化时触发告警
跟踪Saga进度与故障

策略性分区：

哈希分区实现均匀分布
范围分区支持范围查询（监控热点）
地理分区满足合规与低延迟需求

版本化所有内容：

事件 schema演进 - 使用版本化
服务API版本化以保证兼容性
分布式系统中的数据库 schema迁移

Anti-Patterns to Avoid

需避免的反模式

Distributed Monolith:

Microservices with tight coupling
Shared database across services
Fix: Database per service, async communication

Two-Phase Commit (2PC) Overuse:

Slow, blocking, reduces availability
Fix: Use saga pattern for distributed transactions

Ignoring Network Failures:

Assuming network is reliable
Fix: Always add timeouts, retries, circuit breakers

Strong Consistency Everywhere:

Unnecessary latency and complexity
Fix: Use eventual consistency by default, strengthen where needed

No Conflict Resolution Strategy:

Multi-leader without handling conflicts
Fix: Choose LWW, vector clocks, or app-specific merge

Cache Stampede:

TTL expires, all clients query database
Fix: Probabilistic early expiration, request coalescing

分布式单体：

耦合紧密的微服务
服务间共享数据库
修复方案：每个服务独立数据库、异步通信

过度使用两阶段提交（2PC）：

缓慢、阻塞、降低可用性
修复方案：用Saga模式实现分布式事务

忽略网络故障：

假设网络可靠
修复方案：始终添加超时、重试、断路器

全链路强一致性：

不必要的延迟与复杂度
修复方案：默认使用最终一致性，仅在必要时增强

无冲突解决策略：

多主模式下未处理冲突
修复方案：选择最后写入获胜、向量时钟或应用层合并

缓存击穿：

TTL到期后，所有客户端同时查询数据库
修复方案：概率性提前过期、请求合并

Troubleshooting

故障排查

Replication Lag Too High:

Check network bandwidth between datacenters
Monitor write throughput on leader
Consider async replication or multi-leader

Split-Brain Scenario:

Multiple leaders elected during partition
Fix: Use consensus (Raft, Paxos) for leader election
Implement fencing tokens to prevent dual writes

Hot Partitions:

Range partitioning with skewed data
Fix: Add hash component, manually redistribute, use composite keys

Saga Timeout/Stalled:

Service unavailable, saga can't complete
Fix: Implement saga timeout with automated rollback
Dead letter queue for manual intervention

Conflict Resolution Failures:

Multi-leader conflicts unhandled
Fix: Implement clear resolution strategy (LWW, merge, manual)
Monitor conflict rate, alert on spikes

复制延迟过高：

检查数据中心间的网络带宽
监控主节点的写入吞吐量
考虑异步复制或多主模式

脑裂场景：

分区期间选举出多个主节点
修复方案：用共识算法（Raft、Paxos）选举主节点
实现 fencing token防止双写

热点分区：

范围分区下数据倾斜
修复方案：添加哈希组件、手动重分配、使用复合键

Saga超时/停滞：

服务不可用，Saga无法完成
修复方案：实现Saga超时与自动回滚
死信队列用于手动干预

冲突解决失败：

多主模式下冲突未处理
修复方案：明确解决策略（最后写入获胜、合并、手动处理）
监控冲突率，异常时告警