designing-distributed-systems

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Designing Distributed Systems

分布式系统设计

Design scalable, reliable, and fault-tolerant distributed systems using proven patterns and consistency models.
使用经过验证的模式与一致性模型,设计可扩展、可靠且容错的分布式系统。

Purpose

目的

Distributed systems are the foundation of modern cloud-native applications. Understanding fundamental trade-offs (CAP theorem, PACELC), consistency models, replication patterns, and resilience strategies is essential for building systems that scale globally while maintaining correctness and availability.
分布式系统是现代云原生应用的基础。理解核心权衡(CAP定理、PACELC)、一致性模型、复制模式和弹性策略,对于构建可全球扩展同时保持正确性与可用性的系统至关重要。

When to Use This Skill

适用场景

Apply when:
  • Designing microservices architectures with multiple services
  • Building systems that must scale across multiple datacenters or regions
  • Choosing between consistency vs availability during network partitions
  • Selecting replication strategies (single-leader, multi-leader, leaderless)
  • Implementing distributed transactions (saga pattern, event sourcing, CQRS)
  • Designing partition-tolerant systems with proper consistency guarantees
  • Building resilient services with circuit breakers, bulkheads, retries
  • Implementing service discovery and inter-service communication
在以下场景中应用本技能:
  • 设计包含多个服务的微服务架构
  • 构建需跨多个数据中心或区域扩展的系统
  • 在网络分区时在一致性与可用性之间做选择
  • 选择复制策略(单主、多主、无主)
  • 实现分布式事务(Saga模式、事件溯源、CQRS)
  • 设计具备适当一致性保障的分区容忍系统
  • 构建带有断路器、舱壁隔离、重试机制的弹性服务
  • 实现服务发现与服务间通信

Core Concepts

核心概念

CAP Theorem Fundamentals

CAP定理基础

CAP Theorem: In a distributed system experiencing a network partition, choose between Consistency (C) or Availability (A). Partition tolerance (P) is mandatory.
Network partitions WILL occur → Always design for P

During partition:
├─ CP (Consistency + Partition Tolerance)
│  Use when: Financial transactions, inventory, seat booking
│  Trade-off: System unavailable during partition
│  Examples: HBase, MongoDB (default), etcd
└─ AP (Availability + Partition Tolerance)
   Use when: Social media, caching, analytics, shopping carts
   Trade-off: Stale reads possible, conflicts need resolution
   Examples: Cassandra, DynamoDB, Riak
PACELC: Extends CAP to consider normal operations (no partition).
  • If Partition: Choose Availability (A) or Consistency (C)
  • Else (normal): Choose Latency (L) or Consistency (C)
CAP定理: 在发生网络分区的分布式系统中,需在一致性(C)与可用性(A)之间选择。分区容忍性(P)是必须保障的。
网络分区必然会发生 → 始终按P来设计系统

分区发生时:
├─ CP (一致性 + 分区容忍性)
│  适用场景:金融交易、库存管理、座位预订
│  权衡:分区期间系统不可用
│  示例:HBase、MongoDB(默认)、etcd
└─ AP (可用性 + 分区容忍性)
   适用场景:社交媒体、缓存、分析系统、购物车
   权衡:可能出现脏读,需解决冲突
   示例:Cassandra、DynamoDB、Riak
PACELC: 扩展CAP定理,覆盖无分区的正常运行场景。
  • 若发生分区: 选择可用性(A)或一致性(C)
  • 正常状态时: 选择延迟(L)或一致性(C)

Consistency Models Spectrum

一致性模型谱系

Strong Consistency ◄─────────────────────► Eventual Consistency
      │                    │                      │
  Linearizable      Causal Consistency     Convergent
  (Slowest,         (Middle Ground,        (Fastest,
   Most Consistent)  Causally Ordered)     Eventually Consistent)
Strong Consistency (Linearizability):
  • All operations appear atomically in sequential order
  • Reads always return most recent write
  • Use for: Bank balances, inventory stock, seat booking
  • Trade-off: Higher latency, reduced availability
Eventual Consistency:
  • If no new updates, all replicas eventually converge
  • Use for: Social feeds, product catalogs, user profiles, DNS
  • Trade-off: Stale reads possible, conflict resolution needed
Causal Consistency:
  • Causally related operations seen in same order by all nodes
  • Use for: Chat apps, collaborative editing, comment threads
  • Trade-off: More complex than eventual, requires causality tracking
Bounded Staleness:
  • Staleness bounded by time or version count
  • Use for: Real-time dashboards, leaderboards, monitoring
  • Trade-off: Must monitor lag, more complex than eventual
强一致性 ◄─────────────────────► 最终一致性
      │                    │                      │
  线性化一致性      因果一致性     收敛一致性
  (最慢,         (中间方案,        (最快,
   一致性最高)  因果顺序保证)     最终一致)
强一致性(线性化):
  • 所有操作以原子方式按顺序执行
  • 读取始终返回最新写入的数据
  • 适用场景:银行账户余额、库存数量、座位预订
  • 权衡:延迟更高,可用性降低
最终一致性:
  • 若无新更新,所有副本最终会收敛到一致状态
  • 适用场景:社交动态、产品目录、用户资料、DNS
  • 权衡:可能出现脏读,需处理冲突
因果一致性:
  • 所有节点对有因果关联的操作的执行顺序一致
  • 适用场景:聊天应用、协同编辑、评论线程
  • 权衡:比最终一致性复杂,需要跟踪因果关系
有界过期一致性:
  • 数据过期时间或版本数量被限制
  • 适用场景:实时仪表盘、排行榜、监控系统
  • 权衡:必须监控延迟,比最终一致性复杂

Replication Patterns

复制模式

1. Leader-Follower (Single-Leader):
  • All writes to leader, replicated to followers
  • Followers handle reads (load distribution)
  • Synchronous: Wait for follower ACK (strong consistency, higher latency)
  • Asynchronous: Don't wait (eventual consistency, possible data loss)
  • Use for: Most common pattern, strong consistency with sync replication
2. Multi-Leader:
  • Multiple leaders accept writes in different datacenters
  • Leaders replicate to each other
  • Conflict resolution required: Last-Write-Wins, application merge, vector clocks
  • Use for: Multi-datacenter, low write latency, geo-distributed users
  • Trade-off: Conflict resolution complexity
3. Leaderless (Dynamo-style):
  • No single leader, quorum-based reads/writes
  • Quorum rule: W + R > N (W=write quorum, R=read quorum, N=replicas)
  • Example: N=5, W=3, R=2 → Strong consistency (overlap guaranteed)
  • Use for: Maximum availability, partition tolerance
  • Trade-off: Complexity, read repair needed
1. 主从式(单主):
  • 所有写入操作发送至主节点,再复制到从节点
  • 从节点处理读取请求(负载均衡)
  • 同步复制: 等待从节点确认(强一致性,延迟更高)
  • 异步复制: 不等待确认(最终一致性,可能丢失数据)
  • 适用场景:最常用的模式,同步复制可实现强一致性
2. 多主式:
  • 多个主节点在不同数据中心接受写入
  • 主节点之间互相复制数据
  • 必须解决冲突: 最后写入获胜、应用层合并、向量时钟
  • 适用场景:多数据中心、低写入延迟、地理分布式用户
  • 权衡:冲突解决复杂度高
3. 无主式(Dynamo风格):
  • 无单一主节点,基于仲裁机制处理读写
  • 仲裁规则: W + R > N(W=写入仲裁数,R=读取仲裁数,N=副本数)
  • 示例:N=5,W=3,R=2 → 强一致性(保证重叠)
  • 适用场景:最高可用性、分区容忍性
  • 权衡:复杂度高,需要读取修复

Partitioning Strategies

分区策略

Hash Partitioning (Consistent Hashing):
  • Key → Hash(Key) → Partition assignment
  • Even distribution, minimal rebalancing when nodes added/removed
  • Use for: Point queries by ID, even distribution critical
  • Examples: Cassandra, DynamoDB, Redis Cluster
Range Partitioning:
  • Key ranges assigned to partitions (A-F, G-M, N-S, T-Z)
  • Enables range queries, ordered data
  • Risk: Hot spots if data skewed
  • Use for: Time-series data, leaderboards, range scans
  • Examples: HBase, Bigtable
Geographic Partitioning:
  • Partition by location (US-East, EU-West, APAC)
  • Use for: Data locality, GDPR compliance, low latency
  • Examples: Spanner, Cosmos DB
哈希分区(一致性哈希):
  • 键 → 哈希(键) → 分区分配
  • 数据分布均匀,添加/移除节点时重平衡操作最少
  • 适用场景:按ID进行点查询、对均匀分布有严格要求的场景
  • 示例:Cassandra、DynamoDB、Redis Cluster
范围分区:
  • 按键范围分配分区(A-F、G-M、N-S、T-Z)
  • 支持范围查询、有序数据
  • 风险:数据倾斜时会出现热点分区
  • 适用场景:时序数据、排行榜、范围扫描
  • 示例:HBase、Bigtable
地理分区:
  • 按地理位置分区(美东、西欧、亚太)
  • 适用场景:数据本地化、GDPR合规、低延迟
  • 示例:Spanner、Cosmos DB

Resilience Patterns

弹性模式

Circuit Breaker:
[Closed] → Normal operation
   │ (failures exceed threshold)
[Open] → Fail fast (don't call failing service)
   │ (timeout expires)
[Half-Open] → Try single request
   │ success → [Closed]
   │ failure → [Open]
  • Prevents cascading failures
  • Fast-fail instead of waiting for timeout
  • See references/resilience-patterns.md
Bulkhead Isolation:
  • Isolate resources (thread pools, connection pools)
  • Failure in one partition doesn't affect others
  • Like ship compartments preventing total flooding
Timeout and Retry:
  • Timeout: Set deadlines, fail fast if exceeded
  • Retry: Exponential backoff with jitter
  • Idempotency: Ensure safe retry (critical)
Rate Limiting and Backpressure:
  • Protect services from overload
  • Token bucket, leaky bucket algorithms
  • Backpressure: Signal upstream to slow down
断路器:
[闭合] → 正常运行
   │ (失败次数超过阈值)
[打开] → 快速失败(不调用故障服务)
   │ (超时后)
[半开] → 尝试单次请求
   │ 成功 → [闭合]
   │ 失败 → [打开]
  • 防止级联故障
  • 快速失败而非等待超时
  • 参考文档:references/resilience-patterns.md
舱壁隔离:
  • 隔离资源(线程池、连接池)
  • 一个分区的故障不会影响其他分区
  • 类似船舱隔间,防止整船进水
超时与重试:
  • 超时: 设置截止时间,超时则快速失败
  • 重试: 带抖动的指数退避策略
  • 幂等性: 确保重试安全(至关重要)
限流与背压:
  • 保护服务避免过载
  • 令牌桶、漏桶算法
  • 背压:向上游发送信号以降低速率

Transaction Patterns

事务模式

Saga Pattern:
  • Coordinate distributed transactions across services
  • No distributed 2PC (two-phase commit)
Choreography: Services react to events
Order Service → OrderCreated event
Payment Service → listens → PaymentProcessed event
Inventory Service → listens → InventoryReserved event
(Compensating: if payment fails → InventoryReleased event)
Orchestration: Central coordinator
Saga Orchestrator:
1. Call Order Service
2. Call Payment Service
3. Call Inventory Service
(If step fails → call compensating transactions in reverse)
Event Sourcing:
  • Store state changes as immutable events
  • Rebuild state by replaying events
  • Audit trail, time travel, debugging
  • Trade-off: Query complexity, snapshot optimization
CQRS (Command Query Responsibility Segregation):
  • Separate read and write models
  • Write model: Normalized, transactional
  • Read model: Denormalized, cached, optimized
  • Use for: Different read/write patterns, high read:write ratio (10:1+)
  • Often paired with Event Sourcing
Saga模式:
  • 跨服务协调分布式事务
  • 不使用分布式两阶段提交(2PC)
编排模式: 服务响应事件
订单服务 → 订单创建事件
支付服务 → 监听事件 → 支付完成事件
库存服务 → 监听事件 → 库存预留事件
(补偿操作:若支付失败 → 库存释放事件)
** choreography模式:** 中央协调器
Saga协调器:
1. 调用订单服务
2. 调用支付服务
3. 调用库存服务
(若某步骤失败 → 反向调用补偿事务)
事件溯源:
  • 将状态变更存储为不可变事件
  • 通过重放事件重建状态
  • 提供审计追踪、时间回溯、调试能力
  • 权衡:查询复杂度高,需快照优化
CQRS(命令查询职责分离):
  • 分离读模型与写模型
  • 写模型:规范化、事务性
  • 读模型:非规范化、缓存优化
  • 适用场景:读写模式不同、读远多于写(10:1+)的场景
  • 通常与事件溯源配合使用

Service Discovery

服务发现

Client-Side Discovery:
  • Client queries service registry (Consul, etcd, Eureka)
  • Client load balances and calls service directly
  • Pro: No proxy overhead
  • Con: Client complexity
Server-Side Discovery:
  • Client calls load balancer
  • Load balancer queries registry and routes
  • Pro: Simple clients
  • Con: Load balancer single point of failure
Service Mesh:
  • Sidecar proxies handle discovery, routing, retry, circuit breaking
  • Examples: Istio, Linkerd
  • Pro: Decouples communication logic from services
  • Con: Operational complexity
客户端发现:
  • 客户端查询服务注册中心(Consul、etcd、Eureka)
  • 客户端自行负载均衡并直接调用服务
  • 优点:无代理开销
  • 缺点:客户端复杂度高
服务端发现:
  • 客户端调用负载均衡器
  • 负载均衡器查询注册中心并路由请求
  • 优点:客户端实现简单
  • 缺点:负载均衡器可能成为单点故障
服务网格:
  • 边车代理处理发现、路由、重试、断路器
  • 示例:Istio、Linkerd
  • 优点:将通信逻辑与服务解耦
  • 缺点:运维复杂度高

Caching Strategies

缓存策略

Cache-Aside (Lazy Loading):
Read:
1. Check cache → hit? return
2. Miss? Query database
3. Store in cache, return
Write-Through:
Write:
1. Write to cache
2. Cache writes to database synchronously
3. Return success
Write-Behind (Write-Back):
Write:
1. Write to cache
2. Return success
3. Cache writes to database asynchronously (batched)
Cache Invalidation:
  • TTL (Time-To-Live): Expire after duration
  • Event-based: Invalidate on data change
  • Manual: Explicit invalidation on update
旁路缓存(懒加载):
读取:
1. 检查缓存 → 命中则返回
2. 未命中?查询数据库
3. 存入缓存,返回结果
写穿缓存:
写入:
1. 写入缓存
2. 缓存同步写入数据库
3. 返回成功
写回缓存(延迟写入):
写入:
1. 写入缓存
2. 返回成功
3. 缓存异步批量写入数据库
缓存失效:
  • TTL(生存时间):到期后自动失效
  • 事件驱动:数据变更时触发失效
  • 手动:更新时显式触发失效

Decision Frameworks

决策框架

Choosing Consistency Model

一致性模型选择

Decision Tree:
├─ Money involved? → Strong Consistency
├─ Double-booking unacceptable? → Strong Consistency
├─ Causality important (chat, edits)? → Causal Consistency
├─ Read-heavy, stale tolerable? → Eventual Consistency
└─ Default? → Eventual (then strengthen if needed)
决策树:
├─ 涉及资金? → 强一致性
├─ 不允许重复预订? → 强一致性
├─ 因果关系重要(聊天、编辑)? → 因果一致性
├─ 读密集、可容忍脏读? → 最终一致性
└─ 默认选择? → 最终一致性(必要时再增强)

Choosing Replication Pattern

复制模式选择

├─ Single region writes? → Leader-Follower
├─ Multi-region writes + conflicts OK? → Multi-Leader
├─ Multi-region writes + no conflicts? → Leader-Follower with failover
└─ Maximum availability? → Leaderless (quorum)
├─ 单区域写入? → 主从式
├─ 多区域写入 + 允许冲突? → 多主式
├─ 多区域写入 + 无冲突? → 带故障转移的主从式
└─ 最高可用性? → 无主式(仲裁机制)

Choosing Partitioning Strategy

分区策略选择

├─ Need range scans? → Range Partitioning (risk: hot spots)
├─ Data residency requirements? → Geographic Partitioning
└─ Default? → Hash Partitioning (consistent hashing)
├─ 需要范围扫描? → 范围分区(注意:存在热点风险)
├─ 有数据驻留要求? → 地理分区
└─ 默认选择? → 哈希分区(一致性哈希)

Quick Reference Tables

快速参考表

CAP/PACELC System Comparison

CAP/PACELC系统对比

SystemIf PartitionElse (Normal)Use Case
SpannerPCEC (strong)Global SQL
DynamoDBPAEL (eventual)High availability
CassandraPAEL (tunable)Wide-column store
MongoDBPCEC (default)Document store
Cosmos DBPA/PCEL/EC (5 levels)Multi-model
系统发生分区时正常状态时适用场景
SpannerPCEC (强一致性)全局SQL数据库
DynamoDBPAEL (最终一致性)高可用性场景
CassandraPAEL (可配置)宽列存储
MongoDBPCEC (默认)文档数据库
Cosmos DBPA/PCEL/EC (5个级别)多模型数据库

Consistency Model Use Cases

一致性模型适用场景

Use CaseConsistency Model
Bank account balanceStrong (Linearizable)
Seat booking (airline)Strong (Linearizable)
Inventory stock countStrong or Bounded
Shopping cartEventual
Product catalogEventual
Collaborative editingCausal
Chat messagesCausal
Social media likesEventual
DNS recordsEventual
适用场景一致性模型
银行账户余额强一致性(线性化)
航班座位预订强一致性(线性化)
库存数量统计强一致性或有界过期一致性
购物车最终一致性
产品目录最终一致性
协同编辑因果一致性
聊天消息因果一致性
社交媒体点赞最终一致性
DNS记录最终一致性

Quorum Configurations

仲裁配置

ConfigurationWRNConsistencyUse Case
Strong335StrongBanking
Balanced325StrongDefault
Write-heavy235StrongLogs
Read-heavy315EventualCache
Max Avail115EventualAnalytics
配置WRN一致性适用场景
强一致性335强一致性银行业务
平衡型325强一致性默认配置
写密集型235强一致性日志系统
读密集型315最终一致性缓存系统
最高可用性115最终一致性分析系统

Progressive Disclosure

进阶内容

Detailed References

详细参考文档

For comprehensive coverage of specific topics, see:
  • references/cap-pacelc-theorem.md - CAP and PACELC deep-dive with PACELC matrix
  • references/consistency-models.md - Strong, eventual, causal, bounded staleness patterns
  • references/replication-patterns.md - Leader-follower, multi-leader, leaderless replication
  • references/partitioning-strategies.md - Hash, range, geographic partitioning with examples
  • references/consensus-algorithms.md - Raft and Paxos overview (when consensus needed)
  • references/resilience-patterns.md - Circuit breaker, bulkhead, timeout, retry, rate limiting
  • references/saga-pattern.md - Choreography vs orchestration with working examples
  • references/event-sourcing-cqrs.md - Event sourcing and CQRS implementation patterns
  • references/service-discovery.md - Client-side, server-side, service mesh patterns
  • references/caching-strategies.md - Cache-aside, write-through, write-behind, invalidation
如需深入了解特定主题,请查看:
  • references/cap-pacelc-theorem.md - CAP与PACELC深度解析,含PACELC矩阵
  • references/consistency-models.md - 强一致性、最终一致性、因果一致性、有界过期一致性模式
  • references/replication-patterns.md - 主从、多主、无主复制模式
  • references/partitioning-strategies.md - 哈希、范围、地理分区及示例
  • references/consensus-algorithms.md - Raft与Paxos概述(需要共识机制时)
  • references/resilience-patterns.md - 断路器、舱壁隔离、超时、重试、限流
  • references/saga-pattern.md - 编排与 choreography模式及示例
  • references/event-sourcing-cqrs.md - 事件溯源与CQRS实现模式
  • references/service-discovery.md - 客户端、服务端、服务网格发现模式
  • references/caching-strategies.md - 旁路缓存、写穿、写回、缓存失效

Working Examples

实战示例

Complete, runnable examples demonstrating patterns:
  • examples/consistent-hashing/ - Consistent hashing implementation with virtual nodes
  • examples/circuit-breaker/ - Circuit breaker pattern with state transitions
  • examples/saga-orchestration/ - Saga orchestrator with compensating transactions
  • examples/event-sourcing/ - Event store with replay and snapshots
  • examples/cqrs/ - CQRS with separate read/write models
  • examples/service-discovery/ - Consul-based service discovery and registration
展示模式的完整可运行示例:
  • examples/consistent-hashing/ - 带虚拟节点的一致性哈希实现
  • examples/circuit-breaker/ - 带状态转换的断路器模式
  • examples/saga-orchestration/ - 带补偿事务的Saga协调器
  • examples/event-sourcing/ - 带重放与快照的事件存储
  • examples/cqrs/ - 分离读写模型的CQRS实现
  • examples/service-discovery/ - 基于Consul的服务发现与注册

ASCII Diagrams

ASCII图表

Visual representations for complex concepts:
  • diagrams/cap-theorem.txt - CAP theorem decision tree
  • diagrams/replication-topologies.txt - Leader-follower, multi-leader, leaderless
  • diagrams/saga-flow.txt - Saga choreography and orchestration flows
  • diagrams/caching-patterns.txt - Cache-aside, write-through, write-behind
复杂概念的可视化表示:
  • diagrams/cap-theorem.txt - CAP定理决策树
  • diagrams/replication-topologies.txt - 主从、多主、无主复制拓扑
  • diagrams/saga-flow.txt - Saga编排与 choreography流程
  • diagrams/caching-patterns.txt - 旁路缓存、写穿、写回模式

Integration with Other Skills

与其他技能的集成

Related Skills:
For Kubernetes deployment: See
kubernetes-operations
skill for pod anti-affinity, service mesh For infrastructure: See
infrastructure-as-code
skill for deploying distributed systems For databases: See
databases-sql
and
databases-nosql
for replication configuration For messaging: See
message-queues
skill for event-driven architectures, saga orchestration For monitoring: See
observability
skill for distributed tracing, monitoring patterns For testing: See
performance-engineering
skill for load testing distributed systems For security: See
security-hardening
skill for mTLS, service authentication
相关技能:
Kubernetes部署:查看
kubernetes-operations
技能了解Pod反亲和性、服务网格 基础设施:查看
infrastructure-as-code
技能了解分布式系统部署 数据库:查看
databases-sql
databases-nosql
技能了解复制配置 消息队列:查看
message-queues
技能了解事件驱动架构、Saga编排 监控:查看
observability
技能了解分布式追踪、监控模式 测试:查看
performance-engineering
技能了解分布式系统负载测试 安全:查看
security-hardening
技能了解mTLS、服务认证

Common Patterns

常见模式

Multi-Datacenter Pattern

多数据中心模式

1. Choose replication: Multi-leader or Leaderless
2. Partition data geographically
3. Implement conflict resolution (LWW, vector clocks, app-specific)
4. Monitor replication lag
5. Add circuit breakers between datacenters
1. 选择复制模式:多主或无主
2. 按地理分区数据
3. 实现冲突解决(最后写入获胜、向量时钟、应用层自定义)
4. 监控复制延迟
5. 在数据中心之间添加断路器

Event-Driven Saga Pattern

事件驱动Saga模式

1. Define saga steps and compensating actions
2. Choose choreography (events) or orchestration (coordinator)
3. Implement idempotent handlers (retries safe)
4. Publish events with outbox pattern (transactional)
5. Monitor saga progress and timeouts
1. 定义Saga步骤与补偿操作
2. 选择编排(事件)或 choreography(协调器)模式
3. 实现幂等处理器(重试安全)
4. 用outbox模式发布事件(事务性)
5. 监控Saga进度与超时

High-Availability Pattern

高可用性模式

1. Use leaderless replication (N=5, W=3, R=2)
2. Partition with consistent hashing
3. Add circuit breakers for failing nodes
4. Implement read repair and anti-entropy
5. Monitor quorum health
1. 使用无主复制(N=5,W=3,R=2)
2. 用一致性哈希分区
3. 为故障节点添加断路器
4. 实现读取修复与反熵
5. 监控仲裁健康状态

Best Practices

最佳实践

Design for Failure:
  • Network partitions will occur - always design for partition tolerance
  • Use timeouts, retries with exponential backoff
  • Implement circuit breakers to prevent cascading failures
  • Test chaos engineering scenarios (partition nodes, inject latency)
Choose Consistency Carefully:
  • Default to eventual consistency, strengthen only where needed
  • Strong consistency has real costs (latency, availability)
  • Use bounded staleness for middle ground
Idempotency is Critical:
  • Design operations to be safely retryable
  • Use unique request IDs for deduplication
  • Essential for saga compensating transactions
Monitor and Observe:
  • Distributed tracing with correlation IDs
  • Monitor replication lag, quorum health
  • Alert on circuit breaker state changes
  • Track saga progress and failures
Partition Strategically:
  • Hash partitioning for even distribution
  • Range partitioning for range queries (monitor hot spots)
  • Geographic partitioning for compliance, latency
Version Everything:
  • Event schemas evolve - use versioning
  • API versioning for service compatibility
  • Database schema migrations in distributed systems
面向故障设计:
  • 网络分区必然发生 - 始终按分区容忍性设计
  • 使用超时、带指数退避的重试
  • 实现断路器防止级联故障
  • 测试混沌工程场景(分区节点、注入延迟)
谨慎选择一致性:
  • 默认使用最终一致性,仅在必要时增强
  • 强一致性有实际成本(延迟、可用性)
  • 用有界过期一致性作为中间方案
幂等性至关重要:
  • 设计可安全重试的操作
  • 使用唯一请求ID实现去重
  • 对Saga补偿事务必不可少
监控与可观测性:
  • 带关联ID的分布式追踪
  • 监控复制延迟、仲裁健康状态
  • 断路器状态变化时触发告警
  • 跟踪Saga进度与故障
策略性分区:
  • 哈希分区实现均匀分布
  • 范围分区支持范围查询(监控热点)
  • 地理分区满足合规与低延迟需求
版本化所有内容:
  • 事件 schema演进 - 使用版本化
  • 服务API版本化以保证兼容性
  • 分布式系统中的数据库 schema迁移

Anti-Patterns to Avoid

需避免的反模式

Distributed Monolith:
  • Microservices with tight coupling
  • Shared database across services
  • Fix: Database per service, async communication
Two-Phase Commit (2PC) Overuse:
  • Slow, blocking, reduces availability
  • Fix: Use saga pattern for distributed transactions
Ignoring Network Failures:
  • Assuming network is reliable
  • Fix: Always add timeouts, retries, circuit breakers
Strong Consistency Everywhere:
  • Unnecessary latency and complexity
  • Fix: Use eventual consistency by default, strengthen where needed
No Conflict Resolution Strategy:
  • Multi-leader without handling conflicts
  • Fix: Choose LWW, vector clocks, or app-specific merge
Cache Stampede:
  • TTL expires, all clients query database
  • Fix: Probabilistic early expiration, request coalescing
分布式单体:
  • 耦合紧密的微服务
  • 服务间共享数据库
  • 修复方案:每个服务独立数据库、异步通信
过度使用两阶段提交(2PC):
  • 缓慢、阻塞、降低可用性
  • 修复方案:用Saga模式实现分布式事务
忽略网络故障:
  • 假设网络可靠
  • 修复方案:始终添加超时、重试、断路器
全链路强一致性:
  • 不必要的延迟与复杂度
  • 修复方案:默认使用最终一致性,仅在必要时增强
无冲突解决策略:
  • 多主模式下未处理冲突
  • 修复方案:选择最后写入获胜、向量时钟或应用层合并
缓存击穿:
  • TTL到期后,所有客户端同时查询数据库
  • 修复方案:概率性提前过期、请求合并

Troubleshooting

故障排查

Replication Lag Too High:
  • Check network bandwidth between datacenters
  • Monitor write throughput on leader
  • Consider async replication or multi-leader
Split-Brain Scenario:
  • Multiple leaders elected during partition
  • Fix: Use consensus (Raft, Paxos) for leader election
  • Implement fencing tokens to prevent dual writes
Hot Partitions:
  • Range partitioning with skewed data
  • Fix: Add hash component, manually redistribute, use composite keys
Saga Timeout/Stalled:
  • Service unavailable, saga can't complete
  • Fix: Implement saga timeout with automated rollback
  • Dead letter queue for manual intervention
Conflict Resolution Failures:
  • Multi-leader conflicts unhandled
  • Fix: Implement clear resolution strategy (LWW, merge, manual)
  • Monitor conflict rate, alert on spikes
复制延迟过高:
  • 检查数据中心间的网络带宽
  • 监控主节点的写入吞吐量
  • 考虑异步复制或多主模式
脑裂场景:
  • 分区期间选举出多个主节点
  • 修复方案:用共识算法(Raft、Paxos)选举主节点
  • 实现 fencing token防止双写
热点分区:
  • 范围分区下数据倾斜
  • 修复方案:添加哈希组件、手动重分配、使用复合键
Saga超时/停滞:
  • 服务不可用,Saga无法完成
  • 修复方案:实现Saga超时与自动回滚
  • 死信队列用于手动干预
冲突解决失败:
  • 多主模式下冲突未处理
  • 修复方案:明确解决策略(最后写入获胜、合并、手动处理)
  • 监控冲突率,异常时告警