system-design

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

System Design Framework

系统设计框架

A structured approach to designing large-scale distributed systems. Apply these principles when architecting new services, reviewing system designs, estimating capacity, or preparing for system design discussions.

一种设计大规模分布式系统的结构化方法。在架构新服务、评审系统设计、估算容量或准备系统设计讨论时，均可应用这些原则。

Core Principle

核心原则

Start with requirements, not solutions. Every system design begins by clarifying what you are building, for whom, and at what scale. Jumping to architecture before understanding constraints produces over-engineered or under-engineered systems.

The foundation: Scalable systems are not invented from scratch -- they are assembled from well-understood building blocks (load balancers, caches, queues, databases, CDNs) connected by clear data flows. The skill lies in choosing the right blocks, sizing them correctly, and understanding the tradeoffs each choice introduces. A four-step process -- scope, high-level design, deep dive, wrap-up -- keeps the design focused and communicable.

从需求入手，而非直接找解决方案。 每个系统设计都始于明确要构建的内容、服务对象以及规模。在理解约束条件前就敲定架构，会导致系统过度设计或设计不足。

基础要点： 可扩展系统并非凭空创造——而是由一系列成熟的构建模块（load balancers、caches、queues、databases、CDNs）通过清晰的数据流连接而成。关键在于选择合适的模块、正确规划其规模，并理解每个选择带来的权衡。四步流程——范围定义、高层设计、深度拆解、总结收尾——能让设计过程聚焦且易于沟通。

Scoring

评分标准

Goal: 10/10. When reviewing or creating system designs, rate them 0-10 based on adherence to the principles below. A 10/10 means the design clearly states requirements, includes back-of-the-envelope estimates, uses appropriate building blocks, addresses scaling and reliability, and acknowledges tradeoffs. Lower scores indicate gaps to address. Always provide the current score and specific improvements needed to reach 10/10.

目标：10/10。 在评审或创建系统设计时，根据以下原则按0-10分评分。10分意味着设计清晰阐明需求、包含粗略估算、使用合适的构建模块、解决了扩容与可靠性问题，且明确了权衡取舍。低分则表明存在需要解决的漏洞。评分时需给出当前分数及具体的改进建议，以达到10分标准。

The System Design Framework

系统设计框架

Six areas for building reliable, scalable distributed systems:

构建可靠、可扩展分布式系统的六大领域：

1. The Four-Step Process

1. 四步流程

Core concept: Every system design follows four stages: (1) understand the problem and establish design scope, (2) propose a high-level design and get buy-in, (3) dive deep into critical components, (4) wrap up with tradeoffs and future improvements.

Why it works: Without a structured process, designs either stay too abstract or get lost in premature detail. The four-step approach ensures you invest time proportionally -- broad strokes first, depth where it matters.

Key insights:

Step 1 consumes ~5-10 minutes: ask clarifying questions, list functional and non-functional requirements, agree on scale (DAU, QPS, storage)
Step 2 consumes ~15-20 minutes: draw a high-level diagram with APIs, services, data stores, and data flow arrows
Step 3 consumes ~15-20 minutes: pick 2-3 components that are hardest or most critical and design them in detail
Step 4 consumes ~5 minutes: summarize tradeoffs, identify bottlenecks, suggest future improvements
Never skip Step 1 -- ambiguity in scope leads to wasted design effort
Get explicit agreement on assumptions before proceeding

Code applications:

Context	Pattern	Example
New service kickoff	Write a one-page design doc with all four steps before coding	Requirements, API contract, data model, capacity estimate, then implementation
Architecture review	Walk reviewers through the four steps sequentially	Present scope, high-level diagram, deep-dive on the riskiest component, open questions
Incident postmortem	Trace the failure back through the four-step lens	Which requirement was missed? Which building block failed? What tradeoff bit us?

See: references/four-step-process.md

核心概念： 每个系统设计都遵循四个阶段：(1) 理解问题并明确设计范围，(2) 提出高层设计方案并获得认可，(3) 深入拆解关键组件，(4) 总结权衡点与未来改进方向。

为何有效： 缺乏结构化流程的设计要么过于抽象，要么过早陷入细节。四步方法确保你合理分配时间——先搭建整体框架，再在关键环节投入深度精力。

关键见解：

步骤1耗时约5-10分钟：提出澄清问题、列出功能与非功能需求、确认规模（日活跃用户DAU、每秒查询率QPS、存储量）
步骤2耗时约15-20分钟：绘制包含APIs、services、data stores及数据流箭头的高层架构图
步骤3耗时约15-20分钟：挑选2-3个最复杂或最关键的组件进行详细设计
步骤4耗时约5分钟：总结权衡点、识别瓶颈、提出未来改进建议
绝不能跳过步骤1——范围模糊会导致设计工作白费
在推进前需明确达成假设共识

代码场景应用：

场景	模式	示例
新服务启动	编码前撰写包含完整四步流程的单页设计文档	需求、API契约、data model、容量估算，再进行实现
架构评审	按四步流程顺序向评审者展示	介绍范围、高层架构图、对风险最高的组件进行深度拆解、提出待解决问题
事后复盘	从四步流程的角度追溯故障根源	遗漏了哪些需求？哪个构建模块失效了？哪种权衡取舍导致了问题？

参考：references/four-step-process.md

2. Back-of-the-Envelope Estimation

2. 粗略估算

Core concept: Use powers of two, latency numbers, and simple arithmetic to estimate QPS, storage, bandwidth, and server count before committing to an architecture.

Why it works: Estimation prevents two failure modes: over-provisioning (wasting money) and under-provisioning (outages under load). A 2-minute calculation can save weeks of rework.

Key insights:

Know the powers of two: 2^10 = 1 thousand, 2^20 = 1 million, 2^30 = 1 billion, 2^40 = 1 trillion
Memory read ~100 ns, SSD read ~100 us, disk seek ~10 ms, round-trip same datacenter ~0.5 ms, cross-continent ~150 ms
Availability nines: 99.9% = 8.77 hours downtime/year, 99.99% = 52.6 minutes/year
QPS estimation: DAU x average-actions-per-day / 86,400 seconds; peak QPS is typically 2-5x average
Storage estimation: records-per-day x record-size x retention-period
Always round aggressively -- the goal is order of magnitude, not precision

Code applications:

Context	Pattern	Example
Capacity planning	Estimate QPS then multiply by growth factor	100M DAU x 5 actions / 86400 = ~5,800 QPS avg, ~30K QPS peak
Storage budgeting	Estimate per-record size and multiply by volume and retention	500M tweets/day x 300 bytes x 365 days = ~55 TB/year
SLA definition	Convert availability nines to allowed downtime	Four nines (99.99%) = ~52 minutes downtime per year

See: references/estimation-numbers.md

核心概念： 在确定架构前，使用2的幂、延迟数值和简单算术来估算QPS、存储量、带宽和服务器数量。

为何有效： 估算可避免两种失败模式：过度配置（浪费成本）和配置不足（负载过高导致故障）。2分钟的计算能节省数周的返工时间。

关键见解：

牢记2的幂：2^10 = 1千，2^20 = 1百万，2^30 = 10亿，2^40 = 1万亿
内存读取约100 ns，SSD读取约100 us，磁盘寻道约10 ms，同数据中心往返约0.5 ms，跨洲往返约150 ms
可用性等级：99.9% = 每年停机8.77小时，99.99% = 每年停机52.6分钟
QPS估算：DAU × 日均操作数 ÷ 86400秒；峰值QPS通常是平均值的2-5倍
存储量估算：每日记录数 × 单记录大小 × 留存周期
大胆取整——目标是数量级准确，而非精确值

代码场景应用：

场景	模式	示例
容量规划	估算QPS后乘以增长系数	1亿DAU × 5次操作 ÷ 86400 ≈ 平均5800 QPS，峰值约3万QPS
存储预算	估算单记录大小，再乘以总量与留存周期	每日5亿条tweets × 300字节 × 365天 ≈ 每年55 TB
服务水平协议SLA定义	将可用性等级转换为允许的停机时间	四个9（99.99%）= 每年约52分钟停机时间

参考：references/estimation-numbers.md

3. Building Blocks

3. 构建模块

Core concept: Scalable systems are assembled from a standard toolkit: DNS, CDN, load balancers, reverse proxies, application servers, caches, message queues, and consistent hashing.

Why it works: Each block solves a specific scaling or reliability problem. Knowing when and why to introduce each block prevents both premature complexity and avoidable bottlenecks.

Key insights:

DNS resolves domain names; CDN caches static assets at edge locations close to users
Load balancers distribute traffic -- L4 (transport layer, fast, simple) vs L7 (application layer, content-aware routing)
Caching layers: client-side, CDN, web server, application (e.g., Redis/Memcached), database query cache
Cache strategies: cache-aside (app manages), read-through (cache manages reads), write-through (cache manages writes synchronously), write-behind (cache writes asynchronously)
Message queues (Kafka, RabbitMQ, SQS) decouple producers from consumers, absorb traffic spikes, and enable async processing
Consistent hashing distributes keys across nodes with minimal redistribution when nodes are added or removed

Code applications:

Context	Pattern	Example
Read-heavy workload	Add cache-aside with Redis in front of the database	Cache user profiles with TTL; invalidate on write
Traffic spikes	Insert a message queue between API and workers	Enqueue image-resize jobs; workers pull at their own pace
Global users	Place a CDN in front of static assets	Serve JS/CSS/images from edge; origin only serves API
Uneven load	Use consistent hashing for shard assignment	Add a node and only ~1/n keys need to move

See: references/building-blocks.md

核心概念： 可扩展系统由一套标准工具包组装而成：DNS、CDN、load balancers、反向代理、application servers、caches、message queues和一致性哈希（consistent hashing）。

为何有效： 每个模块都解决特定的扩容或可靠性问题。了解何时及为何引入每个模块，可避免过早复杂化和可避免的瓶颈。

关键见解：

DNS解析域名；CDN在靠近用户的边缘节点缓存静态资源
Load balancers分配流量——L4（传输层，快速、简单）与L7（应用层，支持基于内容的路由）
缓存层：客户端缓存、CDN、web服务器缓存、应用层缓存（如Redis/Memcached）、数据库查询缓存
缓存策略：cache-aside（应用程序管理）、read-through（缓存管理读取）、write-through（缓存同步管理写入）、write-behind（缓存异步写入）
Message queues（Kafka、RabbitMQ、SQS）解耦生产者与消费者，吸收流量峰值，支持异步处理
Consistent hashing在添加或移除节点时，将键值跨节点分布的重分配量降至最低

代码场景应用：

场景	模式	示例
读密集型工作负载	在数据库前添加采用cache-aside策略的Redis	用TTL缓存用户资料；写入时失效缓存
流量峰值处理	在API与工作节点之间插入message queue	将图片缩放任务加入队列；工作节点按需拉取任务
全球用户服务	在静态资源前部署CDN	从边缘节点提供JS/CSS/图片；仅由源站处理API请求
负载不均衡处理	使用consistent hashing进行分片分配	添加节点时仅约1/n的键值需要迁移

参考：references/building-blocks.md

4. Database Design and Scaling

4. 数据库设计与扩容

Core concept: Choose SQL vs NoSQL based on data shape and access patterns, then scale vertically first, horizontally (replication and sharding) when vertical limits are reached.

Why it works: The database is usually the first bottleneck. Understanding replication, sharding strategies, and denormalization tradeoffs lets you delay expensive re-architectures and plan growth deliberately.

Key insights:

Vertical scaling (bigger machine) is simpler but has a ceiling; horizontal scaling (more machines) is harder but nearly unlimited
Replication: leader-follower (one writer, many readers) for read-heavy; multi-leader for multi-region writes
Sharding strategies: hash-based (even distribution, hard range queries), range-based (efficient range queries, risk of hotspots), directory-based (flexible, extra lookup)
SQL when you need ACID transactions, complex joins, and a well-defined schema; NoSQL when you need flexible schema, horizontal scale, or very high write throughput
Denormalization trades storage and write complexity for faster reads -- use it when read performance is critical and data doesn't change frequently
Celebrity/hotspot problem: if one shard gets disproportionate traffic, add a secondary partition or cache layer

Code applications:

Context	Pattern	Example
Read-heavy API	Leader-follower replication with read replicas	Route reads to replicas, writes to leader; accept slight replication lag
User data at scale	Hash-based sharding on user_id	Shard key = hash(user_id) % num_shards; even distribution, each shard independent
Analytics dashboard	Denormalize into read-optimized materialized views	Pre-join and aggregate nightly; serve dashboards from the materialized table
Multi-region app	Multi-leader replication with conflict resolution	Each region has a leader; last-write-wins or application-level merge

See: references/database-scaling.md

核心概念： 根据数据形态与访问模式选择SQL或NoSQL，先进行垂直扩容，当垂直扩容达到极限后再进行水平扩容（复制与分片）。

为何有效： 数据库通常是第一个瓶颈。理解复制、分片策略和反规范化的权衡，可延迟昂贵的重构工作，并从容规划增长。

关键见解：

垂直扩容（使用更大的机器）更简单但有上限；水平扩容（使用更多机器）更复杂但几乎无上限
复制：主从复制（leader-follower，一个写节点，多个读节点）适用于读密集型场景；多主复制（multi-leader）适用于多区域写入场景
分片策略：基于哈希（hash-based，分布均匀，范围查询困难）、基于范围（range-based，范围查询高效，存在热点风险）、基于目录（directory-based，灵活，需额外查询）
当需要ACID事务、复杂关联查询和明确的schema时选择SQL；当需要灵活的schema、水平扩容或极高写入吞吐量时选择NoSQL
反规范化以存储和写入复杂度换取更快的读取速度——当读取性能至关重要且数据不频繁变更时使用
热点问题（celebrity/hotspot problem）：如果某个分片流量过大，可添加二级分区或缓存层

代码场景应用：

场景	模式	示例
读密集型API	带读副本的主从复制	读请求路由到副本，写请求路由到主节点；接受轻微的复制延迟
大规模用户数据	基于user_id的哈希分片	分片键 = hash(user_id) % 分片数；分布均匀，每个分片独立
分析仪表盘	反规范化为读优化的物化视图	每日预关联与聚合；从物化表提供仪表盘数据
多区域应用	带冲突解决的多主复制	每个区域有一个主节点；采用最后写入获胜（last-write-wins）或应用层合并策略

参考：references/database-scaling.md

5. Common System Designs

5. 常见系统设计

Core concept: Most systems are variations of a small set of well-known designs: URL shortener, rate limiter, notification system, news feed, chat system, search autocomplete, web crawler, and unique ID generator.

Why it works: Studying common designs builds a mental library of patterns and tradeoffs. When a new problem arrives, you recognize which known design it most resembles and adapt rather than invent from scratch.

Key insights:

URL shortener: base62 encoding, key-value store, 301 vs 302 redirect tradeoff, analytics via redirect logging
Rate limiter: token bucket or sliding window algorithm, placed at API gateway or middleware, return 429 with Retry-After header
News feed: fanout-on-write (push to followers' caches at post time) vs fanout-on-read (pull and merge at read time); hybrid for celebrity accounts
Chat system: WebSocket for real-time bidirectional communication, message queue for delivery guarantees, presence service via heartbeat
Search autocomplete: trie data structure, top-k frequent queries, precompute and cache results for popular prefixes
Web crawler: BFS with URL frontier, politeness (robots.txt, rate limiting per domain), deduplication via content hash
Unique ID generator: UUID (simple, no coordination) vs Snowflake (time-sortable, 64-bit, datacenter-aware)

Code applications:

Context	Pattern	Example
Short link service	Base62 encode an auto-increment ID or hash	`https://short.ly/a1B2c3` maps to row in key-value store
API protection	Token bucket rate limiter at gateway	100 tokens/min per API key; refill at steady rate; reject with 429
Social feed	Hybrid fanout: push for normal users, pull for celebrities	Pre-compute feeds for accounts with < 10K followers; merge at read time for celebrity posts
Distributed IDs	Snowflake: timestamp + datacenter + machine + sequence	64-bit, time-sortable, no coordination required between generators

See: references/common-designs.md

核心概念： 大多数系统都是一小套知名设计的变体：URL短链接、限流器（rate limiter）、通知系统、信息流、聊天系统、搜索自动补全、网络爬虫和唯一ID生成器。

为何有效： 研究常见设计可建立模式与权衡的思维库。遇到新问题时，你能识别出它最接近哪种已知设计，进而进行适配，而非从零开始创造。

关键见解：

URL短链接：base62编码、键值存储、301与302重定向的权衡、通过重定向日志实现分析
Rate limiter：令牌桶（token bucket）或滑动窗口（sliding window）算法，部署在API网关或中间件，返回429状态码并附带Retry-After头部
信息流：写时扩散（fanout-on-write，发布时推送到粉丝缓存）与读时扩散（fanout-on-read，读取时拉取合并）；针对名人账号采用混合策略
聊天系统：WebSocket用于实时双向通信，message queue用于交付保障，通过心跳实现在线状态服务（presence service）
搜索自动补全：字典树（trie）数据结构、Top-K高频查询、预计算并缓存热门前缀的结果
网络爬虫：带URL队列的广度优先搜索（BFS）、合规性（robots.txt、按域名限流）、通过内容哈希去重
唯一ID生成器：UUID（简单，无需协调）与Snowflake（可按时间排序，64位，支持数据中心识别）

代码场景应用：

场景	模式	示例
短链接服务	对自增ID或哈希值进行base62编码	`https://short.ly/a1B2c3` 映射到键值存储中的记录
API保护	在网关部署令牌桶rate limiter	每个API密钥每分钟100个令牌；匀速补充令牌；超出限制时返回429
社交信息流	混合扩散策略：普通用户用写时扩散，名人用户用读时扩散	为粉丝数<1万的账号预计算信息流；读取时合并名人账号的帖子
分布式ID	Snowflake：时间戳 + 数据中心 + 机器 + 序列号	64位，可按时间排序，生成器之间无需协调

参考：references/common-designs.md

6. Reliability and Operations

6. 可靠性与运维

Core concept: A system is only as good as its ability to stay up, recover from failures, and be observed. Health checks, monitoring, logging, and deployment strategies are not afterthoughts -- they are first-class design concerns.

Why it works: Production systems fail in ways that design diagrams never predict. Operational readiness -- metrics, alerts, rollback plans, and redundancy -- determines whether a failure becomes a minor blip or a major outage.

Key insights:

Health checks: liveness (is the process alive?) and readiness (can it serve traffic?) -- Kubernetes uses both
Monitoring stack: metrics (Prometheus, Datadog), logging (ELK, CloudWatch), tracing (Jaeger, Zipkin) -- the three pillars of observability
Deployment strategies: rolling (gradual replacement), blue-green (two identical environments, instant switch), canary (small percentage first, then expand)
Disaster recovery: RPO (how much data can you lose) and RTO (how long until recovery) define your backup and failover strategy
Multi-datacenter: active-passive (failover) or active-active (both serving); active-active requires data synchronization and conflict resolution
Autoscaling: scale on CPU, memory, queue depth, or custom metrics; always set both min and max instance counts

Code applications:

Context	Pattern	Example
Zero-downtime deploy	Blue-green with health check gates	Route traffic to green after health checks pass; keep blue as instant rollback
Gradual rollout	Canary deploy with metric comparison	Send 5% of traffic to new version; compare error rate and latency; promote or rollback
Failure detection	Liveness and readiness probes	`/healthz` returns 200 if alive; `/ready` returns 200 if database connected and cache warm
Data safety	Define RPO/RTO and implement accordingly	RPO = 1 hour means hourly backups; RTO = 5 min means automated failover

See: references/reliability-operations.md

核心概念： 系统的价值取决于其持续运行、从故障中恢复和被监控的能力。健康检查、监控、日志和部署策略并非事后补充——它们是一等设计要素。

为何有效： 生产系统的故障模式是设计图无法预测的。运维就绪性——指标、告警、回滚计划和冗余——决定了故障是小插曲还是重大 outage。

关键见解：

健康检查：存活检查（liveness，进程是否存活？）与就绪检查（readiness，是否能处理流量？）——Kubernetes同时使用这两种检查
监控栈：指标（Prometheus、Datadog）、日志（ELK、CloudWatch）、链路追踪（Jaeger、Zipkin）——可观测性三大支柱
部署策略：滚动部署（rolling，逐步替换）、蓝绿部署（blue-green，两套相同环境，即时切换）、金丝雀部署（canary，先小流量验证，再逐步扩容）
灾难恢复：恢复点目标（RPO，可容忍的数据丢失量）与恢复时间目标（RTO，恢复所需时间）定义了你的备份与故障转移策略
多数据中心：主备模式（active-passive，故障转移）或双活模式（active-active，同时服务）；双活模式需要数据同步与冲突解决
自动扩缩容：基于CPU、内存、队列深度或自定义指标进行扩缩容；始终设置最小与最大实例数

代码场景应用：

场景	模式	示例
零停机部署	带健康检查网关的蓝绿部署	健康检查通过后将流量路由到绿环境；保留蓝环境作为即时回滚选项
灰度发布	带指标对比的金丝雀部署	将5%的流量导向新版本；对比错误率与延迟；决定推广或回滚
故障检测	存活与就绪探针	`/healthz` 返回200表示进程存活； `/ready` 返回200表示已连接数据库且缓存预热完成
数据安全	定义RPO/RTO并据此实现	RPO=1小时意味着每小时备份一次；RTO=5分钟意味着自动化故障转移

参考：references/reliability-operations.md

Common Mistakes

常见错误

Mistake	Why It Fails	Fix
Jumping to architecture without clarifying requirements	You solve the wrong problem or miss critical constraints	Spend the first 5-10 minutes on scope: features, scale, SLA
No back-of-the-envelope estimation	Over-provision or under-provision by orders of magnitude	Estimate QPS, storage, and bandwidth before choosing components
Single point of failure	One component failure takes down the entire system	Add redundancy at every layer: multi-server, multi-AZ, multi-region
Premature sharding	Adds enormous operational complexity before it is needed	Scale vertically first, add read replicas, cache aggressively, shard last
Caching without invalidation strategy	Stale data causes bugs and user confusion	Define TTL, cache-aside with explicit invalidation on writes
Synchronous calls everywhere	One slow downstream service cascades latency to all callers	Use message queues for non-latency-critical paths; set timeouts on sync calls
Ignoring the celebrity/hotspot problem	One shard or cache key gets hammered, others idle	Detect hot keys, add secondary partitioning, or use local caches
No monitoring or alerting	You find out about failures from users, not dashboards	Instrument metrics, logs, and traces from day one

错误	失败原因	修复方案
未明确需求就直接架构设计	解决了错误的问题或遗漏了关键约束	前5-10分钟用于明确范围：功能、规模、SLA
未进行粗略估算	配置过度或不足数个数量级	选择组件前先估算QPS、存储量和带宽
单点故障	单个组件故障导致整个系统瘫痪	在每个层级添加冗余：多服务器、多可用区（multi-AZ）、多区域
过早分片	在需要前引入巨大的运维复杂度	先进行垂直扩容，添加读副本，充分利用缓存，最后再分片
缓存未设置失效策略	Stale数据导致bug与用户困惑	定义TTL，采用cache-aside策略并在写入时显式失效缓存
全同步调用	一个下游服务变慢会导致所有调用者延迟飙升	对非延迟敏感路径使用message queues；为同步调用设置超时
忽略热点问题	某个分片或缓存键被大量请求，其他节点闲置	检测热点键，添加二级分区，或使用本地缓存
无监控与告警	从用户反馈而非仪表盘发现故障	从第一天就埋点指标、日志与链路追踪

Quick Diagnostic

快速诊断

Question	If No	Action
Are functional and non-functional requirements explicitly listed?	Design is based on assumptions	Write down features, DAU, QPS, storage, latency SLA, availability SLA
Do you have a back-of-the-envelope estimate for QPS and storage?	Capacity is a guess	Calculate: DAU x actions / 86400 for QPS; records x size x retention for storage
Is every component in the diagram redundant?	Single points of failure exist	Add replicas, failover, or multi-AZ for each component
Is the database scaling strategy defined?	You will hit a wall under growth	Plan: vertical first, then read replicas, then sharding with a clear shard key
Is there a caching layer for read-heavy paths?	Database takes unnecessary load	Add Redis/Memcached with cache-aside and a defined TTL
Are async paths using message queues?	Tight coupling, cascading failures	Decouple with Kafka/SQS for background jobs, notifications, analytics
Is there a monitoring and alerting plan?	Blind to failures in production	Define metrics, log aggregation, tracing, and alert thresholds
Is the deployment strategy defined?	Risky all-at-once releases	Choose rolling, blue-green, or canary with automated rollback

问题	如果答案为否	行动
是否明确列出了功能与非功能需求？	设计基于假设	写下功能、DAU、QPS、存储量、延迟SLA、可用性SLA
是否有QPS与存储量的粗略估算？	容量全靠猜测	计算：DAU × 操作数 ÷ 86400得到QPS；记录数 × 大小 × 留存周期得到存储量
架构图中的每个组件是否都有冗余？	存在单点故障	为每个组件添加副本、故障转移或多可用区部署
是否定义了数据库扩容策略？	增长到一定程度会遇到瓶颈	规划：先垂直扩容，再添加读副本，最后采用明确分片键的分片策略
读密集型路径是否有缓存层？	数据库承受不必要的负载	添加Redis/Memcached，采用cache-aside策略并定义TTL
异步路径是否使用message queues？	耦合紧密，故障会扩散	用Kafka/SQS解耦后台任务、通知、分析等流程
是否有监控与告警计划？	生产环境故障不可见	定义指标、日志聚合、链路追踪与告警阈值
是否定义了部署策略？	全量发布风险高	选择滚动、蓝绿或金丝雀部署，并配置自动化回滚

Reference Files

参考文件

four-step-process.md: The complete four-step process with time allocation, example questions, and tips for each stage
estimation-numbers.md: Powers of two, latency numbers, availability nines, QPS/storage/bandwidth estimation with worked examples
building-blocks.md: DNS, CDN, load balancers, caching strategies, message queues, consistent hashing
database-scaling.md: SQL vs NoSQL, replication, sharding strategies, denormalization, database selection guide
common-designs.md: URL shortener, rate limiter, news feed, chat system, search autocomplete, web crawler, unique ID generator
reliability-operations.md: Health checks, monitoring, logging, deployment strategies, disaster recovery, autoscaling

four-step-process.md: 完整的四步流程，包含时间分配、示例问题和各阶段技巧
estimation-numbers.md: 2的幂、延迟数值、可用性等级、QPS/存储/带宽估算的示例
building-blocks.md: DNS、CDN、load balancers、缓存策略、message queues、consistent hashing
database-scaling.md: SQL vs NoSQL、复制、分片策略、反规范化、数据库选择指南
common-designs.md: URL短链接、rate limiter、信息流、聊天系统、搜索自动补全、网络爬虫、唯一ID生成器
reliability-operations.md: 健康检查、监控、日志、部署策略、灾难恢复、自动扩缩容

进一步阅读

This skill is based on Alex Xu's practical system design methodology. For the complete guides with detailed diagrams and walkthroughs:

"System Design Interview -- An Insider's Guide" by Alex Xu (Volume 1)
"System Design Interview -- An Insider's Guide: Volume 2" by Alex Xu (Volume 2)
"Designing Data-Intensive Applications" by Martin Kleppmann (deep dive into data systems fundamentals)
ByteByteGo -- Alex Xu's platform with visual system design explanations

本内容基于Alex Xu的实用系统设计方法论。如需获取包含详细图表与演练的完整指南：

《System Design Interview -- An Insider's Guide》作者Alex Xu（第一卷）
《System Design Interview -- An Insider's Guide: Volume 2》作者Alex Xu（第二卷）
《Designing Data-Intensive Applications》作者Martin Kleppmann（数据系统基础深度解析）
ByteByteGo -- Alex Xu的平台，提供可视化的系统设计讲解

About the Author

关于作者

Alex Xu is a software engineer and the creator of ByteByteGo, one of the most popular platforms for learning system design. His two-volume System Design Interview series has become the de facto preparation resource for engineers at all levels, with over 500,000 copies sold. Xu's approach emphasizes structured thinking, back-of-the-envelope estimation, and clear communication of design decisions. Before ByteByteGo, he worked at Twitter, Apple, and Oracle. His visual explanations and step-by-step frameworks have made system design accessible to a broad engineering audience, transforming what was traditionally an opaque topic into a learnable, repeatable skill.

Alex Xu是一名软件工程师，同时是ByteByteGo的创始人，ByteByteGo是最受欢迎的系统设计学习平台之一。他的两卷《System Design Interview》系列已成为各层级工程师的标准备考资料，销量超过50万册。Alex Xu的方法强调结构化思维、粗略估算和清晰传达设计决策。在创立ByteByteGo之前，他曾任职于Twitter、Apple和Oracle。他的可视化讲解与分步框架让系统设计这一传统上晦涩的话题变得易于学习、可重复掌握，面向广大工程师群体开放。

system-design

Original

Translation

System Design Framework

系统设计框架

Core Principle

核心原则

Scoring

评分标准

The System Design Framework

系统设计框架

1. The Four-Step Process

1. 四步流程

2. Back-of-the-Envelope Estimation

2. 粗略估算

3. Building Blocks

3. 构建模块

4. Database Design and Scaling

4. 数据库设计与扩容

5. Common System Designs

5. 常见系统设计

6. Reliability and Operations

6. 可靠性与运维

Common Mistakes

常见错误

Quick Diagnostic

快速诊断

Reference Files

参考文件

Further Reading

进一步阅读

About the Author

关于作者