system-design
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseSystem Design Framework
系统设计框架
A structured approach to designing large-scale distributed systems. Apply these principles when architecting new services, reviewing system designs, estimating capacity, or preparing for system design discussions.
一种设计大规模分布式系统的结构化方法。在架构新服务、评审系统设计、估算容量或准备系统设计讨论时,均可应用这些原则。
Core Principle
核心原则
Start with requirements, not solutions. Every system design begins by clarifying what you are building, for whom, and at what scale. Jumping to architecture before understanding constraints produces over-engineered or under-engineered systems.
The foundation: Scalable systems are not invented from scratch -- they are assembled from well-understood building blocks (load balancers, caches, queues, databases, CDNs) connected by clear data flows. The skill lies in choosing the right blocks, sizing them correctly, and understanding the tradeoffs each choice introduces. A four-step process -- scope, high-level design, deep dive, wrap-up -- keeps the design focused and communicable.
从需求入手,而非直接找解决方案。 每个系统设计都始于明确要构建的内容、服务对象以及规模。在理解约束条件前就敲定架构,会导致系统过度设计或设计不足。
基础要点: 可扩展系统并非凭空创造——而是由一系列成熟的构建模块(load balancers、caches、queues、databases、CDNs)通过清晰的数据流连接而成。关键在于选择合适的模块、正确规划其规模,并理解每个选择带来的权衡。四步流程——范围定义、高层设计、深度拆解、总结收尾——能让设计过程聚焦且易于沟通。
Scoring
评分标准
Goal: 10/10. When reviewing or creating system designs, rate them 0-10 based on adherence to the principles below. A 10/10 means the design clearly states requirements, includes back-of-the-envelope estimates, uses appropriate building blocks, addresses scaling and reliability, and acknowledges tradeoffs. Lower scores indicate gaps to address. Always provide the current score and specific improvements needed to reach 10/10.
目标:10/10。 在评审或创建系统设计时,根据以下原则按0-10分评分。10分意味着设计清晰阐明需求、包含粗略估算、使用合适的构建模块、解决了扩容与可靠性问题,且明确了权衡取舍。低分则表明存在需要解决的漏洞。评分时需给出当前分数及具体的改进建议,以达到10分标准。
The System Design Framework
系统设计框架
Six areas for building reliable, scalable distributed systems:
构建可靠、可扩展分布式系统的六大领域:
1. The Four-Step Process
1. 四步流程
Core concept: Every system design follows four stages: (1) understand the problem and establish design scope, (2) propose a high-level design and get buy-in, (3) dive deep into critical components, (4) wrap up with tradeoffs and future improvements.
Why it works: Without a structured process, designs either stay too abstract or get lost in premature detail. The four-step approach ensures you invest time proportionally -- broad strokes first, depth where it matters.
Key insights:
- Step 1 consumes ~5-10 minutes: ask clarifying questions, list functional and non-functional requirements, agree on scale (DAU, QPS, storage)
- Step 2 consumes ~15-20 minutes: draw a high-level diagram with APIs, services, data stores, and data flow arrows
- Step 3 consumes ~15-20 minutes: pick 2-3 components that are hardest or most critical and design them in detail
- Step 4 consumes ~5 minutes: summarize tradeoffs, identify bottlenecks, suggest future improvements
- Never skip Step 1 -- ambiguity in scope leads to wasted design effort
- Get explicit agreement on assumptions before proceeding
Code applications:
| Context | Pattern | Example |
|---|---|---|
| New service kickoff | Write a one-page design doc with all four steps before coding | Requirements, API contract, data model, capacity estimate, then implementation |
| Architecture review | Walk reviewers through the four steps sequentially | Present scope, high-level diagram, deep-dive on the riskiest component, open questions |
| Incident postmortem | Trace the failure back through the four-step lens | Which requirement was missed? Which building block failed? What tradeoff bit us? |
See: references/four-step-process.md
核心概念: 每个系统设计都遵循四个阶段:(1) 理解问题并明确设计范围,(2) 提出高层设计方案并获得认可,(3) 深入拆解关键组件,(4) 总结权衡点与未来改进方向。
为何有效: 缺乏结构化流程的设计要么过于抽象,要么过早陷入细节。四步方法确保你合理分配时间——先搭建整体框架,再在关键环节投入深度精力。
关键见解:
- 步骤1耗时约5-10分钟:提出澄清问题、列出功能与非功能需求、确认规模(日活跃用户DAU、每秒查询率QPS、存储量)
- 步骤2耗时约15-20分钟:绘制包含APIs、services、data stores及数据流箭头的高层架构图
- 步骤3耗时约15-20分钟:挑选2-3个最复杂或最关键的组件进行详细设计
- 步骤4耗时约5分钟:总结权衡点、识别瓶颈、提出未来改进建议
- 绝不能跳过步骤1——范围模糊会导致设计工作白费
- 在推进前需明确达成假设共识
代码场景应用:
| 场景 | 模式 | 示例 |
|---|---|---|
| 新服务启动 | 编码前撰写包含完整四步流程的单页设计文档 | 需求、API契约、data model、容量估算,再进行实现 |
| 架构评审 | 按四步流程顺序向评审者展示 | 介绍范围、高层架构图、对风险最高的组件进行深度拆解、提出待解决问题 |
| 事后复盘 | 从四步流程的角度追溯故障根源 | 遗漏了哪些需求?哪个构建模块失效了?哪种权衡取舍导致了问题? |
参考:references/four-step-process.md
2. Back-of-the-Envelope Estimation
2. 粗略估算
Core concept: Use powers of two, latency numbers, and simple arithmetic to estimate QPS, storage, bandwidth, and server count before committing to an architecture.
Why it works: Estimation prevents two failure modes: over-provisioning (wasting money) and under-provisioning (outages under load). A 2-minute calculation can save weeks of rework.
Key insights:
- Know the powers of two: 2^10 = 1 thousand, 2^20 = 1 million, 2^30 = 1 billion, 2^40 = 1 trillion
- Memory read ~100 ns, SSD read ~100 us, disk seek ~10 ms, round-trip same datacenter ~0.5 ms, cross-continent ~150 ms
- Availability nines: 99.9% = 8.77 hours downtime/year, 99.99% = 52.6 minutes/year
- QPS estimation: DAU x average-actions-per-day / 86,400 seconds; peak QPS is typically 2-5x average
- Storage estimation: records-per-day x record-size x retention-period
- Always round aggressively -- the goal is order of magnitude, not precision
Code applications:
| Context | Pattern | Example |
|---|---|---|
| Capacity planning | Estimate QPS then multiply by growth factor | 100M DAU x 5 actions / 86400 = ~5,800 QPS avg, ~30K QPS peak |
| Storage budgeting | Estimate per-record size and multiply by volume and retention | 500M tweets/day x 300 bytes x 365 days = ~55 TB/year |
| SLA definition | Convert availability nines to allowed downtime | Four nines (99.99%) = ~52 minutes downtime per year |
See: references/estimation-numbers.md
核心概念: 在确定架构前,使用2的幂、延迟数值和简单算术来估算QPS、存储量、带宽和服务器数量。
为何有效: 估算可避免两种失败模式:过度配置(浪费成本)和配置不足(负载过高导致故障)。2分钟的计算能节省数周的返工时间。
关键见解:
- 牢记2的幂:2^10 = 1千,2^20 = 1百万,2^30 = 10亿,2^40 = 1万亿
- 内存读取约100 ns,SSD读取约100 us,磁盘寻道约10 ms,同数据中心往返约0.5 ms,跨洲往返约150 ms
- 可用性等级:99.9% = 每年停机8.77小时,99.99% = 每年停机52.6分钟
- QPS估算:DAU × 日均操作数 ÷ 86400秒;峰值QPS通常是平均值的2-5倍
- 存储量估算:每日记录数 × 单记录大小 × 留存周期
- 大胆取整——目标是数量级准确,而非精确值
代码场景应用:
| 场景 | 模式 | 示例 |
|---|---|---|
| 容量规划 | 估算QPS后乘以增长系数 | 1亿DAU × 5次操作 ÷ 86400 ≈ 平均5800 QPS,峰值约3万QPS |
| 存储预算 | 估算单记录大小,再乘以总量与留存周期 | 每日5亿条tweets × 300字节 × 365天 ≈ 每年55 TB |
| 服务水平协议SLA定义 | 将可用性等级转换为允许的停机时间 | 四个9(99.99%)= 每年约52分钟停机时间 |
参考:references/estimation-numbers.md
3. Building Blocks
3. 构建模块
Core concept: Scalable systems are assembled from a standard toolkit: DNS, CDN, load balancers, reverse proxies, application servers, caches, message queues, and consistent hashing.
Why it works: Each block solves a specific scaling or reliability problem. Knowing when and why to introduce each block prevents both premature complexity and avoidable bottlenecks.
Key insights:
- DNS resolves domain names; CDN caches static assets at edge locations close to users
- Load balancers distribute traffic -- L4 (transport layer, fast, simple) vs L7 (application layer, content-aware routing)
- Caching layers: client-side, CDN, web server, application (e.g., Redis/Memcached), database query cache
- Cache strategies: cache-aside (app manages), read-through (cache manages reads), write-through (cache manages writes synchronously), write-behind (cache writes asynchronously)
- Message queues (Kafka, RabbitMQ, SQS) decouple producers from consumers, absorb traffic spikes, and enable async processing
- Consistent hashing distributes keys across nodes with minimal redistribution when nodes are added or removed
Code applications:
| Context | Pattern | Example |
|---|---|---|
| Read-heavy workload | Add cache-aside with Redis in front of the database | Cache user profiles with TTL; invalidate on write |
| Traffic spikes | Insert a message queue between API and workers | Enqueue image-resize jobs; workers pull at their own pace |
| Global users | Place a CDN in front of static assets | Serve JS/CSS/images from edge; origin only serves API |
| Uneven load | Use consistent hashing for shard assignment | Add a node and only ~1/n keys need to move |
See: references/building-blocks.md
核心概念: 可扩展系统由一套标准工具包组装而成:DNS、CDN、load balancers、反向代理、application servers、caches、message queues和一致性哈希(consistent hashing)。
为何有效: 每个模块都解决特定的扩容或可靠性问题。了解何时及为何引入每个模块,可避免过早复杂化和可避免的瓶颈。
关键见解:
- DNS解析域名;CDN在靠近用户的边缘节点缓存静态资源
- Load balancers分配流量——L4(传输层,快速、简单)与L7(应用层,支持基于内容的路由)
- 缓存层:客户端缓存、CDN、web服务器缓存、应用层缓存(如Redis/Memcached)、数据库查询缓存
- 缓存策略:cache-aside(应用程序管理)、read-through(缓存管理读取)、write-through(缓存同步管理写入)、write-behind(缓存异步写入)
- Message queues(Kafka、RabbitMQ、SQS)解耦生产者与消费者,吸收流量峰值,支持异步处理
- Consistent hashing在添加或移除节点时,将键值跨节点分布的重分配量降至最低
代码场景应用:
| 场景 | 模式 | 示例 |
|---|---|---|
| 读密集型工作负载 | 在数据库前添加采用cache-aside策略的Redis | 用TTL缓存用户资料;写入时失效缓存 |
| 流量峰值处理 | 在API与工作节点之间插入message queue | 将图片缩放任务加入队列;工作节点按需拉取任务 |
| 全球用户服务 | 在静态资源前部署CDN | 从边缘节点提供JS/CSS/图片;仅由源站处理API请求 |
| 负载不均衡处理 | 使用consistent hashing进行分片分配 | 添加节点时仅约1/n的键值需要迁移 |
参考:references/building-blocks.md
4. Database Design and Scaling
4. 数据库设计与扩容
Core concept: Choose SQL vs NoSQL based on data shape and access patterns, then scale vertically first, horizontally (replication and sharding) when vertical limits are reached.
Why it works: The database is usually the first bottleneck. Understanding replication, sharding strategies, and denormalization tradeoffs lets you delay expensive re-architectures and plan growth deliberately.
Key insights:
- Vertical scaling (bigger machine) is simpler but has a ceiling; horizontal scaling (more machines) is harder but nearly unlimited
- Replication: leader-follower (one writer, many readers) for read-heavy; multi-leader for multi-region writes
- Sharding strategies: hash-based (even distribution, hard range queries), range-based (efficient range queries, risk of hotspots), directory-based (flexible, extra lookup)
- SQL when you need ACID transactions, complex joins, and a well-defined schema; NoSQL when you need flexible schema, horizontal scale, or very high write throughput
- Denormalization trades storage and write complexity for faster reads -- use it when read performance is critical and data doesn't change frequently
- Celebrity/hotspot problem: if one shard gets disproportionate traffic, add a secondary partition or cache layer
Code applications:
| Context | Pattern | Example |
|---|---|---|
| Read-heavy API | Leader-follower replication with read replicas | Route reads to replicas, writes to leader; accept slight replication lag |
| User data at scale | Hash-based sharding on user_id | Shard key = hash(user_id) % num_shards; even distribution, each shard independent |
| Analytics dashboard | Denormalize into read-optimized materialized views | Pre-join and aggregate nightly; serve dashboards from the materialized table |
| Multi-region app | Multi-leader replication with conflict resolution | Each region has a leader; last-write-wins or application-level merge |
See: references/database-scaling.md
核心概念: 根据数据形态与访问模式选择SQL或NoSQL,先进行垂直扩容,当垂直扩容达到极限后再进行水平扩容(复制与分片)。
为何有效: 数据库通常是第一个瓶颈。理解复制、分片策略和反规范化的权衡,可延迟昂贵的重构工作,并从容规划增长。
关键见解:
- 垂直扩容(使用更大的机器)更简单但有上限;水平扩容(使用更多机器)更复杂但几乎无上限
- 复制:主从复制(leader-follower,一个写节点,多个读节点)适用于读密集型场景;多主复制(multi-leader)适用于多区域写入场景
- 分片策略:基于哈希(hash-based,分布均匀,范围查询困难)、基于范围(range-based,范围查询高效,存在热点风险)、基于目录(directory-based,灵活,需额外查询)
- 当需要ACID事务、复杂关联查询和明确的schema时选择SQL;当需要灵活的schema、水平扩容或极高写入吞吐量时选择NoSQL
- 反规范化以存储和写入复杂度换取更快的读取速度——当读取性能至关重要且数据不频繁变更时使用
- 热点问题(celebrity/hotspot problem):如果某个分片流量过大,可添加二级分区或缓存层
代码场景应用:
| 场景 | 模式 | 示例 |
|---|---|---|
| 读密集型API | 带读副本的主从复制 | 读请求路由到副本,写请求路由到主节点;接受轻微的复制延迟 |
| 大规模用户数据 | 基于user_id的哈希分片 | 分片键 = hash(user_id) % 分片数;分布均匀,每个分片独立 |
| 分析仪表盘 | 反规范化为读优化的物化视图 | 每日预关联与聚合;从物化表提供仪表盘数据 |
| 多区域应用 | 带冲突解决的多主复制 | 每个区域有一个主节点;采用最后写入获胜(last-write-wins)或应用层合并策略 |
参考:references/database-scaling.md
5. Common System Designs
5. 常见系统设计
Core concept: Most systems are variations of a small set of well-known designs: URL shortener, rate limiter, notification system, news feed, chat system, search autocomplete, web crawler, and unique ID generator.
Why it works: Studying common designs builds a mental library of patterns and tradeoffs. When a new problem arrives, you recognize which known design it most resembles and adapt rather than invent from scratch.
Key insights:
- URL shortener: base62 encoding, key-value store, 301 vs 302 redirect tradeoff, analytics via redirect logging
- Rate limiter: token bucket or sliding window algorithm, placed at API gateway or middleware, return 429 with Retry-After header
- News feed: fanout-on-write (push to followers' caches at post time) vs fanout-on-read (pull and merge at read time); hybrid for celebrity accounts
- Chat system: WebSocket for real-time bidirectional communication, message queue for delivery guarantees, presence service via heartbeat
- Search autocomplete: trie data structure, top-k frequent queries, precompute and cache results for popular prefixes
- Web crawler: BFS with URL frontier, politeness (robots.txt, rate limiting per domain), deduplication via content hash
- Unique ID generator: UUID (simple, no coordination) vs Snowflake (time-sortable, 64-bit, datacenter-aware)
Code applications:
| Context | Pattern | Example |
|---|---|---|
| Short link service | Base62 encode an auto-increment ID or hash | |
| API protection | Token bucket rate limiter at gateway | 100 tokens/min per API key; refill at steady rate; reject with 429 |
| Social feed | Hybrid fanout: push for normal users, pull for celebrities | Pre-compute feeds for accounts with < 10K followers; merge at read time for celebrity posts |
| Distributed IDs | Snowflake: timestamp + datacenter + machine + sequence | 64-bit, time-sortable, no coordination required between generators |
See: references/common-designs.md
核心概念: 大多数系统都是一小套知名设计的变体:URL短链接、限流器(rate limiter)、通知系统、信息流、聊天系统、搜索自动补全、网络爬虫和唯一ID生成器。
为何有效: 研究常见设计可建立模式与权衡的思维库。遇到新问题时,你能识别出它最接近哪种已知设计,进而进行适配,而非从零开始创造。
关键见解:
- URL短链接:base62编码、键值存储、301与302重定向的权衡、通过重定向日志实现分析
- Rate limiter:令牌桶(token bucket)或滑动窗口(sliding window)算法,部署在API网关或中间件,返回429状态码并附带Retry-After头部
- 信息流:写时扩散(fanout-on-write,发布时推送到粉丝缓存)与读时扩散(fanout-on-read,读取时拉取合并);针对名人账号采用混合策略
- 聊天系统:WebSocket用于实时双向通信,message queue用于交付保障,通过心跳实现在线状态服务(presence service)
- 搜索自动补全:字典树(trie)数据结构、Top-K高频查询、预计算并缓存热门前缀的结果
- 网络爬虫:带URL队列的广度优先搜索(BFS)、合规性(robots.txt、按域名限流)、通过内容哈希去重
- 唯一ID生成器:UUID(简单,无需协调)与Snowflake(可按时间排序,64位,支持数据中心识别)
代码场景应用:
| 场景 | 模式 | 示例 |
|---|---|---|
| 短链接服务 | 对自增ID或哈希值进行base62编码 | |
| API保护 | 在网关部署令牌桶rate limiter | 每个API密钥每分钟100个令牌;匀速补充令牌;超出限制时返回429 |
| 社交信息流 | 混合扩散策略:普通用户用写时扩散,名人用户用读时扩散 | 为粉丝数<1万的账号预计算信息流;读取时合并名人账号的帖子 |
| 分布式ID | Snowflake:时间戳 + 数据中心 + 机器 + 序列号 | 64位,可按时间排序,生成器之间无需协调 |
参考:references/common-designs.md
6. Reliability and Operations
6. 可靠性与运维
Core concept: A system is only as good as its ability to stay up, recover from failures, and be observed. Health checks, monitoring, logging, and deployment strategies are not afterthoughts -- they are first-class design concerns.
Why it works: Production systems fail in ways that design diagrams never predict. Operational readiness -- metrics, alerts, rollback plans, and redundancy -- determines whether a failure becomes a minor blip or a major outage.
Key insights:
- Health checks: liveness (is the process alive?) and readiness (can it serve traffic?) -- Kubernetes uses both
- Monitoring stack: metrics (Prometheus, Datadog), logging (ELK, CloudWatch), tracing (Jaeger, Zipkin) -- the three pillars of observability
- Deployment strategies: rolling (gradual replacement), blue-green (two identical environments, instant switch), canary (small percentage first, then expand)
- Disaster recovery: RPO (how much data can you lose) and RTO (how long until recovery) define your backup and failover strategy
- Multi-datacenter: active-passive (failover) or active-active (both serving); active-active requires data synchronization and conflict resolution
- Autoscaling: scale on CPU, memory, queue depth, or custom metrics; always set both min and max instance counts
Code applications:
| Context | Pattern | Example |
|---|---|---|
| Zero-downtime deploy | Blue-green with health check gates | Route traffic to green after health checks pass; keep blue as instant rollback |
| Gradual rollout | Canary deploy with metric comparison | Send 5% of traffic to new version; compare error rate and latency; promote or rollback |
| Failure detection | Liveness and readiness probes | |
| Data safety | Define RPO/RTO and implement accordingly | RPO = 1 hour means hourly backups; RTO = 5 min means automated failover |
See: references/reliability-operations.md
核心概念: 系统的价值取决于其持续运行、从故障中恢复和被监控的能力。健康检查、监控、日志和部署策略并非事后补充——它们是一等设计要素。
为何有效: 生产系统的故障模式是设计图无法预测的。运维就绪性——指标、告警、回滚计划和冗余——决定了故障是小插曲还是重大 outage。
关键见解:
- 健康检查:存活检查(liveness,进程是否存活?)与就绪检查(readiness,是否能处理流量?)——Kubernetes同时使用这两种检查
- 监控栈:指标(Prometheus、Datadog)、日志(ELK、CloudWatch)、链路追踪(Jaeger、Zipkin)——可观测性三大支柱
- 部署策略:滚动部署(rolling,逐步替换)、蓝绿部署(blue-green,两套相同环境,即时切换)、金丝雀部署(canary,先小流量验证,再逐步扩容)
- 灾难恢复:恢复点目标(RPO,可容忍的数据丢失量)与恢复时间目标(RTO,恢复所需时间)定义了你的备份与故障转移策略
- 多数据中心:主备模式(active-passive,故障转移)或双活模式(active-active,同时服务);双活模式需要数据同步与冲突解决
- 自动扩缩容:基于CPU、内存、队列深度或自定义指标进行扩缩容;始终设置最小与最大实例数
代码场景应用:
| 场景 | 模式 | 示例 |
|---|---|---|
| 零停机部署 | 带健康检查网关的蓝绿部署 | 健康检查通过后将流量路由到绿环境;保留蓝环境作为即时回滚选项 |
| 灰度发布 | 带指标对比的金丝雀部署 | 将5%的流量导向新版本;对比错误率与延迟;决定推广或回滚 |
| 故障检测 | 存活与就绪探针 | |
| 数据安全 | 定义RPO/RTO并据此实现 | RPO=1小时意味着每小时备份一次;RTO=5分钟意味着自动化故障转移 |
参考:references/reliability-operations.md
Common Mistakes
常见错误
| Mistake | Why It Fails | Fix |
|---|---|---|
| Jumping to architecture without clarifying requirements | You solve the wrong problem or miss critical constraints | Spend the first 5-10 minutes on scope: features, scale, SLA |
| No back-of-the-envelope estimation | Over-provision or under-provision by orders of magnitude | Estimate QPS, storage, and bandwidth before choosing components |
| Single point of failure | One component failure takes down the entire system | Add redundancy at every layer: multi-server, multi-AZ, multi-region |
| Premature sharding | Adds enormous operational complexity before it is needed | Scale vertically first, add read replicas, cache aggressively, shard last |
| Caching without invalidation strategy | Stale data causes bugs and user confusion | Define TTL, cache-aside with explicit invalidation on writes |
| Synchronous calls everywhere | One slow downstream service cascades latency to all callers | Use message queues for non-latency-critical paths; set timeouts on sync calls |
| Ignoring the celebrity/hotspot problem | One shard or cache key gets hammered, others idle | Detect hot keys, add secondary partitioning, or use local caches |
| No monitoring or alerting | You find out about failures from users, not dashboards | Instrument metrics, logs, and traces from day one |
| 错误 | 失败原因 | 修复方案 |
|---|---|---|
| 未明确需求就直接架构设计 | 解决了错误的问题或遗漏了关键约束 | 前5-10分钟用于明确范围:功能、规模、SLA |
| 未进行粗略估算 | 配置过度或不足数个数量级 | 选择组件前先估算QPS、存储量和带宽 |
| 单点故障 | 单个组件故障导致整个系统瘫痪 | 在每个层级添加冗余:多服务器、多可用区(multi-AZ)、多区域 |
| 过早分片 | 在需要前引入巨大的运维复杂度 | 先进行垂直扩容,添加读副本,充分利用缓存,最后再分片 |
| 缓存未设置失效策略 | Stale数据导致bug与用户困惑 | 定义TTL,采用cache-aside策略并在写入时显式失效缓存 |
| 全同步调用 | 一个下游服务变慢会导致所有调用者延迟飙升 | 对非延迟敏感路径使用message queues;为同步调用设置超时 |
| 忽略热点问题 | 某个分片或缓存键被大量请求,其他节点闲置 | 检测热点键,添加二级分区,或使用本地缓存 |
| 无监控与告警 | 从用户反馈而非仪表盘发现故障 | 从第一天就埋点指标、日志与链路追踪 |
Quick Diagnostic
快速诊断
| Question | If No | Action |
|---|---|---|
| Are functional and non-functional requirements explicitly listed? | Design is based on assumptions | Write down features, DAU, QPS, storage, latency SLA, availability SLA |
| Do you have a back-of-the-envelope estimate for QPS and storage? | Capacity is a guess | Calculate: DAU x actions / 86400 for QPS; records x size x retention for storage |
| Is every component in the diagram redundant? | Single points of failure exist | Add replicas, failover, or multi-AZ for each component |
| Is the database scaling strategy defined? | You will hit a wall under growth | Plan: vertical first, then read replicas, then sharding with a clear shard key |
| Is there a caching layer for read-heavy paths? | Database takes unnecessary load | Add Redis/Memcached with cache-aside and a defined TTL |
| Are async paths using message queues? | Tight coupling, cascading failures | Decouple with Kafka/SQS for background jobs, notifications, analytics |
| Is there a monitoring and alerting plan? | Blind to failures in production | Define metrics, log aggregation, tracing, and alert thresholds |
| Is the deployment strategy defined? | Risky all-at-once releases | Choose rolling, blue-green, or canary with automated rollback |
| 问题 | 如果答案为否 | 行动 |
|---|---|---|
| 是否明确列出了功能与非功能需求? | 设计基于假设 | 写下功能、DAU、QPS、存储量、延迟SLA、可用性SLA |
| 是否有QPS与存储量的粗略估算? | 容量全靠猜测 | 计算:DAU × 操作数 ÷ 86400得到QPS;记录数 × 大小 × 留存周期得到存储量 |
| 架构图中的每个组件是否都有冗余? | 存在单点故障 | 为每个组件添加副本、故障转移或多可用区部署 |
| 是否定义了数据库扩容策略? | 增长到一定程度会遇到瓶颈 | 规划:先垂直扩容,再添加读副本,最后采用明确分片键的分片策略 |
| 读密集型路径是否有缓存层? | 数据库承受不必要的负载 | 添加Redis/Memcached,采用cache-aside策略并定义TTL |
| 异步路径是否使用message queues? | 耦合紧密,故障会扩散 | 用Kafka/SQS解耦后台任务、通知、分析等流程 |
| 是否有监控与告警计划? | 生产环境故障不可见 | 定义指标、日志聚合、链路追踪与告警阈值 |
| 是否定义了部署策略? | 全量发布风险高 | 选择滚动、蓝绿或金丝雀部署,并配置自动化回滚 |
Reference Files
参考文件
- four-step-process.md: The complete four-step process with time allocation, example questions, and tips for each stage
- estimation-numbers.md: Powers of two, latency numbers, availability nines, QPS/storage/bandwidth estimation with worked examples
- building-blocks.md: DNS, CDN, load balancers, caching strategies, message queues, consistent hashing
- database-scaling.md: SQL vs NoSQL, replication, sharding strategies, denormalization, database selection guide
- common-designs.md: URL shortener, rate limiter, news feed, chat system, search autocomplete, web crawler, unique ID generator
- reliability-operations.md: Health checks, monitoring, logging, deployment strategies, disaster recovery, autoscaling
- four-step-process.md: 完整的四步流程,包含时间分配、示例问题和各阶段技巧
- estimation-numbers.md: 2的幂、延迟数值、可用性等级、QPS/存储/带宽估算的示例
- building-blocks.md: DNS、CDN、load balancers、缓存策略、message queues、consistent hashing
- database-scaling.md: SQL vs NoSQL、复制、分片策略、反规范化、数据库选择指南
- common-designs.md: URL短链接、rate limiter、信息流、聊天系统、搜索自动补全、网络爬虫、唯一ID生成器
- reliability-operations.md: 健康检查、监控、日志、部署策略、灾难恢复、自动扩缩容
Further Reading
进一步阅读
This skill is based on Alex Xu's practical system design methodology. For the complete guides with detailed diagrams and walkthroughs:
- "System Design Interview -- An Insider's Guide" by Alex Xu (Volume 1)
- "System Design Interview -- An Insider's Guide: Volume 2" by Alex Xu (Volume 2)
- "Designing Data-Intensive Applications" by Martin Kleppmann (deep dive into data systems fundamentals)
- ByteByteGo -- Alex Xu's platform with visual system design explanations
本内容基于Alex Xu的实用系统设计方法论。如需获取包含详细图表与演练的完整指南:
- 《System Design Interview -- An Insider's Guide》 作者Alex Xu(第一卷)
- 《System Design Interview -- An Insider's Guide: Volume 2》 作者Alex Xu(第二卷)
- 《Designing Data-Intensive Applications》 作者Martin Kleppmann(数据系统基础深度解析)
- ByteByteGo -- Alex Xu的平台,提供可视化的系统设计讲解
About the Author
关于作者
Alex Xu is a software engineer and the creator of ByteByteGo, one of the most popular platforms for learning system design. His two-volume System Design Interview series has become the de facto preparation resource for engineers at all levels, with over 500,000 copies sold. Xu's approach emphasizes structured thinking, back-of-the-envelope estimation, and clear communication of design decisions. Before ByteByteGo, he worked at Twitter, Apple, and Oracle. His visual explanations and step-by-step frameworks have made system design accessible to a broad engineering audience, transforming what was traditionally an opaque topic into a learnable, repeatable skill.
Alex Xu是一名软件工程师,同时是ByteByteGo的创始人,ByteByteGo是最受欢迎的系统设计学习平台之一。他的两卷《System Design Interview》系列已成为各层级工程师的标准备考资料,销量超过50万册。Alex Xu的方法强调结构化思维、粗略估算和清晰传达设计决策。在创立ByteByteGo之前,他曾任职于Twitter、Apple和Oracle。他的可视化讲解与分步框架让系统设计这一传统上晦涩的话题变得易于学习、可重复掌握,面向广大工程师群体开放。