data-systems-architecture

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Data Systems Architecture

数据系统架构

Overview

概述

Core principle: Good data system architecture balances reliability (correct operation under faults), scalability (handling growth gracefully), and maintainability (enabling productive change over time). Every architectural decision involves trade-offs between these concerns.

This skill synthesizes knowledge from three foundational texts:

Designing Data-Intensive Applications (Kleppmann) - distributed systems, storage engines, scaling
The Art of PostgreSQL (Fontaine) - PostgreSQL-specific patterns, SQL as programming
PostgreSQL Query Optimization (Dombrovskaya et al.) - execution plans, performance tuning

核心原则：优秀的数据系统架构需平衡可靠性（故障下仍能正确运行）、可扩展性（从容应对业务增长）和可维护性（支持长期高效迭代变更）。每一项架构决策都需要在这些关注点之间进行权衡。

本技能整合了三本经典著作的知识：

《Designing Data-Intensive Applications》（Kleppmann）- 分布式系统、存储引擎、扩展方案
《The Art of PostgreSQL》（Fontaine）- PostgreSQL专属模式、SQL编程
《PostgreSQL Query Optimization》（Dombrovskaya等）- 执行计划、性能调优

When to Use

适用场景

Symptom	Start With
Designing a new database/schema	`01-foundational-principles.md`
Normalization vs denormalization decisions	`02-data-modeling.md`
Need to understand OLTP vs OLAP	`03-storage-engines.md`
Slow queries, index selection	`04-indexing.md`
Planning for growth, read replicas	`05-scaling-patterns.md`
Race conditions, deadlocks, isolation issues	`06-transactions-concurrency.md`
N+1 queries, ORM problems, application integration	`07-application-integration.md`

场景	开始阅读
设计新数据库/模式	`01-foundational-principles.md`
规范化与反规范化决策	`02-data-modeling.md`
需要理解OLTP与OLAP的区别	`03-storage-engines.md`
查询缓慢、索引选择	`04-indexing.md`
规划增长方案、只读副本	`05-scaling-patterns.md`
竞态条件、死锁、隔离级别问题	`06-transactions-concurrency.md`
N+1查询、ORM问题、应用集成	`07-application-integration.md`

Navigation

01-foundational-principles.md    - Reliability/Scalability/Maintainability, load parameters
02-data-modeling.md              - Normalization, denormalization, schema design patterns
03-storage-engines.md            - B-trees, LSM-trees, OLTP vs OLAP, PostgreSQL internals
04-indexing.md                   - Index types, compound indexes, covering indexes, maintenance
05-scaling-patterns.md           - Replication, partitioning, sharding strategies
06-transactions-concurrency.md   - ACID, isolation levels, MVCC, locking patterns
07-application-integration.md    - ORM pitfalls, N+1, business logic placement, batch processing

01-foundational-principles.md    - 可靠性/可扩展性/可维护性、负载参数
02-data-modeling.md              - 规范化、反规范化、模式设计模式
03-storage-engines.md            - B-trees、LSM-trees、OLTP vs OLAP、PostgreSQL内部机制
04-indexing.md                   - 索引类型、复合索引、覆盖索引、维护
05-scaling-patterns.md           - 复制、分区、分片策略
06-transactions-concurrency.md   - ACID、隔离级别、MVCC、锁模式
07-application-integration.md    - ORM陷阱、N+1问题、业务逻辑位置、批处理

Quick Decision Framework

快速决策框架

New system design?
├─ Yes → Read 01, then 02 for data model
└─ No → What's the problem?
         ├─ "Queries are slow" → Read 04 (indexing) + 03 (storage patterns)
         ├─ "Data is inconsistent" → Read 02 (modeling) + 06 (transactions)
         ├─ "Can't handle the load" → Read 05 (scaling) + 03 (OLTP vs OLAP)
         ├─ "App makes too many queries" → Read 07 (N+1, ORM patterns)
         └─ "Race conditions/deadlocks" → Read 06 (concurrency)

新系统设计？
├─ 是 → 先阅读01，再阅读02了解数据模型
└─ 否 → 问题是什么？
         ├─ "查询缓慢" → 阅读04（索引） + 03（存储模式）
         ├─ "数据不一致" → 阅读02（建模） + 06（事务）
         ├─ "无法承载负载" → 阅读05（扩展） + 03（OLTP vs OLAP）
         ├─ "应用查询次数过多" → 阅读07（N+1、ORM模式）
         └─ "竞态条件/死锁" → 阅读06（并发）

Core Concepts (Quick Reference)

核心概念（快速参考）

The Three Pillars

三大支柱

Concern	Definition	Key Question
Reliability	System works correctly under faults	What happens when things fail?
Scalability	Handles growth gracefully	What's 10x load look like?
Maintainability	Easy to operate and evolve	Can new engineers understand this?

关注点	定义	关键问题
可靠性	系统在故障下仍能正确运行	出现故障时会发生什么？
可扩展性	从容应对业务增长	10倍负载下系统表现如何？
可维护性	易于运维和迭代	新工程师能快速理解吗？

Data Model Selection

数据模型选择

Model	Best For	Avoid When
Relational	Many-to-many relationships, joins, consistency	Highly hierarchical data, constant schema changes
Document	Self-contained docs, tree structures	Need for joins, many-to-many
Graph	Highly connected data, recursive queries	Simple CRUD, no relationship traversal

模型	适用场景	避免场景
关系型	多对多关系、关联查询、一致性要求高	高度层级化数据、频繁变更模式
文档型	自包含文档、树形结构	需要关联查询、多对多关系
图型	高度关联数据、递归查询	简单CRUD操作、无需关系遍历

OLTP vs OLAP

Aspect	OLTP	OLAP
Query pattern	Point lookups, few rows	Aggregates, many rows
Optimization	Index everything used in WHERE	Fewer indexes, full scans OK
Storage	Row-oriented	Consider column-oriented

维度	OLTP	OLAP
查询模式	点查询、少量行	聚合查询、大量行
优化方向	为WHERE子句中用到的字段建立索引	减少索引、允许全表扫描
存储方式	行式存储	考虑列式存储

Index Type Quick Reference

索引类型快速参考

Type	Use Case	PostgreSQL
B-tree	Equality, range, sorting	Default, most queries
Hash	Equality only	Faster for exact match
GIN	Arrays, JSONB, full-text	`@>` , `@@` operators
GiST	Geometric, range types	PostGIS, nearest-neighbor
BRIN	Large, naturally ordered tables	Time-series data

类型	适用场景	PostgreSQL支持情况
B-tree	等值查询、范围查询、排序	默认类型，适用于大多数查询
Hash	仅等值查询	精确匹配速度更快
GIN	数组、JSONB、全文检索	支持 `@>` 、 `@@` 运算符
GiST	几何类型、范围类型	支持PostGIS、最近邻查询
BRIN	大型、自然有序的表	时间序列数据

Isolation Levels

隔离级别

Level	Prevents	PostgreSQL Default?
Read Committed	Dirty reads	Yes
Repeatable Read	+ Non-repeatable reads	No
Serializable	All anomalies	No (uses SSI)

级别	防止问题	是否为PostgreSQL默认？
Read Committed	脏读	是
Repeatable Read	+ 不可重复读	否
Serializable	所有异常	否（使用SSI）

Design Checklist

设计检查清单

References

参考资料

Kleppmann, M. Designing Data-Intensive Applications (O'Reilly, 2017)
Fontaine, D. The Art of PostgreSQL (2nd ed., 2020)
Dombrovskaya, H., Novikov, B., Bailliekova, A. PostgreSQL Query Optimization (Apress, 2021)

Kleppmann, M. 《Designing Data-Intensive Applications》（O'Reilly, 2017）
Fontaine, D. 《The Art of PostgreSQL》（第2版, 2020）
Dombrovskaya, H., Novikov, B., Bailliekova, A. 《PostgreSQL Query Optimization》（Apress, 2021）

data-systems-architecture

Original

Translation

Data Systems Architecture

数据系统架构

Overview

概述

When to Use

适用场景

Navigation

导航

Reference Files (Load as needed)

参考文件（按需加载）

Quick Decision Framework

快速决策框架

Core Concepts (Quick Reference)

核心概念（快速参考）

The Three Pillars

三大支柱

Data Model Selection

数据模型选择

OLTP vs OLAP

OLTP vs OLAP

Index Type Quick Reference

索引类型快速参考

Isolation Levels

隔离级别

Design Checklist

设计检查清单

References

参考资料