database-schema-design

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Database Schema Design

数据库Schema设计

Overview

概述

Guide the design, implementation, and optimization of database schemas with sound data modeling, safe migrations, effective indexing, and appropriate query patterns. This skill covers the full lifecycle from conceptual modeling through physical optimization, ensuring schemas that are normalized, performant, and safely evolvable.

Announce at start: "I'm using the database-schema-design skill to design the database schema."

本指南基于合理的数据建模、安全的迁移流程、高效的索引策略和合适的查询模式，指导数据库 schema 的设计、实现与优化。该技能覆盖从概念建模到物理优化的全生命周期，可确保 schema 满足规范化要求、性能达标且可安全迭代。

开始时声明： "I'm using the database-schema-design skill to design the database schema."

Phase 1: Discovery and Conceptual Model

阶段1：需求调研与概念模型

Ask these questions to understand the data requirements:

#	Question	What It Determines
1	What entities does the system manage?	Table names
2	What are the relationships between entities?	Foreign keys, join tables
3	What are the key attributes of each entity?	Column definitions
4	What are the primary query patterns?	Index strategy
5	What is the expected data volume? (rows, growth rate)	Partitioning, scaling
6	What is the read/write ratio?	Normalization vs denormalization
7	SQL or NoSQL? (or both?)	Storage engine selection

通过询问以下问题明确数据需求：

#	问题	作用
1	系统需要管理哪些实体？	确定表名
2	实体之间的关系是什么？	确定外键、关联表
3	每个实体的核心属性有哪些？	确定列定义
4	核心查询模式是什么？	确定索引策略
5	预期数据量级是多少？（行数、增长速率）	确定分区、扩容方案
6	读写比例是多少？	权衡规范化与反规范化
7	用SQL还是NoSQL？（还是两者结合？）	确定存储引擎选型

Storage Engine Decision Table

存储引擎选型表

Factor	Choose SQL (PostgreSQL, MySQL)	Choose Document (MongoDB)	Choose Key-Value (Redis)
Data shape	Structured, relational	Semi-structured, nested	Simple lookups, caching
Query complexity	Complex joins, aggregations	Document-level queries	Key-based access only
Consistency needs	ACID required	Eventual consistency OK	Ephemeral or cached data
Schema evolution	Migrations manageable	Schema-free flexibility	No schema
Scale pattern	Vertical first, then read replicas	Horizontal sharding	In-memory, limited size

STOP after discovery — present the conceptual model (entities, relationships, cardinality) for confirmation.

考量因素	选择SQL（PostgreSQL、MySQL）	选择文档型数据库（MongoDB）	选择键值对数据库（Redis）
数据形态	结构化、关系型	半结构化、嵌套结构	简单查询、缓存场景
查询复杂度	复杂关联、聚合查询	文档级查询	仅需键值访问
一致性要求	需要ACID保证	可接受最终一致性	临时或缓存数据
schema 迭代需求	可通过迁移流程管理	需要无 schema 灵活性	无 schema
扩容模式	先垂直扩容，后读副本扩容	水平分片扩容	内存存储，容量有限

调研完成后暂停——先提交概念模型（实体、关系、 cardinality）供确认。

Phase 2: Logical Model Design

阶段2：逻辑模型设计

Translate the conceptual model into tables, columns, types, and constraints.

将概念模型转换为表、列、数据类型和约束。

Column Design Rules

列设计规则

Decision	Guidance
Primary keys	UUIDs for distributed systems, auto-increment for single-node
Column types	Use the most specific type ( `timestamptz` not `varchar` for dates)
Nullability	Default NOT NULL; allow NULL only when absence is meaningful
Defaults	Set sensible defaults ( `created_at DEFAULT now()` )
Constraints	Add CHECK, UNIQUE, and FK constraints at the schema level
Naming	`snake_case` , singular table names or plural — be consistent

决策项	指导原则
主键	分布式系统用UUID，单节点系统用自增ID
列类型	使用最精准的类型（日期用 `timestamptz` 而非 `varchar` ）
可空性	默认设为NOT NULL；仅当值不存在有明确业务含义时才允许NULL
默认值	设置合理的默认值（ `created_at DEFAULT now()` ）
约束	在schema层添加CHECK、UNIQUE和外键约束
命名规范	用 `snake_case` ，表名统一用单数或复数，保持一致性

Normalization Guide

规范化指南

Normal Form	Rule	Violation Example	Fix
1NF	Atomic values, no repeating groups	`tags VARCHAR "urgent,priority,vip"`	Separate `order_tags` table
2NF	All non-key columns depend on entire PK	`product_name` in `order_items` (composite PK)	Move to `products` table
3NF	No transitive dependencies	`city` depends on `zip_code` , not `user_id`	Separate `zip_codes` table

Rule: Always start normalized. Denormalize only with measured evidence.

范式	规则	违反示例	修复方案
1NF	原子值，无重复组	`tags VARCHAR "urgent,priority,vip"`	拆分出单独的 `order_tags` 表
2NF	所有非键列完全依赖主键	复合主键的 `order_items` 表中包含 `product_name`	移到 `products` 表中
3NF	无传递依赖	`city` 依赖 `zip_code` 而非 `user_id`	拆分出单独的 `zip_codes` 表

规则： 始终从规范化设计开始，仅当有可量化的性能证据时才做反规范化。

Denormalization Decision Table

反规范化决策表

Scenario	Pattern	When to Apply
Read-heavy dashboards	Materialized views or summary tables	Measured slow query
Frequently joined data	Embed as JSONB column	Join is >80% of query time
Reporting / analytics	Separate denormalized reporting tables	OLAP workload
Caching layer	Computed columns refreshed on write	High-frequency reads

场景	模式	适用时机
读密集的看板	物化视图或汇总表	可量化的慢查询场景
频繁关联的数据	嵌入为JSONB列	关联耗时占查询总耗时80%以上
报表/分析场景	单独的反规范化报表表	OLAP工作负载
缓存层	写入时刷新的计算列	高频读场景

Relationship Patterns

关系实现模式

Relationship	Implementation	Index Needed
One-to-One	FK with UNIQUE constraint on child	On FK column
One-to-Many	FK on the "many" side	On FK column
Many-to-Many	Junction/join table with composite PK	On both FK columns
Polymorphic	Separate FK columns with CHECK constraint (preferred) or type+id pattern	On type+id or each FK
Self-referential (trees)	`parent_id` FK to same table; or `ltree` /materialized path	On parent_id or path

STOP after logical model — present the table definitions for review.

关系类型	实现方式	需要的索引
一对一	子表外键加UNIQUE约束	外键列建索引
一对多	"多"的一侧加外键	外键列建索引
多对多	带复合主键的关联表/连接表	两个外键列都建索引
多态关联	优先用带CHECK约束的独立外键列，或类型+ID模式	类型+ID列或每个外键列建索引
自引用（树结构）	同表 `parent_id` 外键；或 `ltree` /物化路径	parent_id或路径列建索引

逻辑模型完成后暂停——提交表定义供评审。

Phase 3: Physical Model and Indexing

阶段3：物理模型与索引设计

Index Type Decision Table

索引类型选型表

Index Type	Best For	Example
B-tree (default)	Equality and range queries	`CREATE INDEX idx_users_email ON users(email)`
GIN	Full-text search, JSONB, arrays	`CREATE INDEX idx_posts_search ON posts USING GIN(to_tsvector('english', body))`
Partial	Subset of rows matching condition	`CREATE INDEX idx_active_users ON users(email) WHERE active = true`
Covering (INCLUDE)	Index-only scans avoiding table lookup	`CREATE INDEX idx_users_email ON users(email) INCLUDE (name)`
Composite	Multi-column queries	`CREATE INDEX idx_orders ON orders(tenant_id, status)`

索引类型	适用场景	示例
B-tree（默认）	等值和范围查询	`CREATE INDEX idx_users_email ON users(email)`
GIN	全文搜索、JSONB、数组	`CREATE INDEX idx_posts_search ON posts USING GIN(to_tsvector('english', body))`
Partial（部分索引）	匹配条件的行子集	`CREATE INDEX idx_active_users ON users(email) WHERE active = true`
Covering（覆盖索引，用INCLUDE）	仅索引扫描，避免回表	`CREATE INDEX idx_users_email ON users(email) INCLUDE (name)`
Composite（联合索引）	多列查询	`CREATE INDEX idx_orders ON orders(tenant_id, status)`

Composite Index Column Order

联合索引列顺序规则

Position	Column Type	Reason
First	High-cardinality equality columns	Most selective filter first
Middle	Additional equality columns	Further narrows results
Last	Range columns (dates, numbers)	Range scan on remaining rows

Rule: A composite index on

(A, B, C)

supports queries on

A+B

A+B+C

— but NOT

alone or

alone.

位置	列类型	原因
首位	高基数等值列	优先用最高筛选性的过滤条件
中间	其他等值列	进一步缩小结果范围
末位	范围列（日期、数字）	对剩余行做范围扫描

规则： 联合索引

(A, B, C)

支持

、

A+B

、

A+B+C

的查询——但不支持单独查

或单独查

。

Query Optimization Checklist

查询优化检查清单

Signal in EXPLAIN ANALYZE	Problem	Fix
Seq Scan on large table	Missing index	Add appropriate index
Nested Loop with large outer table	Inefficient join	Add index or restructure query
High actual vs estimated rows	Stale statistics	Run `ANALYZE` on table
Hash Join high memory	`work_mem` too low	Tune `work_mem` or restructure

EXPLAIN ANALYZE中的信号	问题	修复方案
大表上的顺序扫描	缺失索引	添加合适的索引
大外表的嵌套循环连接	关联效率低	添加索引或重构查询
实际行数与预估行数差异大	统计信息过时	对表执行 `ANALYZE`
哈希连接内存占用过高	`work_mem` 设置过低	调优 `work_mem` 或重构查询

N+1 Detection and Prevention

N+1问题检测与预防

sql

-- N+1 problem (bad):
SELECT * FROM users;
-- Then for EACH user: SELECT * FROM orders WHERE user_id = ?;

-- Fixed with join:
SELECT u.*, o.* FROM users u LEFT JOIN orders o ON o.user_id = u.id;

-- Fixed with batch load:
SELECT * FROM orders WHERE user_id = ANY($1);

STOP after physical model — present indexes and optimization strategy for review.

sql

-- N+1问题（错误写法）:
SELECT * FROM users;
-- 然后对每个用户执行: SELECT * FROM orders WHERE user_id = ?;

-- 用关联查询修复:
SELECT u.*, o.* FROM users u LEFT JOIN orders o ON o.user_id = u.id;

-- 用批量加载修复:
SELECT * FROM orders WHERE user_id = ANY($1);

物理模型完成后暂停——提交索引和优化策略供评审。

Phase 4: Migration Strategy

阶段4：迁移策略

Zero-Downtime Migration (Expand-Contract)

零停机迁移（扩缩模式）

Never make a breaking change in a single migration. Use two phases:

Expand phase (backward compatible):

Add new column/table (nullable or with default)
Deploy code that writes to both old and new
Backfill existing data in batches
Deploy code that reads from new

Contract phase (after all code uses new schema):

Remove code that writes to old
Drop old column/table

不要在单次迁移中做破坏性变更，分两个阶段执行：

扩容阶段（向后兼容）：

添加新列/新表（可空或带默认值）
部署同时写新旧结构的代码
分批回填历史数据
部署从新结构读数据的代码

缩容阶段（所有代码都使用新schema后执行）：

移除写旧结构的代码
删除旧列/旧表

Migration Safety Rules

迁移安全规则

Rule	Rationale
Every migration has a corresponding rollback	Safe to revert
Test rollback in staging before production	Verify reversibility
Data-destructive rollbacks need explicit approval	Prevent accidental data loss
Keep migration files immutable once applied	Reproducible state
Backfill large tables in batches (1000 rows)	Avoid table locks

规则	原因
每次迁移都要有对应的回滚脚本	可安全回退
生产环境执行前先在测试环境测试回滚	验证可回滚性
会导致数据丢失的回滚需要明确审批	避免意外数据丢失
已执行的迁移文件保持不可变	保证环境状态可复现
大表数据分批回填（每次1000行）	避免锁表

Backfill Pattern

数据回填模式

sql

-- Backfill in chunks of 1000
UPDATE users SET display_name = username
WHERE display_name IS NULL
AND id IN (SELECT id FROM users WHERE display_name IS NULL LIMIT 1000);

sql

-- 每次回填1000条
UPDATE users SET display_name = username
WHERE display_name IS NULL
AND id IN (SELECT id FROM users WHERE display_name IS NULL LIMIT 1000);

Migration Type Decision Table

迁移类型决策表

Change Type	Safe Approach	Dangerous Approach
Add column	Add nullable or with default	Add NOT NULL without default
Remove column	Expand-contract (two deploys)	Drop column directly
Rename column	Add new, copy data, drop old	ALTER RENAME (breaks queries)
Add index	`CREATE INDEX CONCURRENTLY`	`CREATE INDEX` (locks table)
Change column type	Add new column, migrate data	`ALTER COLUMN TYPE` (locks table)

STOP after migration plan — confirm rollback strategy before finalizing.

变更类型	安全方案	危险方案
添加列	添加可空列或带默认值的列	添加不带默认值的NOT NULL列
删除列	扩缩模式（两次部署）	直接删除列
重命名列	添加新列、同步数据、删除旧列	ALTER RENAME（会破坏现有查询）
添加索引	`CREATE INDEX CONCURRENTLY`	`CREATE INDEX` （会锁表）
修改列类型	添加新列、迁移数据	`ALTER COLUMN TYPE` （会锁表）

迁移方案完成后暂停——最终确认前先验证回滚策略。

Phase 5: Save and Transition

阶段5：存档与交接

After explicit approval:

Save schema design to
```
docs/database/
```
or generate migration files

Commit with message:

docs(db): add schema design for <feature>

获得明确批准后：

将schema设计保存到
```
docs/database/
```
目录或生成迁移文件

提交信息写：

docs(db): add schema design for <feature>

Transition Decision Table

交接决策表

User Intent	Next Skill	Rationale
"Create the migrations"	`planning`	Plan migration implementation
"Write specs for this"	`spec-writing`	Behavioral specs for data operations
"Implement the schema"	`test-driven-development`	TDD with migration tests
"Just save the design"	None	Schema design is the deliverable
"Review for performance"	`performance-optimization`	Analyze query patterns

用户意图	下一个使用的技能	原因
"创建迁移脚本"	`planning`	规划迁移实现方案
"编写相关规格说明"	`spec-writing`	编写数据操作的行为规格
"实现这个schema"	`test-driven-development`	基于TDD编写迁移测试
"仅保存设计即可"	无	schema设计本身就是交付物
"做性能评审"	`performance-optimization`	分析查询模式

ORM Guidance

ORM指导

ORM	Language	Strength	Watch Out For
Prisma	TypeScript	Type-safe schema, migrations	N+1 in nested queries, limited raw SQL
Drizzle	TypeScript	SQL-like API, lightweight	Newer ecosystem, fewer guides
SQLAlchemy	Python	Mature, flexible, raw SQL support	Complex session management
GORM	Go	Convention-based, auto-migrate	Silent failures, implicit behavior

ORM	适用语言	优势	注意事项
Prisma	TypeScript	类型安全的schema、迁移能力	嵌套查询容易出现N+1问题，原生SQL支持有限
Drizzle	TypeScript	类SQL API、轻量	生态较新，相关教程较少
SQLAlchemy	Python	成熟、灵活，支持原生SQL	会话管理复杂
GORM	Go	约定优于配置，支持自动迁移	容易出现静默失败，隐式行为多

ORM Best Practices

ORM最佳实践

Always review generated SQL (enable query logging in development)
Use eager loading to prevent N+1 queries
Write raw SQL for complex queries rather than fighting the ORM
Use ORM migrations, not auto-sync in production
Test query performance with realistic data volumes

始终检查生成的SQL（开发环境开启查询日志）
使用预加载避免N+1查询
复杂查询直接写原生SQL，不要强行适配ORM
生产环境使用ORM迁移功能，不要用自动同步
用真实数据量级测试查询性能

Connection Pooling

连接池配置

Use a connection pooler (PgBouncer, built-in pool)

Pool size formula:

connections = (CPU cores * 2) + disk spindles

Use transaction-level pooling for most workloads
Application servers should not open raw connections

使用连接池工具（PgBouncer、框架内置连接池）

连接数计算公式：

connections = (CPU核心数 * 2) + 磁盘主轴数

大多数工作负载使用事务级连接池
应用服务不要直接打开原生连接

Anti-Patterns / Common Mistakes

反模式/常见错误

Mistake	Why It Is Wrong	What To Do Instead
No foreign key constraints	Orphaned data, broken relationships	Always define FK constraints
VARCHAR for everything	Loses type safety, wastes storage	Use specific types (timestamptz, int, uuid)
No indexes on FK columns	Slow joins on related tables	Index every FK column
Premature denormalization	Complexity without measured benefit	Start normalized, denormalize with evidence
Dropping columns directly	Breaks running application code	Use expand-contract pattern
`CREATE INDEX` without CONCURRENTLY	Locks table during index creation	Always use `CONCURRENTLY` in production
Auto-sync schema in production	Unpredictable destructive changes	Use explicit migration files
No rollback plan for migrations	Cannot recover from failed deploy	Write down migration for every up migration
Nullable columns everywhere	Loses data integrity guarantees	Default NOT NULL, allow NULL intentionally

错误	危害	替代方案
无外键约束	产生孤儿数据、关系断裂	始终定义外键约束
所有列都用VARCHAR	丢失类型安全，浪费存储空间	使用专用类型（timestamptz、int、uuid）
外键列无索引	关联表查询慢	每个外键列都建索引
过早反规范化	引入不必要复杂度，无实际收益	从规范化设计开始，有性能证据再反规范化
直接删除列	破坏线上运行的应用代码	使用扩缩模式
不用CONCURRENTLY创建索引	创建索引期间锁表	生产环境始终用 `CONCURRENTLY`
生产环境开启schema自动同步	产生不可预测的破坏性变更	使用显式的迁移文件
迁移无回滚计划	部署失败后无法恢复	每个升级迁移都对应写回滚迁移
所有列都允许为NULL	丢失数据完整性保证	默认设为NOT NULL，仅必要时允许NULL

Anti-Rationalization Guards

反不合理决策约束

Do NOT skip the conceptual model — understand entities and relationships first
Do NOT add indexes speculatively — measure query patterns first
Do NOT denormalize without measured evidence of a performance problem
Do NOT create migrations without rollback plans
Do NOT skip the discovery phase — understand query patterns and data volume
Do NOT drop columns or tables without expand-contract pattern in production

不要跳过概念模型——先明确实体和关系
不要凭猜测加索引——先量化查询模式
不要无性能证据就做反规范化
不要写无回滚计划的迁移
不要跳过调研阶段——先明确查询模式和数据量级
不要生产环境不用扩缩模式就删除列或表

Documentation Lookup (Context7)

文档查询（Context7）

Use

mcp__context7__resolve-library-id

then

mcp__context7__query-docs

for up-to-date docs. Returned docs override memorized knowledge.

```
prisma
```
— for schema syntax, relations, or migration API
```
typeorm
```
— for entity decorators, repository patterns, or query builder
```
knex
```
— for query builder syntax, migrations, or seed files

先调用

mcp__context7__resolve-library-id

再调用

mcp__context7__query-docs

获取最新文档，返回的文档优先级高于记忆中的知识。

```
prisma
```
—— 查询schema语法、关系、迁移API
```
typeorm
```
—— 查询实体装饰器、仓库模式、查询构建器
```
knex
```
—— 查询构建器语法、迁移、种子文件

Integration Points

集成点

Skill	Relationship
`api-design`	Upstream: API resources map to database entities
`spec-writing`	Upstream: specs define data persistence requirements
`planning`	Downstream: schema design informs implementation plan
`test-driven-development`	Downstream: migration tests written before migration code
`performance-optimization`	Downstream: query optimization after schema is live
`reverse-engineering-specs`	Upstream: reverse-engineer existing schema behavior
`senior-backend`	Parallel: backend specialist for ORM and query patterns

技能	关系
`api-design`	上游：API资源与数据库实体对应
`spec-writing`	上游：规格说明定义数据持久化要求
`planning`	下游：schema设计为实现计划提供输入
`test-driven-development`	下游：迁移代码编写前先写迁移测试
`performance-optimization`	下游：schema上线后做查询优化
`reverse-engineering-specs`	上游：反向推导现有schema的行为
`senior-backend`	并行：后端专家提供ORM和查询模式相关支持

Verification Gate

验证关口

Before claiming the schema design is complete:

VERIFY all entities and relationships are modeled
VERIFY normalization is at least 3NF (or denormalization is justified)
VERIFY indexes are defined for all query patterns and FK columns
VERIFY migration strategy includes rollback for every step
VERIFY the user has approved the schema design
VERIFY connection pooling strategy is defined for production

确认schema设计完成前，请验证：

所有实体和关系都已建模
规范化程度至少达到3NF（或反规范化有充分理由）
所有查询模式和外键列都定义了索引
迁移策略每个步骤都包含回滚方案
用户已批准schema设计
生产环境的连接池策略已明确

Skill Type

技能类型

Flexible — Adapt storage engine, normalization level, and index strategy to project needs while preserving the conceptual-to-physical modeling progression, migration safety rules, and measured-evidence-before-denormalization principle.

灵活适配 —— 可根据项目需求调整存储引擎、规范化级别和索引策略，同时保留从概念到物理建模的流程、迁移安全规则，以及反规范化前先有量化证据的原则。