database-schema-design

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Database Schema Design

数据库Schema设计

Overview

概述

Guide the design, implementation, and optimization of database schemas with sound data modeling, safe migrations, effective indexing, and appropriate query patterns. This skill covers the full lifecycle from conceptual modeling through physical optimization, ensuring schemas that are normalized, performant, and safely evolvable.
Announce at start: "I'm using the database-schema-design skill to design the database schema."
本指南基于合理的数据建模、安全的迁移流程、高效的索引策略和合适的查询模式,指导数据库 schema 的设计、实现与优化。该技能覆盖从概念建模到物理优化的全生命周期,可确保 schema 满足规范化要求、性能达标且可安全迭代。
开始时声明: "I'm using the database-schema-design skill to design the database schema."

Phase 1: Discovery and Conceptual Model

阶段1:需求调研与概念模型

Ask these questions to understand the data requirements:
#QuestionWhat It Determines
1What entities does the system manage?Table names
2What are the relationships between entities?Foreign keys, join tables
3What are the key attributes of each entity?Column definitions
4What are the primary query patterns?Index strategy
5What is the expected data volume? (rows, growth rate)Partitioning, scaling
6What is the read/write ratio?Normalization vs denormalization
7SQL or NoSQL? (or both?)Storage engine selection
通过询问以下问题明确数据需求:
#问题作用
1系统需要管理哪些实体?确定表名
2实体之间的关系是什么?确定外键、关联表
3每个实体的核心属性有哪些?确定列定义
4核心查询模式是什么?确定索引策略
5预期数据量级是多少?(行数、增长速率)确定分区、扩容方案
6读写比例是多少?权衡规范化与反规范化
7用SQL还是NoSQL?(还是两者结合?)确定存储引擎选型

Storage Engine Decision Table

存储引擎选型表

FactorChoose SQL (PostgreSQL, MySQL)Choose Document (MongoDB)Choose Key-Value (Redis)
Data shapeStructured, relationalSemi-structured, nestedSimple lookups, caching
Query complexityComplex joins, aggregationsDocument-level queriesKey-based access only
Consistency needsACID requiredEventual consistency OKEphemeral or cached data
Schema evolutionMigrations manageableSchema-free flexibilityNo schema
Scale patternVertical first, then read replicasHorizontal shardingIn-memory, limited size
STOP after discovery — present the conceptual model (entities, relationships, cardinality) for confirmation.
考量因素选择SQL(PostgreSQL、MySQL)选择文档型数据库(MongoDB)选择键值对数据库(Redis)
数据形态结构化、关系型半结构化、嵌套结构简单查询、缓存场景
查询复杂度复杂关联、聚合查询文档级查询仅需键值访问
一致性要求需要ACID保证可接受最终一致性临时或缓存数据
schema 迭代需求可通过迁移流程管理需要无 schema 灵活性无 schema
扩容模式先垂直扩容,后读副本扩容水平分片扩容内存存储,容量有限
调研完成后暂停——先提交概念模型(实体、关系、 cardinality)供确认。

Phase 2: Logical Model Design

阶段2:逻辑模型设计

Translate the conceptual model into tables, columns, types, and constraints.
将概念模型转换为表、列、数据类型和约束。

Column Design Rules

列设计规则

DecisionGuidance
Primary keysUUIDs for distributed systems, auto-increment for single-node
Column typesUse the most specific type (
timestamptz
not
varchar
for dates)
NullabilityDefault NOT NULL; allow NULL only when absence is meaningful
DefaultsSet sensible defaults (
created_at DEFAULT now()
)
ConstraintsAdd CHECK, UNIQUE, and FK constraints at the schema level
Naming
snake_case
, singular table names or plural — be consistent
决策项指导原则
主键分布式系统用UUID,单节点系统用自增ID
列类型使用最精准的类型(日期用
timestamptz
而非
varchar
可空性默认设为NOT NULL;仅当值不存在有明确业务含义时才允许NULL
默认值设置合理的默认值(
created_at DEFAULT now()
约束在schema层添加CHECK、UNIQUE和外键约束
命名规范
snake_case
,表名统一用单数或复数,保持一致性

Normalization Guide

规范化指南

Normal FormRuleViolation ExampleFix
1NFAtomic values, no repeating groups
tags VARCHAR "urgent,priority,vip"
Separate
order_tags
table
2NFAll non-key columns depend on entire PK
product_name
in
order_items
(composite PK)
Move to
products
table
3NFNo transitive dependencies
city
depends on
zip_code
, not
user_id
Separate
zip_codes
table
Rule: Always start normalized. Denormalize only with measured evidence.
范式规则违反示例修复方案
1NF原子值,无重复组
tags VARCHAR "urgent,priority,vip"
拆分出单独的
order_tags
2NF所有非键列完全依赖主键复合主键的
order_items
表中包含
product_name
移到
products
表中
3NF无传递依赖
city
依赖
zip_code
而非
user_id
拆分出单独的
zip_codes
规则: 始终从规范化设计开始,仅当有可量化的性能证据时才做反规范化。

Denormalization Decision Table

反规范化决策表

ScenarioPatternWhen to Apply
Read-heavy dashboardsMaterialized views or summary tablesMeasured slow query
Frequently joined dataEmbed as JSONB columnJoin is >80% of query time
Reporting / analyticsSeparate denormalized reporting tablesOLAP workload
Caching layerComputed columns refreshed on writeHigh-frequency reads
场景模式适用时机
读密集的看板物化视图或汇总表可量化的慢查询场景
频繁关联的数据嵌入为JSONB列关联耗时占查询总耗时80%以上
报表/分析场景单独的反规范化报表表OLAP工作负载
缓存层写入时刷新的计算列高频读场景

Relationship Patterns

关系实现模式

RelationshipImplementationIndex Needed
One-to-OneFK with UNIQUE constraint on childOn FK column
One-to-ManyFK on the "many" sideOn FK column
Many-to-ManyJunction/join table with composite PKOn both FK columns
PolymorphicSeparate FK columns with CHECK constraint (preferred) or type+id patternOn type+id or each FK
Self-referential (trees)
parent_id
FK to same table; or
ltree
/materialized path
On parent_id or path
STOP after logical model — present the table definitions for review.
关系类型实现方式需要的索引
一对一子表外键加UNIQUE约束外键列建索引
一对多"多"的一侧加外键外键列建索引
多对多带复合主键的关联表/连接表两个外键列都建索引
多态关联优先用带CHECK约束的独立外键列,或类型+ID模式类型+ID列或每个外键列建索引
自引用(树结构)同表
parent_id
外键;或
ltree
/物化路径
parent_id或路径列建索引
逻辑模型完成后暂停——提交表定义供评审。

Phase 3: Physical Model and Indexing

阶段3:物理模型与索引设计

Index Type Decision Table

索引类型选型表

Index TypeBest ForExample
B-tree (default)Equality and range queries
CREATE INDEX idx_users_email ON users(email)
GINFull-text search, JSONB, arrays
CREATE INDEX idx_posts_search ON posts USING GIN(to_tsvector('english', body))
PartialSubset of rows matching condition
CREATE INDEX idx_active_users ON users(email) WHERE active = true
Covering (INCLUDE)Index-only scans avoiding table lookup
CREATE INDEX idx_users_email ON users(email) INCLUDE (name)
CompositeMulti-column queries
CREATE INDEX idx_orders ON orders(tenant_id, status)
索引类型适用场景示例
B-tree(默认)等值和范围查询
CREATE INDEX idx_users_email ON users(email)
GIN全文搜索、JSONB、数组
CREATE INDEX idx_posts_search ON posts USING GIN(to_tsvector('english', body))
Partial(部分索引)匹配条件的行子集
CREATE INDEX idx_active_users ON users(email) WHERE active = true
Covering(覆盖索引,用INCLUDE)仅索引扫描,避免回表
CREATE INDEX idx_users_email ON users(email) INCLUDE (name)
Composite(联合索引)多列查询
CREATE INDEX idx_orders ON orders(tenant_id, status)

Composite Index Column Order

联合索引列顺序规则

PositionColumn TypeReason
FirstHigh-cardinality equality columnsMost selective filter first
MiddleAdditional equality columnsFurther narrows results
LastRange columns (dates, numbers)Range scan on remaining rows
Rule: A composite index on
(A, B, C)
supports queries on
A
,
A+B
,
A+B+C
— but NOT
B
alone or
C
alone.
位置列类型原因
首位高基数等值列优先用最高筛选性的过滤条件
中间其他等值列进一步缩小结果范围
末位范围列(日期、数字)对剩余行做范围扫描
规则: 联合索引
(A, B, C)
支持
A
A+B
A+B+C
的查询——但不支持单独查
B
或单独查
C

Query Optimization Checklist

查询优化检查清单

Signal in EXPLAIN ANALYZEProblemFix
Seq Scan on large tableMissing indexAdd appropriate index
Nested Loop with large outer tableInefficient joinAdd index or restructure query
High actual vs estimated rowsStale statisticsRun
ANALYZE
on table
Hash Join high memory
work_mem
too low
Tune
work_mem
or restructure
EXPLAIN ANALYZE中的信号问题修复方案
大表上的顺序扫描缺失索引添加合适的索引
大外表的嵌套循环连接关联效率低添加索引或重构查询
实际行数与预估行数差异大统计信息过时对表执行
ANALYZE
哈希连接内存占用过高
work_mem
设置过低
调优
work_mem
或重构查询

N+1 Detection and Prevention

N+1问题检测与预防

sql
-- N+1 problem (bad):
SELECT * FROM users;
-- Then for EACH user: SELECT * FROM orders WHERE user_id = ?;

-- Fixed with join:
SELECT u.*, o.* FROM users u LEFT JOIN orders o ON o.user_id = u.id;

-- Fixed with batch load:
SELECT * FROM orders WHERE user_id = ANY($1);
STOP after physical model — present indexes and optimization strategy for review.
sql
-- N+1问题(错误写法):
SELECT * FROM users;
-- 然后对每个用户执行: SELECT * FROM orders WHERE user_id = ?;

-- 用关联查询修复:
SELECT u.*, o.* FROM users u LEFT JOIN orders o ON o.user_id = u.id;

-- 用批量加载修复:
SELECT * FROM orders WHERE user_id = ANY($1);
物理模型完成后暂停——提交索引和优化策略供评审。

Phase 4: Migration Strategy

阶段4:迁移策略

Zero-Downtime Migration (Expand-Contract)

零停机迁移(扩缩模式)

Never make a breaking change in a single migration. Use two phases:
Expand phase (backward compatible):
  1. Add new column/table (nullable or with default)
  2. Deploy code that writes to both old and new
  3. Backfill existing data in batches
  4. Deploy code that reads from new
Contract phase (after all code uses new schema):
  1. Remove code that writes to old
  2. Drop old column/table
不要在单次迁移中做破坏性变更,分两个阶段执行:
扩容阶段(向后兼容):
  1. 添加新列/新表(可空或带默认值)
  2. 部署同时写新旧结构的代码
  3. 分批回填历史数据
  4. 部署从新结构读数据的代码
缩容阶段(所有代码都使用新schema后执行):
  1. 移除写旧结构的代码
  2. 删除旧列/旧表

Migration Safety Rules

迁移安全规则

RuleRationale
Every migration has a corresponding rollbackSafe to revert
Test rollback in staging before productionVerify reversibility
Data-destructive rollbacks need explicit approvalPrevent accidental data loss
Keep migration files immutable once appliedReproducible state
Backfill large tables in batches (1000 rows)Avoid table locks
规则原因
每次迁移都要有对应的回滚脚本可安全回退
生产环境执行前先在测试环境测试回滚验证可回滚性
会导致数据丢失的回滚需要明确审批避免意外数据丢失
已执行的迁移文件保持不可变保证环境状态可复现
大表数据分批回填(每次1000行)避免锁表

Backfill Pattern

数据回填模式

sql
-- Backfill in chunks of 1000
UPDATE users SET display_name = username
WHERE display_name IS NULL
AND id IN (SELECT id FROM users WHERE display_name IS NULL LIMIT 1000);
sql
-- 每次回填1000条
UPDATE users SET display_name = username
WHERE display_name IS NULL
AND id IN (SELECT id FROM users WHERE display_name IS NULL LIMIT 1000);

Migration Type Decision Table

迁移类型决策表

Change TypeSafe ApproachDangerous Approach
Add columnAdd nullable or with defaultAdd NOT NULL without default
Remove columnExpand-contract (two deploys)Drop column directly
Rename columnAdd new, copy data, drop oldALTER RENAME (breaks queries)
Add index
CREATE INDEX CONCURRENTLY
CREATE INDEX
(locks table)
Change column typeAdd new column, migrate data
ALTER COLUMN TYPE
(locks table)
STOP after migration plan — confirm rollback strategy before finalizing.
变更类型安全方案危险方案
添加列添加可空列或带默认值的列添加不带默认值的NOT NULL列
删除列扩缩模式(两次部署)直接删除列
重命名列添加新列、同步数据、删除旧列ALTER RENAME(会破坏现有查询)
添加索引
CREATE INDEX CONCURRENTLY
CREATE INDEX
(会锁表)
修改列类型添加新列、迁移数据
ALTER COLUMN TYPE
(会锁表)
迁移方案完成后暂停——最终确认前先验证回滚策略。

Phase 5: Save and Transition

阶段5:存档与交接

After explicit approval:
  1. Save schema design to
    docs/database/
    or generate migration files
  2. Commit with message:
    docs(db): add schema design for <feature>
获得明确批准后:
  1. 将schema设计保存到
    docs/database/
    目录或生成迁移文件
  2. 提交信息写:
    docs(db): add schema design for <feature>

Transition Decision Table

交接决策表

User IntentNext SkillRationale
"Create the migrations"
planning
Plan migration implementation
"Write specs for this"
spec-writing
Behavioral specs for data operations
"Implement the schema"
test-driven-development
TDD with migration tests
"Just save the design"NoneSchema design is the deliverable
"Review for performance"
performance-optimization
Analyze query patterns
用户意图下一个使用的技能原因
"创建迁移脚本"
planning
规划迁移实现方案
"编写相关规格说明"
spec-writing
编写数据操作的行为规格
"实现这个schema"
test-driven-development
基于TDD编写迁移测试
"仅保存设计即可"schema设计本身就是交付物
"做性能评审"
performance-optimization
分析查询模式

ORM Guidance

ORM指导

ORMLanguageStrengthWatch Out For
PrismaTypeScriptType-safe schema, migrationsN+1 in nested queries, limited raw SQL
DrizzleTypeScriptSQL-like API, lightweightNewer ecosystem, fewer guides
SQLAlchemyPythonMature, flexible, raw SQL supportComplex session management
GORMGoConvention-based, auto-migrateSilent failures, implicit behavior
ORM适用语言优势注意事项
PrismaTypeScript类型安全的schema、迁移能力嵌套查询容易出现N+1问题,原生SQL支持有限
DrizzleTypeScript类SQL API、轻量生态较新,相关教程较少
SQLAlchemyPython成熟、灵活,支持原生SQL会话管理复杂
GORMGo约定优于配置,支持自动迁移容易出现静默失败,隐式行为多

ORM Best Practices

ORM最佳实践

  • Always review generated SQL (enable query logging in development)
  • Use eager loading to prevent N+1 queries
  • Write raw SQL for complex queries rather than fighting the ORM
  • Use ORM migrations, not auto-sync in production
  • Test query performance with realistic data volumes
  • 始终检查生成的SQL(开发环境开启查询日志)
  • 使用预加载避免N+1查询
  • 复杂查询直接写原生SQL,不要强行适配ORM
  • 生产环境使用ORM迁移功能,不要用自动同步
  • 用真实数据量级测试查询性能

Connection Pooling

连接池配置

  • Use a connection pooler (PgBouncer, built-in pool)
  • Pool size formula:
    connections = (CPU cores * 2) + disk spindles
  • Use transaction-level pooling for most workloads
  • Application servers should not open raw connections
  • 使用连接池工具(PgBouncer、框架内置连接池)
  • 连接数计算公式:
    connections = (CPU核心数 * 2) + 磁盘主轴数
  • 大多数工作负载使用事务级连接池
  • 应用服务不要直接打开原生连接

Anti-Patterns / Common Mistakes

反模式/常见错误

MistakeWhy It Is WrongWhat To Do Instead
No foreign key constraintsOrphaned data, broken relationshipsAlways define FK constraints
VARCHAR for everythingLoses type safety, wastes storageUse specific types (timestamptz, int, uuid)
No indexes on FK columnsSlow joins on related tablesIndex every FK column
Premature denormalizationComplexity without measured benefitStart normalized, denormalize with evidence
Dropping columns directlyBreaks running application codeUse expand-contract pattern
CREATE INDEX
without CONCURRENTLY
Locks table during index creationAlways use
CONCURRENTLY
in production
Auto-sync schema in productionUnpredictable destructive changesUse explicit migration files
No rollback plan for migrationsCannot recover from failed deployWrite down migration for every up migration
Nullable columns everywhereLoses data integrity guaranteesDefault NOT NULL, allow NULL intentionally
错误危害替代方案
无外键约束产生孤儿数据、关系断裂始终定义外键约束
所有列都用VARCHAR丢失类型安全,浪费存储空间使用专用类型(timestamptz、int、uuid)
外键列无索引关联表查询慢每个外键列都建索引
过早反规范化引入不必要复杂度,无实际收益从规范化设计开始,有性能证据再反规范化
直接删除列破坏线上运行的应用代码使用扩缩模式
不用CONCURRENTLY创建索引创建索引期间锁表生产环境始终用
CONCURRENTLY
生产环境开启schema自动同步产生不可预测的破坏性变更使用显式的迁移文件
迁移无回滚计划部署失败后无法恢复每个升级迁移都对应写回滚迁移
所有列都允许为NULL丢失数据完整性保证默认设为NOT NULL,仅必要时允许NULL

Anti-Rationalization Guards

反不合理决策约束

  • Do NOT skip the conceptual model — understand entities and relationships first
  • Do NOT add indexes speculatively — measure query patterns first
  • Do NOT denormalize without measured evidence of a performance problem
  • Do NOT create migrations without rollback plans
  • Do NOT skip the discovery phase — understand query patterns and data volume
  • Do NOT drop columns or tables without expand-contract pattern in production
  • 不要跳过概念模型——先明确实体和关系
  • 不要凭猜测加索引——先量化查询模式
  • 不要无性能证据就做反规范化
  • 不要写无回滚计划的迁移
  • 不要跳过调研阶段——先明确查询模式和数据量级
  • 不要生产环境不用扩缩模式就删除列或表

Documentation Lookup (Context7)

文档查询(Context7)

Use
mcp__context7__resolve-library-id
then
mcp__context7__query-docs
for up-to-date docs. Returned docs override memorized knowledge.
  • prisma
    — for schema syntax, relations, or migration API
  • typeorm
    — for entity decorators, repository patterns, or query builder
  • knex
    — for query builder syntax, migrations, or seed files

先调用
mcp__context7__resolve-library-id
再调用
mcp__context7__query-docs
获取最新文档,返回的文档优先级高于记忆中的知识。
  • prisma
    —— 查询schema语法、关系、迁移API
  • typeorm
    —— 查询实体装饰器、仓库模式、查询构建器
  • knex
    —— 查询构建器语法、迁移、种子文件

Integration Points

集成点

SkillRelationship
api-design
Upstream: API resources map to database entities
spec-writing
Upstream: specs define data persistence requirements
planning
Downstream: schema design informs implementation plan
test-driven-development
Downstream: migration tests written before migration code
performance-optimization
Downstream: query optimization after schema is live
reverse-engineering-specs
Upstream: reverse-engineer existing schema behavior
senior-backend
Parallel: backend specialist for ORM and query patterns
技能关系
api-design
上游:API资源与数据库实体对应
spec-writing
上游:规格说明定义数据持久化要求
planning
下游:schema设计为实现计划提供输入
test-driven-development
下游:迁移代码编写前先写迁移测试
performance-optimization
下游:schema上线后做查询优化
reverse-engineering-specs
上游:反向推导现有schema的行为
senior-backend
并行:后端专家提供ORM和查询模式相关支持

Verification Gate

验证关口

Before claiming the schema design is complete:
  1. VERIFY all entities and relationships are modeled
  2. VERIFY normalization is at least 3NF (or denormalization is justified)
  3. VERIFY indexes are defined for all query patterns and FK columns
  4. VERIFY migration strategy includes rollback for every step
  5. VERIFY the user has approved the schema design
  6. VERIFY connection pooling strategy is defined for production
确认schema设计完成前,请验证:
  1. 所有实体和关系都已建模
  2. 规范化程度至少达到3NF(或反规范化有充分理由)
  3. 所有查询模式和外键列都定义了索引
  4. 迁移策略每个步骤都包含回滚方案
  5. 用户已批准schema设计
  6. 生产环境的连接池策略已明确

Skill Type

技能类型

Flexible — Adapt storage engine, normalization level, and index strategy to project needs while preserving the conceptual-to-physical modeling progression, migration safety rules, and measured-evidence-before-denormalization principle.
灵活适配 —— 可根据项目需求调整存储引擎、规范化级别和索引策略,同时保留从概念到物理建模的流程、迁移安全规则,以及反规范化前先有量化证据的原则。