database-schema-design
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabase Schema Design
数据库Schema设计
Overview
概述
Guide the design, implementation, and optimization of database schemas with sound data modeling, safe migrations, effective indexing, and appropriate query patterns. This skill covers the full lifecycle from conceptual modeling through physical optimization, ensuring schemas that are normalized, performant, and safely evolvable.
Announce at start: "I'm using the database-schema-design skill to design the database schema."
本指南基于合理的数据建模、安全的迁移流程、高效的索引策略和合适的查询模式,指导数据库 schema 的设计、实现与优化。该技能覆盖从概念建模到物理优化的全生命周期,可确保 schema 满足规范化要求、性能达标且可安全迭代。
开始时声明: "I'm using the database-schema-design skill to design the database schema."
Phase 1: Discovery and Conceptual Model
阶段1:需求调研与概念模型
Ask these questions to understand the data requirements:
| # | Question | What It Determines |
|---|---|---|
| 1 | What entities does the system manage? | Table names |
| 2 | What are the relationships between entities? | Foreign keys, join tables |
| 3 | What are the key attributes of each entity? | Column definitions |
| 4 | What are the primary query patterns? | Index strategy |
| 5 | What is the expected data volume? (rows, growth rate) | Partitioning, scaling |
| 6 | What is the read/write ratio? | Normalization vs denormalization |
| 7 | SQL or NoSQL? (or both?) | Storage engine selection |
通过询问以下问题明确数据需求:
| # | 问题 | 作用 |
|---|---|---|
| 1 | 系统需要管理哪些实体? | 确定表名 |
| 2 | 实体之间的关系是什么? | 确定外键、关联表 |
| 3 | 每个实体的核心属性有哪些? | 确定列定义 |
| 4 | 核心查询模式是什么? | 确定索引策略 |
| 5 | 预期数据量级是多少?(行数、增长速率) | 确定分区、扩容方案 |
| 6 | 读写比例是多少? | 权衡规范化与反规范化 |
| 7 | 用SQL还是NoSQL?(还是两者结合?) | 确定存储引擎选型 |
Storage Engine Decision Table
存储引擎选型表
| Factor | Choose SQL (PostgreSQL, MySQL) | Choose Document (MongoDB) | Choose Key-Value (Redis) |
|---|---|---|---|
| Data shape | Structured, relational | Semi-structured, nested | Simple lookups, caching |
| Query complexity | Complex joins, aggregations | Document-level queries | Key-based access only |
| Consistency needs | ACID required | Eventual consistency OK | Ephemeral or cached data |
| Schema evolution | Migrations manageable | Schema-free flexibility | No schema |
| Scale pattern | Vertical first, then read replicas | Horizontal sharding | In-memory, limited size |
STOP after discovery — present the conceptual model (entities, relationships, cardinality) for confirmation.
| 考量因素 | 选择SQL(PostgreSQL、MySQL) | 选择文档型数据库(MongoDB) | 选择键值对数据库(Redis) |
|---|---|---|---|
| 数据形态 | 结构化、关系型 | 半结构化、嵌套结构 | 简单查询、缓存场景 |
| 查询复杂度 | 复杂关联、聚合查询 | 文档级查询 | 仅需键值访问 |
| 一致性要求 | 需要ACID保证 | 可接受最终一致性 | 临时或缓存数据 |
| schema 迭代需求 | 可通过迁移流程管理 | 需要无 schema 灵活性 | 无 schema |
| 扩容模式 | 先垂直扩容,后读副本扩容 | 水平分片扩容 | 内存存储,容量有限 |
调研完成后暂停——先提交概念模型(实体、关系、 cardinality)供确认。
Phase 2: Logical Model Design
阶段2:逻辑模型设计
Translate the conceptual model into tables, columns, types, and constraints.
将概念模型转换为表、列、数据类型和约束。
Column Design Rules
列设计规则
| Decision | Guidance |
|---|---|
| Primary keys | UUIDs for distributed systems, auto-increment for single-node |
| Column types | Use the most specific type ( |
| Nullability | Default NOT NULL; allow NULL only when absence is meaningful |
| Defaults | Set sensible defaults ( |
| Constraints | Add CHECK, UNIQUE, and FK constraints at the schema level |
| Naming | |
| 决策项 | 指导原则 |
|---|---|
| 主键 | 分布式系统用UUID,单节点系统用自增ID |
| 列类型 | 使用最精准的类型(日期用 |
| 可空性 | 默认设为NOT NULL;仅当值不存在有明确业务含义时才允许NULL |
| 默认值 | 设置合理的默认值( |
| 约束 | 在schema层添加CHECK、UNIQUE和外键约束 |
| 命名规范 | 用 |
Normalization Guide
规范化指南
| Normal Form | Rule | Violation Example | Fix |
|---|---|---|---|
| 1NF | Atomic values, no repeating groups | | Separate |
| 2NF | All non-key columns depend on entire PK | | Move to |
| 3NF | No transitive dependencies | | Separate |
Rule: Always start normalized. Denormalize only with measured evidence.
| 范式 | 规则 | 违反示例 | 修复方案 |
|---|---|---|---|
| 1NF | 原子值,无重复组 | | 拆分出单独的 |
| 2NF | 所有非键列完全依赖主键 | 复合主键的 | 移到 |
| 3NF | 无传递依赖 | | 拆分出单独的 |
规则: 始终从规范化设计开始,仅当有可量化的性能证据时才做反规范化。
Denormalization Decision Table
反规范化决策表
| Scenario | Pattern | When to Apply |
|---|---|---|
| Read-heavy dashboards | Materialized views or summary tables | Measured slow query |
| Frequently joined data | Embed as JSONB column | Join is >80% of query time |
| Reporting / analytics | Separate denormalized reporting tables | OLAP workload |
| Caching layer | Computed columns refreshed on write | High-frequency reads |
| 场景 | 模式 | 适用时机 |
|---|---|---|
| 读密集的看板 | 物化视图或汇总表 | 可量化的慢查询场景 |
| 频繁关联的数据 | 嵌入为JSONB列 | 关联耗时占查询总耗时80%以上 |
| 报表/分析场景 | 单独的反规范化报表表 | OLAP工作负载 |
| 缓存层 | 写入时刷新的计算列 | 高频读场景 |
Relationship Patterns
关系实现模式
| Relationship | Implementation | Index Needed |
|---|---|---|
| One-to-One | FK with UNIQUE constraint on child | On FK column |
| One-to-Many | FK on the "many" side | On FK column |
| Many-to-Many | Junction/join table with composite PK | On both FK columns |
| Polymorphic | Separate FK columns with CHECK constraint (preferred) or type+id pattern | On type+id or each FK |
| Self-referential (trees) | | On parent_id or path |
STOP after logical model — present the table definitions for review.
| 关系类型 | 实现方式 | 需要的索引 |
|---|---|---|
| 一对一 | 子表外键加UNIQUE约束 | 外键列建索引 |
| 一对多 | "多"的一侧加外键 | 外键列建索引 |
| 多对多 | 带复合主键的关联表/连接表 | 两个外键列都建索引 |
| 多态关联 | 优先用带CHECK约束的独立外键列,或类型+ID模式 | 类型+ID列或每个外键列建索引 |
| 自引用(树结构) | 同表 | parent_id或路径列建索引 |
逻辑模型完成后暂停——提交表定义供评审。
Phase 3: Physical Model and Indexing
阶段3:物理模型与索引设计
Index Type Decision Table
索引类型选型表
| Index Type | Best For | Example |
|---|---|---|
| B-tree (default) | Equality and range queries | |
| GIN | Full-text search, JSONB, arrays | |
| Partial | Subset of rows matching condition | |
| Covering (INCLUDE) | Index-only scans avoiding table lookup | |
| Composite | Multi-column queries | |
| 索引类型 | 适用场景 | 示例 |
|---|---|---|
| B-tree(默认) | 等值和范围查询 | |
| GIN | 全文搜索、JSONB、数组 | |
| Partial(部分索引) | 匹配条件的行子集 | |
| Covering(覆盖索引,用INCLUDE) | 仅索引扫描,避免回表 | |
| Composite(联合索引) | 多列查询 | |
Composite Index Column Order
联合索引列顺序规则
| Position | Column Type | Reason |
|---|---|---|
| First | High-cardinality equality columns | Most selective filter first |
| Middle | Additional equality columns | Further narrows results |
| Last | Range columns (dates, numbers) | Range scan on remaining rows |
Rule: A composite index on supports queries on , , — but NOT alone or alone.
(A, B, C)AA+BA+B+CBC| 位置 | 列类型 | 原因 |
|---|---|---|
| 首位 | 高基数等值列 | 优先用最高筛选性的过滤条件 |
| 中间 | 其他等值列 | 进一步缩小结果范围 |
| 末位 | 范围列(日期、数字) | 对剩余行做范围扫描 |
规则: 联合索引支持、、的查询——但不支持单独查或单独查。
(A, B, C)AA+BA+B+CBCQuery Optimization Checklist
查询优化检查清单
| Signal in EXPLAIN ANALYZE | Problem | Fix |
|---|---|---|
| Seq Scan on large table | Missing index | Add appropriate index |
| Nested Loop with large outer table | Inefficient join | Add index or restructure query |
| High actual vs estimated rows | Stale statistics | Run |
| Hash Join high memory | | Tune |
| EXPLAIN ANALYZE中的信号 | 问题 | 修复方案 |
|---|---|---|
| 大表上的顺序扫描 | 缺失索引 | 添加合适的索引 |
| 大外表的嵌套循环连接 | 关联效率低 | 添加索引或重构查询 |
| 实际行数与预估行数差异大 | 统计信息过时 | 对表执行 |
| 哈希连接内存占用过高 | | 调优 |
N+1 Detection and Prevention
N+1问题检测与预防
sql
-- N+1 problem (bad):
SELECT * FROM users;
-- Then for EACH user: SELECT * FROM orders WHERE user_id = ?;
-- Fixed with join:
SELECT u.*, o.* FROM users u LEFT JOIN orders o ON o.user_id = u.id;
-- Fixed with batch load:
SELECT * FROM orders WHERE user_id = ANY($1);STOP after physical model — present indexes and optimization strategy for review.
sql
-- N+1问题(错误写法):
SELECT * FROM users;
-- 然后对每个用户执行: SELECT * FROM orders WHERE user_id = ?;
-- 用关联查询修复:
SELECT u.*, o.* FROM users u LEFT JOIN orders o ON o.user_id = u.id;
-- 用批量加载修复:
SELECT * FROM orders WHERE user_id = ANY($1);物理模型完成后暂停——提交索引和优化策略供评审。
Phase 4: Migration Strategy
阶段4:迁移策略
Zero-Downtime Migration (Expand-Contract)
零停机迁移(扩缩模式)
Never make a breaking change in a single migration. Use two phases:
Expand phase (backward compatible):
- Add new column/table (nullable or with default)
- Deploy code that writes to both old and new
- Backfill existing data in batches
- Deploy code that reads from new
Contract phase (after all code uses new schema):
- Remove code that writes to old
- Drop old column/table
不要在单次迁移中做破坏性变更,分两个阶段执行:
扩容阶段(向后兼容):
- 添加新列/新表(可空或带默认值)
- 部署同时写新旧结构的代码
- 分批回填历史数据
- 部署从新结构读数据的代码
缩容阶段(所有代码都使用新schema后执行):
- 移除写旧结构的代码
- 删除旧列/旧表
Migration Safety Rules
迁移安全规则
| Rule | Rationale |
|---|---|
| Every migration has a corresponding rollback | Safe to revert |
| Test rollback in staging before production | Verify reversibility |
| Data-destructive rollbacks need explicit approval | Prevent accidental data loss |
| Keep migration files immutable once applied | Reproducible state |
| Backfill large tables in batches (1000 rows) | Avoid table locks |
| 规则 | 原因 |
|---|---|
| 每次迁移都要有对应的回滚脚本 | 可安全回退 |
| 生产环境执行前先在测试环境测试回滚 | 验证可回滚性 |
| 会导致数据丢失的回滚需要明确审批 | 避免意外数据丢失 |
| 已执行的迁移文件保持不可变 | 保证环境状态可复现 |
| 大表数据分批回填(每次1000行) | 避免锁表 |
Backfill Pattern
数据回填模式
sql
-- Backfill in chunks of 1000
UPDATE users SET display_name = username
WHERE display_name IS NULL
AND id IN (SELECT id FROM users WHERE display_name IS NULL LIMIT 1000);sql
-- 每次回填1000条
UPDATE users SET display_name = username
WHERE display_name IS NULL
AND id IN (SELECT id FROM users WHERE display_name IS NULL LIMIT 1000);Migration Type Decision Table
迁移类型决策表
| Change Type | Safe Approach | Dangerous Approach |
|---|---|---|
| Add column | Add nullable or with default | Add NOT NULL without default |
| Remove column | Expand-contract (two deploys) | Drop column directly |
| Rename column | Add new, copy data, drop old | ALTER RENAME (breaks queries) |
| Add index | | |
| Change column type | Add new column, migrate data | |
STOP after migration plan — confirm rollback strategy before finalizing.
| 变更类型 | 安全方案 | 危险方案 |
|---|---|---|
| 添加列 | 添加可空列或带默认值的列 | 添加不带默认值的NOT NULL列 |
| 删除列 | 扩缩模式(两次部署) | 直接删除列 |
| 重命名列 | 添加新列、同步数据、删除旧列 | ALTER RENAME(会破坏现有查询) |
| 添加索引 | | |
| 修改列类型 | 添加新列、迁移数据 | |
迁移方案完成后暂停——最终确认前先验证回滚策略。
Phase 5: Save and Transition
阶段5:存档与交接
After explicit approval:
- Save schema design to or generate migration files
docs/database/ - Commit with message:
docs(db): add schema design for <feature>
获得明确批准后:
- 将schema设计保存到目录或生成迁移文件
docs/database/ - 提交信息写:
docs(db): add schema design for <feature>
Transition Decision Table
交接决策表
| User Intent | Next Skill | Rationale |
|---|---|---|
| "Create the migrations" | | Plan migration implementation |
| "Write specs for this" | | Behavioral specs for data operations |
| "Implement the schema" | | TDD with migration tests |
| "Just save the design" | None | Schema design is the deliverable |
| "Review for performance" | | Analyze query patterns |
| 用户意图 | 下一个使用的技能 | 原因 |
|---|---|---|
| "创建迁移脚本" | | 规划迁移实现方案 |
| "编写相关规格说明" | | 编写数据操作的行为规格 |
| "实现这个schema" | | 基于TDD编写迁移测试 |
| "仅保存设计即可" | 无 | schema设计本身就是交付物 |
| "做性能评审" | | 分析查询模式 |
ORM Guidance
ORM指导
| ORM | Language | Strength | Watch Out For |
|---|---|---|---|
| Prisma | TypeScript | Type-safe schema, migrations | N+1 in nested queries, limited raw SQL |
| Drizzle | TypeScript | SQL-like API, lightweight | Newer ecosystem, fewer guides |
| SQLAlchemy | Python | Mature, flexible, raw SQL support | Complex session management |
| GORM | Go | Convention-based, auto-migrate | Silent failures, implicit behavior |
| ORM | 适用语言 | 优势 | 注意事项 |
|---|---|---|---|
| Prisma | TypeScript | 类型安全的schema、迁移能力 | 嵌套查询容易出现N+1问题,原生SQL支持有限 |
| Drizzle | TypeScript | 类SQL API、轻量 | 生态较新,相关教程较少 |
| SQLAlchemy | Python | 成熟、灵活,支持原生SQL | 会话管理复杂 |
| GORM | Go | 约定优于配置,支持自动迁移 | 容易出现静默失败,隐式行为多 |
ORM Best Practices
ORM最佳实践
- Always review generated SQL (enable query logging in development)
- Use eager loading to prevent N+1 queries
- Write raw SQL for complex queries rather than fighting the ORM
- Use ORM migrations, not auto-sync in production
- Test query performance with realistic data volumes
- 始终检查生成的SQL(开发环境开启查询日志)
- 使用预加载避免N+1查询
- 复杂查询直接写原生SQL,不要强行适配ORM
- 生产环境使用ORM迁移功能,不要用自动同步
- 用真实数据量级测试查询性能
Connection Pooling
连接池配置
- Use a connection pooler (PgBouncer, built-in pool)
- Pool size formula:
connections = (CPU cores * 2) + disk spindles - Use transaction-level pooling for most workloads
- Application servers should not open raw connections
- 使用连接池工具(PgBouncer、框架内置连接池)
- 连接数计算公式:
connections = (CPU核心数 * 2) + 磁盘主轴数 - 大多数工作负载使用事务级连接池
- 应用服务不要直接打开原生连接
Anti-Patterns / Common Mistakes
反模式/常见错误
| Mistake | Why It Is Wrong | What To Do Instead |
|---|---|---|
| No foreign key constraints | Orphaned data, broken relationships | Always define FK constraints |
| VARCHAR for everything | Loses type safety, wastes storage | Use specific types (timestamptz, int, uuid) |
| No indexes on FK columns | Slow joins on related tables | Index every FK column |
| Premature denormalization | Complexity without measured benefit | Start normalized, denormalize with evidence |
| Dropping columns directly | Breaks running application code | Use expand-contract pattern |
| Locks table during index creation | Always use |
| Auto-sync schema in production | Unpredictable destructive changes | Use explicit migration files |
| No rollback plan for migrations | Cannot recover from failed deploy | Write down migration for every up migration |
| Nullable columns everywhere | Loses data integrity guarantees | Default NOT NULL, allow NULL intentionally |
| 错误 | 危害 | 替代方案 |
|---|---|---|
| 无外键约束 | 产生孤儿数据、关系断裂 | 始终定义外键约束 |
| 所有列都用VARCHAR | 丢失类型安全,浪费存储空间 | 使用专用类型(timestamptz、int、uuid) |
| 外键列无索引 | 关联表查询慢 | 每个外键列都建索引 |
| 过早反规范化 | 引入不必要复杂度,无实际收益 | 从规范化设计开始,有性能证据再反规范化 |
| 直接删除列 | 破坏线上运行的应用代码 | 使用扩缩模式 |
| 不用CONCURRENTLY创建索引 | 创建索引期间锁表 | 生产环境始终用 |
| 生产环境开启schema自动同步 | 产生不可预测的破坏性变更 | 使用显式的迁移文件 |
| 迁移无回滚计划 | 部署失败后无法恢复 | 每个升级迁移都对应写回滚迁移 |
| 所有列都允许为NULL | 丢失数据完整性保证 | 默认设为NOT NULL,仅必要时允许NULL |
Anti-Rationalization Guards
反不合理决策约束
- Do NOT skip the conceptual model — understand entities and relationships first
- Do NOT add indexes speculatively — measure query patterns first
- Do NOT denormalize without measured evidence of a performance problem
- Do NOT create migrations without rollback plans
- Do NOT skip the discovery phase — understand query patterns and data volume
- Do NOT drop columns or tables without expand-contract pattern in production
- 不要跳过概念模型——先明确实体和关系
- 不要凭猜测加索引——先量化查询模式
- 不要无性能证据就做反规范化
- 不要写无回滚计划的迁移
- 不要跳过调研阶段——先明确查询模式和数据量级
- 不要生产环境不用扩缩模式就删除列或表
Documentation Lookup (Context7)
文档查询(Context7)
Use then for up-to-date docs. Returned docs override memorized knowledge.
mcp__context7__resolve-library-idmcp__context7__query-docs- — for schema syntax, relations, or migration API
prisma - — for entity decorators, repository patterns, or query builder
typeorm - — for query builder syntax, migrations, or seed files
knex
先调用再调用获取最新文档,返回的文档优先级高于记忆中的知识。
mcp__context7__resolve-library-idmcp__context7__query-docs- —— 查询schema语法、关系、迁移API
prisma - —— 查询实体装饰器、仓库模式、查询构建器
typeorm - —— 查询构建器语法、迁移、种子文件
knex
Integration Points
集成点
| Skill | Relationship |
|---|---|
| Upstream: API resources map to database entities |
| Upstream: specs define data persistence requirements |
| Downstream: schema design informs implementation plan |
| Downstream: migration tests written before migration code |
| Downstream: query optimization after schema is live |
| Upstream: reverse-engineer existing schema behavior |
| Parallel: backend specialist for ORM and query patterns |
| 技能 | 关系 |
|---|---|
| 上游:API资源与数据库实体对应 |
| 上游:规格说明定义数据持久化要求 |
| 下游:schema设计为实现计划提供输入 |
| 下游:迁移代码编写前先写迁移测试 |
| 下游:schema上线后做查询优化 |
| 上游:反向推导现有schema的行为 |
| 并行:后端专家提供ORM和查询模式相关支持 |
Verification Gate
验证关口
Before claiming the schema design is complete:
- VERIFY all entities and relationships are modeled
- VERIFY normalization is at least 3NF (or denormalization is justified)
- VERIFY indexes are defined for all query patterns and FK columns
- VERIFY migration strategy includes rollback for every step
- VERIFY the user has approved the schema design
- VERIFY connection pooling strategy is defined for production
确认schema设计完成前,请验证:
- 所有实体和关系都已建模
- 规范化程度至少达到3NF(或反规范化有充分理由)
- 所有查询模式和外键列都定义了索引
- 迁移策略每个步骤都包含回滚方案
- 用户已批准schema设计
- 生产环境的连接池策略已明确
Skill Type
技能类型
Flexible — Adapt storage engine, normalization level, and index strategy to project needs while preserving the conceptual-to-physical modeling progression, migration safety rules, and measured-evidence-before-denormalization principle.
灵活适配 —— 可根据项目需求调整存储引擎、规范化级别和索引策略,同时保留从概念到物理建模的流程、迁移安全规则,以及反规范化前先有量化证据的原则。