database-designer

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Database Designer - POWERFUL Tier Skill

Database Designer - 强力级技能

Overview

概述

A comprehensive database design skill that provides expert-level analysis, optimization, and migration capabilities for modern database systems. This skill combines theoretical principles with practical tools to help architects and developers create scalable, performant, and maintainable database schemas.

这是一项全面的数据库设计技能，为现代数据库系统提供专家级的分析、优化和迁移能力。该技能将理论原则与实用工具相结合，帮助架构师和开发者创建可扩展、高性能且易于维护的数据库架构。

Core Competencies

核心能力

Schema Design & Analysis

架构设计与分析

Normalization Analysis: Automated detection of normalization levels (1NF through BCNF)
Denormalization Strategy: Smart recommendations for performance optimization
Data Type Optimization: Identification of inappropriate types and size issues
Constraint Analysis: Missing foreign keys, unique constraints, and null checks
Naming Convention Validation: Consistent table and column naming patterns
ERD Generation: Automatic Mermaid diagram creation from DDL

规范化分析：自动检测规范化级别（1NF至BCNF）
反规范化策略：性能优化的智能建议
数据类型优化：识别不合适的数据类型及大小问题
约束分析：检查缺失的外键、唯一约束和空值校验
命名规范验证：确保表和列的命名模式一致
ERD生成：从DDL自动创建Mermaid图表

Index Optimization

索引优化

Index Gap Analysis: Identification of missing indexes on foreign keys and query patterns
Composite Index Strategy: Optimal column ordering for multi-column indexes
Index Redundancy Detection: Elimination of overlapping and unused indexes
Performance Impact Modeling: Selectivity estimation and query cost analysis
Index Type Selection: B-tree, hash, partial, covering, and specialized indexes

索引缺口分析：识别外键和查询模式中缺失的索引
复合索引策略：多列索引的最优列排序方案
索引冗余检测：消除重叠和未使用的索引
性能影响建模：选择性估算和查询成本分析
索引类型选择：B-tree、哈希、部分索引、覆盖索引及专用索引

Migration Management

迁移管理

Zero-Downtime Migrations: Expand-contract pattern implementation
Schema Evolution: Safe column additions, deletions, and type changes
Data Migration Scripts: Automated data transformation and validation
Rollback Strategy: Complete reversal capabilities with validation
Execution Planning: Ordered migration steps with dependency resolution

零停机迁移：实现扩容-收缩模式
架构演进：安全地添加、删除列及修改数据类型
数据迁移脚本：自动数据转换与验证
回滚策略：具备完整的验证回滚能力
执行规划：按依赖关系排序的迁移步骤

Database Design Principles

数据库设计原则

→ See references/database-design-reference.md for details

→ 详情请参考references/database-design-reference.md

Best Practices

最佳实践

Schema Design

架构设计

Use meaningful names: Clear, consistent naming conventions
Choose appropriate data types: Right-sized columns for storage efficiency
Define proper constraints: Foreign keys, check constraints, unique indexes
Consider future growth: Plan for scale from the beginning
Document relationships: Clear foreign key relationships and business rules

使用有意义的名称：清晰、一致的命名规范
选择合适的数据类型：大小适配的列以提升存储效率
定义恰当的约束：外键、检查约束、唯一索引
考虑未来增长：从一开始就规划可扩展性
记录关系：明确的外键关系和业务规则

Performance Optimization

性能优化

Index strategically: Cover common query patterns without over-indexing
Monitor query performance: Regular analysis of slow queries
Partition large tables: Improve query performance and maintenance
Use appropriate isolation levels: Balance consistency with performance
Implement connection pooling: Efficient resource utilization

策略性建索引：覆盖常见查询模式但避免过度索引
监控查询性能：定期分析慢查询
分区大表：提升查询性能和维护效率
使用合适的隔离级别：平衡一致性与性能
实现连接池：高效利用资源

Security Considerations

安全考量

Principle of least privilege: Grant minimal necessary permissions
Encrypt sensitive data: At rest and in transit
Audit access patterns: Monitor and log database access
Validate inputs: Prevent SQL injection attacks
Regular security updates: Keep database software current

最小权限原则：仅授予必要的最小权限
加密敏感数据：静态和传输中的数据都需加密
审计访问模式：监控并记录数据库访问
验证输入：防止SQL注入攻击
定期安全更新：保持数据库软件为最新版本

Query Generation Patterns

查询生成模式

SELECT with JOINs

带JOIN的SELECT

sql

-- INNER JOIN: only matching rows
SELECT o.id, c.name, o.total
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id;

-- LEFT JOIN: all left rows, NULLs for non-matches
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.name;

-- Self-join: hierarchical data (employees/managers)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;

sql

-- INNER JOIN: 仅匹配行
SELECT o.id, c.name, o.total
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id;

-- LEFT JOIN: 左表所有行，无匹配则为NULL
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.name;

-- 自连接：层级数据（员工/经理）
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;

Common Table Expressions (CTEs)

公共表表达式（CTE）

sql

-- Recursive CTE for org chart
WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS depth
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.depth + 1
  FROM employees e INNER JOIN org o ON o.id = e.manager_id
)
SELECT * FROM org ORDER BY depth, name;

sql

-- 用于组织架构图的递归CTE
WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS depth
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.depth + 1
  FROM employees e INNER JOIN org o ON o.id = e.manager_id
)
SELECT * FROM org ORDER BY depth, name;

Window Functions

窗口函数

sql

-- ROW_NUMBER for pagination / dedup
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders;

-- RANK with gaps, DENSE_RANK without gaps
SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM leaderboard;

-- LAG/LEAD for comparing adjacent rows
SELECT date, revenue,
  revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change
FROM daily_sales;

sql

-- ROW_NUMBER用于分页/去重
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders;

-- RANK会产生间隙，DENSE_RANK无间隙
SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM leaderboard;

-- LAG/LEAD用于比较相邻行
SELECT date, revenue,
  revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change
FROM daily_sales;

Aggregation Patterns

聚合模式

sql

-- FILTER clause (PostgreSQL) for conditional aggregation
SELECT
  COUNT(*) AS total,
  COUNT(*) FILTER (WHERE status = 'active') AS active,
  AVG(amount) FILTER (WHERE amount > 0) AS avg_positive
FROM accounts;

-- GROUPING SETS for multi-level rollups
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS ((region, product), (region), ());

sql

-- FILTER子句（PostgreSQL）用于条件聚合
SELECT
  COUNT(*) AS total,
  COUNT(*) FILTER (WHERE status = 'active') AS active,
  AVG(amount) FILTER (WHERE amount > 0) AS avg_positive
FROM accounts;

-- GROUPING SETS用于多级汇总
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS ((region, product), (region), ());

Migration Patterns

迁移模式

Up/Down Migration Scripts

升级/降级迁移脚本

Every migration must have a reversible counterpart. Name files with a timestamp prefix for ordering:

migrations/
├── 20260101_000001_create_users.up.sql
├── 20260101_000001_create_users.down.sql
├── 20260115_000002_add_users_email_index.up.sql
└── 20260115_000002_add_users_email_index.down.sql

每个迁移必须有对应的可逆脚本。文件名称以时间戳前缀排序：

migrations/
├── 20260101_000001_create_users.up.sql
├── 20260101_000001_create_users.down.sql
├── 20260115_000002_add_users_email_index.up.sql
└── 20260115_000002_add_users_email_index.down.sql

Zero-Downtime Migrations (Expand/Contract)

零停机迁移（扩容-收缩模式）

Use the expand-contract pattern to avoid locking or breaking running code:

Expand — add the new column/table (nullable, with default)
Migrate data — backfill in batches; dual-write from application
Transition — application reads from new column; stop writing to old
Contract — drop old column in a follow-up migration

使用扩容-收缩模式避免锁定或破坏运行中的代码：

扩容 — 添加新列/表（可为空，带默认值）
迁移数据 — 分批回填；应用双写至新旧列
过渡 — 应用读取新列；停止写入旧列
收缩 — 在后续迁移中删除旧列

Data Backfill Strategies

数据回填策略

sql

-- Batch update to avoid long-running locks
UPDATE users SET email_normalized = LOWER(email)
WHERE id IN (SELECT id FROM users WHERE email_normalized IS NULL LIMIT 5000);
-- Repeat in a loop until 0 rows affected

sql

-- 批量更新以避免长时间锁定
UPDATE users SET email_normalized = LOWER(email)
WHERE id IN (SELECT id FROM users WHERE email_normalized IS NULL LIMIT 5000);
-- 循环执行直至影响行数为0

Rollback Procedures

回滚流程

Always test the
```
down.sql
```
in staging before deploying
```
up.sql
```
to production
Keep rollback window short — if the contract step has run, rollback requires a new forward migration
For irreversible changes (dropping columns with data), take a logical backup first

在生产环境部署
```
up.sql
```
之前，务必在预发布环境测试
```
down.sql
```
保持回滚窗口较短 — 若已执行收缩步骤，回滚需要新的正向迁移
对于不可逆变更（删除带数据的列），先进行逻辑备份

Performance Optimization

性能优化

Indexing Strategies

索引策略

Index Type	Use Case	Example
B-tree (default)	Equality, range, ORDER BY	`CREATE INDEX idx_users_email ON users(email);`
GIN	Full-text search, JSONB, arrays	`CREATE INDEX idx_docs_body ON docs USING gin(to_tsvector('english', body));`
GiST	Geometry, range types, nearest-neighbor	`CREATE INDEX idx_locations ON places USING gist(coords);`
Partial	Subset of rows (reduce size)	`CREATE INDEX idx_active ON users(email) WHERE active = true;`
Covering	Index-only scans	`CREATE INDEX idx_cov ON orders(customer_id) INCLUDE (total, created_at);`

索引类型	使用场景	示例
B-tree（默认）	等值查询、范围查询、ORDER BY	`CREATE INDEX idx_users_email ON users(email);`
GIN	全文搜索、JSONB、数组	`CREATE INDEX idx_docs_body ON docs USING gin(to_tsvector('english', body));`
GiST	几何类型、范围类型、最近邻查询	`CREATE INDEX idx_locations ON places USING gist(coords);`
部分索引	行的子集（减小索引大小）	`CREATE INDEX idx_active ON users(email) WHERE active = true;`
覆盖索引	仅索引扫描	`CREATE INDEX idx_cov ON orders(customer_id) INCLUDE (total, created_at);`

EXPLAIN Plan Reading

EXPLAIN计划解读

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

Key signals to watch:

Seq Scan on large tables — missing index
Nested Loop with high row estimates — consider hash/merge join or add index
Buffers shared read much higher than hit — working set exceeds memory

sql

EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;

需关注的关键信号：

大表上的Seq Scan — 缺失索引
高行数预估的Nested Loop — 考虑哈希/合并连接或添加索引
Buffers shared read远高于hit — 工作集超出内存

N+1 Query Detection

N+1查询检测

Symptoms: application issues one query per row (e.g., fetching related records in a loop).

Fixes:

Use
```
JOIN
```
or subquery to fetch in one round-trip
ORM eager loading (
```
select_related
```
/
```
includes
```
/
```
with
```
)
DataLoader pattern for GraphQL resolvers

症状：应用为每行发起一次查询（例如，循环获取关联记录）。

解决方案：

使用
```
JOIN
```
或子查询一次性获取数据
ORM预加载（
```
select_related
```
/
```
includes
```
/
```
with
```
）
GraphQL解析器使用DataLoader模式

Connection Pooling

连接池

Tool	Protocol	Best For
PgBouncer	PostgreSQL	Transaction/statement pooling, low overhead
ProxySQL	MySQL	Query routing, read/write splitting
Built-in pool (HikariCP, SQLAlchemy pool)	Any	Application-level pooling

Rule of thumb: Set pool size to

(2 * CPU cores) + disk spindles

. For cloud SSDs, start with

2 * vCPUs

and tune.

工具	协议	适用场景
PgBouncer	PostgreSQL	事务/语句池化，低开销
ProxySQL	MySQL	查询路由，读写分离
内置池（HikariCP、SQLAlchemy pool）	任意	应用层池化

经验法则：池大小设置为

(2 * CPU核心数) + 磁盘主轴数

。对于云SSD，从

2 * vCPUs

开始调整。

Read Replicas and Query Routing

只读副本与查询路由

Route all
```
SELECT
```
queries to replicas; writes to primary
Account for replication lag (typically <1s for async, 0 for sync)
Use
```
pg_last_wal_replay_lsn()
```
to detect lag before reading critical data

将所有
```
SELECT
```
路由至副本；写入操作至主库
考虑复制延迟（异步复制通常<1s，同步复制为0）
读取关键数据前，使用
```
pg_last_wal_replay_lsn()
```
检测延迟

Multi-Database Decision Matrix

多数据库决策矩阵

Criteria	PostgreSQL	MySQL	SQLite	SQL Server
Best for	Complex queries, JSONB, extensions	Web apps, read-heavy workloads	Embedded, dev/test, edge	Enterprise .NET stacks
JSON support	Excellent (JSONB + GIN)	Good (JSON type)	Minimal	Good (OPENJSON)
Replication	Streaming, logical	Group replication, InnoDB cluster	N/A	Always On AG
Licensing	Open source (PostgreSQL License)	Open source (GPL) / commercial	Public domain	Commercial
Max practical size	Multi-TB	Multi-TB	~1 TB (single-writer)	Multi-TB

When to choose:

PostgreSQL — default choice for new projects; best extensibility and standards compliance
MySQL — existing MySQL ecosystem; simple read-heavy web applications
SQLite — mobile apps, CLI tools, unit test databases, IoT/edge
SQL Server — mandated by enterprise policy; deep .NET/Azure integration

评估标准	PostgreSQL	MySQL	SQLite	SQL Server
最佳适用场景	复杂查询、JSONB、扩展功能	Web应用、读密集型负载	嵌入式、开发/测试、边缘场景	企业.NET栈
JSON支持	优秀（JSONB + GIN）	良好（JSON类型）	有限	良好（OPENJSON）
复制	流式复制、逻辑复制	组复制、InnoDB集群	无	Always On AG
许可	开源（PostgreSQL许可证）	开源（GPL）/商业	公有领域	商业
实际最大容量	多TB	多TB	~1 TB（单写入者）	多TB

选型建议：

PostgreSQL — 新项目默认选择；扩展性和标准兼容性最佳
MySQL — 现有MySQL生态；简单读密集型Web应用
SQLite — 移动应用、CLI工具、单元测试数据库、IoT/边缘场景
SQL Server — 企业政策要求；深度.NET/Azure集成

NoSQL Considerations

NoSQL考量

Database	Model	Use When
MongoDB	Document	Schema flexibility, rapid prototyping, content management
Redis	Key-value / cache	Session store, rate limiting, leaderboards, pub/sub
DynamoDB	Wide-column	Serverless AWS apps, single-digit-ms latency at any scale

Use SQL as default. Reach for NoSQL only when the access pattern clearly benefits from it.

数据库	模型	适用场景
MongoDB	文档型	架构灵活性、快速原型开发、内容管理
Redis	键值/缓存	会话存储、限流、排行榜、发布/订阅
DynamoDB	宽列型	无服务器AWS应用、任意规模下的单位数毫秒延迟

优先选择SQL。仅当访问模式明确能从中获益时，才考虑NoSQL。

Sharding & Replication

分片与复制

Horizontal vs Vertical Partitioning

水平 vs 垂直分区

Vertical partitioning: Split columns across tables (e.g., separate BLOB columns). Reduces I/O for narrow queries.
Horizontal partitioning (sharding): Split rows across databases/servers. Required when a single node cannot hold the dataset or handle the throughput.

垂直分区：将列拆分到不同表中（例如，分离BLOB列）。减少窄查询的I/O。
水平分区（分片）：将行拆分到不同数据库/服务器中。当单个节点无法容纳数据集或处理吞吐量时必需。

Sharding Strategies

分片策略

Strategy	How It Works	Pros	Cons
Hash	`shard = hash(key) % N`	Even distribution	Resharding is expensive
Range	Shard by date or ID range	Simple, good for time-series	Hot spots on latest shard
Geographic	Shard by user region	Data locality, compliance	Cross-region queries are hard

策略	工作原理	优点	缺点
哈希分片	`shard = hash(key) % N`	分布均匀	重新分片成本高
范围分片	按日期或ID范围分片	简单，适合时间序列数据	最新分片易成为热点
地理分片	按用户区域分片	数据本地化、合规性	跨区域查询难度大

Replication Patterns

复制模式

Pattern	Consistency	Latency	Use Case
Synchronous	Strong	Higher write latency	Financial transactions
Asynchronous	Eventual	Low write latency	Read-heavy web apps
Semi-synchronous	At-least-one replica confirmed	Moderate	Balance of safety and speed

模式	一致性	延迟	适用场景
同步复制	强一致性	写入延迟较高	金融交易
异步复制	最终一致性	写入延迟低	读密集型Web应用
半同步复制	至少一个副本确认	延迟中等	安全性与速度的平衡

Cross-References

交叉引用

sql-database-assistant — query writing, optimization, and debugging for day-to-day SQL work
database-schema-designer — ERD modeling, normalization analysis, and schema generation
migration-architect — large-scale migration planning across database engines or major schema overhauls
senior-backend — application-layer patterns (connection pooling, ORM best practices)
senior-devops — infrastructure provisioning for database clusters and replicas

sql-database-assistant — 日常SQL工作的查询编写、优化与调试
database-schema-designer — ERD建模、规范化分析与架构生成
migration-architect — 跨数据库引擎或大型架构重构的大规模迁移规划
senior-backend — 应用层模式（连接池、ORM最佳实践）
senior-devops — 数据库集群与副本的基础设施配置

Conclusion

总结

Effective database design requires balancing multiple competing concerns: performance, scalability, maintainability, and business requirements. This skill provides the tools and knowledge to make informed decisions throughout the database lifecycle, from initial schema design through production optimization and evolution.

The included tools automate common analysis and optimization tasks, while the comprehensive guides provide the theoretical foundation for making sound architectural decisions. Whether building a new system or optimizing an existing one, these resources provide expert-level guidance for creating robust, scalable database solutions.

有效的数据库设计需要平衡多个相互竞争的因素：性能、可扩展性、可维护性和业务需求。本技能提供了贯穿数据库生命周期的工具和知识，从初始架构设计到生产环境优化和演进，帮助做出明智决策。

内置工具可自动化常见的分析和优化任务，而全面的指南则为制定合理的架构决策提供理论基础。无论是构建新系统还是优化现有系统，这些资源都能提供专家级指导，助力创建稳健、可扩展的数据库解决方案。