database-designer

Compare original and translation side by side

🇺🇸

Original

English
🇨🇳

Translation

Chinese

Database Designer - POWERFUL Tier Skill

Database Designer - 强力级技能

Overview

概述

A comprehensive database design skill that provides expert-level analysis, optimization, and migration capabilities for modern database systems. This skill combines theoretical principles with practical tools to help architects and developers create scalable, performant, and maintainable database schemas.
这是一项全面的数据库设计技能,为现代数据库系统提供专家级的分析、优化和迁移能力。该技能将理论原则与实用工具相结合,帮助架构师和开发者创建可扩展、高性能且易于维护的数据库架构。

Core Competencies

核心能力

Schema Design & Analysis

架构设计与分析

  • Normalization Analysis: Automated detection of normalization levels (1NF through BCNF)
  • Denormalization Strategy: Smart recommendations for performance optimization
  • Data Type Optimization: Identification of inappropriate types and size issues
  • Constraint Analysis: Missing foreign keys, unique constraints, and null checks
  • Naming Convention Validation: Consistent table and column naming patterns
  • ERD Generation: Automatic Mermaid diagram creation from DDL
  • 规范化分析:自动检测规范化级别(1NF至BCNF)
  • 反规范化策略:性能优化的智能建议
  • 数据类型优化:识别不合适的数据类型及大小问题
  • 约束分析:检查缺失的外键、唯一约束和空值校验
  • 命名规范验证:确保表和列的命名模式一致
  • ERD生成:从DDL自动创建Mermaid图表

Index Optimization

索引优化

  • Index Gap Analysis: Identification of missing indexes on foreign keys and query patterns
  • Composite Index Strategy: Optimal column ordering for multi-column indexes
  • Index Redundancy Detection: Elimination of overlapping and unused indexes
  • Performance Impact Modeling: Selectivity estimation and query cost analysis
  • Index Type Selection: B-tree, hash, partial, covering, and specialized indexes
  • 索引缺口分析:识别外键和查询模式中缺失的索引
  • 复合索引策略:多列索引的最优列排序方案
  • 索引冗余检测:消除重叠和未使用的索引
  • 性能影响建模:选择性估算和查询成本分析
  • 索引类型选择:B-tree、哈希、部分索引、覆盖索引及专用索引

Migration Management

迁移管理

  • Zero-Downtime Migrations: Expand-contract pattern implementation
  • Schema Evolution: Safe column additions, deletions, and type changes
  • Data Migration Scripts: Automated data transformation and validation
  • Rollback Strategy: Complete reversal capabilities with validation
  • Execution Planning: Ordered migration steps with dependency resolution
  • 零停机迁移:实现扩容-收缩模式
  • 架构演进:安全地添加、删除列及修改数据类型
  • 数据迁移脚本:自动数据转换与验证
  • 回滚策略:具备完整的验证回滚能力
  • 执行规划:按依赖关系排序的迁移步骤

Database Design Principles

数据库设计原则

→ See references/database-design-reference.md for details
→ 详情请参考references/database-design-reference.md

Best Practices

最佳实践

Schema Design

架构设计

  1. Use meaningful names: Clear, consistent naming conventions
  2. Choose appropriate data types: Right-sized columns for storage efficiency
  3. Define proper constraints: Foreign keys, check constraints, unique indexes
  4. Consider future growth: Plan for scale from the beginning
  5. Document relationships: Clear foreign key relationships and business rules
  1. 使用有意义的名称:清晰、一致的命名规范
  2. 选择合适的数据类型:大小适配的列以提升存储效率
  3. 定义恰当的约束:外键、检查约束、唯一索引
  4. 考虑未来增长:从一开始就规划可扩展性
  5. 记录关系:明确的外键关系和业务规则

Performance Optimization

性能优化

  1. Index strategically: Cover common query patterns without over-indexing
  2. Monitor query performance: Regular analysis of slow queries
  3. Partition large tables: Improve query performance and maintenance
  4. Use appropriate isolation levels: Balance consistency with performance
  5. Implement connection pooling: Efficient resource utilization
  1. 策略性建索引:覆盖常见查询模式但避免过度索引
  2. 监控查询性能:定期分析慢查询
  3. 分区大表:提升查询性能和维护效率
  4. 使用合适的隔离级别:平衡一致性与性能
  5. 实现连接池:高效利用资源

Security Considerations

安全考量

  1. Principle of least privilege: Grant minimal necessary permissions
  2. Encrypt sensitive data: At rest and in transit
  3. Audit access patterns: Monitor and log database access
  4. Validate inputs: Prevent SQL injection attacks
  5. Regular security updates: Keep database software current
  1. 最小权限原则:仅授予必要的最小权限
  2. 加密敏感数据:静态和传输中的数据都需加密
  3. 审计访问模式:监控并记录数据库访问
  4. 验证输入:防止SQL注入攻击
  5. 定期安全更新:保持数据库软件为最新版本

Query Generation Patterns

查询生成模式

SELECT with JOINs

带JOIN的SELECT

sql
-- INNER JOIN: only matching rows
SELECT o.id, c.name, o.total
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id;

-- LEFT JOIN: all left rows, NULLs for non-matches
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.name;

-- Self-join: hierarchical data (employees/managers)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;
sql
-- INNER JOIN: 仅匹配行
SELECT o.id, c.name, o.total
FROM orders o
INNER JOIN customers c ON c.id = o.customer_id;

-- LEFT JOIN: 左表所有行,无匹配则为NULL
SELECT c.name, COUNT(o.id) AS order_count
FROM customers c
LEFT JOIN orders o ON o.customer_id = c.id
GROUP BY c.name;

-- 自连接:层级数据(员工/经理)
SELECT e.name AS employee, m.name AS manager
FROM employees e
LEFT JOIN employees m ON m.id = e.manager_id;

Common Table Expressions (CTEs)

公共表表达式(CTE)

sql
-- Recursive CTE for org chart
WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS depth
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.depth + 1
  FROM employees e INNER JOIN org o ON o.id = e.manager_id
)
SELECT * FROM org ORDER BY depth, name;
sql
-- 用于组织架构图的递归CTE
WITH RECURSIVE org AS (
  SELECT id, name, manager_id, 1 AS depth
  FROM employees WHERE manager_id IS NULL
  UNION ALL
  SELECT e.id, e.name, e.manager_id, o.depth + 1
  FROM employees e INNER JOIN org o ON o.id = e.manager_id
)
SELECT * FROM org ORDER BY depth, name;

Window Functions

窗口函数

sql
-- ROW_NUMBER for pagination / dedup
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders;

-- RANK with gaps, DENSE_RANK without gaps
SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM leaderboard;

-- LAG/LEAD for comparing adjacent rows
SELECT date, revenue,
  revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change
FROM daily_sales;
sql
-- ROW_NUMBER用于分页/去重
SELECT *, ROW_NUMBER() OVER (PARTITION BY customer_id ORDER BY created_at DESC) AS rn
FROM orders;

-- RANK会产生间隙,DENSE_RANK无间隙
SELECT name, score, RANK() OVER (ORDER BY score DESC) AS rank FROM leaderboard;

-- LAG/LEAD用于比较相邻行
SELECT date, revenue,
  revenue - LAG(revenue) OVER (ORDER BY date) AS daily_change
FROM daily_sales;

Aggregation Patterns

聚合模式

sql
-- FILTER clause (PostgreSQL) for conditional aggregation
SELECT
  COUNT(*) AS total,
  COUNT(*) FILTER (WHERE status = 'active') AS active,
  AVG(amount) FILTER (WHERE amount > 0) AS avg_positive
FROM accounts;

-- GROUPING SETS for multi-level rollups
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS ((region, product), (region), ());

sql
-- FILTER子句(PostgreSQL)用于条件聚合
SELECT
  COUNT(*) AS total,
  COUNT(*) FILTER (WHERE status = 'active') AS active,
  AVG(amount) FILTER (WHERE amount > 0) AS avg_positive
FROM accounts;

-- GROUPING SETS用于多级汇总
SELECT region, product, SUM(revenue)
FROM sales
GROUP BY GROUPING SETS ((region, product), (region), ());

Migration Patterns

迁移模式

Up/Down Migration Scripts

升级/降级迁移脚本

Every migration must have a reversible counterpart. Name files with a timestamp prefix for ordering:
migrations/
├── 20260101_000001_create_users.up.sql
├── 20260101_000001_create_users.down.sql
├── 20260115_000002_add_users_email_index.up.sql
└── 20260115_000002_add_users_email_index.down.sql
每个迁移必须有对应的可逆脚本。文件名称以时间戳前缀排序:
migrations/
├── 20260101_000001_create_users.up.sql
├── 20260101_000001_create_users.down.sql
├── 20260115_000002_add_users_email_index.up.sql
└── 20260115_000002_add_users_email_index.down.sql

Zero-Downtime Migrations (Expand/Contract)

零停机迁移(扩容-收缩模式)

Use the expand-contract pattern to avoid locking or breaking running code:
  1. Expand — add the new column/table (nullable, with default)
  2. Migrate data — backfill in batches; dual-write from application
  3. Transition — application reads from new column; stop writing to old
  4. Contract — drop old column in a follow-up migration
使用扩容-收缩模式避免锁定或破坏运行中的代码:
  1. 扩容 — 添加新列/表(可为空,带默认值)
  2. 迁移数据 — 分批回填;应用双写至新旧列
  3. 过渡 — 应用读取新列;停止写入旧列
  4. 收缩 — 在后续迁移中删除旧列

Data Backfill Strategies

数据回填策略

sql
-- Batch update to avoid long-running locks
UPDATE users SET email_normalized = LOWER(email)
WHERE id IN (SELECT id FROM users WHERE email_normalized IS NULL LIMIT 5000);
-- Repeat in a loop until 0 rows affected
sql
-- 批量更新以避免长时间锁定
UPDATE users SET email_normalized = LOWER(email)
WHERE id IN (SELECT id FROM users WHERE email_normalized IS NULL LIMIT 5000);
-- 循环执行直至影响行数为0

Rollback Procedures

回滚流程

  • Always test the
    down.sql
    in staging before deploying
    up.sql
    to production
  • Keep rollback window short — if the contract step has run, rollback requires a new forward migration
  • For irreversible changes (dropping columns with data), take a logical backup first

  • 在生产环境部署
    up.sql
    之前,务必在预发布环境测试
    down.sql
  • 保持回滚窗口较短 — 若已执行收缩步骤,回滚需要新的正向迁移
  • 对于不可逆变更(删除带数据的列),先进行逻辑备份

Performance Optimization

性能优化

Indexing Strategies

索引策略

Index TypeUse CaseExample
B-tree (default)Equality, range, ORDER BY
CREATE INDEX idx_users_email ON users(email);
GINFull-text search, JSONB, arrays
CREATE INDEX idx_docs_body ON docs USING gin(to_tsvector('english', body));
GiSTGeometry, range types, nearest-neighbor
CREATE INDEX idx_locations ON places USING gist(coords);
PartialSubset of rows (reduce size)
CREATE INDEX idx_active ON users(email) WHERE active = true;
CoveringIndex-only scans
CREATE INDEX idx_cov ON orders(customer_id) INCLUDE (total, created_at);
索引类型使用场景示例
B-tree(默认)等值查询、范围查询、ORDER BY
CREATE INDEX idx_users_email ON users(email);
GIN全文搜索、JSONB、数组
CREATE INDEX idx_docs_body ON docs USING gin(to_tsvector('english', body));
GiST几何类型、范围类型、最近邻查询
CREATE INDEX idx_locations ON places USING gist(coords);
部分索引行的子集(减小索引大小)
CREATE INDEX idx_active ON users(email) WHERE active = true;
覆盖索引仅索引扫描
CREATE INDEX idx_cov ON orders(customer_id) INCLUDE (total, created_at);

EXPLAIN Plan Reading

EXPLAIN计划解读

sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
Key signals to watch:
  • Seq Scan on large tables — missing index
  • Nested Loop with high row estimates — consider hash/merge join or add index
  • Buffers shared read much higher than hit — working set exceeds memory
sql
EXPLAIN (ANALYZE, BUFFERS, FORMAT TEXT) SELECT ...;
需关注的关键信号:
  • 大表上的Seq Scan — 缺失索引
  • 高行数预估的Nested Loop — 考虑哈希/合并连接或添加索引
  • Buffers shared read远高于hit — 工作集超出内存

N+1 Query Detection

N+1查询检测

Symptoms: application issues one query per row (e.g., fetching related records in a loop).
Fixes:
  • Use
    JOIN
    or subquery to fetch in one round-trip
  • ORM eager loading (
    select_related
    /
    includes
    /
    with
    )
  • DataLoader pattern for GraphQL resolvers
症状:应用为每行发起一次查询(例如,循环获取关联记录)。
解决方案:
  • 使用
    JOIN
    或子查询一次性获取数据
  • ORM预加载(
    select_related
    /
    includes
    /
    with
  • GraphQL解析器使用DataLoader模式

Connection Pooling

连接池

ToolProtocolBest For
PgBouncerPostgreSQLTransaction/statement pooling, low overhead
ProxySQLMySQLQuery routing, read/write splitting
Built-in pool (HikariCP, SQLAlchemy pool)AnyApplication-level pooling
Rule of thumb: Set pool size to
(2 * CPU cores) + disk spindles
. For cloud SSDs, start with
2 * vCPUs
and tune.
工具协议适用场景
PgBouncerPostgreSQL事务/语句池化,低开销
ProxySQLMySQL查询路由,读写分离
内置池(HikariCP、SQLAlchemy pool)任意应用层池化
经验法则:池大小设置为
(2 * CPU核心数) + 磁盘主轴数
。对于云SSD,从
2 * vCPUs
开始调整。

Read Replicas and Query Routing

只读副本与查询路由

  • Route all
    SELECT
    queries to replicas; writes to primary
  • Account for replication lag (typically <1s for async, 0 for sync)
  • Use
    pg_last_wal_replay_lsn()
    to detect lag before reading critical data

  • 将所有
    SELECT
    路由至副本;写入操作至主库
  • 考虑复制延迟(异步复制通常<1s,同步复制为0)
  • 读取关键数据前,使用
    pg_last_wal_replay_lsn()
    检测延迟

Multi-Database Decision Matrix

多数据库决策矩阵

CriteriaPostgreSQLMySQLSQLiteSQL Server
Best forComplex queries, JSONB, extensionsWeb apps, read-heavy workloadsEmbedded, dev/test, edgeEnterprise .NET stacks
JSON supportExcellent (JSONB + GIN)Good (JSON type)MinimalGood (OPENJSON)
ReplicationStreaming, logicalGroup replication, InnoDB clusterN/AAlways On AG
LicensingOpen source (PostgreSQL License)Open source (GPL) / commercialPublic domainCommercial
Max practical sizeMulti-TBMulti-TB~1 TB (single-writer)Multi-TB
When to choose:
  • PostgreSQL — default choice for new projects; best extensibility and standards compliance
  • MySQL — existing MySQL ecosystem; simple read-heavy web applications
  • SQLite — mobile apps, CLI tools, unit test databases, IoT/edge
  • SQL Server — mandated by enterprise policy; deep .NET/Azure integration
评估标准PostgreSQLMySQLSQLiteSQL Server
最佳适用场景复杂查询、JSONB、扩展功能Web应用、读密集型负载嵌入式、开发/测试、边缘场景企业.NET栈
JSON支持优秀(JSONB + GIN)良好(JSON类型)有限良好(OPENJSON)
复制流式复制、逻辑复制组复制、InnoDB集群Always On AG
许可开源(PostgreSQL许可证)开源(GPL)/商业公有领域商业
实际最大容量多TB多TB~1 TB(单写入者)多TB
选型建议
  • PostgreSQL — 新项目默认选择;扩展性和标准兼容性最佳
  • MySQL — 现有MySQL生态;简单读密集型Web应用
  • SQLite — 移动应用、CLI工具、单元测试数据库、IoT/边缘场景
  • SQL Server — 企业政策要求;深度.NET/Azure集成

NoSQL Considerations

NoSQL考量

DatabaseModelUse When
MongoDBDocumentSchema flexibility, rapid prototyping, content management
RedisKey-value / cacheSession store, rate limiting, leaderboards, pub/sub
DynamoDBWide-columnServerless AWS apps, single-digit-ms latency at any scale
Use SQL as default. Reach for NoSQL only when the access pattern clearly benefits from it.

数据库模型适用场景
MongoDB文档型架构灵活性、快速原型开发、内容管理
Redis键值/缓存会话存储、限流、排行榜、发布/订阅
DynamoDB宽列型无服务器AWS应用、任意规模下的单位数毫秒延迟
优先选择SQL。仅当访问模式明确能从中获益时,才考虑NoSQL。

Sharding & Replication

分片与复制

Horizontal vs Vertical Partitioning

水平 vs 垂直分区

  • Vertical partitioning: Split columns across tables (e.g., separate BLOB columns). Reduces I/O for narrow queries.
  • Horizontal partitioning (sharding): Split rows across databases/servers. Required when a single node cannot hold the dataset or handle the throughput.
  • 垂直分区:将列拆分到不同表中(例如,分离BLOB列)。减少窄查询的I/O。
  • 水平分区(分片):将行拆分到不同数据库/服务器中。当单个节点无法容纳数据集或处理吞吐量时必需。

Sharding Strategies

分片策略

StrategyHow It WorksProsCons
Hash
shard = hash(key) % N
Even distributionResharding is expensive
RangeShard by date or ID rangeSimple, good for time-seriesHot spots on latest shard
GeographicShard by user regionData locality, complianceCross-region queries are hard
策略工作原理优点缺点
哈希分片
shard = hash(key) % N
分布均匀重新分片成本高
范围分片按日期或ID范围分片简单,适合时间序列数据最新分片易成为热点
地理分片按用户区域分片数据本地化、合规性跨区域查询难度大

Replication Patterns

复制模式

PatternConsistencyLatencyUse Case
SynchronousStrongHigher write latencyFinancial transactions
AsynchronousEventualLow write latencyRead-heavy web apps
Semi-synchronousAt-least-one replica confirmedModerateBalance of safety and speed

模式一致性延迟适用场景
同步复制强一致性写入延迟较高金融交易
异步复制最终一致性写入延迟低读密集型Web应用
半同步复制至少一个副本确认延迟中等安全性与速度的平衡

Cross-References

交叉引用

  • sql-database-assistant — query writing, optimization, and debugging for day-to-day SQL work
  • database-schema-designer — ERD modeling, normalization analysis, and schema generation
  • migration-architect — large-scale migration planning across database engines or major schema overhauls
  • senior-backend — application-layer patterns (connection pooling, ORM best practices)
  • senior-devops — infrastructure provisioning for database clusters and replicas

  • sql-database-assistant — 日常SQL工作的查询编写、优化与调试
  • database-schema-designer — ERD建模、规范化分析与架构生成
  • migration-architect — 跨数据库引擎或大型架构重构的大规模迁移规划
  • senior-backend — 应用层模式(连接池、ORM最佳实践)
  • senior-devops — 数据库集群与副本的基础设施配置

Conclusion

总结

Effective database design requires balancing multiple competing concerns: performance, scalability, maintainability, and business requirements. This skill provides the tools and knowledge to make informed decisions throughout the database lifecycle, from initial schema design through production optimization and evolution.
The included tools automate common analysis and optimization tasks, while the comprehensive guides provide the theoretical foundation for making sound architectural decisions. Whether building a new system or optimizing an existing one, these resources provide expert-level guidance for creating robust, scalable database solutions.
有效的数据库设计需要平衡多个相互竞争的因素:性能、可扩展性、可维护性和业务需求。本技能提供了贯穿数据库生命周期的工具和知识,从初始架构设计到生产环境优化和演进,帮助做出明智决策。
内置工具可自动化常见的分析和优化任务,而全面的指南则为制定合理的架构决策提供理论基础。无论是构建新系统还是优化现有系统,这些资源都能提供专家级指导,助力创建稳健、可扩展的数据库解决方案。