database-schema-designer
Compare original and translation side by side
🇺🇸
Original
English🇨🇳
Translation
ChineseDatabase Schema Designer
数据库Schema设计器
Overview
概述
This skill provides comprehensive guidance for designing robust, scalable database schemas for both SQL and NoSQL databases. Whether building from scratch or evolving existing schemas, this framework ensures data integrity, performance, and maintainability.
When to use this skill:
- Designing new database schemas
- Refactoring or migrating existing schemas
- Optimizing database performance
- Choosing between SQL and NoSQL approaches
- Creating database migrations
- Establishing indexing strategies
- Modeling complex relationships
- Planning data archival and partitioning
本技能为SQL和NoSQL数据库设计健壮、可扩展的数据库Schema提供全面指导。无论是从零开始构建还是迭代现有Schema,此框架都能确保数据完整性、性能和可维护性。
何时使用此技能:
- 设计新的数据库Schema
- 重构或迁移现有Schema
- 优化数据库性能
- 在SQL和NoSQL方案间做选择
- 创建数据库迁移脚本
- 制定索引策略
- 建模复杂关系
- 规划数据归档与分区
Database Design Philosophy
数据库设计理念
Core Principles
核心原则
1. Model the Domain, Not the UI
- Schema reflects business entities and relationships
- Don't let UI requirements drive data structure
- Separate presentation concerns from data model
2. Optimize for Reads or Writes (Not Both)
- OLTP (transactional): Normalized, optimized for writes
- OLAP (analytical): Denormalized, optimized for reads
- Choose based on access patterns
3. Plan for Scale From Day One
- Indexing strategy
- Partitioning approach
- Caching layer
- Read replicas
4. Data Integrity Over Performance
- Use constraints, foreign keys, validation
- Performance issues can be optimized later
- Data corruption is costly to fix
1. 建模业务领域,而非UI
- Schema反映业务实体及关系
- 不要让UI需求驱动数据结构
- 将展示层关注点与数据模型分离
2. 针对读或写优化(而非两者兼顾)
- OLTP(事务型):规范化设计,针对写操作优化
- OLAP(分析型):反规范化设计,针对读操作优化
- 根据访问模式选择合适方案
3. 从第一天就规划可扩展性
- 索引策略
- 分区方案
- 缓存层
- 只读副本
4. 数据完整性优先于性能
- 使用约束、外键、校验规则
- 性能问题可后续优化
- 数据损坏的修复成本极高
SQL Database Design
SQL数据库设计
Normalization
规范化
Database normalization reduces redundancy and ensures data integrity.
数据库规范化可减少冗余并确保数据完整性。
1st Normal Form (1NF)
第一范式(1NF)
Rule: Each column contains atomic (indivisible) values, no repeating groups.
sql
-- ❌ Violates 1NF (multiple values in one column)
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
product_ids VARCHAR(255) -- '101,102,103' (bad!)
);
-- ✅ Follows 1NF
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT
);
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT,
product_id INT,
FOREIGN KEY (order_id) REFERENCES orders(id)
);规则:每个列包含原子(不可分割)值,无重复组。
sql
-- ❌ 违反1NF(单个列包含多个值)
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
product_ids VARCHAR(255) -- '101,102,103'(错误!)
);
-- ✅ 符合1NF
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT
);
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT,
product_id INT,
FOREIGN KEY (order_id) REFERENCES orders(id)
);2nd Normal Form (2NF)
第二范式(2NF)
Rule: Must be in 1NF + all non-key columns depend on the entire primary key.
sql
-- ❌ Violates 2NF (customer_name depends only on customer_id, not full key)
CREATE TABLE order_items (
order_id INT,
product_id INT,
customer_id INT,
customer_name VARCHAR(100), -- Depends on customer_id only
quantity INT,
PRIMARY KEY (order_id, product_id)
);
-- ✅ Follows 2NF (customer data in separate table)
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id)
);
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100)
);规则:必须满足1NF,且所有非键列完全依赖于整个主键。
sql
-- ❌ 违反2NF(customer_name仅依赖于customer_id,而非整个主键)
CREATE TABLE order_items (
order_id INT,
product_id INT,
customer_id INT,
customer_name VARCHAR(100), -- 仅依赖于customer_id
quantity INT,
PRIMARY KEY (order_id, product_id)
);
-- ✅ 符合2NF(客户数据存放在单独表中)
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
FOREIGN KEY (customer_id) REFERENCES customers(id)
);
CREATE TABLE order_items (
order_id INT,
product_id INT,
quantity INT,
PRIMARY KEY (order_id, product_id)
);
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100)
);3rd Normal Form (3NF)
第三范式(3NF)
Rule: Must be in 2NF + no transitive dependencies (non-key columns depend only on primary key).
sql
-- ❌ Violates 3NF (country depends on postal_code, not on customer_id)
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
postal_code VARCHAR(10),
country VARCHAR(50) -- Depends on postal_code, not id
);
-- ✅ Follows 3NF
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
postal_code VARCHAR(10),
FOREIGN KEY (postal_code) REFERENCES postal_codes(code)
);
CREATE TABLE postal_codes (
code VARCHAR(10) PRIMARY KEY,
country VARCHAR(50)
);规则:必须满足2NF,且无传递依赖(非键列仅依赖于主键)。
sql
-- ❌ 违反3NF(country依赖于postal_code,而非customer_id)
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
postal_code VARCHAR(10),
country VARCHAR(50) -- 依赖于postal_code,而非id
);
-- ✅ 符合3NF
CREATE TABLE customers (
id INT PRIMARY KEY,
name VARCHAR(100),
postal_code VARCHAR(10),
FOREIGN KEY (postal_code) REFERENCES postal_codes(code)
);
CREATE TABLE postal_codes (
code VARCHAR(10) PRIMARY KEY,
country VARCHAR(50)
);Denormalization (When to Break Rules)
反规范化(何时打破规则)
Sometimes denormalization improves performance for read-heavy applications.
sql
-- Denormalized for performance (caching derived data)
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
total_amount DECIMAL(10, 2), -- Calculated from order_items
item_count INT, -- Calculated from order_items
created_at TIMESTAMP
);
-- Trigger or application code keeps denormalized data in syncWhen to denormalize:
- Read-heavy applications (reporting, analytics)
- Frequently joined tables causing performance issues
- Pre-calculated aggregates (counts, sums, averages)
- Caching derived data to avoid complex joins
有时反规范化可提升读密集型应用的性能。
sql
-- 为性能做反规范化(缓存衍生数据)
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT,
total_amount DECIMAL(10, 2), -- 从order_items计算得出
item_count INT, -- 从order_items计算得出
created_at TIMESTAMP
);
-- 通过触发器或应用代码保持反规范化数据同步何时反规范化:
- 读密集型应用(报表、分析)
- 频繁关联导致性能问题的表
- 预计算聚合值(计数、求和、平均值)
- 缓存衍生数据以避免复杂关联
Data Types
数据类型
Choose appropriate data types for efficiency and accuracy.
选择合适的数据类型以兼顾效率与准确性。
String Types
字符串类型
sql
-- Fixed-length (use for predictable lengths)
CHAR(10) -- ISO date: '2025-10-31'
CHAR(2) -- State code: 'CA'
-- Variable-length (use for variable lengths)
VARCHAR(255) -- Email, name, short text
TEXT -- Long text (articles, descriptions)
-- ✅ Good: Appropriate sizes
email VARCHAR(255)
phone_number VARCHAR(20)
postal_code VARCHAR(10)
-- ❌ Bad: Wasteful or too small
email VARCHAR(500) -- Too large
description VARCHAR(50) -- Too small for long textsql
-- 固定长度(用于可预测长度的内容)
CHAR(10) -- ISO日期:'2025-10-31'
CHAR(2) -- 州代码:'CA'
-- 可变长度(用于长度可变的内容)
VARCHAR(255) -- 邮箱、名称、短文本
TEXT -- 长文本(文章、描述)
-- ✅ 推荐:合适的长度
email VARCHAR(255)
phone_number VARCHAR(20)
postal_code VARCHAR(10)
-- ❌ 不推荐:浪费空间或长度不足
email VARCHAR(500) -- 过长
description VARCHAR(50) -- 长文本长度不足Numeric Types
数值类型
sql
-- Integer types
TINYINT -- -128 to 127 (age, status codes)
SMALLINT -- -32,768 to 32,767 (quantities)
INT -- -2.1B to 2.1B (IDs, counts)
BIGINT -- Large numbers (timestamps, large IDs)
-- Decimal types
DECIMAL(10, 2) -- Exact precision (money: $99,999,999.99)
FLOAT -- Approximate (scientific calculations)
DOUBLE -- Higher precision approximations
-- ✅ Use DECIMAL for money
CREATE TABLE products (
id INT PRIMARY KEY,
price DECIMAL(10, 2) -- Exact precision
);
-- ❌ Don't use FLOAT for money
price FLOAT -- Rounding errors!sql
-- 整数类型
TINYINT -- -128至127(年龄、状态码)
SMALLINT -- -32,768至32,767(数量)
INT -- -21亿至21亿(ID、计数)
BIGINT -- 大数值(时间戳、大型ID)
-- 小数类型
DECIMAL(10, 2) -- 精确精度(货币:$99,999,999.99)
FLOAT -- 近似值(科学计算)
DOUBLE -- 更高精度的近似值
-- ✅ 货币类型使用DECIMAL
CREATE TABLE products (
id INT PRIMARY KEY,
price DECIMAL(10, 2) -- 精确精度
);
-- ❌ 货币类型不要使用FLOAT
price FLOAT -- 会出现舍入误差!Date/Time Types
日期/时间类型
sql
DATE -- Date only: 2025-10-31
TIME -- Time only: 14:30:00
DATETIME -- Date + time: 2025-10-31 14:30:00
TIMESTAMP -- Unix timestamp (auto-converts timezone)
-- ✅ Always store in UTC
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMPsql
DATE -- 仅日期:2025-10-31
TIME -- 仅时间:14:30:00
DATETIME -- 日期+时间:2025-10-31 14:30:00
TIMESTAMP -- Unix时间戳(自动转换时区)
-- ✅ 始终以UTC时区存储
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
updated_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP ON UPDATE CURRENT_TIMESTAMPBoolean
布尔类型
sql
-- PostgreSQL
is_active BOOLEAN DEFAULT TRUE
-- MySQL
is_active TINYINT(1) DEFAULT 1sql
-- PostgreSQL
is_active BOOLEAN DEFAULT TRUE
-- MySQL
is_active TINYINT(1) DEFAULT 1Indexing Strategies
索引策略
Indexes speed up reads but slow down writes. Use strategically.
索引可加速读操作,但会减慢写操作,需策略性使用。
When to Create Indexes
何时创建索引
sql
-- ✅ Index foreign keys
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- ✅ Index frequently queried columns
CREATE INDEX idx_users_email ON users(email);
-- ✅ Index columns used in WHERE, ORDER BY, GROUP BY
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- ✅ Composite index for multi-column queries
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);sql
-- ✅ 为外键创建索引
CREATE INDEX idx_orders_customer_id ON orders(customer_id);
-- ✅ 为频繁查询的列创建索引
CREATE INDEX idx_users_email ON users(email);
-- ✅ 为WHERE、ORDER BY、GROUP BY中使用的列创建索引
CREATE INDEX idx_orders_created_at ON orders(created_at);
-- ✅ 为多列查询创建复合索引
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);Index Types
索引类型
B-Tree Index (Default)
sql
-- Best for equality and range queries
CREATE INDEX idx_products_price ON products(price);
-- Queries that benefit:
SELECT * FROM products WHERE price > 100;
SELECT * FROM products WHERE price BETWEEN 50 AND 150;Hash Index
sql
-- Best for exact matches only (not ranges)
CREATE INDEX idx_users_email USING HASH ON users(email);
-- Queries that benefit:
SELECT * FROM users WHERE email = 'user@example.com';Full-Text Index
sql
-- Best for text search
CREATE FULLTEXT INDEX idx_articles_content ON articles(title, content);
-- Queries that benefit:
SELECT * FROM articles WHERE MATCH(title, content) AGAINST('database design');Partial Index (PostgreSQL)
sql
-- Index only specific rows
CREATE INDEX idx_active_users ON users(email) WHERE is_active = TRUE;B-Tree索引(默认)
sql
-- 最适合等值和范围查询
CREATE INDEX idx_products_price ON products(price);
-- 受益的查询:
SELECT * FROM products WHERE price > 100;
SELECT * FROM products WHERE price BETWEEN 50 AND 150;哈希索引
sql
-- 仅适合精确匹配(不支持范围查询)
CREATE INDEX idx_users_email USING HASH ON users(email);
-- 受益的查询:
SELECT * FROM users WHERE email = 'user@example.com';全文索引
sql
-- 最适合文本搜索
CREATE FULLTEXT INDEX idx_articles_content ON articles(title, content);
-- 受益的查询:
SELECT * FROM articles WHERE MATCH(title, content) AGAINST('database design');部分索引(PostgreSQL)
sql
-- 仅为特定行创建索引
CREATE INDEX idx_active_users ON users(email) WHERE is_active = TRUE;Composite Indexes (Column Order Matters)
复合索引(列顺序很重要)
sql
-- ✅ Good: Index supports both queries
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
-- Query 1: Uses index efficiently
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
-- Query 2: Uses index (customer_id only)
SELECT * FROM orders WHERE customer_id = 123;
-- ❌ Query 3: Doesn't use index (status is second column)
SELECT * FROM orders WHERE status = 'pending';Rule of Thumb: Put most selective column first, or most frequently queried alone.
sql
-- ✅ 推荐:索引支持两种查询
CREATE INDEX idx_orders_customer_status ON orders(customer_id, status);
-- 查询1:高效使用索引
SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';
-- 查询2:使用索引(仅customer_id)
SELECT * FROM orders WHERE customer_id = 123;
-- ❌ 查询3:不使用索引(status是第二列)
SELECT * FROM orders WHERE status = 'pending';经验法则:将选择性最高的列放在前面,或最常单独查询的列放在前面。
Constraints
约束
Use constraints to enforce data integrity at the database level.
使用约束在数据库层面强制数据完整性。
Primary Key
主键
sql
-- Auto-incrementing integer
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL
);
-- UUID (better for distributed systems)
CREATE TABLE users (
id CHAR(36) PRIMARY KEY DEFAULT (UUID()),
email VARCHAR(255) UNIQUE NOT NULL
);sql
-- 自增整数
CREATE TABLE users (
id INT AUTO_INCREMENT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL
);
-- UUID(更适合分布式系统)
CREATE TABLE users (
id CHAR(36) PRIMARY KEY DEFAULT (UUID()),
email VARCHAR(255) UNIQUE NOT NULL
);Foreign Key
外键
sql
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(id)
ON DELETE CASCADE -- Delete orders when customer deleted
ON UPDATE CASCADE -- Update orders when customer ID changes
);
-- Alternatives:
ON DELETE RESTRICT -- Prevent deletion if referenced
ON DELETE SET NULL -- Set to NULL when parent deleted
ON DELETE NO ACTION -- Same as RESTRICTsql
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT NOT NULL,
FOREIGN KEY (customer_id) REFERENCES customers(id)
ON DELETE CASCADE -- 删除客户时同时删除订单
ON UPDATE CASCADE -- 更新客户ID时同步更新订单
);
-- 其他选项:
ON DELETE RESTRICT -- 若存在引用则阻止删除
ON DELETE SET NULL -- 父记录删除时设为NULL
ON DELETE NO ACTION -- 与RESTRICT相同Unique Constraint
唯一约束
sql
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(50) UNIQUE NOT NULL
);
-- Composite unique constraint
CREATE TABLE enrollments (
student_id INT,
course_id INT,
UNIQUE (student_id, course_id) -- Prevent duplicate enrollments
);sql
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) UNIQUE NOT NULL,
username VARCHAR(50) UNIQUE NOT NULL
);
-- 复合唯一约束
CREATE TABLE enrollments (
student_id INT,
course_id INT,
UNIQUE (student_id, course_id) -- 防止重复选课
);Check Constraint
检查约束
sql
CREATE TABLE products (
id INT PRIMARY KEY,
price DECIMAL(10, 2) CHECK (price >= 0),
stock INT CHECK (stock >= 0),
discount_percent INT CHECK (discount_percent BETWEEN 0 AND 100)
);sql
CREATE TABLE products (
id INT PRIMARY KEY,
price DECIMAL(10, 2) CHECK (price >= 0),
stock INT CHECK (stock >= 0),
discount_percent INT CHECK (discount_percent BETWEEN 0 AND 100)
);Not Null Constraint
非空约束
sql
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) NOT NULL,
name VARCHAR(100) NOT NULL,
bio TEXT -- Nullable (optional)
);sql
CREATE TABLE users (
id INT PRIMARY KEY,
email VARCHAR(255) NOT NULL,
name VARCHAR(100) NOT NULL,
bio TEXT -- 可为空(可选字段)
);Common Schema Patterns
常见Schema模式
One-to-Many (Orders → Order Items)
一对多(订单 → 订单项)
sql
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
price DECIMAL(10, 2) NOT NULL,
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE
);sql
CREATE TABLE orders (
id INT PRIMARY KEY,
customer_id INT NOT NULL,
created_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP
);
CREATE TABLE order_items (
id INT PRIMARY KEY,
order_id INT NOT NULL,
product_id INT NOT NULL,
quantity INT NOT NULL,
price DECIMAL(10, 2) NOT NULL,
FOREIGN KEY (order_id) REFERENCES orders(id) ON DELETE CASCADE
);Many-to-Many (Students ↔ Courses)
多对多(学生 ↔ 课程)
sql
CREATE TABLE students (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
CREATE TABLE courses (
id INT PRIMARY KEY,
title VARCHAR(200) NOT NULL
);
-- Junction table (also called join table, linking table)
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE,
FOREIGN KEY (course_id) REFERENCES courses(id) ON DELETE CASCADE
);sql
CREATE TABLE students (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL
);
CREATE TABLE courses (
id INT PRIMARY KEY,
title VARCHAR(200) NOT NULL
);
-- 关联表(也称为连接表、链接表)
CREATE TABLE enrollments (
student_id INT,
course_id INT,
enrolled_at TIMESTAMP DEFAULT CURRENT_TIMESTAMP,
grade VARCHAR(2),
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES students(id) ON DELETE CASCADE,
FOREIGN KEY (course_id) REFERENCES courses(id) ON DELETE CASCADE
);Self-Referencing (Employees → Manager)
自引用(员工 → 经理)
sql
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES employees(id)
);sql
CREATE TABLE employees (
id INT PRIMARY KEY,
name VARCHAR(100) NOT NULL,
manager_id INT,
FOREIGN KEY (manager_id) REFERENCES employees(id)
);Polymorphic Relationships (Comments on Posts/Photos)
多态关联(帖子/照片的评论)
sql
-- Approach 1: Separate foreign keys with CHECK constraint
CREATE TABLE comments (
id INT PRIMARY KEY,
content TEXT NOT NULL,
post_id INT,
photo_id INT,
CHECK (
(post_id IS NOT NULL AND photo_id IS NULL) OR
(post_id IS NULL AND photo_id IS NOT NULL)
),
FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
FOREIGN KEY (photo_id) REFERENCES photos(id) ON DELETE CASCADE
);
-- Approach 2: commentable_type + commentable_id (Rails-style)
CREATE TABLE comments (
id INT PRIMARY KEY,
content TEXT NOT NULL,
commentable_type VARCHAR(50) NOT NULL, -- 'Post' or 'Photo'
commentable_id INT NOT NULL
);
-- Note: No foreign key constraint possible (less data integrity)sql
-- 方案1:使用单独外键加CHECK约束
CREATE TABLE comments (
id INT PRIMARY KEY,
content TEXT NOT NULL,
post_id INT,
photo_id INT,
CHECK (
(post_id IS NOT NULL AND photo_id IS NULL) OR
(post_id IS NULL AND photo_id IS NOT NULL)
),
FOREIGN KEY (post_id) REFERENCES posts(id) ON DELETE CASCADE,
FOREIGN KEY (photo_id) REFERENCES photos(id) ON DELETE CASCADE
);
-- 方案2:commentable_type + commentable_id(Rails风格)
CREATE TABLE comments (
id INT PRIMARY KEY,
content TEXT NOT NULL,
commentable_type VARCHAR(50) NOT NULL, -- 'Post' 或 'Photo'
commentable_id INT NOT NULL
);
-- 注意:无法创建外键约束(数据完整性较弱)NoSQL Database Design
NoSQL数据库设计
Document Databases (MongoDB)
文档型数据库(MongoDB)
When to use:
- Schema flexibility needed
- Rapid iteration
- Hierarchical data
- Read-heavy workloads
何时使用:
- 需要Schema灵活性
- 快速迭代
- 层级化数据
- 读密集型工作负载
Embedding vs Referencing
嵌入 vs 引用
Embedding (Denormalization)
json
{
"_id": "order_123",
"customer": {
"id": "cust_456",
"name": "Jane Smith",
"email": "jane@example.com"
},
"items": [
{ "product_id": "prod_789", "quantity": 2, "price": 29.99 },
{ "product_id": "prod_101", "quantity": 1, "price": 49.99 }
],
"total": 109.97,
"created_at": "2025-10-31T10:30:00Z"
}When to embed:
- Data accessed together frequently
- 1:few relationships (few items)
- Child documents don't need independent existence
Referencing (Normalization)
json
{
"_id": "order_123",
"customer_id": "cust_456",
"item_ids": ["item_1", "item_2"],
"total": 109.97,
"created_at": "2025-10-31T10:30:00Z"
}When to reference:
- Data accessed independently
- 1:many relationships (many items)
- Large documents (approaching 16MB limit)
- Frequently updated data
嵌入(反规范化)
json
{
"_id": "order_123",
"customer": {
"id": "cust_456",
"name": "Jane Smith",
"email": "jane@example.com"
},
"items": [
{ "product_id": "prod_789", "quantity": 2, "price": 29.99 },
{ "product_id": "prod_101", "quantity": 1, "price": 49.99 }
],
"total": 109.97,
"created_at": "2025-10-31T10:30:00Z"
}何时嵌入:
- 数据经常被一起访问
- 一对少的关系(少量子项)
- 子文档无需独立存在
引用(规范化)
json
{
"_id": "order_123",
"customer_id": "cust_456",
"item_ids": ["item_1", "item_2"],
"total": 109.97,
"created_at": "2025-10-31T10:30:00Z"
}何时引用:
- 数据需独立访问
- 一对多的关系(大量子项)
- 文档过大(接近16MB限制)
- 数据频繁更新
Indexing in MongoDB
MongoDB中的索引
javascript
// Create index
db.users.createIndex({ email: 1 }, { unique: true });
// Composite index
db.orders.createIndex({ customer_id: 1, created_at: -1 });
// Text index for search
db.articles.createIndex({ title: "text", content: "text" });
// Geospatial index
db.stores.createIndex({ location: "2dsphere" });javascript
// 创建索引
db.users.createIndex({ email: 1 }, { unique: true });
// 复合索引
db.orders.createIndex({ customer_id: 1, created_at: -1 });
// 文本索引用于搜索
db.articles.createIndex({ title: "text", content: "text" });
// 地理空间索引
db.stores.createIndex({ location: "2dsphere" });Database Migrations
数据库迁移
Migration Best Practices
迁移最佳实践
1. Always Reversible
sql
-- Up migration
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Down migration
ALTER TABLE users DROP COLUMN phone;2. Backward Compatible
sql
-- ✅ Good: Add nullable column
ALTER TABLE users ADD COLUMN middle_name VARCHAR(50);
-- ❌ Bad: Add required column (breaks existing code)
ALTER TABLE users ADD COLUMN middle_name VARCHAR(50) NOT NULL;
-- ✅ Better: Add nullable, then populate, then make required
-- Migration 1: Add column
ALTER TABLE users ADD COLUMN middle_name VARCHAR(50);
-- Migration 2: Populate with default
UPDATE users SET middle_name = '' WHERE middle_name IS NULL;
-- Migration 3: Make required
ALTER TABLE users MODIFY COLUMN middle_name VARCHAR(50) NOT NULL;3. Data Migrations Separate from Schema Changes
sql
-- Migration 1: Schema change
ALTER TABLE orders ADD COLUMN status VARCHAR(20) DEFAULT 'pending';
-- Migration 2: Data migration
UPDATE orders SET status = 'completed' WHERE completed_at IS NOT NULL;4. Test Migrations on Production Copy
- Test on staging with production data snapshot
- Measure migration duration
- Plan for downtime (if needed)
1. 始终支持回滚
sql
-- 正向迁移
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- 回滚迁移
ALTER TABLE users DROP COLUMN phone;2. 保持向后兼容
sql
-- ✅ 推荐:添加可为空的列
ALTER TABLE users ADD COLUMN middle_name VARCHAR(50);
-- ❌ 不推荐:添加必填列(会破坏现有代码)
ALTER TABLE users ADD COLUMN middle_name VARCHAR(50) NOT NULL;
-- ✅ 更好的方式:先添加可为空列,再填充数据,最后设为必填
-- 迁移1:添加列
ALTER TABLE users ADD COLUMN middle_name VARCHAR(50);
-- 迁移2:填充默认值
UPDATE users SET middle_name = '' WHERE middle_name IS NULL;
-- 迁移3:设为必填
ALTER TABLE users MODIFY COLUMN middle_name VARCHAR(50) NOT NULL;3. 数据迁移与Schema变更分离
sql
-- 迁移1:Schema变更
ALTER TABLE orders ADD COLUMN status VARCHAR(20) DEFAULT 'pending';
-- 迁移2:数据迁移
UPDATE orders SET status = 'completed' WHERE completed_at IS NOT NULL;4. 在生产环境副本上测试迁移
- 使用生产数据快照在预发布环境测试
- 测量迁移耗时
- 规划停机时间(若需要)
Zero-Downtime Migrations
零停机迁移
Adding a Column:
sql
-- Step 1: Add nullable column
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- Step 2: Deploy code that writes to new column
-- (Application now writes to both old and new column)
-- Step 3: Backfill existing rows
UPDATE users SET phone = old_phone WHERE phone IS NULL;
-- Step 4: Make column required (if needed)
ALTER TABLE users MODIFY COLUMN phone VARCHAR(20) NOT NULL;Renaming a Column:
sql
-- Step 1: Add new column
ALTER TABLE users ADD COLUMN email_address VARCHAR(255);
-- Step 2: Copy data
UPDATE users SET email_address = email;
-- Step 3: Deploy code that reads from new column
-- Step 4: Deploy code that writes to new column
-- Step 5: Drop old column
ALTER TABLE users DROP COLUMN email;添加列:
sql
-- 步骤1:添加可为空的列
ALTER TABLE users ADD COLUMN phone VARCHAR(20);
-- 步骤2:部署写入新列的代码
-- (应用现在同时写入新旧列)
-- 步骤3:回填现有行数据
UPDATE users SET phone = old_phone WHERE phone IS NULL;
-- 步骤4:(若需要)将列设为必填
ALTER TABLE users MODIFY COLUMN phone VARCHAR(20) NOT NULL;重命名列:
sql
-- 步骤1:添加新列
ALTER TABLE users ADD COLUMN email_address VARCHAR(255);
-- 步骤2:复制数据
UPDATE users SET email_address = email;
-- 步骤3:部署读取新列的代码
-- 步骤4:部署写入新列的代码
-- 步骤5:删除旧列
ALTER TABLE users DROP COLUMN email;Performance Optimization
性能优化
Query Optimization
查询优化
Use EXPLAIN to analyze queries:
sql
EXPLAIN SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';Look for:
- Type: ALL (table scan - bad), index, ref, eq_ref
- Possible keys: Indexes available
- Key: Index actually used
- Rows: Estimated rows scanned
Optimization techniques:
- Add indexes on WHERE, ORDER BY, GROUP BY columns
- Avoid SELECT * (fetch only needed columns)
- Use LIMIT for pagination
- Denormalize for read-heavy queries
使用EXPLAIN分析查询:
sql
EXPLAIN SELECT * FROM orders WHERE customer_id = 123 AND status = 'pending';关注要点:
- Type:ALL(全表扫描 - 糟糕)、index、ref、eq_ref
- Possible keys:可用的索引
- Key:实际使用的索引
- Rows:预估扫描行数
优化技巧:
- 为WHERE、ORDER BY、GROUP BY列添加索引
- 避免SELECT *(仅获取所需列)
- 分页使用LIMIT
- 对读密集型查询做反规范化
N+1 Query Problem
N+1查询问题
python
undefinedpython
undefined❌ Bad: N+1 queries (1 query for orders + N queries for customers)
❌ 糟糕:N+1查询(1次订单查询 + N次客户查询)
orders = db.query("SELECT * FROM orders")
for order in orders:
customer = db.query(f"SELECT * FROM customers WHERE id = {order.customer_id}")
print(f"{customer.name} ordered {order.total}")
orders = db.query("SELECT * FROM orders")
for order in orders:
customer = db.query(f"SELECT * FROM customers WHERE id = {order.customer_id}")
print(f"{customer.name} ordered {order.total}")
✅ Good: Single query with JOIN
✅ 推荐:使用JOIN的单查询
results = db.query("""
SELECT orders.*, customers.name
FROM orders
JOIN customers ON orders.customer_id = customers.id
""")
for result in results:
print(f"{result.name} ordered {result.total}")
---results = db.query("""
SELECT orders.*, customers.name
FROM orders
JOIN customers ON orders.customer_id = customers.id
""")
for result in results:
print(f"{result.name} ordered {result.total}")
---Integration with Agents
与Agent的集成
Backend System Architect
后端系统架构师
- Uses this skill when designing data models
- Applies normalization and indexing strategies
- Plans for scalability and performance
- 设计数据模型时使用此技能
- 应用规范化和索引策略
- 规划可扩展性与性能
Code Quality Reviewer
代码质量审核员
- Validates schema design follows best practices
- Checks for missing indexes and constraints
- Reviews migration safety
- 验证Schema设计是否遵循最佳实践
- 检查缺失的索引和约束
- 审核迁移的安全性
AI/ML Engineer
AI/ML工程师
- Uses denormalization patterns for analytics
- Designs data pipelines and aggregation tables
- 为分析场景使用反规范化模式
- 设计数据管道和聚合表
Quick Start Checklist
快速入门检查清单
When designing a new schema:
- Identify entities and relationships
- Choose SQL or NoSQL based on requirements
- Normalize to 3NF (SQL) or decide embed/reference (NoSQL)
- Define primary keys (INT auto-increment or UUID)
- Add foreign key constraints
- Choose appropriate data types
- Add unique constraints where needed
- Plan indexing strategy (foreign keys, WHERE columns)
- Add NOT NULL constraints for required fields
- Create CHECK constraints for validation
- Plan for soft deletes (deleted_at column) if needed
- Add timestamps (created_at, updated_at)
- Design migration scripts (up and down)
- Test migrations on staging
Skill Version: 1.0.0
Last Updated: 2025-10-31
Maintained by: AI Agent Hub Team
设计新Schema时:
- 识别实体与关系
- 根据需求选择SQL或NoSQL
- 规范化至3NF(SQL)或决定嵌入/引用策略(NoSQL)
- 定义主键(自增INT或UUID)
- 添加外键约束
- 选择合适的数据类型
- 在需要的地方添加唯一约束
- 规划索引策略(外键、WHERE列)
- 为必填字段添加非空约束
- 添加检查约束用于校验
- (若需要)规划软删除(deleted_at列)
- 添加时间戳(created_at、updated_at)
- 设计迁移脚本(正向和回滚)
- 在预发布环境测试迁移
技能版本:1.0.0
最后更新:2025-10-31
维护团队:AI Agent Hub Team