database-migration

Compare original and translation side by side

🇺🇸

Original

English

🇨🇳

Translation

Chinese

Database Migration

数据库迁移

Migration File Conventions

迁移文件规范

Every schema change must be captured in a versioned migration file. Follow these conventions:

Timestamped naming: Use

YYYYMMDDHHMMSS_descriptive_name

(e.g.,

20240315143022_add_status_to_orders.sql

Sequential ordering: Migrations run in order. Never insert a migration before one that has already been applied.
One concern per file: Each migration handles a single logical change. Do not combine unrelated schema changes.
Descriptive names: The filename should describe the change, not the ticket number. Use verbs like
```
add
```
,
```
remove
```
,
```
create
```
,
```
drop
```
,
```
rename
```
,
```
alter
```
.
Idempotent when possible: Use
```
IF NOT EXISTS
```
/
```
IF EXISTS
```
guards to make re-runs safe.

migrations/
  20240301100000_create_users_table.sql
  20240305120000_add_email_index_to_users.sql
  20240310090000_create_orders_table.sql
  20240315143022_add_status_to_orders.sql

所有模式变更都必须记录在带版本号的迁移文件中。遵循以下规范：

带时间戳的命名：使用

YYYYMMDDHHMMSS_描述性名称

格式（例如

20240315143022_add_status_to_orders.sql

）。

顺序执行：迁移按顺序运行。绝不要在已执行的迁移之前插入新迁移。
单一关注点：每个迁移仅处理一个逻辑变更。不要合并不相关的模式变更。
描述性名称：文件名应描述变更内容，而非工单编号。使用
```
add
```
、
```
remove
```
、
```
create
```
、
```
drop
```
、
```
rename
```
、
```
alter
```
等动词。
尽可能幂等：使用
```
IF NOT EXISTS
```
/
```
IF EXISTS
```
保护语句，确保重复执行安全。

migrations/
  20240301100000_create_users_table.sql
  20240305120000_add_email_index_to_users.sql
  20240310090000_create_orders_table.sql
  20240315143022_add_status_to_orders.sql

Forward and Rollback Pairs

正向与回滚配对

Every migration must have a corresponding rollback. If you cannot write a safe rollback, document why and flag it for review.

sql

-- Migration: 20240315143022_add_status_to_orders.sql

-- UP
ALTER TABLE orders ADD COLUMN status VARCHAR(50) DEFAULT 'pending';
CREATE INDEX idx_orders_status ON orders (status);

-- DOWN
DROP INDEX IF EXISTS idx_orders_status;
ALTER TABLE orders DROP COLUMN IF EXISTS status;

每个迁移都必须有对应的回滚脚本。如果无法编写安全的回滚脚本，请说明原因并提交审核。

sql

-- 迁移文件: 20240315143022_add_status_to_orders.sql

-- 正向执行
ALTER TABLE orders ADD COLUMN status VARCHAR(50) DEFAULT 'pending';
CREATE INDEX idx_orders_status ON orders (status);

-- 回滚执行
DROP INDEX IF EXISTS idx_orders_status;
ALTER TABLE orders DROP COLUMN IF EXISTS status;

Rollback Rules

回滚规则

Scenario	Rollback Strategy
Add column	Drop column
Add index	Drop index
Create table	Drop table
Add constraint	Drop constraint
Rename column	Rename back
Drop column	Cannot auto-rollback — must restore from backup or use prior migration to re-add
Data backfill	Reverse backfill or accept data state

场景	回滚策略
添加列	删除列
添加索引	删除索引
创建表	删除表
添加约束	删除约束
重命名列	改回原名称
删除列	无法自动回滚 —— 必须从备份恢复或使用之前的迁移重新添加
数据回填	反向回填或接受当前数据状态

Zero-Downtime Schema Changes

零停机模式变更

When your application cannot tolerate downtime during deploys, follow the expand-and-contract pattern. Never make breaking schema changes in a single step.

当应用在部署期间无法容忍停机时，请遵循扩展-收缩模式。绝不要通过单一步骤执行破坏性模式变更。

Step-by-Step: Renaming a Column

分步指南：重命名列

Renaming a column directly (

ALTER TABLE ... RENAME COLUMN

) breaks running application instances that reference the old name. Instead:

Deploy 1 — Expand: Add the new column alongside the old one.
sql
```
ALTER TABLE users ADD COLUMN full_name VARCHAR(255);
```

Deploy 2 — Dual Write: Update application code to write to both columns.

javascript

// Write to both columns
await db.query(
  'UPDATE users SET name = $1, full_name = $1 WHERE id = $2',
  [name, userId]
);

Deploy 3 — Backfill: Copy existing data from old column to new column.
sql
```
UPDATE users SET full_name = name WHERE full_name IS NULL;
```

Deploy 4 — Cut Over: Update application to read from and write to new column only.

javascript

// Read from new column
const user = await db.query('SELECT full_name FROM users WHERE id = $1', [userId]);

Deploy 5 — Contract: Drop the old column after a safe observation period.
sql
```
ALTER TABLE users DROP COLUMN name;
```

直接重命名列（

ALTER TABLE ... RENAME COLUMN

）会破坏引用旧列名的运行中应用实例。正确做法：

部署1 —— 扩展：在旧列旁添加新列。

sql

ALTER TABLE users ADD COLUMN full_name VARCHAR(255);

部署2 —— 双写：更新应用代码，同时向新旧两列写入数据。

javascript

// 同时写入两列
await db.query(
  'UPDATE users SET name = $1, full_name = $1 WHERE id = $2',
  [name, userId]
);

部署3 —— 回填：将旧列的现有数据复制到新列。
sql
```
UPDATE users SET full_name = name WHERE full_name IS NULL;
```

部署4 —— 切换：更新应用，仅从新列读取并向新列写入。

javascript

// 从新列读取
const user = await db.query('SELECT full_name FROM users WHERE id = $1', [userId]);

部署5 —— 收缩：经过一段安全观察期后，删除旧列。
sql
```
ALTER TABLE users DROP COLUMN name;
```

Step-by-Step: Adding a NOT NULL Constraint

分步指南：添加NOT NULL约束

Adding a

NOT NULL

constraint on an existing column with null values will fail or lock the table. Safe approach:

Add a check constraint as

NOT VALID

(PostgreSQL) to avoid scanning existing rows:

sql

ALTER TABLE orders ADD CONSTRAINT orders_status_not_null
  CHECK (status IS NOT NULL) NOT VALID;

Backfill any null values:

sql

UPDATE orders SET status = 'unknown' WHERE status IS NULL;

Validate the constraint (acquires lighter lock):

sql

ALTER TABLE orders VALIDATE CONSTRAINT orders_status_not_null;

Optionally convert to a proper

NOT NULL

column constraint:

sql

ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
ALTER TABLE orders DROP CONSTRAINT orders_status_not_null;

在包含空值的现有列上添加

NOT NULL

约束会失败或锁定表。安全方法：

添加

NOT VALID

的检查约束（PostgreSQL），避免扫描现有行：

sql

ALTER TABLE orders ADD CONSTRAINT orders_status_not_null
  CHECK (status IS NOT NULL) NOT VALID;

回填所有空值：

sql

UPDATE orders SET status = 'unknown' WHERE status IS NULL;

验证约束（获取轻量级锁）：

sql

ALTER TABLE orders VALIDATE CONSTRAINT orders_status_not_null;

（可选）转换为正式的列级

NOT NULL

约束：

sql

ALTER TABLE orders ALTER COLUMN status SET NOT NULL;
ALTER TABLE orders DROP CONSTRAINT orders_status_not_null;

Dangerous Operations

危险操作

Operation	Risk	Safe Alternative
`RENAME COLUMN`	Breaks queries referencing old name	Expand-and-contract pattern (add new, migrate, drop old)
`CHANGE COLUMN TYPE`	May require full table rewrite, long lock	Add new column with new type, backfill, swap
`DROP COLUMN`	Irreversible data loss	Verify no code references, back up data, then drop
`DROP TABLE`	Irreversible data and schema loss	Rename to `_deprecated_` first, drop after observation period
`ADD NOT NULL` (to existing column)	Fails if nulls exist; full table scan	Add as nullable, backfill, then add constraint
`ADD COLUMN WITH DEFAULT` (pre-PG11)	Full table rewrite on older PostgreSQL	Add nullable column, set default, backfill
`CREATE INDEX`	Blocks writes on table (non-concurrent)	Use `CREATE INDEX CONCURRENTLY` (PostgreSQL)
`ADD FOREIGN KEY`	Validates all existing rows, long lock	Add as `NOT VALID` , then `VALIDATE` separately
`ALTER TABLE ... LOCK`	Exclusive lock blocks all access	Minimize lock duration, run during low traffic

操作	风险	安全替代方案
`RENAME COLUMN`	破坏引用旧列名的查询	扩展-收缩模式（添加新列、迁移数据、删除旧列）
`CHANGE COLUMN TYPE`	可能需要全表重写，锁定时间长	添加带新类型的新列、回填数据、切换使用
`DROP COLUMN`	不可逆的数据丢失	确认无代码引用、备份数据后再删除
`DROP TABLE`	不可逆的数据和模式丢失	先重命名为 `_deprecated_` ，观察期后再删除
`ADD NOT NULL` （现有列）	若存在空值则失败；全表扫描	先添加可为空的列、回填数据，再添加约束
`ADD COLUMN WITH DEFAULT` （PG11之前版本）	旧版PostgreSQL会触发全表重写	添加可为空列、设置默认值、回填数据
`CREATE INDEX`	锁定表，阻止写入（非并发）	使用 `CREATE INDEX CONCURRENTLY` （PostgreSQL）
`ADD FOREIGN KEY`	验证所有现有行，锁定时间长	先添加为 `NOT VALID` ，再单独执行 `VALIDATE`
`ALTER TABLE ... LOCK`	排他锁阻止所有访问	最小化锁定时长，在低流量时段执行

Data Migration Strategies

数据迁移策略

Backfill Scripts

回填脚本

For populating new columns with computed or default data:

javascript

// Batch backfill to avoid locking and memory issues
async function backfillStatus(db, batchSize = 1000) {
  let totalUpdated = 0;
  let updated;

  do {
    const result = await db.query(`
      UPDATE orders
      SET status = 'pending'
      WHERE id IN (
        SELECT id FROM orders
        WHERE status IS NULL
        LIMIT $1
        FOR UPDATE SKIP LOCKED
      )
      RETURNING id
    `, [batchSize]);

    updated = result.rowCount;
    totalUpdated += updated;
    console.log(`Backfilled ${totalUpdated} rows...`);

    // Pause between batches to reduce load
    await new Promise(resolve => setTimeout(resolve, 100));
  } while (updated === batchSize);

  console.log(`Backfill complete. Total rows updated: ${totalUpdated}`);
}

用于用计算值或默认值填充新列：

javascript

// 批量回填以避免锁定和内存问题
async function backfillStatus(db, batchSize = 1000) {
  let totalUpdated = 0;
  let updated;

  do {
    const result = await db.query(`
      UPDATE orders
      SET status = 'pending'
      WHERE id IN (
        SELECT id FROM orders
        WHERE status IS NULL
        LIMIT $1
        FOR UPDATE SKIP LOCKED
      )
      RETURNING id
    `, [batchSize]);

    updated = result.rowCount;
    totalUpdated += updated;
    console.log(`已回填 ${totalUpdated} 行...`);

    // 批次间暂停以降低负载
    await new Promise(resolve => setTimeout(resolve, 100));
  } while (updated === batchSize);

  console.log(`回填完成。更新总行数：${totalUpdated}`);
}

Dual-Write Pattern

双写模式

When migrating data to a new structure or new system:

Start writing to both old and new locations.
Backfill historical data from old to new.
Verify consistency between old and new.
Switch reads to new location.
Stop writing to old location.
Decommission old structure after observation period.

将数据迁移到新结构或新系统时：

开始同时向旧位置和新位置写入数据。
将历史数据从旧位置回填到新位置。
验证新旧位置的数据一致性。
将读取切换到新位置。
停止向旧位置写入数据。
观察期后停用旧结构。

Shadow Tables

影子表

For large structural changes where you cannot modify the original table in place:

sql

-- 1. Create the new table with desired schema
CREATE TABLE users_v2 (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  full_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- 2. Copy data in batches
INSERT INTO users_v2 (full_name, email, created_at)
SELECT
  COALESCE(first_name || ' ' || last_name, first_name, 'Unknown'),
  email,
  created_at
FROM users
WHERE id BETWEEN 1 AND 10000;
-- Repeat for remaining batches

-- 3. Set up triggers or dual-write for new data during migration

-- 4. Swap tables atomically
ALTER TABLE users RENAME TO users_deprecated;
ALTER TABLE users_v2 RENAME TO users;

-- 5. Drop old table after observation period
-- DROP TABLE users_deprecated;

对于无法直接修改原表的大型结构变更：

sql

-- 1. 创建具有所需模式的新表
CREATE TABLE users_v2 (
  id BIGINT GENERATED ALWAYS AS IDENTITY PRIMARY KEY,
  full_name VARCHAR(255) NOT NULL,
  email VARCHAR(255) NOT NULL UNIQUE,
  created_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
);

-- 2. 批量复制数据
INSERT INTO users_v2 (full_name, email, created_at)
SELECT
  COALESCE(first_name || ' ' || last_name, first_name, 'Unknown'),
  email,
  created_at
FROM users
WHERE id BETWEEN 1 AND 10000;
-- 对剩余批次重复执行

-- 3. 迁移期间为新数据设置触发器或双写

-- 4. 原子性交换表
ALTER TABLE users RENAME TO users_deprecated;
ALTER TABLE users_v2 RENAME TO users;

-- 5. 观察期后删除旧表
-- DROP TABLE users_deprecated;

Migration Testing

迁移测试

Pre-Deployment Testing Checklist

部署前测试清单

Testing Against Production-Like Data

针对类生产数据的测试

Always test migrations against a database with realistic data volume:

bash

undefined

始终在具有真实数据量的数据库上测试迁移：

bash

undefined

Restore a production backup to a test environment

将生产备份恢复到测试环境

pg_restore --dbname=test_db production_backup.dump

Run the migration

执行迁移

psql -d test_db -f migrations/20240315143022_add_status_to_orders.sql

Verify results

验证结果

psql -d test_db -c "SELECT COUNT(*) FROM orders WHERE status IS NULL;"

Time the migration

统计迁移耗时

time psql -d test_db -f migrations/20240315143022_add_status_to_orders.sql

undefined

time psql -d test_db -f migrations/20240315143022_add_status_to_orders.sql

undefined

Database Seeding for Development

开发环境数据库初始化

Maintain seed files that populate development databases with realistic test data:

javascript

// seeds/001_users.js
exports.seed = async function(knex) {
  await knex('users').del();
  await knex('users').insert([
    { id: 1, name: 'Alice Developer', email: 'alice@example.com', role: 'admin' },
    { id: 2, name: 'Bob Tester', email: 'bob@example.com', role: 'user' },
    { id: 3, name: 'Carol Manager', email: 'carol@example.com', role: 'manager' },
  ]);
};

// seeds/002_orders.js
exports.seed = async function(knex) {
  await knex('orders').del();
  await knex('orders').insert([
    { id: 1, user_id: 1, status: 'completed', total: 99.99 },
    { id: 2, user_id: 2, status: 'pending', total: 49.50 },
    { id: 3, user_id: 1, status: 'shipped', total: 150.00 },
  ]);
};

维护初始化文件，为开发数据库填充真实的测试数据：

javascript

// seeds/001_users.js
exports.seed = async function(knex) {
  await knex('users').del();
  await knex('users').insert([
    { id: 1, name: 'Alice Developer', email: 'alice@example.com', role: 'admin' },
    { id: 2, name: 'Bob Tester', email: 'bob@example.com', role: 'user' },
    { id: 3, name: 'Carol Manager', email: 'carol@example.com', role: 'manager' },
  ]);
};

// seeds/002_orders.js
exports.seed = async function(knex) {
  await knex('orders').del();
  await knex('orders').insert([
    { id: 1, user_id: 1, status: 'completed', total: 99.99 },
    { id: 2, user_id: 2, status: 'pending', total: 49.50 },
    { id: 3, user_id: 1, status: 'shipped', total: 150.00 },
  ]);
};

Seed Data Principles

初始化数据原则

Seeds are idempotent: running them twice produces the same state.
Seeds use fixed IDs for referential integrity.
Seeds cover all enum values and edge cases.
Seeds never contain real user data or secrets.

初始化操作是幂等的：执行两次会产生相同的状态。
初始化使用固定ID以保证引用完整性。
初始化覆盖所有枚举值和边缘情况。
初始化绝不包含真实用户数据或敏感信息。

ORM Migration Tools Comparison

ORM迁移工具对比

Tool	Language	Migration Format	Rollback	Auto-Generate	Key Feature
Prisma Migrate	JS/TS	SQL files from schema diff	Limited (reset-based)	Yes, from `schema.prisma`	Declarative schema, drift detection
Knex.js	JS/TS	JavaScript files	Manual `down()` function	No (manual)	Flexible, raw SQL support
Alembic	Python	Python files	Manual `downgrade()`	Yes, from SQLAlchemy models	Branching support, auto-detect
ActiveRecord	Ruby	Ruby DSL files	Automatic `change` reversible	No (manual)	Reversible DSL methods
Diesel	Rust	SQL files	Manual `down.sql`	Yes, from schema diff	Compile-time schema verification
Flyway	Java/JVM	SQL or Java files	Paid feature (undo)	No (manual)	Convention-based, polyglot
golang-migrate	Go	SQL files	Manual `down.sql`	No (manual)	CLI-first, driver-agnostic

工具	语言	迁移格式	回滚支持	自动生成	核心特性
Prisma Migrate	JS/TS	基于模式差异生成的SQL文件	有限（基于重置）	是，从 `schema.prisma` 生成	声明式模式、漂移检测
Knex.js	JS/TS	JavaScript文件	手动 `down()` 函数	否（手动编写）	灵活、支持原生SQL
Alembic	Python	Python文件	手动 `downgrade()`	是，从SQLAlchemy模型生成	分支支持、自动检测
ActiveRecord	Ruby	Ruby DSL文件	自动可逆 `change` 方法	否（手动编写）	可逆DSL方法
Diesel	Rust	SQL文件	手动 `down.sql`	是，基于模式差异生成	编译时模式验证
Flyway	Java/JVM	SQL或Java文件	付费功能（撤销）	否（手动编写）	基于约定、多语言支持
golang-migrate	Go	SQL文件	手动 `down.sql`	否（手动编写）	命令行优先、驱动无关

Locking Considerations

锁定注意事项

Database schema changes acquire locks that can block application queries. Understand the lock implications:

数据库模式变更会获取锁，可能阻塞应用查询。了解锁的影响：

PostgreSQL Lock Levels

PostgreSQL锁级别

DDL Operation	Lock Acquired	Blocks Reads	Blocks Writes	Duration
`CREATE INDEX`	`ShareLock`	No	Yes	Duration of index build
`CREATE INDEX CONCURRENTLY`	`ShareUpdateExclusiveLock`	No	No	Longer build, but non-blocking
`ADD COLUMN` (nullable, no default)	`AccessExclusiveLock`	Yes	Yes	Near-instant (metadata only)
`ADD COLUMN` (with default, PG11+)	`AccessExclusiveLock`	Yes	Yes	Near-instant (virtual default)
`DROP COLUMN`	`AccessExclusiveLock`	Yes	Yes	Near-instant (marks as dropped)
`ALTER COLUMN TYPE`	`AccessExclusiveLock`	Yes	Yes	Full table rewrite
`ADD CONSTRAINT`	`AccessExclusiveLock`	Yes	Yes	Full table scan to validate
`ADD CONSTRAINT ... NOT VALID`	`ShareRowExclusiveLock`	No	Partially	Near-instant
`VALIDATE CONSTRAINT`	`ShareUpdateExclusiveLock`	No	No	Scans table, non-blocking

DDL操作	获取的锁	阻塞读	阻塞写	持续时间
`CREATE INDEX`	`ShareLock`	否	是	索引构建期间
`CREATE INDEX CONCURRENTLY`	`ShareUpdateExclusiveLock`	否	否	构建时间更长，但非阻塞
`ADD COLUMN` （可为空，无默认值）	`AccessExclusiveLock`	是	是	几乎即时（仅元数据操作）
`ADD COLUMN` （带默认值，PG11+）	`AccessExclusiveLock`	是	是	几乎即时（虚拟默认值）
`DROP COLUMN`	`AccessExclusiveLock`	是	是	几乎即时（标记为已删除）
`ALTER COLUMN TYPE`	`AccessExclusiveLock`	是	是	全表重写期间
`ADD CONSTRAINT`	`AccessExclusiveLock`	是	是	全表扫描验证期间
`ADD CONSTRAINT ... NOT VALID`	`ShareRowExclusiveLock`	否	部分阻塞	几乎即时
`VALIDATE CONSTRAINT`	`ShareUpdateExclusiveLock`	否	否	表扫描期间，非阻塞

Mitigating Lock Contention

缓解锁竞争

sql

-- Set a lock timeout to fail fast instead of waiting indefinitely
SET lock_timeout = '5s';

-- Retry the operation if it times out
-- Application code should handle this and retry with backoff

-- Kill long-running queries that block migrations
SELECT pg_cancel_backend(pid)
FROM pg_stat_activity
WHERE state = 'active'
  AND query_start < NOW() - INTERVAL '5 minutes'
  AND query NOT LIKE '%pg_stat_activity%';

sql

-- 设置锁超时，快速失败而非无限等待
SET lock_timeout = '5s';

-- 如果超时则重试操作
-- 应用代码应处理此情况并退避重试

-- 终止阻塞迁移的长时间运行查询
SELECT pg_cancel_backend(pid)
FROM pg_stat_activity
WHERE state = 'active'
  AND query_start < NOW() - INTERVAL '5 minutes'
  AND query NOT LIKE '%pg_stat_activity%';

database-migration

Original

Translation

Database Migration

数据库迁移

Migration File Conventions

迁移文件规范

Forward and Rollback Pairs

正向与回滚配对

Rollback Rules

回滚规则

Zero-Downtime Schema Changes

零停机模式变更

Step-by-Step: Renaming a Column

分步指南：重命名列

Step-by-Step: Adding a NOT NULL Constraint

分步指南：添加NOT NULL约束

Dangerous Operations

危险操作

Data Migration Strategies

数据迁移策略

Backfill Scripts

回填脚本

Dual-Write Pattern

双写模式

Shadow Tables

影子表

Migration Testing

迁移测试

Pre-Deployment Testing Checklist

部署前测试清单

Testing Against Production-Like Data

针对类生产数据的测试

Restore a production backup to a test environment

将生产备份恢复到测试环境

Run the migration

执行迁移

Verify results

验证结果

Time the migration

统计迁移耗时

Database Seeding for Development

开发环境数据库初始化

Seed Data Principles

初始化数据原则

ORM Migration Tools Comparison

ORM迁移工具对比

Locking Considerations

锁定注意事项

PostgreSQL Lock Levels

PostgreSQL锁级别

Mitigating Lock Contention

缓解锁竞争

Migration Deployment Checklist

迁移部署清单

Planning

规划阶段

Execution

执行阶段

Post-Migration

迁移后阶段