Database Migration Patterns
Schema Evolution Strategies
| Strategy | Risk | Downtime | Best For |
|---|
| Additive-Only | Very Low | None | APIs with backward-compatibility guarantees |
| Expand-Contract |
Low | None | Renaming, restructuring, type changes |
|
Parallel Change | Low | None | High-risk changes on critical tables |
|
Lazy Migration | Medium | None | Large tables where bulk migration is too slow |
|
Big Bang | High | Yes | Dev/staging or small datasets only |
Default to Additive-Only. Escalate to Expand-Contract only when you must modify or remove existing structures.
Zero-Downtime Patterns
Every production migration must avoid locking tables or breaking running application code.
| Operation | Pattern | Key Constraint |
|---|
| Add column | Nullable first | Never add NOT NULL without default on large tables |
| Rename column |
Expand-contract | Add new → dual-write → backfill → switch reads → drop old |
|
Drop column | Deprecate first | Stop reading → stop writing → deploy → drop |
|
Change type | Parallel column | Add new type → dual-write + cast → switch → drop old |
|
Add index | Concurrent |
CREATE INDEX CONCURRENTLY — don't wrap in transaction |
|
Split table | Extract + FK | Create new → backfill → add FK → update queries → drop old columns |
|
Change constraint | Two-phase | Add
NOT VALID →
VALIDATE CONSTRAINT separately |
|
Add enum value | Append only | Never remove or rename existing values |
Migration Tools
| Tool | Ecosystem | Style | Key Strength |
|---|
| Prisma Migrate | TypeScript/Node | Declarative (schema diff) | ORM integration, shadow DB |
| Knex |
JavaScript/Node | Imperative (up/down) | Lightweight, flexible |
|
Drizzle Kit | TypeScript/Node | Declarative (schema diff) | Type-safe, SQL-like |
|
Alembic | Python | Imperative (upgrade/downgrade) | Granular control, autogenerate |
|
Django Migrations | Python/Django | Declarative (model diff) | Auto-detection |
|
Flyway | JVM / CLI | SQL file versioning | Simple, wide DB support |
|
golang-migrate | Go / CLI | SQL (up/down files) | Minimal, embeddable |
|
Atlas | Go / CLI | Declarative (HCL/SQL diff) | Schema-as-code, linting, CI |
Match the tool to your ORM and deployment pipeline. Prefer declarative for simple schemas, imperative for fine-grained data manipulation.
Rollback Strategies
| Approach | When to Use |
|---|
| Reversible (up + down) | Schema-only changes, early-stage products |
| Forward-only (corrective migration) |
Data-destructive changes, production at scale |
|
Hybrid | Reversible for schema, forward-only for data |
Data Preservation
- 1. Soft-delete columns — rename with
_deprecated suffix instead of dropping - Snapshot tables — INLINECODE5
- Point-in-time recovery — ensure WAL archiving covers migration windows
- Logical backups —
pg_dump of affected tables before migration
Blue-Green Database
CODEBLOCK0
Data Migration Patterns
Backfill Strategies
| Strategy | Best For |
|---|
| Inline backfill | Small tables (< 100K rows) |
| Batched backfill |
Medium tables (100K–10M rows) |
|
Background job | Large tables (10M+ rows) |
|
Lazy backfill | When immediate consistency not required |
Batch Processing
CODEBLOCK1
Dual-Write Period
For expand-contract and parallel change:
- 1. Dual-write — application writes to both old and new columns/tables
- Backfill — fill new structure with historical data
- Verify — assert consistency (row counts, checksums)
- Cut over — switch reads to new, stop writing to old
- Cleanup — drop old structure after cool-down period
Testing Migrations
Test Against Production-Like Data
- - Never test against empty or synthetic data only
- Use anonymized production snapshots
- Match data volume — a migration working on 1K rows may lock on 10M
- Reproduce edge cases: NULLs, empty strings, max-length, unicode
Migration CI Pipeline
CODEBLOCK2
Every migration PR must pass: up → down → up → tests.
Migration Checklist
Pre-Migration
- - [ ] Tested against production-like data volume
- [ ] Rollback written and tested
- [ ] Backup of affected tables created
- [ ] App code compatible with both old and new schema
- [ ] Execution time benchmarked on staging
- [ ] Lock impact analyzed
- [ ] Replication lag monitoring in place
During Migration
- - [ ] Monitor lock waits and active queries
- [ ] Monitor replication lag
- [ ] Watch for error rate spikes
- [ ] Keep rollback command ready
Post-Migration
- - [ ] Schema matches expected state
- [ ] Integration tests pass against migrated DB
- [ ] Data integrity validated (row counts, checksums)
- [ ] ORM schema / type definitions updated
- [ ] Deprecated structures cleaned up after cool-down
- [ ] Migration documented in team runbook
NEVER Do
- 1. NEVER run untested migrations directly in production
- NEVER drop a column without first removing all application references and deploying
- NEVER add
NOT NULL to a large table without a default value in a single statement - NEVER mix schema DDL and data mutations in the same migration file
- NEVER skip the dual-write phase when renaming columns in a live system
- NEVER assume migrations are instantaneous — always benchmark on production-scale data
- NEVER disable foreign key checks to "speed up" migrations in production
- NEVER deploy application code that depends on a schema change before the migration has completed
数据库迁移模式
模式演进策略
| 策略 | 风险 | 停机时间 | 最佳适用场景 |
|---|
| 仅新增 | 极低 | 无 | 需要向后兼容保证的API |
| 扩展-收缩 |
低 | 无 | 重命名、重构、类型变更 |
|
并行变更 | 低 | 无 | 关键表上的高风险变更 |
|
惰性迁移 | 中 | 无 | 批量迁移速度过慢的大表 |
|
大爆炸 | 高 | 是 | 仅限开发/测试环境或小数据集 |
默认使用仅新增策略。 仅在必须修改或移除现有结构时,才升级到扩展-收缩策略。
零停机模式
每个生产环境的迁移都必须避免锁表或破坏正在运行的应用程序代码。
| 操作 | 模式 | 关键约束 |
|---|
| 添加列 | 先设为可空 | 切勿在大表上添加没有默认值的 NOT NULL 约束 |
| 重命名列 |
扩展-收缩 | 新增列 → 双写 → 回填 → 切换读取 → 删除旧列 |
|
删除列 | 先弃用 | 停止读取 → 停止写入 → 部署 → 删除 |
|
变更类型 | 并行列 | 添加新类型 → 双写 + 类型转换 → 切换 → 删除旧类型 |
|
添加索引 | 并发 | CREATE INDEX CONCURRENTLY — 不要包裹在事务中 |
|
拆分表 | 提取 + 外键 | 创建新表 → 回填 → 添加外键 → 更新查询 → 删除旧列 |
|
变更约束 | 两阶段 | 先添加 NOT VALID → 再单独执行 VALIDATE CONSTRAINT |
|
添加枚举值 | 仅追加 | 切勿删除或重命名现有值 |
迁移工具
| 工具 | 生态系统 | 风格 | 核心优势 |
|---|
| Prisma Migrate | TypeScript/Node | 声明式(模式差异) | ORM集成,影子数据库 |
| Knex |
JavaScript/Node | 命令式(上/下) | 轻量级,灵活 |
|
Drizzle Kit | TypeScript/Node | 声明式(模式差异) | 类型安全,类SQL |
|
Alembic | Python | 命令式(升级/降级) | 精细控制,自动生成 |
|
Django Migrations | Python/Django | 声明式(模型差异) | 自动检测 |
|
Flyway | JVM / CLI | SQL文件版本控制 | 简单,广泛数据库支持 |
|
golang-migrate | Go / CLI | SQL(上/下文件) | 极简,可嵌入 |
|
Atlas | Go / CLI | 声明式(HCL/SQL差异) | 模式即代码,代码检查,CI集成 |
根据你的ORM和部署流水线选择合适的工具。简单模式优先选择声明式,精细数据操作优先选择命令式。
回滚策略
| 方法 | 何时使用 |
|---|
| 可逆(上 + 下) | 仅模式变更,早期阶段产品 |
| 仅向前(修正性迁移) |
数据破坏性变更,大规模生产环境 |
|
混合 | 模式变更可逆,数据变更仅向前 |
数据保留
- 1. 软删除列 — 使用 deprecated 后缀重命名,而非直接删除
- 快照表 — CREATE TABLE backup<表名><日期> AS SELECT * FROM <表名>
- 时间点恢复 — 确保WAL归档覆盖迁移窗口期
- 逻辑备份 — 迁移前对受影响的表执行 pg_dump
蓝绿数据库
- 1. 复制主库 → 从库(绿色)
- 对绿色环境应用迁移
- 对绿色环境运行验证套件
- 将流量切换到绿色环境
- 保留蓝色环境作为回滚目标(N小时)
- 在确认窗口期过后停用蓝色环境
数据迁移模式
回填策略
| 策略 | 最佳适用场景 |
|---|
| 内联回填 | 小表(少于10万行) |
| 批量回填 |
中等表(10万–1000万行) |
|
后台任务 | 大表(超过1000万行) |
|
惰性回填 | 不需要立即一致性的场景 |
批量处理
sql
DO $$
DECLARE
batch_size INT := 1000;
rows_updated INT;
BEGIN
LOOP
UPDATE my_table
SET newcol = computevalue(old_col)
WHERE id IN (
SELECT id FROM my_table
WHERE new_col IS NULL
LIMIT batch_size
FOR UPDATE SKIP LOCKED
);
GET DIAGNOSTICS rowsupdated = ROWCOUNT;
EXIT WHEN rows_updated = 0;
PERFORM pg_sleep(0.1); -- 节流以减少锁压力
COMMIT;
END LOOP;
END $$;
双写期
适用于扩展-收缩和并行变更:
- 1. 双写 — 应用程序同时写入旧列/表和新列/表
- 回填 — 用历史数据填充新结构
- 验证 — 断言一致性(行数、校验和)
- 切换 — 将读取切换到新结构,停止写入旧结构
- 清理 — 在冷却期后删除旧结构
测试迁移
使用类生产数据测试
- - 切勿仅使用空数据或合成数据进行测试
- 使用脱敏的生产快照
- 匹配数据量 — 在1000行上能正常工作的迁移可能在1000万行上导致锁表
- 复现边界情况:NULL值、空字符串、最大长度、Unicode
迁移CI流水线
yaml
steps:
- run: docker compose up -d db
- run: npm run migrate:up # 应用所有迁移
- run: npm run migrate:down # 回滚所有迁移
- run: npm run migrate:up # 重新应用(幂等性)
- run: npm run test:integration # 验证应用程序
- run: npm run migrate:status # 无待处理迁移
每个迁移PR必须通过:上 → 下 → 上 → 测试。
迁移检查清单
迁移前
- - [ ] 已使用类生产数据量进行测试
- [ ] 已编写并测试回滚方案
- [ ] 已创建受影响表的备份
- [ ] 应用程序代码兼容新旧两种模式
- [ ] 已在预发布环境基准测试执行时间
- [ ] 已分析锁影响
- [ ] 已部署复制延迟监控
迁移期间
- - [ ] 监控锁等待和活跃查询
- [ ] 监控复制延迟
- [ ] 关注错误率峰值
- [ ] 准备好回滚命令
迁移后
- - [ ] 模式与预期状态一致
- [ ] 集成测试在迁移后的数据库上通过
- [ ] 数据完整性已验证(行数、校验和)
- [ ] ORM模式/类型定义已更新
- [ ] 冷却期后已清理弃用结构
- [ ] 迁移已在团队运行手册中记录
绝对禁止
- 1. 绝对不要 未经测试直接在生产环境运行迁移
- 绝对不要 在未移除所有应用程序引用并部署前删除列
- 绝对不要 在单个语句中为没有默认值的大表添加 NOT NULL 约束
- 绝对不要 在同一个迁移文件中混合模式DDL和数据变更
- 绝对不要 在在线系统中重命名列时跳过双写阶段
- 绝对不要 假设迁移是瞬间完成的 — 始终使用生产规模数据进行基准测试
- 绝对不要 为加速生产环境迁移而禁用外键检查
- 绝对不要 在迁移完成前部署依赖模式变更的应用程序代码