Data Move

Data migration fails in silent corruption, ordering bugs, and unclear cutover. Treat it as ETL with production risk: explicit mapping, checkpoints, and reconciliation against sources of truth.

When to Offer This Workflow

Trigger conditions:

- Moving data between databases, regions, or tenants
Large backfills after schema changes
Zero or minimal downtime requirements

Initial offer:

Use seven stages: (1) scope & invariants, (2) source/target mapping, (3) batching & idempotency, (4) validation rules, (5) execution strategy (big bang vs phased), (6) cutover & rollback, (7) reconciliation & sign-off). Confirm volume, downtime budget, and compliance (PII, audit).

Stage 1: Scope & Invariants

Goal: Define what moves, what must never diverge, and ordering dependencies (foreign keys, references).

Questions

1. Cutover moment: read-only window vs dual-write?
Immutable identifiers: preserve primary keys or remap with mapping tables?
Deletes: soft-delete vs hard-delete semantics in target

Exit condition: Written invariants (e.g., “every migrated row has legacy_id for traceability”).

Stage 2: Source/Target Mapping

Goal: Field-level mapping document; transforms (timezone, encoding, rounding); defaults for nulls.

Practices

- Surrogate keys generated deterministically or via mapping table
Document one-way vs bi-directional sync if any

Stage 3: Batching & Idempotency

Goal: Jobs restartable; same input yields same output (idempotent writes or upsert keys).

Practices

- Checkpoint by primary key or updated_at watermark
Throttle to protect source and target DB

Stage 4: Validation Rules

Goal: Row counts, checksums, sample joins, business invariants (sums, balances).

Practices

- Shadow compare: run parallel queries on old vs new for critical aggregates

Exit condition: Validation checklist signed before cutover.

Stage 5: Execution Strategy

Goal: Phased by tenant/region vs single window—risk vs complexity trade-off.

Patterns

- Dual-write then backfill then flip reads
Blue/green tables with rename swap

Stage 6: Cutover & Rollback

Goal: Runbook: who flips DNS/config, order of steps, rollback triggers (error rate, failed checks).

Practices

- Feature flags for read path to new store
Keep rollback script tested in staging

Stage 7: Reconciliation & Sign-off

Goal: Post-cutover 24–72h monitoring; reconciliation job scheduled; support playbook for edge cases.

Final Review Checklist

- [ ] Invariants and mapping documented
[ ] Idempotent batches with checkpoints
[ ] Validation and shadow checks passed
[ ] Cutover/rollback runbook tested
[ ] Reconciliation after go-live

Tips for Effective Guidance

- Never assume “batch job finished” = correct—prove with checks.
Clock skew and timezone bugs are classic—call them out in transforms.
Pair with db-migrate for schema timing vs data movement.

Handling Deviations

- Small one-off SQL: still document mapping and run counts before/after.

数据迁移

数据迁移常因静默损坏、排序错误和切换不清晰而失败。应将其视为带有生产风险的ETL：明确映射、检查点，并针对数据源进行对账。

何时提供此工作流

触发条件：

- 在数据库、区域或租户之间移动数据
架构变更后的大规模回填
零或最小停机时间要求

初始方案：

使用七个阶段：(1) 范围与不变性，(2) 源/目标映射，(3) 批处理与幂等性，(4) 验证规则，(5) 执行策略（大爆炸式 vs 分阶段），(6) 切换与回滚，(7) 对账与签收。确认数据量、停机预算和合规性（PII、审计）。

阶段1：范围与不变性

目标： 定义什么需要迁移、什么绝不能出现偏差，以及排序依赖关系（外键、引用）。

问题

1. 切换时刻：只读窗口 vs 双写？
不可变标识符：保留主键还是使用映射表重新映射？
删除：目标中的软删除 vs 硬删除语义

退出条件： 已编写的不变性规则（例如，每个迁移的行都有用于可追溯性的 legacy_id）。

阶段2：源/目标映射

目标： 字段级映射文档；转换（时区、编码、舍入）；空值的默认值。

实践

- 代理键通过确定性方式或映射表生成
如有任何单向与双向同步，需记录在案

阶段3：批处理与幂等性

目标： 作业可重启；相同输入产生相同输出（幂等写入或更新键）。

实践

- 按主键或updated_at水印设置检查点
限流以保护源和目标数据库

阶段4：验证规则

目标： 行数、校验和、抽样连接、业务不变性（总和、余额）。

实践

- 影子比较：对关键聚合值在旧系统与新系统上并行运行查询

退出条件： 切换前签署验证检查清单。

阶段5：执行策略

目标： 按租户/区域分阶段 vs 单窗口——风险与复杂性的权衡。

模式

- 双写，然后回填，最后切换读取
蓝绿表，通过重命名交换

阶段6：切换与回滚

目标： 操作手册：谁切换DNS/配置、步骤顺序、回滚触发条件（错误率、检查失败）。

实践

- 为新存储的读取路径设置功能开关
保持回滚脚本在预发布环境中经过测试

阶段7：对账与签收

目标： 切换后24-72小时监控；安排对账作业；为边缘情况准备支持手册。

最终审查清单

- [ ] 不变性规则和映射已记录
[ ] 带检查点的幂等批处理
[ ] 验证和影子检查已通过
[ ] 切换/回滚操作手册已测试
[ ] 上线后对账

有效指导技巧

- 切勿假设批处理作业完成=正确——用检查来证明。
时钟偏差和时区错误是经典问题——在转换中明确指出。
与db-migrate配合使用，协调架构时机与数据移动。

处理偏差

- 小型一次性SQL：仍需记录映射，并运行前后计数。

data-move数据迁移

data-move

Data Move

When to Offer This Workflow

Stage 1: Scope & Invariants

Questions

Stage 2: Source/Target Mapping

Practices

Stage 3: Batching & Idempotency

Practices

Stage 4: Validation Rules

Practices

Stage 5: Execution Strategy

Patterns

Stage 6: Cutover & Rollback

Practices

Stage 7: Reconciliation & Sign-off

Final Review Checklist

Tips for Effective Guidance

Handling Deviations

数据迁移

何时提供此工作流

阶段1：范围与不变性

问题

阶段2：源/目标映射

实践

阶段3：批处理与幂等性

实践

阶段4：验证规则

实践

阶段5：执行策略

模式

阶段6：切换与回滚

实践

阶段7：对账与签收

最终审查清单

有效指导技巧

处理偏差

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement