Data Model

Analytics models succeed when grain is explicit, keys are stable, and slowly changing dimensions are chosen deliberately—not “star schema by default.”

When to Offer This Workflow

Trigger conditions:

- Designing a warehouse, lakehouse, or BI layer
Confusion on one row per what; duplicate counts in reports
Refactoring dimensional models for performance or clarity

Initial offer:

Use six stages: (1) business questions & grain, (2) conformed dimensions, (3) facts & measures, (4) dimensions & SCD types, (5) keys & integrity, (6) performance & evolution). Confirm tooling (dbt, dimensional DW, BigQuery, etc.).

Stage 1: Business Questions & Grain

Goal: Grain = the atomic row: e.g., “one line item per order per day” not “sort of per order.”

Practices

- List questions the model must answer; derive grain from smallest needed detail

Exit condition: One sentence grain per fact table.

Stage 2: Conformed Dimensions

Goal: Same customer/product definitions across facts—shared dimension tables or SCD policy aligned.

Stage 3: Facts & Measures

Goal: Additive vs semi-additive vs non-additive measures documented (balances, distinct counts).

Practices

- Degenerate dimensions vs junk dimensions—avoid wide fact sprawl without reason

Stage 4: Dimensions & SCD Types

Goal: SCD1 overwrite vs SCD2 history with valid_from/valid_to vs SCD3 limited history—match compliance and reporting needs.

Stage 5: Keys & Integrity

Goal: Surrogate keys in facts; natural keys preserved as attributes; referential integrity strategy in the warehouse layer.

Stage 6: Performance & Evolution

Goal: Partition and cluster keys for large facts; late-arriving facts policy; version dims when schema evolves.

Final Review Checklist

- [ ] Grain explicit per fact table
[ ] Conformed dimensions planned
[ ] Measure additivity documented
[ ] SCD strategy per critical dimension
[ ] Keys and late-arriving data handled

Tips for Effective Guidance

- Fan traps and chasm traps in BI—flag when joining across facts incorrectly.
Snapshot fact tables for point-in-time balances vs transaction facts.

Handling Deviations

- Event-only pipelines: still model curated dimensions for analysis, not only raw JSON.

数据模型

当粒度明确、键值稳定、缓慢变化维度经过审慎选择时，分析模型才能成功——而非默认使用星型模式。

何时提供此工作流

触发条件：

- 设计数据仓库、湖仓一体或BI层时
对每行代表什么存在困惑；报表中出现重复计数
为提升性能或清晰度而重构维度模型

初始建议：

使用六个阶段：(1)业务问题与粒度，(2)一致性维度，(3)事实与度量，(4)维度与SCD类型，(5)键值与完整性，(6)性能与演进。确认工具链（dbt、维度数据仓库、BigQuery等）。

阶段1：业务问题与粒度

目标： 粒度=原子行：例如每天每订单的每个行项目，而非大致按订单。

实践

- 列出模型必须回答的问题；从所需最小细节推导粒度

退出条件： 每个事实表一句粒度描述。

阶段2：一致性维度

目标： 跨事实表保持相同的客户/产品定义——共享维度表或SCD策略对齐。

阶段3：事实与度量

目标： 记录可加性 vs 半可加性 vs 不可加性度量（余额、去重计数）。

实践

- 退化维度 vs 垃圾维度——避免 无理由 扩展 宽事实表

阶段4：维度与SCD类型

目标： SCD1覆盖 vs SCD2历史记录（使用validfrom/validto） vs SCD3有限历史——匹配合规与报表需求。

阶段5：键值与完整性

目标： 事实表中使用代理键；自然键作为属性保留；数据仓库层制定参照完整性策略。

阶段6：性能与演进

目标： 为大事实表设置分区和聚簇键；制定延迟到达事实策略；模式演进时版本化 维度。

最终审查清单

- [ ] 每个事实表粒度明确
[ ] 计划好一致性维度
[ ] 记录度量可加性
[ ] 每个关键维度制定SCD策略
[ ] 处理键值与延迟到达数据

有效指导技巧

- BI中的扇出陷阱和沟壑陷阱——在跨事实表错误连接时标记。
快照事实表用于时间点余额 vs 事务事实表。

处理偏差情况

- 仅事件管道：仍需为分析建模精选维度，而非仅原始JSON。

data-model数据模型

data-model

Data Model

When to Offer This Workflow

Stage 1: Business Questions & Grain

Practices

Stage 2: Conformed Dimensions

Stage 3: Facts & Measures

Practices

Stage 4: Dimensions & SCD Types

Stage 5: Keys & Integrity

Stage 6: Performance & Evolution

Final Review Checklist

Tips for Effective Guidance

Handling Deviations

数据模型

何时提供此工作流

阶段1：业务问题与粒度

实践

阶段2：一致性维度

阶段3：事实与度量

实践

阶段4：维度与SCD类型

阶段5：键值与完整性

阶段6：性能与演进

最终审查清单

有效指导技巧

处理偏差情况

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

data-model数据模型

data-model

Data Model

When to Offer This Workflow

Stage 1: Business Questions & Grain

Practices

Stage 2: Conformed Dimensions

Stage 3: Facts & Measures

Practices

Stage 4: Dimensions & SCD Types

Stage 5: Keys & Integrity

Stage 6: Performance & Evolution

Final Review Checklist

Tips for Effective Guidance

Handling Deviations

数据模型

何时提供此工作流

阶段1：业务问题与粒度

实践

阶段2：一致性维度

阶段3：事实与度量

实践

阶段4：维度与SCD类型

阶段5：键值与完整性

阶段6：性能与演进

最终审查清单

有效指导技巧

处理偏差情况

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement