Fossil Record
"Code tells you what a system does. History tells you what a system survived."
What It Does
INLINECODE0 tells you WHO changed a line. git log tells you WHEN. Fossil Record tells you WHY — by analyzing patterns across the entire commit history to reconstruct the evolutionary pressures that shaped the codebase.
Every line of code is the result of a decision. Most decisions aren't documented. But they leave fossils: commit patterns, revert sequences, hotfix clusters, refactor waves, and the sediment of a hundred small choices that accumulated into the architecture you see today.
The Geological Model
Fossil Record treats your git history as a geological record, with distinct layers and eras:
| Geological Concept | Code Equivalent |
|---|
| Sediment Layers | Periods of steady development (feature commits) |
| Fault Lines |
Major refactors, rewrites, or architecture changes |
|
Impact Craters | Incident responses, emergency hotfixes, reverts |
|
Fossil Beds | Code that hasn't changed in a long time (stable or forgotten?) |
|
Erosion Patterns | Gradual drift from original design intent |
|
Extinction Events | Deleted modules, abandoned features, removed dependencies |
|
Adaptive Radiation | Rapid diversification after a major change (new abstraction spawning many implementations) |
The Eight Excavation Modes
1. Pressure Analysis
Question: What external forces shaped this code?
Analyzes commit message patterns, timing, and clustering to identify:
- - Deadline pressure: Commits accelerating toward a date, then stopping
- Incident pressure: Hotfix → fix → fix-the-fix → revert → different-fix patterns
- Stakeholder pressure: Feature requests appearing as interruptive commit sequences
- Technical debt pressure: Refactors that are started, abandoned, restarted
CODEBLOCK0
2. Decision Reconstruction
Question: What decisions were made here, and what alternatives were considered?
Analyzes:
- - Reverted commits (something was tried and rejected)
- Branches that were created but never merged (abandoned approaches)
- Comments that reference alternatives ("we could have used X but...")
- Sequential implementations of the same feature (iteration history)
CODEBLOCK1
3. Hotspot Archaeology
Question: Why is this specific area of code so volatile?
Goes beyond "this file changes often" to ask "what kind of changes happen here and what drives them?"
CODEBLOCK2
4. Extinction Mapping
Question: What used to be here, and why did it die?
Traces deleted code through git history to reconstruct what was removed and the conditions of its removal:
- - Was it replaced? By what?
- Was it gradually abandoned or suddenly deleted?
- Did its removal cause any subsequent issues (fixes referencing the deleted module)?
- Is anything still alive that was designed to work with the extinct module?
CODEBLOCK3
5. Sediment Dating
Question: How old is this code
really, and has it been maintained or just preserved?
For each module/file, determines:
- - Birth date: When was it first created?
- Last meaningful change: Not just whitespace/formatting — actual behavior change
- Maintenance frequency: Is it regularly updated or untouched?
- Author diversity: Has only one person ever modified this? (bus factor = 1)
- Era classification: Which architectural era does this code belong to?
CODEBLOCK4
6. Fault Line Detection
Question: Where are the tectonic boundaries in this codebase?
Identifies major architectural shifts by finding:
- - Large-scale rename/move operations
- Dependency replacements (library A → library B)
- Directory restructuring
- Changes to build systems, frameworks, or deployment targets
CODEBLOCK5
7. Author Topology
Question: How was knowledge distributed, and where are the gaps?
Maps which developers contributed to which areas, and identifies:
- - Knowledge monopolies: Areas only one person has ever touched
- Knowledge transfers: When a new contributor takes over an area
- Knowledge voids: When all contributors to an area have left the project
- Collaboration patterns: Which areas have healthy multi-author contribution
CODEBLOCK6
8. Evolution Trajectory
Question: Where is this codebase
heading?
Extrapolates from historical patterns to predict:
- - Which areas are actively evolving (increasing commit diversity and frequency)
- Which areas are calcifying (decreasing modifications, aging contributors)
- Which architectural patterns are expanding vs. contracting
- What the next likely "extinction event" or "fault line" might be
CODEBLOCK7
Integration
CODEBLOCK8
Output: The Geological Survey
CODEBLOCK9
Why It Matters
Code review looks at the present. Testing validates the expected. Fossil Record illuminates the past — because a codebase that doesn't understand its own history is condemned to repeat its own mistakes.
Zero external dependencies. Pure git analysis. No APIs, no cloud, no cost.
化石记录
代码告诉你系统做了什么。历史告诉你系统幸存了什么。
功能概述
git blame 告诉你谁修改了一行代码。git log 告诉你何时修改。化石记录告诉你为什么——通过分析整个提交历史中的模式,重构塑造代码库的演化压力。
每一行代码都是一个决策的结果。大多数决策没有被记录。但它们留下了化石:提交模式、回滚序列、热修复集群、重构浪潮,以及成百上千个小选择沉积而成的架构。
地质模型
化石记录将你的 Git 历史视为地质记录,具有不同的地层和时代:
| 地质概念 | 代码对应 |
|---|
| 沉积层 | 稳定开发期(功能提交) |
| 断层线 |
重大重构、重写或架构变更 |
|
撞击坑 | 事件响应、紧急热修复、回滚 |
|
化石层 | 长期未变的代码(稳定还是被遗忘?) |
|
侵蚀模式 | 原始设计意图的逐渐偏离 |
|
灭绝事件 | 被删除的模块、废弃的功能、移除的依赖 |
|
适应性辐射 | 重大变更后的快速多样化(新抽象催生多个实现) |
八种挖掘模式
1. 压力分析
问题: 哪些外部力量塑造了这段代码?
分析提交信息模式、时间和聚类,以识别:
- - 截止日期压力:提交加速趋向某个日期,然后停止
- 事件压力:热修复 → 修复 → 修复修复 → 回滚 → 不同修复模式
- 利益相关者压力:作为中断性提交序列出现的功能请求
- 技术债务压力:开始、放弃、重新开始的重构
输出:外部压力时间线及其对代码质量的影响。
示例:3月3日至17日期间,提交速度翻了三倍,测试覆盖率
从84%降至61%。接下来一周出现了三次热修复。
这段代码区域至今仍带着那次截止日期的伤痕。
2. 决策重构
问题: 这里做出了哪些决策,考虑了哪些替代方案?
分析:
- - 已回滚的提交(尝试过但被拒绝的方案)
- 已创建但从未合并的分支(被放弃的方法)
- 提及替代方案的注释(我们本可以用X但...)
- 同一功能的顺序实现(迭代历史)
输出:展示尝试过什么、保留了什么、放弃了什么的决策树。
示例:认证功能实现了3次:
v1(基于会话,提交a1b2..c3d4,已回滚)
v2(JWT,提交e5f6..g7h8,存活4个月)
v3(OAuth2,提交i9j0..k1l2,当前版本)
压力:v1→v2由扩展性问题驱动。v2→v3由SSO需求驱动。
3. 热点考古
问题: 为什么这段代码的特定区域如此不稳定?
超越这个文件经常变化,追问这里发生哪种变化,是什么驱动它们?
变化分类:
├── 错误修复:同一函数因修复不同错误而被修改(脆弱设计)
├── 功能堆积:函数随着功能被附加而增长(缺少抽象)
├── 配置变动:常量/阈值被反复调整(需求不明确)
├── 重构振荡:代码被反复重构(设计无共识)
└── 依赖动荡:由上游库更新驱动的变更(脆弱耦合)
4. 灭绝映射
问题: 这里曾经有什么,为什么它消亡了?
通过 Git 历史追踪已删除的代码,重构被移除的内容及其移除条件:
- - 它被替换了吗?被什么替换?
- 它是逐渐被废弃还是突然被删除?
- 它的移除是否导致了后续问题(引用已删除模块的修复)?
- 是否有任何仍然存活的设计与已灭绝模块协同工作?
输出:显示什么消失了、何时消失、留下了什么的灭绝时间线。
示例:recommendations模块在提交x1y2z3(2024年6月)中被删除。
仍有3个孤立数据库表存在。
2个API路由在其模式中仍引用推荐类型。
1个测试文件仍导入推荐引擎的模拟。
5. 沉积定年
问题: 这段代码
实际上有多老,它是一直被维护还是仅仅被保存?
对于每个模块/文件,确定:
- - 诞生日期:首次创建时间
- 最后一次有意义的变更:不仅仅是空白/格式——实际行为变化
- 维护频率:定期更新还是无人问津?
- 作者多样性:是否只有一个人修改过?(公交因子 = 1)
- 时代分类:这段代码属于哪个架构时代?
输出:带有时代边界的代码库年龄图。
示例:
src/auth/ 诞生:2023-01,最后修改:2025-11,时代:当前(第3代)
src/utils/ 诞生:2021-06,最后修改:2022-03,时代:创始(第1代)
src/payments/ 诞生:2024-08,最后修改:2024-08,时代:增长(第2代)
⚠️ src/utils/ 已有3年没有有意义的修改。化石层。
6. 断层线检测
问题: 这个代码库中的构造边界在哪里?
通过发现以下内容识别重大架构转变:
- - 大规模重命名/移动操作
- 依赖替换(库A → 库B)
- 目录重组
- 构建系统、框架或部署目标的变更
输出:显示架构时代及其边界的断层线图。
示例:检测到3条主要断层线:
1. [2022-09] 单体 → 微服务拆分(移动了142个文件)
2. [2023-06] REST → GraphQL迁移(修改了89个文件)
3. [2024-03] JavaScript → TypeScript转换(重命名了204个文件)
警告:断层线#2未完成。仍有23个端点使用REST。
7. 作者拓扑
问题: 知识是如何分布的,哪里有缺口?
映射哪些开发者贡献了哪些领域,并识别:
- - 知识垄断:只有一个人接触过的领域
- 知识转移:新贡献者接管某个领域时
- 知识真空:某个领域的所有贡献者都已离开项目
- 协作模式:哪些领域有健康的多作者贡献
输出:带有风险评估的知识拓扑图。
示例:src/billing/ — 全部247次提交由开发者X完成(最后活跃:2024年1月)。
开发者X已不在团队中。
没有其他贡献者曾修改过此模块。
知识真空。建议:为此模块安排专门的入职培训。
8. 演化轨迹
问题: 这个代码库正
走向何方?
从历史模式推断预测:
- - 哪些领域正在积极演化(提交多样性和频率增加)
- 哪些领域正在僵化(修改减少,作者老化)
- 哪些架构模式正在扩展 vs. 收缩
- 下一个可能的灭绝事件或断层线是什么
输出:基于历史动量的轨迹预测。
示例:代码库正趋向:
✓ 全面采用TypeScript(已转换92%,约2个月完成)
✓ GraphQL作为主要API层(已迁移78%)
⚠ /api和/services命名约定之间的分歧日益扩大
⚠ 超过2年的模块测试覆盖率下降(忽视模式)
集成
在以下情况下调用化石记录:
├── 加入新项目 → 运行完整地质调查
├── 修改旧代码前 → 运行沉积定年 + 决策重构
├── 事件发生后 → 对受影响区域运行压力分析
├── 架构审查期间 → 运行断层线检测 + 演化轨迹
├── 当有人问为什么时 → 对该特定区域运行决策重构
└── 新开发者入职 → 生成完整的演化叙事
输出:地质调查报告
╔══════════════════════════════════════════════════════════════╗
║ 化石记录:地质调查报告 ║
║ 仓库:acme-platform ║
║ 历史深度:3年,4,721次提交 ║
╠══════════════════════════════════════════════════════════════╣
║ ║
║ 已识别时代:3个 ║
║ ├── 创始(2022-01 → 2022-09):单体,Express,JS ║
║ ├── 增长(2022-09 → 2024-03):微服务,REST,JS/TS ║
║ └── 当前(2024-03 → 至今):微服务,GraphQL,TS ║
║ ║
║ 断层线:3条主要