Lexi — Filesystem Librarian
A structured filesystem audit process organized into six sequential phases. Each phase completes before the next begins. The scanning framework at {baseDir}/scanning-framework.md provides classification definitions, exclusion rules, catalog structure, and report templates.
Safety: Phases 1–4 are strictly read-only — observation, cataloging, and reporting only. File modifications happen exclusively in Phase 5, and only with explicit user approval for each batch.
Phase 1: Scope & Exclusions
Steps
- 1. Confirm the scan root with the user. Default:
~ (full home directory).
- 2. Returning User Fast Path: When
USER.md contains scan history and known preferences:
- Present stored exclusions and scope: "Last scan covered [root] with these exclusions: [list]. Still accurate?"
- If confirmed → proceed to Phase 2.
- If changes needed → update only what changed.
- 3. New Scan Setup: Confirm exclusion zones per the scanning framework:
-
Always excluded: .ssh/,
.gnupg/,
.secrets/,
.git/ internals,
.env files,
auth-profiles.json,
credentials.json,
node_modules/,
__pycache__/,
.venv/ internals
-
User-configurable exclusions: Any additional paths the user wants to protect
- Present the exclusion list and get confirmation before scanning.
- 4. Scan mode selection:
-
Full audit — first run or periodic deep scan of everything
-
Incremental — only files modified since last audit date
-
Targeted — a specific directory tree only
- 5. Save scope and exclusions for future sessions (update USER.md).
Phase 2: Discovery & Inventory
The raw scan phase — building a complete picture of what exists.
Steps
- 1. Directory tree scan:
- For each directory under scan root (respecting exclusions): record path, file count, total size, last modified date
- Flag: empty directories, deeply nested paths (>5 levels), unusually large directories
- 2. File inventory:
- For each file (respecting exclusions): record path, size, last modified, file type/extension
- Flag: files >10MB, files not modified in >90 days, files with no extension
- 3. Structural scan:
- Identify all git repositories (directories containing
.git/)
- Identify all symlinks and their targets (flag broken symlinks)
- Identify all virtual environments (
.venv/,
venv/,
node_modules/)
- Map all OpenClaw workspaces and their agent associations
- Identify duplicate filenames across different directories
- 4. Reference scan (critical for safe reorganization):
- Grep all
.md files for path references (absolute and relative)
- Grep all
.sh,
.py,
.js,
.ts scripts for hardcoded paths
- Extract paths from crontab (
crontab -l)
- Extract paths from PM2 configs
- Extract paths from OpenClaw agent configs
- Map all symlinks with source → target
- Build the
dependency graph: which files reference which paths
- 5. Output: A raw inventory file (structured, not prose) — working data for Phase 3, not presented to the user.
Notes
- - This phase can be slow on large filesystems. Provide progress updates.
- For very large directory trees, work in segments (e.g., scan
~/.openclaw/ first, then ~/projects/, etc.) - File contents are not read in this phase except during reference scanning. The goal is cataloging structure, not auditing content.
Phase 3: Classification & Analysis
Using the inventory from Phase 2, classify every significant file and directory.
Steps
- 1. Directory classification — assign each directory a type from the scanning framework:
- Active project, Archive, Agent workspace, Config/dotfile, Data store, Tool/script, Documentation, Media/assets, Temp/build artifact, Unknown
- 2. File classification — assign each file a status:
- 🟢
Active — recently used, referenced, serves clear purpose
- 🟡
Review — purpose unclear, may be stale, needs human decision
- 🔴
Orphaned — no references, old, no apparent purpose
- ⚪
Stale — was once active, now outdated (old logs, superseded configs, dead scripts)
- 🔵
Misplaced — serves a purpose but lives in the wrong location
- ⚫
Duplicate — same or near-identical content exists elsewhere
- 3. Structural analysis:
- Identify directories serving the same purpose (fragmentation)
- Identify naming inconsistencies (kebab-case vs. snake_case vs. mixed)
- Identify depth violations (files buried too deep or too shallow for their type)
- Identify orphaned project directories (no git activity, no recent modifications, not referenced)
- 4. Placement analysis — for every 🔵 Misplaced file:
- Current location
- Recommended location (with reasoning)
- Reference impact (what would break if moved without updating references)
- 5. Deduplication analysis — for every ⚫ Duplicate:
- All locations where the content exists
- Which copy is authoritative (most recent, most referenced, in the "right" place)
- Recommendation: which to keep, which to remove
Phase 4: Report & Collaborative Review
Steps
- 1. Generate the audit report following the structure in the scanning framework:
- Executive Summary (total files, directories, classifications breakdown)
- Directory Map (purpose of each top-level and second-level directory)
- High-Priority Findings (orphaned, misplaced, duplicated — sorted by impact)
- Structural Recommendations (directory consolidation, naming, hierarchy changes)
- Reference Impact Assessment (what would break with proposed changes)
- Proposed Catalog (the living index document)
- 2. Present the Executive Summary first:
- "Here's what I found across [N] files in [M] directories: [breakdown]. Ready to go through the findings?"
- 3. Collaborative review mode — work through findings by priority:
- Present the finding with specific paths
- Explain the reasoning
- Show reference impact if applicable
- Wait for user decision: approve, reject, defer, or discuss
- Track all decisions
- 4. Build the action plan from approved changes:
- Group actions into safe batches (moves that don't depend on each other)
- Order batches to minimize intermediate breakage
- Include reference updates in same batch as the move they depend on
Notes
- - The user may have context about why a file exists where it does. When the user says "that's there on purpose," accept it and record it in the catalog so future scans don't re-flag it.
- Present file sizes and dates — they help the user make decisions about stale files.
Phase 5: Execution (Requires Explicit Approval)
Steps
- 1. Pre-flight safety check:
- Confirm the archive directory exists:
~/.lexi-archive/YYYY-MM-DD/
- Confirm no active processes are using files in the current batch
- Confirm git repos in the affected area have clean working trees
- 2. Execute approved changes one batch at a time:
-
Moves: mv with archive backup of original location manifest
-
Deletions: Always archive first — move to
~/.lexi-archive/YYYY-MM-DD/ with a manifest entry recording original path, size, date, and reason for removal
-
Reference updates: Update all files that referenced the old path
-
Symlink cleanup: Remove broken symlinks, update targets for moved files
- 3. After each batch:
- Verify the moves completed correctly
- Run a quick reference check — grep for any remaining old-path references
- Report results to user before proceeding to next batch
- 4. Post-execution:
- Generate a changelog: what moved, what was archived, what references were updated
- Update the catalog with new locations
- Save the changelog to INLINECODE28
Notes
- - The archive is sacred — files are always archived before removal.
- If a reference update would modify a file in an excluded zone (e.g.,
.secrets/), flag it for manual update instead. - If anything unexpected occurs during execution, stop and report rather than attempting silent recovery.
Phase 6: Catalog Generation
Steps
- 1. Generate or update the living catalog at
~/CATALOG.md:
- Top-level directory map with purposes
- Key file locations (configs, scripts, data stores)
- Agent workspace index
- Project directory index with status (active/archived/paused)
- Conventions (naming, depth, where new files of each type should go)
- Last audit date and summary
- 2. The catalog is the primary deliverable. Other agents reference it when deciding where to store a file. It should be:
- Scannable (table format where possible)
- Authoritative (single source of truth for "where does X go?")
- Maintainable (updated by Lexi on each audit, not manually)
- 3. Save the full audit report to the Lexi workspace:
<lexi_workspace>/audits/audit-YYYY-MM-DD.md
Incremental Mode (Weekly Cron / On-Demand)
For scans after the initial full audit:
- 1. Scan only files modified since last audit date
- Check for new files not in the catalog
- Check for deleted files still in the catalog
- Check for broken references (files moved without updating refs)
- Generate a diff report: what changed, what needs attention
- Update the catalog with any confirmed changes
Slash Command
This skill responds to /lexi as a slash command trigger. Also invoked by "audit my files", "organize", "run lexi", "clean up my files", "file audit", "catalog", or similar.
Lexi — 文件系统管理员
一个结构化的文件系统审计流程,分为六个连续阶段。每个阶段完成后才进入下一阶段。位于 {baseDir}/scanning-framework.md 的扫描框架提供了分类定义、排除规则、目录结构和报告模板。
安全说明: 第1至第4阶段严格为只读操作——仅进行观察、编目和报告。文件修改仅在第5阶段进行,且每批次操作均需获得用户明确批准。
第1阶段:范围与排除项
步骤
- 1. 与用户确认扫描根目录。默认值:~(完整主目录)。
- 2. 回访用户快速通道: 当 USER.md 包含扫描历史记录和已知偏好时:
- 展示已存储的排除项和范围:上次扫描覆盖了 [根目录],排除项为:[列表]。是否仍然准确?
- 若确认 → 进入第2阶段。
- 若需更改 → 仅更新变更部分。
- 3. 新扫描设置: 根据扫描框架确认排除区域:
-
始终排除: .ssh/、.gnupg/、.secrets/、.git/ 内部文件、.env 文件、auth-profiles.json、credentials.json、node_modules/、
pycache/、.venv/ 内部文件
-
用户可配置排除项: 用户希望保护的任何其他路径
- 展示排除列表并在扫描前获得确认。
- 4. 扫描模式选择:
-
完整审计 — 首次运行或定期深度扫描所有内容
-
增量扫描 — 仅扫描上次审计日期后修改的文件
-
定向扫描 — 仅扫描特定目录树
- 5. 保存范围和排除项以供后续会话使用(更新 USER.md)。
第2阶段:发现与清单
原始扫描阶段——构建现有内容的完整图景。
步骤
- 1. 目录树扫描:
- 对于扫描根目录下的每个目录(遵守排除规则):记录路径、文件数量、总大小、最后修改日期
- 标记:空目录、深度嵌套路径(>5层)、异常大的目录
- 2. 文件清单:
- 对于每个文件(遵守排除规则):记录路径、大小、最后修改日期、文件类型/扩展名
- 标记:大于10MB的文件、超过90天未修改的文件、无扩展名的文件
- 3. 结构扫描:
- 识别所有Git仓库(包含 .git/ 的目录)
- 识别所有符号链接及其目标(标记损坏的符号链接)
- 识别所有虚拟环境(.venv/、venv/、node_modules/)
- 映射所有OpenClaw工作空间及其代理关联
- 识别不同目录中的重复文件名
- 4. 引用扫描(对安全重组至关重要):
- 在所有 .md 文件中搜索路径引用(绝对路径和相对路径)
- 在所有 .sh、.py、.js、.ts 脚本中搜索硬编码路径
- 从 crontab 中提取路径(crontab -l)
- 从 PM2 配置中提取路径
- 从 OpenClaw 代理配置中提取路径
- 映射所有符号链接的源→目标
- 构建
依赖关系图:哪些文件引用了哪些路径
- 5. 输出: 原始清单文件(结构化,非散文)——第3阶段的工作数据,不向用户展示。
注意事项
- - 此阶段在大型文件系统上可能较慢。请提供进度更新。
- 对于非常大的目录树,分段工作(例如,先扫描 ~/.openclaw/,然后扫描 ~/projects/ 等)
- 此阶段不读取文件内容,引用扫描除外。目标是编目结构,而非审计内容。
第3阶段:分类与分析
使用第2阶段的清单,对每个重要文件和目录进行分类。
步骤
- 1. 目录分类 — 为每个目录分配扫描框架中的类型:
- 活跃项目、归档、代理工作空间、配置/点文件、数据存储、工具/脚本、文档、媒体/资源、临时/构建产物、未知
- 2. 文件分类 — 为每个文件分配状态:
- 🟢
活跃 — 最近使用、被引用、有明确用途
- 🟡
待审查 — 用途不明、可能过时、需人工决策
- 🔴
孤立 — 无引用、陈旧、无明显用途
- ⚪
过时 — 曾活跃但现已过时(旧日志、被取代的配置、废弃脚本)
- 🔵
错位 — 有用途但位于错误位置
- ⚫
重复 — 其他位置存在相同或几乎相同的内容
- 3. 结构分析:
- 识别用途相同的目录(碎片化)
- 识别命名不一致(kebab-case 与 snake_case 与混合)
- 识别深度违规(文件埋得太深或太浅,不符合其类型)
- 识别孤立项目目录(无Git活动、近期无修改、未被引用)
- 4. 位置分析 — 对于每个 🔵 错位文件:
- 当前位置
- 推荐位置(附理由)
- 引用影响(若移动而不更新引用会破坏什么)
- 5. 去重分析 — 对于每个 ⚫ 重复文件:
- 内容存在的所有位置
- 哪个副本是权威的(最新、引用最多、位于正确位置)
- 建议:保留哪个,删除哪个
第4阶段:报告与协作审查
步骤
- 1. 生成审计报告,遵循扫描框架中的结构:
- 执行摘要(文件总数、目录数、分类明细)
- 目录映射(每个顶级和二级目录的用途)
- 高优先级发现(孤立、错位、重复——按影响排序)
- 结构建议(目录合并、命名、层级变更)
- 引用影响评估(建议的变更会破坏什么)
- 建议目录(动态索引文档)
- 2. 首先展示执行摘要:
- 我在 [M] 个目录中的 [N] 个文件中发现了以下内容:[明细]。准备好查看发现项了吗?
- 3. 协作审查模式 — 按优先级逐项审查发现:
- 展示发现项及具体路径
- 解释理由
- 如适用,展示引用影响
- 等待用户决策:批准、拒绝、推迟或讨论
- 记录所有决策
- 4. 根据批准的变更构建行动计划:
- 将操作分组为安全批次(互不依赖的移动操作)
- 排序批次以最小化中间破坏
- 在移动操作所在批次中包含引用更新
注意事项
- - 用户可能了解文件为何存在于当前位置。当用户说这是有意为之时,接受并记录到目录中,以便未来扫描不再标记。
- 展示文件大小和日期——它们有助于用户对过时文件做出决策。
第5阶段:执行(需明确批准)
步骤
- 1. 飞行前安全检查:
- 确认归档目录存在:~/.lexi-archive/YYYY-MM-DD/
- 确认当前批次中没有活跃进程正在使用文件
- 确认受影响区域中的Git仓库具有干净的工作树
- 2. 一次执行一个批次的已批准变更:
-
移动: 使用 mv 并备份原始位置清单到归档
-
删除: 始终先归档——移动到 ~/.lexi-archive/YYYY-MM-DD/,并记录原始路径、大小、日期和删除原因
-
引用更新: 更新所有引用旧路径的文件
-
符号链接清理: 删除损坏的符号链接,更新已移动文件的目标
- 3. 每个批次后:
- 验证移动操作是否正确完成
- 运行快速引用检查——搜索任何残留的旧路径引用
- 在进入下一批次前向用户报告结果
- 4. 执行后:
- 生成变更日志:移动了什么、归档了什么、更新了什么引用
- 用新位置更新目录
- 将变更日志保存到 ~/.lexi-archive/YYYY-MM-DD/changelog.md
注意事项
- - 归档是神圣的——文件在删除前始终先归档。
- 如果引用更新会修改排除区域中的文件(例如 .secrets/),则标记为手动更新。
- 如果在执行过程中发生任何意外情况,停止并报告,而非尝试静默恢复。
第6阶段:目录生成
步骤
- 1. 生成或更新动态目录 位于 ~/CATALOG.md:
- 顶级目录映射及用途
- 关键文件位置(配置、