Research Library Skill
A local-first multimedia research library for capturing, organizing, and searching hardware project knowledge.
What It Does
- - Store documents — Code, PDFs, CAD files, images, schematics
- Extract automatically — Text from PDFs, EXIF from images, functions from code
- Search intelligently — Full-text with material-type weighting (your work ranks higher than external research)
- Project isolation — Arduino separate from CNC; no contamination
- Cross-reference — Link knowledge: "this servo tuning applies to that project"
- Async extraction — Searches never block while OCR runs
- Backup daily — 30-day rolling snapshots
Installation
CODEBLOCK0
Quick Start
CODEBLOCK1
Features
CLI Commands
- -
reslib add — Import documents (auto-detect + extract) - INLINECODE1 — Full-text search with filters
- INLINECODE2 — View document details
- INLINECODE3 /
reslib unarchive — Manage documents - INLINECODE5 — Export as JSON/Markdown
- INLINECODE6 — Create document relationships
- INLINECODE7 — Manage projects
- INLINECODE8 — Manage tags
- INLINECODE9 — System overview
- INLINECODE10 /
reslib restore — Snapshots - INLINECODE12 — Quick validation
Technical
- - Storage: SQLite 3.45+ with FTS5 virtual table
- Extraction: PDF (pdfplumber + OCR), images (EXIF + OCR), code (AST + regex)
- Confidence Scoring: 0.0-1.0 based on quality + source
- Material Weighting: Reference (1.0) vs Research (0.5)
- Project Isolation: Scoped searches, no contamination
- Async Workers: 2-4 configurable extraction workers
- Catalog Separation: real_world vs openclaw projects
- Backup: Daily snapshots, 30-day retention
Configuration
Copy reslib/config.json and customize:
CODEBLOCK2
Integration with War Room
Use RL1 protocol in war room DNA:
CODEBLOCK3
Performance
All targets exceeded:
| Operation | Target | Actual |
|---|
| PDF extraction | <100ms | 20.6ms |
| Search (50 docs) |
<100ms | 0.33ms |
| Worker throughput | >6/sec | 414.69/sec |
Testing
CODEBLOCK4
Known Limitations (Phase 2)
- - OCR quality varies on hand-drawn sketches
- FTS5 designed for <10K documents (PostgreSQL path for scale)
- No automatic web research gathering (manual only)
- Vector embeddings ready but inactive
- CAD file parsing is metadata-only
Documentation
See /docs/:
- -
CLI-REFERENCE.md — All commands + examples - INLINECODE16 — How extraction works
- INLINECODE17 — Ranking + weighting
- INLINECODE18 — Async queue details
- INLINECODE19 — War room RL1 protocol
Phase 2 Roadmap
- - Real-world PDF calibration
- FTS5 scaling tests (10K docs)
- Auto-detection (reference vs research)
- Web research enrichment
- Vector embeddings (semantic search)
- PostgreSQL upgrade path
Building From Source
CODEBLOCK5
Support
Issues? See TECHNICAL-NOTES.md for troubleshooting.
Production-ready MVP. 214 tests passing. 15K lines. Ready to use.
研究库技能
一个本地优先的多媒体研究库,用于捕获、组织和搜索硬件项目知识。
功能概述
- - 存储文档 — 代码、PDF、CAD文件、图片、原理图
- 自动提取 — 从PDF提取文本、从图片提取EXIF信息、从代码提取函数
- 智能搜索 — 全文搜索并支持材料类型权重(你的工作成果优先级高于外部研究)
- 项目隔离 — Arduino与CNC项目分离,互不干扰
- 交叉引用 — 知识关联:此伺服调优适用于彼项目
- 异步提取 — OCR运行时搜索不会阻塞
- 每日备份 — 30天滚动快照
安装
bash
clawhub install research-library
或者
pip install /path/to/research-library
快速开始
bash
初始化数据库
reslib status
添加项目
reslib add ~/projects/arduino/servo.py --project arduino --material-type reference
搜索
reslib search 伺服调优
关联知识
reslib link 5 12 --type applies_to
功能特性
CLI命令
- - reslib add — 导入文档(自动检测+提取)
- reslib search — 带过滤器的全文搜索
- reslib get — 查看文档详情
- reslib archive / reslib unarchive — 管理文档
- reslib export — 导出为JSON/Markdown格式
- reslib link — 创建文档关联关系
- reslib projects — 管理项目
- reslib tags — 管理标签
- reslib status — 系统概览
- reslib backup / reslib restore — 快照管理
- reslib smoke_test.sh — 快速验证
技术细节
- - 存储: SQLite 3.45+ 配合 FTS5 虚拟表
- 提取: PDF(pdfplumber + OCR)、图片(EXIF + OCR)、代码(AST + 正则表达式)
- 置信度评分: 0.0-1.0,基于质量+来源
- 材料权重: 参考(1.0)vs 研究(0.5)
- 项目隔离: 限定范围搜索,无交叉污染
- 异步工作线程: 2-4个可配置的提取工作线程
- 目录分离: real_world 与 openclaw 项目分离
- 备份: 每日快照,保留30天
配置
复制 reslib/config.json 并自定义:
json
{
db_path: ~/.openclaw/research/library.db,
num_workers: 2,
workertimeoutsec: 300,
max_retries: 3,
backupretentiondays: 30,
backup_dir: ~/.openclaw/research/backups,
filesizelimit_mb: 200,
projectsizelimit_gb: 2
}
与作战室集成
在作战室DNA中使用RL1协议:
python
from reslib import ResearchDatabase, ResearchSearch
db = ResearchDatabase()
search = ResearchSearch(db)
研究前,先检查已有知识
prior = search.search(伺服调优, project=rc-quadcopter)
if prior:
print(f找到 {len(prior)} 条已有记录)
else:
# 需要进行新研究...
db.add_research(title=..., content=..., ...)
性能表现
所有指标均超预期:
| 操作 | 目标 | 实际 |
|---|
| PDF提取 | <100ms | 20.6ms |
| 搜索(50个文档) |
<100ms | 0.33ms |
| 工作线程吞吐量 | >6/秒 | 414.69/秒 |
测试
bash
运行所有测试
pytest tests/
快速冒烟测试
bash reslib/smoke_test.sh
性能测试
pytest tests/test_integration.py -v -k stress
已知限制(第二阶段)
- - 手绘草图的OCR质量参差不齐
- FTS5设计用于<10K文档(大规模场景需迁移至PostgreSQL)
- 无自动网络研究收集(仅支持手动)
- 向量嵌入已就绪但未启用
- CAD文件解析仅支持元数据
文档
参见 /docs/ 目录:
- - CLI-REFERENCE.md — 所有命令及示例
- EXTRACTION-GUIDE.md — 提取工作原理
- SEARCH-GUIDE.md — 排序与权重
- WORKER-GUIDE.md — 异步队列详情
- INTEGRATION.md — 作战室RL1协议
第二阶段路线图
- - 真实世界PDF校准
- FTS5扩展性测试(10K文档)
- 自动检测(参考vs研究)
- 网络研究丰富化
- 向量嵌入(语义搜索)
- PostgreSQL升级路径
从源码构建
bash
cd research-library
pip install -e .
pytest tests/
python -m reslib status
支持
遇到问题?请参阅 TECHNICAL-NOTES.md 进行故障排除。
生产就绪的MVP。214个测试通过。15K行代码。随时可用。