Boof 🍑
Local-first document processing: PDF → markdown → RAG index → token-efficient analysis.
Documents stay local. Only relevant chunks go to the LLM. Maximum knowledge absorption, minimum token burn.
Powered by opendataloader-pdf — #1 in PDF parsing benchmarks (0.90 overall, 0.93 table accuracy). CPU-only, no GPU required.
Quick Reference
Convert + index a document
CODEBLOCK0
Convert with custom collection name
CODEBLOCK1
Query indexed content
CODEBLOCK2
Core Workflow
- 1. Boof it: Run
boof.sh on a PDF. This converts it to markdown via opendataloader-pdf (local Java engine, no API, no GPU) and indexes it into QMD for semantic search.
- 2. Query it: Use
qmd query to retrieve only the relevant chunks. Send those chunks to the LLM — not the entire document.
- 3. Analyze it: The LLM sees focused, relevant excerpts. No wasted tokens, no lost-in-the-middle problems.
When to Use Each Approach
"Analyze this specific aspect of the paper" → Boof + query (cheapest, most focused)
"Summarize this entire document" → Boof, then read the markdown section by section. Summarize each section individually, then merge summaries. See advanced-usage.md.
"Compare findings across multiple papers" → Boof all papers into one collection, then query across them.
"Find where the paper discusses X" → qmd search "X" -c collection for exact match, qmd query "X" -c collection for semantic match.
Output Location
Converted markdown files are saved to knowledge/boofed/ by default (override with --output-dir).
Setup
If boof.sh reports missing dependencies, see setup-guide.md for installation instructions (Java + opendataloader-pdf + QMD).
Environment
- -
ODL_ENV — Path to opendataloader-pdf Python venv (default: ~/.openclaw/tools/odl-env) - INLINECODE9 — Path to qmd binary (default:
~/.bun/bin/qmd) - INLINECODE11 — Default output directory (default:
~/.openclaw/workspace/knowledge/boofed)
Boof 🍑
本地优先的文档处理:PDF → Markdown → RAG索引 → 高效Token分析。
文档始终保留在本地。仅将相关片段发送至LLM。最大化知识吸收,最小化Token消耗。
基于 opendataloader-pdf 构建——PDF解析基准测试排名第一(综合得分0.90,表格准确率0.93)。仅需CPU,无需GPU。
快速参考
转换并索引文档
bash
bash {SKILL_DIR}/scripts/boof.sh /path/to/document.pdf
使用自定义集合名称转换
bash
bash {SKILL_DIR}/scripts/boof.sh /path/to/document.pdf --collection my-project
查询已索引内容
bash
qmd query 你的问题 -c collection-name
核心工作流程
- 1. Boof处理: 对PDF运行boof.sh。通过opendataloader-pdf(本地Java引擎,无需API,无需GPU)将其转换为Markdown,并索引至QMD以供语义搜索。
- 2. 查询检索: 使用qmd query仅检索相关片段。将这些片段发送至LLM——而非整个文档。
- 3. 分析解读: LLM仅看到聚焦的相关摘录。无Token浪费,无中间丢失问题。
各场景使用指南
分析论文的特定方面 → Boof + 查询(最经济、最聚焦)
总结整篇文档 → 先Boof,再逐节阅读Markdown。分别总结每个章节,然后合并摘要。详见 advanced-usage.md。
对比多篇论文的发现 → 将所有论文Boof至同一集合,然后跨文档查询。
查找论文中讨论X的位置 → 精确匹配使用 qmd search X -c collection,语义匹配使用 qmd query X -c collection。
输出位置
转换后的Markdown文件默认保存至 knowledge/boofed/(可通过 --output-dir 覆盖)。
环境配置
若 boof.sh 报告缺少依赖,请参阅 setup-guide.md 获取安装说明(Java + opendataloader-pdf + QMD)。
环境变量
- - ODLENV — opendataloader-pdf Python虚拟环境路径(默认:~/.openclaw/tools/odl-env)
- QMDBIN — qmd二进制文件路径(默认:~/.bun/bin/qmd)
- BOOFOUTPUTDIR — 默认输出目录(默认:~/.openclaw/workspace/knowledge/boofed)