MinerU Document Explorer
PDF reading toolkit via doc-search CLI. Search first, then read relevant pages — never scan an entire PDF.
⚠️ Network capabilities: This skill can optionally call external APIs (PageIndex outline generation, MinerU cloud OCR, embedding/reranker services) and run a local FastAPI server. All network features are opt-in and disabled by default.
Path conventions
CODEBLOCK0
Setup check
Read SKILL_DIR/config-state.json. If missing or setup_complete is not true:
- 1. Read
references/setup.md and run the installer - After setup, ask the user if they want to configure PageIndex (e.g. "If you have an OpenAI-compatible API key, you can enable PageIndex to auto-generate a document outline — useful for scanned docs or manuals. Want to set it up?")
- If the user provides
pageindex_api_key / pageindex_base_url → write to SCRIPTS/doc-search/config.yaml; if skipped → continue immediately, do not block
⚠️ MUST read reference docs before acting — no guessing
Any uncertainty about parameters, return fields, or query phrasing → MUST read the corresponding cmd file before running any command. Do not infer or guess.
- -
references/cmd-init.md / cmd-outline.md / INLINECODE9 - INLINECODE10 /
cmd-search-semantic.md / INLINECODE12
For complex tasks, errors, unexpected results, or unfamiliar scenarios → MUST read references/tips.md first. It contains proven workflows and hard-won pitfalls that will save you from repeating mistakes.
Command cheatsheet
All output is JSON on stdout. --timeout is a global flag before the subcommand; default is 120s.
CODEBLOCK1
Key reminders
- - Use
outline and keyword search to narrow the reading range — never scan the full document - INLINECODE16 is 0-indexed — do not confuse with printed page numbers
- After extracting figures/tables with
elements, you must read crop_path to verify ; and the query should be "the actual chart image, not the caption text" ; if the query fails, check page_idxs or rephrase the query
Lessons learned (mandatory)
After completing any PDF task: pitfalls / new workflows / parameter discoveries → append to references/tips.md, 1-2 lines each, conclusions only.
MinerU 文档浏览器
通过 doc-search 命令行工具实现的PDF阅读工具包。先搜索,再阅读相关页面——无需扫描整份PDF。
⚠️ 网络功能:本技能可选择调用外部API(PageIndex大纲生成、MinerU云端OCR、嵌入/重排序服务)并运行本地FastAPI服务器。所有网络功能均为可选,默认关闭。
路径约定
SKILL_DIR = <本文件所在父目录>
SCRIPTS = SKILL_DIR/scripts
安装检查
读取 SKILLDIR/config-state.json。如果文件缺失或 setupcomplete 不为true:
- 1. 阅读 references/setup.md 并运行安装程序
- 安装完成后,询问用户是否要配置PageIndex(例如:如果您有兼容OpenAI的API密钥,可以启用PageIndex自动生成文档大纲——这对扫描件或手册很有用。要设置吗?)
- 如果用户提供 pageindexapikey / pageindexbaseurl → 写入 SCRIPTS/doc-search/config.yaml;如果跳过 → 立即继续,不要阻塞
⚠️ 执行前必须阅读参考文档——禁止猜测
对参数、返回字段或查询措辞有任何不确定 → 在运行任何命令前必须阅读对应的cmd文件。不得自行推断或猜测。
- - references/cmd-init.md / cmd-outline.md / cmd-pages.md
- references/cmd-search-keyword.md / cmd-search-semantic.md / cmd-elements.md
对于复杂任务、错误、意外结果或不熟悉的情况 → 必须首先阅读 references/tips.md。其中包含经过验证的工作流程和来之不易的教训,可避免重复犯错。
命令速查表
所有输出均为JSON格式到标准输出。--timeout 是子命令前的全局标志;默认值为120秒。
bash
doc-search init --doc_path <路径或URL>
doc-search outline --docid [--maxdepth N] [--root_node <节点ID>]
doc-search pages --docid --pageidxs <页码> [--noimage] [--returntext]
doc-search search-keyword --docid --pageidxs <页码> --pattern <正则表达式> [--return_text]
doc-search search-semantic --docid --pageidxs <页码> --query <查询词> [--topk N] [--noimage] [--return_text]
doc-search --timeout 300 elements --docid --pageidxs <页码> --query <查询词>
关键提醒
- - 使用 outline 和关键词搜索缩小阅读范围——切勿扫描整份文档
- --pageidxs 是从0开始索引——不要与打印页码混淆
- 使用 elements 提取图表后,必须读取 croppath 进行验证;查询应为实际图表图像,而非标题文本;如果查询失败,请检查 page_idxs 或重新措辞查询
经验教训(必读)
完成任何PDF任务后:遇到的陷阱/新工作流程/参数发现 → 追加到 references/tips.md,每条1-2行,仅保留结论。