Tool List

1. pdftomarkdown

Convert PDF documents to Markdown format, preserving document structure, formulas, tables, and images.

Description: Use MinerU to parse PDF documents and output in Markdown format, supporting OCR, formula recognition, table extraction, and other features.

Parameters:

- file_path (string, required): Absolute path to the PDF file
INLINECODE1 (string, required): Absolute path to the output directory
INLINECODE2 (string, optional): Parsing backend, options: hybrid-auto-engine (default), pipeline, INLINECODE5
INLINECODE6 (string, optional): OCR language code, such as en (English), ch (Chinese), ja (Japanese), etc., defaults to auto-detection
INLINECODE10 (boolean, optional): Whether to enable formula recognition, defaults to true
INLINECODE11 (boolean, optional): Whether to enable table extraction, defaults to true
INLINECODE12 (integer, optional): Start page number (starting from 0), defaults to 0
INLINECODE13 (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages

Return Value:
CODEBLOCK0

Examples:

python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'

# Use specific backend
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "pipeline"}}'

# Parse specific pages
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_markdown", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "start_page": 0, "end_page": 5}}'

2. pdftojson

Convert PDF documents to JSON format, including detailed layout and structural information.

Description: Use MinerU to parse PDF documents and output in JSON format, containing structured information such as text blocks, images, tables, formulas, etc.

Parameters:

- file_path (string, required): Absolute path to the PDF file
INLINECODE15 (string, required): Absolute path to the output directory
INLINECODE16 (string, optional): Parsing backend, options: hybrid-auto-engine (default), pipeline, INLINECODE19
INLINECODE20 (string, optional): OCR language code, such as en (English), ch (Chinese), ja (Japanese), etc., defaults to auto-detection
INLINECODE24 (boolean, optional): Whether to enable formula recognition, defaults to true
INLINECODE25 (boolean, optional): Whether to enable table extraction, defaults to true
INLINECODE26 (integer, optional): Start page number (starting from 0), defaults to 0
INLINECODE27 (integer, optional): End page number (starting from 0), defaults to -1 meaning parse all pages

Return Value:
CODEBLOCK2

Examples:

python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output"}}'

# Use specific backend and language
python .claude/skills/pdf-process/script/pdf_parser.py \
  '{"name": "pdf_to_json", "arguments": {"file_path": "/path/to/document.pdf", "output_dir": "/path/to/output", "backend": "hybrid-auto-engine", "language": "ch"}}'

Installation Instructions

1. Install MinerU

CODEBLOCK4

2. Verify Installation

CODEBLOCK5

3. System Requirements

- Python Version: 3.10-3.13
Operating System: Linux / Windows / macOS 14.0+
Memory:

- Using pipeline backend: minimum 16GB, recommended 32GB+ - Using hybrid/vlm backend: minimum 16GB, recommended 32GB+

- Disk Space: minimum 20GB (SSD recommended)
GPU (optional):

- pipeline backend: supports CPU-only - hybrid/vlm backend: requires NVIDIA GPU (Volta architecture and above) or Apple Silicon

Use Cases

1. Academic Paper Parsing: Extract structured content such as formulas, tables, and images
Technical Document Conversion: Convert PDF documents to Markdown for version control and online publishing
OCR Processing: Process scanned PDFs and garbled PDFs
Multilingual Documents: Supports OCR recognition for 109 languages
Batch Processing: Batch convert multiple PDF documents

Backend Selection Recommendations

- hybrid-auto-engine (default): Balanced accuracy and speed, suitable for most scenarios
pipeline: Suitable for CPU-only environments, best compatibility
vlm-auto-engine: Highest accuracy, requires GPU acceleration

Notes

1. File Paths: All paths must be absolute paths
Output Directory: Non-existent directories will be created automatically
Performance: Using GPU can significantly improve parsing speed
Page Numbers: Page numbers start counting from 0
Memory: Processing large documents may consume more memory

Troubleshooting

Common Issues

1. Installation Failure:

- Ensure using Python 3.10-3.13 - Windows only supports Python 3.10-3.12 (ray does not support 3.13) - Using uv pip install can resolve most dependency conflicts

2. Insufficient Memory:

- Use pipeline backend - Limit parsing pages: start_page and end_page - Reduce virtual memory allocation

3. Slow Parsing Speed:

- Enable GPU acceleration - Use hybrid-auto-engine backend - Disable unnecessary features (formulas, tables)

4. Low OCR Accuracy:

- Specify the correct document language - Ensure the backend supports OCR (use pipeline or hybrid-*)

Related Resources

- MinerU Official Documentation: https://opendatalab.github.io/MinerU/
MinerU GitHub: https://github.com/opendatalab/MinerU
Online Demo: https://mineru.net/

工具列表

1. pdftomarkdown

将PDF文档转换为Markdown格式，保留文档结构、公式、表格和图片。

描述：使用MinerU解析PDF文档并以Markdown格式输出，支持OCR、公式识别、表格提取等功能。

参数：

- filepath（字符串，必填）：PDF文件的绝对路径
outputdir（字符串，必填）：输出目录的绝对路径
backend（字符串，可选）：解析后端，可选值：hybrid-auto-engine（默认）、pipeline、vlm-auto-engine
language（字符串，可选）：OCR语言代码，如en（英语）、ch（中文）、ja（日语）等，默认为自动检测
enableformula（布尔值，可选）：是否启用公式识别，默认为true
enabletable（布尔值，可选）：是否启用表格提取，默认为true
startpage（整数，可选）：起始页码（从0开始），默认为0
endpage（整数，可选）：结束页码（从0开始），默认为-1表示解析所有页面

返回值：
json
{
success: true,
output_path: /path/to/output,
markdown_content: 转换后的Markdown内容...,
images: [图片路径列表],
tables: [表格信息列表],
formula_count: 10
}

示例：
bash
python .claude/skills/pdf-process/script/pdf_parser.py \
{name: pdftomarkdown, arguments: {filepath: /path/to/document.pdf, outputdir: /path/to/output}}

使用特定后端

python .claude/skills/pdf-process/script/pdf_parser.py \ {name: pdftomarkdown, arguments: {filepath: /path/to/document.pdf, outputdir: /path/to/output, backend: pipeline}}

解析特定页面

python .claude/skills/pdf-process/script/pdf_parser.py \ {name: pdftomarkdown, arguments: {filepath: /path/to/document.pdf, outputdir: /path/to/output, startpage: 0, endpage: 5}}

2. pdftojson

将PDF文档转换为JSON格式，包含详细的布局和结构信息。

描述：使用MinerU解析PDF文档并以JSON格式输出，包含文本块、图片、表格、公式等结构化信息。

参数：

- filepath（字符串，必填）：PDF文件的绝对路径
outputdir（字符串，必填）：输出目录的绝对路径
backend（字符串，可选）：解析后端，可选值：hybrid-auto-engine（默认）、pipeline、vlm-auto-engine
language（字符串，可选）：OCR语言代码，如en（英语）、ch（中文）、ja（日语）等，默认为自动检测
enableformula（布尔值，可选）：是否启用公式识别，默认为true
enabletable（布尔值，可选）：是否启用表格提取，默认为true
startpage（整数，可选）：起始页码（从0开始），默认为0
endpage（整数，可选）：结束页码（从0开始），默认为-1表示解析所有页面

返回值：
json
{
success: true,
output_path: /path/to/output.json,
pages: [
{
page_no: 0,
page_size: [595, 842],
blocks: [
{
type: text,
text: 文本内容,
bbox: [x, y, x, y]
}
],
images: [],
tables: [],
formulas: []
}
],
metadata: {
total_pages: 10,
author: 作者,
title: 标题
}
}

示例：
bash
python .claude/skills/pdf-process/script/pdf_parser.py \
{name: pdftojson, arguments: {filepath: /path/to/document.pdf, outputdir: /path/to/output}}

使用特定后端和语言

python .claude/skills/pdf-process/script/pdf_parser.py \ {name: pdftojson, arguments: {filepath: /path/to/document.pdf, outputdir: /path/to/output, backend: hybrid-auto-engine, language: ch}}

安装说明

1. 安装MinerU

bash

更新pip并安装uv

pip install --upgrade pip
pip install uv

安装MinerU（包含所有功能）

uv pip install -U mineru[all]

2. 验证安装

bash

检查MinerU是否安装成功

mineru --version

测试基本功能

mineru --help

3. 系统要求

- Python版本：3.10-3.13
操作系统：Linux / Windows / macOS 14.0+
内存：

- 使用pipeline后端：最低16GB，推荐32GB+ - 使用hybrid/vlm后端：最低16GB，推荐32GB+

- 磁盘空间：最低20GB（推荐SSD）
GPU（可选）：

- pipeline后端：支持纯CPU运行 - hybrid/vlm后端：需要NVIDIA GPU（Volta架构及以上）或Apple Silicon

使用场景

1. 学术论文解析：提取公式、表格、图片等结构化内容
技术文档转换：将PDF文档转换为Markdown，便于版本控制和在线发布
OCR处理：处理扫描版PDF和乱码PDF
多语言文档：支持109种语言的OCR识别
批量处理：批量转换多个PDF文档

后端选择建议

- hybrid-auto-engine（默认）：精度和速度均衡，适用于大多数场景
pipeline：适用于纯CPU环境，兼容性最佳
vlm-auto-engine：精度最高，需要GPU加速

注意事项

1. 文件路径：所有路径必须为绝对路径
输出目录：不存在的目录会自动创建
性能：使用GPU可显著提升解析速度
页码：页码从0开始计数
内存：处理大型文档可能消耗较多内存

故障排除

常见问题

1. 安装失败：

- 确保使用Python 3.10-3.13 - Windows仅支持Python 3.10-3.12（ray不支持3.13） - 使用uv pip install可解决大部分依赖冲突

2. 内存不足：

- 使用pipeline后端 - 限制解析页面：startpage和endpage - 减少虚拟内存分配

3. 解析速度慢：

- 启用GPU加速 - 使用hybrid-auto-engine后端 - 禁用不必要的功能（公式、表格）

4. OCR精度低：

- 指定正确的文档语言 - 确保后端支持OCR（使用pipeline或hybrid-*）

pdf-process-mineruPDF解析工具

pdf-process-mineru

Tool List

1. pdftomarkdown

2. pdftojson

Installation Instructions

1. Install MinerU

2. Verify Installation

3. System Requirements

Use Cases

Backend Selection Recommendations

Notes

Troubleshooting

Common Issues

Related Resources

工具列表

1. pdftomarkdown

使用特定后端

解析特定页面

2. pdftojson

使用特定后端和语言

安装说明

1. 安装MinerU

更新pip并安装uv

安装MinerU（包含所有功能）

2. 验证安装

检查MinerU是否安装成功

测试基本功能

3. 系统要求

使用场景

后端选择建议

注意事项

故障排除

常见问题

相关资源

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement