extract-tables-from-pdfPDF表格提取

Extract tables from PDF documents using MinerU's table detection engine. Identifies and extracts structured table data from both native and scanned PDFs. Features: automatic table detection in PDFs. Extracts tables preserving row/column structure. OCR mode for scanned PDF tables. Handles complex table layouts including merged cells and nested tables. Use when you need to: extract tables from a PDF, get table data from a PDF document, parse PDF tables into structured format, pull data tables out

作者: admin | 来源: ClawHub

从PDF中提取表格

使用MinerU（mineru-open-api）转换并提取.pdf文件中的内容。

安装

bash
npm install -g mineru-open-api

或通过Go安装（macOS/Linux）：

go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

快速开始

bash

从PDF中提取表格（需要令牌）

mineru-open-api extract report.pdf -o ./out/

针对扫描文档使用显式表格标志和OCR

mineru-open-api extract scanned.pdf --ocr --table -o ./out/

身份验证

extract和crawl命令需要令牌：

bash
mineru-open-api auth # 交互式令牌设置
export MINERU_TOKEN=your-token # 或通过环境变量设置

在以下地址创建令牌：https://mineru.net/apiManage/token

功能特性

- 支持本地文件和URL
需要令牌（mineru-open-api auth或MINERU_TOKEN环境变量）
支持的输入格式：.pdf
使用--language指定语言（默认：ch，英文使用en）
使用--pages指定页码范围（适用时）

注意事项

- 表格识别需要使用带令牌的extract命令。flash-extract不支持表格。使用--table标志（默认启用）。
默认输出到标准输出；使用-o 保存到文件
二进制格式（docx）需要使用-o标志（无法流式输出到标准输出）
所有进度/状态信息输出到标准错误
MinerU是由OpenDataLab（上海人工智能实验室）开发的开源项目：https://github.com/opendatalab/MinerU

extract-tables-from-pdfPDF表格提取

extract-tables-from-pdf

Extract Tables From Pdf

Install

Quick Start

Authentication

Capabilities

Notes

从PDF中提取表格

安装

或通过Go安装（macOS/Linux）：

快速开始

从PDF中提取表格（需要令牌）

针对扫描文档使用显式表格标志和OCR

身份验证

功能特性

注意事项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

extract-tables-from-pdfPDF表格提取

extract-tables-from-pdf

Extract Tables From Pdf

Install

Quick Start

Authentication

Capabilities

Notes

从PDF中提取表格

安装

或通过Go安装（macOS/Linux）：

快速开始

从PDF中提取表格（需要令牌）

针对扫描文档使用显式表格标志和OCR

身份验证

功能特性

注意事项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement