table-ocr表格OCR识别

OCR and extract tables from scanned PDFs and images using MinerU. Recognizes table structures in image-based documents and converts them to structured Markdown. Features: table detection and recognition from PDFs and images (.png, .jpg, .jpeg, .webp). OCR for scanned documents with image-embedded tables. Supports complex table layouts with merged cells. Combined OCR and table extraction in one pass. Use when you need to: extract tables from scanned PDFs, OCR tables from images, convert image tab

作者: admin | 来源: ClawHub

表格OCR

使用MinerU（mineru-open-api）从.pdf/图片（.png/.jpg/.jpeg/.webp）中转换并提取内容。

安装

bash
npm install -g mineru-open-api

或通过Go安装（macOS/Linux）：

go install github.com/opendatalab/MinerU-Ecosystem/cli/mineru-open-api@latest

快速开始

bash

从PDF中提取表格（需要令牌）

mineru-open-api extract report.pdf -o ./out/

对扫描文档使用显式表格标记和OCR

mineru-open-api extract scanned.pdf --ocr --table -o ./out/

身份验证

extract和crawl命令需要令牌：

bash
mineru-open-api auth # 交互式令牌设置
export MINERU_TOKEN=your-token # 或通过环境变量设置

在以下地址创建令牌：https://mineru.net/apiManage/token

功能特性

- 支持本地文件和URL
需要令牌（mineru-open-api auth或MINERU_TOKEN环境变量）
支持的输入格式：.pdf / 图片（.png/.jpg/.jpeg/.webp）
通过--language指定语言（默认：ch，英文使用en）
通过--pages指定页码范围（适用时）

注意事项

- 表格识别需要使用令牌执行extract命令。对扫描内容使用--ocr，对表格检测使用--table（extract中默认同时启用两者）
输出默认发送到标准输出；使用-o 保存到文件
二进制格式（docx）需要使用-o标记（无法流式输出到标准输出）
所有进度/状态消息发送到标准错误输出
MinerU是OpenDataLab（上海人工智能实验室）的开源项目：https://github.com/opendatalab/MinerU

table-ocr表格OCR识别

table-ocr

Table Ocr

Install

Quick Start

Authentication

Capabilities

Notes

表格OCR

安装

或通过Go安装（macOS/Linux）：

快速开始

从PDF中提取表格（需要令牌）

对扫描文档使用显式表格标记和OCR

身份验证

功能特性

注意事项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

table-ocr表格OCR识别

table-ocr

Table Ocr

Install

Quick Start

Authentication

Capabilities

Notes

表格OCR

安装

或通过Go安装（macOS/Linux）：

快速开始

从PDF中提取表格（需要令牌）

对扫描文档使用显式表格标记和OCR

身份验证

功能特性

注意事项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement