PaddleOCR Document Parsing Skill

When to Use This Skill

Use Document Parsing for:

- Documents with tables (invoices, financial reports, spreadsheets)
Documents with mathematical formulas (academic papers, scientific documents)
Documents with charts and diagrams
Multi-column layouts (newspapers, magazines, brochures)
Complex document structures requiring layout analysis
Any document requiring structured understanding

Use Text Recognition instead for:

- Simple text-only extraction
Quick OCR tasks where speed is critical
Screenshots or simple images with clear text

Installation

Install Python dependencies before using this skill. From the skill directory (skills/paddleocr-doc-parsing):

CODEBLOCK0

Optional — for document optimization and split_pdf.py (page extraction):

CODEBLOCK1

How to Use This Skill

⛔ MANDATORY RESTRICTIONS - DO NOT VIOLATE ⛔

1. ONLY use PaddleOCR Document Parsing API - Execute the script INLINECODE2
NEVER parse documents directly - Do NOT parse documents yourself
NEVER offer alternatives - Do NOT suggest "I can try to analyze it" or similar
IF API fails - Display the error message and STOP immediately
NO fallback methods - Do NOT attempt document parsing any other way

If the script execution fails (API not configured, network error, etc.):

- Show the error message to the user
Do NOT offer to help using your vision capabilities
Do NOT ask "Would you like me to try parsing it?"
Simply stop and wait for user to fix the configuration

Basic Workflow

1. Execute document parsing:

   python scripts/vl_caller.py --file-url "URL provided by user" --pretty

Or for local files: CODEBLOCK3

Optional: explicitly set file type:

   python scripts/vl_caller.py --file-url "URL provided by user" --file-type 0 --pretty

- --file-type 0: PDF
- --file-type 1: image
- If omitted, the service can infer file type from input.

Default behavior: save raw JSON to a temp file:
- If --output is omitted, the script saves automatically under the system temp directory
- Default path pattern: <system-temp>/paddleocr/doc-parsing/results/result_<timestamp>_<id>.json
- If --output is provided, it overrides the default temp-file destination
- If --stdout is provided, JSON is printed to stdout and no file is saved
- In save mode, the script prints the absolute saved path on stderr: Result saved to: /absolute/path/...
- In default/custom save mode, read and parse the saved JSON file before responding
- In save mode, always tell the user the saved file path and that full raw JSON is available there
- Use --stdout only when you explicitly want to skip file persistence

2. The output JSON contains COMPLETE content with all document data:

- Headers, footers, page numbers - Main text content - Tables with structure - Formulas (with LaTeX) - Figures and charts - Footnotes and references - Seals and stamps - Layout and reading order

Input type note:
- Supported file types depend on the model and endpoint configuration.
- Always follow the file type constraints documented by your endpoint API.

3. Extract what the user needs from the output JSON using these fields:

- Top-level text - result[n].markdown - INLINECODE13

IMPORTANT: Complete Content Display

CRITICAL: You must display the COMPLETE extracted content to the user based on their needs.

- The output JSON contains ALL document content in a structured format
In save mode, the raw provider result can be inspected in the saved JSON file
Display the full content requested by the user, do NOT truncate or summarize
If user asks for "all text", show the entire text field
If user asks for "tables", show ALL tables in the document
If user asks for "main content", filter out headers/footers but show ALL body text

What this means:

- DO: Display complete text, all tables, all formulas as requested
DO: Present content using these fields: top-level text, result[n].markdown, and INLINECODE17
DON'T: Truncate with "..." unless content is excessively long (>10,000 chars)
DON'T: Summarize or provide excerpts when user asks for full content
DON'T: Say "Here's a preview" when user expects complete output

Example - Correct:
CODEBLOCK5

Example - Incorrect:
CODEBLOCK6

Understanding the JSON Response

The output JSON uses an envelope wrapping the raw API result:

CODEBLOCK7

Key fields:

- text — extracted markdown text from all pages (use this for quick text display)
INLINECODE19 - raw provider response object
INLINECODE20 - structured parsing output for each page (layout/content/confidence and related metadata)
INLINECODE21 — full rendered page output in markdown/HTML

Raw result location (default): the temp-file path printed by the script on stderr

Usage Examples

Example 1: Extract Full Document Text
CODEBLOCK8

Then use:

- Top-level text for quick full-text output
INLINECODE23 when page-level output is needed

Example 2: Extract Structured Page Data
CODEBLOCK9

Then use:

- result[n].prunedResult for structured parsing data (layout/content/confidence)
INLINECODE25 for rendered page content

Example 3: Print JSON Without Saving
CODEBLOCK10

Then return:

- Full text when user asks for full document content
INLINECODE27 and result[n].markdown when user needs complete structured page data

First-Time Configuration

When API is not configured:

The error will show:
CODEBLOCK11

Configuration workflow:

1. Show the exact error message to the user.

2. Guide the user to configure:

- Set PADDLEOCR_DOC_PARSING_API_URL to the full Triton inference endpoint URL. Format: http://<host>:<port>/v2/models/layout-parsing/infer Example: http://10.0.133.33:8020/v2/models/layout-parsing/infer - If the service is behind an nginx with Basic Auth, also set: - PADDLEOCR_BASIC_AUTH_USER — nginx username (e.g. ocr_admin) - PADDLEOCR_BASIC_AUTH_PASSWORD — nginx password - PADDLEOCR_ACCESS_TOKEN is not required for local deployments. Leave it empty or omit it. - Optionally set PADDLEOCR_DOC_PARSING_TIMEOUT (default: 600 seconds). - In OpenClaw, set environment variables in ~/.openclaw/openclaw.json: CODEBLOCK12

3. Ask the user to confirm the environment is configured.

4. Retry only after confirmation:

- Once the user confirms the environment variables are set, retry the original parsing task.

Handling Large Files

There is no file size limit for the API. For PDFs, the maximum is 100 pages per request.

Tips for large files:

Use URL for Large Local Files (Recommended)

For very large local files, prefer --file-url over --file-path to avoid base64 encoding overhead: CODEBLOCK13

Process Specific Pages (PDF Only)

If you only need certain pages from a large PDF, extract them first: CODEBLOCK14

Error Handling

Service unreachable:

error: API request failed: ...

→ Check that the Triton service is running and PADDLEOCR_DOC_PARSING_API_URL is correct

Request timeout:

error: API request timed out after 600s

→ Increase PADDLEOCR_DOC_PARSING_TIMEOUT or check server load

Unsupported format:

error: Unsupported file format

→ File format not supported, convert to PDF/PNG/JPG

Important Notes

- The script NEVER filters content - It always returns complete data
The AI agent decides what to present - Based on user's specific request
All data is always available - Can be re-interpreted for different needs
No information is lost - Complete document structure preserved

Reference Documentation

- references/output_schema.md - Output format specification

Note: Model version and capabilities are determined by your Triton deployment (PADDLEOCR_DOC_PARSING_API_URL).

Load these reference documents into context when:

- Debugging complex parsing issues
Need to understand output format
Working with provider API details

Testing the Skill

To verify the skill is working properly:
CODEBLOCK18

This tests configuration and optionally API connectivity.

PaddleOCR 文档解析技能

何时使用此技能

使用文档解析的场景：

- 包含表格的文档（发票、财务报告、电子表格）
包含数学公式的文档（学术论文、科学文档）
包含图表和示意图的文档
多栏布局（报纸、杂志、宣传册）
需要布局分析的复杂文档结构
任何需要结构化理解的文档

应使用文本识别的场景：

- 简单的纯文本提取
对速度要求较高的快速OCR任务
文字清晰的截图或简单图片

安装

使用此技能前需安装Python依赖。从技能目录（skills/paddleocr-doc-parsing）执行：

bash
pip install -r scripts/requirements.txt

可选 — 用于文档优化和split_pdf.py（页面提取）：

bash
pip install -r scripts/requirements-optimize.txt

如何使用此技能

⛔ 强制限制 - 不得违反 ⛔

1. 仅使用PaddleOCR文档解析API - 执行脚本python scripts/vl_caller.py
切勿直接解析文档 - 不要自行解析文档
切勿提供替代方案 - 不要说我可以尝试分析或类似内容
如果API失败 - 显示错误信息并立即停止
无备用方法 - 不要尝试任何其他方式的文档解析

如果脚本执行失败（API未配置、网络错误等）：

- 向用户显示错误信息
不要主动提出使用您的视觉能力提供帮助
不要问您希望我尝试解析吗？
直接停止并等待用户修复配置

基本工作流程

1. 执行文档解析：

bash python scripts/vl_caller.py --file-url 用户提供的URL --pretty

或用于本地文件：
bash
python scripts/vl_caller.py --file-path 文件路径 --pretty

可选：显式设置文件类型：
bash
python scripts/vl_caller.py --file-url 用户提供的URL --file-type 0 --pretty

- --file-type 0：PDF
- --file-type 1：图片
- 如果省略，服务可从输入推断文件类型。

默认行为：将原始JSON保存到临时文件：
- 如果省略--output，脚本自动保存到系统临时目录下
- 默认路径格式：<系统临时目录>/paddleocr/doc-parsing/results/result<时间戳>.json
- 如果提供了--output，则覆盖默认的临时文件目标路径
- 如果提供了--stdout，JSON将输出到标准输出且不保存文件
- 在保存模式下，脚本会在标准错误输出打印绝对保存路径：Result saved to: /绝对路径/...
- 在默认/自定义保存模式下，在响应前读取并解析已保存的JSON文件
- 在保存模式下，始终告知用户已保存的文件路径以及完整的原始JSON可在该处获取
- 仅在明确需要跳过文件持久化时使用--stdout

2. 输出JSON包含完整内容，包含所有文档数据：

- 页眉、页脚、页码 - 正文内容 - 带结构的表格 - 公式（含LaTeX） - 图形和图表 - 脚注和参考文献 - 印章和戳记 - 布局和阅读顺序

输入类型说明：
- 支持的文件类型取决于模型和端点配置。
- 始终遵循端点API文档中规定的文件类型限制。

3. 使用以下字段从输出JSON中提取用户所需内容：

- 顶层text - result[n].markdown - result[n].prunedResult

重要：完整内容显示

关键：您必须根据用户需求显示完整的提取内容。

- 输出JSON以结构化格式包含所有文档内容
在保存模式下，可在保存的JSON文件中查看原始提供者结果
显示用户请求的完整内容，不要截断或总结
如果用户要求所有文本，显示整个text字段
如果用户要求表格，显示文档中所有表格
如果用户要求主要内容，过滤掉页眉/页脚但显示所有正文

这意味着：

- 要：按请求显示完整文本、所有表格、所有公式
要：使用以下字段呈现内容：顶层text、result[n].markdown和result[n].prunedResult
不要：用...截断，除非内容过长（超过10,000字符）
不要：在用户要求完整内容时进行总结或提供摘录
不要：在用户期望完整输出时说以下是预览

示例 - 正确：

用户：提取此文档中的所有文本
智能体：我已解析完整文档。以下是提取的所有文本：

[按阅读顺序显示整个文本字段或拼接的区域]

文档统计：

- 总区域数：25
文本块：15
表格：3
公式：2

质量：优秀（置信度：0.92）

示例 - 错误：

用户：提取所有文本
智能体：我发现了一个包含多个部分的文档。以下是开头部分：
引言...（为简洁起见已截断内容）

理解JSON响应

输出JSON使用信封包装原始API结果：

json
{
ok: true,
text: 从所有页面提取的完整markdown/HTML文本,
result: { ... }, // 原始提供者响应
error: null
}

关键字段：

- text — 从所有页面提取的markdown文本（用于快速文本显示）
result - 原始提供者响应对象
result[n].prunedResult - 每页的结构化解析输出（布局/内容/置信度及相关元数据）
result[n].markdown — 每页的完整渲染输出（markdown/HTML格式）

原始结果位置（默认）：脚本在标准错误输出打印的临时文件路径

使用示例

示例1：提取完整文档文本
bash
python scripts/vl_caller.py \
--file-url https://example.com/paper.pdf \
--pretty

然后使用：

- 顶层text用于快速全文输出
需要页面级输出时使用result[n].markdown

示例2：提取结构化页面数据
bash
python scripts/vl_caller.py \
--file-path ./financial_report.pdf \
--pretty

然后使用：

- result[n].prunedResult用于结构化解析数据（布局/内容/置信度）
result[n].markdown用于渲染的页面内容

示例3：打印JSON而不保存
bash
python scripts/vl_caller.py \
--file-url URL \
--stdout \
--pretty

然后返回：

- 用户要求完整文档内容时返回完整text
用户需要完整结构化页面数据时返回result[n].prunedResult和result[n].markdown

首次配置

当API未配置时：

错误将显示：

CONFIGERROR: PADDLEOCRDOCPARSINGAPI_URL未配置。请将其设置为您的Triton端点，例如：http://10.0.0.1:8020/v2/models/layout-parsing/infer

配置工作流程：

1. 向用户显示确切的错误信息。

2. 引导用户进行配置：

- 将PADDLEOCRDOCPARSINGAPIURL设置为完整的Triton推理端点URL。格式：http://<主机>:<端口>/v2/models/layout-parsing/infer 示例：http://10.0.133.33:8020/v2/models/layout-parsing/infer - 如果服务位于带基本认证的nginx后面，还需设置： - PADDLEOCRBASICAUTHUSER — nginx用户名（例如ocradmin） - PADDLEOCRBASICAUTH_PASSWORD — nginx密码 - 本地部署不需要PADDLEOCRACCESSTOKEN。留空或省略即可。 - 可选设置PADDLEOCRDOCPARSING_TIMEOUT（默认：600秒）。 - 在OpenClaw中，在~/.openclaw/openclaw.json中设置环境变量： json { skills: { entries: { paddleocr-doc-parsing: { enabled: true, env: { PADDLEOCRDOCPARSINGAPIURL: http://10.0.133.33:8020/v2/models/layout

paddleocr-vl-locallyPaddleOCR本地解析

paddleocr-vl-locally

PaddleOCR Document Parsing Skill

When to Use This Skill

Installation

How to Use This Skill

Basic Workflow

IMPORTANT: Complete Content Display

Understanding the JSON Response

Usage Examples

First-Time Configuration

Handling Large Files

Use URL for Large Local Files (Recommended)

Process Specific Pages (PDF Only)

Error Handling

Important Notes

Reference Documentation

Testing the Skill

PaddleOCR 文档解析技能

何时使用此技能

安装

如何使用此技能

基本工作流程

重要：完整内容显示

理解JSON响应

使用示例

首次配置

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

paddleocr-vl-locallyPaddleOCR本地解析

paddleocr-vl-locally

PaddleOCR Document Parsing Skill

When to Use This Skill

Installation

How to Use This Skill

Basic Workflow

IMPORTANT: Complete Content Display

Understanding the JSON Response

Usage Examples

First-Time Configuration

Handling Large Files

Use URL for Large Local Files (Recommended)

Process Specific Pages (PDF Only)

Error Handling

Important Notes

Reference Documentation

Testing the Skill

PaddleOCR 文档解析技能

何时使用此技能

安装

如何使用此技能

基本工作流程

重要：完整内容显示

理解JSON响应

使用示例

首次配置

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement