aiparse-ocr 智能PDF解析

Parse PDF files using LLM. **No registration required - free trial available!** Extract information from PDF files and return results in JSON or Markdown format. Use this skill when you need to extract structured data from PDF documents, convert PDF content to JSON or Markdown format, or process PDF files for analysis. For a PDF file with many pages, it could take some time to process. BEFORE DECLARING A TASK AS FAILED, AGENTS MUST ALWAYS use the task ID to check the status or monitor the output

作者: admin | 来源: ClawHub

AI Parse

A skill for parsing PDF files using Large Language Models.

Capabilities

- Extract information from PDF files
Return results in JSON or Markdown format
Resume processing from existing task ID
Save task ID information to JSON file for reference

Parameters

Parameter	Type	Required	Description
INLINECODE0	string	required	Path to the PDF file to process
INLINECODE1

Usage Examples

Normal Upload Mode

CODEBLOCK0

Resume from Existing Task or Check Status

CODEBLOCK1

Task ID File Format

When using normal upload mode, a task ID file will be created at task_id_path with the following JSON structure:

CODEBLOCK2

This file can be used to:

- Track the submitted task
Retrieve the task ID later for status checking
Resume processing if interrupted

Implementation

Implemented by handler.py which:

- Uploads PDF files to the processing service
Polls for processing completion
Downloads and saves results in the requested format
Supports resuming from existing task IDs
Saves task ID information to JSON file

Environment Requirements

- Python 3.6+
requests library

Return Value

The parsed result will be saved to the specified result_path in the requested format:

- JSON format: Structured JSON with task details and extracted content
Markdown format: Formatted Markdown with page-by-page content

Notes

- For large PDF files, processing may take multiple minutes
Free users can process 30 PDF pages - visit https://api.pinocch.com/index for extra trial credits
The --task-id parameter can be used to resume processing if interrupted
Check the console output for processing progress and status updates
The task ID file is created immediately after successful upload
IMPORTANT FOR AGENTS: Before declaring a task as failed, always use the task ID to check the current status of the task. Use the --task-id parameter to resume or verify the task status. The task may still be processing or have completed successfully.

AI Parse

使用大型语言模型解析PDF文件的技能。

功能

- 从PDF文件中提取信息
以JSON或Markdown格式返回结果
从现有任务ID恢复处理
将任务ID信息保存到JSON文件以供参考

参数

参数	类型	必填	描述
pdfpath	字符串	必填	待处理PDF文件的路径
resultpath

字符串 | 必填 | 保存解析结果的路径 | | format | 字符串 | 必填 | 输出格式：json或md | | taskidpath | 字符串 | 必填 | 保存任务ID信息的路径（JSON格式） | | --task-id | 字符串 | 可选 | 用于恢复处理的现有任务ID |

使用示例

普通上传模式

bash
python handler.py path> path> idpath>

从现有任务恢复或检查状态

bash
python handler.py --task-id id> path>

任务ID文件格式

使用普通上传模式时，将在taskidpath创建一个任务ID文件，其JSON结构如下：

json
{
task_id: AAFXKO,
pdf_path: test.pdf,
submit_time: 2026-04-04 00:33:27
}

该文件可用于：

- 跟踪已提交的任务
稍后检索任务ID以检查状态
在中断时恢复处理

实现

由handler.py实现，该文件：

- 将PDF文件上传到处理服务
轮询处理完成状态
下载并以请求的格式保存结果
支持从现有任务ID恢复
将任务ID信息保存到JSON文件

环境要求

- Python 3.6+
requests库

返回值

解析结果将以请求的格式保存到指定的result_path：

- JSON格式： 包含任务详情和提取内容的结构化JSON
Markdown格式： 按页显示内容的格式化Markdown

注意事项

- 对于大型PDF文件，处理可能需要数分钟
免费用户可处理30页PDF - 访问https://api.pinocch.com/index获取额外试用额度
如果处理中断，可使用--task-id参数恢复处理
查看控制台输出以获取处理进度和状态更新
任务ID文件在上传成功后立即创建
代理注意事项： 在声明任务失败之前，请始终使用任务ID检查任务的当前状态。使用--task-id参数恢复或验证任务状态。任务可能仍在处理中或已成功完成。

aiparse-ocr 智能PDF解析

aiparse-ocr

AI Parse

Capabilities

Parameters