AI Parse
A skill for parsing PDF files using Large Language Models.
Capabilities
- - Extract information from PDF files
- Return results in JSON or Markdown format
- Resume processing from existing task ID
- Save task ID information to JSON file for reference
Parameters
| Parameter | Type | Required | Description |
|---|
| INLINECODE0 | string | required | Path to the PDF file to process |
| INLINECODE1 |
string | required | Path to save the parsing result |
|
format | string | required | Output format: "json" or "md" |
|
task_id_path | string | required | Path to save task ID information (JSON format) |
|
--task-id | string | optional | Existing task ID to resume processing |
Usage Examples
Normal Upload Mode
CODEBLOCK0
Resume from Existing Task or Check Status
CODEBLOCK1
Task ID File Format
When using normal upload mode, a task ID file will be created at task_id_path with the following JSON structure:
CODEBLOCK2
This file can be used to:
- - Track the submitted task
- Retrieve the task ID later for status checking
- Resume processing if interrupted
Implementation
Implemented by handler.py which:
- - Uploads PDF files to the processing service
- Polls for processing completion
- Downloads and saves results in the requested format
- Supports resuming from existing task IDs
- Saves task ID information to JSON file
Environment Requirements
- - Python 3.6+
- requests library
Return Value
The parsed result will be saved to the specified result_path in the requested format:
- - JSON format: Structured JSON with task details and extracted content
- Markdown format: Formatted Markdown with page-by-page content
Notes
- - For large PDF files, processing may take multiple minutes
- Free users can process 30 PDF pages - visit https://api.pinocch.com/index for extra trial credits
- The
--task-id parameter can be used to resume processing if interrupted - Check the console output for processing progress and status updates
- The task ID file is created immediately after successful upload
- IMPORTANT FOR AGENTS: Before declaring a task as failed, always use the task ID to check the current status of the task. Use the
--task-id parameter to resume or verify the task status. The task may still be processing or have completed successfully.
AI Parse
使用大型语言模型解析PDF文件的技能。
功能
- - 从PDF文件中提取信息
- 以JSON或Markdown格式返回结果
- 从现有任务ID恢复处理
- 将任务ID信息保存到JSON文件以供参考
参数
| 参数 | 类型 | 必填 | 描述 |
|---|
| pdfpath | 字符串 | 必填 | 待处理PDF文件的路径 |
| resultpath |
字符串 | 必填 | 保存解析结果的路径 |
| format | 字符串 | 必填 | 输出格式:json或md |
| task
idpath | 字符串 | 必填 | 保存任务ID信息的路径(JSON格式) |
| --task-id | 字符串 | 可选 | 用于恢复处理的现有任务ID |
使用示例
普通上传模式
bash
python handler.py path> path> idpath>
从现有任务恢复或检查状态
bash
python handler.py --task-id id> path>
任务ID文件格式
使用普通上传模式时,将在taskidpath创建一个任务ID文件,其JSON结构如下:
json
{
task_id: AAFXKO,
pdf_path: test.pdf,
submit_time: 2026-04-04 00:33:27
}
该文件可用于:
- - 跟踪已提交的任务
- 稍后检索任务ID以检查状态
- 在中断时恢复处理
实现
由handler.py实现,该文件:
- - 将PDF文件上传到处理服务
- 轮询处理完成状态
- 下载并以请求的格式保存结果
- 支持从现有任务ID恢复
- 将任务ID信息保存到JSON文件
环境要求
返回值
解析结果将以请求的格式保存到指定的result_path:
- - JSON格式: 包含任务详情和提取内容的结构化JSON
- Markdown格式: 按页显示内容的格式化Markdown
注意事项
- - 对于大型PDF文件,处理可能需要数分钟
- 免费用户可处理30页PDF - 访问https://api.pinocch.com/index获取额外试用额度
- 如果处理中断,可使用--task-id参数恢复处理
- 查看控制台输出以获取处理进度和状态更新
- 任务ID文件在上传成功后立即创建
- 代理注意事项: 在声明任务失败之前,请始终使用任务ID检查任务的当前状态。使用--task-id参数恢复或验证任务状态。任务可能仍在处理中或已成功完成。