PDF Batch Processor
Batch process multiple PDF files with common operations. No need for expensive online services - process locally, keep your data private.
Core Capabilities
1. Merge multiple PDFs
- - Combine multiple PDF files into one
- Preserve page order
- Add table of contents optional
2. Split a PDF
- - Split by page ranges
- Split each page into a separate file
- Extract specific pages
3. Rotate pages
- - Rotate all pages or specific page ranges
- Support 90/180/270 degree rotation
4. Extract text
- - Extract text from all pages
- Export to plain text or markdown
- Batch extract from multiple PDFs in a folder
5. Extract images
- - Save all images from a PDF to separate image files
- Preserve original image quality when possible
6. Compress PDF
- - Reduce file size for web/email
- Three compression levels (low/medium/high)
Usage Examples
Merge multiple PDFs
CODEBLOCK0
Split PDF into individual pages
CODEBLOCK1
Extract all text from PDFs in a folder
CODEBLOCK2
Rotate all pages 90 degrees clockwise
CODEBLOCK3
Installation
CODEBLOCK4
When to use this skill
✅ Use when:
- - You have multiple PDFs that need the same operation
- You want to keep processing local (private, no uploads needed)
- You need to automate PDF processing in a workflow
❌ Don't use when:
- - You only need to edit one page manually (use a GUI PDF editor)
- The PDF is encrypted/scanned image-only (needs OCR first)
- You need advanced editing (add/remove content, edit text)
Notes
- - Works with standard PDF files
- For scanned/image PDFs you need OCR first (use an OCR tool before processing)
- All processing is local - your files never leave your machine
PDF 批量处理器
使用常见操作批量处理多个PDF文件。无需昂贵的在线服务——本地处理,保护数据隐私。
核心功能
1. 合并多个PDF
- - 将多个PDF文件合并为一个
- 保留页面顺序
- 可选添加目录
2. 拆分PDF
- - 按页面范围拆分
- 将每一页拆分为单独文件
- 提取特定页面
3. 旋转页面
- - 旋转所有页面或特定页面范围
- 支持90/180/270度旋转
4. 提取文本
- - 从所有页面提取文本
- 导出为纯文本或Markdown格式
- 从文件夹中的多个PDF批量提取
5. 提取图片
- - 将PDF中的所有图片保存为单独的图片文件
- 尽可能保留原始图片质量
6. 压缩PDF
- - 减小文件大小以便网络/邮件传输
- 三种压缩级别(低/中/高)
使用示例
合并多个PDF
bash
python scripts/merge_pdfs.py --output combined.pdf file1.pdf file2.pdf file3.pdf
将PDF拆分为单独页面
bash
python scripts/split_pdfs.py --input document.pdf --output output-folder/ --mode pages
从文件夹中的PDF提取所有文本
bash
python scripts/extract_text.py --input ./pdfs/ --output ./text/
将所有页面顺时针旋转90度
bash
python scripts/rotate_pdf.py --input input.pdf --output output.pdf --degrees 90
安装
bash
pip install pypdf pillow
何时使用此技能
✅ 适用场景:
- - 有多个PDF需要执行相同操作
- 希望保持本地处理(私密,无需上传)
- 需要在工作流程中自动化PDF处理
❌ 不适用场景:
- - 只需手动编辑一页(请使用图形界面PDF编辑器)
- PDF是加密/扫描的纯图像文件(需要先进行OCR)
- 需要高级编辑(添加/删除内容、编辑文本)
注意事项
- - 适用于标准PDF文件
- 对于扫描/图像PDF,需要先进行OCR(在处理前使用OCR工具)
- 所有处理均在本地进行——文件不会离开您的电脑