返回顶部
p

pdf-all-in-onePDF全能工具

All-in-one PDF processing tool. Merge, split, extract, convert PDFs. Supports text extraction, table recognition, PDF-to-image conversion, OCR. Triggers: PDF, pdf.

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.2
安全检测
已通过
187
下载量
免费
免费
0
收藏
概述
安装方式
版本历史

pdf-all-in-one

PDF 全能处理指南

概述

本指南涵盖全面的PDF处理操作,包括转换为图像。有关高级功能,请参阅REFERENCE.md。

工作目录: /pdf-all-in-one-workspace/

快速入门

python
from pypdf import PdfReader, PdfWriter

读取PDF

reader = PdfReader(document.pdf) print(f页数: {len(reader.pages)})

提取文本

text = for page in reader.pages: text += page.extract_text()

PDF转图像

将PDF页面转换为PNG/JPG

python
from pdf2image import convertfrompath
import os

配置

pdf_path = input.pdf outputdir = pdf-all-in-one-workspace/outputimages os.makedirs(outputdir, existok=True)

将PDF转换为图像

images = convertfrompath(pdf_path, dpi=150)

将每页保存为PNG

for i, image in enumerate(images): outputpath = f{outputdir}/page_{i+1}.png image.save(output_path, PNG) print(f已保存: {output_path})

print(f转换总页数: {len(images)})

转换指定页面范围

python
from pdf2image import convertfrompath

仅转换第1-5页

images = convertfrompath(document.pdf, dpi=200, first_page=1, last_page=5)

for i, image in enumerate(images):
image.save(fpdf-all-in-one-workspace/page_{i+1}.jpg, JPEG, quality=95)

前置条件

bash

安装Python库


pip install pdf2image

安装系统依赖(poppler)

Ubuntu/Debian:

sudo apt-get install poppler-utils

CentOS/RHEL:

sudo yum install poppler-utils

macOS:

brew install poppler

Python库

pypdf - 基本操作

合并PDF

python from pypdf import PdfWriter, PdfReader

writer = PdfWriter()
for pdf_file in [doc1.pdf, doc2.pdf, doc3.pdf]:
reader = PdfReader(pdf_file)
for page in reader.pages:
writer.add_page(page)

with open(merged.pdf, wb) as output:
writer.write(output)

拆分PDF

python reader = PdfReader(input.pdf) for i, page in enumerate(reader.pages): writer = PdfWriter() writer.add_page(page) with open(fpdf-all-in-one-workspace/page_{i+1}.pdf, wb) as output: writer.write(output)

提取元数据

python reader = PdfReader(document.pdf) meta = reader.metadata print(f标题: {meta.title}) print(f作者: {meta.author}) print(f主题: {meta.subject}) print(f创建者: {meta.creator})

旋转页面

python reader = PdfReader(input.pdf) writer = PdfWriter()

page = reader.pages[0]
page.rotate(90) # 顺时针旋转90度
writer.add_page(page)

with open(rotated.pdf, wb) as output:
writer.write(output)

pdfplumber - 文本和表格提取

提取带布局的文本

python import pdfplumber

with pdfplumber.open(document.pdf) as pdf:
for page in pdf.pages:
text = page.extract_text()
print(text)

提取表格

python with pdfplumber.open(document.pdf) as pdf: for i, page in enumerate(pdf.pages): tables = page.extract_tables() for j, table in enumerate(tables): print(f第{i+1}页中的第{j+1}个表格:) for row in table: print(row)

高级表格提取

python import pandas as pd

with pdfplumber.open(document.pdf) as pdf:
all_tables = []
for page in pdf.pages:
tables = page.extract_tables()
for table in tables:
if table:
df = pd.DataFrame(table[1:], columns=table[0])
all_tables.append(df)

if all_tables:
combineddf = pd.concat(alltables, ignore_index=True)
combineddf.toexcel(pdf-all-in-one-workspace/extracted_tables.xlsx, index=False)

reportlab - 创建PDF

基本PDF创建

python from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas

c = canvas.Canvas(pdf-all-in-one-workspace/hello.pdf, pagesize=letter)
width, height = letter

c.drawString(100, height - 100, Hello World!)
c.drawString(100, height - 120, 这是使用reportlab创建的PDF)
c.line(100, height - 140, 400, height - 140)
c.save()

下标和上标

python from reportlab.platypus import Paragraph from reportlab.lib.styles import getSampleStyleSheet

styles = getSampleStyleSheet()
chemical = Paragraph(H2O, styles[Normal])
squared = Paragraph(x2 + y2, styles[Normal])

命令行工具

pdftotext (poppler-utils)

bash

提取文本

pdftotext input.pdf output.txt

保留布局提取文本

pdftotext -layout input.pdf output.txt

提取指定页面

pdftotext -f 1 -l 5 input.pdf output.txt

qpdf

bash

合并PDF

qpdf --empty --pages file1.pdf file2.pdf -- merged.pdf

拆分页面

qpdf input.pdf --pages . 1-5 -- pages1-5.pdf

旋转页面

qpdf input.pdf output.pdf --rotate=+90:1

移除密码

qpdf --password=mypassword --decrypt encrypted.pdf decrypted.pdf

pdftk

bash

合并

pdftk file1.pdf file2.pdf cat output merged.pdf

拆分

pdftk input.pdf burst

旋转

pdftk input.pdf rotate 1east output rotated.pdf

pdfimages - 从PDF提取图像

bash

将所有图像提取为JPG

pdfimages -j input.pdf pdf-all-in-one-workspace/output_prefix

常见任务

从扫描版PDF提取文本(OCR)

python import pytesseract from pdf2image import convertfrompath

images = convertfrompath(scanned.pdf)

text =
for i, image in enumerate(images):
text += f第{i+1}页:\n
text += pytesseract.imagetostring(image)
text += \n\n

print(text)

添加水印

python from pypdf import PdfReader, PdfWriter

watermark = PdfReader(watermark.pdf).pages[0]
reader = PdfReader(document.pdf)
writer = PdfWriter()

for page in reader.pages:
page.merge_page(watermark)
writer.add_page(page)

with open(pdf-all-in-one-workspace/watermarked.pdf, wb) as output:
writer.write(output)

密码保护

python from pypdf import PdfReader, PdfWriter

reader = PdfReader(input.pdf)
writer = PdfWriter()

for page in reader.pages:
writer.add_page(page)

writer.encrypt(userpassword, ownerpassword)

with open(pdf-all-in-one-workspace/encrypted.pdf, wb) as output:
writer.write(output)

快速参考

任务最佳工具命令/代码
PDF转图像pdf2imageconvertfrompath(pdf, dpi=150)
合并PDF
pypdf | writer.add_page(page) | | 拆分PDF | pypdf | 每页一个文件 | | 提取文本 | pdfplumber | page.extract_text() | | 提取表格 | pdfplumber | page.extract_tables() | | 创建PDF | reportlab | Canvas或Platypus | | OCR扫描版PDF | pytesseract | pdf2image + pytesseract | | 提取图像 | pdfimages | pdfimages -j input.pdf output_prefix | | 命令行合并 | qpdf | qpdf --empty --pages ... |

工作目录结构

/
└── pdf-all-in-one-workspace/
├── input/ # 在此放置输入PDF
├── output_images/ # 转换后的图像输出
├── output_pdfs

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 pdf-all-in-one-1776352264 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 pdf-all-in-one-1776352264 技能

通过命令行安装

skillhub install pdf-all-in-one-1776352264

下载

⬇ 下载 pdf-all-in-one v1.0.2(免费)

文件大小: 21.66 KB | 发布时间: 2026-4-17 15:43

v1.0.2 最新 2026-4-17 15:43
Added PDF to image conversion feature, renamed from pdf-cn

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部