Pinterest Scraper

Full-featured Pinterest image scraper with automatic scrolling and multiple output options.

When This Skill Activates

This skill triggers when user wants to download images from Pinterest.

Reasoning Framework

Step	Action	Why
1	EXTRACT	Parse Pinterest URL to determine board/user/search
2

Setup

CODEBLOCK0

Decision Tree

What are you trying to do?

CODEBLOCK1

Quality Selection Decision

Quality	Use Case	File Size
originals	Best quality, archiving	Largest
736x

Usage

Command Line

CODEBLOCK2

Option	Description	Default
INLINECODE0	Pinterest URL (required)	-
INLINECODE1

Common Examples

CODEBLOCK3

Python API

This tool is CLI-based. Run it from your Python code:

CODEBLOCK4

Features

Feature	Description
Infinite Scroll	Automatic scrolling loads more images
Quality Options

Verbose Logging

Use -v or --verbose for detailed logging:

CODEBLOCK5

What gets logged:

- Scroll progress (every 10 scrolls)
Images found per scroll
Download progress (X/Y)
Telegram send status
Errors and warnings

Log files:

- Console: INFO level
INLINECODE14: DEBUG level (detailed)

Troubleshooting

Problem: No images downloaded

- Cause: Not enough scrolls, Pinterest didn't load
Fix: Increase -s value (try 100-200)

Problem: "Browser not found"

- Cause: Playwright not installed
Fix: INLINECODE16

Problem: SSL certificate errors (Mac)

- Cause: macOS SSL issues
Fix: Use verify=False in requests calls

Problem: Duplicate images

- Cause: Deduplication disabled or failed
Fix: Use --dedup flag (default: on)

Problem: Resume not working

- Cause: State file missing or URL changed
Fix: Use same URL as original, check INLINECODE19

Problem: Telegram not sending

- Cause: Invalid token/chat ID, rate limiting
Fix: Verify bot token, check chat ID, Telegram limits 100 images/batch

Problem: Verbose logs not writing

- Cause: File permission issue
Fix: Check write permissions in output directory

Self-Check

- [ ] Pinterest URL is valid (board/user/search)
[ ] Playwright installed: INLINECODE20
[ ] Quality selected appropriately for use case
[ ] Output directory exists or is writable
[ ] For Telegram: token and chat ID correct
[ ] For resume: using same URL as original scrape

Notes

- Pinterest loads dynamically - scrolling required for more images
Use verify=False for requests (Mac SSL issues)
State saved to .scrape_state.json for resume
Telegram limited to 100 images per batch
Verbose mode writes detailed logs to INLINECODE23

Quick Reference

Task	Command
Basic scrape	INLINECODE24
Verbose debug

Pinterest 爬虫

功能完备的 Pinterest 图片爬虫，支持自动滚动和多种输出选项。

技能触发条件

当用户想要从 Pinterest 下载图片时触发此技能。

推理框架

步骤	操作	原因
1	提取	解析 Pinterest URL 以确定是画板/用户/搜索
2

启动 | 使用隐身选项启动 Playwright 浏览器 | | 3 | 滚动 | 增量加载图片（Pinterest 使用无限滚动） | | 4 | 收集 | 提取图片 URL 并选择质量 | | 5 | 去重 | 基于哈希的重复检测 | | 6 | 下载 | 将图片保存到输出文件夹 | | 7 | 通知 | 可选：发送到 Telegram |

安装

bash
pip install playwright requests
playwright install chromium

决策树

您想做什么？

├── 从画板/用户下载图片
│ └── 使用：-u URL -s [滚动次数]
│
├── 获取最高质量
│ └── 使用：-q originals
│
├── 获取更小/更快的下载
│ └── 使用：-q 736x 或 236x
│
├── 发送图片到手机
│ └── 使用：--telegram --token X --chat Y
│
├── 恢复中断的爬取
│ └── 使用：--resume
│
└── 调试问题
└── 使用：-v（详细日志）

质量选择决策

质量	使用场景	文件大小
originals	最佳质量，归档	最大
736x

良好平衡 | 中等 | | 474x | 缩略图质量 | 小 | | 236x | 仅预览 | 最小 | | all | 保存所有版本 | 总大小最大 |

使用方法

命令行

bash
python scrape_pinterest.py -u URL [选项]

选项	描述	默认值
-u, --url	Pinterest URL（必填）	-
-s, --scrolls

常见示例

bash

基本爬取（50次滚动，原图，当前目录）

python scrape_pinterest.py -u URL

详细模式（日志输出到控制台 + scrape.log）

python scrape_pinterest.py -u URL -v

带 Telegram 发送

python scrapepinterest.py -u URL --telegram --token TOKEN --chat CHATID

恢复中断的爬取

python scrape_pinterest.py -u URL --resume -v

显示帮助

python scrape_pinterest.py --help

Python API

此工具基于命令行。从您的 Python 代码中运行：

python
import subprocess
import os

运行爬虫

result = subprocess.run( [python3, scrape_pinterest.py, -u, URL, -s, 50, -q, originals], cwd=./scripts, capture_output=True, text=True )

print(result.returncode) # 0 = 成功
print(result.stdout)

功能特性

功能	描述
无限滚动	自动滚动加载更多图片
质量选项

详细日志

使用 -v 或 --verbose 获取详细日志：

bash
python scrape_pinterest.py -u URL -v

记录内容：

- 滚动进度（每10次滚动）
每次滚动找到的图片数
下载进度（X/Y）
Telegram 发送状态
错误和警告

日志文件：

- 控制台：INFO 级别
scrape.log：DEBUG 级别（详细）

故障排除

问题：未下载图片

- 原因： 滚动次数不足，Pinterest 未加载
解决方法： 增加 -s 值（尝试 100-200）

问题：浏览器未找到

- 原因： Playwright 未安装
解决方法： playwright install chromium

问题：SSL 证书错误（Mac）

- 原因： macOS SSL 问题
解决方法： 在 requests 调用中使用 verify=False

问题：重复图片

- 原因： 去重已禁用或失败
解决方法： 使用 --dedup 标志（默认：开启）

问题：恢复功能不工作

- 原因： 状态文件丢失或 URL 已更改
解决方法： 使用与原始爬取相同的 URL，检查 .scrape_state.json

问题：Telegram 无法发送

- 原因： 令牌/聊天 ID 无效，速率限制
解决方法： 验证机器人令牌，检查聊天 ID，Telegram 每批限制 100 张图片

问题：详细日志未写入

- 原因： 文件权限问题
解决方法： 检查输出目录的写入权限

自检清单

- [ ] Pinterest URL 有效（画板/用户/搜索）
[ ] Playwright 已安装：playwright install chromium
[ ] 根据使用场景选择了合适的质量
[ ] 输出目录存在或可写入
[ ] 对于 Telegram：令牌和聊天 ID 正确
[ ] 对于恢复：使用与原始爬取相同的 URL

注意事项

- Pinterest 动态加载 - 需要滚动以获取更多图片
使用 verify=False 处理 requests（Mac SSL 问题）
状态保存到 .scrape_state.json 用于恢复
Telegram 每批限制 100 张图片
详细模式将详细日志写入 scrape.log

快速参考

任务	命令
基本爬取	python scrapepinterest.py -u URL
详细调试

pinterest-scraperPinterest图片采集器