Pinterest Scraper
Full-featured Pinterest image scraper with automatic scrolling and multiple output options.
When This Skill Activates
This skill triggers when user wants to download images from Pinterest.
Reasoning Framework
| Step | Action | Why |
|---|
| 1 | EXTRACT | Parse Pinterest URL to determine board/user/search |
| 2 |
LAUNCH | Start Playwright browser with stealth options |
| 3 |
SCROLL | Incrementally load images (Pinterest uses infinite scroll) |
| 4 |
COLLECT | Extract image URLs with quality selection |
| 5 |
DEDUP | Hash-based duplicate detection |
| 6 |
DOWNLOAD | Save images to output folder |
| 7 |
NOTIFY | Optional: send to Telegram |
Setup
CODEBLOCK0
Decision Tree
What are you trying to do?
CODEBLOCK1
Quality Selection Decision
| Quality | Use Case | File Size |
|---|
| originals | Best quality, archiving | Largest |
| 736x |
Good balance | Medium |
| 474x | Thumbnail quality | Small |
| 236x | Preview only | Smallest |
| all | Save every version | Largest total |
Usage
Command Line
CODEBLOCK2
| Option | Description | Default |
|---|
| INLINECODE0 | Pinterest URL (required) | - |
| INLINECODE1 |
Number of scrolls | 50 |
|
-o, --output | Output folder | ./pinterest_output |
|
-q, --quality | Quality: originals/736x/474x/236x/all | originals |
|
-v, --verbose | Enable verbose logging | false |
|
--telegram | Send images to Telegram | false |
|
--token | Telegram bot token | - |
|
--chat | Telegram chat ID | - |
|
--resume | Resume from previous scrape | false |
|
--dedup | Skip duplicates | true |
|
--no-dedup | Disable deduplication | - |
|
--telegram-only | Only send existing files | false |
Common Examples
CODEBLOCK3
Python API
This tool is CLI-based. Run it from your Python code:
CODEBLOCK4
Features
| Feature | Description |
|---|
| Infinite Scroll | Automatic scrolling loads more images |
| Quality Options |
originals/736x/474x/236x/all |
| Telegram | Send directly to Telegram |
| Deduplication | Hash-based duplicate detection |
| Resume | Continue from previous scrape |
| URL Types | Boards, user profiles, search results |
| Verbose Logging | -v flag, logs to console + scrape.log |
Verbose Logging
Use -v or --verbose for detailed logging:
CODEBLOCK5
What gets logged:
- - Scroll progress (every 10 scrolls)
- Images found per scroll
- Download progress (X/Y)
- Telegram send status
- Errors and warnings
Log files:
- - Console: INFO level
- INLINECODE14 : DEBUG level (detailed)
Troubleshooting
Problem: No images downloaded
- - Cause: Not enough scrolls, Pinterest didn't load
- Fix: Increase
-s value (try 100-200)
Problem: "Browser not found"
- - Cause: Playwright not installed
- Fix: INLINECODE16
Problem: SSL certificate errors (Mac)
- - Cause: macOS SSL issues
- Fix: Use
verify=False in requests calls
Problem: Duplicate images
- - Cause: Deduplication disabled or failed
- Fix: Use
--dedup flag (default: on)
Problem: Resume not working
- - Cause: State file missing or URL changed
- Fix: Use same URL as original, check INLINECODE19
Problem: Telegram not sending
- - Cause: Invalid token/chat ID, rate limiting
- Fix: Verify bot token, check chat ID, Telegram limits 100 images/batch
Problem: Verbose logs not writing
- - Cause: File permission issue
- Fix: Check write permissions in output directory
Self-Check
- - [ ] Pinterest URL is valid (board/user/search)
- [ ] Playwright installed: INLINECODE20
- [ ] Quality selected appropriately for use case
- [ ] Output directory exists or is writable
- [ ] For Telegram: token and chat ID correct
- [ ] For resume: using same URL as original scrape
Notes
- - Pinterest loads dynamically - scrolling required for more images
- Use
verify=False for requests (Mac SSL issues) - State saved to
.scrape_state.json for resume - Telegram limited to 100 images per batch
- Verbose mode writes detailed logs to INLINECODE23
Quick Reference
| Task | Command |
|---|
| Basic scrape | INLINECODE24 |
| Verbose debug |
python scrape_pinterest.py -u "URL" -v |
| High quality |
python scrape_pinterest.py -u "URL" -q originals |
| Fast/small |
python scrape_pinterest.py -u "URL" -q 236x |
| Send to Telegram |
python scrape_pinterest.py -u "URL" --telegram --token X --chat Y |
| Resume |
python scrape_pinterest.py -u "URL" --resume |
| Custom output |
python scrape_pinterest.py -u "URL" -o ./myfolder |
Pinterest 爬虫
功能完备的 Pinterest 图片爬虫,支持自动滚动和多种输出选项。
技能触发条件
当用户想要从 Pinterest 下载图片时触发此技能。
推理框架
| 步骤 | 操作 | 原因 |
|---|
| 1 | 提取 | 解析 Pinterest URL 以确定是画板/用户/搜索 |
| 2 |
启动 | 使用隐身选项启动 Playwright 浏览器 |
| 3 |
滚动 | 增量加载图片(Pinterest 使用无限滚动) |
| 4 |
收集 | 提取图片 URL 并选择质量 |
| 5 |
去重 | 基于哈希的重复检测 |
| 6 |
下载 | 将图片保存到输出文件夹 |
| 7 |
通知 | 可选:发送到 Telegram |
安装
bash
pip install playwright requests
playwright install chromium
决策树
您想做什么?
├── 从画板/用户下载图片
│ └── 使用:-u URL -s [滚动次数]
│
├── 获取最高质量
│ └── 使用:-q originals
│
├── 获取更小/更快的下载
│ └── 使用:-q 736x 或 236x
│
├── 发送图片到手机
│ └── 使用:--telegram --token X --chat Y
│
├── 恢复中断的爬取
│ └── 使用:--resume
│
└── 调试问题
└── 使用:-v(详细日志)
质量选择决策
| 质量 | 使用场景 | 文件大小 |
|---|
| originals | 最佳质量,归档 | 最大 |
| 736x |
良好平衡 | 中等 |
| 474x | 缩略图质量 | 小 |
| 236x | 仅预览 | 最小 |
| all | 保存所有版本 | 总大小最大 |
使用方法
命令行
bash
python scrape_pinterest.py -u URL [选项]
| 选项 | 描述 | 默认值 |
|---|
| -u, --url | Pinterest URL(必填) | - |
| -s, --scrolls |
滚动次数 | 50 |
| -o, --output | 输出文件夹 | ./pinterest_output |
| -q, --quality | 质量:originals/736x/474x/236x/all | originals |
| -v, --verbose | 启用详细日志 | false |
| --telegram | 发送图片到 Telegram | false |
| --token | Telegram 机器人令牌 | - |
| --chat | Telegram 聊天 ID | - |
| --resume | 从上次爬取恢复 | false |
| --dedup | 跳过重复项 | true |
| --no-dedup | 禁用去重 | - |
| --telegram-only | 仅发送已有文件 | false |
常见示例
bash
基本爬取(50次滚动,原图,当前目录)
python scrape_pinterest.py -u URL
详细模式(日志输出到控制台 + scrape.log)
python scrape_pinterest.py -u URL -v
更多滚动,自定义输出,中等质量
python scrape_pinterest.py -u URL -s 100 -o ./output -q 736x -v
带 Telegram 发送
python scrape
pinterest.py -u URL --telegram --token TOKEN --chat CHATID
恢复中断的爬取
python scrape_pinterest.py -u URL --resume -v
显示帮助
python scrape_pinterest.py --help
Python API
此工具基于命令行。从您的 Python 代码中运行:
python
import subprocess
import os
运行爬虫
result = subprocess.run(
[python3, scrape_pinterest.py, -u, URL, -s, 50, -q, originals],
cwd=./scripts,
capture_output=True,
text=True
)
print(result.returncode) # 0 = 成功
print(result.stdout)
功能特性
originals/736x/474x/236x/all |
| Telegram | 直接发送到 Telegram |
| 去重 | 基于哈希的重复检测 |
| 恢复 | 从上次爬取继续 |
| URL 类型 | 画板、用户主页、搜索结果 |
| 详细日志 | -v 标志,输出到控制台 + scrape.log |
详细日志
使用 -v 或 --verbose 获取详细日志:
bash
python scrape_pinterest.py -u URL -v
记录内容:
- - 滚动进度(每10次滚动)
- 每次滚动找到的图片数
- 下载进度(X/Y)
- Telegram 发送状态
- 错误和警告
日志文件:
- - 控制台:INFO 级别
- scrape.log:DEBUG 级别(详细)
故障排除
问题:未下载图片
- - 原因: 滚动次数不足,Pinterest 未加载
- 解决方法: 增加 -s 值(尝试 100-200)
问题:浏览器未找到
- - 原因: Playwright 未安装
- 解决方法: playwright install chromium
问题:SSL 证书错误(Mac)
- - 原因: macOS SSL 问题
- 解决方法: 在 requests 调用中使用 verify=False
问题:重复图片
- - 原因: 去重已禁用或失败
- 解决方法: 使用 --dedup 标志(默认:开启)
问题:恢复功能不工作
- - 原因: 状态文件丢失或 URL 已更改
- 解决方法: 使用与原始爬取相同的 URL,检查 .scrape_state.json
问题:Telegram 无法发送
- - 原因: 令牌/聊天 ID 无效,速率限制
- 解决方法: 验证机器人令牌,检查聊天 ID,Telegram 每批限制 100 张图片
问题:详细日志未写入
- - 原因: 文件权限问题
- 解决方法: 检查输出目录的写入权限
自检清单
- - [ ] Pinterest URL 有效(画板/用户/搜索)
- [ ] Playwright 已安装:playwright install chromium
- [ ] 根据使用场景选择了合适的质量
- [ ] 输出目录存在或可写入
- [ ] 对于 Telegram:令牌和聊天 ID 正确
- [ ] 对于恢复:使用与原始爬取相同的 URL
注意事项
- - Pinterest 动态加载 - 需要滚动以获取更多图片
- 使用 verify=False 处理 requests(Mac SSL 问题)
- 状态保存到 .scrape_state.json 用于恢复
- Telegram 每批限制 100 张图片
- 详细模式将详细日志写入 scrape.log
快速参考
| 任务 | 命令 |
|---|
| 基本爬取 | python scrapepinterest.py -u URL |
| 详细调试 |
python scrapepinterest.py -u URL -v |
| 高质量 | python scrape_pinterest.py -u URL -q originals |
| 快速/小尺寸 | python scrape_pinterest.py -u URL -q 236x |
| 发送到 Telegram | python scrape_pinterest.py -u URL --telegram --token X --chat Y |
| 恢复 | python scrape_pinterest.py -u URL --resume |
| 自定义输出 | python scrape_pinterest.py -u URL -o ./myfolder |