Setup
On first use, read setup.md for integration guidelines.
When to Use
User needs browser automation: web scraping, E2E testing, PDF generation, screenshots, or any headless Chrome task. Agent handles page navigation, element interaction, waiting strategies, and data extraction.
Architecture
Scripts and outputs in ~/puppeteer/. See memory-template.md for structure.
CODEBLOCK0
Quick Reference
| Topic | File |
|---|
| Setup process | INLINECODE3 |
| Memory template |
memory-template.md |
| Selectors guide |
selectors.md |
| Waiting patterns |
waiting.md |
Core Rules
1. Always Wait Before Acting
Never click or type immediately after navigation. Always wait for the element:
await page.waitForSelector('#button');
await page.click('#button');
Clicking without waiting causes "element not found" errors 90% of the time.
2. Use Specific Selectors
Prefer stable selectors in this order:
- 1.
[data-testid="submit"] — test attributes (most stable) - INLINECODE8 — IDs
- INLINECODE9 — semantic combinations
- INLINECODE10 — classes (least stable, changes often)
Avoid: div > div > div > button — breaks on any DOM change.
3. Handle Navigation Explicitly
After clicks that navigate, wait for navigation:
await Promise.all([
page.waitForNavigation(),
page.click('a.next-page')
]);
Without this, the script continues before the new page loads.
4. Set Realistic Viewport
Always set viewport for consistent rendering:
await page.setViewport({ width: 1280, height: 800 });
Default viewport is 800x600 — many sites render differently or show mobile views.
5. Handle Popups and Dialogs
Dismiss dialogs before they block interaction:
page.on('dialog', async dialog => {
await dialog.dismiss(); // or dialog.accept()
});
Unhandled dialogs freeze the script.
6. Close Browser on Errors
Always wrap in try/finally:
const browser = await puppeteer.launch();
try {
// ... automation code
} finally {
await browser.close();
}
Leaked browser processes consume memory and ports.
7. Respect Rate Limits
Add delays between requests to avoid blocks:
await page.waitForTimeout(1000 + Math.random() * 2000);
Hammering sites triggers CAPTCHAs and IP bans.
Common Traps
- -
page.click() on invisible element → fails silently, use waitForSelector with INLINECODE14 - Screenshots of elements off-screen → blank image, scroll into view first
- INLINECODE15 returns undefined → cannot return DOM nodes, only serializable data
- Headless blocked by site → use
headless: 'new' or set user agent - Form submit reloads page →
page.waitForNavigation() or data is lost - Shadow DOM elements invisible to selectors → use
page.evaluateHandle() to pierce shadow roots - Cookies not persisting → launch with
userDataDir for session persistence
Security & Privacy
Data that stays local:
- - All scraped data in ~/puppeteer/output/
- Browser profile in specified userDataDir
This skill does NOT:
- - Send scraped data anywhere
- Store credentials (you provide them per-script)
- Access files outside ~/puppeteer/
Related Skills
Install with
clawhub install <slug> if user confirms:
- -
playwright — Cross-browser automation alternative - INLINECODE22 — Chrome DevTools and debugging
- INLINECODE23 — General web development
Feedback
- - If useful: INLINECODE24
- Stay updated: INLINECODE25
设置
首次使用时,请阅读 setup.md 了解集成指南。
使用场景
用户需要浏览器自动化操作:网页抓取、端到端测试、PDF生成、截图或任何无头Chrome任务。智能体负责页面导航、元素交互、等待策略和数据提取。
架构
脚本和输出文件位于 ~/puppeteer/ 目录下。结构说明请参见 memory-template.md。
~/puppeteer/
├── memory.md # 状态 + 偏好设置
├── scripts/ # 可复用的自动化脚本
└── output/ # 截图、PDF、抓取数据
快速参考
memory-template.md |
| 选择器指南 | selectors.md |
| 等待模式 | waiting.md |
核心规则
1. 操作前务必等待
导航后切勿立即点击或输入。务必等待元素出现:
javascript
await page.waitForSelector(#button);
await page.click(#button);
不等待直接点击,90%的情况下会导致元素未找到错误。
2. 使用特定选择器
按以下优先级选择稳定的选择器:
- 1. [data-testid=submit] — 测试属性(最稳定)
- #unique-id — ID
- form button[type=submit] — 语义组合
- .class-name — 类名(最不稳定,经常变化)
避免使用:div > div > div > button — DOM结构稍有变动就会失效。
3. 显式处理导航
点击触发导航后,需等待页面加载:
javascript
await Promise.all([
page.waitForNavigation(),
page.click(a.next-page)
]);
不这样做,脚本会在新页面加载完成前继续执行。
4. 设置合理的视口
始终设置视口以确保渲染一致性:
javascript
await page.setViewport({ width: 1280, height: 800 });
默认视口为800x600——许多网站会以不同方式渲染或显示移动端视图。
5. 处理弹窗和对话框
在对话框阻塞交互前将其关闭:
javascript
page.on(dialog, async dialog => {
await dialog.dismiss(); // 或 dialog.accept()
});
未处理的对话框会导致脚本卡死。
6. 出错时关闭浏览器
始终使用try/finally包裹:
javascript
const browser = await puppeteer.launch();
try {
// ... 自动化代码
} finally {
await browser.close();
}
泄漏的浏览器进程会消耗内存和端口资源。
7. 遵守速率限制
在请求之间添加延迟以避免被屏蔽:
javascript
await page.waitForTimeout(1000 + Math.random() * 2000);
频繁访问会触发验证码和IP封禁。
常见陷阱
- - 对不可见元素执行 page.click() → 静默失败,应使用带 visible: true 参数的 waitForSelector
- 对屏幕外元素截图 → 得到空白图片,需先滚动到可视区域
- page.evaluate() 返回undefined → 无法返回DOM节点,只能返回可序列化数据
- 网站屏蔽无头模式 → 使用 headless: new 或设置用户代理
- 表单提交导致页面刷新 → 使用 page.waitForNavigation(),否则数据会丢失
- Shadow DOM元素对选择器不可见 → 使用 page.evaluateHandle() 穿透影子根节点
- Cookie未持久化 → 使用 userDataDir 启动以实现会话持久化
安全与隐私
数据本地存储:
- - 所有抓取数据保存在 ~/puppeteer/output/ 目录
- 浏览器配置文件保存在指定的 userDataDir 目录
本技能不会:
- - 将抓取数据发送到任何地方
- 存储凭据(您需在每个脚本中提供)
- 访问 ~/puppeteer/ 目录外的文件
相关技能
如果用户确认,可使用 clawhub install
安装:
- - playwright — 跨浏览器自动化替代方案
- chrome — Chrome开发者工具和调试
- web — 通用Web开发
反馈
- - 如果觉得有用:clawhub star puppeteer
- 保持更新:clawhub sync