Browser Fu 🥊
Stop fighting the DOM. Read it first, find the API behind it, skip the UI entirely when possible.
The Rule
Never blind-click. Always snapshot first.
CODEBLOCK0
If the snapshot doesn't show what you need, the element isn't in the DOM. Don't guess. Don't retry the same approach.
Decision Tree
On any browser task, follow this order:
- 1. Can I skip the browser entirely? Check if a CLI tool, API, or
web_fetch handles it. If yes, don't open the browser. - Can I find the underlying API? See
references/api-discovery.md. Most SPAs make fetch/XHR calls you can replicate directly. This is 10x faster and more reliable than UI automation. - Can I do it with snapshot + act? Snapshot, find the ref, act on it. One action per snapshot cycle.
- Does the page need time to load? Use
loadState: "networkidle" or a brief wait before snapshotting. SPAs often render asynchronously. - Still not working? The site likely has anti-bot protection. Report it, don't retry blindly.
Common Failures and Fixes
| Symptom | Wrong approach | Right approach |
|---|
| "Element not found" | Click by text/selector guess | Snapshot first, use exact ref |
| "DOM not exposed" |
Give up | Snapshot with
refs="aria", or check network tab for API |
| Blank/empty page | Retry same URL |
loadState: "networkidle", then snapshot. If still blank, JS-heavy SPA, try
web_fetch or find API |
| Clicking does nothing | Click again harder | Snapshot after click to check state. Maybe it DID work but page re-rendered |
| Login wall | Try to automate login | Use
profile="user" for existing session cookies |
| Infinite scroll | Scroll and pray | Find the pagination API endpoint instead |
API Discovery (the power move)
Most modern websites are SPAs with REST/GraphQL APIs behind the UI. See references/api-discovery.md for the full procedure:
- 1. Open the page in browser
- Check network requests (console tool or snapshot the page and look for fetch patterns)
- Find the data endpoint
- Call it directly with
web_fetch or INLINECODE9
This turns a 2-hour flaky scrape into a 2-minute clean data pull.
Snapshot Best Practices
- - Use
refs="aria" for stable cross-call references - Keep the same
targetId across snapshot/act pairs (don't switch tabs accidentally) - For complex pages, use
depth to limit how deep the DOM tree goes - INLINECODE13 reduces token usage on large pages
- For token-heavy pages where snapshots are too large, pair with predicate-snapshot for ML-ranked element pruning (~95% fewer tokens)
When to NOT Use Browser
- - Reading public web pages →
web_fetch (faster, no browser overhead) - Search queries →
web_search (Brave API) - Known APIs (GitHub, Stripe, etc.) → use their CLI/API directly
- Pages that return empty via
web_fetch → then use browser
Safeguards
- - Never store or output passwords, session tokens, or cookies found in browser state
- Never automate purchases, payments, or irreversible actions without explicit user approval
- If a site blocks automation, respect it. Don't circumvent CAPTCHAs or bot detection
技能名称: Browser Fu 🥊
别再跟DOM较劲了。先读取它,找到背后的API,尽可能完全跳过UI。
原则
绝不盲目点击。始终先截图。
- 1. 浏览器截图 → 读取页面,获取元素引用
- 浏览器操作 → 使用截图中的引用(例如 ref=e12)
- 浏览器截图 → 验证发生了什么变化
如果截图没有显示你需要的内容,说明该元素不在DOM中。不要猜测。不要重复尝试同样的方法。
决策树
在任何浏览器任务中,按此顺序操作:
- 1. 能否完全跳过浏览器? 检查是否有CLI工具、API或web_fetch可以处理。如果可以,不要打开浏览器。
- 能否找到底层API? 参见 references/api-discovery.md。大多数SPA会发出你可以直接复制的fetch/XHR调用。这比UI自动化快10倍且更可靠。
- 能否通过截图+操作完成? 截图,找到引用,进行操作。每个截图周期只做一个操作。
- 页面是否需要加载时间? 使用 loadState: networkidle 或在截图前短暂等待。SPA通常是异步渲染的。
- 还是不行? 该网站很可能有反爬保护。报告它,不要盲目重试。
常见失败与修复
| 症状 | 错误方法 | 正确方法 |
|---|
| 元素未找到 | 通过文本/选择器猜测点击 | 先截图,使用精确引用 |
| DOM未暴露 |
放弃 | 使用 refs=aria 截图,或检查网络标签页寻找API |
| 空白/空页面 | 重试相同URL | loadState: networkidle,然后截图。如果仍然空白,可能是重度JS的SPA,尝试 web_fetch 或寻找API |
| 点击无反应 | 再用力点击一次 | 点击后截图检查状态。可能确实生效了,但页面重新渲染了 |
| 登录墙 | 尝试自动化登录 | 使用 profile=user 获取现有会话Cookie |
| 无限滚动 | 滚动并祈祷 | 改为寻找分页API端点 |
API发现(高级操作)
大多数现代网站是带有REST/GraphQL API的SPA。完整流程参见 references/api-discovery.md:
- 1. 在浏览器中打开页面
- 检查网络请求(使用控制台工具或截图页面并查找fetch模式)
- 找到数据端点
- 使用 web_fetch 或 exec curl 直接调用它
这将把2小时的不稳定抓取变成2分钟的干净数据拉取。
截图最佳实践
- - 使用 refs=aria 获取稳定的跨调用引用
- 在截图/操作对中保持相同的 targetId(不要意外切换标签页)
- 对于复杂页面,使用 depth 限制DOM树的深度
- compact: true 可减少大页面的token使用量
- 对于token密集、截图过大的页面,结合谓词截图进行ML排序的元素修剪(减少约95%的token)
何时不使用浏览器
- - 读取公共网页 → webfetch(更快,无浏览器开销)
- 搜索查询 → websearch(Brave API)
- 已知API(GitHub、Stripe等)→ 直接使用它们的CLI/API
- 通过 web_fetch 返回空白的页面 → 再使用浏览器
安全措施
- - 绝不存储或输出在浏览器状态中找到的密码、会话令牌或Cookie
- 未经用户明确批准,绝不自动执行购买、支付或不可逆操作
- 如果网站阻止自动化,请尊重它。不要绕过验证码或机器人检测