browser-read
Extract readable text from an already-open browser page and return markdown, suitable for pages where web_fetch is blocked or missing auth context.
When to use
- -
web_fetch returned an error or empty content. - Page requires authentication/cookies/session state available only in the browser.
- You need text extraction from Twitter/X or LinkedIn timelines/articles where screenshot/OCR was previously used.
When NOT to use
- -
web_fetch already returns good markdown/text (faster and cheaper). - Purely static pages where normal fetch is sufficient.
Steps
- 1. Navigate to the URL with
browser navigate. - Read extraction script from
~/clawd/skills/browser-read/extract.js. - Run
browser act with kind=evaluate and pass the script contents as fn. - Script returns
{title, content, excerpt, byline, siteName, length} where content is markdown. - If extraction fails or returns empty content, script falls back to
document.body.innerText.
Example (tool calls)
CODEBLOCK0
Notes
- -
extract.js is a self-contained IIFE so it can be passed directly as the fn value to browser act. - Keep in mind this is a lightweight extractor; it intentionally strips script/style/nav/header/footer/aside/cookie/ad elements before conversion.
browser-read
从已打开的浏览器页面中提取可读文本并返回Markdown格式,适用于web_fetch被屏蔽或缺少认证上下文的页面。
使用场景
- - web_fetch返回错误或空内容。
- 页面需要仅在浏览器中可用的认证/cookie/会话状态。
- 需要从Twitter/X或LinkedIn的时间线/文章中提取文本(此前需使用截图/OCR)。
禁止使用场景
- - web_fetch已能返回良好的Markdown/文本(更快且成本更低)。
- 纯静态页面,普通抓取即可满足需求。
操作步骤
- 1. 使用browser navigate导航至目标URL。
- 从~/clawd/skills/browser-read/extract.js读取提取脚本。
- 使用kind=evaluate运行browser act,并将脚本内容作为fn参数传入。
- 脚本返回{title, content, excerpt, byline, siteName, length},其中content为Markdown格式。
- 若提取失败或返回空内容,脚本将回退使用document.body.innerText。
示例(工具调用)
json
{
action: navigate,
targetId: ...,
url: https://example.com
}
{
action: act,
targetId: ...,
kind: evaluate,
fn: (() => { ... return {title, content, excerpt, byline, siteName, length}; })()
}
注意事项
- - extract.js是一个自包含的IIFE(立即执行函数表达式),因此可直接作为fn值传递给browser act。
- 请注意这是一个轻量级提取器;在转换前会主动剥离script/style/nav/header/footer/aside/cookie/广告等元素。