Gbrow — The Browser Your AI Agent Actually Needs
A full-featured headless browser powered by Playwright and Bun. Uses the accessibility tree for page reading — not expensive vision models.
Why Gbrow?
| Traditional (screenshots + vision) | Gbrow (accessibility tree) |
|---|
| Screenshot → upload to GPT-4o → wait → read | INLINECODE0 → instant structured text |
| ~$0.01 per page read |
Free |
| 3-10 seconds per page |
< 100ms |
| Fails on API key issues |
Always works |
| Click by fragile CSS selector |
Click by @ref (
@e1,
@e2, etc.) |
Quick Setup
CODEBLOCK0
Or one-liner:
CODEBLOCK1
How It Works
1. Start the server
CODEBLOCK2
2. Read the page (accessibility tree)
The snapshot gives you a structured view with clickable refs:
CODEBLOCK3
3. Click by ref
CODEBLOCK4
Commands
Navigation
| Command | Description | Example |
|---|
| INLINECODE3 | Navigate to URL | INLINECODE4 |
| INLINECODE5 |
History back |
back |
|
forward | History forward |
forward |
|
reload | Reload page |
reload |
|
url | Print current URL |
url |
Reading
| Command | Description | Example |
|---|
| INLINECODE13 | Accessibility tree with @refs | INLINECODE14 (interactive only) |
| INLINECODE15 |
Cleaned page text |
text |
|
html [selector] | Raw HTML |
html .article |
|
links | All links as "text → href" |
links |
|
forms | Form fields as JSON |
forms |
Interaction
| Command | Description | Example |
|---|
| INLINECODE23 | Click element | INLINECODE24 |
| INLINECODE25 |
Fill input |
fill @e4 "hello" |
|
select <ref> <value> | Select dropdown |
select @e5 "option1" |
|
type <ref> <text> | Type with keyboard |
type @e4 "search term" |
|
press <key> | Press key |
press Enter |
|
scroll <direction> | Scroll page |
scroll down |
Inspection
| Command | Description | Example |
|---|
| INLINECODE35 | Run JavaScript | INLINECODE36 |
| INLINECODE37 |
Computed CSS |
css .box color |
|
attrs <ref> | Element attributes |
attrs @e1 |
|
is <prop> <ref> | State check |
is visible @e3 |
Tabs
| Command | Description |
|---|
| INLINECODE43 | List open tabs |
| INLINECODE44 |
Switch to tab N |
|
newtab | Open new tab |
|
closetab | Close current tab |
Visual
| Command | Description |
|---|
| INLINECODE47 | Take screenshot |
| INLINECODE48 |
Set viewport size |
|
pdf | Save page as PDF |
Snapshot Flags
| Flag | Description |
|---|
| INLINECODE50 | Interactive elements only (buttons, links, inputs) |
| INLINECODE51 |
Compact (remove empty structural nodes) |
|
-d N | Limit tree depth |
|
-s <sel> | Scope to CSS selector |
|
-D | Diff against previous snapshot |
|
-a | Annotated screenshot with ref overlays |
HTTP API
All commands go through the HTTP API:
CODEBLOCK5
Architecture
CODEBLOCK6
No vision models. No API calls. Just structured text from the browser's accessibility layer.
Credits
Built on top of gstack by Gary Tan (Y Combinator). Adapted for OpenClaw with permission under MIT license.
License
MIT
Gbrow — 你的AI代理真正需要的浏览器
一个由Playwright和Bun驱动的全功能无头浏览器。利用无障碍树进行页面读取——无需昂贵的视觉模型。
为什么选择Gbrow?
| 传统方式(截图+视觉模型) | Gbrow(无障碍树) |
|---|
| 截图 → 上传至GPT-4o → 等待 → 读取 | ariaSnapshot() → 即时结构化文本 |
| 每次页面读取约$0.01 |
免费 |
| 每次3-10秒 |
< 100ms |
| 因API密钥问题而失败 |
始终可用 |
| 通过脆弱的CSS选择器点击 |
通过@ref点击(@e1、@e2等) |
快速设置
bash
克隆并安装
git clone https://github.com/ashish797/Gbrow.git ~/.openclaw/workspace/skills/Gbrow
cd ~/.openclaw/workspace/skills/Gbrow
bash setup.sh
或一行命令:
bash
curl -fsSL https://raw.githubusercontent.com/ashish797/Gbrow/main/setup.sh | bash
工作原理
1. 启动服务器
bash
cd ~/.openclaw/workspace/skills/Gbrow
bun run src/server.ts
2. 读取页面(无障碍树)
快照会为你提供带有可点击引用的结构化视图:
@e1 [heading] 欢迎 [level=1]
@e2 [link] 开始使用
@e3 [button] 登录
@e4 [textbox] 搜索
3. 通过引用点击
click @e2 → 点击开始使用
fill @e4 查询 → 在搜索框中输入
命令
导航
| 命令 | 描述 | 示例 |
|---|
| goto <url> | 导航到URL | goto https://example.com |
| back |
返回历史记录 | back |
| forward | 前进历史记录 | forward |
| reload | 刷新页面 | reload |
| url | 打印当前URL | url |
读取
| 命令 | 描述 | 示例 |
|---|
| snapshot | 带@ref的无障碍树 | snapshot -i(仅交互元素) |
| text |
清理后的页面文本 | text |
| html [selector] | 原始HTML | html .article |
| links | 所有链接,格式为文本 → href | links |
| forms | 表单字段,JSON格式 | forms |
交互
| 命令 | 描述 | 示例 |
|---|
| click <ref> | 点击元素 | click @e3 |
| fill <ref> <text> |
填充输入框 | fill @e4 你好 |
| select
[ | 选择下拉菜单 | select @e5 option1 |
| type ][ | 使用键盘输入 | type @e4 搜索词 |
| press | 按下按键 | press Enter |
| scroll | 滚动页面 | scroll down |
]检查
| 命令 | 描述 | 示例 |
|---|
| js <expr> | 运行JavaScript | js document.title |
| css <sel> <prop> |
计算后的CSS | css .box color |
| attrs [ | 元素属性 | attrs @e1 |
| is ][ | 状态检查 | is visible @e3 |
]标签页
切换到标签页N |
| newtab | 打开新标签页 |
| closetab | 关闭当前标签页 |
视觉
| 命令 | 描述 |
|---|
| screenshot | 截图 |
| responsive <w> <h> |
设置视口大小 |
| pdf | 将页面保存为PDF |
快照标志
紧凑模式(移除空的结构节点) |
| -d N | 限制树深度 |
| -s | 限定CSS选择器范围 |
| -D | 与上一次快照对比差异 |
| -a | 带引用覆盖的注释截图 |
HTTP API
所有命令都通过HTTP API发送:
bash
从状态文件中获取端口和令牌
PORT=$(python3 -c import json; print(json.load(open(.gstack/browse.json))[port]))
TOKEN=$(python3 -c import json; print(json.load(open(.gstack/browse.json))[token]))
发送命令
curl -s -X POST http://127.0.0.1:${PORT}/command \
-H Authorization: Bearer ${TOKEN} \
-H Content-Type: application/json \
-d {command:goto,args:[https://example.com]}
架构
┌─────────────┐ HTTP ┌──────────────────┐
│ OpenClaw │ ──────────▶ │ Gbrow 服务器 │
│ 代理 │ │ (Bun + Playwright)│
└─────────────┘ └────────┬─────────┘
│
▼
┌──────────────────┐
│ Chromium │
│ (无头模式) │
└──────────────────┘
│
▼
┌──────────────────┐
│ 无障碍树 │
│ (ariaSnapshot) │
└──────────────────┘
无需视觉模型。无需API调用。只需来自浏览器无障碍层的结构化文本。
致谢
基于Gary Tan(Y Combinator)的gstack构建。经许可在MIT许可证下为OpenClaw改编。
许可证
MIT