Browser Automation with browser-act CLI
INLINECODE0 is a CLI for browser automation with stealth and captcha solving capabilities. It supports two browser types (Stealth and Real Chrome) and provides commands for navigation, page interaction, data extraction, tab/session management, and more.
All commands output human-readable text by default. Use --format json for structured JSON output, ideal for AI agent integration and scripting.
Installation
Source: browser-act-cli on PyPI · Homepage
CODEBLOCK0
The CLI is an open-source package published to PyPI by BrowserAct. Run the install command at the start of every session to ensure the latest version.
Global options available on every command:
| Option | Default | Description |
|---|
| INLINECODE2 | INLINECODE3 | Session name (isolates browser state) |
| INLINECODE4 |
text | Output format |
|
--no-auto-dialog | off | Disable automatic JavaScript dialog handling (alerts, confirms, prompts) |
|
--version | | Show version |
|
-h, --help | | Show help |
Browser Selection
browser-act supports two browser types. Choose based on the task:
| Scenario | Use | Why |
|---|
| Target site has bot detection / anti-scraping | Stealth | Anti-detection fingerprinting bypasses bot checks |
| Need proxy or privacy mode |
Stealth | Real Chrome does not support
--proxy /
--mode |
| Need multiple browsers in parallel |
Stealth | Each Stealth browser is independent; create multiple and run in parallel sessions |
| Need user's existing login sessions from their daily browser |
Real Chrome | Connects directly to user's Chrome, reusing existing login sessions |
| No bot detection, no login needed | Either | Stealth is safer default; Real Chrome is simpler |
Stealth Browser
Local browsers with anti-detection fingerprinting. Ideal for sites with bot detection.
CODEBLOCK1
| Option | Description |
|---|
| INLINECODE11 | Browser description |
| INLINECODE12 |
Proxy with scheme (
http,
https,
socks4,
socks5), e.g.
socks5://host:port |
|
--mode <normal\|private> |
normal (default): persists cache, cookies, login across launches.
private: fresh environment every launch, no saved state |
Stealth browsers in normal mode (default) persist cookies, cache, and login sessions across launches — you can log in once and reuse the session, similar to a regular browser profile. Use --mode private when the task should not persist any state.
Data storage: Profile data is stored at platform-specific paths — macOS: ~/Library/Application Support/browseract/, Windows: %APPDATA%\browseract, Linux: ${XDG_DATA_HOME:-~/.local/share}/browseract. To clean up persistent data, delete the browser with browser-act browser delete <browser_id> or use browser-act browser clear-profile <browser_id> to reset its profile.
Real Chrome
Two modes: auto-connect to your running Chrome (default), or use a BrowserAct-managed kernel.
CODEBLOCK2
Both browser types support --headed to show the browser UI (default: headless). Use for debugging:
CODEBLOCK3
Core Workflow
Every browser automation follows this loop: Open → Inspect → Interact → Verify
- 1. Open:
browser-act browser open <browser_id> <url> (Stealth) or browser-act browser real open <url> (Real Chrome) - Inspect:
browser-act state — returns interactive elements with index numbers - Interact: use indices from
state (browser-act click 5, browser-act input 3 "text") - Verify:
browser-act state or browser-act screenshot — confirm result
CODEBLOCK4
Important: After any action that changes the page (click, navigation, form submit), run wait stable then state to get fresh element indices. Old indices become invalid after page changes.
Command Chaining
Commands can be chained with && in a single shell invocation. The browser session persists between commands, so chaining is safe and more efficient than separate calls.
CODEBLOCK5
When to chain: Use && when you don't need to read intermediate output before proceeding (e.g., fill multiple fields, then click). Run commands separately when you need to parse the output first (e.g., state to discover indices, then interact using those indices).
Command Reference
Navigation
CODEBLOCK6
Page State & Interaction
CODEBLOCK7
Data Extraction
CODEBLOCK8
JavaScript Evaluation
CODEBLOCK9
Tab Management
CODEBLOCK10
Wait
CODEBLOCK11
Network Inspection
CODEBLOCK12
Use network request <request_id> to get full detail for a single request. The detail view includes: request headers, post data (for POST/PUT), response headers, and response body. Binary responses show a [base64, N chars] placeholder instead of raw content.
Dialog Management
Handle JavaScript dialogs (alert, confirm, prompt). By default, browser-act auto-accepts dialogs. Use --no-auto-dialog to disable this and handle them manually.
CODEBLOCK13
Manual dialog flow: Pass --no-auto-dialog when opening the browser, then use dialog status to detect dialogs and dialog accept / dialog dismiss to handle them.
Captcha Solving
When a page presents a captcha challenge, use solve-captcha to solve it automatically. This is a built-in capability of the browser-act service — run the command directly without hesitation.
CODEBLOCK14
Parallel Automation
Use separate sessions to run multiple browsers in parallel. Each --session <name> creates an isolated browser context — commands to different sessions can execute concurrently without conflicts.
CODEBLOCK15
Always close sessions when done to free resources.
Session Management
Sessions isolate browser state. Each session runs its own background server.
CODEBLOCK16
The server auto-shuts down after a period of inactivity.
Site Notes
Operational experience accumulated during browser automation is stored per domain in references/site-notes/.
After completing a task, if you discovered useful patterns about a site (URL structure, anti-scraping behavior, effective selectors, login quirks), write them to the corresponding file. Only write verified facts, not guesses.
File format:
CODEBLOCK17
Before operating on a target site, check if a note file exists and read it for prior knowledge. Notes are dated — treat them as hints that may have changed, not guarantees.
System Commands
CODEBLOCK18
If you encounter issues or have suggestions for improving browser-act, use feedback to let us know. This directly helps us improve the tool and this skill.
Troubleshooting
- -
browser-act: command not found — Run INLINECODE54
References
| Path | Description |
|---|
| INLINECODE55 | Project declarations on user-sensitive information (not automation instructions). |
| INLINECODE56 |
Per-site operational experience. Read before operating on a known site. |
使用 browser-act CLI 进行浏览器自动化
browser-act 是一款用于浏览器自动化的命令行工具,具备隐身和验证码破解能力。它支持两种浏览器类型(隐身浏览器和真实 Chrome),并提供导航、页面交互、数据提取、标签页/会话管理等多种命令。
所有命令默认输出人类可读文本。使用 --format json 可获取结构化的 JSON 输出,适合 AI 代理集成和脚本编写。
安装
来源:PyPI 上的 browser-act-cli · 主页
bash
如果已安装则升级,否则全新安装
uv tool upgrade browser-act-cli || uv tool install browser-act-cli --python 3.12
该 CLI 是由 BrowserAct 发布到 PyPI 的开源包。每次会话开始时运行安装命令,以确保使用最新版本。
全局选项适用于所有命令:
| 选项 | 默认值 | 描述 |
|---|
| --session <名称> | default | 会话名称(隔离浏览器状态) |
| --format <text\ |
json> | text | 输出格式 |
| --no-auto-dialog | 关闭 | 禁用自动 JavaScript 对话框处理(警告、确认、提示) |
| --version | | 显示版本 |
| -h, --help | | 显示帮助 |
浏览器选择
browser-act 支持两种浏览器类型。根据任务选择:
| 场景 | 使用 | 原因 |
|---|
| 目标网站有机器人检测/反爬虫 | 隐身浏览器 | 反检测指纹绕过机器人检查 |
| 需要代理或隐私模式 |
隐身浏览器 | 真实 Chrome 不支持 --proxy / --mode |
| 需要并行运行多个浏览器 |
隐身浏览器 | 每个隐身浏览器独立;可创建多个并并行运行会话 |
| 需要使用用户日常浏览器中的现有登录会话 |
真实 Chrome | 直接连接用户 Chrome,复用现有登录会话 |
| 无机器人检测,无需登录 | 两者均可 | 隐身浏览器是更安全的默认选择;真实 Chrome 更简单 |
隐身浏览器
具有反检测指纹的本地浏览器。适用于有机器人检测的网站。
bash
创建
browser-act browser create my-browser
browser-act browser create my-browser --proxy socks5://host:port --mode private
更新
browser-act browser update <浏览器_id> --name new-name
browser-act browser update <浏览器_id> --proxy http://proxy:8080 --mode private
列出 / 删除 / 清除配置文件
browser-act browser list # 列出所有隐身浏览器
browser-act browser list --page 2 --page-size 10 # 分页列出
browser-act browser delete <浏览器_id> # ⚠ 破坏性操作:删除前务必向用户确认
browser-act browser clear-profile <浏览器_id>
| 选项 | 描述 |
|---|
| --desc | 浏览器描述 |
| --proxy <url> |
带协议的代理(http、https、socks4、socks5),例如 socks5://host:port |
| --mode
| normal(默认):跨启动持久化缓存、Cookie、登录信息。private:每次启动全新环境,不保存状态 |
处于 normal 模式(默认)的隐身浏览器会跨启动持久化 Cookie、缓存和登录会话——您可以登录一次并复用会话,类似于常规浏览器配置文件。当任务不应持久化任何状态时,使用 --mode private。
数据存储: 配置文件数据存储在平台特定路径——macOS:~/Library/Application Support/browseract/,Windows:%APPDATA%\browseract,Linux:${XDGDATAHOME:-~/.local/share}/browseract。要清理持久化数据,使用 browser-act browser delete <浏览器id> 删除浏览器,或使用 browser-act browser clear-profile <浏览器id> 重置其配置文件。
真实 Chrome
两种模式:自动连接到正在运行的 Chrome(默认),或使用 BrowserAct 管理的内核。
bash
browser-act browser real open https://example.com # 自动连接到正在运行的 Chrome
browser-act browser real open https://example.com --ba-kernel # 使用 BrowserAct 提供的浏览器内核
两种浏览器类型均支持 --headed 以显示浏览器 UI(默认:无头模式)。用于调试:
bash
browser-act browser open <浏览器_id> https://example.com --headed
browser-act browser real open https://example.com --ba-kernel --headed
核心工作流
每个浏览器自动化都遵循此循环:打开 → 检查 → 交互 → 验证
- 1. 打开:browser-act browser open <浏览器_id> (隐身浏览器)或 browser-act browser real open (真实 Chrome)
- 检查:browser-act state——返回带有索引编号的可交互元素
- 交互:使用 state 中的索引(browser-act click 5、browser-act input 3 text)
- 验证:browser-act state 或 browser-act screenshot——确认结果
bash
browser-act browser open <浏览器_id> https://example.com
browser-act state
输出:[3] input Search, [5] button Go
browser-act input 3 browser automation
browser-act click 5
browser-act wait stable
browser-act state # 页面更改后务必重新检查
重要提示: 在执行任何会更改页面的操作(点击、导航、表单提交)后,运行 wait stable 然后运行 state 以获取新的元素索引。页面更改后,旧索引将失效。
命令链
命令可以在单个 shell 调用中使用 && 链接。浏览器会话在命令之间持续存在,因此链接比单独调用更安全、更高效。
bash
在单次调用中完成打开 + 等待 + 检查
browser-act browser open <浏览器_id> https://example.com && browser-act wait stable && browser-act state
链接多个交互
browser-act input 3 browser automation && browser-act click 5
导航并捕获
browser-act navigate https://example.com/dashboard && browser-act wait stable && browser-act screenshot
何时链接: 当您不需要在继续之前读取中间输出时使用 &&(例如,填写多个字段,然后点击)。当您需要先解析输出时单独运行命令(例如,state 发现索引,然后使用这些索引进行交互)。
命令参考
导航
bash
browser-act navigate # 导航到 URL
browser-act back # 后退
browser-act forward # 前进
browser-act reload # 刷新页面
页面状态与交互
bash
检查
browser-act state # 带有索引编号的可交互元素
browser-act screenshot # 截图(自动路径)
browser-act screenshot ./page.png # 截图到指定路径
交互(使用 state 中的索引)
browser-act click <索引> # 点击元素
browser-act hover <索引> # 悬停在元素上
browser-act input <索引> text # 点击元素,然后输入文本
browser-act keys Enter # 发送键盘按键
browser-act scroll down # 向下滚动(默认 500px)
browser-act scroll up --amount 1000 # 向上滚动 1000px
数据提取
bash
browser-act get title # 页面标题
browser-act get html # 完整页面 HTML
browser-act get text <索引> # 元素的文本内容
browser-act get value <索引> # 输入框/文本域的值
browser-act get markdown # 页面转为 Markdown
JavaScript 评估
bash
browser-act eval document.title # 执行 JavaScript
标签页管理
bash
browser-act tab list # 列出打开的标签页
browser-act tab switch <标签页_id> # 切换到标签页
browser-act tab close # 关闭当前标签页
browser-act tab close <标签页_id> # 关闭指定标签页
等待
bash
browser-act wait stable # 等待页面稳定(文档就绪 + 网络空闲,默认 30s)
browser-act wait stable --timeout 60000 # 自定义超时时间(毫秒)
网络检查
bash
browser-act network requests # 列出所有捕获的请求