Browser Automation with browser-act CLI

INLINECODE0 is a CLI for browser automation with stealth and captcha solving capabilities. It supports two browser types (Stealth and Real Chrome) and provides commands for navigation, page interaction, data extraction, tab/session management, and more.

All commands output human-readable text by default. Use --format json for structured JSON output, ideal for AI agent integration and scripting.

Installation

Source: browser-act-cli on PyPI · Homepage

CODEBLOCK0

The CLI is an open-source package published to PyPI by BrowserAct. Run the install command at the start of every session to ensure the latest version.

Global options available on every command:

Option	Default	Description
INLINECODE2	INLINECODE3	Session name (isolates browser state)
INLINECODE4

Browser Selection

browser-act supports two browser types. Choose based on the task:

Scenario	Use	Why
Target site has bot detection / anti-scraping	Stealth	Anti-detection fingerprinting bypasses bot checks
Need proxy or privacy mode

Stealth Browser

Local browsers with anti-detection fingerprinting. Ideal for sites with bot detection.

CODEBLOCK1

Option	Description
INLINECODE11	Browser description
INLINECODE12

Proxy with scheme (http, https, socks4, socks5), e.g. socks5://host:port |
| --mode <normal\|private> | normal (default): persists cache, cookies, login across launches. private: fresh environment every launch, no saved state |

Stealth browsers in normal mode (default) persist cookies, cache, and login sessions across launches — you can log in once and reuse the session, similar to a regular browser profile. Use --mode private when the task should not persist any state.

Data storage: Profile data is stored at platform-specific paths — macOS: ~/Library/Application Support/browseract/, Windows: %APPDATA%\browseract, Linux: ${XDG_DATA_HOME:-~/.local/share}/browseract. To clean up persistent data, delete the browser with browser-act browser delete <browser_id> or use browser-act browser clear-profile <browser_id> to reset its profile.

Real Chrome

Two modes: auto-connect to your running Chrome (default), or use a BrowserAct-managed kernel.

CODEBLOCK2

Both browser types support --headed to show the browser UI (default: headless). Use for debugging:

CODEBLOCK3

Core Workflow

Every browser automation follows this loop: Open → Inspect → Interact → Verify

1. Open: browser-act browser open <browser_id> <url> (Stealth) or browser-act browser real open <url> (Real Chrome)
Inspect: browser-act state — returns interactive elements with index numbers
Interact: use indices from state (browser-act click 5, browser-act input 3 "text")
Verify: browser-act state or browser-act screenshot — confirm result

CODEBLOCK4

Important: After any action that changes the page (click, navigation, form submit), run wait stable then state to get fresh element indices. Old indices become invalid after page changes.

Command Chaining

Commands can be chained with && in a single shell invocation. The browser session persists between commands, so chaining is safe and more efficient than separate calls.

CODEBLOCK5

When to chain: Use && when you don't need to read intermediate output before proceeding (e.g., fill multiple fields, then click). Run commands separately when you need to parse the output first (e.g., state to discover indices, then interact using those indices).

Command Reference

Navigation

CODEBLOCK6

Page State & Interaction

CODEBLOCK7

Data Extraction

CODEBLOCK8

JavaScript Evaluation

CODEBLOCK9

Tab Management

CODEBLOCK10

Wait

CODEBLOCK11

Network Inspection

CODEBLOCK12

Use network request <request_id> to get full detail for a single request. The detail view includes: request headers, post data (for POST/PUT), response headers, and response body. Binary responses show a [base64, N chars] placeholder instead of raw content.

Dialog Management

Handle JavaScript dialogs (alert, confirm, prompt). By default, browser-act auto-accepts dialogs. Use --no-auto-dialog to disable this and handle them manually.

CODEBLOCK13

Manual dialog flow: Pass --no-auto-dialog when opening the browser, then use dialog status to detect dialogs and dialog accept / dialog dismiss to handle them.

Captcha Solving

When a page presents a captcha challenge, use solve-captcha to solve it automatically. This is a built-in capability of the browser-act service — run the command directly without hesitation.

CODEBLOCK14

Parallel Automation

Use separate sessions to run multiple browsers in parallel. Each --session <name> creates an isolated browser context — commands to different sessions can execute concurrently without conflicts.

CODEBLOCK15

Always close sessions when done to free resources.

Session Management

Sessions isolate browser state. Each session runs its own background server.

CODEBLOCK16

The server auto-shuts down after a period of inactivity.

Site Notes

Operational experience accumulated during browser automation is stored per domain in references/site-notes/.

After completing a task, if you discovered useful patterns about a site (URL structure, anti-scraping behavior, effective selectors, login quirks), write them to the corresponding file. Only write verified facts, not guesses.

File format:

CODEBLOCK17

Before operating on a target site, check if a note file exists and read it for prior knowledge. Notes are dated — treat them as hints that may have changed, not guarantees.

System Commands

CODEBLOCK18

If you encounter issues or have suggestions for improving browser-act, use feedback to let us know. This directly helps us improve the tool and this skill.

Troubleshooting

- browser-act: command not found — Run INLINECODE54

References

Path	Description
INLINECODE55	Project declarations on user-sensitive information (not automation instructions).
INLINECODE56

Per-site operational experience. Read before operating on a known site. |

使用 browser-act CLI 进行浏览器自动化

browser-act 是一款用于浏览器自动化的命令行工具，具备隐身和验证码破解能力。它支持两种浏览器类型（隐身浏览器和真实 Chrome），并提供导航、页面交互、数据提取、标签页/会话管理等多种命令。

所有命令默认输出人类可读文本。使用 --format json 可获取结构化的 JSON 输出，适合 AI 代理集成和脚本编写。

安装

来源：PyPI 上的 browser-act-cli · 主页

bash

如果已安装则升级，否则全新安装

uv tool upgrade browser-act-cli || uv tool install browser-act-cli --python 3.12

该 CLI 是由 BrowserAct 发布到 PyPI 的开源包。每次会话开始时运行安装命令，以确保使用最新版本。

全局选项适用于所有命令：

选项	默认值	描述
--session <名称>	default	会话名称（隔离浏览器状态）
--format <text\

浏览器选择

browser-act 支持两种浏览器类型。根据任务选择：

场景	使用	原因
目标网站有机器人检测/反爬虫	隐身浏览器	反检测指纹绕过机器人检查
需要代理或隐私模式

隐身浏览器

具有反检测指纹的本地浏览器。适用于有机器人检测的网站。

bash

创建

browser-act browser create my-browser
browser-act browser create my-browser --proxy socks5://host:port --mode private

更新

browser-act browser update <浏览器_id> --name new-name browser-act browser update <浏览器_id> --proxy http://proxy:8080 --mode private

列出 / 删除 / 清除配置文件

browser-act browser list # 列出所有隐身浏览器 browser-act browser list --page 2 --page-size 10 # 分页列出 browser-act browser delete <浏览器_id> # ⚠ 破坏性操作：删除前务必向用户确认 browser-act browser clear-profile <浏览器_id>

选项	描述
--desc	浏览器描述
--proxy <url>

带协议的代理（http、https、socks4、socks5），例如 socks5://host:port | | --mode | normal（默认）：跨启动持久化缓存、Cookie、登录信息。private：每次启动全新环境，不保存状态 |

处于 normal 模式（默认）的隐身浏览器会跨启动持久化 Cookie、缓存和登录会话——您可以登录一次并复用会话，类似于常规浏览器配置文件。当任务不应持久化任何状态时，使用 --mode private。

数据存储： 配置文件数据存储在平台特定路径——macOS：~/Library/Application Support/browseract/，Windows：%APPDATA%\browseract，Linux：${XDGDATAHOME:-~/.local/share}/browseract。要清理持久化数据，使用 browser-act browser delete <浏览器id> 删除浏览器，或使用 browser-act browser clear-profile <浏览器id> 重置其配置文件。

真实 Chrome

两种模式：自动连接到正在运行的 Chrome（默认），或使用 BrowserAct 管理的内核。

bash
browser-act browser real open https://example.com # 自动连接到正在运行的 Chrome
browser-act browser real open https://example.com --ba-kernel # 使用 BrowserAct 提供的浏览器内核

两种浏览器类型均支持 --headed 以显示浏览器 UI（默认：无头模式）。用于调试：

bash
browser-act browser open <浏览器_id> https://example.com --headed
browser-act browser real open https://example.com --ba-kernel --headed

核心工作流

每个浏览器自动化都遵循此循环：打开 → 检查 → 交互 → 验证

1. 打开：browser-act browser open <浏览器_id> （隐身浏览器）或 browser-act browser real open （真实 Chrome）
检查：browser-act state——返回带有索引编号的可交互元素
交互：使用 state 中的索引（browser-act click 5、browser-act input 3 text）
验证：browser-act state 或 browser-act screenshot——确认结果

bash
browser-act browser open <浏览器_id> https://example.com
browser-act state

输出：[3] input Search, [5] button Go

browser-act input 3 browser automation
browser-act click 5
browser-act wait stable
browser-act state # 页面更改后务必重新检查

重要提示： 在执行任何会更改页面的操作（点击、导航、表单提交）后，运行 wait stable 然后运行 state 以获取新的元素索引。页面更改后，旧索引将失效。

命令链

命令可以在单个 shell 调用中使用 && 链接。浏览器会话在命令之间持续存在，因此链接比单独调用更安全、更高效。

bash

在单次调用中完成打开 + 等待 + 检查

browser-act browser open <浏览器_id> https://example.com && browser-act wait stable && browser-act state

链接多个交互

browser-act input 3 browser automation && browser-act click 5

导航并捕获

browser-act navigate https://example.com/dashboard && browser-act wait stable && browser-act screenshot

何时链接： 当您不需要在继续之前读取中间输出时使用 &&（例如，填写多个字段，然后点击）。当您需要先解析输出时单独运行命令（例如，state 发现索引，然后使用这些索引进行交互）。

命令参考

页面状态与交互

bash

检查

browser-act state # 带有索引编号的可交互元素
browser-act screenshot # 截图（自动路径）
browser-act screenshot ./page.png # 截图到指定路径

交互（使用 state 中的索引）

browser-act click <索引> # 点击元素 browser-act hover <索引> # 悬停在元素上 browser-act input <索引> text # 点击元素，然后输入文本 browser-act keys Enter # 发送键盘按键 browser-act scroll down # 向下滚动（默认 500px） browser-act scroll up --amount 1000 # 向上滚动 1000px

数据提取

bash
browser-act get title # 页面标题
browser-act get html # 完整页面 HTML
browser-act get text <索引> # 元素的文本内容
browser-act get value <索引> # 输入框/文本域的值
browser-act get markdown # 页面转为 Markdown

JavaScript 评估

bash
browser-act eval document.title # 执行 JavaScript

标签页管理

bash
browser-act tab list # 列出打开的标签页
browser-act tab switch <标签页_id> # 切换到标签页
browser-act tab close # 关闭当前标签页
browser-act tab close <标签页_id> # 关闭指定标签页

等待

bash
browser-act wait stable # 等待页面稳定（文档就绪 + 网络空闲，默认 30s）
browser-act wait stable --timeout 60000 # 自定义超时时间（毫秒）

网络检查

bash
browser-act network requests # 列出所有捕获的请求

browser-act浏览器自动化

browser-act

Browser Automation with browser-act CLI

Installation

Browser Selection

Stealth Browser

Real Chrome

Core Workflow

Command Chaining

Command Reference

Navigation

Page State & Interaction

Data Extraction

JavaScript Evaluation

Tab Management

Wait

Network Inspection

Dialog Management

Captcha Solving

Parallel Automation

Session Management

Site Notes

System Commands

Troubleshooting

References

使用 browser-act CLI 进行浏览器自动化

安装

如果已安装则升级，否则全新安装

浏览器选择

隐身浏览器

创建

更新

列出 / 删除 / 清除配置文件

真实 Chrome

核心工作流

输出：[3] input Search, [5] button Go

命令链

在单次调用中完成打开 + 等待 + 检查

链接多个交互

导航并捕获

命令参考

导航

页面状态与交互

检查

交互（使用 state 中的索引）

数据提取

JavaScript 评估

标签页管理

等待

网络检查

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement