BrowserWing Executor API
Overview
BrowserWing Executor provides comprehensive browser automation capabilities through HTTP APIs. You can control browser navigation, interact with page elements, extract data, and analyze page structure.
Configuration
API Base URL: The BrowserWing Executor API address is configurable via environment variable.
- - Environment Variable: INLINECODE0
- Default Value: INLINECODE1
- How to get the URL: Read from environment variable
$BROWSERWING_EXECUTOR_URL, if not set, use default INLINECODE3
Base URL Format: ${BROWSERWING_EXECUTOR_URL}/api/v1/executor or http://127.0.0.1:8080/api/v1/executor (if env var not set)
Authentication: Use X-BrowserWing-Key: <api-key> header or Authorization: Bearer <token> if required.
Important: Always construct the API URL by reading the environment variable first. In shell commands, use: INLINECODE8
Core Capabilities
- - Page Navigation: Navigate to URLs, go back/forward, reload
- Element Interaction: Click, type, select, hover on page elements
- Data Extraction: Extract text, attributes, values from elements
- Accessibility Analysis: Get accessibility snapshot to understand page structure
- Advanced Operations: Screenshot, JavaScript execution, keyboard input
- Batch Processing: Execute multiple operations in sequence
API Endpoints
1. Discover Available Commands
IMPORTANT: Always call this endpoint first to see all available commands and their parameters.
CODEBLOCK0
Response: Returns complete list of all commands with parameters, examples, and usage guidelines.
Query specific command:
CODEBLOCK1
2. Get Accessibility Snapshot
CRITICAL: Always call this after navigation to understand page structure and get element RefIDs.
CODEBLOCK2
Response Example:
CODEBLOCK3
Use Cases:
- - Understand what interactive elements are on the page
- Get element RefIDs (@e1, @e2, etc.) for precise identification
- See element labels, roles, and attributes
- The accessibility tree is cleaner than raw DOM and better for LLMs
- RefIDs are stable references that work reliably across page changes
3. Common Operations
Note: All examples below use EXECUTOR_URL="${BROWSERWING_EXECUTOR_URL:-http://127.0.0.1:8080}" to read the API address from environment variable, with http://127.0.0.1:8080 as fallback default.
Navigate to URL
CODEBLOCK4
Click Element
EXECUTOR_URL="${BROWSERWING_EXECUTOR_URL:-http://127.0.0.1:8080}"
curl -X POST "${EXECUTOR_URL}/api/v1/executor/click" \
-H 'Content-Type: application/json' \
-d '{"identifier": "@e1"}'
Identifier formats:
- - RefID (Recommended):
@e1, @e2 (from snapshot) - CSS Selector:
#button-id, INLINECODE14 - XPath: INLINECODE15
- Text:
Login (text content)
Type Text
CODEBLOCK6
Extract Data
CODEBLOCK7
Wait for Element
CODEBLOCK8
Batch Operations
CODEBLOCK9
Instructions
Step-by-step workflow:
- 0. Get API URL: First, read the API base URL from environment variable
$BROWSERWING_EXECUTOR_URL. If not set, use default http://127.0.0.1:8080. In shell commands, use: INLINECODE19
- 1. Discover commands: Call
GET /help to see all available operations and their parameters (do this first if unsure).
- 2. Navigate: Use
POST /navigate to open the target webpage.
- 3. Analyze page: Call
GET /snapshot to understand page structure and get element RefIDs.
- 4. Interact: Use element RefIDs (like
@e1, @e2) or CSS selectors to:
- Click elements:
POST /click
- Input text:
POST /type
- Select options:
POST /select
- Wait for elements: INLINECODE28
- 5. Extract data: Use
POST /extract to get information from the page.
- 6. Present results: Format and show extracted data to the user.
Complete Example
User Request: "Search for 'laptop' on example.com and get the first 5 results"
Your Actions:
- 1. Navigate to search page:
CODEBLOCK10
- 2. Get page structure to find search input:
curl -X GET 'http://127.0.0.1:18085/api/v1/executor/snapshot'
Response shows: INLINECODE30
- 3. Type search query:
CODEBLOCK12
- 4. Press Enter to submit:
CODEBLOCK13
- 5. Wait for results to load:
CODEBLOCK14
- 6. Extract search results:
CODEBLOCK15
- 7. Present the extracted data:
CODEBLOCK16
Key Commands Reference
Navigation
- -
POST /navigate - Navigate to URL - INLINECODE32 - Go back in history
- INLINECODE33 - Go forward in history
- INLINECODE34 - Reload current page
Element Interaction
- -
POST /click - Click element (supports: RefID @e1, CSS selector, XPath, text content) - INLINECODE37 - Type text into input (supports: RefID
@e3, CSS selector, XPath) - INLINECODE39 - Select dropdown option
- INLINECODE40 - Hover over element
- INLINECODE41 - Wait for element state (visible, hidden, enabled)
- INLINECODE42 - Press keyboard key (Enter, Tab, Ctrl+S, etc.)
Data Extraction
- -
POST /extract - Extract data from elements (supports multiple elements, custom fields) - INLINECODE44 - Get element text content
- INLINECODE45 - Get input element value
- INLINECODE46 - Get page URL and title
- INLINECODE47 - Get all page text
- INLINECODE48 - Get full HTML
Page Analysis
- -
GET /snapshot - Get accessibility snapshot (⭐ ALWAYS call after navigation) - INLINECODE50 - Get all clickable elements
- INLINECODE51 - Get all input elements
Advanced
- -
POST /screenshot - Take page screenshot (base64 encoded) - INLINECODE53 - Execute JavaScript code
- INLINECODE54 - Execute multiple operations in sequence
- INLINECODE55 - Scroll to page bottom
- INLINECODE56 - Resize browser window
- INLINECODE57 - Manage browser tabs (list, new, switch, close)
- INLINECODE58 - Intelligently fill multiple form fields at once
Debug & Monitoring
- -
GET /console-messages - Get browser console messages (logs, warnings, errors) - INLINECODE60 - Get network requests made by the page
- INLINECODE61 - Configure JavaScript dialog (alert, confirm, prompt) handling
- INLINECODE62 - Upload files to input elements
- INLINECODE63 - Drag and drop elements
- INLINECODE64 - Close the current page/tab
Element Identification
You can identify elements using:
- 1. RefID (Recommended):
@e1, @e2, INLINECODE67
- Most reliable method - stable across page changes
- Get RefIDs from
/snapshot endpoint
- Valid for 5 minutes after snapshot
- Example:
"identifier": "@e1"
- Works with multi-strategy fallback for robustness
- 2. CSS Selector:
#id, .class, INLINECODE72
- Standard CSS selectors
- Example: INLINECODE73
- 3. XPath:
//button[@id='login'], INLINECODE75
- XPath expressions for complex queries
- Example: INLINECODE76
- 4. Text Content:
Login, Sign Up, INLINECODE79
- Searches buttons and links with matching text
- Example: INLINECODE80
- 5. ARIA Label: Elements with
aria-label attribute
- Automatically searched
Guidelines
Before starting:
- - Get API URL first: Read from
$BROWSERWING_EXECUTOR_URL environment variable, or use http://127.0.0.1:8080 as default - Call
GET /help if you're unsure about available commands or their parameters - Ensure browser is started (if not, it will auto-start on first operation)
During automation:
- - Always call
/snapshot after navigation to get page structure and RefIDs - Prefer RefIDs (like
@e1) over CSS selectors for reliability and stability - Re-snapshot after page changes to get updated RefIDs
- Use
/wait for dynamic content that loads asynchronously - Check element states before interaction (visible, enabled)
- Use
/batch for multiple sequential operations to improve efficiency
Error handling:
- - If operation fails, check element identifier and try different format
- For timeout errors, increase timeout value
- If element not found, call
/snapshot again to refresh page structure - Explain errors clearly to user with suggested solutions
Data extraction:
- - Use
fields parameter to specify what to extract: INLINECODE91 - Set
multiple: true to extract from multiple elements - Format extracted data in a readable way for user
Complete Workflow Example
Scenario: User wants to login to a website
CODEBLOCK17
Your Actions:
Step 1: Navigate to login page
CODEBLOCK18
Step 2: Get page structure
EXECUTOR_URL="${BROWSERWING_EXECUTOR_URL:-http://127.0.0.1:8080}"
curl -X GET "${EXECUTOR_URL}/api/v1/executor/snapshot"
Response:
CODEBLOCK20
Step 3: Enter username
CODEBLOCK21
Step 4: Enter password
CODEBLOCK22
Step 5: Click login button
CODEBLOCK23
Step 6: Wait for login success (optional)
CODEBLOCK24
Step 7: Inform user
CODEBLOCK25
Batch Operation Example
Scenario: Fill out a form with multiple fields
Instead of making 5 separate API calls, use one batch operation:
CODEBLOCK26
Best Practices
- 1. Discovery first: If unsure, call
/help or /help?command=<name> to learn about commands - Structure first: Always call
/snapshot after navigation to understand the page - Use accessibility indices: They're more reliable than CSS selectors (elements might have dynamic classes)
- Wait for dynamic content: Use
/wait before interacting with elements that load asynchronously - Batch when possible: Use
/batch for multiple sequential operations - Handle errors gracefully: Provide clear explanations and suggestions when operations fail
- Verify results: After operations, check if desired outcome was achieved
Common Scenarios
Form Filling
- 1. Navigate to form page
- Get accessibility snapshot to find input elements and their RefIDs
- Use
/type for each field: @e1, @e2, etc. - Use
/select for dropdowns - Click submit button using its RefID
Data Scraping
- 1. Navigate to target page
- Wait for content to load with INLINECODE102
- Use
/extract with CSS selector and INLINECODE104 - Specify fields to extract: INLINECODE105
Search Operations
- 1. Navigate to search page
- Get accessibility snapshot to locate search input
- Type search query into input
- Press Enter or click search button
- Wait for results
- Extract results data
Login Automation
- 1. Navigate to login page
- Get accessibility snapshot to find RefIDs
- Type username: INLINECODE106
- Type password: INLINECODE107
- Click login button: INLINECODE108
- Wait for success indicator
Important Notes
- - Browser must be running (it will auto-start on first operation if needed)
- Operations are executed on the currently active browser tab
- Accessibility snapshot updates after each navigation and click operation
- All timeouts are in seconds
- Use
wait_visible: true (default) for reliable element interaction - API address: Always read from
$BROWSERWING_EXECUTOR_URL environment variable, fallback to http://127.0.0.1:8080 if not set - Authentication required: use
X-BrowserWing-Key header or JWT token if configured
Troubleshooting
Element not found:
- - Call
/snapshot to see available elements - Try different identifier format (accessibility index, CSS selector, text)
- Check if page has finished loading
Timeout errors:
- - Increase timeout value in request
- Check if element actually appears on page
- Use
/wait with appropriate state before interaction
Extraction returns empty:
- - Verify CSS selector matches target elements
- Check if content has loaded (use
/wait first) - Try different extraction fields or type
Quick Reference
CODEBLOCK27
Response Format
All operations return:
CODEBLOCK28
Error response:
CODEBLOCK29
BrowserWing Executor API
概述
BrowserWing Executor 通过 HTTP API 提供全面的浏览器自动化功能。您可以控制浏览器导航、与页面元素交互、提取数据以及分析页面结构。
配置
API 基础 URL: BrowserWing Executor API 地址可通过环境变量进行配置。
- - 环境变量: BROWSERWINGEXECUTORURL
- 默认值: http://127.0.0.1:8080
- 获取 URL 的方法: 从环境变量 $BROWSERWINGEXECUTORURL 中读取,如果未设置,则使用默认值 http://127.0.0.1:8080
基础 URL 格式: ${BROWSERWINGEXECUTORURL}/api/v1/executor 或 http://127.0.0.1:8080/api/v1/executor(如果环境变量未设置)
身份验证: 使用 X-BrowserWing-Key: 标头或 Authorization: Bearer (如果需要)。
重要提示: 始终通过首先读取环境变量来构建 API URL。在 shell 命令中,使用:${BROWSERWINGEXECUTORURL:-http://127.0.0.1:8080}
核心功能
- - 页面导航: 导航到 URL、后退/前进、重新加载
- 元素交互: 点击、输入、选择、悬停在页面元素上
- 数据提取: 从元素中提取文本、属性、值
- 可访问性分析: 获取可访问性快照以了解页面结构
- 高级操作: 截图、执行 JavaScript、键盘输入
- 批处理: 按顺序执行多个操作
API 端点
1. 发现可用命令
重要提示: 始终首先调用此端点以查看所有可用命令及其参数。
bash
EXECUTORURL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080}
curl -X GET ${EXECUTOR_URL}/api/v1/executor/help
响应: 返回所有命令的完整列表,包括参数、示例和使用指南。
查询特定命令:
bash
EXECUTORURL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080}
curl -X GET ${EXECUTOR_URL}/api/v1/executor/help?command=extract
2. 获取可访问性快照
关键提示: 在导航后始终调用此接口以了解页面结构并获取元素 RefID。
bash
EXECUTORURL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080}
curl -X GET ${EXECUTOR_URL}/api/v1/executor/snapshot
响应示例:
json
{
success: true,
snapshot_text: 可点击元素:\n @e1 登录 (角色: button)\n @e2 注册 (角色: link)\n\n输入元素:\n @e3 邮箱 (角色: textbox) [占位符: your@email.com]\n @e4 密码 (角色: textbox)
}
使用场景:
- - 了解页面上有哪些交互元素
- 获取元素 RefID(@e1、@e2 等)以进行精确定位
- 查看元素标签、角色和属性
- 可访问性树比原始 DOM 更简洁,更适合 LLM
- RefID 是稳定的引用,可在页面更改时可靠工作
3. 常用操作
注意: 以下所有示例均使用 EXECUTORURL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080} 从环境变量读取 API 地址,并以 http://127.0.0.1:8080 作为回退默认值。
导航到 URL
bash
EXECUTOR
URL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080}
curl -X POST ${EXECUTOR_URL}/api/v1/executor/navigate \
-H Content-Type: application/json \
-d {url: https://example.com}
点击元素
bash
EXECUTOR
URL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080}
curl -X POST ${EXECUTOR_URL}/api/v1/executor/click \
-H Content-Type: application/json \
-d {identifier: @e1}
标识符格式:
- - RefID(推荐): @e1、@e2(来自快照)
- CSS 选择器: #button-id、.class-name
- XPath: //button[@type=submit]
- 文本: 登录(文本内容)
输入文本
bash
EXECUTOR
URL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080}
curl -X POST ${EXECUTOR_URL}/api/v1/executor/type \
-H Content-Type: application/json \
-d {identifier: @e3, text: user@example.com}
提取数据
bash
EXECUTOR
URL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080}
curl -X POST ${EXECUTOR_URL}/api/v1/executor/extract \
-H Content-Type: application/json \
-d {
selector: .product-item,
fields: [text, href],
multiple: true
}
等待元素
bash
EXECUTOR
URL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080}
curl -X POST ${EXECUTOR_URL}/api/v1/executor/wait \
-H Content-Type: application/json \
-d {identifier: .loading, state: hidden, timeout: 10}
批量操作
bash
EXECUTOR
URL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080}
curl -X POST ${EXECUTOR_URL}/api/v1/executor/batch \
-H Content-Type: application/json \
-d {
operations: [
{type: navigate, params: {url: https://example.com}, stop
onerror: true},
{type: click, params: {identifier: @e1}, stop
onerror: true},
{type: type, params: {identifier: @e3, text: query}, stop
onerror: true}
]
}
操作说明
分步工作流程:
- 0. 获取 API URL: 首先,从环境变量 $BROWSERWINGEXECUTORURL 读取 API 基础 URL。如果未设置,则使用默认值 http://127.0.0.1:8080。在 shell 命令中,使用:EXECUTORURL=${BROWSERWINGEXECUTOR_URL:-http://127.0.0.1:8080}
- 1. 发现命令: 调用 GET /help 查看所有可用操作及其参数(如果不确定,请先执行此操作)。
- 2. 导航: 使用 POST /navigate 打开目标网页。
- 3. 分析页面: 调用 GET /snapshot 了解页面结构并获取元素 RefID。
- 4. 交互: 使用元素 RefID(如 @e1、@e2)或 CSS 选择器来:
- 点击元素:POST /click
- 输入文本:POST /type
- 选择选项:POST /select
- 等待元素:POST /wait
- 5. 提取数据: 使用 POST /extract 从页面获取信息。
- 6. 呈现结果: 格式化并向用户显示提取的数据。
完整示例
用户请求: 在 example.com 上搜索 laptop 并获取前 5 个结果
您的操作:
- 1. 导航到搜索页面:
bash
curl -X POST http://127.0.0.1:180