Cloud Browser Automation with scrapeless-browser
Important: Session Management with --session-id
All browser operation commands support the --session-id parameter to specify which Scrapeless session to use.
Recommended Workflow
CODEBLOCK0
Automatic Session Management
If you don't specify --session-id:
- 1. The CLI will query for running sessions
- If a running session exists, it will use the latest one
- If no running session exists, it will create a new one automatically
For production workflows, always use --session-id to ensure consistency.
Authentication Setup
Before using scrapeless-browser, you MUST set up authentication:
CODEBLOCK1
Get your API token from https://app.scrapeless.com
Session Management Behavior
The CLI manages Scrapeless sessions with the following behavior:
- - Session Creation: First command creates a new Scrapeless session
- Session Persistence: Sessions remain active only while connection is maintained
- Session Termination: Sessions automatically terminate when connection closes
- Reconnection Limitation: Cannot reconnect to terminated sessions
Important: For multi-step workflows, consider using the TypeScript API to maintain persistent connections.
Core Workflow
Every browser automation follows this pattern:
- 1. Create Session: Create a session and save the session ID
- Navigate: Use
--session-id to navigate to URL - Snapshot: Get element refs with INLINECODE4
- Interact: Use refs to click, fill, select with INLINECODE5
- Re-snapshot: After navigation or DOM changes, get fresh refs
CODEBLOCK2
Command Chaining
Commands can be chained with && in a single shell invocation:
CODEBLOCK3
When to chain: Use && when you don't need to read intermediate output. Run commands separately when you need to parse output first (e.g., snapshot to discover refs, then interact).
Essential Commands
Note: All commands below support the optional --session-id <id> parameter.
CODEBLOCK4
Session Creation with Advanced Options
The new-session command supports extensive customization options:
CODEBLOCK5
Available Options:
- -
--name <name>: Session name for identification - INLINECODE11 : Session timeout in seconds (default: 180)
- INLINECODE12 : Enable session recording
- INLINECODE13 : Proxy country code (e.g., AU, US, GB, CN, JP)
- INLINECODE14 : Proxy state/region (e.g., NSW, CA, NY, TX)
- INLINECODE15 : Proxy city (e.g., sydney, newyork, london, tokyo)
- INLINECODE16 : Custom user agent string
- INLINECODE17 : Platform (Windows, macOS, Linux, iOS, Android)
- INLINECODE18 : Screen width in pixels (default: 1920)
- INLINECODE19 : Screen height in pixels (default: 1080)
- INLINECODE20 : Timezone (default: America/New_York)
- INLINECODE21 : Comma-separated language codes (default: en)
CODEBLOCK6
scrapeless-scraping-browser get url # Get current URL
scrapeless-scraping-browser get title # Get page title
scrapeless-scraping-browser screenshot # Take screenshot
scrapeless-scraping-browser screenshot --full # Full page screenshot
Wait
scrapeless-scraping-browser wait @e1 # Wait for element
scrapeless-scraping-browser wait --load networkidle # Wait for network idle
scrapeless-scraping-browser wait --url "
/page" # Wait for URL pattern
scrapeless-scraping-browser wait 2000 # Wait milliseconds
Cookies & Storage
scrapeless-scraping-browser cookies # Get all cookies
scrapeless-scraping-browser cookies set
# Set cookie
scrapeless-scraping-browser cookies clear # Clear cookies
scrapeless-scraping-browser storage local # Get localStorage
scrapeless-scraping-browser storage local set # Set localStorage
Multi-page
scrapeless-scraping-browser pages # List all pages/tabs
scrapeless-scraping-browser page # Switch to page
scrapeless-scraping-browser tab new [url] # Open new tab
scrapeless-scraping-browser tab close [n] # Close tab
Live preview
scrapeless-scraping-browser live # Get live preview URL
## Common Patterns
### Form Submission
bash
scrapeless-scraping-browser config set apiKey your_token
scrapeless-scraping-browser open https://example.com/signup
scrapeless-scraping-browser snapshot -i
scrapeless-scraping-browser fill @e1 "Jane Doe"
scrapeless-scraping-browser fill @e2 "jane@example.com"
scrapeless-scraping-browser click @e3
scrapeless-scraping-browser wait --load networkidle
### Data Extraction
bash
scrapeless-scraping-browser config set apiKey your_token
scrapeless-scraping-browser open https://example.com/products
scrapeless-scraping-browser snapshot -i --json
scrapeless-scraping-browser get text @e5 --json
### Common Session Configuration Scenarios
#### Mobile Device Simulation
bash
Simulate iPhone for mobile-specific content
SESSION_ID=$(scrapeless-scraping-browser new-session \
--name "mobile-test" \
--platform iOS \
--screen-width 375 \
--screen-height 812 \
--user-agent "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X)" \
--json | jq -r '.taskId')
scrapeless-scraping-browser --session-id $SESSION_ID open https://m.example.com
#### Geographic Content Testing
bash
Access content from different regions
SESSION_ID=$(scrapeless-scraping-browser new-session \
--name "geo-test" \
--proxy-country AU \
--proxy-city sydney \
--timezone "Australia/Sydney" \
--languages "en-AU,en" \
--json | jq -r '.taskId')
scrapeless-scraping-browser --session-id $SESSION_ID open https://example.com
#### High-Resolution Desktop Testing
bash
Test on high-resolution displays
SESSION_ID=$(scrapeless-scraping-browser new-session \
--name "desktop-4k" \
--platform macOS \
--screen-width 3840 \
--screen-height 2160 \
--json | jq -r '.taskId')
scrapeless-scraping-browser --session-id $SESSION_ID open https://example.com
#### Session Recording for Debugging
bash
Enable recording for troubleshooting
SESSION_ID=$(scrapeless-scraping-browser new-session \
--name "debug-session" \
--recording true \
--ttl 7200 \
--json | jq -r '.taskId')
scrapeless-scraping-browser --session-id $SESSION_ID open https://example.com
Session recording will be available for review
### Session Persistence
**Important**: Scrapeless sessions terminate when connections close. For persistent workflows, use the TypeScript API:
bash
scrapeless-scraping-browser config set apiKey your_token
Create a session for login
scrapeless-scraping-browser create --name "login-session" --ttl 1800
scrapeless-scraping-browser open https://app.example.com/login
scrapeless-scraping-browser snapshot -i
scrapeless-scraping-browser fill @e1 "username"
scrapeless-scraping-browser fill @e2 "password"
scrapeless-scraping-browser click @e3
scrapeless-scraping-browser wait --url "/dashboard"
For subsequent operations, create a new session
(Cannot reuse previous session due to connection termination)
scrapeless-scraping-browser create --name "dashboard-session" --ttl 1800
scrapeless-scraping-browser open https://app.example.com/dashboard
**Better Alternative**: Use TypeScript API for multi-step workflows:
typescript
import { BrowserManager } from './dist/browser.js';
const manager = new BrowserManager();
await manager.launch({ id: 'persistent-workflow', action: 'launch' });
const page = manager.getPage();
await page.goto('https://app.example.com/login');
// Login persists throughout the session
await page.fill('#username', 'user');
await page.fill('#password', 'pass');
await page.click('#login');
await page.waitForURL('/dashboard');
await page.goto('https://app.example.com/profile');
// Session and cookies maintained
await manager.close();
### Using Proxies
bash
scrapeless-scraping-browser config set apiKey your_token
Use residential proxy from specific country
scrapeless-scraping-browser config set proxyCountry US
scrapeless-scraping-browser open https://example.com
Use custom proxy
scrapeless-scraping-browser config set proxyUrl "http://user:pass@proxy.com:8080"
scrapeless-scraping-browser open https://example.com
Use proxy with state and city (v2 API)
scrapeless-scraping-browser config set proxyCountry US
scrapeless-scraping-browser config set proxyState CA
scrapeless-scraping-browser config set proxyCity "Los Angeles"
scrapeless-scraping-browser open https://example.com
### Browser Fingerprinting
bash
scrapeless-scraping-browser config set apiKey your_token
Set browser fingerprint to avoid detection
scrapeless-scraping-browser config set fingerprint chrome
scrapeless-scraping-browser open https://example.com
Customize browser fingerprint details
scrapeless-scraping-browser config set userAgent "Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X)"
scrapeless-scraping-browser config set platform iOS
scrapeless-scraping-browser config set screenWidth 375
scrapeless-scraping-browser config set screenHeight 812
scrapeless-scraping-browser config set timezone "Asia/Shanghai"
scrapeless-scraping-browser config set languages "zh-CN,en"
scrapeless-scraping-browser open https://example.com
### Session Recording
bash
scrapeless-scraping-browser config set apiKey your_token
Enable session recording for debugging
scrapeless-scraping-browser config set sessionRecording true
scrapeless-scraping-browser open https://example.com
... perform actions ...
scrapeless-scraping-browser close
Recording will be available in Scrapeless dashboard
### Multiple Sessions
**Note**: Due to session termination behavior, the `--session-id` parameter has limitations. For reliable multi-session workflows, create separate sessions for each task:
bash
scrapeless-scraping-browser config set apiKey your_token
Create first session for task A
scrapeless-scraping-browser create --name "task-a" --ttl 1800
scrapeless-scraping-browser open https://site-a.com
Complete task A operations...
Create second session for task B
scrapeless-scraping-browser create --name "task-b" --ttl 1800
scrapeless-scraping-browser open https://site-b.com
Complete task B operations...
List all running sessions
scrapeless-scraping-browser sessions
Stop specific session
scrapeless-scraping-browser stop
Stop all sessions
scrapeless-scraping-browser stop-all
**Alternative**: For complex multi-session workflows, use the TypeScript API which supports persistent connections.
## Configuration File
Configuration is managed via the `config` command. All settings are stored in `~/.scrapeless/config.json`.
**Priority**: Config file > Environment variable (only `SCRAPELESS_API_KEY` supports env var)
Available configuration options:
- `apiKey` - Your Scrapeless API token (required)
- `apiVersion` - API version (v1 or v2, default: v2)
- `sessionTtl` - Session timeout in seconds
- `sessionName` - Session name for identification
- `sessionRecording` - Enable session recording (true/false)
- `proxyUrl` - Custom proxy URL
- `proxyCountry` - Proxy country code
- `proxyState` - Proxy state/province
- `proxyCity` - Proxy city
- `fingerprint` - Browser fingerprint
- `debug` - Enable debug logging
## Agent Mode (JSON Output)
Use `--json` for machine-readable output:
bash
scrapeless-scraping-browser snapshot -i --json
scrapeless-scraping-browser get text @e1 --json
scrapeless-scraping-browser is visible @e2 --json
## Ref Lifecycle (Important)
Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
- Clicking links or buttons that navigate
- Form submissions
- Dynamic content loading
bash
scrapeless-scraping-browser click @e5 # Navigates to new page
scrapeless-scraping-browser snapshot -i # MUST re-snapshot
scrapeless-scraping-browser click @e1 # Use new refs
## Session Management
### Important Session Behavior
**Critical**: Scrapeless sessions have specific connection requirements:
- ✅ **Sessions work perfectly with persistent connections**
- ❌ **Sessions automatically terminate when the connection is closed**
- ❌ **Reconnecting to a terminated session will fail**
### Recommended Usage Patterns
#### For Single Operations
bash
Create and use a session for a single task
scrapeless-scraping-browser create --name "single-task" --ttl 600
scrapeless-scraping-browser open https://example.com
scrapeless-scraping-browser screenshot
#### For Multi-Step Operations
For complex workflows requiring multiple steps, use the TypeScript API instead of CLI:
typescript
import { BrowserManager } from './dist/browser.js';
const manager = new BrowserManager();
await manager.launch({ id: 'workflow', action: 'launch' });
const page = manager.getPage();
await page.goto('https://example.com');
await page.screenshot({ path: 'step1.png' });
await page.goto('https://another-site.com');
await page.screenshot({ path: 'step2.png' });
await manager.close();
### Session ID Parameter Limitations
The `--session-id` parameter has limitations due to Scrapeless session behavior:
bash
This will fail if the session connection was previously closed
scrapeless-scraping-browser --session-id open https://example.com
Error: Session has been terminated and cannot be reconnected
**Workaround**: Create new sessions for each workflow instead of reusing session IDs.
### Session Management Commands
Always close sessions when done to avoid leaked resources:
bash
scrapeless-scraping-browser close # Close current session
scrapeless-scraping-browser stop # Stop specific session by ID
scrapeless-scraping-browser stop-all # Stop all sessions
Check running sessions:
bash
scrapeless-scraping-browser sessions
Returns: sessionId, createdAt, status, sessionName
## Live Preview
Get a real-time preview of your browser session via WebSocket:
bash
Get live preview URL for current session
scrapeless-scraping-browser live
Get live preview URL for specific session
scrapeless-scraping-browser live abc123def456
Returns live preview URL for browser viewing
Open this URL in your browser to view the live session
## Timeouts and Slow Pages
For slow websites, use explicit waits:
bash
Wait for network activity to settle
scrapeless-scraping-browser wait --load networkidle
Wait for specific element
scrapeless-scraping-browser wait @e1
Wait for URL pattern
scrapeless-scraping-browser wait --url "/dashboard"
Wait fixed duration (milliseconds)
scrapeless-scraping-browser wait 5000
## Error Handling
Common errors and solutions:
**Authentication Error:**
bash
Make sure API token is set
scrapeless-scraping-browser config set apiKey your_token
Or use environment variable
export SCRAPELESSAPIKEY=your_token
**Session Not Found:**
bash
Session may have expired, create new one
scrapeless-scraping-browser open https://example.com
**Element Not Found:**
bash
Re-snapshot to get fresh refs
scrapeless-scraping-browser snapshot -i
**Timeout:**
bash
Increase session timeout (in seconds)
scrapeless-scraping-browser config set sessionTtl 600
scrapeless-scraping-browser open https://example.com
## Debugging
Enable debug mode for detailed logs:
bash
scrapeless-scraping-browser config set debug true
scrapeless-scraping-browser open https://example.com
Or use `--debug` flag:
bash
scrapeless-scraping-browser --debug open https://example.com
## Configuration Options
| Configuration | Description |
|---------------|-------------|
| `apiKey` | Your API token (required) |
| `apiVersion` | API version (v1 or v2, default: v2) |
| `sessionTtl` | Session timeout in seconds |
| `sessionName` | Session name for identification |
| `sessionRecording` | Enable session recording (true/false) |
| `proxyUrl` | Custom proxy URL |
| `proxyCountry` | Proxy country code |
| `proxyState` | Proxy state/province |
| `proxyCity` | Proxy city |
| `fingerprint` | Browser fingerprint |
| `userAgent` | Custom user agent string |
| `platform` | Platform type (Windows, Linux, macOS, iOS, Android) |
| `screenWidth` | Screen width in pixels |
| `screenHeight` | Screen height in pixels |
| `timezone` | Timezone (e.g., America/New_York, Asia/Shanghai) |
| `languages` | Comma-separated language codes (e.g., en,zh-CN) |
| `debug` | Enable debug logging |
Set configuration using:
bash
scrapeless-scraping-browser config set
Or use environment variable for API key only:
bash
export SCRAPELESSAPIKEY=your_token
## Key Differences from Local Browsers
1. **Cloud-based**: Runs on Scrapeless infrastructure, not locally
2. **Residential Proxies**: Built-in support for residential proxy rotation
3. **Anti-detection**: Automatic browser fingerprinting and stealth features
4. **Session Recording**: Optional recording of browser sessions
5. **No Installation**: No need to install Chrome/Chromium locally
6. **Scalable**: Run multiple sessions in parallel
## Limitations
- Profile management is not currently supported
- Browser extensions are not currently supported
- Requires active internet connection
- Requires valid Scrapeless API token
## Best Practices
1. **Always set API token** before running commands (via config or env var)
2. **Let automatic session management work** - the CLI will reuse sessions intelligently
3. **Use --session-id** only when you need parallel workflows
4. **Close sessions** when done to avoid charges
5. **Use config file** for persistent settings instead of environment variables
6. **Enable recording** for debugging complex flows
7. **Re-snapshot** after page changes
8. **Use --json** for programmatic access
9. **Set session timeout** appropriately for your use case (in seconds)
## Updates
Check for and install updates:
bash
Check version
scrapeless-scraping-browser version
Update via npm
npm update -g scrapeless-scraping-browser-skills
```
Support
- - Documentation: https://docs.scrapeless.com
- API Reference: https://api.scrapeless.com/docs
- GitHub Issues: https://github.com/scrapeless-ai/scraping-browser-skill/issues
使用 scrapeless-browser 进行云端浏览器自动化
重要提示:使用 --session-id 进行会话管理
所有浏览器操作命令均支持 --session-id 参数,用于指定要使用的 Scrapeless 会话。
推荐工作流程
bash
步骤 1:创建会话并保存会话 ID
SESSION_ID=$(scrapeless-scraping-browser new-session --name workflow --ttl 1800 --json | jq -r .taskId)
步骤 2:对所有操作使用该会话 ID
scrapeless-scraping-browser --session-id $SESSION_ID open https://example.com
scrapeless-scraping-browser --session-id $SESSION_ID snapshot -i
scrapeless-scraping-browser --session-id $SESSION_ID click @e1
步骤 3:完成后关闭
scrapeless-scraping-browser --session-id $SESSION_ID close
自动会话管理
如果你未指定 --session-id:
- 1. CLI 将查询正在运行的会话
- 如果存在正在运行的会话,将使用最新的一个
- 如果没有正在运行的会话,将自动创建一个新的
对于生产工作流程,始终使用 --session-id 以确保一致性。
身份验证设置
在使用 scrapeless-browser 之前,你必须设置身份验证:
bash
方法 1:配置文件(推荐,持久化)
scrapeless-scraping-browser config set apiKey your
apitoken_here
方法 2:环境变量
export SCRAPELESS
APIKEY=your
apitoken_here
验证是否已设置
scrapeless-scraping-browser config get apiKey
从 https://app.scrapeless.com 获取你的 API 令牌
会话管理行为
CLI 管理 Scrapeless 会话的行为如下:
- - 会话创建:第一个命令创建一个新的 Scrapeless 会话
- 会话持久性:仅在连接保持时会话才保持活动状态
- 会话终止:连接关闭时会话自动终止
- 重新连接限制:无法重新连接到已终止的会话
重要提示:对于多步骤工作流程,请考虑使用 TypeScript API 来维持持久连接。
核心工作流程
每个浏览器自动化都遵循以下模式:
- 1. 创建会话:创建一个会话并保存会话 ID
- 导航:使用 --session-id 导航到 URL
- 快照:使用 --session-id 获取元素引用
- 交互:使用引用通过 --session-id 进行点击、填写、选择
- 重新快照:在导航或 DOM 更改后,获取新的引用
bash
首先设置 API 令牌
scrapeless-scraping-browser config set apiKey your_token
创建会话
SESSION_ID=$(scrapeless-scraping-browser new-session --name form-fill --ttl 600 --json | jq -r .taskId)
使用会话 ID 开始自动化
scrapeless-scraping-browser --session-id $SESSION_ID open https://example.com/form
scrapeless-scraping-browser --session-id $SESSION_ID snapshot -i
输出:@e1 [input type=email], @e2 [input type=password], @e3 [button] Submit
scrapeless-scraping-browser --session-id $SESSION_ID fill @e1 user@example.com
scrapeless-scraping-browser --session-id $SESSION_ID fill @e2 password123
scrapeless-scraping-browser --session-id $SESSION_ID click @e3
scrapeless-scraping-browser --session-id $SESSION_ID wait --load networkidle
scrapeless-scraping-browser --session-id $SESSION_ID snapshot -i # 检查结果
命令链式调用
命令可以使用 && 在单个 shell 调用中链式执行:
bash
链式执行 open + wait + snapshot
scrapeless-scraping-browser open https://example.com && scrapeless-scraping-browser wait --load networkidle && scrapeless-scraping-browser snapshot -i
链式执行多个交互
scrapeless-scraping-browser fill @e1 user@example.com && scrapeless-scraping-browser fill @e2 password123 && scrapeless-scraping-browser click @e3
何时链式执行: 当你不需要读取中间输出时使用 &&。当你需要先解析输出(例如,快照以发现引用,然后进行交互)时,请分别运行命令。
基本命令
注意:以下所有命令都支持可选的 --session-id 参数。
bash
导航与会话
scrapeless-scraping-browser new-session [options] # 创建新的浏览器会话
scrapeless-scraping-browser [--session-id
] open # 导航到 URL
scrapeless-scraping-browser [--session-id ] close # 关闭浏览器会话
scrapeless-scraping-browser sessions # 列出正在运行的会话
scrapeless-scraping-browser stop # 停止特定会话
scrapeless-scraping-browser stop-all # 停止所有会话
使用高级选项创建会话
new-session 命令支持广泛的自定义选项:
bash
基本会话创建
scrapeless-scraping-browser new-session --name my-session --ttl 1800
带代理设置的会话
scrapeless-scraping-browser new-session \
--name proxy-session \
--proxy-country US \
--proxy-state CA \
--proxy-city Los Angeles \
--ttl 3600
带自定义浏览器配置的会话
scrapeless-scraping-browser new-session \
--name mobile-session \
--platform iOS \
--screen-width 375 \
--screen-height 812 \
--user-agent Mozilla/5.0 (iPhone; CPU iPhone OS 15_0 like Mac OS X) \
--timezone America/Los_Angeles \
--languages en,es
启用录制的会话
scrapeless-scraping-browser new-session \
--name recorded-session \
--recording true \
--ttl 7200
可用选项:
- - --name :用于标识的会话名称
- --ttl :会话超时时间(秒)(默认:180)
- --recording :启用会话录制
- --proxy-country
:代理国家代码(例如,AU、US、GB、CN、JP) - --proxy-state :代理州/地区(例如,NSW、CA、NY、TX)
- --proxy-city :代理城市(例如,sydney、newyork、london、tokyo)
- --user-agent :自定义用户代理字符串
- --platform :平台(Windows、macOS、Linux、iOS、Android)
- --screen-width :屏幕宽度(像素)(默认:1920)
- --screen-height :屏幕高度(像素)(默认:1080)
- --timezone :时区(默认:America/New_York)
- --languages :逗号分隔的语言代码(默认:en)
bash
快照
scrapeless-scraping-browser [--session-id ] snapshot -i # 带引用的交互元素(推荐)
scrapeless-scraping-browser [--session-id ] snapshot -i -C # 包含光标交互元素
scrapeless-scraping-browser [--session-id ] snapshot -s #selector # 限定 CSS 选择器范围
交互(使用快照中的 @refs)
scrapeless-scraping-browser [--session-id ] click @e1 # 点击元素
scrapeless-scraping-browser [--session-id ] fill @e2 text # 清除并输入文本
scrapeless-scraping-browser [--session-id ] type @e2 text # 输入而不清除
scrapeless-scraping-browser [--session-id ] press Enter # 按键
scrapeless-scraping-browser [--session-id ] scroll down 500 # 滚动页面
scrapeless-scraping-browser [--session-id ] scroll down 500 --selector div.content # 在元素内滚动
获取信息
scrapeless-scraping-browser [--session-id ] get text @e1 # 获取元素