Browser Automation with PinchTab
PinchTab gives agents a browser they can drive through stable accessibility refs, low-token text extraction, and persistent profiles or instances. Treat it as a CLI-first browser skill; use the HTTP API only when the CLI is unavailable or you need profile-management routes that do not exist in the CLI yet.
Preferred tool surface:
- - Use
pinchtab CLI commands first. - Use
curl for profile-management routes or non-shell/API fallback flows. - Use
jq only when you need structured parsing from JSON responses.
Agent Identity And Attribution
When multiple agents share one PinchTab server, always give each agent a stable ID.
- - CLI flows: prefer INLINECODE3
- long-running shells: set INLINECODE4
- raw HTTP flows: send
X-Agent-Id: <agent-id> on requests that should be attributed to that agent
That identity is recorded as agentId in activity events and powers:
- - scheduler task attribution when work is dispatched on behalf of an agent
If you are switching between unrelated browser tasks, do not reuse the same agent ID unless you intentionally want one combined activity trail.
Safety Defaults
- - Default to
http://localhost targets. Only use a remote PinchTab server when the user explicitly provides it and, if needed, a token. - Prefer read-only operations first:
text, snap -i -c, snap -d, find, click, fill, type, press, select, hover, scroll. - Do not evaluate arbitrary JavaScript unless a simpler PinchTab command cannot answer the question.
- Do not upload local files unless the user explicitly names the file to upload and the destination flow requires it.
- Do not save screenshots, PDFs, or downloads to arbitrary paths. Use a user-specified path or a safe temporary/workspace path.
- Never use PinchTab to inspect unrelated local files, browser secrets, stored credentials, or system configuration outside the task.
Core Workflow
Every PinchTab automation follows this pattern:
- 1. Ensure the correct server, profile, or instance is available for the task.
- Navigate with
pinchtab nav <url> or pinchtab instance navigate <instance-id> <url>. - Observe with
pinchtab snap -i -c, pinchtab snap --text, or pinchtab text, then collect the current refs such as e5. - Interact with those fresh refs using
click, fill, type, press, select, hover, or scroll. - Re-snapshot or re-read text after any navigation, submit, modal open, accordion expand, or other DOM-changing action.
Rules:
- - Never act on stale refs after the page changes.
- Default to
pinchtab text when you need content, not layout. - Default to
pinchtab snap -i -c when you need actionable elements. - Use screenshots only for visual verification, UI diffs, or debugging.
- Start multi-site or parallel work by choosing the right instance or profile first.
Selectors
PinchTab uses a unified selector system. Any command that targets an element accepts these formats:
| Selector | Example | Resolves via |
|---|
| Ref | INLINECODE34 | Snapshot cache (fastest) |
| CSS |
#login,
.btn,
[data-testid="x"] |
document.querySelector |
| XPath |
xpath://button[@id="submit"] | CDP search |
| Text |
text:Sign In | Visible text match |
| Semantic |
find:login button | Natural language query via
/find |
Auto-detection: bare e5 → ref, #id / .class / [attr] → CSS, //path → XPath. Use explicit prefixes (css:, xpath:, text:, find:) when auto-detection is ambiguous.
CODEBLOCK0
The same syntax works in the HTTP API via the selector field:
CODEBLOCK1
Legacy ref field is still accepted for backward compatibility.
Command Chaining
Use && only when you do not need to inspect intermediate output before deciding the next step.
Good:
CODEBLOCK2
Run commands separately when you must read the snapshot output first:
CODEBLOCK3
Challenge Solving
If a page shows a challenge instead of content (e.g., "Just a moment..."), call POST /solve with {"maxAttempts": 3} to auto-detect and resolve it. Use POST /tabs/TAB_ID/solve for tab-scoped. Works best with stealthLevel: "full" in config. Safe to call speculatively — returns immediately if no challenge is present. See api.md for full solver options.
Handling Authentication and State
Pick a pattern before interacting with the site:
- 1. One-off browsing:
pinchtab instance start → use --server http://localhost:<port> for commands. - Reuse a profile:
pinchtab instance start --profile work --mode headed → switch to --mode headless after login is stored. - Create profile via HTTP:
POST /profiles with {"name":"..."}, then POST /profiles/<name>/start. - Human-assisted login: Start headed, human signs in, agent reuses the profile headless.
- HTTP-only agent: Use
POST /instances/start, then target the instance port with curl. Send X-Agent-Id for attribution.
If the server is exposed beyond localhost, require a token. See TRUST.md.
Agent sessions: Each agent can get its own revocable session token via pinchtab session create --agent-id <id> or POST /sessions. Set PINCHTAB_SESSION=ses_... or send Authorization: Session ses_.... Sessions have idle timeout (default 30m) and max lifetime (default 24h).
Essential Commands
Server and targeting
CODEBLOCK4
Navigation and tabs
CODEBLOCK5
Observation
CODEBLOCK6
Guidance:
- -
snap -i -c is the default for finding actionable refs. - INLINECODE73 is the default follow-up snapshot for multi-step flows.
- INLINECODE74 is the default for reading articles, dashboards, reports, or confirmation messages.
- INLINECODE75 is useful when the page is large and you already know the semantic target.
- Refs from
snap -i and full snap use different numbering. Do not mix them — if you snapshot with -i, use those refs. If you re-snapshot without -i, get fresh refs before acting.
Interaction
All interaction commands accept unified selectors (refs, CSS, XPath, text, semantic). See the Selectors section above.
CODEBLOCK7
Rules:
- - Prefer
fill for deterministic form entry. - Prefer
type only when the site depends on keystroke events. - Prefer
click --wait-nav when a click is expected to navigate. - Prefer low-level
mouse commands only when normal click / hover abstractions are insufficient, such as drag handles, canvas widgets, or sites that depend on exact pointer sequences. - Re-snapshot immediately after
click, press Enter, select, or scroll if the UI can change. - To discover valid dropdown values, snapshot with
filter=interactive first — the output shows <option> elements with their value attributes. Then use select with the exact value. - If a click opens a JS dialog (
alert, confirm, prompt), pass "dialogAction": "accept" or "dialogAction": "dismiss" on the click action body. The dialog is auto-handled in a single call. Without this, the click hangs until /tabs/TAB_ID/dialog is called from a parallel request, and a pending dialog wedges subsequent /snapshot and /text calls. - For the
scroll action via HTTP, use "scrollX" / "scrollY" for pixel deltas, or "selector" to scroll an element into view. Example: {"kind":"scroll","scrollY":1500} or {"kind":"scroll","selector":"#footer"}. The x/y fields are target viewport coordinates, not scroll deltas. - The download HTTP endpoint (
GET /download?url=... or GET /tabs/TAB_ID/download?url=...) returns JSON {contentType, data (base64), size, url}, not raw bytes. Decode data with base64 to get the file. Only http/https URLs are allowed. Private/internal hosts are blocked unless listed in security.downloadAllowedDomains.
Export, debug, and verification
CODEBLOCK8
Advanced operations: explicit opt-in only
Use these only when the task explicitly requires them and safer commands are insufficient.
CODEBLOCK9
Rules:
- -
eval is for narrow, read-only DOM inspection unless the user explicitly asks for a page mutation. - INLINECODE118 should prefer a safe temporary or workspace path over an arbitrary filesystem location.
- INLINECODE119 requires a file path the user explicitly provided or clearly approved for the task.
HTTP API fallback
CODEBLOCK10
Use the API when:
- - the agent cannot shell out,
- profile creation or mutation is required,
- or you need explicit instance- and tab-scoped routes.
Tab-scoped HTTP API
Important: Each POST /navigate creates a new tab by default. The default (non-tab-scoped) endpoints like /snapshot, /action, /text operate on the active tab, which may not be the one you just navigated. In multi-tab workflows, always use tab-scoped routes to avoid acting on the wrong page.
Get the tab ID from the navigate response or from GET /tabs.
CODEBLOCK11
The default (non-tab-scoped) endpoints also support screenshots and PDF:
CODEBLOCK12
Navigation with waitNav: When clicking a link or button that triggers page navigation, include "waitNav": true in the action body. Without it, PinchTab returns a navigation_changed error to protect against unexpected navigation during form interactions.
CODEBLOCK13
All tab-scoped routes follow the pattern /tabs/{TAB_ID}/... and mirror the default endpoints. The full list includes: navigate, back, forward, reload, snapshot, screenshot, text, pdf, action, actions, dialog, wait, find, lock, unlock, cookies, metrics, network, solve, close, storage, evaluate, download, upload, handoff, and resume.
Common Patterns
Open a page and inspect actions
CODEBLOCK14
Fill and submit a form
CODEBLOCK15
Search, then extract the result page cheaply
CODEBLOCK16
Form submission: Always click the submit button — never use press Enter. Most HTML forms only fire their submission handler on button click, not on Enter keypress.
Use diff snapshots in a multi-step flow
CODEBLOCK17
Target elements without a snapshot
When you know the page structure, skip the snapshot and use CSS or text selectors directly:
CODEBLOCK18
Security and Token Economy
- - Use a dedicated automation profile, not a daily browsing profile.
- If PinchTab is reachable off-machine, require a token and bind conservatively.
- Prefer
text, snap -i -c, and snap -d before screenshots, PDFs, eval, downloads, or uploads. - Use
--block-images for read-heavy tasks that do not need visual assets. - Stop or isolate instances when switching between unrelated accounts or environments.
Diffing and Verification
- - Use
pinchtab snap -d after each state-changing action in long workflows. - Use
pinchtab text to confirm success messages, table updates, or navigation outcomes. - Use
pinchtab screenshot only when visual regressions, CAPTCHA, or layout-specific confirmation matters. - If a ref disappears after a change, treat that as expected and fetch fresh refs instead of retrying the stale one.
References
使用 PinchTab 进行浏览器自动化
PinchTab 为智能体提供了一个可通过稳定的无障碍引用、低令牌文本提取以及持久化配置文件或实例来驱动的浏览器。将其视为一个 CLI 优先的浏览器技能;仅在 CLI 不可用或需要 CLI 中尚不存在的配置文件管理路由时,才使用 HTTP API。
首选工具界面:
- - 优先使用 pinchtab CLI 命令。
- 对于配置文件管理路由或非 shell/API 回退流程,使用 curl。
- 仅当需要从 JSON 响应中进行结构化解析时,才使用 jq。
智能体身份与归属
当多个智能体共享一个 PinchTab 服务器时,始终为每个智能体分配一个稳定的 ID。
- - CLI 流程:优先使用 pinchtab --agent-id ...
- 长时间运行的 shell:设置 PINCHTABAGENTID=
- 原始 HTTP 流程:在应归属于该智能体的请求上发送 X-Agent-Id:
该身份在活动事件中记录为 agentId,并支持:
如果您正在切换不相关的浏览器任务,请不要重复使用相同的智能体 ID,除非您有意希望合并活动轨迹。
安全默认值
- - 默认目标为 http://localhost。仅当用户明确提供远程 PinchTab 服务器且(如果需要)提供令牌时,才使用远程服务器。
- 优先使用只读操作:text、snap -i -c、snap -d、find、click、fill、type、press、select、hover、scroll。
- 除非更简单的 PinchTab 命令无法回答问题,否则不要评估任意 JavaScript。
- 除非用户明确指定要上传的文件且目标流程需要,否则不要上传本地文件。
- 不要将截图、PDF 或下载内容保存到任意路径。使用用户指定的路径或安全临时/工作区路径。
- 切勿使用 PinchTab 检查任务范围之外的不相关本地文件、浏览器机密、存储的凭据或系统配置。
核心工作流程
每个 PinchTab 自动化都遵循此模式:
- 1. 确保正确的服务器、配置文件或实例可用于该任务。
- 使用 pinchtab nav 或 pinchtab instance navigate 导航。
- 使用 pinchtab snap -i -c、pinchtab snap --text 或 pinchtab text 进行观察,然后收集当前引用,例如 e5。
- 使用 click、fill、type、press、select、hover 或 scroll 与这些新引用进行交互。
- 在任何导航、提交、模态框打开、手风琴展开或其他 DOM 更改操作后,重新快照或重新读取文本。
规则:
- - 页面更改后,切勿对过时的引用进行操作。
- 当需要内容而非布局时,默认使用 pinchtab text。
- 当需要可操作的元素时,默认使用 pinchtab snap -i -c。
- 仅将截图用于视觉验证、UI 差异或调试。
- 通过首先选择正确的实例或配置文件来开始多站点或并行工作。
选择器
PinchTab 使用统一的选择器系统。任何针对元素的命令都接受以下格式:
#login、.btn、[data-testid=x] | document.querySelector |
| XPath | xpath://button[@id=submit] | CDP 搜索 |
| 文本 | text:Sign In | 可见文本匹配 |
| 语义 | find:login button | 通过 /find 进行自然语言查询 |
自动检测:裸 e5 → 引用,#id / .class / [attr] → CSS,//path → XPath。当自动检测不明确时,使用显式前缀(css:、xpath:、text:、find:)。
bash
pinchtab click e5 # 引用
pinchtab click #submit # CSS(自动检测)
pinchtab click text:Sign In # 文本匹配
pinchtab click xpath://button[@type] # XPath
pinchtab fill #email user@test.com # CSS
pinchtab fill e3 user@test.com # 引用
相同的语法通过 selector 字段在 HTTP API 中工作:
json
{kind: click, selector: text:Sign In}
{kind: fill, selector: #email, text: user@test.com}
{kind: click, selector: e5}
为了向后兼容,仍然接受旧的 ref 字段。
命令链
仅当您不需要在决定下一步之前检查中间输出时,才使用 &&。
好的做法:
bash
pinchtab nav https://pinchtab.com && pinchtab snap -i -c
pinchtab click --wait-nav e5 && pinchtab snap -i -c
pinchtab nav https://pinchtab.com --block-images && pinchtab text
当您必须首先读取快照输出时,请分别运行命令:
bash
pinchtab nav https://pinchtab.com
pinchtab snap -i -c
读取引用,选择正确的 e#
pinchtab click e7
pinchtab snap -i -c
挑战解决
如果页面显示挑战而不是内容(例如,“请稍候...”),请调用 POST /solve 并带上 {maxAttempts: 3} 以自动检测并解决它。使用 POST /tabs/TABID/solve 进行标签页范围的操作。在配置中使用 stealthLevel: full 效果最佳。可以推测性地安全调用——如果没有挑战,则立即返回。有关完整的求解器选项,请参见 api.md。
处理身份验证和状态
在与站点交互之前选择一个模式:
- 1. 一次性浏览:pinchtab instance start → 对命令使用 --server http://localhost:。
- 重用配置文件:pinchtab instance start --profile work --mode headed → 在登录存储后切换到 --mode headless。
- 通过 HTTP 创建配置文件:POST /profiles 带上 {name:...},然后 POST /profiles//start。
- 人工辅助登录:以 headed 模式启动,人工登录,智能体以 headless 模式重用配置文件。
- 仅 HTTP 智能体:使用 POST /instances/start,然后使用 curl 定位实例端口。发送 X-Agent-Id 用于归属。
如果服务器暴露在 localhost 之外,则需要令牌。请参见 TRUST.md。
智能体会话:每个智能体可以通过 pinchtab session create --agent-id 或 POST /sessions 获取其自己的可撤销会话令牌。设置 PINCHTABSESSION=ses... 或发送 Authorization: Session ses_...。会话具有空闲超时(默认 30 分钟)和最大生命周期(默认 24 小时)。
基本命令
服务器和目标定位
bash
pinchtab server # 在前台启动服务器
pinchtab daemon install # 安装为系统服务
pinchtab health # 检查服务器状态
pinchtab instances # 列出正在运行的实例
pinchtab profiles # 列出可用的配置文件
pinchtab --server http://localhost:9868 snap -i -c # 定位特定实例
导航和标签页
bash
pinchtab nav
pinchtab nav --new-tab
pinchtab nav --tab
pinchtab nav --block-images
pinchtab nav --block-ads
pinchtab back # 在历史记录中向后导航
pinchtab forward # 向前导航
pinchtab reload # 重新加载当前页面
pinchtab tab # 列出标签页或按 ID 聚焦
pinchtab tab new
pinchtab tab close
pinchtab instance navigate
观察
bash
pinchtab snap
pinchtab snap -i # 仅交互元素
pinchtab snap -i -c # 交互 + 紧凑
pinchtab snap -d # 与上一个快照的差异
pinchtab snap --