When to Use This Skill
Activate when the user's request involves interacting with a website:
Activate when the user:
- - Needs to do anything on a website ("Send a LinkedIn message", "Book an Airbnb", "Search Google for...")
- Asks how to interact with a site ("How do I post a tweet?", "How to apply on LinkedIn?")
- Wants to fill out forms, click buttons, navigate, search, filter, or browse on a specific site
- Wants to take a screenshot of a web page or monitor changes
- Builds browser-based AI agents, web scrapers, or E2E tests for external websites
- Automates repetitive web tasks (data entry, form submission, content posting)
- Wants to control their existing Chrome browser (Extension mode)
What Actionbook Provides
Actionbook is a library of pre-verified page interaction data. actionbook search finds actions matching a task description; actionbook get "<ID>" returns a structured document describing a page's purpose, functional capabilities, and DOM structure with inline CSS selectors — eliminating the need for runtime page structure discovery.
search and get
search — Find actions by task description
CODEBLOCK0
Returns for each result:
- -
ID — use with actionbook get "<ID>" to retrieve full details - INLINECODE4 —
page (full page) or area (page section) - INLINECODE7 — page overview and function summary
- INLINECODE8 — page where this action applies
- INLINECODE9 — selector reliability percentage (0–100%)
- INLINECODE10 — last verified date
Constructing an effective search query
The query string is the primary signal for finding the right action. Pack it with the user's full intent — not just a site name or a vague keyword.
Include in the query:
- 1. Target site — the website name or domain
- Task verb — what the user wants to do (search, book, post, filter, login, compose, etc.)
- Object / context — what they're acting on (listings, messages, flights, repositories, etc.)
- Specific details — any constraints, filters, or parameters the user mentioned (dates, location, category, language, etc.)
Rule of thumb: Rewrite the user's request as a single descriptive sentence and use that as the query.
| User says | Bad query | Good query |
|---|
| "Book an Airbnb in Tokyo for next week" | INLINECODE12 | INLINECODE13 |
| "Search arXiv for recent NLP papers" |
"arxiv search" |
"arxiv advanced search papers NLP natural language processing recent" |
| "Send a LinkedIn connection request" |
"linkedin" |
"linkedin send connection request invite someone" |
| "Post a tweet with an image" |
"twitter post" |
"twitter compose new tweet post with image media attachment" |
| "Filter GitHub issues by label" |
"github issues" |
"github repository issues filter by label search issues" |
When the user provides extra context (e.g., specific dates, a city name, a topic), fold it into the query even if it won't match a stored action literally — it helps the search engine rank relevant pages higher.
CODEBLOCK1
If --domain or --url is known, always add them — they narrow results and improve precision.
get — Retrieve full action details by ID
CODEBLOCK2
Returns a structured document with:
- 1. Page URL — exact URL and query/path parameters
- Page Overview — what the page does
- Page Function Summary — interactive capabilities (e.g., "Search Term Input", "Subject Classification Filtering")
- Page Structure Summary — DOM hierarchy with CSS selectors inline
Selectors appear embedded in the structure description, e.g.:
CODEBLOCK3
Extract CSS selectors from the structure summary for use with browser commands.
Browser Commands
Quick reference. Full details with all flags and options: command-reference.md.
Navigation
CODEBLOCK4
Interactions
CODEBLOCK5
Observation
CODEBLOCK6
INLINECODE24 cleans up the browser session. Skip if the user requests the browser remain open.
Examples
User request: "Search arXiv for papers about Neural Networks, search in titles only"
CODEBLOCK7
Fallback
Actionbook stores page data captured at indexing time. Websites evolve, so selectors may become outdated.
When a selector from actionbook get fails at runtime, actionbook browser snapshot provides the live accessibility tree with current selectors. Use selectors from the snapshot output to retry the interaction.
Selectors used in browser commands should come from actionbook get or actionbook browser snapshot output in the current session — not from prior knowledge or memory.
If actionbook search returns no results for a page, use snapshot as the primary source, or fall back to other available tools.
References
Login flows, OAuth, 2FA handling, session persistence |
何时使用此技能
当用户的请求涉及与网站交互时激活:
当用户出现以下情况时激活:
- - 需要在网站上执行任何操作(发送LinkedIn消息、预订Airbnb、在Google上搜索...)
- 询问如何与网站交互(如何发布推文?、如何在LinkedIn上申请?)
- 希望在特定网站上填写表单、点击按钮、导航、搜索、筛选或浏览
- 想要截取网页截图或监控变化
- 为外部网站构建基于浏览器的AI代理、网页爬虫或端到端测试
- 自动化重复性网页任务(数据录入、表单提交、内容发布)
- 想要控制现有的Chrome浏览器(扩展模式)
Actionbook提供的内容
Actionbook是一个预验证页面交互数据的库。actionbook search可查找与任务描述匹配的操作;actionbook get 返回一个结构化文档,描述页面的用途、功能能力以及包含内联CSS选择器的DOM结构——无需在运行时发现页面结构。
search和get
search — 按任务描述查找操作
bash
actionbook search <查询> # 按任务意图搜索
actionbook search <查询> --domain site.com # 按域名筛选
actionbook search <查询> --url # 按URL筛选
actionbook search <查询> -p 2 -s 20 # 分页
返回每个结果:
- - ID — 与actionbook get 配合使用以获取完整详情
- Type — page(整页)或area(页面区域)
- Description — 页面概览和功能摘要
- URL — 此操作适用的页面
- Health Score — 选择器可靠性百分比(0–100%)
- Updated — 最后验证日期
构建有效的搜索查询
query字符串是查找正确操作的主要信号。应包含用户的完整意图——不仅仅是网站名称或模糊的关键词。
查询中应包含:
- 1. 目标网站 — 网站名称或域名
- 任务动词 — 用户想做什么(搜索、预订、发布、筛选、登录、撰写等)
- 对象/上下文 — 用户操作的对象(列表、消息、航班、仓库等)
- 具体细节 — 用户提到的任何约束、筛选条件或参数(日期、地点、类别、语言等)
经验法则: 将用户的请求改写为一句描述性语句,并将其用作查询。
| 用户说 | 糟糕的查询 | 好的查询 |
|---|
| 预订下周东京的Airbnb | airbnb | airbnb 搜索 列表 东京 日期 入住 退房 房客 |
| 在arXiv上搜索最近的NLP论文 |
arxiv 搜索 | arxiv 高级搜索 论文 NLP 自然语言处理 最近 |
| 发送LinkedIn连接请求 | linkedin | linkedin 发送 连接 请求 邀请 某人 |
| 发布带图片的推文 | twitter 发布 | twitter 撰写 新推文 发布 带图片 媒体 附件 |
| 按标签筛选GitHub问题 | github 问题 | github 仓库 问题 按标签筛选 搜索 问题 |
当用户提供额外上下文时(例如特定日期、城市名称、主题),即使不会与存储的操作完全匹配,也应将其纳入查询——这有助于搜索引擎将相关页面排名更高。
bash
用户:帮我在LinkedIn上申请软件工程师职位
actionbook search linkedin 工作 搜索 申请 软件工程师 申请表
用户:我需要在arXiv上搜索机器学习论文
actionbook search arxiv 高级搜索 论文 机器学习 主题 类别
如果已知--domain或--url,务必添加——它们可以缩小结果范围并提高精确度。
get — 按ID获取完整操作详情
bash
直接使用搜索结果中的ID
actionbook get arxiv.org:/search/advanced:default
返回一个结构化文档,包含:
- 1. 页面URL — 精确的URL及查询/路径参数
- 页面概览 — 页面的功能
- 页面功能摘要 — 交互能力(例如搜索词输入、主题分类筛选)
- 页面结构摘要 — 包含内联CSS选择器的DOM层级结构
选择器嵌入在结构描述中,例如:
搜索词表单区域:包含搜索词输入字段(input[type=text])、
字段选择下拉框(select[name=searchtype])和提交按钮(button.Search)
从结构摘要中提取CSS选择器,用于浏览器命令。
浏览器命令
快速参考。包含所有标志和选项的完整详情:command-reference.md
导航
bash
actionbook browser open # 在新标签页中打开URL
actionbook browser goto # 导航当前页面
actionbook browser back / forward # 历史导航
actionbook browser reload # 刷新页面
actionbook browser pages # 列出打开的标签页
actionbook browser switch # 切换标签页
actionbook browser close # 关闭浏览器
交互
bash
actionbook browser click <选择器> # 点击元素
actionbook browser fill <选择器> 文本 # 清除并输入
actionbook browser type <选择器> 文本 # 追加文本
actionbook browser select <选择器> 值 # 选择下拉选项
actionbook browser hover <选择器> # 悬停
actionbook browser press Enter # 按下按键
观察
bash
actionbook browser text # 整页文本
actionbook browser text <选择器> # 元素文本
actionbook browser snapshot # 无障碍树(实时页面结构)
actionbook browser screenshot # 保存截图
actionbook browser screenshot --full-page # 整页截图
actionbook browser wait <选择器> # 等待元素
actionbook browser wait-nav # 等待导航
actionbook browser close会清理浏览器会话。如果用户要求保持浏览器打开,则跳过此步骤。
示例
用户请求:在arXiv上搜索关于神经网络的论文,仅在标题中搜索
bash
1. 搜索 — 包含完整意图:网站 + 任务 + 主题 + 筛选偏好
actionbook search arxiv 高级搜索 论文 神经网络 标题 字段 --domain arxiv.org
2. 获取详情 — 阅读页面结构摘要以获取选择器
actionbook get arxiv.org:/search/advanced:default
响应包含:input[type=text]、select[name=searchtype]、button.Search等
3. 使用响应中的选择器进行自动化
actionbook browser open https://arxiv.org/search/advanced
actionbook browser fill input[type=text] Neural Network
actionbook browser select select[name=searchtype] title
actionbook browser click button.Search
actionbook browser wait-nav
actionbook browser text
actionbook browser close
回退方案
Actionbook存储的是索引时捕获的页面数据。网站会不断演变,因此选择器可能过时。
当actionbook get中的选择器在运行时失败时,actionbook browser snapshot提供包含当前选择器的实时无障碍树。使用快照输出中的选择器重试交互。
浏览器命令中使用的选择器应来自当前会话中的actionbook get或actionbook browser snapshot输出——而非来自先前的知识或记忆。
如果actionbook search对某个页面未返回结果,则将snapshot作为主要来源,或回退到其他可用工具。
参考
登录流程、OAuth、2FA处理、会话持久化 |