Curated Search Skill
Summary
Domain-restricted full-text search over a curated whitelist of technical documentation (MDN, Python docs, etc.). Provides clean, authoritative results without web spam.
External Endpoints
This skill does not call any external network endpoints during search operations. The crawler optionally makes outbound HTTP requests during index builds (one‑time setup), but those are user‑initiated (npm run crawl) and respect the configured domain whitelist.
Security & Privacy
- - Search is fully local – After the index is built, all queries run offline; no data leaves your machine.
- Crawling is optional and whitelist‑scoped – The crawler only accesses domains you explicitly list in
config.yaml. It respects robots.txt and configurable delays. - No telemetry – No usage data is transmitted externally.
- Configuration is read from local
config.yaml and the index file in data/.
Model Invocation Note
The curated-search.search tool is invoked only when the user explicitly calls it. It does not run autonomously. OpenClaw calls the tool handler (scripts/search.js) when the user asks to search the curated index.
Trust Statement
By using this skill, you trust that the code operates locally and only crawls domains you approve. The skill does not send your queries or workspace data to any third party. Review the open‑source implementation before installing.
Tool: curated-search.search
Search the curated index.
Parameters
| Name | Type | Required | Default | Description |
|---|
| INLINECODE7 | string | yes | — | Search query terms |
| INLINECODE8 |
number | no | 5 | Maximum results (capped by config.max_limit, typically 100) |
|
domain | string | no | null | Filter to specific domain (e.g.,
docs.python.org) |
|
min_score | number | no | 0.0 | Minimum relevance score (0.0–1.0); filters out low-quality matches |
|
offset | number | no | 0 | Pagination offset (skip first N results) |
Response
JSON array of result objects:
CODEBLOCK0
Fields:
- -
title — Document title (cleaned) - INLINECODE14 — Source URL (canonical)
- INLINECODE15 — Excerpt (~200 chars) from content
- INLINECODE16 — Hostname of source
- INLINECODE17 — BM25 relevance score (higher is better; not normalized 0–1 but typically 0–1 range)
- INLINECODE18 — Unix timestamp when page was crawled
Example Agent Calls
CODEBLOCK1
Errors
If an error occurs, the tool exits non-zero and prints a JSON error object to stderr, e.g.:
CODEBLOCK2
Common error codes:
| Code | Meaning | Suggested Fix |
|---|
| INLINECODE19 | Configuration file not found | Specify --config path or ensure config.yaml exists |
| INLINECODE21 |
YAML parsing failed | Check syntax in config.yaml |
|
config_missing_index_path |
index.path not set | Add index.path to config |
|
index_not_found | Index file missing | Run
npm run crawl to build index |
|
index_corrupted | Index file incompatible or corrupted | Rebuild index with
npm run crawl |
|
index_init_failed | Unexpected index initialization error | Check permissions, reinstall dependencies |
|
missing_query | No query provided | Provide
--query argument |
|
query_too_long | Query exceeds 1000 characters | Shorten the query |
|
limit_exceeded | Limit > config.max_limit | Use a smaller limit |
|
invalid_domain | Domain filter malformed | Use format like
docs.python.org |
|
conflicting_flags | Mutually exclusive flags used (e.g.,
--stats with
--query) | Use flags correctly |
|
stats_failed | Could not retrieve index stats | Ensure index is accessible |
|
search_failed | Search execution threw an error | Check query and index integrity |
Configuration
Edit config.yaml in the skill directory. Key sections:
- -
domains — whitelist of allowed domains (required) - INLINECODE42 — starting URLs for crawling
- INLINECODE43 — depth, delay, timeout, maxdocuments
- INLINECODE44 — mincontentlength, maxcontentlength
- INLINECODE45 — path to index files
- INLINECODE46 — defaultlimit, maxlimit, minscore
See README.md for full configuration docs.
Support
- - Full documentation: INLINECODE48
- Technical specs: INLINECODE49
- Build plan: INLINECODE50
- Contributor guide: INLINECODE51
- Issues: Report on GitHub (or via OpenClaw maintainers)
精选搜索技能
摘要
对技术文档(MDN、Python文档等)精选白名单进行领域受限的全文搜索。提供干净、权威的结果,不含网络垃圾信息。
外部端点
该技能在搜索操作期间不调用任何外部网络端点。爬虫在索引构建期间(一次性设置)可选择性地发起出站HTTP请求,但这些请求由用户发起(npm run crawl),并遵循配置的域名白名单。
安全与隐私
- - 搜索完全本地化 – 索引构建完成后,所有查询离线运行;数据不会离开您的机器。
- 爬取为可选且限定白名单范围 – 爬虫仅访问您在config.yaml中明确列出的域名。它遵循robots.txt和可配置的延迟设置。
- 无遥测 – 不会向外传输任何使用数据。
- 配置从本地config.yaml和data/中的索引文件读取。
模型调用说明
curated-search.search工具仅在用户明确调用时被调用。它不会自主运行。当用户要求搜索精选索引时,OpenClaw会调用工具处理程序(scripts/search.js)。
信任声明
使用此技能即表示您信任代码在本地运行,且仅爬取您批准的域名。该技能不会将您的查询或工作区数据发送给任何第三方。安装前请查看开源实现。
工具:curated-search.search
搜索精选索引。
参数
| 名称 | 类型 | 必填 | 默认值 | 描述 |
|---|
| query | 字符串 | 是 | — | 搜索查询词 |
| limit |
数字 | 否 | 5 | 最大结果数(受config.max_limit限制,通常为100) |
| domain | 字符串 | 否 | null | 限定特定域名(例如docs.python.org) |
| min_score | 数字 | 否 | 0.0 | 最低相关性分数(0.0–1.0);过滤低质量匹配结果 |
| offset | 数字 | 否 | 0 | 分页偏移量(跳过前N个结果) |
响应
JSON结果对象数组:
json
[
{
title: Python教程,
url: https://docs.python.org/3/tutorial/,
snippet: Python是一种易于学习、功能强大的编程语言...,
domain: docs.python.org,
score: 0.87,
crawled_at: 1707712345678
}
]
字段说明:
- - title — 文档标题(已清理)
- url — 来源URL(规范格式)
- snippet — 内容摘要(约200字符)
- domain — 来源主机名
- score — BM25相关性分数(越高越好;未归一化为0–1,但通常在0–1范围内)
- crawled_at — 页面爬取时的Unix时间戳
代理调用示例
search CuratedSearch for python tutorial
search CuratedSearch for async await limit=3 domain=developer.mozilla.org
search CuratedSearch for linux man page min_score=0.3
错误处理
如果发生错误,工具将以非零退出码退出,并将JSON错误对象打印到stderr,例如:
json
{
error: indexnotfound,
message: 未找到搜索索引。索引尚未构建。,
suggestion: 请先运行爬虫:npm run crawl,
details: { path: data/index.json }
}
常见错误代码:
| 代码 | 含义 | 建议修复方法 |
|---|
| configmissing | 未找到配置文件 | 指定--config路径或确保config.yaml存在 |
| configinvalid |
YAML解析失败 | 检查config.yaml中的语法 |
| config
missingindex_path | 未设置index.path | 在配置中添加index.path |
| index
notfound | 索引文件缺失 | 运行npm run crawl构建索引 |
| index_corrupted | 索引文件不兼容或损坏 | 使用npm run crawl重建索引 |
| index
initfailed | 意外的索引初始化错误 | 检查权限,重新安装依赖 |
| missing_query | 未提供查询 | 提供--query参数 |
| query
toolong | 查询超过1000个字符 | 缩短查询内容 |
| limit
exceeded | 限制超过config.maxlimit | 使用较小的限制值 |
| invalid_domain | 域名过滤器格式错误 | 使用类似docs.python.org的格式 |
| conflicting_flags | 使用了互斥的标志(例如--stats与--query) | 正确使用标志 |
| stats_failed | 无法获取索引统计信息 | 确保索引可访问 |
| search_failed | 搜索执行时出错 | 检查查询和索引完整性 |
配置
编辑技能目录中的config.yaml。主要部分:
- - domains — 允许的域名白名单(必填)
- seeds — 爬取的起始URL
- crawl — 深度、延迟、超时、最大文档数
- content — 最小内容长度、最大内容长度
- index — 索引文件路径
- search — 默认限制、最大限制、最低分数
完整配置文档请参阅README.md。
支持
- - 完整文档:README.md
- 技术规格:specs/
- 构建计划:PLAN.md
- 贡献指南:CONTRIBUTING.md
- 问题反馈:在GitHub上报告(或通过OpenClaw维护者)