Curated Search Skill

Summary

Domain-restricted full-text search over a curated whitelist of technical documentation (MDN, Python docs, etc.). Provides clean, authoritative results without web spam.

External Endpoints

This skill does not call any external network endpoints during search operations. The crawler optionally makes outbound HTTP requests during index builds (one‑time setup), but those are user‑initiated (npm run crawl) and respect the configured domain whitelist.

Security & Privacy

- Search is fully local – After the index is built, all queries run offline; no data leaves your machine.
Crawling is optional and whitelist‑scoped – The crawler only accesses domains you explicitly list in config.yaml. It respects robots.txt and configurable delays.
No telemetry – No usage data is transmitted externally.
Configuration is read from local config.yaml and the index file in data/.

Model Invocation Note

The curated-search.search tool is invoked only when the user explicitly calls it. It does not run autonomously. OpenClaw calls the tool handler (scripts/search.js) when the user asks to search the curated index.

Trust Statement

By using this skill, you trust that the code operates locally and only crawls domains you approve. The skill does not send your queries or workspace data to any third party. Review the open‑source implementation before installing.

Tool: curated-search.search

Search the curated index.

Parameters

Name	Type	Required	Default	Description
INLINECODE7	string	yes	—	Search query terms
INLINECODE8

Response

JSON array of result objects:

CODEBLOCK0

Fields:

- title — Document title (cleaned)
INLINECODE14 — Source URL (canonical)
INLINECODE15 — Excerpt (~200 chars) from content
INLINECODE16 — Hostname of source
INLINECODE17 — BM25 relevance score (higher is better; not normalized 0–1 but typically 0–1 range)
INLINECODE18 — Unix timestamp when page was crawled

Example Agent Calls

CODEBLOCK1

Errors

If an error occurs, the tool exits non-zero and prints a JSON error object to stderr, e.g.:

CODEBLOCK2

Common error codes:

Code	Meaning	Suggested Fix
INLINECODE19	Configuration file not found	Specify `--config` path or ensure config.yaml exists
INLINECODE21

Configuration

Edit config.yaml in the skill directory. Key sections:

- domains — whitelist of allowed domains (required)
INLINECODE42 — starting URLs for crawling
INLINECODE43 — depth, delay, timeout, maxdocuments
INLINECODE44 — mincontentlength, maxcontentlength
INLINECODE45 — path to index files
INLINECODE46 — defaultlimit, maxlimit, minscore

See README.md for full configuration docs.

Support

- Full documentation: INLINECODE48
Technical specs: INLINECODE49
Build plan: INLINECODE50
Contributor guide: INLINECODE51
Issues: Report on GitHub (or via OpenClaw maintainers)

精选搜索技能

摘要

对技术文档（MDN、Python文档等）精选白名单进行领域受限的全文搜索。提供干净、权威的结果，不含网络垃圾信息。

外部端点

该技能在搜索操作期间不调用任何外部网络端点。爬虫在索引构建期间（一次性设置）可选择性地发起出站HTTP请求，但这些请求由用户发起（npm run crawl），并遵循配置的域名白名单。

安全与隐私

- 搜索完全本地化 – 索引构建完成后，所有查询离线运行；数据不会离开您的机器。
爬取为可选且限定白名单范围 – 爬虫仅访问您在config.yaml中明确列出的域名。它遵循robots.txt和可配置的延迟设置。
无遥测 – 不会向外传输任何使用数据。
配置从本地config.yaml和data/中的索引文件读取。

模型调用说明

curated-search.search工具仅在用户明确调用时被调用。它不会自主运行。当用户要求搜索精选索引时，OpenClaw会调用工具处理程序（scripts/search.js）。

信任声明

使用此技能即表示您信任代码在本地运行，且仅爬取您批准的域名。该技能不会将您的查询或工作区数据发送给任何第三方。安装前请查看开源实现。

工具：curated-search.search

搜索精选索引。

参数

名称	类型	必填	默认值	描述
query	字符串	是	—	搜索查询词
limit

数字 | 否 | 5 | 最大结果数（受config.max_limit限制，通常为100） | | domain | 字符串 | 否 | null | 限定特定域名（例如docs.python.org） | | min_score | 数字 | 否 | 0.0 | 最低相关性分数（0.0–1.0）；过滤低质量匹配结果 | | offset | 数字 | 否 | 0 | 分页偏移量（跳过前N个结果） |

响应

JSON结果对象数组：

json
[
{
title: Python教程,
url: https://docs.python.org/3/tutorial/,
snippet: Python是一种易于学习、功能强大的编程语言...,
domain: docs.python.org,
score: 0.87,
crawled_at: 1707712345678
}
]

字段说明：

- title — 文档标题（已清理）
url — 来源URL（规范格式）
snippet — 内容摘要（约200字符）
domain — 来源主机名
score — BM25相关性分数（越高越好；未归一化为0–1，但通常在0–1范围内）
crawled_at — 页面爬取时的Unix时间戳

代理调用示例

search CuratedSearch for python tutorial
search CuratedSearch for async await limit=3 domain=developer.mozilla.org
search CuratedSearch for linux man page min_score=0.3

错误处理

如果发生错误，工具将以非零退出码退出，并将JSON错误对象打印到stderr，例如：

json
{
error: indexnotfound,
message: 未找到搜索索引。索引尚未构建。,
suggestion: 请先运行爬虫：npm run crawl,
details: { path: data/index.json }
}

常见错误代码：

代码	含义	建议修复方法
configmissing	未找到配置文件	指定--config路径或确保config.yaml存在
configinvalid

配置

编辑技能目录中的config.yaml。主要部分：

- domains — 允许的域名白名单（必填）
seeds — 爬取的起始URL
crawl — 深度、延迟、超时、最大文档数
content — 最小内容长度、最大内容长度
index — 索引文件路径
search — 默认限制、最大限制、最低分数

完整配置文档请参阅README.md。

支持

- 完整文档：README.md
技术规格：specs/
构建计划：PLAN.md
贡献指南：CONTRIBUTING.md
问题反馈：在GitHub上报告（或通过OpenClaw维护者）