Firecrawl Local Skill

Self-hosted Firecrawl integration using the v1 REST API. Tests connectivity first,
executes scrape/crawl/map, handles async crawl polling automatically.

Setup (one-time)

CODEBLOCK0

The script lives at scripts/run.sh in this skill folder — copy it into place as above.

Prerequisites: curl, jq installed. Firecrawl running at localhost:3002.

Optional env vars:

export FIRECRAWL_LOCAL_URL="http://localhost:3002"  # default
export FIRECRAWL_API_KEY="fc-your-key"              # only needed if auth enabled

Commands

Default — scrape a single page (URL only, no subcommand needed)

CODEBLOCK2

Scrape — explicit, with format options

CODEBLOCK3

Map — discover all URLs on a site

CODEBLOCK4

Crawl — bulk extract multiple pages (async, auto-polled)

firecrawl-local crawl https://docs.example.com
firecrawl-local crawl https://docs.example.com --limit 30 --max-depth 2
firecrawl-local crawl https://docs.example.com --include /docs --exclude /blog

Agent Instructions

When to use each command

Goal	Command
Get content from one URL (quickest)	INLINECODE4
Discover what pages exist

Optimal workflows

Documentation RAG pipeline:
CODEBLOCK6

Full site ingestion:
CODEBLOCK7

Parameters

Flag	Applies to	Description
INLINECODE11	map, crawl	Max pages (default: 50 for crawl, 500 for map)
INLINECODE12

Reading the output

- scrape: Returns INLINECODE20
map: Returns INLINECODE21
crawl: Returns {success, data: [{url, markdown, metadata}, ...]} ← after polling completes

Failure signals and fixes

Error	Cause	Fix
INLINECODE23	Service not running	Start Firecrawl, check port 3002
INLINECODE24

Script reference

See scripts/run.sh for the full implementation. Key design decisions:

- Health check uses /health endpoint with 3s timeout
Auth header only sent when FIRECRAWL_API_KEY is set
Crawl polling retries every 5s up to 60 attempts (5 minutes)
All parameters are passed via jq to prevent shell injection in JSON

Firecrawl 本地技能

使用 v1 REST API 的自托管 Firecrawl 集成。首先测试连接性，执行抓取/爬取/映射操作，并自动处理异步爬取轮询。

设置（一次性操作）

bash
mkdir -p ~/.openclaw/skills/firecrawl-local
cp run.sh ~/.openclaw/skills/firecrawl-local/run.sh
chmod +x ~/.openclaw/skills/firecrawl-local/run.sh

脚本位于此技能文件夹的 scripts/run.sh 中——按上述方式复制到目标位置。

前置条件： 已安装 curl、jq。Firecrawl 在 localhost:3002 运行。

可选环境变量：
bash
export FIRECRAWLLOCALURL=http://localhost:3002 # 默认值
export FIRECRAWLAPIKEY=fc-your-key # 仅在启用认证时需要

命令

默认 — 抓取单个页面（仅需 URL，无需子命令）

bash firecrawl-local https://docs.example.com/api

抓取 — 显式操作，带格式选项

bash firecrawl-local scrape https://docs.example.com/api firecrawl-local scrape https://docs.example.com/api --formats markdown,html

映射 — 发现网站上的所有 URL

bash firecrawl-local map https://docs.example.com firecrawl-local map https://docs.example.com --limit 200

爬取 — 批量提取多个页面（异步，自动轮询）

bash firecrawl-local crawl https://docs.example.com firecrawl-local crawl https://docs.example.com --limit 30 --max-depth 2 firecrawl-local crawl https://docs.example.com --include /docs --exclude /blog

代理指令

何时使用每个命令

目标	命令
获取单个 URL 的内容（最快）	firecrawl-local <url>
发现存在哪些页面

最佳工作流程

文档 RAG 流水线：

1. map https://docs.example.com → 获取完整 URL 列表
scrape <特定关键页面> → 针对性提取
将 markdown 传递给嵌入流水线

完整站点摄取：

1. crawl https://docs.example.com --limit 50 --max-depth 3
结果自动轮询并以 JSON 数组形式返回 {url, markdown}

参数

标志	适用范围	描述
--limit N	map, crawl	最大页面数（默认：crawl 为 50，map 为 500）
--max-depth N

读取输出

- scrape：返回 {success, data: {markdown, html, metadata}}
map：返回 {success, links: [...]}
crawl：返回 {success, data: [{url, markdown, metadata}, ...]} ← 轮询完成后

失败信号及修复

错误	原因	修复
Local Firecrawl unavailable	服务未运行	启动 Firecrawl，检查端口 3002
success: false

脚本参考

完整实现请参见 scripts/run.sh。关键设计决策：

- 健康检查使用 /health 端点，超时时间 3 秒
仅在设置了 FIRECRAWLAPIKEY 时发送认证头
爬取轮询每 5 秒重试一次，最多 60 次（5 分钟）
所有参数通过 jq 传递，防止 JSON 中的 shell 注入

firecrawl-local本地火爬

firecrawl-local

Firecrawl Local Skill

Setup (one-time)

Commands