Stealth Scraper

基于 Playwright 的反爬虫网页抓取技能，支持普通模式、隐身模式和批量模式。

Description

强大的网页抓取工具，内置反检测技术，绕过常见反爬虫机制。纯手写反检测代码，不依赖任何第三方 stealth 插件。

核心能力：

- 🕵️ 隐身模式：隐藏 webdriver 指纹、随机 UA、随机延迟、禁用指纹追踪
📄 普通模式：快速抓取页面内容
📦 批量模式：并发抓取多个 URL，自动限速
🎯 精确提取：CSS 选择器定向抓取
📸 截图 & HTML 保存

Configuration

CODEBLOCK0

Setup

首次使用前需要安装依赖：

CODEBLOCK1

如果 Chromium 未自动安装，运行：

CODEBLOCK2

Usage

普通模式（快速抓取）

CODEBLOCK3

参数：

参数	说明	默认值
INLINECODE0	页面加载后等待时间（毫秒）	2000
INLINECODE1

CSS 选择器，只提取匹配元素的内容 | 无（提取全部） |

示例：
CODEBLOCK4

隐身模式（反爬虫）

CODEBLOCK5

参数：

参数	说明	默认值
INLINECODE2	页面加载后等待时间（毫秒）	2000
INLINECODE3

示例：
CODEBLOCK6

批量模式

CODEBLOCK7

参数：

参数	说明	默认值
INLINECODE9	URL 列表文件（每行一个 URL）	无
INLINECODE10

示例：
CODEBLOCK8

Output Format

所有模式输出统一的 JSON 结构：

CODEBLOCK9

Anti-Detection Features

技术	说明
navigator.webdriver 隐藏	删除 webdriver 属性，伪装为真实浏览器
User-Agent 轮换

Notes

- 纯手写反检测代码，不使用任何第三方 stealth 插件（如 puppeteer-extra-plugin-stealth），避免被反检测系统识别
使用 Playwright 而非 Puppeteer，因为 Playwright 的反检测基础更好
所有反检测代码通过 addInitScript 在页面加载前注入
请遵守目标网站的 robots.txt 和服务条款

Stealth Scraper

基于 Playwright 的反爬虫网页抓取技能，支持普通模式、隐身模式和批量模式。

描述

强大的网页抓取工具，内置反检测技术，绕过常见反爬虫机制。纯手写反检测代码，不依赖任何第三方 stealth 插件。

核心能力：

- 🕵️ 隐身模式：隐藏 webdriver 指纹、随机 UA、随机延迟、禁用指纹追踪
📄 普通模式：快速抓取页面内容
📦 批量模式：并发抓取多个 URL，自动限速
🎯 精确提取：CSS 选择器定向抓取
📸 截图 & HTML 保存

配置

yaml
bins: [node]

安装

首次使用前需要安装依赖：

bash
cd ~/.openclaw/workspace/skills/stealth-scraper
npm install

如果 Chromium 未自动安装，运行：

bash
node scripts/setup.js

使用方法

普通模式（快速抓取）

bash
node scripts/scraper-simple.js [options]

参数：

参数	说明	默认值
--wait <ms>	页面加载后等待时间（毫秒）	2000
--selector <css>

CSS 选择器，只提取匹配元素的内容 | 无（提取全部） |

示例：
bash

基本抓取

node scripts/scraper-simple.js https://example.com

等待 5 秒后提取

node scripts/scraper-simple.js https://example.com --wait 5000

只提取文章正文

node scripts/scraper-simple.js https://example.com --selector article.content

隐身模式（反爬虫）

bash
node scripts/scraper-stealth.js [options]

参数：

参数	说明	默认值
--wait <ms>	页面加载后等待时间（毫秒）	2000
--selector <css>

示例：
bash

隐身抓取

node scripts/scraper-stealth.js https://example.com

使用代理 + 截图

node scripts/scraper-stealth.js https://example.com --proxy http://127.0.0.1:7890 --screenshot ./shot.png

自动滚动 + 保存 HTML

node scripts/scraper-stealth.js https://example.com --scroll --html ./page.html

带 cookie 访问

node scripts/scraper-stealth.js https://example.com --cookie [{name:token,value:abc123,domain:.example.com}]

精确提取 + 等待

node scripts/scraper-stealth.js https://example.com --selector div.main --wait 5000

批量模式

bash
node scripts/scraper-batch.js [options] ...

或

node scripts/scraper-batch.js --file urls.txt [options]

参数：

参数	说明	默认值
--file <path>	URL 列表文件（每行一个 URL）	无
--concurrency <n>

示例：
bash

批量抓取多个 URL

node scripts/scraper-batch.js https://a.com https://b.com https://c.com

从文件读取 URL 列表，隐身模式

node scripts/scraper-batch.js --file urls.txt --stealth --concurrency 2

输出到文件

node scripts/scraper-batch.js --file urls.txt --output results.json

输出格式

所有模式输出统一的 JSON 结构：

json
{
success: true,
url: https://example.com,
title: Example Domain,
content: 页面文本内容...,
links: [{text: More info, href: https://...}],
images: [{alt: Logo, src: https://...}],
metadata: {
description: ...,
keywords: ...,
author: ...
},
elapsedSeconds: 2.35
}

反检测特性

技术	说明
navigator.webdriver 隐藏	删除 webdriver 属性，伪装为真实浏览器
User-Agent 轮换

注意事项

- 纯手写反检测代码，不使用任何第三方 stealth 插件（如 puppeteer-extra-plugin-stealth），避免被反检测系统识别
使用 Playwright 而非 Puppeteer，因为 Playwright 的反检测基础更好
所有反检测代码通过 addInitScript 在页面加载前注入
请遵守目标网站的 robots.txt 和服务条款

stealth-scraper隐身爬虫

stealth-scraper

Stealth Scraper

Description

Configuration

Setup

Usage

普通模式（快速抓取）

隐身模式（反爬虫）

批量模式

Output Format

Anti-Detection Features

Notes

Stealth Scraper

描述

配置

安装

使用方法

普通模式（快速抓取）

基本抓取

等待 5 秒后提取

只提取文章正文

隐身模式（反爬虫）

隐身抓取

使用代理 + 截图

自动滚动 + 保存 HTML

带 cookie 访问

精确提取 + 等待

批量模式

或

批量抓取多个 URL

从文件读取 URL 列表，隐身模式

输出到文件

输出格式

反检测特性

注意事项

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement