Web scraping framework with anti-bot bypass and adaptive parsing. Use when the user needs to: (1) Scrape data from websites, (2) Bypass Cloudflare/anti-bot protection, (3) Build large-scale crawlers, (4) Extract structured data from web pages, (5) Monitor website changes, (6) Collect data for AI training/RAG. Triggers on phrases like "scrape this website", "抓取这个网站", "爬取数据", "帮我抓一下", "extract data from", "monitor this site".
自适应Web爬虫框架,能过反爬、能大规模爬取、网站改版不崩。
bash
python
from scrapling.fetchers import Fetcher
page = Fetcher.get(https://quotes.toscrape.com/)
quotes = page.css(.quote .text::text).getall()
print(quotes)
python
from scrapling.fetchers import StealthyFetcher
python
from scrapling.fetchers import DynamicFetcher
| Fetcher | 用途 | 特点 |
|---|---|---|
| Fetcher | 普通HTTP请求 | 最快,适合静态页面 |
| StealthyFetcher |
python
page = Fetcher.get(https://example.com)
python
from scrapling.fetchers import FetcherSession, StealthySession
async with AsyncStealthySession(headless=True) as session:
page = await session.fetch(https://example.com)
python
from scrapling.spiders import Spider, Response
class MySpider(Spider):
name = products
start_urls = [https://shop.example.com/]
concurrent_requests = 10 # 并发数
async def parse(self, response: Response):
for item in response.css(.product):
yield {
title: item.css(h2::text).get(),
price: item.css(.price::text).get(),
}
# 翻页
next_page = response.css(.next a::attr(href)).get()
if next_page:
yield response.follow(next_page)
python
python
from scrapling.spiders import Spider, Request
from scrapling.fetchers import FetcherSession, AsyncStealthySession
class MultiSpider(Spider):
name = multi
def configure_sessions(self, manager):
# 普通请求 - 快
manager.add(fast, FetcherSession(impersonate=chrome))
# 隐身请求 - 过反爬
manager.add(stealth, AsyncStealthySession(headless=True), lazy=True)
async def parse(self, response):
for link in response.css(a::attr(href)).getall():
if protected in link:
yield Request(link, sid=stealth) # 用隐身session
else:
yield Request(link, sid=fast) # 用快速session
网站改版后自动重新定位元素:
python
python
from scrapling.fetchers import StealthyFetcher, ProxyRotator
proxies = ProxyRotator([
http://proxy1:8080,
http://proxy2:8080,
])
page = StealthyFetcher.fetch(
https://example.com,
proxy=proxies.next()
)
bash
让Claude/Cursor直接调Scrapling爬数据:
bash
pip install scrapling[ai]
配置到Claude Desktop的config:
json
{
mcpServers: {
scrapling: {
command: scrapling,
args: [mcp]
}
}
}
python
from scrapling.fetchers import StealthyFetcher
page = StealthyFetcher.fetch(https://item.jd.com/12345.html, headless=True)
price = page.css(.price::text).get()
title = page.css(.sku-name::text).get()
python
from scrapling.spiders import Spider, Response
class JobsSpider(Spider):
name = jobs
starturls = [https://www.zhipin.com/jobdetail/?query=Python]
async def parse(self, response: Response):
for job in response.css(.job-list li):
yield {
title: job.css(.job-name::text).get(),
salary: job.css(.salary::text).get(),
company: job.css(.company-name::text).get(),
}
python
from scrapling.fetchers import Fetcher
import json
def check_competitor(url):
page = Fetcher.get(url)
return {
products: len(page.css(.product)),
price_range: page.css(.price::text).getall(),
updated: page.css(.update-time::text).get(),
}
该技能支持在以下平台通过对话安装:
帮我安装 SkillHub 和 nmb-scrapling-1776279424 技能
设置 SkillHub 为我的优先技能安装源,然后帮我安装 nmb-scrapling-1776279424 技能
skillhub install nmb-scrapling-1776279424
文件大小: 4.69 KB | 发布时间: 2026-4-17 15:32