返回顶部
R

Real Estate Spider房产数据爬虫

专业爬取中国房产中介网站(安居客、搜房网、贝壳找房、链家)数据的通用爬虫技能,包含反爬虫策略和自动数据提取功能

作者: admin | 来源: ClawHub
源自
ClawHub
版本
V 1.0.0
安全检测
已通过
90
下载量
免费
免费
0
收藏
概述
安装方式
版本历史

Real Estate Spider

技能名称: Real Estate Spider
详细描述:

房产中介网站爬虫技能

简介

本技能专门用于爬取中国主流房产中介网站数据,包括:

  1. 1. 安居客 (anjuke.com)
  2. 搜房网 (soufun.com)
  3. 贝壳找房 (ke.com)
  4. 链家 (lianjia.com)

前置要求

  • - Python 3.x
  • agent-browser 技能已安装
  • requests 库 (可以通过 pip 安装)

安装依赖

bash

安装 Python requests 库


pip install requests beautifulsoup4 lxml

主要功能

1. 反爬虫绕过策略

  • - 模拟真实浏览器指纹
  • 随机延迟避免频率检测
  • Cookie 和会话管理
  • 代理 IP 支持(可选)
  • 验证码处理机制

2. 数据提取功能

  • - 提取房价信息
  • 提取房产面积
  • 提取地理位置
  • 提取户型信息
  • 提取装修状态
  • 提取建筑年代

3. 导出格式

  • - JSON 格式
  • CSV 格式
  • Excel 格式
  • 可视化图表

使用方法

基本爬虫脚本

bash

使用 Python 脚本爬取安居客数据


python3 ~/.openclaw/workspace/skills/real-estate-spider/scripts/anjuke_crawler.py

使用 Shell 脚本配合 agent-browser

bash ~/.openclaw/workspace/skills/real-estate-spider/scripts/bypass_anjuke.sh

配置网站选择

python

配置文件示例


~/.openclaw/workspace/skills/real-estate-spider/config/realestateconfig.py

import json

CONFIG = {
anjuke: {
url: https://www.anjuke.com,
data_selectors: {
price: .property-price,
area: .property-area,
location: .property-location,
type: .property-type
}
},
ke: {
url: https://ke.com,
data_selectors: {
price: .price-text,
area: .area-text,
location: .location-text,
type: .type-text
}
},
lianjia: {
url: https://www.lianjia.com,
data_selectors: {
price: .total-price,
area: .area-num,
location: .location-text,
type: .house-type
}
},
soufun: {
url: https://www.soufun.com,
data_selectors: {
price: .price-num,
area: .area-num,
location: .location-text,
type: .type-text
}
}
}

通用爬虫模板

python

通用爬虫脚本模板


import time
import random
import json
from dataclasses import dataclass

@dataclass
class PropertyData:
title: str
price: str
area: str
location: str
house_type: str
age: str
orientation: str
decoration: str

class RealEstateSpider:
def init(self, website_name):
self.websitename = websitename
self.config = CONFIG[website_name]
self.base_url = self.config[url]
self.selectors = self.config[data_selectors]

def crawl(self, city=北京, district=None):
爬取指定城市和区域的房产数据
# 构建URL
url = self.build_url(city, district)

# 发送请求
data = self.send_request(url)

# 解析数据
properties = self.parse_data(data)

# 返回结果
return properties

def build_url(self, city, district):
构建目标URL
if self.website_name == anjuke:
return f{self.base_url}/fangyuan/{city}
elif self.website_name == ke:
return f{self.base_url}/city/{city}
elif self.website_name == lianjia:
return f{self.base_url}/ershoufang/{city}
elif self.website_name == soufun:
return f{self.base_url}/esf/{city}
else:
return self.base_url

def send_request(self, url):
发送请求,处理反爬虫
headers = {
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36,
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,
Accept-Language: zh-CN,zh;q=0.9,
Accept-Encoding: gzip, deflate, br,
Connection: keep-alive,
Cache-Control: no-cache,
Upgrade-Insecure-Requests: 1
}

# 随机延迟避免频率检测
sleep_time = random.uniform(2, 5)
time.sleep(sleep_time)

# 发送请求(此处为简化示例,实际需要根据网站调整)
import requests
response = requests.get(url, headers=headers)
return response.text

def parsedata(self, htmldata):
解析HTML数据
# 这里需要根据具体网站的HTML结构实现解析逻辑
properties = []

# 示例解析逻辑
import re
pattern = rprice:([\d\.]+),.avgprice:([\d\.]+),.areanum:([\d\.]+),.houseage:([\d年]+),.orient:([^]+),.fitmentname:([^]+),.title:([^]+)
matches = re.findall(pattern, html_data)

for match in matches:
property = PropertyData(
title=match[6],
price=match[0],
area=match[2],
location=, # 需要根据网站调整
house_type=, # 需要根据网站调整
age=match[3],
orientation=match[4],
decoration=match[5]
)
properties.append(property)

return properties

def save_data(self, properties, format=json):
保存数据
if format == json:
with open(f{self.websitename}properties.json, w, encoding=utf-8) as f:
json.dump([prop.dict for prop in properties], f, ensure_ascii=False, indent=2)
elif format == csv:
import csv
with open(f{self.websitename}properties.csv, w, newline=, encoding=utf-8) as f:
writer = csv.writer(f)
writer.writerow([title, price, area, location, house_type, age, orientation, decoration])
for prop in properties:
writer.writerow([prop.title, prop.price, prop.area, prop.location, prop.house_type, prop.age, prop.orientation, prop.decoration])

if name == main:
# 示例:爬取安居客数据
spider = RealEstateSpider(anjuke)
properties = spider.crawl(city=南京)
spider.save_data(properties, format=json)

使用 agent-browser 进行浏览器自动化

bash

使用 agent-browser 绕过JavaScript检测


agent-browser set viewport 1920 1080
agent-browser set headers {
User-Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/146.0.0.0 Safari/537.36,
Accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/avif,image/webp,image/apng,/;q=0.8,
Accept-Language: zh-CN,zh;q=0.9,en;q=0.8,
Accept-Encoding: gzip, deflate, br,
Cache-Control: no-cache,
Connection: keep-alive,
Upgrade-Insecure-Requests: 1
}

访问房产网站

agent-browser open https://www.anjuke.com agent-browser wait 3000 agent-browser snapshot -i

模拟人类

标签

skill ai

通过对话安装

该技能支持在以下平台通过对话安装:

OpenClaw WorkBuddy QClaw Kimi Claude

方式一:安装 SkillHub 和技能

帮我安装 SkillHub 和 real-estate-spider-1775987762 技能

方式二:设置 SkillHub 为优先技能安装源

设置 SkillHub 为我的优先技能安装源,然后帮我安装 real-estate-spider-1775987762 技能

通过命令行安装

skillhub install real-estate-spider-1775987762

下载

⬇ 下载 Real Estate Spider v1.0.0(免费)

文件大小: 31.44 KB | 发布时间: 2026-4-13 11:45

v1.0.0 最新 2026-4-13 11:45
Initial release of Real Estate Spider – a universal crawler for major Chinese real estate sites.

- Supports data extraction from Anjuke, Soufun, Beike (ke.com), and Lianjia.
- Includes anti-crawling strategies: browser fingerprint simulation, random delay, session & cookie management, optional proxy IP, and captcha handling.
- Extracts core real estate info: price, area, location, type, decoration, and year built.
- Allows export in JSON, CSV, Excel, and supports visualization.
- Provides usage scripts for both Python and agent-browser automation.

Archiver·手机版·闲社网·闲社论坛·羊毛社区· 多链控股集团有限公司 · 苏ICP备2025199260号-1

Powered by Discuz! X5.0   © 2024-2025 闲社网·线报更新论坛·羊毛分享社区·http://xianshe.com

p2p_official_large
返回顶部