Lowyat Forum Research Tool
End-to-end research pipeline: Search → Scrape → Analyze
Workflow
Step 1: Understand the user's research topic
- - Ask clarifying questions if needed (e.g. what specifically they want to learn)
- Break the topic into 3-5 search keyword variations
Step 2: Search for relevant threads
- - Use
WebSearch with site:forum.lowyat.net <keywords> to find threads - Use
allowed_domains: ["forum.lowyat.net"] to filter results - Run multiple searches in parallel with different keyword angles
- Present the most relevant threads to the user with titles and URLs
- Let the user pick which threads to scrape, or recommend the best ones
Step 3: Scrape the selected threads
- - The scraper script (
datascraping.py) should be in the project root - Install Python dependencies:
CODEBLOCK0
Or if you have uv installed:
CODEBLOCK1
- - Run the scraper for each thread:
CODEBLOCK2
- - IMPORTANT: Do NOT include
/all or /+N suffixes in the URL — just use the base topic URL (e.g. https://forum.lowyat.net/topic/5411252) - To scrape multiple threads, run them sequentially (one at a time) to be respectful to the server. Only run up to 3 in parallel if the user explicitly asks for speed, using
& and INLINECODE8 - Output:
<topic_id>.xlsx files with columns: Name, Date, INLINECODE12
Step 4: Analyze the scraped data
- - Read the scraped
.xlsx files to understand the forum discussions - Synthesize findings across all threads into a structured summary
- Organize insights by the user's research questions
- Include: consensus opinions, brand recommendations, price ranges, warnings, and specific user experiences
- Cite which thread/user said what when relevant
Scraper Details
- - Forum uses 20 posts per page, paginated via
/+N URL suffix - Scraper auto-detects total pages and crawls all of them
- Random 0.5–2s delay between page requests
- Saves incrementally after each page — safe to interrupt
- If
.xlsx already exists, it resumes by appending
Tips for good searches
- - Use brand names: INLINECODE16
- Use Malay keywords too: INLINECODE17
- Add "recommendation" or "review": INLINECODE18
- Search by location: INLINECODE19
- Try year filters for recency: INLINECODE20
Example usage
User: "I want to research mechanical keyboards on Lowyat"
- 1. Search with variations:
mechanical keyboard recommendation, cherry mx switch, keychron Malaysia, INLINECODE24 - Present top threads to user
- Scrape selected threads in parallel
- Read the xlsx files and provide analysis: popular brands, price ranges, where to buy, common complaints
Links
Disclaimer
Scraped data contains publicly available usernames, dates, and comments from forum.lowyat.net. This tool is intended for personal research purposes only. Users are responsible for how they store, share, and use the scraped data in compliance with applicable privacy laws and Lowyat forum's terms of service.
Lowyat论坛研究工具
端到端研究流程:搜索 → 抓取 → 分析
工作流程
第一步:理解用户的研究主题
- - 必要时提出澄清性问题(例如用户具体想了解什么)
- 将主题拆分为3-5个搜索关键词变体
第二步:搜索相关帖子
- - 使用WebSearch配合site:forum.lowyat.net <关键词>查找帖子
- 使用allowed_domains: [forum.lowyat.net]过滤结果
- 从不同关键词角度并行运行多次搜索
- 向用户展示最相关的帖子,包含标题和链接
- 让用户选择要抓取的帖子,或推荐最佳帖子
第三步:抓取选定的帖子
- - 抓取脚本(datascraping.py)应位于项目根目录
- 安装Python依赖:
bash
pip install requests beautifulsoup4 html5lib openpyxl tqdm
或者如果你已安装uv:
bash
uv sync
bash
python datascraping.py <帖子URL>
- - 重要:URL中不要包含/all或/+N后缀——仅使用基础帖子URL(例如https://forum.lowyat.net/topic/5411252)
- 要抓取多个帖子,请依次运行(一次一个)以尊重服务器。仅当用户明确要求速度时,才使用&和wait最多并行运行3个
- 输出:包含列姓名、日期、评论的<主题ID>.xlsx文件
第四步:分析抓取的数据
- - 读取抓取的.xlsx文件以了解论坛讨论内容
- 将所有帖子的发现结果综合成结构化摘要
- 按用户的研究问题组织见解
- 包括:共识观点、品牌推荐、价格范围、警告和具体用户经验
- 在相关时引用哪个帖子/用户说了什么
抓取器详情
- - 论坛每页20条帖子,通过/+NURL后缀分页
- 抓取器自动检测总页数并爬取所有页面
- 页面请求之间随机延迟0.5-2秒
- 每页后增量保存——可安全中断
- 如果.xlsx已存在,则通过追加方式恢复
搜索技巧
- - 使用品牌名称:site:forum.lowyat.net Toto toilet
- 也使用马来语关键词:site:forum.lowyat.net kipas exhaust tandas
- 添加推荐或评测:site:forum.lowyat.net water heater recommendation
- 按地点搜索:site:forum.lowyat.net bathroom shop KL Selangor
- 尝试年份筛选以获取最新信息:site:forum.lowyat.net smart toilet 2024 2025
使用示例
用户:我想在Lowyat上研究机械键盘
- 1. 使用变体搜索:mechanical keyboard recommendation、cherry mx switch、keychron Malaysia、custom keyboard
- 向用户展示热门帖子
- 并行抓取选定的帖子
- 读取xlsx文件并提供分析:热门品牌、价格范围、购买地点、常见投诉
链接
免责声明
抓取的数据包含来自forum.lowyat.net的公开用户名、日期和评论。此工具仅供个人研究使用。用户有责任按照适用的隐私法和Lowyat论坛的服务条款存储、共享和使用抓取的数据。