Daily Literature Search Skill
Automated literature search system for academic researchers. Performs scheduled searches across multiple databases (PubMed, OpenAlex, Semantic Scholar), automatically deduplicates results, downloads open-access papers, and generates daily reports.
🎯 Use Cases
- - Daily literature monitoring for specific research topics
- Automated paper collection for literature reviews
- Stay updated on latest publications in your field
- Build personal paper library with automatic categorization
📦 Components
1. Core Search Script (daily_literature_search.py)
Main execution script with the following features:
- - Multi-source search: PubMed, OpenAlex, Semantic Scholar
- Automatic deduplication: By DOI (within batch + against local library)
- OA detection: Uses Unpaywall API to identify open-access papers
- Auto-download: Downloads OA papers from PubMed Central or publisher sites
- Smart categorization: Classifies papers by topic (configurable keywords)
- Daily reports: Generates Markdown reports with search statistics
2. Upload Analyzer (analyze_uploaded.py)
Analyzes and categorizes manually uploaded papers:
- - Filename-based classification: Uses keyword matching
- DOI extraction: From filenames and metadata
- Batch processing: Handles multiple files at once
- Report generation: Creates categorization summary
⚙️ Configuration
Directory Structure
CODEBLOCK0
Search Keywords (Customizable)
Edit SEARCH_KEYWORDS in daily_literature_search.py:
CODEBLOCK1
Classification Keywords
Edit B_ALL_KEYWORDS and MM_KEYWORDS in analyze_uploaded.py to match your research domains.
🚀 Usage
Manual Execution
CODEBLOCK2
Scheduled Execution (Cron)
Add to crontab for automatic daily searches:
CODEBLOCK3
Configuration Options
| Parameter | Default | Description |
|---|
| INLINECODE7 | 10 | Max results per keyword per source |
| INLINECODE8 |
7 | Search window (recent N days) |
|
SOURCES |
["pm", "oa", "s2"] | Search databases |
|
USER_EMAIL | — | For polite API access (env var) |
📊 Output
Daily Report Example
CODEBLOCK4
File Organization
- - Reports: INLINECODE12
- Logs: INLINECODE13
- Papers: INLINECODE14
🔧 Advanced Features
1. Library Deduplication
Automatically checks new results against existing library:
- - Scans all category directories for existing DOIs
- Extracts DOIs from filenames and historical logs
- Skips papers already in library
- Reports duplicate statistics
2. Open Access Detection
Uses Unpaywall API to identify OA papers:
CODEBLOCK5
3. PubMed Central Integration
Automatically tries PMC for biomedical papers:
CODEBLOCK6
🛠️ Customization Guide
Change Research Topics
- 1. Edit
SEARCH_KEYWORDS in INLINECODE16 - Update category names and keywords
- Modify directory structure if needed
Add New Categories
- 1. Create new directory: INLINECODE17
- Add classification keywords in
classify_paper() function - Update report generation to include new category
Integrate with Notification Systems
Add email/Slack/Discord notifications after search completion:
CODEBLOCK7
📋 Requirements
Python Dependencies
CODEBLOCK8
API Access (Optional but Recommended)
- - Semantic Scholar API Key: Higher rate limits
- OpenAlex API Key: Polite pool access
- Unpaywall: Free, no key needed (email required)
Set environment variables:
CODEBLOCK9
⚠️ Important Notes
- 1. Rate Limits: Respect API rate limits, especially without API keys
- Storage: Monitor disk space for downloaded PDFs
- Copyright: Only download open-access or legally available papers
- Email: Set
USER_EMAIL for polite API access
🔄 Version History
- - 1.0.0 (2026-03-18): Initial release
- Multi-source search (PubMed, OpenAlex, Semantic Scholar)
- Automatic deduplication (batch + library)
- OA detection and download
- Smart categorization
- Daily reports with statistics
🤝 Contributing
To contribute improvements:
- 1. Fork the skill repository
- Test changes with your own literature search
- Submit pull request with description of improvements
📄 License
This skill is provided as-is for academic research purposes. Users are responsible for compliance with publisher terms and copyright laws.
每日文献检索技能
面向学术研究人员的自动化文献检索系统。在多个数据库(PubMed、OpenAlex、Semantic Scholar)中执行定时检索,自动去重结果,下载开放获取论文,并生成每日报告。
🎯 使用场景
- - 每日文献监控:针对特定研究主题
- 自动化论文收集:用于文献综述
- 保持更新:掌握所在领域的最新出版物
- 构建个人论文库:自动分类整理
📦 组件
1. 核心检索脚本(dailyliteraturesearch.py)
主执行脚本,具有以下功能:
- - 多源检索:PubMed、OpenAlex、Semantic Scholar
- 自动去重:按DOI(批次内 + 与本地库比对)
- 开放获取检测:使用Unpaywall API识别开放获取论文
- 自动下载:从PubMed Central或出版商网站下载开放获取论文
- 智能分类:按主题对论文进行分类(可配置关键词)
- 每日报告:生成包含检索统计信息的Markdown报告
2. 上传分析器(analyze_uploaded.py)
分析并分类手动上传的论文:
- - 基于文件名的分类:使用关键词匹配
- DOI提取:从文件名和元数据中提取
- 批量处理:一次性处理多个文件
- 报告生成:创建分类摘要
⚙️ 配置
目录结构
papers/
├── B-ALL/raw/ # 类别1(例如:B-ALL研究)
├── MM/raw/ # 类别2(例如:多发性骨髓瘤)
├── OTHER/raw/ # 其他论文
├── dailysearchlogs/ # 检索日志和报告
└── upload_temp/ # 临时上传目录
检索关键词(可自定义)
编辑 dailyliteraturesearch.py 中的 SEARCH_KEYWORDS:
python
SEARCH_KEYWORDS = [
奥加伊妥珠单抗,
Elranatamab,
Teclistamab,
Talquetamab,
博纳吐单抗,
(CAR-T AND B-ALL),
]
分类关键词
编辑 analyzeuploaded.py 中的 BALLKEYWORDS 和 MMKEYWORDS,以匹配您的研究领域。
🚀 使用方法
手动执行
bash
运行每日检索
python3 papers/daily
literaturesearch.py
分析上传的论文
python3 papers/analyze_uploaded.py
定时执行(Cron)
添加到crontab以实现自动每日检索:
bash
每日上午6:30检索
30 6
* /usr/bin/python3 /path/to/papers/daily
literaturesearch.py >> /path/to/papers/daily
searchlogs/cron.log 2>&1
配置选项
| 参数 | 默认值 | 描述 |
|---|
| MAXRESULTSPERKEYWORD | 10 | 每个来源每个关键词的最大结果数 |
| DATERANGE_DAYS |
7 | 检索时间窗口(最近N天) |
| SOURCES | [pm, oa, s2] | 检索数据库 |
| USER_EMAIL | — | 用于礼貌API访问(环境变量) |
📊 输出
每日报告示例
markdown
📚 每日文献检索报告
检索日期: 2026-03-18
📊 检索汇总
24 | 0 | 24 |
| 总计 | 53 | 0 | 53 |
🔀 去重统计
- - 原始检索结果:130 篇
- 去重后文献:110 篇
- 批次内重复:2 篇
- 库中已有:18 篇
文件组织
- - 报告:papers/dailysearchlogs/dailyreportYYYY-MM-DD.md
- 日志:papers/dailysearchlogs/dailysearchYYYY-MM-DD.log
- 论文:papers/{CATEGORY}/raw/{DOI}.pdf
🔧 高级功能
1. 库去重
自动将新结果与现有库进行比对:
- - 扫描所有类别目录以查找现有DOI
- 从文件名和历史日志中提取DOI
- 跳过库中已有的论文
- 报告重复统计信息
2. 开放获取检测
使用Unpaywall API识别开放获取论文:
python
isoa, oaurl = checkopenaccess(doi)
if is_oa:
downloadpaper(oaurl, save_path)
3. PubMed Central集成
自动尝试PMC获取生物医学论文:
python
if pmid and str(pmid).isdigit():
downloadfrompubmed(pmid, save_path)
🛠️ 自定义指南
更改研究主题
- 1. 编辑 dailyliteraturesearch.py 中的 SEARCH_KEYWORDS
- 更新类别名称和关键词
- 如有需要,修改目录结构
添加新类别
- 1. 创建新目录:papers/NEWCATEGORY/raw/
- 在 classifypaper() 函数中添加分类关键词
- 更新报告生成以包含新类别
集成通知系统
在检索完成后添加电子邮件/Slack/Discord通知:
python
在main()末尾
send_notification(f每日检索完成:找到 {results[total]} 篇论文)
📋 要求
Python依赖
bash
pip install requests
大多数其他模块为标准库
API访问(可选但推荐)
- - Semantic Scholar API密钥:更高速率限制
- OpenAlex API密钥:礼貌池访问
- Unpaywall:免费,无需密钥(需要电子邮件)
设置环境变量:
bash
export SEMANTICSCHOLARAPI_KEY=your-key
export OPENALEXAPIKEY=your-key
export USER_EMAIL=your@email.com
⚠️ 重要说明
- 1. 速率限制:遵守API速率限制,尤其在没有API密钥的情况下
- 存储:监控下载PDF的磁盘空间
- 版权:仅下载开放获取或合法可用的论文
- 电子邮件:设置 USER_EMAIL 以实现礼貌API访问
🔄 版本历史
- 多源检索(PubMed、OpenAlex、Semantic Scholar)
- 自动去重(批次 + 库)
- 开放获取检测和下载
- 智能分类
- 带统计信息的每日报告
🤝 贡献
要贡献改进:
- 1. Fork该技能仓库
- 使用您自己的文献检索测试更改
- 提交包含改进描述的拉取请求
📄 许可证
本技能按原样提供,用于学术研究目的。用户有责任遵守出版商条款和版权法。