Daily Literature Search Skill

Automated literature search system for academic researchers. Performs scheduled searches across multiple databases (PubMed, OpenAlex, Semantic Scholar), automatically deduplicates results, downloads open-access papers, and generates daily reports.

🎯 Use Cases

- Daily literature monitoring for specific research topics
Automated paper collection for literature reviews
Stay updated on latest publications in your field
Build personal paper library with automatic categorization

📦 Components

1. Core Search Script (`daily_literature_search.py`)

Main execution script with the following features:

- Multi-source search: PubMed, OpenAlex, Semantic Scholar
Automatic deduplication: By DOI (within batch + against local library)
OA detection: Uses Unpaywall API to identify open-access papers
Auto-download: Downloads OA papers from PubMed Central or publisher sites
Smart categorization: Classifies papers by topic (configurable keywords)
Daily reports: Generates Markdown reports with search statistics

2. Upload Analyzer (`analyze_uploaded.py`)

Analyzes and categorizes manually uploaded papers:

- Filename-based classification: Uses keyword matching
DOI extraction: From filenames and metadata
Batch processing: Handles multiple files at once
Report generation: Creates categorization summary

⚙️ Configuration

Directory Structure

CODEBLOCK0

Search Keywords (Customizable)

Edit SEARCH_KEYWORDS in daily_literature_search.py:

CODEBLOCK1

Classification Keywords

Edit B_ALL_KEYWORDS and MM_KEYWORDS in analyze_uploaded.py to match your research domains.

🚀 Usage

Manual Execution

CODEBLOCK2

Scheduled Execution (Cron)

Add to crontab for automatic daily searches:

CODEBLOCK3

Configuration Options

Parameter	Default	Description
INLINECODE7	10	Max results per keyword per source
INLINECODE8

📊 Output

Daily Report Example

CODEBLOCK4

File Organization

- Reports: INLINECODE12
Logs: INLINECODE13
Papers: INLINECODE14

🔧 Advanced Features

1. Library Deduplication

Automatically checks new results against existing library:

- Scans all category directories for existing DOIs
Extracts DOIs from filenames and historical logs
Skips papers already in library
Reports duplicate statistics

2. Open Access Detection

Uses Unpaywall API to identify OA papers:

CODEBLOCK5

3. PubMed Central Integration

Automatically tries PMC for biomedical papers:

CODEBLOCK6

🛠️ Customization Guide

Change Research Topics

1. Edit SEARCH_KEYWORDS in INLINECODE16
Update category names and keywords
Modify directory structure if needed

Add New Categories

1. Create new directory: INLINECODE17
Add classification keywords in classify_paper() function
Update report generation to include new category

Integrate with Notification Systems

Add email/Slack/Discord notifications after search completion:

CODEBLOCK7

📋 Requirements

Python Dependencies

CODEBLOCK8

API Access (Optional but Recommended)

- Semantic Scholar API Key: Higher rate limits
OpenAlex API Key: Polite pool access
Unpaywall: Free, no key needed (email required)

Set environment variables:

CODEBLOCK9

⚠️ Important Notes

1. Rate Limits: Respect API rate limits, especially without API keys
Storage: Monitor disk space for downloaded PDFs
Copyright: Only download open-access or legally available papers
Email: Set USER_EMAIL for polite API access

🔄 Version History

- 1.0.0 (2026-03-18): Initial release

- Multi-source search (PubMed, OpenAlex, Semantic Scholar) - Automatic deduplication (batch + library) - OA detection and download - Smart categorization - Daily reports with statistics

🤝 Contributing

To contribute improvements:

1. Fork the skill repository
Test changes with your own literature search
Submit pull request with description of improvements

📄 License

This skill is provided as-is for academic research purposes. Users are responsible for compliance with publisher terms and copyright laws.

每日文献检索技能

面向学术研究人员的自动化文献检索系统。在多个数据库（PubMed、OpenAlex、Semantic Scholar）中执行定时检索，自动去重结果，下载开放获取论文，并生成每日报告。

🎯 使用场景

- 每日文献监控：针对特定研究主题
自动化论文收集：用于文献综述
保持更新：掌握所在领域的最新出版物
构建个人论文库：自动分类整理

📦 组件

1. 核心检索脚本（dailyliteraturesearch.py）

主执行脚本，具有以下功能：

- 多源检索：PubMed、OpenAlex、Semantic Scholar
自动去重：按DOI（批次内 + 与本地库比对）
开放获取检测：使用Unpaywall API识别开放获取论文
自动下载：从PubMed Central或出版商网站下载开放获取论文
智能分类：按主题对论文进行分类（可配置关键词）
每日报告：生成包含检索统计信息的Markdown报告

2. 上传分析器（analyze_uploaded.py）

分析并分类手动上传的论文：

- 基于文件名的分类：使用关键词匹配
DOI提取：从文件名和元数据中提取
批量处理：一次性处理多个文件
报告生成：创建分类摘要

⚙️ 配置

目录结构

papers/
├── B-ALL/raw/ # 类别1（例如：B-ALL研究）
├── MM/raw/ # 类别2（例如：多发性骨髓瘤）
├── OTHER/raw/ # 其他论文
├── dailysearchlogs/ # 检索日志和报告
└── upload_temp/ # 临时上传目录

检索关键词（可自定义）

编辑 dailyliteraturesearch.py 中的 SEARCH_KEYWORDS：

python
SEARCH_KEYWORDS = [
奥加伊妥珠单抗,
Elranatamab,
Teclistamab,
Talquetamab,
博纳吐单抗,
(CAR-T AND B-ALL),
]

分类关键词

编辑 analyzeuploaded.py 中的 BALLKEYWORDS 和 MMKEYWORDS，以匹配您的研究领域。

🚀 使用方法

手动执行

bash

运行每日检索

python3 papers/dailyliteraturesearch.py

分析上传的论文

python3 papers/analyze_uploaded.py

定时执行（Cron）

添加到crontab以实现自动每日检索：

bash

每日上午6:30检索

30 6 * /usr/bin/python3 /path/to/papers/dailyliteraturesearch.py >> /path/to/papers/dailysearchlogs/cron.log 2>&1

配置选项

参数	默认值	描述
MAXRESULTSPERKEYWORD	10	每个来源每个关键词的最大结果数
DATERANGE_DAYS

📊 输出

每日报告示例

markdown

📚 每日文献检索报告

检索日期： 2026-03-18

📊 检索汇总
分类检索到成功下载付费墙
B-ALL 28 0 28
MM
24 | 0 | 24 |

分类	检索到	成功下载	付费墙
B-ALL	28	0	28
MM

| 总计 | 53 | 0 | 53 |

🔀 去重统计

- 原始检索结果：130 篇
去重后文献：110 篇
批次内重复：2 篇
库中已有：18 篇

文件组织

- 报告：papers/dailysearchlogs/dailyreportYYYY-MM-DD.md
日志：papers/dailysearchlogs/dailysearchYYYY-MM-DD.log
论文：papers/{CATEGORY}/raw/{DOI}.pdf

🔧 高级功能

1. 库去重

自动将新结果与现有库进行比对：

- 扫描所有类别目录以查找现有DOI
从文件名和历史日志中提取DOI
跳过库中已有的论文
报告重复统计信息

2. 开放获取检测

使用Unpaywall API识别开放获取论文：

python
isoa, oaurl = checkopenaccess(doi)
if is_oa:
downloadpaper(oaurl, save_path)

3. PubMed Central集成

自动尝试PMC获取生物医学论文：

python
if pmid and str(pmid).isdigit():
downloadfrompubmed(pmid, save_path)

🛠️ 自定义指南

更改研究主题

1. 编辑 dailyliteraturesearch.py 中的 SEARCH_KEYWORDS
更新类别名称和关键词
如有需要，修改目录结构

添加新类别

1. 创建新目录：papers/NEWCATEGORY/raw/
在 classifypaper() 函数中添加分类关键词
更新报告生成以包含新类别

集成通知系统

在检索完成后添加电子邮件/Slack/Discord通知：

python

在main()末尾

send_notification(f每日检索完成：找到 {results[total]} 篇论文)

📋 要求

Python依赖

bash
pip install requests

大多数其他模块为标准库

API访问（可选但推荐）

- Semantic Scholar API密钥：更高速率限制
OpenAlex API密钥：礼貌池访问
Unpaywall：免费，无需密钥（需要电子邮件）

设置环境变量：

bash
export SEMANTICSCHOLARAPI_KEY=your-key
export OPENALEXAPIKEY=your-key
export USER_EMAIL=your@email.com

⚠️ 重要说明

1. 速率限制：遵守API速率限制，尤其在没有API密钥的情况下
存储：监控下载PDF的磁盘空间
版权：仅下载开放获取或合法可用的论文
电子邮件：设置 USER_EMAIL 以实现礼貌API访问

🔄 版本历史

- 1.0.0（2026-03-18）：初始版本

- 多源检索（PubMed、OpenAlex、Semantic Scholar） - 自动去重（批次 + 库） - 开放获取检测和下载 - 智能分类 - 带统计信息的每日报告

🤝 贡献

要贡献改进：

1. Fork该技能仓库
使用您自己的文献检索测试更改
提交包含改进描述的拉取请求

📄 许可证

本技能按原样提供，用于学术研究目的。用户有责任遵守出版商条款和版权法。

daily-literature每日文献系统