Literature Review
Help write academic literature reviews using a multi-engine search integration (S2, OA, CR, PM).
Capabilities
- - Multi-Source Search: Find relevant academic papers using Semantic Scholar (S2), OpenAlex (OA), Crossref (CR), and PubMed (PM).
- Full Abstracts: All sources now return complete abstracts (PubMed uses
efetch for full XML records). - DOI Extraction: DOIs are extracted from all sources for cross-referencing and deduplication.
- Automatic Deduplication: When searching multiple sources (
--source all or --source both), results are automatically deduplicated by DOI. - Polite Access: Automatic email identification for OpenAlex/Crossref "Polite Pool" (via
USER_EMAIL env var). - Abstract Reconstruction: Reconstructs abstracts from OpenAlex inverted index format.
- Synthesis: Group papers by theme and draft review sections based on metadata.
Environment Variables
| Variable | Purpose | Default |
|---|
| INLINECODE4 | Email for polite API access | INLINECODE5 |
| INLINECODE6 |
Fallback if USER_EMAIL not set | — |
|
SEMANTIC_SCHOLAR_API_KEY | Optional S2 API key for higher rate limits | — |
|
OPENALEX_API_KEY | Optional OpenAlex API key | — |
Workflows
1. Broad Search (All Bases)
Get a comprehensive overview from all major academic databases. Results are automatically deduplicated by DOI.
CODEBLOCK0
2. Targeted Search
- - OpenAlex (
oa): Fast and comprehensive, good abstracts. - Semantic Scholar (
s2): High-quality citation data and TL;DRs. - Crossref (
cr): Precise DOI-based metadata (no abstracts). - PubMed (
pm): Gold standard for biomedical research, full abstracts and PMIDs.
CODEBLOCK1
3. Comparing Sources
Search both S2 and OA simultaneously to ensure nothing is missed. Deduplicated by default.
CODEBLOCK2
4. Getting Full Details (S2)
Retrieve detailed metadata including TL;DR summaries.
CODEBLOCK3
5. Writing the Review
- 1. Extract: Pull key findings from the abstracts found.
- Organize: Group findings into a logical structure (e.g., chronological or thematic).
- Draft: Use the "Think step-by-step" approach to synthesize multiple sources into a coherent narrative.
Output Format
Each result includes:
- -
id: Source-specific identifier (PMID for PubMed, OpenAlex ID, S2 paper ID, DOI for Crossref) - INLINECODE14 : DOI when available (used for deduplication)
- INLINECODE15 : Paper title
- INLINECODE16 : Publication year
- INLINECODE17 : List of author names
- INLINECODE18 : Full abstract text (when available)
- INLINECODE19 : Journal or conference name
- INLINECODE20 : Citation count (S2, OA)
- INLINECODE21 : Which database the result came from
Tips for Success
- - Citations: Always cross-reference the DOI or PMID for accuracy in bibliography.
- Filtering: Focus on papers with higher
citationCount or recent years for a more modern review. - PubMed for Medicine: Use
--source pm for the most reliable biomedical literature. - Deduplication: Multi-source searches automatically remove duplicates; use single sources if you need raw counts.
文献综述
利用多引擎搜索集成(S2、OA、CR、PM)帮助撰写学术文献综述。
功能
- - 多源搜索:使用Semantic Scholar(S2)、OpenAlex(OA)、Crossref(CR)和PubMed(PM)查找相关学术论文。
- 完整摘要:所有来源现在均返回完整摘要(PubMed使用efetch获取完整XML记录)。
- DOI提取:从所有来源提取DOI,用于交叉引用和去重。
- 自动去重:搜索多个来源(--source all或--source both)时,结果按DOI自动去重。
- 礼貌访问:通过USER_EMAIL环境变量自动识别OpenAlex/Crossref礼貌池的邮箱。
- 摘要重构:从OpenAlex倒排索引格式重构摘要。
- 综合整理:按主题对论文进行分组,并根据元数据起草综述章节。
环境变量
| 变量 | 用途 | 默认值 |
|---|
| USEREMAIL | 用于礼貌API访问的邮箱 | anonymous@example.org |
| CLAWDBOTEMAIL |
若USER_EMAIL未设置时的备用邮箱 | — |
| SEMANTIC
SCHOLARAPI_KEY | 可选S2 API密钥,用于更高速率限制 | — |
| OPENALEX
APIKEY | 可选OpenAlex API密钥 | — |
工作流程
1. 广泛搜索(所有数据库)
从所有主要学术数据库获取全面概览。结果按DOI自动去重。
bash
python3 scripts/lit_search.py search impact of glycyrrhiza on bifidobacterium --limit 5 --source all
2. 定向搜索
- - OpenAlex(oa):快速全面,摘要质量好。
- Semantic Scholar(s2):高质量的引文数据和TL;DR摘要。
- Crossref(cr):基于DOI的精确元数据(无摘要)。
- PubMed(pm):生物医学研究的黄金标准,包含完整摘要和PMID。
bash
python3 scripts/lit_search.py search prebiotic effects of liquorice --source pm
3. 比较来源
同时搜索S2和OA,确保不遗漏任何内容。默认去重。
bash
python3 scripts/lit_search.py search Bifidobacterium infantis growth --source both
4. 获取详细信息(S2)
检索包含TL;DR摘要的详细元数据。
bash
python3 scripts/lit_search.py details DOI:10.1016/j.foodchem.2023.136000
5. 撰写综述
- 1. 提取:从找到的摘要中提取关键发现。
- 组织:将发现按逻辑结构分组(例如,按时间顺序或主题)。
- 起草:使用逐步思考方法将多个来源综合成连贯的叙述。
输出格式
每个结果包括:
- - id:来源特定标识符(PubMed的PMID、OpenAlex ID、S2论文ID、Crossref的DOI)
- doi:可用的DOI(用于去重)
- title:论文标题
- year:发表年份
- authors:作者姓名列表
- abstract:完整摘要文本(如有)
- venue:期刊或会议名称
- citationCount:引用次数(S2、OA)
- source:结果来源数据库
成功技巧
- - 引用:始终交叉引用DOI或PMID,确保参考文献准确性。
- 筛选:关注引用次数较高或近年发表的论文,以获得更现代的综述。
- 医学使用PubMed:使用--source pm获取最可靠的生物医学文献。
- 去重:多源搜索自动去除重复项;如需原始计数,请使用单一来源。