Semantic Scholar Search Workflow

Search academic papers via the Semantic Scholar API using a structured 4-phase workflow.

Critical rule: NEVER make multiple sequential Bash calls for API requests. Always write ONE Python script that runs all searches, then execute it once. All rate limiting is handled inside s2.py automatically.

Phase 1: Understand & Plan

Parse the user's intent and choose a search strategy:

Decision Tree

User wants...	Strategy	Function
Broad topic exploration	Relevance search	INLINECODE1
Precise technical terms, exact phrases

Query Construction Rules

- Ambiguous terms (e.g., "stem cells" could mean mesenchymal or stem-like T cells): Use build_bool_query() with exact phrases and exclusions

- Example:

build_bool_query(phrases=["stem-like T cells"], required=["CD4", "TCF7"], excluded=["mesenchymal", "hematopoietic stem cell"])

- Multi-context queries (e.g., "topic X in cancer AND autoimmunity"): Plan separate searches, deduplicate with INLINECODE15
Broad topics: Use search_relevance() with filters (year, venue, fieldsOfStudy, minCitationCount)

Plan Filters

Filter	Use when
INLINECODE17	Recent work only
INLINECODE18

Checkpoint: Before proceeding, verify: (1) search strategy matches user intent, (2) filters are appropriate, (3) query is specific enough to avoid irrelevant results.

Phase 2: Execute Search

Write ONE Python script. Example:

CODEBLOCK0

Execute with: INLINECODE24

Rules:

- Import everything from s2: INLINECODE25
Write script to /tmp/s2_search.py (or similar temp path)
One Bash call to execute. Never chain multiple API calls via separate Bash invocations.
Rate limiting, retries, and backoff are automatic inside s2.py

Checkpoint: Verify the script ran successfully (no exceptions) and returned results. If 0 results, broaden the query or relax filters before presenting.

Worked Examples

Example 1: Author workflow — "Find papers by Yann LeCun on self-supervised learning"

CODEBLOCK1

Example 2: Citation chain — "Who cited the Transformer paper and what did they build on?"

CODEBLOCK2

Example 3: Multi-seed recommendations with BibTeX export — "Find papers like these two but not about NLP"

CODEBLOCK3

Phase 3: Summarize & Present

- Use format_results() for consistent output (summary table + top-10 details)
If user's language is Chinese, present summaries in Chinese
Always note total results count and search strategy used
Highlight most relevant papers based on the user's specific question

Phase 4: User Interaction Loop

After presenting results, always offer these options:

1. Translate — titles/summaries to Chinese (or other language)
Details — full abstract for specific paper numbers
Refine — narrow or expand search with different terms/filters
Similar — find papers similar to a specific result (find_similar())
Citations — who cited a specific paper (get_citations())
Export — save results via export_bibtex(), export_markdown(), or INLINECODE32
Done — end search session

Loop until user says done. Each follow-up uses the same single-script pattern.

API Quick Reference

Helper Module (`s2.py`)

CODEBLOCK4

Paper Search Functions

Function	Purpose	Max Results
INLINECODE34	Simple broad search	1,000
INLINECODE35

Author Functions

Function	Purpose	Max Results
INLINECODE44	Find researchers by name	1,000
INLINECODE45

Filter Parameters (kwargs)

INLINECODE49, publication_date, venue, fields_of_study, min_citations, pub_types, INLINECODE55

- year: "2020-", "-2019", INLINECODE59
INLINECODE60: "2024-01-01:2024-06-30" (YYYY-MM-DD range, open-ended OK)
INLINECODE62: Review, JournalArticle, Conference, ClinicalTrial, MetaAnalysis, Dataset, Book, CaseReport, Editorial, LettersAndComments, News, Study, INLINECODE75

Boolean Query Syntax (bulk search only)

Syntax	Example	Meaning
INLINECODE76	INLINECODE77	Exact phrase
INLINECODE78

+transformer | Must include | | - | -survey | Exclude | | \| | CNN \| RNN | OR | | * | neuro* | Prefix wildcard | | () | (CNN \| RNN) +attention | Grouping |

Use build_bool_query(phrases, required, excluded, or_terms) to construct safely.

Output Functions

Function	Purpose
INLINECODE89	Markdown summary table
INLINECODE90

Supported ID Formats

INLINECODE98, ARXIV:2106.15928, PMID:19872477, PMCID:PMC2323569, CorpusId:215416146, ACL:2020.acl-main.447, DBLP:conf/acl/..., MAG:3015453090, INLINECODE106

Paper Fields

Default: INLINECODE107

Additional: abstract, references, citations, openAccessPdf, publicationDate, publicationVenue, fieldsOfStudy, s2FieldsOfStudy, journal, isOpenAccess, referenceCount, influentialCitationCount, citationStyles, embedding, INLINECODE122

Author fields: name, affiliations, paperCount, citationCount, hIndex, homepage, externalIds, INLINECODE130

Rate Limiting

Handled automatically by s2.py: 1.1s gap between requests, exponential backoff (2s→4s→8s→16s→32s, max 60s) on 429/504 errors, up to 5 retries.

Troubleshooting

Error	Cause	Fix
INLINECODE132	Missing or invalid API key	Verify `S2_API_KEY` is set: INLINECODE134
INLINECODE135 after 5 retries

Semantic Scholar 搜索工作流

通过语义学者API使用结构化的四阶段工作流搜索学术论文。

关键规则： 切勿多次连续调用Bash进行API请求。始终编写一个Python脚本运行所有搜索，然后一次性执行。所有速率限制由s2.py自动处理。

第一阶段：理解与规划

解析用户意图并选择搜索策略：

决策树

用户想要...	策略	函数
广泛主题探索	相关性搜索	searchrelevance()
精确技术术语、确切短语

查询构建规则

- 歧义术语（例如，干细胞可能指间充质干细胞或干细胞样T细胞）：使用buildboolquery()配合精确短语和排除项

- 示例：buildboolquery(phrases=[干细胞样T细胞], required=[CD4, TCF7], excluded=[间充质, 造血干细胞])

- 多上下文查询（例如，癌症和自身免疫中的主题X）：规划独立搜索，使用deduplicate()去重
广泛主题：使用带过滤器的search_relevance()（年份、会议地点、研究领域、最低引用次数）

规划过滤器

过滤器	使用场景
year=2020-	仅近期工作
publication_date=2024-01-01:2024-06-30

检查点： 继续前确认：(1) 搜索策略匹配用户意图，(2) 过滤器适当，(3) 查询足够具体以避免不相关结果。

第二阶段：执行搜索

编写一个Python脚本。示例：

python
import sys, os
SKILL_DIR = next((p for p in [
os.path.expanduser(~/.claude/skills/semanticscholar-skill),
os.path.expanduser(~/.openclaw/skills/semanticscholar-skill),
] if os.path.isdir(p)), .)
sys.path.insert(0, SKILL_DIR)
from s2 import *

构建精确查询

q = buildboolquery( phrases=[干细胞样T细胞], required=[CD4, IBD], excluded=[间充质] ) papers = searchbulk(q, maxresults=30, year=2018-, fieldsofstudy=Medicine) papers = deduplicate(papers)

print(format_results(papers, IBD中的干细胞样CD4 T细胞))

执行命令：python3 /tmp/s2_search.py

规则：

- 从s2导入所有内容：from s2 import *
将脚本写入/tmp/s2_search.py（或类似临时路径）
一次Bash调用执行。切勿通过多次独立Bash调用链式进行多个API调用。
速率限制、重试和退避在s2.py中自动处理

检查点： 验证脚本成功运行（无异常）并返回结果。如果结果为0，在呈现前放宽查询或放松过滤器。

工作示例

示例1：作者工作流 — 查找Yann LeCun关于自监督学习的论文

authors = searchauthors(Yann LeCun, maxresults=5)
print(format_authors(authors))

使用第一个匹配的ID获取其论文

author_id = authors[0][authorId] papers = getauthorpapers(authorid, maxresults=50)

本地按主题过滤

ssl_papers = [p for p in papers if self-supervised in (p.get(title) or ).lower()] print(formatresults(sslpapers, Yann LeCun - 自监督学习))

示例2：引用链 — 谁引用了Transformer论文，他们在此基础上构建了什么？

paper = get_paper(DOI:10.48550/arXiv.1706.03762)
print(f标题: {paper[title]}, 引用数: {paper[citationCount]})

获取引用此论文的高引用论文

citing = getcitations(paper[paperId], maxresults=50) citing_papers = [c[citingPaper] for c in citing if c.get(citingPaper)] citing_papers.sort(key=lambda p: p.get(citationCount, 0), reverse=True) print(formatresults(citingpapers, 引用《Attention Is All You Need》的高引用论文))

示例3：多种子推荐与BibTeX导出 — 查找类似这两篇但不关于NLP的论文

recs = recommend(
positive_ids=[DOI:10.1038/nature14539, ARXIV:2010.11929],
negative_ids=[ARXIV:1706.03762],
limit=20
)
print(format_results(recs, 类似深度学习与ViT的视觉论文，排除NLP))

导出前10个结果的BibTeX

bibdata = batchpapers([r[paperId] for r in recs[:10]], fields=title,citationStyles) print(exportbibtex(bibdata))

第三阶段：总结与呈现

- 使用format_results()获得一致的输出（摘要表+前10详情）
如果用户语言为中文，用中文呈现摘要
始终注明总结果数量和使用的搜索策略
根据用户的具体问题突出最相关的论文

第四阶段：用户交互循环

呈现结果后，始终提供以下选项：

1. 翻译 — 将标题/摘要翻译成中文（或其他语言）
详情 — 特定论文编号的完整摘要
优化 — 使用不同术语/过滤器缩小或扩大搜索
相似 — 查找与特定结果相似的论文（findsimilar()）
引用 — 谁引用了特定论文（getcitations()）
导出 — 通过exportbibtex()、exportmarkdown()或export_json()保存结果
完成 — 结束搜索会话

循环直到用户表示完成。每次后续操作使用相同的单脚本模式。

semanticscholar-skill语义学者搜索

semanticscholar-skill

Semantic Scholar Search Workflow

Phase 1: Understand & Plan

Decision Tree

Query Construction Rules

Plan Filters

Phase 2: Execute Search

Worked Examples

Phase 3: Summarize & Present

Phase 4: User Interaction Loop

API Quick Reference

Helper Module (s2.py)

Paper Search Functions

Author Functions

Filter Parameters (kwargs)

Boolean Query Syntax (bulk search only)

Output Functions

Supported ID Formats

Paper Fields

Rate Limiting

Troubleshooting

Semantic Scholar 搜索工作流

第一阶段：理解与规划

决策树

查询构建规则

规划过滤器

第二阶段：执行搜索

构建精确查询

工作示例

使用第一个匹配的ID获取其论文

本地按主题过滤

获取引用此论文的高引用论文

导出前10个结果的BibTeX

第三阶段：总结与呈现

第四阶段：用户交互循环

API快速参考

辅助模块（s2.py）

论文搜索函数

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement

Helper Module (`s2.py`)