Asta MCP — Academic Paper Search
Asta is Ai2's Scientific Corpus Tool, exposing the Semantic Scholar academic graph over MCP (streamable HTTP transport). This skill tells agents which Asta tool to call for which intent, and how to compose them into useful workflows.
- - MCP endpoint: INLINECODE0
- Auth:
x-api-key header (request key at https://share.hsforms.com/1L4hUh20oT3mu8iXJQMV77w3ioxm) - Transport: streamable HTTP
Prerequisite Check
Before invoking any tool, verify the Asta MCP server is registered in the host agent. Tool names will be prefixed by the MCP server name chosen at install time (commonly asta__<tool> or mcp__asta__<tool>). If no Asta tools are visible, direct the user to the Installation section below.
Tool Map — Intent → Asta Tool
| User intent | Asta tool | Notes |
|---|
| Broad topic search | INLINECODE4 | Supports venue + date filters |
| Known paper title |
search_paper_by_title | Optional venue restriction |
| Known DOI / arXiv / PMID / CorpusId / MAG / ACL / SHA / URL |
get_paper | Single-paper lookup |
| Multiple known IDs at once |
get_paper_batch | Batch lookup — prefer over N sequential
get_paper calls |
| Who cited paper X |
get_citations | Citation traversal with filters, paginated |
| Find author by name |
search_authors_by_name | Returns profile info |
| An author's publications |
get_author_papers | Pass author id from previous call |
| Find passages mentioning X |
snippet_search | ~500-word excerpts from paper bodies |
All tools accept date-range filters and field selection — pass them whenever the user's intent constrains scope (e.g., "recent", "since 2022", "at NeurIPS").
⚠️ fields parameter — avoid context blowups
INLINECODE14 / get_paper_batch accept a fields string. Never request citations or references via fields — a single highly-cited paper (e.g. Attention Is All You Need) returns 200k+ characters and will overflow the agent's context window. Use the dedicated get_citations tool instead (it paginates).
Safe default fields for get_paper:
title,year,authors,venue,tldr,url,abstract
Add
journal,
publicationDate,
fieldsOfStudy,
isOpenAccess only when needed.
Workflow Patterns
Pattern 1 — Topic Discovery
- 1.
search_papers_by_relevance(query, year="2022-", venue=?) → initial hits - Rank/present top N by citationCount + recency
- Offer follow-ups:
get_citations on the most influential, or snippet_search for specific claims
Pattern 2 — Seed-Paper Expansion
- 1.
get_paper(DOI|arXiv|...) → verify seed - INLINECODE31 → forward expansion
- Optionally
search_papers_by_relevance with seed title terms for sideways discovery - Deduplicate by paperId before presenting
Pattern 3 — Author Deep-Dive
- 1.
search_authors_by_name(name) → pick correct profile (disambiguate by affiliation) - INLINECODE34 → full publication list
- Filter client-side by topic keywords or date
Pattern 4 — Evidence Retrieval
- 1.
snippet_search(claim_query) → find passages making/supporting a claim - For each hit, optionally
get_paper(id) for full metadata
Output & Interaction Rules
- - Always report total count and which tool was used.
- Present top 10 as a table (title, year, venue, citations), then details for the most relevant.
- If the user writes in Chinese, present summaries in Chinese; keep titles in original language.
- After results, offer: Details / Refine / Citations / Snippet / Export / Done.
Critical Rules
- - Prefer batched intent over ping-pong. If the user's question needs two independent lookups, issue them as parallel MCP tool calls in one turn, not sequentially.
- Never guess IDs. If a user gives a fuzzy title, use
search_paper_by_title before get_paper. - Respect rate limits. An API key buys higher limits but not unlimited — stop expanding citation graphs beyond what the user asked for.
- Do not fabricate fields. If Asta returns null
abstract or venue, say so rather than inventing.
Relationship to semanticscholar-skill
Both wrap the Semantic Scholar corpus, but target different runtimes:
| INLINECODE42 | INLINECODE43 |
|---|
| Transport | Python + direct REST (s2.py) | MCP (streamable HTTP) |
| Host needs |
S2_API_KEY + Python | Asta MCP registered in host |
| Best for | Scripted batch workflows, custom filters | Zero-code agent integration (Claude Code, Codex, Cursor, Windsurf, OpenClaw) |
| Auth |
S2_API_KEY |
ASTA_API_KEY via
x-api-key header |
Use asta-skill when the host agent supports MCP; fall back to semanticscholar-skill for scripted/pipeline work.
Installation
Set ASTA_API_KEY in your shell first:
CODEBLOCK1
Claude Code
CODEBLOCK2
Or edit ~/.claude.json / .mcp.json:
CODEBLOCK3
Codex CLI
Edit ~/.codex/config.toml:
CODEBLOCK4
Windsurf / Cursor / Hermes / other MCP clients
Add to the client's MCP server config file:
CODEBLOCK5
LM Studio
LM Studio 0.3.17+ supports remote MCP servers. Edit ~/.lmstudio/mcp.json (macOS/Linux) or %USERPROFILE%\.lmstudio\mcp.json (Windows) — or in the app: Program tab → Install > Edit mcp.json:
CODEBLOCK6
Only models with "Tool Use: Supported" in LM Studio's model loader will be able to call Asta tools. Recommended: Qwen 2.5 / 3 Instruct (7B+), Llama 3.1 / 3.3 Instruct (8B+), Mistral / Mixtral Instruct.
OpenClaw
Install this skill into ~/.openclaw/skills/asta-skill/ and register the MCP server in your OpenClaw config using the same URL + x-api-key header pattern. The skill's frontmatter declares ASTA_API_KEY as required via metadata.openclaw.requires.env.
Verification
After installation, ask the agent: "Use Asta to look up the paper with DOI 10.48550/arXiv.1706.03762." A successful call returns the "Attention Is All You Need" paper metadata. If the agent reports no Asta tools, the MCP server is not registered — re-check the config file path and restart the host.
Troubleshooting
| Symptom | Cause | Fix |
|---|
| INLINECODE61 | Missing or invalid INLINECODE62 | Verify ASTA_API_KEY is set and header is forwarded |
| INLINECODE64 |
Rate limit hit | Slow down / batch; ensure API key is attached (unauth'd limits are lower) |
| No Asta tools visible | MCP server not registered in host | Re-run install step, restart agent |
| Empty
abstract | Not all corpus papers have full text | Use
snippet_search instead, or fall back to title + TLDR |
| Author disambiguation wrong | Common name collisions | Inspect affiliations in
search_authors_by_name before calling
get_author_papers |
Asta MCP — 学术论文搜索
Asta 是 Ai2 的科学语料工具,通过 MCP(可流式 HTTP 传输)暴露语义学者学术图谱。该技能告诉智能体针对何种意图调用哪个 Asta 工具,以及如何将它们组合成有用的工作流。
- - MCP 端点: https://asta-tools.allen.ai/mcp/v1
- 认证: x-api-key 请求头(在 https://share.hsforms.com/1L4hUh20oT3mu8iXJQMV77w3ioxm 申请密钥)
- 传输方式: 可流式 HTTP
前置条件检查
在调用任何工具之前,请验证 Asta MCP 服务器已在宿主智能体中注册。工具名称将以安装时选择的 MCP 服务器名称为前缀(通常为 asta 或 mcpasta)。如果未显示任何 Asta 工具,请引导用户查看下方的安装部分。
工具映射 — 意图 → Asta 工具
| 用户意图 | Asta 工具 | 备注 |
|---|
| 广泛主题搜索 | searchpapersbyrelevance | 支持会议+日期筛选 |
| 已知论文标题 |
searchpaper
bytitle | 可选会议限制 |
| 已知 DOI / arXiv / PMID / CorpusId / MAG / ACL / SHA / URL | get_paper | 单篇论文查询 |
| 同时查询多个已知 ID | get
paperbatch | 批量查询 — 优先于 N 次顺序 get_paper 调用 |
| 谁引用了论文 X | get_citations | 带筛选的引用遍历,支持分页 |
| 按姓名查找作者 | search
authorsby_name | 返回个人资料信息 |
| 某位作者的出版物 | get
authorpapers | 传入上一步调用返回的作者 ID |
| 查找提及 X 的片段 | snippet_search | 论文正文中约 500 字的摘录 |
所有工具均接受日期范围筛选和字段选择 — 当用户意图限制范围时(例如近期、自 2022 年以来、在 NeurIPS 上),请传递这些参数。
⚠️ fields 参数 — 避免上下文膨胀
getpaper / getpaperbatch 接受一个 fields 字符串。切勿通过 fields 请求 citations 或 references — 单篇高被引论文(例如《Attention Is All You Need》)会返回 20 万+字符,导致智能体的上下文窗口溢出。请改用专用的 getcitations 工具(它支持分页)。
get_paper 的安全默认 fields:
title,year,authors,venue,tldr,url,abstract
仅在需要时添加 journal、publicationDate、fieldsOfStudy、isOpenAccess。
工作流模式
模式 1 — 主题发现
- 1. searchpapersbyrelevance(query, year=2022-, venue=?) → 初始命中结果
- 按引用次数 + 时效性对前 N 篇进行排序/展示
- 提供后续操作:对最具影响力的论文执行 getcitations,或对特定主张执行 snippet_search
模式 2 — 种子论文扩展
- 1. getpaper(DOI|arXiv|...) → 验证种子论文
- getcitations(paperId) → 前向扩展
- 可选地使用种子论文标题词进行 searchpapersby_relevance 以横向发现
- 在展示前按 paperId 去重
模式 3 — 作者深度挖掘
- 1. searchauthorsbyname(name) → 选择正确的个人资料(通过所属机构消歧)
- getauthor_papers(authorId) → 完整出版物列表
- 按主题关键词或日期在客户端筛选
模式 4 — 证据检索
- 1. snippetsearch(claimquery) → 查找提出/支持某主张的段落
- 对每个命中结果,可选地执行 get_paper(id) 获取完整元数据
输出与交互规则
- - 始终报告总数和使用了哪个工具。
- 以表格形式展示前 10 条结果(标题、年份、会议、引用次数),然后展示最相关结果的详细信息。
- 如果用户使用中文书写,则以中文呈现摘要;标题保持原文语言。
- 结果展示后,提供:详细信息 / 优化 / 引用 / 片段 / 导出 / 完成。
关键规则
- - 优先批量意图而非来回交互。 如果用户的问题需要两个独立的查询,应在一轮中作为并行的 MCP 工具调用发出,而非顺序执行。
- 切勿猜测 ID。 如果用户给出模糊的标题,在调用 getpaper 之前先使用 searchpaperbytitle。
- 遵守速率限制。 API 密钥可提供更高的限制,但并非无限 — 不要超出用户要求的范围扩展引用图谱。
- 不要编造字段。 如果 Asta 返回空的 abstract 或 venue,如实说明,而非自行编造。
与 semanticscholar-skill 的关系
两者都封装了语义学者语料库,但面向不同的运行时:
| semanticscholar-skill | asta-skill |
|---|
| 传输方式 | Python + 直接 REST (s2.py) | MCP(可流式 HTTP) |
| 宿主需求 |
S2
APIKEY + Python | 宿主中注册 Asta MCP |
| 最佳用途 | 脚本化批量工作流、自定义筛选 | 零代码智能体集成(Claude Code、Codex、Cursor、Windsurf、OpenClaw) |
| 认证 | S2
APIKEY | 通过 x-api-key 请求头的 ASTA
APIKEY |
当宿主智能体支持 MCP 时使用 asta-skill;对于脚本化/流水线工作,回退到 semanticscholar-skill。
安装
首先在 shell 中设置 ASTAAPIKEY:
bash
export ASTAAPIKEY=... # 在 https://share.hsforms.com/1L4hUh20oT3mu8iXJQMV77w3ioxm 申请
Claude Code
bash
claude mcp add asta \
--transport http \
--url https://asta-tools.allen.ai/mcp/v1 \
--header x-api-key: $ASTAAPIKEY
或编辑 ~/.claude.json / .mcp.json:
json
{
mcpServers: {
asta: {
type: http,
url: https://asta-tools.allen.ai/mcp/v1,
headers: { x-api-key: ${ASTAAPIKEY} }
}
}
}
Codex CLI
编辑 ~/.codex/config.toml:
toml
[mcp_servers.asta]
type = http
url = https://asta-tools.allen.ai/mcp/v1
headers = { x-api-key = ${ASTAAPIKEY} }
Windsurf / Cursor / Hermes / 其他 MCP 客户端
添加到客户端的 MCP 服务器配置文件中:
json
{
mcpServers: {
asta: {
serverUrl: https://asta-tools.allen.ai/mcp/v1,
headers: { x-api-key: APIKEY> }
}
}
}
LM Studio
LM Studio 0.3.17+ 支持远程 MCP 服务器。编辑 ~/.lmstudio/mcp.json(macOS/Linux)或 %USERPROFILE%\.lmstudio\mcp.json(Windows)— 或在应用中:程序选项卡 → 安装 > 编辑 mcp.json:
json
{
mcpServers: {
asta: {
url: https://asta-tools.allen.ai/mcp/v1,
headers: { x-api-key: APIKEY> }
}
}
}
只有 LM Studio 模型加载器中标记为支持工具使用的模型才能调用 Asta 工具。推荐:Qwen 2.5 / 3 Instruct(7B+)、Llama 3.1 / 3.3 Instruct(8B+)、Mistral / Mixtral Instruct。
OpenClaw
将此技能安装到 ~/.openclaw/skills/asta-skill/,并使用相同的 URL + x-api-key 请求头模式在 OpenClaw 配置中注册 MCP 服务器。该技能的前置声明通过 metadata