OpenPaperGraph — Literature Discovery & Citation Analysis
You are a research assistant with access to a CLI tool for academic literature discovery and analysis.
Setup
The CLI is located at: INLINECODE0
Before first use, ensure dependencies are installed:
CODEBLOCK0
All commands output JSON to stdout. Run from the SKILL_DIR directory.
Architecture: Multi-Source
This tool reduces dependency on any single data source:
| Task | Primary Sources | Fallback |
|---|
| Search | arXiv + DBLP + S2 | Deduplicated, sorted by citations |
| References |
Download PDF → parse reference list | S2 API |
| Citations | Google Scholar | S2 API |
| Citation counts | Google Scholar | S2 |
| Recommendations | S2 Recommendations API | — |
| Reference resolution | arXiv → S2 → CrossRef → OpenAlex | Multi-cascade |
Available Commands
1. Search Papers
Multi-source search across arXiv, DBLP, and Semantic Scholar. Supports conference filtering.
CODEBLOCK1
- -
--source: all (default, multi-source), arxiv, dblp, or INLINECODE6 - INLINECODE7 : Filter by conference —
ICLR, NeurIPS, ICML, ACL, EMNLP, NAACL, WebConf, INLINECODE15 - INLINECODE16 : Max results (default 20)
When to use: User asks to find papers, search for literature, or look up specific topics/conferences.
2. Build Citation Network
Construct a citation graph from seed papers. References come from PDF parsing (downloaded from arXiv/Unpaywall), citations from Google Scholar. Falls back to S2 when needed.
CODEBLOCK2
- - Paper IDs can be: S2 hex ID (
204e3073...), arXiv ID (ARXIV:1706.03762), DOI (DOI:10.1145/...), paper title ("attention is all you need"), PDF path (paper.pdf), BibTeX file (refs.bib), or Zotero CSL-JSON export (zotero.json) - INLINECODE24 : Expansion depth (1 or 2, default 1)
- INLINECODE25 : Save graph to file for later analysis/export
When to use: User wants to explore the citation landscape around specific papers.
3. Paper Recommendations
Get related paper recommendations based on one or more papers (via S2 Recommendations API).
CODEBLOCK3
- - Also accepts paper titles and PDF paths as input
When to use: User wants to discover related or similar papers they may have missed.
4. Monitor New Papers
Check for recently published papers on a research topic (multi-source: arXiv + DBLP + S2, citation counts enriched via Google Scholar).
CODEBLOCK4
When to use: User wants to stay updated on latest publications in a field.
5. Topic Analysis
Analyze a citation graph for topics, keyword distribution, year trends, and top authors.
CODEBLOCK5
When to use: User wants to understand the thematic structure of a set of papers.
6. Research Summary
Generate a research summary from a citation graph. Uses LLM if any provider is configured, otherwise falls back to extractive analysis.
CODEBLOCK6
- -
--style: overview (default), trends, or INLINECODE29 - INLINECODE30 : LLM provider name (e.g.
openai, deepseek, qwen, zhipu, moonshot) - INLINECODE36 : Override the provider's default model
When to use: User wants a quick overview of a research area or to identify trends/gaps.
7. PDF Reference Extraction
Extract references from a PDF paper, resolving via multi-source cascade (arXiv → S2 → CrossRef → OpenAlex).
CODEBLOCK7
- -
--use-grobid: Use GROBID for structured extraction (requires Docker service on port 8070) - Returns: resolved papers, unresolved references, and resolve rate
When to use: User provides a PDF and wants to find/analyze its references.
7b. Build Graph from PDF Reference Lists
Build a citation graph directly from one or more PDF papers' reference lists.
CODEBLOCK8
- -
--depth 0 (default): Only PDF references. --depth 1: Also expand resolved papers. - INLINECODE40 : Keep unresolved references as nodes in the graph (marked
resolved=false) - INLINECODE42 : Use GROBID for structured extraction
- References resolved via: arXiv → Semantic Scholar → CrossRef → OpenAlex (multi-source cascade)
When to use: User has PDF papers and wants a citation graph faithful to the actual reference lists.
8. Zotero Import
Import papers from a Zotero library or collection.
CODEBLOCK9
When to use: User wants to import their existing Zotero library for analysis.
9. Export
Export a citation graph as BibTeX, CSV, Markdown, or JSON. All formats sort papers by year descending.
CODEBLOCK10
- -
--format: bibtex (default), csv, markdown, or INLINECODE47 - CSV/Markdown/JSON include full fields: id, title, authors, year, citations, source, url, doi, arxiv_id, abstract
When to use: User wants to save results for use in a reference manager, spreadsheet, or documentation.
9b. Export Interactive HTML Graph
Export a citation graph as a self-contained interactive HTML visualization.
CODEBLOCK11
- -
--title: Custom page title (default: "Paper Graph") - INLINECODE49 : Pre-generate AI summary at export time (requires LLM API key in env). Result is embedded; API key is NOT.
- INLINECODE50 : Inline vis-network JS for fully offline use (~500KB larger, no CDN needed)
- INLINECODE51 /
--model: Override LLM provider/model for INLINECODE53 - Layout: Semantic left-to-right hierarchy — References (LEFT) → Seeds (CENTER) → Citations (RIGHT)
- Node types: Seeds (purple stars), References (blue circles), Citations (green diamonds), with legend
- Features: bidirectional hover linking, type filter, search/filter, in-page export, seed source management (add/remove seeds)
- Summary modes: (A) Pre-generate with
--summary, (B) Runtime API key (20+ providers), (C) Manual copy/paste (CORS-proof) - Security: API keys are never embedded in the HTML output
When to use: User wants a visual, interactive exploration of the citation network, or wants to share a browsable graph.
9b. Interactive Graph Server (serve)
Start a local HTTP server for interactive graph management. Unlike export-html (static, read-only), serve lets users add papers, convert nodes to seeds, remove seeds, and all changes persist to the graph JSON file.
CODEBLOCK12
- -
--port: Server port (default: 8787) - INLINECODE59 : Custom page title
- Add papers: "+ Add Paper" button in toolbar. Input via title/ID, BibTeX, or PDF upload. Toggle "Treat as Seed Paper" to control expansion.
- Seed: Full expansion — fetches references + citations from S2/Google Scholar, adds nodes + edges
- Non-seed: Lightweight — only checks relationships with existing seeds, no expansion
- Convert to seed: Click any non-seed paper in the list → "⬆ Convert to Seed" button appears. Also available in the node tooltip when clicking graph nodes.
- Remove seed: Seeds/Sources tab → "Remove" button. Deletes seed + exclusive connections.
- Persistent: All changes immediately written to graph JSON file. Survives page refresh.
- Dedup: Papers matched by DOI > arXiv ID > title+year similarity (no duplicates)
When to use: User wants to interactively build and manage a citation network through the browser, with all changes persisted. Use export-html instead when you want a static file for sharing.
10. Remove Seed Paper
Remove a seed paper and all papers exclusively connected to it from a graph.
CODEBLOCK13
- - Accepts paper ID or title substring (fuzzy match)
- Removes the seed + papers connected only to that seed (not shared with other seeds)
- Cleans up all incident edges
- Overwrites the graph file (use
-o to save to a different file)
11. Remove Non-Seed Paper
Remove a single non-seed paper from a graph.
CODEBLOCK14
- - Accepts paper ID or title substring (fuzzy match)
- Only works for non-seed papers (use
remove-seed for seeds) - Cleans up all incident edges
- Overwrites the graph file (use
-o to save to a different file)
12. List Conferences
Show supported conference venues for filtering.
CODEBLOCK15
13. List LLM Providers
Show all 20 supported LLM providers and whether their API key is configured.
CODEBLOCK16
Workflow Guidelines
- 1. Start with search — Help the user find relevant seed papers first (default: multi-source)
- Build a graph — Use seed paper IDs to construct a citation network, save to a
.json file - Explore interactively — Use
serve to open the graph in browser, add papers, convert to seeds (serve) - Analyze — Run topic analysis or generate a summary on the saved graph
- Discover more — Use recommendations to find papers the user may have missed
- Export — Save results as BibTeX/CSV/Markdown/JSON for the user's reference manager
- Share — Generate a static HTML graph for sharing/viewing (
export-html)
Output Format
All commands output JSON to stdout. When presenting results to the user:
- - Show paper titles, authors, year, and citation counts in a readable format
- For large result sets, summarize the top results and mention the total count
- Paper IDs can be: S2 hex IDs, arXiv IDs (
ARXIV:xxxx), DOIs (DOI:xxxx), paper titles, or PDF file paths - The
source field in results indicates where each paper came from (arxiv, semanticscholar, googlescholar, crossref, openalex, dblp)
Environment Variables
S2_API_KEY (Recommended)
Semantic Scholar API key. Free at
semanticscholar.org/product/api.
- - Purpose: Authenticates requests to the S2 API (paper search, citation data, recommendations)
- Why needed: Without it, S2 enforces strict rate limiting — frequent calls return 429 errors
- Role: S2 is the fallback in the multi-source architecture — when PDF download or Google Scholar fails, the system falls back to S2. Also the exclusive source for the
recommend command
LLM Provider API Key (Optional — any one of 20 providers)
The
summary command supports
20 LLM providers. Set any one API key to enable LLM-powered summaries:
US: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, DEEPSEEK_API_KEY, GROQ_API_KEY, TOGETHER_API_KEY, FIREWORKS_API_KEY, MISTRAL_API_KEY, XAI_API_KEY, PERPLEXITY_API_KEY, INLINECODE84
Chinese: ZHIPUAI_API_KEY (智谱), MOONSHOT_API_KEY (月之暗面), BAICHUAN_API_KEY (百川), YI_API_KEY (零一万物), DASHSCOPE_API_KEY (通义千问), ARK_API_KEY (豆包), MINIMAX_API_KEY, STEPFUN_API_KEY (阶跃星辰), SENSENOVA_API_KEY (商汤)
Custom: Set LLM_API_KEY + LLM_BASE_URL + LLM_MODEL for any OpenAI-compatible endpoint.
Additional environment variables:
- -
LLM_PROVIDER: Explicitly select LLM provider (alternative to --provider CLI flag) - INLINECODE99 : Override default model for the selected provider (alternative to
--model CLI flag) - INLINECODE101 : Custom directory for PDF download cache (defaults to system temp)
Without any LLM key, summary uses extractive analysis and export-html hides the AI summary panel. All other commands are unaffected. Run llm-providers to check status.
Cross-Tool Compatibility
This CLI is designed to be called by any AI coding tool (Claude Code, OpenClaw, Codex, etc.):
- - All output is structured JSON on stdout
- Errors go to stderr
- Exit code 0 = success, 1 = argument error, 2 = runtime error
- No interactive input required — all parameters via command-line flags
OpenPaperGraph — 文献发现与引文分析
您是一位研究助手,能够使用命令行工具进行学术文献发现与分析。
环境配置
命令行工具位于:SKILLDIR/openpapergraphcli.py
首次使用前,请确保已安装依赖:
bash
pip install httpx pymupdf scholarly
所有命令均以JSON格式输出至标准输出。请从SKILL_DIR目录运行。
架构:多源数据
该工具减少对单一数据源的依赖:
| 任务 | 主要来源 | 备用来源 |
|---|
| 搜索 | arXiv + DBLP + S2 | 去重后按引用量排序 |
| 参考文献 |
下载PDF → 解析参考文献列表 | S2 API |
| 引文 | Google Scholar | S2 API |
| 引用量 | Google Scholar | S2 |
| 推荐 | S2推荐API | — |
| 参考文献解析 | arXiv → S2 → CrossRef → OpenAlex | 多级级联 |
可用命令
1. 搜索论文
跨arXiv、DBLP和Semantic Scholar的多源搜索。支持会议过滤。
bash
python SKILLDIR/openpapergraphcli.py search 查询词 --source 来源 --venue 会议 --limit 数量
- - --source:all(默认,多源)、arxiv、dblp或s2
- --venue:按会议过滤 — ICLR、NeurIPS、ICML、ACL、EMNLP、NAACL、WebConf、KDD
- --limit:最大结果数(默认20)
使用场景:用户要求查找论文、搜索文献或查询特定主题/会议。
2. 构建引文网络
从种子论文构建引文图。参考文献来自PDF解析(从arXiv/Unpaywall下载),引文来自Google Scholar。必要时回退到S2。
bash
python SKILLDIR/openpapergraphcli.py graph 论文ID1 论文ID2 --depth 1 --output graph.json
- - 论文ID可以是:S2十六进制ID(204e3073...)、arXiv ID(ARXIV:1706.03762)、DOI(DOI:10.1145/...)、论文标题(attention is all you need)、PDF路径(paper.pdf)、BibTeX文件(refs.bib)或Zotero CSL-JSON导出文件(zotero.json)
- --depth:扩展深度(1或2,默认1)
- --output:将图保存到文件,供后续分析/导出
使用场景:用户希望探索特定论文周围的引文景观。
3. 论文推荐
基于一篇或多篇论文获取相关论文推荐(通过S2推荐API)。
bash
python SKILLDIR/openpapergraphcli.py recommend 论文ID1 论文ID2 --limit 10
使用场景:用户希望发现可能遗漏的相关或相似论文。
4. 监控新论文
检查某个研究主题近期发表的论文(多源:arXiv + DBLP + S2,引用量通过Google Scholar丰富)。
bash
python SKILLDIR/openpapergraphcli.py monitor 主题 --year-from 2025 --limit 20
使用场景:用户希望了解某个领域的最新发表动态。
5. 主题分析
分析引文图的主题、关键词分布、年份趋势和顶级作者。
bash
python SKILLDIR/openpapergraphcli.py analyze graph.json
使用场景:用户希望理解一组论文的主题结构。
6. 研究摘要
从引文图生成研究摘要。如果配置了任何LLM提供商则使用LLM,否则回退到提取式分析。
bash
python SKILLDIR/openpapergraphcli.py summary graph.json --style 风格
python SKILLDIR/openpapergraphcli.py summary graph.json --provider deepseek --model deepseek-chat
- - --style:overview(默认)、trends或gaps
- --provider:LLM提供商名称(例如openai、deepseek、qwen、zhipu、moonshot)
- --model:覆盖提供商的默认模型
使用场景:用户希望快速了解某个研究领域或识别趋势/空白。
7. PDF参考文献提取
从PDF论文中提取参考文献,通过多源级联解析(arXiv → S2 → CrossRef → OpenAlex)。
bash
python SKILLDIR/openpapergraphcli.py pdf /path/to/paper.pdf
python SKILLDIR/openpapergraphcli.py pdf /path/to/paper.pdf --use-grobid
- - --use-grobid:使用GROBID进行结构化提取(需要在端口8070运行Docker服务)
- 返回:已解析论文、未解析参考文献和解析率
使用场景:用户提供PDF并希望查找/分析其参考文献。
7b. 从PDF参考文献列表构建图
直接从一篇或多篇PDF论文的参考文献列表构建引文图。
bash
python SKILLDIR/openpapergraphcli.py graph-from-pdf paper.pdf [paper2.pdf ...] --output graph.json
python SKILLDIR/openpapergraphcli.py graph-from-pdf paper.pdf --depth 1 --include-unresolved -o graph.json
- - --depth 0(默认):仅PDF参考文献。--depth 1:同时扩展已解析论文。
- --include-unresolved:将未解析的参考文献保留为图中的节点(标记为resolved=false)
- --use-grobid:使用GROBID进行结构化提取
- 参考文献解析途径:arXiv → Semantic Scholar → CrossRef → OpenAlex(多源级联)
使用场景:用户拥有PDF论文并希望获得忠实于实际参考文献列表的引文图。
8. Zotero导入
从Zotero库或集合导入论文。
bash
python SKILLDIR/openpapergraphcli.py zotero --user-id ID --api-key KEY [--collection KEY] [--list-collections]
使用场景:用户希望导入现有的Zotero库进行分析。
9. 导出
将引文图导出为BibTeX、CSV、Markdown或JSON格式。所有格式按年份降序排列论文。
bash
python SKILLDIR/openpapergraphcli.py export graph.json --format bibtex --output refs.bib
python SKILLDIR/openpapergraphcli.py export graph.json --format csv --output papers.csv
python SKILLDIR/openpapergraphcli.py export graph.json --format markdown --output papers.md
python SKILLDIR/openpapergraphcli.py export graph.json --format json --output papers.json
- - --format:bibtex(默认)、csv、markdown或json
- CSV/Markdown/JSON包含完整字段:id、标题、作者、年份、引用量、来源、url、doi、arxiv_id、摘要
使用场景:用户希望保存结果用于参考文献管理器、电子表格或文档。
9b. 导出交互式HTML图
将引文图导出为自包含的交互式HTML可视化。
bash
python SKILLDIR/openpapergraphcli.py export-html graph.json --output graph.html
python SKILLDIR/openpapergraphcli.py export-html graph.json --output graph.html --title 我的研究 --summary --inline
- - --title:自定义页面标题(默认:Paper Graph)
- --summary:在导出时预生成AI摘要(需要环境变量中的LLM API密钥)。结果嵌入;API密钥不嵌入。
- --inline:内联vis-network JS以实现完全离线使用(约增大500KB,无需CDN)
- --provider / --model:覆盖--summary的LLM提供商/模型
- 布局:语义从左到右层级 — 参考文献(左)→ 种子(中)→ 引文(右)
- 节点类型:种子(紫色星形)、参考文献(蓝色圆形)、引文(绿色菱形),带图例
- 功能:双向悬停链接、类型过滤、搜索/过滤、页面内导出、种子源管理(添加/移除种子)
- 摘要模式:(A)使用--summary预生成,(B)运行时API密钥(20+提供商),(C)手动复制/粘贴(防CORS)
- 安全性:API密钥绝不嵌入HTML输出中
使用场景:用户希望以可视化、交互方式探索引文网络,或希望分享可浏览的图。
9b. 交互式图服务器(serve)
启动本地HTTP服务器进行交互式图管理。与export-html(静态、只读)不同,serve允许用户添加论文、将节点转换为