OpenPaperGraph — Literature Discovery & Citation Analysis

You are a research assistant with access to a CLI tool for academic literature discovery and analysis.

Setup

The CLI is located at: INLINECODE0

Before first use, ensure dependencies are installed:
CODEBLOCK0

All commands output JSON to stdout. Run from the SKILL_DIR directory.

Architecture: Multi-Source

This tool reduces dependency on any single data source:

Task	Primary Sources	Fallback
Search	arXiv + DBLP + S2	Deduplicated, sorted by citations
References

Available Commands

1. Search Papers

Multi-source search across arXiv, DBLP, and Semantic Scholar. Supports conference filtering.

CODEBLOCK1

- --source: all (default, multi-source), arxiv, dblp, or INLINECODE6
INLINECODE7: Filter by conference — ICLR, NeurIPS, ICML, ACL, EMNLP, NAACL, WebConf, INLINECODE15
INLINECODE16: Max results (default 20)

When to use: User asks to find papers, search for literature, or look up specific topics/conferences.

2. Build Citation Network

Construct a citation graph from seed papers. References come from PDF parsing (downloaded from arXiv/Unpaywall), citations from Google Scholar. Falls back to S2 when needed.

CODEBLOCK2

- Paper IDs can be: S2 hex ID (204e3073...), arXiv ID (ARXIV:1706.03762), DOI (DOI:10.1145/...), paper title ("attention is all you need"), PDF path (paper.pdf), BibTeX file (refs.bib), or Zotero CSL-JSON export (zotero.json)
INLINECODE24: Expansion depth (1 or 2, default 1)
INLINECODE25: Save graph to file for later analysis/export

When to use: User wants to explore the citation landscape around specific papers.

3. Paper Recommendations

Get related paper recommendations based on one or more papers (via S2 Recommendations API).

CODEBLOCK3

- Also accepts paper titles and PDF paths as input

When to use: User wants to discover related or similar papers they may have missed.

4. Monitor New Papers

Check for recently published papers on a research topic (multi-source: arXiv + DBLP + S2, citation counts enriched via Google Scholar).

CODEBLOCK4

When to use: User wants to stay updated on latest publications in a field.

5. Topic Analysis

Analyze a citation graph for topics, keyword distribution, year trends, and top authors.

CODEBLOCK5

When to use: User wants to understand the thematic structure of a set of papers.

6. Research Summary

Generate a research summary from a citation graph. Uses LLM if any provider is configured, otherwise falls back to extractive analysis.

CODEBLOCK6

- --style: overview (default), trends, or INLINECODE29
INLINECODE30: LLM provider name (e.g. openai, deepseek, qwen, zhipu, moonshot)
INLINECODE36: Override the provider's default model

When to use: User wants a quick overview of a research area or to identify trends/gaps.

7. PDF Reference Extraction

Extract references from a PDF paper, resolving via multi-source cascade (arXiv → S2 → CrossRef → OpenAlex).

CODEBLOCK7

- --use-grobid: Use GROBID for structured extraction (requires Docker service on port 8070)
Returns: resolved papers, unresolved references, and resolve rate

When to use: User provides a PDF and wants to find/analyze its references.

7b. Build Graph from PDF Reference Lists

Build a citation graph directly from one or more PDF papers' reference lists.

CODEBLOCK8

- --depth 0 (default): Only PDF references. --depth 1: Also expand resolved papers.
INLINECODE40: Keep unresolved references as nodes in the graph (marked resolved=false)
INLINECODE42: Use GROBID for structured extraction
References resolved via: arXiv → Semantic Scholar → CrossRef → OpenAlex (multi-source cascade)

When to use: User has PDF papers and wants a citation graph faithful to the actual reference lists.

8. Zotero Import

Import papers from a Zotero library or collection.

CODEBLOCK9

When to use: User wants to import their existing Zotero library for analysis.

9. Export

Export a citation graph as BibTeX, CSV, Markdown, or JSON. All formats sort papers by year descending.

CODEBLOCK10

- --format: bibtex (default), csv, markdown, or INLINECODE47
CSV/Markdown/JSON include full fields: id, title, authors, year, citations, source, url, doi, arxiv_id, abstract

When to use: User wants to save results for use in a reference manager, spreadsheet, or documentation.

9b. Export Interactive HTML Graph

Export a citation graph as a self-contained interactive HTML visualization.

CODEBLOCK11

- --title: Custom page title (default: "Paper Graph")
INLINECODE49: Pre-generate AI summary at export time (requires LLM API key in env). Result is embedded; API key is NOT.
INLINECODE50: Inline vis-network JS for fully offline use (~500KB larger, no CDN needed)
INLINECODE51 / --model: Override LLM provider/model for INLINECODE53
Layout: Semantic left-to-right hierarchy — References (LEFT) → Seeds (CENTER) → Citations (RIGHT)
Node types: Seeds (purple stars), References (blue circles), Citations (green diamonds), with legend
Features: bidirectional hover linking, type filter, search/filter, in-page export, seed source management (add/remove seeds)
Summary modes: (A) Pre-generate with --summary, (B) Runtime API key (20+ providers), (C) Manual copy/paste (CORS-proof)
Security: API keys are never embedded in the HTML output

When to use: User wants a visual, interactive exploration of the citation network, or wants to share a browsable graph.

9b. Interactive Graph Server (`serve`)

Start a local HTTP server for interactive graph management. Unlike export-html (static, read-only), serve lets users add papers, convert nodes to seeds, remove seeds, and all changes persist to the graph JSON file.

CODEBLOCK12

- --port: Server port (default: 8787)
INLINECODE59: Custom page title
Add papers: "+ Add Paper" button in toolbar. Input via title/ID, BibTeX, or PDF upload. Toggle "Treat as Seed Paper" to control expansion.
Seed: Full expansion — fetches references + citations from S2/Google Scholar, adds nodes + edges
Non-seed: Lightweight — only checks relationships with existing seeds, no expansion
Convert to seed: Click any non-seed paper in the list → "⬆ Convert to Seed" button appears. Also available in the node tooltip when clicking graph nodes.
Remove seed: Seeds/Sources tab → "Remove" button. Deletes seed + exclusive connections.
Persistent: All changes immediately written to graph JSON file. Survives page refresh.
Dedup: Papers matched by DOI > arXiv ID > title+year similarity (no duplicates)

When to use: User wants to interactively build and manage a citation network through the browser, with all changes persisted. Use export-html instead when you want a static file for sharing.

10. Remove Seed Paper

Remove a seed paper and all papers exclusively connected to it from a graph.

CODEBLOCK13

- Accepts paper ID or title substring (fuzzy match)
Removes the seed + papers connected only to that seed (not shared with other seeds)
Cleans up all incident edges
Overwrites the graph file (use -o to save to a different file)

11. Remove Non-Seed Paper

Remove a single non-seed paper from a graph.

CODEBLOCK14

- Accepts paper ID or title substring (fuzzy match)
Only works for non-seed papers (use remove-seed for seeds)
Cleans up all incident edges
Overwrites the graph file (use -o to save to a different file)

12. List Conferences

Show supported conference venues for filtering.

CODEBLOCK15

13. List LLM Providers

Show all 20 supported LLM providers and whether their API key is configured.

CODEBLOCK16

Workflow Guidelines

1. Start with search — Help the user find relevant seed papers first (default: multi-source)
Build a graph — Use seed paper IDs to construct a citation network, save to a .json file
Explore interactively — Use serve to open the graph in browser, add papers, convert to seeds (serve)
Analyze — Run topic analysis or generate a summary on the saved graph
Discover more — Use recommendations to find papers the user may have missed
Export — Save results as BibTeX/CSV/Markdown/JSON for the user's reference manager
Share — Generate a static HTML graph for sharing/viewing (export-html)

Output Format

All commands output JSON to stdout. When presenting results to the user:

- Show paper titles, authors, year, and citation counts in a readable format
For large result sets, summarize the top results and mention the total count
Paper IDs can be: S2 hex IDs, arXiv IDs (ARXIV:xxxx), DOIs (DOI:xxxx), paper titles, or PDF file paths
The source field in results indicates where each paper came from (arxiv, semanticscholar, googlescholar, crossref, openalex, dblp)

Environment Variables

`S2_API_KEY` (Recommended)

Semantic Scholar API key. Free at semanticscholar.org/product/api.

- Purpose: Authenticates requests to the S2 API (paper search, citation data, recommendations)
Why needed: Without it, S2 enforces strict rate limiting — frequent calls return 429 errors
Role: S2 is the fallback in the multi-source architecture — when PDF download or Google Scholar fails, the system falls back to S2. Also the exclusive source for the recommend command

LLM Provider API Key (Optional — any one of 20 providers)

The summary command supports 20 LLM providers. Set any one API key to enable LLM-powered summaries:

US: OPENAI_API_KEY, ANTHROPIC_API_KEY, GEMINI_API_KEY, DEEPSEEK_API_KEY, GROQ_API_KEY, TOGETHER_API_KEY, FIREWORKS_API_KEY, MISTRAL_API_KEY, XAI_API_KEY, PERPLEXITY_API_KEY, INLINECODE84

Chinese: ZHIPUAI_API_KEY (智谱), MOONSHOT_API_KEY (月之暗面), BAICHUAN_API_KEY (百川), YI_API_KEY (零一万物), DASHSCOPE_API_KEY (通义千问), ARK_API_KEY (豆包), MINIMAX_API_KEY, STEPFUN_API_KEY (阶跃星辰), SENSENOVA_API_KEY (商汤)

Custom: Set LLM_API_KEY + LLM_BASE_URL + LLM_MODEL for any OpenAI-compatible endpoint.

Additional environment variables:

- LLM_PROVIDER: Explicitly select LLM provider (alternative to --provider CLI flag)
INLINECODE99: Override default model for the selected provider (alternative to --model CLI flag)
INLINECODE101: Custom directory for PDF download cache (defaults to system temp)

Without any LLM key, summary uses extractive analysis and export-html hides the AI summary panel. All other commands are unaffected. Run llm-providers to check status.

Cross-Tool Compatibility

This CLI is designed to be called by any AI coding tool (Claude Code, OpenClaw, Codex, etc.):

- All output is structured JSON on stdout
Errors go to stderr
Exit code 0 = success, 1 = argument error, 2 = runtime error
No interactive input required — all parameters via command-line flags

OpenPaperGraph — 文献发现与引文分析

您是一位研究助手，能够使用命令行工具进行学术文献发现与分析。

环境配置

命令行工具位于：SKILLDIR/openpapergraphcli.py

首次使用前，请确保已安装依赖：
bash
pip install httpx pymupdf scholarly

所有命令均以JSON格式输出至标准输出。请从SKILL_DIR目录运行。

架构：多源数据

该工具减少对单一数据源的依赖：

任务	主要来源	备用来源
搜索	arXiv + DBLP + S2	去重后按引用量排序
参考文献

可用命令

1. 搜索论文

跨arXiv、DBLP和Semantic Scholar的多源搜索。支持会议过滤。

bash
python SKILLDIR/openpapergraphcli.py search 查询词 --source 来源 --venue 会议 --limit 数量

- --source：all（默认，多源）、arxiv、dblp或s2
--venue：按会议过滤 — ICLR、NeurIPS、ICML、ACL、EMNLP、NAACL、WebConf、KDD
--limit：最大结果数（默认20）

使用场景：用户要求查找论文、搜索文献或查询特定主题/会议。

2. 构建引文网络

从种子论文构建引文图。参考文献来自PDF解析（从arXiv/Unpaywall下载），引文来自Google Scholar。必要时回退到S2。

bash
python SKILLDIR/openpapergraphcli.py graph 论文ID1 论文ID2 --depth 1 --output graph.json

- 论文ID可以是：S2十六进制ID（204e3073...）、arXiv ID（ARXIV:1706.03762）、DOI（DOI:10.1145/...）、论文标题（attention is all you need）、PDF路径（paper.pdf）、BibTeX文件（refs.bib）或Zotero CSL-JSON导出文件（zotero.json）
--depth：扩展深度（1或2，默认1）
--output：将图保存到文件，供后续分析/导出

使用场景：用户希望探索特定论文周围的引文景观。

3. 论文推荐

基于一篇或多篇论文获取相关论文推荐（通过S2推荐API）。

bash
python SKILLDIR/openpapergraphcli.py recommend 论文ID1 论文ID2 --limit 10

- 也接受论文标题和PDF路径作为输入

使用场景：用户希望发现可能遗漏的相关或相似论文。

4. 监控新论文

检查某个研究主题近期发表的论文（多源：arXiv + DBLP + S2，引用量通过Google Scholar丰富）。

bash
python SKILLDIR/openpapergraphcli.py monitor 主题 --year-from 2025 --limit 20

使用场景：用户希望了解某个领域的最新发表动态。

5. 主题分析

分析引文图的主题、关键词分布、年份趋势和顶级作者。

bash
python SKILLDIR/openpapergraphcli.py analyze graph.json

使用场景：用户希望理解一组论文的主题结构。

6. 研究摘要

从引文图生成研究摘要。如果配置了任何LLM提供商则使用LLM，否则回退到提取式分析。

bash
python SKILLDIR/openpapergraphcli.py summary graph.json --style 风格
python SKILLDIR/openpapergraphcli.py summary graph.json --provider deepseek --model deepseek-chat

- --style：overview（默认）、trends或gaps
--provider：LLM提供商名称（例如openai、deepseek、qwen、zhipu、moonshot）
--model：覆盖提供商的默认模型

使用场景：用户希望快速了解某个研究领域或识别趋势/空白。

7. PDF参考文献提取

从PDF论文中提取参考文献，通过多源级联解析（arXiv → S2 → CrossRef → OpenAlex）。

bash
python SKILLDIR/openpapergraphcli.py pdf /path/to/paper.pdf
python SKILLDIR/openpapergraphcli.py pdf /path/to/paper.pdf --use-grobid

- --use-grobid：使用GROBID进行结构化提取（需要在端口8070运行Docker服务）
返回：已解析论文、未解析参考文献和解析率

使用场景：用户提供PDF并希望查找/分析其参考文献。

7b. 从PDF参考文献列表构建图

直接从一篇或多篇PDF论文的参考文献列表构建引文图。

bash
python SKILLDIR/openpapergraphcli.py graph-from-pdf paper.pdf [paper2.pdf ...] --output graph.json
python SKILLDIR/openpapergraphcli.py graph-from-pdf paper.pdf --depth 1 --include-unresolved -o graph.json

- --depth 0（默认）：仅PDF参考文献。--depth 1：同时扩展已解析论文。
--include-unresolved：将未解析的参考文献保留为图中的节点（标记为resolved=false）
--use-grobid：使用GROBID进行结构化提取
参考文献解析途径：arXiv → Semantic Scholar → CrossRef → OpenAlex（多源级联）

使用场景：用户拥有PDF论文并希望获得忠实于实际参考文献列表的引文图。

8. Zotero导入

从Zotero库或集合导入论文。

bash
python SKILLDIR/openpapergraphcli.py zotero --user-id ID --api-key KEY [--collection KEY] [--list-collections]

使用场景：用户希望导入现有的Zotero库进行分析。

9. 导出

将引文图导出为BibTeX、CSV、Markdown或JSON格式。所有格式按年份降序排列论文。

bash
python SKILLDIR/openpapergraphcli.py export graph.json --format bibtex --output refs.bib
python SKILLDIR/openpapergraphcli.py export graph.json --format csv --output papers.csv
python SKILLDIR/openpapergraphcli.py export graph.json --format markdown --output papers.md
python SKILLDIR/openpapergraphcli.py export graph.json --format json --output papers.json

- --format：bibtex（默认）、csv、markdown或json
CSV/Markdown/JSON包含完整字段：id、标题、作者、年份、引用量、来源、url、doi、arxiv_id、摘要

使用场景：用户希望保存结果用于参考文献管理器、电子表格或文档。

9b. 导出交互式HTML图

将引文图导出为自包含的交互式HTML可视化。

bash
python SKILLDIR/openpapergraphcli.py export-html graph.json --output graph.html
python SKILLDIR/openpapergraphcli.py export-html graph.json --output graph.html --title 我的研究 --summary --inline

- --title：自定义页面标题（默认：Paper Graph）
--summary：在导出时预生成AI摘要（需要环境变量中的LLM API密钥）。结果嵌入；API密钥不嵌入。
--inline：内联vis-network JS以实现完全离线使用（约增大500KB，无需CDN）
--provider / --model：覆盖--summary的LLM提供商/模型
布局：语义从左到右层级 — 参考文献（左）→ 种子（中）→ 引文（右）
节点类型：种子（紫色星形）、参考文献（蓝色圆形）、引文（绿色菱形），带图例
功能：双向悬停链接、类型过滤、搜索/过滤、页面内导出、种子源管理（添加/移除种子）
摘要模式：（A）使用--summary预生成，（B）运行时API密钥（20+提供商），（C）手动复制/粘贴（防CORS）
安全性：API密钥绝不嵌入HTML输出中

使用场景：用户希望以可视化、交互方式探索引文网络，或希望分享可浏览的图。

9b. 交互式图服务器（serve）

启动本地HTTP服务器进行交互式图管理。与export-html（静态、只读）不同，serve允许用户添加论文、将节点转换为

opg学术文献发现与引文网络分析

opg

OpenPaperGraph — Literature Discovery & Citation Analysis

Setup

Architecture: Multi-Source

Available Commands

1. Search Papers

2. Build Citation Network

3. Paper Recommendations

4. Monitor New Papers

5. Topic Analysis

6. Research Summary

7. PDF Reference Extraction

7b. Build Graph from PDF Reference Lists

8. Zotero Import

9. Export

9b. Export Interactive HTML Graph

9b. Interactive Graph Server (serve)

10. Remove Seed Paper

11. Remove Non-Seed Paper

12. List Conferences

13. List LLM Providers

Workflow Guidelines

Output Format

Environment Variables

S2_API_KEY (Recommended)

LLM Provider API Key (Optional — any one of 20 providers)

Cross-Tool Compatibility

OpenPaperGraph — 文献发现与引文分析

环境配置

架构：多源数据

可用命令

1. 搜索论文

2. 构建引文网络

3. 论文推荐

4. 监控新论文

5. 主题分析

6. 研究摘要

7. PDF参考文献提取

7b. 从PDF参考文献列表构建图

8. Zotero导入

9. 导出

9b. 导出交互式HTML图

9b. 交互式图服务器（serve）

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement

9b. Interactive Graph Server (`serve`)

`S2_API_KEY` (Recommended)