Related Works Report Pipeline

Orchestrate a full related-works reporting workflow for: $ARGUMENTS

Overview

This workflow turns a small set of paper markdown files into a reproducible report:

CODEBLOCK0

The output is not just the final report. Every intermediate artifact is written into the user-selected work folder so the process can be resumed, audited, or partially rerun.

Required Inputs

Before running, collect or infer:

- PAPERMDPATHS: one or more paper markdown paths
WORKDIR: a user-specified working folder

If WORKDIR is missing, ask the user for it before doing substantial work.

All process outputs and the final report must be written under WORKDIR. Do not write process artifacts elsewhere.

Constants

- TAVILYONLY = true — Tavily is the only allowed search mechanism for title-to-arXiv matching
TAVILYBATCHMODE = sequential — finish batch N before batch N+1
NOSEARCHFALLBACK = true — if Tavily fails, record the failure; do not switch to arXiv search, arXiv API search, or another provider
LOWCONTEXTABSTRACTMODE = true — write one JSON line per title, then render markdown from JSONL
CITATIONNORMALIZATIONREQUIRED = true — the final workflow must consider a normalized-citation companion section so Part 1 can be aligned with dedup ids like INLINECODE2
FINALREPORTNAME = final_related_works_report.md

Work Folder Layout

Create and maintain these artifacts under WORKDIR:

Artifact	Purpose
INLINECODE5	Per-source Related Works verbatim + citation tables
INLINECODE6

Execution Rule

Run phases in order. Do not stop after a checkpoint unless:

- the user explicitly says to stop, or
an input is missing and must be confirmed, or
Tavily failures are severe enough that the user should decide whether to continue with partial results

Parallelism rules:

- Phase 1 extraction may use one clean-context sub agent per source paper.
Phase 3 abstract lookup must run batches sequentially, not in parallel.

Phase 0: Initialize the Work Folder

1. Validate PAPER_MD_PATHS.
Create WORKDIR, WORKDIR/title_batches, and WORKDIR/abstract_batches.
Record the chosen source files in the first process artifact.

Phase 1: Extract Related Works and cited papers

Use one clean-context sub agent per source markdown file.

Each sub agent must return:

- the verbatim Related Works section
the papers cited inside that section
enough citation metadata to later support normalization and deduplication:

- citation token in the text ([12], Guo et al., 2017, etc.) - year - title - authors - raw reference text

Merge all outputs into WORKDIR/step1_extracted_related_works_and_citations.md.

Phase 2: Deduplicate cited papers

Build WORKDIR/step2_deduplicated_paper_list.md.

Rules:

- deduplicate conservatively by normalized title
merge only when the works are clearly the same paper
preserve source occurrences so every original citation can be traced back

Phase 2B: Citation normalization companion text

Before final assembly, produce WORKDIR/step1_normalized_related_works.md.

Goal:

- keep the original Related Works text untouched in INLINECODE22
create a companion version where in-text citations are rewritten to dedup ids like INLINECODE23

Preferred format:

- numeric citations: [12] -> INLINECODE25
author-year citations: (Guo et al., 2017) -> INLINECODE27
grouped citations: rewrite each cited work individually when the mapping is unambiguous

Rules:

- only replace citations when the mapping from source citation token to dedup id is unambiguous
if a citation is ambiguous, keep the original token and add a short note below that section
do not overwrite the verbatim source text

This step exists because the final report should be easy to align with the deduplicated bibliography.

Phase 3: arXiv abstracts via Tavily + local helper

Use the helper scripts stored inside this skill:

- INLINECODE28
INLINECODE29

Search rules

- Tavily MCP only
query shape: INLINECODE30
preferred matches: abs, then html, then INLINECODE33
convert html/pdf to canonical INLINECODE36
no fallback search provider

Batch rules

- process batches sequentially
inside a batch, process titles one by one
if Tavily rate limits, wait and retry Tavily only
if Tavily still fails, record the error in the JSONL and leave arXiv URL and Abstract empty

Low-context pattern

For each processed title, immediately append one JSON line to:

- INLINECODE39

Each line should include at least:

- INLINECODE40
INLINECODE41
INLINECODE42
INLINECODE43
INLINECODE44
INLINECODE45

After a batch completes, render markdown with:

CODEBLOCK1

Phase 4: Final assembly

Use the final builder script stored inside this skill:

CODEBLOCK2

The final report should contain:

- Summary
Part 1: Related Works original text
Part 1B: citation-normalized companion text
Part 2: BibTeX-style entries with retrieved abstracts when available

Key Rules

- Never fabricate a paper match or abstract.
Never use non-Tavily search when resolving titles to arXiv.
Keep all process artifacts inside WORKDIR.
Prefer scripts inside this skill over ad hoc in-message code.
Preserve source-paper order in Part 1.
Preserve dedup order from step2 in Part 2.

Utility Scripts

- scripts/fetch_arxiv_abs.py — compact metadata + abstract extraction from a known arXiv URL
INLINECODE49 — render one batch markdown from JSONL
INLINECODE50 — assemble the final report from workdir artifacts

Additional Resources

- For copy-paste invocations and expected WORKDIR contents, see examples.md

Example Invocation

CODEBLOCK3

相关工作报告流程

为以下内容编排完整的工作报告流程：$ARGUMENTS

概述

该工作流将少量论文Markdown文件转化为可复现的报告：

text
论文md文件
-> 步骤1 提取相关工作 + 引用的论文
-> 步骤2 去重引用的论文
-> 顺序执行Tavily摘要查找 + 本地arXiv获取
-> 规范化引用伴随文本
-> finalrelatedworks_report.md

输出不仅仅是最终报告。每个中间产物都会写入用户选择的工作文件夹，以便流程可以恢复、审计或部分重新运行。

必需输入

运行前，收集或推断：

- PAPERMDPATHS：一个或多个论文Markdown路径
WORKDIR：用户指定的工作文件夹

如果缺少WORKDIR，在进行实质性工作前向用户询问。

所有流程输出和最终报告必须写入WORKDIR下。不要将流程产物写入其他位置。

常量

- TAVILYONLY = true — Tavily是标题到arXiv匹配唯一允许的搜索机制
TAVILYBATCHMODE = sequential — 完成批次N后再进行批次N+1
NOSEARCHFALLBACK = true — 如果Tavily失败，记录失败；不要切换到arXiv搜索、arXiv API搜索或其他提供商
LOWCONTEXTABSTRACTMODE = true — 每个标题写入一行JSON，然后从JSONL渲染Markdown
CITATIONNORMALIZATIONREQUIRED = true — 最终工作流必须包含规范化引用伴随部分，以便第1部分可以与P001等去重ID对齐
FINALREPORTNAME = finalrelatedworks_report.md

工作文件夹布局

在WORKDIR下创建并维护以下产物：

产物	用途
step1extractedrelatedworksandcitations.md	逐源逐字相关工作 + 引用表格
step1normalizedrelatedworks.md

执行规则

按顺序执行各阶段。除非以下情况，否则不要在检查点后停止：

- 用户明确要求停止，或
缺少输入需要确认，或
Tavily失败严重到用户应决定是否继续使用部分结果

并行规则：

- 阶段1提取可为每篇源论文使用一个干净上下文的子代理。
阶段3摘要查找必须顺序执行批次，不能并行。

阶段0：初始化工作文件夹

1. 验证PAPERMDPATHS。
创建WORKDIR、WORKDIR/titlebatches和WORKDIR/abstractbatches。
在第一个流程产物中记录所选源文件。

阶段1：提取相关工作和引用的论文

每篇源Markdown文件使用一个干净上下文的子代理。

每个子代理必须返回：

- 逐字的相关工作部分
该部分中引用的论文
足够的引用元数据以支持后续规范化和去重：

- 文本中的引用标记（[12]、Guo et al., 2017等） - 年份 - 标题 - 作者 - 原始参考文献文本

将所有输出合并到WORKDIR/step1extractedrelatedworksand_citations.md。

阶段2：去重引用的论文

构建WORKDIR/step2deduplicatedpaper_list.md。

规则：

- 通过规范化标题保守去重
仅在作品明确为同一篇论文时合并
保留来源出现情况，以便每个原始引用都可追溯

阶段2B：引用规范化伴随文本

在最终组装前，生成WORKDIR/step1normalizedrelated_works.md。

目标：

- 在step1extractedrelatedworksand_citations.md中保持原始相关工作文本不变
创建一个伴随版本，其中文内引用被重写为去重ID，如P001

首选格式：

- 数字引用：[12] -> [P052]
作者-年份引用：(Guo et al., 2017) -> [P095]
分组引用：当映射无歧义时，逐个重写每个被引作品

规则：

- 仅当从源引用标记到去重ID的映射无歧义时才替换引用
如果引用有歧义，保留原始标记并在该部分下方添加简短注释
不要覆盖逐字源文本

此步骤存在是因为最终报告应易于与去重后的参考文献对齐。

阶段3：通过Tavily + 本地助手获取arXiv摘要

使用此技能中存储的辅助脚本：

- .cursor/skills/related-works-report-from-paper-mds/scripts/fetcharxivabs.py
.cursor/skills/related-works-report-from-paper-mds/scripts/jsonltoabstractbatchmd.py

搜索规则

- 仅使用Tavily MCP
查询格式：<论文标题> arXiv
优先匹配：abs，然后html，然后pdf
将html/pdf转换为规范格式https://arxiv.org/abs/
无备用搜索提供商

批次规则

- 顺序处理批次
批次内逐个处理标题
如果Tavily限速，等待并仅重试Tavily
如果Tavily仍然失败，在JSONL中记录错误，并保留arXiv URL和Abstract为空

低上下文模式

对于每个处理的标题，立即附加一行JSON到：

- WORKDIR/abstractbatches/batchXX_fetches.jsonl

每行至少包含：

- dedupid
inputtitle
tavilystatus
tavilyerror
arxiv_url
fetch

批次完成后，使用以下命令渲染Markdown：

bash
python3 .cursor/skills/related-works-report-from-paper-mds/scripts/jsonltoabstractbatchmd.py \
WORKDIR/abstractbatches/batchXX_fetches.jsonl \
WORKDIR/abstractbatches/batchXX_results.md

阶段4：最终组装

使用此技能中存储的最终构建脚本：

bash
python3 .cursor/skills/related-works-report-from-paper-mds/scripts/buildfinalrelatedworksreport.py \
WORKDIR/step1extractedrelatedworksand_citations.md \
WORKDIR/step2deduplicatedpaper_list.md \
WORKDIR/abstract_batches \
WORKDIR/finalrelatedworks_report.md \
WORKDIR/step1normalizedrelated_works.md

最终报告应包含：

- 摘要
第1部分：相关工作原始文本
第1B部分：引用规范化伴随文本
第2部分：BibTeX风格条目，附有获取到的摘要（如可用）

关键规则

- 绝不虚构论文匹配或摘要。
在解析标题到arXiv时绝不使用非Tavily搜索。
将所有流程产物保留在WORKDIR内。
优先使用此技能内的脚本，而非临时的消息内代码。
在第1部分中保留源论文顺序。
在第2部分中保留来自step2的去重顺序。

实用脚本

- scripts/fetcharxivabs.py — 从已知arXiv URL提取紧凑元数据+摘要
scripts/jsonltoabstractbatchmd.py — 从JSONL渲染一个批次的Markdown
scripts/buildfinalrelatedworksreport.py — 从工作目录产物组装最终报告

附加资源

- 有关复制粘贴调用和预期的WORKDIR内容，请参见examples.md

示例调用

text
/related-works-report-from-paper-mds \
0refs/papermds/2025ConfidenceVLA.md 0refs/papermds/2025SAFE.md 0refs/papermds/2025FAILDetect.md --workdir 0docs/relatedworksreportrun_02

related-works-report-from-paper-mds论文相关报告生成

related-works-report-from-paper-mds

Related Works Report Pipeline

Overview

Required Inputs

Constants

Work Folder Layout

Execution Rule

Phase 0: Initialize the Work Folder

Phase 1: Extract Related Works and cited papers

Phase 2: Deduplicate cited papers

Phase 2B: Citation normalization companion text

Phase 3: arXiv abstracts via Tavily + local helper

Search rules

Batch rules

Low-context pattern

Phase 4: Final assembly

Key Rules

Utility Scripts

Additional Resources

Example Invocation

相关工作报告流程

概述

必需输入

常量

工作文件夹布局

执行规则

阶段0：初始化工作文件夹

阶段1：提取相关工作和引用的论文

阶段2：去重引用的论文

阶段2B：引用规范化伴随文本

阶段3：通过Tavily + 本地助手获取arXiv摘要

搜索规则

批次规则

低上下文模式

阶段4：最终组装

关键规则

实用脚本

附加资源

示例调用

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement