ArXiv Paper Processor

Use this skill for per-paper manual summarization, with optional batch artifact download.

- Single-paper mode: process one paper directory (e.g. <run_dir>/<arxiv_id>/).
Batch predownload mode: process many paper directories under one run dir before writing summaries.

Language Parameter

- Use a workflow language parameter (for example English or Chinese) and apply it manually.
The per-paper summary.md must be written in the selected language.
If download scripts are called directly, pass --language <LANG> for traceability.

Core Principle

Scripts only fetch artifacts. The model performs reading and writing.

Non-negotiable Constraint

- Do not generate summary.md by script-based snippet extraction, regex harvesting, or template autofill.
Do not use Python/shell scripts to auto-compose section text from abstract/introduction fragments.
Scripts in this skill are only for artifact download (source/pdf) and trace logs.
The final summary.md must come from model-side reading and synthesis of the paper content.

Optional Batch Artifact Download (Many Papers)

Use this first when Stage B has many papers:

CODEBLOCK0

Key behavior:

- Supports --artifact source, --artifact pdf, or --artifact source_then_pdf (default).
Supports concurrency (--max-workers) and safe throttling/retry (--min-interval-sec, retry args).
Uses run-local throttle state by default (<run_dir>/.runtime/arxiv_download_state.json) to reduce 429 risk.
Skips papers that already have usable source/source_extract/*.tex or existing source/paper.pdf (unless --force).
Resume-friendly: if a paper already has a completed summary.md, you can skip that paper's summary-writing step.
Writes batch log to <run_dir>/download_batch_log.json by default.

Step 1: Download Source (Preferred)

CODEBLOCK1

This writes:

- INLINECODE20
INLINECODE21
INLINECODE22

If usable source already exists and --force is not set, the script reuses local artifacts.

Step 2: If Needed, Download PDF

CODEBLOCK2

This writes:

- INLINECODE24
INLINECODE25

If PDF already exists and --force is not set, the script reuses local artifacts.

Step 3: Model Reads and Summarizes

1. If summary.md already exists and follows the required format, skip this paper and mark it complete.
Read metadata.md first.
If source/source_extract/ already exists with readable .tex files, use it directly.
Otherwise, if source/paper.pdf already exists, use PDF directly.
If neither exists, run download scripts (single-paper scripts or batch script) first.
Manually write summary.md in the same paper directory, in the selected language.

Do not rely on rule-based auto summarization.
Do not rely on auto-extracted snippets as the primary writing basis.

Quality Requirement

- Every section should include paper-specific details that are traceable to full-text reading.
Section 4/5/10 should reflect concrete method and evaluation details, not generic wording.
If key details are unclear in the source, explicitly note uncertainty instead of guessing.
Match the detail level shown in references/summary-example-en.md and references/summary-example-zh.md.
If your draft is clearly shorter or less specific than the examples, expand it before finishing.

Required Output

- <paper_dir>/summary.md in fixed section format.
Pay special attention to section ## 10. Brief Conclusion: write a 3-4 sentence mini-conclusion that covers contribution, method, evaluation setup, and results with paper-specific details.
In section ## 1. Paper Snapshot, use exact keys: ArXiv ID, Title, Authors, Publish date, Primary category, Reading basis.
Do not use key variants such as Reading source, Author list, Published on, or lowercase key names.

See references/summary-format.md for exact section requirements.

Related Skills

This skill is a sub-skill of arxiv-summarizer-orchestrator.

Pipeline position:

1. Step 1 (upstream): arxiv-search-collector produces the selected paper directories and metadata.
Step 2 (this skill): arxiv-paper-processor downloads artifacts and writes one summary.md per paper.
Step 3 (downstream): arxiv-batch-reporter uses these per-paper summaries to generate the final collection report.

Use this skill together with Step 1 and Step 3 for full end-to-end execution.

ArXiv论文处理器

使用此技能进行单篇论文的手动摘要生成，并可选择批量下载工件。

- 单篇论文模式：处理一个论文目录（例如 dir>/id>/）。
批量预下载模式：在写入摘要前，处理同一运行目录下的多个论文目录。

语言参数

- 使用工作流语言参数（例如 English 或 Chinese）并手动应用。
每篇论文的 summary.md 必须使用所选语言编写。
如果直接调用下载脚本，请传递 --language 以确保可追溯性。

核心原则

脚本仅负责获取工件。模型负责阅读和撰写。

不可协商的约束

- 不得通过基于脚本的片段提取、正则表达式收集或模板自动填充来生成 summary.md。
不得使用 Python/shell 脚本从摘要/引言片段自动组合章节文本。
此技能中的脚本仅用于工件下载（source/pdf）和跟踪日志。
最终的 summary.md 必须来自模型侧对论文内容的阅读和综合。

可选的批量工件下载（多篇论文）

当阶段 B 包含多篇论文时，首先使用此功能：

bash
python3 scripts/downloadpapersbatch.py \
--run-dir /path/to/run \
--artifact sourcethenpdf \
--max-workers 3 \
--min-interval-sec 5 \
--language English

关键行为：

- 支持 --artifact source、--artifact pdf 或 --artifact sourcethenpdf（默认）。
支持并发（--max-workers）和安全限流/重试（--min-interval-sec、重试参数）。
默认使用运行本地限流状态（dir>/.runtime/arxivdownloadstate.json）以降低 429 风险。
跳过已拥有可用 source/sourceextract/*.tex 或现有 source/paper.pdf 的论文（除非使用 --force）。
支持断点续传：如果某篇论文已有完整的 summary.md，可跳过该论文的摘要编写步骤。
默认将批量日志写入 dir>/downloadbatch_log.json。

步骤 1：下载源文件（首选）

bash
python3 scripts/downloadarxivsource.py \
--paper-dir /path/to/run/2602.00528 \
--language English

此命令会写入：

- source/sourcebundle.bin
source/sourceextract/
source/downloadsourcelog.json

如果可用的源文件已存在且未设置 --force，脚本将重用本地工件。

步骤 2：如有需要，下载 PDF

bash
python3 scripts/downloadarxivpdf.py \
--paper-dir /path/to/run/2602.00528 \
--language English

此命令会写入：

- source/paper.pdf
source/downloadpdflog.json

如果 PDF 已存在且未设置 --force，脚本将重用本地工件。

步骤 3：模型阅读并生成摘要

1. 如果 summary.md 已存在且符合要求的格式，跳过该论文并标记为完成。
首先阅读 metadata.md。
如果 source/source_extract/ 已存在且包含可读的 .tex 文件，直接使用。
否则，如果 source/paper.pdf 已存在，直接使用 PDF。
如果两者都不存在，先运行下载脚本（单篇论文脚本或批量脚本）。
在同一论文目录中，使用所选语言手动编写 summary.md。

不要依赖基于规则的自动摘要生成。
不要将自动提取的片段作为主要的编写依据。

质量要求

- 每个章节都应包含可追溯到全文阅读的论文特定细节。
第 4/5/10 节应反映具体的方法和评估细节，而非通用措辞。
如果源文件中关键细节不明确，请明确标注不确定性，而非猜测。
与 references/summary-example-en.md 和 references/summary-example-zh.md 中展示的详细程度相匹配。
如果您的草稿明显比示例更短或更不具体，请在完成前进行扩展。

必需输出

- /summary.md，采用固定章节格式。
特别注意 ## 10. Brief Conclusion 章节：撰写 3-4 句的简短结论，涵盖贡献、方法、评估设置和结果，并包含论文特定细节。
在 ## 1. Paper Snapshot 章节中，使用精确键名：ArXiv ID、Title、Authors、Publish date、Primary category、Reading basis。
不要使用诸如 Reading source、Author list、Published on 或小写键名等变体。

具体章节要求请参见 references/summary-format.md。

arxiv-paper-processor 论文处理工具