ArXiv Paper Processor
Use this skill for per-paper manual summarization, with optional batch artifact download.
- - Single-paper mode: process one paper directory (e.g.
<run_dir>/<arxiv_id>/). - Batch predownload mode: process many paper directories under one run dir before writing summaries.
Language Parameter
- - Use a workflow language parameter (for example
English or Chinese) and apply it manually. - The per-paper
summary.md must be written in the selected language. - If download scripts are called directly, pass
--language <LANG> for traceability.
Core Principle
Scripts only fetch artifacts. The model performs reading and writing.
Non-negotiable Constraint
- - Do not generate
summary.md by script-based snippet extraction, regex harvesting, or template autofill. - Do not use Python/shell scripts to auto-compose section text from abstract/introduction fragments.
- Scripts in this skill are only for artifact download (
source/pdf) and trace logs. - The final
summary.md must come from model-side reading and synthesis of the paper content.
Optional Batch Artifact Download (Many Papers)
Use this first when Stage B has many papers:
CODEBLOCK0
Key behavior:
- - Supports
--artifact source, --artifact pdf, or --artifact source_then_pdf (default). - Supports concurrency (
--max-workers) and safe throttling/retry (--min-interval-sec, retry args). - Uses run-local throttle state by default (
<run_dir>/.runtime/arxiv_download_state.json) to reduce 429 risk. - Skips papers that already have usable
source/source_extract/*.tex or existing source/paper.pdf (unless --force). - Resume-friendly: if a paper already has a completed
summary.md, you can skip that paper's summary-writing step. - Writes batch log to
<run_dir>/download_batch_log.json by default.
Step 1: Download Source (Preferred)
CODEBLOCK1
This writes:
- - INLINECODE20
- INLINECODE21
- INLINECODE22
If usable source already exists and --force is not set, the script reuses local artifacts.
Step 2: If Needed, Download PDF
CODEBLOCK2
This writes:
- - INLINECODE24
- INLINECODE25
If PDF already exists and --force is not set, the script reuses local artifacts.
Step 3: Model Reads and Summarizes
- 1. If
summary.md already exists and follows the required format, skip this paper and mark it complete. - Read
metadata.md first. - If
source/source_extract/ already exists with readable .tex files, use it directly. - Otherwise, if
source/paper.pdf already exists, use PDF directly. - If neither exists, run download scripts (single-paper scripts or batch script) first.
- Manually write
summary.md in the same paper directory, in the selected language.
Do not rely on rule-based auto summarization.
Do not rely on auto-extracted snippets as the primary writing basis.
Quality Requirement
- - Every section should include paper-specific details that are traceable to full-text reading.
- Section 4/5/10 should reflect concrete method and evaluation details, not generic wording.
- If key details are unclear in the source, explicitly note uncertainty instead of guessing.
- Match the detail level shown in
references/summary-example-en.md and references/summary-example-zh.md. - If your draft is clearly shorter or less specific than the examples, expand it before finishing.
Required Output
- -
<paper_dir>/summary.md in fixed section format. - Pay special attention to section
## 10. Brief Conclusion: write a 3-4 sentence mini-conclusion that covers contribution, method, evaluation setup, and results with paper-specific details. - In section
## 1. Paper Snapshot, use exact keys: ArXiv ID, Title, Authors, Publish date, Primary category, Reading basis. - Do not use key variants such as
Reading source, Author list, Published on, or lowercase key names.
See references/summary-format.md for exact section requirements.
Related Skills
This skill is a sub-skill of arxiv-summarizer-orchestrator.
Pipeline position:
- 1. Step 1 (upstream):
arxiv-search-collector produces the selected paper directories and metadata. - Step 2 (this skill):
arxiv-paper-processor downloads artifacts and writes one summary.md per paper. - Step 3 (downstream):
arxiv-batch-reporter uses these per-paper summaries to generate the final collection report.
Use this skill together with Step 1 and Step 3 for full end-to-end execution.
ArXiv论文处理器
使用此技能进行单篇论文的手动摘要生成,并可选择批量下载工件。
- - 单篇论文模式:处理一个论文目录(例如 dir>/id>/)。
- 批量预下载模式:在写入摘要前,处理同一运行目录下的多个论文目录。
语言参数
- - 使用工作流语言参数(例如 English 或 Chinese)并手动应用。
- 每篇论文的 summary.md 必须使用所选语言编写。
- 如果直接调用下载脚本,请传递 --language 以确保可追溯性。
核心原则
脚本仅负责获取工件。模型负责阅读和撰写。
不可协商的约束
- - 不得通过基于脚本的片段提取、正则表达式收集或模板自动填充来生成 summary.md。
- 不得使用 Python/shell 脚本从摘要/引言片段自动组合章节文本。
- 此技能中的脚本仅用于工件下载(source/pdf)和跟踪日志。
- 最终的 summary.md 必须来自模型侧对论文内容的阅读和综合。
可选的批量工件下载(多篇论文)
当阶段 B 包含多篇论文时,首先使用此功能:
bash
python3 scripts/downloadpapersbatch.py \
--run-dir /path/to/run \
--artifact sourcethenpdf \
--max-workers 3 \
--min-interval-sec 5 \
--language English
关键行为:
- - 支持 --artifact source、--artifact pdf 或 --artifact sourcethenpdf(默认)。
- 支持并发(--max-workers)和安全限流/重试(--min-interval-sec、重试参数)。
- 默认使用运行本地限流状态(dir>/.runtime/arxivdownloadstate.json)以降低 429 风险。
- 跳过已拥有可用 source/sourceextract/*.tex 或现有 source/paper.pdf 的论文(除非使用 --force)。
- 支持断点续传:如果某篇论文已有完整的 summary.md,可跳过该论文的摘要编写步骤。
- 默认将批量日志写入 dir>/downloadbatch_log.json。
步骤 1:下载源文件(首选)
bash
python3 scripts/downloadarxivsource.py \
--paper-dir /path/to/run/2602.00528 \
--language English
此命令会写入:
- - source/sourcebundle.bin
- source/sourceextract/
- source/downloadsourcelog.json
如果可用的源文件已存在且未设置 --force,脚本将重用本地工件。
步骤 2:如有需要,下载 PDF
bash
python3 scripts/downloadarxivpdf.py \
--paper-dir /path/to/run/2602.00528 \
--language English
此命令会写入:
- - source/paper.pdf
- source/downloadpdflog.json
如果 PDF 已存在且未设置 --force,脚本将重用本地工件。
步骤 3:模型阅读并生成摘要
- 1. 如果 summary.md 已存在且符合要求的格式,跳过该论文并标记为完成。
- 首先阅读 metadata.md。
- 如果 source/source_extract/ 已存在且包含可读的 .tex 文件,直接使用。
- 否则,如果 source/paper.pdf 已存在,直接使用 PDF。
- 如果两者都不存在,先运行下载脚本(单篇论文脚本或批量脚本)。
- 在同一论文目录中,使用所选语言手动编写 summary.md。
不要依赖基于规则的自动摘要生成。
不要将自动提取的片段作为主要的编写依据。
质量要求
- - 每个章节都应包含可追溯到全文阅读的论文特定细节。
- 第 4/5/10 节应反映具体的方法和评估细节,而非通用措辞。
- 如果源文件中关键细节不明确,请明确标注不确定性,而非猜测。
- 与 references/summary-example-en.md 和 references/summary-example-zh.md 中展示的详细程度相匹配。
- 如果您的草稿明显比示例更短或更不具体,请在完成前进行扩展。
必需输出
- - /summary.md,采用固定章节格式。
- 特别注意 ## 10. Brief Conclusion 章节:撰写 3-4 句的简短结论,涵盖贡献、方法、评估设置和结果,并包含论文特定细节。
- 在 ## 1. Paper Snapshot 章节中,使用精确键名:ArXiv ID、Title、Authors、Publish date、Primary category、Reading basis。
- 不要使用诸如 Reading source、Author list、Published on 或小写键名等变体。
具体章节要求请参见 references/summary-format.md。
相关技能
此技能是 arxiv-summarizer-orchestrator 的子技能。
流水线位置:
- 1. 步骤 1(上游):arxiv-search-collector 生成选定的论文目录和元数据。
- 步骤 2(本技能):arxiv-paper-processor 下载工件并为每篇论文编写一个 summary.md。
- 步骤 3(下游):arxiv-batch-reporter 使用这些单篇论文摘要生成最终的集合报告。
将本技能与步骤 1 和步骤 3 结合使用,以实现完整的端到端执行。