ArXiv Summarizer Orchestrator

Run the full pipeline by composing three sub-skills.

Sub-skill Order

1. INLINECODE0
INLINECODE1
INLINECODE2

Workflow Parameters

- language: manual language parameter used by all stages. Default is English when omitted.
INLINECODE4: subagent_parallel or serial.
INLINECODE7: default 5 when paper_processing_mode=subagent_parallel.

Workflow

Stage A: Collection Setup + Query Retrieval

1. Initialize one run with arxiv-search-collector/scripts/init_collection_run.py.
Model generates multiple focused queries from original topic and writes a minimal query_plan.json (label + query only).
Run arxiv-search-collector/scripts/fetch_queries_batch.py with the plan file (recommended).
(Optional fallback) call arxiv-search-collector/scripts/fetch_query_metadata.py manually for one-by-one fetch.
Model reads each indexed query list and decides keep indexes.
Merge selected items with arxiv-search-collector/scripts/merge_selected_papers.py.
If relevance/coverage is still not good, iterate Stage A:

- generate another query plan with new labels, - fetch again, - re-merge with --incremental and updated selection-json. - set weak labels to empty keep list ([]) to explicitly drop them.

Pass --language <LANG> to collector scripts so all generated markdown files in Stage A follow the selected language.
Use serial query fetch in Stage A with conservative controls (for example --min-interval-sec 5, --retry-max 4).
Default collector settings already include retries/backoff and run-local throttle state (<run_dir>/.runtime/arxiv_api_state.json), so manual tuning is usually unnecessary.
Prefer cache reuse (no --force) unless query parameters changed or data refresh is required.

Output: one run directory with per-paper metadata subdirectories.

Stage B: Per-paper Artifact Download + Manual Summary

For each paper directory, invoke sub-skill arxiv-paper-processor once and let that skill produce <paper_dir>/summary.md.

Recommended pre-step for many papers:

1. Run one batch artifact download before per-paper reading:

CODEBLOCK0

Per-paper execution steps (inside arxiv-paper-processor):

1. If <paper_dir>/summary.md already exists and is complete, skip this paper.
If usable source (source/source_extract/*.tex) or PDF (source/paper.pdf) already exists, skip download.
If artifacts are missing, download source with arxiv-paper-processor/scripts/download_arxiv_source.py.
If source is unusable, download PDF with arxiv-paper-processor/scripts/download_arxiv_pdf.py.
Model reads content and manually writes <paper_dir>/summary.md by reference format, in language.

Parallel strategy for many papers:

- Default: paper_processing_mode=subagent_parallel with max_parallel_papers=5.
Optional: paper_processing_mode=serial to process one paper at a time.
In parallel mode, run multiple arxiv-paper-processor instances in batches; concurrent papers must not exceed max_parallel_papers.
Wait for one batch to finish before starting the next batch.
In serial mode, run exactly one arxiv-paper-processor instance at a time.
Subagent workers should only own one paper directory each to avoid file conflicts.
Do not use scripts to auto-compose summary text; scripts are download-only tools.

Output: all paper directories contain summary.md.

Stage C: Bundle + Final Hierarchical Report

1. Run arxiv-batch-reporter/scripts/collect_summaries_bundle.py --language <LANG>.
Model reads summaries_bundle.md and writes collection_report_template.md in base dir.
In template, each paper leaf entry must include one standalone placeholder line: {{ARXIV_BRIEF:<arxiv_id>}}.
Run arxiv-batch-reporter/scripts/render_collection_report.py to generate final collection_report.md.
Do not manually paraphrase per-paper conclusion lines in final report; they must come from per-paper summary.md section 10 via script injection.

If language is non-English (for example Chinese), all intermediate markdown files and final reports should follow that language.

Periodic Scheduling

This orchestrator is suitable for cron/scheduled execution in OpenClaw:

- Frequency examples: daily, weekly, monthly.
For rolling windows, use lookback (1d, 7d, 30d) when initializing runs.

Output Layout

INLINECODE53

- task_meta.json, INLINECODE55
INLINECODE56, INLINECODE57
INLINECODE58 + downloaded source/pdf + INLINECODE59
INLINECODE60
INLINECODE61
final rendered collection report (e.g. collection_report.md)

Use references/workflow-checklist.md as execution checklist.

Related Skills

This is the top-level orchestration skill.

Before using it, install and enable these three sub-skills:

- INLINECODE64
INLINECODE65
INLINECODE66

Execution order inside this orchestrator:

1. arxiv-search-collector (Stage A)
INLINECODE68 (Stage B)
INLINECODE69 (Stage C)

ArXiv 摘要生成编排器

通过组合三个子技能来运行完整流水线。

子技能顺序

1. arxiv-search-collector
arxiv-paper-processor
arxiv-batch-reporter

工作流参数

- language: 所有阶段使用的手动语言参数。省略时默认为英语。
paperprocessingmode: subagentparallel 或 serial。
maxparallelpapers: 当 paperprocessingmode=subagentparallel 时默认为 5。

工作流

阶段 A：收集设置 + 查询检索

1. 使用 arxiv-search-collector/scripts/initcollectionrun.py 初始化一次运行。
模型根据原始主题生成多个聚焦查询，并编写一个精简的 queryplan.json（仅包含 label + query）。
使用计划文件运行 arxiv-search-collector/scripts/fetchqueriesbatch.py（推荐）。
（可选回退）手动调用 arxiv-search-collector/scripts/fetchquerymetadata.py 进行逐个获取。
模型读取每个索引查询列表并决定保留的索引。
使用 arxiv-search-collector/scripts/mergeselected_papers.py 合并选中的项目。
如果相关性/覆盖度仍不理想，迭代阶段 A：

- 使用新标签生成另一个查询计划， - 再次获取， - 使用 --incremental 和更新的 selection-json 重新合并。 - 将弱标签设置为空保留列表（[]）以明确丢弃。

向收集脚本传递 --language ，使阶段 A 中生成的所有 markdown 文件遵循所选语言。

在阶段 A 中使用保守控制的串行查询获取（例如 --min-interval-sec 5，--retry-max 4）。

默认收集器设置已包含重试/退避和运行本地节流状态（dir>/.runtime/arxivapi_state.json），因此通常无需手动调整。

优先使用缓存重用（不使用 --force），除非查询参数已更改或需要刷新数据。

输出：一个运行目录，包含每篇论文的元数据子目录。

阶段 B：每篇论文的工件下载 + 手动摘要

对于每个论文目录，调用一次子技能 arxiv-paper-processor，让该技能生成 /summary.md。

对于多篇论文，推荐的预处理步骤：

1. 在逐篇阅读之前，先运行一次批量工件下载：

bash
python3 arxiv-paper-processor/scripts/downloadpapersbatch.py \
--run-dir /path/to/run \
--artifact sourcethenpdf \
--max-workers 3 \
--min-interval-sec 5 \
--language

逐篇论文执行步骤（在 arxiv-paper-processor 内部）：

1. 如果 dir>/summary.md 已存在且完整，则跳过此论文。
如果可用的源文件（source/sourceextract/*.tex）或 PDF（source/paper.pdf）已存在，则跳过下载。
如果工件缺失，使用 arxiv-paper-processor/scripts/downloadarxivsource.py 下载源文件。
如果源文件不可用，使用 arxiv-paper-processor/scripts/downloadarxivpdf.py 下载 PDF。
模型阅读内容，并按照参考格式手动编写 /summary.md，使用指定的 language。

多篇论文的并行策略：

- 默认：paperprocessingmode=subagentparallel，maxparallelpapers=5。
可选：paperprocessingmode=serial，一次处理一篇论文。
在并行模式下，分批运行多个 arxiv-paper-processor 实例；并发论文数不得超过 maxparallel_papers。
等待一批完成后才开始下一批。
在串行模式下，一次只运行一个 arxiv-paper-processor 实例。
子代理工作进程应各自只拥有一个论文目录，以避免文件冲突。
不要使用脚本自动生成摘要文本；脚本仅为下载工具。

输出：所有论文目录包含 summary.md。

阶段 C：打包 + 最终分层报告

1. 运行 arxiv-batch-reporter/scripts/collectsummariesbundle.py --language 。
模型读取 summariesbundle.md，并在基础目录中编写 collectionreporttemplate.md。
在模板中，每篇论文的叶子条目必须包含一个独立的占位行：{{ARXIVBRIEF:id>}}。
运行 arxiv-batch-reporter/scripts/rendercollectionreport.py 生成最终的 collectionreport.md。
不要在最终报告中手动改写每篇论文的结论行；它们必须通过脚本注入来自每篇论文的 summary.md 第10节。

如果 language 是非英语（例如中文），所有中间 markdown 文件和最终报告都应遵循该语言。

定期调度

此编排器适用于 OpenClaw 中的 cron/定时执行：

- 频率示例：每日、每周、每月。
对于滚动窗口，在初始化运行时使用回溯（1d、7d、30d）。

输出布局

/--/

- taskmeta.json、taskmeta.md
queryresults/、queryselection/
id>/metadata.md + 下载的源文件/pdf + summary.md
summariesbundle.md
collectionreporttemplate.md
最终渲染的收集报告（例如 collection_report.md）

使用 references/workflow-checklist.md 作为执行检查清单。

arxiv-summarizer-orchestrator arXiv摘要编排