Citation Diversifier (budget-as-constraints) [NO NEW FACTS]

Purpose: fix a common survey failure mode:

- the draft reads under-cited (or reuses the same few citations everywhere)
the pipeline fails the global unique-citation gate

This skill does not change prose by itself.
It produces a constraint sheet: output/CITATION_BUDGET_REPORT.md.

Inputs

- INLINECODE1
INLINECODE2 (H3 ids/titles; used to allocate budgets per subsection)
INLINECODE3 (source of allowed_bibkeys_{selected,mapped,chapter,global} per H3)
INLINECODE5

Output

- INLINECODE6

Non-negotiables (NO NEW FACTS)

- Only propose citation keys that exist in citations/ref.bib.
Only propose keys that are in-scope for the target H3 (prefer subsection-first scope; use chapter/global only when truly cross-cutting).
Do not propose “padding citations” that would require adding new claims or new numbers.

What a good budget report looks like (contract)

The report should feel like a constraint sheet, not a random list:

- It states the blocking policy target and the gap-to-target (how many unique keys are missing; policy default is recommended).
For each H3, it proposes a scope-safe budget sized to actually close the gap:

- small gaps: 3-6 keys / H3 is often enough
- A150++ gaps: plan for ~6-12 keys / H3 (and avoid duplicates across H3 budgets)

- It gives placement guidance (where in the subsection those keys can be embedded without adding new facts).

Canonical (parseable) lines required (downstream validators depend on these):

- The target is derived from queries.md:citation_target (recommended by default for A150++).
INLINECODE11
INLINECODE12 (gap-to-target; if 0, injection can be a no-op PASS)

Optional (always reported; may be blocking depending on citation_target):

- INLINECODE15
INLINECODE16

Recommended prioritization (scope-safe):

- allowed_bibkeys_selected → allowed_bibkeys_mapped → INLINECODE19
Use allowed_bibkeys_global only for:

- benchmarks/protocol papers
- widely-used datasets/suites
- cross-cutting surveys/method papers referenced across chapters

How this connects to writing (LLM-first)

After you generate the budget report:

- Apply it using citation-injector (LLM edits to output/DRAFT.md, NO NEW FACTS).
Then run draft-polisher to remove any “budget dump voice” while keeping citation keys unchanged.

Important: citation-injector is LLM-first. Its script is validation-only.

Workflow

1) Diagnose the global situation

- Read output/DRAFT.md and estimate the “unique-key gap” (or use pipeline-auditor’s FAIL reason).

2) Allocate budgets per H3 (scope-first)

- Use outline/outline.yml to enumerate H3s in paper order.
For each H3, read its allowed key sets from outline/writer_context_packs.jsonl.
Pick a small set of unused keys that strengthen positioning without requiring new claims.

3) Write output/CITATION_BUDGET_REPORT.md
Required structure:

- INLINECODE30
INLINECODE31
INLINECODE32
INLINECODE33 (gap + strategy)
INLINECODE34 (H3 id/title → suggested keys → placement hint)

Script (optional; deterministic report generator)

If you want a deterministic first-pass budget report, run the helper script. Treat it as a baseline and refine the plan as needed.

Quick Start

- INLINECODE35
INLINECODE36

All Options

- INLINECODE37
INLINECODE38 (optional)
INLINECODE39 (rare override; prefer defaults)
INLINECODE40 (rare override; default writes output/CITATION_BUDGET_REPORT.md)
INLINECODE42 (optional)

Examples

- Default IO:

- INLINECODE43

Done criteria

- output/CITATION_BUDGET_REPORT.md exists and has actionable, in-scope budgets.
After applying the plan via citation-injector, pipeline-auditor no longer FAILs on global unique citations.

引用多样化（预算约束）[无新增事实]

目的：修复常见的综述失败模式：

- 草稿引用不足（或通篇重复使用相同的少数引用）
流水线未通过全局唯一引用检查

此技能本身不修改文稿内容。
它生成一个约束报告：output/CITATIONBUDGETREPORT.md

输入

- output/DRAFT.md
outline/outline.yml（H3 标识符/标题；用于为每个小节分配预算）
outline/writercontextpacks.jsonl（每个 H3 的 allowedbibkeys{selected,mapped,chapter,global} 来源）
citations/ref.bib

输出

- output/CITATIONBUDGETREPORT.md

不可妥协项（无新增事实）

- 仅提议存在于 citations/ref.bib 中的引用键。
仅提议在范围内的目标 H3 引用键（优先使用子节范围；仅在真正跨领域时使用章节/全局范围）。
不提议需要添加新主张或新数据的“填充引用”。

优秀预算报告的标准（契约）

报告应像一份约束清单，而非随机列表：

- 说明阻断策略目标和与目标的差距（缺少多少个唯一键；策略默认值为 recommended）。
对于每个 H3，提议一个范围安全的预算，大小足以实际缩小差距：

- 小差距：每个 H3 3-6 个键通常足够
- A150++ 差距：计划每个 H3 约 6-12 个键（并避免跨 H3 预算重复）

- 提供放置指导（在子节的哪些位置可以嵌入这些键而不添加新事实）。

需要可解析的标准行（下游验证器依赖这些行）：

- 目标来源于 queries.md:citation_target（A150++ 默认值为 recommended）。
- 全局目标（策略；阻断）：>= ...
- 差距：（与目标的差距；如果为 0，注入可无操作通过）

可选（始终报告；根据 citation_target 可能为阻断项）：

- - 全局推荐目标：>= ...
- 与推荐目标的差距：

推荐优先级（范围安全）：

- allowedbibkeysselected → allowedbibkeysmapped → allowedbibkeyschapter
仅在以下情况使用 allowedbibkeysglobal：

- 基准/协议论文
- 广泛使用的数据集/套件
- 跨章节引用的跨领域综述/方法论文

与写作的关联（LLM 优先）

生成预算报告后：

- 使用 citation-injector 应用它（LLM 编辑 output/DRAFT.md，无新增事实）。
然后运行 draft-polisher 消除任何“预算倾倒语气”，同时保持引用键不变。

重要提示：citation-injector 是 LLM 优先的。其脚本仅用于验证。

工作流程

1) 诊断全局情况

- 阅读 output/DRAFT.md 并估算“唯一键差距”（或使用 pipeline-auditor 的失败原因）。

2) 按 H3 分配预算（范围优先）

- 使用 outline/outline.yml 按论文顺序枚举 H3。
对于每个 H3，从 outline/writercontextpacks.jsonl 读取其允许的键集。
选择一小部分未使用的键，这些键能增强定位而不需要新主张。

3) 编写 output/CITATIONBUDGETREPORT.md
必需结构：

- - 状态：通过|失败
- 全局目标（策略；阻断）：>= ...
- 差距：
## 摘要（差距 + 策略）
## 各子节预算（H3 标识符/标题 → 建议键 → 放置提示）

脚本（可选；确定性报告生成器）

如需确定性的初版预算报告，运行辅助脚本。将其视为基线并根据需要优化计划。

快速开始

- python scripts/run.py --help
python scripts/run.py --workspace workspaces/

所有选项

- --workspace
--unit-id （可选）
--inputs <分号分隔>（罕见覆盖；优先使用默认值）
--outputs <分号分隔>（罕见覆盖；默认写入 output/CITATIONBUDGETREPORT.md）
--checkpoint （可选）

示例

- 默认输入输出：

- python scripts/run.py --workspace workspaces/

完成标准

- output/CITATIONBUDGETREPORT.md 存在且包含可操作、在范围内的预算。
通过 citation-injector 应用计划后，pipeline-auditor 不再因全局唯一引用而失败。

citation-diversifier引用多样化