Citation Diversifier (budget-as-constraints) [NO NEW FACTS]
Purpose: fix a common survey failure mode:
- - the draft reads under-cited (or reuses the same few citations everywhere)
- the pipeline fails the global unique-citation gate
This skill does not change prose by itself.
It produces a constraint sheet: output/CITATION_BUDGET_REPORT.md.
Inputs
- - INLINECODE1
- INLINECODE2 (H3 ids/titles; used to allocate budgets per subsection)
- INLINECODE3 (source of
allowed_bibkeys_{selected,mapped,chapter,global} per H3) - INLINECODE5
Output
Non-negotiables (NO NEW FACTS)
- - Only propose citation keys that exist in
citations/ref.bib. - Only propose keys that are in-scope for the target H3 (prefer subsection-first scope; use chapter/global only when truly cross-cutting).
- Do not propose “padding citations” that would require adding new claims or new numbers.
What a good budget report looks like (contract)
The report should feel like a constraint sheet, not a random list:
- - It states the blocking policy target and the gap-to-target (how many unique keys are missing; policy default is
recommended). - For each H3, it proposes a scope-safe budget sized to actually close the gap:
- small gaps: 3-6 keys / H3 is often enough
- A150++ gaps: plan for ~6-12 keys / H3 (and avoid duplicates across H3 budgets)
- - It gives placement guidance (where in the subsection those keys can be embedded without adding new facts).
Canonical (parseable) lines required (downstream validators depend on these):
- - The target is derived from
queries.md:citation_target (recommended by default for A150++). - INLINECODE11
- INLINECODE12 (gap-to-target; if
0, injection can be a no-op PASS)
Optional (always reported; may be blocking depending on citation_target):
- - INLINECODE15
- INLINECODE16
Recommended prioritization (scope-safe):
- -
allowed_bibkeys_selected → allowed_bibkeys_mapped → INLINECODE19 - Use
allowed_bibkeys_global only for:
- benchmarks/protocol papers
- widely-used datasets/suites
- cross-cutting surveys/method papers referenced across chapters
How this connects to writing (LLM-first)
After you generate the budget report:
- - Apply it using
citation-injector (LLM edits to output/DRAFT.md, NO NEW FACTS). - Then run
draft-polisher to remove any “budget dump voice” while keeping citation keys unchanged.
Important: citation-injector is LLM-first. Its script is validation-only.
Workflow
1) Diagnose the global situation
- - Read
output/DRAFT.md and estimate the “unique-key gap” (or use pipeline-auditor’s FAIL reason).
2) Allocate budgets per H3 (scope-first)
- - Use
outline/outline.yml to enumerate H3s in paper order. - For each H3, read its allowed key sets from
outline/writer_context_packs.jsonl. - Pick a small set of unused keys that strengthen positioning without requiring new claims.
3) Write output/CITATION_BUDGET_REPORT.md
Required structure:
- - INLINECODE30
- INLINECODE31
- INLINECODE32
- INLINECODE33 (gap + strategy)
- INLINECODE34 (H3 id/title → suggested keys → placement hint)
Script (optional; deterministic report generator)
If you want a deterministic first-pass budget report, run the helper script. Treat it as a baseline and refine the plan as needed.
Quick Start
- - INLINECODE35
- INLINECODE36
All Options
- - INLINECODE37
- INLINECODE38 (optional)
- INLINECODE39 (rare override; prefer defaults)
- INLINECODE40 (rare override; default writes
output/CITATION_BUDGET_REPORT.md) - INLINECODE42 (optional)
Examples
- INLINECODE43
Done criteria
- -
output/CITATION_BUDGET_REPORT.md exists and has actionable, in-scope budgets. - After applying the plan via
citation-injector, pipeline-auditor no longer FAILs on global unique citations.
引用多样化(预算约束)[无新增事实]
目的:修复常见的综述失败模式:
- - 草稿引用不足(或通篇重复使用相同的少数引用)
- 流水线未通过全局唯一引用检查
此技能本身不修改文稿内容。
它生成一个约束报告:output/CITATIONBUDGETREPORT.md
输入
- - output/DRAFT.md
- outline/outline.yml(H3 标识符/标题;用于为每个小节分配预算)
- outline/writercontextpacks.jsonl(每个 H3 的 allowedbibkeys{selected,mapped,chapter,global} 来源)
- citations/ref.bib
输出
- - output/CITATIONBUDGETREPORT.md
不可妥协项(无新增事实)
- - 仅提议存在于 citations/ref.bib 中的引用键。
- 仅提议在范围内的目标 H3 引用键(优先使用子节范围;仅在真正跨领域时使用章节/全局范围)。
- 不提议需要添加新主张或新数据的“填充引用”。
优秀预算报告的标准(契约)
报告应像一份约束清单,而非随机列表:
- - 说明阻断策略目标和与目标的差距(缺少多少个唯一键;策略默认值为 recommended)。
- 对于每个 H3,提议一个范围安全的预算,大小足以实际缩小差距:
- 小差距:每个 H3 3-6 个键通常足够
- A150++ 差距:计划每个 H3 约 6-12 个键(并避免跨 H3 预算重复)
- - 提供放置指导(在子节的哪些位置可以嵌入这些键而不添加新事实)。
需要可解析的标准行(下游验证器依赖这些行):
- - 目标来源于 queries.md:citation_target(A150++ 默认值为 recommended)。
- - 全局目标(策略;阻断):>= ...
- - 差距:(与目标的差距;如果为 0,注入可无操作通过)
可选(始终报告;根据 citation_target 可能为阻断项):
- - - 全局推荐目标:>= ...
- - 与推荐目标的差距:
推荐优先级(范围安全):
- - allowedbibkeysselected → allowedbibkeysmapped → allowedbibkeyschapter
- 仅在以下情况使用 allowedbibkeysglobal:
- 基准/协议论文
- 广泛使用的数据集/套件
- 跨章节引用的跨领域综述/方法论文
与写作的关联(LLM 优先)
生成预算报告后:
- - 使用 citation-injector 应用它(LLM 编辑 output/DRAFT.md,无新增事实)。
- 然后运行 draft-polisher 消除任何“预算倾倒语气”,同时保持引用键不变。
重要提示:citation-injector 是 LLM 优先的。其脚本仅用于验证。
工作流程
1) 诊断全局情况
- - 阅读 output/DRAFT.md 并估算“唯一键差距”(或使用 pipeline-auditor 的失败原因)。
2) 按 H3 分配预算(范围优先)
- - 使用 outline/outline.yml 按论文顺序枚举 H3。
- 对于每个 H3,从 outline/writercontextpacks.jsonl 读取其允许的键集。
- 选择一小部分未使用的键,这些键能增强定位而不需要新主张。
3) 编写 output/CITATIONBUDGETREPORT.md
必需结构:
- - - 状态:通过|失败
- - 全局目标(策略;阻断):>= ...
- - 差距:
- ## 摘要(差距 + 策略)
- ## 各子节预算(H3 标识符/标题 → 建议键 → 放置提示)
脚本(可选;确定性报告生成器)
如需确定性的初版预算报告,运行辅助脚本。将其视为基线并根据需要优化计划。
快速开始
- - python scripts/run.py --help
- python scripts/run.py --workspace workspaces/
所有选项
- - --workspace
- --unit-id (可选)
- --inputs <分号分隔>(罕见覆盖;优先使用默认值)
- --outputs <分号分隔>(罕见覆盖;默认写入 output/CITATIONBUDGETREPORT.md)
- --checkpoint (可选)
示例
- python scripts/run.py --workspace workspaces/
完成标准
- - output/CITATIONBUDGETREPORT.md 存在且包含可操作、在范围内的预算。
- 通过 citation-injector 应用计划后,pipeline-auditor 不再因全局唯一引用而失败。