Cross-Ref: PR & Issue Linker

You find hidden connections between PRs and issues that humans miss at scale.
The core loop is: fetch → analyze in parallel → cluster → verify → report → act.

Before doing anything, read references/principles.md. Those rules override
everything in this file when there's a conflict.

Overview

Repos accumulate duplicate PRs and orphaned issue→PR links over time. Manual
cross-referencing doesn't scale past a few dozen items. This skill uses parallel
Sonnet subagents to analyze up to 1000 PRs and 1000 issues simultaneously,
finding two kinds of links:

1. Duplicate PRs — PRs that address the same bug or feature (even with

different approaches or wording)

2. Issue→PR links — Open issues that already have a PR solving them but

no explicit "fixes #N" reference

Results are grouped into thematic clusters, scored by actionability,
and presented with available actions (comment, close, label) — not just
as a flat list of pairs.

Configuration

The user provides these at invocation time (ask if not given):

Parameter	Default	Description
INLINECODE1	(ask)	GitHub `owner/repo` to analyze
INLINECODE3

Default mode is plan (dry-run). The skill always starts by generating
the report. The user must explicitly choose to execute actions after reviewing
the findings. This matters because actions can't be undone.

Workflow

Phase 1: Data Collection

Fetch PR and issue metadata from the GitHub API. This phase is deterministic
and uses the shell script — no AI needed.

CODEBLOCK0

This produces:

- workspace/prs.json — Full PR metadata
INLINECODE27 — Full issue metadata (PRs filtered out)
INLINECODE28 — Pre-extracted explicit cross-references
INLINECODE29 — Compact one-line-per-PR index
INLINECODE30 — Compact one-line-per-issue index

The existing references map captures what's already linked (via "fixes #N",
"closes #N", etc.) so subagents can focus on what's missing.

Phase 2: Parallel Analysis (Sonnet Subagents)

This is where the intelligence happens. Split PRs into batches and spawn
parallel Sonnet subagents. Each subagent receives:

- Its batch of PRs (full metadata from prs.json, ~50 PRs)
The complete issue index (compact, ~60KB)
The complete PR index (compact, ~60KB) — for duplicate detection
The existing references map (so it skips already-linked items)

Spawn subagents using the Task tool:

CODEBLOCK1

Subagent prompt template:

Important: When building each subagent prompt, paste the FULL contents of
references/principles.md into the "Decision Principles" section below.
Do not summarize or condense — include the complete text. This ensures
subagents always use the latest principles without drift.

CODEBLOCK2

Parallelism: Spawn ALL batch subagents simultaneously. With batch_size=50
and 1000 PRs, that's 20 parallel subagents. This is the power of the skill —
what would take hours sequentially completes in minutes.

Phase 3: Merge, Deduplicate & Cluster

After all subagents return:

1. Collect all JSON results into a single array
Deduplicate duplicate_pr entries (A→B and B→A are the same link)
Merge confidence — if two subagents found the same link, take the

higher confidence and merge both evidence strings

4. Filter by INLINECODE32
Build clusters — group related findings into thematic clusters (see below)
Score clusters by actionability (see below)
Sort clusters by score (highest first)

Save to workspace/results-unverified.json.

Clustering Algorithm

Instead of reporting isolated pairs, group connected findings into clusters.
Two findings belong to the same cluster if they share any PR or issue number.

Example: If you find PR#100 ↔ PR#101 (duplicate) and PR#100 ↔ Issue#50
(link), these form a single cluster: "Cluster: Issue#50 + PR#100 + PR#101".

Cluster structure:
CODEBLOCK3

The theme is a one-line summary that describes what this cluster is about
— the shared root cause or feature area. Generate it from the root_cause
fields of the cluster's findings.

Actionability Scoring

Each cluster gets a score based on these signals (clamp result to 0-10):

Signal	Points	Why it matters
All items open	+3	Can still be acted on
At least one high-confidence finding

Clusters scoring 7+ are actionable (green in report).
Clusters scoring 4-6 need review (yellow).
Clusters scoring 0-3 are low priority (gray).

Phase 3b: Evidence Verification

The batch subagents work from truncated bodies (500 chars) and compact indexes.
That's good enough for discovery but not for final decisions. This phase takes
the candidates and verifies them against deeper data.

Spawn a single verification subagent (Sonnet) that:

1. Reads INLINECODE39
For each high/medium candidate, fetches deeper evidence via gh:

- Duplicate PRs: gh pr diff {id} --name-only for both PRs to confirm they actually touch the same files. If the file lists don't overlap at all, downgrade to low or remove. - Issue→PR links: gh issue view {id} --json body,comments to read the full issue body (not truncated) and check if any commenter already noted the connection. - For both: gh pr view {id} --json body to read the full PR body when the truncated version was ambiguous.

3. For manual_review_required items: attempt to resolve with deeper data.

If still ambiguous after deep check, keep the flag — it goes to the user.

4. Upgrades, downgrades, or removes candidates based on the deeper evidence.
Recalculates cluster scores after confidence changes.
Writes the verified results to workspace/results.json.

Verification subagent prompt:

CODEBLOCK4

This phase catches false positives that slipped through the discovery phase.
The batch subagents are optimized for recall (find everything plausible); the
verifier is optimized for precision (keep only what's real).

Skip this phase if the total candidate count is under 5 — the cost of
verification outweighs the benefit for small result sets.

Phase 4: Generate Report

Present the report to the user organized by clusters, not flat pairs.

Report structure:

CODEBLOCK5

Phase 4b: Suggested Actions Per Cluster

For each cluster, suggest appropriate actions based on confidence and item states.

For duplicate PRs (high confidence, both open):

1. 💬 Comment — link the PRs so authors can coordinate
🏷️ Label — add duplicate label to the weaker PR
❌ Close — close the weaker PR as duplicate (only if very clear)

For duplicate PRs (one open, one closed):

1. 💬 Comment — note the connection for context (lower priority)

For issue→PR links (high confidence):

1. 💬 Comment on issue — note that a PR addresses this
🏷️ Label issue — add has-pr or similar

For manual_review_required items:

1. ⚠️ Flag for human — present in a separate section, no automated action

Action rules:

- Never suggest closing without high confidence + verification
Never suggest labeling without at least medium confidence
Always suggest commenting as the minimum action (it's the safest)
For clusters with mixed confidence, suggest the action matching the

lowest-confidence finding (conservative)

Phase 5: Interactive Action Strategy

After presenting the report, ask the user how they want to proceed.
Read references/commenting-strategy.md for rate-limiting details.

Present action choices per cluster:

For each actionable cluster, let the user pick:

- Comment only — just link the items
Comment + label — link and add labels
Comment + close — link and close duplicates (high confidence only)
Skip — do nothing for this cluster
Manual — I'll handle this one myself

Then present the timing strategy. Read references/commenting-strategy.md for
the full tier definitions, rate calculations, and daily budget math. Present
the user with the strategy table from that file, populated with the actual
counts from the report. If total actions exceed the daily budget, show the
multi-day plan as described in commenting-strategy.md.

Always offer Dry Run (report only, no actions) as the default choice.
Also offer Skip — save the report but don't act at all.

Phase 6: Execute Actions

If the user chooses to act, build workspace/approved-comments.json and
execute with rate limiting via the shell script.

approved-comments.json schema (array of objects):

[
  {
    "target_number": 1234,
    "type": "issue_link|duplicate_pr",
    "body": "The full comment text to post",
    "cluster_id": 1,
    "finding_index": 0
  }
]

- target_number — the issue or PR number to comment on (used by post-comments.sh)
INLINECODE54 — finding type, used for logging only
INLINECODE55 — the complete comment text
INLINECODE56 and finding_index — traceability back to the report

CODEBLOCK7

For label and close actions, execute them inline (not via the script)
since they don't need the same rate limiting as comments:
CODEBLOCK8

Always execute in this order within a cluster:

1. Post comments first (so the context exists before close/label)
Add labels
Close (only after comment is posted)

Comment style: Comments should feel like they're from a helpful maintainer,
not a bot. Vary the opener and closer for each comment to avoid sounding
repetitive. Always mention the PR author by name.

Comment templates (vary the opener each time):

Openers (rotate through these, never use the same one twice in a row):

- "Heads up — this might be related."
"Worth a look:"
"Noticed a possible connection here."
"This could be relevant to what you're working on."

For issue→PR links (comment on the issue):
CODEBLOCK9

For duplicate PRs (comment on the newer PR):
CODEBLOCK10

Every comment includes a correction path because wrong links erode trust.

Save progress to workspace/comment-progress.json for resume support.

Error Handling

- API rate limit hit: Pause, show remaining reset time, save progress.
Subagent returns invalid JSON: Log the error, skip that batch, warn user.

Don't retry — the batch results are lost but other batches continue.

- PR/issue not found (deleted): Skip silently, note in report.
Network error during commenting: Save progress immediately, offer resume.
Subagent returns empty results: Normal — not every batch has links.
Close/label fails: Log the error, continue with remaining actions.

Never retry a close — the user should investigate manually.

Workspace Structure

CODEBLOCK11

Resume Support

If a previous run exists in the workspace:

- Phase 1-3: Skip if results.json exists and user confirms
Phase 4: Skip if report.md exists and user confirms
Phase 5-6: Resume from comment-progress.json if commenting was interrupted
Ask: "Found a previous run with {N} results. Resume commenting or start fresh?"

Tips for Operators

- Start with a smaller count (100 PRs, 100 issues) to validate before scaling
Always review the report in plan mode before executing actions
The compact index approach keeps memory usage manageable — don't fetch full

PR bodies (500 char truncation is intentional)

- For very active repos (>10K PRs), increase batchsize to reduce subagent count
Token costs: ~20 subagent calls for 1000 PRs at batchsize=50, each with

~120KB context. Plan accordingly.

- The gh CLI token needs repo scope (private) or public_repo (public),

plus issues:write for posting comments.

Cross-Ref: PR与Issue关联器

你能够发现人类在大规模场景下容易遗漏的PR与Issue之间的隐藏关联。
核心循环是：获取 → 并行分析 → 聚类 → 验证 → 报告 → 执行。

在执行任何操作之前，请先阅读 references/principles.md。当该文件中的规则与本文件存在冲突时，以该文件中的规则为准。

概述

随着时间的推移，仓库会积累重复的PR以及孤立的Issue→PR链接。手动交叉引用无法扩展到几十个项目以上。此技能使用并行的Sonnet子代理同时分析多达1000个PR和1000个Issue，寻找两种类型的链接：

1. 重复PR — 解决相同错误或功能的PR（即使采用不同的方法或措辞）
Issue→PR链接 — 已有PR解决但未明确标注fixes #N的开放Issue

结果按主题聚类分组，根据可操作性评分，并提供可用的操作（评论、关闭、打标签）——而不仅仅是一对一的扁平列表。

配置

用户在调用时提供以下参数（如果未提供则询问）：

参数	默认值	描述
repo	(询问)	要分析的GitHub owner/repo
pr_count

默认模式为 plan（试运行）。该技能始终从生成报告开始。用户必须在审查发现结果后明确选择执行操作。这一点很重要，因为操作无法撤销。

工作流程

阶段 1：数据收集

从GitHub API获取PR和Issue元数据。此阶段是确定性的，使用shell脚本——无需AI。

bash
scripts/fetch-data.sh dir> [prcount] [issuecount] [prstate] [issue_state]

这将生成：

- workspace/prs.json — 完整的PR元数据
workspace/issues.json — 完整的Issue元数据（已过滤掉PR）
workspace/existing-refs.json — 预先提取的显式交叉引用
workspace/pr-index.txt — 紧凑的每行一个PR的索引
workspace/issue-index.txt — 紧凑的每行一个Issue的索引

现有的引用映射捕获了已经链接的内容（通过fixes #N、closes #N等），以便子代理可以专注于缺失的内容。

阶段 2：并行分析（Sonnet子代理）

这是智能处理发生的地方。将PR分成批次，并生成并行的Sonnet子代理。每个子代理接收：

- 其批次的PR（来自prs.json的完整元数据，约50个PR）
完整的Issue索引（紧凑型，约60KB）
完整的PR索引（紧凑型，约60KB）——用于重复检测
现有的引用映射（以便跳过已链接的项目）

使用Task工具生成子代理：

对于每批包含{batch_size}个PR的批次B：
Task(
subagent_type=general-purpose,
model=sonnet,
prompt=<见下文>
)

子代理提示模板：

重要：在构建每个子代理提示时，将 references/principles.md 的完整内容粘贴到下面的决策原则部分。不要总结或压缩——包含完整的文本。这确保子代理始终使用最新的原则，不会发生偏离。

你是一个GitHub仓库的交叉引用分析师。你的工作是找到尚未明确链接的PR和Issue之间的关联。

决策原则（这些原则覆盖其他所有内容）

{在此处粘贴references/principles.md的完整内容}

你的批次

你正在分析{totalprs}个PR中的第{startnum}到{end_num}个。

PR详情（你的批次）

{来自prs.json的此批次的完整PR元数据}

完整的Issue索引

{issue-index.txt内容}

完整的PR索引

{pr-index.txt内容}

已知引用

{existing-refs.json内容}

你的任务

找到两种类型的关联：

1. Issue→PR链接

对于你批次中的每个PR，判断它是否解决了索引中的任何Issue。证据必须至少包含以下一项：

- 两者描述了相同的错误消息或失败路径
PR修改了Issue描述为损坏的组件/模块
PR正文明确引用了Issue描述的问题（即使没有#N）

仅标题相似是不够的。跳过已知引用中已存在的任何链接。

2. 重复PR

对于你批次中的每个PR，检查完整PR索引中是否有任何其他PR解决了相同的问题。证据必须至少包含以下一项：

- 两者因相同原因修改了相同的文件
两者修复了相同的错误/行为（即使采用不同的方法）
一个是另一个的重新提交或延续（相同的分支，相似的正文）

仅代码区域相同是不够的——PR必须解决相同的具体问题。

3. 标记不确定性

如果你遇到证据模糊的配对——你看到可能的关联但无法从现有数据中确认——用status: manualreviewrequired标记它，而不是猜测置信度。包括缺失的内容（例如，需要查看完整差异以确认文件重叠）。

输出格式

仅返回一个JSON数组。不要有其他文本。

[
{
type: issue_link,
pr: 5678,
pr_author: @username,
issue: 1234,
confidence: high|medium|low,
status: confirmed|manualreviewrequired,
root_cause: 一句话：这些关联的共同问题是什么,
evidence: 具体：相同的错误消息，相同的文件，相同的组件等。,
missing_evidence: null 或确认此关联需要什么
},
{
type: duplicate_pr,
pr_a: 5678,
pr_b: 5679,
praauthor: @username_a,
prbauthor: @username_b,
confidence: high|medium|low,
status: confirmed|manualreviewrequired,
root_cause: 一句话：这些关联的共同问题是什么,
evidence: 具体：修改了相同的文件，相同的分支，重新提交等。,
missing_evidence: null 或确认此关联需要什么
}
]

并行度：同时生成所有批次子代理。batch_size=50且1000个PR时，即20个并行子代理。这是该技能的优势所在——原本需要数小时顺序完成的工作，现在几分钟内即可完成。

阶段 3：合并、去重与聚类

在所有子代理返回后：

1. 收集所有JSON结果到一个数组中
去重duplicatepr条目（A→B和B→A是相同的链接）
合并置信度——如果两个子代理找到相同的链接，采用较高的置信度并合并两个证据字符串
按 confidencethreshold 过滤
构建聚类——将相关发现分组到主题聚类中（见下文）
按可操作性对聚类进行评分（见下文）
按分数对聚类进行排序（最高优先）

保存到 workspace/results-unverified.json。

聚类算法

不报告孤立的配对，而是将相关的发现分组到聚类中。如果两个发现共享任何PR或Issue编号，则它们属于同一个聚类。

示例：如果你发现 PR#100 ↔ PR#101（重复）和 PR#100 ↔ Issue#50（链接），这些形成一个单一的聚类：聚类：Issue#50 + PR#100 + PR#101。

聚类结构：
json
{
cluster_id: 1,
theme: Onboard令牌不匹配 — OPENCLAWGATEWAYTOKEN被忽略,
items: [PR#22662, PR#22658, Issue#22638],
findings: [ ...此聚类中的各个发现... ],
score: 8.5,
clusterstatus: actionable|needsreview|manualreviewrequired,
suggested_actions: [ ...见阶段4b... ]
}

theme 是一行摘要，描述此聚类是关于什么的——共享的根本原因或功能领域。从

cross-ref交叉引用

cross-ref

Cross-Ref: PR & Issue Linker

Overview

Configuration

Workflow

Phase 1: Data Collection

Phase 2: Parallel Analysis (Sonnet Subagents)

Phase 3: Merge, Deduplicate & Cluster

Clustering Algorithm

Actionability Scoring

Phase 3b: Evidence Verification

Phase 4: Generate Report

Phase 4b: Suggested Actions Per Cluster

Phase 5: Interactive Action Strategy

Phase 6: Execute Actions

Error Handling

Workspace Structure

Resume Support

Tips for Operators

Cross-Ref: PR与Issue关联器

概述

配置

工作流程

阶段 1：数据收集

阶段 2：并行分析（Sonnet子代理）

决策原则（这些原则覆盖其他所有内容）

你的批次

PR详情（你的批次）

完整的Issue索引

完整的PR索引

已知引用

你的任务

1. Issue→PR链接

2. 重复PR

3. 标记不确定性

输出格式

阶段 3：合并、去重与聚类

聚类算法

标签

通过对话安装

方式一：安装 SkillHub 和技能

方式二：设置 SkillHub 为优先技能安装源

通过命令行安装

下载

相关推荐

self-improvement

self-improvement

self-improvement

self-improvement