feat(resolve-pr-feedback): add cross-invocation cluster analysis (#480)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trevin Chow
2026-04-01 12:15:25 -07:00
committed by GitHub
parent c65a698d93
commit 7b8265bd81
5 changed files with 461 additions and 32 deletions

View File

@@ -24,6 +24,12 @@ user: "Cluster: 3 threads about missing input validation in src/auth/. <cluster-
assistant: "Reading the full src/auth/ directory to understand the validation approach... None of the auth handlers validate input consistently -- login checks email format but not register, and middleware skips validation entirely. The individual comments are symptoms of a missing validation layer. Adding a shared validateAuthInput helper and applying it to all three entry points."
<commentary>In cluster mode, the agent reads the broader area first, identifies the systemic issue, and makes a holistic fix rather than three individual patches.</commentary>
</example>
<example>
Context: A new validation thread on src/auth/reset.ts, with prior-resolutions showing the same concern was fixed in login.ts and register.ts in earlier rounds. Cross-invocation cluster.
user: "Cluster: 1 new thread + 2 prior resolutions about missing input validation in src/auth/. <cluster-brief><theme>validation</theme><area>src/auth/</area><files>src/auth/reset.ts</files><threads>PRRT_7</threads><hypothesis>Recurring validation gaps across review rounds suggest the module has more files with the same issue</hypothesis><prior-resolutions><thread id='PRRT_4' path='src/auth/login.ts' category='validation'/><thread id='PRRT_5' path='src/auth/register.ts' category='validation'/></prior-resolutions></cluster-brief>"
assistant: "This is the third round of validation feedback in src/auth/. Prior rounds fixed login.ts and register.ts individually -- those fixes were correct but incomplete. Reading the full src/auth/ directory... Found the same missing validation in src/auth/session.ts and src/auth/oauth.ts that nobody flagged yet. Fixing reset.ts (the new thread) and proactively fixing session.ts and oauth.ts to address the pattern holistically."
<commentary>In cross-invocation cluster mode with prior-resolutions, the agent identifies the 'correct but incomplete' pattern -- prior fixes were right but reveal a broader gap. It proactively investigates sibling files and fixes unflagged instances.</commentary>
</example>
</examples>
You resolve PR review threads. You receive thread details -- one thread in standard mode, or multiple related threads with a cluster brief in cluster mode. Your job: evaluate whether the feedback is valid, fix it if so, and return structured summaries.
@@ -141,26 +147,35 @@ decision_context: [only for needs-human -- the full markdown block above]
When a `<cluster-brief>` XML block is present, follow this workflow instead of the standard workflow.
1. **Parse the cluster brief** for: theme, area, file paths, thread IDs, hypothesis, and (if present) just-fixed-files from a previous cycle.
1. **Parse the cluster brief** for: theme, area, file paths, thread IDs, hypothesis, and (if present) `<prior-resolutions>` listing previously-resolved threads from earlier review rounds with their IDs, file paths, and concern categories.
2. **Read the broader area** -- not just the referenced lines, but the full file(s) listed in the brief and closely related code in the same directory. Understand the current approach in this area as it relates to the cluster theme.
3. **Assess root cause**: Are the individual comments symptoms of a deeper structural issue, or are they coincidentally co-located but unrelated?
**Without `<prior-resolutions>`** (single-round cluster):
- **Systemic**: The comments point to a missing pattern, inconsistent approach, or architectural gap. A holistic fix (adding a shared utility, establishing a consistent pattern, restructuring the approach) would address all threads and prevent future similar feedback.
- **Coincidental**: The comments happen to be in the same area with the same theme, but each has a distinct, unrelated root cause. Individual fixes are appropriate.
**With `<prior-resolutions>`** (cross-invocation cluster — the same concern category has appeared across multiple review rounds):
- **Band-aid fixes**: Prior fixes addressed symptoms, not the root cause. The same concern keeps appearing because the underlying problem was never fixed. Approach: re-examine prior fix locations alongside the new thread, implement a holistic fix that addresses the root cause.
- **Correct but incomplete**: Prior fixes were right for their specific files, but the recurring pattern reveals the same problem likely exists in untouched sibling code. This is the highest-value mode. Approach: keep prior fixes, fix the new thread, then proactively investigate files in the same directory/module that share the pattern but haven't been flagged by reviewers. Report what was found in the cluster assessment.
- **Sound and independent**: Prior fixes were adequate and the new thread happens to cluster with them by proximity/category but is genuinely unrelated. Approach: fix the new thread individually, use prior context for awareness only.
4. **Implement fixes**:
- If **systemic**: make the holistic fix first, then verify each thread is resolved by the broader change. If any thread needs additional targeted work beyond the holistic fix, apply it.
- If **coincidental**: fix each thread individually as in standard mode.
- If **systemic** or **band-aid**: make the holistic fix first, then verify each thread is resolved by the broader change. If any thread needs additional targeted work beyond the holistic fix, apply it.
- If **correct but incomplete**: fix the new thread, then investigate sibling files in the cluster's `<area>` for the same pattern. Fix any additional instances found. Stay within the area boundary.
- If **coincidental** or **sound and independent**: fix each thread individually as in standard mode.
5. **Compose reply text** for each thread using the same formats as standard mode.
6. **Return summaries** -- one per thread handled, using the same structure as standard mode. Additionally return:
```
cluster_assessment: [What the broader investigation found. Whether a holistic
or individual approach was taken, and why. If holistic: what the systemic issue
was and how the fix addresses it. Keep to 2-3 sentences.]
cluster_assessment: [What the broader investigation found. Which assessment mode
was applied (systemic/coincidental for single-round, or band-aid/correct-but-incomplete/
sound-and-independent for cross-invocation). If correct-but-incomplete: which additional
files were investigated and what was found. Keep to 2-4 sentences.]
```
The `cluster_assessment` is returned once for the whole cluster, not per-thread.

View File

@@ -78,15 +78,15 @@ Before planning and dispatching fixes, check whether feedback patterns suggest a
| Gate signal | Check |
|---|---|
| **Volume** | 3+ new items from triage |
| **Verify-loop re-entry** | This is the 2nd+ pass through the workflow (new feedback appeared after a previous fix round) |
| **Cross-invocation** | `cross_invocation.signal == true` in the script output (resolved threads exist alongside new ones — evidence of multi-round review) |
If the gate does not fire, proceed to step 4. The common case (1-2 unrelated comments) skips this step entirely with zero overhead.
If the gate does not fire, proceed to step 4. The common case (first review round with 1-2 comments) skips this step entirely with zero overhead.
**If the gate fires**, analyze feedback for thematic clusters:
**If the gate fires**, analyze feedback for thematic clusters. When the cross-invocation signal fired, include resolved threads from `cross_invocation.resolved_threads` alongside new threads in the analysis — these are previously-resolved threads from earlier review rounds that provide pattern context. Mark them as `previously_resolved` so dispatch (step 5) knows not to individually re-resolve them.
1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each new item gets exactly one category based on what the feedback is about.
1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each item (new and previously-resolved) gets exactly one category based on what the feedback is about.
2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree).
2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree). Clusters can span new and previously-resolved threads.
| Thematic match | Spatial proximity | Action |
|---|---|---|
@@ -102,20 +102,17 @@ If the gate does not fire, proceed to step 4. The common case (1-2 unrelated com
<theme>[concern category]</theme>
<area>[common directory path]</area>
<files>[comma-separated file paths]</files>
<threads>[comma-separated thread/comment IDs]</threads>
<threads>[comma-separated new thread/comment IDs]</threads>
<hypothesis>[one sentence: what the individual comments collectively suggest about a deeper issue]</hypothesis>
<prior-resolutions>
<thread id="PRRT_..." path="..." category="..."/>
</prior-resolutions>
</cluster-brief>
```
On verify-loop re-entry, add context about the previous cycle:
```xml
<cluster-brief>
...
<just-fixed-files>[files modified in the previous fix cycle]</just-fixed-files>
</cluster-brief>
```
The `<prior-resolutions>` element lists previously-resolved threads that clustered with the new threads — their IDs, file paths, and assigned concern categories. This gives the resolver agent the full cross-round picture. When no previously-resolved threads are in the cluster, omit the element.
4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5.
4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5. Previously-resolved threads that don't cluster with any new thread are dropped — they provided context but no pattern was found.
5. **If no clusters are found** after analysis (the gate fired but items don't form thematic+spatial groups), proceed with all items as individual. The gate was a false positive -- the only cost was the analysis itself.
@@ -133,9 +130,13 @@ If step 3 produced clusters, include them in the task list as cluster items alon
Process all three feedback types. Review threads are the primary type; PR comments and review bodies are secondary but should not be ignored.
#### Dispatch boundary for previously-resolved threads
Previously-resolved threads (from `cross_invocation.resolved_threads`) participate in clustering and appear in cluster briefs as `<prior-resolutions>` context. They are NEVER individually dispatched — they were already resolved in prior rounds. Only new threads get individual or cluster dispatch.
#### Individual dispatch (default)
**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually.
**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each new thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually.
Each agent receives:
- The thread ID
@@ -264,7 +265,7 @@ The `review_threads` array should be empty (except `needs-human` items).
**If new threads remain**, check the iteration count for this run:
- **First or second fix-verify cycle**: Record which files were modified and which concern categories were addressed in this cycle. Then repeat from step 2 for the remaining threads. The cluster analysis gate (step 3) will fire on re-entry because verify-loop re-entry is a gate signal, enabling broader investigation of recurring patterns.
- **First or second fix-verify cycle**: Repeat from step 2 for the remaining threads. The re-fetch in step 1 will pick up threads resolved in earlier cycles as resolved threads in `cross_invocation`, so the cross-invocation gate (step 3) will fire naturally if patterns emerge across cycles.
- **After the second fix-verify cycle** (3rd pass would begin): Stop looping. Surface remaining issues to the user with context about the recurring pattern: "Multiple rounds of feedback on [area/theme] suggest a deeper issue. Here's what we've fixed so far and what keeps appearing." Use the same `needs-human` escalation pattern -- leave threads open and present the pattern for the user to decide.

View File

@@ -25,16 +25,19 @@ if [ -z "$OWNER" ] || [ -z "$REPO" ]; then
fi
# Fetch review threads, regular PR comments, and review bodies in one query.
# Output is a JSON object with three keys:
# review_threads - unresolved, non-outdated inline code review threads
# pr_comments - top-level PR conversation comments (excludes PR author)
# review_bodies - review submissions with non-empty body text (excludes PR author)
# Output is a JSON object with four keys:
# review_threads - unresolved, non-outdated inline code review threads
# pr_comments - top-level PR conversation comments (excludes PR author)
# review_bodies - review submissions with non-empty body text (excludes PR author)
# cross_invocation - cross-invocation awareness envelope:
# signal: true when both resolved and unresolved threads exist (multi-round review)
# resolved_threads: last N resolved threads by recency, for cluster analysis input
gh api graphql -f owner="$OWNER" -f repo="$REPO" -F pr="$PR_NUMBER" -f query='
query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
repository(owner: $owner, name: $repo) {
pullRequest(number: $pr) {
author { login }
reviewThreads(first: 100) {
reviewThreads(first: 50) {
edges {
node {
id
@@ -42,7 +45,7 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
isOutdated
path
line
comments(first: 50) {
comments(first: 10) {
nodes {
id
author { login }
@@ -75,13 +78,27 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
}
}
}
}' | jq '.data.repository.pullRequest as $pr | {
review_threads: [$pr.reviewThreads.edges[]
| select(.node.isResolved == false and .node.isOutdated == false)],
}' | jq '.data.repository.pullRequest as $pr |
# Unresolved threads (existing behavior, unchanged)
[$pr.reviewThreads.edges[]
| select(.node.isResolved == false and .node.isOutdated == false)] as $unresolved |
# Resolved threads for cross-invocation awareness (last 10 by most recent comment)
[$pr.reviewThreads.edges[]
| select(.node.isResolved == true)
| { thread_id: .node.id, path: .node.path, line: .node.line,
first_comment_body: .node.comments.nodes[0].body,
last_comment_at: ([.node.comments.nodes[].createdAt] | sort | last) }]
| sort_by(.last_comment_at) | .[-10:] | reverse as $resolved |
{
review_threads: $unresolved,
pr_comments: [$pr.comments.nodes[]
| select(.author.login != $pr.author.login)
| select(.body | test("^\\s*$") | not)],
review_bodies: [$pr.reviews.nodes[]
| select(.body != null and .body != "")
| select(.author.login != $pr.author.login)]
| select(.author.login != $pr.author.login)],
cross_invocation: {
signal: (($resolved | length) > 0 and ($unresolved | length) > 0),
resolved_threads: $resolved
}
}'