feat(resolve-pr-feedback): add cross-invocation cluster analysis (#480)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trevin Chow
2026-04-01 12:15:25 -07:00
committed by GitHub
parent c65a698d93
commit 7b8265bd81
5 changed files with 461 additions and 32 deletions

View File

@@ -78,15 +78,15 @@ Before planning and dispatching fixes, check whether feedback patterns suggest a
| Gate signal | Check |
|---|---|
| **Volume** | 3+ new items from triage |
| **Verify-loop re-entry** | This is the 2nd+ pass through the workflow (new feedback appeared after a previous fix round) |
| **Cross-invocation** | `cross_invocation.signal == true` in the script output (resolved threads exist alongside new ones — evidence of multi-round review) |
If the gate does not fire, proceed to step 4. The common case (1-2 unrelated comments) skips this step entirely with zero overhead.
If the gate does not fire, proceed to step 4. The common case (first review round with 1-2 comments) skips this step entirely with zero overhead.
**If the gate fires**, analyze feedback for thematic clusters:
**If the gate fires**, analyze feedback for thematic clusters. When the cross-invocation signal fired, include resolved threads from `cross_invocation.resolved_threads` alongside new threads in the analysis — these are previously-resolved threads from earlier review rounds that provide pattern context. Mark them as `previously_resolved` so dispatch (step 5) knows not to individually re-resolve them.
1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each new item gets exactly one category based on what the feedback is about.
1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each item (new and previously-resolved) gets exactly one category based on what the feedback is about.
2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree).
2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree). Clusters can span new and previously-resolved threads.
| Thematic match | Spatial proximity | Action |
|---|---|---|
@@ -102,20 +102,17 @@ If the gate does not fire, proceed to step 4. The common case (1-2 unrelated com
<theme>[concern category]</theme>
<area>[common directory path]</area>
<files>[comma-separated file paths]</files>
<threads>[comma-separated thread/comment IDs]</threads>
<threads>[comma-separated new thread/comment IDs]</threads>
<hypothesis>[one sentence: what the individual comments collectively suggest about a deeper issue]</hypothesis>
<prior-resolutions>
<thread id="PRRT_..." path="..." category="..."/>
</prior-resolutions>
</cluster-brief>
```
On verify-loop re-entry, add context about the previous cycle:
```xml
<cluster-brief>
...
<just-fixed-files>[files modified in the previous fix cycle]</just-fixed-files>
</cluster-brief>
```
The `<prior-resolutions>` element lists previously-resolved threads that clustered with the new threads — their IDs, file paths, and assigned concern categories. This gives the resolver agent the full cross-round picture. When no previously-resolved threads are in the cluster, omit the element.
4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5.
4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5. Previously-resolved threads that don't cluster with any new thread are dropped — they provided context but no pattern was found.
5. **If no clusters are found** after analysis (the gate fired but items don't form thematic+spatial groups), proceed with all items as individual. The gate was a false positive -- the only cost was the analysis itself.
@@ -133,9 +130,13 @@ If step 3 produced clusters, include them in the task list as cluster items alon
Process all three feedback types. Review threads are the primary type; PR comments and review bodies are secondary but should not be ignored.
#### Dispatch boundary for previously-resolved threads
Previously-resolved threads (from `cross_invocation.resolved_threads`) participate in clustering and appear in cluster briefs as `<prior-resolutions>` context. They are NEVER individually dispatched — they were already resolved in prior rounds. Only new threads get individual or cluster dispatch.
#### Individual dispatch (default)
**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually.
**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each new thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually.
Each agent receives:
- The thread ID
@@ -264,7 +265,7 @@ The `review_threads` array should be empty (except `needs-human` items).
**If new threads remain**, check the iteration count for this run:
- **First or second fix-verify cycle**: Record which files were modified and which concern categories were addressed in this cycle. Then repeat from step 2 for the remaining threads. The cluster analysis gate (step 3) will fire on re-entry because verify-loop re-entry is a gate signal, enabling broader investigation of recurring patterns.
- **First or second fix-verify cycle**: Repeat from step 2 for the remaining threads. The re-fetch in step 1 will pick up threads resolved in earlier cycles as resolved threads in `cross_invocation`, so the cross-invocation gate (step 3) will fire naturally if patterns emerge across cycles.
- **After the second fix-verify cycle** (3rd pass would begin): Stop looping. Surface remaining issues to the user with context about the recurring pattern: "Multiple rounds of feedback on [area/theme] suggest a deeper issue. Here's what we've fixed so far and what keeps appearing." Use the same `needs-human` escalation pattern -- leave threads open and present the pattern for the user to decide.

View File

@@ -25,16 +25,19 @@ if [ -z "$OWNER" ] || [ -z "$REPO" ]; then
fi
# Fetch review threads, regular PR comments, and review bodies in one query.
# Output is a JSON object with three keys:
# review_threads - unresolved, non-outdated inline code review threads
# pr_comments - top-level PR conversation comments (excludes PR author)
# review_bodies - review submissions with non-empty body text (excludes PR author)
# Output is a JSON object with four keys:
# review_threads - unresolved, non-outdated inline code review threads
# pr_comments - top-level PR conversation comments (excludes PR author)
# review_bodies - review submissions with non-empty body text (excludes PR author)
# cross_invocation - cross-invocation awareness envelope:
# signal: true when both resolved and unresolved threads exist (multi-round review)
# resolved_threads: last N resolved threads by recency, for cluster analysis input
gh api graphql -f owner="$OWNER" -f repo="$REPO" -F pr="$PR_NUMBER" -f query='
query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
repository(owner: $owner, name: $repo) {
pullRequest(number: $pr) {
author { login }
reviewThreads(first: 100) {
reviewThreads(first: 50) {
edges {
node {
id
@@ -42,7 +45,7 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
isOutdated
path
line
comments(first: 50) {
comments(first: 10) {
nodes {
id
author { login }
@@ -75,13 +78,27 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
}
}
}
}' | jq '.data.repository.pullRequest as $pr | {
review_threads: [$pr.reviewThreads.edges[]
| select(.node.isResolved == false and .node.isOutdated == false)],
}' | jq '.data.repository.pullRequest as $pr |
# Unresolved threads (existing behavior, unchanged)
[$pr.reviewThreads.edges[]
| select(.node.isResolved == false and .node.isOutdated == false)] as $unresolved |
# Resolved threads for cross-invocation awareness (last 10 by most recent comment)
[$pr.reviewThreads.edges[]
| select(.node.isResolved == true)
| { thread_id: .node.id, path: .node.path, line: .node.line,
first_comment_body: .node.comments.nodes[0].body,
last_comment_at: ([.node.comments.nodes[].createdAt] | sort | last) }]
| sort_by(.last_comment_at) | .[-10:] | reverse as $resolved |
{
review_threads: $unresolved,
pr_comments: [$pr.comments.nodes[]
| select(.author.login != $pr.author.login)
| select(.body | test("^\\s*$") | not)],
review_bodies: [$pr.reviews.nodes[]
| select(.body != null and .body != "")
| select(.author.login != $pr.author.login)]
| select(.author.login != $pr.author.login)],
cross_invocation: {
signal: (($resolved | length) > 0 and ($unresolved | length) > 0),
resolved_threads: $resolved
}
}'