feat(resolve-pr-feedback): add cross-invocation cluster analysis (#480)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 12:15:25 -07:00
parent c65a698d93
commit 7b8265bd81
5 changed files with 461 additions and 32 deletions
--- a/plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md
+++ b/plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md
@@ -78,15 +78,15 @@ Before planning and dispatching fixes, check whether feedback patterns suggest a
 | Gate signal | Check |
 |---|---|
 | **Volume** | 3+ new items from triage |
-| **Verify-loop re-entry** | This is the 2nd+ pass through the workflow (new feedback appeared after a previous fix round) |
+| **Cross-invocation** | `cross_invocation.signal == true` in the script output (resolved threads exist alongside new ones — evidence of multi-round review) |

-If the gate does not fire, proceed to step 4. The common case (1-2 unrelated comments) skips this step entirely with zero overhead.
+If the gate does not fire, proceed to step 4. The common case (first review round with 1-2 comments) skips this step entirely with zero overhead.

-**If the gate fires**, analyze feedback for thematic clusters:
+**If the gate fires**, analyze feedback for thematic clusters. When the cross-invocation signal fired, include resolved threads from `cross_invocation.resolved_threads` alongside new threads in the analysis — these are previously-resolved threads from earlier review rounds that provide pattern context. Mark them as `previously_resolved` so dispatch (step 5) knows not to individually re-resolve them.

-1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each new item gets exactly one category based on what the feedback is about.
+1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each item (new and previously-resolved) gets exactly one category based on what the feedback is about.

-2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree).
+2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree). Clusters can span new and previously-resolved threads.

   | Thematic match | Spatial proximity | Action |
   |---|---|---|
@@ -102,20 +102,17 @@ If the gate does not fire, proceed to step 4. The common case (1-2 unrelated com
     <theme>[concern category]</theme>
     <area>[common directory path]</area>
     <files>[comma-separated file paths]</files>
-     <threads>[comma-separated thread/comment IDs]</threads>
+     <threads>[comma-separated new thread/comment IDs]</threads>
     <hypothesis>[one sentence: what the individual comments collectively suggest about a deeper issue]</hypothesis>
+     <prior-resolutions>
+       <thread id="PRRT_..." path="..." category="..."/>
+     </prior-resolutions>
   </cluster-brief>
   ```

-   On verify-loop re-entry, add context about the previous cycle:
-   ```xml
-   <cluster-brief>
-     ...
-     <just-fixed-files>[files modified in the previous fix cycle]</just-fixed-files>
-   </cluster-brief>
-   ```
+   The `<prior-resolutions>` element lists previously-resolved threads that clustered with the new threads — their IDs, file paths, and assigned concern categories. This gives the resolver agent the full cross-round picture. When no previously-resolved threads are in the cluster, omit the element.

-4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5.
+4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5. Previously-resolved threads that don't cluster with any new thread are dropped — they provided context but no pattern was found.

 5. **If no clusters are found** after analysis (the gate fired but items don't form thematic+spatial groups), proceed with all items as individual. The gate was a false positive -- the only cost was the analysis itself.

@@ -133,9 +130,13 @@ If step 3 produced clusters, include them in the task list as cluster items alon

 Process all three feedback types. Review threads are the primary type; PR comments and review bodies are secondary but should not be ignored.

+#### Dispatch boundary for previously-resolved threads
+
+Previously-resolved threads (from `cross_invocation.resolved_threads`) participate in clustering and appear in cluster briefs as `<prior-resolutions>` context. They are NEVER individually dispatched — they were already resolved in prior rounds. Only new threads get individual or cluster dispatch.
+
 #### Individual dispatch (default)

-**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually.
+**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each new thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually.

 Each agent receives:
 - The thread ID
@@ -264,7 +265,7 @@ The `review_threads` array should be empty (except `needs-human` items).

 **If new threads remain**, check the iteration count for this run:

- **First or second fix-verify cycle**: Record which files were modified and which concern categories were addressed in this cycle. Then repeat from step 2 for the remaining threads. The cluster analysis gate (step 3) will fire on re-entry because verify-loop re-entry is a gate signal, enabling broader investigation of recurring patterns.
+- **First or second fix-verify cycle**: Repeat from step 2 for the remaining threads. The re-fetch in step 1 will pick up threads resolved in earlier cycles as resolved threads in `cross_invocation`, so the cross-invocation gate (step 3) will fire naturally if patterns emerge across cycles.

 - **After the second fix-verify cycle** (3rd pass would begin): Stop looping. Surface remaining issues to the user with context about the recurring pattern: "Multiple rounds of feedback on [area/theme] suggest a deeper issue. Here's what we've fixed so far and what keeps appearing." Use the same `needs-human` escalation pattern -- leave threads open and present the pattern for the user to decide.

--- a/plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments
+++ b/plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments
@@ -25,16 +25,19 @@ if [ -z "$OWNER" ] || [ -z "$REPO" ]; then
 fi

 # Fetch review threads, regular PR comments, and review bodies in one query.
-# Output is a JSON object with three keys:
-#   review_threads - unresolved, non-outdated inline code review threads
-#   pr_comments    - top-level PR conversation comments (excludes PR author)
-#   review_bodies  - review submissions with non-empty body text (excludes PR author)
+# Output is a JSON object with four keys:
+#   review_threads   - unresolved, non-outdated inline code review threads
+#   pr_comments      - top-level PR conversation comments (excludes PR author)
+#   review_bodies    - review submissions with non-empty body text (excludes PR author)
+#   cross_invocation - cross-invocation awareness envelope:
+#     signal: true when both resolved and unresolved threads exist (multi-round review)
+#     resolved_threads: last N resolved threads by recency, for cluster analysis input
 gh api graphql -f owner="$OWNER" -f repo="$REPO" -F pr="$PR_NUMBER" -f query='
 query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
  repository(owner: $owner, name: $repo) {
    pullRequest(number: $pr) {
      author { login }
-      reviewThreads(first: 100) {
+      reviewThreads(first: 50) {
        edges {
          node {
            id
@@ -42,7 +45,7 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
            isOutdated
            path
            line
-            comments(first: 50) {
+            comments(first: 10) {
              nodes {
                id
                author { login }
@@ -75,13 +78,27 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
      }
    }
  }
-}' | jq '.data.repository.pullRequest as $pr | {
-  review_threads: [$pr.reviewThreads.edges[]
-    | select(.node.isResolved == false and .node.isOutdated == false)],
+}' | jq '.data.repository.pullRequest as $pr |
+  # Unresolved threads (existing behavior, unchanged)
+  [$pr.reviewThreads.edges[]
+    | select(.node.isResolved == false and .node.isOutdated == false)] as $unresolved |
+  # Resolved threads for cross-invocation awareness (last 10 by most recent comment)
+  [$pr.reviewThreads.edges[]
+    | select(.node.isResolved == true)
+    | { thread_id: .node.id, path: .node.path, line: .node.line,
+        first_comment_body: .node.comments.nodes[0].body,
+        last_comment_at: ([.node.comments.nodes[].createdAt] | sort | last) }]
+    | sort_by(.last_comment_at) | .[-10:] | reverse as $resolved |
+{
+  review_threads: $unresolved,
  pr_comments: [$pr.comments.nodes[]
    | select(.author.login != $pr.author.login)
    | select(.body | test("^\\s*$") | not)],
  review_bodies: [$pr.reviews.nodes[]
    | select(.body != null and .body != "")
-    | select(.author.login != $pr.author.login)]
+    | select(.author.login != $pr.author.login)],
+  cross_invocation: {
+    signal: (($resolved | length) > 0 and ($unresolved | length) > 0),
+    resolved_threads: $resolved
+  }
 }'