feat(resolve-pr-feedback): add gated feedback clustering to detect systemic issues (#441)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 12:40:55 -07:00
parent 4c7f51f35b
commit a301a08205
3 changed files with 502 additions and 20 deletions
--- a/plugins/compound-engineering/agents/workflow/pr-comment-resolver.md
+++ b/plugins/compound-engineering/agents/workflow/pr-comment-resolver.md
@@ -1,6 +1,6 @@
 ---
 name: pr-comment-resolver
-description: "Evaluates and resolves a single PR review thread -- assesses validity, implements fixes, and returns a structured summary with reply text. Spawned by the resolve-pr-feedback skill."
+description: "Evaluates and resolves one or more related PR review threads -- assesses validity, implements fixes, and returns structured summaries with reply text. Spawned by the resolve-pr-feedback skill."
 color: blue
 model: inherit
 ---
@@ -18,9 +18,22 @@ user: "Thread PRRT_def456 on api.ts:78 -- reviewer says: 'No error handling for
 assistant: "Reading api.ts... There's a try/catch at line 72 that wraps this fetch call. The reviewer may have missed it. Verdict: not-addressing."
 <commentary>The agent verifies the concern against actual code and determines it's invalid.</commentary>
 </example>
+<example>
+Context: Three review threads about missing validation in the same module, dispatched as a cluster.
+user: "Cluster: 3 threads about missing input validation in src/auth/. <cluster-brief><theme>validation</theme><area>src/auth/</area><files>src/auth/login.ts, src/auth/register.ts, src/auth/middleware.ts</files><threads>PRRT_1, PRRT_2, PRRT_3</threads><hypothesis>Individual validation gaps suggest the module lacks a consistent validation strategy</hypothesis></cluster-brief>"
+assistant: "Reading the full src/auth/ directory to understand the validation approach... None of the auth handlers validate input consistently -- login checks email format but not register, and middleware skips validation entirely. The individual comments are symptoms of a missing validation layer. Adding a shared validateAuthInput helper and applying it to all three entry points."
+<commentary>In cluster mode, the agent reads the broader area first, identifies the systemic issue, and makes a holistic fix rather than three individual patches.</commentary>
+</example>
 </examples>

-You resolve a single PR review thread. You receive the thread ID, file path, line number, and full comment text. Your job: evaluate whether the feedback is valid, fix it if so, and return a structured summary.
+You resolve PR review threads. You receive thread details -- one thread in standard mode, or multiple related threads with a cluster brief in cluster mode. Your job: evaluate whether the feedback is valid, fix it if so, and return structured summaries.
+
+## Mode Detection
+
+| Input | Mode |
+|-------|------|
+| Thread details without `<cluster-brief>` | **Standard** -- evaluate and fix one thread (or one file's worth of threads) |
+| Thread details with `<cluster-brief>` XML block | **Cluster** -- investigate the broader area before making targeted fixes |

 ## Evaluation Rubric

@@ -44,7 +57,7 @@ Before touching any code, read the referenced file and classify the feedback:

 **Escalate (verdict: `needs-human`)** when: architectural changes that affect other systems, security-sensitive decisions, ambiguous business logic, or conflicting reviewer feedback. This should be rare -- most feedback has a clear right answer.

-## Workflow
+## Standard Mode Workflow

 1. **Read the code** at the referenced file and line. For review threads, the file path and line are provided directly. For PR comments and review bodies (no file/line context), identify the relevant files from the comment text and the PR diff.
 2. **Evaluate validity** using the rubric above.
@@ -124,10 +137,39 @@ reason: [one-line explanation]
 decision_context: [only for needs-human -- the full markdown block above]
 ```

+## Cluster Mode Workflow
+
+When a `<cluster-brief>` XML block is present, follow this workflow instead of the standard workflow.
+
+1. **Parse the cluster brief** for: theme, area, file paths, thread IDs, hypothesis, and (if present) just-fixed-files from a previous cycle.
+
+2. **Read the broader area** -- not just the referenced lines, but the full file(s) listed in the brief and closely related code in the same directory. Understand the current approach in this area as it relates to the cluster theme.
+
+3. **Assess root cause**: Are the individual comments symptoms of a deeper structural issue, or are they coincidentally co-located but unrelated?
+   - **Systemic**: The comments point to a missing pattern, inconsistent approach, or architectural gap. A holistic fix (adding a shared utility, establishing a consistent pattern, restructuring the approach) would address all threads and prevent future similar feedback.
+   - **Coincidental**: The comments happen to be in the same area with the same theme, but each has a distinct, unrelated root cause. Individual fixes are appropriate.
+
+4. **Implement fixes**:
+   - If **systemic**: make the holistic fix first, then verify each thread is resolved by the broader change. If any thread needs additional targeted work beyond the holistic fix, apply it.
+   - If **coincidental**: fix each thread individually as in standard mode.
+
+5. **Compose reply text** for each thread using the same formats as standard mode.
+
+6. **Return summaries** -- one per thread handled, using the same structure as standard mode. Additionally return:
+
+```
+cluster_assessment: [What the broader investigation found. Whether a holistic
+or individual approach was taken, and why. If holistic: what the systemic issue
+was and how the fix addresses it. Keep to 2-3 sentences.]
+```
+
+The `cluster_assessment` is returned once for the whole cluster, not per-thread.
+
 ## Principles

- Stay focused on the specific thread. Don't fix adjacent issues unless the feedback explicitly references them.
 - Read before acting. Never assume the reviewer is right without checking the code.
 - Never assume the reviewer is wrong without checking the code.
 - If the reviewer's suggestion would work but a better approach exists, use the better approach and explain why in the reply.
 - Maintain consistency with the existing codebase style and patterns.
+- In standard mode: stay focused on the specific thread. Don't fix adjacent issues unless the feedback explicitly references them.
+- In cluster mode: read broadly, but keep fixes scoped to the cluster theme. Don't use the broader read as an excuse to refactor unrelated code.
--- a/plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md
+++ b/plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md
@@ -64,9 +64,59 @@ Before processing, classify each piece of feedback as **new** or **already handl

 The distinction is about content, not who posted what. A deferral from a teammate, a previous skill run, or a manual reply all count.

-If there are no new items across all feedback types, skip steps 3-7 and go straight to step 8.
+If there are no new items across all feedback types, skip steps 3-8 and go straight to step 9.

-### 3. Plan
+### 3. Cluster Analysis (Gated)
+
+Before planning and dispatching fixes, check whether feedback patterns suggest a systemic issue that warrants broader investigation rather than individual fixes.
+
+**Gate check**: Cluster analysis only runs when at least one signal fires. If neither fires, skip directly to step 4.
+
+| Gate signal | Check |
+|---|---|
+| **Volume** | 4+ new items from triage |
+| **Verify-loop re-entry** | This is the 2nd+ pass through the workflow (new feedback appeared after a previous fix round) |
+
+If the gate does not fire, proceed to step 4. The common case (1-3 unrelated comments) skips this step entirely with zero overhead.
+
+**If the gate fires**, analyze feedback for thematic clusters:
+
+1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each new item gets exactly one category based on what the feedback is about.
+
+2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree).
+
+   | Thematic match | Spatial proximity | Action |
+   |---|---|---|
+   | Same category | Same file | Cluster |
+   | Same category | Same directory subtree | Cluster |
+   | Same category | Unrelated locations | No cluster |
+   | Different categories | Any | No cluster (same-file grouping still applies for conflict avoidance) |
+
+3. **Synthesize a cluster brief** for each cluster of 2+ items. Pass briefs to agents using a `<cluster-brief>` XML block:
+
+   ```xml
+   <cluster-brief>
+     <theme>[concern category]</theme>
+     <area>[common directory path]</area>
+     <files>[comma-separated file paths]</files>
+     <threads>[comma-separated thread/comment IDs]</threads>
+     <hypothesis>[one sentence: what the individual comments collectively suggest about a deeper issue]</hypothesis>
+   </cluster-brief>
+   ```
+
+   On verify-loop re-entry, add context about the previous cycle:
+   ```xml
+   <cluster-brief>
+     ...
+     <just-fixed-files>[files modified in the previous fix cycle]</just-fixed-files>
+   </cluster-brief>
+   ```
+
+4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5.
+
+5. **If no clusters are found** after analysis (the gate fired but items don't form thematic+spatial groups), proceed with all items as individual. The gate was a false positive -- the only cost was the analysis itself.
+
+### 4. Plan

 Create a task list of all **new** unresolved items grouped by type (e.g., `TaskCreate` in Claude Code, `update_plan` in Codex):
 - Code changes requested
@@ -74,11 +124,15 @@ Create a task list of all **new** unresolved items grouped by type (e.g., `TaskC
 - Style/convention fixes
 - Test additions needed

-### 4. Implement (PARALLEL)
+If step 3 produced clusters, include them in the task list as cluster items alongside individual items.
+
+### 5. Implement (PARALLEL)

 Process all three feedback types. Review threads are the primary type; PR comments and review bodies are secondary but should not be ignored.

-**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each.
+#### Individual dispatch (default)
+
+**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually.

 Each agent receives:
 - The thread ID
@@ -87,7 +141,19 @@ Each agent receives:
 - The PR number (for context)
 - The feedback type (`review_thread`)

-**For PR comments and review bodies** (`pr_comments`, `review_bodies`): These lack file/line context. Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each actionable item. The agent receives the comment ID, body text, PR number, and feedback type (`pr_comment` or `review_body`). The agent must identify the relevant files from the comment text and the PR diff.
+**For PR comments and review bodies** (`pr_comments`, `review_bodies`): These lack file/line context. Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each actionable non-clustered item. The agent receives the comment ID, body text, PR number, and feedback type (`pr_comment` or `review_body`). The agent must identify the relevant files from the comment text and the PR diff.
+
+#### Cluster dispatch
+
+For each cluster identified in step 3, dispatch ONE `compound-engineering:workflow:pr-comment-resolver` agent that receives:
+- The `<cluster-brief>` XML block
+- All thread details for threads in the cluster (IDs, file paths, line numbers, comment text)
+- The PR number
+- The feedback types
+
+The cluster agent reads the broader area before making targeted fixes. It returns one summary per thread it handled (same structure as individual agents), plus a `cluster_assessment` field describing what broader investigation revealed and whether a holistic or individual approach was taken.
+
+#### Agent return format

 Each agent returns a short summary:
 - **verdict**: `fixed`, `fixed-differently`, `replied`, `not-addressing`, or `needs-human`
@@ -97,6 +163,9 @@ Each agent returns a short summary:
 - **files_changed**: list of files modified (empty if replied/not-addressing)
 - **reason**: brief explanation of what was done or why it was skipped

+Cluster agents additionally return:
+- **cluster_assessment**: what the broader investigation found, whether a holistic or individual approach was taken
+
 Verdict meanings:
 - `fixed` -- code change made as requested
 - `fixed-differently` -- code change made, but with a better approach than suggested
@@ -104,17 +173,19 @@ Verdict meanings:
 - `not-addressing` -- feedback is factually wrong about the code; skip with evidence
 - `needs-human` -- cannot determine the right action; needs user decision

-**Batching**: If there are 1-4 items total, dispatch all in parallel. For 5+ items, batch in groups of 4.
+#### Batching and conflict avoidance

-**Conflict avoidance**: If multiple threads reference the same file, group them into a single agent dispatch to avoid parallel edit conflicts. The agent handling a multi-thread file receives all threads for that file and addresses them sequentially.
+**Batching**: Clusters count as 1 dispatch unit regardless of how many threads they contain. If there are 1-4 dispatch units total (clusters + individual items), dispatch all in parallel. For 5+ dispatch units, batch in groups of 4.

-Fixes can occasionally expand beyond their referenced file (e.g., renaming a method updates callers elsewhere). This is rare but can cause parallel agents to collide. The verification step (step 7) catches this -- if re-fetching shows unresolved threads or if the commit reveals inconsistent changes, re-run the affected agents sequentially.
+**Conflict avoidance**: No two dispatch units that touch the same file should run in parallel. Before dispatching, check for file overlaps across all dispatch units (clusters and individual items). If a cluster's file list overlaps with an individual item's file, or with another cluster's files, serialize those units -- dispatch one, wait for it to complete, then dispatch the next. Non-overlapping units can still run in parallel. Within a single dispatch unit handling multiple threads on the same file, the agent addresses them sequentially.

-Platforms that do not support parallel dispatch should run agents sequentially.
+**Sequential fallback**: Platforms that do not support parallel dispatch should run agents sequentially. Dispatch cluster units first (they are higher-leverage), then individual items.

-### 5. Commit and Push
+Fixes can occasionally expand beyond their referenced file (e.g., renaming a method updates callers elsewhere). This is rare but can cause parallel agents to collide. The verification step (step 8) catches this -- if re-fetching shows unresolved threads or if the commit reveals inconsistent changes, re-run the affected agents sequentially.

-After all agents complete, check whether any files were actually changed. If all verdicts are `replied`, `not-addressing`, or `needs-human` (no code changes), skip this step entirely and proceed to step 6.
+### 6. Commit and Push
+
+After all agents complete, check whether any files were actually changed. If all verdicts are `replied`, `not-addressing`, or `needs-human` (no code changes), skip this step entirely and proceed to step 7.

 If there are file changes:

@@ -132,7 +203,7 @@ git commit -m "Address PR review feedback (#PR_NUMBER)
 git push
 ```

-### 6. Reply and Resolve
+### 7. Reply and Resolve

 After the push succeeds, post replies and resolve where applicable. The mechanism depends on the feedback type.

@@ -178,7 +249,7 @@ gh pr comment PR_NUMBER --body "REPLY_TEXT"

 Include enough quoted context in the reply so the reader can follow which comment is being addressed without scrolling.

-### 7. Verify
+### 8. Verify

 Re-fetch feedback to confirm resolution:

@@ -186,11 +257,17 @@ Re-fetch feedback to confirm resolution:
 bash scripts/get-pr-comments PR_NUMBER
 ```

-The `review_threads` array should be empty (except `needs-human` items). If threads remain, repeat from step 1 for the remaining threads.
+The `review_threads` array should be empty (except `needs-human` items).
+
+**If new threads remain**, check the iteration count for this run:
+
+- **First or second fix-verify cycle**: Record which files were modified and which concern categories were addressed in this cycle. Then repeat from step 2 for the remaining threads. The cluster analysis gate (step 3) will fire on re-entry because verify-loop re-entry is a gate signal, enabling broader investigation of recurring patterns.
+
+- **After the second fix-verify cycle** (3rd pass would begin): Stop looping. Surface remaining issues to the user with context about the recurring pattern: "Multiple rounds of feedback on [area/theme] suggest a deeper issue. Here's what we've fixed so far and what keeps appearing." Use the same `needs-human` escalation pattern -- leave threads open and present the pattern for the user to decide.

 PR comments and review bodies have no resolve mechanism, so they will still appear in the output. Verify they were replied to by checking the PR conversation.

-### 8. Summary
+### 9. Summary

 Present a concise summary of all work done. Group by verdict, one line per item describing *what was done* not just *where*. This is the primary output the user sees.

@@ -205,6 +282,15 @@ Replied (count): [what questions were answered]
 Not addressing (count): [what was skipped and why]
 ```

+If any clusters were investigated, append a cluster investigation section:
+
+```
+Cluster investigations (count):
+
+1. [theme] in [area]: [cluster_assessment from the agent --
+   what was found, whether a holistic or individual approach was taken]
+```
+
 If any agent returned `needs-human`, append a decisions section. These are rare but high-signal. Each `needs-human` agent returns a `decision_context` field with a structured analysis: what the reviewer said, what the agent investigated, why it needs a decision, concrete options with tradeoffs, and the agent's lean if it has one.

 Present the `decision_context` directly -- it's already structured for the user to read and decide quickly:
@@ -264,7 +350,7 @@ This fetches thread IDs and their first comment IDs (minimal fields, no bodies)

 ### 2. Fix, Reply, Resolve

-Spawn a single `compound-engineering:workflow:pr-comment-resolver` agent for the thread. Then follow the same commit -> push -> reply -> resolve flow as Full Mode steps 5-6.
+Spawn a single `compound-engineering:workflow:pr-comment-resolver` agent for the thread. Then follow the same commit -> push -> reply -> resolve flow as Full Mode steps 6-7.

 ---