feat(resolve-pr-feedback): add gated feedback clustering to detect systemic issues (#441)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-29 12:40:55 -07:00
parent 4c7f51f35b
commit a301a08205
3 changed files with 502 additions and 20 deletions
--- a/docs/plans/2026-03-29-002-feat-pr-feedback-clustering-plan.md
+++ b/docs/plans/2026-03-29-002-feat-pr-feedback-clustering-plan.md
@@ -0,0 +1,354 @@
+---
+title: "feat(resolve-pr-feedback): Add feedback clustering to detect systemic issues"
+type: feat
+status: completed
+date: 2026-03-29
+deepened: 2026-03-29
+---
+
+# feat(resolve-pr-feedback): Add feedback clustering to detect systemic issues
+
+## Overview
+
+Add a gated cluster analysis phase to the resolve-pr-feedback skill that detects when concentrated, thematically similar feedback signals a systemic issue rather than isolated bugs. The analysis is gated — it only runs when feedback patterns warrant it (same-file concentration, high volume, or verify-loop re-entry), keeping the common case (2-3 unrelated comments) at zero extra cost. When clusters are detected, dispatch a single investigation-aware agent per cluster that reads the broader area before fixing, rather than N individual fixers playing whack-a-mole. Verify-loop re-entry (new feedback after a fix round) automatically triggers the gate, so cross-cycle patterns are caught without a separate detection mechanism.
+
+## Problem Frame
+
+The resolve-pr-feedback skill currently processes feedback items individually. The only grouping is same-file conflict avoidance (grouping threads that reference the same file into one agent dispatch). There is no semantic analysis of whether multiple feedback items collectively point to a deeper structural issue.
+
+This leads to a whack-a-mole pattern:
+1. Review bots post 4 comments about missing error handling across different functions in `auth.ts`
+2. The skill fixes each one individually — adds a try/catch here, a null check there
+3. The review bot re-runs and finds 3 more error handling gaps the individual fixes didn't cover
+4. The cycle repeats because the underlying issue (the error handling *strategy* in that module) was never examined
+
+The insight: individual comments don't say "this whole approach is wrong," but when you see 2+ comments about the same category of concern in the same area of code, the inference is that the approach in that area needs rethinking — not just N individual patches.
+
+## Requirements Trace
+
+- R1. Detect thematic+spatial clusters in feedback before dispatching fix agents
+- R2. When clusters are detected, investigate the broader area before making targeted fixes
+- R3. Treat verify-loop re-entry (new feedback after a fix round) as a signal to investigate more broadly via the cluster analysis gate
+- R4. Preserve existing behavior for non-clustered feedback (isolated items still get individual agents)
+- R5. Keep the skill prompt-driven (no code changes — this is all SKILL.md and agent markdown)
+- R6. Gate cluster analysis on signal strength — don't run it unconditionally on every pass, only when feedback patterns warrant the cost
+
+## Scope Boundaries
+
+- No changes to the GraphQL scripts (fetch, reply, resolve)
+- No changes to targeted mode (single-thread URL) — clustering only applies in full mode
+- No new agents — extend the existing pr-comment-resolver agent with cluster context handling
+- No changes to the verdict taxonomy (fixed, fixed-differently, replied, not-addressing, needs-human)
+- Clustering is a signal for the orchestrator, not a new data structure or API
+
+## Context & Research
+
+### Relevant Code and Patterns
+
+- `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md` — the orchestrator skill, 285 lines
+- `plugins/compound-engineering/agents/workflow/pr-comment-resolver.md` — the worker agent, 134 lines
+- Current same-file grouping at SKILL.md lines 107-113 — conflict avoidance pattern to extend
+- The ce:review skill's confidence-gated merge/dedup pipeline — precedent for pre-dispatch analysis
+- The todo-resolve skill uses the same pr-comment-resolver agent and batching pattern
+
+### Institutional Learnings
+
+- **Whack-a-mole state machines** (`docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md`): Skills handling multiple dimensions of state need explicit re-verification after every mutating action. Directly applicable — after fixing a cluster, re-verify the whole area, not just the individual threads.
+- **Cluster before filter** (`docs/solutions/skill-design/claude-permissions-optimizer-classification-fix.md`): Pipeline ordering is an architectural invariant. Group/cluster related items before deciding how to address them, otherwise individually below-threshold items that are part of a meaningful pattern get discarded.
+- **Status-gated resolution** (`docs/solutions/workflow/todo-status-lifecycle.md`): Quality gates belong upstream in triage, not at the resolve boundary. The cluster analysis step is exactly this — a quality gate before dispatch.
+- **Pass paths not content** (`docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md`): When dispatching cluster-aware agents, pass thread IDs and file paths, not full comment bodies.
+
+## Key Technical Decisions
+
+- **Cluster analysis lives in the orchestrator (SKILL.md), not the agent**: The orchestrator sees all feedback and can detect cross-thread patterns. Individual agents only see their assigned threads. The orchestrator synthesizes the cluster brief; the agent receives it as context alongside the thread details.
+
+- **Extend existing grouping rather than replacing it**: The current same-file grouping (SKILL.md lines 107-113) already groups threads that reference the same file. Cluster analysis is a semantic layer on top of this — it groups by theme + proximity, and the same-file grouping becomes a special case of spatial proximity.
+
+- **Single agent per cluster, not a new "investigator" agent**: The pr-comment-resolver agent already reads code, evaluates validity, and fixes. For clusters, it receives additional context (the cluster brief and all related threads) and follows an extended workflow: read the broader area first, assess root cause, then decide between holistic fix and individual fixes. This avoids a new agent and keeps the existing parallel dispatch architecture.
+
+- **Cross-cycle detection is a gate signal, not a separate mechanism**: When the Verify step finds new feedback after a fix round, that re-entry automatically triggers the cluster analysis gate. No separate concern-category matching or structural comparison needed — the cluster analysis step handles thematic grouping with the just-fixed file context. This avoids the fragility of comparing LLM-generated category labels across inference passes.
+
+- **Cluster threshold: 2+ items with shared theme AND proximity**: A single comment is never a cluster. Two items sharing both thematic similarity and spatial proximity form the minimum cluster. The threshold is deliberately low because the cost of investigating more broadly is small (agent time is cheap) and the cost of missing a systemic issue is high (another review loop).
+
+- **Cluster analysis is gated, not always-on**: Running cluster analysis on every pass adds latency and token cost for the common case (2-3 unrelated comments). Instead, cluster analysis only fires when the feedback already shows concentration signals. The gate uses cheap, structural checks that are byproducts of triage — not new LLM inference. Gate signals: (a) volume threshold (4+ new items total — enough that patterns are plausible), or (b) verify-loop re-entry (new feedback appeared after a fix round — the strongest signal). Same-file concentration is deliberately excluded as a gate signal because it's the most common feedback pattern and is already handled by existing same-file grouping; it would cause the gate to fire on the majority of runs. If no gate signal fires, skip cluster analysis entirely and proceed directly to plan/dispatch as today.
+
+- **Verify-loop re-entry is a gate signal, not a separate comparison mechanism**: Cross-cycle detection does not need its own concern-category matching or structural comparison. The fact that new feedback appeared after a fix round IS the whack-a-mole signal. Any verify-loop re-entry automatically triggers the cluster analysis gate. The cluster analysis step itself handles the thematic grouping — it doesn't need a separate mechanism to tell it "this is cross-cycle." On re-entry, the cluster analysis step receives which files were just fixed as additional context, so it can assess whether new feedback relates to just-fixed areas.
+
+## Open Questions
+
+### Resolved During Planning
+
+- **Should clusters replace or supplement individual dispatch?** Supplement. Non-clustered items still get individual agents. A cluster dispatches one agent that handles all its threads together. Both can happen in the same run.
+- **Should the agent decide holistic vs. individual, or the orchestrator?** The agent. The orchestrator detects the cluster and synthesizes the brief, but the agent reads the code and is better positioned to judge whether individual fixes suffice or a broader change is needed.
+- **How does the cluster brief get passed?** In a `<cluster-brief>` XML block in the agent prompt — structurally delimited for unambiguous activation. The brief contains: theme label, affected directory/area, file paths, thread IDs, and a one-sentence hypothesis. No full comment bodies — the agent reads threads itself. This prevents accidental cluster mode activation (e.g., todo-resolve passing text that coincidentally mentions "cluster") and follows the pass-paths-not-content principle.
+
+### Deferred to Implementation
+
+- **Exact wording of the cluster analysis prompt**: The heuristics are defined but the prompt phrasing that gets the LLM orchestrator to reliably detect clusters will need iteration.
+- **Whether the "holistic fix" mode needs examples in the agent**: The agent may need 1-2 examples of cluster-aware evaluation in its `<examples>` section. Testing will show if the current examples plus the new workflow instructions are sufficient.
+
+## High-Level Technical Design
+
+> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.*
+
+```
+Current flow:
+  Fetch -> Triage -> Plan -> Dispatch(per-thread) -> Commit -> Reply -> Verify -> Summary
+
+New flow:
+  Fetch -> Triage -> [Gate Check] -> Plan -> Dispatch -> Commit -> Reply -> Verify -> Summary
+                         |                     |                              |
+                    Gate fires?            If clusters:                  New feedback?
+                    /        \             1 agent/cluster               /          \
+                 YES          NO           If isolated:              YES            NO
+                  |            |            1 agent/thread        (re-entry         done
+           Cluster Analysis    |            (same as today)     triggers gate)
+                  |            |
+           Synthesize briefs   |
+                  \           /
+                   v         v
+                 Plan step (unified)
+```
+
+**Cluster analysis gate:**
+
+The gate uses cheap structural checks — byproducts of triage, not new LLM inference. Cluster analysis only runs when at least one gate signal fires:
+
+| Gate signal | Source | Cost |
+|---|---|---|
+| Volume: 4+ new items total | Item count from triage | Zero — simple count |
+| Verify-loop re-entry: this is the 2nd+ pass | Iteration state | Zero — binary flag |
+
+Same-file concentration is deliberately NOT a gate signal. Multiple items on the same file is the most common feedback pattern and is already handled by existing same-file grouping for conflict avoidance. Running cluster analysis every time 2+ items hit the same file would add overhead to the majority of runs for little benefit. Same-file concentration is valuable *inside* the analysis (once the gate has fired for another reason) as a spatial proximity signal, but shouldn't open the gate itself.
+
+If no gate signal fires (the common case: 1-3 items across different files), skip cluster analysis entirely and proceed to plan/dispatch with zero clustering overhead. If the first pass misses a cluster due to low volume, verify-loop re-entry catches it on the second pass.
+
+**Cluster detection decision matrix:**
+
+Spatial proximity is a hard requirement for clustering. Thematic similarity without proximity is better handled by cross-cycle escalation (Unit 4), which catches the case where the same theme keeps producing new issues across the codebase.
+
+| Thematic similarity | Spatial proximity | Item count | Action |
+|---|---|---|---|
+| Yes | Yes (same file) | 2+ | Cluster -> investigate area |
+| Yes | Yes (same directory/module) | 2+ | Cluster -> investigate area |
+| Yes | No (unrelated locations) | any | No cluster (cross-cycle escalation catches recurring themes) |
+| No | Yes (same file) | any | Same-file grouping only (existing behavior for conflict avoidance) |
+| No | No | any | Individual dispatch (existing behavior) |
+
+Spatial proximity means: same file, or files in the same directory subtree (e.g., `src/auth/login.ts` and `src/auth/middleware.ts` are proximate; `src/auth/login.ts` and `src/database/pool.ts` are not).
+
+**Cluster brief structure:**
+
+The cluster brief is passed to agents in a `<cluster-brief>` XML block for unambiguous activation. Contents are constrained to avoid inflating agent context:
+
+```xml
+<cluster-brief>
+  <theme>Missing input validation</theme>
+  <area>src/auth/</area>
+  <files>src/auth/login.ts, src/auth/register.ts, src/auth/middleware.ts</files>
+  <threads>PRRT_abc123, PRRT_def456, PRRT_ghi789</threads>
+  <hypothesis>Individual validation gaps suggest the module lacks a consistent validation strategy</hypothesis>
+</cluster-brief>
+```
+
+No full comment bodies in the brief. The agent reads threads via their IDs.
+
+**Cross-cycle escalation:**
+
+```
+Verify re-fetch finds new threads
+  -> Any new feedback after a fix round = verify-loop re-entry
+  -> Re-entry automatically triggers the cluster analysis gate
+  -> Cluster analysis receives additional context: files just fixed in previous cycle
+  -> Cap at 2 fix-verify iterations before surfacing to user
+```
+
+No separate concern-category matching for cross-cycle detection. The re-entry itself is the signal. The cluster analysis step (which only runs because the gate fired) handles the thematic grouping and determines whether new feedback relates to just-fixed areas.
+
+## Implementation Units
+
+- [x] **Unit 1: Add gated cluster analysis step to SKILL.md**
+
+**Goal:** Insert a gated step between Triage (Step 2) and Plan (Step 3) that checks whether feedback patterns warrant cluster analysis, and only runs the analysis when they do. The common case (2-3 unrelated comments) skips this step entirely.
+
+**Requirements:** R1, R4, R6
+
+**Dependencies:** None
+
+**Files:**
+- Modify: `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md`
+
+**Approach:**
+- Add new "Step 2.5: Cluster Analysis (Gated)" after the triage step
+- **Gate check first**: Before any thematic analysis, check two structural signals: (a) volume — 4+ new items total, (b) verify-loop re-entry — this is the 2nd+ pass through the workflow. If neither fires, skip to Plan step with zero clustering overhead. Same-file concentration is not a gate signal (it's the most common pattern and already handled by existing same-file grouping), but it is used inside the analysis as a spatial proximity indicator once the gate has fired
+- **If gate fires**: Group items by concern category AND spatial proximity. Concern categories are broad labels assigned during this step (error handling, validation, type safety, naming, performance, etc.) — not free-text; use a fixed category list so labels are consistent and comparable. Use the decision matrix from the technical design section to determine actionable clusters
+- When clusters are found, synthesize a `<cluster-brief>` XML block per cluster: the theme, affected files/areas, the hypothesis, and the list of thread IDs. On verify-loop re-entry, include which files were just fixed in the previous cycle as additional context
+- Items not in any cluster remain as individual items (preserving existing behavior)
+- If the gate fired but no clusters are found after thematic analysis, proceed with all items as individual (the gate was a false positive — no cost beyond the analysis itself)
+- Renumber subsequent steps (current Step 3 becomes Step 4, etc.)
+
+**Patterns to follow:**
+- The existing same-file grouping at SKILL.md lines 107-113 — extend this concept semantically
+- The ce:review skill's merge/dedup pipeline across personas — precedent for cross-item analysis before dispatch
+
+**Test scenarios:**
+- Happy path: 5 items across different files, 3 share a validation theme in same directory -> gate fires (volume >= 4), cluster detected for the 3 validation items, other 2 dispatched individually
+- Edge case: 3 items about same theme on same file -> gate does NOT fire (below volume threshold, not a re-entry). Same-file grouping handles conflict avoidance. If the first pass misses a deeper issue and verify finds new feedback, re-entry catches it on the second pass
+- Edge case: 2 unrelated items on different files -> gate does NOT fire, cluster analysis skipped entirely
+- Edge case: verify-loop re-entry with only 1 new item -> gate fires (re-entry signal), analysis runs with context about just-fixed files
+- Happy path: 1 clustered group + 2 isolated items -> cluster gets a brief in `<cluster-brief>` XML block, isolated items pass through unchanged
+- Edge case: gate fires (volume), 4 items on same file but all different themes -> analysis runs, finds no thematic cluster, proceeds with same-file grouping only (false positive gate, low cost)
+- Edge case: items in same directory subtree (e.g., `src/auth/login.ts` and `src/auth/middleware.ts`) -> proximate, eligible for clustering
+- Edge case: 2 items with same theme in completely unrelated files -> NOT clustered (no spatial proximity)
+
+**Verification:**
+- Gate check runs on every pass at near-zero cost (2 structural checks: item count and re-entry flag)
+- Cluster analysis only runs when gate fires
+- The common case (1-3 items) skips cluster analysis entirely
+- Same-file grouping continues to work independently for conflict avoidance regardless of whether the gate fires
+- Renumbering is consistent throughout the document. Specific cross-references to update: (1) "skip steps 3-7 and go straight to step 8" (line 67), (2) "verification step (step 7)" (line 111), (3) "proceed to step 6" (line 117), (4) "repeat from step 1" (line 189), (5) "step 2" (line 222), (6) Targeted Mode "Full Mode steps 5-6" (line 267)
+
+---
+
+- [x] **Unit 2: Modify dispatch logic for cluster-aware processing**
+
+**Goal:** Change Steps 3-4 (Plan and Implement) so that clusters dispatch a single agent with the cluster brief and all related threads, while isolated items dispatch individually as before.
+
+**Requirements:** R2, R4
+
+**Dependencies:** Unit 1
+
+**Files:**
+- Modify: `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md`
+
+**Approach:**
+- In the Plan step, task items now include both clusters (with their briefs) and isolated items
+- In the Implement step, for each cluster: dispatch ONE pr-comment-resolver agent that receives the `<cluster-brief>` XML block, all thread details in the cluster, and an instruction to read the broader area before fixing
+- For isolated items: dispatch exactly as today (one agent per thread, same-file grouping still applies)
+- Batching rule adjusts: clusters count as 1 dispatch unit regardless of how many threads they contain; batching of 4 applies to dispatch units (clusters + isolated items), not raw thread count
+- Sequential fallback ordering: when the platform does not support parallel dispatch, dispatch cluster units first (they are higher-leverage), then isolated items
+- The agent for a cluster returns one summary per thread it handled (same verdict structure), plus a `cluster_assessment` field describing what broader investigation revealed and whether a holistic or individual approach was taken
+
+**Patterns to follow:**
+- Existing same-file grouping and batching logic at SKILL.md lines 107-113
+- The pr-comment-resolver's multi-thread-on-same-file handling — similar pattern, extended to multi-thread-on-same-theme
+
+**Test scenarios:**
+- Happy path: 1 cluster of 3 threads + 2 isolated threads -> 3 dispatch units (1 cluster agent + 2 individual agents), all within the batch-of-4 limit
+- Happy path: cluster agent receives the `<cluster-brief>` XML block and all 3 thread details in its prompt
+- Edge case: 8 isolated items, no clusters -> existing behavior unchanged (2 batches of 4)
+- Edge case: sequential fallback -> clusters dispatched before isolated items
+- Edge case: 2 clusters of 3 each + 2 isolated -> 4 dispatch units (2 cluster agents + 2 individual agents)
+- Happy path: cluster agent returns per-thread verdicts (one summary per thread, same structure as individual agents)
+
+**Verification:**
+- Clustered threads are handled by a single agent dispatch with the cluster brief as context
+- Isolated threads are dispatched individually as before
+- Batching counts dispatch units, not raw threads
+
+---
+
+- [x] **Unit 3: Extend pr-comment-resolver for cluster investigation**
+
+**Goal:** Add cluster-aware workflow to the pr-comment-resolver agent so it can receive a cluster brief and investigate the broader area before making targeted fixes.
+
+**Requirements:** R2
+
+**Dependencies:** Unit 2
+
+**Files:**
+- Modify: `plugins/compound-engineering/agents/workflow/pr-comment-resolver.md`
+
+**Approach:**
+- Add a "Cluster Mode" section to the agent, structured as a mode detection table (following ce:review's pattern): if a `<cluster-brief>` XML block is present in the prompt, activate cluster mode; otherwise, standard single-thread mode
+- Cluster mode workflow: (1) Parse the `<cluster-brief>` block for theme, area, file paths, thread IDs, and hypothesis. (2) Read the broader area — not just the referenced lines, but the full file(s) and closely related code in the same directory. (3) Assess whether the individual comments are symptoms of a deeper structural issue. (4) If yes: make a holistic fix that addresses the root cause, then verify each thread is resolved by the broader fix. (5) If no: fix each thread individually as in standard mode.
+- The agent returns the standard per-thread verdict summaries plus a `cluster_assessment` field: a brief description of what broader investigation revealed and whether a holistic or individual approach was taken. This field is consumed by the orchestrator's Summary step to present cluster investigation results to the user
+- Add 1-2 examples showing cluster-aware evaluation (e.g., 3 error handling comments -> agent reads broader area, identifies missing error boundary pattern, adds it, resolves all 3 threads)
+- Update the agent's frontmatter description to reflect that it handles one or more related threads (e.g., "Evaluates and resolves one or more related PR review threads -- assesses validity, implements fixes, and returns structured summaries with reply text. Spawned by the resolve-pr-feedback skill.")
+- Preserve existing single-thread behavior unchanged when no `<cluster-brief>` block is present
+
+**Patterns to follow:**
+- Existing multi-thread-on-same-file handling in the agent (it already handles multiple threads sequentially when grouped by file)
+- The evaluation rubric's existing structure — cluster mode adds a preliminary "read broader area" step before applying the rubric to each thread
+
+**Test scenarios:**
+- Happy path: agent receives cluster brief about "missing validation" across 3 functions -> reads full file, identifies validation pattern gap, adds validation helper and applies to all 3 locations, returns 3 `fixed` verdicts + cluster_assessment
+- Happy path: agent receives cluster brief but determines individual fixes suffice (comments are coincidentally in same area but unrelated root causes) -> fixes individually, cluster_assessment says "individual fixes appropriate"
+- Edge case: cluster brief + 1 thread that's actually `not-addressing` -> agent still investigates broadly for the valid threads, returns `not-addressing` for the invalid one
+- Happy path: no `<cluster-brief>` block provided -> existing single-thread behavior unchanged (including when dispatched by todo-resolve, which never sends a cluster brief)
+- Integration: cluster agent's per-thread verdicts flow correctly into the orchestrator's commit/reply/resolve steps
+- Integration: cluster_assessment field is consumed by the Summary step to present investigation results to the user
+
+**Verification:**
+- Agent reads the broader area before fixing when `<cluster-brief>` block is present
+- Agent returns per-thread verdicts compatible with the orchestrator's existing commit/reply/resolve flow
+- Existing single-thread behavior is preserved when no `<cluster-brief>` block is present
+- The `<cluster-brief>` XML delimiter prevents accidental cluster mode activation from other consumers (e.g., todo-resolve)
+
+---
+
+- [x] **Unit 4: Add verify-loop re-entry handling and iteration cap**
+
+**Goal:** Modify the Verify step so that any verify-loop re-entry (new feedback after a fix round) automatically triggers the cluster analysis gate from Unit 1, and cap iterations to prevent infinite loops.
+
+**Requirements:** R3, R6
+
+**Dependencies:** Unit 1
+
+**Files:**
+- Modify: `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md`
+
+**Approach:**
+- In the Verify step, after re-fetching feedback, if new threads remain: record the files and themes just fixed in this cycle, then loop back to Triage (Step 2). The cluster analysis gate in Step 2.5 fires automatically because "verify-loop re-entry" is one of its gate signals. No separate comparison or concern-category matching needed — the cluster analysis step itself handles thematic grouping with the just-fixed context
+- On re-entry, pass the list of files modified in the previous cycle to the cluster analysis step so it can assess whether new feedback relates to just-fixed areas
+- Add an iteration cap: after 2 fix-verify cycles, surface remaining issues to the user with context about the recurring pattern rather than continuing to loop. Frame it as: "Multiple rounds of feedback on [area/theme] suggest a deeper issue. Here's what we've fixed so far and what keeps appearing." (Consistent with ce:review's `max_rounds: 2` bounded re-review loop)
+- The iteration cap applies per-run, not per-cluster
+
+**Patterns to follow:**
+- The existing verify-and-repeat logic at SKILL.md lines 186-189
+- The whack-a-mole state machine pattern from `docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md`
+- The `needs-human` escalation pattern already in the skill — iteration cap uses the same "surface to user with structured context" approach
+- The ce:review `max_rounds: 2` bounded loop precedent
+
+**Test scenarios:**
+- Happy path: fix 3 issues, verify re-fetch finds 2 new issues -> re-entry triggers gate, cluster analysis runs with just-fixed context, new items may form a cluster with the just-fixed area context
+- Happy path: fix 3 issues, verify re-fetch finds 1 unrelated issue on different file -> re-entry triggers gate, cluster analysis runs but finds no cluster (1 item, different area), proceeds with individual dispatch
+- Edge case: 2 fix-verify cycles -> after 2nd cycle, surface to user with "recurring pattern" framing instead of looping again
+- Edge case: fix round resolves everything, verify finds zero new threads -> clean exit, no re-entry
+- Edge case: re-entry with only 1 new item on a file that was just fixed -> gate fires (re-entry), cluster analysis has just-fixed context to assess the connection
+- Integration: verify-loop re-entry feeds into the same gated cluster analysis step from Unit 1 (not a separate mechanism)
+
+**Verification:**
+- Any verify-loop re-entry triggers the cluster analysis gate
+- The cluster analysis step receives just-fixed file context on re-entry
+- Iteration cap prevents infinite fix-verify loops
+- No separate concern-category matching or structural comparison needed for cross-cycle detection
+
+## System-Wide Impact
+
+- **Interaction graph:** The resolve-pr-feedback skill dispatches pr-comment-resolver agents. This change modifies what context those agents receive (`<cluster-brief>` XML block) and how the orchestrator decides dispatch grouping. The commit/reply/resolve flow downstream is unchanged — cluster agents return the same per-thread verdict structure. The `cluster_assessment` field flows into the Summary step as a new section: "Cluster investigations: [count clusters investigated, what was found, holistic vs individual approach taken]."
+- **Error propagation:** If cluster analysis fails or produces no clusters, the skill falls back to existing individual dispatch. The cluster analysis step is additive — failure means the existing behavior, not a broken workflow. "Fails" means the orchestrator produces zero clusters from the analysis — in which case all items are dispatched individually. The user sees no difference from the existing behavior.
+- **State lifecycle risks:** The cross-cycle detection compares "just resolved" threads to "newly appeared" threads. This comparison happens within a single skill run and does not persist state across runs. No new state storage needed.
+- **API surface parity:** The todo-resolve skill also uses pr-comment-resolver but dispatches for individual todos, not PR feedback clusters. No changes needed to todo-resolve — the cluster mode in pr-comment-resolver only activates when a cluster brief is present.
+- **Unchanged invariants:** Targeted mode (single URL) is completely unaffected — it is a separate entry path and never triggers cluster analysis. The verdict taxonomy, reply format, GraphQL scripts, and commit/push flow are all unchanged. The pr-comment-resolver agent's existing single-thread behavior is preserved when no `<cluster-brief>` block is present, ensuring todo-resolve and any other consumers are unaffected.
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| Cluster detection is too aggressive (groups unrelated items) | Require both thematic similarity AND spatial proximity. The decision matrix has clear thresholds. Easy to tune prompt wording if false positives appear. |
+| Cluster detection is too conservative (misses real patterns) | Low threshold (2+ items). Agent time is cheap — false positive clusters just mean a broader read before fixing, which rarely hurts. |
+| Cluster agent makes a holistic fix that breaks something the individual fixes wouldn't have | The agent still returns per-thread verdicts. The verify step catches regressions. The iteration cap prevents infinite loops. |
+| Verify-loop re-entry triggers gate unnecessarily (new feedback is unrelated to just-fixed work) | Low cost — the gate fires, cluster analysis runs, finds no cluster, and proceeds with individual dispatch. The only overhead is the analysis step itself, which is lightweight when no clusters exist. |
+| Cluster analysis runs too often (gate too sensitive) | Only 2 signals: volume >= 4 and re-entry. Volume threshold is tunable. False positive gates add only the analysis step overhead — no agent dispatch, no broader-area reads. |
+| Cluster analysis runs too rarely (gate too conservative) | The gate is additive — if it misses a cluster on the first pass (e.g., 3 items about the same theme, below volume threshold), verify-loop re-entry catches it on the second pass. One extra review cycle is an acceptable cost for keeping the common case fast. |
+| Prompt length growth in SKILL.md | The gated cluster analysis step adds ~40-60 lines. The skill is currently 285 lines. This keeps it under 350, well within reasonable skill length. |
+
+## Sources & References
+
+- Related code: `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md`
+- Related code: `plugins/compound-engineering/agents/workflow/pr-comment-resolver.md`
+- Institutional learning: `docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md`
+- Institutional learning: `docs/solutions/skill-design/claude-permissions-optimizer-classification-fix.md`
+- Institutional learning: `docs/solutions/workflow/todo-status-lifecycle.md`
+- Institutional learning: `docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md`
--- a/plugins/compound-engineering/agents/workflow/pr-comment-resolver.md
+++ b/plugins/compound-engineering/agents/workflow/pr-comment-resolver.md
@@ -1,6 +1,6 @@
 ---
 name: pr-comment-resolver
-description: "Evaluates and resolves a single PR review thread -- assesses validity, implements fixes, and returns a structured summary with reply text. Spawned by the resolve-pr-feedback skill."
+description: "Evaluates and resolves one or more related PR review threads -- assesses validity, implements fixes, and returns structured summaries with reply text. Spawned by the resolve-pr-feedback skill."
 color: blue
 model: inherit
 ---
@@ -18,9 +18,22 @@ user: "Thread PRRT_def456 on api.ts:78 -- reviewer says: 'No error handling for
 assistant: "Reading api.ts... There's a try/catch at line 72 that wraps this fetch call. The reviewer may have missed it. Verdict: not-addressing."
 <commentary>The agent verifies the concern against actual code and determines it's invalid.</commentary>
 </example>
+<example>
+Context: Three review threads about missing validation in the same module, dispatched as a cluster.
+user: "Cluster: 3 threads about missing input validation in src/auth/. <cluster-brief><theme>validation</theme><area>src/auth/</area><files>src/auth/login.ts, src/auth/register.ts, src/auth/middleware.ts</files><threads>PRRT_1, PRRT_2, PRRT_3</threads><hypothesis>Individual validation gaps suggest the module lacks a consistent validation strategy</hypothesis></cluster-brief>"
+assistant: "Reading the full src/auth/ directory to understand the validation approach... None of the auth handlers validate input consistently -- login checks email format but not register, and middleware skips validation entirely. The individual comments are symptoms of a missing validation layer. Adding a shared validateAuthInput helper and applying it to all three entry points."
+<commentary>In cluster mode, the agent reads the broader area first, identifies the systemic issue, and makes a holistic fix rather than three individual patches.</commentary>
+</example>
 </examples>

-You resolve a single PR review thread. You receive the thread ID, file path, line number, and full comment text. Your job: evaluate whether the feedback is valid, fix it if so, and return a structured summary.
+You resolve PR review threads. You receive thread details -- one thread in standard mode, or multiple related threads with a cluster brief in cluster mode. Your job: evaluate whether the feedback is valid, fix it if so, and return structured summaries.
+
+## Mode Detection
+
+| Input | Mode |
+|-------|------|
+| Thread details without `<cluster-brief>` | **Standard** -- evaluate and fix one thread (or one file's worth of threads) |
+| Thread details with `<cluster-brief>` XML block | **Cluster** -- investigate the broader area before making targeted fixes |

 ## Evaluation Rubric

@@ -44,7 +57,7 @@ Before touching any code, read the referenced file and classify the feedback:

 **Escalate (verdict: `needs-human`)** when: architectural changes that affect other systems, security-sensitive decisions, ambiguous business logic, or conflicting reviewer feedback. This should be rare -- most feedback has a clear right answer.

-## Workflow
+## Standard Mode Workflow

 1. **Read the code** at the referenced file and line. For review threads, the file path and line are provided directly. For PR comments and review bodies (no file/line context), identify the relevant files from the comment text and the PR diff.
 2. **Evaluate validity** using the rubric above.
@@ -124,10 +137,39 @@ reason: [one-line explanation]
 decision_context: [only for needs-human -- the full markdown block above]
 ```

+## Cluster Mode Workflow
+
+When a `<cluster-brief>` XML block is present, follow this workflow instead of the standard workflow.
+
+1. **Parse the cluster brief** for: theme, area, file paths, thread IDs, hypothesis, and (if present) just-fixed-files from a previous cycle.
+
+2. **Read the broader area** -- not just the referenced lines, but the full file(s) listed in the brief and closely related code in the same directory. Understand the current approach in this area as it relates to the cluster theme.
+
+3. **Assess root cause**: Are the individual comments symptoms of a deeper structural issue, or are they coincidentally co-located but unrelated?
+   - **Systemic**: The comments point to a missing pattern, inconsistent approach, or architectural gap. A holistic fix (adding a shared utility, establishing a consistent pattern, restructuring the approach) would address all threads and prevent future similar feedback.
+   - **Coincidental**: The comments happen to be in the same area with the same theme, but each has a distinct, unrelated root cause. Individual fixes are appropriate.
+
+4. **Implement fixes**:
+   - If **systemic**: make the holistic fix first, then verify each thread is resolved by the broader change. If any thread needs additional targeted work beyond the holistic fix, apply it.
+   - If **coincidental**: fix each thread individually as in standard mode.
+
+5. **Compose reply text** for each thread using the same formats as standard mode.
+
+6. **Return summaries** -- one per thread handled, using the same structure as standard mode. Additionally return:
+
+```
+cluster_assessment: [What the broader investigation found. Whether a holistic
+or individual approach was taken, and why. If holistic: what the systemic issue
+was and how the fix addresses it. Keep to 2-3 sentences.]
+```
+
+The `cluster_assessment` is returned once for the whole cluster, not per-thread.
+
 ## Principles

- Stay focused on the specific thread. Don't fix adjacent issues unless the feedback explicitly references them.
 - Read before acting. Never assume the reviewer is right without checking the code.
 - Never assume the reviewer is wrong without checking the code.
 - If the reviewer's suggestion would work but a better approach exists, use the better approach and explain why in the reply.
 - Maintain consistency with the existing codebase style and patterns.
+- In standard mode: stay focused on the specific thread. Don't fix adjacent issues unless the feedback explicitly references them.
+- In cluster mode: read broadly, but keep fixes scoped to the cluster theme. Don't use the broader read as an excuse to refactor unrelated code.
--- a/plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md
+++ b/plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md
@@ -64,9 +64,59 @@ Before processing, classify each piece of feedback as **new** or **already handl

 The distinction is about content, not who posted what. A deferral from a teammate, a previous skill run, or a manual reply all count.

-If there are no new items across all feedback types, skip steps 3-7 and go straight to step 8.
+If there are no new items across all feedback types, skip steps 3-8 and go straight to step 9.

-### 3. Plan
+### 3. Cluster Analysis (Gated)
+
+Before planning and dispatching fixes, check whether feedback patterns suggest a systemic issue that warrants broader investigation rather than individual fixes.
+
+**Gate check**: Cluster analysis only runs when at least one signal fires. If neither fires, skip directly to step 4.
+
+| Gate signal | Check |
+|---|---|
+| **Volume** | 4+ new items from triage |
+| **Verify-loop re-entry** | This is the 2nd+ pass through the workflow (new feedback appeared after a previous fix round) |
+
+If the gate does not fire, proceed to step 4. The common case (1-3 unrelated comments) skips this step entirely with zero overhead.
+
+**If the gate fires**, analyze feedback for thematic clusters:
+
+1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each new item gets exactly one category based on what the feedback is about.
+
+2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree).
+
+   | Thematic match | Spatial proximity | Action |
+   |---|---|---|
+   | Same category | Same file | Cluster |
+   | Same category | Same directory subtree | Cluster |
+   | Same category | Unrelated locations | No cluster |
+   | Different categories | Any | No cluster (same-file grouping still applies for conflict avoidance) |
+
+3. **Synthesize a cluster brief** for each cluster of 2+ items. Pass briefs to agents using a `<cluster-brief>` XML block:
+
+   ```xml
+   <cluster-brief>
+     <theme>[concern category]</theme>
+     <area>[common directory path]</area>
+     <files>[comma-separated file paths]</files>
+     <threads>[comma-separated thread/comment IDs]</threads>
+     <hypothesis>[one sentence: what the individual comments collectively suggest about a deeper issue]</hypothesis>
+   </cluster-brief>
+   ```
+
+   On verify-loop re-entry, add context about the previous cycle:
+   ```xml
+   <cluster-brief>
+     ...
+     <just-fixed-files>[files modified in the previous fix cycle]</just-fixed-files>
+   </cluster-brief>
+   ```
+
+4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5.
+
+5. **If no clusters are found** after analysis (the gate fired but items don't form thematic+spatial groups), proceed with all items as individual. The gate was a false positive -- the only cost was the analysis itself.
+
+### 4. Plan

 Create a task list of all **new** unresolved items grouped by type (e.g., `TaskCreate` in Claude Code, `update_plan` in Codex):
 - Code changes requested
@@ -74,11 +124,15 @@ Create a task list of all **new** unresolved items grouped by type (e.g., `TaskC
 - Style/convention fixes
 - Test additions needed

-### 4. Implement (PARALLEL)
+If step 3 produced clusters, include them in the task list as cluster items alongside individual items.
+
+### 5. Implement (PARALLEL)

 Process all three feedback types. Review threads are the primary type; PR comments and review bodies are secondary but should not be ignored.

-**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each.
+#### Individual dispatch (default)
+
+**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually.

 Each agent receives:
 - The thread ID
@@ -87,7 +141,19 @@ Each agent receives:
 - The PR number (for context)
 - The feedback type (`review_thread`)

-**For PR comments and review bodies** (`pr_comments`, `review_bodies`): These lack file/line context. Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each actionable item. The agent receives the comment ID, body text, PR number, and feedback type (`pr_comment` or `review_body`). The agent must identify the relevant files from the comment text and the PR diff.
+**For PR comments and review bodies** (`pr_comments`, `review_bodies`): These lack file/line context. Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each actionable non-clustered item. The agent receives the comment ID, body text, PR number, and feedback type (`pr_comment` or `review_body`). The agent must identify the relevant files from the comment text and the PR diff.
+
+#### Cluster dispatch
+
+For each cluster identified in step 3, dispatch ONE `compound-engineering:workflow:pr-comment-resolver` agent that receives:
+- The `<cluster-brief>` XML block
+- All thread details for threads in the cluster (IDs, file paths, line numbers, comment text)
+- The PR number
+- The feedback types
+
+The cluster agent reads the broader area before making targeted fixes. It returns one summary per thread it handled (same structure as individual agents), plus a `cluster_assessment` field describing what broader investigation revealed and whether a holistic or individual approach was taken.
+
+#### Agent return format

 Each agent returns a short summary:
 - **verdict**: `fixed`, `fixed-differently`, `replied`, `not-addressing`, or `needs-human`
@@ -97,6 +163,9 @@ Each agent returns a short summary:
 - **files_changed**: list of files modified (empty if replied/not-addressing)
 - **reason**: brief explanation of what was done or why it was skipped

+Cluster agents additionally return:
+- **cluster_assessment**: what the broader investigation found, whether a holistic or individual approach was taken
+
 Verdict meanings:
 - `fixed` -- code change made as requested
 - `fixed-differently` -- code change made, but with a better approach than suggested
@@ -104,17 +173,19 @@ Verdict meanings:
 - `not-addressing` -- feedback is factually wrong about the code; skip with evidence
 - `needs-human` -- cannot determine the right action; needs user decision

-**Batching**: If there are 1-4 items total, dispatch all in parallel. For 5+ items, batch in groups of 4.
+#### Batching and conflict avoidance

-**Conflict avoidance**: If multiple threads reference the same file, group them into a single agent dispatch to avoid parallel edit conflicts. The agent handling a multi-thread file receives all threads for that file and addresses them sequentially.
+**Batching**: Clusters count as 1 dispatch unit regardless of how many threads they contain. If there are 1-4 dispatch units total (clusters + individual items), dispatch all in parallel. For 5+ dispatch units, batch in groups of 4.

-Fixes can occasionally expand beyond their referenced file (e.g., renaming a method updates callers elsewhere). This is rare but can cause parallel agents to collide. The verification step (step 7) catches this -- if re-fetching shows unresolved threads or if the commit reveals inconsistent changes, re-run the affected agents sequentially.
+**Conflict avoidance**: No two dispatch units that touch the same file should run in parallel. Before dispatching, check for file overlaps across all dispatch units (clusters and individual items). If a cluster's file list overlaps with an individual item's file, or with another cluster's files, serialize those units -- dispatch one, wait for it to complete, then dispatch the next. Non-overlapping units can still run in parallel. Within a single dispatch unit handling multiple threads on the same file, the agent addresses them sequentially.

-Platforms that do not support parallel dispatch should run agents sequentially.
+**Sequential fallback**: Platforms that do not support parallel dispatch should run agents sequentially. Dispatch cluster units first (they are higher-leverage), then individual items.

-### 5. Commit and Push
+Fixes can occasionally expand beyond their referenced file (e.g., renaming a method updates callers elsewhere). This is rare but can cause parallel agents to collide. The verification step (step 8) catches this -- if re-fetching shows unresolved threads or if the commit reveals inconsistent changes, re-run the affected agents sequentially.

-After all agents complete, check whether any files were actually changed. If all verdicts are `replied`, `not-addressing`, or `needs-human` (no code changes), skip this step entirely and proceed to step 6.
+### 6. Commit and Push
+
+After all agents complete, check whether any files were actually changed. If all verdicts are `replied`, `not-addressing`, or `needs-human` (no code changes), skip this step entirely and proceed to step 7.

 If there are file changes:

@@ -132,7 +203,7 @@ git commit -m "Address PR review feedback (#PR_NUMBER)
 git push
 ```

-### 6. Reply and Resolve
+### 7. Reply and Resolve

 After the push succeeds, post replies and resolve where applicable. The mechanism depends on the feedback type.

@@ -178,7 +249,7 @@ gh pr comment PR_NUMBER --body "REPLY_TEXT"

 Include enough quoted context in the reply so the reader can follow which comment is being addressed without scrolling.

-### 7. Verify
+### 8. Verify

 Re-fetch feedback to confirm resolution:

@@ -186,11 +257,17 @@ Re-fetch feedback to confirm resolution:
 bash scripts/get-pr-comments PR_NUMBER
 ```

-The `review_threads` array should be empty (except `needs-human` items). If threads remain, repeat from step 1 for the remaining threads.
+The `review_threads` array should be empty (except `needs-human` items).
+
+**If new threads remain**, check the iteration count for this run:
+
+- **First or second fix-verify cycle**: Record which files were modified and which concern categories were addressed in this cycle. Then repeat from step 2 for the remaining threads. The cluster analysis gate (step 3) will fire on re-entry because verify-loop re-entry is a gate signal, enabling broader investigation of recurring patterns.
+
+- **After the second fix-verify cycle** (3rd pass would begin): Stop looping. Surface remaining issues to the user with context about the recurring pattern: "Multiple rounds of feedback on [area/theme] suggest a deeper issue. Here's what we've fixed so far and what keeps appearing." Use the same `needs-human` escalation pattern -- leave threads open and present the pattern for the user to decide.

 PR comments and review bodies have no resolve mechanism, so they will still appear in the output. Verify they were replied to by checking the PR conversation.

-### 8. Summary
+### 9. Summary

 Present a concise summary of all work done. Group by verdict, one line per item describing *what was done* not just *where*. This is the primary output the user sees.

@@ -205,6 +282,15 @@ Replied (count): [what questions were answered]
 Not addressing (count): [what was skipped and why]
 ```

+If any clusters were investigated, append a cluster investigation section:
+
+```
+Cluster investigations (count):
+
+1. [theme] in [area]: [cluster_assessment from the agent --
+   what was found, whether a holistic or individual approach was taken]
+```
+
 If any agent returned `needs-human`, append a decisions section. These are rare but high-signal. Each `needs-human` agent returns a `decision_context` field with a structured analysis: what the reviewer said, what the agent investigated, why it needs a decision, concrete options with tradeoffs, and the agent's lean if it has one.

 Present the `decision_context` directly -- it's already structured for the user to read and decide quickly:
@@ -264,7 +350,7 @@ This fetches thread IDs and their first comment IDs (minimal fields, no bodies)

 ### 2. Fix, Reply, Resolve

-Spawn a single `compound-engineering:workflow:pr-comment-resolver` agent for the thread. Then follow the same commit -> push -> reply -> resolve flow as Full Mode steps 5-6.
+Spawn a single `compound-engineering:workflow:pr-comment-resolver` agent for the thread. Then follow the same commit -> push -> reply -> resolve flow as Full Mode steps 6-7.

 ---