diff --git a/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md b/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md index 27f6fc3..0f46dd7 100644 --- a/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md +++ b/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md @@ -126,25 +126,21 @@ decision_context: [only for needs-human -- the full markdown block above] When a `` XML block is present, follow this workflow instead of the standard workflow. -1. **Parse the cluster brief** for: theme, area, file paths, thread IDs, hypothesis, and (if present) `` listing previously-resolved threads from earlier review rounds with their IDs, file paths, and concern categories. +Cluster briefs always represent a cross-invocation cluster: the same concern category has appeared across multiple review rounds, and `` lists the previously-resolved threads from earlier rounds. + +1. **Parse the cluster brief** for: theme, area, file paths, thread IDs, hypothesis, and `` listing previously-resolved threads with their IDs, file paths, and concern categories. 2. **Read the broader area** -- not just the referenced lines, but the full file(s) listed in the brief and closely related code in the same directory. Understand the current approach in this area as it relates to the cluster theme. -3. **Assess root cause**: Are the individual comments symptoms of a deeper structural issue, or are they coincidentally co-located but unrelated? - - **Without ``** (single-round cluster): - - **Systemic**: The comments point to a missing pattern, inconsistent approach, or architectural gap. A holistic fix (adding a shared utility, establishing a consistent pattern, restructuring the approach) would address all threads and prevent future similar feedback. - - **Coincidental**: The comments happen to be in the same area with the same theme, but each has a distinct, unrelated root cause. Individual fixes are appropriate. - - **With ``** (cross-invocation cluster — the same concern category has appeared across multiple review rounds): +3. **Assess root cause**. Pick one mode: - **Band-aid fixes**: Prior fixes addressed symptoms, not the root cause. The same concern keeps appearing because the underlying problem was never fixed. Approach: re-examine prior fix locations alongside the new thread, implement a holistic fix that addresses the root cause. - **Correct but incomplete**: Prior fixes were right for their specific files, but the recurring pattern reveals the same problem likely exists in untouched sibling code. This is the highest-value mode. Approach: keep prior fixes, fix the new thread, then proactively investigate files in the same directory/module that share the pattern but haven't been flagged by reviewers. Report what was found in the cluster assessment. - **Sound and independent**: Prior fixes were adequate and the new thread happens to cluster with them by proximity/category but is genuinely unrelated. Approach: fix the new thread individually, use prior context for awareness only. 4. **Implement fixes**: - - If **systemic** or **band-aid**: make the holistic fix first, then verify each thread is resolved by the broader change. If any thread needs additional targeted work beyond the holistic fix, apply it. + - If **band-aid**: make the holistic fix first, then verify each thread is resolved by the broader change. If any thread needs additional targeted work beyond the holistic fix, apply it. - If **correct but incomplete**: fix the new thread, then investigate sibling files in the cluster's `` for the same pattern. Fix any additional instances found. Stay within the area boundary. - - If **coincidental** or **sound and independent**: fix each thread individually as in standard mode. + - If **sound and independent**: fix each thread individually as in standard mode. 5. **Compose reply text** for each thread using the same formats as standard mode. @@ -152,9 +148,9 @@ When a `` XML block is present, follow this workflow instead of t ``` cluster_assessment: [What the broader investigation found. Which assessment mode -was applied (systemic/coincidental for single-round, or band-aid/correct-but-incomplete/ -sound-and-independent for cross-invocation). If correct-but-incomplete: which additional -files were investigated and what was found. Keep to 2-4 sentences.] +was applied (band-aid / correct-but-incomplete / sound-and-independent). If +correct-but-incomplete: which additional files were investigated and what was +found. Keep to 2-4 sentences.] ``` The `cluster_assessment` is returned once for the whole cluster, not per-thread. diff --git a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md index 92aa94f..4b7d100 100644 --- a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md +++ b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md @@ -76,33 +76,33 @@ The distinction is about content, not who posted what. A deferral from a teammat If there are no new items across all feedback types, skip steps 3-9 and go straight to step 10. -### 3. Cluster Analysis (Gated) +### 3. Cross-Invocation Cluster Analysis (Gated) -Before planning and dispatching fixes, check whether feedback patterns suggest a systemic issue that warrants broader investigation rather than individual fixes. +Before planning and dispatching fixes, check whether the same concern has appeared across multiple review rounds — evidence of a recurring pattern that warrants broader investigation rather than another surgical fix. -**Gate check**: Cluster analysis only runs when at least one signal fires. If neither fires, skip directly to step 4. +**Gate check (two stages)**: Both must pass, or skip to step 4. -| Gate signal | Check | -|---|---| -| **Volume** | 3+ new items from triage | -| **Cross-invocation** | `cross_invocation.signal == true` in the script output (resolved threads exist alongside new ones — evidence of multi-round review) | +1. **Signal stage**: `cross_invocation.signal == true` in the script output — resolved threads exist alongside new ones. First-round reviews always fail this stage. +2. **Spatial-overlap precheck**: at least one new `review_thread` shares an exact file path or directory subtree with a thread in `cross_invocation.resolved_threads`. The signal alone only means multi-round review exists; it is not itself evidence that recurring feedback has landed in the same area. This precheck compares paths only — no category inference, no LLM calls — so the false-positive tax is cheap. Skip this stage if the script output lacks file paths on resolved threads; in that case the signal stage governs alone. -If the gate does not fire, proceed to step 4. The common case (first review round with 1-2 comments) skips this step entirely with zero overhead. +Only inline `review_threads` participate in the precheck. `pr_comments` and `review_bodies` have no file paths and cannot contribute to spatial overlap; they are always dispatched individually regardless of clustering. -**If the gate fires**, analyze feedback for thematic clusters. When the cross-invocation signal fired, include resolved threads from `cross_invocation.resolved_threads` alongside new threads in the analysis — these are previously-resolved threads from earlier review rounds that provide pattern context. Mark them as `previously_resolved` so dispatch (step 5) knows not to individually re-resolve them. +Single-round clustering (grouping new-only threads by theme + proximity within one review) is deliberately not performed: the evidence is too thin to justify holistic fixes and the false-positive rate is high. First-round "one helper would fix all of these" opportunities are handled as individual fixes until repeated reviewer evidence promotes the pattern into cross-invocation mode. + +**If both gate stages pass**, analyze feedback for thematic clusters that span new threads and previously-resolved threads. Include resolved threads from `cross_invocation.resolved_threads` alongside new threads in the analysis. Mark prior-resolved threads as `previously_resolved` so dispatch (step 5) knows not to individually re-resolve them. 1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each item (new and previously-resolved) gets exactly one category based on what the feedback is about. -2. **Group by category + spatial proximity**. Form groups from all categorized items -- new and previously-resolved together, not new items only. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree). +2. **Group by category + spatial proximity, requiring cross-round evidence**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree). A cluster must contain **at least one previously-resolved thread** — a new-only group lacks cross-round evidence and is dispatched individually. - | Thematic match | Spatial proximity | Action | - |---|---|---| - | Same category | Same file | Cluster | - | Same category | Same directory subtree | Cluster | - | Same category | Unrelated locations | No cluster | - | Different categories | Any | No cluster (same-file grouping still applies for conflict avoidance) | + | Thematic match | Spatial proximity | Contains prior-resolved? | Action | + |---|---|---|---| + | Same category | Same file or subtree | Yes | Cluster | + | Same category | Same file or subtree | No (new-only) | No cluster | + | Same category | Unrelated locations | Any | No cluster | + | Different categories | Any | Any | No cluster | -3. **Synthesize a cluster brief** for each cluster of 2+ items. Pass briefs to agents using a `` XML block: +3. **Synthesize a cluster brief** for each cluster. Pass briefs to agents using a `` XML block: ```xml @@ -110,18 +110,18 @@ If the gate does not fire, proceed to step 4. The common case (first review roun [common directory path] [comma-separated file paths] [comma-separated new thread/comment IDs] - [one sentence: what the individual comments collectively suggest about a deeper issue] + [one sentence: what the recurring feedback across rounds suggests about a deeper issue] ``` - The `` element lists previously-resolved threads that clustered with the new threads — their IDs, file paths, and assigned concern categories. This gives the resolver agent the full cross-round picture. When no previously-resolved threads are in the cluster, omit the element. + The `` element is always present and lists the previously-resolved threads in the cluster — their IDs, file paths, and concern categories. This gives the resolver agent the full cross-round picture. -4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5. Previously-resolved threads that don't cluster with any new thread are dropped — they provided context but no pattern was found. +4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5. Previously-resolved threads that don't cluster with any new thread are dropped — they provided context but no cross-round pattern was found. -5. **If no clusters are found** after analysis (the gate fired but items don't form thematic+spatial groups), proceed with all items as individual. The gate was a false positive -- the only cost was the analysis itself. +5. **If no clusters are found** after analysis (the signal fired but no new thread clustered with a prior-resolved thread), proceed with all items as individual. The only cost was the analysis itself. ### 4. Plan