feat(resolve-pr-feedback): add gated feedback clustering to detect systemic issues (#441)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -64,9 +64,59 @@ Before processing, classify each piece of feedback as **new** or **already handl
|
||||
|
||||
The distinction is about content, not who posted what. A deferral from a teammate, a previous skill run, or a manual reply all count.
|
||||
|
||||
If there are no new items across all feedback types, skip steps 3-7 and go straight to step 8.
|
||||
If there are no new items across all feedback types, skip steps 3-8 and go straight to step 9.
|
||||
|
||||
### 3. Plan
|
||||
### 3. Cluster Analysis (Gated)
|
||||
|
||||
Before planning and dispatching fixes, check whether feedback patterns suggest a systemic issue that warrants broader investigation rather than individual fixes.
|
||||
|
||||
**Gate check**: Cluster analysis only runs when at least one signal fires. If neither fires, skip directly to step 4.
|
||||
|
||||
| Gate signal | Check |
|
||||
|---|---|
|
||||
| **Volume** | 4+ new items from triage |
|
||||
| **Verify-loop re-entry** | This is the 2nd+ pass through the workflow (new feedback appeared after a previous fix round) |
|
||||
|
||||
If the gate does not fire, proceed to step 4. The common case (1-3 unrelated comments) skips this step entirely with zero overhead.
|
||||
|
||||
**If the gate fires**, analyze feedback for thematic clusters:
|
||||
|
||||
1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each new item gets exactly one category based on what the feedback is about.
|
||||
|
||||
2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree).
|
||||
|
||||
| Thematic match | Spatial proximity | Action |
|
||||
|---|---|---|
|
||||
| Same category | Same file | Cluster |
|
||||
| Same category | Same directory subtree | Cluster |
|
||||
| Same category | Unrelated locations | No cluster |
|
||||
| Different categories | Any | No cluster (same-file grouping still applies for conflict avoidance) |
|
||||
|
||||
3. **Synthesize a cluster brief** for each cluster of 2+ items. Pass briefs to agents using a `<cluster-brief>` XML block:
|
||||
|
||||
```xml
|
||||
<cluster-brief>
|
||||
<theme>[concern category]</theme>
|
||||
<area>[common directory path]</area>
|
||||
<files>[comma-separated file paths]</files>
|
||||
<threads>[comma-separated thread/comment IDs]</threads>
|
||||
<hypothesis>[one sentence: what the individual comments collectively suggest about a deeper issue]</hypothesis>
|
||||
</cluster-brief>
|
||||
```
|
||||
|
||||
On verify-loop re-entry, add context about the previous cycle:
|
||||
```xml
|
||||
<cluster-brief>
|
||||
...
|
||||
<just-fixed-files>[files modified in the previous fix cycle]</just-fixed-files>
|
||||
</cluster-brief>
|
||||
```
|
||||
|
||||
4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5.
|
||||
|
||||
5. **If no clusters are found** after analysis (the gate fired but items don't form thematic+spatial groups), proceed with all items as individual. The gate was a false positive -- the only cost was the analysis itself.
|
||||
|
||||
### 4. Plan
|
||||
|
||||
Create a task list of all **new** unresolved items grouped by type (e.g., `TaskCreate` in Claude Code, `update_plan` in Codex):
|
||||
- Code changes requested
|
||||
@@ -74,11 +124,15 @@ Create a task list of all **new** unresolved items grouped by type (e.g., `TaskC
|
||||
- Style/convention fixes
|
||||
- Test additions needed
|
||||
|
||||
### 4. Implement (PARALLEL)
|
||||
If step 3 produced clusters, include them in the task list as cluster items alongside individual items.
|
||||
|
||||
### 5. Implement (PARALLEL)
|
||||
|
||||
Process all three feedback types. Review threads are the primary type; PR comments and review bodies are secondary but should not be ignored.
|
||||
|
||||
**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each.
|
||||
#### Individual dispatch (default)
|
||||
|
||||
**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually.
|
||||
|
||||
Each agent receives:
|
||||
- The thread ID
|
||||
@@ -87,7 +141,19 @@ Each agent receives:
|
||||
- The PR number (for context)
|
||||
- The feedback type (`review_thread`)
|
||||
|
||||
**For PR comments and review bodies** (`pr_comments`, `review_bodies`): These lack file/line context. Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each actionable item. The agent receives the comment ID, body text, PR number, and feedback type (`pr_comment` or `review_body`). The agent must identify the relevant files from the comment text and the PR diff.
|
||||
**For PR comments and review bodies** (`pr_comments`, `review_bodies`): These lack file/line context. Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each actionable non-clustered item. The agent receives the comment ID, body text, PR number, and feedback type (`pr_comment` or `review_body`). The agent must identify the relevant files from the comment text and the PR diff.
|
||||
|
||||
#### Cluster dispatch
|
||||
|
||||
For each cluster identified in step 3, dispatch ONE `compound-engineering:workflow:pr-comment-resolver` agent that receives:
|
||||
- The `<cluster-brief>` XML block
|
||||
- All thread details for threads in the cluster (IDs, file paths, line numbers, comment text)
|
||||
- The PR number
|
||||
- The feedback types
|
||||
|
||||
The cluster agent reads the broader area before making targeted fixes. It returns one summary per thread it handled (same structure as individual agents), plus a `cluster_assessment` field describing what broader investigation revealed and whether a holistic or individual approach was taken.
|
||||
|
||||
#### Agent return format
|
||||
|
||||
Each agent returns a short summary:
|
||||
- **verdict**: `fixed`, `fixed-differently`, `replied`, `not-addressing`, or `needs-human`
|
||||
@@ -97,6 +163,9 @@ Each agent returns a short summary:
|
||||
- **files_changed**: list of files modified (empty if replied/not-addressing)
|
||||
- **reason**: brief explanation of what was done or why it was skipped
|
||||
|
||||
Cluster agents additionally return:
|
||||
- **cluster_assessment**: what the broader investigation found, whether a holistic or individual approach was taken
|
||||
|
||||
Verdict meanings:
|
||||
- `fixed` -- code change made as requested
|
||||
- `fixed-differently` -- code change made, but with a better approach than suggested
|
||||
@@ -104,17 +173,19 @@ Verdict meanings:
|
||||
- `not-addressing` -- feedback is factually wrong about the code; skip with evidence
|
||||
- `needs-human` -- cannot determine the right action; needs user decision
|
||||
|
||||
**Batching**: If there are 1-4 items total, dispatch all in parallel. For 5+ items, batch in groups of 4.
|
||||
#### Batching and conflict avoidance
|
||||
|
||||
**Conflict avoidance**: If multiple threads reference the same file, group them into a single agent dispatch to avoid parallel edit conflicts. The agent handling a multi-thread file receives all threads for that file and addresses them sequentially.
|
||||
**Batching**: Clusters count as 1 dispatch unit regardless of how many threads they contain. If there are 1-4 dispatch units total (clusters + individual items), dispatch all in parallel. For 5+ dispatch units, batch in groups of 4.
|
||||
|
||||
Fixes can occasionally expand beyond their referenced file (e.g., renaming a method updates callers elsewhere). This is rare but can cause parallel agents to collide. The verification step (step 7) catches this -- if re-fetching shows unresolved threads or if the commit reveals inconsistent changes, re-run the affected agents sequentially.
|
||||
**Conflict avoidance**: No two dispatch units that touch the same file should run in parallel. Before dispatching, check for file overlaps across all dispatch units (clusters and individual items). If a cluster's file list overlaps with an individual item's file, or with another cluster's files, serialize those units -- dispatch one, wait for it to complete, then dispatch the next. Non-overlapping units can still run in parallel. Within a single dispatch unit handling multiple threads on the same file, the agent addresses them sequentially.
|
||||
|
||||
Platforms that do not support parallel dispatch should run agents sequentially.
|
||||
**Sequential fallback**: Platforms that do not support parallel dispatch should run agents sequentially. Dispatch cluster units first (they are higher-leverage), then individual items.
|
||||
|
||||
### 5. Commit and Push
|
||||
Fixes can occasionally expand beyond their referenced file (e.g., renaming a method updates callers elsewhere). This is rare but can cause parallel agents to collide. The verification step (step 8) catches this -- if re-fetching shows unresolved threads or if the commit reveals inconsistent changes, re-run the affected agents sequentially.
|
||||
|
||||
After all agents complete, check whether any files were actually changed. If all verdicts are `replied`, `not-addressing`, or `needs-human` (no code changes), skip this step entirely and proceed to step 6.
|
||||
### 6. Commit and Push
|
||||
|
||||
After all agents complete, check whether any files were actually changed. If all verdicts are `replied`, `not-addressing`, or `needs-human` (no code changes), skip this step entirely and proceed to step 7.
|
||||
|
||||
If there are file changes:
|
||||
|
||||
@@ -132,7 +203,7 @@ git commit -m "Address PR review feedback (#PR_NUMBER)
|
||||
git push
|
||||
```
|
||||
|
||||
### 6. Reply and Resolve
|
||||
### 7. Reply and Resolve
|
||||
|
||||
After the push succeeds, post replies and resolve where applicable. The mechanism depends on the feedback type.
|
||||
|
||||
@@ -178,7 +249,7 @@ gh pr comment PR_NUMBER --body "REPLY_TEXT"
|
||||
|
||||
Include enough quoted context in the reply so the reader can follow which comment is being addressed without scrolling.
|
||||
|
||||
### 7. Verify
|
||||
### 8. Verify
|
||||
|
||||
Re-fetch feedback to confirm resolution:
|
||||
|
||||
@@ -186,11 +257,17 @@ Re-fetch feedback to confirm resolution:
|
||||
bash scripts/get-pr-comments PR_NUMBER
|
||||
```
|
||||
|
||||
The `review_threads` array should be empty (except `needs-human` items). If threads remain, repeat from step 1 for the remaining threads.
|
||||
The `review_threads` array should be empty (except `needs-human` items).
|
||||
|
||||
**If new threads remain**, check the iteration count for this run:
|
||||
|
||||
- **First or second fix-verify cycle**: Record which files were modified and which concern categories were addressed in this cycle. Then repeat from step 2 for the remaining threads. The cluster analysis gate (step 3) will fire on re-entry because verify-loop re-entry is a gate signal, enabling broader investigation of recurring patterns.
|
||||
|
||||
- **After the second fix-verify cycle** (3rd pass would begin): Stop looping. Surface remaining issues to the user with context about the recurring pattern: "Multiple rounds of feedback on [area/theme] suggest a deeper issue. Here's what we've fixed so far and what keeps appearing." Use the same `needs-human` escalation pattern -- leave threads open and present the pattern for the user to decide.
|
||||
|
||||
PR comments and review bodies have no resolve mechanism, so they will still appear in the output. Verify they were replied to by checking the PR conversation.
|
||||
|
||||
### 8. Summary
|
||||
### 9. Summary
|
||||
|
||||
Present a concise summary of all work done. Group by verdict, one line per item describing *what was done* not just *where*. This is the primary output the user sees.
|
||||
|
||||
@@ -205,6 +282,15 @@ Replied (count): [what questions were answered]
|
||||
Not addressing (count): [what was skipped and why]
|
||||
```
|
||||
|
||||
If any clusters were investigated, append a cluster investigation section:
|
||||
|
||||
```
|
||||
Cluster investigations (count):
|
||||
|
||||
1. [theme] in [area]: [cluster_assessment from the agent --
|
||||
what was found, whether a holistic or individual approach was taken]
|
||||
```
|
||||
|
||||
If any agent returned `needs-human`, append a decisions section. These are rare but high-signal. Each `needs-human` agent returns a `decision_context` field with a structured analysis: what the reviewer said, what the agent investigated, why it needs a decision, concrete options with tradeoffs, and the agent's lean if it has one.
|
||||
|
||||
Present the `decision_context` directly -- it's already structured for the user to read and decide quickly:
|
||||
@@ -264,7 +350,7 @@ This fetches thread IDs and their first comment IDs (minimal fields, no bodies)
|
||||
|
||||
### 2. Fix, Reply, Resolve
|
||||
|
||||
Spawn a single `compound-engineering:workflow:pr-comment-resolver` agent for the thread. Then follow the same commit -> push -> reply -> resolve flow as Full Mode steps 5-6.
|
||||
Spawn a single `compound-engineering:workflow:pr-comment-resolver` agent for the thread. Then follow the same commit -> push -> reply -> resolve flow as Full Mode steps 6-7.
|
||||
|
||||
---
|
||||
|
||||
|
||||
Reference in New Issue
Block a user