feat(resolve-pr-feedback): add cross-invocation cluster analysis (#480)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -0,0 +1,79 @@
|
||||
---
|
||||
date: 2026-04-01
|
||||
topic: cross-invocation-cluster-analysis
|
||||
---
|
||||
|
||||
# Cross-Invocation Cluster Analysis for resolve-pr-feedback
|
||||
|
||||
## Problem Frame
|
||||
|
||||
The resolve-pr-feedback skill's cluster analysis is gated on two signals: volume (3+ items) and verify-loop re-entry (2nd+ pass within the same invocation). The verify-loop signal is effectively dead — it requires new review threads to appear between push and verify, but automated reviewers take minutes while verify runs seconds after push. The timing gap makes this gate unreliable at best, and in the common case of automated reviewers, impossible.
|
||||
|
||||
This leaves volume as the only working gate. The skill misses the exact scenario clustering was designed for: a reviewer posts feedback about the same *class* of problem across multiple rounds, with each round containing only 1-2 threads. Individually, no round triggers the volume gate. But taken together, there's a clear recurring pattern — e.g., "three separate rounds of feedback all about missing convergence behavior in target writers." The skill should step back and investigate the problem class holistically rather than applying band-aids to each instance.
|
||||
|
||||
## Requirements
|
||||
|
||||
**Detection Signal**
|
||||
|
||||
- R1. Replace the verify-loop re-entry gate signal with a cross-invocation awareness signal. Before triaging, the skill checks whether it has previously resolved threads on this same PR. Its own prior reply comments are the evidence.
|
||||
- R2. If prior resolutions exist and new unresolved feedback has arrived since the last resolution, that constitutes the re-entry signal — even with just 1 new item. If no prior resolutions are found (first invocation), the cross-invocation signal does not fire and processing continues with the volume gate as the only cluster trigger.
|
||||
- R3. The volume gate (3+ items) remains unchanged as a parallel trigger. The two gates are OR'd: either one fires cluster analysis.
|
||||
|
||||
**Cost Control**
|
||||
|
||||
- R9. Cross-invocation detection must not add GraphQL API calls. The existing `get-pr-comments` query should be broadened to return both unresolved and resolved threads (with skill replies) in a single call. All cross-invocation analysis — detection, overlap check, clustering — works on data already in memory from that one call.
|
||||
- R10. Cross-invocation clustering is scoped to the last N resolution rounds (not all history). A "round" is the set of threads resolved in a single skill invocation. This bounds the data the skill processes regardless of PR history length. Planning should determine the right value of N; 2-3 rounds is likely sufficient since recurring patterns surface in recent history.
|
||||
- R11. When the cross-invocation signal fires but the volume gate does not, the skill runs a lightweight overlap check first: compare concern categories and file paths between new and prior threads using data already fetched. Promote to full clustering only if category or spatial overlap exists. If no overlap, skip clustering and process the new thread(s) individually.
|
||||
|
||||
**Clustering Input**
|
||||
|
||||
- R4. When the cross-invocation signal fires and overlap is confirmed (R11), cluster analysis considers both the new thread(s) AND previously-resolved threads from the last N rounds as input. This enables detecting that the same concern category keeps recurring across rounds.
|
||||
- R5. Previously-resolved threads are included in category assignment and spatial grouping alongside new threads, so clusters can span rounds.
|
||||
|
||||
**Resolver Behavior on Cross-Invocation Clusters**
|
||||
|
||||
- R6. When a cross-invocation cluster forms, the resolver agent assesses the prior fixes and applies one of three modes:
|
||||
- **Band-aid fixes** — prior fixes addressed symptoms, not root cause. Re-examine and potentially redo them as part of a holistic fix.
|
||||
- **Correct but incomplete** — prior fixes were right for their scope, but the recurring pattern reveals the same problem likely exists in untouched sibling code. Keep prior fixes, fix the new thread, and proactively investigate whether the pattern extends to code no reviewer has flagged yet. This is the highest-value mode — it's what catches "three rounds of the same concern category in different files means there are probably more files with the same issue."
|
||||
- **Sound and independent** — prior fixes were adequate and the new thread is genuinely unrelated despite clustering. Use prior context for awareness only.
|
||||
- R7. The cluster brief XML gains a `<prior-resolutions>` element listing previously-resolved thread IDs and their concern categories, with reply timestamps (createdAt) to establish ordering across rounds, so the resolver agent has the full cross-round picture.
|
||||
|
||||
**Within-Session Verify Loop**
|
||||
|
||||
- R8. The within-session verify loop (step 8: if new threads remain, repeat from step 2) continues to function as a workflow mechanism. Replies posted during earlier cycles within the same session count as prior resolutions for the cross-invocation signal, so the new gate naturally subsumes the old verify-loop re-entry gate.
|
||||
|
||||
## Success Criteria
|
||||
|
||||
- Recurring feedback about the same problem class across 2+ rounds triggers cluster analysis, even when each round has only 1-2 threads
|
||||
- A single new thread on a PR with prior resolutions in the same concern category produces a cluster brief that includes both the new and old threads
|
||||
- The resolver agent can distinguish three modes: "prior fixes were band-aids, redo holistically", "prior fixes were correct but incomplete, investigate sibling code", and "prior fixes were sound, this is independent"
|
||||
- Token cost is bounded: a PR with 15 prior resolution rounds costs no more for clustering than a PR with 3, and unrelated new feedback on a multi-round PR skips clustering entirely after the lightweight overlap check
|
||||
|
||||
## Scope Boundaries
|
||||
|
||||
- No persistent state files or `.context/` storage — detection relies entirely on GitHub PR comment history
|
||||
- No changes to the volume gate threshold or the cluster spatial grouping rules
|
||||
- No changes to how the resolver agent handles standard (non-cluster) threads
|
||||
- The `get-pr-comments` script currently filters to unresolved threads only (`isResolved == false`). Per R9, this query is broadened to also return resolved threads — no new script, just a wider filter in the existing one
|
||||
|
||||
## Key Decisions
|
||||
|
||||
- **Detection via own replies, not persistent state**: Prior resolutions are detected by checking for the skill's own reply comments on PR threads. This keeps the skill stateless and avoids `.context/` file management. The data is already authoritative (GitHub is the source of truth for what was resolved).
|
||||
- **Three-mode resolver assessment**: The agent distinguishes band-aid fixes (redo), correct-but-incomplete fixes (keep fixes, investigate sibling code), and sound-and-independent fixes (context only). The "correct but incomplete" mode is the highest-value case — it's what turns "three rounds of the same concern in different files" into proactive investigation of untouched code with the same pattern.
|
||||
- **Cross-invocation signal subsumes verify-loop signal**: Within-session cycles produce replies that count as prior resolutions, so the new gate handles both cross-session and within-session re-entry without needing a separate verify-loop signal.
|
||||
- **Bounded lookback, not full history**: Clustering only considers the last N resolution rounds. Recurring patterns surface in recent history — if the same concern category appeared in the last 2-3 rounds, that's the signal. Going back further adds cost without proportional value.
|
||||
- **Zero additional API calls**: Cross-invocation detection piggybacks on the existing `get-pr-comments` query by broadening the filter. All analysis — detection, overlap check, clustering — happens in-memory on data already fetched. No new GraphQL calls.
|
||||
- **Two-tier cost control**: The lightweight overlap check (R11) prevents unnecessary full clustering. Most multi-round PRs get unrelated feedback in later rounds; those skip clustering entirely after a cheap metadata comparison. Full clustering only runs when there's evidence it will find something.
|
||||
|
||||
## Outstanding Questions
|
||||
|
||||
### Deferred to Planning
|
||||
|
||||
- [Affects R1][Technical] How should the skill identify its own prior replies? Options include checking the authenticated `gh` user, matching a reply-text pattern, or both. Planning should check what the existing `resolve-pr-thread` and `reply-to-pr-thread` scripts produce and what's easily queryable.
|
||||
- [Affects R4][Technical] How should previously-resolved threads be represented in the triage list alongside new threads? They need a status marker (e.g., `previously-resolved`) so clustering can include them while dispatch skips re-resolution of threads that don't cluster.
|
||||
- [Affects R9][Technical] What fields does the existing `get-pr-comments` GraphQL query return per thread? Planning should check whether the query already fetches enough data (file path, line range, comment body, author) to support both resolved and unresolved threads without changing the response shape, or whether fields need to be added.
|
||||
- [Affects R10][Technical] What is the right value of N for resolution round lookback? 2-3 is the starting hypothesis. Planning should consider typical PR review patterns and the marginal value of deeper lookback.
|
||||
|
||||
## Next Steps
|
||||
|
||||
-> `/ce:plan` for structured implementation planning
|
||||
@@ -0,0 +1,317 @@
|
||||
---
|
||||
title: "feat(resolve-pr-feedback): cross-invocation cluster analysis"
|
||||
type: feat
|
||||
status: completed
|
||||
date: 2026-04-01
|
||||
origin: docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md
|
||||
---
|
||||
|
||||
# Cross-Invocation Cluster Analysis for resolve-pr-feedback
|
||||
|
||||
## Overview
|
||||
|
||||
Replace the dead verify-loop re-entry gate signal in the resolve-pr-feedback skill with a cross-invocation awareness signal that detects recurring feedback patterns across multiple review rounds on the same PR. The change touches three files: the `get-pr-comments` script (data), the SKILL.md (orchestration), and the pr-comment-resolver agent (cluster handling).
|
||||
|
||||
## Problem Frame
|
||||
|
||||
The skill's cluster analysis has two gates: volume (3+ items) and verify-loop re-entry (2nd+ pass within same invocation). The verify-loop gate is dead — automated reviewers post minutes after push, but verify runs seconds after. This leaves volume as the only gate, which misses the highest-value scenario: a reviewer posts 1-2 threads per round about the same class of problem across multiple rounds. Cross-invocation awareness detects this pattern by checking for resolved threads alongside new ones — evidence of multi-round review. (see origin: `docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md`)
|
||||
|
||||
## Requirements Trace
|
||||
|
||||
- R1. Cross-invocation awareness signal replaces verify-loop re-entry gate
|
||||
- R2. Prior resolutions + new feedback = re-entry signal, even with 1 new item
|
||||
- R3. Volume gate (3+) unchanged, OR'd with cross-invocation signal
|
||||
- R4. Clustering input includes new + prior threads (bounded to last N)
|
||||
- R5. Previously-resolved threads participate in category assignment and spatial grouping
|
||||
- R6. Three-mode resolver assessment: band-aid (redo), correct-but-incomplete (investigate siblings), sound-and-independent (context only)
|
||||
- R7. Cluster brief gains `<prior-resolutions>` element with metadata
|
||||
- R8. Within-session verify loop subsumes into cross-invocation signal
|
||||
- R9. Zero additional GraphQL calls — broaden existing query's jq filter
|
||||
- R10. Bounded lookback: last N resolved threads (simplified from "rounds" — see Key Technical Decisions)
|
||||
|
||||
## Scope Boundaries
|
||||
|
||||
- No persistent state files or `.context/` storage
|
||||
- No changes to the volume gate threshold or spatial grouping rules
|
||||
- No changes to standard (non-cluster) thread handling
|
||||
- No new scripts — extend the existing `get-pr-comments` script
|
||||
|
||||
## Context & Research
|
||||
|
||||
### Relevant Code and Patterns
|
||||
|
||||
- `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md` — skill orchestration, steps 1-9
|
||||
- `plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments` — GraphQL query + jq filter; already fetches resolved threads in the query but drops them in jq (`isResolved == false`)
|
||||
- `plugins/compound-engineering/agents/workflow/pr-comment-resolver.md` — resolver agent with standard and cluster modes
|
||||
|
||||
### Institutional Learnings
|
||||
|
||||
- **Script-first architecture** (`docs/solutions/skill-design/script-first-skill-architecture.md`): Classification and filtering logic must live in the script, not in SKILL.md instructions. The script should output pre-computed analysis so the model receives structured decisions, not raw data to classify. 60-75% token savings.
|
||||
- **Explicit state machines** (`docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md`): Model the cross-invocation gate as a decision table with explicit outcomes, not prose conditionals.
|
||||
- **Pass paths, not content** (`docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md`): The `<prior-resolutions>` element should contain metadata (thread IDs, categories, file paths, timestamps), not full comment bodies. The resolver reads full content on demand.
|
||||
- **Status-gated resolution** (`docs/solutions/workflow/todo-status-lifecycle.md`): Previously-resolved threads must be enforced at the dispatch boundary — they participate in clustering but are never individually dispatched.
|
||||
|
||||
## Key Technical Decisions
|
||||
|
||||
- **jq filter change, not GraphQL change**: The existing query fetches all threads including resolved ones. The `isResolved == false` filter is in jq. Broadening this filter adds resolved threads to the output at zero API cost. (see origin: R9)
|
||||
- **Any resolved thread is a prior resolution — no author matching needed**: The brainstorm originally required detecting the skill's own prior replies. The plan simplifies this: any resolved thread on the PR is evidence of a prior review round. This eliminates the `gh api user` call, `author.login` matching, reply pattern detection, and the `set -e` error handling complexity. Multi-round review is the signal, regardless of who resolved the threads.
|
||||
- **N bounds total resolved threads, not "rounds"**: The brainstorm defined "rounds" as groups of threads resolved in a single invocation, which required fragile timestamp-based clustering in jq. The plan simplifies to: take the last N resolved threads (by `createdAt` of the most recent comment). This is a trivial jq sort + limit. N=10 is the starting value (covering typical PR history without excessive data). Successive reviews naturally cluster around changed code, so thread-level bounding is sufficient.
|
||||
- **No spatial overlap check**: The brainstorm's R11 specified a lightweight overlap check before full clustering. The plan drops this: successive reviews almost always cluster around the same code areas, so the overlap check would almost always pass. The cost it prevents (clustering with ~10 resolved threads + 1-2 new ones) is small. Skipping it keeps the orchestration simpler.
|
||||
- **Script computes the cross-invocation envelope**: Per the script-first learning, the script outputs a `cross_invocation` object with `signal` (boolean) and `resolved_threads` (array). The SKILL.md receives pre-computed analysis.
|
||||
|
||||
## Open Questions
|
||||
|
||||
### Resolved During Planning
|
||||
|
||||
- **How to detect prior resolutions**: Any resolved thread = prior resolution. No author matching, no reply pattern matching, no user API call. Resolved threads exist alongside new ones in the script output.
|
||||
- **How to bound the lookback**: Last N=10 resolved threads by most-recent comment timestamp. Simple jq sort + slice.
|
||||
- **Whether to check spatial overlap first**: No. Successive reviews naturally cluster around changed code. The overlap check adds orchestration complexity for negligible token savings.
|
||||
|
||||
### Deferred to Implementation
|
||||
|
||||
- **Optimal value of N**: Starting at 10. If PRs with extensive resolved thread history show performance issues, reduce. If patterns are missed, increase.
|
||||
|
||||
---
|
||||
|
||||
## High-Level Technical Design
|
||||
|
||||
> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.*
|
||||
|
||||
```
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ get-pr-comments script (data layer) │
|
||||
│ │
|
||||
│ GraphQL query (unchanged) │
|
||||
│ │ │
|
||||
│ ▼ │
|
||||
│ jq filter (broadened) │
|
||||
│ │ │
|
||||
│ ├── review_threads: [unresolved, as before] │
|
||||
│ ├── pr_comments: [as before] │
|
||||
│ ├── review_bodies: [as before] │
|
||||
│ └── cross_invocation: │
|
||||
│ signal: true/false │
|
||||
│ resolved_threads: [ │
|
||||
│ { thread_id, path, line, │
|
||||
│ first_comment_body, last_comment_at } │
|
||||
│ ...last N by recency │
|
||||
│ ] │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ SKILL.md (orchestration layer) │
|
||||
│ │
|
||||
│ Step 1: Fetch (calls modified script) │
|
||||
│ │
|
||||
│ Step 2: Triage (as before) │
|
||||
│ │
|
||||
│ Step 3: Cluster gate (CHANGED) │
|
||||
│ ┌────────────────────────────────────────────┐ │
|
||||
│ │ Volume (3+)? ─── YES ──> full clustering │ │
|
||||
│ │ │ │ │
|
||||
│ │ NO │ │
|
||||
│ │ │ │ │
|
||||
│ │ cross_invocation.signal? ─ NO ──> skip │ │
|
||||
│ │ │ │ │
|
||||
│ │ YES │ │
|
||||
│ │ │ │ │
|
||||
│ │ Full clustering (new + resolved threads) │ │
|
||||
│ └────────────────────────────────────────────┘ │
|
||||
│ │
|
||||
│ Step 5: Dispatch │
|
||||
│ - resolved threads: cluster input only │
|
||||
│ - new threads: cluster or individual │
|
||||
│ │
|
||||
│ Step 8: Verify loop (simplified) │
|
||||
│ - removes old verify-loop re-entry logic │
|
||||
│ - relies on cross-invocation signal next run │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
│
|
||||
▼
|
||||
┌──────────────────────────────────────────────────────┐
|
||||
│ pr-comment-resolver agent (cluster mode) │
|
||||
│ │
|
||||
│ Receives <cluster-brief> with <prior-resolutions> │
|
||||
│ │
|
||||
│ Three-mode assessment: │
|
||||
│ 1. Band-aid: redo prior fixes holistically │
|
||||
│ 2. Correct-but-incomplete: keep fixes, │
|
||||
│ investigate sibling code │
|
||||
│ 3. Sound-and-independent: context only │
|
||||
└──────────────────────────────────────────────────────┘
|
||||
```
|
||||
|
||||
## Implementation Units
|
||||
|
||||
- [x] **Unit 1: Extend `get-pr-comments` script**
|
||||
|
||||
**Goal:** Broaden the jq filter to include resolved threads and output a cross-invocation envelope alongside the existing data.
|
||||
|
||||
**Requirements:** R1, R2, R9, R10
|
||||
|
||||
**Dependencies:** None
|
||||
|
||||
**Files:**
|
||||
- Modify: `plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments`
|
||||
|
||||
**Approach:**
|
||||
- Widen the jq filter: keep the existing `review_threads` array (unresolved, non-outdated, as before). Add a new selection for resolved threads (`isResolved == true`), sorted by most-recent comment `createdAt`, limited to the last N=10.
|
||||
- Output the existing three keys (`review_threads`, `pr_comments`, `review_bodies`) unchanged, plus a new `cross_invocation` object containing: `signal` (boolean — true when both resolved threads and unresolved review threads exist), and `resolved_threads` (array of objects with `thread_id`, `path`, `line`, `first_comment_body`, `last_comment_at`).
|
||||
- No `gh api user` call. No author matching. No reply pattern detection. The signal is simply: resolved threads exist AND new threads exist.
|
||||
|
||||
**Patterns to follow:**
|
||||
- Existing jq pipeline in `get-pr-comments` — extend the `$pr` extraction, don't restructure it
|
||||
- Keep all logic in jq
|
||||
|
||||
**Test scenarios:**
|
||||
- Happy path: PR with 2 resolved threads and 1 new thread -> `cross_invocation.signal: true`, `resolved_threads` has 2 entries, `review_threads` has 1
|
||||
- Happy path: PR with no resolved threads -> `cross_invocation.signal: false`, `resolved_threads` empty
|
||||
- Happy path: PR with resolved threads but no unresolved threads -> `cross_invocation.signal: false` (nothing new to cluster)
|
||||
- Edge case: PR with 20 resolved threads -> only last 10 (by recency) included
|
||||
- Edge case: PR with resolved threads but all unresolved threads are outdated -> `review_threads` empty, signal false
|
||||
|
||||
**Verification:**
|
||||
- Run against a test PR with known resolved threads and verify the output JSON shape
|
||||
- Existing `review_threads`, `pr_comments`, `review_bodies` output is identical to current behavior
|
||||
|
||||
---
|
||||
|
||||
- [x] **Unit 2: Update SKILL.md orchestration**
|
||||
|
||||
**Goal:** Replace the verify-loop re-entry gate with the cross-invocation signal, update cluster brief format, enforce dispatch boundary for resolved threads, and simplify the verify loop.
|
||||
|
||||
**Requirements:** R1, R2, R3, R4, R5, R7, R8
|
||||
|
||||
**Dependencies:** Unit 1 (script must output the cross-invocation envelope)
|
||||
|
||||
**Files:**
|
||||
- Modify: `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md`
|
||||
|
||||
**Approach:**
|
||||
|
||||
*Step 1 (Fetch)*: No change — the script now returns the cross-invocation envelope automatically.
|
||||
|
||||
*Step 2 (Triage)*: No changes. Triage classifies new vs already-handled among unresolved threads. Resolved threads from `cross_invocation` are not triage subjects — they're a separate input to clustering.
|
||||
|
||||
*Step 3 (Cluster Analysis)*: Replace the gate table:
|
||||
|
||||
| Gate signal | Check |
|
||||
|---|---|
|
||||
| **Volume** | 3+ new items from triage |
|
||||
| **Cross-invocation** | `cross_invocation.signal == true` |
|
||||
|
||||
When cross-invocation gate fires: include resolved threads from `cross_invocation.resolved_threads` alongside new threads in category assignment and spatial grouping. Resolved threads get a `previously_resolved` marker.
|
||||
|
||||
Update cluster brief XML to include `<prior-resolutions>`:
|
||||
```xml
|
||||
<cluster-brief>
|
||||
<theme>[concern category]</theme>
|
||||
<area>[common directory path]</area>
|
||||
<files>[comma-separated file paths]</files>
|
||||
<threads>[comma-separated thread/comment IDs]</threads>
|
||||
<hypothesis>[one sentence]</hypothesis>
|
||||
<prior-resolutions>
|
||||
<thread id="PRRT_..." path="..." category="..."/>
|
||||
</prior-resolutions>
|
||||
</cluster-brief>
|
||||
```
|
||||
|
||||
Remove the `<just-fixed-files>` element — subsumed by `<prior-resolutions>`.
|
||||
|
||||
*Step 5 (Dispatch)*: Add dispatch boundary rule: resolved threads participate in clustering and appear in cluster briefs, but are NEVER individually dispatched. Only new threads get individual or cluster dispatch.
|
||||
|
||||
*Step 8 (Verify)*: Simplify. Remove "Record which files were modified and which concern categories were addressed" and the verify-loop re-entry language. If new threads remain after 2 fix-verify cycles, escalate. Cross-invocation signal handles re-entry across sessions; within-session re-entry works because replies from earlier cycles make threads resolved on re-fetch.
|
||||
|
||||
**Patterns to follow:**
|
||||
- Existing gate table format in step 3
|
||||
- Existing cluster brief XML structure
|
||||
- Existing dispatch boundary logic in step 5
|
||||
|
||||
**Test scenarios:**
|
||||
- Happy path: 1 new thread + cross-invocation signal -> cluster analysis runs, resolved threads included
|
||||
- Happy path: 3 new threads + no cross-invocation signal -> volume gate fires, no resolved threads
|
||||
- Happy path: 1 new thread + no cross-invocation signal -> both gates skip, no clustering
|
||||
- Edge case: cross-invocation cluster with 1 new + 2 resolved -> brief includes all 3, dispatch only addresses the new thread (plus siblings the resolver identifies)
|
||||
- Edge case: resolved thread in a cluster -> in the brief for context, NOT dispatched individually
|
||||
- Integration: verify loop re-fetches after this session's fixes, resolved threads from this cycle appear in `cross_invocation`
|
||||
|
||||
**Verification:**
|
||||
- Gate table in step 3 has exactly two rows (Volume, Cross-invocation)
|
||||
- No references to "verify-loop re-entry" remain
|
||||
- `<just-fixed-files>` removed from cluster brief documentation
|
||||
- Step 5 has "resolved threads are cluster-only" rule
|
||||
- Step 8 no longer tracks files/categories or references re-entry as a gate signal
|
||||
|
||||
---
|
||||
|
||||
- [x] **Unit 3: Update pr-comment-resolver agent for cross-invocation clusters**
|
||||
|
||||
**Goal:** Add handling for the `<prior-resolutions>` element in cluster mode and implement the three-mode assessment for cross-invocation clusters.
|
||||
|
||||
**Requirements:** R6, R7
|
||||
|
||||
**Dependencies:** Unit 2 (SKILL.md must send the new cluster brief format)
|
||||
|
||||
**Files:**
|
||||
- Modify: `plugins/compound-engineering/agents/workflow/pr-comment-resolver.md`
|
||||
|
||||
**Approach:**
|
||||
|
||||
Update the Cluster Mode Workflow section:
|
||||
|
||||
Step 1 (Parse cluster brief): Add `<prior-resolutions>` to parsed elements.
|
||||
|
||||
Step 3 (Assess root cause): When `<prior-resolutions>` is present, expand from two modes (systemic vs coincidental) to three:
|
||||
|
||||
- **Band-aid fixes** — prior fixes addressed symptoms, not root cause. Approach: re-examine prior fix locations, implement holistic fix.
|
||||
- **Correct but incomplete** — prior fixes were right for their files, but the recurring pattern likely exists in untouched sibling code. This is the highest-value mode. Approach: keep prior fixes, fix the new thread, proactively investigate files in the same directory/module for the same pattern. Report findings in cluster assessment.
|
||||
- **Sound and independent** — prior fixes adequate, new thread is genuinely unrelated. Approach: fix individually, use prior context for awareness only.
|
||||
|
||||
Add a cross-invocation example showing the "correct but incomplete" mode.
|
||||
|
||||
Update `cluster_assessment` return to include which mode was applied and, for "correct but incomplete" mode, which additional files were investigated.
|
||||
|
||||
**Patterns to follow:**
|
||||
- Existing cluster mode workflow structure
|
||||
- Existing example format in `<examples>`
|
||||
- Existing `cluster_assessment` return structure
|
||||
|
||||
**Test scenarios:**
|
||||
- Happy path: cluster with `<prior-resolutions>` where pattern extends to untouched code -> "correct but incomplete", investigates siblings
|
||||
- Happy path: cluster with `<prior-resolutions>` where prior fixes were shallow -> "band-aid", holistic fix
|
||||
- Happy path: cluster with `<prior-resolutions>` where new thread is unrelated -> "sound and independent"
|
||||
- Happy path: cluster WITHOUT `<prior-resolutions>` -> existing two-mode assessment, no behavior change
|
||||
- Edge case: `<prior-resolutions>` present but empty -> fall back to existing behavior
|
||||
|
||||
**Verification:**
|
||||
- Cluster mode workflow mentions all three assessment modes
|
||||
- `<prior-resolutions>` is listed as a parsed element
|
||||
- New example demonstrates "correct but incomplete" mode
|
||||
- `cluster_assessment` format documented for all three modes
|
||||
- References to `<just-fixed-files>` removed (subsumed by `<prior-resolutions>`)
|
||||
- Existing standard mode and non-prior cluster mode unchanged
|
||||
|
||||
## System-Wide Impact
|
||||
|
||||
- **Interaction graph:** `get-pr-comments` is called by SKILL.md step 1 and step 8 (verify). Both callers now receive the `cross_invocation` envelope. Step 8's re-fetch picks up this session's replies as resolved threads.
|
||||
- **Error propagation:** No new external calls to fail. The only change is a jq filter broadening — if resolved threads are missing from the GraphQL response, `cross_invocation.signal` is false (graceful degradation).
|
||||
- **API surface parity:** The script's existing three output keys are unchanged. Callers that don't read `cross_invocation` are unaffected.
|
||||
- **Unchanged invariants:** Targeted mode is unaffected. Volume gate threshold, spatial grouping rules, and individual dispatch logic are unchanged.
|
||||
|
||||
## Risks & Dependencies
|
||||
|
||||
| Risk | Mitigation |
|
||||
|------|------------|
|
||||
| Resolved threads from manual (non-skill) resolution included as prior resolutions | Acceptable — any resolved thread is evidence of prior review attention. If it was manually resolved without a fix, clustering with it may produce a "sound and independent" assessment, which is the correct outcome |
|
||||
| Resolved threads with 50+ comments hit pagination limits | Existing query fetches `comments(first: 50)`. The `last_comment_at` timestamp comes from whatever comments are fetched — graceful degradation |
|
||||
| "Correct but incomplete" mode causes resolver to touch files not in review threads | Bounded by the cluster's `<area>` (directory path). Resolver already reads broadly in cluster mode |
|
||||
| Within-session verify loop depends on GitHub API reflecting resolved state quickly | GitHub's GraphQL is eventually consistent. If a just-resolved thread hasn't propagated, the cross-invocation signal won't fire for that thread on re-fetch — it will be caught on the next invocation instead. Acceptable degradation |
|
||||
|
||||
## Sources & References
|
||||
|
||||
- **Origin document:** [docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md](docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md)
|
||||
- Related skill: `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md`
|
||||
- Related agent: `plugins/compound-engineering/agents/workflow/pr-comment-resolver.md`
|
||||
- Related script: `plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments`
|
||||
- Learnings: `docs/solutions/skill-design/script-first-skill-architecture.md`, `docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md`
|
||||
Reference in New Issue
Block a user