diff --git a/docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md b/docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md new file mode 100644 index 0000000..60592c5 --- /dev/null +++ b/docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md @@ -0,0 +1,79 @@ +--- +date: 2026-04-01 +topic: cross-invocation-cluster-analysis +--- + +# Cross-Invocation Cluster Analysis for resolve-pr-feedback + +## Problem Frame + +The resolve-pr-feedback skill's cluster analysis is gated on two signals: volume (3+ items) and verify-loop re-entry (2nd+ pass within the same invocation). The verify-loop signal is effectively dead — it requires new review threads to appear between push and verify, but automated reviewers take minutes while verify runs seconds after push. The timing gap makes this gate unreliable at best, and in the common case of automated reviewers, impossible. + +This leaves volume as the only working gate. The skill misses the exact scenario clustering was designed for: a reviewer posts feedback about the same *class* of problem across multiple rounds, with each round containing only 1-2 threads. Individually, no round triggers the volume gate. But taken together, there's a clear recurring pattern — e.g., "three separate rounds of feedback all about missing convergence behavior in target writers." The skill should step back and investigate the problem class holistically rather than applying band-aids to each instance. + +## Requirements + +**Detection Signal** + +- R1. Replace the verify-loop re-entry gate signal with a cross-invocation awareness signal. Before triaging, the skill checks whether it has previously resolved threads on this same PR. Its own prior reply comments are the evidence. +- R2. If prior resolutions exist and new unresolved feedback has arrived since the last resolution, that constitutes the re-entry signal — even with just 1 new item. If no prior resolutions are found (first invocation), the cross-invocation signal does not fire and processing continues with the volume gate as the only cluster trigger. +- R3. The volume gate (3+ items) remains unchanged as a parallel trigger. The two gates are OR'd: either one fires cluster analysis. + +**Cost Control** + +- R9. Cross-invocation detection must not add GraphQL API calls. The existing `get-pr-comments` query should be broadened to return both unresolved and resolved threads (with skill replies) in a single call. All cross-invocation analysis — detection, overlap check, clustering — works on data already in memory from that one call. +- R10. Cross-invocation clustering is scoped to the last N resolution rounds (not all history). A "round" is the set of threads resolved in a single skill invocation. This bounds the data the skill processes regardless of PR history length. Planning should determine the right value of N; 2-3 rounds is likely sufficient since recurring patterns surface in recent history. +- R11. When the cross-invocation signal fires but the volume gate does not, the skill runs a lightweight overlap check first: compare concern categories and file paths between new and prior threads using data already fetched. Promote to full clustering only if category or spatial overlap exists. If no overlap, skip clustering and process the new thread(s) individually. + +**Clustering Input** + +- R4. When the cross-invocation signal fires and overlap is confirmed (R11), cluster analysis considers both the new thread(s) AND previously-resolved threads from the last N rounds as input. This enables detecting that the same concern category keeps recurring across rounds. +- R5. Previously-resolved threads are included in category assignment and spatial grouping alongside new threads, so clusters can span rounds. + +**Resolver Behavior on Cross-Invocation Clusters** + +- R6. When a cross-invocation cluster forms, the resolver agent assesses the prior fixes and applies one of three modes: + - **Band-aid fixes** — prior fixes addressed symptoms, not root cause. Re-examine and potentially redo them as part of a holistic fix. + - **Correct but incomplete** — prior fixes were right for their scope, but the recurring pattern reveals the same problem likely exists in untouched sibling code. Keep prior fixes, fix the new thread, and proactively investigate whether the pattern extends to code no reviewer has flagged yet. This is the highest-value mode — it's what catches "three rounds of the same concern category in different files means there are probably more files with the same issue." + - **Sound and independent** — prior fixes were adequate and the new thread is genuinely unrelated despite clustering. Use prior context for awareness only. +- R7. The cluster brief XML gains a `` element listing previously-resolved thread IDs and their concern categories, with reply timestamps (createdAt) to establish ordering across rounds, so the resolver agent has the full cross-round picture. + +**Within-Session Verify Loop** + +- R8. The within-session verify loop (step 8: if new threads remain, repeat from step 2) continues to function as a workflow mechanism. Replies posted during earlier cycles within the same session count as prior resolutions for the cross-invocation signal, so the new gate naturally subsumes the old verify-loop re-entry gate. + +## Success Criteria + +- Recurring feedback about the same problem class across 2+ rounds triggers cluster analysis, even when each round has only 1-2 threads +- A single new thread on a PR with prior resolutions in the same concern category produces a cluster brief that includes both the new and old threads +- The resolver agent can distinguish three modes: "prior fixes were band-aids, redo holistically", "prior fixes were correct but incomplete, investigate sibling code", and "prior fixes were sound, this is independent" +- Token cost is bounded: a PR with 15 prior resolution rounds costs no more for clustering than a PR with 3, and unrelated new feedback on a multi-round PR skips clustering entirely after the lightweight overlap check + +## Scope Boundaries + +- No persistent state files or `.context/` storage — detection relies entirely on GitHub PR comment history +- No changes to the volume gate threshold or the cluster spatial grouping rules +- No changes to how the resolver agent handles standard (non-cluster) threads +- The `get-pr-comments` script currently filters to unresolved threads only (`isResolved == false`). Per R9, this query is broadened to also return resolved threads — no new script, just a wider filter in the existing one + +## Key Decisions + +- **Detection via own replies, not persistent state**: Prior resolutions are detected by checking for the skill's own reply comments on PR threads. This keeps the skill stateless and avoids `.context/` file management. The data is already authoritative (GitHub is the source of truth for what was resolved). +- **Three-mode resolver assessment**: The agent distinguishes band-aid fixes (redo), correct-but-incomplete fixes (keep fixes, investigate sibling code), and sound-and-independent fixes (context only). The "correct but incomplete" mode is the highest-value case — it's what turns "three rounds of the same concern in different files" into proactive investigation of untouched code with the same pattern. +- **Cross-invocation signal subsumes verify-loop signal**: Within-session cycles produce replies that count as prior resolutions, so the new gate handles both cross-session and within-session re-entry without needing a separate verify-loop signal. +- **Bounded lookback, not full history**: Clustering only considers the last N resolution rounds. Recurring patterns surface in recent history — if the same concern category appeared in the last 2-3 rounds, that's the signal. Going back further adds cost without proportional value. +- **Zero additional API calls**: Cross-invocation detection piggybacks on the existing `get-pr-comments` query by broadening the filter. All analysis — detection, overlap check, clustering — happens in-memory on data already fetched. No new GraphQL calls. +- **Two-tier cost control**: The lightweight overlap check (R11) prevents unnecessary full clustering. Most multi-round PRs get unrelated feedback in later rounds; those skip clustering entirely after a cheap metadata comparison. Full clustering only runs when there's evidence it will find something. + +## Outstanding Questions + +### Deferred to Planning + +- [Affects R1][Technical] How should the skill identify its own prior replies? Options include checking the authenticated `gh` user, matching a reply-text pattern, or both. Planning should check what the existing `resolve-pr-thread` and `reply-to-pr-thread` scripts produce and what's easily queryable. +- [Affects R4][Technical] How should previously-resolved threads be represented in the triage list alongside new threads? They need a status marker (e.g., `previously-resolved`) so clustering can include them while dispatch skips re-resolution of threads that don't cluster. +- [Affects R9][Technical] What fields does the existing `get-pr-comments` GraphQL query return per thread? Planning should check whether the query already fetches enough data (file path, line range, comment body, author) to support both resolved and unresolved threads without changing the response shape, or whether fields need to be added. +- [Affects R10][Technical] What is the right value of N for resolution round lookback? 2-3 is the starting hypothesis. Planning should consider typical PR review patterns and the marginal value of deeper lookback. + +## Next Steps + +-> `/ce:plan` for structured implementation planning diff --git a/docs/plans/2026-04-01-001-feat-cross-invocation-cluster-analysis-plan.md b/docs/plans/2026-04-01-001-feat-cross-invocation-cluster-analysis-plan.md new file mode 100644 index 0000000..08b813d --- /dev/null +++ b/docs/plans/2026-04-01-001-feat-cross-invocation-cluster-analysis-plan.md @@ -0,0 +1,317 @@ +--- +title: "feat(resolve-pr-feedback): cross-invocation cluster analysis" +type: feat +status: completed +date: 2026-04-01 +origin: docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md +--- + +# Cross-Invocation Cluster Analysis for resolve-pr-feedback + +## Overview + +Replace the dead verify-loop re-entry gate signal in the resolve-pr-feedback skill with a cross-invocation awareness signal that detects recurring feedback patterns across multiple review rounds on the same PR. The change touches three files: the `get-pr-comments` script (data), the SKILL.md (orchestration), and the pr-comment-resolver agent (cluster handling). + +## Problem Frame + +The skill's cluster analysis has two gates: volume (3+ items) and verify-loop re-entry (2nd+ pass within same invocation). The verify-loop gate is dead — automated reviewers post minutes after push, but verify runs seconds after. This leaves volume as the only gate, which misses the highest-value scenario: a reviewer posts 1-2 threads per round about the same class of problem across multiple rounds. Cross-invocation awareness detects this pattern by checking for resolved threads alongside new ones — evidence of multi-round review. (see origin: `docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md`) + +## Requirements Trace + +- R1. Cross-invocation awareness signal replaces verify-loop re-entry gate +- R2. Prior resolutions + new feedback = re-entry signal, even with 1 new item +- R3. Volume gate (3+) unchanged, OR'd with cross-invocation signal +- R4. Clustering input includes new + prior threads (bounded to last N) +- R5. Previously-resolved threads participate in category assignment and spatial grouping +- R6. Three-mode resolver assessment: band-aid (redo), correct-but-incomplete (investigate siblings), sound-and-independent (context only) +- R7. Cluster brief gains `` element with metadata +- R8. Within-session verify loop subsumes into cross-invocation signal +- R9. Zero additional GraphQL calls — broaden existing query's jq filter +- R10. Bounded lookback: last N resolved threads (simplified from "rounds" — see Key Technical Decisions) + +## Scope Boundaries + +- No persistent state files or `.context/` storage +- No changes to the volume gate threshold or spatial grouping rules +- No changes to standard (non-cluster) thread handling +- No new scripts — extend the existing `get-pr-comments` script + +## Context & Research + +### Relevant Code and Patterns + +- `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md` — skill orchestration, steps 1-9 +- `plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments` — GraphQL query + jq filter; already fetches resolved threads in the query but drops them in jq (`isResolved == false`) +- `plugins/compound-engineering/agents/workflow/pr-comment-resolver.md` — resolver agent with standard and cluster modes + +### Institutional Learnings + +- **Script-first architecture** (`docs/solutions/skill-design/script-first-skill-architecture.md`): Classification and filtering logic must live in the script, not in SKILL.md instructions. The script should output pre-computed analysis so the model receives structured decisions, not raw data to classify. 60-75% token savings. +- **Explicit state machines** (`docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md`): Model the cross-invocation gate as a decision table with explicit outcomes, not prose conditionals. +- **Pass paths, not content** (`docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md`): The `` element should contain metadata (thread IDs, categories, file paths, timestamps), not full comment bodies. The resolver reads full content on demand. +- **Status-gated resolution** (`docs/solutions/workflow/todo-status-lifecycle.md`): Previously-resolved threads must be enforced at the dispatch boundary — they participate in clustering but are never individually dispatched. + +## Key Technical Decisions + +- **jq filter change, not GraphQL change**: The existing query fetches all threads including resolved ones. The `isResolved == false` filter is in jq. Broadening this filter adds resolved threads to the output at zero API cost. (see origin: R9) +- **Any resolved thread is a prior resolution — no author matching needed**: The brainstorm originally required detecting the skill's own prior replies. The plan simplifies this: any resolved thread on the PR is evidence of a prior review round. This eliminates the `gh api user` call, `author.login` matching, reply pattern detection, and the `set -e` error handling complexity. Multi-round review is the signal, regardless of who resolved the threads. +- **N bounds total resolved threads, not "rounds"**: The brainstorm defined "rounds" as groups of threads resolved in a single invocation, which required fragile timestamp-based clustering in jq. The plan simplifies to: take the last N resolved threads (by `createdAt` of the most recent comment). This is a trivial jq sort + limit. N=10 is the starting value (covering typical PR history without excessive data). Successive reviews naturally cluster around changed code, so thread-level bounding is sufficient. +- **No spatial overlap check**: The brainstorm's R11 specified a lightweight overlap check before full clustering. The plan drops this: successive reviews almost always cluster around the same code areas, so the overlap check would almost always pass. The cost it prevents (clustering with ~10 resolved threads + 1-2 new ones) is small. Skipping it keeps the orchestration simpler. +- **Script computes the cross-invocation envelope**: Per the script-first learning, the script outputs a `cross_invocation` object with `signal` (boolean) and `resolved_threads` (array). The SKILL.md receives pre-computed analysis. + +## Open Questions + +### Resolved During Planning + +- **How to detect prior resolutions**: Any resolved thread = prior resolution. No author matching, no reply pattern matching, no user API call. Resolved threads exist alongside new ones in the script output. +- **How to bound the lookback**: Last N=10 resolved threads by most-recent comment timestamp. Simple jq sort + slice. +- **Whether to check spatial overlap first**: No. Successive reviews naturally cluster around changed code. The overlap check adds orchestration complexity for negligible token savings. + +### Deferred to Implementation + +- **Optimal value of N**: Starting at 10. If PRs with extensive resolved thread history show performance issues, reduce. If patterns are missed, increase. + +--- + +## High-Level Technical Design + +> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.* + +``` +┌──────────────────────────────────────────────────────┐ +│ get-pr-comments script (data layer) │ +│ │ +│ GraphQL query (unchanged) │ +│ │ │ +│ ▼ │ +│ jq filter (broadened) │ +│ │ │ +│ ├── review_threads: [unresolved, as before] │ +│ ├── pr_comments: [as before] │ +│ ├── review_bodies: [as before] │ +│ └── cross_invocation: │ +│ signal: true/false │ +│ resolved_threads: [ │ +│ { thread_id, path, line, │ +│ first_comment_body, last_comment_at } │ +│ ...last N by recency │ +│ ] │ +└──────────────────────────────────────────────────────┘ + │ + ▼ +┌──────────────────────────────────────────────────────┐ +│ SKILL.md (orchestration layer) │ +│ │ +│ Step 1: Fetch (calls modified script) │ +│ │ +│ Step 2: Triage (as before) │ +│ │ +│ Step 3: Cluster gate (CHANGED) │ +│ ┌────────────────────────────────────────────┐ │ +│ │ Volume (3+)? ─── YES ──> full clustering │ │ +│ │ │ │ │ +│ │ NO │ │ +│ │ │ │ │ +│ │ cross_invocation.signal? ─ NO ──> skip │ │ +│ │ │ │ │ +│ │ YES │ │ +│ │ │ │ │ +│ │ Full clustering (new + resolved threads) │ │ +│ └────────────────────────────────────────────┘ │ +│ │ +│ Step 5: Dispatch │ +│ - resolved threads: cluster input only │ +│ - new threads: cluster or individual │ +│ │ +│ Step 8: Verify loop (simplified) │ +│ - removes old verify-loop re-entry logic │ +│ - relies on cross-invocation signal next run │ +└──────────────────────────────────────────────────────┘ + │ + ▼ +┌──────────────────────────────────────────────────────┐ +│ pr-comment-resolver agent (cluster mode) │ +│ │ +│ Receives with │ +│ │ +│ Three-mode assessment: │ +│ 1. Band-aid: redo prior fixes holistically │ +│ 2. Correct-but-incomplete: keep fixes, │ +│ investigate sibling code │ +│ 3. Sound-and-independent: context only │ +└──────────────────────────────────────────────────────┘ +``` + +## Implementation Units + +- [x] **Unit 1: Extend `get-pr-comments` script** + +**Goal:** Broaden the jq filter to include resolved threads and output a cross-invocation envelope alongside the existing data. + +**Requirements:** R1, R2, R9, R10 + +**Dependencies:** None + +**Files:** +- Modify: `plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments` + +**Approach:** +- Widen the jq filter: keep the existing `review_threads` array (unresolved, non-outdated, as before). Add a new selection for resolved threads (`isResolved == true`), sorted by most-recent comment `createdAt`, limited to the last N=10. +- Output the existing three keys (`review_threads`, `pr_comments`, `review_bodies`) unchanged, plus a new `cross_invocation` object containing: `signal` (boolean — true when both resolved threads and unresolved review threads exist), and `resolved_threads` (array of objects with `thread_id`, `path`, `line`, `first_comment_body`, `last_comment_at`). +- No `gh api user` call. No author matching. No reply pattern detection. The signal is simply: resolved threads exist AND new threads exist. + +**Patterns to follow:** +- Existing jq pipeline in `get-pr-comments` — extend the `$pr` extraction, don't restructure it +- Keep all logic in jq + +**Test scenarios:** +- Happy path: PR with 2 resolved threads and 1 new thread -> `cross_invocation.signal: true`, `resolved_threads` has 2 entries, `review_threads` has 1 +- Happy path: PR with no resolved threads -> `cross_invocation.signal: false`, `resolved_threads` empty +- Happy path: PR with resolved threads but no unresolved threads -> `cross_invocation.signal: false` (nothing new to cluster) +- Edge case: PR with 20 resolved threads -> only last 10 (by recency) included +- Edge case: PR with resolved threads but all unresolved threads are outdated -> `review_threads` empty, signal false + +**Verification:** +- Run against a test PR with known resolved threads and verify the output JSON shape +- Existing `review_threads`, `pr_comments`, `review_bodies` output is identical to current behavior + +--- + +- [x] **Unit 2: Update SKILL.md orchestration** + +**Goal:** Replace the verify-loop re-entry gate with the cross-invocation signal, update cluster brief format, enforce dispatch boundary for resolved threads, and simplify the verify loop. + +**Requirements:** R1, R2, R3, R4, R5, R7, R8 + +**Dependencies:** Unit 1 (script must output the cross-invocation envelope) + +**Files:** +- Modify: `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md` + +**Approach:** + +*Step 1 (Fetch)*: No change — the script now returns the cross-invocation envelope automatically. + +*Step 2 (Triage)*: No changes. Triage classifies new vs already-handled among unresolved threads. Resolved threads from `cross_invocation` are not triage subjects — they're a separate input to clustering. + +*Step 3 (Cluster Analysis)*: Replace the gate table: + +| Gate signal | Check | +|---|---| +| **Volume** | 3+ new items from triage | +| **Cross-invocation** | `cross_invocation.signal == true` | + +When cross-invocation gate fires: include resolved threads from `cross_invocation.resolved_threads` alongside new threads in category assignment and spatial grouping. Resolved threads get a `previously_resolved` marker. + +Update cluster brief XML to include ``: +```xml + + [concern category] + [common directory path] + [comma-separated file paths] + [comma-separated thread/comment IDs] + [one sentence] + + + + +``` + +Remove the `` element — subsumed by ``. + +*Step 5 (Dispatch)*: Add dispatch boundary rule: resolved threads participate in clustering and appear in cluster briefs, but are NEVER individually dispatched. Only new threads get individual or cluster dispatch. + +*Step 8 (Verify)*: Simplify. Remove "Record which files were modified and which concern categories were addressed" and the verify-loop re-entry language. If new threads remain after 2 fix-verify cycles, escalate. Cross-invocation signal handles re-entry across sessions; within-session re-entry works because replies from earlier cycles make threads resolved on re-fetch. + +**Patterns to follow:** +- Existing gate table format in step 3 +- Existing cluster brief XML structure +- Existing dispatch boundary logic in step 5 + +**Test scenarios:** +- Happy path: 1 new thread + cross-invocation signal -> cluster analysis runs, resolved threads included +- Happy path: 3 new threads + no cross-invocation signal -> volume gate fires, no resolved threads +- Happy path: 1 new thread + no cross-invocation signal -> both gates skip, no clustering +- Edge case: cross-invocation cluster with 1 new + 2 resolved -> brief includes all 3, dispatch only addresses the new thread (plus siblings the resolver identifies) +- Edge case: resolved thread in a cluster -> in the brief for context, NOT dispatched individually +- Integration: verify loop re-fetches after this session's fixes, resolved threads from this cycle appear in `cross_invocation` + +**Verification:** +- Gate table in step 3 has exactly two rows (Volume, Cross-invocation) +- No references to "verify-loop re-entry" remain +- `` removed from cluster brief documentation +- Step 5 has "resolved threads are cluster-only" rule +- Step 8 no longer tracks files/categories or references re-entry as a gate signal + +--- + +- [x] **Unit 3: Update pr-comment-resolver agent for cross-invocation clusters** + +**Goal:** Add handling for the `` element in cluster mode and implement the three-mode assessment for cross-invocation clusters. + +**Requirements:** R6, R7 + +**Dependencies:** Unit 2 (SKILL.md must send the new cluster brief format) + +**Files:** +- Modify: `plugins/compound-engineering/agents/workflow/pr-comment-resolver.md` + +**Approach:** + +Update the Cluster Mode Workflow section: + +Step 1 (Parse cluster brief): Add `` to parsed elements. + +Step 3 (Assess root cause): When `` is present, expand from two modes (systemic vs coincidental) to three: + +- **Band-aid fixes** — prior fixes addressed symptoms, not root cause. Approach: re-examine prior fix locations, implement holistic fix. +- **Correct but incomplete** — prior fixes were right for their files, but the recurring pattern likely exists in untouched sibling code. This is the highest-value mode. Approach: keep prior fixes, fix the new thread, proactively investigate files in the same directory/module for the same pattern. Report findings in cluster assessment. +- **Sound and independent** — prior fixes adequate, new thread is genuinely unrelated. Approach: fix individually, use prior context for awareness only. + +Add a cross-invocation example showing the "correct but incomplete" mode. + +Update `cluster_assessment` return to include which mode was applied and, for "correct but incomplete" mode, which additional files were investigated. + +**Patterns to follow:** +- Existing cluster mode workflow structure +- Existing example format in `` +- Existing `cluster_assessment` return structure + +**Test scenarios:** +- Happy path: cluster with `` where pattern extends to untouched code -> "correct but incomplete", investigates siblings +- Happy path: cluster with `` where prior fixes were shallow -> "band-aid", holistic fix +- Happy path: cluster with `` where new thread is unrelated -> "sound and independent" +- Happy path: cluster WITHOUT `` -> existing two-mode assessment, no behavior change +- Edge case: `` present but empty -> fall back to existing behavior + +**Verification:** +- Cluster mode workflow mentions all three assessment modes +- `` is listed as a parsed element +- New example demonstrates "correct but incomplete" mode +- `cluster_assessment` format documented for all three modes +- References to `` removed (subsumed by ``) +- Existing standard mode and non-prior cluster mode unchanged + +## System-Wide Impact + +- **Interaction graph:** `get-pr-comments` is called by SKILL.md step 1 and step 8 (verify). Both callers now receive the `cross_invocation` envelope. Step 8's re-fetch picks up this session's replies as resolved threads. +- **Error propagation:** No new external calls to fail. The only change is a jq filter broadening — if resolved threads are missing from the GraphQL response, `cross_invocation.signal` is false (graceful degradation). +- **API surface parity:** The script's existing three output keys are unchanged. Callers that don't read `cross_invocation` are unaffected. +- **Unchanged invariants:** Targeted mode is unaffected. Volume gate threshold, spatial grouping rules, and individual dispatch logic are unchanged. + +## Risks & Dependencies + +| Risk | Mitigation | +|------|------------| +| Resolved threads from manual (non-skill) resolution included as prior resolutions | Acceptable — any resolved thread is evidence of prior review attention. If it was manually resolved without a fix, clustering with it may produce a "sound and independent" assessment, which is the correct outcome | +| Resolved threads with 50+ comments hit pagination limits | Existing query fetches `comments(first: 50)`. The `last_comment_at` timestamp comes from whatever comments are fetched — graceful degradation | +| "Correct but incomplete" mode causes resolver to touch files not in review threads | Bounded by the cluster's `` (directory path). Resolver already reads broadly in cluster mode | +| Within-session verify loop depends on GitHub API reflecting resolved state quickly | GitHub's GraphQL is eventually consistent. If a just-resolved thread hasn't propagated, the cross-invocation signal won't fire for that thread on re-fetch — it will be caught on the next invocation instead. Acceptable degradation | + +## Sources & References + +- **Origin document:** [docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md](docs/brainstorms/2026-04-01-cross-invocation-cluster-analysis-requirements.md) +- Related skill: `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md` +- Related agent: `plugins/compound-engineering/agents/workflow/pr-comment-resolver.md` +- Related script: `plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments` +- Learnings: `docs/solutions/skill-design/script-first-skill-architecture.md`, `docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md` diff --git a/plugins/compound-engineering/agents/workflow/pr-comment-resolver.md b/plugins/compound-engineering/agents/workflow/pr-comment-resolver.md index e7c4563..2c64c0c 100644 --- a/plugins/compound-engineering/agents/workflow/pr-comment-resolver.md +++ b/plugins/compound-engineering/agents/workflow/pr-comment-resolver.md @@ -24,6 +24,12 @@ user: "Cluster: 3 threads about missing input validation in src/auth/. In cluster mode, the agent reads the broader area first, identifies the systemic issue, and makes a holistic fix rather than three individual patches. + +Context: A new validation thread on src/auth/reset.ts, with prior-resolutions showing the same concern was fixed in login.ts and register.ts in earlier rounds. Cross-invocation cluster. +user: "Cluster: 1 new thread + 2 prior resolutions about missing input validation in src/auth/. validationsrc/auth/src/auth/reset.tsPRRT_7Recurring validation gaps across review rounds suggest the module has more files with the same issue" +assistant: "This is the third round of validation feedback in src/auth/. Prior rounds fixed login.ts and register.ts individually -- those fixes were correct but incomplete. Reading the full src/auth/ directory... Found the same missing validation in src/auth/session.ts and src/auth/oauth.ts that nobody flagged yet. Fixing reset.ts (the new thread) and proactively fixing session.ts and oauth.ts to address the pattern holistically." +In cross-invocation cluster mode with prior-resolutions, the agent identifies the 'correct but incomplete' pattern -- prior fixes were right but reveal a broader gap. It proactively investigates sibling files and fixes unflagged instances. + You resolve PR review threads. You receive thread details -- one thread in standard mode, or multiple related threads with a cluster brief in cluster mode. Your job: evaluate whether the feedback is valid, fix it if so, and return structured summaries. @@ -141,26 +147,35 @@ decision_context: [only for needs-human -- the full markdown block above] When a `` XML block is present, follow this workflow instead of the standard workflow. -1. **Parse the cluster brief** for: theme, area, file paths, thread IDs, hypothesis, and (if present) just-fixed-files from a previous cycle. +1. **Parse the cluster brief** for: theme, area, file paths, thread IDs, hypothesis, and (if present) `` listing previously-resolved threads from earlier review rounds with their IDs, file paths, and concern categories. 2. **Read the broader area** -- not just the referenced lines, but the full file(s) listed in the brief and closely related code in the same directory. Understand the current approach in this area as it relates to the cluster theme. 3. **Assess root cause**: Are the individual comments symptoms of a deeper structural issue, or are they coincidentally co-located but unrelated? + + **Without ``** (single-round cluster): - **Systemic**: The comments point to a missing pattern, inconsistent approach, or architectural gap. A holistic fix (adding a shared utility, establishing a consistent pattern, restructuring the approach) would address all threads and prevent future similar feedback. - **Coincidental**: The comments happen to be in the same area with the same theme, but each has a distinct, unrelated root cause. Individual fixes are appropriate. + **With ``** (cross-invocation cluster — the same concern category has appeared across multiple review rounds): + - **Band-aid fixes**: Prior fixes addressed symptoms, not the root cause. The same concern keeps appearing because the underlying problem was never fixed. Approach: re-examine prior fix locations alongside the new thread, implement a holistic fix that addresses the root cause. + - **Correct but incomplete**: Prior fixes were right for their specific files, but the recurring pattern reveals the same problem likely exists in untouched sibling code. This is the highest-value mode. Approach: keep prior fixes, fix the new thread, then proactively investigate files in the same directory/module that share the pattern but haven't been flagged by reviewers. Report what was found in the cluster assessment. + - **Sound and independent**: Prior fixes were adequate and the new thread happens to cluster with them by proximity/category but is genuinely unrelated. Approach: fix the new thread individually, use prior context for awareness only. + 4. **Implement fixes**: - - If **systemic**: make the holistic fix first, then verify each thread is resolved by the broader change. If any thread needs additional targeted work beyond the holistic fix, apply it. - - If **coincidental**: fix each thread individually as in standard mode. + - If **systemic** or **band-aid**: make the holistic fix first, then verify each thread is resolved by the broader change. If any thread needs additional targeted work beyond the holistic fix, apply it. + - If **correct but incomplete**: fix the new thread, then investigate sibling files in the cluster's `` for the same pattern. Fix any additional instances found. Stay within the area boundary. + - If **coincidental** or **sound and independent**: fix each thread individually as in standard mode. 5. **Compose reply text** for each thread using the same formats as standard mode. 6. **Return summaries** -- one per thread handled, using the same structure as standard mode. Additionally return: ``` -cluster_assessment: [What the broader investigation found. Whether a holistic -or individual approach was taken, and why. If holistic: what the systemic issue -was and how the fix addresses it. Keep to 2-3 sentences.] +cluster_assessment: [What the broader investigation found. Which assessment mode +was applied (systemic/coincidental for single-round, or band-aid/correct-but-incomplete/ +sound-and-independent for cross-invocation). If correct-but-incomplete: which additional +files were investigated and what was found. Keep to 2-4 sentences.] ``` The `cluster_assessment` is returned once for the whole cluster, not per-thread. diff --git a/plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md b/plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md index 489f829..7773017 100644 --- a/plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md +++ b/plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md @@ -78,15 +78,15 @@ Before planning and dispatching fixes, check whether feedback patterns suggest a | Gate signal | Check | |---|---| | **Volume** | 3+ new items from triage | -| **Verify-loop re-entry** | This is the 2nd+ pass through the workflow (new feedback appeared after a previous fix round) | +| **Cross-invocation** | `cross_invocation.signal == true` in the script output (resolved threads exist alongside new ones — evidence of multi-round review) | -If the gate does not fire, proceed to step 4. The common case (1-2 unrelated comments) skips this step entirely with zero overhead. +If the gate does not fire, proceed to step 4. The common case (first review round with 1-2 comments) skips this step entirely with zero overhead. -**If the gate fires**, analyze feedback for thematic clusters: +**If the gate fires**, analyze feedback for thematic clusters. When the cross-invocation signal fired, include resolved threads from `cross_invocation.resolved_threads` alongside new threads in the analysis — these are previously-resolved threads from earlier review rounds that provide pattern context. Mark them as `previously_resolved` so dispatch (step 5) knows not to individually re-resolve them. -1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each new item gets exactly one category based on what the feedback is about. +1. **Assign concern categories** from this fixed list: `error-handling`, `validation`, `type-safety`, `naming`, `performance`, `testing`, `security`, `documentation`, `style`, `architecture`, `other`. Each item (new and previously-resolved) gets exactly one category based on what the feedback is about. -2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree). +2. **Group by category + spatial proximity**. Two items form a potential cluster when they share a concern category AND are spatially proximate (same file, or files in the same directory subtree). Clusters can span new and previously-resolved threads. | Thematic match | Spatial proximity | Action | |---|---|---| @@ -102,20 +102,17 @@ If the gate does not fire, proceed to step 4. The common case (1-2 unrelated com [concern category] [common directory path] [comma-separated file paths] - [comma-separated thread/comment IDs] + [comma-separated new thread/comment IDs] [one sentence: what the individual comments collectively suggest about a deeper issue] + + + ``` - On verify-loop re-entry, add context about the previous cycle: - ```xml - - ... - [files modified in the previous fix cycle] - - ``` + The `` element lists previously-resolved threads that clustered with the new threads — their IDs, file paths, and assigned concern categories. This gives the resolver agent the full cross-round picture. When no previously-resolved threads are in the cluster, omit the element. -4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5. +4. **Items not in any cluster** remain as individual items and are dispatched normally in step 5. Previously-resolved threads that don't cluster with any new thread are dropped — they provided context but no pattern was found. 5. **If no clusters are found** after analysis (the gate fired but items don't form thematic+spatial groups), proceed with all items as individual. The gate was a false positive -- the only cost was the analysis itself. @@ -133,9 +130,13 @@ If step 3 produced clusters, include them in the task list as cluster items alon Process all three feedback types. Review threads are the primary type; PR comments and review bodies are secondary but should not be ignored. +#### Dispatch boundary for previously-resolved threads + +Previously-resolved threads (from `cross_invocation.resolved_threads`) participate in clustering and appear in cluster briefs as `` context. They are NEVER individually dispatched — they were already resolved in prior rounds. Only new threads get individual or cluster dispatch. + #### Individual dispatch (default) -**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually. +**For review threads** (`review_threads`): Spawn a `compound-engineering:workflow:pr-comment-resolver` agent for each new thread that is NOT already assigned to a cluster from step 3. Clustered threads are handled by cluster dispatch below -- do not dispatch them individually. Each agent receives: - The thread ID @@ -264,7 +265,7 @@ The `review_threads` array should be empty (except `needs-human` items). **If new threads remain**, check the iteration count for this run: -- **First or second fix-verify cycle**: Record which files were modified and which concern categories were addressed in this cycle. Then repeat from step 2 for the remaining threads. The cluster analysis gate (step 3) will fire on re-entry because verify-loop re-entry is a gate signal, enabling broader investigation of recurring patterns. +- **First or second fix-verify cycle**: Repeat from step 2 for the remaining threads. The re-fetch in step 1 will pick up threads resolved in earlier cycles as resolved threads in `cross_invocation`, so the cross-invocation gate (step 3) will fire naturally if patterns emerge across cycles. - **After the second fix-verify cycle** (3rd pass would begin): Stop looping. Surface remaining issues to the user with context about the recurring pattern: "Multiple rounds of feedback on [area/theme] suggest a deeper issue. Here's what we've fixed so far and what keeps appearing." Use the same `needs-human` escalation pattern -- leave threads open and present the pattern for the user to decide. diff --git a/plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments b/plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments index 8c909e2..1e9267b 100755 --- a/plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments +++ b/plugins/compound-engineering/skills/resolve-pr-feedback/scripts/get-pr-comments @@ -25,16 +25,19 @@ if [ -z "$OWNER" ] || [ -z "$REPO" ]; then fi # Fetch review threads, regular PR comments, and review bodies in one query. -# Output is a JSON object with three keys: -# review_threads - unresolved, non-outdated inline code review threads -# pr_comments - top-level PR conversation comments (excludes PR author) -# review_bodies - review submissions with non-empty body text (excludes PR author) +# Output is a JSON object with four keys: +# review_threads - unresolved, non-outdated inline code review threads +# pr_comments - top-level PR conversation comments (excludes PR author) +# review_bodies - review submissions with non-empty body text (excludes PR author) +# cross_invocation - cross-invocation awareness envelope: +# signal: true when both resolved and unresolved threads exist (multi-round review) +# resolved_threads: last N resolved threads by recency, for cluster analysis input gh api graphql -f owner="$OWNER" -f repo="$REPO" -F pr="$PR_NUMBER" -f query=' query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { repository(owner: $owner, name: $repo) { pullRequest(number: $pr) { author { login } - reviewThreads(first: 100) { + reviewThreads(first: 50) { edges { node { id @@ -42,7 +45,7 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { isOutdated path line - comments(first: 50) { + comments(first: 10) { nodes { id author { login } @@ -75,13 +78,27 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { } } } -}' | jq '.data.repository.pullRequest as $pr | { - review_threads: [$pr.reviewThreads.edges[] - | select(.node.isResolved == false and .node.isOutdated == false)], +}' | jq '.data.repository.pullRequest as $pr | + # Unresolved threads (existing behavior, unchanged) + [$pr.reviewThreads.edges[] + | select(.node.isResolved == false and .node.isOutdated == false)] as $unresolved | + # Resolved threads for cross-invocation awareness (last 10 by most recent comment) + [$pr.reviewThreads.edges[] + | select(.node.isResolved == true) + | { thread_id: .node.id, path: .node.path, line: .node.line, + first_comment_body: .node.comments.nodes[0].body, + last_comment_at: ([.node.comments.nodes[].createdAt] | sort | last) }] + | sort_by(.last_comment_at) | .[-10:] | reverse as $resolved | +{ + review_threads: $unresolved, pr_comments: [$pr.comments.nodes[] | select(.author.login != $pr.author.login) | select(.body | test("^\\s*$") | not)], review_bodies: [$pr.reviews.nodes[] | select(.body != null and .body != "") - | select(.author.login != $pr.author.login)] + | select(.author.login != $pr.author.login)], + cross_invocation: { + signal: (($resolved | length) > 0 and ($unresolved | length) > 0), + resolved_threads: $resolved + } }'