fix(ce-resolve-pr-feedback): stop dropping unresolved and actionable feedback (#617)

2026-04-20 20:44:16 -07:00
parent 3ed4a4fa0f
commit 153bea8669
4 changed files with 43 additions and 21 deletions
--- a/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md
+++ b/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md
@@ -32,6 +32,11 @@ Before touching any code, read the referenced file and classify the feedback:
 3. **Is it still relevant?** Has the code at this location changed since the review?
   - NO -> verdict: `not-addressing`

+   **Outdated threads (`isOutdated=true`):** The diff hunk shifted, so the reported line may no longer be where the concern lives. GitHub also exposes `line` as nullable -- outdated and file-level threads often have `line == null`. Start the lookup at whichever location field is available, preferring in order: `line`, `startLine`, `originalLine`, `originalStartLine`. If none resolve to current content matching the reviewer's description, extract an anchor from the comment (a symbol, identifier, or distinctive phrase) and search the **same file** once for it before concluding. Do not search other files. Three outcomes:
+   - Anchor found in the file (here or elsewhere in it) -> re-evaluate at that location using steps 2-4.
+   - Anchor not found and the comment describes concrete in-place code -> verdict: `not-addressing` with evidence ("searched <file> for <anchor>, not present").
+   - Anchor not found and the comment suggests the code was extracted to another file -> verdict: `needs-human`. Do not grep the repo; the reviewer's surrounding context is gone and picking the right new location is a judgment call for the user.
+
 4. **Would fixing improve the code?**
   - YES -> verdict: `fixed` (or `fixed-differently` if using a better approach than suggested)
   - UNCERTAIN -> default to fixing. Agent time is cheap.
--- a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md
@@ -49,7 +49,7 @@ Returns a JSON object with three keys:

 | Key | Contents | Has file/line? | Resolvable? |
 |-----|----------|---------------|-------------|
-| `review_threads` | Unresolved, non-outdated inline code review threads | Yes | Yes (GraphQL) |
+| `review_threads` | Unresolved inline code review threads (includes outdated; each carries its `isOutdated` flag so the resolver can account for line drift) | Yes | Yes (GraphQL) |
 | `pr_comments` | Top-level PR conversation comments (excludes PR author) | No | No |
 | `review_bodies` | Review submission bodies with non-empty text (excludes PR author) | No | No |

@@ -72,7 +72,7 @@ Before processing, classify each piece of feedback as **new** or **already handl

 The distinction is about content, not who posted what. A deferral from a teammate, a previous skill run, or a manual reply all count. Similarly, actionability is about content -- bot feedback that requests a specific code change is actionable; a bot's boilerplate header wrapping those requests is not.

-**Silent drop.** Non-actionable items are dropped without narration. Do not announce, list, or count dropped items in conversation, the task list, or the step 10 summary. Known review-bot wrappers (CodeRabbit, Codex, Gemini Code Assist, Copilot) and CI/status bot summaries (Codecov) are already filtered at the script level and will not appear; if a similar wrapper slips through, drop it silently and move on.
+**Silent drop.** Non-actionable items are dropped without narration. Do not announce, list, or count dropped items in conversation, the task list, or the step 10 summary. Review-bot wrappers from CodeRabbit, Codex, Gemini Code Assist, and Copilot (bodies like "Here are some automated review suggestions...") commonly appear here -- recognize them by their boilerplate content, drop silently. Only CI/status bot summaries (Codecov) are pre-filtered at the script level; everything else relies on this content-aware check so bot format changes cannot silently hide actionable findings.

 If there are no new items across all feedback types, skip steps 3-9 and go straight to step 10.

@@ -147,10 +147,11 @@ Previously-resolved threads (from `cross_invocation.resolved_threads`) participa

 Each agent receives:
 - The thread ID
- The file path and line number
+- The file path and location fields: `line`, `originalLine`, `startLine`, `originalStartLine` (any can be null; outdated and file-level threads often have `line == null` and must fall back to `originalLine`)
 - The full comment text (all comments in the thread)
 - The PR number (for context)
 - The feedback type (`review_thread`)
+- The `isOutdated` flag from the thread node (tells the agent the reported line may have drifted)

 **For PR comments and review bodies** (`pr_comments`, `review_bodies`): These lack file/line context. Spawn a `workflow:ce-pr-comment-resolver` agent for each actionable non-clustered item. The agent receives the comment ID, body text, PR number, and feedback type (`pr_comment` or `review_body`). The agent must identify the relevant files from the comment text and the PR diff.

@@ -375,7 +376,7 @@ This fetches thread IDs and their first comment IDs (minimal fields, no bodies)

 ### 2. Fix, Reply, Resolve

-Spawn a single `workflow:ce-pr-comment-resolver` agent for the thread. Then follow the same validate -> commit -> push -> reply -> resolve flow as Full Mode steps 6-8.
+Spawn a single `workflow:ce-pr-comment-resolver` agent for the thread. Pass the same fields full mode does, including `isOutdated` and the location fields (`line`, `originalLine`, `startLine`, `originalStartLine`) -- targeted threads can be outdated too and need the same relocation handling. Then follow the same validate -> commit -> push -> reply -> resolve flow as Full Mode steps 6-8.

 ---

--- a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-pr-comments
+++ b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-pr-comments
@@ -33,17 +33,18 @@ fi
 #     signal: true when both resolved and unresolved threads exist (multi-round review)
 #     resolved_threads: last N resolved threads by recency, for cluster analysis input
 #
-# Bot filtering: the bots listed below fall into two categories:
-#   - AI review bots (coderabbitai, codex, gemini, copilot): put actionable
-#     feedback in inline review comments (review_threads). Their top-level PR
-#     comments and review submission bodies are wrappers/summaries with no
-#     actionable asks on their own.
-#   - CI/status bots (codecov): post coverage, build, or deploy summaries. Never
-#     actionable on their own; any follow-up belongs in human review comments.
-# Filtering them at the source keeps the agent focused on real feedback and
-# avoids narrating "ignoring bot wrapper" on every run. Add new bot logins here
-# as they're observed; prefer exact login match over pattern matching to avoid
-# dropping actionable content from general-purpose bots.
+# Bot filtering: only CI/status bots (codecov, etc.) are filtered at the source.
+# Their output is structurally never actionable -- coverage numbers, build
+# summaries, deploy status -- and that holds regardless of format changes.
+# AI review bots (coderabbitai, codex, gemini, copilot) are NOT filtered here.
+# Historically their top-level comments were assumed to always be wrappers, but
+# that turned out to be wrong: Codex sometimes posts actionable findings as
+# top-level PR comments with no inline thread counterpart. Any source-level
+# heuristic to separate wrapper from actionable for these bots is brittle (one
+# bot format change away from silently dropping feedback). SKILL.md step 2
+# has a content-aware actionability check and Silent Drop rule that handles
+# wrappers correctly, so we trust that layer instead. Add new logins to the CI
+# list only if their output is structurally non-actionable like codecov's.
 gh api graphql -f owner="$OWNER" -f repo="$REPO" -F pr="$PR_NUMBER" -f query='
 query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
  repository(owner: $owner, name: $repo) {
@@ -57,6 +58,9 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
            isOutdated
            path
            line
+            originalLine
+            startLine
+            originalStartLine
            comments(first: 10) {
              nodes {
                id
@@ -87,11 +91,15 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
    }
  }
 }' | jq '.data.repository.pullRequest as $pr |
-  # Bots whose top-level comments and review bodies are wrappers or CI summaries, not actionable.
-  ["coderabbitai", "chatgpt-codex-connector", "gemini-code-assist", "copilot-pull-request-reviewer", "codecov"] as $review_bot_logins |
-  # Unresolved threads (existing behavior, unchanged)
+  # Structurally non-actionable bot output; always dropped.
+  ["codecov"] as $ci_bot_logins |
+  # Unresolved threads. `isOutdated` means the diff hunk around the comment
+  # has shifted since the thread was opened -- not that the reviewer concern
+  # was addressed. Resolution state is the only authoritative signal; outdated
+  # threads are still surfaced (with their isOutdated flag intact) so the
+  # resolver can factor in that the referenced line may have moved.
  [$pr.reviewThreads.edges[]
-    | select(.node.isResolved == false and .node.isOutdated == false)] as $unresolved |
+    | select(.node.isResolved == false)] as $unresolved |
  # Resolved threads for cross-invocation awareness (last 10 by most recent comment)
  [$pr.reviewThreads.edges[]
    | select(.node.isResolved == true)
@@ -103,12 +111,16 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
  review_threads: $unresolved,
  pr_comments: [$pr.comments.nodes[]
    | select(.author.login != $pr.author.login)
-    | select(.author.login as $l | $review_bot_logins | index($l) | not)
+    | select(
+        .author.login as $l | $ci_bot_logins | index($l) | not
+      )
    | select(.body | test("^\\s*$") | not)],
  review_bodies: [$pr.reviews.nodes[]
    | select(.body != null and .body != "")
    | select(.author.login != $pr.author.login)
-    | select(.author.login as $l | $review_bot_logins | index($l) | not)],
+    | select(
+        .author.login as $l | $ci_bot_logins | index($l) | not
+      )],
  cross_invocation: {
    signal: (($resolved | length) > 0 and ($unresolved | length) > 0),
    resolved_threads: $resolved
--- a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-thread-for-comment
+++ b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-thread-for-comment
@@ -36,8 +36,12 @@ query($owner: String!, $repo: String!, $pr: Int!) {
        nodes {
          id
          isResolved
+          isOutdated
          path
          line
+          originalLine
+          startLine
+          originalStartLine
          comments(first: 100) {
            nodes {
              id