From 153bea8669d63848f57942e842cd58ed664e7435 Mon Sep 17 00:00:00 2001 From: Trevin Chow Date: Mon, 20 Apr 2026 20:44:16 -0700 Subject: [PATCH] fix(ce-resolve-pr-feedback): stop dropping unresolved and actionable feedback (#617) --- .../workflow/ce-pr-comment-resolver.agent.md | 5 ++ .../skills/ce-resolve-pr-feedback/SKILL.md | 9 ++-- .../scripts/get-pr-comments | 46 ++++++++++++------- .../scripts/get-thread-for-comment | 4 ++ 4 files changed, 43 insertions(+), 21 deletions(-) diff --git a/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md b/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md index 0f46dd7..d385c78 100644 --- a/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md +++ b/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md @@ -32,6 +32,11 @@ Before touching any code, read the referenced file and classify the feedback: 3. **Is it still relevant?** Has the code at this location changed since the review? - NO -> verdict: `not-addressing` + **Outdated threads (`isOutdated=true`):** The diff hunk shifted, so the reported line may no longer be where the concern lives. GitHub also exposes `line` as nullable -- outdated and file-level threads often have `line == null`. Start the lookup at whichever location field is available, preferring in order: `line`, `startLine`, `originalLine`, `originalStartLine`. If none resolve to current content matching the reviewer's description, extract an anchor from the comment (a symbol, identifier, or distinctive phrase) and search the **same file** once for it before concluding. Do not search other files. Three outcomes: + - Anchor found in the file (here or elsewhere in it) -> re-evaluate at that location using steps 2-4. + - Anchor not found and the comment describes concrete in-place code -> verdict: `not-addressing` with evidence ("searched for , not present"). + - Anchor not found and the comment suggests the code was extracted to another file -> verdict: `needs-human`. Do not grep the repo; the reviewer's surrounding context is gone and picking the right new location is a judgment call for the user. + 4. **Would fixing improve the code?** - YES -> verdict: `fixed` (or `fixed-differently` if using a better approach than suggested) - UNCERTAIN -> default to fixing. Agent time is cheap. diff --git a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md index 4b7d100..98c8bd6 100644 --- a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md +++ b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md @@ -49,7 +49,7 @@ Returns a JSON object with three keys: | Key | Contents | Has file/line? | Resolvable? | |-----|----------|---------------|-------------| -| `review_threads` | Unresolved, non-outdated inline code review threads | Yes | Yes (GraphQL) | +| `review_threads` | Unresolved inline code review threads (includes outdated; each carries its `isOutdated` flag so the resolver can account for line drift) | Yes | Yes (GraphQL) | | `pr_comments` | Top-level PR conversation comments (excludes PR author) | No | No | | `review_bodies` | Review submission bodies with non-empty text (excludes PR author) | No | No | @@ -72,7 +72,7 @@ Before processing, classify each piece of feedback as **new** or **already handl The distinction is about content, not who posted what. A deferral from a teammate, a previous skill run, or a manual reply all count. Similarly, actionability is about content -- bot feedback that requests a specific code change is actionable; a bot's boilerplate header wrapping those requests is not. -**Silent drop.** Non-actionable items are dropped without narration. Do not announce, list, or count dropped items in conversation, the task list, or the step 10 summary. Known review-bot wrappers (CodeRabbit, Codex, Gemini Code Assist, Copilot) and CI/status bot summaries (Codecov) are already filtered at the script level and will not appear; if a similar wrapper slips through, drop it silently and move on. +**Silent drop.** Non-actionable items are dropped without narration. Do not announce, list, or count dropped items in conversation, the task list, or the step 10 summary. Review-bot wrappers from CodeRabbit, Codex, Gemini Code Assist, and Copilot (bodies like "Here are some automated review suggestions...") commonly appear here -- recognize them by their boilerplate content, drop silently. Only CI/status bot summaries (Codecov) are pre-filtered at the script level; everything else relies on this content-aware check so bot format changes cannot silently hide actionable findings. If there are no new items across all feedback types, skip steps 3-9 and go straight to step 10. @@ -147,10 +147,11 @@ Previously-resolved threads (from `cross_invocation.resolved_threads`) participa Each agent receives: - The thread ID -- The file path and line number +- The file path and location fields: `line`, `originalLine`, `startLine`, `originalStartLine` (any can be null; outdated and file-level threads often have `line == null` and must fall back to `originalLine`) - The full comment text (all comments in the thread) - The PR number (for context) - The feedback type (`review_thread`) +- The `isOutdated` flag from the thread node (tells the agent the reported line may have drifted) **For PR comments and review bodies** (`pr_comments`, `review_bodies`): These lack file/line context. Spawn a `workflow:ce-pr-comment-resolver` agent for each actionable non-clustered item. The agent receives the comment ID, body text, PR number, and feedback type (`pr_comment` or `review_body`). The agent must identify the relevant files from the comment text and the PR diff. @@ -375,7 +376,7 @@ This fetches thread IDs and their first comment IDs (minimal fields, no bodies) ### 2. Fix, Reply, Resolve -Spawn a single `workflow:ce-pr-comment-resolver` agent for the thread. Then follow the same validate -> commit -> push -> reply -> resolve flow as Full Mode steps 6-8. +Spawn a single `workflow:ce-pr-comment-resolver` agent for the thread. Pass the same fields full mode does, including `isOutdated` and the location fields (`line`, `originalLine`, `startLine`, `originalStartLine`) -- targeted threads can be outdated too and need the same relocation handling. Then follow the same validate -> commit -> push -> reply -> resolve flow as Full Mode steps 6-8. --- diff --git a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-pr-comments b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-pr-comments index 5dfab1d..1aa15ac 100755 --- a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-pr-comments +++ b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-pr-comments @@ -33,17 +33,18 @@ fi # signal: true when both resolved and unresolved threads exist (multi-round review) # resolved_threads: last N resolved threads by recency, for cluster analysis input # -# Bot filtering: the bots listed below fall into two categories: -# - AI review bots (coderabbitai, codex, gemini, copilot): put actionable -# feedback in inline review comments (review_threads). Their top-level PR -# comments and review submission bodies are wrappers/summaries with no -# actionable asks on their own. -# - CI/status bots (codecov): post coverage, build, or deploy summaries. Never -# actionable on their own; any follow-up belongs in human review comments. -# Filtering them at the source keeps the agent focused on real feedback and -# avoids narrating "ignoring bot wrapper" on every run. Add new bot logins here -# as they're observed; prefer exact login match over pattern matching to avoid -# dropping actionable content from general-purpose bots. +# Bot filtering: only CI/status bots (codecov, etc.) are filtered at the source. +# Their output is structurally never actionable -- coverage numbers, build +# summaries, deploy status -- and that holds regardless of format changes. +# AI review bots (coderabbitai, codex, gemini, copilot) are NOT filtered here. +# Historically their top-level comments were assumed to always be wrappers, but +# that turned out to be wrong: Codex sometimes posts actionable findings as +# top-level PR comments with no inline thread counterpart. Any source-level +# heuristic to separate wrapper from actionable for these bots is brittle (one +# bot format change away from silently dropping feedback). SKILL.md step 2 +# has a content-aware actionability check and Silent Drop rule that handles +# wrappers correctly, so we trust that layer instead. Add new logins to the CI +# list only if their output is structurally non-actionable like codecov's. gh api graphql -f owner="$OWNER" -f repo="$REPO" -F pr="$PR_NUMBER" -f query=' query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { repository(owner: $owner, name: $repo) { @@ -57,6 +58,9 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { isOutdated path line + originalLine + startLine + originalStartLine comments(first: 10) { nodes { id @@ -87,11 +91,15 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { } } }' | jq '.data.repository.pullRequest as $pr | - # Bots whose top-level comments and review bodies are wrappers or CI summaries, not actionable. - ["coderabbitai", "chatgpt-codex-connector", "gemini-code-assist", "copilot-pull-request-reviewer", "codecov"] as $review_bot_logins | - # Unresolved threads (existing behavior, unchanged) + # Structurally non-actionable bot output; always dropped. + ["codecov"] as $ci_bot_logins | + # Unresolved threads. `isOutdated` means the diff hunk around the comment + # has shifted since the thread was opened -- not that the reviewer concern + # was addressed. Resolution state is the only authoritative signal; outdated + # threads are still surfaced (with their isOutdated flag intact) so the + # resolver can factor in that the referenced line may have moved. [$pr.reviewThreads.edges[] - | select(.node.isResolved == false and .node.isOutdated == false)] as $unresolved | + | select(.node.isResolved == false)] as $unresolved | # Resolved threads for cross-invocation awareness (last 10 by most recent comment) [$pr.reviewThreads.edges[] | select(.node.isResolved == true) @@ -103,12 +111,16 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { review_threads: $unresolved, pr_comments: [$pr.comments.nodes[] | select(.author.login != $pr.author.login) - | select(.author.login as $l | $review_bot_logins | index($l) | not) + | select( + .author.login as $l | $ci_bot_logins | index($l) | not + ) | select(.body | test("^\\s*$") | not)], review_bodies: [$pr.reviews.nodes[] | select(.body != null and .body != "") | select(.author.login != $pr.author.login) - | select(.author.login as $l | $review_bot_logins | index($l) | not)], + | select( + .author.login as $l | $ci_bot_logins | index($l) | not + )], cross_invocation: { signal: (($resolved | length) > 0 and ($unresolved | length) > 0), resolved_threads: $resolved diff --git a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-thread-for-comment b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-thread-for-comment index 56c4564..7f45f3e 100755 --- a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-thread-for-comment +++ b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-thread-for-comment @@ -36,8 +36,12 @@ query($owner: String!, $repo: String!, $pr: Int!) { nodes { id isResolved + isOutdated path line + originalLine + startLine + originalStartLine comments(first: 100) { nodes { id