fix(ce-resolve-pr-feedback): stop dropping unresolved and actionable feedback (#617)
Some checks failed
CI / pr-title (push) Has been cancelled
CI / test (push) Has been cancelled
Release PR / release-pr (push) Has been cancelled
Release PR / publish-cli (push) Has been cancelled

This commit is contained in:
Trevin Chow
2026-04-20 20:44:16 -07:00
committed by GitHub
parent 3ed4a4fa0f
commit 153bea8669
4 changed files with 43 additions and 21 deletions

View File

@@ -32,6 +32,11 @@ Before touching any code, read the referenced file and classify the feedback:
3. **Is it still relevant?** Has the code at this location changed since the review?
- NO -> verdict: `not-addressing`
**Outdated threads (`isOutdated=true`):** The diff hunk shifted, so the reported line may no longer be where the concern lives. GitHub also exposes `line` as nullable -- outdated and file-level threads often have `line == null`. Start the lookup at whichever location field is available, preferring in order: `line`, `startLine`, `originalLine`, `originalStartLine`. If none resolve to current content matching the reviewer's description, extract an anchor from the comment (a symbol, identifier, or distinctive phrase) and search the **same file** once for it before concluding. Do not search other files. Three outcomes:
- Anchor found in the file (here or elsewhere in it) -> re-evaluate at that location using steps 2-4.
- Anchor not found and the comment describes concrete in-place code -> verdict: `not-addressing` with evidence ("searched <file> for <anchor>, not present").
- Anchor not found and the comment suggests the code was extracted to another file -> verdict: `needs-human`. Do not grep the repo; the reviewer's surrounding context is gone and picking the right new location is a judgment call for the user.
4. **Would fixing improve the code?**
- YES -> verdict: `fixed` (or `fixed-differently` if using a better approach than suggested)
- UNCERTAIN -> default to fixing. Agent time is cheap.

View File

@@ -49,7 +49,7 @@ Returns a JSON object with three keys:
| Key | Contents | Has file/line? | Resolvable? |
|-----|----------|---------------|-------------|
| `review_threads` | Unresolved, non-outdated inline code review threads | Yes | Yes (GraphQL) |
| `review_threads` | Unresolved inline code review threads (includes outdated; each carries its `isOutdated` flag so the resolver can account for line drift) | Yes | Yes (GraphQL) |
| `pr_comments` | Top-level PR conversation comments (excludes PR author) | No | No |
| `review_bodies` | Review submission bodies with non-empty text (excludes PR author) | No | No |
@@ -72,7 +72,7 @@ Before processing, classify each piece of feedback as **new** or **already handl
The distinction is about content, not who posted what. A deferral from a teammate, a previous skill run, or a manual reply all count. Similarly, actionability is about content -- bot feedback that requests a specific code change is actionable; a bot's boilerplate header wrapping those requests is not.
**Silent drop.** Non-actionable items are dropped without narration. Do not announce, list, or count dropped items in conversation, the task list, or the step 10 summary. Known review-bot wrappers (CodeRabbit, Codex, Gemini Code Assist, Copilot) and CI/status bot summaries (Codecov) are already filtered at the script level and will not appear; if a similar wrapper slips through, drop it silently and move on.
**Silent drop.** Non-actionable items are dropped without narration. Do not announce, list, or count dropped items in conversation, the task list, or the step 10 summary. Review-bot wrappers from CodeRabbit, Codex, Gemini Code Assist, and Copilot (bodies like "Here are some automated review suggestions...") commonly appear here -- recognize them by their boilerplate content, drop silently. Only CI/status bot summaries (Codecov) are pre-filtered at the script level; everything else relies on this content-aware check so bot format changes cannot silently hide actionable findings.
If there are no new items across all feedback types, skip steps 3-9 and go straight to step 10.
@@ -147,10 +147,11 @@ Previously-resolved threads (from `cross_invocation.resolved_threads`) participa
Each agent receives:
- The thread ID
- The file path and line number
- The file path and location fields: `line`, `originalLine`, `startLine`, `originalStartLine` (any can be null; outdated and file-level threads often have `line == null` and must fall back to `originalLine`)
- The full comment text (all comments in the thread)
- The PR number (for context)
- The feedback type (`review_thread`)
- The `isOutdated` flag from the thread node (tells the agent the reported line may have drifted)
**For PR comments and review bodies** (`pr_comments`, `review_bodies`): These lack file/line context. Spawn a `workflow:ce-pr-comment-resolver` agent for each actionable non-clustered item. The agent receives the comment ID, body text, PR number, and feedback type (`pr_comment` or `review_body`). The agent must identify the relevant files from the comment text and the PR diff.
@@ -375,7 +376,7 @@ This fetches thread IDs and their first comment IDs (minimal fields, no bodies)
### 2. Fix, Reply, Resolve
Spawn a single `workflow:ce-pr-comment-resolver` agent for the thread. Then follow the same validate -> commit -> push -> reply -> resolve flow as Full Mode steps 6-8.
Spawn a single `workflow:ce-pr-comment-resolver` agent for the thread. Pass the same fields full mode does, including `isOutdated` and the location fields (`line`, `originalLine`, `startLine`, `originalStartLine`) -- targeted threads can be outdated too and need the same relocation handling. Then follow the same validate -> commit -> push -> reply -> resolve flow as Full Mode steps 6-8.
---

View File

@@ -33,17 +33,18 @@ fi
# signal: true when both resolved and unresolved threads exist (multi-round review)
# resolved_threads: last N resolved threads by recency, for cluster analysis input
#
# Bot filtering: the bots listed below fall into two categories:
# - AI review bots (coderabbitai, codex, gemini, copilot): put actionable
# feedback in inline review comments (review_threads). Their top-level PR
# comments and review submission bodies are wrappers/summaries with no
# actionable asks on their own.
# - CI/status bots (codecov): post coverage, build, or deploy summaries. Never
# actionable on their own; any follow-up belongs in human review comments.
# Filtering them at the source keeps the agent focused on real feedback and
# avoids narrating "ignoring bot wrapper" on every run. Add new bot logins here
# as they're observed; prefer exact login match over pattern matching to avoid
# dropping actionable content from general-purpose bots.
# Bot filtering: only CI/status bots (codecov, etc.) are filtered at the source.
# Their output is structurally never actionable -- coverage numbers, build
# summaries, deploy status -- and that holds regardless of format changes.
# AI review bots (coderabbitai, codex, gemini, copilot) are NOT filtered here.
# Historically their top-level comments were assumed to always be wrappers, but
# that turned out to be wrong: Codex sometimes posts actionable findings as
# top-level PR comments with no inline thread counterpart. Any source-level
# heuristic to separate wrapper from actionable for these bots is brittle (one
# bot format change away from silently dropping feedback). SKILL.md step 2
# has a content-aware actionability check and Silent Drop rule that handles
# wrappers correctly, so we trust that layer instead. Add new logins to the CI
# list only if their output is structurally non-actionable like codecov's.
gh api graphql -f owner="$OWNER" -f repo="$REPO" -F pr="$PR_NUMBER" -f query='
query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
repository(owner: $owner, name: $repo) {
@@ -57,6 +58,9 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
isOutdated
path
line
originalLine
startLine
originalStartLine
comments(first: 10) {
nodes {
id
@@ -87,11 +91,15 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
}
}
}' | jq '.data.repository.pullRequest as $pr |
# Bots whose top-level comments and review bodies are wrappers or CI summaries, not actionable.
["coderabbitai", "chatgpt-codex-connector", "gemini-code-assist", "copilot-pull-request-reviewer", "codecov"] as $review_bot_logins |
# Unresolved threads (existing behavior, unchanged)
# Structurally non-actionable bot output; always dropped.
["codecov"] as $ci_bot_logins |
# Unresolved threads. `isOutdated` means the diff hunk around the comment
# has shifted since the thread was opened -- not that the reviewer concern
# was addressed. Resolution state is the only authoritative signal; outdated
# threads are still surfaced (with their isOutdated flag intact) so the
# resolver can factor in that the referenced line may have moved.
[$pr.reviewThreads.edges[]
| select(.node.isResolved == false and .node.isOutdated == false)] as $unresolved |
| select(.node.isResolved == false)] as $unresolved |
# Resolved threads for cross-invocation awareness (last 10 by most recent comment)
[$pr.reviewThreads.edges[]
| select(.node.isResolved == true)
@@ -103,12 +111,16 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
review_threads: $unresolved,
pr_comments: [$pr.comments.nodes[]
| select(.author.login != $pr.author.login)
| select(.author.login as $l | $review_bot_logins | index($l) | not)
| select(
.author.login as $l | $ci_bot_logins | index($l) | not
)
| select(.body | test("^\\s*$") | not)],
review_bodies: [$pr.reviews.nodes[]
| select(.body != null and .body != "")
| select(.author.login != $pr.author.login)
| select(.author.login as $l | $review_bot_logins | index($l) | not)],
| select(
.author.login as $l | $ci_bot_logins | index($l) | not
)],
cross_invocation: {
signal: (($resolved | length) > 0 and ($unresolved | length) > 0),
resolved_threads: $resolved

View File

@@ -36,8 +36,12 @@ query($owner: String!, $repo: String!, $pr: Int!) {
nodes {
id
isResolved
isOutdated
path
line
originalLine
startLine
originalStartLine
comments(first: 100) {
nodes {
id