feat(ce-resolve-pr-feedback): drop bot noise, centralize test runs (#610)
Some checks failed
CI / pr-title (push) Has been cancelled
CI / test (push) Has been cancelled
Release PR / release-pr (push) Has been cancelled
Release PR / publish-cli (push) Has been cancelled

This commit is contained in:
Trevin Chow
2026-04-20 00:33:03 -07:00
committed by GitHub
parent c1f68d4d55
commit b35de99788
3 changed files with 47 additions and 17 deletions

View File

@@ -44,7 +44,9 @@ Before touching any code, read the referenced file and classify the feedback:
1. **Read the code** at the referenced file and line. For review threads, the file path and line are provided directly. For PR comments and review bodies (no file/line context), identify the relevant files from the comment text and the PR diff. 1. **Read the code** at the referenced file and line. For review threads, the file path and line are provided directly. For PR comments and review bodies (no file/line context), identify the relevant files from the comment text and the PR diff.
2. **Evaluate validity** using the rubric above. 2. **Evaluate validity** using the rubric above.
3. **If fixing**: implement the change. Keep it focused -- address the feedback, don't refactor the neighborhood. Verify the change doesn't break the immediate logic. 3. **If fixing**: implement the change. Keep it focused -- address the feedback, don't refactor the neighborhood. Write a test when the fix warrants one and none exists.
**Test scope rule.** Run only targeted tests for what you changed: a specific test file, a test pattern, or the test you just wrote. Examples: `bun test path/foo.test.ts`, `pytest tests/module/test_foo.py`, `rspec spec/models/user_spec.rb`. **Never run the full project test suite** (bare `bun test`, `pytest`, `rspec` with no path) -- the parent skill runs it once against the combined diff from all resolvers. Skip targeted tests entirely for pure doc/comment/string-literal edits with no behavioral impact. If you can't locate targeted tests, note it in `reason` and let the combined run catch any issues; do not downgrade your verdict.
4. **Compose the reply text** for the parent to post. Quote the specific sentence or passage being addressed -- not the entire comment if it's long. This helps readers follow the conversation without scrolling. 4. **Compose the reply text** for the parent to post. Quote the specific sentence or passage being addressed -- not the entire comment if it's long. This helps readers follow the conversation without scrolling.
For fixed items: For fixed items:

View File

@@ -72,7 +72,9 @@ Before processing, classify each piece of feedback as **new** or **already handl
The distinction is about content, not who posted what. A deferral from a teammate, a previous skill run, or a manual reply all count. Similarly, actionability is about content -- bot feedback that requests a specific code change is actionable; a bot's boilerplate header wrapping those requests is not. The distinction is about content, not who posted what. A deferral from a teammate, a previous skill run, or a manual reply all count. Similarly, actionability is about content -- bot feedback that requests a specific code change is actionable; a bot's boilerplate header wrapping those requests is not.
If there are no new items across all feedback types, skip steps 3-8 and go straight to step 9. **Silent drop.** Non-actionable items are dropped without narration. Do not announce, list, or count dropped items in conversation, the task list, or the step 10 summary. Known review-bot wrappers (CodeRabbit, Codex, Gemini Code Assist, Copilot) and CI/status bot summaries (Codecov) are already filtered at the script level and will not appear; if a similar wrapper slips through, drop it silently and move on.
If there are no new items across all feedback types, skip steps 3-9 and go straight to step 10.
### 3. Cluster Analysis (Gated) ### 3. Cluster Analysis (Gated)
@@ -190,13 +192,25 @@ Verdict meanings:
**Sequential fallback**: Platforms that do not support parallel dispatch should run agents sequentially. Dispatch cluster units first (they are higher-leverage), then individual items. **Sequential fallback**: Platforms that do not support parallel dispatch should run agents sequentially. Dispatch cluster units first (they are higher-leverage), then individual items.
Fixes can occasionally expand beyond their referenced file (e.g., renaming a method updates callers elsewhere). This is rare but can cause parallel agents to collide. The verification step (step 8) catches this -- if re-fetching shows unresolved threads or if the commit reveals inconsistent changes, re-run the affected agents sequentially. Fixes can occasionally expand beyond their referenced file (e.g., renaming a method updates callers elsewhere). This is rare but can cause parallel agents to collide. Step 6 (combined validation) catches test breakage; step 9 (verify) catches unresolved threads. If either surfaces inconsistent changes from parallel fixes, re-run the affected agents sequentially.
### 6. Commit and Push ### 6. Validate Combined State
After all agents complete, check whether any files were actually changed. If all verdicts are `replied`, `not-addressing`, or `needs-human` (no code changes), skip this step entirely and proceed to step 7. After all agents complete, aggregate `files_changed` across every returned summary (individual and cluster alike). If it's empty -- all verdicts are `replied`, `not-addressing`, or `needs-human` -- skip steps 6 and 7 entirely and proceed to step 8.
If there are file changes: Resolvers run only targeted tests on their own changes. This step runs the project's full validation **once** against the combined diff to catch cross-agent interactions that targeted runs can't see.
1. **Run the project's validation command** (test suite, type check, or whatever the repo's AGENTS.md/CLAUDE.md specifies). Run once, not per-agent.
2. **Green** -> proceed to step 7.
3. **Red, failures touch files resolvers changed** -> one inline diagnose-and-fix pass. Re-run validation. If still red, escalate with a `needs-human` item containing the test output; do **not** commit.
4. **Red, failures touch only files no resolver changed** -> treat as pre-existing. Proceed to step 7, but add a footer to the commit message: `Note: pre-existing failure in <test> not addressed by this PR.`
Record the validation outcome (command run, pass/fail counts, any pre-existing failures noted) for the step 10 summary.
### 7. Commit and Push
1. Stage only files reported by sub-agents and commit with a message referencing the PR: 1. Stage only files reported by sub-agents and commit with a message referencing the PR:
@@ -212,7 +226,7 @@ git commit -m "Address PR review feedback (#PR_NUMBER)
git push git push
``` ```
### 7. Reply and Resolve ### 8. Reply and Resolve
After the push succeeds, post replies and resolve where applicable. The mechanism depends on the feedback type. After the push succeeds, post replies and resolve where applicable. The mechanism depends on the feedback type.
@@ -258,7 +272,7 @@ gh pr comment PR_NUMBER --body "REPLY_TEXT"
Include enough quoted context in the reply so the reader can follow which comment is being addressed without scrolling. Include enough quoted context in the reply so the reader can follow which comment is being addressed without scrolling.
### 8. Verify ### 9. Verify
Re-fetch feedback to confirm resolution: Re-fetch feedback to confirm resolution:
@@ -276,7 +290,7 @@ The `review_threads` array should be empty (except `needs-human` items).
PR comments and review bodies have no resolve mechanism, so they will still appear in the output. Verify they were replied to by checking the PR conversation. PR comments and review bodies have no resolve mechanism, so they will still appear in the output. Verify they were replied to by checking the PR conversation.
### 9. Summary ### 10. Summary
Present a concise summary of all work done. Group by verdict, one line per item describing *what was done* not just *where*. This is the primary output the user sees. Present a concise summary of all work done. Group by verdict, one line per item describing *what was done* not just *where*. This is the primary output the user sees.
@@ -289,6 +303,8 @@ Fixed (count): [brief description of each fix]
Fixed differently (count): [what was changed and why the approach differed] Fixed differently (count): [what was changed and why the approach differed]
Replied (count): [what questions were answered] Replied (count): [what questions were answered]
Not addressing (count): [what was skipped and why] Not addressing (count): [what was skipped and why]
Validation: [one line -- e.g., "bun test passed (893/893)" or "bun test passed with pre-existing failure in X noted"; omit when no code changes were committed]
``` ```
If any clusters were investigated, append a cluster investigation section: If any clusters were investigated, append a cluster investigation section:
@@ -359,7 +375,7 @@ This fetches thread IDs and their first comment IDs (minimal fields, no bodies)
### 2. Fix, Reply, Resolve ### 2. Fix, Reply, Resolve
Spawn a single `workflow:ce-pr-comment-resolver` agent for the thread. Then follow the same commit -> push -> reply -> resolve flow as Full Mode steps 6-7. Spawn a single `workflow:ce-pr-comment-resolver` agent for the thread. Then follow the same validate -> commit -> push -> reply -> resolve flow as Full Mode steps 6-8.
--- ---

View File

@@ -27,11 +27,23 @@ fi
# Fetch review threads, regular PR comments, and review bodies in one query. # Fetch review threads, regular PR comments, and review bodies in one query.
# Output is a JSON object with four keys: # Output is a JSON object with four keys:
# review_threads - unresolved, non-outdated inline code review threads # review_threads - unresolved, non-outdated inline code review threads
# pr_comments - top-level PR conversation comments (excludes PR author) # pr_comments - top-level PR conversation comments (excludes PR author and known review bots)
# review_bodies - review submissions with non-empty body text (excludes PR author) # review_bodies - review submissions with non-empty body text (excludes PR author and known review bots)
# cross_invocation - cross-invocation awareness envelope: # cross_invocation - cross-invocation awareness envelope:
# signal: true when both resolved and unresolved threads exist (multi-round review) # signal: true when both resolved and unresolved threads exist (multi-round review)
# resolved_threads: last N resolved threads by recency, for cluster analysis input # resolved_threads: last N resolved threads by recency, for cluster analysis input
#
# Bot filtering: the bots listed below fall into two categories:
# - AI review bots (coderabbitai, codex, gemini, copilot): put actionable
# feedback in inline review comments (review_threads). Their top-level PR
# comments and review submission bodies are wrappers/summaries with no
# actionable asks on their own.
# - CI/status bots (codecov): post coverage, build, or deploy summaries. Never
# actionable on their own; any follow-up belongs in human review comments.
# Filtering them at the source keeps the agent focused on real feedback and
# avoids narrating "ignoring bot wrapper" on every run. Add new bot logins here
# as they're observed; prefer exact login match over pattern matching to avoid
# dropping actionable content from general-purpose bots.
gh api graphql -f owner="$OWNER" -f repo="$REPO" -F pr="$PR_NUMBER" -f query=' gh api graphql -f owner="$OWNER" -f repo="$REPO" -F pr="$PR_NUMBER" -f query='
query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
repository(owner: $owner, name: $repo) { repository(owner: $owner, name: $repo) {
@@ -62,8 +74,6 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
id id
author { login } author { login }
body body
createdAt
url
} }
} }
reviews(first: 50) { reviews(first: 50) {
@@ -72,13 +82,13 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
author { login } author { login }
body body
state state
createdAt
url
} }
} }
} }
} }
}' | jq '.data.repository.pullRequest as $pr | }' | jq '.data.repository.pullRequest as $pr |
# Bots whose top-level comments and review bodies are wrappers or CI summaries, not actionable.
["coderabbitai", "chatgpt-codex-connector", "gemini-code-assist", "copilot-pull-request-reviewer", "codecov"] as $review_bot_logins |
# Unresolved threads (existing behavior, unchanged) # Unresolved threads (existing behavior, unchanged)
[$pr.reviewThreads.edges[] [$pr.reviewThreads.edges[]
| select(.node.isResolved == false and .node.isOutdated == false)] as $unresolved | | select(.node.isResolved == false and .node.isOutdated == false)] as $unresolved |
@@ -93,10 +103,12 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) {
review_threads: $unresolved, review_threads: $unresolved,
pr_comments: [$pr.comments.nodes[] pr_comments: [$pr.comments.nodes[]
| select(.author.login != $pr.author.login) | select(.author.login != $pr.author.login)
| select(.author.login as $l | $review_bot_logins | index($l) | not)
| select(.body | test("^\\s*$") | not)], | select(.body | test("^\\s*$") | not)],
review_bodies: [$pr.reviews.nodes[] review_bodies: [$pr.reviews.nodes[]
| select(.body != null and .body != "") | select(.body != null and .body != "")
| select(.author.login != $pr.author.login)], | select(.author.login != $pr.author.login)
| select(.author.login as $l | $review_bot_logins | index($l) | not)],
cross_invocation: { cross_invocation: {
signal: (($resolved | length) > 0 and ($unresolved | length) > 0), signal: (($resolved | length) > 0 and ($unresolved | length) > 0),
resolved_threads: $resolved resolved_threads: $resolved