From b35de997884e9d6cf69ef19c983d9e61cf9e4bd8 Mon Sep 17 00:00:00 2001 From: Trevin Chow Date: Mon, 20 Apr 2026 00:33:03 -0700 Subject: [PATCH] feat(ce-resolve-pr-feedback): drop bot noise, centralize test runs (#610) --- .../workflow/ce-pr-comment-resolver.agent.md | 4 ++- .../skills/ce-resolve-pr-feedback/SKILL.md | 34 ++++++++++++++----- .../scripts/get-pr-comments | 26 ++++++++++---- 3 files changed, 47 insertions(+), 17 deletions(-) diff --git a/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md b/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md index 6cb7eb0..27f6fc3 100644 --- a/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md +++ b/plugins/compound-engineering/agents/workflow/ce-pr-comment-resolver.agent.md @@ -44,7 +44,9 @@ Before touching any code, read the referenced file and classify the feedback: 1. **Read the code** at the referenced file and line. For review threads, the file path and line are provided directly. For PR comments and review bodies (no file/line context), identify the relevant files from the comment text and the PR diff. 2. **Evaluate validity** using the rubric above. -3. **If fixing**: implement the change. Keep it focused -- address the feedback, don't refactor the neighborhood. Verify the change doesn't break the immediate logic. +3. **If fixing**: implement the change. Keep it focused -- address the feedback, don't refactor the neighborhood. Write a test when the fix warrants one and none exists. + + **Test scope rule.** Run only targeted tests for what you changed: a specific test file, a test pattern, or the test you just wrote. Examples: `bun test path/foo.test.ts`, `pytest tests/module/test_foo.py`, `rspec spec/models/user_spec.rb`. **Never run the full project test suite** (bare `bun test`, `pytest`, `rspec` with no path) -- the parent skill runs it once against the combined diff from all resolvers. Skip targeted tests entirely for pure doc/comment/string-literal edits with no behavioral impact. If you can't locate targeted tests, note it in `reason` and let the combined run catch any issues; do not downgrade your verdict. 4. **Compose the reply text** for the parent to post. Quote the specific sentence or passage being addressed -- not the entire comment if it's long. This helps readers follow the conversation without scrolling. For fixed items: diff --git a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md index 4b0bbad..92aa94f 100644 --- a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md +++ b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/SKILL.md @@ -72,7 +72,9 @@ Before processing, classify each piece of feedback as **new** or **already handl The distinction is about content, not who posted what. A deferral from a teammate, a previous skill run, or a manual reply all count. Similarly, actionability is about content -- bot feedback that requests a specific code change is actionable; a bot's boilerplate header wrapping those requests is not. -If there are no new items across all feedback types, skip steps 3-8 and go straight to step 9. +**Silent drop.** Non-actionable items are dropped without narration. Do not announce, list, or count dropped items in conversation, the task list, or the step 10 summary. Known review-bot wrappers (CodeRabbit, Codex, Gemini Code Assist, Copilot) and CI/status bot summaries (Codecov) are already filtered at the script level and will not appear; if a similar wrapper slips through, drop it silently and move on. + +If there are no new items across all feedback types, skip steps 3-9 and go straight to step 10. ### 3. Cluster Analysis (Gated) @@ -190,13 +192,25 @@ Verdict meanings: **Sequential fallback**: Platforms that do not support parallel dispatch should run agents sequentially. Dispatch cluster units first (they are higher-leverage), then individual items. -Fixes can occasionally expand beyond their referenced file (e.g., renaming a method updates callers elsewhere). This is rare but can cause parallel agents to collide. The verification step (step 8) catches this -- if re-fetching shows unresolved threads or if the commit reveals inconsistent changes, re-run the affected agents sequentially. +Fixes can occasionally expand beyond their referenced file (e.g., renaming a method updates callers elsewhere). This is rare but can cause parallel agents to collide. Step 6 (combined validation) catches test breakage; step 9 (verify) catches unresolved threads. If either surfaces inconsistent changes from parallel fixes, re-run the affected agents sequentially. -### 6. Commit and Push +### 6. Validate Combined State -After all agents complete, check whether any files were actually changed. If all verdicts are `replied`, `not-addressing`, or `needs-human` (no code changes), skip this step entirely and proceed to step 7. +After all agents complete, aggregate `files_changed` across every returned summary (individual and cluster alike). If it's empty -- all verdicts are `replied`, `not-addressing`, or `needs-human` -- skip steps 6 and 7 entirely and proceed to step 8. -If there are file changes: +Resolvers run only targeted tests on their own changes. This step runs the project's full validation **once** against the combined diff to catch cross-agent interactions that targeted runs can't see. + +1. **Run the project's validation command** (test suite, type check, or whatever the repo's AGENTS.md/CLAUDE.md specifies). Run once, not per-agent. + +2. **Green** -> proceed to step 7. + +3. **Red, failures touch files resolvers changed** -> one inline diagnose-and-fix pass. Re-run validation. If still red, escalate with a `needs-human` item containing the test output; do **not** commit. + +4. **Red, failures touch only files no resolver changed** -> treat as pre-existing. Proceed to step 7, but add a footer to the commit message: `Note: pre-existing failure in not addressed by this PR.` + +Record the validation outcome (command run, pass/fail counts, any pre-existing failures noted) for the step 10 summary. + +### 7. Commit and Push 1. Stage only files reported by sub-agents and commit with a message referencing the PR: @@ -212,7 +226,7 @@ git commit -m "Address PR review feedback (#PR_NUMBER) git push ``` -### 7. Reply and Resolve +### 8. Reply and Resolve After the push succeeds, post replies and resolve where applicable. The mechanism depends on the feedback type. @@ -258,7 +272,7 @@ gh pr comment PR_NUMBER --body "REPLY_TEXT" Include enough quoted context in the reply so the reader can follow which comment is being addressed without scrolling. -### 8. Verify +### 9. Verify Re-fetch feedback to confirm resolution: @@ -276,7 +290,7 @@ The `review_threads` array should be empty (except `needs-human` items). PR comments and review bodies have no resolve mechanism, so they will still appear in the output. Verify they were replied to by checking the PR conversation. -### 9. Summary +### 10. Summary Present a concise summary of all work done. Group by verdict, one line per item describing *what was done* not just *where*. This is the primary output the user sees. @@ -289,6 +303,8 @@ Fixed (count): [brief description of each fix] Fixed differently (count): [what was changed and why the approach differed] Replied (count): [what questions were answered] Not addressing (count): [what was skipped and why] + +Validation: [one line -- e.g., "bun test passed (893/893)" or "bun test passed with pre-existing failure in X noted"; omit when no code changes were committed] ``` If any clusters were investigated, append a cluster investigation section: @@ -359,7 +375,7 @@ This fetches thread IDs and their first comment IDs (minimal fields, no bodies) ### 2. Fix, Reply, Resolve -Spawn a single `workflow:ce-pr-comment-resolver` agent for the thread. Then follow the same commit -> push -> reply -> resolve flow as Full Mode steps 6-7. +Spawn a single `workflow:ce-pr-comment-resolver` agent for the thread. Then follow the same validate -> commit -> push -> reply -> resolve flow as Full Mode steps 6-8. --- diff --git a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-pr-comments b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-pr-comments index 1e9267b..5dfab1d 100755 --- a/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-pr-comments +++ b/plugins/compound-engineering/skills/ce-resolve-pr-feedback/scripts/get-pr-comments @@ -27,11 +27,23 @@ fi # Fetch review threads, regular PR comments, and review bodies in one query. # Output is a JSON object with four keys: # review_threads - unresolved, non-outdated inline code review threads -# pr_comments - top-level PR conversation comments (excludes PR author) -# review_bodies - review submissions with non-empty body text (excludes PR author) +# pr_comments - top-level PR conversation comments (excludes PR author and known review bots) +# review_bodies - review submissions with non-empty body text (excludes PR author and known review bots) # cross_invocation - cross-invocation awareness envelope: # signal: true when both resolved and unresolved threads exist (multi-round review) # resolved_threads: last N resolved threads by recency, for cluster analysis input +# +# Bot filtering: the bots listed below fall into two categories: +# - AI review bots (coderabbitai, codex, gemini, copilot): put actionable +# feedback in inline review comments (review_threads). Their top-level PR +# comments and review submission bodies are wrappers/summaries with no +# actionable asks on their own. +# - CI/status bots (codecov): post coverage, build, or deploy summaries. Never +# actionable on their own; any follow-up belongs in human review comments. +# Filtering them at the source keeps the agent focused on real feedback and +# avoids narrating "ignoring bot wrapper" on every run. Add new bot logins here +# as they're observed; prefer exact login match over pattern matching to avoid +# dropping actionable content from general-purpose bots. gh api graphql -f owner="$OWNER" -f repo="$REPO" -F pr="$PR_NUMBER" -f query=' query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { repository(owner: $owner, name: $repo) { @@ -62,8 +74,6 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { id author { login } body - createdAt - url } } reviews(first: 50) { @@ -72,13 +82,13 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { author { login } body state - createdAt - url } } } } }' | jq '.data.repository.pullRequest as $pr | + # Bots whose top-level comments and review bodies are wrappers or CI summaries, not actionable. + ["coderabbitai", "chatgpt-codex-connector", "gemini-code-assist", "copilot-pull-request-reviewer", "codecov"] as $review_bot_logins | # Unresolved threads (existing behavior, unchanged) [$pr.reviewThreads.edges[] | select(.node.isResolved == false and .node.isOutdated == false)] as $unresolved | @@ -93,10 +103,12 @@ query FetchPRFeedback($owner: String!, $repo: String!, $pr: Int!) { review_threads: $unresolved, pr_comments: [$pr.comments.nodes[] | select(.author.login != $pr.author.login) + | select(.author.login as $l | $review_bot_logins | index($l) | not) | select(.body | test("^\\s*$") | not)], review_bodies: [$pr.reviews.nodes[] | select(.body != null and .body != "") - | select(.author.login != $pr.author.login)], + | select(.author.login != $pr.author.login) + | select(.author.login as $l | $review_bot_logins | index($l) | not)], cross_invocation: { signal: (($resolved | length) > 0 and ($unresolved | length) > 0), resolved_threads: $resolved