Merge step (f/g/h): retire todo system, retool pr-comments-review, rewrite ce-upstream-merge

Applies triage doc 021 in full. Legacy file-based todo system is gone; the two local workflows that depended on it are retooled around platform- native task tracking and ce-code-review-style walkthrough patterns. Deletions: - commands/resolve_todo_parallel.md — obsolete; ce-pr-comments-review now orchestrates resolution inline - commands/pr-comments-to-todos.md — replaced by ce-pr-comments-review skill - skills/ce-upstream-merge/assets/merge-triage-template.md — template now lives inline in SKILL.md since decisions live in the task tracker Retool: - New skill skills/ce-pr-comments-review/SKILL.md: fetch PR comments, pressure-test each via validator agent, walk through with AskUserQuestion (Fix now / Clarify / Push back / Skip), dispatch ce-pr-comment-resolver for accepted fixes. Uses TaskCreate instead of todos/*.md. Rewrite: - skills/ce-upstream-merge/SKILL.md: drops the "create todos in todos/" pattern entirely. Phase 2 builds a platform-task decision set instead. Phase 3 walks through with AskUserQuestion. Phase 4 splits execution into 7 named checkpoints (b-h) so the user can inspect and course- correct mid-flight. Adds explicit "hidden conflicts" detection in Phase 1 (upstream rename of file local deleted, upstream deletion of file local depends on, structural refactors). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:01:15 -05:00
parent 4d4586d4a8
commit 9b690499ab
5 changed files with 232 additions and 558 deletions
--- a/plugins/compound-engineering/skills/ce-pr-comments-review/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-pr-comments-review/SKILL.md
@@ -0,0 +1,140 @@
+---
+name: ce-pr-comments-review
+description: "Fetch PR review comments, pressure-test each for validity against the actual codebase, then walk through a decision-per-finding (Fix now / Clarify / Push back / Skip) and apply fixes for accepted items. Every comment is scrutinized before any code changes."
+argument-hint: "[PR number, GitHub URL, or 'current' for current branch PR]"
+---
+
+# PR Comments Review
+
+Reviewer comments are **suggestions, not orders**. This skill fetches them, pressure-tests each one against the actual codebase, and asks you to accept, clarify, push back, or skip — one decision per finding. Only accepted items change code.
+
+The flow mirrors `ce-code-review`'s walk-through pattern: validate first, decide per finding, resolve in parallel where safe.
+
+## When to use
+
+- After a human reviewer leaves comments on an open PR and you want an agent pass that **doesn't blindly apply everything**.
+- When a zip-agent or bot reviewer has left many comments and you need the validator pressure-test before acting on any of them.
+- Before declaring a PR "responded to" — as a forcing function that no comment got silently ignored.
+
+## Input
+
+<pr_target> #$ARGUMENTS </pr_target>
+
+If the input above is empty, ask: "Which PR should we review? (number, URL, or 'current')". Default to `current` when unambiguous.
+
+## Workflow
+
+### Phase 1: Fetch
+
+Resolve the PR reference:
+
+- Numeric → PR number.
+- GitHub URL → extract PR number.
+- `current` or empty → `gh pr status` on the current branch.
+
+Then gather:
+
+```bash
+gh pr view <PR> --json number,title,url,author,headRefName,baseRefName
+gh api repos/{owner}/{repo}/pulls/<PR>/comments
+gh pr view <PR> --json reviews,reviewDecision
+```
+
+Group comments by file + line + thread for context. Skip:
+
+- Bare acknowledgments ("LGTM", "Looks good").
+- Questions that were answered inline in later replies.
+- Threads already marked resolved.
+
+### Phase 2: Pressure-test (validate each comment)
+
+**Do not create tasks or prompt the user yet.** First validate.
+
+For each remaining comment, dispatch a validator agent. In Claude Code:
+
+```
+Agent(subagent_type: "compound-engineering:review:ce-zip-agent-validator",
+      description: "Validate PR comment",
+      prompt: "<comment context + file + line + surrounding code>")
+```
+
+The validator's job is to read the actual code and decide whether the comment applies. It returns a result of the shape:
+
+| Field | Values |
+|-------|--------|
+| `assessment` | `Clear & Correct` / `Unclear` / `Likely Incorrect` / `YAGNI` |
+| `recommended_action` | `Fix now` / `Clarify` / `Push back` / `Skip` |
+| `verified_code` | Yes/No plus what was checked |
+| `verified_tests` | Yes/No plus existing coverage |
+| `verified_usage` | Yes/No plus how the code is used |
+| `prior_decisions` | Yes/No plus any intentional design |
+| `technical_justification` | Prose reasoning |
+
+Prefer parallel dispatch for 2+ comments. For 5+ comments, batch in groups of 4 to keep context bounded.
+
+Preserve every validator result — do not drop the `YAGNI` / `Likely Incorrect` outputs. The user sees all of them in Phase 4.
+
+### Phase 3: Track as tasks
+
+Use the platform's task tool (`TaskCreate` in Claude Code, `update_plan` in Codex, etc.) to register one task per validated comment. Task subject: `<priority>: <short title>`. Task description: include file:line, the validator's assessment, recommended action, and technical justification.
+
+This replaces the old file-based `todos/*.md` layout. No files are written to disk.
+
+### Phase 4: Walk-through (one decision per comment)
+
+Present decisions one at a time using the platform's blocking question tool (call `ToolSearch` with `select:AskUserQuestion` first in Claude Code if the schema is not loaded; `request_user_input` in Codex; `ask_user` in Gemini and Pi via the `pi-ask-user` extension). Fall back to numbered options in chat only when no blocking tool is available.
+
+For each task, show:
+
+- The reviewer's original comment (verbatim).
+- File + line + a code snippet around the location.
+- The validator's assessment, technical justification, and recommended action.
+
+Then ask:
+
+**Stem:** `How should the agent handle this comment?`
+
+**Options (the four match the validator's `recommended_action` vocabulary so the user can accept the recommendation with one click or override it):**
+
+1. `Fix now` — apply the reviewer's suggestion (or a concrete variant).
+2. `Clarify` — reply to the reviewer asking for specifics; do not change code.
+3. `Push back` — reply disagreeing with the comment's premise; do not change code.
+4. `Skip` — take no action; mark the task complete with a skip reason.
+
+Order findings: `Clear & Correct` and `Likely Incorrect` first (they drive the biggest decisions), then `Unclear`, then `YAGNI`. Within each, newer threads before older ones.
+
+**Group related findings** when the same thread touches multiple lines or files — present them together in a single decision step, not sequentially.
+
+### Phase 5: Resolve
+
+Apply decisions in parallel where possible.
+
+- **Fix now** → dispatch `compound-engineering:workflow:ce-pr-comment-resolver` per finding (or grouped set). Prefer parallel; fall back to sequential if dependencies exist. When the resolver is done, post a reply on the thread confirming the fix and include the commit SHA.
+- **Clarify** → post the clarifying reply on the thread via `gh api`. Do not mark the thread resolved.
+- **Push back** → post the disagreement (pulling the technical justification from the validator result) on the thread. Do not mark the thread resolved.
+- **Skip** → do nothing on the thread. Note the skip reason in the task summary.
+
+If 5+ `Fix now` decisions are made, batch the resolvers in groups of 4, returning only short status summaries (file, line, fix applied, tests run).
+
+### Phase 6: Summarize
+
+Print a concise report:
+
+- Count by decision: `N fixed | N clarified | N pushed back | N skipped`.
+- One-line per fix with file:line.
+- Any validator disagreements (user overrode the recommendation) flagged for future review.
+
+If any `Fix now` commits landed, remind the user to push and mark the threads resolved on GitHub when appropriate:
+
+```bash
+gh pr view <PR> --json reviewThreads
+# Resolve threads via GraphQL mutation as needed
+```
+
+## Important rules
+
+- **Every comment is pressure-tested.** No comment leaves Phase 2 without a validator assessment.
+- **User makes the call on every finding.** The validator recommends; the user decides.
+- **No file-based state.** Tasks live in the platform's task tracker; validator outputs live in task descriptions. Nothing writes to `todos/*.md`.
+- **Parallel where safe, sequential when required.** Fixes touching the same file must be serialized; independent fixes can run in parallel.
+- **Preserve reviewer quotes verbatim.** When replying on threads, the reviewer's original wording must be traceable in the context.