Merge step (f/g/h): retire todo system, retool pr-comments-review, rewrite ce-upstream-merge

Applies triage doc 021 in full. Legacy file-based todo system is gone; the two local workflows that depended on it are retooled around platform- native task tracking and ce-code-review-style walkthrough patterns. Deletions: - commands/resolve_todo_parallel.md — obsolete; ce-pr-comments-review now orchestrates resolution inline - commands/pr-comments-to-todos.md — replaced by ce-pr-comments-review skill - skills/ce-upstream-merge/assets/merge-triage-template.md — template now lives inline in SKILL.md since decisions live in the task tracker Retool: - New skill skills/ce-pr-comments-review/SKILL.md: fetch PR comments, pressure-test each via validator agent, walk through with AskUserQuestion (Fix now / Clarify / Push back / Skip), dispatch ce-pr-comment-resolver for accepted fixes. Uses TaskCreate instead of todos/*.md. Rewrite: - skills/ce-upstream-merge/SKILL.md: drops the "create todos in todos/" pattern entirely. Phase 2 builds a platform-task decision set instead. Phase 3 walks through with AskUserQuestion. Phase 4 splits execution into 7 named checkpoints (b-h) so the user can inspect and course- correct mid-flight. Adds explicit "hidden conflicts" detection in Phase 1 (upstream rename of file local deleted, upstream deletion of file local depends on, structural refactors). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 14:01:15 -05:00
parent 4d4586d4a8
commit 9b690499ab
5 changed files with 232 additions and 558 deletions
--- a/plugins/compound-engineering/commands/pr-comments-to-todos.md
+++ b/plugins/compound-engineering/commands/pr-comments-to-todos.md
@@ -1,334 +0,0 @@
---
-name: pr-comments-to-todos
-description: Fetch PR comments and convert them into todo files for triage
-argument-hint: "[PR number, GitHub URL, or 'current' for current branch PR]"
---
-
-# PR Comments to Todos
-
-Convert GitHub PR review comments into structured todo files compatible with `/triage`.
-
-<command_purpose>Fetch all review comments from a PR and create individual todo files in the `todos/` directory, following the file-todos skill format.</command_purpose>
-
-## Review Target
-
-<review_target> #$ARGUMENTS </review_target>
-
-## Workflow
-
-### 1. Identify PR and Fetch Comments
-
-<task_list>
-
- [ ] Determine the PR to process:
-  - If numeric: use as PR number directly
-  - If GitHub URL: extract PR number from URL
-  - If "current" or empty: detect from current branch with `gh pr status`
- [ ] Fetch PR metadata: `gh pr view PR_NUMBER --json title,body,url,author,headRefName`
- [ ] Fetch all review comments: `gh api repos/{owner}/{repo}/pulls/{PR_NUMBER}/comments`
- [ ] Fetch review thread comments: `gh pr view PR_NUMBER --json reviews,reviewDecision`
- [ ] Group comments by file/thread for context
-
-</task_list>
-
-### 2. Pressure Test Each Comment
-
-<critical_evaluation>
-
-**IMPORTANT: Treat reviewer comments as suggestions, not orders.**
-
-Before creating a todo, apply engineering judgment to each comment. Not all feedback is equally valid - your job is to make the right call for the codebase, not just please the reviewer.
-
-#### Step 2a: Verify Before Accepting
-
-For each comment, verify:
- [ ] **Check the code**: Does the concern actually apply to this code?
- [ ] **Check tests**: Are there existing tests that cover this case?
- [ ] **Check usage**: How is this code actually used? Does the concern matter in practice?
- [ ] **Check compatibility**: Would the suggested change break anything?
- [ ] **Check prior decisions**: Was this intentional? Is there a reason it's done this way?
-
-#### Step 2b: Assess Each Comment
-
-Assign an assessment to each comment:
-
-| Assessment | Meaning |
-|------------|---------|
-| **Clear & Correct** | Valid concern, well-reasoned, applies to this code |
-| **Unclear** | Ambiguous, missing context, or doesn't specify what to change |
-| **Likely Incorrect** | Misunderstands the code, context, or requirements |
-| **YAGNI** | Over-engineering, premature abstraction, no clear benefit |
-
-#### Step 2c: Include Assessment in Todo
-
-**IMPORTANT: ALL comments become todos.** Never drop feedback - include the pressure test assessment IN the todo so `/triage` can use it to decide.
-
-For each comment, the todo will include:
- The assessment (Clear & Correct / Unclear / Likely Incorrect / YAGNI)
- The verification results (what was checked)
- Technical justification (why valid, or why you think it should be skipped)
- Recommended action for triage (Fix now / Clarify / Push back / Skip)
-
-The human reviews during `/triage` and makes the final call.
-
-</critical_evaluation>
-
-### 3. Categorize All Comments
-
-<categorization>
-
-For ALL comments (regardless of assessment), determine:
-
-**Severity (Priority):**
- 🔴 **P1 (Critical)**: Security issues, data loss risks, breaking changes, blocking bugs
- 🟡 **P2 (Important)**: Performance issues, architectural concerns, significant code quality
- 🔵 **P3 (Nice-to-have)**: Style suggestions, minor improvements, documentation
-
-**Category Tags:**
- `security` - Security vulnerabilities or concerns
- `performance` - Performance issues or optimizations
- `architecture` - Design or structural concerns
- `bug` - Functional bugs or edge cases
- `quality` - Code quality, readability, maintainability
- `testing` - Test coverage or test quality
- `documentation` - Missing or unclear documentation
- `style` - Code style or formatting
- `needs-clarification` - Comment requires clarification before implementing
- `pushback-candidate` - Human should review before accepting
-
-**Skip these (don't create todos):**
- Simple acknowledgments ("LGTM", "Looks good")
- Questions that were answered inline
- Already resolved threads
-
-**Note:** Comments assessed as YAGNI or Likely Incorrect still become todos with that assessment included. The human decides during `/triage` whether to accept or reject.
-
-</categorization>
-
-### 4. Create Todo Files Using file-todos Skill
-
-<critical_instruction>Create todo files for ALL actionable comments immediately. Use the file-todos skill structure and naming convention.</critical_instruction>
-
-#### Determine Next Issue ID
-
-```bash
-# Find the highest existing issue ID
-ls todos/ 2>/dev/null | grep -o '^[0-9]\+' | sort -n | tail -1 | awk '{printf "%03d", $1+1}'
-# If no todos exist, start with 001
-```
-
-#### File Naming Convention
-
-```
-{issue_id}-pending-{priority}-{brief-description}.md
-```
-
-Examples:
-```
-001-pending-p1-sql-injection-vulnerability.md
-002-pending-p2-missing-error-handling.md
-003-pending-p3-rename-variable-for-clarity.md
-```
-
-#### Todo File Structure
-
-For each comment, create a file with this structure:
-
-```yaml
---
-status: pending
-priority: p1  # or p2, p3 based on severity
-issue_id: "001"
-tags: [code-review, pr-feedback, {category}]
-dependencies: []
---
-```
-
-```markdown
-# [Brief Title from Comment]
-
-## Problem Statement
-
-[Summarize the reviewer's concern - what is wrong or needs improvement]
-
-**PR Context:**
- PR: #{PR_NUMBER} - {PR_TITLE}
- File: {file_path}:{line_number}
- Reviewer: @{reviewer_username}
-
-## Assessment (Pressure Test)
-
-| Criterion | Result |
-|-----------|--------|
-| **Assessment** | Clear & Correct / Unclear / Likely Incorrect / YAGNI |
-| **Recommended Action** | Fix now / Clarify / Push back / Skip |
-| **Verified Code?** | Yes/No - [what was checked] |
-| **Verified Tests?** | Yes/No - [existing coverage] |
-| **Verified Usage?** | Yes/No - [how code is used] |
-| **Prior Decisions?** | Yes/No - [any intentional design] |
-
-**Technical Justification:**
-[If pushing back or marking YAGNI, provide specific technical reasoning. Reference codebase constraints, requirements, or trade-offs. Example: "This abstraction would be YAGNI - we only have one implementation and no plans for variants."]
-
-## Findings
-
- **Original Comment:** "{exact reviewer comment}"
- **Location:** `{file_path}:{line_number}`
- **Code Context:**
-  ```{language}
-  {relevant code snippet}
-  ```
- **Why This Matters:** [Impact if not addressed, or why it doesn't matter]
-
-## Proposed Solutions
-
-### Option 1: [Primary approach based on reviewer suggestion]
-
-**Approach:** [Describe the fix]
-
-**Pros:**
- Addresses reviewer concern directly
- [Other benefits]
-
-**Cons:**
- [Any drawbacks]
-
-**Effort:** Small / Medium / Large
-
-**Risk:** Low / Medium / High
-
---
-
-### Option 2: [Alternative if applicable]
-
-[Only include if there's a meaningful alternative approach]
-
-## Recommended Action
-
-*(To be filled during triage)*
-
-## Technical Details
-
-**Affected Files:**
- `{file_path}:{line_number}` - {what needs changing}
-
-**Related Components:**
- [Components affected by this change]
-
-## Resources
-
- **PR:** #{PR_NUMBER}
- **Comment Link:** {direct_link_to_comment}
- **Reviewer:** @{reviewer_username}
-
-## Acceptance Criteria
-
- [ ] Reviewer concern addressed
- [ ] Tests pass
- [ ] Code reviewed and approved
- [ ] PR comment resolved
-
-## Work Log
-
-### {today's date} - Created from PR Review
-
-**By:** Claude Code
-
-**Actions:**
- Extracted comment from PR #{PR_NUMBER} review
- Created todo for triage
-
-**Learnings:**
- Original reviewer context: {any additional context}
-```
-
-### 5. Parallel Todo Creation (For Multiple Comments)
-
-<parallel_processing>
-
-When processing PRs with many comments (5+), create todos in parallel for efficiency:
-
-1. Synthesize all comments into a categorized list
-2. Assign severity (P1/P2/P3) to each
-3. Launch parallel Write operations for all todos
-4. Each todo follows the file-todos skill template exactly
-
-</parallel_processing>
-
-### 6. Summary Report
-
-After creating all todo files, present:
-
-````markdown
-## ✅ PR Comments Converted to Todos
-
-**PR:** #{PR_NUMBER} - {PR_TITLE}
-**Branch:** {branch_name}
-**Total Comments Processed:** {X}
-
-### Created Todo Files:
-
-**🔴 P1 - Critical:**
- `{id}-pending-p1-{desc}.md` - {summary}
-
-**🟡 P2 - Important:**
- `{id}-pending-p2-{desc}.md` - {summary}
-
-**🔵 P3 - Nice-to-Have:**
- `{id}-pending-p3-{desc}.md` - {summary}
-
-### Skipped (Not Actionable):
- {count} comments skipped (LGTM, questions answered, resolved threads)
-
-### Assessment Summary:
-
-All comments were pressure tested and included in todos:
-
-| Assessment | Count | Description |
-|------------|-------|-------------|
-| **Clear & Correct** | {X} | Valid concerns, recommend fixing |
-| **Unclear** | {X} | Need clarification before implementing |
-| **Likely Incorrect** | {X} | May misunderstand context - review during triage |
-| **YAGNI** | {X} | May be over-engineering - review during triage |
-
-**Note:** All assessments are included in the todo files. Human judgment during `/triage` makes the final call on whether to accept, clarify, or reject each item.
-
-### Next Steps:
-
-1. **Triage the todos:**
-   ```bash
-   /triage
-   ```
-   Review each todo and approve (pending → ready) or skip
-
-2. **Work on approved items:**
-   ```bash
-   /resolve_todo_parallel
-   ```
-
-3. **After fixes, resolve PR comments:**
-   ```bash
-   bin/resolve-pr-thread THREAD_ID
-   ```
-````
-
-## Important Notes
-
-<requirements>
- Ensure `todos/` directory exists before creating files
- Each todo must have unique issue_id (never reuse)
- All todos start with `status: pending` for triage
- Include `code-review` and `pr-feedback` tags on all todos
- Preserve exact reviewer quotes in Findings section
- Link back to original PR and comment in Resources
-</requirements>
-
-## Integration with /triage
-
-The output of this command is designed to work seamlessly with `/triage`:
-
-1. **This command** creates `todos/*-pending-*.md` files
-2. **`/triage`** reviews each pending todo and:
-   - Approves → renames to `*-ready-*.md`
-   - Skips → deletes the todo file
-3. **`/resolve_todo_parallel`** works on approved (ready) todos
--- a/plugins/compound-engineering/commands/resolve_todo_parallel.md
+++ b/plugins/compound-engineering/commands/resolve_todo_parallel.md
@@ -1,36 +0,0 @@
---
-name: resolve_todo_parallel
-description: Resolve all pending CLI todos using parallel processing
-argument-hint: "[optional: specific todo ID or pattern]"
---
-
-Resolve all TODO comments using parallel processing.
-
-## Workflow
-
-### 1. Analyze
-
-Get all unresolved TODOs from the /todos/\*.md directory
-
-If any todo recommends deleting, removing, or gitignoring files in `docs/plans/` or `docs/solutions/`, skip it and mark it as `wont_fix`. These are compound-engineering pipeline artifacts that are intentional and permanent.
-
-### 2. Plan
-
-Create a TodoWrite list of all unresolved items grouped by type.Make sure to look at dependencies that might occur and prioritize the ones needed by others. For example, if you need to change a name, you must wait to do the others. Output a mermaid flow diagram showing how we can do this. Can we do everything in parallel? Do we need to do one first that leads to others in parallel? I'll put the to-dos in the mermaid diagram flow‑wise so the agent knows how to proceed in order.
-
-### 3. Implement (PARALLEL)
-
-Spawn a pr-comment-resolver agent for each unresolved item in parallel.
-
-So if there are 3 comments, it will spawn 3 pr-comment-resolver agents in parallel. liek this
-
-1. Task pr-comment-resolver(comment1)
-2. Task pr-comment-resolver(comment2)
-3. Task pr-comment-resolver(comment3)
-
-Always run all in parallel subagents/Tasks for each Todo item.
-
-### 4. Commit & Resolve
-
- Commit changes
- Remove the TODO from the file, and mark it as resolved.
--- a/plugins/compound-engineering/skills/ce-pr-comments-review/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-pr-comments-review/SKILL.md
@@ -0,0 +1,140 @@
+---
+name: ce-pr-comments-review
+description: "Fetch PR review comments, pressure-test each for validity against the actual codebase, then walk through a decision-per-finding (Fix now / Clarify / Push back / Skip) and apply fixes for accepted items. Every comment is scrutinized before any code changes."
+argument-hint: "[PR number, GitHub URL, or 'current' for current branch PR]"
+---
+
+# PR Comments Review
+
+Reviewer comments are **suggestions, not orders**. This skill fetches them, pressure-tests each one against the actual codebase, and asks you to accept, clarify, push back, or skip — one decision per finding. Only accepted items change code.
+
+The flow mirrors `ce-code-review`'s walk-through pattern: validate first, decide per finding, resolve in parallel where safe.
+
+## When to use
+
+- After a human reviewer leaves comments on an open PR and you want an agent pass that **doesn't blindly apply everything**.
+- When a zip-agent or bot reviewer has left many comments and you need the validator pressure-test before acting on any of them.
+- Before declaring a PR "responded to" — as a forcing function that no comment got silently ignored.
+
+## Input
+
+<pr_target> #$ARGUMENTS </pr_target>
+
+If the input above is empty, ask: "Which PR should we review? (number, URL, or 'current')". Default to `current` when unambiguous.
+
+## Workflow
+
+### Phase 1: Fetch
+
+Resolve the PR reference:
+
+- Numeric → PR number.
+- GitHub URL → extract PR number.
+- `current` or empty → `gh pr status` on the current branch.
+
+Then gather:
+
+```bash
+gh pr view <PR> --json number,title,url,author,headRefName,baseRefName
+gh api repos/{owner}/{repo}/pulls/<PR>/comments
+gh pr view <PR> --json reviews,reviewDecision
+```
+
+Group comments by file + line + thread for context. Skip:
+
+- Bare acknowledgments ("LGTM", "Looks good").
+- Questions that were answered inline in later replies.
+- Threads already marked resolved.
+
+### Phase 2: Pressure-test (validate each comment)
+
+**Do not create tasks or prompt the user yet.** First validate.
+
+For each remaining comment, dispatch a validator agent. In Claude Code:
+
+```
+Agent(subagent_type: "compound-engineering:review:ce-zip-agent-validator",
+      description: "Validate PR comment",
+      prompt: "<comment context + file + line + surrounding code>")
+```
+
+The validator's job is to read the actual code and decide whether the comment applies. It returns a result of the shape:
+
+| Field | Values |
+|-------|--------|
+| `assessment` | `Clear & Correct` / `Unclear` / `Likely Incorrect` / `YAGNI` |
+| `recommended_action` | `Fix now` / `Clarify` / `Push back` / `Skip` |
+| `verified_code` | Yes/No plus what was checked |
+| `verified_tests` | Yes/No plus existing coverage |
+| `verified_usage` | Yes/No plus how the code is used |
+| `prior_decisions` | Yes/No plus any intentional design |
+| `technical_justification` | Prose reasoning |
+
+Prefer parallel dispatch for 2+ comments. For 5+ comments, batch in groups of 4 to keep context bounded.
+
+Preserve every validator result — do not drop the `YAGNI` / `Likely Incorrect` outputs. The user sees all of them in Phase 4.
+
+### Phase 3: Track as tasks
+
+Use the platform's task tool (`TaskCreate` in Claude Code, `update_plan` in Codex, etc.) to register one task per validated comment. Task subject: `<priority>: <short title>`. Task description: include file:line, the validator's assessment, recommended action, and technical justification.
+
+This replaces the old file-based `todos/*.md` layout. No files are written to disk.
+
+### Phase 4: Walk-through (one decision per comment)
+
+Present decisions one at a time using the platform's blocking question tool (call `ToolSearch` with `select:AskUserQuestion` first in Claude Code if the schema is not loaded; `request_user_input` in Codex; `ask_user` in Gemini and Pi via the `pi-ask-user` extension). Fall back to numbered options in chat only when no blocking tool is available.
+
+For each task, show:
+
+- The reviewer's original comment (verbatim).
+- File + line + a code snippet around the location.
+- The validator's assessment, technical justification, and recommended action.
+
+Then ask:
+
+**Stem:** `How should the agent handle this comment?`
+
+**Options (the four match the validator's `recommended_action` vocabulary so the user can accept the recommendation with one click or override it):**
+
+1. `Fix now` — apply the reviewer's suggestion (or a concrete variant).
+2. `Clarify` — reply to the reviewer asking for specifics; do not change code.
+3. `Push back` — reply disagreeing with the comment's premise; do not change code.
+4. `Skip` — take no action; mark the task complete with a skip reason.
+
+Order findings: `Clear & Correct` and `Likely Incorrect` first (they drive the biggest decisions), then `Unclear`, then `YAGNI`. Within each, newer threads before older ones.
+
+**Group related findings** when the same thread touches multiple lines or files — present them together in a single decision step, not sequentially.
+
+### Phase 5: Resolve
+
+Apply decisions in parallel where possible.
+
+- **Fix now** → dispatch `compound-engineering:workflow:ce-pr-comment-resolver` per finding (or grouped set). Prefer parallel; fall back to sequential if dependencies exist. When the resolver is done, post a reply on the thread confirming the fix and include the commit SHA.
+- **Clarify** → post the clarifying reply on the thread via `gh api`. Do not mark the thread resolved.
+- **Push back** → post the disagreement (pulling the technical justification from the validator result) on the thread. Do not mark the thread resolved.
+- **Skip** → do nothing on the thread. Note the skip reason in the task summary.
+
+If 5+ `Fix now` decisions are made, batch the resolvers in groups of 4, returning only short status summaries (file, line, fix applied, tests run).
+
+### Phase 6: Summarize
+
+Print a concise report:
+
+- Count by decision: `N fixed | N clarified | N pushed back | N skipped`.
+- One-line per fix with file:line.
+- Any validator disagreements (user overrode the recommendation) flagged for future review.
+
+If any `Fix now` commits landed, remind the user to push and mark the threads resolved on GitHub when appropriate:
+
+```bash
+gh pr view <PR> --json reviewThreads
+# Resolve threads via GraphQL mutation as needed
+```
+
+## Important rules
+
+- **Every comment is pressure-tested.** No comment leaves Phase 2 without a validator assessment.
+- **User makes the call on every finding.** The validator recommends; the user decides.
+- **No file-based state.** Tasks live in the platform's task tracker; validator outputs live in task descriptions. Nothing writes to `todos/*.md`.
+- **Parallel where safe, sequential when required.** Fixes touching the same file must be serialized; independent fixes can run in parallel.
+- **Preserve reviewer quotes verbatim.** When replying on threads, the reviewer's original wording must be traceable in the context.
--- a/plugins/compound-engineering/skills/ce-upstream-merge/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-upstream-merge/SKILL.md
@@ -1,199 +1,160 @@
 ---
 name: ce-upstream-merge
-description: This skill should be used when incorporating upstream git changes into a local fork while preserving local intent. It provides a structured workflow for analyzing divergence, categorizing conflicts, creating triage todos for each conflict, reviewing decisions one-by-one with the user, and executing all resolutions. Triggers on "merge upstream", "incorporate upstream changes", "sync fork", or when local and remote branches have diverged significantly.
+description: "Incorporate upstream git changes into a local fork without losing local intent. Analyzes divergence, surfaces file-level and hidden conflicts, walks you through a per-finding decision (Accept remote / Keep local / Merge both / Keep deleted), then executes resolutions in a structured multi-step pass with checkpoints. Triggers on 'merge upstream', 'incorporate upstream changes', 'sync fork', or when local and remote branches have diverged significantly."
 ---

 # Upstream Merge

-Incorporate upstream changes into a local fork without losing local intent. Analyze divergence, categorize every changed file, triage conflicts interactively, then execute all decisions in a single structured pass.
+Incorporate upstream changes into a local fork without losing local intent. The flow mirrors `ce-code-review`'s shape: analyze, walk through per-finding with the user, then execute with checkpoints.

 ## Prerequisites

-Before starting, establish context:
+Before starting:

-1. **Identify the guiding principle** — ask the user what local intent must be preserved (e.g., "FastAPI pivot is non-negotiable", "custom branding must remain"). This principle governs every triage decision.
-2. **Confirm remote** — verify `git remote -v` shows the correct upstream origin.
-3. **Fetch latest** — `git fetch origin` to get current upstream state.
+1. **Identify the guiding principle** — ask the user which local intent must survive. Every decision reads against this principle.
+2. **Confirm remote** — `git remote -v` should show the intended upstream.
+3. **Fetch latest** — `git fetch origin`.

-## Phase 1: Analyze Divergence
+## Phase 1: Analyze divergence

-Gather the full picture before making any decisions.
-
-**Run these commands:**
+Gather the full picture:

 ```bash
-# Find common ancestor
 git merge-base HEAD origin/main
-
-# Count divergence
-git rev-list --count HEAD ^origin/main   # local-only commits
-git rev-list --count origin/main ^HEAD   # remote-only commits
-
-# List all changed files on each side
+git rev-list --count HEAD ^origin/main
+git rev-list --count origin/main ^HEAD
 git diff --name-only $(git merge-base HEAD origin/main) HEAD > /tmp/local-changes.txt
 git diff --name-only $(git merge-base HEAD origin/main) origin/main > /tmp/remote-changes.txt
-```
-
-**Categorize every file into three buckets:**
-
-| Bucket | Definition | Action |
-|--------|-----------|--------|
-| **Remote-only** | Changed upstream, untouched locally | Accept automatically |
-| **Local-only** | Changed locally, untouched upstream | Keep as-is |
-| **Both-changed** | Modified on both sides | Create triage todo |
-
-```bash
-# Generate buckets
 comm -23 <(sort /tmp/remote-changes.txt) <(sort /tmp/local-changes.txt) > /tmp/remote-only.txt
 comm -13 <(sort /tmp/remote-changes.txt) <(sort /tmp/local-changes.txt) > /tmp/local-only.txt
 comm -12 <(sort /tmp/remote-changes.txt) <(sort /tmp/local-changes.txt) > /tmp/both-changed.txt
 ```

-**Present summary to user:**
+File-level buckets:

-```
-Divergence Analysis:
- Common ancestor: [commit hash]
- Local: X commits ahead | Remote: Y commits ahead
- Remote-only: N files (auto-accept)
- Local-only: N files (auto-keep)
- Both-changed: N files (need triage)
-```
+| Bucket | Meaning | Default action |
+|--------|---------|----------------|
+| **Remote-only** | Changed upstream, local untouched | Accept automatically |
+| **Local-only** | Changed locally, upstream untouched | Keep as-is |
+| **Both-changed** | Modified on both sides | Decision required |

-## Phase 2: Create Triage Todos
+**Also surface hidden conflicts.** File-level bucketing misses three common patterns that still matter:

-For each file in the "both-changed" bucket, create a triage todo using the template at [merge-triage-template.md](./assets/merge-triage-template.md).
+- **Upstream rename of a file local deleted** (remote-only deletion + remote-only creation under a new path, but local intent was to drop it). Check by diffing the set of deleted paths against any new creations at parallel paths.
+- **Upstream deletion of a file local depends on** (remote-only deletion, but local skills/commands reference the deleted thing). Grep the local tree for names of upstream-deleted skills, agents, and commands.
+- **Structural refactors** (massive renames, namespace changes, convention shifts) where every file "looks fine" individually but the project no longer fits together. Scan the remote-only diff for patterns that affect many files consistently.

-**Process:**
+Present the summary to the user with real counts, real examples, and flagged hidden conflicts.

-1. Determine next issue ID: `ls todos/ | grep -o '^[0-9]\+' | sort -n | tail -1`
-2. For each both-changed file:
-   - Read both versions (local and remote)
-   - Generate the diff: `git diff $(git merge-base HEAD origin/main)..origin/main -- <file>`
-   - Analyze what each side intended
-   - Write a recommendation based on the guiding principle
-   - Create todo: `todos/{id}-pending-p2-merge-{brief-name}.md`
+## Phase 2: Build the decision set

-**Naming convention for merge triage todos:**
+Create one platform task (`TaskCreate` in Claude Code; `update_plan` in Codex) per decision that actually needs user input:

-```
-{id}-pending-p2-merge-{component-name}.md
-```
+- Each both-changed file that survives on both sides (content conflict).
+- Each local-kept / remote-deleted file (keep local with rename, or follow upstream's removal).
+- Each hidden conflict surfaced in Phase 1.

-Examples:
- `001-pending-p2-merge-marketplace-json.md`
- `002-pending-p2-merge-kieran-python-reviewer.md`
- `003-pending-p2-merge-workflows-review.md`
+Do **not** create tasks for:

-**Use parallel agents** to create triage docs when there are many conflicts (batch 4-6 at a time).
+- Trivially-resolved both-changed cases (e.g., both-deleted — keep deleted).
+- Pure remote-only files where local has no dependency on them (auto-accept).
+- Pure local-only files untouched upstream (auto-keep).

-**Announce when complete:**
+**Task subject format:** `<short-filename>: <conflict shape>` (e.g., `ce-brainstorm/SKILL.md: content merge`). **Task description:** include file paths, both intents, a recommendation, and acceptance criteria. Keep it under ~40 lines so the walk-through stays readable.

-```
-Created N triage todos in todos/. Ready to review one-by-one.
-```
+Use parallel agents to draft descriptions when there are many decisions (batch 4-6 at a time). No file writes to `todos/` or anywhere else — the platform's task tracker is the source of truth.

-## Phase 3: Triage (Review One-by-One)
+Announce when the decision set is built: `Decision set: N items. Ready to walk through.`

-Present each triage todo to the user for a decision. Follow the `/triage` command pattern.
+## Phase 3: Walk-through (one decision per finding)

-**For each conflict, present:**
+Present each task one at a time using the platform's blocking question tool. In Claude Code, `AskUserQuestion` (call `ToolSearch` with `select:AskUserQuestion` first if not loaded); in Codex, `request_user_input`; in Gemini, `ask_user`; in Pi, `ask_user` via the `pi-ask-user` extension. Fall back to numbered options in chat only when no blocking tool is available.

-```
---
-Conflict X/N: [filename]
+For each finding, show:

-Category: [agent/command/skill/config]
-Conflict Type: [content/modify-delete/add-add]
+- The file (or group of files if related).
+- The conflict shape (content / rename / modify-delete / hidden).
+- Both intents in one sentence each.
+- The recommendation and the reasoning tied to the guiding principle.
+- The diff summary when it's short enough to be useful.

-Remote intent: [what upstream changed and why]
-Local intent: [what local changed and why]
+Then ask:

-Recommendation: [Accept remote / Keep local / Merge both / Keep deleted]
-Reasoning: [why, referencing the guiding principle]
+**Stem:** `How should the merge resolve this?`

---
-How should we handle this?
-1. Accept remote — take upstream version as-is
-2. Keep local — preserve local version
-3. Merge both — combine changes (specify how)
-4. Keep deleted — file was deleted locally, keep it deleted
-```
+**Options (match the decision vocabulary):**

-**Use AskUserQuestion tool** for each decision with appropriate options.
+1. `Accept remote` — take upstream's version as-is.
+2. `Keep local` — preserve local as-is; discard upstream's change.
+3. `Merge both` — combine; specify the merge shape (e.g., upstream structure + local content hunks).
+4. `Keep deleted` — file stays gone regardless of upstream's continued version.

-**Record decisions** by updating the triage todo:
- Fill the "Decision" section with the chosen resolution
- Add merge instructions if "merge both" was selected
- Update status: `pending` → `ready`
+**Group related findings** — e.g., all files in a renamed skill dir, or all files in a cohesive upstream refactor. Group decisions reduce fatigue and catch inconsistency.

-**Group related files** when presenting (e.g., present all 7 dspy-ruby files together, not separately).
+**Track progress:** show `X/N decided` before each question. Update the corresponding task's status to `completed` as decisions land.

-**Track progress:** Show "X/N completed" with each presentation.
+## Phase 4: Execute in checkpoints

-## Phase 4: Execute Decisions
+Never apply all decisions silently in one commit. Break into checkpoints so the user can inspect and course-correct mid-flight.

-After all triage decisions are made, execute them in a structured order.
-
-### Step 1: Create Working Branch
+### Setup

 ```bash
-git branch backup-local-changes   # safety net
-git checkout -b merge-upstream origin/main
+git branch backup-local-<date> main    # safety net
+git checkout -b merge-upstream-<date> origin/main
 ```

-### Step 2: Execute in Order
+All subsequent work lands on the new branch. The backup branch is the escape hatch.

-Process decisions in this sequence to avoid conflicts:
+### Checkpoint sequence

-1. **Deletions first** — Remove files that should stay deleted
-2. **Copy local-only files** — `git checkout backup-local-changes -- <file>` for local additions
-3. **Merge files** — Apply "merge both" decisions (the most complex step)
-4. **Update metadata** — Counts, versions, descriptions, changelogs
+1. **(b) Carry in local-only files** — `git checkout backup-local-<date> -- <file>` for each local-only addition. Exclude files that need special handling (e.g., a locally-modified agent that upstream renamed — port that in step c.5). Commit.
+2. **(c) Rename / restructure to upstream convention** — move local custom agents to flat `ce-<name>.agent.md`, rename local custom skill dirs to `ce-<name>/`, delete upstream's renamed versions of locally-deleted items, update frontmatter `name:` fields. Commit.
+3. **(d) Content merges** — apply each `Merge both` decision. Commit per-file or per-skill for clean review.
+4. **(e) Ports across rename boundaries** — e.g., when upstream renamed a skill and local had modifications under the old path, port the modifications into the new path. Commit.
+5. **(f) Deletions + retools** — apply `Keep deleted` and `Accept remote` deletion decisions. If a local skill's purpose survives but its mechanism was deprecated, retool it into the new pattern (don't just delete). Commit.
+6. **(g) Reference sweep** — grep for old namespace references, old agent names, old slash commands; replace with new equivalents. Commit.
+7. **(h) Metadata refresh** — recompute agent/skill counts in `README.md`, plugin.json descriptions, marketplace catalogs. Never hand-bump release-owned versions. Commit.

-### Step 3: Verify
+At each checkpoint, ask the user before proceeding: `Checkpoint N/7 complete. Proceed to <next>, hold for inspection, or revise?`
+
+### Validate before merging

 ```bash
-# Validate JSON/YAML files
-cat <config-files> | python3 -m json.tool > /dev/null
-
-# Verify component counts match descriptions
-# (skill-specific: count agents, commands, skills, etc.)
-
-# Check diff summary
-git diff --stat HEAD
+bun run release:validate
+bun test
+grep -r "compound-engineering:" plugins/ | grep -v "docs/\|specs/\|\.md:" | head
 ```

-### Step 4: Commit and Merge to Main
+Expected: tests pass (or match baseline pre-merge state), release validator passes, no stray old-namespace agent refs remain in skill/agent/command content.
+
+### Merge to main
+
+Ask before merging: `All checkpoints applied and verified. Merge to main now?`. On yes:

 ```bash
-git add <specific-files>   # stage explicitly, not -A
-git commit -m "Merge upstream vX.Y.Z with [guiding principle] (vX.Y.Z+1)"
 git checkout main
-git merge merge-upstream
+git merge merge-upstream-<date>
 ```

-**Ask before merging to main** — confirm the user wants to proceed.
+## Decision heuristics

-## Decision Framework
-
-When making recommendations, apply these heuristics:
-
-| Signal | Recommendation |
-|--------|---------------|
+| Signal | Default recommendation |
+|--------|------------------------|
 | Remote adds new content, no local equivalent | Accept remote |
 | Remote updates content local deleted intentionally | Keep deleted |
-| Remote has structural improvements (formatting, frontmatter) + local has content changes | Merge both: remote structure + local content |
-| Both changed same content differently | Merge both: evaluate which serves the guiding principle |
-| Remote renames what local deleted | Keep deleted |
-| File is metadata (counts, versions, descriptions) | Defer to Phase 4 — recalculate from actual files |
+| Remote structural improvements + local content changes | Merge both: remote structure + local content |
+| Both changed same content differently | Merge both: pick the side that serves the guiding principle; port the minimum set of hunks from the other |
+| Remote renames what local deleted | Keep deleted (drop the renamed upstream version too) |
+| File is metadata (counts, versions, descriptions) | Defer to the metadata-refresh checkpoint — recalculate from actual files |
+| Upstream removed a skill local depends on (hidden conflict) | Fix local references AND accept the removal, OR retool local to not need the removed thing |

-## Important Rules
+## Important rules

- **Never auto-resolve "both-changed" files** — always triage with user
- **Never code during triage** — triage is for decisions only, execution is Phase 4
- **Always create a backup branch** before making changes
- **Always stage files explicitly** — never `git add -A` or `git add .`
- **Group related files** — don't present 7 files from the same skill directory separately
- **Metadata is derived, not merged** — counts, versions, and descriptions should be recalculated from actual files after all other changes are applied
- **Preserve the guiding principle** — every recommendation should reference it
+- **No file-based state.** The decision set lives in the platform's task tracker, not in a `todos/` directory. Prior versions of this skill wrote `todos/NNN-*.md`; that pattern is retired.
+- **Never auto-resolve content conflicts.** Every `Merge both` / `Keep local` / `Accept remote` call is the user's, not the agent's.
+- **Checkpoint, don't mega-commit.** Seven small commits read better than one 400-file mega-commit.
+- **Stage explicitly.** Prefer `git add <paths>` over `git add -A` so private files and stale scratch never sneak in.
+- **Group related findings during walk-through.** A 7-file rename is one decision, not seven.
+- **Metadata is derived, not merged.** Component counts, descriptions, and version strings are computed after all other work lands.
+- **Preserve the guiding principle.** Every recommendation should reference it; every decision should be defensible against it.
--- a/plugins/compound-engineering/skills/ce-upstream-merge/assets/merge-triage-template.md
+++ b/plugins/compound-engineering/skills/ce-upstream-merge/assets/merge-triage-template.md
@@ -1,57 +0,0 @@
---
-status: pending
-priority: p2
-issue_id: "XXX"
-tags: [upstream-merge]
-dependencies: []
---
-
-# Merge Conflict: [filename]
-
-## File Info
-
-| Field | Value |
-|-------|-------|
-| **File** | `path/to/file` |
-| **Category** | agent / command / skill / config / other |
-| **Conflict Type** | content / modify-delete / add-add |
-
-## What Changed
-
-### Remote Version
-
-[What the upstream version added, changed, or intended]
-
-### Local Version
-
-[What the local version added, changed, or intended]
-
-## Diff
-
-<details>
-<summary>Show diff</summary>
-
-```diff
-[Relevant diff content]
-```
-
-</details>
-
-## Recommendation
-
-**Suggested resolution:** Accept remote / Keep local / Merge both / Keep deleted
-
-[Reasoning for the recommendation, considering the local fork's guiding principles]
-
-## Decision
-
-**Resolution:** *(filled during triage)*
-
-**Details:** *(specific merge instructions if "merge both")*
-
-## Acceptance Criteria
-
- [ ] Resolution applied correctly
- [ ] No content lost unintentionally
- [ ] Local intent preserved
- [ ] File validates (JSON/YAML if applicable)