feat(ce-polish-beta): human-in-the-loop polish phase between /ce:review and merge (#568)

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
2026-04-16 17:55:10 -05:00
parent 3d96c0f074
commit 070092d997
25 changed files with 3800 additions and 0 deletions
--- a/docs/plans/2026-04-15-001-feat-ce-polish-skill-plan.md
+++ b/docs/plans/2026-04-15-001-feat-ce-polish-skill-plan.md
@@ -0,0 +1,639 @@
+---
+title: "feat: Add /ce:polish skill for human-in-the-loop refinement before merge"
+type: feat
+status: active
+date: 2026-04-15
+---
+
+# feat: Add `/ce:polish` skill for human-in-the-loop refinement before merge
+
+## Overview
+
+Add a new workflow skill at `plugins/compound-engineering/skills/ce-polish/SKILL.md` that implements the "polish phase" — a human-in-the-loop refinement step that runs AFTER `/ce:review` (tests + review green) and BEFORE merge. Polish is the second of two human-in-the-loop moments in an otherwise-automated flow; the first is `/ce:brainstorm` (WHAT to build). Polish answers: *does this feel right to a real user?*
+
+The skill accepts a PR number, URL, or branch name (blank → current branch), verifies that review has already completed successfully, merges latest `main` into the branch with the user's confirmation, starts a local dev server from a user-authored `.claude/launch.json` (with per-framework auto-detect as a fallback), opens the app in the host IDE's built-in browser when available (Claude Code desktop, Cursor, soon Codex) and falls back to printing the URL otherwise, generates an end-user-testable checklist from the diff and PR body, and dispatches polish sub-agents (design iterators, frontend race reviewers, simplicity reviewers) to fix issues the human flags. If the polish batch exceeds one "focus area" (more than one component, cross-cutting files, or cannot be tested as a single user flow), the skill refuses to batch-fix and emits a stacked-PR hand-off artifact.
+
+Ship as `ce:polish-beta` first per the beta-skills framework; promote to stable after usage feedback.
+
+## Problem Frame
+
+The compound-engineering plugin automates most of the development flow end-to-end (`/ce:ideate → /ce:brainstorm → /ce:plan → /ce:work → /ce:review`). Today there is no structured step between a green review and merge. Two gaps result:
+
+1. **Craft/UX is never experienced as an end user.** Review catches correctness, security, and structural issues. It does not catch "this animation is janky," "the empty state is ugly," or "this response feels slow." A human has to use the feature to notice those.
+2. **Polish work accidentally becomes scope creep.** When a human does sit down to polish, it's easy to keep adding to the same PR until it's too large to understand or review again — and the polish never ships cleanly.
+
+Polish needs its own shaped step: bounded, human-driven, but automation-assisted for the fixes themselves. It also needs an explicit size gate so polish tasks that outgrow the PR get split into stacked PRs rather than bloating the original.
+
+The transcript that motivated this plan frames polish as "the second human-in-the-loop moment" — deliberately paired with brainstorm on either end of an automated middle.
+
+## Requirements Trace
+
+From the feature description (10 deliverables):
+
+- **R1.** Command lives as a skill at `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md` with frontmatter `name`, `description`, `argument-hint`, `disable-model-invocation: true` — matching the canonical `ce:review` / `ce:work` / `ce:brainstorm` shape under the beta-first convention (promoted to `skills/ce-polish/` in a follow-up PR).
+- **R2.** Skill SKILL.md structured for progressive disclosure: body under ~500 lines, per-framework dev-server recipes and checklist/dispatch templates extracted to `references/`, deterministic classifiers in `scripts/`.
+- **R3.** `$ARGUMENTS` parses PR number, PR URL, branch name, or blank → current branch, plus named tokens that strip before the target is interpreted: `mode:headless` (machine envelope for LFG/pipelines) and `trust-fork:1` (explicit fork-PR trust override). Additional tokens (`mode:report-only`, `mode:autonomous`) are deferred to follow-up PRs so the surface stays honest about what's actually implemented.
+- **R4.** Dev-server lifecycle is config-driven with auto-detect fallback. Primary source is `.claude/launch.json` at the repo root (Claude Code's launch-config convention); when absent or incomplete, fall back to per-framework auto-detection (Rails / Next.js / Vite / Procfile / Overmind) and offer to write a minimal `launch.json` stub the user can confirm and save for future runs. Kill and restart surface the PID and log path so the user can reclaim control.
+- **R4b.** When running inside an IDE with an embedded browser (Claude Code desktop, Cursor, future Codex), open the polish URL in that browser; otherwise print the URL for the user to open manually. Detection is best-effort and non-blocking — failure to detect the IDE always falls through to printing the URL.
+- **R5.** Skill refuses to polish untested or unreviewed work, based on two signals: the latest `.context/compound-engineering/ce-review/<run-id>/` artifact's verdict, plus `gh pr checks` green.
+- **R6.** Test checklist is generated from the diff, PR body, and (if available) the plan referenced via `plan:<path>` — never by asking the human "what would you like to test?".
+- **R7.** Polish sub-agents are dispatched via fully qualified names (`compound-engineering:design:design-iterator`, `compound-engineering:review:julik-frontend-races-reviewer`, etc.). Dispatch is sequential below 5 items, parallel above — with the invariant that items touching the same file path never run concurrently.
+- **R8.** A "too big" detector operates on two tiers. Per-item: items exceeding file-count, cross-surface, or diff-line thresholds are refused and routed to a stacked-PR hand-off artifact. Per-batch: when the overall polish run shows the PR as a whole is too large (majority-oversized items, repeated `replan` actions from the user, or a preemptive diff-size probe before checklist generation), polish escalates to re-planning — writes a `replan-seed.md` pointing back to the originating brainstorm/plan and routes the user to `/ce:plan` or `/ce:brainstorm`. The size gate at both tiers is load-bearing, not decoration.
+- **R9.** `/ce:polish` slots between `/ce:review` and `/git-commit-push-pr` in the workflow. `/ce:work` Phase 3 offers polish as a next step after `/ce:review` completes. `mode:headless` variant exists so LFG and future pipelines can chain it.
+- **R10.** Feature branch for this work: `feat/ce-polish-command`. No release-owned versions bumped in the PR.
+
+## Scope Boundaries
+
+**In scope:**
+- New beta skill `skills/ce-polish-beta/` (promoted to `skills/ce-polish/` in a follow-up PR per the beta-skills framework)
+- `.claude/launch.json` reader + auto-detect fallback + stub-writer; per-framework dev-server recipes (Rails, Next.js/Node, Vite, Procfile/Overmind) as the fallback path
+- IDE detection (Claude Code, Cursor, future Codex) for embedded-browser handoff; progressive enhancement, never a gate
+- Edit-file-then-ack human interaction loop via `.context/compound-engineering/ce-polish/<run-id>/checklist.md`
+- Two-tier size gate: per-item (stacked-PR seed) and per-batch (replan escalation back to `/ce:plan` or `/ce:brainstorm`)
+- Fork-PR trust boundary check at the entry gate (requires `trust-fork:1` token for cross-repository PRs)
+- Reuse of `resolve-base.sh` (duplicated into the new skill's `references/`, per the "no cross-directory references" rule)
+- Sub-agent orchestration of existing design and review agents — no new agents created in this PR
+- README.md component count update (author edit, not release-owned)
+
+**Out of scope:**
+- Creating a new "copy/microcopy polish" sub-agent — out of scope; surfaced as a future consideration. Copy polish folds into the `design-iterator` loop for v1.
+- Modifying `/ce:work` or `/ce:review` to automatically chain into `/ce:polish`. The first release is manually invoked after `/ce:review`. Automatic chaining belongs in a follow-up PR once beta usage proves the shape.
+- Version bumps in `plugins/compound-engineering/.claude-plugin/plugin.json` or `.claude-plugin/marketplace.json`, or manual `CHANGELOG.md` entries — release-please automation owns these (per `plugins/compound-engineering/AGENTS.md`).
+- Adding a web UI / browser-extension annotation layer for polish note-taking. The transcript mentions annotating in the browser; in v1, notes are captured as plain prose input to the skill, which then dispatches fixes. Browser-side annotation is a follow-up.
+
+## Context & Research
+
+### Relevant Code and Patterns
+
+- **Skill-as-slash-command pattern:** Since v2.39.0, former `/command-name` slash commands live under `plugins/compound-engineering/skills/<command-name>/SKILL.md` (see `plugins/compound-engineering/AGENTS.md`). No `commands/` directory exists. Polish follows this pattern.
+- **Argument parsing (token-based):** `plugins/compound-engineering/skills/ce-review/SKILL.md:19-33` defines the canonical `mode:*`, `base:*`, `plan:*` token-stripping pattern. Polish adopts it verbatim for future extensibility.
+- **Frontmatter for interactively-invocable workflow skills:** `plugins/compound-engineering/skills/ce-review/SKILL.md:1-5` and `plugins/compound-engineering/skills/ce-work/SKILL.md:1-5` — `name: ce:<verb>`, description with natural-language trigger phrases, `argument-hint`, no `disable-model-invocation` for stable workflow skills.
+- **Beta-first convention:** `plugins/compound-engineering/skills/ce-work-beta/` shows the beta pattern. Frontmatter: `name: ce:<verb>-beta`, description prefixed `[BETA]`, `disable-model-invocation: true`. Convention documented in `docs/solutions/skill-design/beta-skills-framework.md`.
+- **Branch / PR acquisition:** `plugins/compound-engineering/skills/ce-review/SKILL.md:184-267` — clean-worktree check via `git status --porcelain`, then `gh pr checkout <n>` for PRs, `git checkout <branch>` for branches, shared `resolve-base.sh` helper for base-branch resolution.
+- **Port detection cascade:** `plugins/compound-engineering/skills/test-browser/SKILL.md:97-143` — CLI flag → `AGENTS.md`/`CLAUDE.md` → `package.json` dev-script → `.env*` → default `3000`. Polish reuses this cascade as-is.
+- **Review artifact location and envelope:** `plugins/compound-engineering/skills/ce-review/SKILL.md:509-516` (headless envelope exposes `Artifact: .context/compound-engineering/ce-review/<run-id>/`) and `SKILL.md:675-680` (what's written). Polish reads this to gate entry.
+- **Scratch space convention:** `.context/compound-engineering/<workflow>/<run-id>/` with `RUN_ID=$(date +%Y%m%d-%H%M%S)-$(head -c4 /dev/urandom | od -An -tx1 | tr -d ' ')`. Used by ce-review, ce-optimize, ce-plan-deepening.
+- **Sub-agent dispatch:** `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md:135-164` is the canonical parallel-dispatch pattern. `plugins/compound-engineering/skills/ce-review/references/subagent-template.md` is the canonical sub-agent prompt shape. Fully qualified names mandatory; omit `mode` on tool calls to honor user permission settings.
+- **Polish-relevant existing agents:** `agents/design/design-iterator.md`, `agents/design/design-implementation-reviewer.md`, `agents/design/figma-design-sync.md`, `agents/review/code-simplicity-reviewer.md`, `agents/review/maintainability-reviewer.md`, `agents/review/julik-frontend-races-reviewer.md`. All referenced via fully qualified `compound-engineering:<category>:<name>`.
+- **Complexity / focus-area heuristic:** `plugins/compound-engineering/skills/ce-work/SKILL.md:36-42` (Trivial / Small / Large matrix) and `plugins/compound-engineering/skills/ce-work/references/shipping-workflow.md:25-30, 108-112` (Tier 1 single-concern criteria). Polish's "too big" detector extends these.
+- **Mode detection and headless envelope:** `plugins/compound-engineering/skills/ce-review/SKILL.md:36-72` — the mode table, the headless rules, and the terminal `Review complete` signal. Polish mirrors this shape with `Polish complete`.
+
+### Institutional Learnings
+
+- **`docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md`** — Branch/PR-switching skills must be modeled as explicit state machines and re-probe at each transition. Polish re-reads `git branch --show-current`, server PID, and PR number after every checkout or kill. Never carries earlier values forward in prose.
+- **`docs/solutions/skill-design/compound-refresh-skill-improvements.md`** — Question-before-evidence is an anti-pattern. Polish generates the test checklist *before* asking the human what to test; the human edits the generated list rather than authoring it from scratch. All confirmations include concrete command/port/PID so the human can judge without a follow-up.
+- **`docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md`** — Orchestrator hands paths to sub-agents; sub-agents do their own reads. Polish passes the diff file list, the review artifact path, and the PR number — never inlined diff content.
+- **`docs/solutions/best-practices/codex-delegation-best-practices-2026-04-01.md`** — ~5-7 unit crossover for parallel dispatch; "never split units that share files." Polish goes sequential below 5 items, parallel above, with the same-file collision guard.
+- **`docs/solutions/skill-design/script-first-skill-architecture.md`** — Deterministic classification (project-type, file-to-surface mapping, oversize detection) belongs in bundled scripts, not the model. 60-75% token reduction.
+- **`docs/solutions/workflow/todo-status-lifecycle.md`** — Status fields only have value when a downstream consumer branches on them. Polish's `status: {manageable | oversized}` per-item field is load-bearing — the dispatcher branches on it (`manageable` → fix, `oversized` → stacked-PR seed).
+- **`docs/solutions/developer-experience/branch-based-plugin-install-and-testing-2026-03-26.md`** — Shared checkout can't serve two branches. If the user is already on a worktree for the target PR, attach; do not silently re-checkout the primary.
+- **`docs/solutions/skill-design/beta-skills-framework.md`** + `.../ce-work-beta-promotion-checklist-2026-03-31.md` — New workflow skills ship first as `-beta` with `disable-model-invocation: true`. Promotion later requires updating every caller in the same PR.
+
+### External References
+
+None required. Repo patterns and institutional learnings cover every decision; no external framework behavior is in dispute. (For cross-platform "kill process by port," `lsof -i :$PORT -t | xargs -r kill` is portable across macOS/Linux; documented inline in the dev-server reference file.)
+
+## Key Technical Decisions
+
+- **Ship as beta first (`skills/ce-polish-beta/`, `name: ce:polish-beta`).** Polish is a new human-in-the-loop workflow skill with multiple novel patterns (dev-server lifecycle, CI-check verification, checklist generation, stacked-PR hand-off). Per `beta-skills-framework.md`, new workflow skills ship beta first with `disable-model-invocation: true`. Promote to `ce:polish` in a follow-up PR once real usage validates the shape. *Rationale: every novel pattern listed below could miss on first design; beta contains blast radius and signals "this shape is not final yet."*
+- **Follow `ce:review`'s token-based argument parsing, not `ce:work`'s `<input_document>` wrapper.** Polish needs structured flags (`mode:*`, eventually `focus:*`, `skip-server-restart`) combined with a free-form target (PR/branch/blank). `ce:review`'s table-based token stripping is the right pattern. *Rationale: pattern already proven in the plugin's most-flag-rich skill.*
+- **Config-first dev-server, `.claude/launch.json` as primary source.** Polish reads `.claude/launch.json` at the repo root first. Schema: VS Code-compatible `version` + `configurations[]` array, each entry with `name`, `runtimeExecutable`, `runtimeArgs`, `port`, `cwd`, `env`. If multiple configurations exist, ask the user to pick. If no `launch.json` exists, fall back to per-framework auto-detect. If auto-detect succeeds, offer to write a minimal `launch.json` stub back to disk so future runs are deterministic. *Rationale: user-authored config is a cleaner trust boundary than auto-executing `bin/dev` from a checked-out branch, piggybacks on a standard Claude Code / VS Code / Cursor users are already adopting, and eliminates detection ambiguity on monorepos or unusual project layouts. Standard is not fully unified across IDEs yet — we lead with `.claude/launch.json` because it's the Claude Code native path; users on other IDEs can still author it.*
+- **Reuse `test-browser`'s port-detection cascade as the auto-detect fallback.** When `launch.json` is absent, cascade: CLI flag → `AGENTS.md`/`CLAUDE.md` → `package.json` dev-script → `.env*` → default `3000`. Do not invent a new cascade. *Rationale: consistency across the plugin, and the cascade already handles the long tail of project conventions when the user hasn't authored explicit config.*
+- **IDE-aware browser handoff.** After the server is reachable, probe for the host IDE via environment variables (`CLAUDE_CODE`, `CURSOR_TRACE_ID`, `TERM_PROGRAM=vscode`, future Codex signals). If running inside an IDE with an embedded browser, emit an open-in-browser instruction the IDE understands; otherwise print `http://localhost:<port>` in the interactive summary. Detection failure is silent — always fall through to printing the URL. *Rationale: polish is inherently iterative, and a built-in browser keeps the loop inside the editor. But IDE detection is a moving target across tools, so treat it as progressive enhancement, never a gate.*
+- **Kill-by-port uses `lsof -i :$PORT -t | xargs -r kill`, gated behind user confirmation.** Portable across macOS/Linux. The confirmation step is mandatory — the plugin's posture everywhere else is "ask the user to do environment setup" (see `test-browser` which tells the user to start the server manually rather than starting it itself). Polish breaks this posture only with explicit consent, and only for the kill step; the start step also asks before executing. *Rationale: destructive action on user's local processes; user consent is non-negotiable.*
+- **Start dev server via background task with PID + log-path reported.** Use the platform's `run_in_background` + Monitor equivalent (in Claude Code: `Bash(..., run_in_background=true)`), capture PID, and print the log tail file path so the user can `tail -f` it themselves. *Rationale: dev servers outlive the polish run; the user must be able to reclaim control.*
+- **Entry gate reads the latest `ce-review` artifact, not CI alone.** Polish looks at `.context/compound-engineering/ce-review/*/` sorted by mtime; requires verdict `Ready to merge` or `Ready with fixes`. *Additionally* runs `gh pr checks <pr> --json bucket,state` for CI green signal. If either gate fails, refuse with clear routing message ("run `/ce:review` first" or "wait for CI"). *Rationale: the review artifact is the canonical "review done" signal in the plugin; CI green is the canonical "tests passed" signal. Both are required.*
+- **Merge `main` back into the branch with user confirmation, not rebase.** `git fetch origin && git merge origin/<base>` after clean-worktree check. Merge, not rebase, because polish operates on a PR that may already have external review comments tied to commits — rebasing orphans those. *Rationale: preserve review-thread anchoring.*
+- **Test checklist generation happens in the model with a bundled prompt template; classification (file → surface, item → oversized) happens in scripts.** The checklist is a judgment artifact (what's worth experiencing as a user); classification is deterministic. Split accordingly per `script-first-skill-architecture.md`.
+- **Sub-agent selection via deterministic rules + diff signal.** Script inspects the diff and emits a proposed agent set: design agents if `.erb`/`.tsx`/`.vue`/`.svelte`/`.css`/`.scss` files changed; frontend-races reviewer if `stimulus`/`turbo`/`hotwire` or async JS patterns detected; simplicity/maintainability reviewer for all polish runs as a sanity pass. *Rationale: agents-as-personas pattern matches `ce:review`; the orchestrator doesn't guess.*
+- **Size gate is load-bearing.** Each checklist item carries `status: {manageable | oversized}`. The dispatcher branches: `manageable` → dispatch a fix sub-agent; `oversized` → refuse to fix, write a stacked-PR seed to `.context/compound-engineering/ce-polish/<run-id>/stacked-pr-<n>.md`, and emit guidance to the user with a proposed branch name. *Rationale: without branching consumption, size gates rot into decoration (per `todo-status-lifecycle.md`).*
+- **Worktree-aware checkout.** Before `gh pr checkout`, probe `git worktree list --porcelain` for the PR branch. If found, attach (cd into the worktree) rather than switching the user's primary checkout. *Rationale: silent branch switches on a running server + shared checkout are one of the more painful ways this could misbehave (per `branch-based-plugin-install-and-testing`).*
+- **`mode:headless` support from v1.** Emit structured completion envelope with `Polish complete` terminal signal, artifact path, and pending-stacked-PR list — mirroring `ce:review` headless. *Rationale: LFG and future pipelines need a machine-consumable completion shape; retrofitting later is harder than building it in.*
+
+## Open Questions
+
+### Resolved During Planning
+
+- *Should polish ship as stable or beta first?* **Beta (`ce:polish-beta`).** Resolved via `beta-skills-framework.md` learning — multiple novel patterns warrant beta containment. Promotion follow-up PR will flip the name and update callers.
+- *Where does polish verify "review done"?* Latest `.context/compound-engineering/ce-review/<run-id>/` artifact verdict + `gh pr checks`. Both must pass.
+- *Does polish itself manage the dev server, or ask the user to?* Polish manages it (kill + restart) with user confirmation at each step. This is a deliberate posture break from `test-browser`, justified because polish is inherently a tight iterate-and-see loop where manual server juggling is the thing polish exists to eliminate.
+- *Rebase or merge when pulling latest main?* Merge. Rebasing would orphan existing PR review-thread anchors.
+- *What agents does polish dispatch?* Existing design and review agents (`design-iterator`, `design-implementation-reviewer`, `figma-design-sync`, `code-simplicity-reviewer`, `maintainability-reviewer`, `julik-frontend-races-reviewer`). No new agents in this PR.
+- *When sub-agents run in parallel, how are file-collision-prone items handled?* Items touching overlapping file paths always run sequentially regardless of total count. The dispatcher groups items by file-path intersection before deciding parallel vs sequential.
+
+### Deferred to Implementation
+
+- *Exact file-count / line-count thresholds for "oversized."* The classifier script should start conservative (e.g., >5 distinct file paths, or >2 distinct surface categories, or >300 diff lines for a single polish item) and be tuned after first beta runs. Don't pretend the thresholds are precisely right at plan time.
+- *Exact format of the stacked-PR seed artifact.* Minimum: target branch name suggestion, description seed, file list, references to the review artifact. Detailed schema belongs in implementation once the downstream consumer (a future `/ce:stack-pr`?) is clearer.
+- *Which log-tail strategy on each platform.* Rails `bin/dev` writes to stdout; Next.js `npm run dev` to stdout; Procfile/Overmind to overmind socket. Specific tail capture belongs in per-framework `references/dev-server-*.md`.
+- *Whether `/ce:work` should auto-chain into `/ce:polish` after review completes.* Deferred to a follow-up PR. First release is manually invoked; chain integration after beta usage signals the shape is right.
+- *What happens if the user is in a git worktree but the PR is not checked out in any worktree.* Recommended behavior is "offer `git worktree add`" but the UX needs to be designed during implementation with an actual worktree scenario to trigger against.
+
+## High-Level Technical Design
+
+> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.*
+
+### State machine
+
+```mermaid
+flowchart TB
+    A[Start: parse args] --> B{Target provided?}
+    B -->|PR number/URL| C[gh pr view + worktree probe]
+    B -->|Branch name| D[git checkout]
+    B -->|Blank| E[Use current branch]
+    C --> F{Review artifact green?}
+    D --> F
+    E --> F
+    F -->|No| FAIL1[Refuse: run /ce:review first]
+    F -->|Yes| G{CI checks green?}
+    G -->|No| FAIL2[Refuse: wait for CI]
+    G -->|Yes| H[Ask: merge main?]
+    H -->|Confirm| I[git merge origin/base]
+    H -->|Skip| LJ{launch.json exists?}
+    I --> LJ
+    LJ -->|Valid single config| K[Use config]
+    LJ -->|Valid multi config| LJP[Ask: which config?]
+    LJP --> K
+    LJ -->|Invalid JSON| FAIL4[Refuse: fix launch.json]
+    LJ -->|Missing| J[Auto-detect project type]
+    J --> JP[Detect port cascade]
+    JP --> JS[Ask: save as launch.json?]
+    JS --> K
+    K --> L[Ask: kill existing server?]
+    L -->|Confirm| M[lsof kill + start background]
+    L -->|Skip| N{Server already reachable?}
+    M --> IDE[Probe IDE env vars]
+    N -->|Yes| IDE
+    N -->|No| FAIL3[Refuse: no server]
+    IDE --> PRE{Preemptive size probe > 30 files or 1000 lines?}
+    PRE -->|Yes| REPLAN1[Write replan-seed; route to /ce:plan or /ce:brainstorm]
+    PRE -->|No| O[Generate checklist + open in IDE browser or print URL]
+    O --> P[Size gate classification per item]
+    P --> MAJ{Majority items oversized?}
+    MAJ -->|Yes| REPLAN2[Write replan-seed; ask continue / replan / rethink]
+    MAJ -->|No| Q{Any items oversized?}
+    Q -->|Yes| R[Write stacked-PR seeds + warn]
+    Q -->|No| S[Present checklist to human]
+    R --> S
+    REPLAN2 -->|continue subset| S
+    S --> T[Human edits checklist.md, replies ready/done/cancel]
+    T --> U{Any items action=fix?}
+    U -->|No| Z[Write polish summary]
+    U -->|action=replan detected| REPLAN3[Escalate to re-plan]
+    U -->|Yes| V[Group by file collision]
+    V --> W[Dispatch fix sub-agents]
+    W --> WX[Rewrite checklist.md with results]
+    WX --> T
+    Z --> END[Polish complete envelope]
+    REPLAN1 --> END
+    REPLAN2 -->|halt| END
+    REPLAN3 --> END
+```
+
+### Skill directory shape
+
+```
+skills/ce-polish-beta/
+├── SKILL.md                              # <500 lines, orchestrator logic
+├── references/
+│   ├── resolve-base.sh                   # duplicated from ce-review per no-cross-dir rule
+│   ├── launch-json-schema.md             # .claude/launch.json schema + stub template
+│   ├── ide-detection.md                  # env-var probe table for Claude/Cursor/Codex
+│   ├── dev-server-detection.md           # port cascade (duplicated from test-browser)
+│   ├── dev-server-rails.md               # bin/dev, Procfile.dev, port conventions (fallback)
+│   ├── dev-server-next.md                # npm run dev, turbopack flags (fallback)
+│   ├── dev-server-vite.md                # vite dev, --host, --port (fallback)
+│   ├── dev-server-procfile.md            # overmind, foreman, socket handling (fallback)
+│   ├── checklist-template.md             # prompt scaffold for checklist generation
+│   ├── subagent-dispatch-matrix.md       # file-pattern -> agent-type rules
+│   ├── stacked-pr-seed-template.md       # format for oversized-item hand-offs
+│   └── replan-seed-template.md           # format for batch-level replan escalation
+├── scripts/
+│   ├── detect-project-type.sh            # signature-file glob -> type string
+│   ├── read-launch-json.sh               # .claude/launch.json parser w/ sentinels
+│   ├── extract-surfaces.sh               # diff -> file:surface JSON
+│   ├── classify-oversized.sh             # per-item -> {manageable|oversized}
+│   └── parse-checklist.sh                # edited checklist.md -> action JSON
+```
+
+### Headless completion envelope (mirrors ce:review)
+
+```
+Polish complete (headless mode).
+
+Scope: <pr-or-branch>
+Review artifact: <path-to-ce-review-run-dir>
+Dev server: <pid> on :<port> (logs: <path>)
+IDE browser: <opened-in:claude-code|cursor|none>
+Checklist items: <n> total (<k> fixed, <m> skipped, <j> stacked, <r> replan)
+Stacked PRs: <list-or-none>
+Replan seed: <path-or-none>
+Escalation: <none|replan-suggested|replan-required>
+Artifact: .context/compound-engineering/ce-polish/<run-id>/
+
+Polish complete
+```
+
+## Implementation Units
+
+- [ ] **Unit 1: Skill skeleton, frontmatter, and argument parsing**
+
+  **Goal:** Create `skills/ce-polish-beta/SKILL.md` with frontmatter, argument-parsing table, mode detection, and input-triage phase that lands at the entry gate without attempting any state changes.
+
+  **Requirements:** R1, R2, R3, R10
+
+  **Dependencies:** None
+
+  **Files:**
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md`
+  - Test: `tests/fixtures/sample-plugin/skills/ce-polish-beta/SKILL.md` (fixture for converter tests) and converter coverage in `tests/converter.test.ts`
+
+  **Approach:**
+  - Frontmatter: `name: ce:polish-beta`, description starts `[BETA] ...`, `argument-hint: "[PR number, PR URL, branch name, or blank for current branch]"`, `disable-model-invocation: true`.
+  - Parse `$ARGUMENTS` via `ce:review`-style token table: `mode:headless`, `trust-fork:1`. Strip tokens, interpret remainder as PR number / URL / branch / blank. (`mode:report-only` and `mode:autonomous` are deferred — add in a follow-up PR once a downstream consumer needs them.)
+  - Conflicting mode token detection — stop and emit an envelope mirror of `ce:review` Stage 6.
+  - Phase 0 (Input Triage) only for this unit; later units extend with behavior.
+
+  **Patterns to follow:**
+  - Frontmatter: `plugins/compound-engineering/skills/ce-review/SKILL.md:1-5`
+  - Argument table: `plugins/compound-engineering/skills/ce-review/SKILL.md:19-33`
+  - Beta skill posture: `plugins/compound-engineering/skills/ce-work-beta/SKILL.md` frontmatter
+  - Cross-platform tool-selection rules: `plugins/compound-engineering/AGENTS.md` section on tool selection
+
+  **Test scenarios:**
+  - Happy path: `$ARGUMENTS="123"` → parsed as PR number 123, no mode flags.
+  - Happy path: `$ARGUMENTS=""` → parsed as "use current branch".
+  - Happy path: `$ARGUMENTS="mode:headless 123"` → headless mode, PR 123.
+  - Happy path: `$ARGUMENTS="https://github.com/foo/bar/pull/42"` → parsed as PR URL 42.
+  - Edge case: `$ARGUMENTS="feat/my-branch"` → parsed as branch name.
+  - Happy path: `$ARGUMENTS="trust-fork:1 123"` → trust-fork flag set, PR 123; fork-PR check in Unit 3 will honor it.
+  - Error path: `$ARGUMENTS="mode:headless mode:autonomous"` → unknown-mode-token envelope (only `mode:headless` is implemented in v1), no further dispatch.
+  - Integration: converter test confirms the skill is discovered and YAML frontmatter parses under `install --to opencode` and `install --to codex` without the colon-unquoting bug (see `plugin.compound-engineering/AGENTS.md` YAML rule).
+
+  **Verification:** Invoking `/ce:polish-beta` with no arguments prints the parsed target and exits cleanly at end of Phase 0 without attempting checkout, server work, or sub-agent dispatch.
+
+- [ ] **Unit 2: Branch / PR acquisition with worktree awareness**
+
+  **Goal:** Check out the requested PR or branch safely. Probe for an existing worktree; attach rather than re-checkout when possible. Refuse with a clear message when the working tree is dirty.
+
+  **Requirements:** R3, R4
+
+  **Dependencies:** Unit 1
+
+  **Files:**
+  - Modify: `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md` (new phase)
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/resolve-base.sh` (copied from `plugins/compound-engineering/skills/ce-review/references/resolve-base.sh` verbatim)
+  - Test: extend `tests/converter.test.ts` to confirm the duplicated script is included in the skill's output tree on conversion.
+
+  **Approach:**
+  - Clean-worktree probe via `git status --porcelain`. Non-empty → emit the same message `ce-review` uses; do not proceed.
+  - For PR number/URL: `gh pr view <n> --json url,headRefName,baseRefName,headRepositoryOwner,state,mergeable`, then `git worktree list --porcelain` and grep for the head branch. If present in a worktree, cd into that worktree's path and announce the attach. Otherwise `gh pr checkout <n>`.
+  - For branch name: same worktree probe, then `git checkout <branch>` if not in a worktree.
+  - For blank: use current branch, run `resolve-base.sh` to find the base.
+  - Re-read `git branch --show-current` after any checkout (state-machine discipline from `git-workflow-skills-need-explicit-state-machines`).
+
+  **Patterns to follow:**
+  - Branch/PR acquisition block: `plugins/compound-engineering/skills/ce-review/SKILL.md:184-267`
+  - State-machine discipline: `docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md`
+
+  **Test scenarios:**
+  - Happy path: clean worktree, PR number provided, PR not in any worktree → `gh pr checkout` executes, branch matches `headRefName`.
+  - Happy path: clean worktree, PR number provided, PR already in a worktree at `../polish-pr-123` → attach (print worktree path), no `gh pr checkout`.
+  - Edge case: dirty worktree → emit uncommitted-changes message, exit without checkout.
+  - Edge case: PR state is `MERGED` or `CLOSED` → emit "PR not open, nothing to polish" and exit.
+  - Error path: `gh pr view` fails because `gh` is not authenticated → surface the actual error to the user; do not swallow (per AGENTS.md "no error suppression" rule).
+  - Integration: running the skill on a PR branch already checked out via `gh pr checkout` earlier should re-confirm via `git branch --show-current` and proceed without re-checkout.
+
+  **Verification:** The skill never silently switches a user's primary checkout when a worktree for the PR exists, and never proceeds past Phase 1 with a dirty working tree.
+
+- [ ] **Unit 3: Entry gate — fork-PR trust check + review artifact + CI check + merge-main**
+
+  **Goal:** Verify the work is actually ready (and safe) to polish before taking any further action. Refuse cleanly if the PR is from a fork without explicit trust, if review is not green, or if CI is failing. Offer to merge latest `main` in with user confirmation.
+
+  **Requirements:** R5, R10
+
+  **Dependencies:** Unit 2
+
+  **Files:**
+  - Modify: `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md` (new phase)
+  - Modify: `plugins/compound-engineering/skills/ce-review/SKILL.md` — single additive step in the finalize phase: write `metadata.json` alongside the existing synthesized-findings file containing `{branch, head_sha, created_at}`. No other ce-review behavior changes. This is the writer counterpart to polish's SHA-binding reader.
+  - Test: fixture under `tests/fixtures/sample-plugin/.context/compound-engineering/ce-review/20260415-120000-abcd/` with both a "ready to merge" and a "not ready" synthesized-findings file, each with a matching `metadata.json`, to exercise both gate outcomes and the SHA-binding paths. Also include one fixture artifact without `metadata.json` to exercise the pre-metadata.json fallback.
+
+  **Approach:**
+  - **Fork-PR trust check (first, before anything else in this phase):** For PR-number and PR-URL targets, run `gh pr view <n> --json isCrossRepository,headRepositoryOwner,author`. If `isCrossRepository=true`, refuse unless `$ARGUMENTS` contains the explicit token `trust-fork:1`. Refusal message prints the PR author, head repo, and instructions to re-invoke with the trust-fork token. For branch-name and blank targets, skip this check (the user already has the code on disk; they are the trust boundary).
+  - **Branch + SHA binding (before reading the artifact's verdict):** Compute `current_branch = git branch --show-current` and `current_sha = git rev-parse HEAD`. The entry gate must verify that the ce-review artifact it is about to read was produced against **this branch** at **this SHA** or an ancestor SHA. Binding logic:
+    - Read `.context/compound-engineering/ce-review/*/metadata.json` sorted by mtime; pick the newest whose `branch` matches `current_branch`. If none match, emit "No review artifact found for branch `<current_branch>` — run `/ce:review` first." and exit.
+    - If the matching artifact's `head_sha` equals `current_sha`, bind succeeds.
+    - If `current_sha` is a descendant of the artifact's `head_sha` (test: `git merge-base --is-ancestor <artifact_head_sha> <current_sha>`), warn "review covers `<artifact_head_sha>`; you have N additional commits — re-run /ce:review to cover them" and, unless `$ARGUMENTS` contains `accept-stale-review:1`, refuse. Never silently accept a partial-coverage artifact.
+    - If `current_sha` is neither equal to nor a descendant of the artifact's `head_sha` (different branch lineage, force-push, or reset), refuse unconditionally with "review artifact is not an ancestor of HEAD; re-run /ce:review."
+    - `metadata.json` is a small additive file ce-review writes alongside its existing artifact (see Unchanged Invariants — ce-review gains one small additive field, no behavior change). If a pre-metadata.json artifact is the only match, fall back to the mtime-vs-HEAD-commit-time heuristic: if any commit on `current_branch` is newer than the artifact mtime, warn and require `accept-stale-review:1`. The fallback exists for backwards-compatibility during the rollout window and is documented as such — it is not the preferred path.
+  - Read the matching artifact. Parse verdict. Accept `Ready to merge` and `Ready with fixes`; reject `Not ready`.
+  - Run `gh pr checks <pr-or-branch> --json bucket,state --jq '.[] | select(.state != "SUCCESS" and .state != "SKIPPED")'`. Non-empty → "CI not green" and exit (headless mode emits structured failure envelope; interactive offers to wait-and-retry).
+  - Offer "Merge latest `main` into this branch?" via the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini) with a numbered-options fallback. On confirm: `git fetch origin && git merge origin/<base>` where `<base>` is from `resolve-base.sh`.
+  - Merge conflict → stop, do not attempt resolution; tell the user to resolve manually and re-invoke.
+
+  **Patterns to follow:**
+  - Artifact reading: `plugins/compound-engineering/skills/ce-review/SKILL.md:509-516, 675-680`
+  - Question-tool pattern: `plugins/compound-engineering/AGENTS.md` Cross-Platform User Interaction rules
+  - State-machine: re-read branch after merge.
+
+  **Test scenarios:**
+  - Happy path (fork + trust): PR is from a fork, `trust-fork:1` token present → fork check passes, proceed to review-artifact gate.
+  - Error path (fork without trust): PR is from a fork, no `trust-fork:1` token → refusal message prints PR author + head repo, exits before any server command runs.
+  - Happy path (same-repo): PR is from the same repo (`isCrossRepository=false`) → fork check is a no-op, proceed.
+  - Happy path (SHA binding exact match): artifact's `metadata.json` has `branch: feat/x`, `head_sha: abc123`; current branch `feat/x`, current SHA `abc123` → bind succeeds, proceed to verdict parse.
+  - Happy path (SHA binding ancestor-with-warning-accepted): artifact at `abc123`, current SHA `def456` is a descendant of `abc123`, `accept-stale-review:1` token present → warn "2 commits newer than review," proceed.
+  - Error path (SHA binding ancestor-without-accept): same scenario, no `accept-stale-review:1` → refuse with "re-run /ce:review to cover N additional commits."
+  - Error path (SHA binding diverged): artifact at `abc123`, current SHA `zzz999` on a different lineage (force-push or different branch) → refuse unconditionally.
+  - Error path (branch mismatch): artifact's metadata shows `branch: feat/a`, current branch is `feat/b` → refuse with "no review artifact found for branch `feat/b`."
+  - Happy path (pre-metadata.json fallback): artifact has no `metadata.json` (produced by an older ce-review), artifact mtime is newer than the HEAD commit time → warn but proceed.
+  - Edge case (pre-metadata.json fallback, stale): artifact has no `metadata.json`, HEAD commit is newer than artifact mtime → require `accept-stale-review:1` or refuse.
+  - Happy path: latest artifact says "Ready to merge", `gh pr checks` all `SUCCESS`, user confirms merge → merges cleanly and proceeds.
+  - Happy path: user skips merge-main → proceeds without merging.
+  - Edge case: no review artifact on disk → refuse with routing message.
+  - Edge case: latest review artifact is older than the latest commit on the branch → warn "review may be stale; re-run /ce:review" (don't hard-refuse — the user may have made only polish-intent commits, but flag it).
+  - Error path: `gh pr checks` shows a failing job → refuse with the job name in the error message.
+  - Error path: `git merge origin/<base>` produces a conflict → surface conflict file list, exit without attempting resolution.
+  - Integration: gate messages flow through headless envelope correctly when `mode:headless` is set.
+
+  **Verification:** Running `/ce:polish-beta` on a branch with no review artifact, or with failing CI, exits before touching the dev server or generating any checklist.
+
+- [ ] **Unit 4: Dev-server lifecycle (launch.json-first, auto-detect fallback, IDE browser handoff)**
+
+  **Goal:** Resolve the dev-server start command from `.claude/launch.json` when present; fall back to per-framework auto-detect when absent and offer to write a `launch.json` stub; optionally kill any existing listener on the target port; start the server in the background; detect the host IDE and open the polish URL in its embedded browser when available, otherwise print the URL.
+
+  **Requirements:** R4, R4b
+
+  **Dependencies:** Unit 3
+
+  **Files:**
+  - Modify: `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md` (new phase)
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/scripts/detect-project-type.sh`
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/scripts/read-launch-json.sh` — parses `.claude/launch.json`, emits selected configuration as JSON on stdout, or `__NO_LAUNCH_JSON__` / `__INVALID_LAUNCH_JSON__` sentinel on failure
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/launch-json-schema.md` — documents the schema polish reads, the stub template written on fallback, and worked examples for Rails / Next / Vite / Procfile
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/ide-detection.md` — env-var probe table (`CLAUDE_CODE`, `CURSOR_TRACE_ID`, `TERM_PROGRAM`, future Codex signals) and browser-open command per IDE
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-detection.md`
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-rails.md`
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-next.md`
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-vite.md`
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-procfile.md`
+  - Test: `tests/skills/ce-polish-beta-dev-server.test.ts` — unit tests for `read-launch-json.sh` (valid single-config, valid multi-config, missing file, invalid JSON) and `detect-project-type.sh` (signature tree per framework plus `unknown`).
+
+  **Approach:**
+  - **Step 1 — Resolve the start command, config-first:**
+    - Run `read-launch-json.sh` at the repo root. If it returns a valid configuration object, use it: `runtimeExecutable` + `runtimeArgs` + `port` + `cwd` + `env`. If multiple configurations are defined, ask the user to pick via the platform's blocking question tool.
+    - If it returns `__NO_LAUNCH_JSON__`, fall through to Step 2 (auto-detect).
+    - If it returns `__INVALID_LAUNCH_JSON__`, stop with a clear parse-error message pointing at the file — do not silently fall back; a broken config should be fixed, not worked around.
+  - **Step 2 — Auto-detect fallback when launch.json is absent:**
+    - Script `detect-project-type.sh` inspects signature files: `bin/dev` and `Gemfile` → `rails`; `next.config.js`/`next.config.mjs` → `next`; `vite.config.*` → `vite`; `Procfile` / `Procfile.dev` → `procfile`; otherwise `unknown`.
+    - Port detection: reuse the `test-browser` cascade verbatim (CLI flag → `AGENTS.md`/`CLAUDE.md` → `package.json` dev-script → `.env*` → default `3000`). Duplicate the relevant prose into `references/dev-server-detection.md` (no cross-skill references).
+    - For multi-signature (monorepo-ish): ask the user to disambiguate. For `unknown`: ask the user for the start command explicitly; do not guess.
+  - **Step 3 — Offer to persist launch.json stub (fallback path only):**
+    - Once auto-detect (or user prompt) has produced a working command + port, ask the user: "Save this as `.claude/launch.json` for future runs?" via the platform's blocking question tool. On confirm: render `references/launch-json-schema.md` stub template with the resolved values and write to the repo root. On decline: proceed without writing; future runs will auto-detect again.
+  - **Step 4 — Kill any existing listener on the target port (with consent):**
+    - Ask: "Kill existing listener on port `<port>` (PID `<pid>`, command `<name>`)?" with `AskUserQuestion` / numbered-options fallback. On confirm: `lsof -i :$PORT -t | xargs -r kill`; re-probe after 1s; if still listening, `kill -9` with a second confirmation.
+  - **Step 5 — Start server in the background:**
+    - Start via the platform's background-command primitive (`Bash(..., run_in_background=true)` in Claude Code; equivalent elsewhere). For platforms without a background primitive (Codex currently), fall back to asking the user to start the server in another terminal and paste back PID + port.
+    - Redirect stdout+stderr to `.context/compound-engineering/ce-polish/<run-id>/server.log`.
+    - Probe reachability: `curl -sfI http://localhost:<port>` for up to 30s. Print PID, log path.
+  - **Step 6 — Host IDE detection and browser handoff:**
+    - Load `references/ide-detection.md`. Probe env vars in order: `CLAUDE_CODE` (Claude Code desktop), `CURSOR_TRACE_ID` (Cursor), future Codex signal, `TERM_PROGRAM=vscode` (plain VS Code). On a positive match, emit the IDE's open-in-browser instruction for `http://localhost:<port>`. On no match, print the URL in the interactive summary. Detection failure is never fatal.
+
+  **Patterns to follow:**
+  - Port cascade: `plugins/compound-engineering/skills/test-browser/SKILL.md:97-143`
+  - Script-first architecture: `docs/solutions/skill-design/script-first-skill-architecture.md`
+  - Pre-resolution sentinel pattern (for `read-launch-json.sh`): `plugins/compound-engineering/AGENTS.md` pre-resolution exception rule
+  - No error suppression / no shell chaining in SKILL.md bodies (per `plugins/compound-engineering/AGENTS.md`)
+
+  **Test scenarios:**
+  - Happy path (launch.json, single config): `.claude/launch.json` with one Rails configuration → `read-launch-json.sh` returns it, skill uses it verbatim, auto-detect not invoked.
+  - Happy path (launch.json, multi-config): `.claude/launch.json` with `web` + `worker` configurations → skill prompts user to pick before proceeding.
+  - Happy path (no launch.json, Rails auto-detect): fixture with `bin/dev` + `Gemfile`, no `.claude/launch.json` → auto-detect returns `rails`, skill offers to write stub.
+  - Happy path (stub accepted): auto-detect succeeds, user says yes to "save launch.json?" → file written at `.claude/launch.json` with correct schema, subsequent run uses it without re-prompting.
+  - Happy path (Next.js auto-detect): fixture with `next.config.mjs`, no launch.json → `next` detected.
+  - Happy path (Procfile/Overmind auto-detect): fixture with `Procfile.dev`, no launch.json → `procfile`.
+  - Happy path (IDE detect — Claude Code): `CLAUDE_CODE` env var set → browser-open instruction emitted.
+  - Happy path (IDE detect — Cursor): `CURSOR_TRACE_ID` env var set → Cursor browser-open instruction emitted.
+  - Happy path (IDE detect — terminal): no IDE env vars set → URL printed, no browser-open attempt.
+  - Edge case (invalid launch.json): `.claude/launch.json` exists but is malformed JSON → skill stops with parse-error pointing at file, does not fall back silently.
+  - Edge case (multi-signature auto-detect): `bin/dev` + `next.config.mjs` (monorepo-ish) → skill asks the user to disambiguate.
+  - Edge case (unknown auto-detect): no signatures, no launch.json → skill prompts user for start command.
+  - Error path: port in use, user declines to kill → skill exits cleanly with "cannot continue without dev server."
+  - Error path: kill succeeds but server fails to start within 30s → exit with the log tail printed.
+  - Error path (no background primitive): Codex or other platform without background-command support → skill asks user to start the server manually and paste PID + port.
+  - Integration: server PID/log path propagated into the run artifact so the user can tail logs after the polish run ends; `launch.json` written during a first run is consumed by the next run without re-prompting.
+
+  **Verification:** `launch.json` is the first source checked; auto-detect runs only when it is missing; a user who accepts the stub offer gets a durable config that makes subsequent runs deterministic. For each supported project type, the skill starts a reachable dev server on the correct port and reports PID + log path. When running inside Claude Code / Cursor, the polish URL opens in the embedded browser; elsewhere the URL is printed.
+
+- [ ] **Unit 5: Checklist generation, size gate, and sub-agent dispatch**
+
+  **Goal:** Generate an end-user-testable checklist from the diff + PR body + (optional) plan, classify each item as `manageable` or `oversized`, route `oversized` items to stacked-PR seed files, dispatch polish sub-agents for `manageable` items with file-collision-safe grouping.
+
+  **Requirements:** R6, R7, R8
+
+  **Dependencies:** Unit 4
+
+  **Files:**
+  - Modify: `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md` (new phase — the core of polish)
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/scripts/extract-surfaces.sh`
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/scripts/classify-oversized.sh`
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/scripts/parse-checklist.sh` — parses the edited `checklist.md`, emits JSON array of `{id, action, files, surface, status, notes}`; surfaces parse errors with line numbers on stderr
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/checklist-template.md` — markdown scaffold with per-item schema, field descriptions, and allowed-action list
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/subagent-dispatch-matrix.md`
+  - Create: `plugins/compound-engineering/skills/ce-polish-beta/references/stacked-pr-seed-template.md`
+  - Test: `tests/skills/ce-polish-beta-size-gate.test.ts` — unit tests on `classify-oversized.sh` (manageable + oversized fixture items), on `parse-checklist.sh` (well-formed + malformed files + unknown actions), and on dispatcher branching by action.
+
+  **Approach:**
+  - `extract-surfaces.sh` reads `git diff --name-only <base>...HEAD` and emits JSON mapping each file to one of `{view, controller, model, api, config, asset, test, other}` based on path heuristics (matches `app/views/`, `app/controllers/`, etc. for Rails; `pages/`/`app/` for Next; `src/components/` for Vite).
+  - Model synthesizes the checklist using `references/checklist-template.md` as a scaffold: diff + PR body + plan → list of per-item markdown sections. Each item is a top-level `## Item N — <title>` block with YAML-ish fields: `action:` (default `keep`), `files:`, `surface:`, `status:` (from `classify-oversized.sh`), `notes:` (block scalar). The template explains the allowed `action` values and documents that editing `action` is the only input channel.
+  - `classify-oversized.sh` reads each checklist item's file-path list and returns `status: manageable` or `status: oversized` based on:
+    - >5 distinct file paths, OR
+    - >2 distinct surface categories, OR
+    - >300 lines of diff spanned (sum of `git diff --numstat <base>...HEAD` for the item's files).
+  - Thresholds are explicitly conservative starting points; revisit after beta runs.
+  - For each `oversized` item: write `.context/compound-engineering/ce-polish/<run-id>/stacked-pr-<n>.md` using `references/stacked-pr-seed-template.md`. In the checklist file, oversized items are included but marked `status: oversized` and `action: stacked` (immutable — user editing `action` on an oversized item is rejected on re-read with a pointer to the stacked seed).
+  - **Human interaction loop (edit-file-then-ack):**
+    1. Polish writes `.context/compound-engineering/ce-polish/<run-id>/checklist.md` with all items in their default state (`action: keep` except oversized which are pinned `action: stacked`).
+    2. Polish announces the file path, a short summary of item count and stacked count, the dev-server URL (and whether it was opened in the IDE browser), and exits to the user prompt with one instruction: *"Test the app, edit `action:` on each item to `keep` / `skip` / `fix` / `note`, add prose under `notes:` as needed, then reply `ready` to dispatch or `done` to finish."*
+    3. User edits the file in their editor of choice (the IDE that's open anyway). They may also **add new `## Item N — ...` sections** for anything the generated checklist missed — polish re-runs size classification on added items during the next parse.
+    4. On user reply `ready`: `parse-checklist.sh` reads the file. Unknown action values, malformed YAML-ish fields, or edits to pinned `status: oversized / action: stacked` items produce a structured error — polish prints the error with line number and asks the user to fix the file, does not dispatch.
+    5. On a clean parse, polish dispatches per-action:
+       - `keep` → record in `dispatch-log.json`, no sub-agent
+       - `skip` → record in `dispatch-log.json`, no sub-agent
+       - `fix` → dispatch sub-agent using the item's `notes:` block as the fix directive (per the dispatch matrix rules below)
+       - `note` → record in `dispatch-log.json`, no sub-agent
+       - `stacked` → already handled at classification; never dispatched
+       - `replan` → escalate: this item is bigger than polish can handle. Polish writes `.context/compound-engineering/ce-polish/<run-id>/replan-seed.md` capturing the item's `notes:`, file list, and originating brainstorm/plan path (from `plan:<path>` argument if provided, else `docs/plans/` most recent match). The run halts with a routing message recommending `/ce:plan <path>` to revise the plan or `/ce:brainstorm` to rethink scope.
+  - **Escalation thresholds (batch-level replan):** in addition to the per-item `replan` action, polish auto-suggests (does not auto-execute) batch-level replan when any of these fire:
+    - More than half the generated items are classified `oversized` (the PR as a whole is too large, not just individual items)
+    - More than 3 items are marked `replan` by the user in a single round
+    - The initial diff against base exceeds >30 files or >1000 lines before checklist generation — polish preempts the loop entirely and emits the escalation message before writing `checklist.md`, so the user does not do exploratory testing on a scope that should not have reached polish
+    When any threshold fires, polish writes `replan-seed.md`, pauses the loop, and asks the user via the platform's blocking question tool: (a) continue polishing the subset that is manageable, (b) halt and re-plan via `/ce:plan`, (c) halt and rethink via `/ce:brainstorm`. The user's answer is durable — polish records it in the artifact so later runs do not re-prompt.
+    6. After dispatch, polish rewrites `checklist.md` in place: each previously-`fix` item now shows `result: {fixed | failed}`, a one-line summary, and (for fixed items) a link to the commit SHA or pending diff. All other items retain their prior state. Polish announces the updated file and awaits the next reply.
+    7. On user reply `done`: polish stops the loop, proceeds to Unit 6 (envelope + artifact write).
+    8. On user reply `cancel`: polish stops without dispatching remaining actions, records the partial state in the artifact, proceeds to Unit 6.
+  - Dispatch rules (from `references/subagent-dispatch-matrix.md`):
+    - `asset`/`view` files → `compound-engineering:design:design-iterator`
+    - If a Figma link is in the PR body → also `compound-engineering:design:design-implementation-reviewer`
+    - Async JS / `stimulus_*` / `turbo_*` files → `compound-engineering:review:julik-frontend-races-reviewer`
+    - Every polish run → `compound-engineering:review:code-simplicity-reviewer` + `compound-engineering:review:maintainability-reviewer` as a sanity pass on dispatched items (not a blanket run — only over touched files).
+  - Group `fix`-action items by file-path intersection. Items sharing any file run sequentially in a single agent invocation; disjoint items may run in parallel.
+  - Parallelize only when the number of disjoint `fix` groups is >=5 (crossover rule from `codex-delegation-best-practices`). Below 5, run sequentially — overhead isn't worth it.
+  - **Headless mode behavior:** `mode:headless` cannot use the edit-file-then-ack loop (no human to edit the file). In headless mode, polish generates `checklist.md`, emits the structured envelope with item list and stacked seeds, and exits with `Polish complete` — it does NOT wait for user edits or dispatch fixes. A downstream caller can re-invoke interactively to complete the loop. Document this in Unit 6.
+
+  **Patterns to follow:**
+  - Parallel dispatch: `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md:135-164`
+  - Sub-agent template: `plugins/compound-engineering/skills/ce-review/references/subagent-template.md`
+  - Fully qualified agent names: `plugins/compound-engineering/AGENTS.md`
+  - Pass paths not content: `docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md`
+  - Load-bearing status fields: `docs/solutions/workflow/todo-status-lifecycle.md`
+
+  **Test scenarios:**
+  - Happy path (manageable): 3 items, 4 total files across 2 surfaces → all `manageable`, user marks 2 `fix` + 1 `keep`, dispatch sequential (below 5-group crossover).
+  - Happy path (oversized): 1 item touching 8 files across 4 surfaces → `oversized`, stacked-PR seed written, item pinned in checklist.md, user cannot change its action.
+  - Happy path (parallel): 6 disjoint items all marked `fix` → parallel dispatch.
+  - Happy path (edit-ack round-trip): polish writes checklist.md, user changes 2 items to `fix`, replies `ready`, polish dispatches, rewrites checklist.md with results, user replies `done` → clean exit.
+  - Edge case (file collision): 5 items with 2 sharing a file, all `fix` → first 4 run parallel, those 2 serialize into one sub-agent.
+  - Edge case (human-added item oversized): human adds a free-form `## Item N` section that spans many files → size gate re-runs on next parse, item becomes `oversized`, pinned; polish warns.
+  - Edge case (replan action on one item): user marks 1 item `replan` → polish writes replan-seed.md, halts, routes to `/ce:plan`, does not dispatch remaining `fix` items from the same round.
+  - Edge case (batch-level preemptive replan): diff touches 45 files / 1500 lines → polish preempts before checklist generation, writes replan-seed.md, asks continue-subset / halt-for-replan / halt-for-brainstorm.
+  - Edge case (majority-oversized): 5 of 8 generated items classified `oversized` → polish writes replan-seed.md and prompts user for continue-subset / halt.
+  - Edge case (3+ replan actions in one round): user marks 4 items `replan` in one round → polish escalates even though no preemptive signal fired.
+  - Error path (malformed checklist): user introduces an unknown `action:` value or breaks the item header format → parse-checklist.sh reports line number, polish asks user to fix file, does not dispatch.
+  - Error path (editing pinned oversized item): user changes a `status: oversized` item's action to `fix` → parse rejects the edit with pointer to the stacked-PR seed file.
+  - Error path (sub-agent fails): sub-agent fails to produce a fix → recorded as `result: failed` in updated checklist.md, dispatch-log.json captures full error, polish does not retry automatically.
+  - Error path (diff empty): polish invoked with no changes vs base → refuse with "nothing to polish."
+  - Error path (cancel mid-loop): user replies `cancel` after round 1 with fixes in flight → polish stops dispatch, records partial state, proceeds to envelope with partial summary.
+  - Headless: `mode:headless` generates checklist.md, emits envelope with item list + stacked seeds + replan flag if any, exits with `Polish complete` — never waits for user ack, never dispatches.
+  - Integration: checklist + dispatch + artifact writing round-trips through the run artifact; later `/ce:polish` runs on the same PR can see prior run's output.
+
+  **Verification:** For a PR with 4 polish items (1 oversized, 3 manageable sharing one file), the skill writes 1 stacked-PR seed, pins the oversized item in `checklist.md`, the user edits two of the three manageable items to `fix`, polish dispatches them via a single sequential sub-agent invocation (file collision), rewrites `checklist.md` with results, and the user replies `done` — producing a summary record with `fixed: 2`, `kept: 1`, `stacked: 1`, `replanned: 0`. For a PR diff of 50 files touching 5 surfaces, polish preempts before checklist generation and routes the user to `/ce:plan`.
+
+- [ ] **Unit 6: Headless envelope, run artifact, and workflow stitching**
+
+  **Goal:** Emit structured completion envelopes (interactive + headless), write the canonical run artifact, and document where `/ce:polish` slots in the overall workflow.
+
+  **Requirements:** R9
+
+  **Dependencies:** Unit 5
+
+  **Files:**
+  - Modify: `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md` (final phase + workflow-integration prose)
+  - Modify: `plugins/compound-engineering/README.md` — add `ce:polish-beta` to the Skills table; update skill count (note: this is a substantive doc update, not a release-owned count change — it reflects a genuine new file, not a release version bump).
+  - Test: `tests/skills/ce-polish-beta-envelope.test.ts` — snapshot tests for both interactive and headless completion output.
+
+  **Approach:**
+  - Write per-run artifact at `.context/compound-engineering/ce-polish/<run-id>/` with: `checklist.md` (evolves in place across rounds), `dispatch-log.json` (agent assignments + outcomes + classifier decisions for threshold tuning), `stacked-pr-<n>.md` files, `replan-seed.md` (present only when escalation fired), `server.log` (from Unit 4), `summary.md`.
+  - Interactive mode: print a human-readable summary and, if any stacked-PR seeds exist, offer to create them via `gh pr create` in a new branch — or stop and let the user run `/git-commit-push-pr` themselves.
+  - Headless mode: emit the envelope shape from the High-Level Technical Design section, terminal signal `Polish complete`.
+  - Skill prose includes a "Where this fits" section linking to `/ce:review` upstream and `/git-commit-push-pr` downstream. Uses semantic wording ("load the `git-commit-push-pr` skill") per the cross-platform reference rules.
+
+  **Patterns to follow:**
+  - Headless envelope: `plugins/compound-engineering/skills/ce-review/SKILL.md:509-516`
+  - Run artifact shape: `plugins/compound-engineering/skills/ce-review/SKILL.md:675-680`
+  - Cross-platform reference wording: `plugins/compound-engineering/AGENTS.md` Cross-Platform Reference Rules
+
+  **Test scenarios:**
+  - Happy path (interactive): successful polish run ending with 2 fixes and 1 stacked → summary prints correctly, user prompted about stacked PR creation.
+  - Happy path (headless): same scenario in `mode:headless` → envelope matches the documented shape byte-for-byte, `Polish complete` is the last line.
+  - Edge case (0 items fixed): skill exits cleanly, envelope reports `Checklist items: 0 fixed`.
+  - Edge case (only oversized items): skill reports all items stacked, no fixes dispatched, server still started.
+  - Integration: `bun run release:validate` after this unit still passes (no release-owned file changes).
+  - Integration: README skill table includes `ce:polish-beta` with the correct description; `bun test` converter tests pass.
+
+  **Verification:** A consumer of `mode:headless` (e.g., a future LFG chain) can parse the envelope, detect `Polish complete`, and read the artifact path reliably. `README.md` reflects the new skill. `bun run release:validate` passes without release-owned version changes.
+
+## System-Wide Impact
+
+- **Interaction graph:** `/ce:polish-beta` invokes six existing agents (design-iterator, design-implementation-reviewer, figma-design-sync, code-simplicity-reviewer, maintainability-reviewer, julik-frontend-races-reviewer) via sub-agent dispatch. It reads from `/ce:review`'s run-artifact directory and writes to its own. It does not modify any existing skill's behavior; integration with `/ce:work` (auto-chain) is deliberately deferred.
+- **Error propagation:** Gate failures (no review artifact, failing CI, dirty worktree, merge conflict, no dev server) all exit cleanly at the phase boundary with an actionable message. No silent skipping. Sub-agent failures are recorded in the artifact and surfaced to the user; polish never proceeds as if a failed fix succeeded.
+- **State lifecycle risks:** The dev server outlives the polish run. PID + log path must be in the artifact and the final summary. Otherwise the user has no clean way to reclaim or kill the server after the session ends. Worktree state must be re-probed after every checkout (state-machine discipline).
+- **API surface parity:** `mode:headless` envelope shape mirrors `ce:review` so downstream consumers can parse both with the same logic. Future `/ce:polish` (stable) promotion must preserve the envelope exactly.
+- **Integration coverage:** Unit tests alone will not cover the cross-layer behavior of "review artifact + CI check + merge-main + server lifecycle + sub-agent dispatch" as a single flow. Beta usage on a real PR is the integration test for v1.
+- **Unchanged invariants:**
+  - `/ce:review`'s synthesis, finding taxonomy, and headless envelope are unchanged.
+  - `/ce:work`'s shipping workflow is unchanged.
+  - `/git-commit-push-pr` is unchanged.
+  - No existing agents are modified.
+  - No release-owned files (`.claude-plugin/plugin.json`, `.claude-plugin/marketplace.json`, root `CHANGELOG.md`) are touched.
+- **Additive change to `/ce:review` artifact shape:** `/ce:review` gains a small, additive `metadata.json` file per run artifact containing `{branch, head_sha, created_at}`. This is required by Unit 3's SHA-binding entry gate so polish can refuse stale review artifacts. The change is purely additive — existing artifact consumers are unaffected, the written files otherwise keep their current shape, and a fallback path handles pre-metadata.json artifacts via mtime comparison against the HEAD commit time. The `/ce:review` skill edit is scoped to a single write step in its finalize phase and does not alter finding synthesis or envelope output.
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| Dev-server lifecycle is novel ground; the per-framework recipes will miss edge cases (monorepos, custom scripts, non-standard ports). | Lead with user-authored `.claude/launch.json` — sidesteps detection entirely for users who opt in. Auto-detect remains as fallback. Ship as beta (`ce:polish-beta`) with `disable-model-invocation: true`. `unknown` project type always falls back to asking the user for the start command. Revisit thresholds and recipes after first beta runs. |
+| `.claude/launch.json` is not a fully standardized format across Claude Code / Cursor / VS Code / Codex. Leading with it may surprise users on other IDEs who expect `.vscode/launch.json` or `tasks.json`. | Document the schema polish reads in `references/launch-json-schema.md` with worked examples. On absence, auto-detect still covers most cases. Revisit after beta if a clear cross-IDE standard emerges — the config format can be swapped without touching the rest of the skill. |
+| IDE detection (Claude Code / Cursor / future Codex) is a moving target; env-var signals shift between releases. | Treat IDE detection as progressive enhancement. Detection failure never blocks — always falls through to printing the URL. Encode the env-var table in `references/ide-detection.md` so updates are a single-file change. |
+| A fork PR's checked-out `.claude/launch.json` is attacker-controlled; auto-executing its `runtimeExecutable` + `runtimeArgs` inside the maintainer's shell is arbitrary code execution. | Entry gate probes `gh pr view --json isCrossRepository,headRepositoryOwner`. For fork PRs, refuse by default and require an explicit `trust-fork:1` argument token plus printing the PR author + repo before any server command runs. Document this in Unit 3's entry gate alongside the review-artifact and CI check. |
+| `lsof` kill on a port may terminate a server the user cares about (not the expected dev server). | Always confirm the kill with the user by printing the PID and process name before asking. Never kill without consent. Never use `kill -9` without a second confirmation after a graceful kill fails. |
+| `git merge origin/<base>` may conflict, leaving the branch in a half-merged state. | Exit cleanly on conflict with the conflict file list; do not attempt resolution. User resolves manually and re-invokes. |
+| Silent primary-checkout switches during an active `bin/dev` / `npm run dev` can serve the wrong branch's assets. | Worktree probe before `gh pr checkout`: if PR is already checked out in a worktree, attach. Dev server is always killed+restarted after any checkout before the checklist is presented. |
+| The "oversized" classifier thresholds (>5 files, >2 surfaces, >300 diff lines for per-item; >30 files / >1000 lines for batch preempt) are guesses. Over-triggering creates friction; under-triggering defeats the guard. | Thresholds configurable via the classifier script. Ship conservative defaults; document as "revisit after beta runs." The size gate is load-bearing in the dispatcher, so incorrect thresholds produce visible friction the user will report. The run artifact must record every classifier decision (item file count, surface count, diff-line count, classification result, user override if any) so thresholds can be tuned empirically. |
+| Polish escalates to re-planning (writing `replan-seed.md` and routing to `/ce:plan` or `/ce:brainstorm`) but cannot itself invoke those skills. A user who dismisses the escalation and continues anyway produces work the stacked-PR path cannot safely absorb. | Replan escalation is presented via the platform's blocking question tool with a durable recorded answer. `continue subset` is explicitly offered so the user can proceed on the part that fits polish while acknowledging the replan-seed. The seed file persists and the summary flags it so a later reviewer sees that the user consciously deferred a replan. |
+| Sub-agents running in parallel may collide on file writes. | Dispatcher groups items by file-path intersection; colliding items serialize. No item is ever dispatched to two agents simultaneously. |
+| The skill assumes `.context/compound-engineering/ce-review/` exists. On a fresh clone or a new branch where `/ce:review` has never run, the gate will fail with "no review artifact." | Gate's refusal message explicitly routes the user to `/ce:review` first. No silent fallback. |
+| `gh pr checks` may not return results for a brand-new PR where CI hasn't started yet. | Interactive mode: offer to wait-and-retry with a 30s interval; user can cancel. Headless mode: treat as non-green and emit failure envelope. |
+| Promotion from beta to stable requires updating every orchestration caller in the same PR; missing one leaves stale references. | Implementation Unit 6 catalogs the integration points (`README.md`, future `/ce:work` auto-chain, potential LFG integration). Promotion PR follows the `ce-work-beta-promotion-checklist` precedent. |
+| The human-in-the-loop step pauses automation indefinitely in headless mode if the caller doesn't expect it. | `mode:headless` never prompts interactively; if human judgment is required (oversized items, ambiguous project type, kill confirmation), headless fails fast with a structured "human input required" envelope and does not hang. |
+
+## Security Considerations
+
+`/ce:polish-beta` runs attacker-influenced code (the checked-out branch's dev server, `launch.json`, and diff) inside the maintainer's shell and on a local network port. The individual guardrails are distributed across Units 3-5; this section consolidates the threat model so the boundaries stay explicit as the skill evolves.
+
+| Concern | Trust boundary | Control | Unit |
+|---------|---------------|---------|------|
+| Fork-PR `launch.json` is attacker-authored — its `runtimeExecutable` + `runtimeArgs` run in the maintainer's shell. | Cross-repo PR code is untrusted by default. | Entry gate probes `gh pr view --json isCrossRepository,headRepositoryOwner`. Fork PRs refuse unconditionally unless `trust-fork:1` is passed; the PR author + source repo are printed before any server command runs. Headless mode never auto-trusts a fork. | Unit 3 |
+| `launch.json` from a same-repo branch can still be malicious if the branch was written by a compromised contributor. | User-authored config on a trusted repo is the trust boundary. The user who invokes `/ce:polish-beta` must trust their own repo's branches. | Document the trust model in `references/launch-json-schema.md`. No separate guard — this matches the trust model of any IDE that executes `.vscode/launch.json`. | Unit 4 |
+| Killing a process bound to the project's dev-server port may terminate an unrelated server the user cares about. | User explicit consent required per kill. | Print PID + process name, ask via the platform's blocking question tool; never kill without confirmation; never use `kill -9` without a second confirmation after graceful kill fails; headless mode refuses to kill unless `allow-port-kill:1` is passed. | Unit 4 |
+| Dev server bound to `0.0.0.0` exposes attacker-influenced code to the network. | Dev server should be localhost-only. | All framework recipes and the `launch.json` schema document default to `localhost`/`127.0.0.1` host binding. Reject a configured host of `0.0.0.0` unless the user explicitly overrides. | Unit 4 |
+| Reusing a stale `/ce:review` artifact across branches (e.g., the user ran review on branch A, then checked out branch B and invoked polish) would gate polish on the wrong verdict. | Review artifact is trusted only for the exact SHA it was computed against (and descendants the user acknowledges). | SHA-binding check: `metadata.json` must match current branch and SHA, or be an ancestor with `accept-stale-review:1`, else refuse. Pre-metadata.json fallback uses mtime-vs-commit-time with the same accept-token. | Unit 3 |
+| Artifact files written to `.context/compound-engineering/ce-polish/<run-id>/` may be read by other skills or committed by accident. | Artifacts are local-only, never committed. | `.context/` is already gitignored at repo root; polish never writes outside it. Run IDs are per-run so concurrent invocations cannot interleave. | Unit 6 |
+| Sub-agent dispatch passes user-supplied `notes:` text as fix directives. Malicious notes could attempt prompt injection against the sub-agent. | The user authoring `notes:` is the same user who invoked polish; notes are not an external input. | No separate guard — same trust level as any user-typed directive to the agent. Document that `notes:` is interpreted as a directive in `references/checklist-template.md`. | Unit 5 |
+
+The table is the full surface area: there are no other untrusted inputs into polish beyond (a) fork-PR contents, (b) same-repo branch contents, (c) the port-binding process table, (d) the review artifact on disk, and (e) user-typed notes.
+
+## Documentation / Operational Notes
+
+- `README.md` skill table gains one row for `ce:polish-beta`. Count update is a substantive doc edit, not a release-owned version bump.
+- No `CHANGELOG.md` entry in this PR; release-please composes it from the conventional commit (`feat(ce-polish): add /ce:polish-beta skill for human-in-the-loop refinement`).
+- Feature branch name: `feat/ce-polish-command`.
+- After the beta PR merges, monitor usage feedback for ~2 weeks of active use before opening a promotion PR. Promotion criteria: no P0/P1 issues in beta usage, `unknown` fall-back rate <20% of runs, stacked-PR-seed path exercised at least once.
+- Beta-to-stable promotion PR checklist lives in `docs/solutions/skill-design/ce-work-beta-promotion-checklist-2026-03-31.md` — apply it by analogy.
+
+## Sources & References
+
+- Motivating transcript: user-provided polish-phase description (attached to `/modify-plugin` invocation, this planning run).
+- Research agents consulted this planning run:
+  - `compound-engineering:research:repo-research-analyst` — patterns, architecture, directory layout, frontmatter conventions, existing agent inventory.
+  - `compound-engineering:research:learnings-researcher` — institutional findings across `docs/solutions/`.
+- Related code (all repo-relative):
+  - `plugins/compound-engineering/skills/ce-review/SKILL.md` (argument table, branch/PR acquisition, headless envelope)
+  - `plugins/compound-engineering/skills/ce-work/SKILL.md` (complexity matrix, phase structure)
+  - `plugins/compound-engineering/skills/ce-brainstorm/SKILL.md` (interactive posture baseline)
+  - `plugins/compound-engineering/skills/test-browser/SKILL.md` (port detection cascade, framework-agnostic probing)
+  - `plugins/compound-engineering/skills/resolve-pr-feedback/SKILL.md` (parallel sub-agent dispatch pattern)
+  - `plugins/compound-engineering/skills/ce-work-beta/SKILL.md` (beta posture)
+  - `plugins/compound-engineering/skills/ce-review/references/resolve-base.sh` (base-branch resolver — duplicated, not referenced)
+  - `plugins/compound-engineering/skills/ce-review/references/subagent-template.md` (sub-agent prompt shape)
+  - `plugins/compound-engineering/agents/design/design-iterator.md`
+  - `plugins/compound-engineering/agents/design/design-implementation-reviewer.md`
+  - `plugins/compound-engineering/agents/design/figma-design-sync.md`
+  - `plugins/compound-engineering/agents/review/code-simplicity-reviewer.md`
+  - `plugins/compound-engineering/agents/review/maintainability-reviewer.md`
+  - `plugins/compound-engineering/agents/review/julik-frontend-races-reviewer.md`
+- Institutional learnings:
+  - `docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md`
+  - `docs/solutions/skill-design/compound-refresh-skill-improvements.md`
+  - `docs/solutions/skill-design/research-agent-pipeline-separation-2026-04-05.md`
+  - `docs/solutions/skill-design/pass-paths-not-content-to-subagents-2026-03-26.md`
+  - `docs/solutions/best-practices/codex-delegation-best-practices-2026-04-01.md`
+  - `docs/solutions/developer-experience/branch-based-plugin-install-and-testing-2026-03-26.md`
+  - `docs/solutions/best-practices/conditional-visual-aids-in-generated-documents-2026-03-29.md`
+  - `docs/solutions/workflow/todo-status-lifecycle.md`
+  - `docs/solutions/skill-design/script-first-skill-architecture.md`
+  - `docs/solutions/skill-design/beta-skills-framework.md`
+  - `docs/solutions/skill-design/ce-work-beta-promotion-checklist-2026-03-31.md`
+- Project AGENTS.md rules applied throughout:
+  - `AGENTS.md` (repo root) — branching, commit conventions, release versioning, file reference rules
+  - `plugins/compound-engineering/AGENTS.md` — skill compliance checklist, cross-platform rules, reference file inclusion, tool selection
--- a/docs/plans/2026-04-16-001-fix-ce-polish-beta-detection-gaps-plan.md
+++ b/docs/plans/2026-04-16-001-fix-ce-polish-beta-detection-gaps-plan.md
@@ -0,0 +1,456 @@
+---
+title: fix: Close ce-polish-beta detection gaps from PR #568 feedback
+type: fix
+status: active
+date: 2026-04-16
+---
+
+# fix: Close ce-polish-beta detection gaps from PR #568 feedback
+
+## Overview
+
+Address four concrete detection/resolution gaps in `ce-polish-beta` raised by @tmchow on EveryInc/compound-engineering-plugin#568:
+
+1. Framework coverage — Nuxt, SvelteKit, Remix, Astro fall through to `unknown` (the commenter calls them "table stakes alongside Next and Vite")
+2. Monorepo blind spot — `detect-project-type.sh` only inspects the repo root, so a Turborepo with `apps/web/next.config.js` returns `unknown`
+3. Package-manager detection is documented in prose but not implemented; Next/Vite stubs silently write `npm run dev` on pnpm/yarn/bun projects
+4. Port cascade is lossy — `.env` reader doesn't strip quotes or trailing comments, `AGENTS.md`/`CLAUDE.md` grep hits unrelated doc references, no probe of `next.config.*` / `vite.config.*` / `config/puma.rb` / `docker-compose.yml`
+
+All four are detection/resolution bugs in an already-shipped beta skill (`disable-model-invocation: true`, so no auto-trigger regression risk). Fix scope is the skill's own `scripts/` and `references/` trees plus the Phase 3 wiring in `SKILL.md`.
+
+## Problem Frame
+
+Polish's dev-server lifecycle (Phase 3 in SKILL.md) has three resolution jobs:
+
+- **What project type is this?** → `scripts/detect-project-type.sh`
+- **How do I start it?** → per-type recipe in `references/dev-server-<type>.md`, substituted into a `launch.json` stub
+- **What port will it bind to?** → inline cascade documented in `references/dev-server-detection.md`
+
+All three jobs currently fail for common-but-unhandled shapes (monorepos, Nuxt/Astro, pnpm-only repos, quoted `.env` values). Users hit these gaps the first time they run polish on anything outside the four project types the skill was bootstrapped with (rails, next, vite, procfile). The fallback — "ask the user to author `.claude/launch.json`" — works but pushes onto the user a discovery problem the skill should do itself.
+
+Feedback is the first real contact the skill has had with a reviewer outside the original plan, and it lines up with hazards already flagged in `references/dev-server-vite.md` ("SvelteKit, SolidStart, Qwik City, and Astro all use Vite… Different default ports apply") and `references/dev-server-next.md` ("Monorepo roots: users should set `cwd`… to the specific Next app"). The skill knew these were gaps and punted — this plan closes the punt.
+
+## Requirements Trace
+
+- **R1.** Nuxt, SvelteKit, Astro, and Remix are recognized first-class project types (no longer fall through to `unknown`).
+- **R2.** `detect-project-type.sh` finds a framework config inside a monorepo workspace (up to a bounded depth) and returns a type + relative `cwd`, so the stub-writer can populate `cwd` in `launch.json` without user intervention.
+- **R3.** Next and Vite stubs use the package manager indicated by the lockfile (`pnpm` / `yarn` / `bun` / `npm`) instead of hard-coding `npm`.
+- **R4.** Port resolution prefers authoritative config files (framework config, `config/puma.rb`, `Procfile.dev`, `docker-compose.yml`) over prose references. `.env` parsing correctly strips surrounding quotes and trailing `# comment`. The noisy `AGENTS.md`/`CLAUDE.md` grep is removed.
+- **R5.** Existing users are not regressed. Repos that previously detected correctly continue to detect the same type; repos with `.claude/launch.json` are unaffected (launch.json still wins).
+- **R6.** Each new or modified script has unit-test coverage in `tests/skills/` mirroring the existing `ce-polish-beta-dev-server.test.ts` harness (tmp git repo, Bun.spawn, exit-code + stdout assertions).
+
+## Scope Boundaries
+
+- **Not** adding Python (Django, Flask, FastAPI), Go, Elixir/Phoenix, Deno/Fresh, Angular, Gatsby, Expo, Electron, Tauri, Storybook, or Ruby non-Rails (Sinatra, Hanami). Trevor listed these as gaps; they each need their own recipe file and dev-server conventions, and together they would roughly double the skill's surface area. Defer to a follow-up plan.
+- **Not** changing `.claude/launch.json` priority — launch.json always wins over auto-detect. This plan only improves what auto-detect does when launch.json is absent.
+- **Not** rewriting the IDE handoff, kill-by-port, or reachability probe in Phase 3.5/3.6. Those are unaffected.
+- **Not** changing headless-mode semantics. All new scripts are probes; they don't mutate state, so headless rules ("never write .claude/launch.json, never kill without token") are preserved.
+- **Not** adding a framework config parser beyond a conservative regex. Arbitrary JS/TS config files can set `port` via computed expressions the regex won't catch; when the probe misses, the cascade falls through to framework defaults. Document this as best-effort, not authoritative.
+- **Not** bumping plugin version, marketplace version, or writing a release entry. Per repo `AGENTS.md`, release-please owns that.
+
+## Context & Research
+
+### Relevant Code and Patterns
+
+- `plugins/compound-engineering/skills/ce-polish-beta/scripts/detect-project-type.sh` — current root-only classifier with precedence rules (rails beats procfile, `multiple` for real disambiguation)
+- `plugins/compound-engineering/skills/ce-polish-beta/scripts/read-launch-json.sh` — existing script that emits sentinel outputs (`__NO_LAUNCH_JSON__`, `__INVALID_LAUNCH_JSON__`, `__MISSING_CONFIGURATIONS__`, `__CONFIG_NOT_FOUND__`). The sentinel pattern is the convention new scripts should follow for signaling "no match, fall through"
+- `plugins/compound-engineering/skills/ce-polish-beta/scripts/parse-checklist.sh` — pattern for set-unsafe `set -u`, bash regex (`[[ =~ ]]`), and awk/jq composition within a single script. New scripts should match this style (no `set -euo pipefail`; the existing scripts use `set -u` only, by convention)
+- `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-<rails|next|vite|procfile>.md` — per-type recipe shape: Signature, Start command, Port, Stub generation, Common gotchas
+- `plugins/compound-engineering/skills/ce-polish-beta/references/launch-json-schema.md` — stub templates grouped by project type; the stub-writer block to parameterize
+- `tests/skills/ce-polish-beta-dev-server.test.ts` — test harness pattern: tmp git repo, touch signature files, invoke script via `Bun.spawn`, assert `exitCode` + `stdout.trim()`. All new scripts follow this shape.
+- `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md` Phase 3.2 (lines 272-291) — project-type routing table; the surface that needs extending for new types and the `<type>@<cwd>` return variant
+- `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md` Phase 3.3 (lines 293-303) — stub-writer; where package-manager substitution and `cwd` population land
+
+### Institutional Learnings
+
+None directly applicable; this work extends patterns already proven in the same skill.
+
+### Cross-Repo Reference (informational only)
+
+`plugins/compound-engineering/skills/test-browser/SKILL.md` has an inline port cascade that polish's `dev-server-detection.md` is a copy of (per the self-contained-skill rule). This plan does not modify `test-browser` — the two cascades stay independent by design. Note for maintainers: if test-browser adopts a parallel resolve-port script later, the two skills will need the standard manual-sync note updated.
+
+## Key Technical Decisions
+
+- **Decision: detect-project-type.sh returns `<type>` at root and `<type>@<cwd>` for monorepo hits, never just `<cwd>`.** Rationale: keeps the existing single-token protocol intact for the 90% root-detection case; downstream readers split on `@` when present. `@` is chosen over `:` because `:` is reserved for the outer multi-hit separator (see below). Alternative considered: return structured JSON. Rejected because every other script in `scripts/` returns plain-text tokens and consumers use `case`/`awk` on them, and JSON would force `jq` onto a detector that today only uses bash builtins.
+
+- **Decision: Output grammar is `<type>` or `<type>@<cwd>` for single hits, `multiple` or `multiple:<type>@<cwd>,<type>@<cwd>,...` for multi-hits.** The four concrete shapes are:
+  - `next` (single hit at root)
+  - `next@apps/web` (single hit in monorepo)
+  - `multiple` (multiple signatures at root — existing behavior, unchanged)
+  - `multiple:next@apps/web,rails@apps/api` (multiple hits across monorepo workspaces, always emitted as `type@path` pairs even when types are the same)
+  Rationale: `:` is the outer multi-hit delimiter and `@` is the inner type-path delimiter, making the grammar unambiguous under naive `awk -F:` or bash parameter expansion. Document this explicitly in the script header comment so callers cannot misread it.
+
+- **Decision: New scripts accept an optional path as a positional argument, not `--cwd`.** Rationale: every existing script in `scripts/` uses positional args (`parse-checklist.sh <path>`, `classify-oversized.sh <path> <path>`) or derives cwd from `git rev-parse --show-toplevel`. Flag-parsing would be a new convention. Follow the existing pattern: optional positional path defaults to `git rev-parse --show-toplevel`.
+
+- **Decision: Expected-no-result sentinels exit 0, not 1.** Rationale: the existing convention in `read-launch-json.sh` (header comment on lines 20-21 of that file) reserves non-zero exit for operational failure only (missing `jq`, no git root). `__NO_PACKAGE_JSON__` and similar sentinels exit 0 with the sentinel on stdout; callers pattern-match on stdout, not exit code.
+
+- **Decision: No provenance output on stderr.** Rationale: stderr across all existing scripts is reserved for `ERROR: ...` messages only. Provenance ("resolved_from: framework_config") would break that convention. `resolve-port.sh` emits a single-line integer on stdout, matching the simplicity of existing scripts. If future debugging surfaces real demand for provenance, add a second script or a `--verbose` mode in a follow-up — not speculatively.
+
+- **Decision: Monorepo probe has a depth cap of 3 and walks only if root detection returned `unknown`.** Rationale: depth 3 covers the common layouts (`apps/web/next.config.js`, `packages/frontend/vite.config.ts`, `services/api/next.config.js`). Running unconditionally would slow the common case and risk false positives when the root is a known type with example configs nested elsewhere (fixtures, templates). Depth 3 is a hard cap because deeper nesting usually means the user already needs to author `launch.json`.
+
+- **Decision: Exclude `node_modules/`, `.git/`, `vendor/`, `dist/`, `build/`, `coverage/`, `.next/`, `.nuxt/`, `.svelte-kit/`, `.turbo/`, `tmp/`, `fixtures/` from the monorepo probe.** Rationale: these directories ship config files as fixtures or build output that the user doesn't own. Without exclusion, a Rails app with `node_modules/next/.../examples/` would register as Next, and a monorepo with test fixtures would surface false positives.
+
+- **Decision: `resolve-package-manager.sh` returns one token (`npm` / `pnpm` / `yarn` / `bun`) plus the start command (stdout line 1 and line 2 respectively) so stub-writer substitution is deterministic.** Rationale: `pnpm dev` and `bun run dev` use different argv shapes. A single-token return would force the consumer to maintain a lookup table; emitting both the binary and the canonical args keeps all PM-specific knowledge in one place (the resolver).
+
+- **Decision: `resolve-port.sh` replaces the inline `dev-server-detection.md` cascade.** Rationale: the cascade lives in skill prose and has silently-buggy shell (unstripped quotes, noisy grep). Lifting it into a tested script with the sentinel-output convention makes the behavior assertable and fixes the bugs at the same site. `dev-server-detection.md` becomes a thin pointer to the script with the framework-default table retained.
+
+- **Decision: Port cascade probes authoritative config files first, `.env*` second, default last.** Rationale: Trevor's core complaint is that the current cascade prefers *prose* (AGENTS.md) over *config* (next.config.js, config/puma.rb). Flipping that ordering restores "the code is the source of truth."
+
+- **Decision: Drop the `AGENTS.md` / `CLAUDE.md` grep entirely.** Rationale: users who need to override have the explicit `--port` / `port:` CLI token and the `.claude/launch.json` escape hatch. Grepping instruction files for port numbers catches unrelated mentions ("connects to Stripe on port 8443", "example: localhost:3000") far more often than it captures a real override.
+
+- **Decision: Framework config probes use a conservative regex and treat misses as "no pin, fall through".** Rationale: parsing arbitrary JS/TS reliably requires a JS runtime, which polish doesn't ship with. A regex that catches `port: 3000`, `port: "3000"`, and `server: { port: 3000 }` literals covers the common patterns. Missed ports fall through to framework default — same behavior as today, just with more chances to catch an explicit value along the way.
+
+## Open Questions
+
+### Resolved During Planning
+
+- **Should Remix get a dedicated signature or route through Vite?** Resolved: both. Classic Remix ships `remix.config.js` without Vite; Remix 2.x+ ships `vite.config.ts`. Classic pattern gets its own signature in the detector so it resolves without ambiguity; new Remix continues to resolve as `vite` (the existing Vite recipe already documents SvelteKit/Astro/etc. as framework-on-Vite). The `remix` recipe notes both paths.
+
+- **Should the monorepo probe return all matches or just one?** Resolved: return one if there's a single match, `multiple` with `<type>@<path>` pairs if several. Multiple matches at depth ≤3 is the genuine disambiguation case the existing `multiple` sentinel was designed for; the new output is `multiple:next@apps/web,next@apps/admin` so the interactive prompt in Phase 3.2 can list the options.
+
+- **Where does SKILL.md document the new `<type>@<cwd>` format?** Resolved: extend the existing Phase 3.2 routing table with a "Paths with `@<cwd>` suffix" paragraph and update Phase 3.3 to substitute `cwd` when present. No new top-level section.
+
+- **Does the port resolver need to parse `docker-compose.yml`?** Resolved: yes, but lightly — grep for `- "<port>:<port>"` under a `ports:` key on the service named `web` / `app` / `frontend`. Full YAML parsing is out of scope; a line-anchored regex catches the common compose shape and misses gracefully on exotic configs.
+
+### Deferred to Implementation
+
+- **Exact regex for framework config port probes.** Start with `port:\s*[0-9]+` and `port:\s*["']?[0-9]+["']?`, tighten if tests surface false positives. Unit 4 owns this.
+- **Whether `pnpm dev` should be `pnpm dev` or `pnpm run dev`.** Both work; pick whichever is idiomatic per the current pnpm docs at the time of implementation and pin it in the resolver's lookup table.
+- **Whether to probe `bun.lock` ahead of `bun.lockb`.** Bun recently added a text lockfile format (`bun.lock`) alongside the binary (`bun.lockb`); priority likely doesn't matter (only one will be present) but the resolver should match whichever is there.
+
+## Implementation Units
+
+- [x] **Unit 1: Add first-class recipes for Nuxt, Astro, Remix, SvelteKit**
+
+**Goal:** Give the four "table stakes" JS frontend frameworks their own reference recipes with correct ports, start commands, and stub templates, so they stop falling through to `unknown`.
+
+**Requirements:** R1, R6
+
+**Dependencies:** None (recipe files are additive; they don't activate until Unit 2 extends the detector)
+
+**Files:**
+- Create: `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-nuxt.md`
+- Create: `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-astro.md`
+- Create: `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-remix.md`
+- Create: `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-sveltekit.md`
+- Modify: `plugins/compound-engineering/skills/ce-polish-beta/references/launch-json-schema.md` (add 4 stub templates)
+
+**Approach:**
+- Mirror the structure of `dev-server-next.md` exactly: Signature / Start command / Port / Stub generation / Common gotchas
+- Defaults per the current framework docs: Nuxt port 3000, Astro port 4321, Remix port 3000 (classic) or 5173 (Vite), SvelteKit port 5173
+- Each recipe's "Common gotchas" section notes interactions users will actually hit: Nuxt's Nitro, Astro's SSR vs SSG dev behavior, Remix's classic-vs-Vite fork, SvelteKit's adapter-free dev mode
+- Stub templates in `launch-json-schema.md` match the existing Next/Vite/Rails/Procfile pattern
+
+**Patterns to follow:**
+- `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-next.md` for overall shape
+- `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-vite.md` for framework-on-Vite notes (relevant to SvelteKit and new Remix)
+
+**Test scenarios:** Test expectation: none — reference markdown is consumed by the model, not asserted. Unit 5's integration test covers that these recipes are selected correctly when their respective signatures are present.
+
+**Verification:**
+- Four new reference files exist with all five required sections
+- `launch-json-schema.md` has stub templates for all four new types
+- A reader landing on a new recipe can answer "what command do I run, at what port, with what launch.json stub?" without leaving the file
+
+- [x] **Unit 2: Extend detect-project-type.sh with new signatures and monorepo probe**
+
+**Goal:** The detector recognizes Nuxt/Astro/Remix/SvelteKit at the repo root and descends up to depth 3 into workspaces when root detection returns `unknown`, emitting `<type>` or `<type>@<cwd>` as appropriate.
+
+**Requirements:** R1, R2, R5
+
+**Dependencies:** Unit 1 (new types must have recipes before the detector returns them, so Phase 3.2 routing in Unit 5 doesn't dead-end)
+
+**Files:**
+- Modify: `plugins/compound-engineering/skills/ce-polish-beta/scripts/detect-project-type.sh`
+- Create: `tests/skills/ce-polish-beta-project-type.test.ts`
+
+**Approach:**
+- Keep the existing root-scan precedence block intact (rails beats procfile, single-match returns `<type>`)
+- Add signature checks for `nuxt.config.{js,mjs,ts}`, `astro.config.{js,mjs,ts}`, `remix.config.{js,ts}`, and `svelte.config.{js,mjs,ts}` at root
+- When the root-scan yields zero matches, run a shallow `find` with `-maxdepth 3` excluding `node_modules`, `.git`, `vendor`, `dist`, `build`, `coverage`, `.next`, `.nuxt`, `.svelte-kit`, `.turbo`, `tmp`, `fixtures` looking for any supported signature filename
+- Collect hits as `(type, relative-dir)` pairs. Deduplicate on the pair
+- Single hit → emit `<type>@<cwd>` (or bare `<type>` when the hit is `.`)
+- Multiple hits → emit `multiple:<type1>@<cwd1>,<type2>@<cwd2>,...` (always include the type prefix so the grammar is unambiguous under naive `awk -F:` on the outer separator)
+- Zero monorepo hits → emit `unknown` unchanged
+- **Header comment requirements:** document the output grammar explicitly (the four concrete shapes: `<type>` / `<type>@<cwd>` / `multiple` / `multiple:<type>@<cwd>,...`), the depth cap of 3 with its rationale, and the exclusion list. Callers should not have to reverse-engineer the grammar from examples
+
+**Execution note:** Test-first — add the new test file with scenarios for each new signature, monorepo single-hit, monorepo multi-hit, exclusion of `node_modules`, and the unchanged-root-detection regression cases. Run the suite red, then modify the detector to go green. This script is load-bearing for dev-server startup and has no production telemetry; tests are the only safety net.
+
+**Patterns to follow:**
+- Existing `detect-project-type.sh` precedence block (rails-before-procfile)
+- `tests/skills/ce-polish-beta-dev-server.test.ts` for test harness shape
+
+**Test scenarios:**
+- Happy path: `nuxt.config.ts` at root → `nuxt`
+- Happy path: `astro.config.mjs` at root → `astro`
+- Happy path: `remix.config.js` at root → `remix`
+- Happy path: `svelte.config.js` at root → `sveltekit`
+- Happy path: `apps/web/next.config.js` in Turborepo layout → `next@apps/web`
+- Happy path: `packages/frontend/vite.config.ts` in pnpm-workspace layout → `vite@packages/frontend`
+- Edge case: `apps/web/next.config.js` and `apps/admin/next.config.js` → `multiple:next@apps/web,next@apps/admin`
+- Edge case: `apps/web/next.config.js` and `apps/api/Gemfile+bin/dev` → `multiple:next@apps/web,rails@apps/api`
+- Edge case: signature inside `node_modules/next/examples/...` → ignored (root returns `unknown`)
+- Edge case: signature at depth 4 (`projects/app/web/client/next.config.js`) → ignored
+- Edge case: signature alongside `bin/dev`+`Gemfile` at root → returns `rails` (root wins, no probe runs)
+- Regression: existing 4-type root detection unchanged when signatures present at root
+- Regression: `Procfile.dev` + `bin/dev` + `Gemfile` → still returns `rails`, not `multiple`
+
+**Verification:**
+- All 12 test scenarios pass
+- `bash scripts/detect-project-type.sh` run in a real Turborepo returns `next@apps/web` (or whichever app path matches)
+- Run in the plugin's own repo root still returns the existing detection (or `unknown`, matching prior behavior)
+
+- [x] **Unit 3: Package-manager resolver script**
+
+**Goal:** A new `resolve-package-manager.sh` emits the project's package manager (`npm` / `pnpm` / `yarn` / `bun`) plus the canonical dev-server argv, so the stub-writer can substitute both without in-agent judgment.
+
+**Requirements:** R3, R6
+
+**Dependencies:** None
+
+**Files:**
+- Create: `plugins/compound-engineering/skills/ce-polish-beta/scripts/resolve-package-manager.sh`
+- Create: `tests/skills/ce-polish-beta-package-manager.test.ts`
+
+**Approach:**
+- Accept an optional path as a positional argument (first positional); default to repo root via `git rev-parse --show-toplevel` when omitted
+- In the resolved path, check for lockfiles in priority order: `pnpm-lock.yaml` → `yarn.lock` → `bun.lockb` / `bun.lock` → `package-lock.json`
+- Emit two lines on stdout: line 1 = token (`npm` | `pnpm` | `yarn` | `bun`), line 2 = canonical command tail as a space-separated argv (e.g., `run dev` for npm/bun, `dev` for pnpm/yarn)
+- Fall through to `npm` + `run dev` only when a `package.json` is present and no lockfile matches (matches prior hardcoded behavior, so no regression for vanilla projects). If the path is a valid directory but contains no `package.json`, do not fall through to `npm` — emit the sentinel instead (see next bullet), so callers can distinguish "JavaScript project with no lockfile" from "not a JavaScript project at all"
+- If the path is a valid directory but contains no `package.json`, emit sentinel `__NO_PACKAGE_JSON__` on stdout and exit 0 (expected-no-match, matching `read-launch-json.sh` sentinel convention — callers pattern-match on stdout, not exit code)
+- When both `bun.lockb` (binary) and `bun.lock` (text) are present in the same directory, prefer `bun.lock` (text). Rationale: Bun's text lockfile is the newer, canonical format; the binary format is a legacy variant. Only one will normally be present, but the resolver must deterministically pick one when both exist
+- If the path itself does not exist or is not a directory, emit `ERROR:` on stderr and exit 1 (operational failure, distinct from expected-no-match)
+- **Header comment requirements:** document the two-line stdout grammar (line 1 = binary, line 2 = argv tail), the lockfile priority order and why, and the sentinel-vs-error exit-code split
+
+**Patterns to follow:**
+- `plugins/compound-engineering/skills/ce-polish-beta/scripts/read-launch-json.sh` for sentinel outputs and exit codes
+- Existing `detect-project-type.sh` for simple lockfile-presence checks
+
+**Test scenarios:**
+- Happy path: `pnpm-lock.yaml` present → stdout: `pnpm\ndev`
+- Happy path: `yarn.lock` present → stdout: `yarn\ndev`
+- Happy path: `bun.lockb` present → stdout: `bun\nrun dev`
+- Happy path: `bun.lock` (text format) present → stdout: `bun\nrun dev`
+- Happy path: `package-lock.json` present → stdout: `npm\nrun dev`
+- Happy path: no lockfile, `package.json` present → stdout: `npm\nrun dev` (safe default)
+- Edge case: both `pnpm-lock.yaml` and `yarn.lock` present → stdout: `pnpm\ndev` (priority order wins)
+- Edge case: positional path pointing to `apps/web` — reads lockfile from subdir, not repo root
+- Edge case: positional path to a directory without `package.json` → stdout `__NO_PACKAGE_JSON__`, exit 0 (expected-no-match sentinel)
+- Edge case: no positional arg, not in a git repo → stderr `ERROR:` + exit 1 (operational failure)
+- Edge case: positional path but directory doesn't exist → stderr `ERROR:` + exit 1 (operational failure)
+
+**Verification:**
+- All test scenarios pass
+- Running from a real pnpm repo returns `pnpm\ndev`
+- Running from a real npm repo returns `npm\nrun dev`
+
+- [x] **Unit 4: Port resolver script with authoritative config probes**
+
+**Goal:** A new `resolve-port.sh` probes config files in priority order (framework config → `config/puma.rb` → `Procfile.dev` → `docker-compose.yml` → `package.json` scripts → `.env*` → default), correctly parses `.env` values (stripping quotes and `# comment`), and drops the `AGENTS.md`/`CLAUDE.md` grep.
+
+**Requirements:** R4, R6
+
+**Dependencies:** None
+
+**Files:**
+- Create: `plugins/compound-engineering/skills/ce-polish-beta/scripts/resolve-port.sh`
+- Create: `tests/skills/ce-polish-beta-resolve-port.test.ts`
+- Modify: `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-detection.md`
+
+**Approach:**
+- Accept optional positional path as the first positional argument (defaults to `git rev-parse --show-toplevel` when omitted) — consistent with `parse-checklist.sh` and the Unit 3 resolver
+- Accept optional `--type <rails|next|vite|nuxt|astro|remix|sveltekit|procfile>` flag to scope which probes run (e.g., skip `config/puma.rb` for Next). Type is a classification, not a path, so the flag form is appropriate and distinguishable from the positional path
+- Accept optional `--port <n>` flag as an explicit override (emit immediately when present, before any probing)
+- Probe order (first hit wins):
+  1. Explicit `--port` flag
+  2. Framework config: `next.config.*` / `vite.config.*` / `nuxt.config.*` / `astro.config.*` — conservative regex for `port:\s*["']?[0-9]+["']?` or `server.port\s*=\s*[0-9]+`. Numeric literals only; reject matches where the value is a variable reference (e.g., `process.env.PORT`, `getPort()`) so we do not emit a misleading default
+  3. Rails: `config/puma.rb` `port\s+[0-9]+`
+  4. Procfile: `Procfile.dev` `web:` line scanned for `-p <n>` / `--port <n>`
+  5. `docker-compose.yml`: in service named `web` / `app` / `frontend`, the first `"<n>:<n>"` line under `ports:`
+  6. `package.json` `dev`/`start` script for `--port <n>` / `-p <n>`
+  7. `.env*` files: check in override order **`.env.local` → `.env.development` → `.env`** (first hit wins, matching the convention most JS frameworks use where `.env.local` overrides `.env.development` which overrides `.env`). Parse `PORT=<n>`, stripping surrounding `"` or `'` and truncating at `#` (after trimming whitespace)
+  8. Framework default (emitted from a lookup table: rails/next/nuxt/remix=3000, vite/sveltekit=5173, astro=4321, procfile=3000, unknown=3000)
+- Emit the resolved port as a single line on stdout. Do **not** emit provenance — stderr is reserved for `ERROR:` messages, matching the existing convention in `read-launch-json.sh` and `parse-checklist.sh`. If future debugging demand surfaces, add a `--verbose` mode in a follow-up rather than speculatively
+- Rewrite `dev-server-detection.md`: the inline bash cascade is removed; the file becomes a navigable pointer ("Port resolution runs via `scripts/resolve-port.sh`") plus the framework-default table and probe-order rationale. Include an explicit **sync-note block** listing the three intentional divergences from `test-browser`'s inline cascade: (a) quote stripping on `.env` values, (b) comment stripping on `.env` values, (c) removal of the `AGENTS.md`/`CLAUDE.md` grep. The block tells a future maintainer of either skill exactly what not to "fix" back to symmetry
+- **Header comment requirements:** document the probe-order rationale (config-before-prose), the `.env` parsing contract (quote + comment stripping), and the reason `AGENTS.md`/`CLAUDE.md` grepping is deliberately omitted
+
+**Execution note:** Test-first — `.env` parsing bugs are the whole point. Write cases for quoted, single-quoted, comment-trailed, whitespace-padded, and multi-line forms first. Implement against those cases.
+
+**Patterns to follow:**
+- Existing cascade in `references/dev-server-detection.md` for probe order (improved, not replaced wholesale)
+- `scripts/parse-checklist.sh` for bash regex patterns and awk/sed composition
+- `scripts/read-launch-json.sh` for sentinel conventions and stderr-for-diagnostics
+
+**Test scenarios:**
+- Happy path: `--port 8080` explicit → `8080`
+- Happy path: `next.config.js` with `port: 4000` → `4000`
+- Happy path: `next.config.ts` with `server: { port: 4000 }` → `4000`
+- Happy path: `config/puma.rb` with `port 3001` → `3001` (rails type)
+- Happy path: `Procfile.dev` `web: bundle exec puma -p 4567` → `4567`
+- Happy path: `docker-compose.yml` with `web:\n  ports:\n    - "9000:9000"` → `9000`
+- Happy path: `package.json` `"dev": "next dev --port 4000"` → `4000`
+- Edge case: `.env` `PORT=3001` → `3001`
+- Edge case: `.env` `PORT="3001"` → `3001` (quotes stripped)
+- Edge case: `.env` `PORT='3001'` → `3001` (single quotes stripped)
+- Edge case: `.env` `PORT=3001 # dev only` → `3001` (comment stripped)
+- Edge case: `.env` `PORT="3001" # quoted+commented` → `3001`
+- Edge case: `.env` `  PORT = 3001  ` → `3001` (whitespace tolerated)
+- Edge case: `.env.local` `PORT=4000` + `.env` `PORT=3000` both present → `4000` (`.env.local` precedence)
+- Edge case: `.env.development` `PORT=4000` + `.env` `PORT=3000` both present → `4000` (`.env.development` precedence)
+- Edge case: `.env.local` `PORT=4000` + `.env.development` `PORT=5000` both present → `4000` (`.env.local` beats `.env.development`)
+- Edge case: multiple probes hit — framework config wins over `.env` (priority order)
+- Edge case: no probe matches, `--type next` → `3000` (default)
+- Edge case: no probe matches, `--type vite` → `5173`
+- Edge case: no probe matches, `--type astro` → `4321`
+- Edge case: no probe matches, no `--type` → `3000` (unknown default)
+- Error path: malformed `docker-compose.yml` — probe misses, falls through (no crash)
+- Error path: `next.config.js` with computed port (`port: getPort()`) — regex misses, falls through
+- Error path: `next.config.js` with `port: process.env.PORT || 3000` — probe rejects the variable reference and falls through to `.env` / default (does not emit `3000` as if it were a framework-config hit)
+- Error path: positional path does not exist → stderr `ERROR:` + exit 1 (operational failure, not a fall-through)
+- Regression: `AGENTS.md` mentioning port `8443` in prose — ignored (grep removed)
+- Regression: `CLAUDE.md` mentioning `localhost:3000` in examples — ignored
+
+**Verification:**
+- All 20+ test scenarios pass
+- Running in the plugin's own repo root returns `3000` (default, since no framework config)
+- Running against a synthetic Rails repo with `config/puma.rb port 3001` returns `3001`
+- `dev-server-detection.md` no longer contains inline shell; it describes the probe order and framework-default table
+
+- [x] **Unit 5: Wire new scripts and signatures into SKILL.md Phase 3**
+
+**Goal:** SKILL.md Phase 3.2 routes the four new types and handles the `<type>@<cwd>` format; Phase 3.3 substitutes package-manager + cwd into stubs; port resolution calls `resolve-port.sh` instead of the inline cascade.
+
+**Requirements:** R1, R2, R3, R4, R5
+
+**Dependencies:** Units 1–4 (recipes, signatures, resolvers all exist)
+
+**Files:**
+- Modify: `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md` (Phase 3.2 routing table, Phase 3.3 stub-writer logic, references list at bottom)
+
+**Approach:**
+- Phase 3.2 routing table gains four new rows (nuxt, astro, remix, sveltekit)
+- Phase 3.2 adds a paragraph under the table: "When the detector returns `<type>@<cwd>`, route by `<type>` as usual, and carry `<cwd>` into the stub-writer for Phase 3.3. When the detector returns `multiple:<type1>@<cwd1>,<type2>@<cwd2>,...`, the interactive prompt lists the `<type>@<cwd>` pairs and asks the user to pick one; headless mode emits the standard `multiple` failure with the pair list appended."
+- Phase 3.3 stub-writer logic updated: "For Next/Vite/Nuxt/Astro/Remix/SvelteKit stubs, call `resolve-package-manager.sh` (passing `<cwd>` as the positional arg when present) and substitute the emitted binary and args into `runtimeExecutable` / `runtimeArgs`. When the detector emitted `<type>@<cwd>`, populate the stub's `cwd` field with that value. For port, call `resolve-port.sh [<cwd>] --type <type>` and substitute the emitted port."
+- References list at the bottom of SKILL.md gains the three new reference files (Unit 1) and two new scripts (Units 3 and 4)
+- `dev-server-detection.md` reference in the "Cascade" section is kept but its description changes to "Port-resolution documentation — the runtime path is `scripts/resolve-port.sh`"
+
+**Patterns to follow:**
+- Existing Phase 3.2 table structure and prose (keep the table format, add rows)
+- Existing Phase 3.3 stub-writer prose (keep imperative style, add substitution bullets)
+- Existing reference list at SKILL.md bottom (alphabetical within scripts/references groups)
+
+**Test scenarios:**
+- Test expectation: none — SKILL.md content is model-consumed. The behavior it documents is asserted by Units 2, 3, and 4 unit tests.
+
+**Verification:**
+- `bun test tests/skills/ce-polish-beta-*` passes (all old + new tests green)
+- `bun run release:validate` passes (SKILL.md structure intact, no broken references)
+- Reading SKILL.md Phase 3 start-to-finish, a reader can trace: "detector says `next@apps/web`" → "Phase 3.3 substitutes pm+port+cwd from resolvers into Next stub" → "final stub has `cwd: apps/web`, `runtimeExecutable: pnpm`, `port: 3001`"
+- Four new reference files and two new scripts appear in the SKILL.md references list
+
+## High-Level Technical Design
+
+> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.*
+
+**Data flow through Phase 3 after the fix:**
+
+```
+    .claude/launch.json exists? ──yes──▶ use it verbatim ──▶ Phase 3.5
+          │
+          no
+          ▼
+    detect-project-type.sh
+          │
+          ├─ rails | next | vite | procfile | nuxt | astro | remix | sveltekit
+          │         │
+          │         ▼
+          │    load references/dev-server-<type>.md
+          │    (recipe: command, default port, gotchas)
+          │
+          ├─ <type>@<cwd>     (monorepo hit, depth ≤ 3)
+          │         │
+          │         ▼
+          │    load recipe + remember cwd for stub-writer
+          │
+          ├─ multiple[:<type>@<cwd>,...]   (disambiguation needed)
+          │         │
+          │         ▼
+          │    interactive: user picks <type>@<cwd> pair
+          │    headless: fail with pair list
+          │
+          └─ unknown          (no signature anywhere in scan scope)
+                    │
+                    ▼
+               interactive: ask for exec/args/port
+               headless: fail
+
+    ── stub-writer (Phase 3.3) ──────────────────────────
+
+    pm = resolve-package-manager.sh [<cwd>]   (Next/Vite/Nuxt/Astro/Remix/SvelteKit)
+    port = resolve-port.sh [<cwd>] --type <type>
+
+    stub = template(type).with(
+             runtimeExecutable = pm.bin,
+             runtimeArgs       = pm.args,
+             port              = port,
+             cwd               = cwd if present
+           )
+```
+
+**Probe-order for `resolve-port.sh` (first hit wins):**
+
+| Rank | Source | Why this order |
+|------|--------|----------------|
+| 1 | Explicit CLI `--port` | User intent is authoritative |
+| 2 | Framework config (`next.config.*` / `vite.config.*` / `nuxt.config.*` / `astro.config.*`) | The framework itself reads this |
+| 3 | `config/puma.rb` (rails only) | Rails server actually binds here |
+| 4 | `Procfile.dev` web line | What `bin/dev` / foreman actually runs |
+| 5 | `docker-compose.yml` web service ports | Container port binding, often authoritative in Docker-first dev |
+| 6 | `package.json` `dev`/`start` scripts | Falls back to npm-style CLI flags |
+| 7 | `.env*` (quote- and comment-stripped) | Env override, commonly used |
+| 8 | Framework default | Last resort, documented table |
+
+## System-Wide Impact
+
+- **Interaction graph:** Phase 3.2 routing consumes detector output; Phase 3.3 stub-writer consumes resolver output. No other phases touch these scripts. Headless mode's "never mutate state" invariant is preserved because all new scripts are read-only probes.
+- **Error propagation:** New scripts follow the sentinel-on-stdout + exit-code convention. Phase 3 already handles sentinel outputs from `read-launch-json.sh`; new sentinels (`__NO_PACKAGE_JSON__`) integrate into the same handler shape. Unknown probes fall through to framework defaults (same as today) rather than erroring.
+- **State lifecycle risks:** None. No persisted state changes; the stub-writer writes `.claude/launch.json` only in interactive mode with user consent (Phase 3.3 existing behavior, preserved).
+- **API surface parity:** Not applicable — this is a skill-internal detection subsystem. The skill's public contract (argument tokens, `checklist.md` format, headless envelope shape) is unchanged.
+- **Integration coverage:** Unit 5's verification explicitly traces a full monorepo + pnpm + custom-port scenario end-to-end to catch integration bugs the per-unit tests miss.
+- **Unchanged invariants:**
+  - `.claude/launch.json` always wins over auto-detect (Phase 3.1 unchanged)
+  - `rails` still beats `procfile` at root (existing precedence preserved)
+  - Headless mode still never writes `.claude/launch.json`
+  - The cross-skill `dev-server-detection.md` duplication note (vs `test-browser`) remains manual-sync; this plan does not modify `test-browser`
+
+## Risks & Dependencies
+
+| Risk | Mitigation |
+|------|------------|
+| Monorepo probe false-positive (e.g., config in a fixture directory) | Exclusion list (`node_modules`, `fixtures`, etc.) in the probe; depth cap at 3; `multiple` output still triggers user disambiguation |
+| Framework config regex misses a valid port (e.g., computed expression) | Falls through to `.env` then framework default — same as today, just with more chances to catch a literal. Documented as best-effort |
+| Package-manager resolver picks wrong PM (e.g., stale `yarn.lock` in a pnpm-migrated repo) | Priority order follows common-case lockfile precedence; user can override via `launch.json`. Documented in the resolver's header comment |
+| New test files slow the suite | Each new test file adds ~10-20 cases using the existing tmp-repo harness (already fast in `ce-polish-beta-dev-server.test.ts`); measurable impact expected < 2 seconds |
+| Changing `dev-server-detection.md` breaks a downstream reader | The file is only referenced from within the skill; no external consumers. Grep confirms no cross-skill references before the change lands |
+| Dropping `AGENTS.md`/`CLAUDE.md` port grep regresses users relying on it | Very low — the grep was added speculatively and the lossy pattern (`localhost:3000` match) makes it more likely to have surfaced wrong values than correct ones in the wild. Explicit `--port` and `.claude/launch.json` both remain as override paths |
+| Polish's `resolve-port.sh` diverges from `test-browser`'s inline cascade and the two drift silently | Unit 4 adds an explicit sync-note block inside `dev-server-detection.md` enumerating the three intentional divergences (quote stripping, comment stripping, no `AGENTS.md`/`CLAUDE.md` grep). A future maintainer who "fixes" `test-browser` by copying polish's cascade, or vice versa, will hit the sync-note first. No automated cross-skill check — acceptable because both skills are internal and the cascade is small |
+
+## Documentation / Operational Notes
+
+- Update PR description on #568 (or a follow-up PR) to note that these gaps are fixed and reference this plan
+- No marketplace release entry, version bump, or CHANGELOG edit — release-please handles it
+- No user-facing docs outside the skill's own reference tree
+- Keep `dev-server-detection.md` as a navigable doc explaining probe order + framework defaults, even though the implementation now lives in `resolve-port.sh`. Reviewers will still land there first when debugging port issues
+
+## Sources & References
+
+- **Origin:** PR feedback from @tmchow on EveryInc/compound-engineering-plugin#568 ([comment](https://github.com/EveryInc/compound-engineering-plugin/pull/568#issuecomment-4254733274))
+- **Previous plan:** `docs/plans/2026-04-15-001-feat-ce-polish-skill-plan.md` (feature this fixes)
+- **Related files:**
+  - `plugins/compound-engineering/skills/ce-polish-beta/scripts/detect-project-type.sh`
+  - `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-detection.md`
+  - `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-next.md`
+  - `plugins/compound-engineering/skills/ce-polish-beta/references/dev-server-vite.md`
+  - `plugins/compound-engineering/skills/ce-polish-beta/references/launch-json-schema.md`
+  - `plugins/compound-engineering/skills/ce-polish-beta/SKILL.md` (Phase 3)
+- **Test harness pattern:** `tests/skills/ce-polish-beta-dev-server.test.ts`