Merge upstream origin/main (v2.60.0) with fork customizations preserved

Incorporates 78 upstream commits while preserving all local fork intent:
- Keep deleted: dhh-rails, kieran-rails, dspy-ruby, andrew-kane-gem-writer (FastAPI pivot)
- Merge both: ce-review (zip-agent + design-conformance wiring),
  kieran-python-reviewer (pipeline + FastAPI conventions),
  ce-brainstorm/ce-plan/ce-work (improvements + deploy wiring),
  todo-create (template refs + assessment block),
  best-practices-researcher (rename + FastAPI refs)
- Accept remote: 142 remote-only files, plugin.json, README.md
- Keep local: 71 local-only files (custom agents, skills, commands, voice)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
John Lamb
2026-03-31 12:28:53 -05:00
153 changed files with 12801 additions and 3761 deletions

View File

@@ -1,7 +1,7 @@
---
name: ce:review
description: "Structured code review using tiered persona agents, confidence-gated findings, and a merge/dedup pipeline. Use when reviewing code changes before creating a PR."
argument-hint: "[mode:autofix|mode:report-only] [PR number, GitHub URL, or branch name]"
argument-hint: "[blank to review current branch, or provide PR link]"
---
# Code Review
@@ -16,15 +16,30 @@ Reviews code changes using dynamically selected reviewer personas. Spawns parall
- Can be invoked standalone
- Can run as a read-only or autofix review step inside larger workflows
## Mode Detection
## Argument Parsing
Check `$ARGUMENTS` for `mode:autofix` or `mode:report-only`. If either token is present, strip it from the remaining arguments before interpreting the rest as the PR number, GitHub URL, or branch name.
Parse `$ARGUMENTS` for the following optional tokens. Strip each recognized token before interpreting the remainder as the PR number, GitHub URL, or branch name.
| Token | Example | Effect |
|-------|---------|--------|
| `mode:autofix` | `mode:autofix` | Select autofix mode (see Mode Detection below) |
| `mode:report-only` | `mode:report-only` | Select report-only mode |
| `mode:headless` | `mode:headless` | Select headless mode for programmatic callers (see Mode Detection below) |
| `base:<sha-or-ref>` | `base:abc1234` or `base:origin/main` | Skip scope detection — use this as the diff base directly |
| `plan:<path>` | `plan:docs/plans/2026-03-25-001-feat-foo-plan.md` | Load this plan for requirements verification |
All tokens are optional. Each one present means one less thing to infer. When absent, fall back to existing behavior for that stage.
**Conflicting mode flags:** If multiple mode tokens appear in arguments, stop and do not dispatch agents. If `mode:headless` is one of the conflicting tokens, emit the headless error envelope: `Review failed (headless mode). Reason: conflicting mode flags — <mode_a> and <mode_b> cannot be combined.` Otherwise emit the generic form: `Review failed. Reason: conflicting mode flags — <mode_a> and <mode_b> cannot be combined.`
## Mode Detection
| Mode | When | Behavior |
|------|------|----------|
| **Interactive** (default) | No mode token present | Review, present findings, ask for policy decisions when needed, and optionally continue into fix/push/PR next steps |
| **Interactive** (default) | No mode token present | Review, apply safe_auto fixes automatically, present findings, ask for policy decisions on gated/manual findings, and optionally continue into fix/push/PR next steps |
| **Autofix** | `mode:autofix` in arguments | No user interaction. Review, apply only policy-allowed `safe_auto` fixes, re-review in bounded rounds, write a run artifact, and emit residual downstream work when needed |
| **Report-only** | `mode:report-only` in arguments | Strictly read-only. Review and report only, then stop with no edits, artifacts, todos, commits, pushes, or PR actions |
| **Headless** | `mode:headless` in arguments | Programmatic mode for skill-to-skill invocation. Apply `safe_auto` fixes silently (single pass), return all other findings as structured text output, write run artifacts, skip todos, and return "Review complete" signal. No interactive prompts. |
### Autofix mode rules
@@ -42,6 +57,19 @@ Check `$ARGUMENTS` for `mode:autofix` or `mode:report-only`. If either token is
- **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:report-only` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`.
- **Do not overlap mutating review with browser testing on the same checkout.** If a future orchestrator wants fixes, run the mutating review phase after browser testing or in an isolated checkout/worktree.
### Headless mode rules
- **Skip all user questions.** Never use the platform question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini) or other interactive prompts. Infer intent conservatively if the diff metadata is thin.
- **Require a determinable diff scope.** If headless mode cannot determine a diff scope (no branch, PR, or `base:` ref determinable without user interaction), emit `Review failed (headless mode). Reason: no diff scope detected. Re-invoke with a branch name, PR number, or base:<ref>.` and stop without dispatching agents.
- **Apply only `safe_auto -> review-fixer` findings in a single pass.** No bounded re-review rounds. Leave `gated_auto`, `manual`, `human`, and `release` work unresolved and return them in the structured output.
- **Return all non-auto findings as structured text output.** Use the headless output envelope format (see Stage 6 below) preserving severity, autofix_class, owner, requires_verification, confidence, evidence[], and pre_existing per finding.
- **Write a run artifact** under `.context/compound-engineering/ce-review/<run-id>/` summarizing findings, applied fixes, and advisory outputs. Include the artifact path in the structured output.
- **Do not create todo files.** The caller receives structured findings and routes downstream work itself.
- **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:headless` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`. When stopping, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.`
- **Not safe for concurrent use on a shared checkout.** Unlike `mode:report-only`, headless mutates files (applies `safe_auto` fixes). Callers must not run headless concurrently with other mutating operations on the same checkout.
- **Never commit, push, or create a PR** from headless mode. The caller owns those decisions.
- **End with "Review complete" as the terminal signal** so callers can detect completion. If all reviewers fail or time out, emit `Code review degraded (headless mode). Reason: 0 of N reviewers returned results.` followed by "Review complete".
## Severity Scale
All reviewers use P0-P3:
@@ -73,7 +101,7 @@ Routing rules:
## Reviewers
8 personas in two tiers, plus CE-specific agents. See [persona-catalog.md](./references/persona-catalog.md) for the full catalog.
16 reviewer personas in layered conditionals, plus CE-specific agents. See the persona catalog included below for the full catalog.
**Always-on (every review):**
@@ -82,10 +110,11 @@ Routing rules:
| `compound-engineering:review:correctness-reviewer` | Logic errors, edge cases, state bugs, error propagation |
| `compound-engineering:review:testing-reviewer` | Coverage gaps, weak assertions, brittle tests |
| `compound-engineering:review:maintainability-reviewer` | Coupling, complexity, naming, dead code, abstraction debt |
| `compound-engineering:review:project-standards-reviewer` | CLAUDE.md and AGENTS.md compliance -- frontmatter, references, naming, portability |
| `compound-engineering:review:agent-native-reviewer` | Verify new features are agent-accessible |
| `compound-engineering:research:learnings-researcher` | Search docs/solutions/ for past issues related to this PR |
**Conditional (selected per diff):**
**Cross-cutting conditional (selected per diff):**
| Agent | Select when diff touches... |
|-------|---------------------------|
@@ -94,18 +123,31 @@ Routing rules:
| `compound-engineering:review:api-contract-reviewer` | Routes, serializers, type signatures, versioning |
| `compound-engineering:review:data-migrations-reviewer` | Migrations, schema changes, backfills |
| `compound-engineering:review:reliability-reviewer` | Error handling, retries, timeouts, background jobs |
| `compound-engineering:review:adversarial-reviewer` | Diff >=50 changed non-test/non-generated/non-lockfile lines, or auth, payments, data mutations, external APIs |
| `compound-engineering:review:previous-comments-reviewer` | Reviewing a PR that has existing review comments or threads |
**Stack-specific conditional (selected per diff):**
| Agent | Select when diff touches... |
|-------|---------------------------|
| `compound-engineering:review:dhh-rails-reviewer` | Rails architecture, service objects, session/auth choices, or Hotwire-vs-SPA boundaries |
| `compound-engineering:review:kieran-rails-reviewer` | Rails application code where conventions, naming, and maintainability are in play |
| `compound-engineering:review:kieran-python-reviewer` | Python modules, endpoints, scripts, or services |
| `compound-engineering:review:kieran-typescript-reviewer` | TypeScript components, services, hooks, utilities, or shared types |
| `compound-engineering:review:julik-frontend-races-reviewer` | Stimulus/Turbo controllers, DOM events, timers, animations, or async UI flows |
**CE conditional (migration & external review):**
| Agent | Select when... |
|-------|----------------|
| `compound-engineering:review:design-conformance-reviewer` | Repo contains design documents or active plan matching current branch |
| `compound-engineering:review:schema-drift-detector` | Diff includes migration files -- cross-references schema.rb against included migrations |
| `compound-engineering:review:deployment-verification-agent` | Diff includes migration files -- produces deployment checklist with SQL verification queries |
| `compound-engineering:review:zip-agent-validator` | PR URL contains `git.zoominfo.com` -- pressure-tests zip-agent comments for validity |
## Review Scope
Every review spawns all 3 always-on personas plus the 2 CE always-on agents, then adds applicable conditionals. The tier model naturally right-sizes: a small config change triggers 0 conditionals = 5 reviewers. A large auth feature triggers security + maybe reliability = 7 reviewers.
Every review spawns all 4 always-on personas plus the 2 CE always-on agents, then adds whichever cross-cutting and stack-specific conditionals fit the diff. The model naturally right-sizes: a small config change triggers 0 conditionals = 6 reviewers. A Rails auth feature might trigger security + reliability + kieran-rails + dhh-rails = 10 reviewers.
## Protected Artifacts
@@ -123,9 +165,26 @@ If a reviewer flags any file in these directories for cleanup or removal, discar
Compute the diff range, file list, and diff. Minimize permission prompts by combining into as few commands as possible.
**If `base:` argument is provided (fast path):**
The caller already knows the diff base. Skip all base-branch detection, remote resolution, and merge-base computation. Use the provided value directly:
```
BASE_ARG="{base_arg}"
BASE=$(git merge-base HEAD "$BASE_ARG" 2>/dev/null) || BASE="$BASE_ARG"
```
Then produce the same output as the other paths:
```
echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard
```
This path works with any ref — a SHA, `origin/main`, a branch name. Automated callers (ce:work, lfg, slfg) should prefer this to avoid the detection overhead. **Do not combine `base:` with a PR number or branch target.** If both are present, stop with an error: "Cannot use `base:` with a PR number or branch target — `base:` implies the current checkout is already the correct branch. Pass `base:` alone, or pass the target alone and let scope detection resolve the base." This avoids scope/intent mismatches where the diff base comes from one source but the code and metadata come from another.
**If a PR number or GitHub URL is provided as an argument:**
If `mode:report-only` is active, do **not** run `gh pr checkout <number-or-url>` on the shared checkout. Tell the caller: "mode:report-only cannot switch the shared checkout to review a PR target. Run it from an isolated worktree/checkout for that PR, or run report-only with no target argument on the already checked out branch." Stop here unless the review is already running in an isolated checkout.
If `mode:report-only` or `mode:headless` is active, do **not** run `gh pr checkout <number-or-url>` on the shared checkout. For `mode:report-only`, tell the caller: "mode:report-only cannot switch the shared checkout to review a PR target. Run it from an isolated worktree/checkout for that PR, or run report-only with no target argument on the already checked out branch." For `mode:headless`, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.` Stop here unless the review is already running in an isolated checkout.
First, verify the worktree is clean before switching branches:
@@ -179,7 +238,7 @@ Extract PR title/body, base branch, and PR URL from `gh pr view`, then extract t
Check out the named branch, then diff it against the base branch. Substitute the provided branch name (shown here as `<branch>`).
If `mode:report-only` is active, do **not** run `git checkout <branch>` on the shared checkout. Tell the caller: "mode:report-only cannot switch the shared checkout to review another branch. Run it from an isolated worktree/checkout for `<branch>`, or run report-only on the current checkout with no target argument." Stop here unless the review is already running in an isolated checkout.
If `mode:report-only` or `mode:headless` is active, do **not** run `git checkout <branch>` on the shared checkout. For `mode:report-only`, tell the caller: "mode:report-only cannot switch the shared checkout to review another branch. Run it from an isolated worktree/checkout for `<branch>`, or run report-only on the current checkout with no target argument." For `mode:headless`, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.` Stop here unless the review is already running in an isolated checkout.
First, verify the worktree is clean before switching branches:
@@ -193,97 +252,45 @@ If the output is non-empty, inform the user: "You have uncommitted changes on th
git checkout <branch>
```
Then detect the review base branch before computing the merge-base. When the branch has an open PR, resolve the base ref from the PR's actual base repository (not just `origin`), mirroring the PR-mode logic for fork safety. Fall back to `origin/HEAD`, GitHub metadata, then common branch names:
Then detect the review base branch and compute the merge-base. Run the `references/resolve-base.sh` script, which handles fork-safe remote resolution with multi-fallback detection (PR metadata -> `origin/HEAD` -> `gh repo view` -> common branch names):
```
REVIEW_BASE_BRANCH=""
PR_BASE_REPO=""
if command -v gh >/dev/null 2>&1; then
PR_META=$(gh pr view --json baseRefName,url 2>/dev/null || true)
if [ -n "$PR_META" ]; then
REVIEW_BASE_BRANCH=$(echo "$PR_META" | jq -r '.baseRefName // empty')
PR_BASE_REPO=$(echo "$PR_META" | jq -r '.url // empty' | sed -n 's#https://github.com/\([^/]*/[^/]*\)/pull/.*#\1#p')
fi
fi
if [ -z "$REVIEW_BASE_BRANCH" ]; then REVIEW_BASE_BRANCH=$(git symbolic-ref --quiet --short refs/remotes/origin/HEAD 2>/dev/null | sed 's#^origin/##'); fi
if [ -z "$REVIEW_BASE_BRANCH" ] && command -v gh >/dev/null 2>&1; then REVIEW_BASE_BRANCH=$(gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name' 2>/dev/null); fi
if [ -z "$REVIEW_BASE_BRANCH" ]; then
for candidate in main master develop trunk; do
if git rev-parse --verify "origin/$candidate" >/dev/null 2>&1 || git rev-parse --verify "$candidate" >/dev/null 2>&1; then
REVIEW_BASE_BRANCH="$candidate"
break
fi
done
fi
if [ -n "$REVIEW_BASE_BRANCH" ]; then
if [ -n "$PR_BASE_REPO" ]; then
PR_BASE_REMOTE=$(git remote -v | awk "index(\$2, \"github.com:$PR_BASE_REPO\") || index(\$2, \"github.com/$PR_BASE_REPO\") {print \$1; exit}")
if [ -n "$PR_BASE_REMOTE" ]; then
git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH" 2>/dev/null || true
BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
fi
fi
if [ -z "$BASE_REF" ]; then
git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags origin "$REVIEW_BASE_BRANCH" 2>/dev/null || true
BASE_REF=$(git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" 2>/dev/null || git rev-parse --verify "$REVIEW_BASE_BRANCH" 2>/dev/null || true)
fi
if [ -n "$BASE_REF" ]; then BASE=$(git merge-base HEAD "$BASE_REF" 2>/dev/null) || BASE=""; else BASE=""; fi
else BASE=""; fi
RESOLVE_OUT=$(bash references/resolve-base.sh) || { echo "ERROR: resolve-base.sh failed"; exit 1; }
if [ -z "$RESOLVE_OUT" ] || echo "$RESOLVE_OUT" | grep -q '^ERROR:'; then echo "${RESOLVE_OUT:-ERROR: resolve-base.sh produced no output}"; exit 1; fi
BASE=$(echo "$RESOLVE_OUT" | sed 's/^BASE://')
```
If the script outputs an error, stop instead of falling back to `git diff HEAD`; a branch review without the base branch would only show uncommitted changes and silently miss all committed work.
On success, produce the diff:
```
if [ -n "$BASE" ]; then echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard; else echo "ERROR: Unable to resolve review base branch locally. Fetch the base branch and rerun, or provide a PR number so the review scope can be determined from PR metadata."; fi
echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard
```
If the branch has an open PR, the detection above uses the PR's base repository to resolve the merge-base, which handles fork workflows correctly. You may still fetch additional PR metadata with `gh pr view` for title, body, and linked issues, but do not fail if no PR exists. If the base branch still cannot be resolved after the detection and fetch attempts, stop instead of falling back to `git diff HEAD`; a branch review without the base branch would only show uncommitted changes and silently miss all committed work.
You may still fetch additional PR metadata with `gh pr view` for title, body, and linked issues, but do not fail if no PR exists.
**If no argument (standalone on current branch):**
Detect the review base branch before computing the merge-base. When the current branch has an open PR, resolve the base ref from the PR's actual base repository (not just `origin`), mirroring the PR-mode logic for fork safety. Fall back to `origin/HEAD`, GitHub metadata, then common branch names:
Detect the review base branch and compute the merge-base using the same `references/resolve-base.sh` script as branch mode:
```
REVIEW_BASE_BRANCH=""
PR_BASE_REPO=""
if command -v gh >/dev/null 2>&1; then
PR_META=$(gh pr view --json baseRefName,url 2>/dev/null || true)
if [ -n "$PR_META" ]; then
REVIEW_BASE_BRANCH=$(echo "$PR_META" | jq -r '.baseRefName // empty')
PR_BASE_REPO=$(echo "$PR_META" | jq -r '.url // empty' | sed -n 's#https://github.com/\([^/]*/[^/]*\)/pull/.*#\1#p')
fi
fi
if [ -z "$REVIEW_BASE_BRANCH" ]; then REVIEW_BASE_BRANCH=$(git symbolic-ref --quiet --short refs/remotes/origin/HEAD 2>/dev/null | sed 's#^origin/##'); fi
if [ -z "$REVIEW_BASE_BRANCH" ] && command -v gh >/dev/null 2>&1; then REVIEW_BASE_BRANCH=$(gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name' 2>/dev/null); fi
if [ -z "$REVIEW_BASE_BRANCH" ]; then
for candidate in main master develop trunk; do
if git rev-parse --verify "origin/$candidate" >/dev/null 2>&1 || git rev-parse --verify "$candidate" >/dev/null 2>&1; then
REVIEW_BASE_BRANCH="$candidate"
break
fi
done
fi
if [ -n "$REVIEW_BASE_BRANCH" ]; then
if [ -n "$PR_BASE_REPO" ]; then
PR_BASE_REMOTE=$(git remote -v | awk "index(\$2, \"github.com:$PR_BASE_REPO\") || index(\$2, \"github.com/$PR_BASE_REPO\") {print \$1; exit}")
if [ -n "$PR_BASE_REMOTE" ]; then
git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH" 2>/dev/null || true
BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
fi
fi
if [ -z "$BASE_REF" ]; then
git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags origin "$REVIEW_BASE_BRANCH" 2>/dev/null || true
BASE_REF=$(git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" 2>/dev/null || git rev-parse --verify "$REVIEW_BASE_BRANCH" 2>/dev/null || true)
fi
if [ -n "$BASE_REF" ]; then BASE=$(git merge-base HEAD "$BASE_REF" 2>/dev/null) || BASE=""; else BASE=""; fi
else BASE=""; fi
RESOLVE_OUT=$(bash references/resolve-base.sh) || { echo "ERROR: resolve-base.sh failed"; exit 1; }
if [ -z "$RESOLVE_OUT" ] || echo "$RESOLVE_OUT" | grep -q '^ERROR:'; then echo "${RESOLVE_OUT:-ERROR: resolve-base.sh produced no output}"; exit 1; fi
BASE=$(echo "$RESOLVE_OUT" | sed 's/^BASE://')
```
If the script outputs an error, stop instead of falling back to `git diff HEAD`; a standalone review without the base branch would only show uncommitted changes and silently miss all committed work on the branch.
On success, produce the diff:
```
if [ -n "$BASE" ]; then echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard; else echo "ERROR: Unable to resolve review base branch locally. Fetch the base branch and rerun, or provide a PR number so the review scope can be determined from PR metadata."; fi
echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard
```
Parse: `BASE:` = merge-base SHA, `FILES:` = file list, `DIFF:` = diff, `UNTRACKED:` = files excluded from review scope because they are not staged. Using `git diff $BASE` (without `..HEAD`) diffs the merge-base against the working tree, which includes committed, staged, and unstaged changes together. If the base branch cannot be resolved after the detection and fetch attempts, stop instead of falling back to `git diff HEAD`; a standalone review without the base branch would only show uncommitted changes and silently miss all committed work on the branch.
Using `git diff $BASE` (without `..HEAD`) diffs the merge-base against the working tree, which includes committed, staged, and unstaged changes together.
**Untracked file handling:** Always inspect the `UNTRACKED:` list, even when `FILES:`/`DIFF:` are non-empty. Untracked files are outside review scope until staged. If the list is non-empty, tell the user which files are excluded. If any of them should be reviewed, stop and tell the user to `git add` them first and rerun. Only continue when the user is intentionally reviewing tracked changes only.
**Untracked file handling:** Always inspect the `UNTRACKED:` list, even when `FILES:`/`DIFF:` are non-empty. Untracked files are outside review scope until staged. If the list is non-empty, tell the user which files are excluded. If any of them should be reviewed, stop and tell the user to `git add` them first and rerun. Only continue when the user is intentionally reviewing tracked changes only. In `mode:headless` or `mode:autofix`, do not stop to ask — proceed with tracked changes only and note the excluded untracked files in the Coverage section of the output.
### Stage 2: Intent discovery
@@ -299,7 +306,7 @@ Understand what the change is trying to accomplish. The source of intent depends
echo "BRANCH:" && git rev-parse --abbrev-ref HEAD && echo "COMMITS:" && git log --oneline ${BASE}..HEAD
```
Combined with conversation context (plan section summary, PR description, caller-provided description), write a 2-3 line intent summary:
Combined with conversation context (plan section summary, PR description), write a 2-3 line intent summary:
```
Intent: Simplify tax calculation by replacing the multi-tier rate lookup
@@ -311,11 +318,31 @@ Pass this to every reviewer in their spawn prompt. Intent shapes *how hard each
**When intent is ambiguous:**
- **Interactive mode:** Ask one question using the platform's interactive question tool (AskUserQuestion in Claude Code, request_user_input in Codex): "What is the primary goal of these changes?" Do not spawn reviewers until intent is established.
- **Autofix/report-only modes:** Infer intent conservatively from the branch name, diff, PR metadata, and caller context. Note the uncertainty in Coverage or Verdict reasoning instead of blocking.
- **Autofix/report-only/headless modes:** Infer intent conservatively from the branch name, diff, PR metadata, and caller context. Note the uncertainty in Coverage or Verdict reasoning instead of blocking.
### Stage 2b: Plan discovery (requirements verification)
Locate the plan document so Stage 6 can verify requirements completeness. Check these sources in priority order — stop at the first hit:
1. **`plan:` argument.** If the caller passed a plan path, use it directly. Read the file to confirm it exists.
2. **PR body.** If PR metadata was fetched in Stage 1, scan the body for paths matching `docs/plans/*.md`. If exactly one match is found and the file exists, use it as `plan_source: explicit`. If multiple plan paths appear, treat as ambiguous — demote to `plan_source: inferred` for the most recent match that exists on disk, or skip if none exist or none clearly relate to the PR title/intent. Always verify the selected file exists before using it — stale or copied plan links in PR descriptions are common.
3. **Auto-discover.** Extract 2-3 keywords from the branch name (e.g., `feat/onboarding-skill` -> `onboarding`, `skill`). Glob `docs/plans/*` and filter filenames containing those keywords. If exactly one match, use it. If multiple matches or the match looks ambiguous (e.g., generic keywords like `review`, `fix`, `update` that could hit many plans), **skip auto-discovery** — a wrong plan is worse than no plan. If zero matches, skip.
**Confidence tagging:** Record how the plan was found:
- `plan:` argument -> `plan_source: explicit` (high confidence)
- Single unambiguous PR body match -> `plan_source: explicit` (high confidence)
- Multiple/ambiguous PR body matches -> `plan_source: inferred` (lower confidence)
- Auto-discover with single unambiguous match -> `plan_source: inferred` (lower confidence)
If a plan is found, read its **Requirements Trace** (R1, R2, etc.) and **Implementation Units** (checkbox items). Store the extracted requirements list and `plan_source` for Stage 6. Do not block the review if no plan is found — requirements verification is additive, not required.
### Stage 3: Select reviewers
Read the diff and file list from Stage 1. The 3 always-on personas and 2 CE always-on agents are automatic. For each conditional persona in [persona-catalog.md](./references/persona-catalog.md), decide whether the diff warrants it. This is agent judgment, not keyword matching.
Read the diff and file list from Stage 1. The 4 always-on personas and 2 CE always-on agents are automatic. For each cross-cutting and stack-specific conditional persona in the persona catalog included below, decide whether the diff warrants it. This is agent judgment, not keyword matching.
**`previous-comments` is PR-only.** Only select this persona when Stage 1 gathered PR metadata (PR number or URL was provided as an argument, or `gh pr view` returned metadata for the current branch). Skip it entirely for standalone branch reviews with no associated PR -- there are no prior comments to check.
Stack-specific personas are additive. A Rails UI change may warrant `kieran-rails` plus `julik-frontend-races`; a TypeScript API diff may warrant `kieran-typescript` plus `api-contract` and `reliability`.
For CE conditional agents, check if the diff includes files matching `db/migrate/*.rb`, `db/schema.rb`, or data backfill scripts. If the PR URL contains `git.zoominfo.com`, select `zip-agent-validator`.
@@ -326,29 +353,55 @@ Review team:
- correctness (always)
- testing (always)
- maintainability (always)
- project-standards (always)
- agent-native-reviewer (always)
- learnings-researcher (always)
- security -- new endpoint in routes.rb accepts user-provided redirect URL
- kieran-rails -- controller and Turbo flow changed in app/controllers and app/views
- dhh-rails -- diff adds service objects around ordinary Rails CRUD
- data-migrations -- adds migration 20260303_add_index_to_orders
- schema-drift-detector -- migration files present
```
This is progress reporting, not a blocking confirmation.
### Stage 3b: Discover project standards paths
Before spawning sub-agents, find the file paths (not contents) of all relevant standards files for the `project-standards` persona. Use the native file-search/glob tool to locate:
1. Use the native file-search tool (e.g., Glob in Claude Code) to find all `**/CLAUDE.md` and `**/AGENTS.md` in the repo.
2. Filter to those whose directory is an ancestor of at least one changed file. A standards file governs all files below it (e.g., `plugins/compound-engineering/AGENTS.md` applies to everything under `plugins/compound-engineering/`).
Pass the resulting path list to the `project-standards` persona inside a `<standards-paths>` block in its review context (see Stage 4). The persona reads the files itself, targeting only the sections relevant to the changed file types. This keeps the orchestrator's work cheap (path discovery only) and avoids bloating the subagent prompt with content the reviewer may not fully need.
### Stage 4: Spawn sub-agents
Spawn each selected persona reviewer as a parallel sub-agent using the template in [subagent-template.md](./references/subagent-template.md). Each persona sub-agent receives:
#### Model tiering
Persona sub-agents do focused, scoped work and should use cheaper/faster models to reduce cost and latency. The orchestrator itself stays on the default (most capable) model.
Use the platform's cheapest capable model for all persona and CE sub-agents. In Claude Code, pass `model: "haiku"` in the Agent tool call. On other platforms, use the equivalent fast/cheap tier (e.g., `gpt-4o-mini` in Codex). If the platform has no model override mechanism or the available model names are unknown, omit the model parameter and let agents inherit the default -- a working review on the parent model is better than a broken dispatch from an unrecognized model name.
CE always-on agents (agent-native-reviewer, learnings-researcher) and CE conditional agents (design-conformance-reviewer, schema-drift-detector, deployment-verification-agent, zip-agent-validator) also use the cheaper model tier since they perform scoped, focused work.
The orchestrator (this skill) stays on the default model because it handles intent discovery, reviewer selection, finding merge/dedup, and synthesis -- tasks that benefit from stronger reasoning.
#### Spawning
Spawn each selected persona reviewer as a parallel sub-agent using the subagent template included below. Each persona sub-agent receives:
1. Their persona file content (identity, failure modes, calibration, suppress conditions)
2. Shared diff-scope rules from [diff-scope.md](./references/diff-scope.md)
3. The JSON output contract from [findings-schema.json](./references/findings-schema.json)
4. Review context: intent summary, file list, diff
2. Shared diff-scope rules from the diff-scope reference included below
3. The JSON output contract from the findings schema included below
4. PR metadata: title, body, and URL when reviewing a PR (empty string otherwise). Passed in a `<pr-context>` block so reviewers can verify code against stated intent
5. Review context: intent summary, file list, diff
6. **For `project-standards` only:** the standards file path list from Stage 3b, wrapped in a `<standards-paths>` block appended to the review context
Persona sub-agents are **read-only**: they review and return structured JSON. They do not edit files or propose refactors.
Read-only here means **non-mutating**, not "no shell access." Reviewer sub-agents may use non-mutating inspection commands when needed to gather evidence or verify scope, including read-oriented `git` / `gh` usage such as `git diff`, `git show`, `git blame`, `git log`, and `gh pr view`. They must not edit files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
Each persona sub-agent returns JSON matching [findings-schema.json](./references/findings-schema.json):
Each persona sub-agent returns JSON matching the findings schema included below:
```json
{
@@ -361,44 +414,126 @@ Each persona sub-agent returns JSON matching [findings-schema.json](./references
**CE always-on agents** (agent-native-reviewer, learnings-researcher) are dispatched as standard Agent calls in parallel with the persona agents. Give them the same review context bundle the personas receive: entry mode, any PR metadata gathered in Stage 1, intent summary, review base branch name when known, `BASE:` marker, file list, diff, and `UNTRACKED:` scope notes. Do not invoke them with a generic "review this" prompt. Their output is unstructured and synthesized separately in Stage 6.
**CE conditional agents** (design-conformance-reviewer, schema-drift-detector, deployment-verification-agent, zip-agent-validator) are also dispatched as standard Agent calls when applicable. Pass the same review context bundle plus the applicability reason (for example, which design docs were found, or which migration files triggered the agent). For schema-drift-detector specifically, pass the resolved review base branch explicitly so it never assumes `main`. For zip-agent-validator, pass the full PR URL and the PR number so it can fetch comments from the GHE API. Their output is unstructured and must be preserved for Stage 6 synthesis just like the CE always-on agents.
**CE conditional agents** (design-conformance-reviewer, schema-drift-detector, deployment-verification-agent, zip-agent-validator) are also dispatched as standard Agent calls when applicable. Pass the same review context bundle plus the applicability reason (for example, which migration files triggered the agent, which design docs were found, or that the PR URL matched `git.zoominfo.com`). For schema-drift-detector specifically, pass the resolved review base branch explicitly so it never assumes `main`. For zip-agent-validator, pass the full PR URL and the PR number so it can fetch comments from the GHE API. Their output is unstructured and must be preserved for Stage 6 synthesis just like the CE always-on agents.
### Stage 5: Merge findings
Convert multiple reviewer JSON payloads into one deduplicated, confidence-gated finding set.
1. **Validate.** Check each output against the schema. Drop malformed findings (missing required fields). Record the drop count.
2. **Confidence gate.** Suppress findings below 0.60 confidence. Record the suppressed count. This matches the persona instructions: findings below 0.60 are noise and should not survive synthesis.
2. **Confidence gate.** Suppress findings below 0.60 confidence. Exception: P0 findings at 0.50+ confidence survive the gate -- critical-but-uncertain issues must not be silently dropped. Record the suppressed count. This matches the persona instructions and the schema's confidence thresholds.
3. **Deduplicate.** Compute fingerprint: `normalize(file) + line_bucket(line, +/-3) + normalize(title)`. When fingerprints match, merge: keep highest severity, keep highest confidence with strongest evidence, union evidence, note which reviewers flagged it.
4. **Separate pre-existing.** Pull out findings with `pre_existing: true` into a separate list.
5. **Normalize routing.** For each merged finding, set the final `autofix_class`, `owner`, and `requires_verification`. If reviewers disagree, keep the most conservative route. Synthesis may narrow a finding from `safe_auto` to `gated_auto` or `manual`, but must not widen it without new evidence.
6. **Partition the work.** Build three sets:
4. **Cross-reviewer agreement.** When 2+ independent reviewers flag the same issue (same fingerprint), boost the merged confidence by 0.10 (capped at 1.0). Cross-reviewer agreement is strong signal -- independent reviewers converging on the same issue is more reliable than any single reviewer's confidence. Note the agreement in the Reviewer column of the output (e.g., "security, correctness").
5. **Separate pre-existing.** Pull out findings with `pre_existing: true` into a separate list.
5. **Resolve disagreements.** When reviewers flag the same code region but disagree on severity, autofix_class, or owner, record the disagreement in the finding's evidence (e.g., "security rated P0, correctness rated P1 -- keeping P0"). This transparency helps the user understand why a finding was routed the way it was.
6. **Normalize routing.** For each merged finding, set the final `autofix_class`, `owner`, and `requires_verification`. If reviewers disagree, keep the most conservative route. Synthesis may narrow a finding from `safe_auto` to `gated_auto` or `manual`, but must not widen it without new evidence.
7. **Partition the work.** Build three sets:
- in-skill fixer queue: only `safe_auto -> review-fixer`
- residual actionable queue: unresolved `gated_auto` or `manual` findings whose owner is `downstream-resolver`
- report-only queue: `advisory` findings plus anything owned by `human` or `release`
7. **Sort.** Order by severity (P0 first) -> confidence (descending) -> file path -> line number.
8. **Collect coverage data.** Union residual_risks and testing_gaps across reviewers.
9. **Preserve CE agent artifacts.** Keep the learnings, agent-native, schema-drift, deployment-verification, and zip-agent-validator outputs alongside the merged finding set. Do not drop unstructured agent output just because it does not match the persona JSON schema. For zip-agent-validator specifically, its validated findings use the standard findings schema and enter the merge pipeline (steps 1-7) like persona findings. Its `residual_risks` entries (collapsed zip-agent comments) are preserved separately for the Zip Agent Validation section in Stage 6.
8. **Sort.** Order by severity (P0 first) -> confidence (descending) -> file path -> line number.
9. **Collect coverage data.** Union residual_risks and testing_gaps across reviewers.
10. **Preserve CE agent artifacts.** Keep the learnings, agent-native, schema-drift, deployment-verification, and zip-agent-validator outputs alongside the merged finding set. Do not drop unstructured agent output just because it does not match the persona JSON schema. For zip-agent-validator specifically, its validated findings use the standard findings schema and enter the merge pipeline (steps 1-7) like persona findings. Its `residual_risks` entries (collapsed zip-agent comments) are preserved separately for the Zip Agent Validation section in Stage 6.
### Stage 6: Synthesize and present
Assemble the final report using the template in [review-output-template.md](./references/review-output-template.md):
Assemble the final report using **pipe-delimited markdown tables for findings** from the review output template included below. The table format is mandatory for finding rows in interactive mode — do not render findings as freeform text blocks or horizontal-rule-separated prose. Other report sections (Applied Fixes, Learnings, Coverage, etc.) use bullet lists and the `---` separator before the verdict, as shown in the template.
1. **Header.** Scope, intent, mode, reviewer team with per-conditional justifications.
2. **Findings.** Grouped by severity (P0, P1, P2, P3). Each finding shows file, issue, reviewer(s), confidence, and synthesized route.
3. **Applied Fixes.** Include only if a fix phase ran in this invocation.
4. **Residual Actionable Work.** Include when unresolved actionable findings were handed off or should be handed off.
5. **Pre-existing.** Separate section, does not count toward verdict.
6. **Learnings & Past Solutions.** Surface learnings-researcher results: if past solutions are relevant, flag them as "Known Pattern" with links to docs/solutions/ files.
7. **Agent-Native Gaps.** Surface agent-native-reviewer results. Omit section if no gaps found.
8. **Schema Drift Check.** If schema-drift-detector ran, summarize whether drift was found. If drift exists, list the unrelated schema objects and the required cleanup command. If clean, say so briefly.
9. **Deployment Notes.** If deployment-verification-agent ran, surface the key Go/No-Go items: blocking pre-deploy checks, the most important verification queries, rollback caveats, and monitoring focus areas. Keep the checklist actionable rather than dropping it into Coverage.
10. **Zip Agent Validation.** If zip-agent-validator ran, summarize the results: how many zip-agent comments were evaluated, how many validated (these appear as findings in the severity-grouped tables above), and how many collapsed with reasons. This section provides traceability -- reviewers can see that zip-agent comments were evaluated, not ignored.
11. **Coverage.** Suppressed count, residual risks, testing gaps, failed/timed-out reviewers, and any intent uncertainty carried by non-interactive modes.
12. **Verdict.** Ready to merge / Ready with fixes / Not ready. Fix order if applicable.
2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists.
3. **Requirements Completeness.** Include only when a plan was found in Stage 2b. For each requirement (R1, R2, etc.) and implementation unit in the plan, report whether corresponding work appears in the diff. Use a simple checklist: met / not addressed / partially addressed. Routing depends on `plan_source`:
- **`explicit`** (caller-provided or PR body): Flag unaddressed requirements as P1 findings with `autofix_class: manual`, `owner: downstream-resolver`. These enter the residual actionable queue and can become todos.
- **`inferred`** (auto-discovered): Flag unaddressed requirements as P3 findings with `autofix_class: advisory`, `owner: human`. These stay in the report only — no todos, no autonomous follow-up. An inferred plan match is a hint, not a contract.
Omit this section entirely when no plan was found — do not mention the absence of a plan.
4. **Applied Fixes.** Include only if a fix phase ran in this invocation.
5. **Residual Actionable Work.** Include when unresolved actionable findings were handed off or should be handed off.
6. **Pre-existing.** Separate section, does not count toward verdict.
7. **Learnings & Past Solutions.** Surface learnings-researcher results: if past solutions are relevant, flag them as "Known Pattern" with links to docs/solutions/ files.
8. **Agent-Native Gaps.** Surface agent-native-reviewer results. Omit section if no gaps found.
9. **Schema Drift Check.** If schema-drift-detector ran, summarize whether drift was found. If drift exists, list the unrelated schema objects and the required cleanup command. If clean, say so briefly.
10. **Deployment Notes.** If deployment-verification-agent ran, surface the key Go/No-Go items: blocking pre-deploy checks, the most important verification queries, rollback caveats, and monitoring focus areas. Keep the checklist actionable rather than dropping it into Coverage.
11. **Zip Agent Validation.** If zip-agent-validator ran, summarize the results: how many zip-agent comments were evaluated, how many validated (these appear as findings in the severity-grouped tables above), and how many collapsed with reasons. This section provides traceability -- reviewers can see that zip-agent comments were evaluated, not ignored.
12. **Coverage.** Suppressed count, residual risks, testing gaps, failed/timed-out reviewers, and any intent uncertainty carried by non-interactive modes.
13. **Verdict.** Ready to merge / Ready with fixes / Not ready. Fix order if applicable. When an `explicit` plan has unaddressed requirements, the verdict must reflect it — a PR that's code-clean but missing planned requirements is "Not ready" unless the omission is intentional. When an `inferred` plan has unaddressed requirements, note it in the verdict reasoning but do not block on it alone.
Do not include time estimates.
**Format verification:** Before delivering the report, verify the findings sections use pipe-delimited table rows (`| # | File | Issue | ... |`) not freeform text. If you catch yourself rendering findings as prose blocks separated by horizontal rules or bullet points, stop and reformat into tables.
### Headless output format
In `mode:headless`, replace the interactive pipe-delimited table report with a structured text envelope. The envelope follows the same structural pattern as document-review's headless output (completion header, metadata block, findings grouped by autofix_class, trailing sections) while using ce:review's own section headings and per-finding fields.
```
Code review complete (headless mode).
Scope: <scope-line>
Intent: <intent-summary>
Reviewers: <reviewer-list with conditional justifications>
Verdict: <Ready to merge | Ready with fixes | Not ready>
Artifact: .context/compound-engineering/ce-review/<run-id>/
Applied N safe_auto fixes.
Gated-auto findings (concrete fix, changes behavior/contracts):
[P1][gated_auto -> downstream-resolver][needs-verification] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Suggested fix: <suggested_fix or "none">
Evidence: <evidence[0]>
Evidence: <evidence[1]>
Manual findings (actionable, needs handoff):
[P1][manual -> downstream-resolver] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Evidence: <evidence[0]>
Advisory findings (report-only):
[P2][advisory -> human] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Pre-existing issues:
[P2][gated_auto -> downstream-resolver] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Residual risks:
- <risk>
Learnings & Past Solutions:
- <learning>
Agent-Native Gaps:
- <gap description>
Schema Drift Check:
- <drift status>
Deployment Notes:
- <deployment note>
Testing gaps:
- <gap>
Coverage:
- Suppressed: <N> findings below 0.60 confidence (P0 at 0.50+ retained)
- Untracked files excluded: <file1>, <file2>
- Failed reviewers: <reviewer>
Review complete
```
**Formatting rules:**
- The `[needs-verification]` marker appears only on findings where `requires_verification: true`.
- The `Artifact:` line gives callers the path to the full run artifact for machine-readable access to the complete findings schema. The text envelope is the primary handoff; the artifact is for debugging and full-fidelity access.
- Findings with `owner: release` appear in the Advisory section (they are operational/rollout items, not code fixes).
- Findings with `pre_existing: true` appear in the Pre-existing section regardless of autofix_class.
- The Verdict appears in the metadata header (deliberately reordered from the interactive format where it appears at the bottom) so programmatic callers get the verdict first.
- Omit any section with zero items.
- If all reviewers fail or time out, emit `Code review degraded (headless mode). Reason: 0 of N reviewers returned results.` followed by "Review complete".
- End with "Review complete" as the terminal signal so callers can detect completion.
## Quality Gates
Before delivering the review, verify:
@@ -410,9 +545,11 @@ Before delivering the review, verify:
5. **Protected artifacts are respected.** Discard any findings that recommend deleting or gitignoring files in `docs/brainstorms/`, `docs/plans/`, or `docs/solutions/`.
6. **Findings don't duplicate linter output.** Don't flag things the project's linter/formatter would catch (missing semicolons, wrong indentation). Focus on semantic issues.
## Language-Agnostic
## Language-Aware Conditionals
This skill does NOT use language-specific reviewer agents. Persona reviewers adapt their criteria to the language/framework based on project context (loaded automatically). This keeps the skill simple and avoids maintaining parallel reviewers per language.
This skill uses stack-specific reviewer agents when the diff clearly warrants them. Keep those agents opinionated. They are not generic language checkers; they add a distinct review lens on top of the always-on and cross-cutting personas.
Do not spawn them mechanically from file extensions alone. The trigger is meaningful changed behavior, architecture, or UI state in that stack.
## After Review
@@ -432,17 +569,26 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
**Interactive mode**
- Ask a single policy question only when actionable work exists.
- Recommended default:
- Apply `safe_auto -> review-fixer` findings automatically without asking. These are safe by definition.
- Ask a policy question **using the platform's blocking question tool** (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini) only when `gated_auto` or `manual` findings remain after safe fixes. Do not replace with a conversational open-ended question. Adapt the options to match what actually remains:
**When `gated_auto` findings are present** (with or without `manual`):
```
What should I do with the actionable findings?
1. Apply safe_auto fixes and leave the rest as residual work (Recommended)
2. Apply safe_auto fixes only
3. Review report only
Safe fixes have been applied. What should I do with the remaining findings?
1. Review and approve specific gated fixes (Recommended)
2. Leave as residual work
3. Report only -- no further action
```
- Tailor the prompt to the actual action sets. If the fixer queue is empty, do not offer "Apply safe_auto fixes" options. Ask whether to externalize the residual actionable work or keep the review report-only instead.
**When only `manual` findings remain** (no `gated_auto`):
```
Safe fixes have been applied. The remaining findings need manual resolution. What should I do?
1. Leave as residual work (Recommended)
2. Report only -- no further action
```
If no blocking question tool is available, present the applicable numbered options as text and wait for the user's selection before proceeding.
- If no `gated_auto` or `manual` findings remain after safe fixes, skip the policy question entirely — report what was fixed and proceed to next steps.
- Only include `gated_auto` findings in the fixer queue after the user explicitly approves the specific items. Do not widen the queue based on severity alone.
**Autofix mode**
@@ -459,6 +605,15 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
- Do not create residual todos or `.context` artifacts.
- Stop after Stage 6. Everything remains in the report.
**Headless mode**
- Ask no questions.
- Apply only the `safe_auto -> review-fixer` queue in a single pass. Do not enter the bounded re-review loop (Step 3). Spawn one fixer subagent, apply fixes, then proceed directly to Step 4.
- Leave `gated_auto`, `manual`, `human`, and `release` items unresolved — they appear in the structured text output.
- Output the headless output envelope (see Stage 6) instead of the interactive report.
- Write a run artifact (Step 4) but do not create todo files.
- Stop after the structured text output and "Review complete" signal. No commit/push/PR.
#### Step 3: Apply fixes with one fixer and bounded rounds
- Spawn exactly one fixer subagent for the current fixer queue in the current checkout. That fixer applies all approved changes and runs the relevant targeted tests in one pass against a consistent tree.
@@ -470,7 +625,7 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
#### Step 4: Emit artifacts and downstream handoff
- In interactive and autofix modes, write a per-run artifact under `.context/compound-engineering/ce-review/<run-id>/` containing:
- In interactive, autofix, and headless modes, write a per-run artifact under `.context/compound-engineering/ce-review/<run-id>/` containing:
- synthesized findings
- applied fixes
- residual actionable work
@@ -498,8 +653,32 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
If "Create a PR": first publish the branch with `git push --set-upstream origin HEAD`, then use `gh pr create` with a title and summary derived from the branch changes.
If "Push fixes": push the branch with `git push` to update the existing PR.
**Autofix and report-only modes:** stop after the report, artifact emission, and residual-work handoff. Do not commit, push, or create a PR.
**Autofix, report-only, and headless modes:** stop after the report, artifact emission, and residual-work handoff. Do not commit, push, or create a PR.
## Fallback
If the platform doesn't support parallel sub-agents, run reviewers sequentially. Everything else (stages, output format, merge pipeline) stays the same.
---
## Included References
### Persona Catalog
@./references/persona-catalog.md
### Subagent Template
@./references/subagent-template.md
### Diff Scope Rules
@./references/diff-scope.md
### Findings Schema
@./references/findings-schema.json
### Review Output Template
@./references/review-output-template.md

View File

@@ -102,9 +102,10 @@
"_meta": {
"confidence_thresholds": {
"suppress": "Below 0.60 -- do not report. Finding is speculative noise.",
"flag": "0.60-0.69 -- include only when the persona's calibration says the issue is actionable at that confidence.",
"report": "0.70+ -- report with full confidence."
"suppress": "Below 0.60 -- do not report. Finding is speculative noise. Exception: P0 findings at 0.50+ may be reported.",
"flag": "0.60-0.69 -- include only when the issue is clearly actionable with concrete evidence.",
"confident": "0.70-0.84 -- real and important. Report with full evidence.",
"certain": "0.85-1.00 -- verifiable from the code alone. Report."
},
"severity_definitions": {
"P0": "Critical breakage, exploitable vulnerability, data loss/corruption. Must fix before merge.",
@@ -113,10 +114,10 @@
"P3": "Low-impact, narrow scope, minor improvement. User's discretion."
},
"autofix_classes": {
"safe_auto": "Local, deterministic code or test fix suitable for the in-skill fixer in autonomous mode.",
"gated_auto": "Concrete fix exists, but it changes behavior, permissions, contracts, or other sensitive areas that deserve explicit approval.",
"manual": "Actionable issue that should become residual work rather than an in-skill autofix.",
"advisory": "Informational or operational item that should be surfaced in the report only."
"safe_auto": "Local, deterministic code or test fix suitable for the in-skill fixer. Examples: extract duplicated helper, add missing nil check, fix off-by-one, add missing test, remove dead code. Do not default to advisory when a concrete safe fix exists.",
"gated_auto": "Concrete fix exists, but it changes behavior, permissions, contracts, or other sensitive areas that deserve explicit approval. Examples: add auth to unprotected endpoint, change API response shape.",
"manual": "Actionable issue that requires design decisions or cross-cutting changes. Examples: redesign data model, add pagination strategy, choose between architectural approaches.",
"advisory": "Informational or operational item that should be surfaced in the report only. Examples: design asymmetry the PR improves but does not fully resolve, residual risk notes, deployment considerations."
},
"owners": {
"review-fixer": "The in-skill fixer can own this when policy allows.",

View File

@@ -1,8 +1,8 @@
# Persona Catalog
13 reviewer personas organized in three tiers, plus CE-specific agents. The orchestrator uses this catalog to select which reviewers to spawn for each review.
21 reviewer personas organized into always-on, cross-cutting conditional, stack-specific conditional, and language/framework conditional layers, plus CE-specific agents. The orchestrator uses this catalog to select which reviewers to spawn for each review.
## Always-on (3 personas + 2 CE agents)
## Always-on (4 personas + 2 CE agents)
Spawned on every review regardless of diff content.
@@ -13,6 +13,7 @@ Spawned on every review regardless of diff content.
| `correctness` | `compound-engineering:review:correctness-reviewer` | Logic errors, edge cases, state bugs, error propagation, intent compliance |
| `testing` | `compound-engineering:review:testing-reviewer` | Coverage gaps, weak assertions, brittle tests, missing edge case tests |
| `maintainability` | `compound-engineering:review:maintainability-reviewer` | Coupling, complexity, naming, dead code, premature abstraction |
| `project-standards` | `compound-engineering:review:project-standards-reviewer` | CLAUDE.md and AGENTS.md compliance -- frontmatter, references, naming, cross-platform portability, tool selection |
**CE agents (unstructured output, synthesized separately):**
@@ -21,7 +22,7 @@ Spawned on every review regardless of diff content.
| `compound-engineering:review:agent-native-reviewer` | Verify new features are agent-accessible |
| `compound-engineering:research:learnings-researcher` | Search docs/solutions/ for past issues related to this PR's modules and patterns |
## Conditional (5 personas)
## Conditional (7 personas)
Spawned when the orchestrator identifies relevant patterns in the diff. The orchestrator reads the full diff and reasons about selection -- this is agent judgment, not keyword matching.
@@ -32,6 +33,20 @@ Spawned when the orchestrator identifies relevant patterns in the diff. The orch
| `api-contract` | `compound-engineering:review:api-contract-reviewer` | Route definitions, serializer/interface changes, event schemas, exported type signatures, API versioning |
| `data-migrations` | `compound-engineering:review:data-migrations-reviewer` | Migration files, schema changes, backfill scripts, data transformations |
| `reliability` | `compound-engineering:review:reliability-reviewer` | Error handling, retry logic, circuit breakers, timeouts, background jobs, async handlers, health checks |
| `adversarial` | `compound-engineering:review:adversarial-reviewer` | Diff has >=50 changed non-test, non-generated, non-lockfile lines, OR touches auth, payments, data mutations, external API integrations, or other high-risk domains |
| `previous-comments` | `compound-engineering:review:previous-comments-reviewer` | **PR-only.** Reviewing a PR that has existing review comments or review threads from prior review rounds. Skip entirely when no PR metadata was gathered in Stage 1. |
## Stack-Specific Conditional (5 personas)
These reviewers keep their original opinionated lens. They are additive with the cross-cutting personas above, not replacements for them.
| Persona | Agent | Select when diff touches... |
|---------|-------|---------------------------|
| `dhh-rails` | `compound-engineering:review:dhh-rails-reviewer` | Rails architecture, service objects, authentication/session choices, Hotwire-vs-SPA boundaries, or abstractions that may fight Rails conventions |
| `kieran-rails` | `compound-engineering:review:kieran-rails-reviewer` | Rails controllers, models, views, jobs, components, routes, or other application-layer Ruby code where clarity and conventions matter |
| `kieran-python` | `compound-engineering:review:kieran-python-reviewer` | Python modules, endpoints, services, scripts, or typed domain code |
| `kieran-typescript` | `compound-engineering:review:kieran-typescript-reviewer` | TypeScript components, services, hooks, utilities, or shared types |
| `julik-frontend-races` | `compound-engineering:review:julik-frontend-races-reviewer` | Stimulus/Turbo controllers, DOM event wiring, timers, async UI flows, animations, or frontend state transitions with race potential |
## Language & Framework Conditional (5 personas)
@@ -47,7 +62,7 @@ Spawned when the orchestrator identifies language or framework-specific patterns
## CE Conditional Agents (design, migration & external review)
These CE-native agents provide specialized analysis beyond what the persona agents cover.
These CE-native agents provide specialized analysis beyond what the persona agents cover. Spawn them when the diff includes database migrations, schema.rb, data backfills, design documents, or when the PR originates from specific platforms.
| Agent | Focus | Select when... |
|-------|-------|----------------|
@@ -58,8 +73,9 @@ These CE-native agents provide specialized analysis beyond what the persona agen
## Selection rules
1. **Always spawn all 3 always-on personas** plus the 2 CE always-on agents.
2. **For each conditional persona**, the orchestrator reads the diff and decides whether the persona's domain is relevant. This is a judgment call, not a keyword match.
3. **For language/framework conditional personas**, spawn when the diff contains files matching the persona's language or framework domain. Multiple language personas can be active simultaneously (e.g., both `python-quality` and `typescript-quality` if the diff touches both).
4. **For CE conditional agents**, check each agent's selection criteria. `design-conformance-reviewer`: spawn when the repo contains design docs or an active plan matching the branch. `schema-drift-detector` and `deployment-verification-agent`: spawn when the diff includes migration files (`db/migrate/*.rb`, `db/schema.rb`) or data backfill scripts. `zip-agent-validator`: spawn when the PR URL contains `git.zoominfo.com`.
5. **Announce the team** before spawning with a one-line justification per conditional reviewer selected.
1. **Always spawn all 4 always-on personas** plus the 2 CE always-on agents.
2. **For each cross-cutting conditional persona**, the orchestrator reads the diff and decides whether the persona's domain is relevant. This is a judgment call, not a keyword match.
3. **For each stack-specific conditional persona**, use file types and changed patterns as a starting point, then decide whether the diff actually introduces meaningful work for that reviewer. Do not spawn language-specific reviewers just because one config or generated file happens to match the extension.
4. **For each language/framework conditional persona**, check whether the diff touches language or framework-specific patterns that warrant deeper domain expertise. These are additive with stack-specific personas, not replacements.
5. **For CE conditional agents**, spawn when applicable: migration files (`db/migrate/*.rb`, `db/schema.rb`) or data backfill scripts trigger schema-drift-detector and deployment-verification-agent; design documents or active plans trigger design-conformance-reviewer; PR URLs containing `git.zoominfo.com` trigger zip-agent-validator.
6. **Announce the team** before spawning with a one-line justification per conditional reviewer selected.

View File

@@ -0,0 +1,94 @@
#!/usr/bin/env bash
# Resolve the review base branch and compute the merge-base for ce:review.
# Handles fork-safe remote resolution, PR metadata, and multi-fallback detection.
#
# Usage: bash references/resolve-base.sh
# Output: BASE:<sha> on success, ERROR:<message> on failure.
#
# Detects the base branch from (in priority order):
# 1. PR metadata (base ref + base repo for fork safety)
# 2. origin/HEAD symbolic ref
# 3. gh repo view defaultBranchRef
# 4. Common branch names: main, master, develop, trunk
set -euo pipefail
REVIEW_BASE_BRANCH=""
PR_BASE_REPO=""
PR_BASE_REMOTE=""
BASE_REF=""
# Step 1: Try PR metadata (handles fork workflows)
if command -v gh >/dev/null 2>&1; then
PR_META=$(gh pr view --json baseRefName,url 2>/dev/null || true)
if [ -n "$PR_META" ]; then
REVIEW_BASE_BRANCH=$(echo "$PR_META" | jq -r '.baseRefName // empty' 2>/dev/null || true)
PR_BASE_REPO=$(echo "$PR_META" | jq -r '.url // empty' 2>/dev/null | sed -n 's#https://github.com/\([^/]*/[^/]*\)/pull/.*#\1#p' || true)
fi
fi
# Step 2: Fall back to origin/HEAD
if [ -z "$REVIEW_BASE_BRANCH" ]; then
REVIEW_BASE_BRANCH=$(git symbolic-ref --quiet --short refs/remotes/origin/HEAD 2>/dev/null | sed 's#^origin/##' || true)
fi
# Step 3: Fall back to gh repo view
if [ -z "$REVIEW_BASE_BRANCH" ] && command -v gh >/dev/null 2>&1; then
REVIEW_BASE_BRANCH=$(gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name' 2>/dev/null || true)
fi
# Step 4: Fall back to common branch names
if [ -z "$REVIEW_BASE_BRANCH" ]; then
for candidate in main master develop trunk; do
if git rev-parse --verify "origin/$candidate" >/dev/null 2>&1 || git rev-parse --verify "$candidate" >/dev/null 2>&1; then
REVIEW_BASE_BRANCH="$candidate"
break
fi
done
fi
# Resolve the base ref from the correct remote (fork-safe)
if [ -n "$REVIEW_BASE_BRANCH" ]; then
if [ -n "$PR_BASE_REPO" ]; then
PR_BASE_REMOTE=$(git remote -v | awk "index(\$2, \"github.com:$PR_BASE_REPO\") || index(\$2, \"github.com/$PR_BASE_REPO\") {print \$1; exit}")
if [ -n "$PR_BASE_REMOTE" ]; then
git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH:refs/remotes/$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH" 2>/dev/null || true
BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
fi
fi
if [ -z "$BASE_REF" ]; then
# Only try origin if it exists as a remote; otherwise skip to avoid
# confusing errors in fork setups where origin points at the user's fork.
if git remote get-url origin >/dev/null 2>&1; then
git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags origin "$REVIEW_BASE_BRANCH:refs/remotes/origin/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags origin "$REVIEW_BASE_BRANCH" 2>/dev/null || true
BASE_REF=$(git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
fi
# Fall back to a bare local ref only if remote resolution failed
if [ -z "$BASE_REF" ]; then
BASE_REF=$(git rev-parse --verify "$REVIEW_BASE_BRANCH" 2>/dev/null || true)
fi
fi
fi
# Compute merge-base
if [ -n "$BASE_REF" ]; then
BASE=$(git merge-base HEAD "$BASE_REF" 2>/dev/null) || BASE=""
if [ -z "$BASE" ] && [ "$(git rev-parse --is-shallow-repository 2>/dev/null || echo false)" = "true" ]; then
if git remote get-url origin >/dev/null 2>&1; then
git fetch --no-tags --unshallow origin 2>/dev/null || true
BASE=$(git merge-base HEAD "$BASE_REF" 2>/dev/null) || BASE=""
fi
if [ -z "$BASE" ] && [ -n "$PR_BASE_REMOTE" ] && [ "$PR_BASE_REMOTE" != "origin" ]; then
git fetch --no-tags --unshallow "$PR_BASE_REMOTE" 2>/dev/null || true
BASE=$(git merge-base HEAD "$BASE_REF" 2>/dev/null) || BASE=""
fi
fi
else
BASE=""
fi
if [ -n "$BASE" ]; then
echo "BASE:$BASE"
else
echo "ERROR:Unable to resolve review base branch locally. Fetch the base branch and rerun, or provide a PR number so the review scope can be determined from PR metadata."
fi

View File

@@ -92,6 +92,15 @@ Use this **exact format** when presenting synthesized review findings. Findings
- Residual risks: No rate limiting on export endpoint
- Testing gaps: No test for concurrent export requests
### Zip Agent Validation
- Evaluated: 8 zip-agent comments
- Validated: 2 (appear as findings #3 and #6 above)
- Collapsed: 6
- `app/services/order_service.rb:45`: "Missing error handling" -- handled by ApplicationService base class rescue
- `app/controllers/api/orders_controller.rb:18`: "Unbounded query" -- pagination enforced by ApiController concern
- _(4 more collapsed for stylistic/formatting concerns)_
---
> **Verdict:** Ready with fixes
@@ -101,16 +110,37 @@ Use this **exact format** when presenting synthesized review findings. Findings
> **Fix order:** P0 auth bypass -> P1 memory/pagination -> P2 error handling if straightforward
```
## Anti-patterns
Do NOT produce output like this. The following is wrong:
```markdown
Findings
Sev: P1
File: foo.go:42
Issue: Some problem description
Reviewer(s): adversarial
Confidence: 0.85
Route: advisory -> human
────────────────────────────────────────
Sev: P2
File: bar.go:99
Issue: Another problem
```
This fails because: no pipe-delimited tables, no severity-grouped `###` headers, uses box-drawing horizontal rules, no numbered findings, no `## Code Review Results` title, and the verdict is not in a blockquote. Always use the table format from the example above.
## Formatting Rules
- **Pipe-delimited markdown tables** -- never ASCII box-drawing characters
- **Pipe-delimited markdown tables** for findings -- never ASCII box-drawing characters or per-finding horizontal-rule separators between entries (the report-level `---` before the verdict is still required)
- **Severity-grouped sections** -- `### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`. Omit empty severity levels.
- **Always include file:line location** for code review issues
- **Reviewer column** shows which persona(s) flagged the issue. Multiple reviewers = cross-reviewer agreement.
- **Confidence column** shows the finding's confidence score
- **Route column** shows the synthesized handling decision as ``<autofix_class> -> <owner>``.
- **Header includes** scope, intent, and reviewer team with per-conditional justifications
- **Mode line** -- include `interactive`, `autofix`, or `report-only`
- **Mode line** -- include `interactive`, `autofix`, `report-only`, or `headless`
- **Applied Fixes section** -- include only when a fix phase ran in this review invocation
- **Residual Actionable Work section** -- include only when unresolved actionable findings were handed off for later work
- **Pre-existing section** -- separate table, no confidence column (these are informational)
@@ -120,6 +150,19 @@ Use this **exact format** when presenting synthesized review findings. Findings
- **Deployment Notes section** -- key checklist items from deployment-verification-agent. Omit if the agent did not run.
- **Zip Agent Validation section** -- summary of zip-agent comment evaluation: total, validated (with cross-references to findings table), collapsed (with reasons). Omit if the agent did not run.
- **Coverage section** -- suppressed count, residual risks, testing gaps, failed reviewers
- **Zip Agent Validation section** -- summary of zip-agent comment evaluation: total, validated (with cross-references to findings table), collapsed (with reasons). Omit if the agent did not run.
- **Summary uses blockquotes** for verdict, reasoning, and fix order
- **Horizontal rule** (`---`) separates findings from verdict
- **`###` headers** for each section -- never plain text headers
## Headless Mode Format
In `mode:headless`, replace the interactive pipe-delimited table report with a structured text envelope. The headless format is defined in the `### Headless output format` section of SKILL.md. Key differences from the interactive format:
- **No pipe-delimited tables.** Findings use `[severity][autofix_class -> owner] File: <file:line> -- <title>` line format with indented Why/Evidence/Suggested fix lines.
- **Findings grouped by autofix_class** (gated-auto, manual, advisory) instead of severity. Within each group, findings are sorted by severity.
- **Verdict in header** (top of output) instead of bottom, so programmatic callers get it first.
- **`Artifact:` line** in metadata header gives callers the path to the full run artifact.
- **`[needs-verification]` marker** on findings where `requires_verification: true`.
- **Evidence lines** included per finding.
- **Completion signal:** "Review complete" as the final line.

View File

@@ -22,18 +22,45 @@ Return ONLY valid JSON matching the findings schema below. No prose, no markdown
{schema}
Confidence rubric (0.0-1.0 scale):
- 0.00-0.29: Not confident / likely false positive. Do not report.
- 0.30-0.49: Somewhat confident. Do not report -- too speculative for actionable review.
- 0.50-0.59: Moderately confident. Real but uncertain. Do not report unless P0 severity.
- 0.60-0.69: Confident enough to flag. Include only when the issue is clearly actionable.
- 0.70-0.84: Highly confident. Real and important. Report with full evidence.
- 0.85-1.00: Certain. Verifiable from the code alone. Report.
Suppress threshold: 0.60. Do not emit findings below 0.60 confidence (except P0 at 0.50+).
False-positive categories to actively suppress:
- Pre-existing issues unrelated to this diff (mark pre_existing: true for unchanged code the diff does not interact with; if the diff makes it newly relevant, it is secondary, not pre-existing)
- Pedantic style nitpicks that a linter/formatter would catch
- Code that looks wrong but is intentional (check comments, commit messages, PR description for intent)
- Issues already handled elsewhere in the codebase (check callers, guards, middleware)
- Suggestions that restate what the code already does in different words
- Generic "consider adding" advice without a concrete failure mode
Rules:
- Suppress any finding below your stated confidence floor (see your Confidence calibration section).
- Every finding MUST include at least one evidence item grounded in the actual code.
- Set pre_existing to true ONLY for issues in unchanged code that are unrelated to this diff. If the diff makes the issue newly relevant, it is NOT pre-existing.
- You are operationally read-only. You may use non-mutating inspection commands, including read-oriented `git` / `gh` commands, to gather evidence. Do not edit files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
- Set `autofix_class` conservatively. Use `safe_auto` only when the fix is local, deterministic, and low-risk. Use `gated_auto` when a concrete fix exists but changes behavior/contracts/permissions. Use `manual` for actionable residual work. Use `advisory` for report-only items that should not become code-fix work.
- Set `autofix_class` accurately -- not every finding is `advisory`. Use this decision guide:
- `safe_auto`: The fix is local and deterministic — the fixer can apply it mechanically without design judgment. Examples: extracting a duplicated helper, adding a missing nil/null check, fixing an off-by-one, adding a missing test for an untested code path, removing dead code.
- `gated_auto`: A concrete fix exists but it changes contracts, permissions, or crosses a module boundary in a way that deserves explicit approval. Examples: adding authentication to an unprotected endpoint, changing a public API response shape, switching from soft-delete to hard-delete.
- `manual`: Actionable work that requires design decisions or cross-cutting changes. Examples: redesigning a data model, choosing between two valid architectural approaches, adding pagination to an unbounded query.
- `advisory`: Report-only items that should not become code-fix work. Examples: noting a design asymmetry the PR improves but doesn't fully resolve, flagging a residual risk, deployment notes.
Do not default to `advisory` when uncertain -- if a concrete fix is obvious, classify it as `safe_auto` or `gated_auto`.
- Set `owner` to the default next actor for this finding: `review-fixer`, `downstream-resolver`, `human`, or `release`.
- Set `requires_verification` to true whenever the likely fix needs targeted tests, a focused re-review, or operational validation before it should be trusted.
- suggested_fix is optional. Only include it when the fix is obvious and correct. A bad suggestion is worse than none.
- If you find no issues, return an empty findings array. Still populate residual_risks and testing_gaps if applicable.
- **Intent verification:** Compare the code changes against the stated intent (and PR title/body when available). If the code does something the intent does not describe, or fails to do something the intent promises, flag it as a finding. Mismatches between stated intent and actual code are high-value findings.
</output-contract>
<pr-context>
{pr_metadata}
</pr-context>
<review-context>
Intent: {intent_summary}
@@ -52,5 +79,6 @@ Diff:
| `{diff_scope_rules}` | `references/diff-scope.md` content | Primary/secondary/pre-existing tier rules |
| `{schema}` | `references/findings-schema.json` content | The JSON schema reviewers must conform to |
| `{intent_summary}` | Stage 2 output | 2-3 line description of what the change is trying to accomplish |
| `{pr_metadata}` | Stage 1 output | PR title, body, and URL when reviewing a PR. Empty string when reviewing a branch or standalone checkout |
| `{file_list}` | Stage 1 output | List of changed files from the scope step |
| `{diff}` | Stage 1 output | The actual diff content to review |