refactor(cli)!: rename all skills and agents to consistent ce- prefix (#503)
Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
763
plugins/compound-engineering/skills/ce-code-review/SKILL.md
Normal file
763
plugins/compound-engineering/skills/ce-code-review/SKILL.md
Normal file
@@ -0,0 +1,763 @@
|
||||
---
|
||||
name: ce-code-review
|
||||
description: "Structured code review using tiered persona agents, confidence-gated findings, and a merge/dedup pipeline. Use when reviewing code changes before creating a PR."
|
||||
argument-hint: "[blank to review current branch, or provide PR link]"
|
||||
---
|
||||
|
||||
# Code Review
|
||||
|
||||
Reviews code changes using dynamically selected reviewer personas. Spawns parallel sub-agents that return structured JSON, then merges and deduplicates findings into a single report.
|
||||
|
||||
## When to Use
|
||||
|
||||
- Before creating a PR
|
||||
- After completing a task during iterative implementation
|
||||
- When feedback is needed on any code changes
|
||||
- Can be invoked standalone
|
||||
- Can run as a read-only or autofix review step inside larger workflows
|
||||
|
||||
## Argument Parsing
|
||||
|
||||
Parse `$ARGUMENTS` for the following optional tokens. Strip each recognized token before interpreting the remainder as the PR number, GitHub URL, or branch name.
|
||||
|
||||
| Token | Example | Effect |
|
||||
|-------|---------|--------|
|
||||
| `mode:autofix` | `mode:autofix` | Select autofix mode (see Mode Detection below) |
|
||||
| `mode:report-only` | `mode:report-only` | Select report-only mode |
|
||||
| `mode:headless` | `mode:headless` | Select headless mode for programmatic callers (see Mode Detection below) |
|
||||
| `base:<sha-or-ref>` | `base:abc1234` or `base:origin/main` | Skip scope detection — use this as the diff base directly |
|
||||
| `plan:<path>` | `plan:docs/plans/2026-03-25-001-feat-foo-plan.md` | Load this plan for requirements verification |
|
||||
|
||||
All tokens are optional. Each one present means one less thing to infer. When absent, fall back to existing behavior for that stage.
|
||||
|
||||
**Conflicting mode flags:** If multiple mode tokens appear in arguments, stop and do not dispatch agents. If `mode:headless` is one of the conflicting tokens, emit the headless error envelope: `Review failed (headless mode). Reason: conflicting mode flags — <mode_a> and <mode_b> cannot be combined.` Otherwise emit the generic form: `Review failed. Reason: conflicting mode flags — <mode_a> and <mode_b> cannot be combined.`
|
||||
|
||||
## Mode Detection
|
||||
|
||||
| Mode | When | Behavior |
|
||||
|------|------|----------|
|
||||
| **Interactive** (default) | No mode token present | Review, apply safe_auto fixes automatically, present findings, ask for policy decisions on gated/manual findings, and optionally continue into fix/push/PR next steps |
|
||||
| **Autofix** | `mode:autofix` in arguments | No user interaction. Review, apply only policy-allowed `safe_auto` fixes, re-review in bounded rounds, write a run artifact, and emit residual downstream work when needed |
|
||||
| **Report-only** | `mode:report-only` in arguments | Strictly read-only. Review and report only, then stop with no edits, artifacts, todos, commits, pushes, or PR actions |
|
||||
| **Headless** | `mode:headless` in arguments | Programmatic mode for skill-to-skill invocation. Apply `safe_auto` fixes silently (single pass), return all other findings as structured text output, write run artifacts, skip todos, and return "Review complete" signal. No interactive prompts. |
|
||||
|
||||
### Autofix mode rules
|
||||
|
||||
- **Skip all user questions.** Never pause for approval or clarification once scope has been established.
|
||||
- **Apply only `safe_auto -> review-fixer` findings.** Leave `gated_auto`, `manual`, `human`, and `release` work unresolved.
|
||||
- **Write a run artifact** under `.context/compound-engineering/ce-code-review/<run-id>/` summarizing findings, applied fixes, residual actionable work, and advisory outputs.
|
||||
- **Create durable todo files only for unresolved actionable findings** whose final owner is `downstream-resolver`. Load the `ce-todo-create` skill for the canonical directory path and naming convention.
|
||||
- **Never commit, push, or create a PR** from autofix mode. Parent workflows own those decisions.
|
||||
|
||||
### Report-only mode rules
|
||||
|
||||
- **Skip all user questions.** Infer intent conservatively if the diff metadata is thin.
|
||||
- **Never edit files or externalize work.** Do not write `.context/compound-engineering/ce-code-review/<run-id>/`, do not create todo files, and do not commit, push, or create a PR.
|
||||
- **Safe for parallel read-only verification.** `mode:report-only` is the only mode that is safe to run concurrently with browser testing on the same checkout.
|
||||
- **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:report-only` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`.
|
||||
- **Do not overlap mutating review with browser testing on the same checkout.** If a future orchestrator wants fixes, run the mutating review phase after browser testing or in an isolated checkout/worktree.
|
||||
|
||||
### Headless mode rules
|
||||
|
||||
- **Skip all user questions.** Never use the platform question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini) or other interactive prompts. Infer intent conservatively if the diff metadata is thin.
|
||||
- **Require a determinable diff scope.** If headless mode cannot determine a diff scope (no branch, PR, or `base:` ref determinable without user interaction), emit `Review failed (headless mode). Reason: no diff scope detected. Re-invoke with a branch name, PR number, or base:<ref>.` and stop without dispatching agents.
|
||||
- **Apply only `safe_auto -> review-fixer` findings in a single pass.** No bounded re-review rounds. Leave `gated_auto`, `manual`, `human`, and `release` work unresolved and return them in the structured output.
|
||||
- **Return all non-auto findings as structured text output.** Use the headless output envelope format (see Stage 6 below) preserving severity, autofix_class, owner, requires_verification, confidence, pre_existing, and suggested_fix per finding. Enrich with detail-tier fields (why_it_matters, evidence[]) from the per-agent artifact files on disk (see Detail enrichment in Stage 6).
|
||||
- **Write a run artifact** under `.context/compound-engineering/ce-code-review/<run-id>/` summarizing findings, applied fixes, and advisory outputs. Include the artifact path in the structured output.
|
||||
- **Do not create todo files.** The caller receives structured findings and routes downstream work itself.
|
||||
- **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:headless` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`. When stopping, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.`
|
||||
- **Not safe for concurrent use on a shared checkout.** Unlike `mode:report-only`, headless mutates files (applies `safe_auto` fixes). Callers must not run headless concurrently with other mutating operations on the same checkout.
|
||||
- **Never commit, push, or create a PR** from headless mode. The caller owns those decisions.
|
||||
- **End with "Review complete" as the terminal signal** so callers can detect completion. If all reviewers fail or time out, emit `Code review degraded (headless mode). Reason: 0 of N reviewers returned results.` followed by "Review complete".
|
||||
|
||||
### Interactive mode rules
|
||||
|
||||
- **Pre-load the platform question tool before any question fires.** In Claude Code, `AskUserQuestion` is a deferred tool — its schema is not available at session start. At the start of Interactive-mode work (before Stage 2 intent-ambiguity questions, the After-Review routing question, walk-through per-finding questions, bulk-preview Proceed/Cancel, and tracker-defer failure sub-questions), call `ToolSearch` with query `select:AskUserQuestion` to load the schema. Load it **once, eagerly, at the top of the Interactive flow** — do not wait for the first question site and do not decide it on a per-site basis. On Codex (`request_user_input`) and Gemini (`ask_user`) this step is not required; the tools are loaded by default.
|
||||
- **The numbered-list fallback only applies on confirmed load failure.** The skill's fallback pattern — "present the options as a numbered list and wait for the user's reply" — is valid **only** when `ToolSearch` returns no match or the tool call explicitly fails. Rendering a question as narrative text because the tool feels inconvenient, because the model is in report-formatting mode, or because the instruction was buried in a long skill is a bug. A question that calls for a user decision must either fire the tool or fail loudly.
|
||||
|
||||
## Severity Scale
|
||||
|
||||
All reviewers use P0-P3:
|
||||
|
||||
| Level | Meaning | Action |
|
||||
|-------|---------|--------|
|
||||
| **P0** | Critical breakage, exploitable vulnerability, data loss/corruption | Must fix before merge |
|
||||
| **P1** | High-impact defect likely hit in normal usage, breaking contract | Should fix |
|
||||
| **P2** | Moderate issue with meaningful downside (edge case, perf regression, maintainability trap) | Fix if straightforward |
|
||||
| **P3** | Low-impact, narrow scope, minor improvement | User's discretion |
|
||||
|
||||
## Action Routing
|
||||
|
||||
Severity answers **urgency**. Routing answers **who acts next** and **whether this skill may mutate the checkout**.
|
||||
|
||||
| `autofix_class` | Default owner | Meaning |
|
||||
|-----------------|---------------|---------|
|
||||
| `safe_auto` | `review-fixer` | Local, deterministic fix suitable for the in-skill fixer when the current mode allows mutation |
|
||||
| `gated_auto` | `downstream-resolver` or `human` | Concrete fix exists, but it changes behavior, contracts, permissions, or another sensitive boundary that should not be auto-applied by default |
|
||||
| `manual` | `downstream-resolver` or `human` | Actionable work that should be handed off rather than fixed in-skill |
|
||||
| `advisory` | `human` or `release` | Report-only output such as learnings, rollout notes, or residual risk |
|
||||
|
||||
Routing rules:
|
||||
|
||||
- **Synthesis owns the final route.** Persona-provided routing metadata is input, not the last word.
|
||||
- **Choose the more conservative route on disagreement.** A merged finding may move from `safe_auto` to `gated_auto` or `manual`, but never the other way without stronger evidence.
|
||||
- **Only `safe_auto -> review-fixer` enters the in-skill fixer queue automatically.**
|
||||
- **`requires_verification: true` means a fix is not complete without targeted tests, a focused re-review, or operational validation.**
|
||||
|
||||
## Reviewers
|
||||
|
||||
17 reviewer personas in layered conditionals, plus CE-specific agents. See the persona catalog included below for the full catalog.
|
||||
|
||||
**Always-on (every review):**
|
||||
|
||||
| Agent | Focus |
|
||||
|-------|-------|
|
||||
| `review:ce-correctness-reviewer` | Logic errors, edge cases, state bugs, error propagation |
|
||||
| `review:ce-testing-reviewer` | Coverage gaps, weak assertions, brittle tests |
|
||||
| `review:ce-maintainability-reviewer` | Coupling, complexity, naming, dead code, abstraction debt |
|
||||
| `review:ce-project-standards-reviewer` | CLAUDE.md and AGENTS.md compliance -- frontmatter, references, naming, portability |
|
||||
| `review:ce-agent-native-reviewer` | Verify new features are agent-accessible |
|
||||
| `research:ce-learnings-researcher` | Search docs/solutions/ for past issues related to this PR |
|
||||
|
||||
**Cross-cutting conditional (selected per diff):**
|
||||
|
||||
| Agent | Select when diff touches... |
|
||||
|-------|---------------------------|
|
||||
| `review:ce-security-reviewer` | Auth, public endpoints, user input, permissions |
|
||||
| `review:ce-performance-reviewer` | DB queries, data transforms, caching, async |
|
||||
| `review:ce-api-contract-reviewer` | Routes, serializers, type signatures, versioning |
|
||||
| `review:ce-data-migrations-reviewer` | Migrations, schema changes, backfills |
|
||||
| `review:ce-reliability-reviewer` | Error handling, retries, timeouts, background jobs |
|
||||
| `review:ce-adversarial-reviewer` | Diff >=50 changed non-test/non-generated/non-lockfile lines, or auth, payments, data mutations, external APIs |
|
||||
| `review:ce-cli-readiness-reviewer` | CLI command definitions, argument parsing, CLI framework usage, command handler implementations |
|
||||
| `review:ce-previous-comments-reviewer` | Reviewing a PR that has existing review comments or threads |
|
||||
|
||||
**Stack-specific conditional (selected per diff):**
|
||||
|
||||
| Agent | Select when diff touches... |
|
||||
|-------|---------------------------|
|
||||
| `review:ce-dhh-rails-reviewer` | Rails architecture, service objects, session/auth choices, or Hotwire-vs-SPA boundaries |
|
||||
| `review:ce-kieran-rails-reviewer` | Rails application code where conventions, naming, and maintainability are in play |
|
||||
| `review:ce-kieran-python-reviewer` | Python modules, endpoints, scripts, or services |
|
||||
| `review:ce-kieran-typescript-reviewer` | TypeScript components, services, hooks, utilities, or shared types |
|
||||
| `review:ce-julik-frontend-races-reviewer` | Stimulus/Turbo controllers, DOM events, timers, animations, or async UI flows |
|
||||
|
||||
**CE conditional (migration-specific):**
|
||||
|
||||
| Agent | Select when diff includes migration files |
|
||||
|-------|------------------------------------------|
|
||||
| `review:ce-schema-drift-detector` | Cross-references schema.rb against included migrations |
|
||||
| `review:ce-deployment-verification-agent` | Produces deployment checklist with SQL verification queries |
|
||||
|
||||
## Review Scope
|
||||
|
||||
Every review spawns all 4 always-on personas plus the 2 CE always-on agents, then adds whichever cross-cutting and stack-specific conditionals fit the diff. The model naturally right-sizes: a small config change triggers 0 conditionals = 6 reviewers. A Rails auth feature might trigger security + reliability + kieran-rails + dhh-rails = 10 reviewers.
|
||||
|
||||
## Protected Artifacts
|
||||
|
||||
The following paths are compound-engineering pipeline artifacts and must never be flagged for deletion, removal, or gitignore by any reviewer:
|
||||
|
||||
- `docs/brainstorms/*` -- requirements documents created by ce-brainstorm
|
||||
- `docs/plans/*.md` -- plan files created by ce-plan (living documents with progress checkboxes)
|
||||
- `docs/solutions/*.md` -- solution documents created during the pipeline
|
||||
|
||||
If a reviewer flags any file in these directories for cleanup or removal, discard that finding during synthesis.
|
||||
|
||||
## How to Run
|
||||
|
||||
### Stage 1: Determine scope
|
||||
|
||||
Compute the diff range, file list, and diff. Minimize permission prompts by combining into as few commands as possible.
|
||||
|
||||
**If `base:` argument is provided (fast path):**
|
||||
|
||||
The caller already knows the diff base. Skip all base-branch detection, remote resolution, and merge-base computation. Use the provided value directly:
|
||||
|
||||
```
|
||||
BASE_ARG="{base_arg}"
|
||||
BASE=$(git merge-base HEAD "$BASE_ARG" 2>/dev/null) || BASE="$BASE_ARG"
|
||||
```
|
||||
|
||||
Then produce the same output as the other paths:
|
||||
|
||||
```
|
||||
echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard
|
||||
```
|
||||
|
||||
This path works with any ref — a SHA, `origin/main`, a branch name. Automated callers (ce-work, lfg, slfg) should prefer this to avoid the detection overhead. **Do not combine `base:` with a PR number or branch target.** If both are present, stop with an error: "Cannot use `base:` with a PR number or branch target — `base:` implies the current checkout is already the correct branch. Pass `base:` alone, or pass the target alone and let scope detection resolve the base." This avoids scope/intent mismatches where the diff base comes from one source but the code and metadata come from another.
|
||||
|
||||
**If a PR number or GitHub URL is provided as an argument:**
|
||||
|
||||
If `mode:report-only` or `mode:headless` is active, do **not** run `gh pr checkout <number-or-url>` on the shared checkout. For `mode:report-only`, tell the caller: "mode:report-only cannot switch the shared checkout to review a PR target. Run it from an isolated worktree/checkout for that PR, or run report-only with no target argument on the already checked out branch." For `mode:headless`, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.` Stop here unless the review is already running in an isolated checkout.
|
||||
|
||||
First, verify the worktree is clean before switching branches:
|
||||
|
||||
```
|
||||
git status --porcelain
|
||||
```
|
||||
|
||||
If the output is non-empty, inform the user: "You have uncommitted changes on the current branch. Stash or commit them before reviewing a PR, or use standalone mode (no argument) to review the current branch as-is." Do not proceed with checkout until the worktree is clean.
|
||||
|
||||
Then check out the PR branch so persona agents can read the actual code (not the current checkout):
|
||||
|
||||
```
|
||||
gh pr checkout <number-or-url>
|
||||
```
|
||||
|
||||
Then fetch PR metadata. Capture the base branch name and the PR base repository identity, not just the branch name:
|
||||
|
||||
```
|
||||
gh pr view <number-or-url> --json title,body,baseRefName,headRefName,url
|
||||
```
|
||||
|
||||
Use the repository portion of the returned PR URL as `<base-repo>` (for example, `EveryInc/compound-engineering-plugin` from `https://github.com/EveryInc/compound-engineering-plugin/pull/348`).
|
||||
|
||||
Then compute a local diff against the PR's base branch so re-reviews also include local fix commits and uncommitted edits. Substitute the PR base branch from metadata (shown here as `<base>`) and the PR base repository identity derived from the PR URL (shown here as `<base-repo>`). Resolve the base ref from the PR's actual base repository, not by assuming `origin` points at that repo:
|
||||
|
||||
```
|
||||
PR_BASE_REMOTE=$(git remote -v | awk 'index($2, "github.com:<base-repo>") || index($2, "github.com/<base-repo>") {print $1; exit}')
|
||||
if [ -n "$PR_BASE_REMOTE" ]; then PR_BASE_REMOTE_REF="$PR_BASE_REMOTE/<base>"; else PR_BASE_REMOTE_REF=""; fi
|
||||
PR_BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE_REF" 2>/dev/null || git rev-parse --verify <base> 2>/dev/null || true)
|
||||
if [ -z "$PR_BASE_REF" ]; then
|
||||
if [ -n "$PR_BASE_REMOTE_REF" ]; then
|
||||
git fetch --no-tags "$PR_BASE_REMOTE" <base>:refs/remotes/"$PR_BASE_REMOTE"/<base> 2>/dev/null || git fetch --no-tags "$PR_BASE_REMOTE" <base> 2>/dev/null || true
|
||||
PR_BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE_REF" 2>/dev/null || git rev-parse --verify <base> 2>/dev/null || true)
|
||||
else
|
||||
if git fetch --no-tags https://github.com/<base-repo>.git <base> 2>/dev/null; then
|
||||
PR_BASE_REF=$(git rev-parse --verify FETCH_HEAD 2>/dev/null || true)
|
||||
fi
|
||||
if [ -z "$PR_BASE_REF" ]; then PR_BASE_REF=$(git rev-parse --verify <base> 2>/dev/null || true); fi
|
||||
fi
|
||||
fi
|
||||
if [ -n "$PR_BASE_REF" ]; then BASE=$(git merge-base HEAD "$PR_BASE_REF" 2>/dev/null) || BASE=""; else BASE=""; fi
|
||||
```
|
||||
|
||||
```
|
||||
if [ -n "$BASE" ]; then echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard; else echo "ERROR: Unable to resolve PR base branch <base> locally. Fetch the base branch and rerun so the review scope stays aligned with the PR."; fi
|
||||
```
|
||||
|
||||
Extract PR title/body, base branch, and PR URL from `gh pr view`, then extract the base marker, file list, diff content, and `UNTRACKED:` list from the local command. Do not use `gh pr diff` as the review scope after checkout -- it only reflects the remote PR state and will miss local fix commits until they are pushed. If the base ref still cannot be resolved from the PR's actual base repository after the fetch attempt, stop instead of falling back to `git diff HEAD`; a PR review without the PR base branch is incomplete.
|
||||
|
||||
**If a branch name is provided as an argument:**
|
||||
|
||||
Check out the named branch, then diff it against the base branch. Substitute the provided branch name (shown here as `<branch>`).
|
||||
|
||||
If `mode:report-only` or `mode:headless` is active, do **not** run `git checkout <branch>` on the shared checkout. For `mode:report-only`, tell the caller: "mode:report-only cannot switch the shared checkout to review another branch. Run it from an isolated worktree/checkout for `<branch>`, or run report-only on the current checkout with no target argument." For `mode:headless`, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.` Stop here unless the review is already running in an isolated checkout.
|
||||
|
||||
First, verify the worktree is clean before switching branches:
|
||||
|
||||
```
|
||||
git status --porcelain
|
||||
```
|
||||
|
||||
If the output is non-empty, inform the user: "You have uncommitted changes on the current branch. Stash or commit them before reviewing another branch, or provide a PR number instead." Do not proceed with checkout until the worktree is clean.
|
||||
|
||||
```
|
||||
git checkout <branch>
|
||||
```
|
||||
|
||||
Then detect the review base branch and compute the merge-base. Run the `references/resolve-base.sh` script, which handles fork-safe remote resolution with multi-fallback detection (PR metadata -> `origin/HEAD` -> `gh repo view` -> common branch names):
|
||||
|
||||
```
|
||||
RESOLVE_OUT=$(bash references/resolve-base.sh) || { echo "ERROR: resolve-base.sh failed"; exit 1; }
|
||||
if [ -z "$RESOLVE_OUT" ] || echo "$RESOLVE_OUT" | grep -q '^ERROR:'; then echo "${RESOLVE_OUT:-ERROR: resolve-base.sh produced no output}"; exit 1; fi
|
||||
BASE=$(echo "$RESOLVE_OUT" | sed 's/^BASE://')
|
||||
```
|
||||
|
||||
If the script outputs an error, stop instead of falling back to `git diff HEAD`; a branch review without the base branch would only show uncommitted changes and silently miss all committed work.
|
||||
|
||||
On success, produce the diff:
|
||||
|
||||
```
|
||||
echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard
|
||||
```
|
||||
|
||||
You may still fetch additional PR metadata with `gh pr view` for title, body, and linked issues, but do not fail if no PR exists.
|
||||
|
||||
**If no argument (standalone on current branch):**
|
||||
|
||||
Detect the review base branch and compute the merge-base using the same `references/resolve-base.sh` script as branch mode:
|
||||
|
||||
```
|
||||
RESOLVE_OUT=$(bash references/resolve-base.sh) || { echo "ERROR: resolve-base.sh failed"; exit 1; }
|
||||
if [ -z "$RESOLVE_OUT" ] || echo "$RESOLVE_OUT" | grep -q '^ERROR:'; then echo "${RESOLVE_OUT:-ERROR: resolve-base.sh produced no output}"; exit 1; fi
|
||||
BASE=$(echo "$RESOLVE_OUT" | sed 's/^BASE://')
|
||||
```
|
||||
|
||||
If the script outputs an error, stop instead of falling back to `git diff HEAD`; a standalone review without the base branch would only show uncommitted changes and silently miss all committed work on the branch.
|
||||
|
||||
On success, produce the diff:
|
||||
|
||||
```
|
||||
echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard
|
||||
```
|
||||
|
||||
Using `git diff $BASE` (without `..HEAD`) diffs the merge-base against the working tree, which includes committed, staged, and unstaged changes together.
|
||||
|
||||
**Untracked file handling:** Always inspect the `UNTRACKED:` list, even when `FILES:`/`DIFF:` are non-empty. Untracked files are outside review scope until staged. If the list is non-empty, tell the user which files are excluded. If any of them should be reviewed, stop and tell the user to `git add` them first and rerun. Only continue when the user is intentionally reviewing tracked changes only. In `mode:headless` or `mode:autofix`, do not stop to ask — proceed with tracked changes only and note the excluded untracked files in the Coverage section of the output.
|
||||
|
||||
### Stage 2: Intent discovery
|
||||
|
||||
Understand what the change is trying to accomplish. The source of intent depends on which Stage 1 path was taken:
|
||||
|
||||
**PR/URL mode:** Use the PR title, body, and linked issues from `gh pr view` metadata. Supplement with commit messages from the PR if the body is sparse.
|
||||
|
||||
**Branch mode:** Run `git log --oneline ${BASE}..<branch>` using the resolved merge-base from Stage 1.
|
||||
|
||||
**Standalone (current branch):** Run:
|
||||
|
||||
```
|
||||
echo "BRANCH:" && git rev-parse --abbrev-ref HEAD && echo "COMMITS:" && git log --oneline ${BASE}..HEAD
|
||||
```
|
||||
|
||||
Combined with conversation context (plan section summary, PR description), write a 2-3 line intent summary:
|
||||
|
||||
```
|
||||
Intent: Simplify tax calculation by replacing the multi-tier rate lookup
|
||||
with a flat-rate computation. Must not regress edge cases in tax-exempt handling.
|
||||
```
|
||||
|
||||
Pass this to every reviewer in their spawn prompt. Intent shapes *how hard each reviewer looks*, not which reviewers are selected.
|
||||
|
||||
**When intent is ambiguous:**
|
||||
|
||||
- **Interactive mode:** Ask one question using the platform's interactive question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini): "What is the primary goal of these changes?" Do not spawn reviewers until intent is established. **Claude Code only:** if `AskUserQuestion` has not yet been loaded this session (per the Interactive mode rules pre-load), call `ToolSearch` with query `select:AskUserQuestion` first before asking. On Codex (`request_user_input`) and Gemini (`ask_user`) this preload step does not apply — the platform-native question tool is loaded by default.
|
||||
- **Autofix/report-only/headless modes:** Infer intent conservatively from the branch name, diff, PR metadata, and caller context. Note the uncertainty in Coverage or Verdict reasoning instead of blocking.
|
||||
|
||||
### Stage 2b: Plan discovery (requirements verification)
|
||||
|
||||
Locate the plan document so Stage 6 can verify requirements completeness. Check these sources in priority order — stop at the first hit:
|
||||
|
||||
1. **`plan:` argument.** If the caller passed a plan path, use it directly. Read the file to confirm it exists.
|
||||
2. **PR body.** If PR metadata was fetched in Stage 1, scan the body for paths matching `docs/plans/*.md`. If exactly one match is found and the file exists, use it as `plan_source: explicit`. If multiple plan paths appear, treat as ambiguous — demote to `plan_source: inferred` for the most recent match that exists on disk, or skip if none exist or none clearly relate to the PR title/intent. Always verify the selected file exists before using it — stale or copied plan links in PR descriptions are common.
|
||||
3. **Auto-discover.** Extract 2-3 keywords from the branch name (e.g., `feat/onboarding-skill` -> `onboarding`, `skill`). Glob `docs/plans/*` and filter filenames containing those keywords. If exactly one match, use it. If multiple matches or the match looks ambiguous (e.g., generic keywords like `review`, `fix`, `update` that could hit many plans), **skip auto-discovery** — a wrong plan is worse than no plan. If zero matches, skip.
|
||||
|
||||
**Confidence tagging:** Record how the plan was found:
|
||||
- `plan:` argument -> `plan_source: explicit` (high confidence)
|
||||
- Single unambiguous PR body match -> `plan_source: explicit` (high confidence)
|
||||
- Multiple/ambiguous PR body matches -> `plan_source: inferred` (lower confidence)
|
||||
- Auto-discover with single unambiguous match -> `plan_source: inferred` (lower confidence)
|
||||
|
||||
If a plan is found, read its **Requirements Trace** (R1, R2, etc.) and **Implementation Units** (checkbox items). Store the extracted requirements list and `plan_source` for Stage 6. Do not block the review if no plan is found — requirements verification is additive, not required.
|
||||
|
||||
### Stage 3: Select reviewers
|
||||
|
||||
Read the diff and file list from Stage 1. The 4 always-on personas and 2 CE always-on agents are automatic. For each cross-cutting and stack-specific conditional persona in the persona catalog included below, decide whether the diff warrants it. This is agent judgment, not keyword matching.
|
||||
|
||||
**File-type awareness for conditional selection:** Instruction-prose files (Markdown skill definitions, JSON schemas, config files) are product code but do not benefit from runtime-focused reviewers. The adversarial reviewer's techniques (race conditions, cascade failures, abuse cases) target executable code behavior. For diffs that only change instruction-prose files, skip adversarial unless the prose describes auth, payment, or data-mutation behavior. Count only executable code lines toward line-count thresholds.
|
||||
|
||||
**`previous-comments` is PR-only.** Only select this persona when Stage 1 gathered PR metadata (PR number or URL was provided as an argument, or `gh pr view` returned metadata for the current branch). Skip it entirely for standalone branch reviews with no associated PR -- there are no prior comments to check.
|
||||
|
||||
Stack-specific personas are additive. A Rails UI change may warrant `kieran-rails` plus `julik-frontend-races`; a TypeScript API diff may warrant `kieran-typescript` plus `api-contract` and `reliability`.
|
||||
|
||||
For CE conditional agents, check if the diff includes files matching `db/migrate/*.rb`, `db/schema.rb`, or data backfill scripts.
|
||||
|
||||
Announce the team before spawning:
|
||||
|
||||
```
|
||||
Review team:
|
||||
- correctness (always)
|
||||
- testing (always)
|
||||
- maintainability (always)
|
||||
- project-standards (always)
|
||||
- ce-agent-native-reviewer (always)
|
||||
- ce-learnings-researcher (always)
|
||||
- security -- new endpoint in routes.rb accepts user-provided redirect URL
|
||||
- kieran-rails -- controller and Turbo flow changed in app/controllers and app/views
|
||||
- dhh-rails -- diff adds service objects around ordinary Rails CRUD
|
||||
- data-migrations -- adds migration 20260303_add_index_to_orders
|
||||
- ce-schema-drift-detector -- migration files present
|
||||
```
|
||||
|
||||
This is progress reporting, not a blocking confirmation.
|
||||
|
||||
### Stage 3b: Discover project standards paths
|
||||
|
||||
Before spawning sub-agents, find the file paths (not contents) of all relevant standards files for the `project-standards` persona. Use the native file-search/glob tool to locate:
|
||||
|
||||
1. Use the native file-search tool (e.g., Glob in Claude Code) to find all `**/CLAUDE.md` and `**/AGENTS.md` in the repo.
|
||||
2. Filter to those whose directory is an ancestor of at least one changed file. A standards file governs all files below it (e.g., `plugins/compound-engineering/AGENTS.md` applies to everything under `plugins/compound-engineering/`).
|
||||
|
||||
Pass the resulting path list to the `project-standards` persona inside a `<standards-paths>` block in its review context (see Stage 4). The persona reads the files itself, targeting only the sections relevant to the changed file types. This keeps the orchestrator's work cheap (path discovery only) and avoids bloating the subagent prompt with content the reviewer may not fully need.
|
||||
|
||||
### Stage 4: Spawn sub-agents
|
||||
|
||||
#### Model tiering
|
||||
|
||||
Persona sub-agents do focused, scoped work and should use a fast mid-tier model to reduce cost and latency without sacrificing review quality. The orchestrator itself stays on the default (most capable) model.
|
||||
|
||||
Use the platform's mid-tier model for all persona and CE sub-agents. In Claude Code, pass `model: "sonnet"` in the Agent tool call. On other platforms, use the equivalent mid-tier (e.g., `gpt-4o` in Codex). If the platform has no model override mechanism or the available model names are unknown, omit the model parameter and let agents inherit the default -- a working review on the parent model is better than a broken dispatch from an unrecognized model name.
|
||||
|
||||
CE always-on agents (ce-agent-native-reviewer, ce-learnings-researcher) and CE conditional agents (ce-schema-drift-detector, ce-deployment-verification-agent) also use the mid-tier model since they perform scoped, focused work.
|
||||
|
||||
The orchestrator (this skill) stays on the default model because it handles intent discovery, reviewer selection, finding merge/dedup, and synthesis -- tasks that benefit from stronger reasoning.
|
||||
|
||||
#### Run ID
|
||||
|
||||
Generate a unique run identifier before dispatching any agents. This ID scopes all agent artifact files and the post-review run artifact to the same directory.
|
||||
|
||||
```bash
|
||||
RUN_ID=$(date +%Y%m%d-%H%M%S)-$(head -c4 /dev/urandom | od -An -tx1 | tr -d ' ')
|
||||
mkdir -p ".context/compound-engineering/ce-code-review/$RUN_ID"
|
||||
```
|
||||
|
||||
Pass `{run_id}` to every persona sub-agent so they can write their full analysis to `.context/compound-engineering/ce-code-review/{run_id}/{reviewer_name}.json`.
|
||||
|
||||
**Report-only mode:** Skip run-id generation and directory creation. Do not pass `{run_id}` to agents. Agents return compact JSON only with no file write, consistent with report-only's no-write contract.
|
||||
|
||||
#### Spawning
|
||||
|
||||
Omit the `mode` parameter when dispatching sub-agents so the user's configured permission settings apply. Do not pass `mode: "auto"`.
|
||||
|
||||
Spawn each selected persona reviewer as a parallel sub-agent using the subagent template included below. Each persona sub-agent receives:
|
||||
|
||||
1. Their persona file content (identity, failure modes, calibration, suppress conditions)
|
||||
2. Shared diff-scope rules from the diff-scope reference included below
|
||||
3. The JSON output contract from the findings schema included below
|
||||
4. PR metadata: title, body, and URL when reviewing a PR (empty string otherwise). Passed in a `<pr-context>` block so reviewers can verify code against stated intent
|
||||
5. Review context: intent summary, file list, diff
|
||||
6. Run ID and reviewer name for the artifact file path
|
||||
7. **For `project-standards` only:** the standards file path list from Stage 3b, wrapped in a `<standards-paths>` block appended to the review context
|
||||
|
||||
Persona sub-agents are **read-only** with respect to the project: they review and return structured JSON. They do not edit project files or propose refactors. The one permitted write is saving their full analysis to the `.context/` artifact path specified in the output contract.
|
||||
|
||||
Read-only here means **non-mutating**, not "no shell access." Reviewer sub-agents may use non-mutating inspection commands when needed to gather evidence or verify scope, including read-oriented `git` / `gh` usage such as `git diff`, `git show`, `git blame`, `git log`, and `gh pr view`. They must not edit project files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
|
||||
|
||||
Each persona sub-agent writes full JSON (all schema fields) to `.context/compound-engineering/ce-code-review/{run_id}/{reviewer_name}.json` and returns compact JSON with merge-tier fields only:
|
||||
|
||||
```json
|
||||
{
|
||||
"reviewer": "security",
|
||||
"findings": [
|
||||
{
|
||||
"title": "User-supplied ID in account lookup without ownership check",
|
||||
"severity": "P0",
|
||||
"file": "orders_controller.rb",
|
||||
"line": 42,
|
||||
"confidence": 0.92,
|
||||
"autofix_class": "gated_auto",
|
||||
"owner": "downstream-resolver",
|
||||
"requires_verification": true,
|
||||
"pre_existing": false,
|
||||
"suggested_fix": "Add current_user.owns?(account) guard before lookup"
|
||||
}
|
||||
],
|
||||
"residual_risks": [...],
|
||||
"testing_gaps": [...]
|
||||
}
|
||||
```
|
||||
|
||||
Detail-tier fields (`why_it_matters`, `evidence`) are in the artifact file only. `suggested_fix` is optional in both tiers -- included in compact returns when present so the orchestrator has fix context for auto-apply decisions. If the file write fails, the compact return still provides everything the merge needs.
|
||||
|
||||
**CE always-on agents** (ce-agent-native-reviewer, ce-learnings-researcher) are dispatched as standard Agent calls in parallel with the persona agents. Give them the same review context bundle the personas receive: entry mode, any PR metadata gathered in Stage 1, intent summary, review base branch name when known, `BASE:` marker, file list, diff, and `UNTRACKED:` scope notes. Do not invoke them with a generic "review this" prompt. Their output is unstructured and synthesized separately in Stage 6.
|
||||
|
||||
**CE conditional agents** (ce-schema-drift-detector, ce-deployment-verification-agent) are also dispatched as standard Agent calls when applicable. Pass the same review context bundle plus the applicability reason (for example, which migration files triggered the agent). For ce-schema-drift-detector specifically, pass the resolved review base branch explicitly so it never assumes `main`. Their output is unstructured and must be preserved for Stage 6 synthesis just like the CE always-on agents.
|
||||
|
||||
### Stage 5: Merge findings
|
||||
|
||||
Convert multiple reviewer compact JSON returns into one deduplicated, confidence-gated finding set. The compact returns contain merge-tier fields (title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing) plus the optional suggested_fix. Detail-tier fields (why_it_matters, evidence) are on disk in the per-agent artifact files and are not loaded at this stage.
|
||||
|
||||
1. **Validate.** Check each compact return for required top-level and per-finding fields, plus value constraints. Drop malformed returns or findings. Record the drop count.
|
||||
- **Top-level required:** reviewer (string), findings (array), residual_risks (array), testing_gaps (array). Drop the entire return if any are missing or wrong type.
|
||||
- **Per-finding required:** title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing
|
||||
- **Value constraints:**
|
||||
- severity: P0 | P1 | P2 | P3
|
||||
- autofix_class: safe_auto | gated_auto | manual | advisory
|
||||
- owner: review-fixer | downstream-resolver | human | release
|
||||
- confidence: numeric, 0.0-1.0
|
||||
- line: positive integer
|
||||
- pre_existing, requires_verification: boolean
|
||||
- Do not validate against the full schema here -- the full schema (including why_it_matters and evidence) applies to the artifact files on disk, not the compact returns.
|
||||
2. **Confidence gate.** Suppress findings below 0.60 confidence. Exception: P0 findings at 0.50+ confidence survive the gate -- critical-but-uncertain issues must not be silently dropped. Record the suppressed count. This matches the persona instructions and the schema's confidence thresholds.
|
||||
3. **Deduplicate.** Compute fingerprint: `normalize(file) + line_bucket(line, +/-3) + normalize(title)`. When fingerprints match, merge: keep highest severity, keep highest confidence, note which reviewers flagged it.
|
||||
4. **Cross-reviewer agreement.** When 2+ independent reviewers flag the same issue (same fingerprint), boost the merged confidence by 0.10 (capped at 1.0). Cross-reviewer agreement is strong signal -- independent reviewers converging on the same issue is more reliable than any single reviewer's confidence. Note the agreement in the Reviewer column of the output (e.g., "security, correctness").
|
||||
5. **Separate pre-existing.** Pull out findings with `pre_existing: true` into a separate list.
|
||||
6. **Resolve disagreements.** When reviewers flag the same code region but disagree on severity, autofix_class, or owner, annotate the Reviewer column with the disagreement (e.g., "security (P0), correctness (P1) -- kept P0"). This transparency helps the user understand why a finding was routed the way it was.
|
||||
7. **Normalize routing.** For each merged finding, set the final `autofix_class`, `owner`, and `requires_verification`. If reviewers disagree, keep the most conservative route. Synthesis may narrow a finding from `safe_auto` to `gated_auto` or `manual`, but must not widen it without new evidence.
|
||||
7b. **Tie-break the recommended action.** Interactive mode's walk-through and LFG paths present a per-finding recommended action (Apply / Defer / Skip / Acknowledge) derived from the normalized `autofix_class` and `suggested_fix`. When contributing reviewers implied different actions for the same merged finding, synthesis picks the most conservative using the order `Skip > Defer > Apply > Acknowledge`. This guarantees that identical review artifacts produce the same recommendation deterministically, so LFG results are auditable after the fact and the walk-through's recommendation is stable across re-runs. The user may still override per finding via the walk-through's options; this rule only determines what gets labeled "recommended."
|
||||
8. **Partition the work.** Build three sets:
|
||||
- in-skill fixer queue: only `safe_auto -> review-fixer`
|
||||
- residual actionable queue: unresolved `gated_auto` or `manual` findings whose owner is `downstream-resolver`
|
||||
- report-only queue: `advisory` findings plus anything owned by `human` or `release`
|
||||
9. **Sort.** Order by severity (P0 first) -> confidence (descending) -> file path -> line number.
|
||||
10. **Collect coverage data.** Union residual_risks and testing_gaps across reviewers.
|
||||
11. **Preserve CE agent artifacts.** Keep the learnings, agent-native, schema-drift, and deployment-verification outputs alongside the merged finding set. Do not drop unstructured agent output just because it does not match the persona JSON schema.
|
||||
|
||||
### Stage 6: Synthesize and present
|
||||
|
||||
Assemble the final report using **pipe-delimited markdown tables for findings** from the review output template included below. The table format is mandatory for finding rows in interactive mode — do not render findings as freeform text blocks or horizontal-rule-separated prose. Other report sections (Applied Fixes, Learnings, Coverage, etc.) use bullet lists and the `---` separator before the verdict, as shown in the template.
|
||||
|
||||
1. **Header.** Scope, intent, mode, reviewer team with per-conditional justifications.
|
||||
2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists.
|
||||
3. **Requirements Completeness.** Include only when a plan was found in Stage 2b. For each requirement (R1, R2, etc.) and implementation unit in the plan, report whether corresponding work appears in the diff. Use a simple checklist: met / not addressed / partially addressed. Routing depends on `plan_source`:
|
||||
- **`explicit`** (caller-provided or PR body): Flag unaddressed requirements as P1 findings with `autofix_class: manual`, `owner: downstream-resolver`. These enter the residual actionable queue and can become todos.
|
||||
- **`inferred`** (auto-discovered): Flag unaddressed requirements as P3 findings with `autofix_class: advisory`, `owner: human`. These stay in the report only — no todos, no autonomous follow-up. An inferred plan match is a hint, not a contract.
|
||||
Omit this section entirely when no plan was found — do not mention the absence of a plan.
|
||||
4. **Applied Fixes.** Include only if a fix phase ran in this invocation.
|
||||
5. **Residual Actionable Work.** Include when unresolved actionable findings were handed off or should be handed off.
|
||||
6. **Pre-existing.** Separate section, does not count toward verdict.
|
||||
7. **Learnings & Past Solutions.** Surface ce-learnings-researcher results: if past solutions are relevant, flag them as "Known Pattern" with links to docs/solutions/ files.
|
||||
8. **Agent-Native Gaps.** Surface ce-agent-native-reviewer results. Omit section if no gaps found.
|
||||
9. **Schema Drift Check.** If ce-schema-drift-detector ran, summarize whether drift was found. If drift exists, list the unrelated schema objects and the required cleanup command. If clean, say so briefly.
|
||||
10. **Deployment Notes.** If ce-deployment-verification-agent ran, surface the key Go/No-Go items: blocking pre-deploy checks, the most important verification queries, rollback caveats, and monitoring focus areas. Keep the checklist actionable rather than dropping it into Coverage.
|
||||
11. **Coverage.** Suppressed count, residual risks, testing gaps, failed/timed-out reviewers, and any intent uncertainty carried by non-interactive modes.
|
||||
12. **Verdict.** Ready to merge / Ready with fixes / Not ready. Fix order if applicable. When an `explicit` plan has unaddressed requirements, the verdict must reflect it — a PR that's code-clean but missing planned requirements is "Not ready" unless the omission is intentional. When an `inferred` plan has unaddressed requirements, note it in the verdict reasoning but do not block on it alone.
|
||||
|
||||
Do not include time estimates.
|
||||
|
||||
**Format verification:** Before delivering the report, verify the findings sections use pipe-delimited table rows (`| # | File | Issue | ... |`) not freeform text. If you catch yourself rendering findings as prose blocks separated by horizontal rules or bullet points, stop and reformat into tables.
|
||||
|
||||
### Headless output format
|
||||
|
||||
In `mode:headless`, replace the interactive pipe-delimited table report with a structured text envelope. The envelope follows the same structural pattern as document-review's headless output (completion header, metadata block, findings grouped by autofix_class, trailing sections) while using ce-code-review's own section headings and per-finding fields.
|
||||
|
||||
```
|
||||
Code review complete (headless mode).
|
||||
|
||||
Scope: <scope-line>
|
||||
Intent: <intent-summary>
|
||||
Reviewers: <reviewer-list with conditional justifications>
|
||||
Verdict: <Ready to merge | Ready with fixes | Not ready>
|
||||
Artifact: .context/compound-engineering/ce-code-review/<run-id>/
|
||||
|
||||
Applied N safe_auto fixes.
|
||||
|
||||
Gated-auto findings (concrete fix, changes behavior/contracts):
|
||||
|
||||
[P1][gated_auto -> downstream-resolver][needs-verification] File: <file:line> -- <title> (<reviewer>, confidence <N>)
|
||||
Why: <why_it_matters>
|
||||
Suggested fix: <suggested_fix or "none">
|
||||
Evidence: <evidence[0]>
|
||||
Evidence: <evidence[1]>
|
||||
|
||||
Manual findings (actionable, needs handoff):
|
||||
|
||||
[P1][manual -> downstream-resolver] File: <file:line> -- <title> (<reviewer>, confidence <N>)
|
||||
Why: <why_it_matters>
|
||||
Evidence: <evidence[0]>
|
||||
|
||||
Advisory findings (report-only):
|
||||
|
||||
[P2][advisory -> human] File: <file:line> -- <title> (<reviewer>, confidence <N>)
|
||||
Why: <why_it_matters>
|
||||
|
||||
Pre-existing issues:
|
||||
[P2][gated_auto -> downstream-resolver] File: <file:line> -- <title> (<reviewer>, confidence <N>)
|
||||
Why: <why_it_matters>
|
||||
|
||||
Residual risks:
|
||||
- <risk>
|
||||
|
||||
Learnings & Past Solutions:
|
||||
- <learning>
|
||||
|
||||
Agent-Native Gaps:
|
||||
- <gap description>
|
||||
|
||||
Schema Drift Check:
|
||||
- <drift status>
|
||||
|
||||
Deployment Notes:
|
||||
- <deployment note>
|
||||
|
||||
Testing gaps:
|
||||
- <gap>
|
||||
|
||||
Coverage:
|
||||
- Suppressed: <N> findings below 0.60 confidence (P0 at 0.50+ retained)
|
||||
- Untracked files excluded: <file1>, <file2>
|
||||
- Failed reviewers: <reviewer>
|
||||
|
||||
Review complete
|
||||
```
|
||||
|
||||
**Detail enrichment (headless only):** The headless envelope includes `Why:`, `Evidence:`, and `Suggested fix:` lines. After merge (Stage 5), read the per-agent artifact files from `.context/compound-engineering/ce-code-review/{run_id}/` for only the findings that survived dedup and confidence gating.
|
||||
- **Field tiers:** `Why:` and `Evidence:` are detail-tier -- load from per-agent artifact files. `Suggested fix:` is merge-tier -- use it directly from the compact return without artifact lookup.
|
||||
- **Artifact matching:** For each surviving finding, look up its detail-tier fields in the artifact files of the contributing reviewers. Match on `file + line_bucket(line, +/-3)` (the same tolerance used in Stage 5 dedup) within each contributing reviewer's artifact. When multiple artifact entries fall within the line bucket, apply `normalize(title)` to both the merged finding's title and each candidate entry's title as a tie-breaker.
|
||||
- **Reviewer order:** Try contributing reviewers in the order they appear in the merged finding's reviewer list; use the first match.
|
||||
- **No-match fallback:** If no artifact file contains a match (all writes failed, or the finding was synthesized during merge), omit the `Why:` and `Evidence:` lines for that finding and note the gap in Coverage. The `Suggested fix:` line can still be populated from the compact return since it is merge-tier.
|
||||
|
||||
**Formatting rules:**
|
||||
- The `[needs-verification]` marker appears only on findings where `requires_verification: true`.
|
||||
- The `Artifact:` line gives callers the path to the full run artifact for machine-readable access to the complete findings schema. The text envelope is the primary handoff; the artifact is for debugging and full-fidelity access.
|
||||
- Findings with `owner: release` appear in the Advisory section (they are operational/rollout items, not code fixes).
|
||||
- Findings with `pre_existing: true` appear in the Pre-existing section regardless of autofix_class.
|
||||
- The Verdict appears in the metadata header (deliberately reordered from the interactive format where it appears at the bottom) so programmatic callers get the verdict first.
|
||||
- Omit any section with zero items.
|
||||
- If all reviewers fail or time out, emit `Code review degraded (headless mode). Reason: 0 of N reviewers returned results.` followed by "Review complete".
|
||||
- End with "Review complete" as the terminal signal so callers can detect completion.
|
||||
|
||||
## Quality Gates
|
||||
|
||||
Before delivering the review, verify:
|
||||
|
||||
1. **Every finding is actionable.** Re-read each finding. If it says "consider", "might want to", or "could be improved" without a concrete fix, rewrite it with a specific action. Vague findings waste engineering time.
|
||||
2. **No false positives from skimming.** For each finding, verify the surrounding code was actually read. Check that the "bug" isn't handled elsewhere in the same function, that the "unused import" isn't used in a type annotation, that the "missing null check" isn't guarded by the caller.
|
||||
3. **Severity is calibrated.** A style nit is never P0. A SQL injection is never P3. Re-check every severity assignment.
|
||||
4. **Line numbers are accurate.** Verify each cited line number against the file content. A finding pointing to the wrong line is worse than no finding.
|
||||
5. **Protected artifacts are respected.** Discard any findings that recommend deleting or gitignoring files in `docs/brainstorms/`, `docs/plans/`, or `docs/solutions/`.
|
||||
6. **Findings don't duplicate linter output.** Don't flag things the project's linter/formatter would catch (missing semicolons, wrong indentation). Focus on semantic issues.
|
||||
|
||||
## Language-Aware Conditionals
|
||||
|
||||
This skill uses stack-specific reviewer agents when the diff clearly warrants them. Keep those agents opinionated. They are not generic language checkers; they add a distinct review lens on top of the always-on and cross-cutting personas.
|
||||
|
||||
Do not spawn them mechanically from file extensions alone. The trigger is meaningful changed behavior, architecture, or UI state in that stack.
|
||||
|
||||
## After Review
|
||||
|
||||
### Mode-Driven Post-Review Flow
|
||||
|
||||
After presenting findings and verdict (Stage 6), route the next steps by mode. Review and synthesis stay the same in every mode; only mutation and handoff behavior changes.
|
||||
|
||||
#### Step 1: Build the action sets
|
||||
|
||||
- **Clean review** means zero findings after suppression and pre-existing separation. Skip the fix/handoff phase when the review is clean.
|
||||
- **Fixer queue:** final findings routed to `safe_auto -> review-fixer`.
|
||||
- **Residual actionable queue:** unresolved `gated_auto` or `manual` findings whose final owner is `downstream-resolver`.
|
||||
- **Report-only queue:** `advisory` findings and any outputs owned by `human` or `release`.
|
||||
- **Never convert advisory-only outputs into fix work or todos.** Deployment notes, residual risks, and release-owned items stay in the report.
|
||||
|
||||
#### Step 2: Choose policy by mode
|
||||
|
||||
**Interactive mode**
|
||||
|
||||
- Apply `safe_auto -> review-fixer` findings automatically without asking. These are safe by definition.
|
||||
- **Zero-remaining case:** if no `gated_auto` or `manual` findings remain after the `safe_auto` pass, skip the routing question entirely. Emit a one-line completion summary phrased so advisory and pre-existing findings (which are not handled by this flow) are not implied to be cleared. When no advisory or pre-existing findings remain in the report, `All findings resolved — N safe_auto fixes applied.` is accurate. When advisory and/or pre-existing findings do remain, use the qualified form `All actionable findings resolved — N safe_auto fixes applied. (K advisory, J pre-existing findings remain in the report.)`, omitting any zero-count clause. Follow the summary with the existing end-of-review verdict, then proceed to Step 5 per the gating rule there.
|
||||
- **Tracker pre-detection:** before rendering the routing question, consult `references/tracker-defer.md` for the session's tracker tuple `{ tracker_name, confidence, named_sink_available, any_sink_available }`. The probe runs at most once per session and is cached for the rest of the run. `named_sink_available` drives the option C label (inline tracker name only when the named sink can actually be invoked). `any_sink_available` drives whether option C is offered at all (it can still be offered when the named tracker is unreachable but `gh` or the harness task primitive works).
|
||||
- **Verify question-tool pre-load (checklist, Claude Code only).** Before firing the routing question in Claude Code, confirm `AskUserQuestion` is loaded (per Interactive mode rules at the top of this skill). If not yet loaded this session, call `ToolSearch` with query `select:AskUserQuestion` now. Do not proceed to the routing question without this verification. Rendering the question as narrative text is a bug, not a valid fallback. On Codex (`request_user_input`) and Gemini (`ask_user`) this checklist does not apply — the platform-native question tool is loaded by default and there is no `ToolSearch` preload step to perform.
|
||||
- **Routing question.** Ask using the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Stem: `What should the agent do with the remaining N findings?` — use third-person voice referring to "the agent", not first-person "me" / "I". Options:
|
||||
|
||||
```
|
||||
(A) Review each finding one by one — accept the recommendation or choose another action
|
||||
(B) LFG. Apply the agent's best-judgment action per finding
|
||||
(C) File a [TRACKER] ticket per finding without applying fixes
|
||||
(D) Report only — take no further action
|
||||
```
|
||||
|
||||
Render option C per `references/tracker-defer.md`: when `confidence = high` AND `named_sink_available = true`, replace `[TRACKER]` with the concrete name and keep the full label (e.g., `File a Linear ticket per finding without applying fixes`). When `any_sink_available = true` but either `confidence = low` or `named_sink_available = false` (a fallback tier like GitHub Issues or the harness task primitive is working instead), use the generic label `File an issue per finding without applying fixes` — this is a whole-label substitution, not a `[TRACKER]` token swap. When `any_sink_available = false`, **omit option C entirely** and add one line to the stem explaining why (e.g., `Defer unavailable — no tracker or task-tracking primitive detected on this platform.`). The three remaining options (A, B, D) survive.
|
||||
|
||||
The numbered-list text fallback applies only when `ToolSearch` explicitly returns no match for the platform's question tool or the tool call errors. It does not apply when the agent simply hasn't loaded the tool yet — in that case, load it now (see the verification checklist above). On platforms genuinely without a blocking question tool, present the applicable options as a numbered list and wait for the user's reply.
|
||||
|
||||
- **Dispatch on selection.** Route by the option letter (A / B / C / D), not by the rendered label string. The option-C label varies by tracker-detection confidence (`File a [TRACKER] ticket per finding without applying fixes` for a named tracker, `File an issue per finding without applying fixes` as the generic fallback, or omitted entirely when no sink is available — see `references/tracker-defer.md`), and options A / B / D have a single canonical label each. The letter is the stable dispatch signal; the canonical labels below are shown for documentation only. A low-confidence run that rendered option C as the generic label routes to the same branch as a high-confidence run that rendered it with the named tracker.
|
||||
- (A) `Review each finding one by one` — load `references/walkthrough.md` and enter the per-finding walk-through loop. The walk-through accumulates Apply decisions in memory; Defer decisions execute inline via `references/tracker-defer.md`; Skip / Acknowledge decisions are recorded as no-action; `LFG the rest` routes through `references/bulk-preview.md`. At end of the loop, dispatch one fixer subagent for the accumulated Apply set (Step 3). Emit the unified completion report.
|
||||
- (B) `LFG. Apply the agent's best-judgment action per finding` — load `references/bulk-preview.md` scoped to every pending `gated_auto` / `manual` finding. On `Proceed`, execute the plan: Apply set → Step 3 fixer dispatch; Defer set → `references/tracker-defer.md`; Skip / Acknowledge → no-op. On `Cancel`, return to this routing question. Emit the unified completion report after execution.
|
||||
- (C) `File a [TRACKER] ticket per finding without applying fixes` (or the generic `File an issue per finding without applying fixes` when the named-tracker label is not used) — load `references/bulk-preview.md` with every pending finding in the file-tickets bucket (regardless of the agent's natural recommendation). On `Proceed`, route every finding through `references/tracker-defer.md`; no fixes are applied. On `Cancel`, return to this routing question. Emit the unified completion report.
|
||||
- (D) `Report only — take no further action` — do not enter any dispatch phase. Emit the completion report, then proceed to Step 5 per its gating rule (`fixes_applied_count > 0` from earlier `safe_auto` passes). If no fixes were applied this run, stop after the report.
|
||||
|
||||
- The walk-through's completion report, the LFG / File-tickets completion report, and the zero-remaining completion summary all follow the unified completion-report structure documented in `references/walkthrough.md`. Use the same structure across every terminal path.
|
||||
|
||||
**Autofix mode**
|
||||
|
||||
- Ask no questions.
|
||||
- Apply only the `safe_auto -> review-fixer` queue.
|
||||
- Leave `gated_auto`, `manual`, `human`, and `release` items unresolved.
|
||||
- Prepare residual work only for unresolved actionable findings whose final owner is `downstream-resolver`.
|
||||
|
||||
**Report-only mode**
|
||||
|
||||
- Ask no questions.
|
||||
- Do not build a fixer queue.
|
||||
- Do not create residual todos or `.context` artifacts.
|
||||
- Stop after Stage 6. Everything remains in the report.
|
||||
|
||||
**Headless mode**
|
||||
|
||||
- Ask no questions.
|
||||
- Apply only the `safe_auto -> review-fixer` queue in a single pass. Do not enter the bounded re-review loop (Step 3). Spawn one fixer subagent, apply fixes, then proceed directly to Step 4.
|
||||
- Leave `gated_auto`, `manual`, `human`, and `release` items unresolved — they appear in the structured text output.
|
||||
- Output the headless output envelope (see Stage 6) instead of the interactive report.
|
||||
- Write a run artifact (Step 4) but do not create todo files.
|
||||
- Stop after the structured text output and "Review complete" signal. No commit/push/PR.
|
||||
|
||||
#### Step 3: Apply fixes with one fixer and bounded rounds
|
||||
|
||||
- Spawn exactly one fixer subagent for the current fixer queue in the current checkout. That fixer applies all approved changes and runs the relevant targeted tests in one pass against a consistent tree.
|
||||
- Do not fan out multiple fixers against the same checkout. Parallel fixers require isolated worktrees/branches and deliberate mergeback.
|
||||
- Re-review only the changed scope after fixes land.
|
||||
- Bound the loop with `max_rounds: 2`. If issues remain after the second round, stop and hand them off as residual work or report them as unresolved.
|
||||
- If any applied finding has `requires_verification: true`, the round is incomplete until the targeted verification runs.
|
||||
- Do not start a mutating review round concurrently with browser testing on the same checkout. Future orchestrators that want both must either run `mode:report-only` during the parallel phase or isolate the mutating review in its own checkout/worktree.
|
||||
|
||||
#### Step 4: Emit artifacts and downstream handoff
|
||||
|
||||
- In interactive, autofix, and headless modes, write a per-run artifact under `.context/compound-engineering/ce-code-review/<run-id>/` containing:
|
||||
- synthesized findings (merged output from Stage 5)
|
||||
- applied fixes
|
||||
- residual actionable work
|
||||
- advisory-only outputs
|
||||
Per-agent full-detail JSON files (`{reviewer_name}.json`) are already present in this directory from Stage 4 dispatch.
|
||||
- Also write `metadata.json` alongside the findings so downstream skills (e.g., `ce-polish-beta`) can verify the artifact matches the current branch and HEAD. Minimum fields:
|
||||
```json
|
||||
{
|
||||
"run_id": "<run-id>",
|
||||
"branch": "<git branch --show-current at dispatch time>",
|
||||
"head_sha": "<git rev-parse HEAD at dispatch time>",
|
||||
"verdict": "<Ready to merge | Ready with fixes | Not ready>",
|
||||
"completed_at": "<ISO 8601 UTC timestamp>"
|
||||
}
|
||||
```
|
||||
Capture `branch` and `head_sha` at dispatch time (before any autofixes land), and write the file after the verdict is finalized. This file is additive -- pre-existing artifacts that predate this field are still valid, and downstream skills fall back to file mtime when it is missing.
|
||||
- In autofix mode, create durable todo files only for unresolved actionable findings whose final owner is `downstream-resolver`. Load the `ce-todo-create` skill for the canonical directory path, naming convention, YAML frontmatter structure, and template. Each todo should map the finding's severity to the todo priority (`P0`/`P1` -> `p1`, `P2` -> `p2`, `P3` -> `p3`) and set `status: ready` since these findings have already been triaged by synthesis.
|
||||
- Do not create todos for `advisory` findings, `owner: human`, `owner: release`, or protected-artifact cleanup suggestions.
|
||||
- If only advisory outputs remain, create no todos.
|
||||
- Interactive mode may offer to externalize residual actionable work after fixes, but it is not required to finish the review.
|
||||
|
||||
#### Step 5: Final next steps
|
||||
|
||||
**Interactive mode only.** After the fix-review cycle completes (clean verdict or the user chose to stop), offer next steps based on the entry mode. Reuse the resolved review base/default branch from Stage 1 when known; do not hard-code only `main`/`master`.
|
||||
|
||||
**The gate is total fixes applied this run, not routing option.** Track `fixes_applied_count` across the whole Interactive invocation. This counter includes both the `safe_auto` fixes applied automatically before the routing question (see Step 2 Interactive mode) AND any Apply decisions executed by routing option A (walk-through) or option B (LFG). Routing options C (File tickets) and D (Report only) add zero to this counter; neither does a walk-through that ends with only Skip / Defer / Acknowledge, and neither does an LFG whose recommendations were all Defer / Skip / Acknowledge.
|
||||
|
||||
Step 5 runs only when `fixes_applied_count > 0`. If the counter is zero — no `safe_auto` fixes were applied AND the routing path produced no additional Apply — skip Step 5 entirely and exit after the completion report. Asking "push fixes?" when nothing changed in the working tree is incoherent.
|
||||
|
||||
Common outcomes:
|
||||
|
||||
- `safe_auto` produced fixes AND the user picked any routing option → Step 5 runs (counter > 0 from the safe_auto pass alone).
|
||||
- No `safe_auto` fixes AND the user picked option C or D → Step 5 skipped.
|
||||
- No `safe_auto` fixes AND walk-through / LFG finished with zero Applies → Step 5 skipped.
|
||||
- Zero-remaining case (no `gated_auto` / `manual` after `safe_auto`) with at least one `safe_auto` fix → Step 5 runs; the routing question was never asked but the counter is > 0.
|
||||
|
||||
- **PR mode (entered via PR number/URL):**
|
||||
- **Push fixes** -- push commits to the existing PR branch
|
||||
- **Exit** -- done for now
|
||||
- **Branch mode (feature branch with no PR, and not the resolved review base/default branch):**
|
||||
- **Create a PR (Recommended)** -- push and open a pull request
|
||||
- **Continue without PR** -- stay on the branch
|
||||
- **Exit** -- done for now
|
||||
- **On the resolved review base/default branch:**
|
||||
- **Continue** -- proceed with next steps
|
||||
- **Exit** -- done for now
|
||||
|
||||
If "Create a PR": first publish the branch with `git push --set-upstream origin HEAD`, then use `gh pr create` with a title and summary derived from the branch changes.
|
||||
If "Push fixes": push the branch with `git push` to update the existing PR.
|
||||
|
||||
**Autofix, report-only, and headless modes:** stop after the report, artifact emission, and residual-work handoff. Do not commit, push, or create a PR.
|
||||
|
||||
## Fallback
|
||||
|
||||
If the platform doesn't support parallel sub-agents, run reviewers sequentially. Everything else (stages, output format, merge pipeline) stays the same.
|
||||
|
||||
---
|
||||
|
||||
## Included References
|
||||
|
||||
### Persona Catalog
|
||||
|
||||
@./references/persona-catalog.md
|
||||
|
||||
### Subagent Template
|
||||
|
||||
@./references/subagent-template.md
|
||||
|
||||
### Diff Scope Rules
|
||||
|
||||
@./references/diff-scope.md
|
||||
|
||||
### Findings Schema
|
||||
|
||||
@./references/findings-schema.json
|
||||
|
||||
### Review Output Template
|
||||
|
||||
@./references/review-output-template.md
|
||||
@@ -0,0 +1,132 @@
|
||||
# Bulk Action Preview
|
||||
|
||||
This reference defines the compact plan preview that Interactive mode shows before every bulk action — LFG (routing option B), File tickets per finding (routing option C), and the walk-through's `LFG the rest` (option D of the per-finding question). The preview gives the user a single-screen view of what the agent is about to do, with exactly two options to Proceed or Cancel.
|
||||
|
||||
Interactive mode only.
|
||||
|
||||
---
|
||||
|
||||
## When the preview fires
|
||||
|
||||
Three call sites:
|
||||
|
||||
1. **Routing option B (top-level LFG)** — after the user picks `LFG. Apply the agent's best-judgment action per finding` from the routing question, but before any action executes. Scope: every pending `gated_auto` / `manual` finding.
|
||||
2. **Routing option C (top-level File tickets)** — after the user picks `File a [TRACKER] ticket per finding without applying fixes` but before any ticket is filed. Scope: every pending `gated_auto` / `manual` finding. Every finding appears under `Filing [TRACKER] tickets (N):` regardless of the agent's natural recommendation, because option C is batch-defer.
|
||||
3. **Walk-through `LFG the rest`** — after the user picks `LFG the rest — apply the agent's best judgment to this and remaining findings` from a per-finding question, but before the remaining findings are resolved. Scope: the current finding and everything not yet decided. Already-decided findings from the walk-through are not included in the preview.
|
||||
|
||||
In all three cases the user confirms with `Proceed` or backs out with `Cancel`. No per-item decisions inside the preview — per-item decisioning is the walk-through's role.
|
||||
|
||||
---
|
||||
|
||||
## Preview structure
|
||||
|
||||
The preview is grouped by the action the agent intends to take. Bucket headers appear only when their bucket is non-empty.
|
||||
|
||||
```
|
||||
<Path label> — <scope summary>[ (tracker: <name>)]:
|
||||
|
||||
Applying (N):
|
||||
[P0] <file>:<line> — <one-line plain-English summary>
|
||||
[P1] <file>:<line> — <one-line plain-English summary>
|
||||
|
||||
Filing [TRACKER] tickets (N):
|
||||
[P2] <file>:<line> — <one-line plain-English summary>
|
||||
|
||||
Skipping (N):
|
||||
[P2] <file>:<line> — <one-line plain-English summary>
|
||||
|
||||
Acknowledging (N):
|
||||
[P3] <file>:<line> — <one-line plain-English summary>
|
||||
```
|
||||
|
||||
Worked example, for routing option B (top-level LFG):
|
||||
|
||||
```
|
||||
LFG plan — 8 findings (tracker: Linear):
|
||||
|
||||
Applying (4):
|
||||
[P0] orders_controller.rb:42 — Add ownership guard before order lookup
|
||||
[P1] webhook_handler.rb:120 — Raise on unhandled error instead of swallowing
|
||||
[P2] user_serializer.rb:14 — Drop internal_id from serialized response
|
||||
[P3] string_utils.rb:8 — Rename ambiguous helper for clarity
|
||||
|
||||
Filing Linear tickets (2):
|
||||
[P2] billing_service.rb:230 — N+1 on refund batch (no concrete fix)
|
||||
[P2] session_helper.rb:12 — Session reset behavior needs discussion
|
||||
|
||||
Skipping (2):
|
||||
[P2] report_worker.rb:55 — Recommendation is speculative; low confidence
|
||||
[P3] readme.md:14 — Style preference, subjective
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Scope summary wording by path
|
||||
|
||||
- **Routing option B (top-level LFG):** header reads `LFG plan — N findings[ (tracker: <name>)]:`.
|
||||
- **Routing option C (top-level File tickets):** header reads `File plan — N findings as [TRACKER] tickets:`. Every finding lands in the `Filing [TRACKER] tickets (N):` bucket.
|
||||
- **Walk-through LFG the rest:** header reads `LFG plan — N remaining findings (K already decided)[ (tracker: <name>)]:`. Already-decided findings from the walk-through are **not** included in the preview or in the bucket counts. The `K already decided` counter communicates that the walk-through was partially completed.
|
||||
|
||||
When the detected tracker is low-confidence or generic (see `tracker-defer.md`), the `(tracker: <name>)` annotation is omitted from the header and the `Filing [TRACKER] tickets` bucket header uses the generic form (`Filing tickets (N):`).
|
||||
|
||||
---
|
||||
|
||||
## Per-finding line format
|
||||
|
||||
Each line uses the compressed form of the framing-quality bar from the plan (R22-R25 — observable-behavior-first, no function / variable names unless needed to locate). The one-line summary is drawn from the persona-produced `why_it_matters` by taking the first sentence (and, when the first sentence is too long for the preview width, paraphrasing it tightly to fit).
|
||||
|
||||
- **Shape:** `[<severity>] <file>:<line> — <one-line summary>`
|
||||
- **Width target:** keep lines near 80 columns so the preview renders cleanly in narrow terminals. Truncate with ellipsis when necessary.
|
||||
- **No function / variable names inline** unless the reader needs them to locate the issue.
|
||||
- **Advisory bucket phrasing:** the `Acknowledging (N):` bucket describes the advisory content in one line. No "fix" phrase — advisory findings have no concrete fix.
|
||||
|
||||
When no `why_it_matters` is available for a finding (e.g., Unit 2's template upgrade hasn't fully propagated through the persona run, or the artifact file was unreadable), fall back to the finding's title directly. Note the gap in the completion report's Coverage section if it affects more than a few findings in the same run.
|
||||
|
||||
---
|
||||
|
||||
## Question and options
|
||||
|
||||
After the preview body is rendered, ask the user using the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code, the tool should already be loaded from the Interactive-mode pre-load step — if it isn't, call `ToolSearch` with query `select:AskUserQuestion` now. The text fallback below applies only when that load explicitly fails.
|
||||
|
||||
Stem (adapted to the path):
|
||||
- For routing B: `The agent is about to apply the plan above. Proceed?`
|
||||
- For routing C: `The agent is about to file the tickets above. Proceed?`
|
||||
- For walk-through `LFG the rest`: `The agent is about to resolve the remaining findings above. Proceed?`
|
||||
|
||||
Options (exactly two, in all three cases):
|
||||
- `Proceed` — execute the plan as shown
|
||||
- `Cancel` — do nothing, return to the originating question
|
||||
|
||||
Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to presenting numbered options and waiting for the user's next reply.
|
||||
|
||||
---
|
||||
|
||||
## Cancel semantics
|
||||
|
||||
- **From routing option B Cancel:** return the user to the routing question (the four-option menu). Do not apply any fixes, file any tickets, or record any state. The session's cached tracker-detection tuple is preserved.
|
||||
- **From routing option C Cancel:** same — return to the routing question, no side effects.
|
||||
- **From walk-through `LFG the rest` Cancel:** return the user to the current finding's per-finding question (not to the routing question). The walk-through continues from where it was, with prior decisions intact.
|
||||
|
||||
In every case, `Cancel` changes no on-disk or external state.
|
||||
|
||||
---
|
||||
|
||||
## Proceed semantics
|
||||
|
||||
When the user picks `Proceed`:
|
||||
|
||||
- **Routing option B (top-level LFG):** for each finding in the plan, execute the recommended action. Apply findings go into the Apply set for a single end-of-batch fixer dispatch (see `walkthrough.md` for the Apply batching rules). Defer findings route through `tracker-defer.md`. Skip / Acknowledge findings are recorded as no-action. After all actions complete, emit the unified completion report (see `walkthrough.md`).
|
||||
- **Routing option C (top-level File tickets):** every finding routes through `tracker-defer.md` for ticket creation. No fixes are applied. After all tickets have been filed (or failed), emit the unified completion report.
|
||||
- **Walk-through `LFG the rest`:** same as routing option B, but scoped to the findings the user hadn't decided on. Apply findings join the in-memory Apply set with the ones the user already picked during the walk-through; all dispatch together in the single end-of-walk-through fixer call.
|
||||
|
||||
Failure during `Proceed` (e.g., ticket creation fails for one finding during a batch Defer) follows the failure path defined in `tracker-defer.md` — surface the failure inline with Retry / Fallback / Skip, continue with the rest of the plan, and capture the failure in the completion report's failure section.
|
||||
|
||||
---
|
||||
|
||||
## Edge cases
|
||||
|
||||
- **Zero findings in a bucket:** omit the bucket header. A preview with only Apply and Skip does not show an empty `Filing tickets (0):` or `Acknowledging (0):` line.
|
||||
- **All findings in one bucket:** preview still shows the bucket header; Proceed / Cancel still offered. This is the common case for routing option C (every finding under `Filing tickets`).
|
||||
- **N=1 preview (only one finding in scope):** the preview still uses the grouped format, just with a single-line bucket. `Proceed` / `Cancel` still apply.
|
||||
- **No tracker available:** option C is not offered upstream (see `tracker-defer.md` no-sink handling). LFG (option B) and walk-through `LFG the rest` can still run — they may contain per-finding Defer recommendations from Stage 5. Before rendering any LFG-shaped preview, downgrade every Defer recommendation to Skip when the session's cached `any_sink_available` is false, and surface the downgrade on the preview itself (e.g., a `Skipping — defer sink unavailable (N):` bucket, or a note in the header: `N Defer recommendations downgraded to Skip — no tracker sink`). This is a preview-time runtime step, not Stage 5 tie-breaking — step 7b only orders conflicting reviewer recommendations (`Skip > Defer > Apply > Acknowledge`, as defined in `SKILL.md` Stage 5 step 7b) and has no knowledge of sink availability.
|
||||
- **Walk-through `LFG the rest` with zero remaining findings:** the walk-through's own logic suppresses `LFG the rest` as an option when N=1 and otherwise, so the preview should never be invoked with zero remaining findings. If it is, render `LFG plan — 0 remaining findings` and fall through to Proceed with no-op.
|
||||
@@ -0,0 +1,31 @@
|
||||
# Diff Scope Rules
|
||||
|
||||
These rules apply to every reviewer. They define what is "your code to review" versus pre-existing context.
|
||||
|
||||
## Scope Discovery
|
||||
|
||||
Determine the diff to review using this priority order:
|
||||
|
||||
1. **User-specified scope.** If the caller passed `BASE:`, `FILES:`, or `DIFF:` markers, use that scope exactly.
|
||||
2. **Working copy changes.** If there are unstaged or staged changes (`git diff HEAD` is non-empty), review those.
|
||||
3. **Unpushed commits vs base branch.** If the working copy is clean, review `git diff $(git merge-base HEAD <base>)..HEAD` where `<base>` is the default branch (main or master).
|
||||
|
||||
The scope step in the SKILL.md handles discovery and passes you the resolved diff. You do not need to run git commands yourself.
|
||||
|
||||
## Finding Classification Tiers
|
||||
|
||||
Every finding you report falls into one of three tiers based on its relationship to the diff:
|
||||
|
||||
### Primary (directly changed code)
|
||||
|
||||
Lines added or modified in the diff. This is your main focus. Report findings against these lines at full confidence.
|
||||
|
||||
### Secondary (immediately surrounding code)
|
||||
|
||||
Unchanged code within the same function, method, or block as a changed line. If a change introduces a bug that's only visible by reading the surrounding context, report it -- but note that the issue exists in the interaction between new and existing code.
|
||||
|
||||
### Pre-existing (unrelated to this diff)
|
||||
|
||||
Issues in unchanged code that the diff didn't touch and doesn't interact with. Mark these as `"pre_existing": true` in your output. They're reported separately and don't count toward the review verdict.
|
||||
|
||||
**The rule:** If you'd flag the same issue on an identical diff that didn't include the surrounding file, it's pre-existing. If the diff makes the issue *newly relevant* (e.g., a new caller hits an existing buggy function), it's secondary.
|
||||
@@ -0,0 +1,134 @@
|
||||
{
|
||||
"$schema": "http://json-schema.org/draft-07/schema#",
|
||||
"title": "Code Review Findings",
|
||||
"description": "Structured output schema for code review sub-agents",
|
||||
"type": "object",
|
||||
"required": ["reviewer", "findings", "residual_risks", "testing_gaps"],
|
||||
"properties": {
|
||||
"reviewer": {
|
||||
"type": "string",
|
||||
"description": "Persona name that produced this output (e.g., 'correctness', 'security')"
|
||||
},
|
||||
"findings": {
|
||||
"type": "array",
|
||||
"description": "List of code review findings. Empty array if no issues found.",
|
||||
"items": {
|
||||
"type": "object",
|
||||
"required": [
|
||||
"title",
|
||||
"severity",
|
||||
"file",
|
||||
"line",
|
||||
"why_it_matters",
|
||||
"autofix_class",
|
||||
"owner",
|
||||
"requires_verification",
|
||||
"confidence",
|
||||
"evidence",
|
||||
"pre_existing"
|
||||
],
|
||||
"properties": {
|
||||
"title": {
|
||||
"type": "string",
|
||||
"description": "Short, specific issue title. 10 words or fewer.",
|
||||
"maxLength": 100
|
||||
},
|
||||
"severity": {
|
||||
"type": "string",
|
||||
"enum": ["P0", "P1", "P2", "P3"],
|
||||
"description": "Issue severity level"
|
||||
},
|
||||
"file": {
|
||||
"type": "string",
|
||||
"description": "Relative file path from repository root"
|
||||
},
|
||||
"line": {
|
||||
"type": "integer",
|
||||
"description": "Primary line number of the issue",
|
||||
"minimum": 1
|
||||
},
|
||||
"why_it_matters": {
|
||||
"type": "string",
|
||||
"description": "Impact and failure mode -- not 'what is wrong' but 'what breaks'"
|
||||
},
|
||||
"autofix_class": {
|
||||
"type": "string",
|
||||
"enum": ["safe_auto", "gated_auto", "manual", "advisory"],
|
||||
"description": "Reviewer's conservative recommendation for how this issue should be handled after synthesis"
|
||||
},
|
||||
"owner": {
|
||||
"type": "string",
|
||||
"enum": ["review-fixer", "downstream-resolver", "human", "release"],
|
||||
"description": "Who should own the next action for this finding after synthesis"
|
||||
},
|
||||
"requires_verification": {
|
||||
"type": "boolean",
|
||||
"description": "Whether any fix for this finding must be re-verified with targeted tests or a follow-up review pass"
|
||||
},
|
||||
"suggested_fix": {
|
||||
"type": ["string", "null"],
|
||||
"description": "Concrete minimal fix. Omit or null if no good fix is obvious -- a bad suggestion is worse than none."
|
||||
},
|
||||
"confidence": {
|
||||
"type": "number",
|
||||
"description": "Reviewer confidence in this finding, calibrated per persona",
|
||||
"minimum": 0.0,
|
||||
"maximum": 1.0
|
||||
},
|
||||
"evidence": {
|
||||
"type": "array",
|
||||
"description": "Code-grounded evidence: snippets, line references, or pattern descriptions. At least 1 item.",
|
||||
"items": { "type": "string" },
|
||||
"minItems": 1
|
||||
},
|
||||
"pre_existing": {
|
||||
"type": "boolean",
|
||||
"description": "True if this issue exists in unchanged code unrelated to the current diff"
|
||||
}
|
||||
}
|
||||
}
|
||||
},
|
||||
"residual_risks": {
|
||||
"type": "array",
|
||||
"description": "Risks the reviewer noticed but could not confirm as findings",
|
||||
"items": { "type": "string" }
|
||||
},
|
||||
"testing_gaps": {
|
||||
"type": "array",
|
||||
"description": "Missing test coverage the reviewer identified",
|
||||
"items": { "type": "string" }
|
||||
}
|
||||
},
|
||||
|
||||
"_meta": {
|
||||
"confidence_thresholds": {
|
||||
"suppress": "Below 0.60 -- do not report. Finding is speculative noise. Exception: P0 findings at 0.50+ may be reported.",
|
||||
"flag": "0.60-0.69 -- include only when the issue is clearly actionable with concrete evidence.",
|
||||
"confident": "0.70-0.84 -- real and important. Report with full evidence.",
|
||||
"certain": "0.85-1.00 -- verifiable from the code alone. Report."
|
||||
},
|
||||
"severity_definitions": {
|
||||
"P0": "Critical breakage, exploitable vulnerability, data loss/corruption. Must fix before merge.",
|
||||
"P1": "High-impact defect likely hit in normal usage, breaking contract. Should fix.",
|
||||
"P2": "Moderate issue with meaningful downside (edge case, perf regression, maintainability trap). Fix if straightforward.",
|
||||
"P3": "Low-impact, narrow scope, minor improvement. User's discretion."
|
||||
},
|
||||
"autofix_classes": {
|
||||
"safe_auto": "Local, deterministic code or test fix suitable for the in-skill fixer. Examples: extract duplicated helper, add missing nil check, fix off-by-one, add missing test, remove dead code. Do not default to advisory when a concrete safe fix exists.",
|
||||
"gated_auto": "Concrete fix exists, but it changes behavior, permissions, contracts, or other sensitive areas that deserve explicit approval. Examples: add auth to unprotected endpoint, change API response shape.",
|
||||
"manual": "Actionable issue that requires design decisions or cross-cutting changes. Examples: redesign data model, add pagination strategy, choose between architectural approaches.",
|
||||
"advisory": "Informational or operational item that should be surfaced in the report only. Examples: design asymmetry the PR improves but does not fully resolve, residual risk notes, deployment considerations."
|
||||
},
|
||||
"owners": {
|
||||
"review-fixer": "The in-skill fixer can own this when policy allows.",
|
||||
"downstream-resolver": "Turn this into residual work for later resolution.",
|
||||
"human": "A person must make a judgment call before code changes should continue.",
|
||||
"release": "Operational or rollout follow-up; do not convert into code-fix work automatically."
|
||||
},
|
||||
"return_tiers": {
|
||||
"description": "Finding fields are split into two tiers. The full schema (with all required fields) applies to the artifact file on disk. The compact return to the orchestrator omits detail-tier fields. Both are valid uses of this schema in different contexts.",
|
||||
"merge_tier": "Returned to orchestrator: title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix (optional). Plus top-level reviewer, residual_risks, testing_gaps.",
|
||||
"detail_tier": "Required in artifact file, omitted from compact return: why_it_matters, evidence. The artifact file must pass full schema validation including all required fields. Headless output depends on why_it_matters and evidence being present in the artifact."
|
||||
}
|
||||
}
|
||||
}
|
||||
@@ -0,0 +1,67 @@
|
||||
# Persona Catalog
|
||||
|
||||
17 reviewer personas organized into always-on, cross-cutting conditional, and stack-specific conditional layers, plus CE-specific agents. The orchestrator uses this catalog to select which reviewers to spawn for each review.
|
||||
|
||||
## Always-on (4 personas + 2 CE agents)
|
||||
|
||||
Spawned on every review regardless of diff content.
|
||||
|
||||
**Persona agents (structured JSON output):**
|
||||
|
||||
| Persona | Agent | Focus |
|
||||
|---------|-------|-------|
|
||||
| `correctness` | `review:ce-correctness-reviewer` | Logic errors, edge cases, state bugs, error propagation, intent compliance |
|
||||
| `testing` | `review:ce-testing-reviewer` | Coverage gaps, weak assertions, brittle tests, missing edge case tests |
|
||||
| `maintainability` | `review:ce-maintainability-reviewer` | Coupling, complexity, naming, dead code, premature abstraction |
|
||||
| `project-standards` | `review:ce-project-standards-reviewer` | CLAUDE.md and AGENTS.md compliance -- frontmatter, references, naming, cross-platform portability, tool selection |
|
||||
|
||||
**CE agents (unstructured output, synthesized separately):**
|
||||
|
||||
| Agent | Focus |
|
||||
|-------|-------|
|
||||
| `review:ce-agent-native-reviewer` | Verify new features are agent-accessible |
|
||||
| `research:ce-learnings-researcher` | Search docs/solutions/ for past issues related to this PR's modules and patterns |
|
||||
|
||||
## Conditional (8 personas)
|
||||
|
||||
Spawned when the orchestrator identifies relevant patterns in the diff. The orchestrator reads the full diff and reasons about selection -- this is agent judgment, not keyword matching.
|
||||
|
||||
| Persona | Agent | Select when diff touches... |
|
||||
|---------|-------|---------------------------|
|
||||
| `security` | `review:ce-security-reviewer` | Auth middleware, public endpoints, user input handling, permission checks, secrets management |
|
||||
| `performance` | `review:ce-performance-reviewer` | Database queries, ORM calls, loop-heavy data transforms, caching layers, async/concurrent code |
|
||||
| `api-contract` | `review:ce-api-contract-reviewer` | Route definitions, serializer/interface changes, event schemas, exported type signatures, API versioning |
|
||||
| `data-migrations` | `review:ce-data-migrations-reviewer` | Migration files, schema changes, backfill scripts, data transformations |
|
||||
| `reliability` | `review:ce-reliability-reviewer` | Error handling, retry logic, circuit breakers, timeouts, background jobs, async handlers, health checks |
|
||||
| `adversarial` | `review:ce-adversarial-reviewer` | Diff has >=50 changed non-test, non-generated, non-lockfile lines, OR touches auth, payments, data mutations, external API integrations, or other high-risk domains |
|
||||
| `cli-readiness` | `review:ce-cli-readiness-reviewer` | CLI command definitions, argument parsing, CLI framework usage, command handler implementations |
|
||||
| `previous-comments` | `review:ce-previous-comments-reviewer` | **PR-only.** Reviewing a PR that has existing review comments or review threads from prior review rounds. Skip entirely when no PR metadata was gathered in Stage 1. |
|
||||
|
||||
## Stack-Specific Conditional (5 personas)
|
||||
|
||||
These reviewers keep their original opinionated lens. They are additive with the cross-cutting personas above, not replacements for them.
|
||||
|
||||
| Persona | Agent | Select when diff touches... |
|
||||
|---------|-------|---------------------------|
|
||||
| `dhh-rails` | `review:ce-dhh-rails-reviewer` | Rails architecture, service objects, authentication/session choices, Hotwire-vs-SPA boundaries, or abstractions that may fight Rails conventions |
|
||||
| `kieran-rails` | `review:ce-kieran-rails-reviewer` | Rails controllers, models, views, jobs, components, routes, or other application-layer Ruby code where clarity and conventions matter |
|
||||
| `kieran-python` | `review:ce-kieran-python-reviewer` | Python modules, endpoints, services, scripts, or typed domain code |
|
||||
| `kieran-typescript` | `review:ce-kieran-typescript-reviewer` | TypeScript components, services, hooks, utilities, or shared types |
|
||||
| `julik-frontend-races` | `review:ce-julik-frontend-races-reviewer` | Stimulus/Turbo controllers, DOM event wiring, timers, async UI flows, animations, or frontend state transitions with race potential |
|
||||
|
||||
## CE Conditional Agents (migration-specific)
|
||||
|
||||
These CE-native agents provide specialized analysis beyond what the persona agents cover. Spawn them when the diff includes database migrations, schema.rb, or data backfills.
|
||||
|
||||
| Agent | Focus |
|
||||
|-------|-------|
|
||||
| `review:ce-schema-drift-detector` | Cross-references schema.rb changes against included migrations to catch unrelated drift |
|
||||
| `review:ce-deployment-verification-agent` | Produces Go/No-Go deployment checklist with SQL verification queries and rollback procedures |
|
||||
|
||||
## Selection rules
|
||||
|
||||
1. **Always spawn all 4 always-on personas** plus the 2 CE always-on agents.
|
||||
2. **For each cross-cutting conditional persona**, the orchestrator reads the diff and decides whether the persona's domain is relevant. This is a judgment call, not a keyword match.
|
||||
3. **For each stack-specific conditional persona**, use file types and changed patterns as a starting point, then decide whether the diff actually introduces meaningful work for that reviewer. Do not spawn language-specific reviewers just because one config or generated file happens to match the extension.
|
||||
4. **For CE conditional agents**, spawn when the diff includes migration files (`db/migrate/*.rb`, `db/schema.rb`) or data backfill scripts.
|
||||
5. **Announce the team** before spawning with a one-line justification per conditional reviewer selected.
|
||||
@@ -0,0 +1,100 @@
|
||||
#!/usr/bin/env bash
|
||||
# Resolve the review base branch and compute the merge-base for ce-code-review.
|
||||
# Handles fork-safe remote resolution, PR metadata, and multi-fallback detection.
|
||||
#
|
||||
# Usage: bash references/resolve-base.sh
|
||||
# Output: BASE:<sha> on success, ERROR:<message> on failure.
|
||||
#
|
||||
# Detects the base branch from (in priority order):
|
||||
# 1. PR metadata (base ref + base repo for fork safety)
|
||||
# 2. origin/HEAD symbolic ref
|
||||
# 3. gh repo view defaultBranchRef
|
||||
# 4. Common branch names: main, master, develop, trunk
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
REVIEW_BASE_BRANCH=""
|
||||
PR_BASE_REPO=""
|
||||
PR_BASE_REMOTE=""
|
||||
BASE_REF=""
|
||||
|
||||
# Step 1: Try PR metadata (handles fork workflows)
|
||||
if command -v gh >/dev/null 2>&1; then
|
||||
PR_META=$(gh pr view --json baseRefName,url 2>/dev/null || true)
|
||||
if [ -n "$PR_META" ]; then
|
||||
REVIEW_BASE_BRANCH=$(echo "$PR_META" | jq -r '.baseRefName // empty' 2>/dev/null || true)
|
||||
PR_BASE_REPO=$(echo "$PR_META" | jq -r '.url // empty' 2>/dev/null | sed -n 's#https://github.com/\([^/]*/[^/]*\)/pull/.*#\1#p' || true)
|
||||
fi
|
||||
fi
|
||||
|
||||
# Step 2: Fall back to origin/HEAD
|
||||
if [ -z "$REVIEW_BASE_BRANCH" ]; then
|
||||
REVIEW_BASE_BRANCH=$(git symbolic-ref --quiet --short refs/remotes/origin/HEAD 2>/dev/null | sed 's#^origin/##' || true)
|
||||
fi
|
||||
|
||||
# Step 3: Fall back to gh repo view
|
||||
if [ -z "$REVIEW_BASE_BRANCH" ] && command -v gh >/dev/null 2>&1; then
|
||||
REVIEW_BASE_BRANCH=$(gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name' 2>/dev/null || true)
|
||||
fi
|
||||
|
||||
# Step 4: Fall back to common branch names
|
||||
if [ -z "$REVIEW_BASE_BRANCH" ]; then
|
||||
for candidate in main master develop trunk; do
|
||||
if git rev-parse --verify "origin/$candidate" >/dev/null 2>&1 || git rev-parse --verify "$candidate" >/dev/null 2>&1; then
|
||||
REVIEW_BASE_BRANCH="$candidate"
|
||||
break
|
||||
fi
|
||||
done
|
||||
fi
|
||||
|
||||
# Resolve the base ref from the correct remote (fork-safe)
|
||||
if [ -n "$REVIEW_BASE_BRANCH" ]; then
|
||||
if [ -n "$PR_BASE_REPO" ]; then
|
||||
PR_BASE_REMOTE=$(git remote -v | awk "index(\$2, \"github.com:$PR_BASE_REPO\") || index(\$2, \"github.com/$PR_BASE_REPO\") {print \$1; exit}")
|
||||
if [ -n "$PR_BASE_REMOTE" ]; then
|
||||
BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
|
||||
if [ -z "$BASE_REF" ]; then
|
||||
git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH:refs/remotes/$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH" 2>/dev/null || true
|
||||
BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
if [ -z "$BASE_REF" ]; then
|
||||
# Only try origin if it exists as a remote; otherwise skip to avoid
|
||||
# confusing errors in fork setups where origin points at the user's fork.
|
||||
if git remote get-url origin >/dev/null 2>&1; then
|
||||
BASE_REF=$(git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
|
||||
if [ -z "$BASE_REF" ]; then
|
||||
git fetch --no-tags origin "$REVIEW_BASE_BRANCH:refs/remotes/origin/$REVIEW_BASE_BRANCH" 2>/dev/null || git fetch --no-tags origin "$REVIEW_BASE_BRANCH" 2>/dev/null || true
|
||||
BASE_REF=$(git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
|
||||
fi
|
||||
fi
|
||||
# Fall back to a bare local ref only if remote resolution failed
|
||||
if [ -z "$BASE_REF" ]; then
|
||||
BASE_REF=$(git rev-parse --verify "$REVIEW_BASE_BRANCH" 2>/dev/null || true)
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
|
||||
# Compute merge-base
|
||||
if [ -n "$BASE_REF" ]; then
|
||||
BASE=$(git merge-base HEAD "$BASE_REF" 2>/dev/null) || BASE=""
|
||||
if [ -z "$BASE" ] && [ "$(git rev-parse --is-shallow-repository 2>/dev/null || echo false)" = "true" ]; then
|
||||
if git remote get-url origin >/dev/null 2>&1; then
|
||||
git fetch --no-tags --unshallow origin 2>/dev/null || true
|
||||
BASE=$(git merge-base HEAD "$BASE_REF" 2>/dev/null) || BASE=""
|
||||
fi
|
||||
if [ -z "$BASE" ] && [ -n "$PR_BASE_REMOTE" ] && [ "$PR_BASE_REMOTE" != "origin" ]; then
|
||||
git fetch --no-tags --unshallow "$PR_BASE_REMOTE" 2>/dev/null || true
|
||||
BASE=$(git merge-base HEAD "$BASE_REF" 2>/dev/null) || BASE=""
|
||||
fi
|
||||
fi
|
||||
else
|
||||
BASE=""
|
||||
fi
|
||||
|
||||
if [ -n "$BASE" ]; then
|
||||
echo "BASE:$BASE"
|
||||
else
|
||||
echo "ERROR:Unable to resolve review base branch locally. Fetch the base branch and rerun, or provide a PR number so the review scope can be determined from PR metadata."
|
||||
fi
|
||||
@@ -0,0 +1,148 @@
|
||||
# Code Review Output Template
|
||||
|
||||
Use this **exact format** when presenting synthesized review findings. Findings are grouped by severity, not by reviewer.
|
||||
|
||||
**IMPORTANT:** Use pipe-delimited markdown tables (`| col | col |`). Do NOT use ASCII box-drawing characters.
|
||||
|
||||
## Example
|
||||
|
||||
```markdown
|
||||
## Code Review Results
|
||||
|
||||
**Scope:** merge-base with the review base branch -> working tree (14 files, 342 lines)
|
||||
**Intent:** Add order export endpoint with CSV and JSON format support
|
||||
**Mode:** autofix
|
||||
|
||||
**Reviewers:** correctness, testing, maintainability, security, api-contract
|
||||
- security -- new public endpoint accepts user-provided format parameter
|
||||
- api-contract -- new /api/orders/export route with response schema
|
||||
|
||||
### P0 -- Critical
|
||||
|
||||
| # | File | Issue | Reviewer | Confidence | Route |
|
||||
|---|------|-------|----------|------------|-------|
|
||||
| 1 | `orders_controller.rb:42` | User-supplied ID in account lookup without ownership check | security | 0.92 | `gated_auto -> downstream-resolver` |
|
||||
|
||||
### P1 -- High
|
||||
|
||||
| # | File | Issue | Reviewer | Confidence | Route |
|
||||
|---|------|-------|----------|------------|-------|
|
||||
| 2 | `export_service.rb:87` | Loads all orders into memory -- unbounded for large accounts | performance | 0.85 | `safe_auto -> review-fixer` |
|
||||
| 3 | `export_service.rb:91` | No pagination -- response size grows linearly with order count | api-contract, performance | 0.80 | `manual -> downstream-resolver` |
|
||||
|
||||
### P2 -- Moderate
|
||||
|
||||
| # | File | Issue | Reviewer | Confidence | Route |
|
||||
|---|------|-------|----------|------------|-------|
|
||||
| 4 | `export_service.rb:45` | Missing error handling for CSV serialization failure | correctness | 0.75 | `safe_auto -> review-fixer` |
|
||||
|
||||
### P3 -- Low
|
||||
|
||||
| # | File | Issue | Reviewer | Confidence | Route |
|
||||
|---|------|-------|----------|------------|-------|
|
||||
| 5 | `export_helper.rb:12` | Format detection could use early return instead of nested conditional | maintainability | 0.70 | `advisory -> human` |
|
||||
|
||||
### Applied Fixes
|
||||
|
||||
- `safe_auto`: Added bounded export pagination guard and CSV serialization failure test coverage in this run
|
||||
|
||||
### Residual Actionable Work
|
||||
|
||||
| # | File | Issue | Route | Next Step |
|
||||
|---|------|-------|-------|-----------|
|
||||
| 1 | `orders_controller.rb:42` | Ownership check missing on export lookup | `gated_auto -> downstream-resolver` | Create residual todo and require explicit approval before behavior change |
|
||||
| 2 | `export_service.rb:91` | Pagination contract needs a broader API decision | `manual -> downstream-resolver` | Create residual todo with contract and client impact details |
|
||||
|
||||
### Pre-existing Issues
|
||||
|
||||
| # | File | Issue | Reviewer |
|
||||
|---|------|-------|----------|
|
||||
| 1 | `orders_controller.rb:12` | Broad rescue masking failed permission check | correctness |
|
||||
|
||||
### Learnings & Past Solutions
|
||||
|
||||
- [Known Pattern] `docs/solutions/export-pagination.md` -- previous export pagination fix applies to this endpoint
|
||||
|
||||
### Agent-Native Gaps
|
||||
|
||||
- New export endpoint has no CLI/agent equivalent -- agent users cannot trigger exports
|
||||
|
||||
### Schema Drift Check
|
||||
|
||||
- Clean: schema.rb changes match the migrations in scope
|
||||
|
||||
### Deployment Notes
|
||||
|
||||
- Pre-deploy: capture baseline row counts before enabling the export backfill
|
||||
- Verify: `SELECT COUNT(*) FROM exports WHERE status IS NULL;` should stay at `0`
|
||||
- Rollback: keep the old export path available until the backfill has been validated
|
||||
|
||||
### Coverage
|
||||
|
||||
- Suppressed: 2 findings below 0.60 confidence
|
||||
- Residual risks: No rate limiting on export endpoint
|
||||
- Testing gaps: No test for concurrent export requests
|
||||
|
||||
---
|
||||
|
||||
> **Verdict:** Ready with fixes
|
||||
>
|
||||
> **Reasoning:** 1 critical auth bypass must be fixed. The memory/pagination issues (P1) should be addressed for production safety.
|
||||
>
|
||||
> **Fix order:** P0 auth bypass -> P1 memory/pagination -> P2 error handling if straightforward
|
||||
```
|
||||
|
||||
## Anti-patterns
|
||||
|
||||
Do NOT produce output like this. The following is wrong:
|
||||
|
||||
```markdown
|
||||
Findings
|
||||
|
||||
Sev: P1
|
||||
File: foo.go:42
|
||||
Issue: Some problem description
|
||||
Reviewer(s): adversarial
|
||||
Confidence: 0.85
|
||||
Route: advisory -> human
|
||||
────────────────────────────────────────
|
||||
Sev: P2
|
||||
File: bar.go:99
|
||||
Issue: Another problem
|
||||
```
|
||||
|
||||
This fails because: no pipe-delimited tables, no severity-grouped `###` headers, uses box-drawing horizontal rules, no numbered findings, no `## Code Review Results` title, and the verdict is not in a blockquote. Always use the table format from the example above.
|
||||
|
||||
## Formatting Rules
|
||||
|
||||
- **Pipe-delimited markdown tables** for findings -- never ASCII box-drawing characters or per-finding horizontal-rule separators between entries (the report-level `---` before the verdict is still required)
|
||||
- **Severity-grouped sections** -- `### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`. Omit empty severity levels.
|
||||
- **Always include file:line location** for code review issues
|
||||
- **Reviewer column** shows which persona(s) flagged the issue. Multiple reviewers = cross-reviewer agreement.
|
||||
- **Confidence column** shows the finding's confidence score
|
||||
- **Route column** shows the synthesized handling decision as ``<autofix_class> -> <owner>``.
|
||||
- **Header includes** scope, intent, and reviewer team with per-conditional justifications
|
||||
- **Mode line** -- include `interactive`, `autofix`, `report-only`, or `headless`
|
||||
- **Applied Fixes section** -- include only when a fix phase ran in this review invocation
|
||||
- **Residual Actionable Work section** -- include only when unresolved actionable findings were handed off for later work
|
||||
- **Pre-existing section** -- separate table, no confidence column (these are informational)
|
||||
- **Learnings & Past Solutions section** -- results from ce-learnings-researcher, with links to docs/solutions/ files
|
||||
- **Agent-Native Gaps section** -- results from ce-agent-native-reviewer. Omit if no gaps found.
|
||||
- **Schema Drift Check section** -- results from ce-schema-drift-detector. Omit if the agent did not run.
|
||||
- **Deployment Notes section** -- key checklist items from ce-deployment-verification-agent. Omit if the agent did not run.
|
||||
- **Coverage section** -- suppressed count, residual risks, testing gaps, failed reviewers
|
||||
- **Summary uses blockquotes** for verdict, reasoning, and fix order
|
||||
- **Horizontal rule** (`---`) separates findings from verdict
|
||||
- **`###` headers** for each section -- never plain text headers
|
||||
|
||||
## Headless Mode Format
|
||||
|
||||
In `mode:headless`, replace the interactive pipe-delimited table report with a structured text envelope. The headless format is defined in the `### Headless output format` section of SKILL.md. Key differences from the interactive format:
|
||||
|
||||
- **No pipe-delimited tables.** Findings use `[severity][autofix_class -> owner] File: <file:line> -- <title>` line format with indented Why/Evidence/Suggested fix lines.
|
||||
- **Findings grouped by autofix_class** (gated-auto, manual, advisory) instead of severity. Within each group, findings are sorted by severity.
|
||||
- **Verdict in header** (top of output) instead of bottom, so programmatic callers get it first.
|
||||
- **`Artifact:` line** in metadata header gives callers the path to the full run artifact.
|
||||
- **`[needs-verification]` marker** on findings where `requires_verification: true`.
|
||||
- **Evidence lines** included per finding.
|
||||
- **Completion signal:** "Review complete" as the final line.
|
||||
@@ -0,0 +1,130 @@
|
||||
# Sub-agent Prompt Template
|
||||
|
||||
This template is used by the orchestrator to spawn each reviewer sub-agent. Variable substitution slots are filled at spawn time.
|
||||
|
||||
---
|
||||
|
||||
## Template
|
||||
|
||||
```
|
||||
You are a specialist code reviewer.
|
||||
|
||||
<persona>
|
||||
{persona_file}
|
||||
</persona>
|
||||
|
||||
<scope-rules>
|
||||
{diff_scope_rules}
|
||||
</scope-rules>
|
||||
|
||||
<output-contract>
|
||||
You produce up to two outputs depending on whether a run ID was provided:
|
||||
|
||||
1. **Artifact file (when run ID is present).** If a Run ID appears in <review-context> below, WRITE your full analysis (all schema fields, including why_it_matters, evidence, and suggested_fix) as JSON to:
|
||||
.context/compound-engineering/ce-code-review/{run_id}/{reviewer_name}.json
|
||||
This is the ONE write operation you are permitted to make. Use the platform's file-write tool.
|
||||
If the write fails, continue -- the compact return still provides everything the merge needs.
|
||||
If no Run ID is provided (the field is empty or absent), skip this step entirely -- do not attempt any file write.
|
||||
|
||||
2. **Compact return (always).** RETURN compact JSON to the parent with ONLY merge-tier fields per finding:
|
||||
title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing, suggested_fix.
|
||||
Do NOT include why_it_matters or evidence in the returned JSON.
|
||||
Include reviewer, residual_risks, and testing_gaps at the top level.
|
||||
|
||||
The full file preserves detail for downstream consumers (headless output, debugging).
|
||||
The compact return keeps the orchestrator's context lean for merge and synthesis.
|
||||
|
||||
The schema below describes the **full artifact file format** (all fields required). For the compact return, follow the field list above -- omit why_it_matters and evidence even though the schema marks them as required.
|
||||
|
||||
{schema}
|
||||
|
||||
Confidence rubric (0.0-1.0 scale):
|
||||
- 0.00-0.29: Not confident / likely false positive. Do not report.
|
||||
- 0.30-0.49: Somewhat confident. Do not report -- too speculative for actionable review.
|
||||
- 0.50-0.59: Moderately confident. Real but uncertain. Do not report unless P0 severity.
|
||||
- 0.60-0.69: Confident enough to flag. Include only when the issue is clearly actionable.
|
||||
- 0.70-0.84: Highly confident. Real and important. Report with full evidence.
|
||||
- 0.85-1.00: Certain. Verifiable from the code alone. Report.
|
||||
|
||||
Suppress threshold: 0.60. Do not emit findings below 0.60 confidence (except P0 at 0.50+).
|
||||
|
||||
Writing `why_it_matters` (required field, every finding):
|
||||
|
||||
The `why_it_matters` field is how the reader — a developer triaging findings, a ticket-body reader months later, or a downstream automated surface — understands the problem without re-reading the file. Treat it as the most important prose field in your output; every downstream surface (walk-through questions, bulk-action previews, ticket bodies, headless output) depends on it being good.
|
||||
|
||||
- **Lead with observable behavior.** Describe what the bug does from the outside — what a user, attacker, operator, or downstream caller experiences. Do not lead with code structure ("The function X does Y..."). Start with the effect ("Any signed-in user can read another user's orders..."). Function and variable names appear later, only when the reader needs them to locate the issue.
|
||||
- **Explain why the fix resolves the problem.** If you include a `suggested_fix`, the `why_it_matters` should make clear why that specific fix addresses the root cause. When a similar pattern exists elsewhere in the codebase (an existing guard, an established convention, a parallel handler), reference it so the recommendation is grounded in the project's own conventions rather than theoretical best practice.
|
||||
- **Keep it tight.** Approximately 2-4 sentences plus the minimum code quoted inline to ground the point. Longer framings are a regression — downstream surfaces have narrow display budgets, and verbose `why_it_matters` content gets truncated or skimmed.
|
||||
- **Always produce substantive content.** `why_it_matters` is required by the schema. Empty strings, nulls, and single-phrase entries are validation failures. If you found something worth flagging (confidence >= 0.60), you can explain it — the field exists because every finding needs a reason.
|
||||
|
||||
Illustrative pair — same finding, weak vs. strong framing:
|
||||
|
||||
```
|
||||
WEAK (code-citation first; fails the observable-behavior rule):
|
||||
orders_controller.rb:42 has a missing authorization check.
|
||||
Add current_user.owns?(account) guard before the query.
|
||||
|
||||
STRONG (observable behavior first, grounded fix reasoning):
|
||||
Any signed-in user can read another user's orders by pasting the
|
||||
target account ID into the URL. The controller looks up the account
|
||||
and returns its orders without verifying the current user owns it.
|
||||
Adding a one-line ownership guard before the lookup matches the
|
||||
pattern already used in the shipments controller for the same attack.
|
||||
```
|
||||
|
||||
False-positive categories to actively suppress:
|
||||
- Pre-existing issues unrelated to this diff (mark pre_existing: true for unchanged code the diff does not interact with; if the diff makes it newly relevant, it is secondary, not pre-existing)
|
||||
- Pedantic style nitpicks that a linter/formatter would catch
|
||||
- Code that looks wrong but is intentional (check comments, commit messages, PR description for intent)
|
||||
- Issues already handled elsewhere in the codebase (check callers, guards, middleware)
|
||||
- Suggestions that restate what the code already does in different words
|
||||
- Generic "consider adding" advice without a concrete failure mode
|
||||
|
||||
Rules:
|
||||
- You are a leaf reviewer inside an already-running compound-engineering review workflow. Do not invoke compound-engineering skills or agents unless this template explicitly instructs you to. Perform your analysis directly and return findings in the required output format only.
|
||||
- Every finding in the full artifact file MUST include at least one evidence item grounded in the actual code. The compact return omits evidence -- the evidence requirement applies to the disk artifact only.
|
||||
- Set pre_existing to true ONLY for issues in unchanged code that are unrelated to this diff. If the diff makes the issue newly relevant, it is NOT pre-existing.
|
||||
- You are operationally read-only. The one permitted exception is writing your full analysis to the `.context/` artifact path when a run ID is provided. You may also use non-mutating inspection commands, including read-oriented `git` / `gh` commands, to gather evidence. Do not edit project files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
|
||||
- Set `autofix_class` accurately -- not every finding is `advisory`. Use this decision guide:
|
||||
- `safe_auto`: The fix is local and deterministic — the fixer can apply it mechanically without design judgment. Examples: extracting a duplicated helper, adding a missing nil/null check, fixing an off-by-one, adding a missing test for an untested code path, removing dead code.
|
||||
- `gated_auto`: A concrete fix exists but it changes contracts, permissions, or crosses a module boundary in a way that deserves explicit approval. Examples: adding authentication to an unprotected endpoint, changing a public API response shape, switching from soft-delete to hard-delete.
|
||||
- `manual`: Actionable work that requires design decisions or cross-cutting changes. Examples: redesigning a data model, choosing between two valid architectural approaches, adding pagination to an unbounded query.
|
||||
- `advisory`: Report-only items that should not become code-fix work. Examples: noting a design asymmetry the PR improves but doesn't fully resolve, flagging a residual risk, deployment notes.
|
||||
Do not default to `advisory` when uncertain -- if a concrete fix is obvious, classify it as `safe_auto` or `gated_auto`.
|
||||
- Set `owner` to the default next actor for this finding: `review-fixer`, `downstream-resolver`, `human`, or `release`.
|
||||
- Set `requires_verification` to true whenever the likely fix needs targeted tests, a focused re-review, or operational validation before it should be trusted.
|
||||
- suggested_fix is optional. Only include it when the fix is obvious and correct. A bad suggestion is worse than none.
|
||||
- If you find no issues, return an empty findings array. Still populate residual_risks and testing_gaps if applicable.
|
||||
- **Intent verification:** Compare the code changes against the stated intent (and PR title/body when available). If the code does something the intent does not describe, or fails to do something the intent promises, flag it as a finding. Mismatches between stated intent and actual code are high-value findings.
|
||||
</output-contract>
|
||||
|
||||
<pr-context>
|
||||
{pr_metadata}
|
||||
</pr-context>
|
||||
|
||||
<review-context>
|
||||
Run ID: {run_id}
|
||||
Reviewer name: {reviewer_name}
|
||||
|
||||
Intent: {intent_summary}
|
||||
|
||||
Changed files: {file_list}
|
||||
|
||||
Diff:
|
||||
{diff}
|
||||
</review-context>
|
||||
```
|
||||
|
||||
## Variable Reference
|
||||
|
||||
| Variable | Source | Description |
|
||||
|----------|--------|-------------|
|
||||
| `{persona_file}` | Agent markdown file content | The full persona definition (identity, failure modes, calibration, suppress conditions) |
|
||||
| `{diff_scope_rules}` | `references/diff-scope.md` content | Primary/secondary/pre-existing tier rules |
|
||||
| `{schema}` | `references/findings-schema.json` content | The JSON schema reviewers must conform to |
|
||||
| `{intent_summary}` | Stage 2 output | 2-3 line description of what the change is trying to accomplish |
|
||||
| `{pr_metadata}` | Stage 1 output | PR title, body, and URL when reviewing a PR. Empty string when reviewing a branch or standalone checkout |
|
||||
| `{file_list}` | Stage 1 output | List of changed files from the scope step |
|
||||
| `{diff}` | Stage 1 output | The actual diff content to review |
|
||||
| `{run_id}` | Stage 4 output | Unique review run identifier for the artifact directory |
|
||||
| `{reviewer_name}` | Stage 3 output | Persona or agent name used as the artifact filename stem |
|
||||
@@ -0,0 +1,138 @@
|
||||
# Tracker Detection and Defer Execution
|
||||
|
||||
This reference covers how Interactive mode's Defer actions file tickets in the project's tracker. It is loaded by `SKILL.md` when the routing question needs to decide whether to offer option C (File tickets), when the walk-through's Defer option executes, and when the bulk-preview of option C (File tickets per finding) is shown.
|
||||
|
||||
Interactive mode only. Autofix, Report-only, and Headless modes do not use this reference.
|
||||
|
||||
---
|
||||
|
||||
## Detection
|
||||
|
||||
The agent determines the project's tracker from whatever documentation is obvious. Primary sources: `CLAUDE.md` and `AGENTS.md` at the repo root and in relevant subdirectories. Supplementary signals (when primary documentation is ambiguous): `CONTRIBUTING.md`, `README.md`, PR templates under `.github/`, visible tracker URLs in the repo.
|
||||
|
||||
A tracker can be surfaced via MCP tool (e.g., a Linear MCP server), CLI (e.g., `gh`), or direct API. All are acceptable. The detection output is a tuple with two availability flags — one for the named tracker specifically (drives label confidence) and one for the full fallback chain (drives whether Defer is offered at all):
|
||||
|
||||
```
|
||||
{ tracker_name, confidence, named_sink_available, any_sink_available }
|
||||
```
|
||||
|
||||
Where:
|
||||
- `tracker_name` — human-readable name ("Linear", "GitHub Issues", "Jira"), or `null` when detection cannot identify a specific tracker
|
||||
- `confidence` — `high` when the tracker is named explicitly in documentation (or via a linked URL to a specific project/workspace) and is unambiguously the project's canonical tracker; `low` when the signal is thin, conflicting, or implied only
|
||||
- `named_sink_available` — `true` only when the agent can actually invoke the detected tracker (MCP tool is loaded, CLI is authenticated, or API credentials are in environment); `false` when the tracker is documented but no tool reaches it, or when no tracker is found at all. Drives label confidence: inline tracker naming requires this to be `true`.
|
||||
- `any_sink_available` — `true` when any tier in the fallback chain (named tracker, GitHub Issues via `gh`, or harness task-tracking primitive) can be invoked this session. Drives whether Defer is offered: no-sink behavior fires only when this is `false`.
|
||||
|
||||
Detection is reasoning-based. Do not maintain an enumerated checklist of files to read. Read the obvious sources and form a confident conclusion; when the obvious sources don't resolve, the label falls back to generic wording and the agent confirms with the user before executing.
|
||||
|
||||
---
|
||||
|
||||
## Probe timing and caching
|
||||
|
||||
Availability probes run **at most once per session** and **only when the routing question is about to be asked**. Never speculatively at review start, never per-Defer, never per-walk-through-finding. The cached tuple is reused for every Defer action in the same run.
|
||||
|
||||
Typical probe sequence:
|
||||
|
||||
1. Read `CLAUDE.md` / `AGENTS.md` for tracker references. If nothing found, set `tracker_name = null`, `confidence = low`.
|
||||
2. **Probe the named tracker when one was found.** For GitHub Issues, run `gh auth status` and `gh repo view --json hasIssuesEnabled`. For Linear or other MCP-backed trackers, verify the relevant MCP tool is loaded and responsive. For API-backed trackers, verify credentials in environment. Set `named_sink_available` from the probe result.
|
||||
3. **Probe the fallback tiers to compute `any_sink_available`.** Even when the named tracker was found and probed, the fallback tiers matter for the "no-sink" decision so that a run with no documented tracker but working `gh` still offers Defer. Stop at the first working tier:
|
||||
- If `named_sink_available = true`: `any_sink_available = true` (no further probes needed).
|
||||
- Otherwise, probe GitHub Issues via `gh auth status` + `gh repo view --json hasIssuesEnabled` (skip if already probed in step 2). If it works, `any_sink_available = true`.
|
||||
- Otherwise, check the harness task-tracking primitive. `TaskCreate` / `update_plan` are typically always present when the skill runs inside their harness — treat as available unless the session is in a context that explicitly forbids it (e.g., converted targets without task binding).
|
||||
- If every tier fails, `any_sink_available = false`.
|
||||
|
||||
When the routing question is skipped entirely (R2 zero-findings case), no probes run. When the cached tuple is reused across a session, any `named_sink_available = true` from the session's first probe stays cached — do not re-probe per Defer.
|
||||
|
||||
---
|
||||
|
||||
## Label logic
|
||||
|
||||
- When `confidence = high` AND `named_sink_available = true`: the routing question's option C and the walk-through's per-finding Defer option both include the tracker name verbatim. Example: `File a Linear ticket per finding`, `Defer — file a Linear ticket`.
|
||||
- When `any_sink_available = true` but either `confidence = low` or `named_sink_available = false` (a fallback tier is working instead): the labels read generically — `File an issue per finding`, `Defer — file a ticket`. Before executing the first Defer of the session, the agent confirms the effective tracker choice with the user using the platform's blocking question tool.
|
||||
- When `any_sink_available = false`: option C is omitted from the routing question, option B (Defer) is omitted from the walk-through per-finding options, and the agent tells the user why in the routing question's stem.
|
||||
|
||||
---
|
||||
|
||||
## Fallback chain
|
||||
|
||||
When the named tracker is unavailable or no tracker is named, fall back in this order. Prefer durable external trackers over in-session-only primitives.
|
||||
|
||||
1. **Named tracker** (MCP tool, CLI, or API the agent can invoke directly)
|
||||
2. **GitHub Issues via `gh`** — when `gh auth status` succeeds and the current repo has issues enabled (`gh repo view --json hasIssuesEnabled` returns `true`)
|
||||
3. **Harness task-tracking primitive** — `TaskCreate` in Claude Code, `update_plan` in Codex, or the equivalent on other target platforms — used as a last resort and only after a once-per-session durability confirmation (below)
|
||||
|
||||
Never fall back to `.context/compound-engineering/todos/`. The internal-todos system is on a deprecation path (see plan scope boundaries) and must not be extended by this Defer path.
|
||||
|
||||
---
|
||||
|
||||
## Once-per-session harness-fallback confirmation
|
||||
|
||||
When the fallback to harness task-tracking primitive is in effect, and before the first Defer action of the session executes, the agent asks the user once using the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code, `AskUserQuestion` is a deferred tool — before the first call this session, load its schema via `ToolSearch` with query `select:AskUserQuestion`.
|
||||
|
||||
> No documented tracker was found and `gh` is not available. Defer actions will create in-session tasks that do not survive past this session. Proceed for this and subsequent Defer actions?
|
||||
|
||||
Options:
|
||||
- `Proceed with in-session tasks` — the agent continues with harness task creation for every Defer in this run
|
||||
- `Cancel — leave findings as residual in the report` — the agent converts all pending Defers to Skip with a note, and surfaces the findings in the completion report's residual-work section
|
||||
|
||||
The confirmation is cached for the session. Subsequent Defer actions do not re-prompt.
|
||||
|
||||
Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to numbered options and waiting for the user's reply.
|
||||
|
||||
---
|
||||
|
||||
## Ticket composition
|
||||
|
||||
Every Defer action creates a ticket with the following content, adapted to the tracker's capabilities:
|
||||
|
||||
- **Title:** the merged finding's `title` (schema-capped at 10 words).
|
||||
- **Body:**
|
||||
- Plain-English problem statement — reads the persona-produced `why_it_matters` from the contributing reviewer's artifact file at `.context/compound-engineering/ce-code-review/<run-id>/{reviewer}.json`, using the same `file + line_bucket(line, +/-3) + normalize(title)` matching headless mode uses (see SKILL.md Stage 6 detail enrichment). Falls back to the merged finding's `title`, `severity`, `file`, and `suggested_fix` (when present) when no artifact match is available — these fields are guaranteed in the merge-tier compact return.
|
||||
- Suggested fix (when present in the finding's `suggested_fix`).
|
||||
- Evidence (direct quotes from the reviewer's artifact).
|
||||
- Metadata block: `Severity: <level>`, `Confidence: <score>`, `Reviewer(s): <list>`, `Finding ID: <fingerprint>`.
|
||||
- **Labels** (when the tracker supports labels): severity tag (`P0`, `P1`, `P2`, `P3`) and, when the tracker convention supports it, a category label sourced from the reviewer name.
|
||||
- **Length cap:** when the composed body would exceed a tracker's body length limit, truncate with `... (continued in ce-code-review run artifact: .context/compound-engineering/ce-code-review/<run-id>/)` and include the finding_id in both the truncated body and the metadata block so the artifact is discoverable.
|
||||
|
||||
The finding_id is a stable fingerprint composed as `normalize(file) + line_bucket(line, +/-3) + normalize(title)` — the same fingerprint used by the merge pipeline.
|
||||
|
||||
---
|
||||
|
||||
## Failure path
|
||||
|
||||
When ticket creation fails at execution (API error, auth expiry mid-session, rate limit, malformed body rejected, 4xx/5xx response), the agent surfaces the failure inline and asks the user using the platform's blocking question tool:
|
||||
|
||||
Stem:
|
||||
> Defer failed: <tracker name> returned <error summary>. How should the agent handle this finding?
|
||||
|
||||
Options:
|
||||
- `Retry on <tracker>` — re-attempt the same tracker once more (useful for transient errors)
|
||||
- `Fall back to next sink` — move this finding's Defer to the next tier in the fallback chain (e.g., from Linear to GitHub Issues, or from GitHub Issues to harness task primitive)
|
||||
- `Convert to Skip — record the failure` — abandon this Defer, note the failure in the completion report's failure section, and continue the walk-through or bulk flow
|
||||
|
||||
When a high-confidence named tracker fails at execution, the cached `named_sink_available` is set to `false` for the rest of the session. Subsequent Defer actions fall straight through to the next tier without retrying a confirmed-broken sink. `any_sink_available` is only downgraded to `false` when every tier has been confirmed broken — a failed Linear call that succeeds via `gh` keeps `any_sink_available = true`.
|
||||
|
||||
Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to numbered options and waiting for the user's reply.
|
||||
|
||||
---
|
||||
|
||||
## Per-tracker behavior
|
||||
|
||||
Concrete behavior per tracker at execution time. The agent may invoke any of these through the appropriate interface (MCP, CLI, or API) — the choice depends on what is available in the current environment.
|
||||
|
||||
| Tracker | Interface | Invocation sketch | Body format | Labels |
|
||||
|---------|-----------|-------------------|-------------|--------|
|
||||
| Linear | MCP (preferred) or API | Create issue in the project/workspace identified by documentation; assign to the reporter if the MCP tool exposes user context | Markdown | Severity priority field if the MCP exposes it; otherwise include severity in body |
|
||||
| GitHub Issues | `gh issue create` | Repo defaults to the current repo. Use `--label` for severity tag when labels exist; omit `--label` if the repo has no label fixture. Fall back to a label-less issue on first failure. | Markdown | `--label P0` / `--label P1` / etc. when labels exist |
|
||||
| Jira | MCP or API | Create issue in the project identified by documentation; Jira's markdown dialect differs from GitHub's — use plain text in the body when MCP does not handle conversion | Plain text when MCP does not handle markdown | Severity priority field |
|
||||
| Harness task primitive (last resort) | `TaskCreate` / `update_plan` / platform equivalent | Create one task per finding with subject = title and description = compact version of the body. No labels. Warn the user that tasks will not survive past the session (see once-per-session confirmation above). | Plain text, compact | None |
|
||||
| No sink available | — | Defer option is omitted; findings remain in the report's residual-work section | — | — |
|
||||
|
||||
When uncertain, prefer "drop with explicit user-facing notice" over "pass through silently and hope." A Defer that produces no durable artifact and no user message is data loss.
|
||||
|
||||
---
|
||||
|
||||
## Cross-platform notes
|
||||
|
||||
The question-tool name varies by platform. Use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code the tool should already be loaded from the Interactive-mode pre-load step — if it isn't, call `ToolSearch` with query `select:AskUserQuestion` now. Only when that load explicitly fails, or on a platform with no blocking tool, fall back to numbered options and waiting for the user's next reply before proceeding.
|
||||
|
||||
The fallback chain's final tier (harness task-tracking primitive) does not exist on every target platform. When converted for a platform that has no equivalent of `TaskCreate` / `update_plan`, the agent should treat that platform as "no harness sink" and move directly to the no-sink behavior (omit Defer from menus and tell the user why).
|
||||
@@ -0,0 +1,246 @@
|
||||
# Per-finding Walk-through
|
||||
|
||||
This reference defines Interactive mode's per-finding walk-through — the path the user enters by picking option A (`Review each finding one by one — accept the recommendation or choose another action`) from the routing question. It also covers the unified completion report that every terminal path (walk-through, LFG, File tickets, zero findings) emits.
|
||||
|
||||
Interactive mode only.
|
||||
|
||||
---
|
||||
|
||||
## Entry
|
||||
|
||||
The walk-through receives, from the orchestrator:
|
||||
|
||||
- The merged findings list in severity order (P0 → P1 → P2 → P3), filtered to `gated_auto` and `manual` findings that survived the Stage 5 confidence gate. Advisory findings are included when they were surfaced to this phase (advisory findings normally live in the report-only queue, but when the review flow routes them here for acknowledgment they take the advisory variant below).
|
||||
- The cached tracker-detection tuple from `tracker-defer.md` (`{ tracker_name, confidence, named_sink_available, any_sink_available }`). `any_sink_available` determines whether the Defer option is offered; `named_sink_available` + `confidence` determine whether the label names the tracker inline.
|
||||
- The run id for artifact lookups.
|
||||
|
||||
Each finding's recommended action has already been normalized by Stage 5 (step 7b — tie-break on action). The walk-through surfaces that recommendation to the user but does not recompute it.
|
||||
|
||||
---
|
||||
|
||||
## Per-finding presentation
|
||||
|
||||
Each finding is presented in two parts: a **terminal output block** carrying the explanation, and a **question** via the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini) carrying the decision. Never merge the two — the terminal block uses markdown; the question uses plain text.
|
||||
|
||||
In Claude Code the tool should already be loaded from the Interactive-mode pre-load step in `SKILL.md` — if it isn't, call `ToolSearch` with query `select:AskUserQuestion` now. Rendering the per-finding question as narrative text is a bug, not a valid fallback.
|
||||
|
||||
### Terminal output block (print before firing the question)
|
||||
|
||||
Render as markdown. Labels on their own line, blank lines between sections:
|
||||
|
||||
```
|
||||
## Finding {N} of {M} — {severity} {plain-English title}
|
||||
|
||||
{file}:{line}
|
||||
|
||||
**What's wrong**
|
||||
|
||||
{plain-English problem statement from why_it_matters}
|
||||
|
||||
**Proposed fix**
|
||||
|
||||
{suggested_fix — rendered per the substitution rules below: prose-first, intent-language}
|
||||
|
||||
**Why it works**
|
||||
|
||||
{short reasoning, grounded in a codebase pattern when available}
|
||||
|
||||
{R15 conflict context line, when applicable}
|
||||
```
|
||||
|
||||
Substitutions:
|
||||
|
||||
- **`{plain-English title}`:** a 3-8 word summary suitable as a heading. Derived from the merged finding's `title` field but rephrased so it reads as observable behavior (e.g., "Path traversal in loadUserFromCache" rather than "Missing userId validation on line 36").
|
||||
- **`why_it_matters`:** read the contributing reviewer's artifact file at `.context/compound-engineering/ce-code-review/{run_id}/{reviewer_name}.json` using the same `file + line_bucket(line, +/-3) + normalize(title)` matching that headless mode uses (see `SKILL.md` Stage 6 detail enrichment). When multiple reviewers flagged the merged finding, try them in the order they appear in the merged finding's reviewer list. Use the first match.
|
||||
- **`suggested_fix`:** from the merged finding's `suggested_fix` field. Render as prose describing **intent**, not as syntax. The fixer subagent owns the exact code — the walk-through just needs enough for the user to trust or reject the action. Rules:
|
||||
- **Default — one sentence describing the effect.** What does the fix achieve, and where does it live? Prefer intent language over quoted code.
|
||||
- ✅ `Throw on non-2xx response before parsing JSON.`
|
||||
- ✅ `` Replace `==` with `===` on line 42. ``
|
||||
- ✅ `` Add a `response.ok` check after the fetch and throw on non-2xx. ``
|
||||
- ✅ `Extract the request-building logic into a helper and call it from both sites.`
|
||||
- ❌ `` Add `if (!response.ok) throw new Error(`HTTP ${response.status}`);` after the `await fetch(...)` call, before `response.json()`. `` — nested backticks, multiple code spans, full statement quoted; renders broken in terminal.
|
||||
- **Code-span budget: at most 2 inline backtick spans per sentence, each a single identifier, operator, or short phrase** (e.g., `` `response.ok` ``, `` `===` ``, `` `fetchUserById` ``). Never embed full statements, template literals, or code requiring nested backticks. If the intent can't be stated within that budget, the prose is too close to syntax — restate at a higher level, or switch to summary + artifact pointer.
|
||||
- **Always leave a space before and after every backtick span.** Without it, the terminal's markdown renderer eats the delimiters and runs the words together.
|
||||
- **Raw code block — only for short (≤5 line) genuinely additive new code** where no before-state exists (new file, new function, new guard at the top of an empty body). Above 5 lines, switch to summary + pointer.
|
||||
- **Summary + artifact pointer** — when prose can't capture the fix: one-sentence transformation + key symbol/location + `Full fix: .context/compound-engineering/ce-code-review/{run_id}/{reviewer_name}.json → findings[].suggested_fix`.
|
||||
- **No diff blocks.** Modifications to existing code render as prose.
|
||||
- **`Why it works`:** grounded reasoning that, where possible, references a similar pattern already used elsewhere in the codebase (e.g., "matches the format-validation pattern already used at src/cli/io.ts:41"). One to three sentences.
|
||||
- **R15 conflict context line (when applicable):** when contributing reviewers implied different actions for this finding and Stage 5 step 7b broke the tie, surface that briefly. Example: `Correctness recommends Apply; Testing recommends Skip (low confidence). Agent's recommendation: Skip.` The orchestrator's recommendation — the post-tie-break value — is what the menu labels "recommended."
|
||||
|
||||
When no artifact match exists for the finding (merge-synthesized finding, or the persona's artifact write failed), the terminal block degrades to the heading + `suggested_fix` only (omit the `What's wrong` and `Why it works` sections) and records the gap for the Coverage section of the completion report.
|
||||
|
||||
### Question stem (short, decision-focused)
|
||||
|
||||
After the terminal block renders, fire the platform's blocking question tool with a compact two-line stem:
|
||||
|
||||
```
|
||||
Finding {N} of {M} — {severity} {short handle}.
|
||||
{Action framing in a phrase}?
|
||||
```
|
||||
|
||||
Where:
|
||||
|
||||
- **Short handle:** matches the `{plain-English title}` from the terminal block heading.
|
||||
- **Action framing:** one phrase describing what the *single recommended action* does, as a yes/no question. Examples: `Apply the format-validation + path.resolve guard?`, `Skip the fix since the fixture is being deleted?`, `Defer and file a rotation ticket?`.
|
||||
|
||||
Never enumerate alternatives in the stem. One recommendation as a yes/no — the option list carries the alternatives. When the recommendation is close, surface the disagreement in the R15 conflict context line, not as a multi-option stem.
|
||||
|
||||
Example (recommendation = Apply):
|
||||
|
||||
```
|
||||
Finding 3 of 8 — P1 path traversal in loadUserFromCache.
|
||||
Apply the format-validation + path.resolve guard?
|
||||
```
|
||||
|
||||
Example (recommendation = Skip because content context overrides default):
|
||||
|
||||
```
|
||||
Finding 1 of 9 — P0 hardcoded admin token.
|
||||
Skip the fix since the fixture is being deleted?
|
||||
(Security recommends Apply; file context recommends Skip. Agent's recommendation: Skip.)
|
||||
```
|
||||
|
||||
Never embed code blocks, diff syntax, or the full fix/reasoning in the stem.
|
||||
|
||||
### Confirmation between findings
|
||||
|
||||
After the user answers and before printing the next finding's terminal block, emit a one-line confirmation of the action taken. Examples: `→ Applied. Fix staged at src/utils/api-client.ts:36-37.`, `→ Deferred. Ticket filed: <url>.`, `→ Skipped.`, `→ Acknowledged.`
|
||||
|
||||
### Options (four, or adapted as noted)
|
||||
|
||||
Fixed order. Never reorder:
|
||||
|
||||
```
|
||||
1. Apply the proposed fix
|
||||
2. Defer — file a [TRACKER] ticket
|
||||
3. Skip — don't apply, don't track
|
||||
4. LFG the rest — apply the agent's best judgment to this and remaining findings
|
||||
```
|
||||
|
||||
Render the `[TRACKER]` label per `tracker-defer.md`: when `confidence = high` AND `named_sink_available = true`, replace `[TRACKER]` with the concrete tracker name (e.g., `Defer — file a Linear ticket`). When `any_sink_available = true` but either `confidence = low` or `named_sink_available = false`, use the generic whole label `Defer — file a ticket` — whole-label substitution, not a `[TRACKER]` token swap.
|
||||
|
||||
**Mark the post-tie-break recommendation with `(recommended)` on its option label.** Required, not optional. Any of the four options can carry it:
|
||||
|
||||
```
|
||||
1. Apply the proposed fix (recommended)
|
||||
2. Defer — file a ticket
|
||||
3. Skip — don't apply, don't track
|
||||
4. LFG the rest
|
||||
```
|
||||
|
||||
```
|
||||
1. Apply the proposed fix
|
||||
2. Defer — file a ticket
|
||||
3. Skip — don't apply, don't track (recommended)
|
||||
4. LFG the rest
|
||||
```
|
||||
|
||||
When reviewers disagreed or content context cuts against the default, still mark one option — whichever Stage 5 step 7b produced — and surface the disagreement in the R15 conflict context line.
|
||||
|
||||
### Adaptations
|
||||
|
||||
- **Advisory-only finding:** when the finding's `autofix_class` is `advisory` (no actionable fix), option A is replaced with `Acknowledge — mark as reviewed`. The other three options remain. The advisory variant is the only case where `Acknowledge` appears in the menu.
|
||||
- **N=1 (exactly one pending finding):** the terminal block's heading omits `Finding N of M` and renders as `## {severity} {plain-English title}`. The stem's first line drops the position counter, becoming `{severity} {short handle}.` Option D (`LFG the rest`) is suppressed because no subsequent findings exist — the menu shows three options: Apply / Defer / Skip (or Acknowledge, for advisory).
|
||||
- **No-sink (Defer option unavailable):** when the tracker-detection tuple reports `any_sink_available: false` (every tier in the fallback chain — named tracker, GitHub Issues, harness primitive — is unreachable), option B (`Defer`) is omitted. The stem appends one line explaining why (e.g., `Defer unavailable on this platform — no tracker or task-tracking primitive detected.`). The menu shows three options: Apply / Skip / LFG the rest (and Acknowledge in place of Apply for advisory-only findings). **Before rendering the options, remap any per-finding `Defer` recommendation produced by Stage 5 step 7b to `Skip`** so the `(recommended)` marker always lands on an option that is actually in the menu. When the remap fires, surface it on the R15 conflict context line (e.g., `Stage 5 recommended Defer; downgraded to Skip — no tracker sink available.`). This is a render-time runtime step mirroring the Defer→Skip downgrade that `bulk-preview.md` performs for LFG previews; Stage 5 step 7b has no knowledge of sink availability and only orders conflicting reviewer recommendations.
|
||||
- **Combined N=1 + no-sink:** the menu shows two options: Apply / Skip (or Acknowledge / Skip).
|
||||
|
||||
Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to presenting the options as a numbered list and waiting for the user's next reply.
|
||||
|
||||
---
|
||||
|
||||
## Per-finding routing
|
||||
|
||||
For each finding's answer:
|
||||
|
||||
- **Apply the proposed fix** — add the finding's id to an in-memory Apply set. Advance to the next finding. Do not dispatch the fixer inline — Apply accumulates for end-of-walk-through batch dispatch.
|
||||
- **Acknowledge — mark as reviewed** (advisory variant) — record Acknowledge in the in-memory decision list. Advance to the next finding. No side effects.
|
||||
- **Defer — file a [TRACKER] ticket** — invoke the tracker-defer flow from `tracker-defer.md`. The walk-through's position indicator stays on the current finding during any failure-path sub-question (Retry / Fall back / Convert to Skip). On success, record the tracker URL / reference in the in-memory decision list and advance. On conversion-to-Skip from the failure path, advance with the failure noted in the completion report.
|
||||
- **Skip — don't apply, don't track** — record Skip in the in-memory decision list. Advance. No side effects.
|
||||
- **LFG the rest — apply the agent's best judgment to this and remaining findings** — exit the walk-through loop. Dispatch the bulk preview from `bulk-preview.md`, scoped to the current finding and everything not yet decided. The preview header reports the count of already-decided findings ("K already decided"). If the user picks `Cancel` from the preview, return to the current finding's per-finding question (not to the routing question). If the user picks `Proceed`, execute the plan per `bulk-preview.md` — Apply findings join the in-memory Apply set with the ones the user already picked, Defer findings route through `tracker-defer.md`, Skip / Acknowledge no-op — then proceed to end-of-walk-through dispatch.
|
||||
|
||||
---
|
||||
|
||||
## Override rule
|
||||
|
||||
"Override" means the user picks a different preset action (Defer or Skip in place of Apply, or Apply in place of the agent's recommendation). No inline freeform custom-fix authoring — the walk-through is a decision loop, not a pair-programming surface. A user who wants a variant of the proposed fix picks Skip and hand-edits outside the flow; if they also want the finding tracked, they file a ticket manually. This trade is explicit in v1's scope boundaries.
|
||||
|
||||
---
|
||||
|
||||
## State
|
||||
|
||||
Walk-through state is **in-memory only**. The orchestrator maintains:
|
||||
|
||||
- An Apply set (finding ids the user picked Apply on)
|
||||
- A decision list (every answered finding with its action and any metadata like `tracker_url` for Deferred or `reason` for Skipped)
|
||||
- The current position in the findings list
|
||||
|
||||
Nothing is written to disk per-decision. An interrupted walk-through (user cancels the prompt, session compacts, network dies) discards all in-memory state. Defer actions that already executed remain in the tracker — those are external side effects and cannot be rolled back. Apply decisions have not been dispatched yet (they batch at end-of-walk-through), so they are cleanly lost with no code changes.
|
||||
|
||||
Formal cross-session resumption is out of scope for v1.
|
||||
|
||||
---
|
||||
|
||||
## End-of-walk-through dispatch
|
||||
|
||||
After the loop terminates — either every finding has been answered, or the user took `LFG the rest → Proceed` — the walk-through hands off to the dispatch phase:
|
||||
|
||||
1. **Apply set:** spawn one fixer subagent for the full accumulated Apply set. The fixer receives the set as its input queue and applies all changes in one pass against the current working tree. This preserves the existing "one fixer, consistent tree" mechanic and gives the fixer the full set at once to handle inter-fix dependencies (two Applies touching overlapping regions). The existing Step 3 fixer prompt needs a small update to acknowledge this queue may be heterogeneous (`gated_auto` and `manual` mix, not just `safe_auto`) — authored alongside this reference.
|
||||
2. **Defer set:** already executed inline during the walk-through. Nothing to dispatch here.
|
||||
3. **Skip / Acknowledge:** no-op.
|
||||
|
||||
After dispatch completes (or after `LFG the rest → Cancel` followed by the user working through remaining findings one at a time, or after the loop runs to completion), emit the unified completion report described below.
|
||||
|
||||
---
|
||||
|
||||
## Unified completion report
|
||||
|
||||
Every terminal path of Interactive mode emits the same completion report structure. This covers:
|
||||
|
||||
- Walk-through completed (all findings answered)
|
||||
- Walk-through bailed via `LFG the rest → Proceed`
|
||||
- Top-level LFG (routing option B) completed
|
||||
- Top-level File tickets (routing option C) completed
|
||||
- Zero findings after `safe_auto` (routing question was skipped — the completion summary is a one-line degenerate case of this structure)
|
||||
|
||||
### Minimum required fields (per R12)
|
||||
|
||||
- **Per-finding entries:** for every finding the flow touched, a line with — at minimum — title, severity, the action taken (Applied / Deferred / Skipped / Acknowledged), the tracker URL or in-session task reference for Deferred entries, and a one-line reason for Skipped entries (grounded in the finding's confidence or the one-line `why_it_matters` snippet).
|
||||
- **Summary counts by action:** totals per bucket (e.g., `4 applied, 2 deferred, 2 skipped`).
|
||||
- **Failures called out explicitly:** any fix application that failed, any ticket creation that failed (with the reason returned by the tracker). Failures are surfaced above the per-finding list so they are not missed.
|
||||
- **End-of-review verdict:** the existing Stage 6 verdict (Ready to merge / Ready with fixes / Not ready), computed from the residual state after all actions complete.
|
||||
|
||||
### Coverage section
|
||||
|
||||
Carry forward the existing Coverage data (suppressed-finding count, residual risks, testing gaps, failed reviewers) and add one new element:
|
||||
|
||||
- **Framing-enrichment gaps:** count of findings where artifact lookup returned no match (merge-synthesized findings, or failed persona artifact writes). Name the personas contributing those gaps so the data feeds any future persona-upgrade decision. A trail of gaps per run tells the team which persona agents still need attention.
|
||||
|
||||
### Report ordering
|
||||
|
||||
The report appears after all execution completes. Ordering inside the report: failures first (above the per-finding list), then per-finding entries grouped by action bucket in the order `Applied / Deferred / Skipped / Acknowledged`, then summary counts, then Coverage, then the verdict.
|
||||
|
||||
### Zero-findings degenerate case
|
||||
|
||||
When the routing question was skipped because no `gated_auto` / `manual` findings remained after `safe_auto`, the completion report collapses to its summary-counts + verdict form with one added line — the count of `safe_auto` fixes applied. The summary wording mirrors `SKILL.md` Step 2 Interactive mode's zero-remaining case: the unqualified `All findings resolved` form is only accurate when no advisory or pre-existing findings remain. When advisory and/or pre-existing findings remain in the report, use the qualified form that names what was cleared and names what still remains. Examples:
|
||||
|
||||
No remaining advisory or pre-existing findings:
|
||||
|
||||
```
|
||||
All findings resolved — 3 safe_auto fixes applied.
|
||||
|
||||
Verdict: Ready with fixes.
|
||||
```
|
||||
|
||||
Advisory and/or pre-existing findings remain in the report:
|
||||
|
||||
```
|
||||
All actionable findings resolved — 3 safe_auto fixes applied. (2 advisory, 1 pre-existing findings remain in the report.)
|
||||
|
||||
Verdict: Ready with fixes.
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Execution posture
|
||||
|
||||
The walk-through is operationally read-only except for two permitted writes: the in-memory Apply set / decision list (managed by the orchestrator) and the tracker-defer dispatch (external ticket creation, described in `tracker-defer.md`). Persona agents remain strictly read-only. The end-of-walk-through fixer dispatch is the single point where file modifications happen — governed by the existing Step 3 fixer contract in `SKILL.md`.
|
||||
Reference in New Issue
Block a user