|
|
|
|
@@ -62,7 +62,7 @@ All tokens are optional. Each one present means one less thing to infer. When ab
|
|
|
|
|
- **Skip all user questions.** Never use the platform question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini) or other interactive prompts. Infer intent conservatively if the diff metadata is thin.
|
|
|
|
|
- **Require a determinable diff scope.** If headless mode cannot determine a diff scope (no branch, PR, or `base:` ref determinable without user interaction), emit `Review failed (headless mode). Reason: no diff scope detected. Re-invoke with a branch name, PR number, or base:<ref>.` and stop without dispatching agents.
|
|
|
|
|
- **Apply only `safe_auto -> review-fixer` findings in a single pass.** No bounded re-review rounds. Leave `gated_auto`, `manual`, `human`, and `release` work unresolved and return them in the structured output.
|
|
|
|
|
- **Return all non-auto findings as structured text output.** Use the headless output envelope format (see Stage 6 below) preserving severity, autofix_class, owner, requires_verification, confidence, evidence[], and pre_existing per finding.
|
|
|
|
|
- **Return all non-auto findings as structured text output.** Use the headless output envelope format (see Stage 6 below) preserving severity, autofix_class, owner, requires_verification, confidence, pre_existing, and suggested_fix per finding. Enrich with detail-tier fields (why_it_matters, evidence[]) from the per-agent artifact files on disk (see Detail enrichment in Stage 6).
|
|
|
|
|
- **Write a run artifact** under `.context/compound-engineering/ce-review/<run-id>/` summarizing findings, applied fixes, and advisory outputs. Include the artifact path in the structured output.
|
|
|
|
|
- **Do not create todo files.** The caller receives structured findings and routes downstream work itself.
|
|
|
|
|
- **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:headless` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`. When stopping, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.`
|
|
|
|
|
@@ -339,6 +339,8 @@ If a plan is found, read its **Requirements Trace** (R1, R2, etc.) and **Impleme
|
|
|
|
|
|
|
|
|
|
Read the diff and file list from Stage 1. The 4 always-on personas and 2 CE always-on agents are automatic. For each cross-cutting and stack-specific conditional persona in the persona catalog included below, decide whether the diff warrants it. This is agent judgment, not keyword matching.
|
|
|
|
|
|
|
|
|
|
**File-type awareness for conditional selection:** Instruction-prose files (Markdown skill definitions, JSON schemas, config files) are product code but do not benefit from runtime-focused reviewers. The adversarial reviewer's techniques (race conditions, cascade failures, abuse cases) target executable code behavior. For diffs that only change instruction-prose files, skip adversarial unless the prose describes auth, payment, or data-mutation behavior. Count only executable code lines toward line-count thresholds.
|
|
|
|
|
|
|
|
|
|
**`previous-comments` is PR-only.** Only select this persona when Stage 1 gathered PR metadata (PR number or URL was provided as an argument, or `gh pr view` returned metadata for the current branch). Skip it entirely for standalone branch reviews with no associated PR -- there are no prior comments to check.
|
|
|
|
|
|
|
|
|
|
Stack-specific personas are additive. A Rails UI change may warrant `kieran-rails` plus `julik-frontend-races`; a TypeScript API diff may warrant `kieran-typescript` plus `api-contract` and `reliability`.
|
|
|
|
|
@@ -385,6 +387,19 @@ CE always-on agents (agent-native-reviewer, learnings-researcher) and CE conditi
|
|
|
|
|
|
|
|
|
|
The orchestrator (this skill) stays on the default model because it handles intent discovery, reviewer selection, finding merge/dedup, and synthesis -- tasks that benefit from stronger reasoning.
|
|
|
|
|
|
|
|
|
|
#### Run ID
|
|
|
|
|
|
|
|
|
|
Generate a unique run identifier before dispatching any agents. This ID scopes all agent artifact files and the post-review run artifact to the same directory.
|
|
|
|
|
|
|
|
|
|
```bash
|
|
|
|
|
RUN_ID=$(date +%Y%m%d-%H%M%S)-$(head -c4 /dev/urandom | od -An -tx1 | tr -d ' ')
|
|
|
|
|
mkdir -p ".context/compound-engineering/ce-review/$RUN_ID"
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Pass `{run_id}` to every persona sub-agent so they can write their full analysis to `.context/compound-engineering/ce-review/{run_id}/{reviewer_name}.json`.
|
|
|
|
|
|
|
|
|
|
**Report-only mode:** Skip run-id generation and directory creation. Do not pass `{run_id}` to agents. Agents return compact JSON only with no file write, consistent with report-only's no-write contract.
|
|
|
|
|
|
|
|
|
|
#### Spawning
|
|
|
|
|
|
|
|
|
|
Omit the `mode` parameter when dispatching sub-agents so the user's configured permission settings apply. Do not pass `mode: "auto"`.
|
|
|
|
|
@@ -396,45 +411,71 @@ Spawn each selected persona reviewer as a parallel sub-agent using the subagent
|
|
|
|
|
3. The JSON output contract from the findings schema included below
|
|
|
|
|
4. PR metadata: title, body, and URL when reviewing a PR (empty string otherwise). Passed in a `<pr-context>` block so reviewers can verify code against stated intent
|
|
|
|
|
5. Review context: intent summary, file list, diff
|
|
|
|
|
6. **For `project-standards` only:** the standards file path list from Stage 3b, wrapped in a `<standards-paths>` block appended to the review context
|
|
|
|
|
6. Run ID and reviewer name for the artifact file path
|
|
|
|
|
7. **For `project-standards` only:** the standards file path list from Stage 3b, wrapped in a `<standards-paths>` block appended to the review context
|
|
|
|
|
|
|
|
|
|
Persona sub-agents are **read-only**: they review and return structured JSON. They do not edit files or propose refactors.
|
|
|
|
|
Persona sub-agents are **read-only** with respect to the project: they review and return structured JSON. They do not edit project files or propose refactors. The one permitted write is saving their full analysis to the `.context/` artifact path specified in the output contract.
|
|
|
|
|
|
|
|
|
|
Read-only here means **non-mutating**, not "no shell access." Reviewer sub-agents may use non-mutating inspection commands when needed to gather evidence or verify scope, including read-oriented `git` / `gh` usage such as `git diff`, `git show`, `git blame`, `git log`, and `gh pr view`. They must not edit files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
|
|
|
|
|
Read-only here means **non-mutating**, not "no shell access." Reviewer sub-agents may use non-mutating inspection commands when needed to gather evidence or verify scope, including read-oriented `git` / `gh` usage such as `git diff`, `git show`, `git blame`, `git log`, and `gh pr view`. They must not edit project files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
|
|
|
|
|
|
|
|
|
|
Each persona sub-agent returns JSON matching the findings schema included below:
|
|
|
|
|
Each persona sub-agent writes full JSON (all schema fields) to `.context/compound-engineering/ce-review/{run_id}/{reviewer_name}.json` and returns compact JSON with merge-tier fields only:
|
|
|
|
|
|
|
|
|
|
```json
|
|
|
|
|
{
|
|
|
|
|
"reviewer": "security",
|
|
|
|
|
"findings": [...],
|
|
|
|
|
"findings": [
|
|
|
|
|
{
|
|
|
|
|
"title": "User-supplied ID in account lookup without ownership check",
|
|
|
|
|
"severity": "P0",
|
|
|
|
|
"file": "orders_controller.rb",
|
|
|
|
|
"line": 42,
|
|
|
|
|
"confidence": 0.92,
|
|
|
|
|
"autofix_class": "gated_auto",
|
|
|
|
|
"owner": "downstream-resolver",
|
|
|
|
|
"requires_verification": true,
|
|
|
|
|
"pre_existing": false,
|
|
|
|
|
"suggested_fix": "Add current_user.owns?(account) guard before lookup"
|
|
|
|
|
}
|
|
|
|
|
],
|
|
|
|
|
"residual_risks": [...],
|
|
|
|
|
"testing_gaps": [...]
|
|
|
|
|
}
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
Detail-tier fields (`why_it_matters`, `evidence`) are in the artifact file only. `suggested_fix` is optional in both tiers -- included in compact returns when present so the orchestrator has fix context for auto-apply decisions. If the file write fails, the compact return still provides everything the merge needs.
|
|
|
|
|
|
|
|
|
|
**CE always-on agents** (agent-native-reviewer, learnings-researcher) are dispatched as standard Agent calls in parallel with the persona agents. Give them the same review context bundle the personas receive: entry mode, any PR metadata gathered in Stage 1, intent summary, review base branch name when known, `BASE:` marker, file list, diff, and `UNTRACKED:` scope notes. Do not invoke them with a generic "review this" prompt. Their output is unstructured and synthesized separately in Stage 6.
|
|
|
|
|
|
|
|
|
|
**CE conditional agents** (schema-drift-detector, deployment-verification-agent) are also dispatched as standard Agent calls when applicable. Pass the same review context bundle plus the applicability reason (for example, which migration files triggered the agent). For schema-drift-detector specifically, pass the resolved review base branch explicitly so it never assumes `main`. Their output is unstructured and must be preserved for Stage 6 synthesis just like the CE always-on agents.
|
|
|
|
|
|
|
|
|
|
### Stage 5: Merge findings
|
|
|
|
|
|
|
|
|
|
Convert multiple reviewer JSON payloads into one deduplicated, confidence-gated finding set.
|
|
|
|
|
Convert multiple reviewer compact JSON returns into one deduplicated, confidence-gated finding set. The compact returns contain merge-tier fields (title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing) plus the optional suggested_fix. Detail-tier fields (why_it_matters, evidence) are on disk in the per-agent artifact files and are not loaded at this stage.
|
|
|
|
|
|
|
|
|
|
1. **Validate.** Check each output against the schema. Drop malformed findings (missing required fields). Record the drop count.
|
|
|
|
|
1. **Validate.** Check each compact return for required top-level and per-finding fields, plus value constraints. Drop malformed returns or findings. Record the drop count.
|
|
|
|
|
- **Top-level required:** reviewer (string), findings (array), residual_risks (array), testing_gaps (array). Drop the entire return if any are missing or wrong type.
|
|
|
|
|
- **Per-finding required:** title, severity, file, line, confidence, autofix_class, owner, requires_verification, pre_existing
|
|
|
|
|
- **Value constraints:**
|
|
|
|
|
- severity: P0 | P1 | P2 | P3
|
|
|
|
|
- autofix_class: safe_auto | gated_auto | manual | advisory
|
|
|
|
|
- owner: review-fixer | downstream-resolver | human | release
|
|
|
|
|
- confidence: numeric, 0.0-1.0
|
|
|
|
|
- line: positive integer
|
|
|
|
|
- pre_existing, requires_verification: boolean
|
|
|
|
|
- Do not validate against the full schema here -- the full schema (including why_it_matters and evidence) applies to the artifact files on disk, not the compact returns.
|
|
|
|
|
2. **Confidence gate.** Suppress findings below 0.60 confidence. Exception: P0 findings at 0.50+ confidence survive the gate -- critical-but-uncertain issues must not be silently dropped. Record the suppressed count. This matches the persona instructions and the schema's confidence thresholds.
|
|
|
|
|
3. **Deduplicate.** Compute fingerprint: `normalize(file) + line_bucket(line, +/-3) + normalize(title)`. When fingerprints match, merge: keep highest severity, keep highest confidence with strongest evidence, union evidence, note which reviewers flagged it.
|
|
|
|
|
3. **Deduplicate.** Compute fingerprint: `normalize(file) + line_bucket(line, +/-3) + normalize(title)`. When fingerprints match, merge: keep highest severity, keep highest confidence, note which reviewers flagged it.
|
|
|
|
|
4. **Cross-reviewer agreement.** When 2+ independent reviewers flag the same issue (same fingerprint), boost the merged confidence by 0.10 (capped at 1.0). Cross-reviewer agreement is strong signal -- independent reviewers converging on the same issue is more reliable than any single reviewer's confidence. Note the agreement in the Reviewer column of the output (e.g., "security, correctness").
|
|
|
|
|
5. **Separate pre-existing.** Pull out findings with `pre_existing: true` into a separate list.
|
|
|
|
|
5. **Resolve disagreements.** When reviewers flag the same code region but disagree on severity, autofix_class, or owner, record the disagreement in the finding's evidence (e.g., "security rated P0, correctness rated P1 -- keeping P0"). This transparency helps the user understand why a finding was routed the way it was.
|
|
|
|
|
6. **Normalize routing.** For each merged finding, set the final `autofix_class`, `owner`, and `requires_verification`. If reviewers disagree, keep the most conservative route. Synthesis may narrow a finding from `safe_auto` to `gated_auto` or `manual`, but must not widen it without new evidence.
|
|
|
|
|
7. **Partition the work.** Build three sets:
|
|
|
|
|
6. **Resolve disagreements.** When reviewers flag the same code region but disagree on severity, autofix_class, or owner, annotate the Reviewer column with the disagreement (e.g., "security (P0), correctness (P1) -- kept P0"). This transparency helps the user understand why a finding was routed the way it was.
|
|
|
|
|
7. **Normalize routing.** For each merged finding, set the final `autofix_class`, `owner`, and `requires_verification`. If reviewers disagree, keep the most conservative route. Synthesis may narrow a finding from `safe_auto` to `gated_auto` or `manual`, but must not widen it without new evidence.
|
|
|
|
|
8. **Partition the work.** Build three sets:
|
|
|
|
|
- in-skill fixer queue: only `safe_auto -> review-fixer`
|
|
|
|
|
- residual actionable queue: unresolved `gated_auto` or `manual` findings whose owner is `downstream-resolver`
|
|
|
|
|
- report-only queue: `advisory` findings plus anything owned by `human` or `release`
|
|
|
|
|
8. **Sort.** Order by severity (P0 first) -> confidence (descending) -> file path -> line number.
|
|
|
|
|
9. **Collect coverage data.** Union residual_risks and testing_gaps across reviewers.
|
|
|
|
|
10. **Preserve CE agent artifacts.** Keep the learnings, agent-native, schema-drift, and deployment-verification outputs alongside the merged finding set. Do not drop unstructured agent output just because it does not match the persona JSON schema.
|
|
|
|
|
9. **Sort.** Order by severity (P0 first) -> confidence (descending) -> file path -> line number.
|
|
|
|
|
10. **Collect coverage data.** Union residual_risks and testing_gaps across reviewers.
|
|
|
|
|
11. **Preserve CE agent artifacts.** Keep the learnings, agent-native, schema-drift, and deployment-verification outputs alongside the merged finding set. Do not drop unstructured agent output just because it does not match the persona JSON schema.
|
|
|
|
|
|
|
|
|
|
### Stage 6: Synthesize and present
|
|
|
|
|
|
|
|
|
|
@@ -524,6 +565,12 @@ Coverage:
|
|
|
|
|
Review complete
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
**Detail enrichment (headless only):** The headless envelope includes `Why:`, `Evidence:`, and `Suggested fix:` lines. After merge (Stage 5), read the per-agent artifact files from `.context/compound-engineering/ce-review/{run_id}/` for only the findings that survived dedup and confidence gating.
|
|
|
|
|
- **Field tiers:** `Why:` and `Evidence:` are detail-tier -- load from per-agent artifact files. `Suggested fix:` is merge-tier -- use it directly from the compact return without artifact lookup.
|
|
|
|
|
- **Artifact matching:** For each surviving finding, look up its detail-tier fields in the artifact files of the contributing reviewers. Match on `file + line_bucket(line, +/-3)` (the same tolerance used in Stage 5 dedup) within each contributing reviewer's artifact. When multiple artifact entries fall within the line bucket, apply `normalize(title)` to both the merged finding's title and each candidate entry's title as a tie-breaker.
|
|
|
|
|
- **Reviewer order:** Try contributing reviewers in the order they appear in the merged finding's reviewer list; use the first match.
|
|
|
|
|
- **No-match fallback:** If no artifact file contains a match (all writes failed, or the finding was synthesized during merge), omit the `Why:` and `Evidence:` lines for that finding and note the gap in Coverage. The `Suggested fix:` line can still be populated from the compact return since it is merge-tier.
|
|
|
|
|
|
|
|
|
|
**Formatting rules:**
|
|
|
|
|
- The `[needs-verification]` marker appears only on findings where `requires_verification: true`.
|
|
|
|
|
- The `Artifact:` line gives callers the path to the full run artifact for machine-readable access to the complete findings schema. The text envelope is the primary handoff; the artifact is for debugging and full-fidelity access.
|
|
|
|
|
@@ -626,10 +673,11 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
|
|
|
|
|
#### Step 4: Emit artifacts and downstream handoff
|
|
|
|
|
|
|
|
|
|
- In interactive, autofix, and headless modes, write a per-run artifact under `.context/compound-engineering/ce-review/<run-id>/` containing:
|
|
|
|
|
- synthesized findings
|
|
|
|
|
- synthesized findings (merged output from Stage 5)
|
|
|
|
|
- applied fixes
|
|
|
|
|
- residual actionable work
|
|
|
|
|
- advisory-only outputs
|
|
|
|
|
Per-agent full-detail JSON files (`{reviewer_name}.json`) are already present in this directory from Stage 4 dispatch.
|
|
|
|
|
- In autofix mode, create durable todo files only for unresolved actionable findings whose final owner is `downstream-resolver`. Load the `todo-create` skill for the canonical directory path, naming convention, YAML frontmatter structure, and template. Each todo should map the finding's severity to the todo priority (`P0`/`P1` -> `p1`, `P2` -> `p2`, `P3` -> `p3`) and set `status: ready` since these findings have already been triaged by synthesis.
|
|
|
|
|
- Do not create todos for `advisory` findings, `owner: human`, `owner: release`, or protected-artifact cleanup suggestions.
|
|
|
|
|
- If only advisory outputs remain, create no todos.
|
|
|
|
|
|