feat(ce-review): add headless mode for programmatic callers (#430)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trevin Chow
2026-03-28 16:09:21 -07:00
committed by GitHub
parent 125463b52a
commit 3706a9764b
6 changed files with 555 additions and 11 deletions

View File

@@ -0,0 +1,58 @@
---
date: 2026-03-28
topic: ce-review-headless-mode
---
# ce:review Headless Mode
## Problem Frame
ce:review currently has three modes (interactive, autofix, report-only), but all assume some level of direct user interaction or have mode-specific behaviors that don't fit programmatic callers. When another skill needs code review results as structured input, there's no way to invoke ce:review without it trying to prompt a user or applying fixes with interactive-session assumptions.
document-review solved this same problem in PR #425 with a `mode:headless` pattern. ce:review needs the same capability so it can be used as a utility skill by other workflows.
## Requirements
**Argument Parsing**
- R1. Add `mode:headless` argument, parsed alongside existing mode flags
**Runtime Behavior**
- R2. In headless mode, apply `safe_auto` fixes silently (matching autofix behavior)
- R4. No `AskUserQuestion` or other interactive prompts in headless mode
- R5. End with a clear completion signal so callers can detect when the review is done
**Output Format**
- R3. Return all non-auto findings (`gated_auto`, `manual`, `advisory`) as structured text output, preserving their original classifications (severity, autofix_class, owner, confidence, evidence[], pre_existing)
- R6. Follow document-review's structural output pattern (same envelope format, same section headings, similar parsing heuristics) while adapting per-finding fields to ce:review's own schema
## Success Criteria
- Another skill can invoke ce:review with `mode:headless`, receive structured findings, and act on them without any user interaction
- Output envelope (section headings, severity grouping, completion signal) is structurally consistent with document-review's headless output so callers can use a similar consumption pattern for both, while per-finding fields reflect ce:review's own schema
## Scope Boundaries
- Not changing the existing three modes (interactive, autofix, report-only)
- Not adding new reviewer personas or changing the review pipeline itself
- Not building a specific caller workflow in this change — just enabling the capability
## Key Decisions
- **Apply safe_auto fixes in headless**: Matches document-review's pattern where auto-fixes are applied silently and everything else is returned for the caller to handle
- **Structural consistency with document-review, not schema compatibility**: Same envelope and section headings, but per-finding fields use ce:review's own schema (which has different autofix_class values, owner, pre_existing, etc.). Callers will need skill-aware parsing for individual findings
## Outstanding Questions
### Deferred to Planning
- [Affects R3][Technical] Exact structured output format — should it mirror document-review's text format verbatim, or adapt to ce:review's richer findings schema (which includes fields like `autofix_class`, `evidence[]`, `pre_existing` that document-review doesn't have)?
- [Affects R1][Technical] How `mode:headless` interacts with the existing mode parsing — is it a fourth mode, or an overlay that modifies report-only/autofix behavior?
- [Affects R5][Technical] What the completion signal looks like — "Review complete (headless mode)" text, or a more structured envelope?
- [Affects R2][Technical] Should headless mode write run artifacts (`.context/compound-engineering/ce-review/<run-id>/`) and create durable todo files like autofix, or suppress them like report-only?
- [Affects R1][Technical] How should headless mode handle checkout/branch switching in Stage 1? Programmatic callers may need the checkout to stay stable (like report-only) even though headless applies fixes (like autofix).
- [Affects R1][Technical] Error behavior when headless receives conflicting mode flags (e.g., `mode:headless` + existing mode flags) or missing diff scope (no changes, no PR).
- [Affects R2][Technical] Should headless mode support bounded re-review rounds (max_rounds: 2) like autofix, or be single-pass?
## Next Steps
-> `/ce:plan` for structured implementation planning

View File

@@ -0,0 +1,330 @@
---
title: "feat(ce-review): Add headless mode for programmatic callers"
type: feat
status: completed
date: 2026-03-28
origin: docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md
---
# feat(ce-review): Add headless mode for programmatic callers
## Overview
Add `mode:headless` to ce:review so other skills can invoke it programmatically and receive structured findings without interactive prompts. Follows the pattern established by document-review's headless mode (PR #425).
## Problem Frame
ce:review has three modes (interactive, autofix, report-only), but none is designed for skill-to-skill invocation where the caller wants structured findings returned as parseable output. Autofix applies fixes and writes todos; report-only is read-only and outputs a human-readable report. Neither returns structured output for a calling workflow to consume and route. (see origin: `docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md`)
## Requirements Trace
- R1. Add `mode:headless` argument, parsed alongside existing mode flags
- R2. In headless mode, apply `safe_auto` fixes silently (matching autofix behavior)
- R3. Return all non-auto findings as structured text output, preserving severity, autofix_class, owner, requires_verification, confidence, evidence[], pre_existing
- R4. No `AskUserQuestion` or other interactive prompts in headless mode
- R5. End with a clear completion signal so callers can detect when the review is done
- R6. Follow document-review's structural output *pattern* (completion header, metadata block, autofix-class-grouped findings, trailing sections) while using ce:review's own section headings and per-finding fields
## Scope Boundaries
- Not changing existing three modes (interactive, autofix, report-only)
- Not adding new reviewer personas or changing the review pipeline (Stages 3-5)
- Not building a specific caller workflow — just enabling the capability
- Not adding headless invocations to existing orchestrators (lfg, slfg) in this change
## Context & Research
### Relevant Code and Patterns
- `plugins/compound-engineering/skills/ce-review/SKILL.md` — the skill to modify (mode detection at line 32, argument parsing at line 19, post-review flow at line 440)
- `plugins/compound-engineering/skills/ce-review/references/review-output-template.md` — existing output template with pipe-delimited tables and severity-grouped sections
- `plugins/compound-engineering/skills/ce-review/references/findings-schema.json` — ce:review's findings schema with `safe_auto|gated_auto|manual|advisory` autofix_class and `review-fixer|downstream-resolver|human|release` owner
- `plugins/compound-engineering/skills/document-review/SKILL.md` — headless mode pattern to follow (Phase 0 parsing, Phase 4 headless output, Phase 5 immediate return)
- `tests/review-skill-contract.test.ts` — contract test to extend
### Institutional Learnings
- `docs/solutions/skill-design/beta-promotion-orchestration-contract.md` — contract tests must be extended atomically with new mode flags
- `docs/solutions/skill-design/compound-refresh-skill-improvements.md` — explicit opt-in only for autonomous modes (no auto-detection from tool availability); conservative treatment of borderline cases
- `docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md` — walk all mode x state combinations when adding a new mode branch
- `docs/solutions/agent-friendly-cli-principles.md` — structured parseable output with stable field contracts for programmatic callers
## Key Technical Decisions
- **Headless is a fourth explicit mode, not an overlay**: Each mode is self-contained with its own complete behavior specification. This avoids whack-a-mole regressions from overlay interactions (per state-machine learning). Headless has its own rules section parallel to autofix and report-only.
- **No shared checkout switching, but NOT safe for concurrent use**: Headless follows report-only's checkout guard — if a PR/branch target is passed, headless must run in an isolated worktree or stop. However, unlike report-only, headless mutates files (applies safe_auto fixes). Callers must not run headless concurrently with other mutating operations on the same checkout. The headless rules section should explicitly state this.
- **Single-pass, no re-review rounds**: Headless applies `safe_auto` fixes in one pass and returns. No bounded fixer loop. Rationale: autofix uses max_rounds:2 because it operates autonomously within a larger workflow; headless returns structured output to a caller that can re-invoke if needed. The caller owns the iteration decision, keeping headless simple and predictable. Applied fixes that introduce new issues will be caught on a subsequent invocation if the caller chooses to re-review.
- **Write run artifacts, skip todos**: Run artifacts (`.context/compound-engineering/ce-review/<run-id>/`) provide an audit trail of what headless did. Todo files are skipped because the caller receives structured findings and routes downstream work itself.
- **Reject conflicting mode flags**: `mode:headless` is incompatible with `mode:autofix` and `mode:report-only`. If multiple mode tokens appear, emit an error and stop. This follows the "fail fast with actionable errors" principle.
- **Require diff scope with structured error**: Like document-review requiring a document path in headless mode, ce:review headless requires that a diff scope is determinable (branch, PR, or `base:` ref). If scope cannot be determined, emit a structured error: `Review failed (headless mode). Reason: <no diff scope detected | merge-base unresolved | conflicting mode flags>`. No agents are dispatched. The same structured error format applies to conflicting mode flags.
## Open Questions
### Resolved During Planning
- **Fourth mode vs overlay?** Fourth mode. Self-contained behavior avoids overlay ambiguity. (Grounded in state-machine learning and the fact that all three existing modes have independent rules sections.)
- **Artifacts and todos?** Write artifacts (audit trail), skip todos (caller routes findings). Headless owns mutation but not downstream handoff.
- **Checkout behavior?** No shared checkout switching. Same guard as report-only, since headless callers need stable checkouts.
- **Re-review rounds?** Single-pass. Callers can re-invoke if needed.
### Deferred to Implementation
- **Conflicting flags and missing scope error messages**: Decision made (reject with structured error), but exact wording and error envelope format deferred to implementation
- Whether the run artifact format needs any headless-specific metadata (e.g., marking the run as headless)
## High-Level Technical Design
> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.*
### Mode x Behavior Decision Matrix
| Behavior | Interactive | Autofix | Report-only | **Headless** |
|----------|------------|---------|-------------|--------------|
| User questions | Yes | No | No | **No** |
| Checkout switching | Yes | Yes | No (worktree or stop) | **No (worktree or stop)** |
| Intent ambiguity | Ask user | Infer conservatively | Infer conservatively | **Infer conservatively** |
| Apply safe_auto fixes | After policy question | Automatically | Never | **safe_auto only, single pass** |
| Apply gated_auto/manual fixes | After user approval | Never | Never | **Never (returned in output)** |
| Re-review rounds | max_rounds: 2 | max_rounds: 2 | N/A | **Single pass (no re-review)** |
| Write run artifact | Yes | Yes | No | **Yes** |
| Create todo files | No (user decides) | Yes (downstream-resolver) | No | **No (caller routes)** |
| Structured text output | No (interactive report) | No (interactive report) | No (interactive report) | **Yes (headless envelope)** |
| Commit/push/PR | Offered | Never | Never | **Never** |
| Completion signal | N/A | Stops after artifacts | Stops after report | **"Review complete"** |
| Safe for concurrent use | No | No | Yes (read-only) | **No (mutates files)** |
### Headless Output Envelope
Follows document-review's structural pattern adapted for ce:review's schema:
```
Code review complete (headless mode).
Scope: <scope-line>
Intent: <intent-summary>
Reviewers: <reviewer-list with conditional justifications>
Verdict: <Ready to merge | Ready with fixes | Not ready>
Artifact: .context/compound-engineering/ce-review/<run-id>/
Applied N safe_auto fixes.
Gated-auto findings (concrete fix, changes behavior/contracts):
[P1][gated_auto -> downstream-resolver][needs-verification] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Suggested fix: <suggested_fix or "none">
Evidence: <evidence[0]>
Evidence: <evidence[1]>
Manual findings (actionable, needs handoff):
[P1][manual -> downstream-resolver] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Evidence: <evidence[0]>
Advisory findings (report-only):
[P2][advisory -> human] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Pre-existing issues:
- <file:line> -- <title> (<reviewer>)
Residual risks:
- <risk>
Testing gaps:
- <gap>
```
The `[needs-verification]` marker appears only on findings where `requires_verification: true`. The `Artifact:` line gives callers the path to the full run artifact for machine-readable access to the complete findings schema. The text envelope is the primary handoff; the artifact is for debugging and full-fidelity access.
Findings with `owner: release` appear in the Advisory section (they are operational/rollout items, not code fixes). Findings with `pre_existing: true` appear in the Pre-existing section regardless of autofix_class.
Omit any section with zero items. If all reviewers fail or time out, emit a degraded signal: `Code review degraded (headless mode). Reason: 0 of N reviewers returned results.` followed by "Review complete" so the caller can detect the failure and decide how to proceed.
Then output "Review complete" as the terminal signal.
## Implementation Units
- [ ] **Unit 1: Mode Infrastructure**
**Goal:** Add `mode:headless` to argument parsing, mode detection, and error handling for conflicting flags / missing scope.
**Requirements:** R1, R4
**Dependencies:** None
**Files:**
- Modify: `plugins/compound-engineering/skills/ce-review/SKILL.md`
**Approach:**
- Add `mode:headless` row to the Argument Parsing token table (alongside `mode:autofix` and `mode:report-only`)
- Add headless row to the Mode Detection table with behavior summary
- Add a "Headless mode rules" subsection parallel to "Autofix mode rules" and "Report-only mode rules"
- Update the `argument-hint` frontmatter to include `mode:headless`
- Add conflicting-flag guard: if multiple mode tokens appear in arguments, emit an error message listing the conflict and stop
- Add scope-required guard: if headless mode cannot determine diff scope without user interaction, emit an error with re-invocation syntax (matching document-review's nil-path pattern)
**Patterns to follow:**
- Existing mode detection table structure at SKILL.md line 34
- Existing mode rules subsections at SKILL.md lines 40-54
- document-review Phase 0 parsing and nil-path guard at document-review SKILL.md lines 12-37
**Test scenarios:**
- Happy path: `mode:headless` token is parsed and headless mode is activated
- Happy path: `mode:headless` with a branch name or PR number parses both correctly
- Error path: `mode:headless mode:autofix` is rejected with a clear error
- Error path: `mode:headless mode:report-only` is rejected with a clear error
- Edge case: `mode:headless` alone with no branch/PR and no determinable scope emits a scope-required error
**Verification:**
- SKILL.md contains `mode:headless` in argument-hint, token table, mode detection table, and a dedicated rules subsection
- Conflicting-flag and missing-scope guard text is present
---
- [ ] **Unit 2: Pipeline Behavior Adjustments**
**Goal:** Add headless-specific behavior for Stage 1 (checkout guard) and Stage 2 (intent ambiguity).
**Requirements:** R1, R4
**Dependencies:** Unit 1
**Files:**
- Modify: `plugins/compound-engineering/skills/ce-review/SKILL.md`
**Approach:**
- In Stage 1 scope detection, add headless to the checkout guard alongside report-only: `mode:headless` and `mode:report-only` must not run `gh pr checkout` or `git checkout` on the shared checkout. They must run in an isolated worktree or stop. When headless stops due to the checkout guard, emit a structured error with re-invocation syntax (e.g., "Re-invoke with base:\<ref\> to review the current checkout, or run from an isolated worktree.").
- In Stage 1 untracked file handling, add headless behavior: if the UNTRACKED list is non-empty, proceed with tracked changes only and note excluded files in the Coverage section of the structured output. Never stop to ask the user — this matches the "infer conservatively" pattern.
- In Stage 2 intent discovery, add headless to the non-interactive path alongside autofix and report-only: infer intent conservatively, note uncertainty in Coverage/Verdict reasoning instead of blocking.
- All changes are small additions to existing conditional text — add headless to the existing mode lists where report-only and autofix are already distinguished.
**Patterns to follow:**
- Existing report-only checkout guard at SKILL.md line 53 ("mode:report-only cannot switch the shared checkout")
- Existing autofix/report-only intent handling at SKILL.md (~line 298)
**Test scenarios:**
- Happy path: headless mode with a PR target uses a worktree or stops instead of switching the shared checkout
- Happy path: headless mode infers intent conservatively when diff metadata is thin
- Happy path: headless mode with untracked files proceeds with tracked changes only and notes exclusions
- Error path: headless stops due to checkout guard and emits re-invocation syntax
**Verification:**
- SKILL.md mentions headless alongside report-only in checkout guard sections
- SKILL.md mentions headless alongside autofix/report-only in intent discovery sections
- SKILL.md specifies headless behavior for untracked files (proceed, don't prompt)
---
- [ ] **Unit 3: Headless Output Format and Post-Review Flow**
**Goal:** Define the headless structured text output and the headless post-review behavior (apply safe_auto, write artifacts, skip todos, output structured text, return completion signal).
**Requirements:** R2, R3, R4, R5, R6
**Dependencies:** Unit 1, Unit 2
**Files:**
- Modify: `plugins/compound-engineering/skills/ce-review/SKILL.md`
- Modify: `plugins/compound-engineering/skills/ce-review/references/review-output-template.md`
**Approach:**
*Stage 6 output:*
- Add a headless-specific output section to SKILL.md that defines the structured text envelope format
- The envelope follows document-review's structural pattern: completion header, metadata (scope/intent/reviewers/verdict), applied fixes count, findings grouped by autofix_class with severity/route/file/line per finding, trailing sections (pre-existing, residual risks, testing gaps)
- Per-finding format: `[severity][autofix_class -> owner] File: <file:line> -- <title> (<reviewer>, confidence <N>)` with Why and Suggested fix lines
- Omit sections with zero items
- In headless mode, output this structured text instead of the interactive pipe-delimited table report
*Post-review flow (After Review section):*
- Add "Headless mode" to Step 2 (Choose policy by mode) parallel to autofix and report-only
- Headless rules: ask no questions; apply `safe_auto -> review-fixer` queue in a single pass (no re-review rounds); skip Step 3's bounded loop entirely
- Step 4 (Emit artifacts): headless writes run artifacts (like autofix) but does NOT create todo files (caller handles routing from structured output)
- Step 5: headless stops after structured text output and "Review complete" signal. No commit/push/PR.
*Review output template:*
- Add a "Headless mode format" section to `review-output-template.md` with the structured text template and formatting rules
- Update the Mode line documentation to include `headless`
**Patterns to follow:**
- document-review headless output format at document-review SKILL.md lines 219-248
- Existing autofix and report-only post-review steps at SKILL.md lines 471-483
- Existing review-output-template.md formatting rules
**Test scenarios:**
- Happy path: headless mode with safe_auto findings applies fixes and returns structured output listing remaining findings
- Happy path: headless mode with no actionable findings returns "Applied 0 safe_auto fixes" and the completion signal
- Happy path: headless mode with mixed findings (safe_auto + gated_auto + manual + advisory) applies safe_auto, returns all others in structured output grouped by autofix_class
- Edge case: headless mode with only advisory findings returns structured output with no fixes applied
- Edge case: headless mode with only pre-existing findings separates them into the pre-existing section
- Integration: headless output includes Verdict line so callers can make merge decisions
- Integration: run artifact is written under `.context/compound-engineering/ce-review/<run-id>/`
- Error path: clean review (zero findings) returns the completion signal with no findings sections
**Verification:**
- SKILL.md has a headless output format section with the structured text envelope
- review-output-template.md includes headless mode format
- Post-review flow has a headless branch in Steps 2, 4, and 5
- No AskUserQuestion or interactive prompts reachable in headless mode
---
- [ ] **Unit 4: Contract Test Extension**
**Goal:** Extend `tests/review-skill-contract.test.ts` to assert headless mode contract invariants.
**Requirements:** R1, R4, R5
**Dependencies:** Units 1-3
**Files:**
- Modify: `tests/review-skill-contract.test.ts`
- Test: `tests/review-skill-contract.test.ts`
**Approach:**
- Add assertions to the existing "documents explicit modes and orchestration boundaries" test for headless mode presence
- Add a new test case for headless-specific contract invariants: completion signal text, no-checkout-switching guard, artifact behavior, no-todo rule, structured output format presence, conflicting-flags guard
- Assert `mode:headless` appears in argument-hint and mode detection table
- Assert headless rules section exists with key behavioral commitments
**Patterns to follow:**
- Existing contract test structure at `tests/review-skill-contract.test.ts` — string containment assertions against SKILL.md content
**Test scenarios:**
- Happy path: contract test passes with all headless mode assertions
- Edge case: if any headless rule text is accidentally removed from SKILL.md, the contract test fails
**Verification:**
- `bun test tests/review-skill-contract.test.ts` passes
- Test covers: mode detection, checkout guard, artifact/todo behavior, completion signal, conflicting flags guard
## System-Wide Impact
- **Interaction graph:** No new callbacks or middleware. Headless mode is a new branch in existing mode-dispatch logic. Existing callers (lfg, slfg) are not changed — they continue using autofix and report-only.
- **Error propagation:** New error paths (conflicting flags, missing scope) emit text errors and stop. No cascading failure risk.
- **State lifecycle risks:** Headless writes run artifacts but not todos. A caller that expects todos from headless would get none — this is intentional and documented.
- **API surface parity:** Headless mode is a new API surface for skill-to-skill invocation. Future orchestrators may adopt it, but existing ones are unchanged.
- **Unchanged invariants:** Stages 3-5 (reviewer selection, sub-agent dispatch, merge/dedup pipeline) are completely unchanged. The findings schema is unchanged. The confidence threshold (0.60) is unchanged.
## Risks & Dependencies
| Risk | Mitigation |
|------|------------|
| Headless checkout guard text diverges from report-only over time | Both share the same guard language — mention headless alongside report-only in the same sentences so they stay in sync |
| Caller assumes headless creates todos and depends on them | Headless rules section explicitly states no todos; contract test asserts it |
| Structured output format drifts from document-review's envelope | Format is documented in review-output-template.md and tested by contract; changes require deliberate updates |
## Sources & References
- **Origin document:** [docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md](docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md)
- Related code: `plugins/compound-engineering/skills/ce-review/SKILL.md`, `plugins/compound-engineering/skills/document-review/SKILL.md`
- Related PRs: #425 (document-review headless mode)
- Learnings: `docs/solutions/skill-design/beta-promotion-orchestration-contract.md`, `docs/solutions/skill-design/compound-refresh-skill-improvements.md`, `docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md`

View File

@@ -1,7 +1,7 @@
--- ---
name: ce:review name: ce:review
description: "Structured code review using tiered persona agents, confidence-gated findings, and a merge/dedup pipeline. Use when reviewing code changes before creating a PR." description: "Structured code review using tiered persona agents, confidence-gated findings, and a merge/dedup pipeline. Use when reviewing code changes before creating a PR."
argument-hint: "[mode:autofix|mode:report-only] [PR number, GitHub URL, or branch name]" argument-hint: "[mode:autofix|mode:report-only|mode:headless] [PR number, GitHub URL, or branch name]"
--- ---
# Code Review # Code Review
@@ -24,11 +24,14 @@ Parse `$ARGUMENTS` for the following optional tokens. Strip each recognized toke
|-------|---------|--------| |-------|---------|--------|
| `mode:autofix` | `mode:autofix` | Select autofix mode (see Mode Detection below) | | `mode:autofix` | `mode:autofix` | Select autofix mode (see Mode Detection below) |
| `mode:report-only` | `mode:report-only` | Select report-only mode | | `mode:report-only` | `mode:report-only` | Select report-only mode |
| `mode:headless` | `mode:headless` | Select headless mode for programmatic callers (see Mode Detection below) |
| `base:<sha-or-ref>` | `base:abc1234` or `base:origin/main` | Skip scope detection — use this as the diff base directly | | `base:<sha-or-ref>` | `base:abc1234` or `base:origin/main` | Skip scope detection — use this as the diff base directly |
| `plan:<path>` | `plan:docs/plans/2026-03-25-001-feat-foo-plan.md` | Load this plan for requirements verification | | `plan:<path>` | `plan:docs/plans/2026-03-25-001-feat-foo-plan.md` | Load this plan for requirements verification |
All tokens are optional. Each one present means one less thing to infer. When absent, fall back to existing behavior for that stage. All tokens are optional. Each one present means one less thing to infer. When absent, fall back to existing behavior for that stage.
**Conflicting mode flags:** If multiple mode tokens appear in arguments, stop and do not dispatch agents. If `mode:headless` is one of the conflicting tokens, emit the headless error envelope: `Review failed (headless mode). Reason: conflicting mode flags — <mode_a> and <mode_b> cannot be combined.` Otherwise emit the generic form: `Review failed. Reason: conflicting mode flags — <mode_a> and <mode_b> cannot be combined.`
## Mode Detection ## Mode Detection
| Mode | When | Behavior | | Mode | When | Behavior |
@@ -36,6 +39,7 @@ All tokens are optional. Each one present means one less thing to infer. When ab
| **Interactive** (default) | No mode token present | Review, present findings, ask for policy decisions when needed, and optionally continue into fix/push/PR next steps | | **Interactive** (default) | No mode token present | Review, present findings, ask for policy decisions when needed, and optionally continue into fix/push/PR next steps |
| **Autofix** | `mode:autofix` in arguments | No user interaction. Review, apply only policy-allowed `safe_auto` fixes, re-review in bounded rounds, write a run artifact, and emit residual downstream work when needed | | **Autofix** | `mode:autofix` in arguments | No user interaction. Review, apply only policy-allowed `safe_auto` fixes, re-review in bounded rounds, write a run artifact, and emit residual downstream work when needed |
| **Report-only** | `mode:report-only` in arguments | Strictly read-only. Review and report only, then stop with no edits, artifacts, todos, commits, pushes, or PR actions | | **Report-only** | `mode:report-only` in arguments | Strictly read-only. Review and report only, then stop with no edits, artifacts, todos, commits, pushes, or PR actions |
| **Headless** | `mode:headless` in arguments | Programmatic mode for skill-to-skill invocation. Apply `safe_auto` fixes silently (single pass), return all other findings as structured text output, write run artifacts, skip todos, and return "Review complete" signal. No interactive prompts. |
### Autofix mode rules ### Autofix mode rules
@@ -53,6 +57,19 @@ All tokens are optional. Each one present means one less thing to infer. When ab
- **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:report-only` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`. - **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:report-only` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`.
- **Do not overlap mutating review with browser testing on the same checkout.** If a future orchestrator wants fixes, run the mutating review phase after browser testing or in an isolated checkout/worktree. - **Do not overlap mutating review with browser testing on the same checkout.** If a future orchestrator wants fixes, run the mutating review phase after browser testing or in an isolated checkout/worktree.
### Headless mode rules
- **Skip all user questions.** Never use the platform question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini) or other interactive prompts. Infer intent conservatively if the diff metadata is thin.
- **Require a determinable diff scope.** If headless mode cannot determine a diff scope (no branch, PR, or `base:` ref determinable without user interaction), emit `Review failed (headless mode). Reason: no diff scope detected. Re-invoke with a branch name, PR number, or base:<ref>.` and stop without dispatching agents.
- **Apply only `safe_auto -> review-fixer` findings in a single pass.** No bounded re-review rounds. Leave `gated_auto`, `manual`, `human`, and `release` work unresolved and return them in the structured output.
- **Return all non-auto findings as structured text output.** Use the headless output envelope format (see Stage 6 below) preserving severity, autofix_class, owner, requires_verification, confidence, evidence[], and pre_existing per finding.
- **Write a run artifact** under `.context/compound-engineering/ce-review/<run-id>/` summarizing findings, applied fixes, and advisory outputs. Include the artifact path in the structured output.
- **Do not create todo files.** The caller receives structured findings and routes downstream work itself.
- **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:headless` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`. When stopping, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.`
- **Not safe for concurrent use on a shared checkout.** Unlike `mode:report-only`, headless mutates files (applies `safe_auto` fixes). Callers must not run headless concurrently with other mutating operations on the same checkout.
- **Never commit, push, or create a PR** from headless mode. The caller owns those decisions.
- **End with "Review complete" as the terminal signal** so callers can detect completion. If all reviewers fail or time out, emit `Code review degraded (headless mode). Reason: 0 of N reviewers returned results.` followed by "Review complete".
## Severity Scale ## Severity Scale
All reviewers use P0-P3: All reviewers use P0-P3:
@@ -164,7 +181,7 @@ This path works with any ref — a SHA, `origin/main`, a branch name. Automated
**If a PR number or GitHub URL is provided as an argument:** **If a PR number or GitHub URL is provided as an argument:**
If `mode:report-only` is active, do **not** run `gh pr checkout <number-or-url>` on the shared checkout. Tell the caller: "mode:report-only cannot switch the shared checkout to review a PR target. Run it from an isolated worktree/checkout for that PR, or run report-only with no target argument on the already checked out branch." Stop here unless the review is already running in an isolated checkout. If `mode:report-only` or `mode:headless` is active, do **not** run `gh pr checkout <number-or-url>` on the shared checkout. For `mode:report-only`, tell the caller: "mode:report-only cannot switch the shared checkout to review a PR target. Run it from an isolated worktree/checkout for that PR, or run report-only with no target argument on the already checked out branch." For `mode:headless`, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.` Stop here unless the review is already running in an isolated checkout.
First, verify the worktree is clean before switching branches: First, verify the worktree is clean before switching branches:
@@ -218,7 +235,7 @@ Extract PR title/body, base branch, and PR URL from `gh pr view`, then extract t
Check out the named branch, then diff it against the base branch. Substitute the provided branch name (shown here as `<branch>`). Check out the named branch, then diff it against the base branch. Substitute the provided branch name (shown here as `<branch>`).
If `mode:report-only` is active, do **not** run `git checkout <branch>` on the shared checkout. Tell the caller: "mode:report-only cannot switch the shared checkout to review another branch. Run it from an isolated worktree/checkout for `<branch>`, or run report-only on the current checkout with no target argument." Stop here unless the review is already running in an isolated checkout. If `mode:report-only` or `mode:headless` is active, do **not** run `git checkout <branch>` on the shared checkout. For `mode:report-only`, tell the caller: "mode:report-only cannot switch the shared checkout to review another branch. Run it from an isolated worktree/checkout for `<branch>`, or run report-only on the current checkout with no target argument." For `mode:headless`, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.` Stop here unless the review is already running in an isolated checkout.
First, verify the worktree is clean before switching branches: First, verify the worktree is clean before switching branches:
@@ -270,7 +287,7 @@ echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:"
Using `git diff $BASE` (without `..HEAD`) diffs the merge-base against the working tree, which includes committed, staged, and unstaged changes together. Using `git diff $BASE` (without `..HEAD`) diffs the merge-base against the working tree, which includes committed, staged, and unstaged changes together.
**Untracked file handling:** Always inspect the `UNTRACKED:` list, even when `FILES:`/`DIFF:` are non-empty. Untracked files are outside review scope until staged. If the list is non-empty, tell the user which files are excluded. If any of them should be reviewed, stop and tell the user to `git add` them first and rerun. Only continue when the user is intentionally reviewing tracked changes only. **Untracked file handling:** Always inspect the `UNTRACKED:` list, even when `FILES:`/`DIFF:` are non-empty. Untracked files are outside review scope until staged. If the list is non-empty, tell the user which files are excluded. If any of them should be reviewed, stop and tell the user to `git add` them first and rerun. Only continue when the user is intentionally reviewing tracked changes only. In `mode:headless` or `mode:autofix`, do not stop to ask — proceed with tracked changes only and note the excluded untracked files in the Coverage section of the output.
### Stage 2: Intent discovery ### Stage 2: Intent discovery
@@ -298,7 +315,7 @@ Pass this to every reviewer in their spawn prompt. Intent shapes *how hard each
**When intent is ambiguous:** **When intent is ambiguous:**
- **Interactive mode:** Ask one question using the platform's interactive question tool (AskUserQuestion in Claude Code, request_user_input in Codex): "What is the primary goal of these changes?" Do not spawn reviewers until intent is established. - **Interactive mode:** Ask one question using the platform's interactive question tool (AskUserQuestion in Claude Code, request_user_input in Codex): "What is the primary goal of these changes?" Do not spawn reviewers until intent is established.
- **Autofix/report-only modes:** Infer intent conservatively from the branch name, diff, PR metadata, and caller context. Note the uncertainty in Coverage or Verdict reasoning instead of blocking. - **Autofix/report-only/headless modes:** Infer intent conservatively from the branch name, diff, PR metadata, and caller context. Note the uncertainty in Coverage or Verdict reasoning instead of blocking.
### Stage 2b: Plan discovery (requirements verification) ### Stage 2b: Plan discovery (requirements verification)
@@ -420,6 +437,80 @@ Assemble the final report using the review output template included below:
Do not include time estimates. Do not include time estimates.
### Headless output format
In `mode:headless`, replace the interactive pipe-delimited table report with a structured text envelope. The envelope follows the same structural pattern as document-review's headless output (completion header, metadata block, findings grouped by autofix_class, trailing sections) while using ce:review's own section headings and per-finding fields.
```
Code review complete (headless mode).
Scope: <scope-line>
Intent: <intent-summary>
Reviewers: <reviewer-list with conditional justifications>
Verdict: <Ready to merge | Ready with fixes | Not ready>
Artifact: .context/compound-engineering/ce-review/<run-id>/
Applied N safe_auto fixes.
Gated-auto findings (concrete fix, changes behavior/contracts):
[P1][gated_auto -> downstream-resolver][needs-verification] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Suggested fix: <suggested_fix or "none">
Evidence: <evidence[0]>
Evidence: <evidence[1]>
Manual findings (actionable, needs handoff):
[P1][manual -> downstream-resolver] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Evidence: <evidence[0]>
Advisory findings (report-only):
[P2][advisory -> human] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Pre-existing issues:
[P2][gated_auto -> downstream-resolver] File: <file:line> -- <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Residual risks:
- <risk>
Learnings & Past Solutions:
- <learning>
Agent-Native Gaps:
- <gap description>
Schema Drift Check:
- <drift status>
Deployment Notes:
- <deployment note>
Testing gaps:
- <gap>
Coverage:
- Suppressed: <N> findings below 0.60 confidence
- Untracked files excluded: <file1>, <file2>
- Failed reviewers: <reviewer>
Review complete
```
**Formatting rules:**
- The `[needs-verification]` marker appears only on findings where `requires_verification: true`.
- The `Artifact:` line gives callers the path to the full run artifact for machine-readable access to the complete findings schema. The text envelope is the primary handoff; the artifact is for debugging and full-fidelity access.
- Findings with `owner: release` appear in the Advisory section (they are operational/rollout items, not code fixes).
- Findings with `pre_existing: true` appear in the Pre-existing section regardless of autofix_class.
- The Verdict appears in the metadata header (deliberately reordered from the interactive format where it appears at the bottom) so programmatic callers get the verdict first.
- Omit any section with zero items.
- If all reviewers fail or time out, emit `Code review degraded (headless mode). Reason: 0 of N reviewers returned results.` followed by "Review complete".
- End with "Review complete" as the terminal signal so callers can detect completion.
## Quality Gates ## Quality Gates
Before delivering the review, verify: Before delivering the review, verify:
@@ -482,6 +573,15 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
- Do not create residual todos or `.context` artifacts. - Do not create residual todos or `.context` artifacts.
- Stop after Stage 6. Everything remains in the report. - Stop after Stage 6. Everything remains in the report.
**Headless mode**
- Ask no questions.
- Apply only the `safe_auto -> review-fixer` queue in a single pass. Do not enter the bounded re-review loop (Step 3). Spawn one fixer subagent, apply fixes, then proceed directly to Step 4.
- Leave `gated_auto`, `manual`, `human`, and `release` items unresolved — they appear in the structured text output.
- Output the headless output envelope (see Stage 6) instead of the interactive report.
- Write a run artifact (Step 4) but do not create todo files.
- Stop after the structured text output and "Review complete" signal. No commit/push/PR.
#### Step 3: Apply fixes with one fixer and bounded rounds #### Step 3: Apply fixes with one fixer and bounded rounds
- Spawn exactly one fixer subagent for the current fixer queue in the current checkout. That fixer applies all approved changes and runs the relevant targeted tests in one pass against a consistent tree. - Spawn exactly one fixer subagent for the current fixer queue in the current checkout. That fixer applies all approved changes and runs the relevant targeted tests in one pass against a consistent tree.
@@ -493,7 +593,7 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
#### Step 4: Emit artifacts and downstream handoff #### Step 4: Emit artifacts and downstream handoff
- In interactive and autofix modes, write a per-run artifact under `.context/compound-engineering/ce-review/<run-id>/` containing: - In interactive, autofix, and headless modes, write a per-run artifact under `.context/compound-engineering/ce-review/<run-id>/` containing:
- synthesized findings - synthesized findings
- applied fixes - applied fixes
- residual actionable work - residual actionable work
@@ -521,7 +621,7 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
If "Create a PR": first publish the branch with `git push --set-upstream origin HEAD`, then use `gh pr create` with a title and summary derived from the branch changes. If "Create a PR": first publish the branch with `git push --set-upstream origin HEAD`, then use `gh pr create` with a title and summary derived from the branch changes.
If "Push fixes": push the branch with `git push` to update the existing PR. If "Push fixes": push the branch with `git push` to update the existing PR.
**Autofix and report-only modes:** stop after the report, artifact emission, and residual-work handoff. Do not commit, push, or create a PR. **Autofix, report-only, and headless modes:** stop after the report, artifact emission, and residual-work handoff. Do not commit, push, or create a PR.
## Fallback ## Fallback

View File

@@ -101,7 +101,7 @@ Use this **exact format** when presenting synthesized review findings. Findings
- **Confidence column** shows the finding's confidence score - **Confidence column** shows the finding's confidence score
- **Route column** shows the synthesized handling decision as ``<autofix_class> -> <owner>``. - **Route column** shows the synthesized handling decision as ``<autofix_class> -> <owner>``.
- **Header includes** scope, intent, and reviewer team with per-conditional justifications - **Header includes** scope, intent, and reviewer team with per-conditional justifications
- **Mode line** -- include `interactive`, `autofix`, or `report-only` - **Mode line** -- include `interactive`, `autofix`, `report-only`, or `headless`
- **Applied Fixes section** -- include only when a fix phase ran in this review invocation - **Applied Fixes section** -- include only when a fix phase ran in this review invocation
- **Residual Actionable Work section** -- include only when unresolved actionable findings were handed off for later work - **Residual Actionable Work section** -- include only when unresolved actionable findings were handed off for later work
- **Pre-existing section** -- separate table, no confidence column (these are informational) - **Pre-existing section** -- separate table, no confidence column (these are informational)
@@ -113,3 +113,15 @@ Use this **exact format** when presenting synthesized review findings. Findings
- **Summary uses blockquotes** for verdict, reasoning, and fix order - **Summary uses blockquotes** for verdict, reasoning, and fix order
- **Horizontal rule** (`---`) separates findings from verdict - **Horizontal rule** (`---`) separates findings from verdict
- **`###` headers** for each section -- never plain text headers - **`###` headers** for each section -- never plain text headers
## Headless Mode Format
In `mode:headless`, replace the interactive pipe-delimited table report with a structured text envelope. The headless format is defined in the `### Headless output format` section of SKILL.md. Key differences from the interactive format:
- **No pipe-delimited tables.** Findings use `[severity][autofix_class -> owner] File: <file:line> -- <title>` line format with indented Why/Evidence/Suggested fix lines.
- **Findings grouped by autofix_class** (gated-auto, manual, advisory) instead of severity. Within each group, findings are sorted by severity.
- **Verdict in header** (top of output) instead of bottom, so programmatic callers get it first.
- **`Artifact:` line** in metadata header gives callers the path to the full run artifact.
- **`[needs-verification]` marker** on findings where `requires_verification: true`.
- **Evidence lines** included per finding.
- **Completion signal:** "Review complete" as the final line.

View File

@@ -1,7 +1,7 @@
--- ---
name: document-review name: document-review
description: Review requirements or plan documents using parallel persona agents that surface role-specific issues. Use when a requirements document or plan document exists and the user wants to improve it. description: Review requirements or plan documents using parallel persona agents that surface role-specific issues. Use when a requirements document or plan document exists and the user wants to improve it.
argument-hint: "[path/to/document.md]" argument-hint: "[mode:headless] [path/to/document.md]"
--- ---
# Document Review # Document Review
@@ -23,7 +23,7 @@ The caller receives findings with their original classifications intact and deci
Callers invoke headless mode by including `mode:headless` in the skill arguments, e.g.: Callers invoke headless mode by including `mode:headless` in the skill arguments, e.g.:
``` ```
Skill("compound-engineering:document-review", "docs/plans/my-plan.md mode:headless") Skill("compound-engineering:document-review", "mode:headless docs/plans/my-plan.md")
``` ```
If `mode:headless` is not present, the skill runs in its default interactive mode with no behavior change. If `mode:headless` is not present, the skill runs in its default interactive mode with no behavior change.
@@ -34,7 +34,7 @@ If `mode:headless` is not present, the skill runs in its default interactive mod
**If no document is specified (interactive mode):** Ask which document to review, or find the most recent in `docs/brainstorms/` or `docs/plans/` using a file-search/glob tool (e.g., Glob in Claude Code). **If no document is specified (interactive mode):** Ask which document to review, or find the most recent in `docs/brainstorms/` or `docs/plans/` using a file-search/glob tool (e.g., Glob in Claude Code).
**If no document is specified (headless mode):** Output "Review failed: headless mode requires a document path. Re-invoke with: Skill(\"compound-engineering:document-review\", \"<path> mode:headless\")" without dispatching agents. **If no document is specified (headless mode):** Output "Review failed: headless mode requires a document path. Re-invoke with: Skill(\"compound-engineering:document-review\", \"mode:headless <path>\")" without dispatching agents.
### Classify Document Type ### Classify Document Type

View File

@@ -14,6 +14,7 @@ describe("ce-review contract", () => {
expect(content).toContain("## Mode Detection") expect(content).toContain("## Mode Detection")
expect(content).toContain("mode:autofix") expect(content).toContain("mode:autofix")
expect(content).toContain("mode:report-only") expect(content).toContain("mode:report-only")
expect(content).toContain("mode:headless")
expect(content).toContain(".context/compound-engineering/ce-review/<run-id>/") expect(content).toContain(".context/compound-engineering/ce-review/<run-id>/")
expect(content).toContain("Do not create residual todos or `.context` artifacts.") expect(content).toContain("Do not create residual todos or `.context` artifacts.")
expect(content).toContain( expect(content).toContain(
@@ -25,6 +26,49 @@ describe("ce-review contract", () => {
expect(content).not.toContain("Which severities should I fix?") expect(content).not.toContain("Which severities should I fix?")
}) })
test("documents headless mode contract for programmatic callers", async () => {
const content = await readRepoFile("plugins/compound-engineering/skills/ce-review/SKILL.md")
// Headless mode has its own rules section
expect(content).toContain("### Headless mode rules")
// No interactive prompts (cross-platform)
expect(content).toContain(
"Never use the platform question tool",
)
// Structured output format
expect(content).toContain("### Headless output format")
expect(content).toContain("Code review complete (headless mode).")
expect(content).toContain('"Review complete" as the terminal signal')
// Applies safe_auto fixes but NOT safe for concurrent use
expect(content).toContain(
"Not safe for concurrent use on a shared checkout.",
)
// Writes artifacts but no todos, no commit/push/PR
expect(content).toContain("Do not create todo files.")
expect(content).toContain(
"Never commit, push, or create a PR",
)
// Single-pass fixing, no bounded re-review rounds
expect(content).toContain("No bounded re-review rounds")
// Checkout guard — headless shares report-only's guard
expect(content).toMatch(/mode:headless.*must run in an isolated checkout\/worktree or stop/)
// Conflicting mode flags
expect(content).toContain("**Conflicting mode flags:**")
// Structured error for missing scope
expect(content).toContain("Review failed (headless mode). Reason: no diff scope detected.")
// Degraded signal when all reviewers fail
expect(content).toContain("Code review degraded (headless mode).")
})
test("documents policy-driven routing and residual handoff", async () => { test("documents policy-driven routing and residual handoff", async () => {
const content = await readRepoFile("plugins/compound-engineering/skills/ce-review/SKILL.md") const content = await readRepoFile("plugins/compound-engineering/skills/ce-review/SKILL.md")