From 3706a9764b6e73b7a155771956646ddef73f04a5 Mon Sep 17 00:00:00 2001 From: Trevin Chow Date: Sat, 28 Mar 2026 16:09:21 -0700 Subject: [PATCH] feat(ce-review): add headless mode for programmatic callers (#430) Co-authored-by: Claude Opus 4.6 (1M context) --- ...28-ce-review-headless-mode-requirements.md | 58 +++ ...8-001-feat-ce-review-headless-mode-plan.md | 330 ++++++++++++++++++ .../skills/ce-review/SKILL.md | 114 +++++- .../references/review-output-template.md | 14 +- .../skills/document-review/SKILL.md | 6 +- tests/review-skill-contract.test.ts | 44 +++ 6 files changed, 555 insertions(+), 11 deletions(-) create mode 100644 docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md create mode 100644 docs/plans/2026-03-28-001-feat-ce-review-headless-mode-plan.md diff --git a/docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md b/docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md new file mode 100644 index 0000000..0b56720 --- /dev/null +++ b/docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md @@ -0,0 +1,58 @@ +--- +date: 2026-03-28 +topic: ce-review-headless-mode +--- + +# ce:review Headless Mode + +## Problem Frame + +ce:review currently has three modes (interactive, autofix, report-only), but all assume some level of direct user interaction or have mode-specific behaviors that don't fit programmatic callers. When another skill needs code review results as structured input, there's no way to invoke ce:review without it trying to prompt a user or applying fixes with interactive-session assumptions. + +document-review solved this same problem in PR #425 with a `mode:headless` pattern. ce:review needs the same capability so it can be used as a utility skill by other workflows. + +## Requirements + +**Argument Parsing** +- R1. Add `mode:headless` argument, parsed alongside existing mode flags + +**Runtime Behavior** +- R2. In headless mode, apply `safe_auto` fixes silently (matching autofix behavior) +- R4. No `AskUserQuestion` or other interactive prompts in headless mode +- R5. End with a clear completion signal so callers can detect when the review is done + +**Output Format** +- R3. Return all non-auto findings (`gated_auto`, `manual`, `advisory`) as structured text output, preserving their original classifications (severity, autofix_class, owner, confidence, evidence[], pre_existing) +- R6. Follow document-review's structural output pattern (same envelope format, same section headings, similar parsing heuristics) while adapting per-finding fields to ce:review's own schema + +## Success Criteria + +- Another skill can invoke ce:review with `mode:headless`, receive structured findings, and act on them without any user interaction +- Output envelope (section headings, severity grouping, completion signal) is structurally consistent with document-review's headless output so callers can use a similar consumption pattern for both, while per-finding fields reflect ce:review's own schema + +## Scope Boundaries + +- Not changing the existing three modes (interactive, autofix, report-only) +- Not adding new reviewer personas or changing the review pipeline itself +- Not building a specific caller workflow in this change — just enabling the capability + +## Key Decisions + +- **Apply safe_auto fixes in headless**: Matches document-review's pattern where auto-fixes are applied silently and everything else is returned for the caller to handle +- **Structural consistency with document-review, not schema compatibility**: Same envelope and section headings, but per-finding fields use ce:review's own schema (which has different autofix_class values, owner, pre_existing, etc.). Callers will need skill-aware parsing for individual findings + +## Outstanding Questions + +### Deferred to Planning + +- [Affects R3][Technical] Exact structured output format — should it mirror document-review's text format verbatim, or adapt to ce:review's richer findings schema (which includes fields like `autofix_class`, `evidence[]`, `pre_existing` that document-review doesn't have)? +- [Affects R1][Technical] How `mode:headless` interacts with the existing mode parsing — is it a fourth mode, or an overlay that modifies report-only/autofix behavior? +- [Affects R5][Technical] What the completion signal looks like — "Review complete (headless mode)" text, or a more structured envelope? +- [Affects R2][Technical] Should headless mode write run artifacts (`.context/compound-engineering/ce-review//`) and create durable todo files like autofix, or suppress them like report-only? +- [Affects R1][Technical] How should headless mode handle checkout/branch switching in Stage 1? Programmatic callers may need the checkout to stay stable (like report-only) even though headless applies fixes (like autofix). +- [Affects R1][Technical] Error behavior when headless receives conflicting mode flags (e.g., `mode:headless` + existing mode flags) or missing diff scope (no changes, no PR). +- [Affects R2][Technical] Should headless mode support bounded re-review rounds (max_rounds: 2) like autofix, or be single-pass? + +## Next Steps + +-> `/ce:plan` for structured implementation planning diff --git a/docs/plans/2026-03-28-001-feat-ce-review-headless-mode-plan.md b/docs/plans/2026-03-28-001-feat-ce-review-headless-mode-plan.md new file mode 100644 index 0000000..9b9d198 --- /dev/null +++ b/docs/plans/2026-03-28-001-feat-ce-review-headless-mode-plan.md @@ -0,0 +1,330 @@ +--- +title: "feat(ce-review): Add headless mode for programmatic callers" +type: feat +status: completed +date: 2026-03-28 +origin: docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md +--- + +# feat(ce-review): Add headless mode for programmatic callers + +## Overview + +Add `mode:headless` to ce:review so other skills can invoke it programmatically and receive structured findings without interactive prompts. Follows the pattern established by document-review's headless mode (PR #425). + +## Problem Frame + +ce:review has three modes (interactive, autofix, report-only), but none is designed for skill-to-skill invocation where the caller wants structured findings returned as parseable output. Autofix applies fixes and writes todos; report-only is read-only and outputs a human-readable report. Neither returns structured output for a calling workflow to consume and route. (see origin: `docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md`) + +## Requirements Trace + +- R1. Add `mode:headless` argument, parsed alongside existing mode flags +- R2. In headless mode, apply `safe_auto` fixes silently (matching autofix behavior) +- R3. Return all non-auto findings as structured text output, preserving severity, autofix_class, owner, requires_verification, confidence, evidence[], pre_existing +- R4. No `AskUserQuestion` or other interactive prompts in headless mode +- R5. End with a clear completion signal so callers can detect when the review is done +- R6. Follow document-review's structural output *pattern* (completion header, metadata block, autofix-class-grouped findings, trailing sections) while using ce:review's own section headings and per-finding fields + +## Scope Boundaries + +- Not changing existing three modes (interactive, autofix, report-only) +- Not adding new reviewer personas or changing the review pipeline (Stages 3-5) +- Not building a specific caller workflow — just enabling the capability +- Not adding headless invocations to existing orchestrators (lfg, slfg) in this change + +## Context & Research + +### Relevant Code and Patterns + +- `plugins/compound-engineering/skills/ce-review/SKILL.md` — the skill to modify (mode detection at line 32, argument parsing at line 19, post-review flow at line 440) +- `plugins/compound-engineering/skills/ce-review/references/review-output-template.md` — existing output template with pipe-delimited tables and severity-grouped sections +- `plugins/compound-engineering/skills/ce-review/references/findings-schema.json` — ce:review's findings schema with `safe_auto|gated_auto|manual|advisory` autofix_class and `review-fixer|downstream-resolver|human|release` owner +- `plugins/compound-engineering/skills/document-review/SKILL.md` — headless mode pattern to follow (Phase 0 parsing, Phase 4 headless output, Phase 5 immediate return) +- `tests/review-skill-contract.test.ts` — contract test to extend + +### Institutional Learnings + +- `docs/solutions/skill-design/beta-promotion-orchestration-contract.md` — contract tests must be extended atomically with new mode flags +- `docs/solutions/skill-design/compound-refresh-skill-improvements.md` — explicit opt-in only for autonomous modes (no auto-detection from tool availability); conservative treatment of borderline cases +- `docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md` — walk all mode x state combinations when adding a new mode branch +- `docs/solutions/agent-friendly-cli-principles.md` — structured parseable output with stable field contracts for programmatic callers + +## Key Technical Decisions + +- **Headless is a fourth explicit mode, not an overlay**: Each mode is self-contained with its own complete behavior specification. This avoids whack-a-mole regressions from overlay interactions (per state-machine learning). Headless has its own rules section parallel to autofix and report-only. + +- **No shared checkout switching, but NOT safe for concurrent use**: Headless follows report-only's checkout guard — if a PR/branch target is passed, headless must run in an isolated worktree or stop. However, unlike report-only, headless mutates files (applies safe_auto fixes). Callers must not run headless concurrently with other mutating operations on the same checkout. The headless rules section should explicitly state this. + +- **Single-pass, no re-review rounds**: Headless applies `safe_auto` fixes in one pass and returns. No bounded fixer loop. Rationale: autofix uses max_rounds:2 because it operates autonomously within a larger workflow; headless returns structured output to a caller that can re-invoke if needed. The caller owns the iteration decision, keeping headless simple and predictable. Applied fixes that introduce new issues will be caught on a subsequent invocation if the caller chooses to re-review. + +- **Write run artifacts, skip todos**: Run artifacts (`.context/compound-engineering/ce-review//`) provide an audit trail of what headless did. Todo files are skipped because the caller receives structured findings and routes downstream work itself. + +- **Reject conflicting mode flags**: `mode:headless` is incompatible with `mode:autofix` and `mode:report-only`. If multiple mode tokens appear, emit an error and stop. This follows the "fail fast with actionable errors" principle. + +- **Require diff scope with structured error**: Like document-review requiring a document path in headless mode, ce:review headless requires that a diff scope is determinable (branch, PR, or `base:` ref). If scope cannot be determined, emit a structured error: `Review failed (headless mode). Reason: `. No agents are dispatched. The same structured error format applies to conflicting mode flags. + +## Open Questions + +### Resolved During Planning + +- **Fourth mode vs overlay?** Fourth mode. Self-contained behavior avoids overlay ambiguity. (Grounded in state-machine learning and the fact that all three existing modes have independent rules sections.) +- **Artifacts and todos?** Write artifacts (audit trail), skip todos (caller routes findings). Headless owns mutation but not downstream handoff. +- **Checkout behavior?** No shared checkout switching. Same guard as report-only, since headless callers need stable checkouts. +- **Re-review rounds?** Single-pass. Callers can re-invoke if needed. + +### Deferred to Implementation + +- **Conflicting flags and missing scope error messages**: Decision made (reject with structured error), but exact wording and error envelope format deferred to implementation +- Whether the run artifact format needs any headless-specific metadata (e.g., marking the run as headless) + +## High-Level Technical Design + +> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.* + +### Mode x Behavior Decision Matrix + +| Behavior | Interactive | Autofix | Report-only | **Headless** | +|----------|------------|---------|-------------|--------------| +| User questions | Yes | No | No | **No** | +| Checkout switching | Yes | Yes | No (worktree or stop) | **No (worktree or stop)** | +| Intent ambiguity | Ask user | Infer conservatively | Infer conservatively | **Infer conservatively** | +| Apply safe_auto fixes | After policy question | Automatically | Never | **safe_auto only, single pass** | +| Apply gated_auto/manual fixes | After user approval | Never | Never | **Never (returned in output)** | +| Re-review rounds | max_rounds: 2 | max_rounds: 2 | N/A | **Single pass (no re-review)** | +| Write run artifact | Yes | Yes | No | **Yes** | +| Create todo files | No (user decides) | Yes (downstream-resolver) | No | **No (caller routes)** | +| Structured text output | No (interactive report) | No (interactive report) | No (interactive report) | **Yes (headless envelope)** | +| Commit/push/PR | Offered | Never | Never | **Never** | +| Completion signal | N/A | Stops after artifacts | Stops after report | **"Review complete"** | +| Safe for concurrent use | No | No | Yes (read-only) | **No (mutates files)** | + +### Headless Output Envelope + +Follows document-review's structural pattern adapted for ce:review's schema: + +``` +Code review complete (headless mode). + +Scope: +Intent: +Reviewers: +Verdict: +Artifact: .context/compound-engineering/ce-review// + +Applied N safe_auto fixes. + +Gated-auto findings (concrete fix, changes behavior/contracts): + +[P1][gated_auto -> downstream-resolver][needs-verification] File: -- (<reviewer>, confidence <N>) + Why: <why_it_matters> + Suggested fix: <suggested_fix or "none"> + Evidence: <evidence[0]> + Evidence: <evidence[1]> + +Manual findings (actionable, needs handoff): + +[P1][manual -> downstream-resolver] File: <file:line> -- <title> (<reviewer>, confidence <N>) + Why: <why_it_matters> + Evidence: <evidence[0]> + +Advisory findings (report-only): + +[P2][advisory -> human] File: <file:line> -- <title> (<reviewer>, confidence <N>) + Why: <why_it_matters> + +Pre-existing issues: +- <file:line> -- <title> (<reviewer>) + +Residual risks: +- <risk> + +Testing gaps: +- <gap> +``` + +The `[needs-verification]` marker appears only on findings where `requires_verification: true`. The `Artifact:` line gives callers the path to the full run artifact for machine-readable access to the complete findings schema. The text envelope is the primary handoff; the artifact is for debugging and full-fidelity access. + +Findings with `owner: release` appear in the Advisory section (they are operational/rollout items, not code fixes). Findings with `pre_existing: true` appear in the Pre-existing section regardless of autofix_class. + +Omit any section with zero items. If all reviewers fail or time out, emit a degraded signal: `Code review degraded (headless mode). Reason: 0 of N reviewers returned results.` followed by "Review complete" so the caller can detect the failure and decide how to proceed. + +Then output "Review complete" as the terminal signal. + +## Implementation Units + +- [ ] **Unit 1: Mode Infrastructure** + +**Goal:** Add `mode:headless` to argument parsing, mode detection, and error handling for conflicting flags / missing scope. + +**Requirements:** R1, R4 + +**Dependencies:** None + +**Files:** +- Modify: `plugins/compound-engineering/skills/ce-review/SKILL.md` + +**Approach:** +- Add `mode:headless` row to the Argument Parsing token table (alongside `mode:autofix` and `mode:report-only`) +- Add headless row to the Mode Detection table with behavior summary +- Add a "Headless mode rules" subsection parallel to "Autofix mode rules" and "Report-only mode rules" +- Update the `argument-hint` frontmatter to include `mode:headless` +- Add conflicting-flag guard: if multiple mode tokens appear in arguments, emit an error message listing the conflict and stop +- Add scope-required guard: if headless mode cannot determine diff scope without user interaction, emit an error with re-invocation syntax (matching document-review's nil-path pattern) + +**Patterns to follow:** +- Existing mode detection table structure at SKILL.md line 34 +- Existing mode rules subsections at SKILL.md lines 40-54 +- document-review Phase 0 parsing and nil-path guard at document-review SKILL.md lines 12-37 + +**Test scenarios:** +- Happy path: `mode:headless` token is parsed and headless mode is activated +- Happy path: `mode:headless` with a branch name or PR number parses both correctly +- Error path: `mode:headless mode:autofix` is rejected with a clear error +- Error path: `mode:headless mode:report-only` is rejected with a clear error +- Edge case: `mode:headless` alone with no branch/PR and no determinable scope emits a scope-required error + +**Verification:** +- SKILL.md contains `mode:headless` in argument-hint, token table, mode detection table, and a dedicated rules subsection +- Conflicting-flag and missing-scope guard text is present + +--- + +- [ ] **Unit 2: Pipeline Behavior Adjustments** + +**Goal:** Add headless-specific behavior for Stage 1 (checkout guard) and Stage 2 (intent ambiguity). + +**Requirements:** R1, R4 + +**Dependencies:** Unit 1 + +**Files:** +- Modify: `plugins/compound-engineering/skills/ce-review/SKILL.md` + +**Approach:** +- In Stage 1 scope detection, add headless to the checkout guard alongside report-only: `mode:headless` and `mode:report-only` must not run `gh pr checkout` or `git checkout` on the shared checkout. They must run in an isolated worktree or stop. When headless stops due to the checkout guard, emit a structured error with re-invocation syntax (e.g., "Re-invoke with base:\<ref\> to review the current checkout, or run from an isolated worktree."). +- In Stage 1 untracked file handling, add headless behavior: if the UNTRACKED list is non-empty, proceed with tracked changes only and note excluded files in the Coverage section of the structured output. Never stop to ask the user — this matches the "infer conservatively" pattern. +- In Stage 2 intent discovery, add headless to the non-interactive path alongside autofix and report-only: infer intent conservatively, note uncertainty in Coverage/Verdict reasoning instead of blocking. +- All changes are small additions to existing conditional text — add headless to the existing mode lists where report-only and autofix are already distinguished. + +**Patterns to follow:** +- Existing report-only checkout guard at SKILL.md line 53 ("mode:report-only cannot switch the shared checkout") +- Existing autofix/report-only intent handling at SKILL.md (~line 298) + +**Test scenarios:** +- Happy path: headless mode with a PR target uses a worktree or stops instead of switching the shared checkout +- Happy path: headless mode infers intent conservatively when diff metadata is thin +- Happy path: headless mode with untracked files proceeds with tracked changes only and notes exclusions +- Error path: headless stops due to checkout guard and emits re-invocation syntax + +**Verification:** +- SKILL.md mentions headless alongside report-only in checkout guard sections +- SKILL.md mentions headless alongside autofix/report-only in intent discovery sections +- SKILL.md specifies headless behavior for untracked files (proceed, don't prompt) + +--- + +- [ ] **Unit 3: Headless Output Format and Post-Review Flow** + +**Goal:** Define the headless structured text output and the headless post-review behavior (apply safe_auto, write artifacts, skip todos, output structured text, return completion signal). + +**Requirements:** R2, R3, R4, R5, R6 + +**Dependencies:** Unit 1, Unit 2 + +**Files:** +- Modify: `plugins/compound-engineering/skills/ce-review/SKILL.md` +- Modify: `plugins/compound-engineering/skills/ce-review/references/review-output-template.md` + +**Approach:** + +*Stage 6 output:* +- Add a headless-specific output section to SKILL.md that defines the structured text envelope format +- The envelope follows document-review's structural pattern: completion header, metadata (scope/intent/reviewers/verdict), applied fixes count, findings grouped by autofix_class with severity/route/file/line per finding, trailing sections (pre-existing, residual risks, testing gaps) +- Per-finding format: `[severity][autofix_class -> owner] File: <file:line> -- <title> (<reviewer>, confidence <N>)` with Why and Suggested fix lines +- Omit sections with zero items +- In headless mode, output this structured text instead of the interactive pipe-delimited table report + +*Post-review flow (After Review section):* +- Add "Headless mode" to Step 2 (Choose policy by mode) parallel to autofix and report-only +- Headless rules: ask no questions; apply `safe_auto -> review-fixer` queue in a single pass (no re-review rounds); skip Step 3's bounded loop entirely +- Step 4 (Emit artifacts): headless writes run artifacts (like autofix) but does NOT create todo files (caller handles routing from structured output) +- Step 5: headless stops after structured text output and "Review complete" signal. No commit/push/PR. + +*Review output template:* +- Add a "Headless mode format" section to `review-output-template.md` with the structured text template and formatting rules +- Update the Mode line documentation to include `headless` + +**Patterns to follow:** +- document-review headless output format at document-review SKILL.md lines 219-248 +- Existing autofix and report-only post-review steps at SKILL.md lines 471-483 +- Existing review-output-template.md formatting rules + +**Test scenarios:** +- Happy path: headless mode with safe_auto findings applies fixes and returns structured output listing remaining findings +- Happy path: headless mode with no actionable findings returns "Applied 0 safe_auto fixes" and the completion signal +- Happy path: headless mode with mixed findings (safe_auto + gated_auto + manual + advisory) applies safe_auto, returns all others in structured output grouped by autofix_class +- Edge case: headless mode with only advisory findings returns structured output with no fixes applied +- Edge case: headless mode with only pre-existing findings separates them into the pre-existing section +- Integration: headless output includes Verdict line so callers can make merge decisions +- Integration: run artifact is written under `.context/compound-engineering/ce-review/<run-id>/` +- Error path: clean review (zero findings) returns the completion signal with no findings sections + +**Verification:** +- SKILL.md has a headless output format section with the structured text envelope +- review-output-template.md includes headless mode format +- Post-review flow has a headless branch in Steps 2, 4, and 5 +- No AskUserQuestion or interactive prompts reachable in headless mode + +--- + +- [ ] **Unit 4: Contract Test Extension** + +**Goal:** Extend `tests/review-skill-contract.test.ts` to assert headless mode contract invariants. + +**Requirements:** R1, R4, R5 + +**Dependencies:** Units 1-3 + +**Files:** +- Modify: `tests/review-skill-contract.test.ts` +- Test: `tests/review-skill-contract.test.ts` + +**Approach:** +- Add assertions to the existing "documents explicit modes and orchestration boundaries" test for headless mode presence +- Add a new test case for headless-specific contract invariants: completion signal text, no-checkout-switching guard, artifact behavior, no-todo rule, structured output format presence, conflicting-flags guard +- Assert `mode:headless` appears in argument-hint and mode detection table +- Assert headless rules section exists with key behavioral commitments + +**Patterns to follow:** +- Existing contract test structure at `tests/review-skill-contract.test.ts` — string containment assertions against SKILL.md content + +**Test scenarios:** +- Happy path: contract test passes with all headless mode assertions +- Edge case: if any headless rule text is accidentally removed from SKILL.md, the contract test fails + +**Verification:** +- `bun test tests/review-skill-contract.test.ts` passes +- Test covers: mode detection, checkout guard, artifact/todo behavior, completion signal, conflicting flags guard + +## System-Wide Impact + +- **Interaction graph:** No new callbacks or middleware. Headless mode is a new branch in existing mode-dispatch logic. Existing callers (lfg, slfg) are not changed — they continue using autofix and report-only. +- **Error propagation:** New error paths (conflicting flags, missing scope) emit text errors and stop. No cascading failure risk. +- **State lifecycle risks:** Headless writes run artifacts but not todos. A caller that expects todos from headless would get none — this is intentional and documented. +- **API surface parity:** Headless mode is a new API surface for skill-to-skill invocation. Future orchestrators may adopt it, but existing ones are unchanged. +- **Unchanged invariants:** Stages 3-5 (reviewer selection, sub-agent dispatch, merge/dedup pipeline) are completely unchanged. The findings schema is unchanged. The confidence threshold (0.60) is unchanged. + +## Risks & Dependencies + +| Risk | Mitigation | +|------|------------| +| Headless checkout guard text diverges from report-only over time | Both share the same guard language — mention headless alongside report-only in the same sentences so they stay in sync | +| Caller assumes headless creates todos and depends on them | Headless rules section explicitly states no todos; contract test asserts it | +| Structured output format drifts from document-review's envelope | Format is documented in review-output-template.md and tested by contract; changes require deliberate updates | + +## Sources & References + +- **Origin document:** [docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md](docs/brainstorms/2026-03-28-ce-review-headless-mode-requirements.md) +- Related code: `plugins/compound-engineering/skills/ce-review/SKILL.md`, `plugins/compound-engineering/skills/document-review/SKILL.md` +- Related PRs: #425 (document-review headless mode) +- Learnings: `docs/solutions/skill-design/beta-promotion-orchestration-contract.md`, `docs/solutions/skill-design/compound-refresh-skill-improvements.md`, `docs/solutions/skill-design/git-workflow-skills-need-explicit-state-machines-2026-03-27.md` diff --git a/plugins/compound-engineering/skills/ce-review/SKILL.md b/plugins/compound-engineering/skills/ce-review/SKILL.md index 2cbbc89..597f621 100644 --- a/plugins/compound-engineering/skills/ce-review/SKILL.md +++ b/plugins/compound-engineering/skills/ce-review/SKILL.md @@ -1,7 +1,7 @@ --- name: ce:review description: "Structured code review using tiered persona agents, confidence-gated findings, and a merge/dedup pipeline. Use when reviewing code changes before creating a PR." -argument-hint: "[mode:autofix|mode:report-only] [PR number, GitHub URL, or branch name]" +argument-hint: "[mode:autofix|mode:report-only|mode:headless] [PR number, GitHub URL, or branch name]" --- # Code Review @@ -24,11 +24,14 @@ Parse `$ARGUMENTS` for the following optional tokens. Strip each recognized toke |-------|---------|--------| | `mode:autofix` | `mode:autofix` | Select autofix mode (see Mode Detection below) | | `mode:report-only` | `mode:report-only` | Select report-only mode | +| `mode:headless` | `mode:headless` | Select headless mode for programmatic callers (see Mode Detection below) | | `base:<sha-or-ref>` | `base:abc1234` or `base:origin/main` | Skip scope detection — use this as the diff base directly | | `plan:<path>` | `plan:docs/plans/2026-03-25-001-feat-foo-plan.md` | Load this plan for requirements verification | All tokens are optional. Each one present means one less thing to infer. When absent, fall back to existing behavior for that stage. +**Conflicting mode flags:** If multiple mode tokens appear in arguments, stop and do not dispatch agents. If `mode:headless` is one of the conflicting tokens, emit the headless error envelope: `Review failed (headless mode). Reason: conflicting mode flags — <mode_a> and <mode_b> cannot be combined.` Otherwise emit the generic form: `Review failed. Reason: conflicting mode flags — <mode_a> and <mode_b> cannot be combined.` + ## Mode Detection | Mode | When | Behavior | @@ -36,6 +39,7 @@ All tokens are optional. Each one present means one less thing to infer. When ab | **Interactive** (default) | No mode token present | Review, present findings, ask for policy decisions when needed, and optionally continue into fix/push/PR next steps | | **Autofix** | `mode:autofix` in arguments | No user interaction. Review, apply only policy-allowed `safe_auto` fixes, re-review in bounded rounds, write a run artifact, and emit residual downstream work when needed | | **Report-only** | `mode:report-only` in arguments | Strictly read-only. Review and report only, then stop with no edits, artifacts, todos, commits, pushes, or PR actions | +| **Headless** | `mode:headless` in arguments | Programmatic mode for skill-to-skill invocation. Apply `safe_auto` fixes silently (single pass), return all other findings as structured text output, write run artifacts, skip todos, and return "Review complete" signal. No interactive prompts. | ### Autofix mode rules @@ -53,6 +57,19 @@ All tokens are optional. Each one present means one less thing to infer. When ab - **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:report-only` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`. - **Do not overlap mutating review with browser testing on the same checkout.** If a future orchestrator wants fixes, run the mutating review phase after browser testing or in an isolated checkout/worktree. +### Headless mode rules + +- **Skip all user questions.** Never use the platform question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini) or other interactive prompts. Infer intent conservatively if the diff metadata is thin. +- **Require a determinable diff scope.** If headless mode cannot determine a diff scope (no branch, PR, or `base:` ref determinable without user interaction), emit `Review failed (headless mode). Reason: no diff scope detected. Re-invoke with a branch name, PR number, or base:<ref>.` and stop without dispatching agents. +- **Apply only `safe_auto -> review-fixer` findings in a single pass.** No bounded re-review rounds. Leave `gated_auto`, `manual`, `human`, and `release` work unresolved and return them in the structured output. +- **Return all non-auto findings as structured text output.** Use the headless output envelope format (see Stage 6 below) preserving severity, autofix_class, owner, requires_verification, confidence, evidence[], and pre_existing per finding. +- **Write a run artifact** under `.context/compound-engineering/ce-review/<run-id>/` summarizing findings, applied fixes, and advisory outputs. Include the artifact path in the structured output. +- **Do not create todo files.** The caller receives structured findings and routes downstream work itself. +- **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:headless` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`. When stopping, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.` +- **Not safe for concurrent use on a shared checkout.** Unlike `mode:report-only`, headless mutates files (applies `safe_auto` fixes). Callers must not run headless concurrently with other mutating operations on the same checkout. +- **Never commit, push, or create a PR** from headless mode. The caller owns those decisions. +- **End with "Review complete" as the terminal signal** so callers can detect completion. If all reviewers fail or time out, emit `Code review degraded (headless mode). Reason: 0 of N reviewers returned results.` followed by "Review complete". + ## Severity Scale All reviewers use P0-P3: @@ -164,7 +181,7 @@ This path works with any ref — a SHA, `origin/main`, a branch name. Automated **If a PR number or GitHub URL is provided as an argument:** -If `mode:report-only` is active, do **not** run `gh pr checkout <number-or-url>` on the shared checkout. Tell the caller: "mode:report-only cannot switch the shared checkout to review a PR target. Run it from an isolated worktree/checkout for that PR, or run report-only with no target argument on the already checked out branch." Stop here unless the review is already running in an isolated checkout. +If `mode:report-only` or `mode:headless` is active, do **not** run `gh pr checkout <number-or-url>` on the shared checkout. For `mode:report-only`, tell the caller: "mode:report-only cannot switch the shared checkout to review a PR target. Run it from an isolated worktree/checkout for that PR, or run report-only with no target argument on the already checked out branch." For `mode:headless`, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.` Stop here unless the review is already running in an isolated checkout. First, verify the worktree is clean before switching branches: @@ -218,7 +235,7 @@ Extract PR title/body, base branch, and PR URL from `gh pr view`, then extract t Check out the named branch, then diff it against the base branch. Substitute the provided branch name (shown here as `<branch>`). -If `mode:report-only` is active, do **not** run `git checkout <branch>` on the shared checkout. Tell the caller: "mode:report-only cannot switch the shared checkout to review another branch. Run it from an isolated worktree/checkout for `<branch>`, or run report-only on the current checkout with no target argument." Stop here unless the review is already running in an isolated checkout. +If `mode:report-only` or `mode:headless` is active, do **not** run `git checkout <branch>` on the shared checkout. For `mode:report-only`, tell the caller: "mode:report-only cannot switch the shared checkout to review another branch. Run it from an isolated worktree/checkout for `<branch>`, or run report-only on the current checkout with no target argument." For `mode:headless`, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.` Stop here unless the review is already running in an isolated checkout. First, verify the worktree is clean before switching branches: @@ -270,7 +287,7 @@ echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" Using `git diff $BASE` (without `..HEAD`) diffs the merge-base against the working tree, which includes committed, staged, and unstaged changes together. -**Untracked file handling:** Always inspect the `UNTRACKED:` list, even when `FILES:`/`DIFF:` are non-empty. Untracked files are outside review scope until staged. If the list is non-empty, tell the user which files are excluded. If any of them should be reviewed, stop and tell the user to `git add` them first and rerun. Only continue when the user is intentionally reviewing tracked changes only. +**Untracked file handling:** Always inspect the `UNTRACKED:` list, even when `FILES:`/`DIFF:` are non-empty. Untracked files are outside review scope until staged. If the list is non-empty, tell the user which files are excluded. If any of them should be reviewed, stop and tell the user to `git add` them first and rerun. Only continue when the user is intentionally reviewing tracked changes only. In `mode:headless` or `mode:autofix`, do not stop to ask — proceed with tracked changes only and note the excluded untracked files in the Coverage section of the output. ### Stage 2: Intent discovery @@ -298,7 +315,7 @@ Pass this to every reviewer in their spawn prompt. Intent shapes *how hard each **When intent is ambiguous:** - **Interactive mode:** Ask one question using the platform's interactive question tool (AskUserQuestion in Claude Code, request_user_input in Codex): "What is the primary goal of these changes?" Do not spawn reviewers until intent is established. -- **Autofix/report-only modes:** Infer intent conservatively from the branch name, diff, PR metadata, and caller context. Note the uncertainty in Coverage or Verdict reasoning instead of blocking. +- **Autofix/report-only/headless modes:** Infer intent conservatively from the branch name, diff, PR metadata, and caller context. Note the uncertainty in Coverage or Verdict reasoning instead of blocking. ### Stage 2b: Plan discovery (requirements verification) @@ -420,6 +437,80 @@ Assemble the final report using the review output template included below: Do not include time estimates. +### Headless output format + +In `mode:headless`, replace the interactive pipe-delimited table report with a structured text envelope. The envelope follows the same structural pattern as document-review's headless output (completion header, metadata block, findings grouped by autofix_class, trailing sections) while using ce:review's own section headings and per-finding fields. + +``` +Code review complete (headless mode). + +Scope: <scope-line> +Intent: <intent-summary> +Reviewers: <reviewer-list with conditional justifications> +Verdict: <Ready to merge | Ready with fixes | Not ready> +Artifact: .context/compound-engineering/ce-review/<run-id>/ + +Applied N safe_auto fixes. + +Gated-auto findings (concrete fix, changes behavior/contracts): + +[P1][gated_auto -> downstream-resolver][needs-verification] File: <file:line> -- <title> (<reviewer>, confidence <N>) + Why: <why_it_matters> + Suggested fix: <suggested_fix or "none"> + Evidence: <evidence[0]> + Evidence: <evidence[1]> + +Manual findings (actionable, needs handoff): + +[P1][manual -> downstream-resolver] File: <file:line> -- <title> (<reviewer>, confidence <N>) + Why: <why_it_matters> + Evidence: <evidence[0]> + +Advisory findings (report-only): + +[P2][advisory -> human] File: <file:line> -- <title> (<reviewer>, confidence <N>) + Why: <why_it_matters> + +Pre-existing issues: +[P2][gated_auto -> downstream-resolver] File: <file:line> -- <title> (<reviewer>, confidence <N>) + Why: <why_it_matters> + +Residual risks: +- <risk> + +Learnings & Past Solutions: +- <learning> + +Agent-Native Gaps: +- <gap description> + +Schema Drift Check: +- <drift status> + +Deployment Notes: +- <deployment note> + +Testing gaps: +- <gap> + +Coverage: +- Suppressed: <N> findings below 0.60 confidence +- Untracked files excluded: <file1>, <file2> +- Failed reviewers: <reviewer> + +Review complete +``` + +**Formatting rules:** +- The `[needs-verification]` marker appears only on findings where `requires_verification: true`. +- The `Artifact:` line gives callers the path to the full run artifact for machine-readable access to the complete findings schema. The text envelope is the primary handoff; the artifact is for debugging and full-fidelity access. +- Findings with `owner: release` appear in the Advisory section (they are operational/rollout items, not code fixes). +- Findings with `pre_existing: true` appear in the Pre-existing section regardless of autofix_class. +- The Verdict appears in the metadata header (deliberately reordered from the interactive format where it appears at the bottom) so programmatic callers get the verdict first. +- Omit any section with zero items. +- If all reviewers fail or time out, emit `Code review degraded (headless mode). Reason: 0 of N reviewers returned results.` followed by "Review complete". +- End with "Review complete" as the terminal signal so callers can detect completion. + ## Quality Gates Before delivering the review, verify: @@ -482,6 +573,15 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R - Do not create residual todos or `.context` artifacts. - Stop after Stage 6. Everything remains in the report. +**Headless mode** + +- Ask no questions. +- Apply only the `safe_auto -> review-fixer` queue in a single pass. Do not enter the bounded re-review loop (Step 3). Spawn one fixer subagent, apply fixes, then proceed directly to Step 4. +- Leave `gated_auto`, `manual`, `human`, and `release` items unresolved — they appear in the structured text output. +- Output the headless output envelope (see Stage 6) instead of the interactive report. +- Write a run artifact (Step 4) but do not create todo files. +- Stop after the structured text output and "Review complete" signal. No commit/push/PR. + #### Step 3: Apply fixes with one fixer and bounded rounds - Spawn exactly one fixer subagent for the current fixer queue in the current checkout. That fixer applies all approved changes and runs the relevant targeted tests in one pass against a consistent tree. @@ -493,7 +593,7 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R #### Step 4: Emit artifacts and downstream handoff -- In interactive and autofix modes, write a per-run artifact under `.context/compound-engineering/ce-review/<run-id>/` containing: +- In interactive, autofix, and headless modes, write a per-run artifact under `.context/compound-engineering/ce-review/<run-id>/` containing: - synthesized findings - applied fixes - residual actionable work @@ -521,7 +621,7 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R If "Create a PR": first publish the branch with `git push --set-upstream origin HEAD`, then use `gh pr create` with a title and summary derived from the branch changes. If "Push fixes": push the branch with `git push` to update the existing PR. -**Autofix and report-only modes:** stop after the report, artifact emission, and residual-work handoff. Do not commit, push, or create a PR. +**Autofix, report-only, and headless modes:** stop after the report, artifact emission, and residual-work handoff. Do not commit, push, or create a PR. ## Fallback diff --git a/plugins/compound-engineering/skills/ce-review/references/review-output-template.md b/plugins/compound-engineering/skills/ce-review/references/review-output-template.md index a2ca65c..ec1a5d5 100644 --- a/plugins/compound-engineering/skills/ce-review/references/review-output-template.md +++ b/plugins/compound-engineering/skills/ce-review/references/review-output-template.md @@ -101,7 +101,7 @@ Use this **exact format** when presenting synthesized review findings. Findings - **Confidence column** shows the finding's confidence score - **Route column** shows the synthesized handling decision as ``<autofix_class> -> <owner>``. - **Header includes** scope, intent, and reviewer team with per-conditional justifications -- **Mode line** -- include `interactive`, `autofix`, or `report-only` +- **Mode line** -- include `interactive`, `autofix`, `report-only`, or `headless` - **Applied Fixes section** -- include only when a fix phase ran in this review invocation - **Residual Actionable Work section** -- include only when unresolved actionable findings were handed off for later work - **Pre-existing section** -- separate table, no confidence column (these are informational) @@ -113,3 +113,15 @@ Use this **exact format** when presenting synthesized review findings. Findings - **Summary uses blockquotes** for verdict, reasoning, and fix order - **Horizontal rule** (`---`) separates findings from verdict - **`###` headers** for each section -- never plain text headers + +## Headless Mode Format + +In `mode:headless`, replace the interactive pipe-delimited table report with a structured text envelope. The headless format is defined in the `### Headless output format` section of SKILL.md. Key differences from the interactive format: + +- **No pipe-delimited tables.** Findings use `[severity][autofix_class -> owner] File: <file:line> -- <title>` line format with indented Why/Evidence/Suggested fix lines. +- **Findings grouped by autofix_class** (gated-auto, manual, advisory) instead of severity. Within each group, findings are sorted by severity. +- **Verdict in header** (top of output) instead of bottom, so programmatic callers get it first. +- **`Artifact:` line** in metadata header gives callers the path to the full run artifact. +- **`[needs-verification]` marker** on findings where `requires_verification: true`. +- **Evidence lines** included per finding. +- **Completion signal:** "Review complete" as the final line. diff --git a/plugins/compound-engineering/skills/document-review/SKILL.md b/plugins/compound-engineering/skills/document-review/SKILL.md index f11c130..8b39290 100644 --- a/plugins/compound-engineering/skills/document-review/SKILL.md +++ b/plugins/compound-engineering/skills/document-review/SKILL.md @@ -1,7 +1,7 @@ --- name: document-review description: Review requirements or plan documents using parallel persona agents that surface role-specific issues. Use when a requirements document or plan document exists and the user wants to improve it. -argument-hint: "[path/to/document.md]" +argument-hint: "[mode:headless] [path/to/document.md]" --- # Document Review @@ -23,7 +23,7 @@ The caller receives findings with their original classifications intact and deci Callers invoke headless mode by including `mode:headless` in the skill arguments, e.g.: ``` -Skill("compound-engineering:document-review", "docs/plans/my-plan.md mode:headless") +Skill("compound-engineering:document-review", "mode:headless docs/plans/my-plan.md") ``` If `mode:headless` is not present, the skill runs in its default interactive mode with no behavior change. @@ -34,7 +34,7 @@ If `mode:headless` is not present, the skill runs in its default interactive mod **If no document is specified (interactive mode):** Ask which document to review, or find the most recent in `docs/brainstorms/` or `docs/plans/` using a file-search/glob tool (e.g., Glob in Claude Code). -**If no document is specified (headless mode):** Output "Review failed: headless mode requires a document path. Re-invoke with: Skill(\"compound-engineering:document-review\", \"<path> mode:headless\")" without dispatching agents. +**If no document is specified (headless mode):** Output "Review failed: headless mode requires a document path. Re-invoke with: Skill(\"compound-engineering:document-review\", \"mode:headless <path>\")" without dispatching agents. ### Classify Document Type diff --git a/tests/review-skill-contract.test.ts b/tests/review-skill-contract.test.ts index a6d55b8..726d703 100644 --- a/tests/review-skill-contract.test.ts +++ b/tests/review-skill-contract.test.ts @@ -14,6 +14,7 @@ describe("ce-review contract", () => { expect(content).toContain("## Mode Detection") expect(content).toContain("mode:autofix") expect(content).toContain("mode:report-only") + expect(content).toContain("mode:headless") expect(content).toContain(".context/compound-engineering/ce-review/<run-id>/") expect(content).toContain("Do not create residual todos or `.context` artifacts.") expect(content).toContain( @@ -25,6 +26,49 @@ describe("ce-review contract", () => { expect(content).not.toContain("Which severities should I fix?") }) + test("documents headless mode contract for programmatic callers", async () => { + const content = await readRepoFile("plugins/compound-engineering/skills/ce-review/SKILL.md") + + // Headless mode has its own rules section + expect(content).toContain("### Headless mode rules") + + // No interactive prompts (cross-platform) + expect(content).toContain( + "Never use the platform question tool", + ) + + // Structured output format + expect(content).toContain("### Headless output format") + expect(content).toContain("Code review complete (headless mode).") + expect(content).toContain('"Review complete" as the terminal signal') + + // Applies safe_auto fixes but NOT safe for concurrent use + expect(content).toContain( + "Not safe for concurrent use on a shared checkout.", + ) + + // Writes artifacts but no todos, no commit/push/PR + expect(content).toContain("Do not create todo files.") + expect(content).toContain( + "Never commit, push, or create a PR", + ) + + // Single-pass fixing, no bounded re-review rounds + expect(content).toContain("No bounded re-review rounds") + + // Checkout guard — headless shares report-only's guard + expect(content).toMatch(/mode:headless.*must run in an isolated checkout\/worktree or stop/) + + // Conflicting mode flags + expect(content).toContain("**Conflicting mode flags:**") + + // Structured error for missing scope + expect(content).toContain("Review failed (headless mode). Reason: no diff scope detected.") + + // Degraded signal when all reviewers fail + expect(content).toContain("Code review degraded (headless mode).") + }) + test("documents policy-driven routing and residual handoff", async () => { const content = await readRepoFile("plugins/compound-engineering/skills/ce-review/SKILL.md")