feat: add ce:review-beta with structured persona pipeline (#348)
This commit is contained in:
@@ -0,0 +1,316 @@
|
||||
---
|
||||
title: "feat: Make ce:review-beta autonomous and pipeline-safe"
|
||||
type: feat
|
||||
status: active
|
||||
date: 2026-03-23
|
||||
origin: direct user request and planning discussion on ce:review-beta standalone vs. autonomous pipeline behavior
|
||||
---
|
||||
|
||||
# Make ce:review-beta Autonomous and Pipeline-Safe
|
||||
|
||||
## Overview
|
||||
|
||||
Redesign `ce:review-beta` from a purely interactive standalone review workflow into a policy-driven review engine that supports three explicit modes: `interactive`, `autonomous`, and `report-only`. The redesign should preserve the current standalone UX for manual review, enable hands-off review and safe autofix in automated workflows, and define a clean residual-work handoff for anything that should not be auto-fixed. This plan remains beta-only; promotion to stable `ce:review` and any `lfg` / `slfg` cutover should happen only in a follow-up plan after the beta behavior is validated.
|
||||
|
||||
## Problem Frame
|
||||
|
||||
`ce:review-beta` currently mixes three responsibilities in one loop:
|
||||
|
||||
1. Review and synthesis
|
||||
2. Human approval on what to fix
|
||||
3. Local fixing, re-review, and push/PR next steps
|
||||
|
||||
That is acceptable for standalone use, but it is the wrong shape for autonomous orchestration:
|
||||
|
||||
- `lfg` currently treats review as an upstream producer before downstream resolution and browser testing
|
||||
- `slfg` currently runs review and browser testing in parallel, which is only safe if review is non-mutating
|
||||
- `resolve-todo-parallel` expects a durable residual-work contract (`todos/`), while `ce:review-beta` currently tries to resolve accepted findings inline
|
||||
- The findings schema lacks routing metadata, so severity is doing too much work; urgency and autofix eligibility are distinct concerns
|
||||
|
||||
The result is a workflow that is hard to promote safely: it can be interactive, or autonomous, or mutation-owning, but not all three at once without an explicit mode model and clearer ownership boundaries.
|
||||
|
||||
## Requirements Trace
|
||||
|
||||
- R1. `ce:review-beta` supports explicit execution modes: `interactive` (default), `autonomous`, and `report-only`
|
||||
- R2. `autonomous` mode never asks the user questions, never waits for approval, and applies only policy-allowed safe fixes
|
||||
- R3. `report-only` mode is strictly read-only and safe to run in parallel with other read-only verification steps
|
||||
- R4. Findings are routed by explicit fixability metadata, not by severity alone
|
||||
- R5. `ce:review-beta` can run one bounded in-skill autofix pass for `safe_auto` findings and then re-review the changed scope
|
||||
- R6. Residual actionable findings are emitted as durable downstream work artifacts; advisory outputs remain report-only
|
||||
- R7. CE helper outputs (`learnings`, `agent-native`, `schema-drift`, `deployment-verification`) are preserved but only some become actionable work items
|
||||
- R8. The beta contract makes future orchestration constraints explicit so a later `lfg` / `slfg` cutover does not run a mutating review concurrently with browser testing on the same checkout
|
||||
- R9. Repeated regression classes around interaction mode, routing, and orchestration boundaries gain lightweight contract coverage
|
||||
|
||||
## Scope Boundaries
|
||||
|
||||
- Keep the existing persona ensemble, confidence gate, and synthesis model as the base architecture
|
||||
- Do not redesign every reviewer persona's prompt beyond the metadata they need to emit
|
||||
- Do not introduce a new general-purpose orchestration framework; reuse existing skill patterns where possible
|
||||
- Do not auto-fix deployment checklists, residual risks, or other advisory-only outputs
|
||||
- Do not attempt broad converter/platform work in this change unless the review skill's frontmatter or references require it
|
||||
- Beta remains the only implementation target in this plan; stable promotion is intentionally deferred to a follow-up plan after validation
|
||||
|
||||
## Context & Research
|
||||
|
||||
### Relevant Code and Patterns
|
||||
|
||||
- `plugins/compound-engineering/skills/ce-review-beta/SKILL.md`
|
||||
- Current staged review pipeline with interactive severity acceptance, inline fixer, re-review offer, and post-fix push/PR actions
|
||||
- `plugins/compound-engineering/skills/ce-review-beta/references/findings-schema.json`
|
||||
- Structured persona finding contract today; currently missing routing metadata for autonomous handling
|
||||
- `plugins/compound-engineering/skills/ce-review/SKILL.md`
|
||||
- Current stable review workflow; creates durable `todos/` artifacts rather than fixing findings inline
|
||||
- `plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md`
|
||||
- Existing residual-work resolver; parallelizes item handling once work has already been externalized
|
||||
- `plugins/compound-engineering/skills/file-todos/SKILL.md`
|
||||
- Existing review -> triage -> todo -> resolve integration contract
|
||||
- `plugins/compound-engineering/skills/lfg/SKILL.md`
|
||||
- Sequential orchestrator whose future cutover constraints should inform the beta contract, even though this plan does not modify it
|
||||
- `plugins/compound-engineering/skills/slfg/SKILL.md`
|
||||
- Swarm orchestrator whose current review/browser parallelism defines an important future integration constraint, even though this plan does not modify it
|
||||
- `plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md`
|
||||
- Strong repo precedent for explicit `mode:autonomous` argument handling and conservative non-interactive behavior
|
||||
- `plugins/compound-engineering/skills/ce-plan/SKILL.md`
|
||||
- Strong repo precedent for pipeline mode skipping interactive questions
|
||||
|
||||
### Institutional Learnings
|
||||
|
||||
- `docs/solutions/skill-design/compound-refresh-skill-improvements.md`
|
||||
- Explicit autonomous mode beats tool-based auto-detection
|
||||
- Ambiguous cases in autonomous mode should be recorded conservatively, not guessed
|
||||
- Report structure should distinguish applied actions from recommended follow-up
|
||||
- `docs/solutions/skill-design/beta-skills-framework.md`
|
||||
- Beta skills should remain isolated until validated
|
||||
- Promotion is the right time to rewire `lfg` / `slfg`, which is out of scope for this plan
|
||||
|
||||
### External Research Decision
|
||||
|
||||
Skipped. This is a repo-internal orchestration and skill-design change with strong existing local patterns for autonomous mode, beta promotion, and residual-work handling.
|
||||
|
||||
## Key Technical Decisions
|
||||
|
||||
- **Use explicit mode arguments instead of auto-detection.** Follow `ce:compound-refresh` and require `mode:autonomous` / `mode:report-only` arguments. Interactive remains the default. This avoids conflating "no question tool" with "headless workflow."
|
||||
- **Split review from mutation semantically, not by creating two separate skills.** `ce:review-beta` should always perform the same review and synthesis stages. Mutation behavior becomes a mode-controlled phase layered on top.
|
||||
- **Route by fixability, not severity.** Add explicit per-finding routing fields such as `autofix_class`, `owner`, and `requires_verification`. Severity remains urgency; it no longer implies who acts.
|
||||
- **Keep one in-skill fixer, but only for `safe_auto` findings.** The current "one fixer subagent" rule is still right for consistent-tree edits. The change is that the fixer is selected by policy and routing metadata, not by an interactive severity prompt.
|
||||
- **Emit both ephemeral and durable outputs.** Use `.context/compound-engineering/ce-review-beta/<run-id>/` for the per-run machine-readable report and create durable `todos/` items only for unresolved actionable findings that belong downstream.
|
||||
- **Treat CE helper outputs by artifact class.**
|
||||
- `learnings-researcher`: contextual/advisory unless a concrete finding corroborates it
|
||||
- `agent-native-reviewer`: often `gated_auto` or `manual`, occasionally `safe_auto` when the fix is purely local and mechanical
|
||||
- `schema-drift-detector`: default `manual` or `gated_auto`; never auto-fix blindly by default
|
||||
- `deployment-verification-agent`: always advisory / operational, never autofix
|
||||
- **Design the beta contract so future orchestration cutover is safe.** The beta must make it explicit that mutating review cannot run concurrently with browser testing on the same checkout. That requirement is part of validation and future cutover criteria, not a same-plan rewrite of `slfg`.
|
||||
- **Move push / PR creation decisions out of autonomous review.** Interactive standalone mode may still offer next-step prompts. Autonomous and report-only modes should stop after producing fixes and/or residual artifacts; any future parent workflow decides commit, push, and PR timing.
|
||||
- **Add lightweight contract tests.** Repeated regressions have come from instruction-boundary drift. String- and structure-level contract tests are justified here even though the behavior is prompt-driven.
|
||||
|
||||
## Open Questions
|
||||
|
||||
### Resolved During Planning
|
||||
|
||||
- **Should `ce:review-beta` keep any embedded fix loop?** Yes, but only for `safe_auto` findings under an explicit mode/policy. Residual work is handed off.
|
||||
- **Should autonomous mode be inferred from lack of interactivity?** No. Use explicit `mode:autonomous`.
|
||||
- **Should `slfg` keep review and browser testing in parallel?** No, not once review can mutate the checkout. Run browser testing after the mutating review phase on the stabilized tree.
|
||||
- **Should residual work be `todos/`, `.context/`, or both?** Both. `.context` holds the run artifact; `todos/` is only for durable unresolved actionable work.
|
||||
|
||||
### Deferred to Implementation
|
||||
|
||||
- Exact metadata field names in `findings-schema.json`
|
||||
- Whether `report-only` should imply a different default output template section ordering than `interactive` / `autonomous`
|
||||
- Whether residual `todos/` should be created directly by `ce:review-beta` or via a small shared helper/reference template used by both review and resolver flows
|
||||
|
||||
## High-Level Technical Design
|
||||
|
||||
This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.
|
||||
|
||||
```text
|
||||
review stages -> synthesize -> classify outputs by autofix_class/owner
|
||||
-> if mode=report-only: emit report + stop
|
||||
-> if mode=interactive: acquire policy from user
|
||||
-> if mode=autonomous: use policy from arguments/defaults
|
||||
-> run single fixer on safe_auto set
|
||||
-> verify tests + focused re-review
|
||||
-> emit residual todos for unresolved actionable items
|
||||
-> emit advisory/report sections for non-actionable outputs
|
||||
```
|
||||
|
||||
## Implementation Units
|
||||
|
||||
- [x] **Unit 1: Add explicit mode handling and routing metadata to ce:review-beta**
|
||||
|
||||
**Goal:** Give `ce:review-beta` a clear execution contract for standalone, autonomous, and read-only pipeline use.
|
||||
|
||||
**Requirements:** R1, R2, R3, R4, R7
|
||||
|
||||
**Dependencies:** None
|
||||
|
||||
**Files:**
|
||||
- Modify: `plugins/compound-engineering/skills/ce-review-beta/SKILL.md`
|
||||
- Modify: `plugins/compound-engineering/skills/ce-review-beta/references/findings-schema.json`
|
||||
- Modify: `plugins/compound-engineering/skills/ce-review-beta/references/review-output-template.md`
|
||||
- Modify: `plugins/compound-engineering/skills/ce-review-beta/references/subagent-template.md` (if routing metadata needs to be spelled out in spawn prompts)
|
||||
|
||||
**Approach:**
|
||||
- Add a Mode Detection section near the top of `SKILL.md` using the established `mode:autonomous` argument pattern from `ce:compound-refresh`
|
||||
- Introduce `mode:report-only` alongside `mode:autonomous`
|
||||
- Scope all interactive question instructions so they apply only to interactive mode
|
||||
- Extend `findings-schema.json` with routing-oriented fields such as:
|
||||
- `autofix_class`: `safe_auto | gated_auto | manual | advisory`
|
||||
- `owner`: `review-fixer | downstream-resolver | human | release`
|
||||
- `requires_verification`: boolean
|
||||
- Update the review output template so the final report can distinguish:
|
||||
- applied fixes
|
||||
- residual actionable work
|
||||
- advisory / operational notes
|
||||
|
||||
**Patterns to follow:**
|
||||
- `plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md` explicit autonomous mode structure
|
||||
- `plugins/compound-engineering/skills/ce-plan/SKILL.md` pipeline-mode question skipping
|
||||
|
||||
**Test scenarios:**
|
||||
- Interactive mode still presents questions and next-step prompts
|
||||
- `mode:autonomous` never asks a question and never waits for user input
|
||||
- `mode:report-only` performs no edits and no commit/push/PR actions
|
||||
- A helper-agent output can be preserved in the final report without being treated as auto-fixable work
|
||||
|
||||
**Verification:**
|
||||
- `tests/review-skill-contract.test.ts` asserts the three mode markers and interactive scoping rules
|
||||
- `bun run release:validate` passes
|
||||
|
||||
- [x] **Unit 2: Redesign the fix loop around policy-driven safe autofix and bounded re-review**
|
||||
|
||||
**Goal:** Replace the current severity-prompt-centric fix loop with one that works in both interactive and autonomous contexts.
|
||||
|
||||
**Requirements:** R2, R4, R5, R7
|
||||
|
||||
**Dependencies:** Unit 1
|
||||
|
||||
**Files:**
|
||||
- Modify: `plugins/compound-engineering/skills/ce-review-beta/SKILL.md`
|
||||
- Add: `plugins/compound-engineering/skills/ce-review-beta/references/fix-policy.md` (if the classification and policy table becomes too large for `SKILL.md`)
|
||||
- Modify: `plugins/compound-engineering/skills/ce-review-beta/references/review-output-template.md`
|
||||
|
||||
**Approach:**
|
||||
- Replace "Severity Acceptance" as the primary decision point with a classification stage that groups synthesized findings by `autofix_class`
|
||||
- In interactive mode, ask the user only for policy decisions that remain ambiguous after classification
|
||||
- In autonomous mode, use conservative defaults:
|
||||
- apply `safe_auto`
|
||||
- leave `gated_auto`, `manual`, and `advisory` unresolved
|
||||
- Keep the "exactly one fixer subagent" rule for consistency
|
||||
- Bound the loop with `max_rounds` (for example 2) and require targeted verification plus focused re-review after any applied fix set
|
||||
- Restrict commit / push / PR creation steps to interactive mode only; autonomous and report-only modes stop after emitting outputs
|
||||
|
||||
**Patterns to follow:**
|
||||
- `docs/solutions/skill-design/compound-refresh-skill-improvements.md` applied-vs-recommended distinction
|
||||
- Existing `ce-review-beta` single-fixer rule
|
||||
|
||||
**Test scenarios:**
|
||||
- A `safe_auto` testing finding gets fixed and re-reviewed without user input in autonomous mode
|
||||
- A `gated_auto` API contract or authz finding is preserved as residual actionable work, not auto-fixed
|
||||
- A deployment checklist remains advisory and never enters the fixer queue
|
||||
- Zero findings skip the fix phase entirely
|
||||
- Re-review is bounded and does not recurse indefinitely
|
||||
|
||||
**Verification:**
|
||||
- `tests/review-skill-contract.test.ts` asserts that autonomous mode has no mandatory user-question step in the fix path
|
||||
- Manual dry run: read the fix-loop prose end-to-end and verify there is no mutation-owning step outside the policy gate
|
||||
|
||||
- [x] **Unit 3: Define residual artifact and downstream handoff behavior**
|
||||
|
||||
**Goal:** Make autonomous review compatible with downstream workflows instead of competing with them.
|
||||
|
||||
**Requirements:** R5, R6, R7
|
||||
|
||||
**Dependencies:** Unit 2
|
||||
|
||||
**Files:**
|
||||
- Modify: `plugins/compound-engineering/skills/ce-review-beta/SKILL.md`
|
||||
- Modify: `plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md`
|
||||
- Modify: `plugins/compound-engineering/skills/file-todos/SKILL.md`
|
||||
- Add: `plugins/compound-engineering/skills/ce-review-beta/references/residual-work-template.md` (if a dedicated durable-work shape helps keep review prose smaller)
|
||||
|
||||
**Approach:**
|
||||
- Write a per-run review artifact under `.context/compound-engineering/ce-review-beta/<run-id>/` containing:
|
||||
- synthesized findings
|
||||
- what was auto-fixed
|
||||
- what remains unresolved
|
||||
- advisory-only outputs
|
||||
- Create durable `todos/` items only for unresolved actionable findings whose `owner` is downstream resolution
|
||||
- Update `resolve-todo-parallel` to acknowledge this source explicitly so residual review work can be picked up without pretending everything came from stable `ce:review`
|
||||
- Update `file-todos` integration guidance to reflect the new flow:
|
||||
- review-beta autonomous -> residual todos -> resolve-todo-parallel
|
||||
- advisory-only outputs do not become todos
|
||||
|
||||
**Patterns to follow:**
|
||||
- `.context/compound-engineering/<workflow>/<run-id>/` scratch-space convention from `AGENTS.md`
|
||||
- Existing `file-todos` review/resolution lifecycle
|
||||
|
||||
**Test scenarios:**
|
||||
- Autonomous review with only advisory outputs creates no todos
|
||||
- Autonomous review with 2 unresolved actionable findings creates exactly 2 residual todos
|
||||
- Residual work items exclude protected-artifact cleanup suggestions
|
||||
- The run artifact is sufficient to explain what the in-skill fixer changed vs. what remains
|
||||
|
||||
**Verification:**
|
||||
- `tests/review-skill-contract.test.ts` asserts the documented `.context` and `todos/` handoff rules
|
||||
- `bun run release:validate` passes after any skill inventory/reference changes
|
||||
|
||||
- [x] **Unit 4: Add contract-focused regression coverage for mode, handoff, and future-integration boundaries**
|
||||
|
||||
**Goal:** Catch the specific instruction-boundary regressions that have repeatedly escaped manual review.
|
||||
|
||||
**Requirements:** R8, R9
|
||||
|
||||
**Dependencies:** Units 1-3
|
||||
|
||||
**Files:**
|
||||
- Add: `tests/review-skill-contract.test.ts`
|
||||
- Optionally modify: `package.json` only if a new test entry point is required (prefer using the existing Bun test setup without package changes)
|
||||
|
||||
**Approach:**
|
||||
- Add a focused test that reads the relevant skill files and asserts contract-level invariants instead of brittle full-file snapshots
|
||||
- Cover:
|
||||
- `ce-review-beta` mode markers and mode-specific behavior phrases
|
||||
- absence of unconditional interactive prompts in autonomous/report-only paths
|
||||
- explicit residual-work handoff language
|
||||
- explicit documentation that mutating review must not run concurrently with browser testing on the same checkout
|
||||
- Keep assertions semantic and localized; avoid snapshotting large markdown files
|
||||
|
||||
**Patterns to follow:**
|
||||
- Existing Bun tests that read repository files directly for release/config validation
|
||||
|
||||
**Test scenarios:**
|
||||
- Missing `mode:autonomous` block fails
|
||||
- Reintroduced unconditional "Ask the user" text in the autonomous path fails
|
||||
- Missing residual todo handoff text fails
|
||||
- Missing future integration constraint around mutating review vs. browser testing fails
|
||||
|
||||
**Verification:**
|
||||
- `bun test tests/review-skill-contract.test.ts`
|
||||
- full `bun test`
|
||||
|
||||
## Risks & Dependencies
|
||||
|
||||
- **Over-aggressive autofix classification.**
|
||||
- Mitigation: conservative defaults, `gated_auto` bucket, bounded rounds, focused re-review
|
||||
- **Dual ownership confusion between `ce:review-beta` and `resolve-todo-parallel`.**
|
||||
- Mitigation: explicit owner/routing metadata and durable residual-work contract
|
||||
- **Brittle contract tests.**
|
||||
- Mitigation: assert only boundary invariants, not full markdown snapshots
|
||||
- **Promotion churn.**
|
||||
- Mitigation: keep beta isolated until Unit 4 contract coverage and manual verification pass
|
||||
|
||||
## Sources & References
|
||||
|
||||
- Related skills:
|
||||
- `plugins/compound-engineering/skills/ce-review-beta/SKILL.md`
|
||||
- `plugins/compound-engineering/skills/ce-review/SKILL.md`
|
||||
- `plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md`
|
||||
- `plugins/compound-engineering/skills/file-todos/SKILL.md`
|
||||
- `plugins/compound-engineering/skills/lfg/SKILL.md`
|
||||
- `plugins/compound-engineering/skills/slfg/SKILL.md`
|
||||
- Institutional learnings:
|
||||
- `docs/solutions/skill-design/compound-refresh-skill-improvements.md`
|
||||
- `docs/solutions/skill-design/beta-skills-framework.md`
|
||||
- Supporting pattern reference:
|
||||
- `plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md`
|
||||
- `plugins/compound-engineering/skills/ce-plan/SKILL.md`
|
||||
Reference in New Issue
Block a user