Files
claude-engineering-plugin/docs/plans/2026-03-23-001-feat-ce-review-beta-pipeline-mode-beta-plan.md

19 KiB

title, type, status, date, origin
title type status date origin
feat: Make ce:review-beta autonomous and pipeline-safe feat active 2026-03-23 direct user request and planning discussion on ce:review-beta standalone vs. autonomous pipeline behavior

Make ce:review-beta Autonomous and Pipeline-Safe

Overview

Redesign ce:review-beta from a purely interactive standalone review workflow into a policy-driven review engine that supports three explicit modes: interactive, autonomous, and report-only. The redesign should preserve the current standalone UX for manual review, enable hands-off review and safe autofix in automated workflows, and define a clean residual-work handoff for anything that should not be auto-fixed. This plan remains beta-only; promotion to stable ce:review and any lfg / slfg cutover should happen only in a follow-up plan after the beta behavior is validated.

Problem Frame

ce:review-beta currently mixes three responsibilities in one loop:

  1. Review and synthesis
  2. Human approval on what to fix
  3. Local fixing, re-review, and push/PR next steps

That is acceptable for standalone use, but it is the wrong shape for autonomous orchestration:

  • lfg currently treats review as an upstream producer before downstream resolution and browser testing
  • slfg currently runs review and browser testing in parallel, which is only safe if review is non-mutating
  • resolve-todo-parallel expects a durable residual-work contract (todos/), while ce:review-beta currently tries to resolve accepted findings inline
  • The findings schema lacks routing metadata, so severity is doing too much work; urgency and autofix eligibility are distinct concerns

The result is a workflow that is hard to promote safely: it can be interactive, or autonomous, or mutation-owning, but not all three at once without an explicit mode model and clearer ownership boundaries.

Requirements Trace

  • R1. ce:review-beta supports explicit execution modes: interactive (default), autonomous, and report-only
  • R2. autonomous mode never asks the user questions, never waits for approval, and applies only policy-allowed safe fixes
  • R3. report-only mode is strictly read-only and safe to run in parallel with other read-only verification steps
  • R4. Findings are routed by explicit fixability metadata, not by severity alone
  • R5. ce:review-beta can run one bounded in-skill autofix pass for safe_auto findings and then re-review the changed scope
  • R6. Residual actionable findings are emitted as durable downstream work artifacts; advisory outputs remain report-only
  • R7. CE helper outputs (learnings, agent-native, schema-drift, deployment-verification) are preserved but only some become actionable work items
  • R8. The beta contract makes future orchestration constraints explicit so a later lfg / slfg cutover does not run a mutating review concurrently with browser testing on the same checkout
  • R9. Repeated regression classes around interaction mode, routing, and orchestration boundaries gain lightweight contract coverage

Scope Boundaries

  • Keep the existing persona ensemble, confidence gate, and synthesis model as the base architecture
  • Do not redesign every reviewer persona's prompt beyond the metadata they need to emit
  • Do not introduce a new general-purpose orchestration framework; reuse existing skill patterns where possible
  • Do not auto-fix deployment checklists, residual risks, or other advisory-only outputs
  • Do not attempt broad converter/platform work in this change unless the review skill's frontmatter or references require it
  • Beta remains the only implementation target in this plan; stable promotion is intentionally deferred to a follow-up plan after validation

Context & Research

Relevant Code and Patterns

  • plugins/compound-engineering/skills/ce-review-beta/SKILL.md
    • Current staged review pipeline with interactive severity acceptance, inline fixer, re-review offer, and post-fix push/PR actions
  • plugins/compound-engineering/skills/ce-review-beta/references/findings-schema.json
    • Structured persona finding contract today; currently missing routing metadata for autonomous handling
  • plugins/compound-engineering/skills/ce-review/SKILL.md
    • Current stable review workflow; creates durable todos/ artifacts rather than fixing findings inline
  • plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md
    • Existing residual-work resolver; parallelizes item handling once work has already been externalized
  • plugins/compound-engineering/skills/file-todos/SKILL.md
    • Existing review -> triage -> todo -> resolve integration contract
  • plugins/compound-engineering/skills/lfg/SKILL.md
    • Sequential orchestrator whose future cutover constraints should inform the beta contract, even though this plan does not modify it
  • plugins/compound-engineering/skills/slfg/SKILL.md
    • Swarm orchestrator whose current review/browser parallelism defines an important future integration constraint, even though this plan does not modify it
  • plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
    • Strong repo precedent for explicit mode:autonomous argument handling and conservative non-interactive behavior
  • plugins/compound-engineering/skills/ce-plan/SKILL.md
    • Strong repo precedent for pipeline mode skipping interactive questions

Institutional Learnings

  • docs/solutions/skill-design/compound-refresh-skill-improvements.md
    • Explicit autonomous mode beats tool-based auto-detection
    • Ambiguous cases in autonomous mode should be recorded conservatively, not guessed
    • Report structure should distinguish applied actions from recommended follow-up
  • docs/solutions/skill-design/beta-skills-framework.md
    • Beta skills should remain isolated until validated
    • Promotion is the right time to rewire lfg / slfg, which is out of scope for this plan

External Research Decision

Skipped. This is a repo-internal orchestration and skill-design change with strong existing local patterns for autonomous mode, beta promotion, and residual-work handling.

Key Technical Decisions

  • Use explicit mode arguments instead of auto-detection. Follow ce:compound-refresh and require mode:autonomous / mode:report-only arguments. Interactive remains the default. This avoids conflating "no question tool" with "headless workflow."
  • Split review from mutation semantically, not by creating two separate skills. ce:review-beta should always perform the same review and synthesis stages. Mutation behavior becomes a mode-controlled phase layered on top.
  • Route by fixability, not severity. Add explicit per-finding routing fields such as autofix_class, owner, and requires_verification. Severity remains urgency; it no longer implies who acts.
  • Keep one in-skill fixer, but only for safe_auto findings. The current "one fixer subagent" rule is still right for consistent-tree edits. The change is that the fixer is selected by policy and routing metadata, not by an interactive severity prompt.
  • Emit both ephemeral and durable outputs. Use .context/compound-engineering/ce-review-beta/<run-id>/ for the per-run machine-readable report and create durable todos/ items only for unresolved actionable findings that belong downstream.
  • Treat CE helper outputs by artifact class.
    • learnings-researcher: contextual/advisory unless a concrete finding corroborates it
    • agent-native-reviewer: often gated_auto or manual, occasionally safe_auto when the fix is purely local and mechanical
    • schema-drift-detector: default manual or gated_auto; never auto-fix blindly by default
    • deployment-verification-agent: always advisory / operational, never autofix
  • Design the beta contract so future orchestration cutover is safe. The beta must make it explicit that mutating review cannot run concurrently with browser testing on the same checkout. That requirement is part of validation and future cutover criteria, not a same-plan rewrite of slfg.
  • Move push / PR creation decisions out of autonomous review. Interactive standalone mode may still offer next-step prompts. Autonomous and report-only modes should stop after producing fixes and/or residual artifacts; any future parent workflow decides commit, push, and PR timing.
  • Add lightweight contract tests. Repeated regressions have come from instruction-boundary drift. String- and structure-level contract tests are justified here even though the behavior is prompt-driven.

Open Questions

Resolved During Planning

  • Should ce:review-beta keep any embedded fix loop? Yes, but only for safe_auto findings under an explicit mode/policy. Residual work is handed off.
  • Should autonomous mode be inferred from lack of interactivity? No. Use explicit mode:autonomous.
  • Should slfg keep review and browser testing in parallel? No, not once review can mutate the checkout. Run browser testing after the mutating review phase on the stabilized tree.
  • Should residual work be todos/, .context/, or both? Both. .context holds the run artifact; todos/ is only for durable unresolved actionable work.

Deferred to Implementation

  • Exact metadata field names in findings-schema.json
  • Whether report-only should imply a different default output template section ordering than interactive / autonomous
  • Whether residual todos/ should be created directly by ce:review-beta or via a small shared helper/reference template used by both review and resolver flows

High-Level Technical Design

This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.

review stages -> synthesize -> classify outputs by autofix_class/owner
               -> if mode=report-only: emit report + stop
               -> if mode=interactive: acquire policy from user
               -> if mode=autonomous: use policy from arguments/defaults
               -> run single fixer on safe_auto set
               -> verify tests + focused re-review
               -> emit residual todos for unresolved actionable items
               -> emit advisory/report sections for non-actionable outputs

Implementation Units

  • Unit 1: Add explicit mode handling and routing metadata to ce:review-beta

Goal: Give ce:review-beta a clear execution contract for standalone, autonomous, and read-only pipeline use.

Requirements: R1, R2, R3, R4, R7

Dependencies: None

Files:

  • Modify: plugins/compound-engineering/skills/ce-review-beta/SKILL.md
  • Modify: plugins/compound-engineering/skills/ce-review-beta/references/findings-schema.json
  • Modify: plugins/compound-engineering/skills/ce-review-beta/references/review-output-template.md
  • Modify: plugins/compound-engineering/skills/ce-review-beta/references/subagent-template.md (if routing metadata needs to be spelled out in spawn prompts)

Approach:

  • Add a Mode Detection section near the top of SKILL.md using the established mode:autonomous argument pattern from ce:compound-refresh
  • Introduce mode:report-only alongside mode:autonomous
  • Scope all interactive question instructions so they apply only to interactive mode
  • Extend findings-schema.json with routing-oriented fields such as:
    • autofix_class: safe_auto | gated_auto | manual | advisory
    • owner: review-fixer | downstream-resolver | human | release
    • requires_verification: boolean
  • Update the review output template so the final report can distinguish:
    • applied fixes
    • residual actionable work
    • advisory / operational notes

Patterns to follow:

  • plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md explicit autonomous mode structure
  • plugins/compound-engineering/skills/ce-plan/SKILL.md pipeline-mode question skipping

Test scenarios:

  • Interactive mode still presents questions and next-step prompts
  • mode:autonomous never asks a question and never waits for user input
  • mode:report-only performs no edits and no commit/push/PR actions
  • A helper-agent output can be preserved in the final report without being treated as auto-fixable work

Verification:

  • tests/review-skill-contract.test.ts asserts the three mode markers and interactive scoping rules

  • bun run release:validate passes

  • Unit 2: Redesign the fix loop around policy-driven safe autofix and bounded re-review

Goal: Replace the current severity-prompt-centric fix loop with one that works in both interactive and autonomous contexts.

Requirements: R2, R4, R5, R7

Dependencies: Unit 1

Files:

  • Modify: plugins/compound-engineering/skills/ce-review-beta/SKILL.md
  • Add: plugins/compound-engineering/skills/ce-review-beta/references/fix-policy.md (if the classification and policy table becomes too large for SKILL.md)
  • Modify: plugins/compound-engineering/skills/ce-review-beta/references/review-output-template.md

Approach:

  • Replace "Severity Acceptance" as the primary decision point with a classification stage that groups synthesized findings by autofix_class
  • In interactive mode, ask the user only for policy decisions that remain ambiguous after classification
  • In autonomous mode, use conservative defaults:
    • apply safe_auto
    • leave gated_auto, manual, and advisory unresolved
  • Keep the "exactly one fixer subagent" rule for consistency
  • Bound the loop with max_rounds (for example 2) and require targeted verification plus focused re-review after any applied fix set
  • Restrict commit / push / PR creation steps to interactive mode only; autonomous and report-only modes stop after emitting outputs

Patterns to follow:

  • docs/solutions/skill-design/compound-refresh-skill-improvements.md applied-vs-recommended distinction
  • Existing ce-review-beta single-fixer rule

Test scenarios:

  • A safe_auto testing finding gets fixed and re-reviewed without user input in autonomous mode
  • A gated_auto API contract or authz finding is preserved as residual actionable work, not auto-fixed
  • A deployment checklist remains advisory and never enters the fixer queue
  • Zero findings skip the fix phase entirely
  • Re-review is bounded and does not recurse indefinitely

Verification:

  • tests/review-skill-contract.test.ts asserts that autonomous mode has no mandatory user-question step in the fix path

  • Manual dry run: read the fix-loop prose end-to-end and verify there is no mutation-owning step outside the policy gate

  • Unit 3: Define residual artifact and downstream handoff behavior

Goal: Make autonomous review compatible with downstream workflows instead of competing with them.

Requirements: R5, R6, R7

Dependencies: Unit 2

Files:

  • Modify: plugins/compound-engineering/skills/ce-review-beta/SKILL.md
  • Modify: plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md
  • Modify: plugins/compound-engineering/skills/file-todos/SKILL.md
  • Add: plugins/compound-engineering/skills/ce-review-beta/references/residual-work-template.md (if a dedicated durable-work shape helps keep review prose smaller)

Approach:

  • Write a per-run review artifact under .context/compound-engineering/ce-review-beta/<run-id>/ containing:
    • synthesized findings
    • what was auto-fixed
    • what remains unresolved
    • advisory-only outputs
  • Create durable todos/ items only for unresolved actionable findings whose owner is downstream resolution
  • Update resolve-todo-parallel to acknowledge this source explicitly so residual review work can be picked up without pretending everything came from stable ce:review
  • Update file-todos integration guidance to reflect the new flow:
    • review-beta autonomous -> residual todos -> resolve-todo-parallel
    • advisory-only outputs do not become todos

Patterns to follow:

  • .context/compound-engineering/<workflow>/<run-id>/ scratch-space convention from AGENTS.md
  • Existing file-todos review/resolution lifecycle

Test scenarios:

  • Autonomous review with only advisory outputs creates no todos
  • Autonomous review with 2 unresolved actionable findings creates exactly 2 residual todos
  • Residual work items exclude protected-artifact cleanup suggestions
  • The run artifact is sufficient to explain what the in-skill fixer changed vs. what remains

Verification:

  • tests/review-skill-contract.test.ts asserts the documented .context and todos/ handoff rules

  • bun run release:validate passes after any skill inventory/reference changes

  • Unit 4: Add contract-focused regression coverage for mode, handoff, and future-integration boundaries

Goal: Catch the specific instruction-boundary regressions that have repeatedly escaped manual review.

Requirements: R8, R9

Dependencies: Units 1-3

Files:

  • Add: tests/review-skill-contract.test.ts
  • Optionally modify: package.json only if a new test entry point is required (prefer using the existing Bun test setup without package changes)

Approach:

  • Add a focused test that reads the relevant skill files and asserts contract-level invariants instead of brittle full-file snapshots
  • Cover:
    • ce-review-beta mode markers and mode-specific behavior phrases
    • absence of unconditional interactive prompts in autonomous/report-only paths
    • explicit residual-work handoff language
    • explicit documentation that mutating review must not run concurrently with browser testing on the same checkout
  • Keep assertions semantic and localized; avoid snapshotting large markdown files

Patterns to follow:

  • Existing Bun tests that read repository files directly for release/config validation

Test scenarios:

  • Missing mode:autonomous block fails
  • Reintroduced unconditional "Ask the user" text in the autonomous path fails
  • Missing residual todo handoff text fails
  • Missing future integration constraint around mutating review vs. browser testing fails

Verification:

  • bun test tests/review-skill-contract.test.ts
  • full bun test

Risks & Dependencies

  • Over-aggressive autofix classification.
    • Mitigation: conservative defaults, gated_auto bucket, bounded rounds, focused re-review
  • Dual ownership confusion between ce:review-beta and resolve-todo-parallel.
    • Mitigation: explicit owner/routing metadata and durable residual-work contract
  • Brittle contract tests.
    • Mitigation: assert only boundary invariants, not full markdown snapshots
  • Promotion churn.
    • Mitigation: keep beta isolated until Unit 4 contract coverage and manual verification pass

Sources & References

  • Related skills:
    • plugins/compound-engineering/skills/ce-review-beta/SKILL.md
    • plugins/compound-engineering/skills/ce-review/SKILL.md
    • plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md
    • plugins/compound-engineering/skills/file-todos/SKILL.md
    • plugins/compound-engineering/skills/lfg/SKILL.md
    • plugins/compound-engineering/skills/slfg/SKILL.md
  • Institutional learnings:
    • docs/solutions/skill-design/compound-refresh-skill-improvements.md
    • docs/solutions/skill-design/beta-skills-framework.md
  • Supporting pattern reference:
    • plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md
    • plugins/compound-engineering/skills/ce-plan/SKILL.md