19 KiB
title, type, status, date, origin
| title | type | status | date | origin |
|---|---|---|---|---|
| feat: Make ce:review-beta autonomous and pipeline-safe | feat | active | 2026-03-23 | direct user request and planning discussion on ce:review-beta standalone vs. autonomous pipeline behavior |
Make ce:review-beta Autonomous and Pipeline-Safe
Overview
Redesign ce:review-beta from a purely interactive standalone review workflow into a policy-driven review engine that supports three explicit modes: interactive, autonomous, and report-only. The redesign should preserve the current standalone UX for manual review, enable hands-off review and safe autofix in automated workflows, and define a clean residual-work handoff for anything that should not be auto-fixed. This plan remains beta-only; promotion to stable ce:review and any lfg / slfg cutover should happen only in a follow-up plan after the beta behavior is validated.
Problem Frame
ce:review-beta currently mixes three responsibilities in one loop:
- Review and synthesis
- Human approval on what to fix
- Local fixing, re-review, and push/PR next steps
That is acceptable for standalone use, but it is the wrong shape for autonomous orchestration:
lfgcurrently treats review as an upstream producer before downstream resolution and browser testingslfgcurrently runs review and browser testing in parallel, which is only safe if review is non-mutatingresolve-todo-parallelexpects a durable residual-work contract (todos/), whilece:review-betacurrently tries to resolve accepted findings inline- The findings schema lacks routing metadata, so severity is doing too much work; urgency and autofix eligibility are distinct concerns
The result is a workflow that is hard to promote safely: it can be interactive, or autonomous, or mutation-owning, but not all three at once without an explicit mode model and clearer ownership boundaries.
Requirements Trace
- R1.
ce:review-betasupports explicit execution modes:interactive(default),autonomous, andreport-only - R2.
autonomousmode never asks the user questions, never waits for approval, and applies only policy-allowed safe fixes - R3.
report-onlymode is strictly read-only and safe to run in parallel with other read-only verification steps - R4. Findings are routed by explicit fixability metadata, not by severity alone
- R5.
ce:review-betacan run one bounded in-skill autofix pass forsafe_autofindings and then re-review the changed scope - R6. Residual actionable findings are emitted as durable downstream work artifacts; advisory outputs remain report-only
- R7. CE helper outputs (
learnings,agent-native,schema-drift,deployment-verification) are preserved but only some become actionable work items - R8. The beta contract makes future orchestration constraints explicit so a later
lfg/slfgcutover does not run a mutating review concurrently with browser testing on the same checkout - R9. Repeated regression classes around interaction mode, routing, and orchestration boundaries gain lightweight contract coverage
Scope Boundaries
- Keep the existing persona ensemble, confidence gate, and synthesis model as the base architecture
- Do not redesign every reviewer persona's prompt beyond the metadata they need to emit
- Do not introduce a new general-purpose orchestration framework; reuse existing skill patterns where possible
- Do not auto-fix deployment checklists, residual risks, or other advisory-only outputs
- Do not attempt broad converter/platform work in this change unless the review skill's frontmatter or references require it
- Beta remains the only implementation target in this plan; stable promotion is intentionally deferred to a follow-up plan after validation
Context & Research
Relevant Code and Patterns
plugins/compound-engineering/skills/ce-review-beta/SKILL.md- Current staged review pipeline with interactive severity acceptance, inline fixer, re-review offer, and post-fix push/PR actions
plugins/compound-engineering/skills/ce-review-beta/references/findings-schema.json- Structured persona finding contract today; currently missing routing metadata for autonomous handling
plugins/compound-engineering/skills/ce-review/SKILL.md- Current stable review workflow; creates durable
todos/artifacts rather than fixing findings inline
- Current stable review workflow; creates durable
plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md- Existing residual-work resolver; parallelizes item handling once work has already been externalized
plugins/compound-engineering/skills/file-todos/SKILL.md- Existing review -> triage -> todo -> resolve integration contract
plugins/compound-engineering/skills/lfg/SKILL.md- Sequential orchestrator whose future cutover constraints should inform the beta contract, even though this plan does not modify it
plugins/compound-engineering/skills/slfg/SKILL.md- Swarm orchestrator whose current review/browser parallelism defines an important future integration constraint, even though this plan does not modify it
plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md- Strong repo precedent for explicit
mode:autonomousargument handling and conservative non-interactive behavior
- Strong repo precedent for explicit
plugins/compound-engineering/skills/ce-plan/SKILL.md- Strong repo precedent for pipeline mode skipping interactive questions
Institutional Learnings
docs/solutions/skill-design/compound-refresh-skill-improvements.md- Explicit autonomous mode beats tool-based auto-detection
- Ambiguous cases in autonomous mode should be recorded conservatively, not guessed
- Report structure should distinguish applied actions from recommended follow-up
docs/solutions/skill-design/beta-skills-framework.md- Beta skills should remain isolated until validated
- Promotion is the right time to rewire
lfg/slfg, which is out of scope for this plan
External Research Decision
Skipped. This is a repo-internal orchestration and skill-design change with strong existing local patterns for autonomous mode, beta promotion, and residual-work handling.
Key Technical Decisions
- Use explicit mode arguments instead of auto-detection. Follow
ce:compound-refreshand requiremode:autonomous/mode:report-onlyarguments. Interactive remains the default. This avoids conflating "no question tool" with "headless workflow." - Split review from mutation semantically, not by creating two separate skills.
ce:review-betashould always perform the same review and synthesis stages. Mutation behavior becomes a mode-controlled phase layered on top. - Route by fixability, not severity. Add explicit per-finding routing fields such as
autofix_class,owner, andrequires_verification. Severity remains urgency; it no longer implies who acts. - Keep one in-skill fixer, but only for
safe_autofindings. The current "one fixer subagent" rule is still right for consistent-tree edits. The change is that the fixer is selected by policy and routing metadata, not by an interactive severity prompt. - Emit both ephemeral and durable outputs. Use
.context/compound-engineering/ce-review-beta/<run-id>/for the per-run machine-readable report and create durabletodos/items only for unresolved actionable findings that belong downstream. - Treat CE helper outputs by artifact class.
learnings-researcher: contextual/advisory unless a concrete finding corroborates itagent-native-reviewer: oftengated_autoormanual, occasionallysafe_autowhen the fix is purely local and mechanicalschema-drift-detector: defaultmanualorgated_auto; never auto-fix blindly by defaultdeployment-verification-agent: always advisory / operational, never autofix
- Design the beta contract so future orchestration cutover is safe. The beta must make it explicit that mutating review cannot run concurrently with browser testing on the same checkout. That requirement is part of validation and future cutover criteria, not a same-plan rewrite of
slfg. - Move push / PR creation decisions out of autonomous review. Interactive standalone mode may still offer next-step prompts. Autonomous and report-only modes should stop after producing fixes and/or residual artifacts; any future parent workflow decides commit, push, and PR timing.
- Add lightweight contract tests. Repeated regressions have come from instruction-boundary drift. String- and structure-level contract tests are justified here even though the behavior is prompt-driven.
Open Questions
Resolved During Planning
- Should
ce:review-betakeep any embedded fix loop? Yes, but only forsafe_autofindings under an explicit mode/policy. Residual work is handed off. - Should autonomous mode be inferred from lack of interactivity? No. Use explicit
mode:autonomous. - Should
slfgkeep review and browser testing in parallel? No, not once review can mutate the checkout. Run browser testing after the mutating review phase on the stabilized tree. - Should residual work be
todos/,.context/, or both? Both..contextholds the run artifact;todos/is only for durable unresolved actionable work.
Deferred to Implementation
- Exact metadata field names in
findings-schema.json - Whether
report-onlyshould imply a different default output template section ordering thaninteractive/autonomous - Whether residual
todos/should be created directly byce:review-betaor via a small shared helper/reference template used by both review and resolver flows
High-Level Technical Design
This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.
review stages -> synthesize -> classify outputs by autofix_class/owner
-> if mode=report-only: emit report + stop
-> if mode=interactive: acquire policy from user
-> if mode=autonomous: use policy from arguments/defaults
-> run single fixer on safe_auto set
-> verify tests + focused re-review
-> emit residual todos for unresolved actionable items
-> emit advisory/report sections for non-actionable outputs
Implementation Units
- Unit 1: Add explicit mode handling and routing metadata to ce:review-beta
Goal: Give ce:review-beta a clear execution contract for standalone, autonomous, and read-only pipeline use.
Requirements: R1, R2, R3, R4, R7
Dependencies: None
Files:
- Modify:
plugins/compound-engineering/skills/ce-review-beta/SKILL.md - Modify:
plugins/compound-engineering/skills/ce-review-beta/references/findings-schema.json - Modify:
plugins/compound-engineering/skills/ce-review-beta/references/review-output-template.md - Modify:
plugins/compound-engineering/skills/ce-review-beta/references/subagent-template.md(if routing metadata needs to be spelled out in spawn prompts)
Approach:
- Add a Mode Detection section near the top of
SKILL.mdusing the establishedmode:autonomousargument pattern fromce:compound-refresh - Introduce
mode:report-onlyalongsidemode:autonomous - Scope all interactive question instructions so they apply only to interactive mode
- Extend
findings-schema.jsonwith routing-oriented fields such as:autofix_class:safe_auto | gated_auto | manual | advisoryowner:review-fixer | downstream-resolver | human | releaserequires_verification: boolean
- Update the review output template so the final report can distinguish:
- applied fixes
- residual actionable work
- advisory / operational notes
Patterns to follow:
plugins/compound-engineering/skills/ce-compound-refresh/SKILL.mdexplicit autonomous mode structureplugins/compound-engineering/skills/ce-plan/SKILL.mdpipeline-mode question skipping
Test scenarios:
- Interactive mode still presents questions and next-step prompts
mode:autonomousnever asks a question and never waits for user inputmode:report-onlyperforms no edits and no commit/push/PR actions- A helper-agent output can be preserved in the final report without being treated as auto-fixable work
Verification:
-
tests/review-skill-contract.test.tsasserts the three mode markers and interactive scoping rules -
bun run release:validatepasses -
Unit 2: Redesign the fix loop around policy-driven safe autofix and bounded re-review
Goal: Replace the current severity-prompt-centric fix loop with one that works in both interactive and autonomous contexts.
Requirements: R2, R4, R5, R7
Dependencies: Unit 1
Files:
- Modify:
plugins/compound-engineering/skills/ce-review-beta/SKILL.md - Add:
plugins/compound-engineering/skills/ce-review-beta/references/fix-policy.md(if the classification and policy table becomes too large forSKILL.md) - Modify:
plugins/compound-engineering/skills/ce-review-beta/references/review-output-template.md
Approach:
- Replace "Severity Acceptance" as the primary decision point with a classification stage that groups synthesized findings by
autofix_class - In interactive mode, ask the user only for policy decisions that remain ambiguous after classification
- In autonomous mode, use conservative defaults:
- apply
safe_auto - leave
gated_auto,manual, andadvisoryunresolved
- apply
- Keep the "exactly one fixer subagent" rule for consistency
- Bound the loop with
max_rounds(for example 2) and require targeted verification plus focused re-review after any applied fix set - Restrict commit / push / PR creation steps to interactive mode only; autonomous and report-only modes stop after emitting outputs
Patterns to follow:
docs/solutions/skill-design/compound-refresh-skill-improvements.mdapplied-vs-recommended distinction- Existing
ce-review-betasingle-fixer rule
Test scenarios:
- A
safe_autotesting finding gets fixed and re-reviewed without user input in autonomous mode - A
gated_autoAPI contract or authz finding is preserved as residual actionable work, not auto-fixed - A deployment checklist remains advisory and never enters the fixer queue
- Zero findings skip the fix phase entirely
- Re-review is bounded and does not recurse indefinitely
Verification:
-
tests/review-skill-contract.test.tsasserts that autonomous mode has no mandatory user-question step in the fix path -
Manual dry run: read the fix-loop prose end-to-end and verify there is no mutation-owning step outside the policy gate
-
Unit 3: Define residual artifact and downstream handoff behavior
Goal: Make autonomous review compatible with downstream workflows instead of competing with them.
Requirements: R5, R6, R7
Dependencies: Unit 2
Files:
- Modify:
plugins/compound-engineering/skills/ce-review-beta/SKILL.md - Modify:
plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md - Modify:
plugins/compound-engineering/skills/file-todos/SKILL.md - Add:
plugins/compound-engineering/skills/ce-review-beta/references/residual-work-template.md(if a dedicated durable-work shape helps keep review prose smaller)
Approach:
- Write a per-run review artifact under
.context/compound-engineering/ce-review-beta/<run-id>/containing:- synthesized findings
- what was auto-fixed
- what remains unresolved
- advisory-only outputs
- Create durable
todos/items only for unresolved actionable findings whoseowneris downstream resolution - Update
resolve-todo-parallelto acknowledge this source explicitly so residual review work can be picked up without pretending everything came from stablece:review - Update
file-todosintegration guidance to reflect the new flow:- review-beta autonomous -> residual todos -> resolve-todo-parallel
- advisory-only outputs do not become todos
Patterns to follow:
.context/compound-engineering/<workflow>/<run-id>/scratch-space convention fromAGENTS.md- Existing
file-todosreview/resolution lifecycle
Test scenarios:
- Autonomous review with only advisory outputs creates no todos
- Autonomous review with 2 unresolved actionable findings creates exactly 2 residual todos
- Residual work items exclude protected-artifact cleanup suggestions
- The run artifact is sufficient to explain what the in-skill fixer changed vs. what remains
Verification:
-
tests/review-skill-contract.test.tsasserts the documented.contextandtodos/handoff rules -
bun run release:validatepasses after any skill inventory/reference changes -
Unit 4: Add contract-focused regression coverage for mode, handoff, and future-integration boundaries
Goal: Catch the specific instruction-boundary regressions that have repeatedly escaped manual review.
Requirements: R8, R9
Dependencies: Units 1-3
Files:
- Add:
tests/review-skill-contract.test.ts - Optionally modify:
package.jsononly if a new test entry point is required (prefer using the existing Bun test setup without package changes)
Approach:
- Add a focused test that reads the relevant skill files and asserts contract-level invariants instead of brittle full-file snapshots
- Cover:
ce-review-betamode markers and mode-specific behavior phrases- absence of unconditional interactive prompts in autonomous/report-only paths
- explicit residual-work handoff language
- explicit documentation that mutating review must not run concurrently with browser testing on the same checkout
- Keep assertions semantic and localized; avoid snapshotting large markdown files
Patterns to follow:
- Existing Bun tests that read repository files directly for release/config validation
Test scenarios:
- Missing
mode:autonomousblock fails - Reintroduced unconditional "Ask the user" text in the autonomous path fails
- Missing residual todo handoff text fails
- Missing future integration constraint around mutating review vs. browser testing fails
Verification:
bun test tests/review-skill-contract.test.ts- full
bun test
Risks & Dependencies
- Over-aggressive autofix classification.
- Mitigation: conservative defaults,
gated_autobucket, bounded rounds, focused re-review
- Mitigation: conservative defaults,
- Dual ownership confusion between
ce:review-betaandresolve-todo-parallel.- Mitigation: explicit owner/routing metadata and durable residual-work contract
- Brittle contract tests.
- Mitigation: assert only boundary invariants, not full markdown snapshots
- Promotion churn.
- Mitigation: keep beta isolated until Unit 4 contract coverage and manual verification pass
Sources & References
- Related skills:
plugins/compound-engineering/skills/ce-review-beta/SKILL.mdplugins/compound-engineering/skills/ce-review/SKILL.mdplugins/compound-engineering/skills/resolve-todo-parallel/SKILL.mdplugins/compound-engineering/skills/file-todos/SKILL.mdplugins/compound-engineering/skills/lfg/SKILL.mdplugins/compound-engineering/skills/slfg/SKILL.md
- Institutional learnings:
docs/solutions/skill-design/compound-refresh-skill-improvements.mddocs/solutions/skill-design/beta-skills-framework.md
- Supporting pattern reference:
plugins/compound-engineering/skills/ce-compound-refresh/SKILL.mdplugins/compound-engineering/skills/ce-plan/SKILL.md