feat: add ce:review-beta with structured persona pipeline (#348)

This commit is contained in:
Trevin Chow
2026-03-23 21:49:04 -07:00
committed by GitHub
parent 0fdc25a36c
commit e932276866
22 changed files with 1794 additions and 11 deletions

View File

@@ -0,0 +1,316 @@
---
title: "feat: Make ce:review-beta autonomous and pipeline-safe"
type: feat
status: active
date: 2026-03-23
origin: direct user request and planning discussion on ce:review-beta standalone vs. autonomous pipeline behavior
---
# Make ce:review-beta Autonomous and Pipeline-Safe
## Overview
Redesign `ce:review-beta` from a purely interactive standalone review workflow into a policy-driven review engine that supports three explicit modes: `interactive`, `autonomous`, and `report-only`. The redesign should preserve the current standalone UX for manual review, enable hands-off review and safe autofix in automated workflows, and define a clean residual-work handoff for anything that should not be auto-fixed. This plan remains beta-only; promotion to stable `ce:review` and any `lfg` / `slfg` cutover should happen only in a follow-up plan after the beta behavior is validated.
## Problem Frame
`ce:review-beta` currently mixes three responsibilities in one loop:
1. Review and synthesis
2. Human approval on what to fix
3. Local fixing, re-review, and push/PR next steps
That is acceptable for standalone use, but it is the wrong shape for autonomous orchestration:
- `lfg` currently treats review as an upstream producer before downstream resolution and browser testing
- `slfg` currently runs review and browser testing in parallel, which is only safe if review is non-mutating
- `resolve-todo-parallel` expects a durable residual-work contract (`todos/`), while `ce:review-beta` currently tries to resolve accepted findings inline
- The findings schema lacks routing metadata, so severity is doing too much work; urgency and autofix eligibility are distinct concerns
The result is a workflow that is hard to promote safely: it can be interactive, or autonomous, or mutation-owning, but not all three at once without an explicit mode model and clearer ownership boundaries.
## Requirements Trace
- R1. `ce:review-beta` supports explicit execution modes: `interactive` (default), `autonomous`, and `report-only`
- R2. `autonomous` mode never asks the user questions, never waits for approval, and applies only policy-allowed safe fixes
- R3. `report-only` mode is strictly read-only and safe to run in parallel with other read-only verification steps
- R4. Findings are routed by explicit fixability metadata, not by severity alone
- R5. `ce:review-beta` can run one bounded in-skill autofix pass for `safe_auto` findings and then re-review the changed scope
- R6. Residual actionable findings are emitted as durable downstream work artifacts; advisory outputs remain report-only
- R7. CE helper outputs (`learnings`, `agent-native`, `schema-drift`, `deployment-verification`) are preserved but only some become actionable work items
- R8. The beta contract makes future orchestration constraints explicit so a later `lfg` / `slfg` cutover does not run a mutating review concurrently with browser testing on the same checkout
- R9. Repeated regression classes around interaction mode, routing, and orchestration boundaries gain lightweight contract coverage
## Scope Boundaries
- Keep the existing persona ensemble, confidence gate, and synthesis model as the base architecture
- Do not redesign every reviewer persona's prompt beyond the metadata they need to emit
- Do not introduce a new general-purpose orchestration framework; reuse existing skill patterns where possible
- Do not auto-fix deployment checklists, residual risks, or other advisory-only outputs
- Do not attempt broad converter/platform work in this change unless the review skill's frontmatter or references require it
- Beta remains the only implementation target in this plan; stable promotion is intentionally deferred to a follow-up plan after validation
## Context & Research
### Relevant Code and Patterns
- `plugins/compound-engineering/skills/ce-review-beta/SKILL.md`
- Current staged review pipeline with interactive severity acceptance, inline fixer, re-review offer, and post-fix push/PR actions
- `plugins/compound-engineering/skills/ce-review-beta/references/findings-schema.json`
- Structured persona finding contract today; currently missing routing metadata for autonomous handling
- `plugins/compound-engineering/skills/ce-review/SKILL.md`
- Current stable review workflow; creates durable `todos/` artifacts rather than fixing findings inline
- `plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md`
- Existing residual-work resolver; parallelizes item handling once work has already been externalized
- `plugins/compound-engineering/skills/file-todos/SKILL.md`
- Existing review -> triage -> todo -> resolve integration contract
- `plugins/compound-engineering/skills/lfg/SKILL.md`
- Sequential orchestrator whose future cutover constraints should inform the beta contract, even though this plan does not modify it
- `plugins/compound-engineering/skills/slfg/SKILL.md`
- Swarm orchestrator whose current review/browser parallelism defines an important future integration constraint, even though this plan does not modify it
- `plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md`
- Strong repo precedent for explicit `mode:autonomous` argument handling and conservative non-interactive behavior
- `plugins/compound-engineering/skills/ce-plan/SKILL.md`
- Strong repo precedent for pipeline mode skipping interactive questions
### Institutional Learnings
- `docs/solutions/skill-design/compound-refresh-skill-improvements.md`
- Explicit autonomous mode beats tool-based auto-detection
- Ambiguous cases in autonomous mode should be recorded conservatively, not guessed
- Report structure should distinguish applied actions from recommended follow-up
- `docs/solutions/skill-design/beta-skills-framework.md`
- Beta skills should remain isolated until validated
- Promotion is the right time to rewire `lfg` / `slfg`, which is out of scope for this plan
### External Research Decision
Skipped. This is a repo-internal orchestration and skill-design change with strong existing local patterns for autonomous mode, beta promotion, and residual-work handling.
## Key Technical Decisions
- **Use explicit mode arguments instead of auto-detection.** Follow `ce:compound-refresh` and require `mode:autonomous` / `mode:report-only` arguments. Interactive remains the default. This avoids conflating "no question tool" with "headless workflow."
- **Split review from mutation semantically, not by creating two separate skills.** `ce:review-beta` should always perform the same review and synthesis stages. Mutation behavior becomes a mode-controlled phase layered on top.
- **Route by fixability, not severity.** Add explicit per-finding routing fields such as `autofix_class`, `owner`, and `requires_verification`. Severity remains urgency; it no longer implies who acts.
- **Keep one in-skill fixer, but only for `safe_auto` findings.** The current "one fixer subagent" rule is still right for consistent-tree edits. The change is that the fixer is selected by policy and routing metadata, not by an interactive severity prompt.
- **Emit both ephemeral and durable outputs.** Use `.context/compound-engineering/ce-review-beta/<run-id>/` for the per-run machine-readable report and create durable `todos/` items only for unresolved actionable findings that belong downstream.
- **Treat CE helper outputs by artifact class.**
- `learnings-researcher`: contextual/advisory unless a concrete finding corroborates it
- `agent-native-reviewer`: often `gated_auto` or `manual`, occasionally `safe_auto` when the fix is purely local and mechanical
- `schema-drift-detector`: default `manual` or `gated_auto`; never auto-fix blindly by default
- `deployment-verification-agent`: always advisory / operational, never autofix
- **Design the beta contract so future orchestration cutover is safe.** The beta must make it explicit that mutating review cannot run concurrently with browser testing on the same checkout. That requirement is part of validation and future cutover criteria, not a same-plan rewrite of `slfg`.
- **Move push / PR creation decisions out of autonomous review.** Interactive standalone mode may still offer next-step prompts. Autonomous and report-only modes should stop after producing fixes and/or residual artifacts; any future parent workflow decides commit, push, and PR timing.
- **Add lightweight contract tests.** Repeated regressions have come from instruction-boundary drift. String- and structure-level contract tests are justified here even though the behavior is prompt-driven.
## Open Questions
### Resolved During Planning
- **Should `ce:review-beta` keep any embedded fix loop?** Yes, but only for `safe_auto` findings under an explicit mode/policy. Residual work is handed off.
- **Should autonomous mode be inferred from lack of interactivity?** No. Use explicit `mode:autonomous`.
- **Should `slfg` keep review and browser testing in parallel?** No, not once review can mutate the checkout. Run browser testing after the mutating review phase on the stabilized tree.
- **Should residual work be `todos/`, `.context/`, or both?** Both. `.context` holds the run artifact; `todos/` is only for durable unresolved actionable work.
### Deferred to Implementation
- Exact metadata field names in `findings-schema.json`
- Whether `report-only` should imply a different default output template section ordering than `interactive` / `autonomous`
- Whether residual `todos/` should be created directly by `ce:review-beta` or via a small shared helper/reference template used by both review and resolver flows
## High-Level Technical Design
This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.
```text
review stages -> synthesize -> classify outputs by autofix_class/owner
-> if mode=report-only: emit report + stop
-> if mode=interactive: acquire policy from user
-> if mode=autonomous: use policy from arguments/defaults
-> run single fixer on safe_auto set
-> verify tests + focused re-review
-> emit residual todos for unresolved actionable items
-> emit advisory/report sections for non-actionable outputs
```
## Implementation Units
- [x] **Unit 1: Add explicit mode handling and routing metadata to ce:review-beta**
**Goal:** Give `ce:review-beta` a clear execution contract for standalone, autonomous, and read-only pipeline use.
**Requirements:** R1, R2, R3, R4, R7
**Dependencies:** None
**Files:**
- Modify: `plugins/compound-engineering/skills/ce-review-beta/SKILL.md`
- Modify: `plugins/compound-engineering/skills/ce-review-beta/references/findings-schema.json`
- Modify: `plugins/compound-engineering/skills/ce-review-beta/references/review-output-template.md`
- Modify: `plugins/compound-engineering/skills/ce-review-beta/references/subagent-template.md` (if routing metadata needs to be spelled out in spawn prompts)
**Approach:**
- Add a Mode Detection section near the top of `SKILL.md` using the established `mode:autonomous` argument pattern from `ce:compound-refresh`
- Introduce `mode:report-only` alongside `mode:autonomous`
- Scope all interactive question instructions so they apply only to interactive mode
- Extend `findings-schema.json` with routing-oriented fields such as:
- `autofix_class`: `safe_auto | gated_auto | manual | advisory`
- `owner`: `review-fixer | downstream-resolver | human | release`
- `requires_verification`: boolean
- Update the review output template so the final report can distinguish:
- applied fixes
- residual actionable work
- advisory / operational notes
**Patterns to follow:**
- `plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md` explicit autonomous mode structure
- `plugins/compound-engineering/skills/ce-plan/SKILL.md` pipeline-mode question skipping
**Test scenarios:**
- Interactive mode still presents questions and next-step prompts
- `mode:autonomous` never asks a question and never waits for user input
- `mode:report-only` performs no edits and no commit/push/PR actions
- A helper-agent output can be preserved in the final report without being treated as auto-fixable work
**Verification:**
- `tests/review-skill-contract.test.ts` asserts the three mode markers and interactive scoping rules
- `bun run release:validate` passes
- [x] **Unit 2: Redesign the fix loop around policy-driven safe autofix and bounded re-review**
**Goal:** Replace the current severity-prompt-centric fix loop with one that works in both interactive and autonomous contexts.
**Requirements:** R2, R4, R5, R7
**Dependencies:** Unit 1
**Files:**
- Modify: `plugins/compound-engineering/skills/ce-review-beta/SKILL.md`
- Add: `plugins/compound-engineering/skills/ce-review-beta/references/fix-policy.md` (if the classification and policy table becomes too large for `SKILL.md`)
- Modify: `plugins/compound-engineering/skills/ce-review-beta/references/review-output-template.md`
**Approach:**
- Replace "Severity Acceptance" as the primary decision point with a classification stage that groups synthesized findings by `autofix_class`
- In interactive mode, ask the user only for policy decisions that remain ambiguous after classification
- In autonomous mode, use conservative defaults:
- apply `safe_auto`
- leave `gated_auto`, `manual`, and `advisory` unresolved
- Keep the "exactly one fixer subagent" rule for consistency
- Bound the loop with `max_rounds` (for example 2) and require targeted verification plus focused re-review after any applied fix set
- Restrict commit / push / PR creation steps to interactive mode only; autonomous and report-only modes stop after emitting outputs
**Patterns to follow:**
- `docs/solutions/skill-design/compound-refresh-skill-improvements.md` applied-vs-recommended distinction
- Existing `ce-review-beta` single-fixer rule
**Test scenarios:**
- A `safe_auto` testing finding gets fixed and re-reviewed without user input in autonomous mode
- A `gated_auto` API contract or authz finding is preserved as residual actionable work, not auto-fixed
- A deployment checklist remains advisory and never enters the fixer queue
- Zero findings skip the fix phase entirely
- Re-review is bounded and does not recurse indefinitely
**Verification:**
- `tests/review-skill-contract.test.ts` asserts that autonomous mode has no mandatory user-question step in the fix path
- Manual dry run: read the fix-loop prose end-to-end and verify there is no mutation-owning step outside the policy gate
- [x] **Unit 3: Define residual artifact and downstream handoff behavior**
**Goal:** Make autonomous review compatible with downstream workflows instead of competing with them.
**Requirements:** R5, R6, R7
**Dependencies:** Unit 2
**Files:**
- Modify: `plugins/compound-engineering/skills/ce-review-beta/SKILL.md`
- Modify: `plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md`
- Modify: `plugins/compound-engineering/skills/file-todos/SKILL.md`
- Add: `plugins/compound-engineering/skills/ce-review-beta/references/residual-work-template.md` (if a dedicated durable-work shape helps keep review prose smaller)
**Approach:**
- Write a per-run review artifact under `.context/compound-engineering/ce-review-beta/<run-id>/` containing:
- synthesized findings
- what was auto-fixed
- what remains unresolved
- advisory-only outputs
- Create durable `todos/` items only for unresolved actionable findings whose `owner` is downstream resolution
- Update `resolve-todo-parallel` to acknowledge this source explicitly so residual review work can be picked up without pretending everything came from stable `ce:review`
- Update `file-todos` integration guidance to reflect the new flow:
- review-beta autonomous -> residual todos -> resolve-todo-parallel
- advisory-only outputs do not become todos
**Patterns to follow:**
- `.context/compound-engineering/<workflow>/<run-id>/` scratch-space convention from `AGENTS.md`
- Existing `file-todos` review/resolution lifecycle
**Test scenarios:**
- Autonomous review with only advisory outputs creates no todos
- Autonomous review with 2 unresolved actionable findings creates exactly 2 residual todos
- Residual work items exclude protected-artifact cleanup suggestions
- The run artifact is sufficient to explain what the in-skill fixer changed vs. what remains
**Verification:**
- `tests/review-skill-contract.test.ts` asserts the documented `.context` and `todos/` handoff rules
- `bun run release:validate` passes after any skill inventory/reference changes
- [x] **Unit 4: Add contract-focused regression coverage for mode, handoff, and future-integration boundaries**
**Goal:** Catch the specific instruction-boundary regressions that have repeatedly escaped manual review.
**Requirements:** R8, R9
**Dependencies:** Units 1-3
**Files:**
- Add: `tests/review-skill-contract.test.ts`
- Optionally modify: `package.json` only if a new test entry point is required (prefer using the existing Bun test setup without package changes)
**Approach:**
- Add a focused test that reads the relevant skill files and asserts contract-level invariants instead of brittle full-file snapshots
- Cover:
- `ce-review-beta` mode markers and mode-specific behavior phrases
- absence of unconditional interactive prompts in autonomous/report-only paths
- explicit residual-work handoff language
- explicit documentation that mutating review must not run concurrently with browser testing on the same checkout
- Keep assertions semantic and localized; avoid snapshotting large markdown files
**Patterns to follow:**
- Existing Bun tests that read repository files directly for release/config validation
**Test scenarios:**
- Missing `mode:autonomous` block fails
- Reintroduced unconditional "Ask the user" text in the autonomous path fails
- Missing residual todo handoff text fails
- Missing future integration constraint around mutating review vs. browser testing fails
**Verification:**
- `bun test tests/review-skill-contract.test.ts`
- full `bun test`
## Risks & Dependencies
- **Over-aggressive autofix classification.**
- Mitigation: conservative defaults, `gated_auto` bucket, bounded rounds, focused re-review
- **Dual ownership confusion between `ce:review-beta` and `resolve-todo-parallel`.**
- Mitigation: explicit owner/routing metadata and durable residual-work contract
- **Brittle contract tests.**
- Mitigation: assert only boundary invariants, not full markdown snapshots
- **Promotion churn.**
- Mitigation: keep beta isolated until Unit 4 contract coverage and manual verification pass
## Sources & References
- Related skills:
- `plugins/compound-engineering/skills/ce-review-beta/SKILL.md`
- `plugins/compound-engineering/skills/ce-review/SKILL.md`
- `plugins/compound-engineering/skills/resolve-todo-parallel/SKILL.md`
- `plugins/compound-engineering/skills/file-todos/SKILL.md`
- `plugins/compound-engineering/skills/lfg/SKILL.md`
- `plugins/compound-engineering/skills/slfg/SKILL.md`
- Institutional learnings:
- `docs/solutions/skill-design/compound-refresh-skill-improvements.md`
- `docs/solutions/skill-design/beta-skills-framework.md`
- Supporting pattern reference:
- `plugins/compound-engineering/skills/ce-compound-refresh/SKILL.md`
- `plugins/compound-engineering/skills/ce-plan/SKILL.md`

View File

@@ -13,6 +13,7 @@ severity: medium
description: "Pattern for trialing new skill versions alongside stable ones using a -beta suffix. Covers naming, plan file naming, internal references, and promotion path."
related:
- docs/solutions/skill-design/compound-refresh-skill-improvements.md
- docs/solutions/skill-design/review-skill-promotion-orchestration-contract.md
---
## Problem
@@ -79,6 +80,8 @@ When the beta version is validated:
8. Verify `lfg`/`slfg` work with the promoted skill
9. Verify `ce:work` consumes plans from the promoted skill
If the beta skill changed its invocation contract, promotion must also update all orchestration callers in the same PR instead of relying on the stable default behavior. See [review-skill-promotion-orchestration-contract.md](./review-skill-promotion-orchestration-contract.md) for the concrete review-skill example.
## Validation
After creating a beta skill, search its SKILL.md for references to the stable skill name it replaces. Any occurrence of the stable name without `-beta` is a missed rename — it would cause output collisions or route to the wrong skill.

View File

@@ -0,0 +1,80 @@
---
title: "Promoting review-beta to stable must update orchestration callers in the same change"
category: skill-design
date: 2026-03-23
module: plugins/compound-engineering/skills
component: SKILL.md
tags:
- skill-design
- beta-testing
- rollout-safety
- orchestration
- review-workflow
severity: medium
description: "When ce:review-beta is promoted to stable, update lfg/slfg in the same PR so they pass the correct mode instead of inheriting the interactive default."
related:
- docs/solutions/skill-design/beta-skills-framework.md
- docs/plans/2026-03-23-001-feat-ce-review-beta-pipeline-mode-beta-plan.md
---
## Problem
`ce:review-beta` introduces an explicit mode contract:
- default `interactive`
- `mode:autonomous`
- `mode:report-only`
That is correct for direct user invocation, but it creates a promotion hazard. If the beta skill is later promoted over stable `ce:review` without updating its orchestration callers, the surrounding workflows will silently inherit the interactive default.
For the current review workflow family, that would be wrong:
- `lfg` should run review in `mode:autonomous`
- `slfg` should run review in `mode:report-only` during its parallel review/browser phase
Without those caller changes, promotion would keep the skill name stable while changing its contract, which is exactly the kind of boundary drift that tends to escape manual review.
## Solution
Treat promotion as an orchestration contract change, not a file rename.
When promoting `ce:review-beta` to stable:
1. Replace stable `ce:review` with the promoted content
2. Update every workflow that invokes `ce:review` in the same PR
3. Hardcode the intended mode at each callsite instead of relying on the default
4. Add or update contract tests so the orchestration assumptions are executable
For the review workflow family, the expected caller contract is:
- `lfg` -> `ce:review mode:autonomous`
- `slfg` parallel phase -> `ce:review mode:report-only`
- any mutating review step in `slfg` must happen later, sequentially, or in an isolated checkout/worktree
## Why This Lives Here
This is not a good `AGENTS.md` note:
- it is specific to one beta-to-stable promotion
- it is easy for a temporary repo-global reminder to become stale
- future planning and review work is more likely to search `docs/solutions/skill-design/` than to rediscover an old ad hoc note in `AGENTS.md`
The durable memory should live with the other skill-design rollout patterns.
## Prevention
- When a beta skill changes invocation semantics, its promotion plan must include caller updates as a first-class implementation unit
- Promotion PRs should be atomic: promote the skill and update orchestrators in the same branch
- Add contract coverage for the promoted callsites so future refactors cannot silently drop required mode flags
- Do not rely on “remembering later” for orchestration mode changes; encode them in docs, plans, and tests
## Lifecycle Note
This note is intentionally tied to the `ce:review-beta` -> `ce:review` promotion window.
Once that promotion is complete and the stable orchestrators/tests already encode the contract:
- update or archive this doc if it no longer adds distinct value
- do not leave it behind as a stale reminder for a promotion that already happened
If the final stable design differs from the current expectation, revise this doc during the promotion PR so the historical note matches what actually shipped.