feat: promote ce:review-beta to stable ce:review (#371)
This commit is contained in:
@@ -0,0 +1,44 @@
|
|||||||
|
---
|
||||||
|
title: “Beta-to-stable promotions must update orchestration callers atomically”
|
||||||
|
category: skill-design
|
||||||
|
date: 2026-03-23
|
||||||
|
module: plugins/compound-engineering/skills
|
||||||
|
component: SKILL.md
|
||||||
|
tags:
|
||||||
|
- skill-design
|
||||||
|
- beta-testing
|
||||||
|
- rollout-safety
|
||||||
|
- orchestration
|
||||||
|
severity: medium
|
||||||
|
description: “When promoting a beta skill to stable, update all orchestration callers in the same PR so they pass correct mode flags instead of inheriting defaults.”
|
||||||
|
related:
|
||||||
|
- docs/solutions/skill-design/beta-skills-framework.md
|
||||||
|
---
|
||||||
|
|
||||||
|
## Problem
|
||||||
|
|
||||||
|
When a beta skill introduces new invocation semantics (e.g., explicit mode flags), promoting it over its stable counterpart without updating orchestration callers causes those callers to silently inherit the wrong default behavior.
|
||||||
|
|
||||||
|
## Solution
|
||||||
|
|
||||||
|
Treat promotion as an orchestration contract change, not a file rename.
|
||||||
|
|
||||||
|
1. Replace the stable skill with the promoted content
|
||||||
|
2. Update every workflow that invokes the skill in the same PR
|
||||||
|
3. Hardcode the intended mode at each callsite instead of relying on the default
|
||||||
|
4. Add or update contract tests so the orchestration assumptions are executable
|
||||||
|
|
||||||
|
## Applied: ce:review-beta -> ce:review (2026-03-24)
|
||||||
|
|
||||||
|
This pattern was applied when promoting `ce:review-beta` to stable. The caller contract:
|
||||||
|
|
||||||
|
- `lfg` -> `/ce:review mode:autofix`
|
||||||
|
- `slfg` parallel phase -> `/ce:review mode:report-only`
|
||||||
|
- Contract test in `tests/review-skill-contract.test.ts` enforces these mode flags
|
||||||
|
|
||||||
|
## Prevention
|
||||||
|
|
||||||
|
- When a beta skill changes invocation semantics, its promotion plan must include caller updates as a first-class implementation unit
|
||||||
|
- Promotion PRs should be atomic: promote the skill and update orchestrators in the same branch
|
||||||
|
- Add contract coverage for the promoted callsites so future refactors cannot silently drop required mode flags
|
||||||
|
- Do not rely on “remembering later” for orchestration mode changes; encode them in docs, plans, and tests
|
||||||
@@ -13,7 +13,7 @@ severity: medium
|
|||||||
description: "Pattern for trialing new skill versions alongside stable ones using a -beta suffix. Covers naming, plan file naming, internal references, and promotion path."
|
description: "Pattern for trialing new skill versions alongside stable ones using a -beta suffix. Covers naming, plan file naming, internal references, and promotion path."
|
||||||
related:
|
related:
|
||||||
- docs/solutions/skill-design/compound-refresh-skill-improvements.md
|
- docs/solutions/skill-design/compound-refresh-skill-improvements.md
|
||||||
- docs/solutions/skill-design/review-skill-promotion-orchestration-contract.md
|
- docs/solutions/skill-design/beta-promotion-orchestration-contract.md
|
||||||
---
|
---
|
||||||
|
|
||||||
## Problem
|
## Problem
|
||||||
@@ -80,7 +80,7 @@ When the beta version is validated:
|
|||||||
8. Verify `lfg`/`slfg` work with the promoted skill
|
8. Verify `lfg`/`slfg` work with the promoted skill
|
||||||
9. Verify `ce:work` consumes plans from the promoted skill
|
9. Verify `ce:work` consumes plans from the promoted skill
|
||||||
|
|
||||||
If the beta skill changed its invocation contract, promotion must also update all orchestration callers in the same PR instead of relying on the stable default behavior. See [review-skill-promotion-orchestration-contract.md](./review-skill-promotion-orchestration-contract.md) for the concrete review-skill example.
|
If the beta skill changed its invocation contract, promotion must also update all orchestration callers in the same PR instead of relying on the stable default behavior. See [beta-promotion-orchestration-contract.md](./beta-promotion-orchestration-contract.md) for the concrete review-skill example.
|
||||||
|
|
||||||
## Validation
|
## Validation
|
||||||
|
|
||||||
|
|||||||
@@ -1,80 +0,0 @@
|
|||||||
---
|
|
||||||
title: "Promoting review-beta to stable must update orchestration callers in the same change"
|
|
||||||
category: skill-design
|
|
||||||
date: 2026-03-23
|
|
||||||
module: plugins/compound-engineering/skills
|
|
||||||
component: SKILL.md
|
|
||||||
tags:
|
|
||||||
- skill-design
|
|
||||||
- beta-testing
|
|
||||||
- rollout-safety
|
|
||||||
- orchestration
|
|
||||||
- review-workflow
|
|
||||||
severity: medium
|
|
||||||
description: "When ce:review-beta is promoted to stable, update lfg/slfg in the same PR so they pass the correct mode instead of inheriting the interactive default."
|
|
||||||
related:
|
|
||||||
- docs/solutions/skill-design/beta-skills-framework.md
|
|
||||||
- docs/plans/2026-03-23-001-feat-ce-review-beta-pipeline-mode-beta-plan.md
|
|
||||||
---
|
|
||||||
|
|
||||||
## Problem
|
|
||||||
|
|
||||||
`ce:review-beta` introduces an explicit mode contract:
|
|
||||||
|
|
||||||
- default `interactive`
|
|
||||||
- `mode:autonomous`
|
|
||||||
- `mode:report-only`
|
|
||||||
|
|
||||||
That is correct for direct user invocation, but it creates a promotion hazard. If the beta skill is later promoted over stable `ce:review` without updating its orchestration callers, the surrounding workflows will silently inherit the interactive default.
|
|
||||||
|
|
||||||
For the current review workflow family, that would be wrong:
|
|
||||||
|
|
||||||
- `lfg` should run review in `mode:autonomous`
|
|
||||||
- `slfg` should run review in `mode:report-only` during its parallel review/browser phase
|
|
||||||
|
|
||||||
Without those caller changes, promotion would keep the skill name stable while changing its contract, which is exactly the kind of boundary drift that tends to escape manual review.
|
|
||||||
|
|
||||||
## Solution
|
|
||||||
|
|
||||||
Treat promotion as an orchestration contract change, not a file rename.
|
|
||||||
|
|
||||||
When promoting `ce:review-beta` to stable:
|
|
||||||
|
|
||||||
1. Replace stable `ce:review` with the promoted content
|
|
||||||
2. Update every workflow that invokes `ce:review` in the same PR
|
|
||||||
3. Hardcode the intended mode at each callsite instead of relying on the default
|
|
||||||
4. Add or update contract tests so the orchestration assumptions are executable
|
|
||||||
|
|
||||||
For the review workflow family, the expected caller contract is:
|
|
||||||
|
|
||||||
- `lfg` -> `ce:review mode:autonomous`
|
|
||||||
- `slfg` parallel phase -> `ce:review mode:report-only`
|
|
||||||
- any mutating review step in `slfg` must happen later, sequentially, or in an isolated checkout/worktree
|
|
||||||
|
|
||||||
## Why This Lives Here
|
|
||||||
|
|
||||||
This is not a good `AGENTS.md` note:
|
|
||||||
|
|
||||||
- it is specific to one beta-to-stable promotion
|
|
||||||
- it is easy for a temporary repo-global reminder to become stale
|
|
||||||
- future planning and review work is more likely to search `docs/solutions/skill-design/` than to rediscover an old ad hoc note in `AGENTS.md`
|
|
||||||
|
|
||||||
The durable memory should live with the other skill-design rollout patterns.
|
|
||||||
|
|
||||||
## Prevention
|
|
||||||
|
|
||||||
- When a beta skill changes invocation semantics, its promotion plan must include caller updates as a first-class implementation unit
|
|
||||||
- Promotion PRs should be atomic: promote the skill and update orchestrators in the same branch
|
|
||||||
- Add contract coverage for the promoted callsites so future refactors cannot silently drop required mode flags
|
|
||||||
- Do not rely on “remembering later” for orchestration mode changes; encode them in docs, plans, and tests
|
|
||||||
|
|
||||||
## Lifecycle Note
|
|
||||||
|
|
||||||
This note is intentionally tied to the `ce:review-beta` -> `ce:review` promotion window.
|
|
||||||
|
|
||||||
Once that promotion is complete and the stable orchestrators/tests already encode the contract:
|
|
||||||
|
|
||||||
- update or archive this doc if it no longer adds distinct value
|
|
||||||
- do not leave it behind as a stale reminder for a promotion that already happened
|
|
||||||
|
|
||||||
If the final stable design differs from the current expectation, revise this doc during the promotion PR so the historical note matches what actually shipped.
|
|
||||||
@@ -11,7 +11,6 @@ tags:
|
|||||||
related_components:
|
related_components:
|
||||||
- plugins/compound-engineering/skills/todo-resolve/
|
- plugins/compound-engineering/skills/todo-resolve/
|
||||||
- plugins/compound-engineering/skills/ce-review/
|
- plugins/compound-engineering/skills/ce-review/
|
||||||
- plugins/compound-engineering/skills/ce-review-beta/
|
|
||||||
- plugins/compound-engineering/skills/todo-triage/
|
- plugins/compound-engineering/skills/todo-triage/
|
||||||
- plugins/compound-engineering/skills/todo-create/
|
- plugins/compound-engineering/skills/todo-create/
|
||||||
problem_type: correctness-gap
|
problem_type: correctness-gap
|
||||||
@@ -21,12 +20,11 @@ problem_type: correctness-gap
|
|||||||
|
|
||||||
## Problem
|
## Problem
|
||||||
|
|
||||||
The todo system defines a three-state lifecycle (`pending` -> `ready` -> `complete`) across three skills (`todo-create`, `todo-triage`, `todo-resolve`). Two review skills create todos with different status assumptions:
|
The todo system defines a three-state lifecycle (`pending` -> `ready` -> `complete`) across three skills (`todo-create`, `todo-triage`, `todo-resolve`). Different sources create todos with different status assumptions:
|
||||||
|
|
||||||
| Source | Status created | Reasoning |
|
| Source | Status created | Reasoning |
|
||||||
|--------|---------------|-----------|
|
|--------|---------------|-----------|
|
||||||
| `ce:review` | `pending` | Dumps all findings, expects separate `/todo-triage` |
|
| `ce:review` (autofix mode) | `ready` | Built-in triage: confidence gating (>0.60), merge/dedup across 8 personas, owner routing. Only creates todos for `downstream-resolver` findings |
|
||||||
| `ce:review-beta` | `ready` | Built-in triage: confidence gating (>0.60), merge/dedup across 8 personas, owner routing. Only creates todos for `downstream-resolver` findings |
|
|
||||||
| `todo-create` (manual) | `pending` (default) | Template default |
|
| `todo-create` (manual) | `pending` (default) | Template default |
|
||||||
| `test-browser`, `test-xcode` | via `todo-create` | Inherit default |
|
| `test-browser`, `test-xcode` | via `todo-create` | Inherit default |
|
||||||
|
|
||||||
@@ -46,9 +44,9 @@ Updated `todo-resolve` to partition todos by status in its Analyze step:
|
|||||||
|
|
||||||
This is a single-file change scoped to `todo-resolve/SKILL.md`. No schema changes, no new fields, no changes to `todo-create` or `todo-triage` -- just enforcement of the existing contract at the resolve boundary.
|
This is a single-file change scoped to `todo-resolve/SKILL.md`. No schema changes, no new fields, no changes to `todo-create` or `todo-triage` -- just enforcement of the existing contract at the resolve boundary.
|
||||||
|
|
||||||
## Key Insight: Review-Beta Promotion Eliminates Automated `pending`
|
## Key Insight: No Automated Source Creates `pending` Todos
|
||||||
|
|
||||||
Once `ce:review-beta` is promoted to stable (replacing `ce:review`), no automated source creates `pending` todos. The `pending` status becomes exclusively a human-authored state for manually created work items that need triage before action.
|
No automated source creates `pending` todos. The `pending` status is exclusively a human-authored state for manually created work items that need triage before action.
|
||||||
|
|
||||||
The safety model becomes:
|
The safety model becomes:
|
||||||
- **`ready`** = autofix-eligible. Triage already happened upstream (either built into the review pipeline or via explicit `/todo-triage`).
|
- **`ready`** = autofix-eligible. Triage already happened upstream (either built into the review pipeline or via explicit `/todo-triage`).
|
||||||
@@ -76,6 +74,6 @@ When a skill creates artifacts for downstream consumption, it should state which
|
|||||||
|
|
||||||
## Cross-References
|
## Cross-References
|
||||||
|
|
||||||
- [review-skill-promotion-orchestration-contract.md](../skill-design/review-skill-promotion-orchestration-contract.md) -- promotion hazard: if mode flags are dropped during promotion, the wrong artifacts are produced upstream
|
- [beta-promotion-orchestration-contract.md](../skill-design/beta-promotion-orchestration-contract.md) -- promotion hazard: if mode flags are dropped during promotion, the wrong artifacts are produced upstream
|
||||||
- [compound-refresh-skill-improvements.md](../skill-design/compound-refresh-skill-improvements.md) -- "conservative confidence in autonomous mode" principle that motivates status enforcement
|
- [compound-refresh-skill-improvements.md](../skill-design/compound-refresh-skill-improvements.md) -- "conservative confidence in autonomous mode" principle that motivates status enforcement
|
||||||
- [claude-permissions-optimizer-classification-fix.md](../skill-design/claude-permissions-optimizer-classification-fix.md) -- "pipeline ordering is an architectural invariant" pattern; the same concept applies to the review -> triage -> resolve pipeline
|
- [claude-permissions-optimizer-classification-fix.md](../skill-design/claude-permissions-optimizer-classification-fix.md) -- "pipeline ordering is an architectural invariant" pattern; the same concept applies to the review -> triage -> resolve pipeline
|
||||||
|
|||||||
@@ -19,28 +19,28 @@ Agents are organized into categories for easier discovery.
|
|||||||
| Agent | Description |
|
| Agent | Description |
|
||||||
|-------|-------------|
|
|-------|-------------|
|
||||||
| `agent-native-reviewer` | Verify features are agent-native (action + context parity) |
|
| `agent-native-reviewer` | Verify features are agent-native (action + context parity) |
|
||||||
| `api-contract-reviewer` | Detect breaking API contract changes (ce:review-beta persona) |
|
| `api-contract-reviewer` | Detect breaking API contract changes |
|
||||||
| `architecture-strategist` | Analyze architectural decisions and compliance |
|
| `architecture-strategist` | Analyze architectural decisions and compliance |
|
||||||
| `code-simplicity-reviewer` | Final pass for simplicity and minimalism |
|
| `code-simplicity-reviewer` | Final pass for simplicity and minimalism |
|
||||||
| `correctness-reviewer` | Logic errors, edge cases, state bugs (ce:review-beta persona) |
|
| `correctness-reviewer` | Logic errors, edge cases, state bugs |
|
||||||
| `data-integrity-guardian` | Database migrations and data integrity |
|
| `data-integrity-guardian` | Database migrations and data integrity |
|
||||||
| `data-migration-expert` | Validate ID mappings match production, check for swapped values |
|
| `data-migration-expert` | Validate ID mappings match production, check for swapped values |
|
||||||
| `data-migrations-reviewer` | Migration safety with confidence calibration (ce:review-beta persona) |
|
| `data-migrations-reviewer` | Migration safety with confidence calibration |
|
||||||
| `deployment-verification-agent` | Create Go/No-Go deployment checklists for risky data changes |
|
| `deployment-verification-agent` | Create Go/No-Go deployment checklists for risky data changes |
|
||||||
| `dhh-rails-reviewer` | Rails review from DHH's perspective |
|
| `dhh-rails-reviewer` | Rails review from DHH's perspective |
|
||||||
| `julik-frontend-races-reviewer` | Review JavaScript/Stimulus code for race conditions |
|
| `julik-frontend-races-reviewer` | Review JavaScript/Stimulus code for race conditions |
|
||||||
| `kieran-rails-reviewer` | Rails code review with strict conventions |
|
| `kieran-rails-reviewer` | Rails code review with strict conventions |
|
||||||
| `kieran-python-reviewer` | Python code review with strict conventions |
|
| `kieran-python-reviewer` | Python code review with strict conventions |
|
||||||
| `kieran-typescript-reviewer` | TypeScript code review with strict conventions |
|
| `kieran-typescript-reviewer` | TypeScript code review with strict conventions |
|
||||||
| `maintainability-reviewer` | Coupling, complexity, naming, dead code (ce:review-beta persona) |
|
| `maintainability-reviewer` | Coupling, complexity, naming, dead code |
|
||||||
| `pattern-recognition-specialist` | Analyze code for patterns and anti-patterns |
|
| `pattern-recognition-specialist` | Analyze code for patterns and anti-patterns |
|
||||||
| `performance-oracle` | Performance analysis and optimization |
|
| `performance-oracle` | Performance analysis and optimization |
|
||||||
| `performance-reviewer` | Runtime performance with confidence calibration (ce:review-beta persona) |
|
| `performance-reviewer` | Runtime performance with confidence calibration |
|
||||||
| `reliability-reviewer` | Production reliability and failure modes (ce:review-beta persona) |
|
| `reliability-reviewer` | Production reliability and failure modes |
|
||||||
| `schema-drift-detector` | Detect unrelated schema.rb changes in PRs |
|
| `schema-drift-detector` | Detect unrelated schema.rb changes in PRs |
|
||||||
| `security-reviewer` | Exploitable vulnerabilities with confidence calibration (ce:review-beta persona) |
|
| `security-reviewer` | Exploitable vulnerabilities with confidence calibration |
|
||||||
| `security-sentinel` | Security audits and vulnerability assessments |
|
| `security-sentinel` | Security audits and vulnerability assessments |
|
||||||
| `testing-reviewer` | Test coverage gaps, weak assertions (ce:review-beta persona) |
|
| `testing-reviewer` | Test coverage gaps, weak assertions |
|
||||||
|
|
||||||
### Document Review
|
### Document Review
|
||||||
|
|
||||||
@@ -98,7 +98,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
|
|||||||
| `/ce:ideate` | Discover high-impact project improvements through divergent ideation and adversarial filtering |
|
| `/ce:ideate` | Discover high-impact project improvements through divergent ideation and adversarial filtering |
|
||||||
| `/ce:brainstorm` | Explore requirements and approaches before planning |
|
| `/ce:brainstorm` | Explore requirements and approaches before planning |
|
||||||
| `/ce:plan` | Transform features into structured implementation plans grounded in repo patterns |
|
| `/ce:plan` | Transform features into structured implementation plans grounded in repo patterns |
|
||||||
| `/ce:review` | Run comprehensive code reviews |
|
| `/ce:review` | Structured code review with tiered persona agents, confidence gating, and dedup pipeline |
|
||||||
| `/ce:work` | Execute work items systematically |
|
| `/ce:work` | Execute work items systematically |
|
||||||
| `/ce:compound` | Document solved problems to compound team knowledge |
|
| `/ce:compound` | Document solved problems to compound team knowledge |
|
||||||
| `/ce:compound-refresh` | Refresh stale or drifting learnings and decide whether to keep, update, replace, or archive them |
|
| `/ce:compound-refresh` | Refresh stale or drifting learnings and decide whether to keep, update, replace, or archive them |
|
||||||
@@ -172,16 +172,6 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
|
|||||||
|-------|-------------|
|
|-------|-------------|
|
||||||
| `agent-browser` | CLI-based browser automation using Vercel's agent-browser |
|
| `agent-browser` | CLI-based browser automation using Vercel's agent-browser |
|
||||||
|
|
||||||
### Beta Skills
|
|
||||||
|
|
||||||
Experimental versions of core workflow skills. These are being tested before replacing their stable counterparts. They work standalone but are not yet wired into the automated `lfg`/`slfg` orchestration.
|
|
||||||
|
|
||||||
| Skill | Description | Replaces |
|
|
||||||
|-------|-------------|----------|
|
|
||||||
| `ce:review-beta` | Structured review with tiered persona agents, confidence gating, and dedup pipeline | `ce:review` |
|
|
||||||
|
|
||||||
To test: invoke `/ce:review-beta` directly.
|
|
||||||
|
|
||||||
### Image Generation
|
### Image Generation
|
||||||
|
|
||||||
| Skill | Description |
|
| Skill | Description |
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
name: api-contract-reviewer
|
name: api-contract-reviewer
|
||||||
description: Conditional code-review persona, selected when the diff touches API routes, request/response types, serialization, versioning, or exported type signatures. Reviews code for breaking contract changes. Spawned by the ce:review-beta skill as part of a reviewer ensemble.
|
description: Conditional code-review persona, selected when the diff touches API routes, request/response types, serialization, versioning, or exported type signatures. Reviews code for breaking contract changes.
|
||||||
model: inherit
|
model: inherit
|
||||||
tools: Read, Grep, Glob, Bash
|
tools: Read, Grep, Glob, Bash
|
||||||
color: blue
|
color: blue
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
name: correctness-reviewer
|
name: correctness-reviewer
|
||||||
description: Always-on code-review persona. Reviews code for logic errors, edge cases, state management bugs, error propagation failures, and intent-vs-implementation mismatches. Spawned by the ce:review-beta skill as part of a reviewer ensemble.
|
description: Always-on code-review persona. Reviews code for logic errors, edge cases, state management bugs, error propagation failures, and intent-vs-implementation mismatches.
|
||||||
model: inherit
|
model: inherit
|
||||||
tools: Read, Grep, Glob, Bash
|
tools: Read, Grep, Glob, Bash
|
||||||
color: blue
|
color: blue
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
name: data-migrations-reviewer
|
name: data-migrations-reviewer
|
||||||
description: Conditional code-review persona, selected when the diff touches migration files, schema changes, data transformations, or backfill scripts. Reviews code for data integrity and migration safety. Spawned by the ce:review-beta skill as part of a reviewer ensemble.
|
description: Conditional code-review persona, selected when the diff touches migration files, schema changes, data transformations, or backfill scripts. Reviews code for data integrity and migration safety.
|
||||||
model: inherit
|
model: inherit
|
||||||
tools: Read, Grep, Glob, Bash
|
tools: Read, Grep, Glob, Bash
|
||||||
color: blue
|
color: blue
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
name: maintainability-reviewer
|
name: maintainability-reviewer
|
||||||
description: Always-on code-review persona. Reviews code for premature abstraction, unnecessary indirection, dead code, coupling between unrelated modules, and naming that obscures intent. Spawned by the ce:review-beta skill as part of a reviewer ensemble.
|
description: Always-on code-review persona. Reviews code for premature abstraction, unnecessary indirection, dead code, coupling between unrelated modules, and naming that obscures intent.
|
||||||
model: inherit
|
model: inherit
|
||||||
tools: Read, Grep, Glob, Bash
|
tools: Read, Grep, Glob, Bash
|
||||||
color: blue
|
color: blue
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
name: performance-reviewer
|
name: performance-reviewer
|
||||||
description: Conditional code-review persona, selected when the diff touches database queries, loop-heavy data transforms, caching layers, or I/O-intensive paths. Reviews code for runtime performance and scalability issues. Spawned by the ce:review-beta skill as part of a reviewer ensemble.
|
description: Conditional code-review persona, selected when the diff touches database queries, loop-heavy data transforms, caching layers, or I/O-intensive paths. Reviews code for runtime performance and scalability issues.
|
||||||
model: inherit
|
model: inherit
|
||||||
tools: Read, Grep, Glob, Bash
|
tools: Read, Grep, Glob, Bash
|
||||||
color: blue
|
color: blue
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
name: reliability-reviewer
|
name: reliability-reviewer
|
||||||
description: Conditional code-review persona, selected when the diff touches error handling, retries, circuit breakers, timeouts, health checks, background jobs, or async handlers. Reviews code for production reliability and failure modes. Spawned by the ce:review-beta skill as part of a reviewer ensemble.
|
description: Conditional code-review persona, selected when the diff touches error handling, retries, circuit breakers, timeouts, health checks, background jobs, or async handlers. Reviews code for production reliability and failure modes.
|
||||||
model: inherit
|
model: inherit
|
||||||
tools: Read, Grep, Glob, Bash
|
tools: Read, Grep, Glob, Bash
|
||||||
color: blue
|
color: blue
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
name: security-reviewer
|
name: security-reviewer
|
||||||
description: Conditional code-review persona, selected when the diff touches auth middleware, public endpoints, user input handling, or permission checks. Reviews code for exploitable vulnerabilities. Spawned by the ce:review-beta skill as part of a reviewer ensemble.
|
description: Conditional code-review persona, selected when the diff touches auth middleware, public endpoints, user input handling, or permission checks. Reviews code for exploitable vulnerabilities.
|
||||||
model: inherit
|
model: inherit
|
||||||
tools: Read, Grep, Glob, Bash
|
tools: Read, Grep, Glob, Bash
|
||||||
color: blue
|
color: blue
|
||||||
|
|||||||
@@ -1,6 +1,6 @@
|
|||||||
---
|
---
|
||||||
name: testing-reviewer
|
name: testing-reviewer
|
||||||
description: Always-on code-review persona. Reviews code for test coverage gaps, weak assertions, brittle implementation-coupled tests, and missing edge case coverage. Spawned by the ce:review-beta skill as part of a reviewer ensemble.
|
description: Always-on code-review persona. Reviews code for test coverage gaps, weak assertions, brittle implementation-coupled tests, and missing edge case coverage.
|
||||||
model: inherit
|
model: inherit
|
||||||
tools: Read, Grep, Glob, Bash
|
tools: Read, Grep, Glob, Bash
|
||||||
color: blue
|
color: blue
|
||||||
|
|||||||
@@ -1,506 +0,0 @@
|
|||||||
---
|
|
||||||
name: ce:review-beta
|
|
||||||
description: "[BETA] Structured code review using tiered persona agents, confidence-gated findings, and a merge/dedup pipeline. Use when reviewing code changes before creating a PR."
|
|
||||||
argument-hint: "[mode:autonomous|mode:report-only] [PR number, GitHub URL, or branch name]"
|
|
||||||
disable-model-invocation: true
|
|
||||||
---
|
|
||||||
|
|
||||||
# Code Review (Beta)
|
|
||||||
|
|
||||||
Reviews code changes using dynamically selected reviewer personas. Spawns parallel sub-agents that return structured JSON, then merges and deduplicates findings into a single report.
|
|
||||||
|
|
||||||
## When to Use
|
|
||||||
|
|
||||||
- Before creating a PR
|
|
||||||
- After completing a task during iterative implementation
|
|
||||||
- When feedback is needed on any code changes
|
|
||||||
- Can be invoked standalone
|
|
||||||
- Can run as a read-only or autonomous review step inside larger workflows
|
|
||||||
|
|
||||||
## Mode Detection
|
|
||||||
|
|
||||||
Check `$ARGUMENTS` for `mode:autonomous` or `mode:report-only`. If either token is present, strip it from the remaining arguments before interpreting the rest as the PR number, GitHub URL, or branch name.
|
|
||||||
|
|
||||||
| Mode | When | Behavior |
|
|
||||||
|------|------|----------|
|
|
||||||
| **Interactive** (default) | No mode token present | Review, present findings, ask for policy decisions when needed, and optionally continue into fix/push/PR next steps |
|
|
||||||
| **Autonomous** | `mode:autonomous` in arguments | No user interaction. Review, apply only policy-allowed `safe_auto` fixes, re-review in bounded rounds, write a run artifact, and emit residual downstream work when needed |
|
|
||||||
| **Report-only** | `mode:report-only` in arguments | Strictly read-only. Review and report only, then stop with no edits, artifacts, todos, commits, pushes, or PR actions |
|
|
||||||
|
|
||||||
### Autonomous mode rules
|
|
||||||
|
|
||||||
- **Skip all user questions.** Never pause for approval or clarification once scope has been established.
|
|
||||||
- **Apply only `safe_auto -> review-fixer` findings.** Leave `gated_auto`, `manual`, `human`, and `release` work unresolved.
|
|
||||||
- **Write a run artifact** under `.context/compound-engineering/ce-review-beta/<run-id>/` summarizing findings, applied fixes, residual actionable work, and advisory outputs.
|
|
||||||
- **Create durable todo files only for unresolved actionable findings** whose final owner is `downstream-resolver`. Load the `todo-create` skill for the canonical directory path and naming convention.
|
|
||||||
- **Never commit, push, or create a PR** from autonomous mode. Parent workflows own those decisions.
|
|
||||||
|
|
||||||
### Report-only mode rules
|
|
||||||
|
|
||||||
- **Skip all user questions.** Infer intent conservatively if the diff metadata is thin.
|
|
||||||
- **Never edit files or externalize work.** Do not write `.context/compound-engineering/ce-review-beta/<run-id>/`, do not create todo files, and do not commit, push, or create a PR.
|
|
||||||
- **Safe for parallel read-only verification.** `mode:report-only` is the only mode that is safe to run concurrently with browser testing on the same checkout.
|
|
||||||
- **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:report-only` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`.
|
|
||||||
- **Do not overlap mutating review with browser testing on the same checkout.** If a future orchestrator wants fixes, run the mutating review phase after browser testing or in an isolated checkout/worktree.
|
|
||||||
|
|
||||||
## Severity Scale
|
|
||||||
|
|
||||||
All reviewers use P0-P3:
|
|
||||||
|
|
||||||
| Level | Meaning | Action |
|
|
||||||
|-------|---------|--------|
|
|
||||||
| **P0** | Critical breakage, exploitable vulnerability, data loss/corruption | Must fix before merge |
|
|
||||||
| **P1** | High-impact defect likely hit in normal usage, breaking contract | Should fix |
|
|
||||||
| **P2** | Moderate issue with meaningful downside (edge case, perf regression, maintainability trap) | Fix if straightforward |
|
|
||||||
| **P3** | Low-impact, narrow scope, minor improvement | User's discretion |
|
|
||||||
|
|
||||||
## Action Routing
|
|
||||||
|
|
||||||
Severity answers **urgency**. Routing answers **who acts next** and **whether this skill may mutate the checkout**.
|
|
||||||
|
|
||||||
| `autofix_class` | Default owner | Meaning |
|
|
||||||
|-----------------|---------------|---------|
|
|
||||||
| `safe_auto` | `review-fixer` | Local, deterministic fix suitable for the in-skill fixer when the current mode allows mutation |
|
|
||||||
| `gated_auto` | `downstream-resolver` or `human` | Concrete fix exists, but it changes behavior, contracts, permissions, or another sensitive boundary that should not be auto-applied by default |
|
|
||||||
| `manual` | `downstream-resolver` or `human` | Actionable work that should be handed off rather than fixed in-skill |
|
|
||||||
| `advisory` | `human` or `release` | Report-only output such as learnings, rollout notes, or residual risk |
|
|
||||||
|
|
||||||
Routing rules:
|
|
||||||
|
|
||||||
- **Synthesis owns the final route.** Persona-provided routing metadata is input, not the last word.
|
|
||||||
- **Choose the more conservative route on disagreement.** A merged finding may move from `safe_auto` to `gated_auto` or `manual`, but never the other way without stronger evidence.
|
|
||||||
- **Only `safe_auto -> review-fixer` enters the in-skill fixer queue automatically.**
|
|
||||||
- **`requires_verification: true` means a fix is not complete without targeted tests, a focused re-review, or operational validation.**
|
|
||||||
|
|
||||||
## Reviewers
|
|
||||||
|
|
||||||
8 personas in two tiers, plus CE-specific agents. See [persona-catalog.md](./references/persona-catalog.md) for the full catalog.
|
|
||||||
|
|
||||||
**Always-on (every review):**
|
|
||||||
|
|
||||||
| Agent | Focus |
|
|
||||||
|-------|-------|
|
|
||||||
| `compound-engineering:review:correctness-reviewer` | Logic errors, edge cases, state bugs, error propagation |
|
|
||||||
| `compound-engineering:review:testing-reviewer` | Coverage gaps, weak assertions, brittle tests |
|
|
||||||
| `compound-engineering:review:maintainability-reviewer` | Coupling, complexity, naming, dead code, abstraction debt |
|
|
||||||
| `compound-engineering:review:agent-native-reviewer` | Verify new features are agent-accessible |
|
|
||||||
| `compound-engineering:research:learnings-researcher` | Search docs/solutions/ for past issues related to this PR |
|
|
||||||
|
|
||||||
**Conditional (selected per diff):**
|
|
||||||
|
|
||||||
| Agent | Select when diff touches... |
|
|
||||||
|-------|---------------------------|
|
|
||||||
| `compound-engineering:review:security-reviewer` | Auth, public endpoints, user input, permissions |
|
|
||||||
| `compound-engineering:review:performance-reviewer` | DB queries, data transforms, caching, async |
|
|
||||||
| `compound-engineering:review:api-contract-reviewer` | Routes, serializers, type signatures, versioning |
|
|
||||||
| `compound-engineering:review:data-migrations-reviewer` | Migrations, schema changes, backfills |
|
|
||||||
| `compound-engineering:review:reliability-reviewer` | Error handling, retries, timeouts, background jobs |
|
|
||||||
|
|
||||||
**CE conditional (migration-specific):**
|
|
||||||
|
|
||||||
| Agent | Select when diff includes migration files |
|
|
||||||
|-------|------------------------------------------|
|
|
||||||
| `compound-engineering:review:schema-drift-detector` | Cross-references schema.rb against included migrations |
|
|
||||||
| `compound-engineering:review:deployment-verification-agent` | Produces deployment checklist with SQL verification queries |
|
|
||||||
|
|
||||||
## Review Scope
|
|
||||||
|
|
||||||
Every review spawns all 3 always-on personas plus the 2 CE always-on agents, then adds applicable conditionals. The tier model naturally right-sizes: a small config change triggers 0 conditionals = 5 reviewers. A large auth feature triggers security + maybe reliability = 7 reviewers.
|
|
||||||
|
|
||||||
## Protected Artifacts
|
|
||||||
|
|
||||||
The following paths are compound-engineering pipeline artifacts and must never be flagged for deletion, removal, or gitignore by any reviewer:
|
|
||||||
|
|
||||||
- `docs/brainstorms/*` -- requirements documents created by ce:brainstorm
|
|
||||||
- `docs/plans/*.md` -- plan files created by ce:plan (living documents with progress checkboxes)
|
|
||||||
- `docs/solutions/*.md` -- solution documents created during the pipeline
|
|
||||||
|
|
||||||
If a reviewer flags any file in these directories for cleanup or removal, discard that finding during synthesis.
|
|
||||||
|
|
||||||
## How to Run
|
|
||||||
|
|
||||||
### Stage 1: Determine scope
|
|
||||||
|
|
||||||
Compute the diff range, file list, and diff. Minimize permission prompts by combining into as few commands as possible.
|
|
||||||
|
|
||||||
**If a PR number or GitHub URL is provided as an argument:**
|
|
||||||
|
|
||||||
If `mode:report-only` is active, do **not** run `gh pr checkout <number-or-url>` on the shared checkout. Tell the caller: "mode:report-only cannot switch the shared checkout to review a PR target. Run it from an isolated worktree/checkout for that PR, or run report-only with no target argument on the already checked out branch." Stop here unless the review is already running in an isolated checkout.
|
|
||||||
|
|
||||||
First, verify the worktree is clean before switching branches:
|
|
||||||
|
|
||||||
```
|
|
||||||
git status --porcelain
|
|
||||||
```
|
|
||||||
|
|
||||||
If the output is non-empty, inform the user: "You have uncommitted changes on the current branch. Stash or commit them before reviewing a PR, or use standalone mode (no argument) to review the current branch as-is." Do not proceed with checkout until the worktree is clean.
|
|
||||||
|
|
||||||
Then check out the PR branch so persona agents can read the actual code (not the current checkout):
|
|
||||||
|
|
||||||
```
|
|
||||||
gh pr checkout <number-or-url>
|
|
||||||
```
|
|
||||||
|
|
||||||
Then fetch PR metadata. Capture the base branch name and the PR base repository identity, not just the branch name:
|
|
||||||
|
|
||||||
```
|
|
||||||
gh pr view <number-or-url> --json title,body,baseRefName,headRefName,url
|
|
||||||
```
|
|
||||||
|
|
||||||
Use the repository portion of the returned PR URL as `<base-repo>` (for example, `EveryInc/compound-engineering-plugin` from `https://github.com/EveryInc/compound-engineering-plugin/pull/348`).
|
|
||||||
|
|
||||||
Then compute a local diff against the PR's base branch so re-reviews also include local fix commits and uncommitted edits. Substitute the PR base branch from metadata (shown here as `<base>`) and the PR base repository identity derived from the PR URL (shown here as `<base-repo>`). Resolve the base ref from the PR's actual base repository, not by assuming `origin` points at that repo:
|
|
||||||
|
|
||||||
```
|
|
||||||
PR_BASE_REMOTE=$(git remote -v | awk 'index($2, "github.com:<base-repo>") || index($2, "github.com/<base-repo>") {print $1; exit}')
|
|
||||||
if [ -n "$PR_BASE_REMOTE" ]; then PR_BASE_REMOTE_REF="$PR_BASE_REMOTE/<base>"; else PR_BASE_REMOTE_REF=""; fi
|
|
||||||
PR_BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE_REF" 2>/dev/null || git rev-parse --verify <base> 2>/dev/null || true)
|
|
||||||
if [ -z "$PR_BASE_REF" ]; then
|
|
||||||
if [ -n "$PR_BASE_REMOTE_REF" ]; then
|
|
||||||
git fetch --no-tags "$PR_BASE_REMOTE" <base>:refs/remotes/"$PR_BASE_REMOTE"/<base> 2>/dev/null || git fetch --no-tags "$PR_BASE_REMOTE" <base> 2>/dev/null || true
|
|
||||||
PR_BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE_REF" 2>/dev/null || git rev-parse --verify <base> 2>/dev/null || true)
|
|
||||||
else
|
|
||||||
if git fetch --no-tags https://github.com/<base-repo>.git <base> 2>/dev/null; then
|
|
||||||
PR_BASE_REF=$(git rev-parse --verify FETCH_HEAD 2>/dev/null || true)
|
|
||||||
fi
|
|
||||||
if [ -z "$PR_BASE_REF" ]; then PR_BASE_REF=$(git rev-parse --verify <base> 2>/dev/null || true); fi
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
if [ -n "$PR_BASE_REF" ]; then BASE=$(git merge-base HEAD "$PR_BASE_REF" 2>/dev/null) || BASE=""; else BASE=""; fi
|
|
||||||
```
|
|
||||||
|
|
||||||
```
|
|
||||||
if [ -n "$BASE" ]; then echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE && echo "UNTRACKED:" && git ls-files --others --exclude-standard; else echo "ERROR: Unable to resolve PR base branch <base> locally. Fetch the base branch and rerun so the review scope stays aligned with the PR."; fi
|
|
||||||
```
|
|
||||||
|
|
||||||
Extract PR title/body, base branch, and PR URL from `gh pr view`, then extract the base marker, file list, diff content, and `UNTRACKED:` list from the local command. Do not use `gh pr diff` as the review scope after checkout -- it only reflects the remote PR state and will miss local fix commits until they are pushed. If the base ref still cannot be resolved from the PR's actual base repository after the fetch attempt, stop instead of falling back to `git diff HEAD`; a PR review without the PR base branch is incomplete.
|
|
||||||
|
|
||||||
**If a branch name is provided as an argument:**
|
|
||||||
|
|
||||||
Check out the named branch, then diff it against the base branch. Substitute the provided branch name (shown here as `<branch>`).
|
|
||||||
|
|
||||||
If `mode:report-only` is active, do **not** run `git checkout <branch>` on the shared checkout. Tell the caller: "mode:report-only cannot switch the shared checkout to review another branch. Run it from an isolated worktree/checkout for `<branch>`, or run report-only on the current checkout with no target argument." Stop here unless the review is already running in an isolated checkout.
|
|
||||||
|
|
||||||
First, verify the worktree is clean before switching branches:
|
|
||||||
|
|
||||||
```
|
|
||||||
git status --porcelain
|
|
||||||
```
|
|
||||||
|
|
||||||
If the output is non-empty, inform the user: "You have uncommitted changes on the current branch. Stash or commit them before reviewing another branch, or provide a PR number instead." Do not proceed with checkout until the worktree is clean.
|
|
||||||
|
|
||||||
```
|
|
||||||
git checkout <branch>
|
|
||||||
```
|
|
||||||
|
|
||||||
Then detect the review base branch before computing the merge-base. When the branch has an open PR, resolve the base ref from the PR's actual base repository (not just `origin`), mirroring the PR-mode logic for fork safety. Fall back to `origin/HEAD`, GitHub metadata, then common branch names:
|
|
||||||
|
|
||||||
```
|
|
||||||
REVIEW_BASE_BRANCH=""
|
|
||||||
PR_BASE_REPO=""
|
|
||||||
if command -v gh >/dev/null 2>&1; then
|
|
||||||
PR_META=$(gh pr view --json baseRefName,url 2>/dev/null || true)
|
|
||||||
if [ -n "$PR_META" ]; then
|
|
||||||
REVIEW_BASE_BRANCH=$(echo "$PR_META" | jq -r '.baseRefName // empty')
|
|
||||||
PR_BASE_REPO=$(echo "$PR_META" | jq -r '.url // empty' | sed -n 's#https://github.com/\([^/]*/[^/]*\)/pull/.*#\1#p')
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
if [ -z "$REVIEW_BASE_BRANCH" ]; then REVIEW_BASE_BRANCH=$(git symbolic-ref --quiet --short refs/remotes/origin/HEAD 2>/dev/null | sed 's#^origin/##'); fi
|
|
||||||
if [ -z "$REVIEW_BASE_BRANCH" ] && command -v gh >/dev/null 2>&1; then REVIEW_BASE_BRANCH=$(gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name' 2>/dev/null); fi
|
|
||||||
if [ -z "$REVIEW_BASE_BRANCH" ]; then
|
|
||||||
for candidate in main master develop trunk; do
|
|
||||||
if git rev-parse --verify "origin/$candidate" >/dev/null 2>&1 || git rev-parse --verify "$candidate" >/dev/null 2>&1; then
|
|
||||||
REVIEW_BASE_BRANCH="$candidate"
|
|
||||||
break
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
fi
|
|
||||||
if [ -n "$REVIEW_BASE_BRANCH" ]; then
|
|
||||||
if [ -n "$PR_BASE_REPO" ]; then
|
|
||||||
PR_BASE_REMOTE=$(git remote -v | awk "index(\$2, \"github.com:$PR_BASE_REPO\") || index(\$2, \"github.com/$PR_BASE_REPO\") {print \$1; exit}")
|
|
||||||
if [ -n "$PR_BASE_REMOTE" ]; then
|
|
||||||
git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH" 2>/dev/null || true
|
|
||||||
BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
if [ -z "$BASE_REF" ]; then
|
|
||||||
git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags origin "$REVIEW_BASE_BRANCH" 2>/dev/null || true
|
|
||||||
BASE_REF=$(git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" 2>/dev/null || git rev-parse --verify "$REVIEW_BASE_BRANCH" 2>/dev/null || true)
|
|
||||||
fi
|
|
||||||
if [ -n "$BASE_REF" ]; then BASE=$(git merge-base HEAD "$BASE_REF" 2>/dev/null) || BASE=""; else BASE=""; fi
|
|
||||||
else BASE=""; fi
|
|
||||||
```
|
|
||||||
|
|
||||||
```
|
|
||||||
if [ -n "$BASE" ]; then echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE; elif git rev-parse HEAD >/dev/null 2>&1; then echo "BASE:none" && echo "FILES:" && git diff --name-only HEAD && echo "DIFF:" && git diff -U10 HEAD; else echo "BASE:none" && echo "FILES:" && git diff --cached --name-only && echo "DIFF:" && git diff --cached -U10; fi && echo "UNTRACKED:" && git ls-files --others --exclude-standard
|
|
||||||
```
|
|
||||||
|
|
||||||
If the branch has an open PR, the detection above uses the PR's base repository to resolve the merge-base, which handles fork workflows correctly. You may still fetch additional PR metadata with `gh pr view` for title, body, and linked issues, but do not fail if no PR exists.
|
|
||||||
|
|
||||||
**If no argument (standalone on current branch):**
|
|
||||||
|
|
||||||
Detect the review base branch before computing the merge-base. When the current branch has an open PR, resolve the base ref from the PR's actual base repository (not just `origin`), mirroring the PR-mode logic for fork safety. Fall back to `origin/HEAD`, GitHub metadata, then common branch names:
|
|
||||||
|
|
||||||
```
|
|
||||||
REVIEW_BASE_BRANCH=""
|
|
||||||
PR_BASE_REPO=""
|
|
||||||
if command -v gh >/dev/null 2>&1; then
|
|
||||||
PR_META=$(gh pr view --json baseRefName,url 2>/dev/null || true)
|
|
||||||
if [ -n "$PR_META" ]; then
|
|
||||||
REVIEW_BASE_BRANCH=$(echo "$PR_META" | jq -r '.baseRefName // empty')
|
|
||||||
PR_BASE_REPO=$(echo "$PR_META" | jq -r '.url // empty' | sed -n 's#https://github.com/\([^/]*/[^/]*\)/pull/.*#\1#p')
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
if [ -z "$REVIEW_BASE_BRANCH" ]; then REVIEW_BASE_BRANCH=$(git symbolic-ref --quiet --short refs/remotes/origin/HEAD 2>/dev/null | sed 's#^origin/##'); fi
|
|
||||||
if [ -z "$REVIEW_BASE_BRANCH" ] && command -v gh >/dev/null 2>&1; then REVIEW_BASE_BRANCH=$(gh repo view --json defaultBranchRef --jq '.defaultBranchRef.name' 2>/dev/null); fi
|
|
||||||
if [ -z "$REVIEW_BASE_BRANCH" ]; then
|
|
||||||
for candidate in main master develop trunk; do
|
|
||||||
if git rev-parse --verify "origin/$candidate" >/dev/null 2>&1 || git rev-parse --verify "$candidate" >/dev/null 2>&1; then
|
|
||||||
REVIEW_BASE_BRANCH="$candidate"
|
|
||||||
break
|
|
||||||
fi
|
|
||||||
done
|
|
||||||
fi
|
|
||||||
if [ -n "$REVIEW_BASE_BRANCH" ]; then
|
|
||||||
if [ -n "$PR_BASE_REPO" ]; then
|
|
||||||
PR_BASE_REMOTE=$(git remote -v | awk "index(\$2, \"github.com:$PR_BASE_REPO\") || index(\$2, \"github.com/$PR_BASE_REPO\") {print \$1; exit}")
|
|
||||||
if [ -n "$PR_BASE_REMOTE" ]; then
|
|
||||||
git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags "$PR_BASE_REMOTE" "$REVIEW_BASE_BRANCH" 2>/dev/null || true
|
|
||||||
BASE_REF=$(git rev-parse --verify "$PR_BASE_REMOTE/$REVIEW_BASE_BRANCH" 2>/dev/null || true)
|
|
||||||
fi
|
|
||||||
fi
|
|
||||||
if [ -z "$BASE_REF" ]; then
|
|
||||||
git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" >/dev/null 2>&1 || git fetch --no-tags origin "$REVIEW_BASE_BRANCH" 2>/dev/null || true
|
|
||||||
BASE_REF=$(git rev-parse --verify "origin/$REVIEW_BASE_BRANCH" 2>/dev/null || git rev-parse --verify "$REVIEW_BASE_BRANCH" 2>/dev/null || true)
|
|
||||||
fi
|
|
||||||
if [ -n "$BASE_REF" ]; then BASE=$(git merge-base HEAD "$BASE_REF" 2>/dev/null) || BASE=""; else BASE=""; fi
|
|
||||||
else BASE=""; fi
|
|
||||||
```
|
|
||||||
|
|
||||||
```
|
|
||||||
if [ -n "$BASE" ]; then echo "BASE:$BASE" && echo "FILES:" && git diff --name-only $BASE && echo "DIFF:" && git diff -U10 $BASE; elif git rev-parse HEAD >/dev/null 2>&1; then echo "BASE:none" && echo "FILES:" && git diff --name-only HEAD && echo "DIFF:" && git diff -U10 HEAD; else echo "BASE:none" && echo "FILES:" && git diff --cached --name-only && echo "DIFF:" && git diff --cached -U10; fi && echo "UNTRACKED:" && git ls-files --others --exclude-standard
|
|
||||||
```
|
|
||||||
|
|
||||||
Parse: `BASE:` = merge-base SHA (or `none`), `FILES:` = file list, `DIFF:` = diff, `UNTRACKED:` = files excluded from review scope because they are not staged. Using `git diff $BASE` (without `..HEAD`) diffs the merge-base against the working tree, which includes committed, staged, and unstaged changes together. When BASE is empty and HEAD exists, the fallback uses `git diff HEAD` which shows all uncommitted changes. When HEAD itself does not exist (initial commit in an empty repo), the fallback uses `git diff --cached` for staged changes.
|
|
||||||
|
|
||||||
**Untracked file handling:** Always inspect the `UNTRACKED:` list, even when `FILES:`/`DIFF:` are non-empty. Untracked files are outside review scope until staged. If the list is non-empty, tell the user which files are excluded. If any of them should be reviewed, stop and tell the user to `git add` them first and rerun. Only continue when the user is intentionally reviewing tracked changes only.
|
|
||||||
|
|
||||||
### Stage 2: Intent discovery
|
|
||||||
|
|
||||||
Understand what the change is trying to accomplish. The source of intent depends on which Stage 1 path was taken:
|
|
||||||
|
|
||||||
**PR/URL mode:** Use the PR title, body, and linked issues from `gh pr view` metadata. Supplement with commit messages from the PR if the body is sparse.
|
|
||||||
|
|
||||||
**Branch mode:** If `${BASE}` was resolved in Stage 1, run `git log --oneline ${BASE}..<branch>`. If no merge-base was available (Stage 1 fell back to `git diff HEAD` or `git diff --cached`), derive intent from the branch name and the diff content alone.
|
|
||||||
|
|
||||||
**Standalone (current branch):** If `${BASE}` was resolved in Stage 1, run:
|
|
||||||
|
|
||||||
```
|
|
||||||
echo "BRANCH:" && git rev-parse --abbrev-ref HEAD && echo "COMMITS:" && git log --oneline ${BASE}..HEAD
|
|
||||||
```
|
|
||||||
|
|
||||||
If no merge-base was available, use the branch name and diff content to infer intent.
|
|
||||||
|
|
||||||
Combined with conversation context (plan section summary, PR description, caller-provided description), write a 2-3 line intent summary:
|
|
||||||
|
|
||||||
```
|
|
||||||
Intent: Simplify tax calculation by replacing the multi-tier rate lookup
|
|
||||||
with a flat-rate computation. Must not regress edge cases in tax-exempt handling.
|
|
||||||
```
|
|
||||||
|
|
||||||
Pass this to every reviewer in their spawn prompt. Intent shapes *how hard each reviewer looks*, not which reviewers are selected.
|
|
||||||
|
|
||||||
**When intent is ambiguous:**
|
|
||||||
|
|
||||||
- **Interactive mode:** Ask one question using the platform's interactive question tool (AskUserQuestion in Claude Code, request_user_input in Codex): "What is the primary goal of these changes?" Do not spawn reviewers until intent is established.
|
|
||||||
- **Autonomous/report-only modes:** Infer intent conservatively from the branch name, diff, PR metadata, and caller context. Note the uncertainty in Coverage or Verdict reasoning instead of blocking.
|
|
||||||
|
|
||||||
### Stage 3: Select reviewers
|
|
||||||
|
|
||||||
Read the diff and file list from Stage 1. The 3 always-on personas and 2 CE always-on agents are automatic. For each conditional persona in [persona-catalog.md](./references/persona-catalog.md), decide whether the diff warrants it. This is agent judgment, not keyword matching.
|
|
||||||
|
|
||||||
For CE conditional agents, check if the diff includes files matching `db/migrate/*.rb`, `db/schema.rb`, or data backfill scripts.
|
|
||||||
|
|
||||||
Announce the team before spawning:
|
|
||||||
|
|
||||||
```
|
|
||||||
Review team:
|
|
||||||
- correctness (always)
|
|
||||||
- testing (always)
|
|
||||||
- maintainability (always)
|
|
||||||
- agent-native-reviewer (always)
|
|
||||||
- learnings-researcher (always)
|
|
||||||
- security -- new endpoint in routes.rb accepts user-provided redirect URL
|
|
||||||
- data-migrations -- adds migration 20260303_add_index_to_orders
|
|
||||||
- schema-drift-detector -- migration files present
|
|
||||||
```
|
|
||||||
|
|
||||||
This is progress reporting, not a blocking confirmation.
|
|
||||||
|
|
||||||
### Stage 4: Spawn sub-agents
|
|
||||||
|
|
||||||
Spawn each selected persona reviewer as a parallel sub-agent using the template in [subagent-template.md](./references/subagent-template.md). Each persona sub-agent receives:
|
|
||||||
|
|
||||||
1. Their persona file content (identity, failure modes, calibration, suppress conditions)
|
|
||||||
2. Shared diff-scope rules from [diff-scope.md](./references/diff-scope.md)
|
|
||||||
3. The JSON output contract from [findings-schema.json](./references/findings-schema.json)
|
|
||||||
4. Review context: intent summary, file list, diff
|
|
||||||
|
|
||||||
Persona sub-agents are **read-only**: they review and return structured JSON. They do not edit files or propose refactors.
|
|
||||||
|
|
||||||
Read-only here means **non-mutating**, not "no shell access." Reviewer sub-agents may use non-mutating inspection commands when needed to gather evidence or verify scope, including read-oriented `git` / `gh` usage such as `git diff`, `git show`, `git blame`, `git log`, and `gh pr view`. They must not edit files, change branches, commit, push, create PRs, or otherwise mutate the checkout or repository state.
|
|
||||||
|
|
||||||
Each persona sub-agent returns JSON matching [findings-schema.json](./references/findings-schema.json):
|
|
||||||
|
|
||||||
```json
|
|
||||||
{
|
|
||||||
"reviewer": "security",
|
|
||||||
"findings": [...],
|
|
||||||
"residual_risks": [...],
|
|
||||||
"testing_gaps": [...]
|
|
||||||
}
|
|
||||||
```
|
|
||||||
|
|
||||||
**CE always-on agents** (agent-native-reviewer, learnings-researcher) are dispatched as standard Agent calls in parallel with the persona agents. Give them the same review context bundle the personas receive: entry mode, any PR metadata gathered in Stage 1, intent summary, review base branch name when known, `BASE:` marker, file list, diff, and `UNTRACKED:` scope notes. Do not invoke them with a generic "review this" prompt. Their output is unstructured and synthesized separately in Stage 6.
|
|
||||||
|
|
||||||
**CE conditional agents** (schema-drift-detector, deployment-verification-agent) are also dispatched as standard Agent calls when applicable. Pass the same review context bundle plus the applicability reason (for example, which migration files triggered the agent). For schema-drift-detector specifically, pass the resolved review base branch explicitly so it never assumes `main`. Their output is unstructured and must be preserved for Stage 6 synthesis just like the CE always-on agents.
|
|
||||||
|
|
||||||
### Stage 5: Merge findings
|
|
||||||
|
|
||||||
Convert multiple reviewer JSON payloads into one deduplicated, confidence-gated finding set.
|
|
||||||
|
|
||||||
1. **Validate.** Check each output against the schema. Drop malformed findings (missing required fields). Record the drop count.
|
|
||||||
2. **Confidence gate.** Suppress findings below 0.60 confidence. Record the suppressed count. This matches the persona instructions: findings below 0.60 are noise and should not survive synthesis.
|
|
||||||
3. **Deduplicate.** Compute fingerprint: `normalize(file) + line_bucket(line, +/-3) + normalize(title)`. When fingerprints match, merge: keep highest severity, keep highest confidence with strongest evidence, union evidence, note which reviewers flagged it.
|
|
||||||
4. **Separate pre-existing.** Pull out findings with `pre_existing: true` into a separate list.
|
|
||||||
5. **Normalize routing.** For each merged finding, set the final `autofix_class`, `owner`, and `requires_verification`. If reviewers disagree, keep the most conservative route. Synthesis may narrow a finding from `safe_auto` to `gated_auto` or `manual`, but must not widen it without new evidence.
|
|
||||||
6. **Partition the work.** Build three sets:
|
|
||||||
- in-skill fixer queue: only `safe_auto -> review-fixer`
|
|
||||||
- residual actionable queue: unresolved `gated_auto` or `manual` findings whose owner is `downstream-resolver`
|
|
||||||
- report-only queue: `advisory` findings plus anything owned by `human` or `release`
|
|
||||||
7. **Sort.** Order by severity (P0 first) -> confidence (descending) -> file path -> line number.
|
|
||||||
8. **Collect coverage data.** Union residual_risks and testing_gaps across reviewers.
|
|
||||||
9. **Preserve CE agent artifacts.** Keep the learnings, agent-native, schema-drift, and deployment-verification outputs alongside the merged finding set. Do not drop unstructured agent output just because it does not match the persona JSON schema.
|
|
||||||
|
|
||||||
### Stage 6: Synthesize and present
|
|
||||||
|
|
||||||
Assemble the final report using the template in [review-output-template.md](./references/review-output-template.md):
|
|
||||||
|
|
||||||
1. **Header.** Scope, intent, mode, reviewer team with per-conditional justifications.
|
|
||||||
2. **Findings.** Grouped by severity (P0, P1, P2, P3). Each finding shows file, issue, reviewer(s), confidence, and synthesized route.
|
|
||||||
3. **Applied Fixes.** Include only if a fix phase ran in this invocation.
|
|
||||||
4. **Residual Actionable Work.** Include when unresolved actionable findings were handed off or should be handed off.
|
|
||||||
5. **Pre-existing.** Separate section, does not count toward verdict.
|
|
||||||
6. **Learnings & Past Solutions.** Surface learnings-researcher results: if past solutions are relevant, flag them as "Known Pattern" with links to docs/solutions/ files.
|
|
||||||
7. **Agent-Native Gaps.** Surface agent-native-reviewer results. Omit section if no gaps found.
|
|
||||||
8. **Schema Drift Check.** If schema-drift-detector ran, summarize whether drift was found. If drift exists, list the unrelated schema objects and the required cleanup command. If clean, say so briefly.
|
|
||||||
9. **Deployment Notes.** If deployment-verification-agent ran, surface the key Go/No-Go items: blocking pre-deploy checks, the most important verification queries, rollback caveats, and monitoring focus areas. Keep the checklist actionable rather than dropping it into Coverage.
|
|
||||||
10. **Coverage.** Suppressed count, residual risks, testing gaps, failed/timed-out reviewers, and any intent uncertainty carried by non-interactive modes.
|
|
||||||
11. **Verdict.** Ready to merge / Ready with fixes / Not ready. Fix order if applicable.
|
|
||||||
|
|
||||||
Do not include time estimates.
|
|
||||||
|
|
||||||
## Quality Gates
|
|
||||||
|
|
||||||
Before delivering the review, verify:
|
|
||||||
|
|
||||||
1. **Every finding is actionable.** Re-read each finding. If it says "consider", "might want to", or "could be improved" without a concrete fix, rewrite it with a specific action. Vague findings waste engineering time.
|
|
||||||
2. **No false positives from skimming.** For each finding, verify the surrounding code was actually read. Check that the "bug" isn't handled elsewhere in the same function, that the "unused import" isn't used in a type annotation, that the "missing null check" isn't guarded by the caller.
|
|
||||||
3. **Severity is calibrated.** A style nit is never P0. A SQL injection is never P3. Re-check every severity assignment.
|
|
||||||
4. **Line numbers are accurate.** Verify each cited line number against the file content. A finding pointing to the wrong line is worse than no finding.
|
|
||||||
5. **Protected artifacts are respected.** Discard any findings that recommend deleting or gitignoring files in `docs/brainstorms/`, `docs/plans/`, or `docs/solutions/`.
|
|
||||||
6. **Findings don't duplicate linter output.** Don't flag things the project's linter/formatter would catch (missing semicolons, wrong indentation). Focus on semantic issues.
|
|
||||||
|
|
||||||
## Language-Agnostic
|
|
||||||
|
|
||||||
This skill does NOT use language-specific reviewer agents. Persona reviewers adapt their criteria to the language/framework based on project context (loaded automatically). This keeps the skill simple and avoids maintaining parallel reviewers per language.
|
|
||||||
|
|
||||||
## After Review
|
|
||||||
|
|
||||||
### Mode-Driven Post-Review Flow
|
|
||||||
|
|
||||||
After presenting findings and verdict (Stage 6), route the next steps by mode. Review and synthesis stay the same in every mode; only mutation and handoff behavior changes.
|
|
||||||
|
|
||||||
#### Step 1: Build the action sets
|
|
||||||
|
|
||||||
- **Clean review** means zero findings after suppression and pre-existing separation. Skip the fix/handoff phase when the review is clean.
|
|
||||||
- **Fixer queue:** final findings routed to `safe_auto -> review-fixer`.
|
|
||||||
- **Residual actionable queue:** unresolved `gated_auto` or `manual` findings whose final owner is `downstream-resolver`.
|
|
||||||
- **Report-only queue:** `advisory` findings and any outputs owned by `human` or `release`.
|
|
||||||
- **Never convert advisory-only outputs into fix work or todos.** Deployment notes, residual risks, and release-owned items stay in the report.
|
|
||||||
|
|
||||||
#### Step 2: Choose policy by mode
|
|
||||||
|
|
||||||
**Interactive mode**
|
|
||||||
|
|
||||||
- Ask a single policy question only when actionable work exists.
|
|
||||||
- Recommended default:
|
|
||||||
|
|
||||||
```
|
|
||||||
What should I do with the actionable findings?
|
|
||||||
1. Apply safe_auto fixes and leave the rest as residual work (Recommended)
|
|
||||||
2. Apply safe_auto fixes only
|
|
||||||
3. Review report only
|
|
||||||
```
|
|
||||||
|
|
||||||
- Tailor the prompt to the actual action sets. If the fixer queue is empty, do not offer "Apply safe_auto fixes" options. Ask whether to externalize the residual actionable work or keep the review report-only instead.
|
|
||||||
- Only include `gated_auto` findings in the fixer queue after the user explicitly approves the specific items. Do not widen the queue based on severity alone.
|
|
||||||
|
|
||||||
**Autonomous mode**
|
|
||||||
|
|
||||||
- Ask no questions.
|
|
||||||
- Apply only the `safe_auto -> review-fixer` queue.
|
|
||||||
- Leave `gated_auto`, `manual`, `human`, and `release` items unresolved.
|
|
||||||
- Prepare residual work only for unresolved actionable findings whose final owner is `downstream-resolver`.
|
|
||||||
|
|
||||||
**Report-only mode**
|
|
||||||
|
|
||||||
- Ask no questions.
|
|
||||||
- Do not build a fixer queue.
|
|
||||||
- Do not create residual todos or `.context` artifacts.
|
|
||||||
- Stop after Stage 6. Everything remains in the report.
|
|
||||||
|
|
||||||
#### Step 3: Apply fixes with one fixer and bounded rounds
|
|
||||||
|
|
||||||
- Spawn exactly one fixer subagent for the current fixer queue in the current checkout. That fixer applies all approved changes and runs the relevant targeted tests in one pass against a consistent tree.
|
|
||||||
- Do not fan out multiple fixers against the same checkout. Parallel fixers require isolated worktrees/branches and deliberate mergeback.
|
|
||||||
- Re-review only the changed scope after fixes land.
|
|
||||||
- Bound the loop with `max_rounds: 2`. If issues remain after the second round, stop and hand them off as residual work or report them as unresolved.
|
|
||||||
- If any applied finding has `requires_verification: true`, the round is incomplete until the targeted verification runs.
|
|
||||||
- Do not start a mutating review round concurrently with browser testing on the same checkout. Future orchestrators that want both must either run `mode:report-only` during the parallel phase or isolate the mutating review in its own checkout/worktree.
|
|
||||||
|
|
||||||
#### Step 4: Emit artifacts and downstream handoff
|
|
||||||
|
|
||||||
- In interactive and autonomous modes, write a per-run artifact under `.context/compound-engineering/ce-review-beta/<run-id>/` containing:
|
|
||||||
- synthesized findings
|
|
||||||
- applied fixes
|
|
||||||
- residual actionable work
|
|
||||||
- advisory-only outputs
|
|
||||||
- In autonomous mode, create durable todo files only for unresolved actionable findings whose final owner is `downstream-resolver`. Load the `todo-create` skill for the canonical directory path, naming convention, YAML frontmatter structure, and template. Each todo should map the finding's severity to the todo priority (`P0`/`P1` -> `p1`, `P2` -> `p2`, `P3` -> `p3`) and set `status: ready` since these findings have already been triaged by synthesis.
|
|
||||||
- Do not create todos for `advisory` findings, `owner: human`, `owner: release`, or protected-artifact cleanup suggestions.
|
|
||||||
- If only advisory outputs remain, create no todos.
|
|
||||||
- Interactive mode may offer to externalize residual actionable work after fixes, but it is not required to finish the review.
|
|
||||||
|
|
||||||
#### Step 5: Final next steps
|
|
||||||
|
|
||||||
**Interactive mode only:** after the fix-review cycle completes (clean verdict or the user chose to stop), offer next steps based on the entry mode. Reuse the resolved review base/default branch from Stage 1 when known; do not hard-code only `main`/`master`.
|
|
||||||
|
|
||||||
- **PR mode (entered via PR number/URL):**
|
|
||||||
- **Push fixes** -- push commits to the existing PR branch
|
|
||||||
- **Exit** -- done for now
|
|
||||||
- **Branch mode (feature branch with no PR, and not the resolved review base/default branch):**
|
|
||||||
- **Create a PR (Recommended)** -- push and open a pull request
|
|
||||||
- **Continue without PR** -- stay on the branch
|
|
||||||
- **Exit** -- done for now
|
|
||||||
- **On the resolved review base/default branch:**
|
|
||||||
- **Continue** -- proceed with next steps
|
|
||||||
- **Exit** -- done for now
|
|
||||||
|
|
||||||
If "Create a PR": first publish the branch with `git push --set-upstream origin HEAD`, then use `gh pr create` with a title and summary derived from the branch changes.
|
|
||||||
If "Push fixes": push the branch with `git push` to update the existing PR.
|
|
||||||
|
|
||||||
**Autonomous and report-only modes:** stop after the report, artifact emission, and residual-work handoff. Do not commit, push, or create a PR.
|
|
||||||
|
|
||||||
## Fallback
|
|
||||||
|
|
||||||
If the platform doesn't support parallel sub-agents, run reviewers sequentially. Everything else (stages, output format, merge pipeline) stays the same.
|
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -11,7 +11,7 @@ Use this **exact format** when presenting synthesized review findings. Findings
|
|||||||
|
|
||||||
**Scope:** merge-base with the review base branch -> working tree (14 files, 342 lines)
|
**Scope:** merge-base with the review base branch -> working tree (14 files, 342 lines)
|
||||||
**Intent:** Add order export endpoint with CSV and JSON format support
|
**Intent:** Add order export endpoint with CSV and JSON format support
|
||||||
**Mode:** autonomous
|
**Mode:** autofix
|
||||||
|
|
||||||
**Reviewers:** correctness, testing, maintainability, security, api-contract
|
**Reviewers:** correctness, testing, maintainability, security, api-contract
|
||||||
- security -- new public endpoint accepts user-provided format parameter
|
- security -- new public endpoint accepts user-provided format parameter
|
||||||
@@ -101,7 +101,7 @@ Use this **exact format** when presenting synthesized review findings. Findings
|
|||||||
- **Confidence column** shows the finding's confidence score
|
- **Confidence column** shows the finding's confidence score
|
||||||
- **Route column** shows the synthesized handling decision as ``<autofix_class> -> <owner>``.
|
- **Route column** shows the synthesized handling decision as ``<autofix_class> -> <owner>``.
|
||||||
- **Header includes** scope, intent, and reviewer team with per-conditional justifications
|
- **Header includes** scope, intent, and reviewer team with per-conditional justifications
|
||||||
- **Mode line** -- include `interactive`, `autonomous`, or `report-only`
|
- **Mode line** -- include `interactive`, `autofix`, or `report-only`
|
||||||
- **Applied Fixes section** -- include only when a fix phase ran in this review invocation
|
- **Applied Fixes section** -- include only when a fix phase ran in this review invocation
|
||||||
- **Residual Actionable Work section** -- include only when unresolved actionable findings were handed off for later work
|
- **Residual Actionable Work section** -- include only when unresolved actionable findings were handed off for later work
|
||||||
- **Pre-existing section** -- separate table, no confidence column (these are informational)
|
- **Pre-existing section** -- separate table, no confidence column (these are informational)
|
||||||
@@ -23,7 +23,7 @@ CRITICAL: You MUST execute every step below IN ORDER. Do NOT skip any required s
|
|||||||
|
|
||||||
GATE: STOP. Verify that implementation work was performed - files were created or modified beyond the plan. Do NOT proceed to step 5 if no code changes were made.
|
GATE: STOP. Verify that implementation work was performed - files were created or modified beyond the plan. Do NOT proceed to step 5 if no code changes were made.
|
||||||
|
|
||||||
5. `/ce:review`
|
5. `/ce:review mode:autofix`
|
||||||
|
|
||||||
6. `/compound-engineering:todo-resolve`
|
6. `/compound-engineering:todo-resolve`
|
||||||
|
|
||||||
|
|||||||
@@ -21,15 +21,19 @@ Swarm-enabled LFG. Run these steps in order, parallelizing where indicated. Do n
|
|||||||
|
|
||||||
After work completes, launch steps 5 and 6 as **parallel swarm agents** (both only need code to be written):
|
After work completes, launch steps 5 and 6 as **parallel swarm agents** (both only need code to be written):
|
||||||
|
|
||||||
5. `/ce:review` — spawn as background Task agent
|
5. `/ce:review mode:report-only` — spawn as background Task agent
|
||||||
6. `/compound-engineering:test-browser` — spawn as background Task agent
|
6. `/compound-engineering:test-browser` — spawn as background Task agent
|
||||||
|
|
||||||
Wait for both to complete before continuing.
|
Wait for both to complete before continuing.
|
||||||
|
|
||||||
|
## Autofix Phase
|
||||||
|
|
||||||
|
7. `/ce:review mode:autofix` — run sequentially after the parallel phase so it can safely mutate the checkout, apply `safe_auto` fixes, and emit residual todos for step 8
|
||||||
|
|
||||||
## Finalize Phase
|
## Finalize Phase
|
||||||
|
|
||||||
7. `/compound-engineering:todo-resolve` — resolve findings, compound on learnings, clean up completed todos
|
8. `/compound-engineering:todo-resolve` — resolve findings, compound on learnings, clean up completed todos
|
||||||
8. `/compound-engineering:feature-video` — record the final walkthrough and add to PR
|
9. `/compound-engineering:feature-video` — record the final walkthrough and add to PR
|
||||||
9. Output `<promise>DONE</promise>` when video is in PR
|
10. Output `<promise>DONE</promise>` when video is in PR
|
||||||
|
|
||||||
Start with step 1 now.
|
Start with step 1 now.
|
||||||
|
|||||||
@@ -94,7 +94,7 @@ To check blockers: search for `{dep_id}-complete-*.md` in both paths. Missing ma
|
|||||||
| Trigger | Flow |
|
| Trigger | Flow |
|
||||||
|---------|------|
|
|---------|------|
|
||||||
| Code review | `/ce:review` -> Findings -> `/todo-triage` -> Todos |
|
| Code review | `/ce:review` -> Findings -> `/todo-triage` -> Todos |
|
||||||
| Autonomous review | `/ce:review-beta mode:autonomous` -> Residual todos -> `/todo-resolve` |
|
| Autonomous review | `/ce:review mode:autofix` -> Residual todos -> `/todo-resolve` |
|
||||||
| Code TODOs | `/todo-resolve` -> Fixes + Complex todos |
|
| Code TODOs | `/todo-resolve` -> Fixes + Complex todos |
|
||||||
| Planning | Brainstorm -> Create todo -> Work -> Complete |
|
| Planning | Brainstorm -> Create todo -> Work -> Complete |
|
||||||
|
|
||||||
|
|||||||
@@ -20,7 +20,7 @@ Scan `.context/compound-engineering/todos/*.md` and legacy `todos/*.md`. Partiti
|
|||||||
|
|
||||||
If a specific todo ID or pattern was passed as an argument, filter to matching todos only (still must be `ready`).
|
If a specific todo ID or pattern was passed as an argument, filter to matching todos only (still must be `ready`).
|
||||||
|
|
||||||
Residual actionable work from `ce:review-beta mode:autonomous` after its `safe_auto` pass will already be `ready`.
|
Residual actionable work from `ce:review mode:autofix` after its `safe_auto` pass will already be `ready`.
|
||||||
|
|
||||||
Skip any todo that recommends deleting, removing, or gitignoring files in `docs/brainstorms/`, `docs/plans/`, or `docs/solutions/` — these are intentional pipeline artifacts.
|
Skip any todo that recommends deleting, removing, or gitignoring files in `docs/brainstorms/`, `docs/plans/`, or `docs/solutions/` — these are intentional pipeline artifacts.
|
||||||
|
|
||||||
|
|||||||
@@ -6,14 +6,14 @@ async function readRepoFile(relativePath: string): Promise<string> {
|
|||||||
return readFile(path.join(process.cwd(), relativePath), "utf8")
|
return readFile(path.join(process.cwd(), relativePath), "utf8")
|
||||||
}
|
}
|
||||||
|
|
||||||
describe("ce-review-beta contract", () => {
|
describe("ce-review contract", () => {
|
||||||
test("documents explicit modes and orchestration boundaries", async () => {
|
test("documents explicit modes and orchestration boundaries", async () => {
|
||||||
const content = await readRepoFile("plugins/compound-engineering/skills/ce-review-beta/SKILL.md")
|
const content = await readRepoFile("plugins/compound-engineering/skills/ce-review/SKILL.md")
|
||||||
|
|
||||||
expect(content).toContain("## Mode Detection")
|
expect(content).toContain("## Mode Detection")
|
||||||
expect(content).toContain("mode:autonomous")
|
expect(content).toContain("mode:autofix")
|
||||||
expect(content).toContain("mode:report-only")
|
expect(content).toContain("mode:report-only")
|
||||||
expect(content).toContain(".context/compound-engineering/ce-review-beta/<run-id>/")
|
expect(content).toContain(".context/compound-engineering/ce-review/<run-id>/")
|
||||||
expect(content).toContain("Do not create residual todos or `.context` artifacts.")
|
expect(content).toContain("Do not create residual todos or `.context` artifacts.")
|
||||||
expect(content).toContain(
|
expect(content).toContain(
|
||||||
"Do not start a mutating review round concurrently with browser testing on the same checkout.",
|
"Do not start a mutating review round concurrently with browser testing on the same checkout.",
|
||||||
@@ -25,7 +25,7 @@ describe("ce-review-beta contract", () => {
|
|||||||
})
|
})
|
||||||
|
|
||||||
test("documents policy-driven routing and residual handoff", async () => {
|
test("documents policy-driven routing and residual handoff", async () => {
|
||||||
const content = await readRepoFile("plugins/compound-engineering/skills/ce-review-beta/SKILL.md")
|
const content = await readRepoFile("plugins/compound-engineering/skills/ce-review/SKILL.md")
|
||||||
|
|
||||||
expect(content).toContain("## Action Routing")
|
expect(content).toContain("## Action Routing")
|
||||||
expect(content).toContain("Only `safe_auto -> review-fixer` enters the in-skill fixer queue automatically.")
|
expect(content).toContain("Only `safe_auto -> review-fixer` enters the in-skill fixer queue automatically.")
|
||||||
@@ -36,7 +36,7 @@ describe("ce-review-beta contract", () => {
|
|||||||
'If the fixer queue is empty, do not offer "Apply safe_auto fixes" options.',
|
'If the fixer queue is empty, do not offer "Apply safe_auto fixes" options.',
|
||||||
)
|
)
|
||||||
expect(content).toContain(
|
expect(content).toContain(
|
||||||
"In autonomous mode, create durable todo files only for unresolved actionable findings whose final owner is `downstream-resolver`.",
|
"In autofix mode, create durable todo files only for unresolved actionable findings whose final owner is `downstream-resolver`.",
|
||||||
)
|
)
|
||||||
expect(content).toContain("If only advisory outputs remain, create no todos.")
|
expect(content).toContain("If only advisory outputs remain, create no todos.")
|
||||||
expect(content).toContain("**On the resolved review base/default branch:**")
|
expect(content).toContain("**On the resolved review base/default branch:**")
|
||||||
@@ -46,7 +46,7 @@ describe("ce-review-beta contract", () => {
|
|||||||
|
|
||||||
test("keeps findings schema and downstream docs aligned", async () => {
|
test("keeps findings schema and downstream docs aligned", async () => {
|
||||||
const rawSchema = await readRepoFile(
|
const rawSchema = await readRepoFile(
|
||||||
"plugins/compound-engineering/skills/ce-review-beta/references/findings-schema.json",
|
"plugins/compound-engineering/skills/ce-review/references/findings-schema.json",
|
||||||
)
|
)
|
||||||
const schema = JSON.parse(rawSchema) as {
|
const schema = JSON.parse(rawSchema) as {
|
||||||
_meta: { confidence_thresholds: { suppress: string } }
|
_meta: { confidence_thresholds: { suppress: string } }
|
||||||
@@ -83,11 +83,36 @@ describe("ce-review-beta contract", () => {
|
|||||||
expect(schema._meta.confidence_thresholds.suppress).toContain("0.60")
|
expect(schema._meta.confidence_thresholds.suppress).toContain("0.60")
|
||||||
|
|
||||||
const fileTodos = await readRepoFile("plugins/compound-engineering/skills/todo-create/SKILL.md")
|
const fileTodos = await readRepoFile("plugins/compound-engineering/skills/todo-create/SKILL.md")
|
||||||
expect(fileTodos).toContain("/ce:review-beta mode:autonomous")
|
expect(fileTodos).toContain("/ce:review mode:autofix")
|
||||||
expect(fileTodos).toContain("/todo-resolve")
|
expect(fileTodos).toContain("/todo-resolve")
|
||||||
|
|
||||||
const resolveTodos = await readRepoFile("plugins/compound-engineering/skills/todo-resolve/SKILL.md")
|
const resolveTodos = await readRepoFile("plugins/compound-engineering/skills/todo-resolve/SKILL.md")
|
||||||
expect(resolveTodos).toContain("ce:review-beta mode:autonomous")
|
expect(resolveTodos).toContain("ce:review mode:autofix")
|
||||||
expect(resolveTodos).toContain("safe_auto")
|
expect(resolveTodos).toContain("safe_auto")
|
||||||
})
|
})
|
||||||
|
|
||||||
|
test("fails closed when merge-base is unresolved instead of falling back to git diff HEAD", async () => {
|
||||||
|
const content = await readRepoFile("plugins/compound-engineering/skills/ce-review/SKILL.md")
|
||||||
|
|
||||||
|
// No scope path should fall back to `git diff HEAD` or `git diff --cached` — those only
|
||||||
|
// show uncommitted changes and silently produce empty diffs on clean feature branches.
|
||||||
|
expect(content).not.toContain("git diff --name-only HEAD")
|
||||||
|
expect(content).not.toContain("git diff -U10 HEAD")
|
||||||
|
expect(content).not.toContain("git diff --cached")
|
||||||
|
|
||||||
|
// All three scope paths must emit ERROR when BASE is unresolved
|
||||||
|
const errorMatches = content.match(/echo "ERROR: Unable to resolve/g)
|
||||||
|
expect(errorMatches?.length).toBe(3) // PR mode, branch mode, standalone mode
|
||||||
|
})
|
||||||
|
|
||||||
|
test("orchestration callers pass explicit mode flags", async () => {
|
||||||
|
const lfg = await readRepoFile("plugins/compound-engineering/skills/lfg/SKILL.md")
|
||||||
|
expect(lfg).toContain("/ce:review mode:autofix")
|
||||||
|
|
||||||
|
const slfg = await readRepoFile("plugins/compound-engineering/skills/slfg/SKILL.md")
|
||||||
|
// slfg uses report-only for the parallel phase (safe with browser testing)
|
||||||
|
// then autofix sequentially after to emit fixes and todos
|
||||||
|
expect(slfg).toContain("/ce:review mode:report-only")
|
||||||
|
expect(slfg).toContain("/ce:review mode:autofix")
|
||||||
|
})
|
||||||
})
|
})
|
||||||
|
|||||||
Reference in New Issue
Block a user