feat: redesign document-review skill with persona-based review (#359)

2026-03-24 01:51:22 -07:00
parent e932276866
commit 18d22afde2
18 changed files with 1259 additions and 64 deletions
--- a/docs/brainstorms/2026-03-23-plan-review-personas-requirements.md
+++ b/docs/brainstorms/2026-03-23-plan-review-personas-requirements.md
@@ -0,0 +1,84 @@
+---
+date: 2026-03-23
+topic: plan-review-personas
+---
+
+# Persona-Based Plan Review for document-review
+
+## Problem Frame
+
+The `document-review` skill currently uses a single-voice evaluator with five generic criteria (Clarity, Completeness, Specificity, Appropriate Level, YAGNI). This catches surface-level issues but misses role-specific concerns: a security engineer, product leader, and design reviewer each see different problems in the same plan. The ce:review skill already demonstrates that multi-persona review produces richer, more actionable feedback for code. The same architecture should apply to plan review.
+
+## Requirements
+
+- R1. Replace the current single-voice `document-review` with a persona pipeline that dispatches specialized reviewer agents in parallel against the target document.
+
+- R2. Implement 2 always-on personas that run on every document review:
+  - **coherence**: Internal consistency, contradictions, terminology drift, structural issues, ambiguity. Checks whether readers would diverge on interpretation.
+  - **feasibility**: Can this actually be built? Architecture decisions, external dependencies, performance requirements, migration strategies. Absorbs the "tech-plan implementability" angle (can an implementer code from this?).
+
+- R3. Implement 4 conditional personas that activate based on document content analysis:
+  - **product-lens**: Activates when the document contains user-facing features, market claims, scope decisions, or prioritization. Opens with a "premise challenge" -- 3 diagnostic questions that challenge whether the plan solves the right problem. Asks: "What's the 10-star version? What's the narrowest wedge that proves demand?"
+  - **design-lens**: Activates when the document contains UI/UX work, frontend changes, or user flows. Uses a "rate 0-10 and describe what 10 looks like" dimensional rating method. Rates design dimensions concretely, identifies what "great" looks like for each.
+  - **security-lens**: Activates when the document contains auth, data handling, external APIs, or payments. Evaluates threat model at the plan level, not code level. Surfaces what the plan fails to account for.
+  - **scope-guardian**: Activates when the document contains multiple priority levels, unclear boundaries, or goals that don't align with requirements. Absorbs the "skeptic" angle -- challenges unnecessary complexity, premature abstractions, and frameworks ahead of need. Opens with a "what already exists?" check against the codebase.
+
+- R4. The skill auto-detects which conditional personas are relevant by analyzing the document content. No user configuration required for persona selection.
+
+- R5. Hybrid action model after persona findings are synthesized:
+  - **Auto-fix**: Document quality issues (contradictions, terminology drift, structural problems, missing details that can be inferred). These are unambiguously improvements.
+  - **Present for user decision**: Strategic/product questions (problem framing, scope challenges, priority conflicts, "is this the right thing to build?"). These require human judgment.
+
+- R6. Each persona returns structured findings with confidence scores. The orchestrator deduplicates overlapping findings across personas and synthesizes into a single prioritized report.
+
+- R7. Maintain backward compatibility with all existing callers:
+  - `ce-brainstorm` Phase 4 "Review and refine" option
+  - `ce-plan` / `ce-plan-beta` post-generation "Review and refine" option
+  - `deepen-plan-beta` post-deepening "Review and refine" option
+  - Standalone invocation
+  - Returns "Review complete" when done, as callers expect
+
+- R8. Pipeline-compatible: When called from automated pipelines (e.g., future lfg/slfg integration), auto-fixes run silently and only genuinely blocking strategic questions surface to the user.
+
+## Success Criteria
+
+- Running document-review on a plan surfaces role-specific issues that the current single-voice evaluator misses (e.g., security gaps, product framing problems, scope concerns).
+- Conditional personas activate only when relevant -- a backend refactor plan does not spawn design-lens.
+- Auto-fix changes improve the document without requiring user approval for every edit.
+- Strategic findings are presented as clear questions, not vague observations.
+- All existing callers (brainstorm, plan, plan-beta, deepen-plan-beta) work without modification.
+
+## Scope Boundaries
+
+- Not adding new callers or pipeline integrations beyond maintaining existing ones.
+- Not changing how deepen-plan-beta works (it strengthens with research; document-review reviews for issues).
+- Not adding user configuration for persona selection (auto-detection only for now).
+- Not inventing new review frameworks -- incorporating established review patterns (premise challenge, dimensional rating, existing-code check) into the respective personas.
+
+## Key Decisions
+
+- **Replace, don't layer**: document-review is fully replaced by the persona pipeline, not enhanced with an optional mode. Simpler mental model, one behavior.
+- **2 always-on + 4 conditional**: Coherence and feasibility run on every document. Product-lens, design-lens, security-lens, and scope-guardian activate based on content. Keeps cost proportional to document complexity.
+- **Hybrid action model**: Auto-fix document quality issues, present strategic questions. Matches the natural split between what personas surface.
+- **Absorb skeptic into scope-guardian**: Both challenge whether the plan is right-sized. One persona with both angles avoids redundancy.
+- **Absorb tech-plan implementability into feasibility**: Both ask "can this work?" One persona with both angles.
+- **Review patterns as persona behavior, not separate mechanisms**: Premise challenge goes into product-lens, dimensional rating goes into design-lens, existing-code check goes into scope-guardian.
+
+## Dependencies / Assumptions
+
+- Assumes the ce:review agent orchestration pattern (parallel dispatch, synthesis, dedup) can be adapted for plan review without fundamental changes.
+- Assumes plan/requirements documents are text-based and contain enough signal for content-based conditional persona selection.
+
+## Outstanding Questions
+
+### Deferred to Planning
+
+- [Affects R6][Technical] What is the exact structured output format for persona findings? Should it mirror ce:review's P1/P2/P3 severity model or use a different classification?
+- [Affects R4][Needs research] What content signals reliably detect each conditional persona's relevance? Need to define the heuristics (keyword-based, section-based, or semantic).
+- [Affects R1][Technical] Should personas be implemented as compound-engineering agents (like code review agents) or as inline prompt sections within the skill? Agents enable parallel dispatch; inline is simpler.
+- [Affects R5][Technical] How should the auto-fix mechanism work -- direct inline edits like current document-review, or a separate "apply fixes" pass after synthesis?
+- [Affects R7][Technical] Do any of the 4 existing callers need minor updates to handle the new output format, or is the "Review complete" contract sufficient?
+
+## Next Steps
+
+-> /ce:plan for structured implementation planning
--- a/docs/plans/2026-03-23-001-feat-plan-review-personas-beta-plan.md
+++ b/docs/plans/2026-03-23-001-feat-plan-review-personas-beta-plan.md
@@ -0,0 +1,505 @@
+---
+title: "feat: Replace document-review with persona-based review pipeline"
+type: feat
+status: completed
+date: 2026-03-23
+deepened: 2026-03-23
+origin: docs/brainstorms/2026-03-23-plan-review-personas-requirements.md
+---
+
+# Replace document-review with Persona-Based Review Pipeline
+
+## Overview
+
+Replace the single-voice `document-review` skill with a multi-persona review pipeline that dispatches specialized reviewer agents in parallel. Two always-on personas (coherence, feasibility) run on every review. Four conditional personas (product-lens, design-lens, security-lens, scope-guardian) activate based on document content analysis. Quality issues are auto-fixed; strategic questions are presented to the user.
+
+## Problem Frame
+
+The current `document-review` applies five generic criteria (Clarity, Completeness, Specificity, Appropriate Level, YAGNI) through a single evaluator voice. This misses role-specific concerns: a security engineer, product leader, and design reviewer each see different problems in the same plan. The `ce:review` skill already demonstrates that multi-persona review produces richer, more actionable feedback for code. The same architecture applies to plan/requirements review. (see origin: docs/brainstorms/2026-03-23-plan-review-personas-requirements.md)
+
+## Requirements Trace
+
+- R1. Replace document-review with persona pipeline dispatching specialized agents in parallel
+- R2. 2 always-on personas: coherence, feasibility
+- R3. 4 conditional personas: product-lens, design-lens, security-lens, scope-guardian
+- R4. Auto-detect conditional persona relevance from document content
+- R5. Hybrid action model: auto-fix quality issues, present strategic questions
+- R6. Structured findings with confidence, dedup, synthesized report
+- R7. Backward compatibility with all 4 callers (brainstorm, plan, plan-beta, deepen-plan-beta)
+- R8. Pipeline-compatible for future automated workflows
+
+## Scope Boundaries
+
+- Not adding new callers or pipeline integrations
+- Not changing deepen-plan-beta behavior
+- Not adding user configuration for persona selection
+- Not inventing new review frameworks -- incorporating established review patterns into respective personas
+- Not modifying any of the 4 existing caller skills
+
+## Context & Research
+
+### Relevant Code and Patterns
+
+- `plugins/compound-engineering/skills/ce-review/SKILL.md` -- Multi-agent orchestration reference: parallel dispatch via Task tool, always-on + conditional agents, P1/P2/P3 severity, finding synthesis with dedup
+- `plugins/compound-engineering/skills/document-review/SKILL.md` -- Current single-voice skill to replace. Key contract: "Review complete" terminal signal
+- `plugins/compound-engineering/agents/review/*.md` -- 15 existing review agents. Frontmatter schema: `name`, `description`, `model: inherit`. Body: examples block, role definition, analysis protocol, output format
+- `plugins/compound-engineering/AGENTS.md` -- Agent naming: fully-qualified `compound-engineering:<category>:<agent-name>`. Agent placement: `agents/<category>/<name>.md`
+
+### Caller Integration Points
+
+All 4 callers use the same contract:
+- `ce-brainstorm/SKILL.md` line 301: "Load the `document-review` skill and apply it to the requirements document"
+- `ce-plan/SKILL.md` line 592: "Load `document-review` skill"
+- `ce-plan-beta/SKILL.md` line 611: "Load the `document-review` skill with the plan path"
+- `deepen-plan-beta/SKILL.md` line 402: "Load the `document-review` skill with the plan path"
+
+All expect "Review complete" as the terminal signal. No callers check for specific output format. No caller changes needed.
+
+### Institutional Learnings
+
+- **Subagent design** (docs/solutions/skill-design/compound-refresh-skill-improvements.md): Each persona agent needs explicit context (file path, scope, output format) -- don't rely on inherited context. Use native file tools, not shell commands. Avoid hardcoded tool names; use capability-first language with platform examples.
+- **Parallel dispatch safety**: Persona reviewers are read-only (analyze the document, don't modify it). Parallel dispatch is safe. This differs from compound-refresh which used sequential subagents because they modified files.
+- **Contradictory findings**: With 6 independent reviewers, findings will conflict (scope-guardian wants to cut; coherence wants to keep for narrative flow). Synthesis needs conflict-resolution rules, not just dedup.
+- **Classification pipeline ordering** (docs/solutions/skill-design/claude-permissions-optimizer-classification-fix.md): Pipeline ordering matters: filter -> normalize -> group -> threshold -> re-classify -> output. Post-grouping safety checks catch misclassified findings. Single source of truth for classification logic.
+- **Beta skills framework** (docs/solutions/skill-design/beta-skills-framework.md): Since we're replacing document-review entirely (not running side-by-side), the beta framework doesn't apply here.
+
+### Research Insights: iterative-engineering plan-review
+
+The iterative-engineering plugin (v1.16.1) implements a mature plan-review skill with persona agents. Key architectural patterns to adopt:
+
+**Structured output contract**: All personas return findings in a consistent JSON-like structure with: title (<=10 words), priority (HIGH/MEDIUM/LOW), section, line, why_it_matters (impact not symptom), confidence (0.0-1.0), evidence (quoted text, minimum 1), and optional suggestion. This consistency enables reliable synthesis.
+
+**Fingerprint-based dedup**: `normalize(section) + line_bucket(line, +/-5) + normalize(title)`. When fingerprints match: keep highest priority, highest confidence, union evidence, note all reviewers. This is more precise than judgment-based dedup.
+
+**Residual concerns**: Findings below the confidence threshold (0.50) are stored separately as residual concerns. During synthesis, residual concerns are promoted to findings if they overlap with findings from other reviewers or describe concrete blocking risks. This catches issues that one persona sees dimly but another confirms.
+
+**Per-persona confidence calibration**: Each persona defines its own confidence bands -- what HIGH (0.80+), MODERATE (0.60-0.79), and LOW mean for that persona's domain. This prevents apples-to-oranges confidence comparisons.
+
+**Explicit suppress conditions**: Each persona lists what it should NOT flag (e.g., coherence suppresses style preferences and missing content; feasibility suppresses implementation style choices). This prevents noise and keeps personas focused.
+
+**Subagent prompt template**: A shared template wraps each persona's identity + output schema + review context. This ensures consistent behavior across all personas without repeating boilerplate in each agent file.
+
+### Established Review Patterns
+
+Three proven review approaches provide the behavioral foundation for specific personas:
+
+**Premise challenge pattern (-> product-lens persona):**
+- Nuclear scope challenge with 3 questions: (1) Is this the right problem? Could a different framing yield a simpler/more impactful solution? (2) What is the actual user/business outcome? Is the plan the most direct path? (3) What happens if we do nothing? Real pain or hypothetical?
+- Implementation alternatives: Produce 2-3 approaches with effort (S/M/L/XL), risk (Low/Med/High), pros/cons
+- Search-before-building: Layer 1 (conventional), Layer 2 (search results), Layer 3 (first principles)
+
+**Dimensional rating pattern (-> design-lens persona):**
+- 0-10 rating loop: Rate dimension -> explain gap ("4 because X; 10 would have Y") -> suggest fix -> re-rate -> repeat
+- 7 evaluation passes: Information architecture, interaction state coverage, user journey/emotional arc, AI slop risk, design system alignment, responsive/a11y, unresolved design decisions
+- AI slop blacklist: 10 recognizable AI-generated patterns to avoid (3-column feature grids, purple gradients, icons in colored circles, uniform border-radius, etc.)
+
+**Existing-code audit pattern (-> scope-guardian + feasibility personas):**
+- "What already exists?" check: (1) What existing code partially/fully solves each sub-problem? (2) What is minimum set of changes for stated goal? (3) Complexity check (>8 files or >2 new classes = smell). (4) Search check per architectural pattern. (5) TODOS cross-reference
+- Completeness principle: With AI, completeness cost is 10-100x cheaper. If shortcut saves human hours but only minutes with AI, recommend complete version
+- Error & rescue map: For every method/codepath that can fail, name the exception class, trigger, handler, and user-visible outcome
+
+## Key Technical Decisions
+
+- **Agents, not inline prompts**: Persona reviewers are implemented as agent files under `agents/review/`. This enables parallel dispatch via Task tool, follows established patterns, and keeps the SKILL.md focused on orchestration. (Resolves deferred question from origin)
+
+- **Structured output contract aligned with ce:review-beta (PR #348)**: Same normalization mechanism -- findings-schema.json, subagent-template.md, review-output-template.md as reference files. Same field names and enums where applicable (severity P0-P3, autofix_class, owner, confidence, evidence). Document-specific adaptations: `section` replaces `file`+`line`, `deferred_questions` replaces `testing_gaps`, drop `pre_existing`. Each persona defines its own confidence calibration and suppress conditions. (Resolves deferred question from origin -- output format)
+
+- **Content-based activation heuristics**: The orchestrator skill checks the document for keyword and structural patterns to select conditional personas. Heuristics are defined in the skill, not in the agents -- this keeps selection logic centralized and agents focused on review. (Resolves deferred question from origin)
+
+- **Separate auto-fix pass after synthesis**: Personas are read-only (produce findings only). After dedup and synthesis, the orchestrator applies auto-fixes for quality issues in a single pass, then presents strategic questions. This prevents conflicting edits from multiple agents. (Resolves deferred question from origin)
+
+- **No caller modifications needed**: The "Review complete" contract is sufficient. All 4 callers reference document-review by skill name and check for the terminal signal. (Resolves deferred question from origin)
+
+- **Fingerprint-based dedup over judgment-based**: Use `normalize(section) + normalize(title)` fingerprinting for deterministic dedup. More reliable than asking the model to "remove duplicates" at synthesis time. When fingerprints match: keep highest priority, highest confidence, union evidence, note all agreeing reviewers.
+
+- **Residual concerns with cross-persona promotion**: Findings below 0.50 confidence are stored as residual concerns. During synthesis, promote to findings if corroborated by another persona or if they describe concrete blocking risks. This catches issues one persona sees dimly but another confirms.
+
+## Open Questions
+
+### Resolved During Planning
+
+- **Agent category**: Place under `agents/review/` alongside existing code review agents. Names are distinct (coherence-reviewer, feasibility-reviewer, etc.) and don't conflict with existing agents. Fully-qualified: `compound-engineering:review:<name>`.
+- **Parallel vs serial dispatch**: Always parallel. We have 2-6 agents per run (under the auto-serial threshold of 5 from ce:review's pattern). Even at max (6), these are document reviewers with bounded scope.
+- **Review pattern integration**: Premise challenge -> product-lens opener. Dimensional rating -> design-lens evaluation method. Existing-code audit -> scope-guardian opener. These are incorporated as agent behavior, not separate orchestration mechanisms.
+- **Output format**: Align with ce:review-beta (PR #348) normalization pattern. Same mechanism: JSON schema reference file, shared subagent template, output template. Same enums (P0-P3 severity, autofix_class, owner). Document-specific field swaps: `section` replaces `file`+`line`, `deferred_questions` replaces `testing_gaps`, drop `pre_existing`.
+
+### Deferred to Implementation
+
+- Exact keyword lists for conditional persona activation -- start with the obvious signals, refine based on real usage
+- Whether the auto-fix pass should re-read the document after applying changes to verify consistency, or trust a single pass
+
+## High-Level Technical Design
+
+> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.*
+
+```
+Document Review Pipeline Flow:
+
+1. READ document
+2. CLASSIFY document type (requirements doc vs plan)
+3. ANALYZE content for conditional persona signals
+   - product signals? -> activate product-lens
+   - design/UI signals? -> activate design-lens
+   - security/auth signals? -> activate security-lens
+   - scope/priority signals? -> activate scope-guardian
+4. ANNOUNCE review team with per-conditional justifications
+5. DISPATCH agents in parallel via Task tool
+   - Always: coherence-reviewer, feasibility-reviewer
+   - Conditional: activated personas from step 3
+   - Each receives: subagent-template.md populated with persona + schema + doc content
+6. COLLECT findings from all agents (validate against findings-schema.json)
+7. SYNTHESIZE
+   a. Validate: check structure compliance against schema, drop malformed
+   b. Confidence gate: suppress findings below 0.50
+   c. Deduplicate: fingerprint matching, keep highest severity/confidence
+   d. Promote residual concerns: corroborated or blocking -> promote to finding
+   e. Resolve contradictions: conflicting personas -> combined finding, manual + human
+   f. Route: safe_auto -> apply, everything else -> present
+8. APPLY safe_auto fixes (edit document inline, single pass)
+9. PRESENT remaining findings to user, grouped by severity
+10. FORMAT output using review-output-template.md
+11. OFFER next action: "Refine again" or "Review complete"
+```
+
+**Finding structure (aligned with ce:review-beta PR #348):**
+
+```
+Envelope (per persona):
+  reviewer:            Persona name (e.g., "coherence", "product-lens")
+  findings:            Array of finding objects
+  residual_risks:      Risks noticed but not confirmed as findings
+  deferred_questions:  Questions that should be resolved in a later workflow stage
+
+Finding object:
+  title:               Short issue title (<=10 words)
+  severity:            P0 / P1 / P2 / P3  (same scale as ce:review-beta)
+  section:             Document section where issue appears (replaces file+line)
+  why_it_matters:      Impact statement (what goes wrong if not addressed)
+  autofix_class:       safe_auto / gated_auto / manual / advisory
+  owner:               review-fixer / downstream-resolver / human / release
+  requires_verification: Whether fix needs re-review
+  suggested_fix:       Optional concrete fix (null if not obvious)
+  confidence:          0.0-1.0 (calibrated per persona)
+  evidence:            Quoted text from document (minimum 1)
+
+Severity definitions (same as ce:review-beta):
+  P0: Contradictions or gaps that would cause building the wrong thing. Must fix.
+  P1: Significant gap likely hit during planning/implementation. Should fix.
+  P2: Moderate issue with meaningful downside. Fix if straightforward.
+  P3: Minor improvement. User's discretion.
+
+Autofix classes (same enum as ce:review-beta for schema compatibility):
+  safe_auto:  Terminology fix, formatting, cross-reference -- local and deterministic
+  gated_auto: Restructure or edit that changes document meaning -- needs approval
+  manual:     Strategic question requiring user judgment -- becomes residual work
+  advisory:   Informational finding -- surface in report only
+
+Orchestrator routing (document review simplification):
+  The 4-class enum is preserved for schema compatibility with ce:review-beta,
+  but the orchestrator routes as 2 buckets:
+    safe_auto           -> apply automatically
+    gated_auto + manual + advisory -> present to user
+  The gated/manual/advisory distinction is blurry for documents (all need user
+  judgment). Personas still classify precisely; the orchestrator collapses.
+```
+
+## Implementation Units
+
+- [x] **Unit 1: Create always-on persona agents**
+
+**Goal:** Create the coherence and feasibility reviewer agents that run on every document review.
+
+**Requirements:** R2
+
+**Dependencies:** None
+
+**Files:**
+- Create: `plugins/compound-engineering/agents/review/coherence-reviewer.md`
+- Create: `plugins/compound-engineering/agents/review/feasibility-reviewer.md`
+
+**Approach:**
+- Follow existing agent structure: frontmatter (name, description, model: inherit), examples block, role definition, analysis protocol
+- Each agent defines: role identity, analysis protocol, confidence calibration, and suppress conditions
+- Agents do NOT define their own output format -- the shared `references/findings-schema.json` and `references/subagent-template.md` handle output normalization (same pattern as ce:review-beta PR #348)
+
+**coherence-reviewer:**
+- Role: Technical editor who reads for internal consistency
+- Hunts: contradictions between sections, terminology drift (same concept called different names), structural issues (sections that don't flow logically), ambiguity where readers would diverge on interpretation
+- Confidence calibration: HIGH (0.80+) = provable contradictions from text. MODERATE (0.60-0.79) = likely but could be reconciled charitably. Suppress below 0.50.
+- Suppress: style preferences, missing content (other personas handle that), imprecision that isn't actually ambiguity, formatting opinions
+
+**feasibility-reviewer:**
+- Role: Systems architect evaluating whether proposed approaches survive contact with reality
+- Hunts: architecture decisions that conflict with existing patterns, external dependencies without fallback plans, performance requirements without measurement plans, migration strategies with gaps, approaches that won't work with known constraints
+- Absorbs tech-plan implementability: can an implementer read this and start coding? Are file paths, interfaces, and dependencies specific enough?
+- Opens with "what already exists?" check: does the plan acknowledge existing code before proposing new abstractions?
+- Confidence calibration: HIGH (0.80+) = specific technical constraint that blocks approach. MODERATE (0.60-0.79) = constraint likely but depends on specifics not in document.
+- Suppress: implementation style choices, testing strategy details, code organization preferences, theoretical scalability concerns
+
+**Patterns to follow:**
+- `plugins/compound-engineering/agents/review/code-simplicity-reviewer.md` for agent structure and output format conventions
+- `plugins/compound-engineering/agents/review/architecture-strategist.md` for systematic analysis protocol style
+- iterative-engineering agents for confidence calibration and suppress conditions pattern
+
+**Test scenarios:**
+- coherence-reviewer identifies a plan where Section 3 claims "no external dependencies" but Section 5 proposes calling an external API
+- coherence-reviewer flags a document using "pipeline" and "workflow" interchangeably for the same concept
+- coherence-reviewer does NOT flag a minor formatting inconsistency (suppress condition working)
+- feasibility-reviewer identifies a requirement for "sub-millisecond response time" without a measurement or caching strategy
+- feasibility-reviewer identifies that a plan proposes building a custom auth system when the codebase already has one
+- feasibility-reviewer surfaces "what already exists?" when plan doesn't acknowledge existing patterns
+- Both agents produce findings with all required fields (title, priority, section, confidence, evidence, action)
+
+**Verification:**
+- Both agents have valid frontmatter (name, description, model: inherit)
+- Both agents include examples, role definition, analysis protocol, confidence calibration, and suppress conditions
+- Agents rely on shared findings-schema.json for output normalization (no per-agent output format)
+- Suppress conditions are explicit and sensible for each persona's domain
+
+---
+
+- [x] **Unit 2: Create conditional persona agents**
+
+**Goal:** Create the four conditional persona agents that activate based on document content.
+
+**Requirements:** R3
+
+**Dependencies:** Unit 1 (for consistent agent structure)
+
+**Files:**
+- Create: `plugins/compound-engineering/agents/review/product-lens-reviewer.md`
+- Create: `plugins/compound-engineering/agents/review/design-lens-reviewer.md`
+- Create: `plugins/compound-engineering/agents/review/security-lens-reviewer.md`
+- Create: `plugins/compound-engineering/agents/review/scope-guardian-reviewer.md`
+
+**Approach:**
+All four use the same structure established in Unit 1 (frontmatter, examples, role, protocol, confidence calibration, suppress conditions). Output normalization handled by shared reference files.
+
+**product-lens-reviewer:**
+- Role: Senior product leader evaluating whether the plan solves the right problem
+- Opens with premise challenge: 3 diagnostic questions:
+  1. Is this the right problem to solve? Could a different framing yield a simpler or more impactful solution?
+  2. What is the actual user/business outcome? Is the plan the most direct path, or is it solving a proxy problem?
+  3. What would happen if we did nothing? Real pain point or hypothetical?
+- Evaluates: scope decisions and prioritization rationale, implementation alternatives (are there simpler paths?), whether goals connect to requirements
+- Confidence calibration: HIGH (0.80+) = specific text demonstrating misalignment between stated goal and proposed work. MODERATE (0.60-0.79) = likely but depends on business context.
+- Suppress: implementation details, technical specifics, measurement methodology, style
+
+**design-lens-reviewer:**
+- Role: Senior product designer reviewing plans for missing design decisions
+- Uses "rate 0-10 and describe what 10 looks like" dimensional rating method
+- Evaluates design dimensions: information architecture (what does user see first/second/third?), interaction state coverage (loading, empty, error, success, partial), user flow completeness, responsive/accessibility considerations
+- Produces rated findings: "Information architecture: 4/10 -- it's a 4 because [gap]. A 10 would have [what's needed]."
+- AI slop check: flags plans that would produce generic AI-looking interfaces (3-column feature grids, purple gradients, icons in colored circles, uniform border-radius)
+- Confidence calibration: HIGH (0.80+) = missing states or flows that will clearly cause UX problems. MODERATE (0.60-0.79) = design gap exists but skilled designer could resolve from context.
+- Suppress: backend implementation details, performance concerns, security (other persona handles), business strategy
+
+**security-lens-reviewer:**
+- Role: Security architect evaluating threat model at the plan level
+- Evaluates: auth/authz gaps, data exposure risks, API surface vulnerabilities, input validation assumptions, secrets management, third-party trust boundaries, plan-level threat model completeness
+- Distinct from the code-level `security-sentinel` agent -- this reviews whether the PLAN accounts for security, not whether the CODE is secure
+- Confidence calibration: HIGH (0.80+) = plan explicitly introduces attack surface without mentioning mitigation. MODERATE (0.60-0.79) = security concern likely but plan may address it implicitly.
+- Suppress: code quality issues, performance, non-security architecture, business logic
+
+**scope-guardian-reviewer:**
+- Role: Product manager reviewing scope decisions for alignment, plus skeptic evaluating whether complexity earns its keep
+- Opens with "what already exists?" check: (1) What existing code/patterns already solve sub-problems? (2) What is the minimum set of changes for stated goal? (3) Complexity check -- if plan touches many files or introduces many new abstractions, is that justified?
+- Challenges: scope size relative to stated goals, unnecessary complexity, premature abstractions, framework-ahead-of-need, priority dependency conflicts (e.g., core feature depending on nice-to-have), scope boundaries violated by requirements, goals disconnected from requirements
+- Completeness principle check: is the plan taking shortcuts where the complete version would cost little more?
+- Confidence calibration: HIGH (0.80+) = can point to specific text showing scope conflict or unjustified complexity. MODERATE (0.60-0.79) = misalignment likely but depends on interpretation.
+- Suppress: implementation style choices, priority preferences (other persona handles), missing requirements (coherence handles), business strategy
+
+**Patterns to follow:**
+- Unit 1 agents for consistent structure
+- `plugins/compound-engineering/agents/review/security-sentinel.md` for security analysis style (plan-level adaptation)
+
+**Test scenarios:**
+- product-lens-reviewer challenges a plan that builds a complex admin dashboard when the stated goal is "improve user onboarding"
+- product-lens-reviewer produces premise challenge as its opening findings
+- design-lens-reviewer rates a user flow at 6/10 and describes what 10 looks like with specific missing states
+- design-lens-reviewer flags a plan describing "a modern card-based dashboard layout" as AI slop risk
+- security-lens-reviewer flags a plan that adds a public API endpoint without mentioning auth or rate limiting
+- security-lens-reviewer does NOT flag code quality issues (suppress condition working)
+- scope-guardian-reviewer identifies a plan with 12 implementation units when 4 would deliver the core value
+- scope-guardian-reviewer identifies that the plan proposes a custom solution when an existing framework would work
+- All four agents produce findings with all required fields
+
+**Verification:**
+- All four agents have valid frontmatter and follow the same structure as Unit 1
+- product-lens-reviewer includes the 3-question premise challenge
+- design-lens-reviewer includes the "rate 0-10, describe what 10 looks like" evaluation pattern
+- scope-guardian-reviewer includes the "what already exists?" opening check
+- All agents define confidence calibration and suppress conditions
+- All agents rely on shared findings-schema.json for output normalization
+
+---
+
+- [x] **Unit 3: Rewrite document-review skill with persona pipeline**
+
+**Goal:** Replace the current single-voice document-review SKILL.md with the persona pipeline orchestrator.
+
+**Requirements:** R1, R4, R5, R6, R7, R8
+
+**Dependencies:** Unit 1, Unit 2
+
+**Files:**
+- Modify: `plugins/compound-engineering/skills/document-review/SKILL.md`
+- Create: `plugins/compound-engineering/skills/document-review/references/findings-schema.json`
+- Create: `plugins/compound-engineering/skills/document-review/references/subagent-template.md`
+- Create: `plugins/compound-engineering/skills/document-review/references/review-output-template.md`
+
+**Approach:**
+
+**Reference files (aligned with ce:review-beta PR #348 mechanism):**
+- `findings-schema.json`: JSON schema that all persona agents must conform to. Same structure as ce:review-beta with document-specific swaps: `section` replaces `file`+`line`, `deferred_questions` replaces `testing_gaps`, drop `pre_existing`. Same enums for severity, autofix_class, owner.
+- `subagent-template.md`: Shared prompt template with variable slots ({persona_file}, {schema}, {document_content}, {document_path}, {document_type}). Rules: "Return ONLY valid JSON matching the schema", suppress below confidence floor, every finding needs evidence. Adapted from ce:review-beta's template for document context instead of diff context.
+- `review-output-template.md`: Markdown template for synthesized output. Findings grouped by severity (P0-P3), pipe-delimited tables with section, issue, reviewer, confidence, and route (autofix_class -> owner). Adapted from ce:review-beta's template for sections instead of file:line.
+
+The rewritten skill has these phases:
+
+**Phase 1 -- Get and Analyze Document:**
+- Same entry point as current: accept a path or find the most recent doc in `docs/brainstorms/` or `docs/plans/`
+- Read the document
+- Classify document type: requirements doc (from brainstorms/) or plan (from plans/)
+- Analyze content for conditional persona activation signals:
+  - product-lens: user-facing features, market claims, scope decisions, prioritization language, requirements with user/customer focus
+  - design-lens: UI/UX references, frontend components, user flows, wireframes, screen/page/view mentions
+  - security-lens: auth/authorization mentions, API endpoints, data handling, payments, tokens, credentials, encryption
+  - scope-guardian: multiple priority tiers (P0/P1/P2), large requirement count (>8), stretch goals, nice-to-haves, scope boundary language that seems misaligned
+
+**Phase 2 -- Announce and Dispatch Personas:**
+- Announce the review team with per-conditional justifications (e.g., "scope-guardian-reviewer -- plan has 12 requirements across 3 priority levels")
+- Build the agent list: always coherence-reviewer + feasibility-reviewer, plus activated conditional agents
+- Dispatch all agents in parallel via Task tool using fully-qualified names (`compound-engineering:review:<name>`)
+- Pass each agent: document content, document path, document type (requirements vs plan), and the structured output schema
+- Each agent receives the full document -- do not split into sections
+
+**Phase 3 -- Synthesize Findings:**
+Synthesis pipeline (order matters):
+1. **Validate**: Check each agent's output for structural compliance against findings-schema.json. Drop malformed findings but note the agent's name for the coverage section.
+2. **Confidence gate**: Suppress findings below 0.50 confidence. Store them as residual concerns.
+3. **Deduplicate**: Fingerprint each finding using `normalize(section) + normalize(title)`. When fingerprints match: keep highest severity, highest confidence, union evidence, note all agreeing reviewers.
+4. **Promote residual concerns**: Scan residual concerns for overlap with existing findings from other reviewers or concrete blocking risks. Promote to findings at P2 with confidence 0.55-0.65.
+5. **Resolve contradictions**: When personas disagree on the same section (e.g., scope-guardian says cut, coherence says keep for narrative flow), create a combined finding presenting both perspectives with autofix_class `manual` and owner `human` -- let the user decide.
+6. **Route by autofix_class**: `safe_auto` -> apply immediately. Everything else (`gated_auto`, `manual`, `advisory`) -> present to user. Personas classify precisely; the orchestrator collapses to 2 buckets.
+7. **Sort**: P0 -> P1 -> P2 -> P3, then by confidence (descending), then document order.
+
+**Phase 4 -- Apply and Present:**
+- Apply `safe_auto` fixes to the document inline (single pass)
+- Present all other findings (`gated_auto`, `manual`, `advisory`) to the user, grouped by severity
+- Show a brief summary: N auto-fixes applied, M findings to consider
+- Show coverage: which personas ran, any suppressed/residual counts
+- Use the review-output-template.md format for consistent presentation
+
+**Phase 5 -- Next Action:**
+- Use the platform's blocking question tool when available (AskUserQuestion in Claude Code, request_user_input in Codex, ask_user in Gemini). Otherwise present numbered options and wait.
+- Offer: "Refine again" or "Review complete"
+- After 2 refinement passes, recommend completion (carry over from current behavior)
+- "Review complete" as terminal signal for callers
+
+**Pipeline mode:** When called from automated workflows, auto-fixes run silently. Strategic questions are still surfaced (the calling skill decides whether to present them or convert to assumptions).
+
+**Protected artifacts:** Carry over from ce:review -- never flag `docs/brainstorms/`, `docs/plans/`, or `docs/solutions/` files for deletion. Discard any such findings during synthesis.
+
+**What NOT to do section:** Carry over current guardrails:
+- Don't rewrite the entire document
+- Don't add new requirements the user didn't discuss
+- Don't create separate review files or metadata sections
+- Don't over-engineer or add complexity
+- Don't add new sections not discussed in the brainstorm/plan
+
+**Conflict resolution rules for synthesis:**
+- When coherence says "keep for consistency" and scope-guardian says "cut for simplicity" -> combined finding, autofix_class: manual, owner: human
+- When feasibility says "this is impossible" and product-lens says "this is essential" -> P1 finding, autofix_class: manual, owner: human, frame as a tradeoff
+- When multiple personas flag the same issue -> merge into single finding, note consensus, increase confidence
+- When a residual concern from one persona matches a finding from another -> promote the concern, note corroboration
+
+**Patterns to follow:**
+- `plugins/compound-engineering/skills/ce-review/SKILL.md` for agent dispatch and synthesis patterns
+- Current `document-review/SKILL.md` for the entry point, iteration guidance, and "What NOT to Do" guardrails
+- iterative-engineering `plan-review/SKILL.md` for synthesis pipeline ordering and fingerprint dedup
+
+**Test scenarios:**
+- A backend refactor plan triggers only coherence + feasibility (no conditional personas)
+- A plan mentioning "user authentication flow" triggers coherence + feasibility + security-lens
+- A plan with UI mockups and 15 requirements triggers all 6 personas
+- A safe_auto finding correctly updates a terminology inconsistency without user approval
+- A gated_auto finding is presented to the user (not auto-applied) despite having a suggested_fix
+- A contradictory finding (scope-guardian vs coherence) is presented as a combined manual finding, not as two separate findings
+- A residual concern from one persona is promoted when corroborated by another persona's finding
+- Findings below 0.50 confidence are suppressed (not shown to user)
+- Duplicate findings from two personas are merged into one with both reviewer names
+- "Review complete" signal works correctly with a caller context
+- Second refinement pass recommends completion
+- Protected artifacts are not flagged for deletion
+
+**Verification:**
+- Skill has valid frontmatter (name: document-review, description updated to reflect persona pipeline)
+- All agent references use fully-qualified namespace (`compound-engineering:review:<name>`)
+- Entry point matches current skill (path or auto-find)
+- Terminal signal "Review complete" preserved
+- Conditional persona selection logic is centralized in the skill
+- Synthesis pipeline follows the correct ordering (validate -> gate -> dedup -> promote -> resolve -> route -> sort)
+- Reference files exist: findings-schema.json, subagent-template.md, review-output-template.md
+- Cross-platform guidance included (platform question tool with fallback)
+- Protected artifacts section present
+
+---
+
+- [x] **Unit 4: Update README and validate**
+
+**Goal:** Update plugin documentation to reflect the new agents and revised skill.
+
+**Requirements:** R1, R7
+
+**Dependencies:** Unit 1, Unit 2, Unit 3
+
+**Files:**
+- Modify: `plugins/compound-engineering/README.md`
+
+**Approach:**
+- Add 6 new agents to the Review table in README.md (coherence-reviewer, design-lens-reviewer, feasibility-reviewer, product-lens-reviewer, scope-guardian-reviewer, security-lens-reviewer)
+- Update agent count from "25+" to "31+" (or appropriate count after adding 6)
+- Update the document-review description in the skills table if it exists
+- Run `bun run release:validate` to verify consistency
+
+**Patterns to follow:**
+- Existing README.md table formatting
+- Alphabetical ordering within the Review agent table
+
+**Test scenarios:**
+- All 6 new agents appear in README Review table
+- Agent count is accurate
+- `bun run release:validate` passes
+
+**Verification:**
+- README agent count matches actual agent file count
+- All new agents listed with accurate descriptions
+- release:validate passes without errors
+
+## System-Wide Impact
+
+- **Interaction graph:** document-review is called from 4 skills (ce-brainstorm, ce-plan, ce-plan-beta, deepen-plan-beta). The "Review complete" contract is preserved, so no caller changes needed.
+- **Error propagation:** If a persona agent fails or times out during parallel dispatch, the orchestrator should proceed with findings from the agents that completed. Do not block the entire review on a single agent failure. Note the failed agent in the coverage section.
+- **State lifecycle risks:** None -- personas are read-only. Only the orchestrator modifies the document, in a single auto-fix pass.
+- **API surface parity:** The skill name (`document-review`) and terminal signal ("Review complete") remain unchanged. No breaking changes to callers.
+- **Integration coverage:** Verify the skill works when invoked standalone and from each of the 4 caller contexts.
+- **Finding noise risk:** With up to 6 personas, the total finding count could be high. The confidence gate (suppress below 0.50), dedup (fingerprint matching), and suppress conditions (per-persona) are the three mechanisms that control noise. If findings are still too noisy in practice, tighten the confidence gate or add suppress conditions.
+
+## Risks & Dependencies
+
+- **Agent dispatch limit:** ce:review auto-switches to serial mode at >5 agents. Maximum dispatch here is 6 (2 always-on + 4 conditional). If all 6 activate, the orchestrator should still use parallel dispatch since these are lightweight document reviewers reading a single document, not code analyzers scanning a codebase. Document this decision in the skill.
+- **Contradictory findings:** The synthesis phase must handle conflicting persona findings explicitly. The initial implementation should lean toward presenting contradictions (both perspectives as a combined finding) rather than auto-resolving them. This preserves value even if it's slightly noisier.
+- **Finding volume at full activation:** When all 6 personas activate on a large document, the total pre-dedup finding count could exceed 20-30. The synthesis pipeline (confidence gate + dedup + suppress conditions) should reduce this to a manageable set. If it doesn't, the first lever to pull is tightening per-persona suppress conditions.
+- **Persona prompt quality:** The agents are only as good as their prompts. The established review patterns and iterative-engineering references provide battle-tested material, but the compound-engineering versions will be new and may need iteration. Plan for 1-2 rounds of prompt refinement after initial implementation.
+
+## Sources & References
+
+- **Origin document:** [docs/brainstorms/2026-03-23-plan-review-personas-requirements.md](docs/brainstorms/2026-03-23-plan-review-personas-requirements.md)
+- Related code: `plugins/compound-engineering/skills/ce-review/SKILL.md` (multi-agent orchestration pattern)
+- Related code: `plugins/compound-engineering/skills/document-review/SKILL.md` (current implementation to replace)
+- Related code: `plugins/compound-engineering/agents/review/` (agent structure reference)
+- Related pattern: iterative-engineering `skills/plan-review/SKILL.md` (synthesis pipeline, findings schema, subagent template)
+- Related pattern: iterative-engineering `agents/coherence-reviewer.md`, `feasibility-reviewer.md`, `scope-guardian-reviewer.md`, `prd-reviewer.md`, `tech-plan-reviewer.md`, `skeptic-reviewer.md` (persona prompt design, confidence calibration, suppress conditions)
+- Related learning: `docs/solutions/skill-design/compound-refresh-skill-improvements.md` (subagent design patterns)
+- Related learning: `docs/solutions/skill-design/claude-permissions-optimizer-classification-fix.md` (pipeline ordering, classification correctness)
--- a/plugins/compound-engineering/AGENTS.md
+++ b/plugins/compound-engineering/AGENTS.md
@@ -34,6 +34,7 @@ Before committing ANY changes:
 ```
 agents/
 ├── review/           # Code review agents
+├── document-review/  # Plan and requirements document review agents
 ├── research/         # Research and analysis agents
 ├── design/           # Design and UI agents
 └── docs/             # Documentation agents
@@ -131,7 +132,7 @@ grep -E '^description:' skills/*/SKILL.md
 ## Adding Components

 - **New skill:** Create `skills/<name>/SKILL.md` with required YAML frontmatter (`name`, `description`). Reference files go in `skills/<name>/references/`. Add the skill to the appropriate category table in `README.md` and update the skill count.
- **New agent:** Create `agents/<category>/<name>.md` with frontmatter. Categories: `review`, `research`, `design`, `docs`, `workflow`. Add the agent to `README.md` and update the agent count.
+- **New agent:** Create `agents/<category>/<name>.md` with frontmatter. Categories: `review`, `document-review`, `research`, `design`, `docs`, `workflow`. Add the agent to `README.md` and update the agent count.

 ## Upstream-Sourced Skills

--- a/plugins/compound-engineering/README.md
+++ b/plugins/compound-engineering/README.md
@@ -6,7 +6,7 @@ AI-powered development tools that get smarter with every use. Make each unit of

 | Component | Count |
 |-----------|-------|
-| Agents | 25+ |
+| Agents | 35+ |
 | Skills | 40+ |
 | MCP Servers | 1 |

@@ -42,6 +42,17 @@ Agents are organized into categories for easier discovery.
 | `security-sentinel` | Security audits and vulnerability assessments |
 | `testing-reviewer` | Test coverage gaps, weak assertions (ce:review-beta persona) |

+### Document Review
+
+| Agent | Description |
+|-------|-------------|
+| `coherence-reviewer` | Review documents for internal consistency, contradictions, and terminology drift |
+| `design-lens-reviewer` | Review plans for missing design decisions, interaction states, and AI slop risk |
+| `feasibility-reviewer` | Evaluate whether proposed technical approaches will survive contact with reality |
+| `product-lens-reviewer` | Challenge problem framing, evaluate scope decisions, surface goal misalignment |
+| `scope-guardian-reviewer` | Challenge unjustified complexity, scope creep, and premature abstractions |
+| `security-lens-reviewer` | Evaluate plans for security gaps at the plan level (auth, data, APIs) |
+
 ### Research

 | Agent | Description |
@@ -134,7 +145,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou

 | Skill | Description |
 |-------|-------------|
-| `document-review` | Improve documents through structured self-review |
+| `document-review` | Review documents using parallel persona agents for role-specific feedback |
 | `every-style-editor` | Review copy for Every's style guide compliance |
 | `file-todos` | File-based todo tracking system |
 | `git-worktree` | Manage Git worktrees for parallel development |
--- a/plugins/compound-engineering/agents/document-review/coherence-reviewer.md
+++ b/plugins/compound-engineering/agents/document-review/coherence-reviewer.md
@@ -0,0 +1,37 @@
+---
+name: coherence-reviewer
+description: "Reviews planning documents for internal consistency -- contradictions between sections, terminology drift, structural issues, and ambiguity where readers would diverge. Spawned by the document-review skill."
+model: haiku
+---
+
+You are a technical editor reading for internal consistency. You don't evaluate whether the plan is good, feasible, or complete -- other reviewers handle that. You catch when the document disagrees with itself.
+
+## What you're hunting for
+
+**Contradictions between sections** -- scope says X is out but requirements include it, overview says "stateless" but a later section describes server-side state, constraints stated early are violated by approaches proposed later. When two parts can't both be true, that's a finding.
+
+**Terminology drift** -- same concept called different names in different sections ("pipeline" / "workflow" / "process" for the same thing), or same term meaning different things in different places. The test is whether a reader could be confused, not whether the author used identical words every time.
+
+**Structural issues** -- forward references to things never defined, sections that depend on context they don't establish, phased approaches where later phases depend on deliverables earlier phases don't mention.
+
+**Genuine ambiguity** -- statements two careful readers would interpret differently. Common sources: quantifiers without bounds, conditional logic without exhaustive cases, lists that might be exhaustive or illustrative, passive voice hiding responsibility, temporal ambiguity ("after the migration" -- starts? completes? verified?).
+
+**Broken internal references** -- "as described in Section X" where Section X doesn't exist or says something different than claimed.
+
+**Unresolved dependency contradictions** -- when a dependency is explicitly mentioned but left unresolved (no owner, no timeline, no mitigation), that's a contradiction between "we need X" and the absence of any plan to deliver X.
+
+## Confidence calibration
+
+- **HIGH (0.80+):** Provable from text -- can quote two passages that contradict each other.
+- **MODERATE (0.60-0.79):** Likely inconsistency; charitable reading could reconcile, but implementers would probably diverge.
+- **Below 0.50:** Suppress entirely.
+
+## What you don't flag
+
+- Style preferences (word choice, formatting, bullet vs numbered lists)
+- Missing content that belongs to other personas (security gaps, feasibility issues)
+- Imprecision that isn't ambiguity ("fast" is vague but not incoherent)
+- Formatting inconsistencies (header levels, indentation, markdown style)
+- Document organization opinions when the structure works without self-contradiction
+- Explicitly deferred content ("TBD," "out of scope," "Phase 2")
+- Terms the audience would understand without formal definition
--- a/plugins/compound-engineering/agents/document-review/design-lens-reviewer.md
+++ b/plugins/compound-engineering/agents/document-review/design-lens-reviewer.md
@@ -0,0 +1,44 @@
+---
+name: design-lens-reviewer
+description: "Reviews planning documents for missing design decisions -- information architecture, interaction states, user flows, and AI slop risk. Uses dimensional rating to identify gaps. Spawned by the document-review skill."
+model: inherit
+---
+
+You are a senior product designer reviewing plans for missing design decisions. Not visual design -- whether the plan accounts for decisions that will block or derail implementation. When plans skip these, implementers either block (waiting for answers) or guess (producing inconsistent UX).
+
+## Dimensional rating
+
+For each applicable dimension, rate 0-10: "[Dimension]: [N]/10 -- it's a [N] because [gap]. A 10 would have [what's needed]." Only produce findings for 7/10 or below. Skip irrelevant dimensions.
+
+**Information architecture** -- What does the user see first/second/third? Content hierarchy, navigation model, grouping rationale. A 10 has clear priority, navigation model, and grouping reasoning.
+
+**Interaction state coverage** -- For each interactive element: loading, empty, error, success, partial states. A 10 has every state specified with content.
+
+**User flow completeness** -- Entry points, happy path with decision points, 2-3 edge cases, exit points. A 10 has a flow description covering all of these.
+
+**Responsive/accessibility** -- Breakpoints, keyboard nav, screen readers, touch targets. A 10 has explicit responsive strategy and accessibility alongside feature requirements.
+
+**Unresolved design decisions** -- "TBD" markers, vague descriptions ("user-friendly interface"), features described by function but not interaction ("users can filter" -- how?). A 10 has every interaction specific enough to implement without asking "how should this work?"
+
+## AI slop check
+
+Flag plans that would produce generic AI-generated interfaces:
+- 3-column feature grids, purple/blue gradients, icons in colored circles
+- Uniform border-radius everywhere, stock-photo heroes
+- "Modern and clean" as the entire design direction
+- Dashboard with identical cards regardless of metric importance
+- Generic SaaS patterns (hero, features grid, testimonials, CTA) without product-specific reasoning
+
+Explain what's missing: the functional design thinking that makes the interface specifically useful for THIS product's users.
+
+## Confidence calibration
+
+- **HIGH (0.80+):** Missing states/flows that will clearly cause UX problems during implementation.
+- **MODERATE (0.60-0.79):** Gap exists but a skilled designer could resolve from context.
+- **Below 0.50:** Suppress.
+
+## What you don't flag
+
+- Backend details, performance, security (security-lens), business strategy
+- Database schema, code organization, technical architecture
+- Visual design preferences unless they indicate AI slop
--- a/plugins/compound-engineering/agents/document-review/feasibility-reviewer.md
+++ b/plugins/compound-engineering/agents/document-review/feasibility-reviewer.md
@@ -0,0 +1,40 @@
+---
+name: feasibility-reviewer
+description: "Evaluates whether proposed technical approaches in planning documents will survive contact with reality -- architecture conflicts, dependency gaps, migration risks, and implementability. Spawned by the document-review skill."
+model: inherit
+---
+
+You are a systems architect evaluating whether this plan can actually be built as described and whether an implementer could start working from it without making major architectural decisions the plan should have made.
+
+## What you check
+
+**"What already exists?"** -- Does the plan acknowledge existing code, services, and infrastructure? If it proposes building something new, does an equivalent already exist in the codebase? Does it assume greenfield when reality is brownfield? This check requires reading the codebase alongside the plan.
+
+**Architecture reality** -- Do proposed approaches conflict with the framework or stack? Does the plan assume capabilities the infrastructure doesn't have? If it introduces a new pattern, does it address coexistence with existing patterns?
+
+**Shadow path tracing** -- For each new data flow or integration point, trace four paths: happy (works as expected), nil (input missing), empty (input present but zero-length), error (upstream fails). Produce a finding for any path the plan doesn't address. Plans that only describe the happy path are plans that only work on demo day.
+
+**Dependencies** -- Are external dependencies identified? Are there implicit dependencies it doesn't acknowledge?
+
+**Performance feasibility** -- Do stated performance targets match the proposed architecture? Back-of-envelope math is sufficient. If targets are absent but the work is latency-sensitive, flag the gap.
+
+**Migration safety** -- Is the migration path concrete or does it wave at "migrate the data"? Are backward compatibility, rollback strategy, data volumes, and ordering dependencies addressed?
+
+**Implementability** -- Could an engineer start coding tomorrow? Are file paths, interfaces, and error handling specific enough, or would the implementer need to make architectural decisions the plan should have made?
+
+Apply each check only when relevant. Silence is only a finding when the gap would block implementation.
+
+## Confidence calibration
+
+- **HIGH (0.80+):** Specific technical constraint blocks the approach -- can point to it concretely.
+- **MODERATE (0.60-0.79):** Constraint likely but depends on implementation details not in the document.
+- **Below 0.50:** Suppress entirely.
+
+## What you don't flag
+
+- Implementation style choices (unless they conflict with existing constraints)
+- Testing strategy details
+- Code organization preferences
+- Theoretical scalability concerns without evidence of a current problem
+- "It would be better to..." preferences when the proposed approach works
+- Details the plan explicitly defers
--- a/plugins/compound-engineering/agents/document-review/product-lens-reviewer.md
+++ b/plugins/compound-engineering/agents/document-review/product-lens-reviewer.md
@@ -0,0 +1,48 @@
+---
+name: product-lens-reviewer
+description: "Reviews planning documents as a senior product leader -- challenges problem framing, evaluates scope decisions, and surfaces misalignment between stated goals and proposed work. Spawned by the document-review skill."
+model: inherit
+---
+
+You are a senior product leader. The most common failure mode is building the wrong thing well. Challenge the premise before evaluating the execution.
+
+## Analysis protocol
+
+### 1. Premise challenge (always first)
+
+For every plan, ask these three questions. Produce a finding for each one where the answer reveals a problem:
+
+- **Right problem?** Could a different framing yield a simpler or more impactful solution? Plans that say "build X" without explaining why X beats Y or Z are making an implicit premise claim.
+- **Actual outcome?** Trace from proposed work to user impact. Is this the most direct path, or is it solving a proxy problem? Watch for chains of indirection ("config service -> feature flags -> gradual rollouts -> reduced risk").
+- **What if we did nothing?** Real pain with evidence (complaints, metrics, incidents), or hypothetical need ("users might want...")? Hypothetical needs get challenged harder.
+- **Inversion: what would make this fail?** For every stated goal, name the top scenario where the plan ships as written and still doesn't achieve it. Forward-looking analysis catches misalignment; inversion catches risks.
+
+### 2. Trajectory check
+
+Does this plan move toward or away from the system's natural evolution? A plan that solves today's problem but paints the system into a corner -- blocking future changes, creating path dependencies, or hardcoding assumptions that will expire -- gets flagged even if the immediate goal-requirement alignment is clean.
+
+### 3. Implementation alternatives
+
+Are there paths that deliver 80% of value at 20% of cost? Buy-vs-build considered? Would a different sequence deliver value sooner? Only produce findings when a concrete simpler alternative exists.
+
+### 4. Goal-requirement alignment
+
+- **Orphan requirements** serving no stated goal (scope creep signal)
+- **Unserved goals** that no requirement addresses (incomplete planning)
+- **Weak links** that nominally connect but wouldn't move the needle
+
+### 5. Prioritization coherence
+
+If priority tiers exist: do assignments match stated goals? Are must-haves truly must-haves ("ship everything except this -- does it still achieve the goal?")? Do P0s depend on P2s?
+
+## Confidence calibration
+
+- **HIGH (0.80+):** Can quote both the goal and the conflicting work -- disconnect is clear.
+- **MODERATE (0.60-0.79):** Likely misalignment, depends on business context not in document.
+- **Below 0.50:** Suppress.
+
+## What you don't flag
+
+- Implementation details, technical architecture, measurement methodology
+- Style/formatting, security (security-lens), design (design-lens)
+- Scope sizing (scope-guardian), internal consistency (coherence-reviewer)
--- a/plugins/compound-engineering/agents/document-review/scope-guardian-reviewer.md
+++ b/plugins/compound-engineering/agents/document-review/scope-guardian-reviewer.md
@@ -0,0 +1,52 @@
+---
+name: scope-guardian-reviewer
+description: "Reviews planning documents for scope alignment and unjustified complexity -- challenges unnecessary abstractions, premature frameworks, and scope that exceeds stated goals. Spawned by the document-review skill."
+model: inherit
+---
+
+You ask two questions about every plan: "Is this right-sized for its goals?" and "Does every abstraction earn its keep?" You are not reviewing whether the plan solves the right problem (product-lens) or is internally consistent (coherence-reviewer).
+
+## Analysis protocol
+
+### 1. "What already exists?" (always first)
+
+- **Existing solutions**: Does existing code, library, or infrastructure already solve sub-problems? Has the plan considered what already exists before proposing to build?
+- **Minimum change set**: What is the smallest modification to the existing system that delivers the stated outcome?
+- **Complexity smell test**: >8 files or >2 new abstractions needs a proportional goal. 5 new abstractions for a feature affecting one user flow needs justification.
+
+### 2. Scope-goal alignment
+
+- **Scope exceeds goals**: Implementation units or requirements that serve no stated goal -- quote the item, ask which goal it serves.
+- **Goals exceed scope**: Stated goals that no scope item delivers.
+- **Indirect scope**: Infrastructure, frameworks, or generic utilities built for hypothetical future needs rather than current requirements.
+
+### 3. Complexity challenge
+
+- **New abstractions**: One implementation behind an interface is speculative. What does the generality buy today?
+- **Custom vs. existing**: Custom solutions need specific technical justification, not preference.
+- **Framework-ahead-of-need**: Building "a system for X" when the goal is "do X once."
+- **Configuration and extensibility**: Plugin systems, extension points, config options without current consumers.
+
+### 4. Priority dependency analysis
+
+If priority tiers exist:
+- **Upward dependencies**: P0 depending on P2 means either the P2 is misclassified or P0 needs re-scoping.
+- **Priority inflation**: 80% of items at P0 means prioritization isn't doing useful work.
+- **Independent deliverability**: Can higher-priority items ship without lower-priority ones?
+
+### 5. Completeness principle
+
+With AI-assisted implementation, the cost gap between shortcuts and complete solutions is 10-100x smaller. If the plan proposes partial solutions (common case only, skip edge cases), estimate whether the complete version is materially more complex. If not, recommend complete. Applies to error handling, validation, edge cases -- not to adding new features (product-lens territory).
+
+## Confidence calibration
+
+- **HIGH (0.80+):** Can quote goal statement and scope item showing the mismatch.
+- **MODERATE (0.60-0.79):** Misalignment likely but depends on context not in document.
+- **Below 0.50:** Suppress.
+
+## What you don't flag
+
+- Implementation style, technology selection
+- Product strategy, priority preferences (product-lens)
+- Missing requirements (coherence-reviewer), security (security-lens)
+- Design/UX (design-lens), technical feasibility (feasibility-reviewer)
--- a/plugins/compound-engineering/agents/document-review/security-lens-reviewer.md
+++ b/plugins/compound-engineering/agents/document-review/security-lens-reviewer.md
@@ -0,0 +1,36 @@
+---
+name: security-lens-reviewer
+description: "Evaluates planning documents for security gaps at the plan level -- auth/authz assumptions, data exposure risks, API surface vulnerabilities, and missing threat model elements. Spawned by the document-review skill."
+model: inherit
+---
+
+You are a security architect evaluating whether this plan accounts for security at the planning level. Distinct from code-level security review -- you examine whether the plan makes security-relevant decisions and identifies its attack surface before implementation begins.
+
+## What you check
+
+Skip areas not relevant to the document's scope.
+
+**Attack surface inventory** -- New endpoints (who can access?), new data stores (sensitivity? access control?), new integrations (what crosses the trust boundary?), new user inputs (validation mentioned?). Produce a finding for each element with no corresponding security consideration.
+
+**Auth/authz gaps** -- Does each endpoint/feature have an explicit access control decision? Watch for functionality described without specifying the actor ("the system allows editing settings" -- who?). New roles or permission changes need defined boundaries.
+
+**Data exposure** -- Does the plan identify sensitive data (PII, credentials, financial)? Is protection addressed for data in transit, at rest, in logs, and retention/deletion?
+
+**Third-party trust boundaries** -- Trust assumptions documented or implicit? Credential storage and rotation defined? Failure modes (compromise, malicious data, unavailability) addressed? Minimum necessary data shared?
+
+**Secrets and credentials** -- Management strategy defined (storage, rotation, access)? Risk of hardcoding, source control, or logging? Environment separation?
+
+**Plan-level threat model** -- Not a full model. Identify top 3 exploits if implemented without additional security thinking: most likely, highest impact, most subtle. One sentence each plus needed mitigation.
+
+## Confidence calibration
+
+- **HIGH (0.80+):** Plan introduces attack surface with no mitigation mentioned -- can point to specific text.
+- **MODERATE (0.60-0.79):** Concern likely but plan may address implicitly or in a later phase.
+- **Below 0.50:** Suppress.
+
+## What you don't flag
+
+- Code quality, non-security architecture, business logic
+- Performance (unless it creates a DoS vector)
+- Style/formatting, scope (product-lens), design (design-lens)
+- Internal consistency (coherence-reviewer)
--- a/plugins/compound-engineering/skills/document-review/SKILL.md
+++ b/plugins/compound-engineering/skills/document-review/SKILL.md
@@ -1,88 +1,191 @@
 ---
 name: document-review
-description: This skill should be used to refine requirements or plan documents before proceeding to the next workflow step. It applies when a requirements document or plan document exists and the user wants to improve it.
+description: Review requirements or plan documents using parallel persona agents that surface role-specific issues. Use when a requirements document or plan document exists and the user wants to improve it.
 ---

 # Document Review

-Improve requirements or plan documents through structured review.
+Review requirements or plan documents through multi-persona analysis. Dispatches specialized reviewer agents in parallel, auto-fixes quality issues, and presents strategic questions for user decision.

-## Step 1: Get the Document
+## Phase 1: Get and Analyze Document

-**If a document path is provided:** Read it, then proceed to Step 2.
+**If a document path is provided:** Read it, then proceed.

-**If no document is specified:** Ask which document to review, or look for the most recent requirements/plan in `docs/brainstorms/` or `docs/plans/`.
+**If no document is specified:** Ask which document to review, or find the most recent in `docs/brainstorms/` or `docs/plans/` using a file-search/glob tool (e.g., Glob in Claude Code).

-## Step 2: Assess
+### Classify Document Type

-Read through the document and ask:
+After reading, classify the document:
+- **requirements** -- from `docs/brainstorms/`, focuses on what to build and why
+- **plan** -- from `docs/plans/`, focuses on how to build it with implementation details

- What is unclear?
- What is unnecessary?
- What decision is being avoided?
- What assumptions are unstated?
- Where could scope accidentally expand?
+### Select Conditional Personas

-These questions surface issues. Don't fix yet—just note what you find.
+Analyze the document content to determine which conditional personas to activate. Check for these signals:

-## Step 3: Evaluate
+**product-lens** -- activate when the document contains:
+- User-facing features, user stories, or customer-focused language
+- Market claims, competitive positioning, or business justification
+- Scope decisions, prioritization language, or priority tiers with feature assignments
+- Requirements with user/customer/business outcome focus

-Score the document against these criteria:
+**design-lens** -- activate when the document contains:
+- UI/UX references, frontend components, or visual design language
+- User flows, wireframes, screen/page/view mentions
+- Interaction descriptions (forms, buttons, navigation, modals)
+- References to responsive behavior or accessibility

-| Criterion | What to Check |
-|-----------|---------------|
-| **Clarity** | Problem statement is clear, no vague language ("probably," "consider," "try to") |
-| **Completeness** | Required sections present, constraints stated, and outstanding questions clearly marked as blocking or deferred |
-| **Specificity** | Concrete enough for next step (requirements → can plan, plan → can implement) |
-| **Appropriate Level** | Requirements doc stays at behavior/scope level and does not drift into implementation unless the document is inherently technical |
-| **YAGNI** | Avoid speculative complexity whose carrying cost outweighs its value; keep low-cost, meaningful polish when it is easy to maintain |
+**security-lens** -- activate when the document contains:
+- Auth/authorization mentions, login flows, session management
+- API endpoints exposed to external clients
+- Data handling, PII, payments, tokens, credentials, encryption
+- Third-party integrations with trust boundary implications

-If invoked within a workflow (after `/ce:brainstorm` or `/ce:plan`), also check:
- **User intent fidelity** — Document reflects what was discussed, assumptions validated
+**scope-guardian** -- activate when the document contains:
+- Multiple priority tiers (P0/P1/P2, must-have/should-have/nice-to-have)
+- Large requirement count (>8 distinct requirements or implementation units)
+- Stretch goals, nice-to-haves, or "future work" sections
+- Scope boundary language that seems misaligned with stated goals
+- Goals that don't clearly connect to requirements

-## Step 4: Identify the Critical Improvement
+## Phase 2: Announce and Dispatch Personas

-Among everything found in Steps 2-3, does one issue stand out? If something would significantly improve the document's quality, this is the "must address" item. Highlight it prominently.
+### Announce the Review Team

-## Step 5: Make Changes
+Tell the user which personas will review and why. For conditional personas, include the justification:

-Present your findings, then:
+```
+Reviewing with:
+- coherence-reviewer (always-on)
+- feasibility-reviewer (always-on)
+- scope-guardian-reviewer -- plan has 12 requirements across 3 priority levels
+- security-lens-reviewer -- plan adds API endpoints with auth flow
+```

-1. **Auto-fix** minor issues (vague language, formatting) without asking
-2. **Ask approval** before substantive changes (restructuring, removing sections, changing meaning)
-3. **Update** the document inline—no separate files, no metadata sections
+### Build Agent List

-### Simplification Guidance
+Always include:
+- `compound-engineering:document-review:coherence-reviewer`
+- `compound-engineering:document-review:feasibility-reviewer`

-Simplification is purposeful removal of unnecessary complexity, not shortening for its own sake.
+Add activated conditional personas:
+- `compound-engineering:document-review:product-lens-reviewer`
+- `compound-engineering:document-review:design-lens-reviewer`
+- `compound-engineering:document-review:security-lens-reviewer`
+- `compound-engineering:document-review:scope-guardian-reviewer`

-**Simplify when:**
- Content serves hypothetical future needs without enough current value to justify its carrying cost
- Sections repeat information already covered elsewhere
- Detail exceeds what's needed to take the next step
- Abstractions or structure add overhead without clarity
+### Dispatch

-**Don't simplify:**
- Constraints or edge cases that affect implementation
- Rationale that explains why alternatives were rejected
- Open questions that need resolution
- Deferred technical or research questions that are intentionally carried forward to the next stage
+Dispatch all agents in **parallel** using the platform's task/agent tool (e.g., Agent tool in Claude Code, spawn in Codex). Each agent receives the prompt built from the [subagent template](./references/subagent-template.md) with these variables filled:

-**Also remove when inappropriate:**
- Library choices, file structures, endpoints, schemas, or other implementation details that do not belong in a non-technical requirements document
+| Variable | Value |
+|----------|-------|
+| `{persona_file}` | Full content of the agent's markdown file |
+| `{schema}` | Content of [findings-schema.json](./references/findings-schema.json) |
+| `{document_type}` | "requirements" or "plan" from Phase 1 classification |
+| `{document_path}` | Path to the document |
+| `{document_content}` | Full text of the document |

-## Step 6: Offer Next Action
+Pass each agent the **full document** -- do not split into sections.

-After changes are complete, ask:
+**Error handling:** If an agent fails or times out, proceed with findings from agents that completed. Note the failed agent in the Coverage section. Do not block the entire review on a single agent failure.

-1. **Refine again** - Another review pass
-2. **Review complete** - Document is ready
+**Dispatch limit:** Even at maximum (6 agents), use parallel dispatch. These are document reviewers with bounded scope reading a single document -- parallel is safe and fast.

-### Iteration Guidance
+## Phase 3: Synthesize Findings

-After 2 refinement passes, recommend completion—diminishing returns are likely. But if the user wants to continue, allow it.
+Process findings from all agents through this pipeline. **Order matters** -- each step depends on the previous.

-Return control to the caller (workflow or user) after selection.
+### 3.1 Validate
+
+Check each agent's returned JSON against [findings-schema.json](./references/findings-schema.json):
+- Drop findings missing any required field defined in the schema
+- Drop findings with invalid enum values
+- Note the agent name for any malformed output in the Coverage section
+
+### 3.2 Confidence Gate
+
+Suppress findings below 0.50 confidence. Store them as residual concerns for potential promotion in step 3.4.
+
+### 3.3 Deduplicate
+
+Fingerprint each finding using `normalize(section) + normalize(title)`. Normalization: lowercase, strip punctuation, collapse whitespace.
+
+When fingerprints match across personas:
+- If the findings recommend **opposing actions** (e.g., one says cut, the other says keep), do not merge -- preserve both for contradiction resolution in 3.5
+- Otherwise merge: keep the highest severity, keep the highest confidence, union all evidence arrays, note all agreeing reviewers (e.g., "coherence, feasibility")
+
+### 3.4 Promote Residual Concerns
+
+Scan the residual concerns (findings suppressed in 3.2) for:
+- **Cross-persona corroboration**: A residual concern from Persona A overlaps with an above-threshold finding from Persona B. Promote at P2 with confidence 0.55-0.65.
+- **Concrete blocking risks**: A residual concern describes a specific, concrete risk that would block implementation. Promote at P2 with confidence 0.55.
+
+### 3.5 Resolve Contradictions
+
+When personas disagree on the same section:
+- Create a **combined finding** presenting both perspectives
+- Set `autofix_class: present`
+- Frame as a tradeoff, not a verdict
+
+Specific conflict patterns:
+- Coherence says "keep for consistency" + scope-guardian says "cut for simplicity" -> combined finding, let user decide
+- Feasibility says "this is impossible" + product-lens says "this is essential" -> P1 finding framed as a tradeoff
+- Multiple personas flag the same issue -> merge into single finding, note consensus, increase confidence
+
+### 3.6 Route by Autofix Class
+
+| Autofix Class | Route |
+|---------------|-------|
+| `auto` | Apply automatically -- local deterministic fix (terminology, formatting, cross-references) |
+| `present` | Present to user for judgment |
+
+Demote any `auto` finding that lacks a `suggested_fix` to `present` -- the orchestrator cannot apply a fix without concrete replacement text.
+
+### 3.7 Sort
+
+Sort findings for presentation: P0 -> P1 -> P2 -> P3, then by confidence (descending), then by document order (section position).
+
+## Phase 4: Apply and Present
+
+### Apply Auto-fixes
+
+Apply all `auto` findings to the document in a **single pass**:
+- Edit the document inline using the platform's edit tool
+- Track what was changed for the "Auto-fixes Applied" section
+- Do not ask for approval -- these are unambiguously correct (terminology fixes, formatting, cross-references)
+
+### Present Remaining Findings
+
+Present all other findings to the user using the format from [review-output-template.md](./references/review-output-template.md):
+- Group by severity (P0 -> P3)
+- Include the Coverage table showing which personas ran
+- Show auto-fixes that were applied
+- Include residual concerns and deferred questions if any
+
+Brief summary at the top: "Applied N auto-fixes. M findings to consider (X at P0/P1)."
+
+### Protected Artifacts
+
+During synthesis, discard any finding that recommends deleting or removing files in:
+- `docs/brainstorms/`
+- `docs/plans/`
+- `docs/solutions/`
+
+These are pipeline artifacts and must not be flagged for removal.
+
+## Phase 5: Next Action
+
+Use the platform's blocking question tool when available (AskUserQuestion in Claude Code, request_user_input in Codex, ask_user in Gemini). Otherwise present numbered options and wait for the user's reply.
+
+Offer:
+
+1. **Refine again** -- another review pass
+2. **Review complete** -- document is ready
+
+After 2 refinement passes, recommend completion -- diminishing returns are likely. But if the user wants to continue, allow it.
+
+Return "Review complete" as the terminal signal for callers.

 ## What NOT to Do

@@ -90,3 +193,8 @@ Return control to the caller (workflow or user) after selection.
 - Do not add new sections or requirements the user didn't discuss
 - Do not over-engineer or add complexity
 - Do not create separate review files or add metadata sections
+- Do not modify any of the 4 caller skills (ce-brainstorm, ce-plan, ce-plan-beta, deepen-plan-beta)
+
+## Iteration Guidance
+
+On subsequent passes, re-dispatch personas and re-synthesize. The auto-fix mechanism and confidence gating prevent the same findings from recurring once fixed. If findings are repetitive across passes, recommend completion.
--- a/plugins/compound-engineering/skills/document-review/references/findings-schema.json
+++ b/plugins/compound-engineering/skills/document-review/references/findings-schema.json
@@ -0,0 +1,98 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "Document Review Findings",
+  "description": "Structured output schema for document review persona agents",
+  "type": "object",
+  "required": ["reviewer", "findings", "residual_risks", "deferred_questions"],
+  "properties": {
+    "reviewer": {
+      "type": "string",
+      "description": "Persona name that produced this output (e.g., 'coherence', 'feasibility', 'product-lens')"
+    },
+    "findings": {
+      "type": "array",
+      "description": "List of document review findings. Empty array if no issues found.",
+      "items": {
+        "type": "object",
+        "required": [
+          "title",
+          "severity",
+          "section",
+          "why_it_matters",
+          "autofix_class",
+          "confidence",
+          "evidence"
+        ],
+        "properties": {
+          "title": {
+            "type": "string",
+            "description": "Short, specific issue title. 10 words or fewer.",
+            "maxLength": 100
+          },
+          "severity": {
+            "type": "string",
+            "enum": ["P0", "P1", "P2", "P3"],
+            "description": "Issue severity level"
+          },
+          "section": {
+            "type": "string",
+            "description": "Document section where the issue appears (e.g., 'Requirements Trace', 'Implementation Unit 3', 'Overview')"
+          },
+          "why_it_matters": {
+            "type": "string",
+            "description": "Impact statement -- not 'what is wrong' but 'what goes wrong if not addressed'"
+          },
+          "autofix_class": {
+            "type": "string",
+            "enum": ["auto", "present"],
+            "description": "How this issue should be handled. auto = local deterministic fix the orchestrator can apply without asking (terminology, formatting, cross-references). present = requires user judgment."
+          },
+          "suggested_fix": {
+            "type": ["string", "null"],
+            "description": "Concrete fix text. Omit or null if no good fix is obvious -- a bad suggestion is worse than none."
+          },
+          "confidence": {
+            "type": "number",
+            "description": "Reviewer confidence in this finding, calibrated per persona",
+            "minimum": 0.0,
+            "maximum": 1.0
+          },
+          "evidence": {
+            "type": "array",
+            "description": "Quoted text from the document that supports this finding. At least 1 item.",
+            "items": { "type": "string" },
+            "minItems": 1
+          }
+        }
+      }
+    },
+    "residual_risks": {
+      "type": "array",
+      "description": "Risks the reviewer noticed but could not confirm as findings (below confidence threshold)",
+      "items": { "type": "string" }
+    },
+    "deferred_questions": {
+      "type": "array",
+      "description": "Questions that should be resolved in a later workflow stage (planning, implementation)",
+      "items": { "type": "string" }
+    }
+  },
+
+  "_meta": {
+    "confidence_thresholds": {
+      "suppress": "Below 0.50 -- do not report. Finding is speculative noise.",
+      "flag": "0.50-0.69 -- include only when the persona's calibration says the issue is actionable at that confidence.",
+      "report": "0.70+ -- report with full confidence."
+    },
+    "severity_definitions": {
+      "P0": "Contradictions or gaps that would cause building the wrong thing. Must fix before proceeding.",
+      "P1": "Significant gap likely hit during planning or implementation. Should fix.",
+      "P2": "Moderate issue with meaningful downside. Fix if straightforward.",
+      "P3": "Minor improvement. User's discretion."
+    },
+    "autofix_classes": {
+      "auto": "Local, deterministic document fix: terminology consistency, formatting, cross-reference correction. Must be unambiguous and not change the document's meaning.",
+      "present": "Requires user judgment -- strategic questions, tradeoffs, meaning-changing fixes, or informational findings."
+    }
+  }
+}
--- a/plugins/compound-engineering/skills/document-review/references/review-output-template.md
+++ b/plugins/compound-engineering/skills/document-review/references/review-output-template.md
@@ -0,0 +1,78 @@
+# Document Review Output Template
+
+Use this **exact format** when presenting synthesized review findings. Findings are grouped by severity, not by reviewer.
+
+**IMPORTANT:** Use pipe-delimited markdown tables (`| col | col |`). Do NOT use ASCII box-drawing characters.
+
+## Example
+
+```markdown
+## Document Review Results
+
+**Document:** docs/plans/2026-03-15-feat-user-auth-plan.md
+**Type:** plan
+**Reviewers:** coherence, feasibility, security-lens, scope-guardian
+- security-lens -- plan adds public API endpoint with auth flow
+- scope-guardian -- plan has 15 requirements across 3 priority levels
+
+### Auto-fixes Applied
+
+- Standardized "pipeline"/"workflow" terminology to "pipeline" throughout (coherence, auto)
+- Fixed cross-reference: Section 4 referenced "Section 3.2" which is actually "Section 3.1" (coherence, auto)
+
+### P0 -- Must Fix
+
+| # | Section | Issue | Reviewer | Confidence | Route |
+|---|---------|-------|----------|------------|-------|
+| 1 | Requirements Trace | Goal states "offline support" but technical approach assumes persistent connectivity | coherence | 0.92 | `present` |
+
+### P1 -- Should Fix
+
+| # | Section | Issue | Reviewer | Confidence | Route |
+|---|---------|-------|----------|------------|-------|
+| 2 | Implementation Unit 3 | Plan proposes custom auth when codebase already uses Devise | feasibility | 0.85 | `present` |
+| 3 | Scope Boundaries | 8 of 12 units build admin infrastructure; only 2 touch stated goal | scope-guardian | 0.80 | `present` |
+
+### P2 -- Consider Fixing
+
+| # | Section | Issue | Reviewer | Confidence | Route |
+|---|---------|-------|----------|------------|-------|
+| 4 | API Design | Public webhook endpoint has no rate limiting mentioned | security-lens | 0.75 | `present` |
+
+### P3 -- Minor
+
+| # | Section | Issue | Reviewer | Confidence | Route |
+|---|---------|-------|----------|------------|-------|
+| 5 | Overview | "Service" used to mean both microservice and business class | coherence | 0.65 | `auto` |
+
+### Residual Concerns
+
+| # | Concern | Source |
+|---|---------|--------|
+| 1 | Migration rollback strategy not addressed for Phase 2 data changes | feasibility |
+
+### Deferred Questions
+
+| # | Question | Source |
+|---|---------|--------|
+| 1 | Should the API use versioned endpoints from launch? | feasibility, security-lens |
+
+### Coverage
+
+| Persona | Status | Findings | Residual |
+|---------|--------|----------|----------|
+| coherence | completed | 2 | 0 |
+| feasibility | completed | 1 | 1 |
+| security-lens | completed | 1 | 0 |
+| scope-guardian | completed | 1 | 0 |
+| product-lens | not activated | -- | -- |
+| design-lens | not activated | -- | -- |
+```
+
+## Section Rules
+
+- **Auto-fixes Applied**: List fixes that were applied automatically (auto class). Omit section if none.
+- **P0-P3 sections**: Only include sections that have findings. Omit empty severity levels.
+- **Residual Concerns**: Findings below confidence threshold that were promoted by cross-persona corroboration, plus unpromoted residual risks. Omit if none.
+- **Deferred Questions**: Questions for later workflow stages. Omit if none.
+- **Coverage**: Always include. Shows which personas ran and their output counts.
--- a/plugins/compound-engineering/skills/document-review/references/subagent-template.md
+++ b/plugins/compound-engineering/skills/document-review/references/subagent-template.md
@@ -0,0 +1,50 @@
+# Document Review Sub-agent Prompt Template
+
+This template is used by the document-review orchestrator to spawn each reviewer sub-agent. Variable substitution slots are filled at dispatch time.
+
+---
+
+## Template
+
+```
+You are a specialist document reviewer.
+
+<persona>
+{persona_file}
+</persona>
+
+<output-contract>
+Return ONLY valid JSON matching the findings schema below. No prose, no markdown, no explanation outside the JSON object.
+
+{schema}
+
+Rules:
+- Suppress any finding below your stated confidence floor (see your Confidence calibration section).
+- Every finding MUST include at least one evidence item -- a direct quote from the document.
+- You are operationally read-only. Analyze the document and produce findings. Do not edit the document, create files, or make changes. You may use non-mutating tools (file reads, glob, grep, git log) to gather context about the codebase when evaluating feasibility or existing patterns.
+- Set `autofix_class` conservatively:
+  - `auto`: Only for local, deterministic fixes -- terminology corrections, formatting fixes, cross-reference repairs. The fix must be unambiguous and not change the document's meaning.
+  - `present`: Everything else -- strategic questions, tradeoffs, meaning-changing fixes, informational findings.
+- `suggested_fix` is optional. Only include it when the fix is obvious and correct. For `present` findings, frame as a question instead.
+- If you find no issues, return an empty findings array. Still populate residual_risks and deferred_questions if applicable.
+- Use your suppress conditions. Do not flag issues that belong to other personas.
+</output-contract>
+
+<review-context>
+Document type: {document_type}
+Document path: {document_path}
+
+Document content:
+{document_content}
+</review-context>
+```
+
+## Variable Reference
+
+| Variable | Source | Description |
+|----------|--------|-------------|
+| `{persona_file}` | Agent markdown file content | The full persona definition (identity, analysis protocol, calibration, suppress conditions) |
+| `{schema}` | `references/findings-schema.json` content | The JSON schema reviewers must conform to |
+| `{document_type}` | Orchestrator classification | Either "requirements" or "plan" |
+| `{document_path}` | Skill input | Path to the document being reviewed |
+| `{document_content}` | File read | The full document text |
--- a/src/converters/claude-to-copilot.ts
+++ b/src/converters/claude-to-copilot.ts
@@ -53,7 +53,7 @@ function convertAgent(agent: ClaudeAgent, usedNames: Set<string>): CopilotAgent
    infer: true,
  }

-  if (agent.model) {
+  if (agent.model && agent.model !== "inherit") {
    frontmatter.model = agent.model
  }

--- a/src/converters/claude-to-droid.ts
+++ b/src/converters/claude-to-droid.ts
@@ -75,7 +75,10 @@ function convertAgent(agent: ClaudeAgent): DroidAgentFile {
  const frontmatter: Record<string, unknown> = {
    name,
    description: agent.description,
-    model: agent.model && agent.model !== "inherit" ? agent.model : "inherit",
+  }
+
+  if (agent.model && agent.model !== "inherit") {
+    frontmatter.model = agent.model
  }

  const tools = mapAgentTools(agent)
--- a/src/converters/claude-to-opencode.ts
+++ b/src/converters/claude-to-opencode.ts
@@ -264,7 +264,7 @@ function rewriteClaudePaths(body: string): string {
 // Update these when new model generations are released.
 const CLAUDE_FAMILY_ALIASES: Record<string, string> = {
  haiku: "claude-haiku-4-5",
-  sonnet: "claude-sonnet-4-5",
+  sonnet: "claude-sonnet-4-6",
  opus: "claude-opus-4-6",
 }

--- a/tests/droid-converter.test.ts
+++ b/tests/droid-converter.test.ts
@@ -89,7 +89,7 @@ describe("convertClaudeToDroid", () => {
    expect(bundle.skillDirs[0].sourceDir).toBe("/tmp/plugin/skills/existing-skill")
  })

-  test("sets model to inherit when not specified", () => {
+  test("omits model when set to inherit", () => {
    const plugin: ClaudePlugin = {
      ...fixturePlugin,
      agents: [
@@ -110,7 +110,7 @@ describe("convertClaudeToDroid", () => {
    })

    const parsed = parseFrontmatter(bundle.droids[0].content)
-    expect(parsed.data.model).toBe("inherit")
+    expect(parsed.data.model).toBeUndefined()
  })

  test("transforms Task agent calls to droid-compatible syntax", () => {