refactor(cli)!: rename all skills and agents to consistent ce- prefix (#503)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-18 15:44:22 -07:00
parent 49249d7317
commit 5c0ec9137a
233 changed files with 3199 additions and 936 deletions
--- a/plugins/compound-engineering/skills/ce-doc-review/references/findings-schema.json
+++ b/plugins/compound-engineering/skills/ce-doc-review/references/findings-schema.json
@@ -0,0 +1,86 @@
+{
+  "$schema": "http://json-schema.org/draft-07/schema#",
+  "title": "Document Review Findings",
+  "description": "Structured output schema for document review persona agents",
+  "type": "object",
+  "required": ["reviewer", "findings", "residual_risks", "deferred_questions"],
+  "properties": {
+    "reviewer": {
+      "type": "string",
+      "description": "Persona name that produced this output (e.g., 'coherence', 'feasibility', 'product-lens')"
+    },
+    "findings": {
+      "type": "array",
+      "description": "List of document review findings. Empty array if no issues found.",
+      "items": {
+        "type": "object",
+        "required": [
+          "title",
+          "severity",
+          "section",
+          "why_it_matters",
+          "finding_type",
+          "autofix_class",
+          "confidence",
+          "evidence"
+        ],
+        "properties": {
+          "title": {
+            "type": "string",
+            "description": "Short, specific issue title. 10 words or fewer.",
+            "maxLength": 100
+          },
+          "severity": {
+            "type": "string",
+            "enum": ["P0", "P1", "P2", "P3"],
+            "description": "Issue severity level"
+          },
+          "section": {
+            "type": "string",
+            "description": "Document section where the issue appears (e.g., 'Requirements Trace', 'Implementation Unit 3', 'Overview')"
+          },
+          "why_it_matters": {
+            "type": "string",
+            "description": "Impact statement -- not 'what is wrong' but 'what goes wrong if not addressed'"
+          },
+          "autofix_class": {
+            "type": "string",
+            "enum": ["auto", "present"],
+            "description": "How this issue should be handled. auto = one clear correct fix that can be applied silently (terminology, formatting, cross-references, completeness corrections, additions mechanically implied by other content). present = requires individual user judgment."
+          },
+          "finding_type": {
+            "type": "string",
+            "enum": ["error", "omission"],
+            "description": "Whether the finding is a mistake in what the document says (error) or something the document forgot to say (omission). Errors are design tensions, contradictions, or incorrect statements. Omissions are missing mechanical steps, forgotten list entries, or absent details."
+          },
+          "suggested_fix": {
+            "type": ["string", "null"],
+            "description": "Concrete fix text. Omit or null if no good fix is obvious -- a bad suggestion is worse than none."
+          },
+          "confidence": {
+            "type": "number",
+            "description": "Reviewer confidence in this finding, calibrated per persona",
+            "minimum": 0.0,
+            "maximum": 1.0
+          },
+          "evidence": {
+            "type": "array",
+            "description": "Quoted text from the document that supports this finding. At least 1 item.",
+            "items": { "type": "string" },
+            "minItems": 1
+          }
+        }
+      }
+    },
+    "residual_risks": {
+      "type": "array",
+      "description": "Risks the reviewer noticed but could not confirm as findings (below confidence threshold)",
+      "items": { "type": "string" }
+    },
+    "deferred_questions": {
+      "type": "array",
+      "description": "Questions that should be resolved in a later workflow stage (planning, implementation)",
+      "items": { "type": "string" }
+    }
+  }
+}
--- a/plugins/compound-engineering/skills/ce-doc-review/references/review-output-template.md
+++ b/plugins/compound-engineering/skills/ce-doc-review/references/review-output-template.md
@@ -0,0 +1,89 @@
+# Document Review Output Template
+
+Use this **exact format** when presenting synthesized review findings. Findings are grouped by severity, not by reviewer.
+
+**IMPORTANT:** Use pipe-delimited markdown tables (`| col | col |`). Do NOT use ASCII box-drawing characters.
+
+## Example
+
+```markdown
+## Document Review Results
+
+**Document:** docs/plans/2026-03-15-feat-user-auth-plan.md
+**Type:** plan
+**Reviewers:** coherence, feasibility, security-lens, scope-guardian
+- security-lens -- plan adds public API endpoint with auth flow
+- scope-guardian -- plan has 15 requirements across 3 priority levels
+
+Applied 5 auto-fixes. 4 findings to consider (2 errors, 2 omissions).
+
+### Auto-fixes Applied
+
+- Standardized "pipeline"/"workflow" terminology to "pipeline" throughout (coherence)
+- Fixed cross-reference: Section 4 referenced "Section 3.2" which is actually "Section 3.1" (coherence)
+- Updated unit count from "6 units" to "7 units" to match listed units (coherence)
+- Added "update API rate-limit config" step to Unit 4 -- implied by Unit 3's rate-limit introduction (feasibility)
+- Added auth token refresh to test scenarios -- required by Unit 2's token expiry handling (security-lens)
+
+### P0 -- Must Fix
+
+#### Errors
+
+| # | Section | Issue | Reviewer | Confidence |
+|---|---------|-------|----------|------------|
+| 1 | Requirements Trace | Goal states "offline support" but technical approach assumes persistent connectivity | coherence | 0.92 |
+
+### P1 -- Should Fix
+
+#### Errors
+
+| # | Section | Issue | Reviewer | Confidence |
+|---|---------|-------|----------|------------|
+| 2 | Scope Boundaries | 8 of 12 units build admin infrastructure; only 2 touch stated goal | scope-guardian | 0.80 |
+
+#### Omissions
+
+| # | Section | Issue | Reviewer | Confidence |
+|---|---------|-------|----------|------------|
+| 3 | Implementation Unit 3 | Plan proposes custom auth but does not mention existing Devise setup or migration path | feasibility | 0.85 |
+
+### P2 -- Consider Fixing
+
+#### Omissions
+
+| # | Section | Issue | Reviewer | Confidence |
+|---|---------|-------|----------|------------|
+| 4 | API Design | Public webhook endpoint has no rate limiting mentioned | security-lens | 0.75 |
+
+### Residual Concerns
+
+| # | Concern | Source |
+|---|---------|--------|
+| 1 | Migration rollback strategy not addressed for Phase 2 data changes | feasibility |
+
+### Deferred Questions
+
+| # | Question | Source |
+|---|---------|--------|
+| 1 | Should the API use versioned endpoints from launch? | feasibility, security-lens |
+
+### Coverage
+
+| Persona | Status | Findings | Auto | Present | Residual |
+|---------|--------|----------|------|---------|----------|
+| coherence | completed | 4 | 3 | 1 | 0 |
+| feasibility | completed | 2 | 1 | 1 | 1 |
+| security-lens | completed | 2 | 1 | 1 | 0 |
+| scope-guardian | completed | 1 | 0 | 1 | 0 |
+| product-lens | not activated | -- | -- | -- | -- |
+| design-lens | not activated | -- | -- | -- | -- |
+```
+
+## Section Rules
+
+- **Summary line**: Always present after the reviewer list. Format: "Applied N auto-fixes. K findings to consider (X errors, Y omissions)." Omit any zero clause.
+- **Auto-fixes Applied**: List all fixes that were applied automatically (auto class). Include enough detail per fix to convey the substance -- especially for fixes that add content or touch document meaning. Omit section if none.
+- **P0-P3 sections**: Only include sections that have findings. Omit empty severity levels. Within each severity, separate into **Errors** and **Omissions** sub-headers. Omit a sub-header if that severity has none of that type.
+- **Residual Concerns**: Findings below confidence threshold that were promoted by cross-persona corroboration, plus unpromoted residual risks. Omit if none.
+- **Deferred Questions**: Questions for later workflow stages. Omit if none.
+- **Coverage**: Always include. All counts are **post-synthesis**. **Findings** must equal Auto + Present exactly -- if deduplication merged a finding across personas, attribute it to the persona with the highest confidence and reduce the other persona's count. **Residual** = count of `residual_risks` from this persona's raw output (not the promoted subset in the Residual Concerns section).
--- a/plugins/compound-engineering/skills/ce-doc-review/references/subagent-template.md
+++ b/plugins/compound-engineering/skills/ce-doc-review/references/subagent-template.md
@@ -0,0 +1,52 @@
+# Document Review Sub-agent Prompt Template
+
+This template is used by the document-review orchestrator to spawn each reviewer sub-agent. Variable substitution slots are filled at dispatch time.
+
+---
+
+## Template
+
+```
+You are a specialist document reviewer.
+
+<persona>
+{persona_file}
+</persona>
+
+<output-contract>
+Return ONLY valid JSON matching the findings schema below. No prose, no markdown, no explanation outside the JSON object.
+
+{schema}
+
+Rules:
+- You are a leaf reviewer inside an already-running compound-engineering review workflow. Do not invoke compound-engineering skills or agents unless this template explicitly instructs you to. Perform your analysis directly and return findings in the required output format only.
+- Suppress any finding below your stated confidence floor (see your Confidence calibration section).
+- Every finding MUST include at least one evidence item -- a direct quote from the document.
+- You are operationally read-only. Analyze the document and produce findings. Do not edit the document, create files, or make changes. You may use non-mutating tools (file reads, glob, grep, git log) to gather context about the codebase when evaluating feasibility or existing patterns.
+- Set `finding_type` for every finding:
+  - `error`: Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs.
+  - `omission`: Something the document forgot to say -- missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references.
+- Set `autofix_class` based on whether there is one clear correct fix, not on severity or importance:
+  - `auto`: One clear correct fix, applied silently. This includes trivial fixes AND substantive ones:
+    - Internal reconciliation -- one document part authoritative over another (summary/detail mismatches, wrong counts, stale cross-references, terminology drift)
+    - Implied additions -- correct content mechanically obvious from the document (missing steps, unstated thresholds, completeness gaps)
+    - Codebase-pattern-resolved -- an established codebase pattern resolves ambiguity (cite the specific file/function in `why_it_matters`)
+    - Incorrect behavior -- the document describes behavior that is factually wrong, and the correct behavior is obvious from context or the codebase
+    - Missing standard security measures -- HTTPS enforcement, checksum verification, input sanitization, private IP rejection, or other controls with known implementations where omission is clearly a bug
+    - Incomplete technical descriptions -- the accurate/complete version is directly derivable from the codebase
+    - Missing requirements that follow mechanically from the document's own explicit, concrete decisions (not high-level goals -- a goal can be satisfied by multiple valid requirements)
+    The test is not "is this fix important?" but "is there more than one reasonable way to fix this?" If a competent implementer would arrive at the same fix independently, it is auto -- even if the fix is substantive. Always include `suggested_fix`. NOT auto if more than one reasonable fix exists or if scope/priority judgment is involved.
+  - `present`: Requires user judgment -- genuinely multiple valid approaches where the right choice depends on priorities, tradeoffs, or context the reviewer does not have. Examples: architectural choices with real tradeoffs, scope decisions, feature prioritization, UX design choices.
+- `suggested_fix` is required for `auto` findings. For `present` findings, include only when the fix is obvious.
+- If you find no issues, return an empty findings array. Still populate residual_risks and deferred_questions if applicable.
+- Use your suppress conditions. Do not flag issues that belong to other personas.
+</output-contract>
+
+<review-context>
+Document type: {document_type}
+Document path: {document_path}
+
+Document content:
+{document_content}
+</review-context>
+```
--- a/plugins/compound-engineering/skills/ce-doc-review/references/synthesis-and-presentation.md
+++ b/plugins/compound-engineering/skills/ce-doc-review/references/synthesis-and-presentation.md
@@ -0,0 +1,173 @@
+# Phases 3-5: Synthesis, Presentation, and Next Action
+
+## Phase 3: Synthesize Findings
+
+Process findings from all agents through this pipeline. **Order matters** -- each step depends on the previous.
+
+### 3.1 Validate
+
+Check each agent's returned JSON against the findings schema:
+- Drop findings missing any required field defined in the schema
+- Drop findings with invalid enum values
+- Note the agent name for any malformed output in the Coverage section
+
+### 3.2 Confidence Gate
+
+Suppress findings below 0.50 confidence. Store them as residual concerns for potential promotion in step 3.4.
+
+### 3.3 Deduplicate
+
+Fingerprint each finding using `normalize(section) + normalize(title)`. Normalization: lowercase, strip punctuation, collapse whitespace.
+
+When fingerprints match across personas:
+- If the findings recommend **opposing actions** (e.g., one says cut, the other says keep), do not merge -- preserve both for contradiction resolution in 3.5
+- Otherwise merge: keep the highest severity, keep the highest confidence, union all evidence arrays, note all agreeing reviewers (e.g., "coherence, feasibility")
+- **Coverage attribution:** Attribute the merged finding to the persona with the highest confidence. Decrement the losing persona's Findings count *and* the corresponding route bucket (Auto or Present) so `Findings = Auto + Present` stays exact.
+
+### 3.4 Promote Residual Concerns
+
+Scan the residual concerns (findings suppressed in 3.2) for:
+- **Cross-persona corroboration**: A residual concern from Persona A overlaps with an above-threshold finding from Persona B. Promote at P2 with confidence 0.55-0.65. Inherit `finding_type` from the corroborating above-threshold finding.
+- **Concrete blocking risks**: A residual concern describes a specific, concrete risk that would block implementation. Promote at P2 with confidence 0.55. Set `finding_type: omission` (blocking risks surfaced as residual concerns are inherently about something the document failed to address).
+
+### 3.5 Resolve Contradictions
+
+When personas disagree on the same section:
+- Create a **combined finding** presenting both perspectives
+- Set `autofix_class: present`
+- Set `finding_type: error` (contradictions are by definition about conflicting things the document says, not things it omits)
+- Frame as a tradeoff, not a verdict
+
+Specific conflict patterns:
+- Coherence says "keep for consistency" + scope-guardian says "cut for simplicity" -> combined finding, let user decide
+- Feasibility says "this is impossible" + product-lens says "this is essential" -> P1 finding framed as a tradeoff
+- Multiple personas flag the same issue -> merge into single finding, note consensus, increase confidence
+
+### 3.6 Promote Pattern-Resolved Findings
+
+Scan `present` findings for codebase-pattern-resolved auto-eligibility. Promote `present` -> `auto` when **all three** conditions are met:
+
+1. The finding's `why_it_matters` cites a specific existing codebase pattern -- not just "best practice" or "convention," but a concrete pattern with a file, function, or usage reference
+2. The finding includes a concrete `suggested_fix` that follows that cited pattern
+3. There is no genuine tradeoff -- the codebase context resolves any ambiguity about which approach to use
+
+The principle: when a reviewer mentions multiple theoretical approaches but the codebase already has an established pattern that makes one approach clearly correct, the codebase context settles the question. Alternatives mentioned in passing do not create a real tradeoff if the evidence shows the codebase has already chosen.
+
+Additional auto-promotion patterns (promote `present` -> `auto` when):
+- The finding identifies factually incorrect behavior in the document and the suggested fix describes the correct behavior (not a design choice between alternatives)
+- The finding identifies a missing industry-standard security control where the document's own context makes the omission clearly wrong (not a legitimate design choice for the system described), and the suggested fix follows established practice
+- The finding identifies an incomplete technical description and the complete version is directly derivable from the codebase (the reviewer cited specific code showing what the description should say)
+
+Do not promote if the finding involves scope or priority changes where the document author may have weighed tradeoffs invisible to the reviewer.
+
+### 3.7 Route by Autofix Class
+
+**Severity and autofix_class are independent.** A P1 finding can be `auto` if the correct fix is obvious. The test is not "how important?" but "is there one clear correct fix, or does this require judgment?"
+
+| Autofix Class | Route |
+|---------------|-------|
+| `auto` | Apply automatically -- one clear correct fix. Includes internal reconciliation (one part authoritative over another), additions mechanically implied by the document's own content, and codebase-pattern-resolved fixes where codebase evidence makes one approach clearly correct. |
+| `present` | Present individually for user judgment |
+
+Demote any `auto` finding that lacks a `suggested_fix` to `present`.
+
+**Auto-eligible patterns:** summary/detail mismatch (body is authoritative over overview), wrong counts, missing list entries derivable from elsewhere in the document, stale internal cross-references, terminology drift, prose/diagram contradictions where prose is more detailed, missing steps mechanically implied by other content, unstated thresholds implied by surrounding context, completeness gaps where the correct addition is obvious, codebase-pattern-resolved fixes where the reviewer cites a specific existing pattern and the suggested_fix follows it, factually incorrect behavior where the correct behavior is obvious from context or the codebase, missing standard security controls with known implementations, incomplete technical descriptions where the complete version is derivable from the codebase. If the fix requires judgment about *what* to do (not just *what to write*) and the codebase context does not resolve the ambiguity, it belongs in `present`.
+
+### 3.8 Sort
+
+Sort findings for presentation: P0 -> P1 -> P2 -> P3, then by finding type (errors before omissions), then by confidence (descending), then by document order (section position).
+
+## Phase 4: Apply and Present
+
+### Apply Auto-fixes
+
+Apply all `auto` findings to the document in a **single pass**:
+- Edit the document inline using the platform's edit tool
+- Track what was changed for the "Auto-fixes Applied" section
+- Do not ask for approval -- these have one clear correct fix
+
+List every auto-fix in the output summary so the user can see what changed. Use enough detail to convey the substance of each fix (section, what was changed, reviewer attribution). This is especially important for fixes that add content or touch document meaning -- the user should not have to diff the document to understand what the review did.
+
+### Present Remaining Findings
+
+**Headless mode:** Do not use interactive question tools. Output all non-auto findings as a structured text summary the caller can parse and act on:
+
+```
+Document review complete (headless mode).
+
+Applied N auto-fixes:
+- <section>: <what was changed> (<reviewer>)
+- <section>: <what was changed> (<reviewer>)
+
+Findings (requires judgment):
+
+[P0] Section: <section> — <title> (<reviewer>, confidence <N>)
+  Why: <why_it_matters>
+  Suggested fix: <suggested_fix or "none">
+
+[P1] Section: <section> — <title> (<reviewer>, confidence <N>)
+  Why: <why_it_matters>
+  Suggested fix: <suggested_fix or "none">
+
+Residual concerns:
+- <concern> (<source>)
+
+Deferred questions:
+- <question> (<source>)
+```
+
+Omit any section with zero items. Then proceed directly to Phase 5 (which returns immediately in headless mode).
+
+**Interactive mode:**
+
+Present `present` findings using the review output template (read `references/review-output-template.md`). Within each severity level, separate findings by type:
+- **Errors** (design tensions, contradictions, incorrect statements) first -- these need resolution
+- **Omissions** (missing steps, absent details, forgotten entries) second -- these need additions
+
+Brief summary at the top: "Applied N auto-fixes. K findings to consider (X errors, Y omissions)."
+
+Include the Coverage table, auto-fixes applied, residual concerns, and deferred questions.
+
+### Protected Artifacts
+
+During synthesis, discard any finding that recommends deleting or removing files in:
+- `docs/brainstorms/`
+- `docs/plans/`
+- `docs/solutions/`
+
+These are pipeline artifacts and must not be flagged for removal.
+
+## Phase 5: Next Action
+
+**Headless mode:** Return "Review complete" immediately. Do not ask questions. The caller receives the text summary from Phase 4 and handles any remaining findings.
+
+**Interactive mode:**
+
+**Ask using the platform's interactive question tool** -- do not print the question as plain text output:
+- Claude Code: `AskUserQuestion`
+- Codex: `request_user_input`
+- Gemini: `ask_user`
+- Fallback (no question tool available): present numbered options and stop; wait for the user's next message
+
+Offer these two options. Use the document type from Phase 1 to set the "Review complete" description:
+
+1. **Refine again** -- Address the findings above, then re-review
+2. **Review complete** -- description based on document type:
+   - requirements document: "Create technical plan with ce-plan"
+   - plan document: "Implement with ce-work"
+
+After 2 refinement passes, recommend completion -- diminishing returns are likely. But if the user wants to continue, allow it.
+
+Return "Review complete" as the terminal signal for callers.
+
+## What NOT to Do
+
+- Do not rewrite the entire document
+- Do not add new sections or requirements the user didn't discuss
+- Do not over-engineer or add complexity
+- Do not create separate review files or add metadata sections
+- Do not modify caller skills (ce-brainstorm, ce-plan, or external plugin skills that invoke ce-doc-review)
+
+## Iteration Guidance
+
+On subsequent passes, re-dispatch personas and re-synthesize. The auto-fix mechanism and confidence gating prevent the same findings from recurring once fixed. If findings are repetitive across passes, recommend completion.