fix(document-review): reduce token cost and latency (#509)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-04 23:31:56 -07:00
parent b223e39a63
commit 9da73a6091
8 changed files with 179 additions and 216 deletions
--- a/plugins/compound-engineering/skills/document-review/SKILL.md
+++ b/plugins/compound-engineering/skills/document-review/SKILL.md
@@ -131,172 +131,9 @@ Pass each agent the **full document** -- do not split into sections.

 **Dispatch limit:** Even at maximum (7 agents), use parallel dispatch. These are document reviewers with bounded scope reading a single document -- parallel is safe and fast.

-## Phase 3: Synthesize Findings
+## Phases 3-5: Synthesis, Presentation, and Next Action

-Process findings from all agents through this pipeline. **Order matters** -- each step depends on the previous.
-
-### 3.1 Validate
-
-Check each agent's returned JSON against the findings schema included below:
- Drop findings missing any required field defined in the schema
- Drop findings with invalid enum values
- Note the agent name for any malformed output in the Coverage section
-
-### 3.2 Confidence Gate
-
-Suppress findings below 0.50 confidence. Store them as residual concerns for potential promotion in step 3.4.
-
-### 3.3 Deduplicate
-
-Fingerprint each finding using `normalize(section) + normalize(title)`. Normalization: lowercase, strip punctuation, collapse whitespace.
-
-When fingerprints match across personas:
- If the findings recommend **opposing actions** (e.g., one says cut, the other says keep), do not merge -- preserve both for contradiction resolution in 3.5
- Otherwise merge: keep the highest severity, keep the highest confidence, union all evidence arrays, note all agreeing reviewers (e.g., "coherence, feasibility")
- **Coverage attribution:** Attribute the merged finding to the persona with the highest confidence. Decrement the losing persona's Findings count *and* the corresponding route bucket (Auto or Present) so `Findings = Auto + Present` stays exact.
-
-### 3.4 Promote Residual Concerns
-
-Scan the residual concerns (findings suppressed in 3.2) for:
- **Cross-persona corroboration**: A residual concern from Persona A overlaps with an above-threshold finding from Persona B. Promote at P2 with confidence 0.55-0.65. Inherit `finding_type` from the corroborating above-threshold finding.
- **Concrete blocking risks**: A residual concern describes a specific, concrete risk that would block implementation. Promote at P2 with confidence 0.55. Set `finding_type: omission` (blocking risks surfaced as residual concerns are inherently about something the document failed to address).
-
-### 3.5 Resolve Contradictions
-
-When personas disagree on the same section:
- Create a **combined finding** presenting both perspectives
- Set `autofix_class: present`
- Set `finding_type: error` (contradictions are by definition about conflicting things the document says, not things it omits)
- Frame as a tradeoff, not a verdict
-
-Specific conflict patterns:
- Coherence says "keep for consistency" + scope-guardian says "cut for simplicity" -> combined finding, let user decide
- Feasibility says "this is impossible" + product-lens says "this is essential" -> P1 finding framed as a tradeoff
- Multiple personas flag the same issue -> merge into single finding, note consensus, increase confidence
-
-### 3.6 Promote Pattern-Resolved Findings
-
-Scan `present` findings for codebase-pattern-resolved auto-eligibility. Promote `present` -> `auto` when **all three** conditions are met:
-
-1. The finding's `why_it_matters` cites a specific existing codebase pattern -- not just "best practice" or "convention," but a concrete pattern with a file, function, or usage reference
-2. The finding includes a concrete `suggested_fix` that follows that cited pattern
-3. There is no genuine tradeoff -- the codebase context resolves any ambiguity about which approach to use
-
-The principle: when a reviewer mentions multiple theoretical approaches but the codebase already has an established pattern that makes one approach clearly correct, the codebase context settles the question. Alternatives mentioned in passing do not create a real tradeoff if the evidence shows the codebase has already chosen.
-
-Do not promote if the finding involves scope or priority changes where the document author may have weighed tradeoffs invisible to the reviewer.
-
-### 3.7 Route by Autofix Class
-
-**Severity and autofix_class are independent.** A P1 finding can be `auto` if the correct fix is obvious. The test is not "how important?" but "is there one clear correct fix, or does this require judgment?"
-
-| Autofix Class | Route |
-|---------------|-------|
-| `auto` | Apply automatically -- one clear correct fix. Includes internal reconciliation (one part authoritative over another), additions mechanically implied by the document's own content, and codebase-pattern-resolved fixes where codebase evidence makes one approach clearly correct. |
-| `present` | Present individually for user judgment |
-
-Demote any `auto` finding that lacks a `suggested_fix` to `present`.
-
-**Auto-eligible patterns:** summary/detail mismatch (body is authoritative over overview), wrong counts, missing list entries derivable from elsewhere in the document, stale internal cross-references, terminology drift, prose/diagram contradictions where prose is more detailed, missing steps mechanically implied by other content, unstated thresholds implied by surrounding context, completeness gaps where the correct addition is obvious, codebase-pattern-resolved fixes where the reviewer cites a specific existing pattern and the suggested_fix follows it. If the fix requires judgment about *what* to do (not just *what to write*) and the codebase context does not resolve the ambiguity, it belongs in `present`.
-
-### 3.8 Sort
-
-Sort findings for presentation: P0 -> P1 -> P2 -> P3, then by finding type (errors before omissions), then by confidence (descending), then by document order (section position).
-
-## Phase 4: Apply and Present
-
-### Apply Auto-fixes
-
-Apply all `auto` findings to the document in a **single pass**:
- Edit the document inline using the platform's edit tool
- Track what was changed for the "Auto-fixes Applied" section
- Do not ask for approval -- these have one clear correct fix
-
-List every auto-fix in the output summary so the user can see what changed. Use enough detail to convey the substance of each fix (section, what was changed, reviewer attribution). This is especially important for fixes that add content or touch document meaning -- the user should not have to diff the document to understand what the review did.
-
-### Present Remaining Findings
-
-**Headless mode:** Do not use interactive question tools. Output all non-auto findings as a structured text summary the caller can parse and act on:
-
-```
-Document review complete (headless mode).
-
-Applied N auto-fixes:
- <section>: <what was changed> (<reviewer>)
- <section>: <what was changed> (<reviewer>)
-
-Findings (requires judgment):
-
-[P0] Section: <section> — <title> (<reviewer>, confidence <N>)
-  Why: <why_it_matters>
-  Suggested fix: <suggested_fix or "none">
-
-[P1] Section: <section> — <title> (<reviewer>, confidence <N>)
-  Why: <why_it_matters>
-  Suggested fix: <suggested_fix or "none">
-
-Residual concerns:
- <concern> (<source>)
-
-Deferred questions:
- <question> (<source>)
-```
-
-Omit any section with zero items. Then proceed directly to Phase 5 (which returns immediately in headless mode).
-
-**Interactive mode:**
-
-Present `present` findings using the review output template included below. Within each severity level, separate findings by type:
- **Errors** (design tensions, contradictions, incorrect statements) first -- these need resolution
- **Omissions** (missing steps, absent details, forgotten entries) second -- these need additions
-
-Brief summary at the top: "Applied N auto-fixes. K findings to consider (X errors, Y omissions)."
-
-Include the Coverage table, auto-fixes applied, residual concerns, and deferred questions.
-
-### Protected Artifacts
-
-During synthesis, discard any finding that recommends deleting or removing files in:
- `docs/brainstorms/`
- `docs/plans/`
- `docs/solutions/`
-
-These are pipeline artifacts and must not be flagged for removal.
-
-## Phase 5: Next Action
-
-**Headless mode:** Return "Review complete" immediately. Do not ask questions. The caller receives the text summary from Phase 4 and handles any remaining findings.
-
-**Interactive mode:**
-
-**Ask using the platform's interactive question tool** -- do not print the question as plain text output:
- Claude Code: `AskUserQuestion`
- Codex: `request_user_input`
- Gemini: `ask_user`
- Fallback (no question tool available): present numbered options and stop; wait for the user's next message
-
-Offer these two options. Use the document type from Phase 1 to set the "Review complete" description:
-
-1. **Refine again** -- Address the findings above, then re-review
-2. **Review complete** -- description based on document type:
-   - requirements document: "Create technical plan with ce:plan"
-   - plan document: "Implement with ce:work"
-
-After 2 refinement passes, recommend completion -- diminishing returns are likely. But if the user wants to continue, allow it.
-
-Return "Review complete" as the terminal signal for callers.
-
-## What NOT to Do
-
- Do not rewrite the entire document
- Do not add new sections or requirements the user didn't discuss
- Do not over-engineer or add complexity
- Do not create separate review files or add metadata sections
- Do not modify caller skills (ce-brainstorm, ce-plan, or external plugin skills that invoke document-review)
-
-## Iteration Guidance
-
-On subsequent passes, re-dispatch personas and re-synthesize. The auto-fix mechanism and confidence gating prevent the same findings from recurring once fixed. If findings are repetitive across passes, recommend completion.
+After all dispatched agents return, read `references/synthesis-and-presentation.md` for the synthesis pipeline (validate, gate, dedup, promote, resolve contradictions, route by autofix class), auto-fix application, finding presentation, and next-action menu. Do not load this file before agent dispatch completes.

 ---

@@ -309,7 +146,3 @@ On subsequent passes, re-dispatch personas and re-synthesize. The auto-fix mecha
 ### Findings Schema

@./references/findings-schema.json
-
-### Review Output Template
-
-@./references/review-output-template.md
--- a/plugins/compound-engineering/skills/document-review/references/findings-schema.json
+++ b/plugins/compound-engineering/skills/document-review/references/findings-schema.json
@@ -82,28 +82,5 @@
      "description": "Questions that should be resolved in a later workflow stage (planning, implementation)",
      "items": { "type": "string" }
    }
-  },
-
-  "_meta": {
-    "confidence_thresholds": {
-      "suppress": "Below 0.50 -- do not report. Finding is speculative noise.",
-      "flag": "0.50-0.69 -- include only when the persona's calibration says the issue is actionable at that confidence.",
-      "report": "0.70+ -- report with full confidence."
-    },
-    "severity_definitions": {
-      "P0": "Contradictions or gaps that would cause building the wrong thing. Must fix before proceeding.",
-      "P1": "Significant gap likely hit during planning or implementation. Should fix.",
-      "P2": "Moderate issue with meaningful downside. Fix if straightforward.",
-      "P3": "Minor improvement. User's discretion."
-    },
-    "autofix_classes": {
-      "_principle": "Autofix class is independent of severity. A P1 finding can be auto if the fix is obvious. The test: is there one clear correct fix, or does resolving this require judgment?",
-      "auto": "One clear correct fix -- applied silently. Three categories: internal reconciliation (summary/detail mismatches, wrong counts, stale cross-references, terminology drift), additions mechanically implied by other content (missing steps, unstated thresholds, completeness gaps where the correct content is obvious), and codebase-pattern-resolved fixes (the reviewer cites a specific existing codebase pattern that makes one approach clearly correct, regardless of theoretical alternatives). Must include suggested_fix.",
-      "present": "Requires individual user judgment -- strategic questions, design tradeoffs, or findings where reasonable people could disagree on the right action."
-    },
-    "finding_types": {
-      "error": "Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs. These are mistakes in what exists.",
-      "omission": "Something the document forgot to say -- missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references. These are gaps in completeness."
-    }
  }
 }
--- a/plugins/compound-engineering/skills/document-review/references/subagent-template.md
+++ b/plugins/compound-engineering/skills/document-review/references/subagent-template.md
@@ -25,15 +25,10 @@ Rules:
 - Set `finding_type` for every finding:
  - `error`: Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs.
  - `omission`: Something the document forgot to say -- missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references.
- Set `autofix_class` based on whether there is one clear correct fix, not on severity. A P1 finding can be `auto` if the fix is obvious:
-  - `auto`: One clear correct fix. Applied silently without asking. The test: is there only one reasonable way to resolve this? If yes, it is auto. Three categories:
-    - Internal reconciliation: one part of the document is authoritative over another -- reconcile toward the authority. Examples: summary/detail mismatches, wrong counts, missing list entries derivable from elsewhere, stale cross-references, terminology drift, prose/diagram contradictions where prose is authoritative.
-    - Implied additions: the correct content is mechanically obvious from the document's own context. Examples: adding a missing implementation step implied by other content, defining a threshold implied but never stated, completeness gaps where what to add is clear.
-    - Codebase-pattern-resolved: the reviewer investigated the codebase and found an established pattern that resolves any ambiguity about the correct fix. The suggested_fix follows that pattern. The test: does the codebase context make one approach clearly correct, regardless of how many alternatives exist in theory? If yes, it is auto. Examples: adding a nil guard using the same early-return pattern found elsewhere in the codebase, applying the naming convention the codebase already follows, promoting a step from conditional to required when code-path analysis proves it is always needed.
-    Always include `suggested_fix` for auto findings. For codebase-pattern-resolved findings, `why_it_matters` must name the specific codebase pattern (file, function, or usage) that makes the fix unambiguous -- `evidence` still quotes the document passage showing the issue.
-    NOT auto (the gap is clear but more than one reasonable fix exists): choosing an implementation approach when the document states a need without constraining how (e.g., "support offline mode" could mean service workers, local-first database, or queue-and-sync -- there is no single obvious answer), changing scope or priority where the author may have weighed tradeoffs the reviewer can't see (e.g., promoting a P2 to P1, or cutting a feature the document intentionally keeps at a lower tier). Note: mentioning alternatives in passing does NOT disqualify a finding from auto -- the test is whether codebase evidence or document context makes one approach clearly superior, not whether other approaches were discussed.
-  - `present`: Requires judgment -- strategic questions, tradeoffs, design tensions where reasonable people could disagree, findings where the right action is unclear.
- `suggested_fix` is required for `auto` findings. For `present` findings, `suggested_fix` is optional -- include it only when the fix is obvious, and frame as a question when the right action is unclear.
+- Set `autofix_class` based on whether there is one clear correct fix, not on severity:
+  - `auto`: One clear correct fix, applied silently. Three categories: (1) internal reconciliation -- one document part authoritative over another (summary/detail mismatches, wrong counts, stale cross-references, terminology drift); (2) implied additions -- correct content mechanically obvious from the document (missing steps, unstated thresholds, completeness gaps); (3) codebase-pattern-resolved -- an established codebase pattern resolves ambiguity (cite the specific file/function in `why_it_matters`). Always include `suggested_fix`. NOT auto if more than one reasonable fix exists or if scope/priority judgment is involved.
+  - `present`: Requires user judgment -- strategic questions, tradeoffs, design tensions.
+- `suggested_fix` is required for `auto` findings. For `present` findings, include only when the fix is obvious.
 - If you find no issues, return an empty findings array. Still populate residual_risks and deferred_questions if applicable.
 - Use your suppress conditions. Do not flag issues that belong to other personas.
 </output-contract>
@@ -46,13 +41,3 @@ Document content:
 {document_content}
 </review-context>
 ```
-
-## Variable Reference
-
-| Variable | Source | Description |
-|----------|--------|-------------|
-| `{persona_file}` | Agent markdown file content | The full persona definition (identity, analysis protocol, calibration, suppress conditions) |
-| `{schema}` | `references/findings-schema.json` content | The JSON schema reviewers must conform to |
-| `{document_type}` | Orchestrator classification | Either "requirements" or "plan" |
-| `{document_path}` | Skill input | Path to the document being reviewed |
-| `{document_content}` | File read | The full document text |
--- a/plugins/compound-engineering/skills/document-review/references/synthesis-and-presentation.md
+++ b/plugins/compound-engineering/skills/document-review/references/synthesis-and-presentation.md
@@ -0,0 +1,168 @@
+# Phases 3-5: Synthesis, Presentation, and Next Action
+
+## Phase 3: Synthesize Findings
+
+Process findings from all agents through this pipeline. **Order matters** -- each step depends on the previous.
+
+### 3.1 Validate
+
+Check each agent's returned JSON against the findings schema:
+- Drop findings missing any required field defined in the schema
+- Drop findings with invalid enum values
+- Note the agent name for any malformed output in the Coverage section
+
+### 3.2 Confidence Gate
+
+Suppress findings below 0.50 confidence. Store them as residual concerns for potential promotion in step 3.4.
+
+### 3.3 Deduplicate
+
+Fingerprint each finding using `normalize(section) + normalize(title)`. Normalization: lowercase, strip punctuation, collapse whitespace.
+
+When fingerprints match across personas:
+- If the findings recommend **opposing actions** (e.g., one says cut, the other says keep), do not merge -- preserve both for contradiction resolution in 3.5
+- Otherwise merge: keep the highest severity, keep the highest confidence, union all evidence arrays, note all agreeing reviewers (e.g., "coherence, feasibility")
+- **Coverage attribution:** Attribute the merged finding to the persona with the highest confidence. Decrement the losing persona's Findings count *and* the corresponding route bucket (Auto or Present) so `Findings = Auto + Present` stays exact.
+
+### 3.4 Promote Residual Concerns
+
+Scan the residual concerns (findings suppressed in 3.2) for:
+- **Cross-persona corroboration**: A residual concern from Persona A overlaps with an above-threshold finding from Persona B. Promote at P2 with confidence 0.55-0.65. Inherit `finding_type` from the corroborating above-threshold finding.
+- **Concrete blocking risks**: A residual concern describes a specific, concrete risk that would block implementation. Promote at P2 with confidence 0.55. Set `finding_type: omission` (blocking risks surfaced as residual concerns are inherently about something the document failed to address).
+
+### 3.5 Resolve Contradictions
+
+When personas disagree on the same section:
+- Create a **combined finding** presenting both perspectives
+- Set `autofix_class: present`
+- Set `finding_type: error` (contradictions are by definition about conflicting things the document says, not things it omits)
+- Frame as a tradeoff, not a verdict
+
+Specific conflict patterns:
+- Coherence says "keep for consistency" + scope-guardian says "cut for simplicity" -> combined finding, let user decide
+- Feasibility says "this is impossible" + product-lens says "this is essential" -> P1 finding framed as a tradeoff
+- Multiple personas flag the same issue -> merge into single finding, note consensus, increase confidence
+
+### 3.6 Promote Pattern-Resolved Findings
+
+Scan `present` findings for codebase-pattern-resolved auto-eligibility. Promote `present` -> `auto` when **all three** conditions are met:
+
+1. The finding's `why_it_matters` cites a specific existing codebase pattern -- not just "best practice" or "convention," but a concrete pattern with a file, function, or usage reference
+2. The finding includes a concrete `suggested_fix` that follows that cited pattern
+3. There is no genuine tradeoff -- the codebase context resolves any ambiguity about which approach to use
+
+The principle: when a reviewer mentions multiple theoretical approaches but the codebase already has an established pattern that makes one approach clearly correct, the codebase context settles the question. Alternatives mentioned in passing do not create a real tradeoff if the evidence shows the codebase has already chosen.
+
+Do not promote if the finding involves scope or priority changes where the document author may have weighed tradeoffs invisible to the reviewer.
+
+### 3.7 Route by Autofix Class
+
+**Severity and autofix_class are independent.** A P1 finding can be `auto` if the correct fix is obvious. The test is not "how important?" but "is there one clear correct fix, or does this require judgment?"
+
+| Autofix Class | Route |
+|---------------|-------|
+| `auto` | Apply automatically -- one clear correct fix. Includes internal reconciliation (one part authoritative over another), additions mechanically implied by the document's own content, and codebase-pattern-resolved fixes where codebase evidence makes one approach clearly correct. |
+| `present` | Present individually for user judgment |
+
+Demote any `auto` finding that lacks a `suggested_fix` to `present`.
+
+**Auto-eligible patterns:** summary/detail mismatch (body is authoritative over overview), wrong counts, missing list entries derivable from elsewhere in the document, stale internal cross-references, terminology drift, prose/diagram contradictions where prose is more detailed, missing steps mechanically implied by other content, unstated thresholds implied by surrounding context, completeness gaps where the correct addition is obvious, codebase-pattern-resolved fixes where the reviewer cites a specific existing pattern and the suggested_fix follows it. If the fix requires judgment about *what* to do (not just *what to write*) and the codebase context does not resolve the ambiguity, it belongs in `present`.
+
+### 3.8 Sort
+
+Sort findings for presentation: P0 -> P1 -> P2 -> P3, then by finding type (errors before omissions), then by confidence (descending), then by document order (section position).
+
+## Phase 4: Apply and Present
+
+### Apply Auto-fixes
+
+Apply all `auto` findings to the document in a **single pass**:
+- Edit the document inline using the platform's edit tool
+- Track what was changed for the "Auto-fixes Applied" section
+- Do not ask for approval -- these have one clear correct fix
+
+List every auto-fix in the output summary so the user can see what changed. Use enough detail to convey the substance of each fix (section, what was changed, reviewer attribution). This is especially important for fixes that add content or touch document meaning -- the user should not have to diff the document to understand what the review did.
+
+### Present Remaining Findings
+
+**Headless mode:** Do not use interactive question tools. Output all non-auto findings as a structured text summary the caller can parse and act on:
+
+```
+Document review complete (headless mode).
+
+Applied N auto-fixes:
+- <section>: <what was changed> (<reviewer>)
+- <section>: <what was changed> (<reviewer>)
+
+Findings (requires judgment):
+
+[P0] Section: <section> — <title> (<reviewer>, confidence <N>)
+  Why: <why_it_matters>
+  Suggested fix: <suggested_fix or "none">
+
+[P1] Section: <section> — <title> (<reviewer>, confidence <N>)
+  Why: <why_it_matters>
+  Suggested fix: <suggested_fix or "none">
+
+Residual concerns:
+- <concern> (<source>)
+
+Deferred questions:
+- <question> (<source>)
+```
+
+Omit any section with zero items. Then proceed directly to Phase 5 (which returns immediately in headless mode).
+
+**Interactive mode:**
+
+Present `present` findings using the review output template (read `references/review-output-template.md`). Within each severity level, separate findings by type:
+- **Errors** (design tensions, contradictions, incorrect statements) first -- these need resolution
+- **Omissions** (missing steps, absent details, forgotten entries) second -- these need additions
+
+Brief summary at the top: "Applied N auto-fixes. K findings to consider (X errors, Y omissions)."
+
+Include the Coverage table, auto-fixes applied, residual concerns, and deferred questions.
+
+### Protected Artifacts
+
+During synthesis, discard any finding that recommends deleting or removing files in:
+- `docs/brainstorms/`
+- `docs/plans/`
+- `docs/solutions/`
+
+These are pipeline artifacts and must not be flagged for removal.
+
+## Phase 5: Next Action
+
+**Headless mode:** Return "Review complete" immediately. Do not ask questions. The caller receives the text summary from Phase 4 and handles any remaining findings.
+
+**Interactive mode:**
+
+**Ask using the platform's interactive question tool** -- do not print the question as plain text output:
+- Claude Code: `AskUserQuestion`
+- Codex: `request_user_input`
+- Gemini: `ask_user`
+- Fallback (no question tool available): present numbered options and stop; wait for the user's next message
+
+Offer these two options. Use the document type from Phase 1 to set the "Review complete" description:
+
+1. **Refine again** -- Address the findings above, then re-review
+2. **Review complete** -- description based on document type:
+   - requirements document: "Create technical plan with ce:plan"
+   - plan document: "Implement with ce:work"
+
+After 2 refinement passes, recommend completion -- diminishing returns are likely. But if the user wants to continue, allow it.
+
+Return "Review complete" as the terminal signal for callers.
+
+## What NOT to Do
+
+- Do not rewrite the entire document
+- Do not add new sections or requirements the user didn't discuss
+- Do not over-engineer or add complexity
+- Do not create separate review files or add metadata sections
+- Do not modify caller skills (ce-brainstorm, ce-plan, or external plugin skills that invoke document-review)
+
+## Iteration Guidance
+
+On subsequent passes, re-dispatch personas and re-synthesize. The auto-fix mechanism and confidence gating prevent the same findings from recurring once fixed. If findings are repetitive across passes, recommend completion.