feat(document-review): smarter autofix, batch-confirm, and error/omission classification (#401)

2026-03-26 19:40:13 -07:00
parent b30288c44e
commit 0863cfa4cb
4 changed files with 95 additions and 46 deletions
--- a/plugins/compound-engineering/skills/document-review/SKILL.md
+++ b/plugins/compound-engineering/skills/document-review/SKILL.md
@@ -118,14 +118,15 @@ When fingerprints match across personas:
 ### 3.4 Promote Residual Concerns

 Scan the residual concerns (findings suppressed in 3.2) for:
- **Cross-persona corroboration**: A residual concern from Persona A overlaps with an above-threshold finding from Persona B. Promote at P2 with confidence 0.55-0.65.
- **Concrete blocking risks**: A residual concern describes a specific, concrete risk that would block implementation. Promote at P2 with confidence 0.55.
+- **Cross-persona corroboration**: A residual concern from Persona A overlaps with an above-threshold finding from Persona B. Promote at P2 with confidence 0.55-0.65. Inherit `finding_type` from the corroborating above-threshold finding.
+- **Concrete blocking risks**: A residual concern describes a specific, concrete risk that would block implementation. Promote at P2 with confidence 0.55. Set `finding_type: omission` (blocking risks surfaced as residual concerns are inherently about something the document failed to address).

 ### 3.5 Resolve Contradictions

 When personas disagree on the same section:
 - Create a **combined finding** presenting both perspectives
 - Set `autofix_class: present`
+- Set `finding_type: error` (contradictions are by definition about conflicting things the document says, not things it omits)
 - Frame as a tradeoff, not a verdict

 Specific conflict patterns:
@@ -137,14 +138,17 @@ Specific conflict patterns:

 | Autofix Class | Route |
 |---------------|-------|
-| `auto` | Apply automatically -- local deterministic fix (terminology, formatting, cross-references) |
-| `present` | Present to user for judgment |
+| `auto` | Apply automatically -- local deterministic fix (terminology, formatting, cross-references, completeness corrections where the correct value is verifiable from the document itself) |
+| `batch_confirm` | Group for single batch approval -- obvious fixes that touch meaning but have one clear correct answer |
+| `present` | Present individually for user judgment |

-Demote any `auto` finding that lacks a `suggested_fix` to `present` -- the orchestrator cannot apply a fix without concrete replacement text.
+Demote any `auto` finding that lacks a `suggested_fix` to `batch_confirm`. Demote any `batch_confirm` finding that lacks a `suggested_fix` to `present`.
+
+**Completeness corrections eligible for `auto`:** A finding qualifies when the correct fix is deterministically derivable from other content in the document. Examples: a count says "6 units" but the document lists 7, a summary omits an item that appears in the detailed list, a cross-reference points to a renamed section. If the fix requires judgment about *what* to add (not just *that* something is missing), it belongs in `batch_confirm` or `present`.

 ### 3.7 Sort

-Sort findings for presentation: P0 -> P1 -> P2 -> P3, then by confidence (descending), then by document order (section position).
+Sort findings for presentation: P0 -> P1 -> P2 -> P3, then by finding type (errors before omissions), then by confidence (descending), then by document order (section position).

 ## Phase 4: Apply and Present

@@ -153,17 +157,28 @@ Sort findings for presentation: P0 -> P1 -> P2 -> P3, then by confidence (descen
 Apply all `auto` findings to the document in a **single pass**:
 - Edit the document inline using the platform's edit tool
 - Track what was changed for the "Auto-fixes Applied" section
- Do not ask for approval -- these are unambiguously correct (terminology fixes, formatting, cross-references)
+- Do not ask for approval -- these are unambiguously correct
+
+### Batch Confirm
+
+If any `batch_confirm` findings exist, present them as a group for a single approval:
+- List the proposed fixes in a numbered table
+- Use the platform's blocking question tool (AskUserQuestion in Claude Code, request_user_input in Codex, ask_user in Gemini) to ask: "Apply these N fixes? (yes/no/select)". If no blocking question tool is available, present the table with numbered options and wait for the user's reply before proceeding.
+- If approved, apply all in a single pass
+- If "select", let the user pick which to apply
+- If rejected, demote remaining to the `present` findings list
+
+This turns N obvious-but-meaning-touching fixes into 1 interaction instead of N.

 ### Present Remaining Findings

-Present all other findings to the user using the review output template included below:
- Group by severity (P0 -> P3)
- Include the Coverage table showing which personas ran
- Show auto-fixes that were applied
- Include residual concerns and deferred questions if any
+Present `present` findings using the review output template included below. Within each severity level, separate findings by type:
+- **Errors** (design tensions, contradictions, incorrect statements) first -- these need resolution
+- **Omissions** (missing steps, absent details, forgotten entries) second -- these need additions

-Brief summary at the top: "Applied N auto-fixes. M findings to consider (X at P0/P1)."
+Brief summary at the top: "Applied N auto-fixes. Batched M fixes for approval. K findings to consider (X errors, Y omissions)."
+
+Include the Coverage table, auto-fixes applied, residual concerns, and deferred questions.

 ### Protected Artifacts

--- a/plugins/compound-engineering/skills/document-review/references/findings-schema.json
+++ b/plugins/compound-engineering/skills/document-review/references/findings-schema.json
@@ -19,6 +19,7 @@
          "severity",
          "section",
          "why_it_matters",
+          "finding_type",
          "autofix_class",
          "confidence",
          "evidence"
@@ -44,8 +45,13 @@
          },
          "autofix_class": {
            "type": "string",
-            "enum": ["auto", "present"],
-            "description": "How this issue should be handled. auto = local deterministic fix the orchestrator can apply without asking (terminology, formatting, cross-references). present = requires user judgment."
+            "enum": ["auto", "batch_confirm", "present"],
+            "description": "How this issue should be handled. auto = local deterministic fix applied silently (terminology, formatting, cross-references, completeness corrections). batch_confirm = obvious fix with a clear correct answer, but touches meaning enough to warrant grouped approval. present = requires individual user judgment."
+          },
+          "finding_type": {
+            "type": "string",
+            "enum": ["error", "omission"],
+            "description": "Whether the finding is a mistake in what the document says (error) or something the document forgot to say (omission). Errors are design tensions, contradictions, or incorrect statements. Omissions are missing mechanical steps, forgotten list entries, or absent details."
          },
          "suggested_fix": {
            "type": ["string", "null"],
@@ -91,8 +97,13 @@
      "P3": "Minor improvement. User's discretion."
    },
    "autofix_classes": {
-      "auto": "Local, deterministic document fix: terminology consistency, formatting, cross-reference correction. Must be unambiguous and not change the document's meaning.",
-      "present": "Requires user judgment -- strategic questions, tradeoffs, meaning-changing fixes, or informational findings."
+      "auto": "Local, deterministic document fix: terminology consistency, formatting, cross-reference correction, completeness corrections (wrong counts, missing list entries, undefined values where the correct value is verifiable from elsewhere in the document). Must be unambiguous.",
+      "batch_confirm": "Obvious fix with a clear correct answer, but touches document meaning enough to warrant user awareness. Grouped with other batch_confirm findings for a single approval rather than individual review. Examples: adding a missing implementation step that is mechanically implied, updating a section summary to reflect its own contents.",
+      "present": "Requires individual user judgment -- strategic questions, design tradeoffs, meaning-changing fixes, or findings where reasonable people could disagree on the right action."
+    },
+    "finding_types": {
+      "error": "Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs. These are mistakes in what exists.",
+      "omission": "Something the document forgot to say -- missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references. These are gaps in completeness."
    }
  }
 }
--- a/plugins/compound-engineering/skills/document-review/references/review-output-template.md
+++ b/plugins/compound-engineering/skills/document-review/references/review-output-template.md
@@ -15,35 +15,52 @@ Use this **exact format** when presenting synthesized review findings. Findings
 - security-lens -- plan adds public API endpoint with auth flow
 - scope-guardian -- plan has 15 requirements across 3 priority levels

+Applied 3 auto-fixes. Batched 2 fixes for approval. 4 findings to consider (2 errors, 2 omissions).
+
 ### Auto-fixes Applied

- Standardized "pipeline"/"workflow" terminology to "pipeline" throughout (coherence, auto)
- Fixed cross-reference: Section 4 referenced "Section 3.2" which is actually "Section 3.1" (coherence, auto)
+- Standardized "pipeline"/"workflow" terminology to "pipeline" throughout (coherence)
+- Fixed cross-reference: Section 4 referenced "Section 3.2" which is actually "Section 3.1" (coherence)
+- Updated unit count from "6 units" to "7 units" to match listed units (coherence)
+
+### Batch Confirm
+
+These fixes have one clear correct answer but touch document meaning. Apply all?
+
+| # | Section | Fix | Reviewer |
+|---|---------|-----|----------|
+| 1 | Unit 4 | Add "update API rate-limit config" step -- implied by Unit 3's rate-limit introduction | feasibility |
+| 2 | Verification | Add auth token refresh to test scenarios -- required by Unit 2's token expiry handling | security-lens |

 ### P0 -- Must Fix

-| # | Section | Issue | Reviewer | Confidence | Route |
-|---|---------|-------|----------|------------|-------|
-| 1 | Requirements Trace | Goal states "offline support" but technical approach assumes persistent connectivity | coherence | 0.92 | `present` |
+#### Errors
+
+| # | Section | Issue | Reviewer | Confidence |
+|---|---------|-------|----------|------------|
+| 1 | Requirements Trace | Goal states "offline support" but technical approach assumes persistent connectivity | coherence | 0.92 |

 ### P1 -- Should Fix

-| # | Section | Issue | Reviewer | Confidence | Route |
-|---|---------|-------|----------|------------|-------|
-| 2 | Implementation Unit 3 | Plan proposes custom auth when codebase already uses Devise | feasibility | 0.85 | `present` |
-| 3 | Scope Boundaries | 8 of 12 units build admin infrastructure; only 2 touch stated goal | scope-guardian | 0.80 | `present` |
+#### Errors
+
+| # | Section | Issue | Reviewer | Confidence |
+|---|---------|-------|----------|------------|
+| 2 | Scope Boundaries | 8 of 12 units build admin infrastructure; only 2 touch stated goal | scope-guardian | 0.80 |
+
+#### Omissions
+
+| # | Section | Issue | Reviewer | Confidence |
+|---|---------|-------|----------|------------|
+| 3 | Implementation Unit 3 | Plan proposes custom auth but does not mention existing Devise setup or migration path | feasibility | 0.85 |

 ### P2 -- Consider Fixing

-| # | Section | Issue | Reviewer | Confidence | Route |
-|---|---------|-------|----------|------------|-------|
-| 4 | API Design | Public webhook endpoint has no rate limiting mentioned | security-lens | 0.75 | `present` |
+#### Omissions

-### P3 -- Minor
-
-| # | Section | Issue | Reviewer | Confidence | Route |
-|---|---------|-------|----------|------------|-------|
-| 5 | Overview | "Service" used to mean both microservice and business class | coherence | 0.65 | `auto` |
+| # | Section | Issue | Reviewer | Confidence |
+|---|---------|-------|----------|------------|
+| 4 | API Design | Public webhook endpoint has no rate limiting mentioned | security-lens | 0.75 |

 ### Residual Concerns

@@ -59,20 +76,22 @@ Use this **exact format** when presenting synthesized review findings. Findings

 ### Coverage

-| Persona | Status | Findings | Residual |
-|---------|--------|----------|----------|
-| coherence | completed | 2 | 0 |
-| feasibility | completed | 1 | 1 |
-| security-lens | completed | 1 | 0 |
-| scope-guardian | completed | 1 | 0 |
-| product-lens | not activated | -- | -- |
-| design-lens | not activated | -- | -- |
+| Persona | Status | Findings | Auto | Batch | Present | Residual |
+|---------|--------|----------|------|-------|---------|----------|
+| coherence | completed | 3 | 2 | 0 | 1 | 0 |
+| feasibility | completed | 2 | 0 | 1 | 1 | 1 |
+| security-lens | completed | 2 | 0 | 1 | 1 | 0 |
+| scope-guardian | completed | 1 | 0 | 0 | 1 | 0 |
+| product-lens | not activated | -- | -- | -- | -- | -- |
+| design-lens | not activated | -- | -- | -- | -- | -- |
 ```

 ## Section Rules

+- **Summary line**: Always present after the reviewer list. Format: "Applied N auto-fixes. Batched M fixes for approval. K findings to consider (X errors, Y omissions)." Omit any zero clause.
 - **Auto-fixes Applied**: List fixes that were applied automatically (auto class). Omit section if none.
- **P0-P3 sections**: Only include sections that have findings. Omit empty severity levels.
+- **Batch Confirm**: Group `batch_confirm` findings for a single yes/no/select approval. Omit section if none.
+- **P0-P3 sections**: Only include sections that have findings. Omit empty severity levels. Within each severity, separate into **Errors** and **Omissions** sub-headers. Omit a sub-header if that severity has none of that type.
 - **Residual Concerns**: Findings below confidence threshold that were promoted by cross-persona corroboration, plus unpromoted residual risks. Omit if none.
 - **Deferred Questions**: Questions for later workflow stages. Omit if none.
- **Coverage**: Always include. Shows which personas ran and their output counts.
+- **Coverage**: Always include. Shows which personas ran and their output counts broken down by route (Auto, Batch, Present).
--- a/plugins/compound-engineering/skills/document-review/references/subagent-template.md
+++ b/plugins/compound-engineering/skills/document-review/references/subagent-template.md
@@ -22,9 +22,13 @@ Rules:
 - Suppress any finding below your stated confidence floor (see your Confidence calibration section).
 - Every finding MUST include at least one evidence item -- a direct quote from the document.
 - You are operationally read-only. Analyze the document and produce findings. Do not edit the document, create files, or make changes. You may use non-mutating tools (file reads, glob, grep, git log) to gather context about the codebase when evaluating feasibility or existing patterns.
+- Set `finding_type` for every finding:
+  - `error`: Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs.
+  - `omission`: Something the document forgot to say -- missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references.
 - Set `autofix_class` conservatively:
-  - `auto`: Only for local, deterministic fixes -- terminology corrections, formatting fixes, cross-reference repairs. The fix must be unambiguous and not change the document's meaning.
-  - `present`: Everything else -- strategic questions, tradeoffs, meaning-changing fixes, informational findings.
+  - `auto`: Deterministic fixes where the correct value is verifiable from the document itself -- terminology corrections, formatting fixes, cross-reference repairs, wrong counts, missing list entries where the correct entry exists elsewhere in the document. The fix must be unambiguous.
+  - `batch_confirm`: Obvious fix with one clear correct answer, but it touches document meaning. Examples: adding a missing implementation step that is mechanically implied by other content, updating a summary to match its own details. Use when reasonable people would agree on the fix but it goes beyond cosmetic correction.
+  - `present`: Everything else -- strategic questions, tradeoffs, design tensions where reasonable people could disagree, informational findings.
 - `suggested_fix` is optional. Only include it when the fix is obvious and correct. For `present` findings, frame as a question instead.
 - If you find no issues, return an empty findings array. Still populate residual_risks and deferred_questions if applicable.
 - Use your suppress conditions. Do not flag issues that belong to other personas.