fix(document-review): enforce interactive questions and fix autofix classification (#415)

Co-authored-by: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-27 19:37:14 -07:00
parent 7b75a9ac25
commit d44729603d
4 changed files with 41 additions and 19 deletions
--- a/plugins/compound-engineering/skills/document-review/references/findings-schema.json
+++ b/plugins/compound-engineering/skills/document-review/references/findings-schema.json
@@ -97,9 +97,10 @@
      "P3": "Minor improvement. User's discretion."
    },
    "autofix_classes": {
-      "auto": "Local, deterministic document fix: terminology consistency, formatting, cross-reference correction, completeness corrections (wrong counts, missing list entries, undefined values where the correct value is verifiable from elsewhere in the document). Must be unambiguous.",
-      "batch_confirm": "Obvious fix with a clear correct answer, but requires authoring new content where exact wording needs verification. Grouped with other batch_confirm findings for a single approval rather than individual review. Examples: adding a missing implementation step that is mechanically implied, rewording a section to resolve an ambiguity. Note: fixes that are deterministically derivable from existing document content (updating a summary to match its own list, correcting a count, adding group headers to requirements) belong in auto, not here.",
-      "present": "Requires individual user judgment -- strategic questions, design tradeoffs, meaning-changing fixes, or findings where reasonable people could disagree on the right action."
+      "_principle": "Autofix class is independent of severity. A P1 finding can be auto if the fix is deterministic. The test: can the correct fix be derived from the document's own content without judgment?",
+      "auto": "Fix is derivable from the document itself -- one part is clearly authoritative over another, reconcile toward the authority. Includes: summary/detail mismatches, wrong counts, missing list entries, stale cross-references, terminology drift, prose/diagram contradictions where prose is authoritative. Must include suggested_fix.",
+      "batch_confirm": "One clear correct answer, but authors new content where exact wording needs verification. Grouped for single approval. Examples: adding a missing step mechanically implied by other content, defining an implied-but-unstated threshold. Must include suggested_fix.",
+      "present": "Requires individual user judgment -- strategic questions, design tradeoffs, or findings where reasonable people could disagree on the right action."
    },
    "finding_types": {
      "error": "Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs. These are mistakes in what exists.",
--- a/plugins/compound-engineering/skills/document-review/references/review-output-template.md
+++ b/plugins/compound-engineering/skills/document-review/references/review-output-template.md
@@ -94,4 +94,4 @@ These fixes have one clear correct answer but touch document meaning. Apply all?
 - **P0-P3 sections**: Only include sections that have findings. Omit empty severity levels. Within each severity, separate into **Errors** and **Omissions** sub-headers. Omit a sub-header if that severity has none of that type.
 - **Residual Concerns**: Findings below confidence threshold that were promoted by cross-persona corroboration, plus unpromoted residual risks. Omit if none.
 - **Deferred Questions**: Questions for later workflow stages. Omit if none.
- **Coverage**: Always include. Shows which personas ran and their output counts broken down by route (Auto, Batch, Present).
+- **Coverage**: Always include. All counts are **post-synthesis**. **Findings** must equal Auto + Batch + Present exactly -- if deduplication merged a finding across personas, attribute it to the persona with the highest confidence and reduce the other persona's count. **Residual** = count of `residual_risks` from this persona's raw output (not the promoted subset in the Residual Concerns section).
--- a/plugins/compound-engineering/skills/document-review/references/subagent-template.md
+++ b/plugins/compound-engineering/skills/document-review/references/subagent-template.md
@@ -25,11 +25,18 @@ Rules:
 - Set `finding_type` for every finding:
  - `error`: Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs.
  - `omission`: Something the document forgot to say -- missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references.
- Set `autofix_class` conservatively:
-  - `auto`: Deterministic fixes where the correct value is verifiable from the document itself -- terminology corrections, formatting fixes, cross-reference repairs, wrong counts, missing list entries where the correct entry exists elsewhere in the document. The fix must be unambiguous.
-  - `batch_confirm`: Obvious fix with one clear correct answer, but it touches document meaning. Examples: adding a missing implementation step that is mechanically implied by other content, updating a summary to match its own details. Use when reasonable people would agree on the fix but it goes beyond cosmetic correction.
-  - `present`: Everything else -- strategic questions, tradeoffs, design tensions where reasonable people could disagree, informational findings.
- `suggested_fix` is optional. Only include it when the fix is obvious and correct. For `present` findings, frame as a question instead.
+- Set `autofix_class` based on determinism, not severity. A P1 finding can be `auto` if the correct fix is derivable from the document itself:
+  - `auto`: The correct fix is derivable from the document's own content without judgment about what to write. The test: is one part of the document clearly authoritative over another? If yes, reconcile toward the authority. Examples:
+    - Summary/detail mismatch: overview says "3 phases" but body describes 4 in detail -- update the summary
+    - Wrong count: "the following 3 steps" but 4 are listed -- fix the count
+    - Missing list entry where the correct entry exists elsewhere in the document
+    - Stale internal reference: "as described in Phase 3" but content moved to Phase 4 -- fix the pointer
+    - Terminology drift: document uses both "pipeline" and "workflow" for the same concept -- standardize to the more frequent term
+    - Prose/diagram contradiction where prose is more detailed and authoritative -- update the diagram description to match
+    Always include `suggested_fix` for auto findings.
+  - `batch_confirm`: One clear correct answer, but it authors new content where exact wording needs verification. The test: would reasonable people agree on WHAT to fix but potentially disagree on the exact PHRASING? Examples: adding a missing implementation step that is mechanically implied by other content, defining a threshold that is implied but never stated explicitly. Always include `suggested_fix` for batch_confirm findings.
+  - `present`: Requires judgment -- strategic questions, tradeoffs, design tensions where reasonable people could disagree, findings where the right action is unclear.
+- `suggested_fix` is required for `auto` and `batch_confirm` findings (see above). For `present` findings, `suggested_fix` is optional -- include it only when the fix is obvious, and frame as a question when the right action is unclear.
 - If you find no issues, return an empty findings array. Still populate residual_risks and deferred_questions if applicable.
 - Use your suppress conditions. Do not flag issues that belong to other personas.
 </output-contract>