diff --git a/plugins/compound-engineering/skills/document-review/SKILL.md b/plugins/compound-engineering/skills/document-review/SKILL.md index 8b39290..31eb1ff 100644 --- a/plugins/compound-engineering/skills/document-review/SKILL.md +++ b/plugins/compound-engineering/skills/document-review/SKILL.md @@ -14,12 +14,12 @@ Check the skill arguments for `mode:headless`. Arguments may contain a document If `mode:headless` is present, set **headless mode** for the rest of the workflow. -**Headless mode** changes the interaction model, not the classification boundaries. Document-review still applies the same judgment about what is deterministic vs. what needs verification. The only difference is how non-auto findings are delivered: +**Headless mode** changes the interaction model, not the classification boundaries. Document-review still applies the same judgment about what has one clear correct fix vs. what needs user judgment. The only difference is how non-auto findings are delivered: - `auto` fixes are applied silently (same as interactive) -- `batch_confirm` and `present` findings are returned as structured text for the caller to handle -- no AskUserQuestion prompts, no interactive approval +- `present` findings are returned as structured text for the caller to handle -- no AskUserQuestion prompts, no interactive approval - Phase 5 returns immediately with "Review complete" (no refine/complete question) -The caller receives findings with their original classifications intact and decides what to do with each tier. +The caller receives findings with their original classifications intact and decides what to do with them. Callers invoke headless mode by including `mode:headless` in the skill arguments, e.g.: ``` @@ -144,7 +144,7 @@ Fingerprint each finding using `normalize(section) + normalize(title)`. Normaliz When fingerprints match across personas: - If the findings recommend **opposing actions** (e.g., one says cut, the other says keep), do not merge -- preserve both for contradiction resolution in 3.5 - Otherwise merge: keep the highest severity, keep the highest confidence, union all evidence arrays, note all agreeing reviewers (e.g., "coherence, feasibility") -- **Coverage attribution:** Attribute the merged finding to the persona with the highest confidence. Decrement the losing persona's Findings count *and* the corresponding route bucket (Auto, Batch, or Present) so `Findings = Auto + Batch + Present` stays exact. +- **Coverage attribution:** Attribute the merged finding to the persona with the highest confidence. Decrement the losing persona's Findings count *and* the corresponding route bucket (Auto or Present) so `Findings = Auto + Present` stays exact. ### 3.4 Promote Residual Concerns @@ -167,17 +167,16 @@ Specific conflict patterns: ### 3.6 Route by Autofix Class -**Severity and autofix_class are independent.** A P1 finding can be `auto` if the correct fix is deterministic. The test is not "how important?" but "can the fix be derived from the document's own content without judgment?" +**Severity and autofix_class are independent.** A P1 finding can be `auto` if the correct fix is obvious. The test is not "how important?" but "is there one clear correct fix, or does this require judgment?" | Autofix Class | Route | |---------------|-------| -| `auto` | Apply automatically -- fix is derivable from the document itself. One part of the document is clearly authoritative over another; reconcile toward the authority. | -| `batch_confirm` | Group for single batch approval -- one clear correct answer, but authors new content where exact wording needs verification | +| `auto` | Apply automatically -- one clear correct fix. Includes both internal reconciliation (one part authoritative over another) and additions mechanically implied by the document's own content. | | `present` | Present individually for user judgment | -Demote any `auto` finding that lacks a `suggested_fix` to `batch_confirm`. Demote any `batch_confirm` finding that lacks a `suggested_fix` to `present`. +Demote any `auto` finding that lacks a `suggested_fix` to `present`. -**Auto-eligible patterns:** summary/detail mismatch (body is authoritative over overview), wrong counts, missing list entries derivable from elsewhere in the document, stale internal cross-references, terminology drift, prose/diagram contradictions where prose is more detailed. If the fix requires judgment about *what* to write (not just *that* something needs updating), it belongs in `batch_confirm` or `present`. +**Auto-eligible patterns:** summary/detail mismatch (body is authoritative over overview), wrong counts, missing list entries derivable from elsewhere in the document, stale internal cross-references, terminology drift, prose/diagram contradictions where prose is more detailed, missing steps mechanically implied by other content, unstated thresholds implied by surrounding context, completeness gaps where the correct addition is obvious. If the fix requires judgment about *what* to do (not just *what to write*), it belongs in `present`. ### 3.7 Sort @@ -190,29 +189,9 @@ Sort findings for presentation: P0 -> P1 -> P2 -> P3, then by finding type (erro Apply all `auto` findings to the document in a **single pass**: - Edit the document inline using the platform's edit tool - Track what was changed for the "Auto-fixes Applied" section -- Do not ask for approval -- these are unambiguously correct +- Do not ask for approval -- these have one clear correct fix -### Batch Confirm - -If any `batch_confirm` findings exist: - -**Headless mode:** Do not prompt. Include `batch_confirm` findings in the structured text output alongside `present` findings, clearly marked with their classification so the caller can distinguish them. The caller decides whether to apply them. - -**Interactive mode:** - -1. Present the proposed fixes in a numbered table (see template) -2. **Ask for approval using the platform's interactive question tool** -- do not print the question as plain text output: - - Claude Code: `AskUserQuestion` - - Codex: `request_user_input` - - Gemini: `ask_user` - - Fallback (no question tool available): present numbered options and stop; wait for the user's next message before proceeding -3. Question text: "Apply these N fixes? (yes/no/select)" -4. Handle the response: - - **yes**: Apply all in a single pass - - **select**: Let the user pick which to apply - - **no**: Demote remaining to the `present` findings list - -This turns N obvious-but-meaning-touching fixes into 1 interaction instead of N. +List every auto-fix in the output summary so the user can see what changed. Use enough detail to convey the substance of each fix (section, what was changed, reviewer attribution). This is especially important for fixes that add content or touch document meaning -- the user should not have to diff the document to understand what the review did. ### Present Remaining Findings @@ -221,21 +200,17 @@ This turns N obvious-but-meaning-touching fixes into 1 interaction instead of N. ``` Document review complete (headless mode). -Applied N auto-fixes. +Applied N auto-fixes: +-
: () +-
: () -Batch-confirm findings (clear fix, wording needs verification): +Findings (requires judgment): -[P1][batch_confirm] Section:
(<reviewer>, confidence <N>) - Why: <why_it_matters> - Suggested fix: <suggested_fix> - -Present findings (requires judgment): - -[P0][present] Section: <section> — <title> (<reviewer>, confidence <N>) +[P0] Section: <section> — <title> (<reviewer>, confidence <N>) Why: <why_it_matters> Suggested fix: <suggested_fix or "none"> -[P1][present] Section: <section> — <title> (<reviewer>, confidence <N>) +[P1] Section: <section> — <title> (<reviewer>, confidence <N>) Why: <why_it_matters> Suggested fix: <suggested_fix or "none"> @@ -254,7 +229,7 @@ Present `present` findings using the review output template included below. With - **Errors** (design tensions, contradictions, incorrect statements) first -- these need resolution - **Omissions** (missing steps, absent details, forgotten entries) second -- these need additions -Brief summary at the top: "Applied N auto-fixes. Batched M fixes for approval. K findings to consider (X errors, Y omissions)." +Brief summary at the top: "Applied N auto-fixes. K findings to consider (X errors, Y omissions)." Include the Coverage table, auto-fixes applied, residual concerns, and deferred questions. diff --git a/plugins/compound-engineering/skills/document-review/references/findings-schema.json b/plugins/compound-engineering/skills/document-review/references/findings-schema.json index 1346e97..9da1a9e 100644 --- a/plugins/compound-engineering/skills/document-review/references/findings-schema.json +++ b/plugins/compound-engineering/skills/document-review/references/findings-schema.json @@ -45,8 +45,8 @@ }, "autofix_class": { "type": "string", - "enum": ["auto", "batch_confirm", "present"], - "description": "How this issue should be handled. auto = local deterministic fix applied silently (terminology, formatting, cross-references, completeness corrections). batch_confirm = obvious fix with a clear correct answer, but touches meaning enough to warrant grouped approval. present = requires individual user judgment." + "enum": ["auto", "present"], + "description": "How this issue should be handled. auto = one clear correct fix that can be applied silently (terminology, formatting, cross-references, completeness corrections, additions mechanically implied by other content). present = requires individual user judgment." }, "finding_type": { "type": "string", @@ -97,9 +97,8 @@ "P3": "Minor improvement. User's discretion." }, "autofix_classes": { - "_principle": "Autofix class is independent of severity. A P1 finding can be auto if the fix is deterministic. The test: can the correct fix be derived from the document's own content without judgment?", - "auto": "Fix is derivable from the document itself -- one part is clearly authoritative over another, reconcile toward the authority. Includes: summary/detail mismatches, wrong counts, missing list entries, stale cross-references, terminology drift, prose/diagram contradictions where prose is authoritative. Must include suggested_fix.", - "batch_confirm": "One clear correct answer, but authors new content where exact wording needs verification. Grouped for single approval. Examples: adding a missing step mechanically implied by other content, defining an implied-but-unstated threshold. Must include suggested_fix.", + "_principle": "Autofix class is independent of severity. A P1 finding can be auto if the fix is obvious. The test: is there one clear correct fix, or does resolving this require judgment?", + "auto": "One clear correct fix -- applied silently. Includes both internal reconciliation (summary/detail mismatches, wrong counts, stale cross-references, terminology drift) and additions mechanically implied by other content (missing steps, unstated thresholds, completeness gaps where the correct content is obvious). Must include suggested_fix.", "present": "Requires individual user judgment -- strategic questions, design tradeoffs, or findings where reasonable people could disagree on the right action." }, "finding_types": { diff --git a/plugins/compound-engineering/skills/document-review/references/review-output-template.md b/plugins/compound-engineering/skills/document-review/references/review-output-template.md index 8bf8eb9..7f19a39 100644 --- a/plugins/compound-engineering/skills/document-review/references/review-output-template.md +++ b/plugins/compound-engineering/skills/document-review/references/review-output-template.md @@ -15,22 +15,15 @@ Use this **exact format** when presenting synthesized review findings. Findings - security-lens -- plan adds public API endpoint with auth flow - scope-guardian -- plan has 15 requirements across 3 priority levels -Applied 3 auto-fixes. Batched 2 fixes for approval. 4 findings to consider (2 errors, 2 omissions). +Applied 5 auto-fixes. 4 findings to consider (2 errors, 2 omissions). ### Auto-fixes Applied - Standardized "pipeline"/"workflow" terminology to "pipeline" throughout (coherence) - Fixed cross-reference: Section 4 referenced "Section 3.2" which is actually "Section 3.1" (coherence) - Updated unit count from "6 units" to "7 units" to match listed units (coherence) - -### Batch Confirm - -These fixes have one clear correct answer but touch document meaning. Apply all? - -| # | Section | Fix | Reviewer | -|---|---------|-----|----------| -| 1 | Unit 4 | Add "update API rate-limit config" step -- implied by Unit 3's rate-limit introduction | feasibility | -| 2 | Verification | Add auth token refresh to test scenarios -- required by Unit 2's token expiry handling | security-lens | +- Added "update API rate-limit config" step to Unit 4 -- implied by Unit 3's rate-limit introduction (feasibility) +- Added auth token refresh to test scenarios -- required by Unit 2's token expiry handling (security-lens) ### P0 -- Must Fix @@ -76,22 +69,21 @@ These fixes have one clear correct answer but touch document meaning. Apply all? ### Coverage -| Persona | Status | Findings | Auto | Batch | Present | Residual | -|---------|--------|----------|------|-------|---------|----------| -| coherence | completed | 3 | 2 | 0 | 1 | 0 | -| feasibility | completed | 2 | 0 | 1 | 1 | 1 | -| security-lens | completed | 2 | 0 | 1 | 1 | 0 | -| scope-guardian | completed | 1 | 0 | 0 | 1 | 0 | -| product-lens | not activated | -- | -- | -- | -- | -- | -| design-lens | not activated | -- | -- | -- | -- | -- | +| Persona | Status | Findings | Auto | Present | Residual | +|---------|--------|----------|------|---------|----------| +| coherence | completed | 4 | 3 | 1 | 0 | +| feasibility | completed | 2 | 1 | 1 | 1 | +| security-lens | completed | 2 | 1 | 1 | 0 | +| scope-guardian | completed | 1 | 0 | 1 | 0 | +| product-lens | not activated | -- | -- | -- | -- | +| design-lens | not activated | -- | -- | -- | -- | ``` ## Section Rules -- **Summary line**: Always present after the reviewer list. Format: "Applied N auto-fixes. Batched M fixes for approval. K findings to consider (X errors, Y omissions)." Omit any zero clause. -- **Auto-fixes Applied**: List fixes that were applied automatically (auto class). Omit section if none. -- **Batch Confirm**: Group `batch_confirm` findings for a single yes/no/select approval. Omit section if none. +- **Summary line**: Always present after the reviewer list. Format: "Applied N auto-fixes. K findings to consider (X errors, Y omissions)." Omit any zero clause. +- **Auto-fixes Applied**: List all fixes that were applied automatically (auto class). Include enough detail per fix to convey the substance -- especially for fixes that add content or touch document meaning. Omit section if none. - **P0-P3 sections**: Only include sections that have findings. Omit empty severity levels. Within each severity, separate into **Errors** and **Omissions** sub-headers. Omit a sub-header if that severity has none of that type. - **Residual Concerns**: Findings below confidence threshold that were promoted by cross-persona corroboration, plus unpromoted residual risks. Omit if none. - **Deferred Questions**: Questions for later workflow stages. Omit if none. -- **Coverage**: Always include. All counts are **post-synthesis**. **Findings** must equal Auto + Batch + Present exactly -- if deduplication merged a finding across personas, attribute it to the persona with the highest confidence and reduce the other persona's count. **Residual** = count of `residual_risks` from this persona's raw output (not the promoted subset in the Residual Concerns section). +- **Coverage**: Always include. All counts are **post-synthesis**. **Findings** must equal Auto + Present exactly -- if deduplication merged a finding across personas, attribute it to the persona with the highest confidence and reduce the other persona's count. **Residual** = count of `residual_risks` from this persona's raw output (not the promoted subset in the Residual Concerns section). diff --git a/plugins/compound-engineering/skills/document-review/references/subagent-template.md b/plugins/compound-engineering/skills/document-review/references/subagent-template.md index 997dbd7..94cab8f 100644 --- a/plugins/compound-engineering/skills/document-review/references/subagent-template.md +++ b/plugins/compound-engineering/skills/document-review/references/subagent-template.md @@ -25,18 +25,14 @@ Rules: - Set `finding_type` for every finding: - `error`: Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs. - `omission`: Something the document forgot to say -- missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references. -- Set `autofix_class` based on determinism, not severity. A P1 finding can be `auto` if the correct fix is derivable from the document itself: - - `auto`: The correct fix is derivable from the document's own content without judgment about what to write. The test: is one part of the document clearly authoritative over another? If yes, reconcile toward the authority. Examples: - - Summary/detail mismatch: overview says "3 phases" but body describes 4 in detail -- update the summary - - Wrong count: "the following 3 steps" but 4 are listed -- fix the count - - Missing list entry where the correct entry exists elsewhere in the document - - Stale internal reference: "as described in Phase 3" but content moved to Phase 4 -- fix the pointer - - Terminology drift: document uses both "pipeline" and "workflow" for the same concept -- standardize to the more frequent term - - Prose/diagram contradiction where prose is more detailed and authoritative -- update the diagram description to match +- Set `autofix_class` based on whether there is one clear correct fix, not on severity. A P1 finding can be `auto` if the fix is obvious: + - `auto`: One clear correct fix. Applied silently without asking. The test: is there only one reasonable way to resolve this? If yes, it is auto. Two categories: + - Internal reconciliation: one part of the document is authoritative over another -- reconcile toward the authority. Examples: summary/detail mismatches, wrong counts, missing list entries derivable from elsewhere, stale cross-references, terminology drift, prose/diagram contradictions where prose is authoritative. + - Implied additions: the correct content is mechanically obvious from the document's own context. Examples: adding a missing implementation step implied by other content, defining a threshold implied but never stated, completeness gaps where what to add is clear. Always include `suggested_fix` for auto findings. - - `batch_confirm`: One clear correct answer, but it authors new content where exact wording needs verification. The test: would reasonable people agree on WHAT to fix but potentially disagree on the exact PHRASING? Examples: adding a missing implementation step that is mechanically implied by other content, defining a threshold that is implied but never stated explicitly. Always include `suggested_fix` for batch_confirm findings. + NOT auto (the gap is clear but more than one reasonable fix exists): choosing an implementation approach when the document states a need without constraining how (e.g., "support offline mode" could mean service workers, local-first database, or queue-and-sync -- there is no single obvious answer), changing scope or priority where the author may have weighed tradeoffs the reviewer can't see (e.g., promoting a P2 to P1, or cutting a feature the document intentionally keeps at a lower tier). - `present`: Requires judgment -- strategic questions, tradeoffs, design tensions where reasonable people could disagree, findings where the right action is unclear. -- `suggested_fix` is required for `auto` and `batch_confirm` findings (see above). For `present` findings, `suggested_fix` is optional -- include it only when the fix is obvious, and frame as a question when the right action is unclear. +- `suggested_fix` is required for `auto` findings. For `present` findings, `suggested_fix` is optional -- include it only when the fix is obvious, and frame as a question when the right action is unclear. - If you find no issues, return an empty findings array. Still populate residual_risks and deferred_questions if applicable. - Use your suppress conditions. Do not flag issues that belong to other personas. </output-contract>