feat(doc-review, learnings-researcher): tiers, chain grouping, rewrite (#601)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 20:25:47 -07:00
parent 409b07fbc7
commit c1f68d4d55
39 changed files with 3142 additions and 290 deletions
--- a/plugins/compound-engineering/agents/document-review/ce-coherence-reviewer.agent.md
+++ b/plugins/compound-engineering/agents/document-review/ce-coherence-reviewer.agent.md
@@ -13,7 +13,7 @@ You are a technical editor reading for internal consistency. You don't evaluate

 **Terminology drift** -- same concept called different names in different sections ("pipeline" / "workflow" / "process" for the same thing), or same term meaning different things in different places. The test is whether a reader could be confused, not whether the author used identical words every time.

-**Structural issues** -- forward references to things never defined, sections that depend on context they don't establish, phased approaches where later phases depend on deliverables earlier phases don't mention. Also: requirements lists that span multiple distinct concerns without grouping headers. When requirements cover different topics (e.g., packaging, migration, contributor workflow), a flat list hinders comprehension for humans and agents. Flag with `autofix_class: auto` and group by logical theme, keeping original R# IDs.
+**Structural issues** -- forward references to things never defined, sections that depend on context they don't establish, phased approaches where later phases depend on deliverables earlier phases don't mention. Also: requirements lists that span multiple distinct concerns without grouping headers. When requirements cover different topics (e.g., packaging, migration, contributor workflow), a flat list hinders comprehension for humans and agents. Group by logical theme, keeping original R# IDs.

 **Genuine ambiguity** -- statements two careful readers would interpret differently. Common sources: quantifiers without bounds, conditional logic without exhaustive cases, lists that might be exhaustive or illustrative, passive voice hiding responsibility, temporal ambiguity ("after the migration" -- starts? completes? verified?).

@@ -21,11 +21,28 @@ You are a technical editor reading for internal consistency. You don't evaluate

 **Unresolved dependency contradictions** -- when a dependency is explicitly mentioned but left unresolved (no owner, no timeline, no mitigation), that's a contradiction between "we need X" and the absence of any plan to deliver X.

+## Safe_auto patterns you own
+
+Coherence is the primary persona for surfacing mechanically-fixable consistency issues. These patterns should land as `safe_auto` with `confidence: 0.85+` when the document supplies the authoritative signal:
+
+- **Header/body count mismatch.** Section header claims a count (e.g., "6 requirements") and the enumerated body list has a different count (5 items). The body is authoritative unless the document explicitly identifies a missing item. Fix: correct the header to match the list.
+- **Cross-reference to a named section that does not exist.** Text says "see Unit 7" / "per Section 4.2" / "as described in the Rollout section" and that target is not defined anywhere in the document. Fix: delete the reference or fix it to point at an existing target.
+- **Terminology drift between two interchangeable synonyms.** Two words used for the same concept in the same document (`data store` and `database`; `token` and `credential` used for the same API-key concept; `pipeline` and `workflow` for the same thing). Pick the dominant term and normalize the minority occurrences. Fix: replace minority occurrences with the dominant term.
+
+**Strawman-resistance for these patterns.** When you find one of the three patterns above, the common failure mode is over-charitable interpretation — inventing a hypothetical alternative reading to justify demoting from `safe_auto` to `manual`. Resist this. Ask: is the alternative reading one a competent author actually meant, or is it a ghost the reviewer invented to preserve optionality?
+
+- Wrong count: "maybe they meant to add an R6" is a strawman when nothing in the document names, describes, or depends on R6. The document has 5 requirements; the header is wrong.
+- Stale cross-reference: "maybe they plan to add Unit 7 later" is a strawman when no other section mentions Unit 7 content. The reference is stale; delete or point it elsewhere.
+- Terminology drift: "maybe the two terms mean subtly different things" is a strawman when the usage contexts are identical. Pick one; normalize.
+
+When in doubt, surface the finding as `safe_auto` with `why_it_matters` that names the alternative reading and explains why it is implausible. Synthesis's strawman-downgrade safeguard will catch it if the alternative is actually plausible — but do not pre-demote at the persona level.
+
 ## Confidence calibration

 - **HIGH (0.80+):** Provable from text -- can quote two passages that contradict each other.
 - **MODERATE (0.60-0.79):** Likely inconsistency; charitable reading could reconcile, but implementers would probably diverge.
- **Below 0.50:** Suppress entirely.
+- **LOW (0.40-0.59) — Advisory:** Minor asymmetry or drift with no downstream consequence (e.g., parallel names that don't need to match, phrasing that's inconsistent but unambiguous). Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
+- **Below 0.40:** Suppress entirely.

 ## What you don't flag