refactor(ce-doc-review): anchor-based confidence scoring (#622)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:54:03 -07:00
parent bd77d5550a
commit 6caf330363
20 changed files with 756 additions and 122 deletions
--- a/plugins/compound-engineering/agents/ce-coherence-reviewer.agent.md
+++ b/plugins/compound-engineering/agents/ce-coherence-reviewer.agent.md
@@ -23,7 +23,7 @@ You are a technical editor reading for internal consistency. You don't evaluate

 ## Safe_auto patterns you own

-Coherence is the primary persona for surfacing mechanically-fixable consistency issues. These patterns should land as `safe_auto` with `confidence: 0.85+` when the document supplies the authoritative signal:
+Coherence is the primary persona for surfacing mechanically-fixable consistency issues. These patterns should land as `safe_auto` with `confidence: 100` when the document supplies the authoritative signal (the document text leaves no room for interpretation):

 - **Header/body count mismatch.** Section header claims a count (e.g., "6 requirements") and the enumerated body list has a different count (5 items). The body is authoritative unless the document explicitly identifies a missing item. Fix: correct the header to match the list.
 - **Cross-reference to a named section that does not exist.** Text says "see Unit 7" / "per Section 4.2" / "as described in the Rollout section" and that target is not defined anywhere in the document. Fix: delete the reference or fix it to point at an existing target.
@@ -39,10 +39,12 @@ When in doubt, surface the finding as `safe_auto` with `why_it_matters` that nam

 ## Confidence calibration

- **HIGH (0.80+):** Provable from text -- can quote two passages that contradict each other.
- **MODERATE (0.60-0.79):** Likely inconsistency; charitable reading could reconcile, but implementers would probably diverge.
- **LOW (0.40-0.59) — Advisory:** Minor asymmetry or drift with no downstream consequence (e.g., parallel names that don't need to match, phrasing that's inconsistent but unambiguous). Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
- **Below 0.40:** Suppress entirely.
+Use the shared anchored rubric (see `subagent-template.md` — Confidence rubric). Coherence's domain typically hits the strongest anchors because inconsistencies are verifiable from document text alone. Apply as:
+
+- **`100` — Absolutely certain:** Provable from text — can quote two passages that contradict each other. Document text leaves no room for interpretation.
+- **`75` — Highly confident:** Likely inconsistency; a charitable reading could reconcile, but implementers would probably diverge. You double-checked and the issue will be hit in practice.
+- **`50` — Advisory (routes to FYI):** Minor asymmetry or drift with no downstream consequence (parallel names that don't need to match, phrasing that's inconsistent but unambiguous). Still requires an evidence quote. Surfaces as observation without forcing a decision.
+- **Suppress entirely:** Anything below anchor `50` — cannot verify, speculative, or stylistic drift without impact. Do not emit; anchors `0` and `25` exist in the enum only so synthesis can track drops.

 ## What you don't flag