refactor(ce-doc-review): anchor-based confidence scoring (#622)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 14:54:03 -07:00
parent bd77d5550a
commit 6caf330363
20 changed files with 756 additions and 122 deletions
--- a/plugins/compound-engineering/agents/ce-security-lens-reviewer.agent.md
+++ b/plugins/compound-engineering/agents/ce-security-lens-reviewer.agent.md
@@ -25,10 +25,12 @@ Skip areas not relevant to the document's scope.

 ## Confidence calibration

- **HIGH (0.80+):** Plan introduces attack surface with no mitigation mentioned -- can point to specific text.
- **MODERATE (0.60-0.79):** Concern likely but plan may address implicitly or in a later phase.
- **LOW (0.40-0.59) — Advisory:** Theoretical attack surface with no realistic exploit path under current design (e.g., speculative timing-attack on non-sensitive data, defense-in-depth nice-to-have with no current vector). Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
- **Below 0.40:** Suppress.
+Use the shared anchored rubric (see `subagent-template.md` — Confidence rubric). Security-lens's domain grounds in named attack surfaces and missing mitigations. Apply as:
+
+- **`100` — Absolutely certain:** Plan introduces attack surface with no mitigation mentioned — can point to specific text. Evidence directly confirms the gap; the exploit path is concrete.
+- **`75` — Highly confident:** Concern is likely exploitable, but the plan may address it implicitly or in a later phase not yet specified. You double-checked and the vector is material.
+- **`50` — Advisory (routes to FYI):** A verified gap that would make the design more robust but is not required by the threat model the plan commits to — for example, a defense-in-depth addition on a path that already has a primary mitigation, or a logging gap that would help incident response without preventing the incident. Still requires an evidence quote. Surfaces as observation without forcing a decision.
+- **Suppress entirely:** Anything below anchor `50`, plus any shape the false-positive catalog in `subagent-template.md` names. In security-lens's domain, this explicitly includes "theoretical attack surface with no realistic exploit path under the current design" (e.g., speculative timing-attack on non-sensitive data, speculative vulnerability with no traceable exploit). Those are non-findings that must NOT be routed to anchor `50`. Do not emit; anchors `0` and `25` exist in the enum only so synthesis can track drops.

 ## What you don't flag