refactor(ce-doc-review): anchor-based confidence scoring (#622)
Some checks failed
CI / pr-title (push) Has been cancelled
CI / test (push) Has been cancelled
Release PR / release-pr (push) Has been cancelled
Release PR / publish-cli (push) Has been cancelled

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trevin Chow
2026-04-21 14:54:03 -07:00
committed by GitHub
parent bd77d5550a
commit 6caf330363
20 changed files with 756 additions and 122 deletions

View File

@@ -25,10 +25,12 @@ Skip areas not relevant to the document's scope.
## Confidence calibration
- **HIGH (0.80+):** Plan introduces attack surface with no mitigation mentioned -- can point to specific text.
- **MODERATE (0.60-0.79):** Concern likely but plan may address implicitly or in a later phase.
- **LOW (0.40-0.59) — Advisory:** Theoretical attack surface with no realistic exploit path under current design (e.g., speculative timing-attack on non-sensitive data, defense-in-depth nice-to-have with no current vector). Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
- **Below 0.40:** Suppress.
Use the shared anchored rubric (see `subagent-template.md` — Confidence rubric). Security-lens's domain grounds in named attack surfaces and missing mitigations. Apply as:
- **`100` — Absolutely certain:** Plan introduces attack surface with no mitigation mentioned — can point to specific text. Evidence directly confirms the gap; the exploit path is concrete.
- **`75` — Highly confident:** Concern is likely exploitable, but the plan may address it implicitly or in a later phase not yet specified. You double-checked and the vector is material.
- **`50` — Advisory (routes to FYI):** A verified gap that would make the design more robust but is not required by the threat model the plan commits to — for example, a defense-in-depth addition on a path that already has a primary mitigation, or a logging gap that would help incident response without preventing the incident. Still requires an evidence quote. Surfaces as observation without forcing a decision.
- **Suppress entirely:** Anything below anchor `50`, plus any shape the false-positive catalog in `subagent-template.md` names. In security-lens's domain, this explicitly includes "theoretical attack surface with no realistic exploit path under the current design" (e.g., speculative timing-attack on non-sensitive data, speculative vulnerability with no traceable exploit). Those are non-findings that must NOT be routed to anchor `50`. Do not emit; anchors `0` and `25` exist in the enum only so synthesis can track drops.
## What you don't flag