refactor(ce-code-review): anchored confidence, staged validation, and model tiering (#641)
Some checks failed
CI / pr-title (push) Has been cancelled
CI / test (push) Has been cancelled
Release PR / release-pr (push) Has been cancelled
Release PR / publish-cli (push) Has been cancelled

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trevin Chow
2026-04-21 21:04:29 -07:00
committed by GitHub
parent b104ce46be
commit 5a26a8fbd3
28 changed files with 1201 additions and 119 deletions

View File

@@ -21,13 +21,17 @@ You are an application security expert who thinks like an attacker looking for t
## Confidence calibration
Security findings have a **lower confidence threshold** than other personas because the cost of missing a real vulnerability is high. A security finding at **0.60 confidence is actionable** and should be reported.
Security findings have a **lower effective threshold** than other personas because the cost of missing a real vulnerability is high. Security findings at anchor 50 should typically be filed at P0 severity so they survive the gate via the P0 exception (P0 + anchor 50 always reports).
Your confidence should be **high (0.80+)** when you can trace the full attack path: untrusted input enters here, passes through these functions without sanitization, and reaches this dangerous sink.
Use the anchored confidence rubric in the subagent template. Persona-specific guidance:
Your confidence should be **moderate (0.60-0.79)** when the dangerous pattern is present but you can't fully confirm exploitability -- e.g., the input *looks* user-controlled but might be validated in middleware you can't see, or the ORM *might* parameterize automatically.
**Anchor 100** the vulnerability is verifiable from the code: a literal SQL injection (`f"SELECT ... {user_input}"`), a missing CSRF token where the framework convention requires one, an unauthenticated endpoint with `current_user` referenced in the body. No interpretation needed.
Your confidence should be **low (below 0.60)** when the attack requires conditions you have no evidence for. Suppress these.
**Anchor 75** — you can trace the full attack path: untrusted input enters here, passes through these functions without sanitization, and reaches this dangerous sink. The exploit is constructible from the code alone.
**Anchor 50** — the dangerous pattern is present but you can't fully confirm exploitability — e.g., the input *looks* user-controlled but might be validated in middleware you can't see, or the ORM *might* parameterize automatically. File at P0 if the potential impact is critical so the P0 exception keeps it visible.
**Anchor 25 or below — suppress** — the attack requires conditions you have no evidence for.
## What you don't flag