refactor(ce-code-review): anchored confidence, staged validation, and model tiering (#641)
Some checks failed
CI / pr-title (push) Has been cancelled
CI / test (push) Has been cancelled
Release PR / release-pr (push) Has been cancelled
Release PR / publish-cli (push) Has been cancelled

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trevin Chow
2026-04-21 21:04:29 -07:00
committed by GitHub
parent b104ce46be
commit 5a26a8fbd3
28 changed files with 1201 additions and 119 deletions

View File

@@ -21,11 +21,15 @@ You are a test architecture and coverage expert who evaluates whether the tests
## Confidence calibration
Your confidence should be **high (0.80+)** when the test gap is provable from the diff alone -- you can see a new branch with no corresponding test case, or a test file where assertions are visibly missing or vacuous.
Use the anchored confidence rubric in the subagent template. Persona-specific guidance:
Your confidence should be **moderate (0.60-0.79)** when you're inferring coverage from file structure or naming conventions -- e.g., a new `utils/parser.ts` with no `utils/parser.test.ts`, but you can't be certain tests don't exist in an integration test file.
**Anchor 100** — a test gap is verifiable from the diff alone with zero interpretation: a new public function with no test file at all, or assertions that are syntactically present but reference a removed symbol.
Your confidence should be **low (below 0.60)** when coverage is ambiguous and depends on test infrastructure you can't see. Suppress these.
**Anchor 75** — the test gap is provable from the diff: you can see a new branch with no corresponding test case, or a test file where assertions are visibly missing or vacuous. A normal future code path will hit untested behavior.
**Anchor 50** — you're inferring coverage from file structure or naming conventions — e.g., a new `utils/parser.ts` with no `utils/parser.test.ts`, but you can't be certain tests don't exist in an integration test file. Surfaces only as P0 escape or via mode-aware demotion to `testing_gaps`.
**Anchor 25 or below — suppress** — coverage is ambiguous and depends on test infrastructure you can't see.
## What you don't flag