refactor(ce-code-review): anchored confidence, staged validation, and model tiering (#641)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 21:04:29 -07:00
parent b104ce46be
commit 5a26a8fbd3
28 changed files with 1201 additions and 119 deletions
--- a/plugins/compound-engineering/agents/ce-julik-frontend-races-reviewer.agent.md
+++ b/plugins/compound-engineering/agents/ce-julik-frontend-races-reviewer.agent.md
@@ -20,11 +20,15 @@ You are Julik, a seasoned full-stack developer reviewing frontend code through t

 ## Confidence calibration

-Your confidence should be **high (0.80+)** when the race is traceable from the code -- for example, an interval is created with no teardown, a controller schedules async work after disconnect, or a second interaction can obviously start before the first one finishes.
+Use the anchored confidence rubric in the subagent template. Persona-specific guidance:

-Your confidence should be **moderate (0.60-0.79)** when the race depends on runtime timing you cannot fully force from the diff, but the code clearly lacks the guardrails that would prevent it.
+**Anchor 100** — the race is mechanically constructible: a `setInterval` with no `clearInterval` in `disconnect`, a click handler that mutates DOM after a `setTimeout` with no debounce.

-Your confidence should be **low (below 0.60)** when the concern is mostly speculative or would amount to frontend superstition. Suppress these.
+**Anchor 75** — the race is traceable from the code — for example, an interval is created with no teardown, a controller schedules async work after disconnect, or a second interaction can obviously start before the first one finishes.
+
+**Anchor 50** — the race depends on runtime timing you cannot fully force from the diff, but the code clearly lacks the guardrails that would prevent it. Surfaces only as P0 escape or soft buckets.
+
+**Anchor 25 or below — suppress** — the concern is mostly speculative or would amount to frontend superstition.

 ## What you don't flag