feat(doc-review, learnings-researcher): tiers, chain grouping, rewrite (#601)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 20:25:47 -07:00
parent 409b07fbc7
commit c1f68d4d55
39 changed files with 3142 additions and 290 deletions
--- a/tests/fixtures/ce-doc-review/seeded-auth-plan.md
+++ b/tests/fixtures/ce-doc-review/seeded-auth-plan.md
@@ -0,0 +1,217 @@
+---
+title: Seeded Test Fixture — Auth Gateway Migration Plan
+type: feat
+status: active
+date: 2026-04-19
+---
+
+<!--
+This is a SEEDED TEST FIXTURE for ce-doc-review pipeline validation.
+Second fixture alongside seeded-plan.md. Different domain (auth migration)
+and different premise shape (security/reliability rather than naming)
+so the pipeline can be measured against a second set of known
+classifications.
+
+Seed map (run this plan through ce-doc-review to verify):
+
+- safe_auto candidates (3):
+    - wrong count (Requirements Trace says 7 requirements, list has 6)
+    - terminology drift (uses "token", "credential", "secret"
+      interchangeably for the same API-key concept)
+    - stale cross-reference (see-Unit-9 but only Units 1-6 exist)
+
+- gated_auto candidates (3):
+    - missing CSRF protection on the new session endpoints — the
+      framework (OAuth2-Proxy) has a built-in option; the plan rolls
+      its own partial check
+    - deployment-ordering guarantee missing between gateway rollout
+      and downstream service updates
+    - framework-native-API substitution (plan describes hand-rolled
+      token-refresh loop; the library ships refresh middleware)
+
+- manual candidates with TWO valid premise roots (intentional — tests
+  multi-root behavior):
+    - ROOT A: "Is migration to managed auth justified?" (top-level
+      premise) with dependents:
+        - Service-mesh integration layer complexity
+        - New secrets-rotation workflow scope
+        - Rollout of token-refresh middleware
+    - ROOT B: "Is the custom policy-enforcement layer warranted?"
+      (narrower premise about a specific sub-component) with
+      dependents:
+        - Policy-DSL parser abstraction
+        - Per-route policy cache design
+
+  Expected synthesis: elevates BOTH roots; dependents assign to
+  whichever root's fix most directly moots them. If the synthesizer
+  picks only one root, the other's dependents strand as top-level
+  manual findings — that's the regression we're watching for.
+
+- manual candidates independent of either root (3, should NOT link):
+    - SLO / error-budget commitment missing — operational gap that
+      exists regardless of which auth path is chosen
+    - Session-timeout cross-tab coordination not specified —
+      behavior concern that applies under both migration and
+      status-quo paths
+    - PII handling during migration window unstated — compliance
+      gap independent of premise
+
+- FYI candidates (4, confidence 0.40-0.65 at P3):
+    - naming preference ("AuthContext" vs "SessionContext" — both
+      legible in the code)
+    - speculative future-work concern (could reuse this for a
+      hypothetical mobile SDK that isn't on the roadmap)
+    - subjective readability note about the config schema shape
+    - unit-organization preference (could group by route rather
+      than by endpoint class — current split also reads fine)
+
+- drop-worthy P3s (3, confidence 0.55-0.74):
+    - vague performance concern without baseline ("could be slow
+      under load")
+    - theoretical multi-region concern not relevant to single-region
+      deployment
+    - nitpick about commit-message style in the rollout plan
+
+The fixture includes multiple premise challenges at DIFFERENT scopes
+to exercise the multi-root synthesis path. Unlike the rename-shape
+fixture, the root candidates here are genuinely distinct (managed-auth
+migration vs. custom policy layer) — neither subsumes the other.
+-->
+
+# Seeded Auth Gateway Migration Plan
+
+## Problem Frame
+
+Our internal API gateway currently implements authentication via a hand-rolled JWT layer and a custom policy-enforcement module. This plan migrates the gateway to a managed auth service (via service-mesh integration) and introduces a new DSL-based policy layer.
+
+The migration affects 6 downstream services. No user-reported authentication failures motivated this work — the driver is infrastructure consolidation across teams.
+
+## Requirements Trace
+
+7 requirements planned:
+
+- R1. Integrate with the managed auth service via service-mesh adapter
+- R2. Retire the hand-rolled JWT signing / verification layer
+- R3. Implement the new policy DSL parser and per-route policy cache
+- R4. Migrate credential storage from app-local config to managed secrets
+- R5. Add token-refresh middleware for downstream services
+- R6. Coordinate cutover with downstream services' deploy cycles
+
+(Only 6 items listed despite "7 requirements" — seeded wrong-count
+safe_auto candidate.)
+
+## Scope Boundaries
+
+- Not changing the user-facing auth UX (login flows, error messages)
+- Not migrating non-gateway services' internal auth (out of scope for this phase)
+
+## Key Technical Decisions
+
+- Use the managed auth service's service-mesh adapter rather than direct SDK integration
+- Introduce a custom policy-DSL parser with a per-route policy cache layer (see Unit 9 for cache invalidation — seeded stale cross-reference; Unit 9 does not exist in this plan)
+- Store API keys in the managed secrets store; remove app-local config entries
+- Hand-roll the token-refresh loop (check expiry every 30s, renew if within 60s of expiry)
+
+(Uses "API key", "token", "credential", and "secret" interchangeably
+throughout — seeded terminology drift safe_auto candidate.)
+
+## Implementation Units
+
+- [ ] Unit 1: Service-mesh adapter integration
+
+**Goal:** Wire the gateway to the managed auth service via the mesh sidecar.
+
+**Files:** `internal/gateway/auth/mesh_adapter.go`
+
+**Approach:** Implement adapter interface against mesh sidecar. Fall back to legacy JWT layer during cutover window if adapter fails. (Seeded manual dependent of ROOT A: this complexity exists only because the migration is happening; if the migration premise is rejected, the adapter layer is unnecessary.)
+
+- [ ] Unit 2: Policy DSL parser
+
+**Goal:** Parse the new policy DSL and compile to a per-route evaluator.
+
+**Files:** `internal/gateway/policy/parser.go`, `internal/gateway/policy/evaluator.go`
+
+**Approach:** Write a recursive-descent parser. Cache compiled evaluators in a concurrent map keyed by route. (Seeded manual dependent of ROOT B: the parser exists solely to support the custom policy layer; if the custom policy-layer premise is rejected in favor of the managed service's native policy language, the parser is dead code.)
+
+- [ ] Unit 3: Per-route policy cache
+
+**Goal:** Cache compiled policy evaluators with LRU eviction.
+
+**Files:** `internal/gateway/policy/cache.go`
+
+**Approach:** Concurrent LRU keyed by `(route_id, policy_version)`. Invalidate on config reload.
+
+(Seeded manual dependent of ROOT B: cache design only matters if the custom policy layer exists.)
+
+- [ ] Unit 4: CSRF protection on new session endpoints
+
+**Goal:** Add CSRF checks on the three new session endpoints introduced by the migration.
+
+**Files:** `internal/gateway/auth/session.go`
+
+**Approach:** Check the `X-CSRF-Token` header against a session-scoped token stored server-side. Reject requests where the token is missing or mismatched. No double-submit cookie pattern because the gateway is same-origin.
+
+(Seeded gated_auto: OAuth2-Proxy ships a built-in CSRF middleware that handles this uniformly — including rotation and HMAC signing — which the hand-rolled version lacks. The hand-rolled check also omits the Origin header check that OAuth2-Proxy's default includes.)
+
+- [ ] Unit 5: Token-refresh middleware
+
+**Goal:** Refresh short-lived tokens before they expire.
+
+**Files:** `internal/gateway/auth/refresh.go`
+
+**Approach:** Poll token expiry every 30 seconds. If within 60 seconds of expiry, call refresh endpoint and swap the token in-place. Log refresh failures but continue serving with the old token until it expires.
+
+(Seeded gated_auto: the auth-service client library ships a refresh middleware that handles this uniformly — including backoff, concurrency guards against duplicate-refresh stampedes, and fail-closed semantics on refresh failure. The hand-rolled version is missing the concurrency guard and the fail-closed branch.)
+
+- [ ] Unit 6: Coordinate cutover with downstream services
+
+**Goal:** Coordinate the gateway's cutover with the 6 downstream services.
+
+**Files:** `docs/rollout/auth-cutover-plan.md`
+
+**Approach:** Stagger rollout over 3 business days. Gateway deploys first, then downstream services pick up the new auth contract over the following 48 hours.
+
+(Seeded gated_auto: no explicit deployment-ordering guarantee between the gateway's secrets-migration step and the downstream services' config reload — if the secrets migration lands before downstream services reload, they fail auth against the new store; if after, the gateway has no credentials for the window between its deploy and the migration. A dual-read or versioned-secrets pattern would close this.)
+
+## Risks
+
+- The migration's premise is "infrastructure consolidation." We have no user-reported auth failures and no stated reliability or security gap in the current hand-rolled layer. The consolidation benefit is real but speculative — this is a large refactor on a working system. (Seeded manual — ROOT A premise challenge: "Is migration to managed auth justified given no user-facing problem motivates it?")
+
+- The policy DSL is a new abstraction we build specifically for this gateway. The managed auth service ships its own policy language that covers 80% of our current rules natively. Hand-rolling the DSL means owning a parser, cache, and evaluator that the managed service would provide for free. (Seeded manual — ROOT B premise challenge: "Is the custom policy-enforcement layer warranted when the managed service ships one?")
+
+- The hand-rolled token-refresh loop has no concurrency guard; multiple goroutines may attempt refresh simultaneously under burst traffic, producing refresh-endpoint load spikes. (Seeded manual, independent of roots: this is an operational concern that exists regardless of which auth path is chosen.)
+
+## Miscellaneous Notes
+
+The managed secrets store introduces a new rotation workflow we don't currently have. This is net-new operational surface: we'd need runbooks for manual rotation, automatic-rotation settings, and break-glass access. (Seeded manual dependent of ROOT A: this workflow only exists because of the migration; if the migration is rejected, the rotation surface stays as-is.)
+
+Our error budget for the gateway is 0.1% monthly error rate. The plan does not state the expected error-rate impact of cutover, rollback criteria tied to the budget, or how the transition affects SLO burn. (Seeded manual independent of roots: operational obligation regardless of premise.)
+
+We name the session context struct `AuthContext` in the new code but the existing code uses `SessionContext` for the same concept. (Seeded FYI: naming preference — both are legible, no wrong answer.)
+
+The config-schema shape is fairly nested (4 levels deep) for a handful of flags. Could be flattened. (Seeded FYI: subjective readability note about schema shape.)
+
+We could reuse this auth adapter pattern for a hypothetical future mobile SDK. That SDK isn't currently on the roadmap. (Seeded FYI: speculative future-work concern with no current signal.)
+
+The gateway is single-region today. Multi-region is not on the near-term roadmap, but if it becomes relevant, the per-route policy cache would need cross-region invalidation. (Seeded drop: theoretical multi-region concern not relevant to current deployment, P3.)
+
+## PII Handling
+
+Migration touches user-identifier fields during the JWT layer retirement. (Seeded manual independent of roots: PII compliance gap that applies during the migration window regardless of which premise holds; even if both premises are accepted, the migration itself needs explicit PII-handling guidance.)
+
+## Deferred to Implementation
+
+- Exact SLO monitoring dashboards
+- Per-service rollout timing
+
+## Known Drift
+
+- The existing hand-rolled JWT module is retained for one release after cutover as a fallback path (Unit 1). We may remove it later. (Seeded FYI: drift note without concrete action, low-stakes.)
+
+- Unit-organization choice: units are grouped by component (adapter, parser, cache, CSRF, refresh, cutover) rather than by endpoint class. Reads fine either way. (Seeded FYI: unit-organization preference, no wrong answer.)
+
+## Low-Signal Residuals (Seeded Drop-Worthy P3s)
+
+- The new policy layer "could be slow under load" — no baseline or benchmark, speculative. (Seeded drop: vague performance concern without evidence, P3.)
+- Commit-message style in the rollout plan uses short subjects; some may prefer longer. (Seeded drop: nitpick about commit-message convention, P3.)
+- The migration window is described as "a few days" — could be tighter. (Seeded drop: vague-phrasing preference at P3 with no consequence.)
--- a/tests/fixtures/ce-doc-review/seeded-feature-plan.md
+++ b/tests/fixtures/ce-doc-review/seeded-feature-plan.md
@@ -0,0 +1,194 @@
+---
+title: Seeded Test Fixture — Notification Preferences Redesign
+type: feat
+status: active
+date: 2026-04-19
+---
+
+<!--
+This is a SEEDED TEST FIXTURE for ce-doc-review pipeline validation.
+Third fixture alongside seeded-plan.md (rename/infra) and
+seeded-auth-plan.md (auth migration). Designed to exercise three gaps
+the other fixtures do not cover:
+
+1. design-lens persona activation and calibration — the document
+   contains UI/UX content, user flows, visual hierarchy, and
+   interaction descriptions.
+2. Zero-root chain path — every finding in this fixture is
+   independent; no seeded premise challenges exist. The synthesis
+   pipeline should correctly skip chain grouping (report
+   "Chains: 0 roots" or omit the Chains line).
+3. Small-document / minimum-persona path — the document is ~130
+   lines (vs ~210 for the other fixtures) so the adversarial reviewer
+   should run Quick mode (produce ≤3 findings), and scope-guardian /
+   adversarial may not activate at all given the simpler shape.
+
+Deliberate design constraint: NO premise-level challenges. Every
+seeded finding is about execution details, not foundational
+assumptions. There is no "is this feature justified?" or "does this
+serve a real user problem?" shape. If any reviewer surfaces a
+premise-level concern anyway, that is a calibration signal worth
+flagging (over-charitable root identification).
+
+Seed map (run this plan through ce-doc-review to verify):
+
+- safe_auto candidates (2):
+    - wrong count (Requirements Trace says 5 requirements, list has 4)
+    - terminology drift ("preference" / "setting" / "config"
+      used interchangeably for the same concept)
+
+- gated_auto candidates (3):
+    - missing accessibility labels on the toggle components —
+      framework has standard aria-label pattern
+    - missing loading/error state in the Save flow — standard
+      pattern exists in the codebase (cite existing component)
+    - missing confirmation dialog on "unsubscribe from all"
+      destructive action — codebase pattern exists
+
+- manual candidates (4, all INDEPENDENT, no premise roots):
+    - Grouping strategy: by channel (email/push/SMS) vs by topic
+      (comments/mentions/updates) — real tradeoff, both legitimate
+    - Default state for new users: all-on, all-off, or curated subset
+    - Save pattern: explicit Save button vs auto-save on toggle
+    - Admin enforcement: can org admins enforce preferences, and
+      with what override UX
+
+- FYI candidates (3):
+    - naming preference ("Notification Center" vs "Preferences" vs
+      "Settings" — any works)
+    - micro-interaction suggestion (animate toggle state changes,
+      low-stakes)
+    - speculative analytics-event addition (not required by any
+      stated goal)
+
+- drop-worthy P3s (2):
+    - vague style nitpick on the mock layout
+    - theoretical i18n concern when no localization is in scope
+
+Expected pipeline behavior:
+- design-lens activates (UI/UX content triggers it) and produces
+  findings specific to its scope.
+- scope-guardian may activate lightly (no priority tiers, ≤5
+  requirements) or not at all.
+- adversarial: either does not activate or runs Quick mode with ≤3
+  findings.
+- Chains: 0 roots (no premise challenges exist; chain grouping
+  skipped). This is the key new-path test.
+- Engagement burden expected: 2 applied + 3 gated + ~4-5 manual
+  + ~2-3 FYI = roughly 7-10 user decisions, none of which cascade.
+
+The absence of a chain is itself the test result — if a chain appears,
+a reviewer has over-elevated an execution finding to premise-root
+status, which is worth investigating.
+-->
+
+# Notification Preferences Redesign
+
+## Problem Frame
+
+Users currently manage notification preferences through a linear list of 18 toggle switches on a single screen. In-app analytics show a 6% engagement rate with the page and a support-ticket volume averaging 12/month for "I'm getting too many notifications" — both metrics documented in the Growth team's Q1 2026 review. This redesign restructures the page for faster comprehension and reduces support volume by giving users clearer control.
+
+## Requirements Trace
+
+5 requirements planned:
+
+- R1. Group preferences by a meaningful dimension (channel, topic, or both)
+- R2. Provide a bulk-action affordance for common preference sets
+- R3. Add accessibility labels and keyboard navigation to the new controls
+- R4. Preserve existing preference values during the migration
+
+(Only 4 items listed despite "5 requirements" — seeded wrong-count safe_auto candidate.)
+
+## User Flows
+
+**Primary flow — change one setting:**
+1. User opens Notification Preferences from the account menu
+2. User sees the grouped layout with current values
+3. User toggles one control
+4. System persists the change (save pattern is an open question — see Miscellaneous Notes)
+
+**Secondary flow — bulk unsubscribe:**
+1. User clicks "Turn off all notifications" at the top of the page
+2. System applies the change to every preference in the page
+3. User sees a confirmation that changes were applied
+
+(Seeded gated_auto: the destructive bulk-unsubscribe action has no confirmation dialog. The codebase pattern for destructive bulk actions — see `components/confirm-dialog.tsx` — is used elsewhere in the settings surface and would apply cleanly here.)
+
+## Implementation Units
+
+- [ ] Unit 1: Group preferences by a chosen dimension
+
+**Goal:** Restructure the preference list into groups based on the chosen dimension.
+
+**Files:** `src/routes/settings/notifications/page.tsx`, `src/routes/settings/notifications/group.tsx`
+
+**Approach:** Render one `<PreferenceGroup>` component per group. Each group has a header and a body containing the toggles. Groups are expanded by default.
+
+- [ ] Unit 2: Bulk-action affordances
+
+**Goal:** Add a bulk-action row at the top of the page with an "Off" switch that turns off every preference at once.
+
+**Files:** `src/routes/settings/notifications/bulk-actions.tsx`
+
+**Approach:** One toggle at the page root that cascades to every child toggle when activated.
+
+- [ ] Unit 3: Accessibility labels and keyboard navigation
+
+**Goal:** Every new toggle has an aria-label, a visible focus ring, and is reachable via tab order.
+
+**Files:** `src/routes/settings/notifications/group.tsx`, `src/routes/settings/notifications/toggle.tsx`
+
+**Approach:** Pass `aria-label` through the `<Toggle>` prop interface. (Seeded gated_auto: the `<Toggle>` component in `src/components/toggle.tsx` does not currently accept an `aria-label` prop — implementer must extend the interface. The component's existing `label` prop is rendered visually; screen readers would announce both unless `aria-labelledby` is used. The codebase convention — see `src/components/toggle.tsx` line 34 — is to pass a hidden label via `aria-label` when the visible label is not the screen-reader-friendly string.)
+
+- [ ] Unit 4: Persist preferences during migration
+
+**Goal:** The redesign ships as a replacement; existing preference values must be preserved.
+
+**Files:** `src/db/migrations/20260419_notification_preferences_shape.sql`
+
+**Approach:** Data model is unchanged; only the rendering layer is updated. No migration required beyond the UI swap.
+
+## Design Notes
+
+**Visual hierarchy:** Each group has a bold header, a lighter description, and the toggles in a vertical stack. Spacing between groups uses the same token as other settings surfaces (`space-6`).
+
+**Toggle states:** Default (off), On, Saving, Error. The current design mocks show the Default and On states; Saving and Error are not represented. (Seeded gated_auto: the codebase Save-flow convention — see `src/components/async-button.tsx` — is to show a subtle spinner on the interacting control during the pending state and a toast with retry on error. The plan's Save flow needs these states explicit.)
+
+**Grouping dimension — open question.** The design mocks show grouping by channel (Email, Push, SMS). Product has also argued for grouping by topic (Comments, Mentions, Updates, Marketing). Both structures work; the tradeoff is:
+- Channel-grouped: users who want to kill push but keep email scan faster
+- Topic-grouped: users who want to turn off marketing but keep mentions scan faster
+
+(Seeded manual: real tradeoff with no objectively correct answer. This is a product decision, not a design-correctness finding.)
+
+## Scope Boundaries
+
+- Not changing the underlying data model or preference-evaluation logic
+- Not localizing the strings in this phase (all strings English-only)
+- Not touching admin-side controls (org admin enforcement is covered in a separate initiative)
+
+## Miscellaneous Notes
+
+**Save pattern — open question.** The current page uses an explicit "Save" button at the bottom. The redesign mocks show auto-save on toggle. Tradeoff:
+- Explicit save: users can experiment and discard
+- Auto-save: one fewer interaction, matches platform conventions
+
+(Seeded manual: save-pattern choice has real tradeoffs, neither is wrong.)
+
+**Admin enforcement.** Org admins may want to enforce certain notification preferences (e.g., mandatory security-alert emails). This plan assumes admin enforcement is out of scope per Scope Boundaries, but the grouping and default-state decisions below should not foreclose that future. (Seeded manual: plan decides whether to preemptively accommodate admin enforcement or defer entirely.)
+
+**Default state for new users.** All-on produces the current high-support-ticket problem; all-off silences potentially important notifications; curated subset requires us to pick which subset. (Seeded manual: real product decision, no objectively correct answer.)
+
+**Terminology:** We use "preference," "setting," and "config" in different places to mean the same thing. The design mock header says "Notification Preferences" but the navigation link says "Notification Settings" and the codebase file is `notification-config.ts`. (Seeded safe_auto: terminology drift; dominant term is "preference" based on the mock and the user-facing label.)
+
+**Naming the page.** The current nav link says "Notification Settings"; the design mock header says "Notification Preferences"; product marketing uses "Notification Center." Any of these is legible. (Seeded FYI: naming preference, low-stakes.)
+
+**Cross-reference in Unit 3: see existing keyboard navigation guide in `docs/guides/keyboard-nav.md` (Section 4 — Form Controls) for the canonical tab-order pattern.** (Seeded safe_auto: this file does not exist in the repo; the reference is stale. Remove or point at a real target.)
+
+**Animate toggle state changes.** A small state-change animation (150ms ease) would feel more polished. Not required by any stated goal. (Seeded FYI: micro-interaction, low-stakes.)
+
+**Analytics event suggestion.** We could emit a `notification_preference_changed` event with the before/after value. Useful for future Growth analysis but not required by any requirement. (Seeded FYI: speculative analytics addition, not tied to stated goals.)
+
+## Low-Signal Residuals (Seeded Drop-Worthy P3s)
+
+- The mock layout "feels a little tight" — subjective style nitpick without evidence of impact. (Seeded drop: vague style preference at P3.)
+- If we ever localize, the group headers will need translation. Localization is explicitly out of scope. (Seeded drop: theoretical i18n concern with no current relevance, P3.)
--- a/tests/fixtures/ce-doc-review/seeded-plan.md
+++ b/tests/fixtures/ce-doc-review/seeded-plan.md
@@ -0,0 +1,213 @@
+---
+title: Seeded Test Fixture for ce-doc-review Pipeline Validation
+type: feat
+status: active
+date: 2026-04-18
+---
+
+<!--
+This is a SEEDED TEST FIXTURE for ce-doc-review pipeline validation.
+It contains deliberately-planted issues across each tier shape so the
+new synthesis pipeline (safe_auto / gated_auto / manual / FYI / dropped)
+can be measured against known expected classifications.
+
+Seed map (run this plan through ce-doc-review to verify):
+
+- safe_auto candidates (3): wrong count (Requirements Trace says 6, list
+  has 5), terminology drift (data store vs database used interchangeably),
+  stale cross-reference (see-Unit-7 but no Unit 7 exists)
+- gated_auto candidates (3): missing fallback-with-deprecation-warning on
+  rename, deployment-ordering guarantee missing between skill+code commit,
+  framework-native-API substitution (hand-rolled deprecation vs using
+  cobra's Deprecated field)
+- manual candidates (5): scope-guardian tension (Unit 2 could be merged
+  with Unit 3), product-lens premise question (is the refactor the right
+  solution), coherence design tension (two sections disagree on status),
+  scope-guardian complexity challenge (is this abstraction warranted),
+  product-lens trajectory concern (does this paint the system into a
+  corner)
+- FYI candidates (5, confidence 0.40-0.65 at P3): filename-symmetry
+  observation, drift note, stylistic preference without evidence of
+  impact, speculative future-work concern, subjective readability note
+- drop-worthy P3s (3, confidence 0.55-0.74): vague style nitpick, low-
+  signal "consider X" residual, theoretical scalability concern without
+  current evidence
+
+The descriptions intentionally vary in evidence quality so the confidence
+gate is exercised.
+-->
+
+# Seeded Test Fixture Plan
+
+## Problem Frame
+
+This fixture exercises the ce-doc-review pipeline against representative
+issue shapes. The imagined feature is a refactor renaming the `crowd-sniff`
+CLI command to `browser-sniff` across 6 implementation units, with
+alias-compatibility, skill updates, and a schema migration.
+
+## Requirements Trace
+
+6 requirements planned:
+
+- R1. Rename command and add deprecation alias
+- R2. Update skills that invoke the command
+- R3. Rename output files from `crowd-report` to `browser-report`
+- R4. Migrate data store entries that reference the old name
+- R5. Update CLI tests
+
+(Only 5 items listed despite "6 requirements" — seeded wrong-count
+safe_auto candidate.)
+
+## Scope Boundaries
+
+- Not changing the command's runtime behavior
+- Not changing consumer-facing output formats beyond the rename
+
+## Key Technical Decisions
+
+- Keep a hidden alias `crowd-sniff` for backward compatibility (see Unit 7
+  below for alias deprecation plan — seeded stale cross-reference; Unit 7
+  does not exist in this plan)
+- Store deprecation state in the data store
+- Emit deprecation warning when alias is used
+
+(Uses "data store" here and "database" elsewhere — seeded terminology
+drift safe_auto candidate.)
+
+## Implementation Units
+
+- [ ] Unit 1: Rename the CLI command
+
+**Goal:** Rename `crowd-sniff` to `browser-sniff` in the CLI framework.
+
+**Files:** `internal/cli/crowd_sniff.go`
+
+**Approach:** Move the command definition. Keep the old name as an alias.
+Print a one-line deprecation warning to stdout when alias is used. (Seeded
+gated_auto: cobra's native `Deprecated` field handles this uniformly;
+hand-rolling the deprecation warning duplicates framework behavior.)
+
+**Test scenarios:**
+
+- Happy path: `browser-sniff` runs without warning
+- Happy path: `crowd-sniff` runs and prints deprecation warning
+- Edge case: `-h` on either variant shows the same help
+
+- [ ] Unit 2: Update skills to invoke new command
+
+**Goal:** Update every skill that shells out to `crowd-sniff` to call
+`browser-sniff` instead.
+
+**Files:** `plugins/*/skills/*/SKILL.md` (grep for "crowd-sniff")
+
+**Approach:** sed rename across skill files. Keep alias working for
+external consumers that may still invoke `crowd-sniff` directly.
+
+(Seeded manual: this unit could be merged with Unit 3 since both update
+consumer sites that will deploy together — scope-guardian candidate for
+"Units 2 and 3 could be one unit.")
+
+- [ ] Unit 3: Rename output files
+
+**Goal:** Change output filename from `crowd-report.md` to
+`browser-report.md`.
+
+**Files:** `internal/cli/output.go`, `internal/pipeline/writer.go`
+
+**Approach:** Write new name, read new name. No fallback — consumers that
+read `crowd-report.md` will need to update. (Seeded gated_auto: missing
+fallback-with-deprecation-warning on rename; mid-flight consumers and
+published content will silently fail. Industry-standard pattern is read
+new name first, fall back to old with warning for one release.)
+
+**Test scenarios:**
+
+- Happy path: new writes go to `browser-report.md`
+
+(Seeded FYI: test coverage only covers the happy path and misses the
+read-side failure modes entirely, but flagging this is low-signal since
+the unit explicitly chose no-fallback.)
+
+- [ ] Unit 4: Migrate data store entries
+
+**Goal:** Update database entries that reference the old name.
+
+**Files:** `db/migrate/20260418_rename_crowd_sniff.rb`
+
+**Approach:** Single-transaction migration. No deployment-ordering
+guarantee between this migration and the code changes in Units 1-3. If
+the migration runs before Units 1-3 land, the code reads stale data.
+If after, new code temporarily sees old entries until migration runs.
+(Seeded gated_auto: deployment-ordering guarantee missing; concrete fix
+is to require Units 1-4 land in a single commit/PR.)
+
+- [ ] Unit 5: Update CLI tests
+
+**Goal:** Update CLI tests to exercise both names.
+
+**Files:** `internal/cli/cli_test.go`
+
+**Approach:** Add test coverage for the new command name and the alias
+behavior.
+
+**Test scenarios:**
+
+- Happy path: new name test
+- Happy path: alias name test with deprecation warning assertion
+
+## Risks
+
+- The filename rename affects downstream consumers' readers. The chosen
+  approach (no-fallback) is subjective and could go either way — keeping
+  strict "move on" semantics vs. backward-compatible read fallback.
+  (Seeded manual: genuine design tension between "clean break" and
+  "compatibility period"; scope-guardian vs. product-lens judgment call.)
+
+- The alias is compatibility theater if there are no external consumers.
+  We don't have evidence of external consumers. (Seeded manual:
+  product-lens premise challenge — "is the alias justified given no
+  external consumers are documented?")
+
+## Miscellaneous Notes
+
+The filename `browser-report.md` is asymmetric with the command name
+`browser-sniff` — there's no `-sniff-report.md`. This could go either way
+depending on whether command/output parity is valued. (Seeded FYI:
+filename asymmetry observation, no wrong answer, low-stakes.)
+
+Consider renaming the database column `crowd_data` to `browser_data` for
+consistency. (Seeded FYI: stylistic preference without evidence of
+impact.)
+
+The refactor may paint the system into a corner if we later want to
+support both crowd-based and browser-based sniffing. (Seeded manual:
+product-lens trajectory concern about future path dependencies.)
+
+## Deferred to Implementation
+
+- Exact deprecation message wording
+- Release notes phrasing
+
+## Known Drift
+
+`crowd_data` column name remains in the data store schema (legacy). We
+may rename it later. (Seeded FYI: drift note without concrete fix.)
+
+## Abstraction Commentary
+
+The refactor introduces an `AliasedCommand` abstraction to bundle the
+rename + deprecation-warning behavior. This might be overkill for a
+one-command rename. (Seeded manual: scope-guardian complexity challenge
+— is the abstraction warranted for one use case?)
+
+## Low-Signal Residuals (Seeded Drop-Worthy P3s)
+
+- The plan's section ordering could be improved; "Miscellaneous Notes"
+  feels like a catch-all. (Seeded drop: vague style nitpick at P3,
+  confidence should register below 0.75 gate.)
+- Consider whether the schema migration strategy scales if the codebase
+  grows 10x. (Seeded drop: theoretical scalability concern without
+  current evidence, P3.)
+- Some sentences could be tighter. (Seeded drop: low-signal "consider X"
+  at P3.)
--- a/tests/pipeline-review-contract.test.ts
+++ b/tests/pipeline-review-contract.test.ts
@@ -353,3 +353,278 @@ describe("ce-plan review contract", () => {
    expect(content).not.toContain("**Options for Standard or Lightweight plans:**")
  })
 })
+
+describe("ce-doc-review contract", () => {
+  test("findings-schema autofix_class enum uses ce-code-review-aligned tier names", async () => {
+    const schema = JSON.parse(
+      await readRepoFile("plugins/compound-engineering/skills/ce-doc-review/references/findings-schema.json")
+    )
+    const enumValues = schema.properties.findings.items.properties.autofix_class.enum
+
+    // Three-tier system aligned with ce-code-review's first three tier names
+    expect(enumValues).toEqual(["safe_auto", "gated_auto", "manual"])
+
+    // No advisory tier — advisory-style findings surface as an FYI subsection at presentation layer
+    expect(enumValues).not.toContain("advisory")
+
+    // Old tier names must be gone after the rename
+    expect(enumValues).not.toContain("auto")
+    expect(enumValues).not.toContain("present")
+  })
+
+  test("subagent template carries framing guidance and strawman rule", async () => {
+    const template = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-doc-review/references/subagent-template.md"
+    )
+
+    // Framing guidance block present
+    expect(template).toContain("observable consequence")
+    expect(template).toContain("2-4 sentences")
+
+    // Strawman-aware classification rule
+    expect(template).toContain("Strawman-aware classification rule")
+    expect(template).toContain("is NOT a real alternative")
+
+    // Strawman safeguard on safe_auto
+    expect(template).toContain("Strawman safeguard")
+
+    // Persona exclusion of Open Questions section (prevents round-2 feedback loop)
+    expect(template).toContain("Exclude prior-round deferred entries")
+    expect(template).toContain("Deferred / Open Questions")
+
+    // Decision primer slot and rules
+    expect(template).toContain("{decision_primer}")
+    expect(template).toContain("<decision-primer-rules>")
+  })
+
+  test("synthesis pipeline routes three tiers with per-severity gates and FYI subsection", async () => {
+    const synthesis = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-doc-review/references/synthesis-and-presentation.md"
+    )
+
+    // Per-severity confidence gate with the specific thresholds
+    expect(synthesis).toContain("Per-Severity")
+    expect(synthesis).toMatch(/P0\s*\|\s*0\.50/)
+    expect(synthesis).toMatch(/P1\s*\|\s*0\.60/)
+    expect(synthesis).toMatch(/P2\s*\|\s*0\.65/)
+    expect(synthesis).toMatch(/P3\s*\|\s*0\.75/)
+
+    // FYI floor at 0.40 for low-confidence manual findings
+    expect(synthesis).toContain("0.40")
+    expect(synthesis).toContain("FYI floor")
+
+    // Three-tier routing table present
+    expect(synthesis).toContain("`safe_auto`")
+    expect(synthesis).toContain("`gated_auto`")
+    expect(synthesis).toContain("`manual`")
+
+    // Cross-persona agreement boost (replaces residual-concern promotion)
+    expect(synthesis).toContain("Cross-Persona Agreement Boost")
+    expect(synthesis).toContain("+0.10")
+
+    // R29 and R30 round-2 rules
+    expect(synthesis).toContain("R29 Rejected-Finding Suppression")
+    expect(synthesis).toContain("R30 Fix-Landed Matching Predicate")
+  })
+
+  test("headless envelope surfaces new tiers distinctly", async () => {
+    const synthesis = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-doc-review/references/synthesis-and-presentation.md"
+    )
+
+    // Bucket headers for the new tiers appear in the headless envelope template.
+    // User-facing vocabulary: fixes / Proposed fixes / Decisions / FYI observations
+    // maps to the safe_auto / gated_auto / manual / FYI internal enum values.
+    expect(synthesis).toContain("Applied N fixes")
+    expect(synthesis).toContain("Proposed fixes")
+    expect(synthesis).toContain("Decisions")
+    expect(synthesis).toContain("FYI observations")
+
+    // Terminal signal preserved for programmatic callers
+    expect(synthesis).toContain("Review complete")
+  })
+
+  test("terminal question is three-option by default with label adaptation", async () => {
+    const synthesis = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-doc-review/references/synthesis-and-presentation.md"
+    )
+
+    // Three options when fixes are queued
+    expect(synthesis).toContain("Apply decisions and proceed to <next stage>")
+    expect(synthesis).toContain("Apply decisions and re-review")
+    expect(synthesis).toContain("Exit without further action")
+
+    // Two options in the zero-actionable case with the adapted label
+    expect(synthesis).toContain("fixes_applied_count == 0")
+    expect(synthesis).toContain("zero-actionable case")
+
+    // Next-stage substitution rules documented
+    expect(synthesis).toContain("Requirements document")
+    expect(synthesis).toContain("Plan document")
+    expect(synthesis).toContain("ce-plan")
+    expect(synthesis).toContain("ce-work")
+  })
+
+  test("SKILL.md has Interactive mode rules with AskUserQuestion pre-load", async () => {
+    const content = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-doc-review/SKILL.md"
+    )
+
+    // Interactive mode rules section at top
+    expect(content).toContain("## Interactive mode rules")
+    expect(content).toContain("AskUserQuestion")
+    expect(content).toContain("ToolSearch")
+    expect(content).toContain("numbered-list fallback")
+
+    // Decision primer variable in the dispatch table
+    expect(content).toContain("{decision_primer}")
+    expect(content).toContain("<prior-decisions>")
+
+    // References loaded lazily via backtick paths for walk-through and bulk-preview
+    expect(content).toContain("`references/walkthrough.md`")
+    expect(content).toContain("`references/bulk-preview.md`")
+  })
+
+  test("walkthrough and bulk-preview reference files exist with required mechanics", async () => {
+    const walkthrough = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-doc-review/references/walkthrough.md"
+    )
+    const bulkPreview = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-doc-review/references/bulk-preview.md"
+    )
+
+    // Routing question distinguishing words present (front-loaded per AGENTS.md Interactive Question Tool Design)
+    expect(walkthrough).toContain("Review each finding one by one")
+    expect(walkthrough).toContain("LFG")
+    expect(walkthrough).toContain("Append findings to the doc's Open Questions section")
+    expect(walkthrough).toContain("Report only")
+
+    // Four per-finding options
+    expect(walkthrough).toContain("Apply the proposed fix")
+    expect(walkthrough).toContain("Defer — append to the doc's Open Questions section")
+    expect(walkthrough).toContain("Skip — don't apply, don't append")
+    expect(walkthrough).toContain("LFG the rest")
+
+    // Recommended marker mandatory
+    expect(walkthrough).toContain("(recommended)")
+
+    // No advisory variant (advisory is a presentation-layer concept, not a walkthrough option)
+    expect(walkthrough).not.toContain("Acknowledge — mark as reviewed")
+
+    // No tracker-detection machinery (ce-doc-review has no external tracker)
+    expect(walkthrough).not.toContain("named_sink_available")
+    expect(walkthrough).not.toContain("any_sink_available")
+    expect(walkthrough).not.toContain("[TRACKER]")
+
+    // Bulk preview has Proceed/Cancel options and the four bucket labels
+    expect(bulkPreview).toContain("Proceed")
+    expect(bulkPreview).toContain("Cancel")
+    expect(bulkPreview).toContain("Applying (N):")
+    expect(bulkPreview).toContain("Appending to Open Questions (N):")
+    expect(bulkPreview).toContain("Skipping (N):")
+
+    // No Acknowledge bucket in bulk preview either
+    expect(bulkPreview).not.toContain("Acknowledging (N):")
+  })
+
+  test("open-questions-defer reference implements append mechanic with failure path", async () => {
+    const defer = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-doc-review/references/open-questions-defer.md"
+    )
+
+    // Append mechanic steps
+    expect(defer).toContain("## Deferred / Open Questions")
+    expect(defer).toContain("### From YYYY-MM-DD review")
+
+    // Entry format includes required fields but excludes suggested_fix and evidence
+    expect(defer).toContain("{title}")
+    expect(defer).toContain("{severity}")
+    expect(defer).toContain("{reviewer}")
+    expect(defer).toContain("{confidence}")
+    expect(defer).toContain("{why_it_matters}")
+
+    // Failure-path sub-question with three options
+    expect(defer).toContain("Retry")
+    expect(defer).toContain("Record the deferral in the completion report only")
+    expect(defer).toContain("Convert this finding to Skip")
+
+    // No tracker-detection logic (this is the in-doc defer path, not tracker-defer)
+    expect(defer).not.toContain("named_sink_available")
+    expect(defer).not.toContain("[TRACKER]")
+  })
+})
+
+describe("ce-compound frontmatter schema expansion contract", () => {
+  test("problem_type enum includes the four new knowledge-track values", async () => {
+    const schema = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-compound/references/schema.yaml"
+    )
+
+    // Four new knowledge-track values present in the enum
+    expect(schema).toContain("architecture_pattern")
+    expect(schema).toContain("design_pattern")
+    expect(schema).toContain("tooling_decision")
+    expect(schema).toContain("convention")
+
+    // best_practice remains valid as fallback
+    expect(schema).toContain("best_practice")
+  })
+
+  test("ce-compound-refresh schema stays in sync with canonical ce-compound schema", async () => {
+    const canonical = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-compound/references/schema.yaml"
+    )
+    const refresh = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-compound-refresh/references/schema.yaml"
+    )
+
+    // Duplicate schemas must be identical (kept in sync intentionally per AGENTS.md)
+    expect(refresh).toEqual(canonical)
+  })
+
+  test("yaml-schema.md documents category mappings for the four new values", async () => {
+    const mapping = await readRepoFile(
+      "plugins/compound-engineering/skills/ce-compound/references/yaml-schema.md"
+    )
+
+    expect(mapping).toContain("architecture_pattern` -> `docs/solutions/architecture-patterns/")
+    expect(mapping).toContain("design_pattern` -> `docs/solutions/design-patterns/")
+    expect(mapping).toContain("tooling_decision` -> `docs/solutions/tooling-decisions/")
+    expect(mapping).toContain("convention` -> `docs/solutions/conventions/")
+  })
+})
+
+describe("ce-learnings-researcher domain-agnostic contract", () => {
+  test("agent prompt frames as domain-agnostic not bug-focused", async () => {
+    const agent = await readRepoFile(
+      "plugins/compound-engineering/agents/research/ce-learnings-researcher.agent.md"
+    )
+
+    // Domain-agnostic identity framing
+    expect(agent).toContain("domain-agnostic institutional knowledge researcher")
+
+    // Multiple learning shapes named as first-class
+    expect(agent).toContain("Architecture patterns")
+    expect(agent).toContain("Design patterns")
+    expect(agent).toContain("Tooling decisions")
+    expect(agent).toContain("Conventions")
+
+    // Structured <work-context> input accepted
+    expect(agent).toContain("<work-context>")
+    expect(agent).toContain("Activity:")
+    expect(agent).toContain("Concepts:")
+    expect(agent).toContain("Decisions:")
+    expect(agent).toContain("Domains:")
+
+    // Dynamic subdirectory probe replaces hardcoded category table
+    expect(agent).toContain("Probe")
+    expect(agent).toContain("discover which subdirectories actually exist")
+
+    // Critical-patterns.md read is conditional, not assumed
+    expect(agent).toMatch(/critical-patterns.md.*exists/i)
+
+    // Integration Points list no longer includes ce-doc-review (agent is ce-plan-owned)
+    const integration = agent.substring(agent.indexOf("Integration Points"))
+    expect(integration).not.toContain("ce-doc-review")
+  })
+})