john/claude-engineering-plugin

Fork 0

Files

Trevin Chow c1f68d4d55

CI / pr-title (push) Has been cancelled

Details

CI / test (push) Has been cancelled

Details

Release PR / release-pr (push) Has been cancelled

Details

Release PR / publish-cli (push) Has been cancelled

Details

feat(doc-review, learnings-researcher): tiers, chain grouping, rewrite (#601 )

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

2026-04-19 20:25:47 -07:00

12 KiB

Raw Blame History

title, type, status, date

title	type	status	date
Seeded Test Fixture — Auth Gateway Migration Plan	feat	active	2026-04-19

Seeded Auth Gateway Migration Plan

Problem Frame

Our internal API gateway currently implements authentication via a hand-rolled JWT layer and a custom policy-enforcement module. This plan migrates the gateway to a managed auth service (via service-mesh integration) and introduces a new DSL-based policy layer.

The migration affects 6 downstream services. No user-reported authentication failures motivated this work — the driver is infrastructure consolidation across teams.

Requirements Trace

7 requirements planned:

R1. Integrate with the managed auth service via service-mesh adapter
R2. Retire the hand-rolled JWT signing / verification layer
R3. Implement the new policy DSL parser and per-route policy cache
R4. Migrate credential storage from app-local config to managed secrets
R5. Add token-refresh middleware for downstream services
R6. Coordinate cutover with downstream services' deploy cycles

(Only 6 items listed despite "7 requirements" — seeded wrong-count safe_auto candidate.)

Scope Boundaries

Not changing the user-facing auth UX (login flows, error messages)
Not migrating non-gateway services' internal auth (out of scope for this phase)

Key Technical Decisions

Use the managed auth service's service-mesh adapter rather than direct SDK integration
Introduce a custom policy-DSL parser with a per-route policy cache layer (see Unit 9 for cache invalidation — seeded stale cross-reference; Unit 9 does not exist in this plan)
Store API keys in the managed secrets store; remove app-local config entries
Hand-roll the token-refresh loop (check expiry every 30s, renew if within 60s of expiry)

(Uses "API key", "token", "credential", and "secret" interchangeably throughout — seeded terminology drift safe_auto candidate.)

Implementation Units

Unit 1: Service-mesh adapter integration

Goal: Wire the gateway to the managed auth service via the mesh sidecar.

Files: internal/gateway/auth/mesh_adapter.go

Approach: Implement adapter interface against mesh sidecar. Fall back to legacy JWT layer during cutover window if adapter fails. (Seeded manual dependent of ROOT A: this complexity exists only because the migration is happening; if the migration premise is rejected, the adapter layer is unnecessary.)

Unit 2: Policy DSL parser

Goal: Parse the new policy DSL and compile to a per-route evaluator.

Files: internal/gateway/policy/parser.go, internal/gateway/policy/evaluator.go

Approach: Write a recursive-descent parser. Cache compiled evaluators in a concurrent map keyed by route. (Seeded manual dependent of ROOT B: the parser exists solely to support the custom policy layer; if the custom policy-layer premise is rejected in favor of the managed service's native policy language, the parser is dead code.)

Unit 3: Per-route policy cache

Goal: Cache compiled policy evaluators with LRU eviction.

Files: internal/gateway/policy/cache.go

Approach: Concurrent LRU keyed by (route_id, policy_version). Invalidate on config reload.

(Seeded manual dependent of ROOT B: cache design only matters if the custom policy layer exists.)

Unit 4: CSRF protection on new session endpoints

Goal: Add CSRF checks on the three new session endpoints introduced by the migration.

Files: internal/gateway/auth/session.go

Approach: Check the X-CSRF-Token header against a session-scoped token stored server-side. Reject requests where the token is missing or mismatched. No double-submit cookie pattern because the gateway is same-origin.

(Seeded gated_auto: OAuth2-Proxy ships a built-in CSRF middleware that handles this uniformly — including rotation and HMAC signing — which the hand-rolled version lacks. The hand-rolled check also omits the Origin header check that OAuth2-Proxy's default includes.)

Unit 5: Token-refresh middleware

Goal: Refresh short-lived tokens before they expire.

Files: internal/gateway/auth/refresh.go

Approach: Poll token expiry every 30 seconds. If within 60 seconds of expiry, call refresh endpoint and swap the token in-place. Log refresh failures but continue serving with the old token until it expires.

(Seeded gated_auto: the auth-service client library ships a refresh middleware that handles this uniformly — including backoff, concurrency guards against duplicate-refresh stampedes, and fail-closed semantics on refresh failure. The hand-rolled version is missing the concurrency guard and the fail-closed branch.)

Unit 6: Coordinate cutover with downstream services

Goal: Coordinate the gateway's cutover with the 6 downstream services.

Files: docs/rollout/auth-cutover-plan.md

Approach: Stagger rollout over 3 business days. Gateway deploys first, then downstream services pick up the new auth contract over the following 48 hours.

(Seeded gated_auto: no explicit deployment-ordering guarantee between the gateway's secrets-migration step and the downstream services' config reload — if the secrets migration lands before downstream services reload, they fail auth against the new store; if after, the gateway has no credentials for the window between its deploy and the migration. A dual-read or versioned-secrets pattern would close this.)

Risks

The migration's premise is "infrastructure consolidation." We have no user-reported auth failures and no stated reliability or security gap in the current hand-rolled layer. The consolidation benefit is real but speculative — this is a large refactor on a working system. (Seeded manual — ROOT A premise challenge: "Is migration to managed auth justified given no user-facing problem motivates it?")
The policy DSL is a new abstraction we build specifically for this gateway. The managed auth service ships its own policy language that covers 80% of our current rules natively. Hand-rolling the DSL means owning a parser, cache, and evaluator that the managed service would provide for free. (Seeded manual — ROOT B premise challenge: "Is the custom policy-enforcement layer warranted when the managed service ships one?")
The hand-rolled token-refresh loop has no concurrency guard; multiple goroutines may attempt refresh simultaneously under burst traffic, producing refresh-endpoint load spikes. (Seeded manual, independent of roots: this is an operational concern that exists regardless of which auth path is chosen.)

Miscellaneous Notes

The managed secrets store introduces a new rotation workflow we don't currently have. This is net-new operational surface: we'd need runbooks for manual rotation, automatic-rotation settings, and break-glass access. (Seeded manual dependent of ROOT A: this workflow only exists because of the migration; if the migration is rejected, the rotation surface stays as-is.)

Our error budget for the gateway is 0.1% monthly error rate. The plan does not state the expected error-rate impact of cutover, rollback criteria tied to the budget, or how the transition affects SLO burn. (Seeded manual independent of roots: operational obligation regardless of premise.)

We name the session context struct AuthContext in the new code but the existing code uses SessionContext for the same concept. (Seeded FYI: naming preference — both are legible, no wrong answer.)

The config-schema shape is fairly nested (4 levels deep) for a handful of flags. Could be flattened. (Seeded FYI: subjective readability note about schema shape.)

We could reuse this auth adapter pattern for a hypothetical future mobile SDK. That SDK isn't currently on the roadmap. (Seeded FYI: speculative future-work concern with no current signal.)

The gateway is single-region today. Multi-region is not on the near-term roadmap, but if it becomes relevant, the per-route policy cache would need cross-region invalidation. (Seeded drop: theoretical multi-region concern not relevant to current deployment, P3.)

PII Handling

Migration touches user-identifier fields during the JWT layer retirement. (Seeded manual independent of roots: PII compliance gap that applies during the migration window regardless of which premise holds; even if both premises are accepted, the migration itself needs explicit PII-handling guidance.)

Deferred to Implementation

Exact SLO monitoring dashboards
Per-service rollout timing

Known Drift

The existing hand-rolled JWT module is retained for one release after cutover as a fallback path (Unit 1). We may remove it later. (Seeded FYI: drift note without concrete action, low-stakes.)
Unit-organization choice: units are grouped by component (adapter, parser, cache, CSRF, refresh, cutover) rather than by endpoint class. Reads fine either way. (Seeded FYI: unit-organization preference, no wrong answer.)

Low-Signal Residuals (Seeded Drop-Worthy P3s)

The new policy layer "could be slow under load" — no baseline or benchmark, speculative. (Seeded drop: vague performance concern without evidence, P3.)
Commit-message style in the rollout plan uses short subjects; some may prefer longer. (Seeded drop: nitpick about commit-message convention, P3.)
The migration window is described as "a few days" — could be tighter. (Seeded drop: vague-phrasing preference at P3 with no consequence.)

12 KiB Raw Blame History

Seeded Auth Gateway Migration Plan

Problem Frame

Requirements Trace

Scope Boundaries

Key Technical Decisions

Implementation Units

Risks

Miscellaneous Notes

PII Handling

Deferred to Implementation

Known Drift

Low-Signal Residuals (Seeded Drop-Worthy P3s)

12 KiB

Raw Blame History