refactor(todos): remove internal file-based todo system (#635)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-21 18:16:13 -07:00
parent 19bbb60e90
commit accbd2adcf
27 changed files with 760 additions and 606 deletions
--- a/plugins/compound-engineering/README.md
+++ b/plugins/compound-engineering/README.md
@@ -62,8 +62,6 @@ For `/ce-optimize`, see [`skills/ce-optimize/README.md`](./skills/ce-optimize/RE
 | `/ce-setup` | Diagnose environment, install missing tools, and bootstrap project config |
 | `/ce-update` | Check compound-engineering plugin version and fix stale cache (Claude Code only) |
 | `/ce-release-notes` | Summarize recent compound-engineering plugin releases, or answer a question about a past release with a version citation |
-| `/ce-todo-resolve` | Resolve todos in parallel |
-| `/ce-todo-triage` | Triage and prioritize pending todos |

 ### Development Frameworks

@@ -84,7 +82,6 @@ For `/ce-optimize`, see [`skills/ce-optimize/README.md`](./skills/ce-optimize/RE
 | Skill | Description |
 |-------|-------------|
 | `ce-proof` | Create, edit, and share documents via Proof collaborative editor |
-| `ce-todo-create` | File-based todo tracking system |

 ### Automation & Tools

--- a/plugins/compound-engineering/skills/ce-code-review/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-code-review/SKILL.md
@@ -37,22 +37,22 @@ All tokens are optional. Each one present means one less thing to infer. When ab
 | Mode | When | Behavior |
 |------|------|----------|
 | **Interactive** (default) | No mode token present | Review, apply safe_auto fixes automatically, present findings, ask for policy decisions on gated/manual findings, and optionally continue into fix/push/PR next steps |
-| **Autofix** | `mode:autofix` in arguments | No user interaction. Review, apply only policy-allowed `safe_auto` fixes, re-review in bounded rounds, write a run artifact, and emit residual downstream work when needed |
-| **Report-only** | `mode:report-only` in arguments | Strictly read-only. Review and report only, then stop with no edits, artifacts, todos, commits, pushes, or PR actions |
-| **Headless** | `mode:headless` in arguments | Programmatic mode for skill-to-skill invocation. Apply `safe_auto` fixes silently (single pass), return all other findings as structured text output, write run artifacts, skip todos, and return "Review complete" signal. No interactive prompts. |
+| **Autofix** | `mode:autofix` in arguments | No user interaction. Review, apply only policy-allowed `safe_auto` fixes, re-review in bounded rounds, write a run artifact capturing residual downstream work |
+| **Report-only** | `mode:report-only` in arguments | Strictly read-only. Review and report only, then stop with no edits, artifacts, commits, pushes, or PR actions |
+| **Headless** | `mode:headless` in arguments | Programmatic mode for skill-to-skill invocation. Apply `safe_auto` fixes silently (single pass), return all other findings as structured text output, write run artifacts, and return "Review complete" signal. No interactive prompts. |

 ### Autofix mode rules

 - **Skip all user questions.** Never pause for approval or clarification once scope has been established.
 - **Apply only `safe_auto -> review-fixer` findings.** Leave `gated_auto`, `manual`, `human`, and `release` work unresolved.
- **Write a run artifact** under `.context/compound-engineering/ce-code-review/<run-id>/` summarizing findings, applied fixes, residual actionable work, and advisory outputs.
- **Create durable todo files only for unresolved actionable findings** whose final owner is `downstream-resolver`. Load the `ce-todo-create` skill for the canonical directory path and naming convention.
+- **Write a run artifact** under `.context/compound-engineering/ce-code-review/<run-id>/` summarizing findings, applied fixes, residual actionable work, and advisory outputs. Orchestrators read this artifact to route residual `downstream-resolver` findings; the skill itself does not file tickets or prompt the user in autofix.
+- **Emit a compact Residual Actionable Work summary in the autofix return** listing each residual `downstream-resolver` finding with severity, file:line, title, and autofix_class. Include the run-artifact path. Callers read this summary directly without parsing the artifact. When no residuals exist, state `Residual actionable work: none.` explicitly.
 - **Never commit, push, or create a PR** from autofix mode. Parent workflows own those decisions.

 ### Report-only mode rules

 - **Skip all user questions.** Infer intent conservatively if the diff metadata is thin.
- **Never edit files or externalize work.** Do not write `.context/compound-engineering/ce-code-review/<run-id>/`, do not create todo files, and do not commit, push, or create a PR.
+- **Never edit files or externalize work.** Do not write `.context/compound-engineering/ce-code-review/<run-id>/`, do not file tickets, and do not commit, push, or create a PR.
 - **Safe for parallel read-only verification.** `mode:report-only` is the only mode that is safe to run concurrently with browser testing on the same checkout.
 - **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:report-only` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`.
 - **Do not overlap mutating review with browser testing on the same checkout.** If a future orchestrator wants fixes, run the mutating review phase after browser testing or in an isolated checkout/worktree.
@@ -64,7 +64,7 @@ All tokens are optional. Each one present means one less thing to infer. When ab
 - **Apply only `safe_auto -> review-fixer` findings in a single pass.** No bounded re-review rounds. Leave `gated_auto`, `manual`, `human`, and `release` work unresolved and return them in the structured output.
 - **Return all non-auto findings as structured text output.** Use the headless output envelope format (see Stage 6 below) preserving severity, autofix_class, owner, requires_verification, confidence, pre_existing, and suggested_fix per finding. Enrich with detail-tier fields (why_it_matters, evidence[]) from the per-agent artifact files on disk (see Detail enrichment in Stage 6).
 - **Write a run artifact** under `.context/compound-engineering/ce-code-review/<run-id>/` summarizing findings, applied fixes, and advisory outputs. Include the artifact path in the structured output.
- **Do not create todo files.** The caller receives structured findings and routes downstream work itself.
+- **Do not file tickets or externalize work.** The caller receives structured findings and routes downstream work itself.
 - **Do not switch the shared checkout.** If the caller passes an explicit PR or branch target, `mode:headless` must run in an isolated checkout/worktree or stop instead of running `gh pr checkout` / `git checkout`. When stopping, emit `Review failed (headless mode). Reason: cannot switch shared checkout. Re-invoke with base:<ref> to review the current checkout, or run from an isolated worktree.`
 - **Not safe for concurrent use on a shared checkout.** Unlike `mode:report-only`, headless mutates files (applies `safe_auto` fixes). Callers must not run headless concurrently with other mutating operations on the same checkout.
 - **Never commit, push, or create a PR** from headless mode. The caller owns those decisions.
@@ -491,8 +491,8 @@ Assemble the final report using **pipe-delimited markdown tables for findings**
 1. **Header.** Scope, intent, mode, reviewer team with per-conditional justifications.
 2. **Findings.** Rendered as pipe-delimited tables grouped by severity (`### P0 -- Critical`, `### P1 -- High`, `### P2 -- Moderate`, `### P3 -- Low`). Each finding row shows `#`, file, issue, reviewer(s), confidence, and synthesized route. Omit empty severity levels. Never render findings as freeform text blocks or numbered lists.
 3. **Requirements Completeness.** Include only when a plan was found in Stage 2b. For each requirement (R1, R2, etc.) and implementation unit in the plan, report whether corresponding work appears in the diff. Use a simple checklist: met / not addressed / partially addressed. Routing depends on `plan_source`:
-   - **`explicit`** (caller-provided or PR body): Flag unaddressed requirements as P1 findings with `autofix_class: manual`, `owner: downstream-resolver`. These enter the residual actionable queue and can become todos.
-   - **`inferred`** (auto-discovered): Flag unaddressed requirements as P3 findings with `autofix_class: advisory`, `owner: human`. These stay in the report only — no todos, no autonomous follow-up. An inferred plan match is a hint, not a contract.
+   - **`explicit`** (caller-provided or PR body): Flag unaddressed requirements as P1 findings with `autofix_class: manual`, `owner: downstream-resolver`. These enter the residual actionable queue.
+   - **`inferred`** (auto-discovered): Flag unaddressed requirements as P3 findings with `autofix_class: advisory`, `owner: human`. These stay in the report only — no autonomous follow-up. An inferred plan match is a hint, not a contract.
   Omit this section entirely when no plan was found — do not mention the absence of a plan.
 4. **Applied Fixes.** Include only if a fix phase ran in this invocation.
 5. **Residual Actionable Work.** Include when unresolved actionable findings were handed off or should be handed off.
@@ -617,7 +617,7 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
 - **Fixer queue:** final findings routed to `safe_auto -> review-fixer`.
 - **Residual actionable queue:** unresolved `gated_auto` or `manual` findings whose final owner is `downstream-resolver`.
 - **Report-only queue:** `advisory` findings and any outputs owned by `human` or `release`.
- **Never convert advisory-only outputs into fix work or todos.** Deployment notes, residual risks, and release-owned items stay in the report.
+- **Never convert advisory-only outputs into fix work or ticket handoff.** Deployment notes, residual risks, and release-owned items stay in the report.

 #### Step 2: Choose policy by mode

@@ -625,7 +625,7 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R

 - Apply `safe_auto -> review-fixer` findings automatically without asking. These are safe by definition.
 - **Zero-remaining case:** if no `gated_auto` or `manual` findings remain after the `safe_auto` pass, skip the routing question entirely. Emit a one-line completion summary phrased so advisory and pre-existing findings (which are not handled by this flow) are not implied to be cleared. When no advisory or pre-existing findings remain in the report, `All findings resolved — N safe_auto fixes applied.` is accurate. When advisory and/or pre-existing findings do remain, use the qualified form `All actionable findings resolved — N safe_auto fixes applied. (K advisory, J pre-existing findings remain in the report.)`, omitting any zero-count clause. Follow the summary with the existing end-of-review verdict, then proceed to Step 5 per the gating rule there.
- **Tracker pre-detection:** before rendering the routing question, consult `references/tracker-defer.md` for the session's tracker tuple `{ tracker_name, confidence, named_sink_available, any_sink_available }`. The probe runs at most once per session and is cached for the rest of the run. `named_sink_available` drives the option C label (inline tracker name only when the named sink can actually be invoked). `any_sink_available` drives whether option C is offered at all (it can still be offered when the named tracker is unreachable but `gh` or the harness task primitive works).
+- **Tracker pre-detection:** before rendering the routing question, consult `references/tracker-defer.md` for the session's tracker tuple `{ tracker_name, confidence, named_sink_available, any_sink_available }`. The probe runs at most once per session and is cached for the rest of the run. `named_sink_available` drives the option C label (inline tracker name only when the named sink can actually be invoked). `any_sink_available` drives whether option C is offered at all (it can still be offered when the named tracker is unreachable but GitHub Issues via `gh` works).
 - **Verify question-tool pre-load (checklist, Claude Code only).** Before firing the routing question in Claude Code, confirm `AskUserQuestion` is loaded (per Interactive mode rules at the top of this skill). If not yet loaded this session, call `ToolSearch` with query `select:AskUserQuestion` now. Do not proceed to the routing question without this verification. Rendering the question as narrative text because the schema isn't loaded yet is a bug, not a valid fallback. On Codex and Gemini this checklist does not apply — there is no `ToolSearch` preload step to perform. (If `request_user_input` is unavailable in the current Codex runtime mode, use the numbered-list fallback described below.)
 - **Routing question.** Ask using the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Stem: `What should the agent do with the remaining N findings?` — use third-person voice referring to "the agent", not first-person "me" / "I". Options:

@@ -636,7 +636,7 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
  (D) Report only — take no further action
  ```

-  Render option C per `references/tracker-defer.md`: when `confidence = high` AND `named_sink_available = true`, replace `[TRACKER]` with the concrete name and keep the full label (e.g., `File a Linear ticket per finding without applying fixes`). When `any_sink_available = true` but either `confidence = low` or `named_sink_available = false` (a fallback tier like GitHub Issues or the harness task primitive is working instead), use the generic label `File an issue per finding without applying fixes` — this is a whole-label substitution, not a `[TRACKER]` token swap. When `any_sink_available = false`, **omit option C entirely** and add one line to the stem explaining why (e.g., `Defer unavailable — no tracker or task-tracking primitive detected on this platform.`). The three remaining options (A, B, D) survive.
+  Render option C per `references/tracker-defer.md`: when `confidence = high` AND `named_sink_available = true`, replace `[TRACKER]` with the concrete name and keep the full label (e.g., `File a Linear ticket per finding without applying fixes`). When `any_sink_available = true` but either `confidence = low` or `named_sink_available = false` (GitHub Issues via `gh` is working as the fallback), use the generic label `File an issue per finding without applying fixes` — this is a whole-label substitution, not a `[TRACKER]` token swap. When `any_sink_available = false`, **omit option C entirely** and add one line to the stem explaining why (e.g., `Defer unavailable — no durable tracker sink detected on this platform.`). The three remaining options (A, B, D) survive.

  The numbered-list text fallback applies when `ToolSearch` explicitly returns no match for the platform's question tool or the tool call errors (including Codex runtime modes where `request_user_input` is unavailable). It does not apply when the agent simply hasn't loaded the tool yet — in that case, load it now (see the verification checklist above). When the fallback applies, present the options as a numbered list and wait for the user's reply — never silently skip the question.

@@ -659,7 +659,7 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R

 - Ask no questions.
 - Do not build a fixer queue.
- Do not create residual todos or `.context` artifacts.
+- Do not write `.context` artifacts.
 - Stop after Stage 6. Everything remains in the report.

 **Headless mode**
@@ -668,7 +668,7 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
 - Apply only the `safe_auto -> review-fixer` queue in a single pass. Do not enter the bounded re-review loop (Step 3). Spawn one fixer subagent, apply fixes, then proceed directly to Step 4.
 - Leave `gated_auto`, `manual`, `human`, and `release` items unresolved — they appear in the structured text output.
 - Output the headless output envelope (see Stage 6) instead of the interactive report.
- Write a run artifact (Step 4) but do not create todo files.
+- Write a run artifact (Step 4). Do not file tickets or externalize work — the caller owns that.
 - Stop after the structured text output and "Review complete" signal. No commit/push/PR.

 #### Step 3: Apply fixes with one fixer and bounded rounds
@@ -699,10 +699,8 @@ After presenting findings and verdict (Stage 6), route the next steps by mode. R
  }
  ```
  Capture `branch` and `head_sha` at dispatch time (before any autofixes land), and write the file after the verdict is finalized. This file is additive -- pre-existing artifacts that predate this field are still valid, and downstream skills fall back to file mtime when it is missing.
- In autofix mode, create durable todo files only for unresolved actionable findings whose final owner is `downstream-resolver`. Load the `ce-todo-create` skill for the canonical directory path, naming convention, YAML frontmatter structure, and template. Each todo should map the finding's severity to the todo priority (`P0`/`P1` -> `p1`, `P2` -> `p2`, `P3` -> `p3`) and set `status: ready` since these findings have already been triaged by synthesis.
- Do not create todos for `advisory` findings, `owner: human`, `owner: release`, or protected-artifact cleanup suggestions.
- If only advisory outputs remain, create no todos.
- Interactive mode may offer to externalize residual actionable work after fixes, but it is not required to finish the review.
+- In autofix mode, the run artifact is the handoff. Orchestrators read the artifact's residual actionable work and route it as appropriate. The skill itself does not file tickets or prompt the user in autofix.
+- Interactive mode may offer to externalize residual actionable work via `references/tracker-defer.md` (named tracker -> GitHub Issues via `gh`), but it is not required to finish the review.

 #### Step 5: Final next steps

--- a/plugins/compound-engineering/skills/ce-code-review/references/bulk-preview.md
+++ b/plugins/compound-engineering/skills/ce-code-review/references/bulk-preview.md
@@ -128,5 +128,5 @@ Failure during `Proceed` (e.g., ticket creation fails for one finding during a b
 - **Zero findings in a bucket:** omit the bucket header. A preview with only Apply and Skip does not show an empty `Filing tickets (0):` or `Acknowledging (0):` line.
 - **All findings in one bucket:** preview still shows the bucket header; Proceed / Cancel still offered. This is the common case for routing option C (every finding under `Filing tickets`).
 - **N=1 preview (only one finding in scope):** the preview still uses the grouped format, just with a single-line bucket. `Proceed` / `Cancel` still apply.
- **No tracker available:** option C is not offered upstream (see `tracker-defer.md` no-sink handling). LFG (option B) and walk-through `LFG the rest` can still run — they may contain per-finding Defer recommendations from Stage 5. Before rendering any LFG-shaped preview, downgrade every Defer recommendation to Skip when the session's cached `any_sink_available` is false, and surface the downgrade on the preview itself (e.g., a `Skipping — defer sink unavailable (N):` bucket, or a note in the header: `N Defer recommendations downgraded to Skip — no tracker sink`). This is a preview-time runtime step, not Stage 5 tie-breaking — step 7b only orders conflicting reviewer recommendations (`Skip > Defer > Apply > Acknowledge`, as defined in `SKILL.md` Stage 5 step 7b) and has no knowledge of sink availability.
+- **No tracker available:** option C is not offered upstream (see `tracker-defer.md` no sink handling). LFG (option B) and walk-through `LFG the rest` can still run — they may contain per-finding Defer recommendations from Stage 5. Before rendering any LFG-shaped preview, downgrade every Defer recommendation to Skip when the session's cached `any_sink_available` is false, and surface the downgrade on the preview itself (e.g., a `Skipping — defer sink unavailable (N):` bucket, or a note in the header: `N Defer recommendations downgraded to Skip — no tracker sink`). This is a preview-time runtime step, not Stage 5 tie-breaking — step 7b only orders conflicting reviewer recommendations (`Skip > Defer > Apply > Acknowledge`, as defined in `SKILL.md` Stage 5 step 7b) and has no knowledge of sink availability.
 - **Walk-through `LFG the rest` with zero remaining findings:** the walk-through's own logic suppresses `LFG the rest` as an option when N=1 and otherwise, so the preview should never be invoked with zero remaining findings. If it is, render `LFG plan — 0 remaining findings` and fall through to Proceed with no-op.
--- a/plugins/compound-engineering/skills/ce-code-review/references/review-output-template.md
+++ b/plugins/compound-engineering/skills/ce-code-review/references/review-output-template.md
@@ -50,8 +50,8 @@ Use this **exact format** when presenting synthesized review findings. Findings

 | # | File | Issue | Route | Next Step |
 |---|------|-------|-------|-----------|
-| 1 | `orders_controller.rb:42` | Ownership check missing on export lookup | `gated_auto -> downstream-resolver` | Create residual todo and require explicit approval before behavior change |
-| 2 | `export_service.rb:91` | Pagination contract needs a broader API decision | `manual -> downstream-resolver` | Create residual todo with contract and client impact details |
+| 1 | `orders_controller.rb:42` | Ownership check missing on export lookup | `gated_auto -> downstream-resolver` | Defer via tracker (requires explicit approval before behavior change) |
+| 2 | `export_service.rb:91` | Pagination contract needs a broader API decision | `manual -> downstream-resolver` | Defer via tracker with contract and client impact details |

 ### Pre-existing Issues

--- a/plugins/compound-engineering/skills/ce-code-review/references/tracker-defer.md
+++ b/plugins/compound-engineering/skills/ce-code-review/references/tracker-defer.md
@@ -1,8 +1,31 @@
 # Tracker Detection and Defer Execution

-This reference covers how Interactive mode's Defer actions file tickets in the project's tracker. It is loaded by `SKILL.md` when the routing question needs to decide whether to offer option C (File tickets), when the walk-through's Defer option executes, and when the bulk-preview of option C (File tickets per finding) is shown.
+This reference covers how Defer actions file tickets in the project's tracker. It is loaded by `SKILL.md` when Interactive mode's routing question needs to decide whether to offer option C (File tickets), when the walk-through's Defer option executes, and when the bulk-preview of option C is shown. It is also loaded by autonomous callers (e.g., `lfg`) that need to file residual actionable findings without user prompts — see Execution Modes below.

-Interactive mode only. Autofix, Report-only, and Headless modes do not use this reference.
+---
+
+## Execution Modes
+
+Tracker-defer has two execution modes. The caller selects one; the detection, fallback chain, and ticket composition are shared.
+
+### Interactive mode (default)
+
+Used by `ce-code-review` Interactive mode's routing question, walk-through Defer actions, and bulk-preview option C. All user-facing prompts fire:
+
+- First Defer of the session with a generic (non-named) label confirms the effective tracker choice.
+- Execution failures prompt with Retry / Fall back to next sink / Convert to Skip.
+- Labels in the routing question reflect `named_sink_available` (name the tracker) vs fallback generics.
+
+### Non-interactive mode
+
+Used by autonomous callers like `lfg` that must not prompt. All blocking questions are skipped; the fallback chain is executed silently in order. Behavior:
+
+- No confirmation on the first generic-label Defer; proceed directly.
+- On execution failure, automatically fall to the next tier without prompting. Record the failure.
+- On total chain exhaustion (every tier failed or no sink available), return findings in the `no_sink` bucket so the caller can route them to another surface (e.g., inline them in a PR description).
+- Return a structured result: `{ filed: [{ finding_id, tracker, url }], failed: [{ finding_id, tracker, reason }], no_sink: [{ finding_id, title, severity, file, line }] }`.
+
+The caller decides how to surface the result to the user. The non-interactive mode treats "no sink available" as a data-producing outcome, not a prompt trigger.

 ---

@@ -10,7 +33,7 @@ Interactive mode only. Autofix, Report-only, and Headless modes do not use this

 The agent determines the project's tracker from whatever documentation is obvious. Primary sources: `CLAUDE.md` and `AGENTS.md` at the repo root and in relevant subdirectories. Supplementary signals (when primary documentation is ambiguous): `CONTRIBUTING.md`, `README.md`, PR templates under `.github/`, visible tracker URLs in the repo.

-A tracker can be surfaced via MCP tool (e.g., a Linear MCP server), CLI (e.g., `gh`), or direct API. All are acceptable. The detection output is a tuple with two availability flags — one for the named tracker specifically (drives label confidence) and one for the full fallback chain (drives whether Defer is offered at all):
+A tracker can be surfaced via MCP tool (e.g., a Linear MCP server), CLI (e.g., `gh`), or direct API. All are acceptable. The detection output is a tuple with two availability flags — one for the named tracker specifically (drives label confidence in Interactive mode) and one for the full fallback chain (drives whether Defer is offered at all):

 ```
 { tracker_name, confidence, named_sink_available, any_sink_available }
@@ -20,63 +43,48 @@ Where:
 - `tracker_name` — human-readable name ("Linear", "GitHub Issues", "Jira"), or `null` when detection cannot identify a specific tracker
 - `confidence` — `high` when the tracker is named explicitly in documentation (or via a linked URL to a specific project/workspace) and is unambiguously the project's canonical tracker; `low` when the signal is thin, conflicting, or implied only
 - `named_sink_available` — `true` only when the agent can actually invoke the detected tracker (MCP tool is loaded, CLI is authenticated, or API credentials are in environment); `false` when the tracker is documented but no tool reaches it, or when no tracker is found at all. Drives label confidence: inline tracker naming requires this to be `true`.
- `any_sink_available` — `true` when any tier in the fallback chain (named tracker, GitHub Issues via `gh`, or harness task-tracking primitive) can be invoked this session. Drives whether Defer is offered: no-sink behavior fires only when this is `false`.
+- `any_sink_available` — `true` when any tier in the fallback chain (named tracker or GitHub Issues via `gh`) can be invoked this session. Drives whether Defer is offered in Interactive mode, and drives the `no_sink` bucket in Non-interactive mode.

-Detection is reasoning-based. Do not maintain an enumerated checklist of files to read. Read the obvious sources and form a confident conclusion; when the obvious sources don't resolve, the label falls back to generic wording and the agent confirms with the user before executing.
+Detection is reasoning-based. Do not maintain an enumerated checklist of files to read. Read the obvious sources and form a confident conclusion; when the obvious sources don't resolve, the label falls back to generic wording and the agent confirms with the user before executing (Interactive mode only).

 ---

 ## Probe timing and caching

-Availability probes run **at most once per session** and **only when the routing question is about to be asked**. Never speculatively at review start, never per-Defer, never per-walk-through-finding. The cached tuple is reused for every Defer action in the same run.
+Availability probes run **at most once per session** and **only when Defer execution is imminent**. Never speculatively at review start, never per-Defer, never per-walk-through-finding. The cached tuple is reused for every Defer action in the same run.

 Typical probe sequence:

 1. Read `CLAUDE.md` / `AGENTS.md` for tracker references. If nothing found, set `tracker_name = null`, `confidence = low`.
 2. **Probe the named tracker when one was found.** For GitHub Issues, run `gh auth status` and `gh repo view --json hasIssuesEnabled`. For Linear or other MCP-backed trackers, verify the relevant MCP tool is loaded and responsive. For API-backed trackers, verify credentials in environment. Set `named_sink_available` from the probe result.
-3. **Probe the fallback tiers to compute `any_sink_available`.** Even when the named tracker was found and probed, the fallback tiers matter for the "no-sink" decision so that a run with no documented tracker but working `gh` still offers Defer. Stop at the first working tier:
+3. **Probe the GitHub Issues fallback to compute `any_sink_available`.** Even when the named tracker was found and probed, `gh` matters for the `no_sink` bucket decision so that a run with no documented tracker but working `gh` still offers Defer.
   - If `named_sink_available = true`: `any_sink_available = true` (no further probes needed).
   - Otherwise, probe GitHub Issues via `gh auth status` + `gh repo view --json hasIssuesEnabled` (skip if already probed in step 2). If it works, `any_sink_available = true`.
-   - Otherwise, check the harness task-tracking primitive. `TaskCreate` / `update_plan` are typically always present when the skill runs inside their harness — treat as available unless the session is in a context that explicitly forbids it (e.g., converted targets without task binding).
-   - If every tier fails, `any_sink_available = false`.
+   - Otherwise, `any_sink_available = false`.

-When the routing question is skipped entirely (R2 zero-findings case), no probes run. When the cached tuple is reused across a session, any `named_sink_available = true` from the session's first probe stays cached — do not re-probe per Defer.
+When Interactive mode's routing question is skipped entirely (R2 zero-findings case), no probes run. When the cached tuple is reused across a session, any `named_sink_available = true` from the session's first probe stays cached — do not re-probe per Defer.

 ---

-## Label logic
+## Label logic (Interactive mode)

 - When `confidence = high` AND `named_sink_available = true`: the routing question's option C and the walk-through's per-finding Defer option both include the tracker name verbatim. Example: `File a Linear ticket per finding`, `Defer — file a Linear ticket`.
 - When `any_sink_available = true` but either `confidence = low` or `named_sink_available = false` (a fallback tier is working instead): the labels read generically — `File an issue per finding`, `Defer — file a ticket`. Before executing the first Defer of the session, the agent confirms the effective tracker choice with the user using the platform's blocking question tool.
 - When `any_sink_available = false`: option C is omitted from the routing question, option B (Defer) is omitted from the walk-through per-finding options, and the agent tells the user why in the routing question's stem.

+Non-interactive mode skips label decisions entirely — it acts silently on the detected sink.
+
 ---

 ## Fallback chain

-When the named tracker is unavailable or no tracker is named, fall back in this order. Prefer durable external trackers over in-session-only primitives.
+When the named tracker is unavailable or no tracker is named, fall back in this order. Prefer the project's detected tracker; use `gh` only when no named tracker was found or the named one is unreachable.

-1. **Named tracker** (MCP tool, CLI, or API the agent can invoke directly)
+1. **Named tracker** (MCP tool, CLI, or API the agent can invoke directly, identified via Detection above)
 2. **GitHub Issues via `gh`** — when `gh auth status` succeeds and the current repo has issues enabled (`gh repo view --json hasIssuesEnabled` returns `true`)
-3. **Harness task-tracking primitive** — `TaskCreate` in Claude Code, `update_plan` in Codex, or the equivalent on other target platforms — used as a last resort and only after a once-per-session durability confirmation (below)
+3. **No sink** — findings remain in the review report's residual-work section (Interactive mode) or are returned in the `no_sink` bucket for the caller to route (Non-interactive mode). The agent does not re-display them through a transient surface.

-Never fall back to `.context/compound-engineering/todos/`. The internal-todos system is on a deprecation path (see plan scope boundaries) and must not be extended by this Defer path.
-
---
-
-## Once-per-session harness-fallback confirmation
-
-When the fallback to harness task-tracking primitive is in effect, and before the first Defer action of the session executes, the agent asks the user once using the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code, `AskUserQuestion` is a deferred tool — before the first call this session, load its schema via `ToolSearch` with query `select:AskUserQuestion`.
-
-> No documented tracker was found and `gh` is not available. Defer actions will create in-session tasks that do not survive past this session. Proceed for this and subsequent Defer actions?
-
-Options:
- `Proceed with in-session tasks` — the agent continues with harness task creation for every Defer in this run
- `Cancel — leave findings as residual in the report` — the agent converts all pending Defers to Skip with a note, and surfaces the findings in the completion report's residual-work section
-
-The confirmation is cached for the session. Subsequent Defer actions do not re-prompt.
-
-Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to numbered options and waiting for the user's reply.
+Previously this chain included a third in-session fallback tier. That tier was removed because in-session tasks do not survive past the session and therefore do not meet the "durable filing" intent of a Defer action. When no durable tracker exists, the correct behavior is to leave findings in the report (Interactive) or return them to the caller (Non-interactive).

 ---

@@ -99,19 +107,23 @@ The finding_id is a stable fingerprint composed as `normalize(file) + line_bucke

 ## Failure path

-When ticket creation fails at execution (API error, auth expiry mid-session, rate limit, malformed body rejected, 4xx/5xx response), the agent surfaces the failure inline and asks the user using the platform's blocking question tool:
+When ticket creation fails at execution (API error, auth expiry mid-session, rate limit, malformed body rejected, 4xx/5xx response):
+
+**Interactive mode:** surface the failure inline and ask the user using the platform's blocking question tool.

 Stem:
 > Defer failed: <tracker name> returned <error summary>. How should the agent handle this finding?

 Options:
 - `Retry on <tracker>` — re-attempt the same tracker once more (useful for transient errors)
- `Fall back to next sink` — move this finding's Defer to the next tier in the fallback chain (e.g., from Linear to GitHub Issues, or from GitHub Issues to harness task primitive)
+- `Fall back to next sink` — move this finding's Defer to the next tier in the fallback chain (e.g., from Linear to GitHub Issues)
 - `Convert to Skip — record the failure` — abandon this Defer, note the failure in the completion report's failure section, and continue the walk-through or bulk flow

+**Non-interactive mode:** do not prompt. Automatically fall through to the next tier. If every tier fails, record the finding in the `failed` bucket of the structured return and continue. If the chain exhausts with no sink ever available, the finding ends up in the `no_sink` bucket.
+
 When a high-confidence named tracker fails at execution, the cached `named_sink_available` is set to `false` for the rest of the session. Subsequent Defer actions fall straight through to the next tier without retrying a confirmed-broken sink. `any_sink_available` is only downgraded to `false` when every tier has been confirmed broken — a failed Linear call that succeeds via `gh` keeps `any_sink_available = true`.

-Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to numbered options and waiting for the user's reply.
+Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to numbered options and waiting for the user's reply (Interactive mode only).

 ---

@@ -124,8 +136,7 @@ Concrete behavior per tracker at execution time. The agent may invoke any of the
 | Linear | MCP (preferred) or API | Create issue in the project/workspace identified by documentation; assign to the reporter if the MCP tool exposes user context | Markdown | Severity priority field if the MCP exposes it; otherwise include severity in body |
 | GitHub Issues | `gh issue create` | Repo defaults to the current repo. Use `--label` for severity tag when labels exist; omit `--label` if the repo has no label fixture. Fall back to a label-less issue on first failure. | Markdown | `--label P0` / `--label P1` / etc. when labels exist |
 | Jira | MCP or API | Create issue in the project identified by documentation; Jira's markdown dialect differs from GitHub's — use plain text in the body when MCP does not handle conversion | Plain text when MCP does not handle markdown | Severity priority field |
-| Harness task primitive (last resort) | `TaskCreate` / `update_plan` / platform equivalent | Create one task per finding with subject = title and description = compact version of the body. No labels. Warn the user that tasks will not survive past the session (see once-per-session confirmation above). | Plain text, compact | None |
-| No sink available | — | Defer option is omitted; findings remain in the report's residual-work section | — | — |
+| No sink available | — | Interactive: Defer option omitted, findings remain in the report's residual-work section. Non-interactive: findings returned in the `no_sink` bucket for caller routing. | — | — |

 When uncertain, prefer "drop with explicit user-facing notice" over "pass through silently and hope." A Defer that produces no durable artifact and no user message is data loss.

@@ -133,6 +144,6 @@ When uncertain, prefer "drop with explicit user-facing notice" over "pass throug

 ## Cross-platform notes

-The question-tool name varies by platform. Use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code the tool should already be loaded from the Interactive-mode pre-load step — if it isn't, call `ToolSearch` with query `select:AskUserQuestion` now. Fall back to numbered options in chat only when the harness genuinely lacks a blocking tool — `ToolSearch` returns no match, the tool call explicitly fails, or the runtime mode does not expose it (e.g., Codex edit modes without `request_user_input`). A pending schema load is not a fallback trigger. Never silently skip the question.
+The question-tool name varies by platform. In Interactive mode, use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code the tool should already be loaded from the Interactive-mode pre-load step — if it isn't, call `ToolSearch` with query `select:AskUserQuestion` now. Fall back to numbered options in chat only when the harness genuinely lacks a blocking tool — `ToolSearch` returns no match, the tool call explicitly fails, or the runtime mode does not expose it (e.g., Codex edit modes without `request_user_input`). A pending schema load is not a fallback trigger. Never silently skip the question.

-The fallback chain's final tier (harness task-tracking primitive) does not exist on every target platform. When converted for a platform that has no equivalent of `TaskCreate` / `update_plan`, the agent should treat that platform as "no harness sink" and move directly to the no-sink behavior (omit Defer from menus and tell the user why).
+Non-interactive mode is platform-agnostic: it never prompts, so the platform's question tool is not relevant.
--- a/plugins/compound-engineering/skills/ce-code-review/references/walkthrough.md
+++ b/plugins/compound-engineering/skills/ce-code-review/references/walkthrough.md
@@ -141,8 +141,8 @@ When reviewers disagreed or content context cuts against the default, still mark

 - **Advisory-only finding:** when the finding's `autofix_class` is `advisory` (no actionable fix), option A is replaced with `Acknowledge — mark as reviewed`. The other three options remain. The advisory variant is the only case where `Acknowledge` appears in the menu.
 - **N=1 (exactly one pending finding):** the terminal block's heading omits `Finding N of M` and renders as `## {severity} {plain-English title}`. The stem's first line drops the position counter, becoming `{severity} {short handle}.` Option D (`LFG the rest`) is suppressed because no subsequent findings exist — the menu shows three options: Apply / Defer / Skip (or Acknowledge, for advisory).
- **No-sink (Defer option unavailable):** when the tracker-detection tuple reports `any_sink_available: false` (every tier in the fallback chain — named tracker, GitHub Issues, harness primitive — is unreachable), option B (`Defer`) is omitted. The stem appends one line explaining why (e.g., `Defer unavailable on this platform — no tracker or task-tracking primitive detected.`). The menu shows three options: Apply / Skip / LFG the rest (and Acknowledge in place of Apply for advisory-only findings). **Before rendering the options, remap any per-finding `Defer` recommendation produced by Stage 5 step 7b to `Skip`** so the `(recommended)` marker always lands on an option that is actually in the menu. When the remap fires, surface it on the R15 conflict context line (e.g., `Stage 5 recommended Defer; downgraded to Skip — no tracker sink available.`). This is a render-time runtime step mirroring the Defer→Skip downgrade that `bulk-preview.md` performs for LFG previews; Stage 5 step 7b has no knowledge of sink availability and only orders conflicting reviewer recommendations.
- **Combined N=1 + no-sink:** the menu shows two options: Apply / Skip (or Acknowledge / Skip).
+- **No sink (Defer option unavailable):** when the tracker-detection tuple reports `any_sink_available: false` (every tier in the fallback chain — named tracker and GitHub Issues via `gh` — is unreachable), option B (`Defer`) is omitted. The stem appends one line explaining why (e.g., `Defer unavailable on this platform — no durable tracker sink detected.`). The menu shows three options: Apply / Skip / LFG the rest (and Acknowledge in place of Apply for advisory-only findings). **Before rendering the options, remap any per-finding `Defer` recommendation produced by Stage 5 step 7b to `Skip`** so the `(recommended)` marker always lands on an option that is actually in the menu. When the remap fires, surface it on the R15 conflict context line (e.g., `Stage 5 recommended Defer; downgraded to Skip — no tracker sink available.`). This is a render-time runtime step mirroring the Defer→Skip downgrade that `bulk-preview.md` performs for LFG previews; Stage 5 step 7b has no knowledge of sink availability and only orders conflicting reviewer recommendations.
+- **Combined N=1 + no sink:** the menu shows two options: Apply / Skip (or Acknowledge / Skip).

 Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to presenting the options as a numbered list and waiting for the user's next reply.

--- a/plugins/compound-engineering/skills/ce-test-browser/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-test-browser/SKILL.md
@@ -220,14 +220,12 @@ When a test fails:
   Console errors: [if any]

   How to proceed?
-   1. Fix now - I'll help debug and fix
-   2. Create todo - Add a todo for later (using the todo-create skill)
-   3. Skip - Continue testing other pages
+   1. Fix now - debug and fix the failing test
+   2. Skip - continue testing other pages
   ```

 3. **If "Fix now":** investigate, propose a fix, apply, re-run the failing test
-4. **If "Create todo":** load the `ce-todo-create` skill and create a todo with priority p1 and description `browser-test-{description}`, continue
-5. **If "Skip":** log as skipped, continue
+4. **If "Skip":** log as skipped, continue

 ### 10. Test Summary

@@ -258,9 +256,6 @@ After all tests complete, present a summary:
 ### Failures: [count]
 - `/dashboard` - [issue description]

-### Created Todos: [count]
- `005-pending-p1-browser-test-dashboard-error.md`
-
 ### Result: [PASS / FAIL / PARTIAL]
 ```

--- a/plugins/compound-engineering/skills/ce-test-xcode/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-test-xcode/SKILL.md
@@ -61,7 +61,6 @@ Call `build_ios_sim_app` with the project path and scheme name.

 **On failure:**
 - Capture build errors
- Create a P1 todo for each build error
 - Report to user with specific error details

 **On success:**
@@ -142,14 +141,12 @@ When a test fails:
   Logs: [relevant error messages]

   How to proceed?
-   1. Fix now - I'll help debug and fix
-   2. Create todo - Add a todo for later (using the todo-create skill)
-   3. Skip - Continue testing other screens
+   1. Fix now - debug, propose a fix, rebuild and retest
+   2. Skip - continue testing other screens
   ```

 3. **If "Fix now":** investigate, propose a fix, rebuild and retest
-4. **If "Create todo":** load the `ce-todo-create` skill and create a todo with priority p1 and description `xcode-{description}`, continue
-5. **If "Skip":** log as skipped, continue
+4. **If "Skip":** log as skipped, continue

 ### 8. Test Summary

@@ -183,9 +180,6 @@ After all tests complete, present a summary:
 ### Failures: [count]
 - Settings screen - crash on navigation

-### Created Todos: [count]
- `006-pending-p1-xcode-settings-crash.md`
-
 ### Result: [PASS / FAIL / PARTIAL]
 ```

--- a/plugins/compound-engineering/skills/ce-todo-create/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-todo-create/SKILL.md
@@ -1,109 +0,0 @@
---
-name: ce-todo-create
-description: Use when creating durable work items, managing todo lifecycle, or tracking findings across sessions in the file-based todo system
-disable-model-invocation: true
---
-
-# File-Based Todo Tracking
-
-## Overview
-
-The `.context/compound-engineering/todos/` directory is a file-based tracking system for code review feedback, technical debt, feature requests, and work items. Each todo is a markdown file with YAML frontmatter.
-
-> **Legacy support:** Always check both `.context/compound-engineering/todos/` (canonical) and `todos/` (legacy) when reading. Write new todos only to the canonical path. This directory has a multi-session lifecycle -- do not clean it up as scratch.
-
-## Directory Paths
-
-| Purpose | Path |
-|---------|------|
-| **Canonical (write here)** | `.context/compound-engineering/todos/` |
-| **Legacy (read-only)** | `todos/` |
-
-## File Naming Convention
-
-```
-{issue_id}-{status}-{priority}-{description}.md
-```
-
- **issue_id**: Sequential number (001, 002, ...) -- never reused
- **status**: `pending` | `ready` | `complete`
- **priority**: `p1` (critical) | `p2` (important) | `p3` (nice-to-have)
- **description**: kebab-case, brief
-
-**Example:** `002-ready-p1-fix-n-plus-1.md`
-
-## File Structure
-
-Each todo has YAML frontmatter and structured sections. Use the todo template included below when creating new todos.
-
-```yaml
---
-status: ready
-priority: p1
-issue_id: "002"
-tags: [rails, performance]
-dependencies: ["001"]     # Issue IDs this is blocked by
---
-```
-
-**Required sections:** Problem Statement, Findings, Proposed Solutions, Recommended Action (filled during triage), Acceptance Criteria, Work Log.
-
-**Optional sections:** Technical Details, Resources, Notes.
-
-## Workflows
-
-> **Tool preference:** Use native file-search/glob and content-search tools instead of shell commands for finding and reading todo files. Shell only for operations with no native equivalent (`mv`, `mkdir -p`).
-
-### Creating a New Todo
-
-1. `mkdir -p .context/compound-engineering/todos/`
-2. Search both paths for `[0-9]*-*.md`, find the highest numeric prefix, increment, zero-pad to 3 digits.
-3. Use the todo template included below, write to canonical path as `{NEXT_ID}-pending-{priority}-{description}.md`.
-4. Fill Problem Statement, Findings, Proposed Solutions, Acceptance Criteria, and initial Work Log entry.
-5. Set status: `pending` (needs triage) or `ready` (pre-approved).
-
-**Create a todo when** the work needs more than ~15 minutes, has dependencies, requires planning, or needs prioritization. **Act immediately instead** when the fix is trivial, obvious, and self-contained.
-
-### Triaging Pending Items
-
-1. Glob `*-pending-*.md` in both paths.
-2. Review each todo's Problem Statement, Findings, and Proposed Solutions.
-3. Approve: rename `pending` -> `ready` in filename and frontmatter, fill Recommended Action.
-4. Defer: leave as `pending`.
-
-Load the `ce-todo-triage` skill for an interactive approval workflow.
-
-### Managing Dependencies
-
-```yaml
-dependencies: ["002", "005"]  # Blocked by these issues
-dependencies: []               # No blockers
-```
-
-To check blockers: search for `{dep_id}-complete-*.md` in both paths. Missing matches = incomplete blockers.
-
-### Completing a Todo
-
-1. Verify all acceptance criteria.
-2. Update Work Log with final session.
-3. Rename `ready` -> `complete` in filename and frontmatter.
-4. Check for unblocked work: search for files containing `dependencies:.*"{issue_id}"`.
-
-## Integration with Workflows
-
-| Trigger | Flow |
-|---------|------|
-| Code review | `/ce-code-review` -> Findings -> `/ce-todo-triage` -> Todos |
-| Autonomous review | `/ce-code-review mode:autofix` -> Residual todos -> `/ce-todo-resolve` |
-| Code TODOs | `/ce-todo-resolve` -> Fixes + Complex todos |
-| Planning | Brainstorm -> Create todo -> Work -> Complete |
-
-## Key Distinction
-
-This skill manages **durable, cross-session work items** persisted as markdown files. For temporary in-session step tracking, use platform task tools (`TaskCreate`/`TaskUpdate` in Claude Code, `update_plan` in Codex) instead.
-
---
-
-## Todo Template
-
-@./assets/todo-template.md
--- a/plugins/compound-engineering/skills/ce-todo-create/assets/todo-template.md
+++ b/plugins/compound-engineering/skills/ce-todo-create/assets/todo-template.md
@@ -1,155 +0,0 @@
---
-status: pending
-priority: p2
-issue_id: "XXX"
-tags: []
-dependencies: []
---
-
-# Brief Task Title
-
-Replace with a concise title describing what needs to be done.
-
-## Problem Statement
-
-What is broken, missing, or needs improvement? Provide clear context about why this matters.
-
-**Example:**
- Template system lacks comprehensive test coverage for edge cases discovered during PR review
- Email service is missing proper error handling for rate-limit scenarios
- Documentation doesn't cover the new authentication flow
-
-## Findings
-
-Investigation results, root cause analysis, and key discoveries.
-
- Finding 1 (with specifics: file, line number if applicable)
- Finding 2
- Key discovery with impact assessment
- Related issues or patterns discovered
-
-**Example format:**
- Identified 12 missing test scenarios in `app/models/user_test.rb`
- Current coverage: 60% of code paths
- Missing: empty inputs, special characters, large payloads
- Similar issues exist in `app/models/post_test.rb` (~8 scenarios)
-
-## Proposed Solutions
-
-Present multiple options with pros, cons, effort estimates, and risk assessment.
-
-### Option 1: [Solution Name]
-
-**Approach:** Describe the solution clearly.
-
-**Pros:**
- Benefit 1
- Benefit 2
-
-**Cons:**
- Drawback 1
- Drawback 2
-
-**Effort:** 2-3 hours
-
-**Risk:** Low / Medium / High
-
---
-
-### Option 2: [Solution Name]
-
-**Approach:** Describe the solution clearly.
-
-**Pros:**
- Benefit 1
- Benefit 2
-
-**Cons:**
- Drawback 1
- Drawback 2
-
-**Effort:** 4-6 hours
-
-**Risk:** Low / Medium / High
-
---
-
-### Option 3: [Solution Name]
-
-(Include if you have alternatives)
-
-## Recommended Action
-
-**To be filled during triage.** Clear, actionable plan for resolving this todo.
-
-**Example:**
-"Implement both unit tests (covering each scenario) and integration tests (full pipeline) before merging. Estimated 4 hours total effort. Target coverage > 85% for this module."
-
-## Technical Details
-
-Affected files, related components, database changes, or architectural considerations.
-
-**Affected files:**
- `app/models/user.rb:45` - full_name method
- `app/services/user_service.rb:12` - validation logic
- `test/models/user_test.rb` - existing tests
-
-**Related components:**
- UserMailer (depends on user validation)
- AccountPolicy (authorization checks)
-
-**Database changes (if any):**
- Migration needed? Yes / No
- New columns/tables? Describe here
-
-## Resources
-
-Links to errors, tests, PRs, documentation, similar issues.
-
- **PR:** #1287
- **Related issue:** #456
- **Error log:** [link to AppSignal incident]
- **Documentation:** [relevant docs]
- **Similar patterns:** Issue #200 (completed, ref for approach)
-
-## Acceptance Criteria
-
-Testable checklist items for verifying completion.
-
- [ ] All acceptance criteria checked
- [ ] Tests pass (unit + integration if applicable)
- [ ] Code reviewed and approved
- [ ] (Example) Test coverage > 85%
- [ ] (Example) Performance metrics acceptable
- [ ] (Example) Documentation updated
-
-## Work Log
-
-Chronological record of work sessions, actions taken, and learnings.
-
-### 2025-11-12 - Initial Discovery
-
-**By:** Claude Code
-
-**Actions:**
- Identified 12 missing test scenarios
- Analyzed existing test coverage (file:line references)
- Reviewed similar patterns in codebase
- Drafted 3 solution approaches
-
-**Learnings:**
- Similar issues exist in related modules
- Current test setup supports both unit and integration tests
- Performance testing would be valuable addition
-
---
-
-(Add more entries as work progresses)
-
-## Notes
-
-Additional context, decisions, or reminders.
-
- Decision: Include both unit and integration tests for comprehensive coverage
- Blocker: Depends on completion of issue #001
- Timeline: Priority for sprint due to blocking other work
--- a/plugins/compound-engineering/skills/ce-todo-resolve/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-todo-resolve/SKILL.md
@@ -1,68 +0,0 @@
---
-name: ce-todo-resolve
-description: Use when batch-resolving approved todos, especially after code review or triage sessions
-argument-hint: "[optional: specific todo ID or pattern]"
---
-
-Resolve approved todos using parallel processing, document lessons learned, then clean up.
-
-Only `ready` todos are resolved. `pending` todos are skipped — they haven't been triaged yet. If pending todos exist, list them at the end so the user knows what was left behind.
-
-## Workflow
-
-### 1. Analyze
-
-Scan `.context/compound-engineering/todos/*.md` and legacy `todos/*.md`. Partition by status:
-
- **`ready`** (status field or `-ready-` in filename): resolve these.
- **`pending`**: skip. Report them at the end.
- **`complete`**: ignore, already done.
-
-If a specific todo ID or pattern was passed as an argument, filter to matching todos only (still must be `ready`).
-
-Residual actionable work from `ce-code-review mode:autofix` after its `safe_auto` pass will already be `ready`.
-
-Skip any todo that recommends deleting, removing, or gitignoring files in `docs/brainstorms/`, `docs/plans/`, or `docs/solutions/` — these are intentional pipeline artifacts.
-
-### 2. Plan
-
-Create a task list grouped by type (e.g., `TaskCreate` in Claude Code, `update_plan` in Codex). Analyze dependencies -- items that others depend on run first. Output a mermaid diagram showing execution order and parallelism.
-
-### 3. Implement (PARALLEL)
-
-Spawn a `ce-pr-comment-resolver` agent per item. Prefer parallel; fall back to sequential respecting dependency order.
-
-**Batching:** 1-4 items: direct parallel returns. 5+ items: batches of 4, each returning only a short status summary (todo handled, files changed, tests run/skipped, blockers).
-
-For large sets, use a scratch directory at `.context/compound-engineering/ce-todo-resolve/<run-id>/` for per-resolver artifacts. Return only completion summaries to parent.
-
-### 4. Commit & Resolve
-
-Commit changes, mark todos resolved, push to remote.
-
-GATE: STOP. Verify todos resolved and changes committed before proceeding.
-
-### 5. Compound on Lessons Learned
-
-Load the `ce-compound` skill to document what was learned. Todo resolutions often surface patterns and architectural insights worth capturing.
-
-GATE: STOP. Verify the compound skill produced a solution document in `docs/solutions/`. If none (user declined or no learnings), continue.
-
-### 6. Clean Up
-
-Delete completed/resolved todo files from both paths. If a scratch directory was created at `.context/compound-engineering/ce-todo-resolve/<run-id>/`, delete it (unless user asked to inspect).
-
-```
-Todos resolved: [count]
-Pending (skipped): [count, or "none"]
-Lessons documented: [path to solution doc, or "skipped"]
-Todos cleaned up: [count deleted]
-```
-
-If pending todos were skipped, list them:
-
-```
-Skipped pending todos (run /ce-todo-triage to approve):
-  - 003-pending-p2-missing-index.md
-  - 005-pending-p3-rename-variable.md
-```
--- a/plugins/compound-engineering/skills/ce-todo-triage/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-todo-triage/SKILL.md
@@ -1,70 +0,0 @@
---
-name: ce-todo-triage
-description: Use when reviewing pending todos for approval, prioritizing code review findings, or interactively categorizing work items
-argument-hint: "[findings list or source type]"
-disable-model-invocation: true
---
-
-# Todo Triage
-
-Interactive workflow for reviewing pending todos one by one and deciding whether to approve, skip, or modify each.
-
-**Do not write code during triage.** This is purely for review and prioritization -- implementation happens in `/ce-todo-resolve`.
-
- First set the /model to Haiku
- Read all pending todos from `.context/compound-engineering/todos/` and legacy `todos/` directories
-
-## Workflow
-
-### 1. Present Each Finding
-
-For each pending todo, present it clearly with severity, category, description, location, problem scenario, proposed solution, and effort estimate. Then ask:
-
-```
-Do you want to add this to the todo list?
-1. yes - approve and mark ready
-2. next - skip (deletes the todo file)
-3. custom - modify before approving
-```
-
-Use severity levels: 🔴 P1 (CRITICAL), 🟡 P2 (IMPORTANT), 🔵 P3 (NICE-TO-HAVE).
-
-Include progress tracking in each header: `Progress: 3/10 completed`
-
-### 2. Handle Decision
-
-**yes:** Rename file from `pending` -> `ready` in both filename and frontmatter. Fill the Recommended Action section. If creating a new todo (not updating existing), use the naming convention from the `ce-todo-create` skill.
-
-Priority mapping: 🔴 P1 -> `p1`, 🟡 P2 -> `p2`, 🔵 P3 -> `p3`
-
-Confirm: "✅ Approved: `{filename}` (Issue #{issue_id}) - Status: **ready**"
-
-**next:** Delete the todo file. Log as skipped for the final summary.
-
-**custom:** Ask what to modify, update, re-present, ask again.
-
-### 3. Final Summary
-
-After all items processed:
-
-```markdown
-## Triage Complete
-
-**Total Items:** [X] | **Approved (ready):** [Y] | **Skipped:** [Z]
-
-### Approved Todos (Ready for Work):
- `042-ready-p1-transaction-boundaries.md` - Transaction boundary issue
-
-### Skipped (Deleted):
- Item #5: [reason]
-```
-
-### 4. Next Steps
-
-```markdown
-What would you like to do next?
-
-1. run /ce-todo-resolve to resolve the todos
-2. commit the todos
-3. nothing, go chill
-```
--- a/plugins/compound-engineering/skills/ce-work-beta/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-work-beta/SKILL.md
@@ -11,7 +11,7 @@ Execute work efficiently while maintaining quality and finishing features.

 ## Introduction

-This command takes a work document (plan, specification, or todo file) or a bare prompt describing the work, and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
+This command takes a work document (plan or specification) or a bare prompt describing the work, and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.

 **Beta rollout note:** Invoke `ce-work-beta` manually when you want to trial Codex delegation. During the beta period, planning and workflow handoffs remain pointed at stable `ce-work` to avoid dual-path orchestration complexity.

@@ -75,7 +75,7 @@ Store the resolved state for downstream consumption:

 Determine how to proceed based on what was provided in `<input_document>`.

-**Plan document** (input is a file path to an existing plan, specification, or todo file) → skip to Phase 1.
+**Plan document** (input is a file path to an existing plan or specification) → skip to Phase 1.

 **Bare prompt** (input is a description of work, not a file path):

@@ -164,8 +164,8 @@ Determine how to proceed based on what was provided in `<input_document>`.
   - You want to keep the default branch clean while experimenting
   - You plan to switch between branches frequently

-3. **Create Todo List** _(skip if Phase 0 already built one, or if Phase 0 routed as Trivial)_
-   - Use your available task tracking tool (e.g., TodoWrite, task lists) to break the plan into actionable tasks
+3. **Create Task List** _(skip if Phase 0 already built one, or if Phase 0 routed as Trivial)_
+   - Use the platform's task tracking tool (`TaskCreate`/`TaskUpdate`/`TaskList` in Claude Code, `update_plan` in Codex, or the equivalent on other harnesses) to break the plan into actionable tasks
   - Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
   - When the plan defines U-IDs for Implementation Units, preserve the unit's U-ID as a prefix in the task subject (e.g., "U3: Add parser coverage"). This keeps blocker references, deferred-work notes, and final summaries anchored to the same identifier the plan uses, so progress and traceability remain unambiguous across plan edits
   - Carry each unit's `Execution note` into the task when present
--- a/plugins/compound-engineering/skills/ce-work-beta/references/shipping-workflow.md
+++ b/plugins/compound-engineering/skills/ce-work-beta/references/shipping-workflow.md
@@ -20,7 +20,7 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas

   Every change gets reviewed before shipping. The depth scales with the change's risk profile, but review itself is never skipped.

-   **Tier 2: Full review (default)** -- REQUIRED unless Tier 1 criteria are explicitly met. Invoke the `ce-code-review` skill with `mode:autofix` to run specialized reviewer agents, auto-apply safe fixes, and surface residual work as todos. When the plan file path is known, pass it as `plan:<path>`. This is the mandatory default -- proceed to Tier 1 only after confirming every criterion below.
+   **Tier 2: Full review (default)** -- REQUIRED unless Tier 1 criteria are explicitly met. Invoke the `ce-code-review` skill with `mode:autofix` to run specialized reviewer agents, auto-apply safe fixes, and record residual downstream work in the per-run artifact. When the plan file path is known, pass it as `plan:<path>`. This is the mandatory default -- proceed to Tier 1 only after confirming every criterion below.

   **Tier 1: Inline self-review** -- A lighter alternative permitted only when **all four** criteria are true. Before choosing Tier 1, explicitly state which criteria apply and why. If any criterion is uncertain, use Tier 2.
   - Purely additive (new files only, no existing behavior modified)
@@ -28,7 +28,23 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas
   - Pattern-following (implementation mirrors an existing example with no novel logic)
   - Plan-faithful (no scope growth, no deferred questions resolved with surprising answers)

-3. **Final Validation**
+3. **Residual Work Gate** (REQUIRED when Tier 2 ran)
+
+   After Tier 2 code review completes, inspect the Residual Actionable Work summary it returned (or read the run artifact directly if the summary was not emitted). If one or more residual `downstream-resolver` findings remain, do not proceed to Final Validation until the user decides how to handle them.
+
+   Ask the user using the platform's blocking question tool (`AskUserQuestion` in Claude Code with `ToolSearch select:AskUserQuestion` pre-loaded if needed, `request_user_input` in Codex, `ask_user` in Gemini). Fall back to numbered options in chat only when the harness genuinely lacks a blocking tool. Never silently skip the gate.
+
+   Stem: `Code review found N residual finding(s) the skill did not auto-fix. How should the agent proceed?`
+
+   Options (four or fewer, self-contained labels):
+   - `Apply/fix now` — loop back into review with focused fixes; the agent investigates each finding, applies changes where safe, and re-runs review.
+   - `File tickets via project tracker` — load `references/tracker-defer.md` in Interactive mode; the agent files tickets in the project's detected tracker (or `gh` fallback, or leaves them in the report if no sink exists) and proceeds to Final Validation.
+   - `Accept and proceed` — record the residual findings verbatim in a durable "Known Residuals" sink before shipping. If a PR will be created or updated in Phase 4, include them in the PR description's "Known Residuals" section (the agent owns this when calling `ce-commit-push-pr`). If the user later chooses the no-PR `ce-commit` path, create `docs/residual-review-findings/<branch-or-head-sha>.md`, include the accepted findings and source review-run context, stage it with the implementation commit, and mention the file path in the final summary. The user has acknowledged the risk, but the findings must not live only in the transient session.
+   - `Stop — do not ship` — abort the shipping workflow. The user will handle findings manually before re-invoking.
+
+   Skip this gate entirely when the review reported `Residual actionable work: none.` or when only Tier 1 (inline self-review) was used. Do not proceed past this gate on an `Accept and proceed` decision until the agent has recorded whether the durable sink is `PR Known Residuals` or `docs/residual-review-findings/<branch-or-head-sha>.md`.
+
+4. **Final Validation**
   - All tasks marked completed
   - Testing addressed -- tests pass and new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
   - Linting passes
@@ -38,7 +54,7 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas
   - If the plan has a `Requirements Trace`, verify each requirement is satisfied by the completed work
   - If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution

-4. **Prepare Operational Validation Plan** (REQUIRED)
+5. **Prepare Operational Validation Plan** (REQUIRED)
   - Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
   - Include concrete:
     - Log queries/search terms
@@ -72,7 +88,8 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas
   - Testing notes (tests added/modified, manual testing performed)
   - Evidence context from step 1, so `ce-commit-push-pr` can decide whether to ask about capturing evidence
   - Figma design link (if applicable)
-   - The Post-Deploy Monitoring & Validation section (see Phase 3 Step 4)
+   - The Post-Deploy Monitoring & Validation section (see Phase 3 Step 5)
+   - Any "Known Residuals" accepted in the Phase 3 Residual Work Gate, rendered as a dedicated section in the PR body with severity, file:line, and title per finding

   If the user prefers to commit without creating a PR, load the `ce-commit` skill instead.

@@ -103,7 +120,7 @@ Before creating PR, verify:

 Every change gets reviewed. The tier determines depth, not whether review happens.

-**Tier 2 (full review)** -- REQUIRED default. Invoke `ce-code-review mode:autofix` with `plan:<path>` when available. Safe fixes are applied automatically; residual work surfaces as todos. Always use this tier unless all four Tier 1 criteria are explicitly confirmed.
+**Tier 2 (full review)** -- REQUIRED default. Invoke `ce-code-review mode:autofix` with `plan:<path>` when available. Safe fixes are applied automatically; residual work is recorded in the run artifact for downstream routing. Always use this tier unless all four Tier 1 criteria are explicitly confirmed.

 **Tier 1 (inline self-review)** -- permitted only when all four are true (state each explicitly before choosing):
 - Purely additive (new files only, no existing behavior modified)
--- a/plugins/compound-engineering/skills/ce-work-beta/references/tracker-defer.md
+++ b/plugins/compound-engineering/skills/ce-work-beta/references/tracker-defer.md
@@ -0,0 +1,149 @@
+# Tracker Detection and Defer Execution
+
+This reference covers how Defer actions file tickets in the project's tracker. It is loaded by `SKILL.md` when Interactive mode's routing question needs to decide whether to offer option C (File tickets), when the walk-through's Defer option executes, and when the bulk-preview of option C is shown. It is also loaded by autonomous callers (e.g., `lfg`) that need to file residual actionable findings without user prompts — see Execution Modes below.
+
+---
+
+## Execution Modes
+
+Tracker-defer has two execution modes. The caller selects one; the detection, fallback chain, and ticket composition are shared.
+
+### Interactive mode (default)
+
+Used by `ce-code-review` Interactive mode's routing question, walk-through Defer actions, and bulk-preview option C. All user-facing prompts fire:
+
+- First Defer of the session with a generic (non-named) label confirms the effective tracker choice.
+- Execution failures prompt with Retry / Fall back to next sink / Convert to Skip.
+- Labels in the routing question reflect `named_sink_available` (name the tracker) vs fallback generics.
+
+### Non-interactive mode
+
+Used by autonomous callers like `lfg` that must not prompt. All blocking questions are skipped; the fallback chain is executed silently in order. Behavior:
+
+- No confirmation on the first generic-label Defer; proceed directly.
+- On execution failure, automatically fall to the next tier without prompting. Record the failure.
+- On total chain exhaustion (every tier failed or no sink available), return findings in the `no_sink` bucket so the caller can route them to another surface (e.g., inline them in a PR description).
+- Return a structured result: `{ filed: [{ finding_id, tracker, url }], failed: [{ finding_id, tracker, reason }], no_sink: [{ finding_id, title, severity, file, line }] }`.
+
+The caller decides how to surface the result to the user. The non-interactive mode treats "no sink available" as a data-producing outcome, not a prompt trigger.
+
+---
+
+## Detection
+
+The agent determines the project's tracker from whatever documentation is obvious. Primary sources: `CLAUDE.md` and `AGENTS.md` at the repo root and in relevant subdirectories. Supplementary signals (when primary documentation is ambiguous): `CONTRIBUTING.md`, `README.md`, PR templates under `.github/`, visible tracker URLs in the repo.
+
+A tracker can be surfaced via MCP tool (e.g., a Linear MCP server), CLI (e.g., `gh`), or direct API. All are acceptable. The detection output is a tuple with two availability flags — one for the named tracker specifically (drives label confidence in Interactive mode) and one for the full fallback chain (drives whether Defer is offered at all):
+
+```
+{ tracker_name, confidence, named_sink_available, any_sink_available }
+```
+
+Where:
+- `tracker_name` — human-readable name ("Linear", "GitHub Issues", "Jira"), or `null` when detection cannot identify a specific tracker
+- `confidence` — `high` when the tracker is named explicitly in documentation (or via a linked URL to a specific project/workspace) and is unambiguously the project's canonical tracker; `low` when the signal is thin, conflicting, or implied only
+- `named_sink_available` — `true` only when the agent can actually invoke the detected tracker (MCP tool is loaded, CLI is authenticated, or API credentials are in environment); `false` when the tracker is documented but no tool reaches it, or when no tracker is found at all. Drives label confidence: inline tracker naming requires this to be `true`.
+- `any_sink_available` — `true` when any tier in the fallback chain (named tracker or GitHub Issues via `gh`) can be invoked this session. Drives whether Defer is offered in Interactive mode, and drives the `no_sink` bucket in Non-interactive mode.
+
+Detection is reasoning-based. Do not maintain an enumerated checklist of files to read. Read the obvious sources and form a confident conclusion; when the obvious sources don't resolve, the label falls back to generic wording and the agent confirms with the user before executing (Interactive mode only).
+
+---
+
+## Probe timing and caching
+
+Availability probes run **at most once per session** and **only when Defer execution is imminent**. Never speculatively at review start, never per-Defer, never per-walk-through-finding. The cached tuple is reused for every Defer action in the same run.
+
+Typical probe sequence:
+
+1. Read `CLAUDE.md` / `AGENTS.md` for tracker references. If nothing found, set `tracker_name = null`, `confidence = low`.
+2. **Probe the named tracker when one was found.** For GitHub Issues, run `gh auth status` and `gh repo view --json hasIssuesEnabled`. For Linear or other MCP-backed trackers, verify the relevant MCP tool is loaded and responsive. For API-backed trackers, verify credentials in environment. Set `named_sink_available` from the probe result.
+3. **Probe the GitHub Issues fallback to compute `any_sink_available`.** Even when the named tracker was found and probed, `gh` matters for the `no_sink` bucket decision so that a run with no documented tracker but working `gh` still offers Defer.
+   - If `named_sink_available = true`: `any_sink_available = true` (no further probes needed).
+   - Otherwise, probe GitHub Issues via `gh auth status` + `gh repo view --json hasIssuesEnabled` (skip if already probed in step 2). If it works, `any_sink_available = true`.
+   - Otherwise, `any_sink_available = false`.
+
+When Interactive mode's routing question is skipped entirely (R2 zero-findings case), no probes run. When the cached tuple is reused across a session, any `named_sink_available = true` from the session's first probe stays cached — do not re-probe per Defer.
+
+---
+
+## Label logic (Interactive mode)
+
+- When `confidence = high` AND `named_sink_available = true`: the routing question's option C and the walk-through's per-finding Defer option both include the tracker name verbatim. Example: `File a Linear ticket per finding`, `Defer — file a Linear ticket`.
+- When `any_sink_available = true` but either `confidence = low` or `named_sink_available = false` (a fallback tier is working instead): the labels read generically — `File an issue per finding`, `Defer — file a ticket`. Before executing the first Defer of the session, the agent confirms the effective tracker choice with the user using the platform's blocking question tool.
+- When `any_sink_available = false`: option C is omitted from the routing question, option B (Defer) is omitted from the walk-through per-finding options, and the agent tells the user why in the routing question's stem.
+
+Non-interactive mode skips label decisions entirely — it acts silently on the detected sink.
+
+---
+
+## Fallback chain
+
+When the named tracker is unavailable or no tracker is named, fall back in this order. Prefer the project's detected tracker; use `gh` only when no named tracker was found or the named one is unreachable.
+
+1. **Named tracker** (MCP tool, CLI, or API the agent can invoke directly, identified via Detection above)
+2. **GitHub Issues via `gh`** — when `gh auth status` succeeds and the current repo has issues enabled (`gh repo view --json hasIssuesEnabled` returns `true`)
+3. **No sink** — findings remain in the review report's residual-work section (Interactive mode) or are returned in the `no_sink` bucket for the caller to route (Non-interactive mode). The agent does not re-display them through a transient surface.
+
+Previously this chain included a third in-session fallback tier. That tier was removed because in-session tasks do not survive past the session and therefore do not meet the "durable filing" intent of a Defer action. When no durable tracker exists, the correct behavior is to leave findings in the report (Interactive) or return them to the caller (Non-interactive).
+
+---
+
+## Ticket composition
+
+Every Defer action creates a ticket with the following content, adapted to the tracker's capabilities:
+
+- **Title:** the merged finding's `title` (schema-capped at 10 words).
+- **Body:**
+  - Plain-English problem statement — reads the persona-produced `why_it_matters` from the contributing reviewer's artifact file at `.context/compound-engineering/ce-code-review/<run-id>/{reviewer}.json`, using the same `file + line_bucket(line, +/-3) + normalize(title)` matching headless mode uses (see SKILL.md Stage 6 detail enrichment). Falls back to the merged finding's `title`, `severity`, `file`, and `suggested_fix` (when present) when no artifact match is available — these fields are guaranteed in the merge-tier compact return.
+  - Suggested fix (when present in the finding's `suggested_fix`).
+  - Evidence (direct quotes from the reviewer's artifact).
+  - Metadata block: `Severity: <level>`, `Confidence: <score>`, `Reviewer(s): <list>`, `Finding ID: <fingerprint>`.
+- **Labels** (when the tracker supports labels): severity tag (`P0`, `P1`, `P2`, `P3`) and, when the tracker convention supports it, a category label sourced from the reviewer name.
+- **Length cap:** when the composed body would exceed a tracker's body length limit, truncate with `... (continued in ce-code-review run artifact: .context/compound-engineering/ce-code-review/<run-id>/)` and include the finding_id in both the truncated body and the metadata block so the artifact is discoverable.
+
+The finding_id is a stable fingerprint composed as `normalize(file) + line_bucket(line, +/-3) + normalize(title)` — the same fingerprint used by the merge pipeline.
+
+---
+
+## Failure path
+
+When ticket creation fails at execution (API error, auth expiry mid-session, rate limit, malformed body rejected, 4xx/5xx response):
+
+**Interactive mode:** surface the failure inline and ask the user using the platform's blocking question tool.
+
+Stem:
+> Defer failed: <tracker name> returned <error summary>. How should the agent handle this finding?
+
+Options:
+- `Retry on <tracker>` — re-attempt the same tracker once more (useful for transient errors)
+- `Fall back to next sink` — move this finding's Defer to the next tier in the fallback chain (e.g., from Linear to GitHub Issues)
+- `Convert to Skip — record the failure` — abandon this Defer, note the failure in the completion report's failure section, and continue the walk-through or bulk flow
+
+**Non-interactive mode:** do not prompt. Automatically fall through to the next tier. If every tier fails, record the finding in the `failed` bucket of the structured return and continue. If the chain exhausts with no sink ever available, the finding ends up in the `no_sink` bucket.
+
+When a high-confidence named tracker fails at execution, the cached `named_sink_available` is set to `false` for the rest of the session. Subsequent Defer actions fall straight through to the next tier without retrying a confirmed-broken sink. `any_sink_available` is only downgraded to `false` when every tier has been confirmed broken — a failed Linear call that succeeds via `gh` keeps `any_sink_available = true`.
+
+Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to numbered options and waiting for the user's reply (Interactive mode only).
+
+---
+
+## Per-tracker behavior
+
+Concrete behavior per tracker at execution time. The agent may invoke any of these through the appropriate interface (MCP, CLI, or API) — the choice depends on what is available in the current environment.
+
+| Tracker | Interface | Invocation sketch | Body format | Labels |
+|---------|-----------|-------------------|-------------|--------|
+| Linear | MCP (preferred) or API | Create issue in the project/workspace identified by documentation; assign to the reporter if the MCP tool exposes user context | Markdown | Severity priority field if the MCP exposes it; otherwise include severity in body |
+| GitHub Issues | `gh issue create` | Repo defaults to the current repo. Use `--label` for severity tag when labels exist; omit `--label` if the repo has no label fixture. Fall back to a label-less issue on first failure. | Markdown | `--label P0` / `--label P1` / etc. when labels exist |
+| Jira | MCP or API | Create issue in the project identified by documentation; Jira's markdown dialect differs from GitHub's — use plain text in the body when MCP does not handle conversion | Plain text when MCP does not handle markdown | Severity priority field |
+| No sink available | — | Interactive: Defer option omitted, findings remain in the report's residual-work section. Non-interactive: findings returned in the `no_sink` bucket for caller routing. | — | — |
+
+When uncertain, prefer "drop with explicit user-facing notice" over "pass through silently and hope." A Defer that produces no durable artifact and no user message is data loss.
+
+---
+
+## Cross-platform notes
+
+The question-tool name varies by platform. In Interactive mode, use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code the tool should already be loaded from the Interactive-mode pre-load step — if it isn't, call `ToolSearch` with query `select:AskUserQuestion` now. Fall back to numbered options in chat only when the harness genuinely lacks a blocking tool — `ToolSearch` returns no match, the tool call explicitly fails, or the runtime mode does not expose it (e.g., Codex edit modes without `request_user_input`). A pending schema load is not a fallback trigger. Never silently skip the question.
+
+Non-interactive mode is platform-agnostic: it never prompts, so the platform's question tool is not relevant.
--- a/plugins/compound-engineering/skills/ce-work/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-work/SKILL.md
@@ -10,7 +10,7 @@ Execute work efficiently while maintaining quality and finishing features.

 ## Introduction

-This command takes a work document (plan, specification, or todo file) or a bare prompt describing the work, and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
+This command takes a work document (plan or specification) or a bare prompt describing the work, and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.

 ## Input Document

@@ -22,7 +22,7 @@ This command takes a work document (plan, specification, or todo file) or a bare

 Determine how to proceed based on what was provided in `<input_document>`.

-**Plan document** (input is a file path to an existing plan, specification, or todo file) → skip to Phase 1.
+**Plan document** (input is a file path to an existing plan or specification) → skip to Phase 1.

 **Bare prompt** (input is a description of work, not a file path):

@@ -111,8 +111,8 @@ Determine how to proceed based on what was provided in `<input_document>`.
   - You want to keep the default branch clean while experimenting
   - You plan to switch between branches frequently

-3. **Create Todo List** _(skip if Phase 0 already built one, or if Phase 0 routed as Trivial)_
-   - Use your available task tracking tool (e.g., TodoWrite, task lists) to break the plan into actionable tasks
+3. **Create Task List** _(skip if Phase 0 already built one, or if Phase 0 routed as Trivial)_
+   - Use the platform's task tracking tool (`TaskCreate`/`TaskUpdate`/`TaskList` in Claude Code, `update_plan` in Codex, or the equivalent on other harnesses) to break the plan into actionable tasks
   - Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
   - When the plan defines U-IDs for Implementation Units, preserve the unit's U-ID as a prefix in the task subject (e.g., "U3: Add parser coverage"). This keeps blocker references, deferred-work notes, and final summaries anchored to the same identifier the plan uses, so progress and traceability remain unambiguous across plan edits
   - Carry each unit's `Execution note` into the task when present
--- a/plugins/compound-engineering/skills/ce-work/references/shipping-workflow.md
+++ b/plugins/compound-engineering/skills/ce-work/references/shipping-workflow.md
@@ -20,7 +20,7 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas

   Every change gets reviewed before shipping. The depth scales with the change's risk profile, but review itself is never skipped.

-   **Tier 2: Full review (default)** -- REQUIRED unless Tier 1 criteria are explicitly met. Invoke the `ce-code-review` skill with `mode:autofix` to run specialized reviewer agents, auto-apply safe fixes, and surface residual work as todos. When the plan file path is known, pass it as `plan:<path>`. This is the mandatory default -- proceed to Tier 1 only after confirming every criterion below.
+   **Tier 2: Full review (default)** -- REQUIRED unless Tier 1 criteria are explicitly met. Invoke the `ce-code-review` skill with `mode:autofix` to run specialized reviewer agents, auto-apply safe fixes, and record residual downstream work in the per-run artifact. When the plan file path is known, pass it as `plan:<path>`. This is the mandatory default -- proceed to Tier 1 only after confirming every criterion below.

   **Tier 1: Inline self-review** -- A lighter alternative permitted only when **all four** criteria are true. Before choosing Tier 1, explicitly state which criteria apply and why. If any criterion is uncertain, use Tier 2.
   - Purely additive (new files only, no existing behavior modified)
@@ -28,7 +28,23 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas
   - Pattern-following (implementation mirrors an existing example with no novel logic)
   - Plan-faithful (no scope growth, no deferred questions resolved with surprising answers)

-3. **Final Validation**
+3. **Residual Work Gate** (REQUIRED when Tier 2 ran)
+
+   After Tier 2 code review completes, inspect the Residual Actionable Work summary it returned (or read the run artifact directly if the summary was not emitted). If one or more residual `downstream-resolver` findings remain, do not proceed to Final Validation until the user decides how to handle them.
+
+   Ask the user using the platform's blocking question tool (`AskUserQuestion` in Claude Code with `ToolSearch select:AskUserQuestion` pre-loaded if needed, `request_user_input` in Codex, `ask_user` in Gemini). Fall back to numbered options in chat only when the harness genuinely lacks a blocking tool. Never silently skip the gate.
+
+   Stem: `Code review found N residual finding(s) the skill did not auto-fix. How should the agent proceed?`
+
+   Options (four or fewer, self-contained labels):
+   - `Apply/fix now` — loop back into review with focused fixes; the agent investigates each finding, applies changes where safe, and re-runs review.
+   - `File tickets via project tracker` — load `references/tracker-defer.md` in Interactive mode; the agent files tickets in the project's detected tracker (or `gh` fallback, or leaves them in the report if no sink exists) and proceeds to Final Validation.
+   - `Accept and proceed` — record the residual findings verbatim in a durable "Known Residuals" sink before shipping. If a PR will be created or updated in Phase 4, include them in the PR description's "Known Residuals" section (the agent owns this when calling `ce-commit-push-pr`). If the user later chooses the no-PR `ce-commit` path, create `docs/residual-review-findings/<branch-or-head-sha>.md`, include the accepted findings and source review-run context, stage it with the implementation commit, and mention the file path in the final summary. The user has acknowledged the risk, but the findings must not live only in the transient session.
+   - `Stop — do not ship` — abort the shipping workflow. The user will handle findings manually before re-invoking.
+
+   Skip this gate entirely when the review reported `Residual actionable work: none.` or when only Tier 1 (inline self-review) was used. Do not proceed past this gate on an `Accept and proceed` decision until the agent has recorded whether the durable sink is `PR Known Residuals` or `docs/residual-review-findings/<branch-or-head-sha>.md`.
+
+4. **Final Validation**
   - All tasks marked completed
   - Testing addressed -- tests pass and new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
   - Linting passes
@@ -38,7 +54,7 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas
   - If the plan has a `Requirements Trace`, verify each requirement is satisfied by the completed work
   - If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution

-4. **Prepare Operational Validation Plan** (REQUIRED)
+5. **Prepare Operational Validation Plan** (REQUIRED)
   - Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
   - Include concrete:
     - Log queries/search terms
@@ -72,7 +88,8 @@ This file contains the shipping workflow (Phase 3-4). Load it only when all Phas
   - Testing notes (tests added/modified, manual testing performed)
   - Evidence context from step 1, so `ce-commit-push-pr` can decide whether to ask about capturing evidence
   - Figma design link (if applicable)
-   - The Post-Deploy Monitoring & Validation section (see Phase 3 Step 4)
+   - The Post-Deploy Monitoring & Validation section (see Phase 3 Step 5)
+   - Any "Known Residuals" accepted in the Phase 3 Residual Work Gate, rendered as a dedicated section in the PR body with severity, file:line, and title per finding

   If the user prefers to commit without creating a PR, load the `ce-commit` skill instead.

@@ -103,7 +120,7 @@ Before creating PR, verify:

 Every change gets reviewed. The tier determines depth, not whether review happens.

-**Tier 2 (full review)** -- REQUIRED default. Invoke `ce-code-review mode:autofix` with `plan:<path>` when available. Safe fixes are applied automatically; residual work surfaces as todos. Always use this tier unless all four Tier 1 criteria are explicitly confirmed.
+**Tier 2 (full review)** -- REQUIRED default. Invoke `ce-code-review mode:autofix` with `plan:<path>` when available. Safe fixes are applied automatically; residual work is recorded in the run artifact for downstream routing. Always use this tier unless all four Tier 1 criteria are explicitly confirmed.

 **Tier 1 (inline self-review)** -- permitted only when all four are true (state each explicitly before choosing):
 - Purely additive (new files only, no existing behavior modified)
--- a/plugins/compound-engineering/skills/ce-work/references/tracker-defer.md
+++ b/plugins/compound-engineering/skills/ce-work/references/tracker-defer.md
@@ -0,0 +1,149 @@
+# Tracker Detection and Defer Execution
+
+This reference covers how Defer actions file tickets in the project's tracker. It is loaded by `SKILL.md` when Interactive mode's routing question needs to decide whether to offer option C (File tickets), when the walk-through's Defer option executes, and when the bulk-preview of option C is shown. It is also loaded by autonomous callers (e.g., `lfg`) that need to file residual actionable findings without user prompts — see Execution Modes below.
+
+---
+
+## Execution Modes
+
+Tracker-defer has two execution modes. The caller selects one; the detection, fallback chain, and ticket composition are shared.
+
+### Interactive mode (default)
+
+Used by `ce-code-review` Interactive mode's routing question, walk-through Defer actions, and bulk-preview option C. All user-facing prompts fire:
+
+- First Defer of the session with a generic (non-named) label confirms the effective tracker choice.
+- Execution failures prompt with Retry / Fall back to next sink / Convert to Skip.
+- Labels in the routing question reflect `named_sink_available` (name the tracker) vs fallback generics.
+
+### Non-interactive mode
+
+Used by autonomous callers like `lfg` that must not prompt. All blocking questions are skipped; the fallback chain is executed silently in order. Behavior:
+
+- No confirmation on the first generic-label Defer; proceed directly.
+- On execution failure, automatically fall to the next tier without prompting. Record the failure.
+- On total chain exhaustion (every tier failed or no sink available), return findings in the `no_sink` bucket so the caller can route them to another surface (e.g., inline them in a PR description).
+- Return a structured result: `{ filed: [{ finding_id, tracker, url }], failed: [{ finding_id, tracker, reason }], no_sink: [{ finding_id, title, severity, file, line }] }`.
+
+The caller decides how to surface the result to the user. The non-interactive mode treats "no sink available" as a data-producing outcome, not a prompt trigger.
+
+---
+
+## Detection
+
+The agent determines the project's tracker from whatever documentation is obvious. Primary sources: `CLAUDE.md` and `AGENTS.md` at the repo root and in relevant subdirectories. Supplementary signals (when primary documentation is ambiguous): `CONTRIBUTING.md`, `README.md`, PR templates under `.github/`, visible tracker URLs in the repo.
+
+A tracker can be surfaced via MCP tool (e.g., a Linear MCP server), CLI (e.g., `gh`), or direct API. All are acceptable. The detection output is a tuple with two availability flags — one for the named tracker specifically (drives label confidence in Interactive mode) and one for the full fallback chain (drives whether Defer is offered at all):
+
+```
+{ tracker_name, confidence, named_sink_available, any_sink_available }
+```
+
+Where:
+- `tracker_name` — human-readable name ("Linear", "GitHub Issues", "Jira"), or `null` when detection cannot identify a specific tracker
+- `confidence` — `high` when the tracker is named explicitly in documentation (or via a linked URL to a specific project/workspace) and is unambiguously the project's canonical tracker; `low` when the signal is thin, conflicting, or implied only
+- `named_sink_available` — `true` only when the agent can actually invoke the detected tracker (MCP tool is loaded, CLI is authenticated, or API credentials are in environment); `false` when the tracker is documented but no tool reaches it, or when no tracker is found at all. Drives label confidence: inline tracker naming requires this to be `true`.
+- `any_sink_available` — `true` when any tier in the fallback chain (named tracker or GitHub Issues via `gh`) can be invoked this session. Drives whether Defer is offered in Interactive mode, and drives the `no_sink` bucket in Non-interactive mode.
+
+Detection is reasoning-based. Do not maintain an enumerated checklist of files to read. Read the obvious sources and form a confident conclusion; when the obvious sources don't resolve, the label falls back to generic wording and the agent confirms with the user before executing (Interactive mode only).
+
+---
+
+## Probe timing and caching
+
+Availability probes run **at most once per session** and **only when Defer execution is imminent**. Never speculatively at review start, never per-Defer, never per-walk-through-finding. The cached tuple is reused for every Defer action in the same run.
+
+Typical probe sequence:
+
+1. Read `CLAUDE.md` / `AGENTS.md` for tracker references. If nothing found, set `tracker_name = null`, `confidence = low`.
+2. **Probe the named tracker when one was found.** For GitHub Issues, run `gh auth status` and `gh repo view --json hasIssuesEnabled`. For Linear or other MCP-backed trackers, verify the relevant MCP tool is loaded and responsive. For API-backed trackers, verify credentials in environment. Set `named_sink_available` from the probe result.
+3. **Probe the GitHub Issues fallback to compute `any_sink_available`.** Even when the named tracker was found and probed, `gh` matters for the `no_sink` bucket decision so that a run with no documented tracker but working `gh` still offers Defer.
+   - If `named_sink_available = true`: `any_sink_available = true` (no further probes needed).
+   - Otherwise, probe GitHub Issues via `gh auth status` + `gh repo view --json hasIssuesEnabled` (skip if already probed in step 2). If it works, `any_sink_available = true`.
+   - Otherwise, `any_sink_available = false`.
+
+When Interactive mode's routing question is skipped entirely (R2 zero-findings case), no probes run. When the cached tuple is reused across a session, any `named_sink_available = true` from the session's first probe stays cached — do not re-probe per Defer.
+
+---
+
+## Label logic (Interactive mode)
+
+- When `confidence = high` AND `named_sink_available = true`: the routing question's option C and the walk-through's per-finding Defer option both include the tracker name verbatim. Example: `File a Linear ticket per finding`, `Defer — file a Linear ticket`.
+- When `any_sink_available = true` but either `confidence = low` or `named_sink_available = false` (a fallback tier is working instead): the labels read generically — `File an issue per finding`, `Defer — file a ticket`. Before executing the first Defer of the session, the agent confirms the effective tracker choice with the user using the platform's blocking question tool.
+- When `any_sink_available = false`: option C is omitted from the routing question, option B (Defer) is omitted from the walk-through per-finding options, and the agent tells the user why in the routing question's stem.
+
+Non-interactive mode skips label decisions entirely — it acts silently on the detected sink.
+
+---
+
+## Fallback chain
+
+When the named tracker is unavailable or no tracker is named, fall back in this order. Prefer the project's detected tracker; use `gh` only when no named tracker was found or the named one is unreachable.
+
+1. **Named tracker** (MCP tool, CLI, or API the agent can invoke directly, identified via Detection above)
+2. **GitHub Issues via `gh`** — when `gh auth status` succeeds and the current repo has issues enabled (`gh repo view --json hasIssuesEnabled` returns `true`)
+3. **No sink** — findings remain in the review report's residual-work section (Interactive mode) or are returned in the `no_sink` bucket for the caller to route (Non-interactive mode). The agent does not re-display them through a transient surface.
+
+Previously this chain included a third in-session fallback tier. That tier was removed because in-session tasks do not survive past the session and therefore do not meet the "durable filing" intent of a Defer action. When no durable tracker exists, the correct behavior is to leave findings in the report (Interactive) or return them to the caller (Non-interactive).
+
+---
+
+## Ticket composition
+
+Every Defer action creates a ticket with the following content, adapted to the tracker's capabilities:
+
+- **Title:** the merged finding's `title` (schema-capped at 10 words).
+- **Body:**
+  - Plain-English problem statement — reads the persona-produced `why_it_matters` from the contributing reviewer's artifact file at `.context/compound-engineering/ce-code-review/<run-id>/{reviewer}.json`, using the same `file + line_bucket(line, +/-3) + normalize(title)` matching headless mode uses (see SKILL.md Stage 6 detail enrichment). Falls back to the merged finding's `title`, `severity`, `file`, and `suggested_fix` (when present) when no artifact match is available — these fields are guaranteed in the merge-tier compact return.
+  - Suggested fix (when present in the finding's `suggested_fix`).
+  - Evidence (direct quotes from the reviewer's artifact).
+  - Metadata block: `Severity: <level>`, `Confidence: <score>`, `Reviewer(s): <list>`, `Finding ID: <fingerprint>`.
+- **Labels** (when the tracker supports labels): severity tag (`P0`, `P1`, `P2`, `P3`) and, when the tracker convention supports it, a category label sourced from the reviewer name.
+- **Length cap:** when the composed body would exceed a tracker's body length limit, truncate with `... (continued in ce-code-review run artifact: .context/compound-engineering/ce-code-review/<run-id>/)` and include the finding_id in both the truncated body and the metadata block so the artifact is discoverable.
+
+The finding_id is a stable fingerprint composed as `normalize(file) + line_bucket(line, +/-3) + normalize(title)` — the same fingerprint used by the merge pipeline.
+
+---
+
+## Failure path
+
+When ticket creation fails at execution (API error, auth expiry mid-session, rate limit, malformed body rejected, 4xx/5xx response):
+
+**Interactive mode:** surface the failure inline and ask the user using the platform's blocking question tool.
+
+Stem:
+> Defer failed: <tracker name> returned <error summary>. How should the agent handle this finding?
+
+Options:
+- `Retry on <tracker>` — re-attempt the same tracker once more (useful for transient errors)
+- `Fall back to next sink` — move this finding's Defer to the next tier in the fallback chain (e.g., from Linear to GitHub Issues)
+- `Convert to Skip — record the failure` — abandon this Defer, note the failure in the completion report's failure section, and continue the walk-through or bulk flow
+
+**Non-interactive mode:** do not prompt. Automatically fall through to the next tier. If every tier fails, record the finding in the `failed` bucket of the structured return and continue. If the chain exhausts with no sink ever available, the finding ends up in the `no_sink` bucket.
+
+When a high-confidence named tracker fails at execution, the cached `named_sink_available` is set to `false` for the rest of the session. Subsequent Defer actions fall straight through to the next tier without retrying a confirmed-broken sink. `any_sink_available` is only downgraded to `false` when every tier has been confirmed broken — a failed Linear call that succeeds via `gh` keeps `any_sink_available = true`.
+
+Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to numbered options and waiting for the user's reply (Interactive mode only).
+
+---
+
+## Per-tracker behavior
+
+Concrete behavior per tracker at execution time. The agent may invoke any of these through the appropriate interface (MCP, CLI, or API) — the choice depends on what is available in the current environment.
+
+| Tracker | Interface | Invocation sketch | Body format | Labels |
+|---------|-----------|-------------------|-------------|--------|
+| Linear | MCP (preferred) or API | Create issue in the project/workspace identified by documentation; assign to the reporter if the MCP tool exposes user context | Markdown | Severity priority field if the MCP exposes it; otherwise include severity in body |
+| GitHub Issues | `gh issue create` | Repo defaults to the current repo. Use `--label` for severity tag when labels exist; omit `--label` if the repo has no label fixture. Fall back to a label-less issue on first failure. | Markdown | `--label P0` / `--label P1` / etc. when labels exist |
+| Jira | MCP or API | Create issue in the project identified by documentation; Jira's markdown dialect differs from GitHub's — use plain text in the body when MCP does not handle conversion | Plain text when MCP does not handle markdown | Severity priority field |
+| No sink available | — | Interactive: Defer option omitted, findings remain in the report's residual-work section. Non-interactive: findings returned in the `no_sink` bucket for caller routing. | — | — |
+
+When uncertain, prefer "drop with explicit user-facing notice" over "pass through silently and hope." A Defer that produces no durable artifact and no user message is data loss.
+
+---
+
+## Cross-platform notes
+
+The question-tool name varies by platform. In Interactive mode, use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code the tool should already be loaded from the Interactive-mode pre-load step — if it isn't, call `ToolSearch` with query `select:AskUserQuestion` now. Fall back to numbered options in chat only when the harness genuinely lacks a blocking tool — `ToolSearch` returns no match, the tool call explicitly fails, or the runtime mode does not expose it (e.g., Codex edit modes without `request_user_input`). A pending schema load is not a fallback trigger. Never silently skip the question.
+
+Non-interactive mode is platform-agnostic: it never prompts, so the platform's question tool is not relevant.
--- a/plugins/compound-engineering/skills/lfg/SKILL.md
+++ b/plugins/compound-engineering/skills/lfg/SKILL.md
@@ -19,12 +19,40 @@ CRITICAL: You MUST execute every step below IN ORDER. Do NOT skip any required s

 4. `/ce-code-review mode:autofix plan:<plan-path-from-step-2>`

-   Pass the plan file path from step 2 so ce-code-review can verify requirements completeness.
+   Pass the plan file path from step 2 so ce-code-review can verify requirements completeness. Read the Residual Actionable Work summary the skill emits.

-5. `/ce-todo-resolve`
+5. **Persist review autofixes** (REQUIRED after step 4, before residual handoff)

-6. `/ce-test-browser`
+   Check `git status --short`. If `ce-code-review mode:autofix` changed files, stage only those review-fix files, commit them with `fix(review): apply autofix feedback`, and push the current branch before continuing. If an upstream exists, run `git push`. If no upstream exists, resolve a writable remote dynamically: prefer `origin` when present, otherwise use `git remote` and choose the first configured remote. Then run `git push --set-upstream <remote> HEAD`. Do not proceed to step 6, run browser tests, or output DONE while review autofix edits remain only in the working tree. If no files changed, explicitly note that there were no review autofixes to persist.

-7. Output `<promise>DONE</promise>` when complete
+6. **Autonomous residual handoff** (only when step 4 reported one or more residual `downstream-resolver` findings; skip when it reported `Residual actionable work: none.`)
+
+   Do not prompt the user. This step embraces the autopilot contract: residuals must become durable before DONE, but the agent never stops to ask.
+
+   1. Load `references/tracker-defer.md` in **non-interactive mode**. Pass the residual actionable findings from step 4's summary (or the run artifact when the summary was truncated).
+   2. Collect the structured return: `{ filed: [...], failed: [...], no_sink: [...] }`.
+   3. Compose a `## Residual Review Findings` markdown section from the structured return:
+      - For each item in `filed`: a bullet with severity, file:line, title, and a link to the tracker ticket URL.
+      - For each item in `failed`: a bullet with severity, file:line, title, and the failure reason (e.g., `Defer failed: gh returned 401 — tracker unavailable`).
+      - For each item in `no_sink`: a bullet with severity, file:line, and title inlined verbatim so the PR body or fallback file is the durable record.
+   4. Detect the current branch's open PR without prompting:
+
+      ```bash
+      gh pr view --json number,url,body,state
+      ```
+
+   5. If an open PR exists, update it directly with `gh`; do not load any confirmation-driven PR update skill. Append or replace the `## Residual Review Findings` section in the current PR body, write the new body to an OS temp file, then run:
+
+      ```bash
+      gh pr edit PR_NUMBER --body-file BODY_FILE
+      ```
+
+   6. If no open PR exists, create a tracked fallback file at `docs/residual-review-findings/<branch-or-head-sha>.md` containing the composed section and the source PR-review run context. Stage only that file, commit it with `docs(review): record residual review findings`, and push the current branch. If an upstream exists, run `git push`. If no upstream exists, resolve a writable remote dynamically: prefer `origin` when present, otherwise use `git remote` and choose the first configured remote. Then run `git push --set-upstream <remote> HEAD`. This is the durable no-PR sink. Do not output DONE until either the existing PR body has been updated or this fallback file commit has been pushed. If both paths fail, stop and report the failed commands; do not silently proceed.
+
+   Never block DONE on tracker filing failures once residuals have been durably recorded. A `no_sink` outcome is success only when the findings are present in the PR body or in the pushed fallback file.
+
+7. `/ce-test-browser`
+
+8. Output `<promise>DONE</promise>` when complete

 Start with step 2 now (or step 1 if ralph-loop is available). Remember: plan FIRST, then work. Never skip the plan.
--- a/plugins/compound-engineering/skills/lfg/references/tracker-defer.md
+++ b/plugins/compound-engineering/skills/lfg/references/tracker-defer.md
@@ -0,0 +1,149 @@
+# Tracker Detection and Defer Execution
+
+This reference covers how Defer actions file tickets in the project's tracker. It is loaded by `SKILL.md` when Interactive mode's routing question needs to decide whether to offer option C (File tickets), when the walk-through's Defer option executes, and when the bulk-preview of option C is shown. It is also loaded by autonomous callers (e.g., `lfg`) that need to file residual actionable findings without user prompts — see Execution Modes below.
+
+---
+
+## Execution Modes
+
+Tracker-defer has two execution modes. The caller selects one; the detection, fallback chain, and ticket composition are shared.
+
+### Interactive mode (default)
+
+Used by `ce-code-review` Interactive mode's routing question, walk-through Defer actions, and bulk-preview option C. All user-facing prompts fire:
+
+- First Defer of the session with a generic (non-named) label confirms the effective tracker choice.
+- Execution failures prompt with Retry / Fall back to next sink / Convert to Skip.
+- Labels in the routing question reflect `named_sink_available` (name the tracker) vs fallback generics.
+
+### Non-interactive mode
+
+Used by autonomous callers like `lfg` that must not prompt. All blocking questions are skipped; the fallback chain is executed silently in order. Behavior:
+
+- No confirmation on the first generic-label Defer; proceed directly.
+- On execution failure, automatically fall to the next tier without prompting. Record the failure.
+- On total chain exhaustion (every tier failed or no sink available), return findings in the `no_sink` bucket so the caller can route them to another surface (e.g., inline them in a PR description).
+- Return a structured result: `{ filed: [{ finding_id, tracker, url }], failed: [{ finding_id, tracker, reason }], no_sink: [{ finding_id, title, severity, file, line }] }`.
+
+The caller decides how to surface the result to the user. The non-interactive mode treats "no sink available" as a data-producing outcome, not a prompt trigger.
+
+---
+
+## Detection
+
+The agent determines the project's tracker from whatever documentation is obvious. Primary sources: `CLAUDE.md` and `AGENTS.md` at the repo root and in relevant subdirectories. Supplementary signals (when primary documentation is ambiguous): `CONTRIBUTING.md`, `README.md`, PR templates under `.github/`, visible tracker URLs in the repo.
+
+A tracker can be surfaced via MCP tool (e.g., a Linear MCP server), CLI (e.g., `gh`), or direct API. All are acceptable. The detection output is a tuple with two availability flags — one for the named tracker specifically (drives label confidence in Interactive mode) and one for the full fallback chain (drives whether Defer is offered at all):
+
+```
+{ tracker_name, confidence, named_sink_available, any_sink_available }
+```
+
+Where:
+- `tracker_name` — human-readable name ("Linear", "GitHub Issues", "Jira"), or `null` when detection cannot identify a specific tracker
+- `confidence` — `high` when the tracker is named explicitly in documentation (or via a linked URL to a specific project/workspace) and is unambiguously the project's canonical tracker; `low` when the signal is thin, conflicting, or implied only
+- `named_sink_available` — `true` only when the agent can actually invoke the detected tracker (MCP tool is loaded, CLI is authenticated, or API credentials are in environment); `false` when the tracker is documented but no tool reaches it, or when no tracker is found at all. Drives label confidence: inline tracker naming requires this to be `true`.
+- `any_sink_available` — `true` when any tier in the fallback chain (named tracker or GitHub Issues via `gh`) can be invoked this session. Drives whether Defer is offered in Interactive mode, and drives the `no_sink` bucket in Non-interactive mode.
+
+Detection is reasoning-based. Do not maintain an enumerated checklist of files to read. Read the obvious sources and form a confident conclusion; when the obvious sources don't resolve, the label falls back to generic wording and the agent confirms with the user before executing (Interactive mode only).
+
+---
+
+## Probe timing and caching
+
+Availability probes run **at most once per session** and **only when Defer execution is imminent**. Never speculatively at review start, never per-Defer, never per-walk-through-finding. The cached tuple is reused for every Defer action in the same run.
+
+Typical probe sequence:
+
+1. Read `CLAUDE.md` / `AGENTS.md` for tracker references. If nothing found, set `tracker_name = null`, `confidence = low`.
+2. **Probe the named tracker when one was found.** For GitHub Issues, run `gh auth status` and `gh repo view --json hasIssuesEnabled`. For Linear or other MCP-backed trackers, verify the relevant MCP tool is loaded and responsive. For API-backed trackers, verify credentials in environment. Set `named_sink_available` from the probe result.
+3. **Probe the GitHub Issues fallback to compute `any_sink_available`.** Even when the named tracker was found and probed, `gh` matters for the `no_sink` bucket decision so that a run with no documented tracker but working `gh` still offers Defer.
+   - If `named_sink_available = true`: `any_sink_available = true` (no further probes needed).
+   - Otherwise, probe GitHub Issues via `gh auth status` + `gh repo view --json hasIssuesEnabled` (skip if already probed in step 2). If it works, `any_sink_available = true`.
+   - Otherwise, `any_sink_available = false`.
+
+When Interactive mode's routing question is skipped entirely (R2 zero-findings case), no probes run. When the cached tuple is reused across a session, any `named_sink_available = true` from the session's first probe stays cached — do not re-probe per Defer.
+
+---
+
+## Label logic (Interactive mode)
+
+- When `confidence = high` AND `named_sink_available = true`: the routing question's option C and the walk-through's per-finding Defer option both include the tracker name verbatim. Example: `File a Linear ticket per finding`, `Defer — file a Linear ticket`.
+- When `any_sink_available = true` but either `confidence = low` or `named_sink_available = false` (a fallback tier is working instead): the labels read generically — `File an issue per finding`, `Defer — file a ticket`. Before executing the first Defer of the session, the agent confirms the effective tracker choice with the user using the platform's blocking question tool.
+- When `any_sink_available = false`: option C is omitted from the routing question, option B (Defer) is omitted from the walk-through per-finding options, and the agent tells the user why in the routing question's stem.
+
+Non-interactive mode skips label decisions entirely — it acts silently on the detected sink.
+
+---
+
+## Fallback chain
+
+When the named tracker is unavailable or no tracker is named, fall back in this order. Prefer the project's detected tracker; use `gh` only when no named tracker was found or the named one is unreachable.
+
+1. **Named tracker** (MCP tool, CLI, or API the agent can invoke directly, identified via Detection above)
+2. **GitHub Issues via `gh`** — when `gh auth status` succeeds and the current repo has issues enabled (`gh repo view --json hasIssuesEnabled` returns `true`)
+3. **No sink** — findings remain in the review report's residual-work section (Interactive mode) or are returned in the `no_sink` bucket for the caller to route (Non-interactive mode). The agent does not re-display them through a transient surface.
+
+Previously this chain included a third in-session fallback tier. That tier was removed because in-session tasks do not survive past the session and therefore do not meet the "durable filing" intent of a Defer action. When no durable tracker exists, the correct behavior is to leave findings in the report (Interactive) or return them to the caller (Non-interactive).
+
+---
+
+## Ticket composition
+
+Every Defer action creates a ticket with the following content, adapted to the tracker's capabilities:
+
+- **Title:** the merged finding's `title` (schema-capped at 10 words).
+- **Body:**
+  - Plain-English problem statement — reads the persona-produced `why_it_matters` from the contributing reviewer's artifact file at `.context/compound-engineering/ce-code-review/<run-id>/{reviewer}.json`, using the same `file + line_bucket(line, +/-3) + normalize(title)` matching headless mode uses (see SKILL.md Stage 6 detail enrichment). Falls back to the merged finding's `title`, `severity`, `file`, and `suggested_fix` (when present) when no artifact match is available — these fields are guaranteed in the merge-tier compact return.
+  - Suggested fix (when present in the finding's `suggested_fix`).
+  - Evidence (direct quotes from the reviewer's artifact).
+  - Metadata block: `Severity: <level>`, `Confidence: <score>`, `Reviewer(s): <list>`, `Finding ID: <fingerprint>`.
+- **Labels** (when the tracker supports labels): severity tag (`P0`, `P1`, `P2`, `P3`) and, when the tracker convention supports it, a category label sourced from the reviewer name.
+- **Length cap:** when the composed body would exceed a tracker's body length limit, truncate with `... (continued in ce-code-review run artifact: .context/compound-engineering/ce-code-review/<run-id>/)` and include the finding_id in both the truncated body and the metadata block so the artifact is discoverable.
+
+The finding_id is a stable fingerprint composed as `normalize(file) + line_bucket(line, +/-3) + normalize(title)` — the same fingerprint used by the merge pipeline.
+
+---
+
+## Failure path
+
+When ticket creation fails at execution (API error, auth expiry mid-session, rate limit, malformed body rejected, 4xx/5xx response):
+
+**Interactive mode:** surface the failure inline and ask the user using the platform's blocking question tool.
+
+Stem:
+> Defer failed: <tracker name> returned <error summary>. How should the agent handle this finding?
+
+Options:
+- `Retry on <tracker>` — re-attempt the same tracker once more (useful for transient errors)
+- `Fall back to next sink` — move this finding's Defer to the next tier in the fallback chain (e.g., from Linear to GitHub Issues)
+- `Convert to Skip — record the failure` — abandon this Defer, note the failure in the completion report's failure section, and continue the walk-through or bulk flow
+
+**Non-interactive mode:** do not prompt. Automatically fall through to the next tier. If every tier fails, record the finding in the `failed` bucket of the structured return and continue. If the chain exhausts with no sink ever available, the finding ends up in the `no_sink` bucket.
+
+When a high-confidence named tracker fails at execution, the cached `named_sink_available` is set to `false` for the rest of the session. Subsequent Defer actions fall straight through to the next tier without retrying a confirmed-broken sink. `any_sink_available` is only downgraded to `false` when every tier has been confirmed broken — a failed Linear call that succeeds via `gh` keeps `any_sink_available = true`.
+
+Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to numbered options and waiting for the user's reply (Interactive mode only).
+
+---
+
+## Per-tracker behavior
+
+Concrete behavior per tracker at execution time. The agent may invoke any of these through the appropriate interface (MCP, CLI, or API) — the choice depends on what is available in the current environment.
+
+| Tracker | Interface | Invocation sketch | Body format | Labels |
+|---------|-----------|-------------------|-------------|--------|
+| Linear | MCP (preferred) or API | Create issue in the project/workspace identified by documentation; assign to the reporter if the MCP tool exposes user context | Markdown | Severity priority field if the MCP exposes it; otherwise include severity in body |
+| GitHub Issues | `gh issue create` | Repo defaults to the current repo. Use `--label` for severity tag when labels exist; omit `--label` if the repo has no label fixture. Fall back to a label-less issue on first failure. | Markdown | `--label P0` / `--label P1` / etc. when labels exist |
+| Jira | MCP or API | Create issue in the project identified by documentation; Jira's markdown dialect differs from GitHub's — use plain text in the body when MCP does not handle conversion | Plain text when MCP does not handle markdown | Severity priority field |
+| No sink available | — | Interactive: Defer option omitted, findings remain in the report's residual-work section. Non-interactive: findings returned in the `no_sink` bucket for caller routing. | — | — |
+
+When uncertain, prefer "drop with explicit user-facing notice" over "pass through silently and hope." A Defer that produces no durable artifact and no user message is data loss.
+
+---
+
+## Cross-platform notes
+
+The question-tool name varies by platform. In Interactive mode, use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code the tool should already be loaded from the Interactive-mode pre-load step — if it isn't, call `ToolSearch` with query `select:AskUserQuestion` now. Fall back to numbered options in chat only when the harness genuinely lacks a blocking tool — `ToolSearch` returns no match, the tool call explicitly fails, or the runtime mode does not expose it (e.g., Codex edit modes without `request_user_input`). A pending schema load is not a fallback trigger. Never silently skip the question.
+
+Non-interactive mode is platform-agnostic: it never prompts, so the platform's question tool is not relevant.