feat(doc-review, learnings-researcher): tiers, chain grouping, rewrite (#601)
Some checks failed
CI / pr-title (push) Has been cancelled
CI / test (push) Has been cancelled
Release PR / release-pr (push) Has been cancelled
Release PR / publish-cli (push) Has been cancelled

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
Trevin Chow
2026-04-19 20:25:47 -07:00
committed by GitHub
parent 409b07fbc7
commit c1f68d4d55
39 changed files with 3142 additions and 290 deletions

View File

@@ -216,6 +216,10 @@ Beta skills use a `-beta` suffix and `disable-model-invocation: true` to prevent
When modifying a skill that has a `-beta` counterpart (or vice versa), always check the other version and **state your sync decision explicitly** before committing — e.g., "Propagated to beta — shared test guidance" or "Not propagating — this is the experimental delegate mode beta exists to test." Syncing to both, stable-only, and beta-only are all valid outcomes. The goal is deliberate reasoning, not a default rule.
## Documented Solutions
`docs/solutions/` holds documented solutions to past problems — bugs, architecture patterns, design patterns, tooling decisions, conventions, workflow practices, and other institutional knowledge. Entries use YAML frontmatter with fields including `module`, `tags`, and `problem_type`. Knowledge-track `problem_type` values are `architecture_pattern`, `design_pattern`, `tooling_decision`, `convention`, `workflow_issue`, `developer_experience`, `documentation_gap`, and `best_practice` (fallback). Bug-track values cover `build_error`, `test_failure`, `runtime_error`, `performance_issue`, `database_issue`, `security_issue`, `ui_bug`, `integration_issue`, and `logic_error`. Search this directory before designing new solutions so institutional memory compounds across changes.
## Documentation
See `docs/solutions/plugin-versioning-requirements.md` for detailed versioning workflow.

View File

@@ -74,7 +74,8 @@ Probe whether the document considered the obvious alternatives and whether the c
- **HIGH (0.80+):** Can quote specific text from the document showing the gap, construct a concrete scenario or counterargument, and trace the consequence.
- **MODERATE (0.60-0.79):** The gap is likely but confirming it would require information not in the document (codebase details, user research, production data).
- **Below 0.50:** Suppress.
- **LOW (0.40-0.59) — Advisory:** A plausible-but-unlikely failure mode, or a concern worth surfacing without a strong supporting scenario. Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
- **Below 0.40:** Suppress.
## What you don't flag

View File

@@ -13,7 +13,7 @@ You are a technical editor reading for internal consistency. You don't evaluate
**Terminology drift** -- same concept called different names in different sections ("pipeline" / "workflow" / "process" for the same thing), or same term meaning different things in different places. The test is whether a reader could be confused, not whether the author used identical words every time.
**Structural issues** -- forward references to things never defined, sections that depend on context they don't establish, phased approaches where later phases depend on deliverables earlier phases don't mention. Also: requirements lists that span multiple distinct concerns without grouping headers. When requirements cover different topics (e.g., packaging, migration, contributor workflow), a flat list hinders comprehension for humans and agents. Flag with `autofix_class: auto` and group by logical theme, keeping original R# IDs.
**Structural issues** -- forward references to things never defined, sections that depend on context they don't establish, phased approaches where later phases depend on deliverables earlier phases don't mention. Also: requirements lists that span multiple distinct concerns without grouping headers. When requirements cover different topics (e.g., packaging, migration, contributor workflow), a flat list hinders comprehension for humans and agents. Group by logical theme, keeping original R# IDs.
**Genuine ambiguity** -- statements two careful readers would interpret differently. Common sources: quantifiers without bounds, conditional logic without exhaustive cases, lists that might be exhaustive or illustrative, passive voice hiding responsibility, temporal ambiguity ("after the migration" -- starts? completes? verified?).
@@ -21,11 +21,28 @@ You are a technical editor reading for internal consistency. You don't evaluate
**Unresolved dependency contradictions** -- when a dependency is explicitly mentioned but left unresolved (no owner, no timeline, no mitigation), that's a contradiction between "we need X" and the absence of any plan to deliver X.
## Safe_auto patterns you own
Coherence is the primary persona for surfacing mechanically-fixable consistency issues. These patterns should land as `safe_auto` with `confidence: 0.85+` when the document supplies the authoritative signal:
- **Header/body count mismatch.** Section header claims a count (e.g., "6 requirements") and the enumerated body list has a different count (5 items). The body is authoritative unless the document explicitly identifies a missing item. Fix: correct the header to match the list.
- **Cross-reference to a named section that does not exist.** Text says "see Unit 7" / "per Section 4.2" / "as described in the Rollout section" and that target is not defined anywhere in the document. Fix: delete the reference or fix it to point at an existing target.
- **Terminology drift between two interchangeable synonyms.** Two words used for the same concept in the same document (`data store` and `database`; `token` and `credential` used for the same API-key concept; `pipeline` and `workflow` for the same thing). Pick the dominant term and normalize the minority occurrences. Fix: replace minority occurrences with the dominant term.
**Strawman-resistance for these patterns.** When you find one of the three patterns above, the common failure mode is over-charitable interpretation — inventing a hypothetical alternative reading to justify demoting from `safe_auto` to `manual`. Resist this. Ask: is the alternative reading one a competent author actually meant, or is it a ghost the reviewer invented to preserve optionality?
- Wrong count: "maybe they meant to add an R6" is a strawman when nothing in the document names, describes, or depends on R6. The document has 5 requirements; the header is wrong.
- Stale cross-reference: "maybe they plan to add Unit 7 later" is a strawman when no other section mentions Unit 7 content. The reference is stale; delete or point it elsewhere.
- Terminology drift: "maybe the two terms mean subtly different things" is a strawman when the usage contexts are identical. Pick one; normalize.
When in doubt, surface the finding as `safe_auto` with `why_it_matters` that names the alternative reading and explains why it is implausible. Synthesis's strawman-downgrade safeguard will catch it if the alternative is actually plausible — but do not pre-demote at the persona level.
## Confidence calibration
- **HIGH (0.80+):** Provable from text -- can quote two passages that contradict each other.
- **MODERATE (0.60-0.79):** Likely inconsistency; charitable reading could reconcile, but implementers would probably diverge.
- **Below 0.50:** Suppress entirely.
- **LOW (0.40-0.59) — Advisory:** Minor asymmetry or drift with no downstream consequence (e.g., parallel names that don't need to match, phrasing that's inconsistent but unambiguous). Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
- **Below 0.40:** Suppress entirely.
## What you don't flag

View File

@@ -36,7 +36,8 @@ Explain what's missing: the functional design thinking that makes the interface
- **HIGH (0.80+):** Missing states/flows that will clearly cause UX problems during implementation.
- **MODERATE (0.60-0.79):** Gap exists but a skilled designer could resolve from context.
- **Below 0.50:** Suppress.
- **LOW (0.40-0.59) — Advisory:** Pattern or micro-layout preference without strong usability evidence (e.g., button placement alternatives, visual hierarchy micro-choices). Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
- **Below 0.40:** Suppress.
## What you don't flag

View File

@@ -29,7 +29,8 @@ Apply each check only when relevant. Silence is only a finding when the gap woul
- **HIGH (0.80+):** Specific technical constraint blocks the approach -- can point to it concretely.
- **MODERATE (0.60-0.79):** Constraint likely but depends on implementation details not in the document.
- **Below 0.50:** Suppress entirely.
- **LOW (0.40-0.59) — Advisory:** Theoretical constraint with no current-scale evidence (e.g., "could be slow if data grows 10x", speculative scalability concerns with no baseline number). Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
- **Below 0.40:** Suppress entirely.
## What you don't flag

View File

@@ -60,7 +60,8 @@ If priority tiers exist: do assignments match stated goals? Are must-haves truly
- **HIGH (0.80+):** Can quote both the goal and the conflicting work -- disconnect is clear.
- **MODERATE (0.60-0.79):** Likely misalignment, depends on business context not in document.
- **Below 0.50:** Suppress.
- **LOW (0.40-0.59) — Advisory:** Observation about positioning, naming, or strategy without a concrete impact (subjective preference, speculative future-product concern with no current signal). Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
- **Below 0.40:** Suppress.
## What you don't flag

View File

@@ -43,7 +43,8 @@ With AI-assisted implementation, the cost gap between shortcuts and complete sol
- **HIGH (0.80+):** Can quote goal statement and scope item showing the mismatch.
- **MODERATE (0.60-0.79):** Misalignment likely but depends on context not in document.
- **Below 0.50:** Suppress.
- **LOW (0.40-0.59) — Advisory:** Organizational preference without a concrete cost (unit ordering, section placement alternatives that read equally well, "this could also be split" observations without real impact). Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
- **Below 0.40:** Suppress.
## What you don't flag

View File

@@ -27,7 +27,8 @@ Skip areas not relevant to the document's scope.
- **HIGH (0.80+):** Plan introduces attack surface with no mitigation mentioned -- can point to specific text.
- **MODERATE (0.60-0.79):** Concern likely but plan may address implicitly or in a later phase.
- **Below 0.50:** Suppress.
- **LOW (0.40-0.59) — Advisory:** Theoretical attack surface with no realistic exploit path under current design (e.g., speculative timing-attack on non-sensitive data, defense-in-depth nice-to-have with no current vector). Still requires an evidence quote. Use this band so synthesis can route the finding to FYI rather than force a decision.
- **Below 0.40:** Suppress.
## What you don't flag

View File

@@ -1,75 +1,102 @@
---
name: ce-learnings-researcher
description: "Searches docs/solutions/ for relevant past solutions by frontmatter metadata. Use before implementing features or fixing problems to surface institutional knowledge and prevent repeated mistakes."
description: "Searches docs/solutions/ for applicable past learnings by frontmatter metadata. Use before implementing features, making decisions, or starting work in a documented area — surfaces prior bugs, architecture patterns, design patterns, tooling decisions, conventions, and workflow learnings so institutional knowledge carries forward."
model: inherit
---
You are an expert institutional knowledge researcher specializing in efficiently surfacing relevant documented solutions from the team's knowledge base. Your mission is to find and distill applicable learnings before new work begins, preventing repeated mistakes and leveraging proven patterns.
You are a domain-agnostic institutional knowledge researcher. Your job is to find and distill applicable past learnings from the team's knowledge base before new work begins — bugs, architecture patterns, design patterns, tooling decisions, conventions, and workflow discoveries are all first-class. Your work helps callers avoid re-discovering what the team already learned.
Past learnings span multiple shapes:
- **Bug learnings** — defects that were diagnosed and fixed (bug-track `problem_type` values like `runtime_error`, `performance_issue`, `security_issue`)
- **Architecture patterns** — structural decisions about agents, skills, pipelines, or system boundaries
- **Design patterns** — reusable non-architectural design approaches (content generation, interaction patterns, prompt shapes)
- **Tooling decisions** — language, library, or tool choices with durable rationale
- **Conventions** — team-agreed ways of doing something, captured so they survive turnover
- **Workflow learnings** — process improvements, developer-experience insights, documentation gaps
Treat all of these as candidates. Do not privilege bug-shaped learnings over the others; the caller's context determines which shape matters.
## Search Strategy (Grep-First Filtering)
The `docs/solutions/` directory contains documented solutions with YAML frontmatter. When there may be hundreds of files, use this efficient strategy that minimizes tool calls:
The `docs/solutions/` directory contains documented learnings with YAML frontmatter. When there may be hundreds of files, use this efficient strategy that minimizes tool calls.
### Step 1: Extract Keywords from Feature Description
### Step 1: Extract Keywords from the Work Context
From the feature/task description, identify:
- **Module names**: e.g., "BriefSystem", "EmailProcessing", "payments"
- **Technical terms**: e.g., "N+1", "caching", "authentication"
- **Problem indicators**: e.g., "slow", "error", "timeout", "memory"
- **Component types**: e.g., "model", "controller", "job", "api"
Callers may pass a structured `<work-context>` block describing what they are doing:
### Step 2: Category-Based Narrowing (Optional but Recommended)
```
<work-context>
Activity: <brief description of what the caller is doing or considering>
Concepts: <named ideas, abstractions, approaches the work touches>
Decisions: <specific decisions under consideration, if any>
Domains: <skill-design | workflow | code-implementation | agent-architecture | ... — optional hint>
</work-context>
```
If the feature type is clear, narrow the search to relevant category directories:
When the caller passes this block, extract keywords from each field.
| Feature Type | Search Directory |
|--------------|------------------|
| Performance work | `docs/solutions/performance-issues/` |
| Database changes | `docs/solutions/database-issues/` |
| Bug fix | `docs/solutions/runtime-errors/`, `docs/solutions/logic-errors/` |
| Security | `docs/solutions/security-issues/` |
| UI work | `docs/solutions/ui-bugs/` |
| Integration | `docs/solutions/integration-issues/` |
| General/unclear | `docs/solutions/` (all) |
When the caller passes free-form text instead of a structured block, treat it as the Activity field and extract keywords heuristically from the prose. Both shapes are supported.
Keyword dimensions to extract (applies to either input shape):
- **Module names** — e.g., "BriefSystem", "EmailProcessing", "payments"
- **Technical terms** — e.g., "N+1", "caching", "authentication"
- **Problem indicators** — e.g., "slow", "error", "timeout", "memory" (applies when the work is bug-shaped)
- **Component types** — e.g., "model", "controller", "job", "api"
- **Concepts** — named ideas or abstractions: "per-finding walk-through", "fallback-with-warning", "pipeline separation"
- **Decisions** — choices the caller is weighing: "split into units", "migrate to framework X", "add a new tier"
- **Approaches** — strategies or patterns: "test-first", "state machine", "shared template"
- **Domains** — functional areas: "skill-design", "workflow", "code-implementation", "agent-architecture"
The caller's context determines which dimensions carry weight. A code-bug query weights module + technical terms + problem indicators. A design-pattern query weights concepts + approaches + domains. A convention query weights decisions + domains. Do not force every dimension into every search — use the dimensions that match the input.
### Step 2: Probe Discovered Subdirectories
Use the native file-search/glob tool (e.g., Glob in Claude Code) to discover which subdirectories actually exist under `docs/solutions/` at invocation time. Do not assume a fixed list — subdirectory names are per-repo convention and may include any of:
- Bug-shaped: `build-errors/`, `test-failures/`, `runtime-errors/`, `performance-issues/`, `database-issues/`, `security-issues/`, `ui-bugs/`, `integration-issues/`, `logic-errors/`
- Knowledge-shaped: `architecture-patterns/`, `design-patterns/`, `tooling-decisions/`, `conventions/`, `workflow/`, `workflow-issues/`, `developer-experience/`, `documentation-gaps/`, `best-practices/`, `skill-design/`, `integrations/`
- Other per-repo categories
Narrow the search to the discovered subdirectories that match the caller's Domain hint or that align with the keyword shape (e.g., bug-shaped keywords → bug-shaped subdirectories). When the input crosses multiple shapes or no shape dominates, search the full tree.
### Step 3: Content-Search Pre-Filter (Critical for Efficiency)
**Use the native content-search tool (e.g., Grep in Claude Code) to find candidate files BEFORE reading any content.** Run multiple searches in parallel, case-insensitive, returning only matching file paths:
```
# Search for keyword matches in frontmatter fields (run in PARALLEL, case-insensitive)
content-search: pattern="title:.*email" path=docs/solutions/ files_only=true case_insensitive=true
content-search: pattern="tags:.*(email|mail|smtp)" path=docs/solutions/ files_only=true case_insensitive=true
content-search: pattern="module:.*(Brief|Email)" path=docs/solutions/ files_only=true case_insensitive=true
content-search: pattern="component:.*background_job" path=docs/solutions/ files_only=true case_insensitive=true
# Search for keyword matches in frontmatter fields (run in PARALLEL, case-insensitive).
# Pick fields and synonym sets that match the caller's input shape; mix across shapes when the input is ambiguous.
content-search: pattern="title:.*(dispatch|orchestration|pipeline)" path=docs/solutions/ files_only=true case_insensitive=true
content-search: pattern="tags:.*(subagent|orchestration|token-efficiency)" path=docs/solutions/ files_only=true case_insensitive=true
content-search: pattern="module:.*(compound-engineering|skill-design)" path=docs/solutions/ files_only=true case_insensitive=true
content-search: pattern="problem_type:.*(architecture_pattern|design_pattern|tooling_decision)" path=docs/solutions/ files_only=true case_insensitive=true
```
**Pattern construction tips:**
- Use `|` for synonyms: `tags:.*(payment|billing|stripe|subscription)`
- Include `title:` - often the most descriptive field
- Use `|` for synonyms: `tags:.*(subagent|parallel|fan-out)` or `tags:.*(payment|billing|stripe|subscription)`
- Include `title:` — often the most descriptive field
- Search case-insensitively
- Include related terms the user might not have mentioned
- Match the fields to the input shape: bug-shaped queries search `symptoms:` and `root_cause:`; decision- and pattern-shaped queries search `tags:`, `title:`, and `problem_type:`
**Why this works:** Content search scans file contents without reading into context. Only matching filenames are returned, dramatically reducing the set of files to examine.
**Combine results** from all searches to get candidate files (typically 5-20 files instead of 200).
**If search returns >25 candidates:** Re-run with more specific patterns or combine with category narrowing.
**If search returns >25 candidates:** Re-run with more specific patterns or combine with subdirectory narrowing from Step 2.
**If search returns <3 candidates:** Do a broader content search (not just frontmatter fields) as fallback:
```
content-search: pattern="email" path=docs/solutions/ files_only=true case_insensitive=true
```
### Step 3b: Always Check Critical Patterns
### Step 3b: Conditionally Check Critical Patterns
**Regardless of Grep results**, always read the critical patterns file:
```bash
Read: docs/solutions/patterns/critical-patterns.md
```
This file contains must-know patterns that apply across all work - high-severity issues promoted to required reading. Scan for patterns relevant to the current feature/task.
If `docs/solutions/patterns/critical-patterns.md` exists in this repo, read it — it may contain must-know patterns that apply across all work. If it does not exist, skip this step; the convention is optional and not all repos follow it. Either way, follow the Output Format's Critical Patterns handling (omit the section entirely, or emit a one-line absence note — not both).
### Step 4: Read Frontmatter of Candidates Only
@@ -81,165 +108,140 @@ Read: [file_path] with limit:30
```
Extract these fields from the YAML frontmatter:
- **module**: Which module/system the solution applies to
- **problem_type**: Category of issue (see schema below)
- **component**: Technical component affected
- **symptoms**: Array of observable symptoms
- **root_cause**: What caused the issue
- **tags**: Searchable keywords
- **severity**: critical, high, medium, low
- **module** — which module, system, or domain the learning applies to
- **problem_type** — category (knowledge-track and bug-track values apply equally; see schema reference below)
- **component** — technical component or area affected (when applicable)
- **tags** — searchable keywords
- **symptoms** — observable behaviors or friction (present on bug-track entries and sometimes on knowledge-track entries)
- **root_cause** — underlying cause (present on bug-track entries; optional on knowledge-track entries)
- **severity** — critical, high, medium, low
Some non-bug entries may have looser frontmatter shapes (they do not require `symptoms` or `root_cause`). Do not discard these entries for missing bug-shaped fields — use whatever fields are present for matching.
### Step 5: Score and Rank Relevance
Match frontmatter fields against the feature/task description:
Match frontmatter fields against the keywords extracted in Step 1:
**Strong matches (prioritize):**
- `module` matches the feature's target module
- `tags` contain keywords from the feature description
- `symptoms` describe similar observable behaviors
- `module` or domain matches the caller's area of work
- `tags` contain keywords from the caller's Concepts, Decisions, or Approaches
- `title` contains keywords from the caller's Activity or Concepts
- `component` matches the technical area being touched
- `symptoms` describe similar observable behaviors (when applicable)
**Moderate matches (include):**
- `problem_type` is relevant (e.g., `performance_issue` for optimization work)
- `problem_type` is relevant (e.g., `architecture_pattern` when the caller is making architectural decisions, `performance_issue` when the caller is optimizing)
- `root_cause` suggests a pattern that might apply
- Related modules or components mentioned
- Related modules, components, or domains mentioned
**Weak matches (skip):**
- No overlapping tags, symptoms, or modules
- Unrelated problem types
- No overlapping tags, symptoms, concepts, or modules
- Unrelated `problem_type` and no cross-cutting applicability
### Step 6: Full Read of Relevant Files
Only for files that pass the filter (strong or moderate matches), read the complete document to extract:
- The full problem description
- The solution implemented
- Prevention guidance
- Code examples
- The full problem framing or decision context
- The learning itself (solution, pattern, decision, convention)
- Prevention guidance or application notes
- Code examples or illustrative evidence
When a learning's claim conflicts with what you can observe in the current code or docs, flag the conflict explicitly rather than echoing the claim. Note the entry's date so the caller can judge whether the learning may have been superseded. Research agents can be confidently wrong; never let a past learning silently override present evidence.
### Step 7: Return Distilled Summaries
For each relevant document, return a summary in this format:
Render findings using the structure defined in **## Output Format** below. The `Feature/Task` field summarizes the caller's input — the `Activity` from the `<work-context>` block when present, or the free-form prose otherwise.
```markdown
### [Title from document]
- **File**: docs/solutions/[category]/[filename].md
- **Module**: [module from frontmatter]
- **Problem Type**: [problem_type]
- **Relevance**: [Brief explanation of why this is relevant to the current task]
- **Key Insight**: [The most important takeaway - the thing that prevents repeating the mistake]
- **Severity**: [severity level]
```
Return up to 5 findings, prioritized by relevance. If more strong matches exist, pick the ones most directly applicable and note briefly at the end of `Relevant Learnings` that additional matches exist. Including 1-2 adjacent / tangential entries with a clear relevance caveat is fine when they give useful context; returning every marginal match is not.
Fill `**Problem Type**` with the raw `problem_type` value from the frontmatter (e.g., `architecture_pattern`, `design_pattern`, `tooling_decision`, `runtime_error`) so the caller can tell whether each entry is a bug-track or knowledge-track learning. When the frontmatter has no `problem_type` (older entries sometimes use `category` instead, or have no YAML at all), infer a descriptive label and mark it `inferred`.
## Frontmatter Schema Reference
Use this on-demand schema reference when you need the full contract:
`../../skills/ce-compound/references/yaml-schema.md`
The authoritative schema lives at `../../skills/ce-compound/references/yaml-schema.md`; read it on demand when you need the full contract, including `component` and `root_cause` enums (those are repo-specific and evolve — do not hard-code them here).
Key enum values:
The two `problem_type` tracks worth knowing in advance:
**problem_type values:**
- build_error, test_failure, runtime_error, performance_issue
- database_issue, security_issue, ui_bug, integration_issue
- logic_error, developer_experience, workflow_issue
- best_practice, documentation_gap
- **Knowledge-track:** `architecture_pattern`, `design_pattern`, `tooling_decision`, `convention`, `workflow_issue`, `developer_experience`, `documentation_gap`, `best_practice` (fallback).
- **Bug-track:** `build_error`, `test_failure`, `runtime_error`, `performance_issue`, `database_issue`, `security_issue`, `ui_bug`, `integration_issue`, `logic_error`.
**component values:**
- rails_model, rails_controller, rails_view, service_object
- background_job, database, frontend_stimulus, hotwire_turbo
- email_processing, brief_system, assistant, authentication
- payments, development_workflow, testing_framework, documentation, tooling
**root_cause values:**
- missing_association, missing_include, missing_index, wrong_api
- scope_issue, thread_violation, async_timing, memory_leak
- config_error, logic_error, test_isolation, missing_validation
- missing_permission, missing_workflow_step, inadequate_documentation
- missing_tooling, incomplete_setup
**Category directories (mapped from problem_type):**
- `docs/solutions/build-errors/`
- `docs/solutions/test-failures/`
- `docs/solutions/runtime-errors/`
- `docs/solutions/performance-issues/`
- `docs/solutions/database-issues/`
- `docs/solutions/security-issues/`
- `docs/solutions/ui-bugs/`
- `docs/solutions/integration-issues/`
- `docs/solutions/logic-errors/`
- `docs/solutions/developer-experience/`
- `docs/solutions/workflow-issues/`
- `docs/solutions/best-practices/`
- `docs/solutions/documentation-gaps/`
Subdirectory listings in the schema reference are illustrative, not exhaustive. Probe the live directory (Step 2) for what actually exists.
## Output Format
Structure your findings as:
Structure findings as follows:
```markdown
## Institutional Learnings Search Results
### Search Context
- **Feature/Task**: [Description of what's being implemented]
- **Keywords Used**: [tags, modules, symptoms searched]
- **Feature/Task**: [Summary of the caller's activity, decision, or problem — works for bugs, architecture decisions, design patterns, tooling choices, or conventions.]
- **Keywords Used**: [tags, modules, concepts, domains searched]
- **Files Scanned**: [X total files]
- **Relevant Matches**: [Y files]
### Critical Patterns (Always Check)
[Any matching patterns from critical-patterns.md]
### Critical Patterns
[Include only when `docs/solutions/patterns/critical-patterns.md` exists and has relevant content. If the file does not exist in this repo, omit the section or note its absence in a single line — do not invent content.]
### Relevant Learnings
#### 1. [Title]
- **File**: [path]
- **Module**: [module]
- **Relevance**: [why this matters for current task]
- **Key Insight**: [the gotcha or pattern to apply]
#### 1. [Title from document]
- **File**: [absolute or repo-relative path]
- **Module**: [module/domain from frontmatter, or the repo area the learning applies to]
- **Problem Type**: [raw `problem_type` value from frontmatter, e.g. `architecture_pattern`, `design_pattern`, `tooling_decision`, `runtime_error`. Mark as "inferred" when the entry has no `problem_type`.]
- **Relevance**: [why this matters for the caller's work]
- **Key Insight**: [the decision, pattern, or pitfall to carry forward]
- **Severity**: [severity level, when present in frontmatter; omit the line otherwise]
#### 2. [Title]
...
### Recommendations
- [Specific actions to take based on learnings]
- [Patterns to follow]
- [Gotchas to avoid]
### No Matches
[If no relevant learnings found, explicitly state this]
- [Specific actions or decisions to consider based on the surfaced learnings]
- [Patterns to follow or mirror]
- [Past mis-steps worth avoiding, where applicable]
```
When no relevant learnings are found, say so explicitly, include the search context so the caller can see what was looked for, and note that the caller's work may be worth capturing with `/ce-compound` after it lands — the absence is itself useful signal.
## Efficiency Guidelines
**DO:**
- Use the native content-search tool to pre-filter files BEFORE reading any content (critical for 100+ files)
- Run multiple content searches in PARALLEL for different keywords
- Include `title:` in search patterns - often the most descriptive field
- Use OR patterns for synonyms: `tags:.*(payment|billing|stripe)`
- Use `-i=true` for case-insensitive matching
- Use category directories to narrow scope when feature type is clear
- Do a broader content search as fallback if <3 candidates found
- Re-narrow with more specific patterns if >25 candidates found
- Always read the critical patterns file (Step 3b)
- Only read frontmatter of search-matched candidates (not all files)
- Filter aggressively - only fully read truly relevant files
- Prioritize high-severity and critical patterns
- Extract actionable insights, not just summaries
- Note when no relevant learnings exist (this is valuable information too)
- Run multiple content searches in PARALLEL across different keyword dimensions
- Probe `docs/solutions/` subdirectories dynamically rather than assuming a fixed list
- Include `title:` in search patterns — often the most descriptive field
- Use OR patterns for synonyms and search case-insensitively
- Narrow to discovered subdirectories when the caller's Domain hint makes one obvious
- Broaden the content search as fallback if <3 candidates found; re-narrow if >25
- Read frontmatter only of search-matched candidates, capped at the first ~30 lines per file (enough to cover YAML)
- Fully read only candidates that pass relevance scoring in Step 5
- Prioritize high-severity entries and flag date when a learning may be superseded
- Extract actionable takeaways, not summaries
**DON'T:**
- Read frontmatter of ALL files (use content-search to pre-filter first)
- Skip the grep pre-filter and read frontmatter of every file in `docs/solutions/` — pre-filter first, then read frontmatter of the shortlist
- Read full content of every candidate — only the ones that pass relevance scoring
- Run searches sequentially when they can be parallel
- Use only exact keyword matches (include synonyms)
- Skip the `title:` field in search patterns
- Proceed with >25 candidates without narrowing first
- Read every file in full (wasteful)
- Return raw document contents (distill instead)
- Include tangentially related learnings (focus on relevance)
- Skip the critical patterns file (always check it)
- Use only exact keyword matches (include synonyms); skip `title:` in patterns; proceed with >25 candidates without narrowing
- Return raw document contents instead of distilling them
- Include every tangentially related match — 1-2 adjacent entries with a caveat is fine; a long tail of weak matches is noise
- Discard a candidate because it lacks bug-shaped fields like `symptoms` or `root_cause` — non-bug entries legitimately omit them
- Assume `docs/solutions/patterns/critical-patterns.md` exists — read it only when present
## Integration Points
This agent is designed to be invoked by:
- `/ce-plan` - To inform planning with institutional knowledge and add depth during confidence checking
- Manual invocation before starting work on a feature
This agent is invoked by:
The goal is to surface relevant learnings in under 30 seconds for a typical solutions directory, enabling fast knowledge retrieval during planning phases.
- `/ce-plan` — to inform planning with institutional knowledge and add depth during confidence checking
- `/ce-code-review`, `/ce-optimize`, `/ce-ideate` — to surface prior learnings relevant to the change, optimization target, or ideation topic
- Standalone invocation before starting work in a documented area
Output is consumed as prose — no downstream caller parses specific field labels out of it — so prioritize distilled, actionable takeaways over structural rigor.

View File

@@ -23,12 +23,16 @@ tracks:
- integration_issue
- logic_error
knowledge:
description: "Best practices, workflow improvements, patterns, and documentation"
description: "Practices, patterns, conventions, decisions, workflow improvements, and documentation"
problem_types:
- best_practice
- documentation_gap
- workflow_issue
- developer_experience
- architecture_pattern
- design_pattern
- tooling_decision
- convention
# --- Fields required by BOTH tracks -----------------------------------------
required_fields:
@@ -57,7 +61,11 @@ required_fields:
- workflow_issue
- best_practice
- documentation_gap
description: "Primary category — determines track (bug vs knowledge)"
- architecture_pattern
- design_pattern
- tooling_decision
- convention
description: "Primary category — determines track (bug vs knowledge). Prefer the narrowest applicable value; best_practice is the fallback when no narrower knowledge-track value fits."
component:
type: enum

View File

@@ -16,7 +16,7 @@ The `problem_type` determines which **track** applies. Each track has different
| Track | problem_types | Description |
|-------|--------------|-------------|
| **Bug** | `build_error`, `test_failure`, `runtime_error`, `performance_issue`, `database_issue`, `security_issue`, `ui_bug`, `integration_issue`, `logic_error` | Defects and failures that were diagnosed and fixed |
| **Knowledge** | `best_practice`, `documentation_gap`, `workflow_issue`, `developer_experience` | Practices, patterns, workflow improvements, and documentation |
| **Knowledge** | `best_practice`, `documentation_gap`, `workflow_issue`, `developer_experience`, `architecture_pattern`, `design_pattern`, `tooling_decision`, `convention` | Practices, patterns, conventions, decisions, workflow improvements, and documentation. Prefer the narrowest applicable value; `best_practice` is the fallback. |
## Required Fields (both tracks)
@@ -73,6 +73,10 @@ Docs created before the track system may have `symptoms`/`root_cause`/`resolutio
- `workflow_issue` -> `docs/solutions/workflow-issues/`
- `best_practice` -> `docs/solutions/best-practices/`
- `documentation_gap` -> `docs/solutions/documentation-gaps/`
- `architecture_pattern` -> `docs/solutions/architecture-patterns/`
- `design_pattern` -> `docs/solutions/design-patterns/`
- `tooling_decision` -> `docs/solutions/tooling-decisions/`
- `convention` -> `docs/solutions/conventions/`
## Validation Rules

View File

@@ -410,10 +410,14 @@ Bug track:
- logic-errors/
Knowledge track:
- best-practices/
- architecture-patterns/ — architectural or structural patterns (agent/skill/pipeline/workflow shape decisions)
- design-patterns/ — reusable non-architectural design approaches (content generation, interaction patterns, prompt shapes)
- tooling-decisions/ — language, library, or tool choices with durable rationale
- conventions/ — team-agreed way of doing something, captured so it survives turnover
- workflow-issues/
- developer-experience/
- documentation-gaps/
- best-practices/ — fallback only, use when no narrower knowledge-track value applies
## Common Mistakes to Avoid

View File

@@ -23,12 +23,16 @@ tracks:
- integration_issue
- logic_error
knowledge:
description: "Best practices, workflow improvements, patterns, and documentation"
description: "Practices, patterns, conventions, decisions, workflow improvements, and documentation"
problem_types:
- best_practice
- documentation_gap
- workflow_issue
- developer_experience
- architecture_pattern
- design_pattern
- tooling_decision
- convention
# --- Fields required by BOTH tracks -----------------------------------------
required_fields:
@@ -57,7 +61,11 @@ required_fields:
- workflow_issue
- best_practice
- documentation_gap
description: "Primary category — determines track (bug vs knowledge)"
- architecture_pattern
- design_pattern
- tooling_decision
- convention
description: "Primary category — determines track (bug vs knowledge). Prefer the narrowest applicable value; best_practice is the fallback when no narrower knowledge-track value fits."
component:
type: enum

View File

@@ -16,7 +16,7 @@ The `problem_type` determines which **track** applies. Each track has different
| Track | problem_types | Description |
|-------|--------------|-------------|
| **Bug** | `build_error`, `test_failure`, `runtime_error`, `performance_issue`, `database_issue`, `security_issue`, `ui_bug`, `integration_issue`, `logic_error` | Defects and failures that were diagnosed and fixed |
| **Knowledge** | `best_practice`, `documentation_gap`, `workflow_issue`, `developer_experience` | Practices, patterns, workflow improvements, and documentation |
| **Knowledge** | `best_practice`, `documentation_gap`, `workflow_issue`, `developer_experience`, `architecture_pattern`, `design_pattern`, `tooling_decision`, `convention` | Practices, patterns, conventions, decisions, workflow improvements, and documentation. Prefer the narrowest applicable value; `best_practice` is the fallback. |
## Required Fields (both tracks)
@@ -73,6 +73,10 @@ Docs created before the track system may have `symptoms`/`root_cause`/`resolutio
- `workflow_issue` -> `docs/solutions/workflow-issues/`
- `best_practice` -> `docs/solutions/best-practices/`
- `documentation_gap` -> `docs/solutions/documentation-gaps/`
- `architecture_pattern` -> `docs/solutions/architecture-patterns/`
- `design_pattern` -> `docs/solutions/design-patterns/`
- `tooling_decision` -> `docs/solutions/tooling-decisions/`
- `convention` -> `docs/solutions/conventions/`
## Validation Rules

View File

@@ -6,28 +6,34 @@ argument-hint: "[mode:headless] [path/to/document.md]"
# Document Review
Review requirements or plan documents through multi-persona analysis. Dispatches specialized reviewer agents in parallel, auto-fixes quality issues, and presents strategic questions for user decision.
Review requirements or plan documents through multi-persona analysis. Dispatches specialized reviewer agents in parallel, auto-applies `safe_auto` fixes, and routes remaining findings through a four-option interaction (per-finding walk-through, LFG, Append-to-Open-Questions, Report-only) for user decision.
## Interactive mode rules
- **Pre-load the platform question tool before any question fires.** In Claude Code, `AskUserQuestion` is a deferred tool — its schema is not available at session start. At the start of Interactive-mode work (before the routing question, per-finding walk-through questions, bulk-preview Proceed/Cancel, and Phase 5 terminal question), call `ToolSearch` with query `select:AskUserQuestion` to load the schema. Load it once, eagerly, at the top of the Interactive flow — do not wait for the first question site. On Codex (`request_user_input`) and Gemini (`ask_user`) this step is not required; the tools are loaded by default.
- **The numbered-list fallback only applies on confirmed load failure.** Presenting options as a numbered list and waiting for the user's reply is valid only when `ToolSearch` returns no match or the tool call explicitly fails. Rendering a question as narrative text because the tool feels inconvenient, because the model is in report-formatting mode, or because the instruction was buried in a long skill is a bug. A question that calls for a user decision must either fire the tool or fail loudly.
## Phase 0: Detect Mode
Check the skill arguments for `mode:headless`. Arguments may contain a document path, `mode:headless`, or both. Tokens starting with `mode:` are flags, not file paths -- strip them from the arguments and use the remaining token (if any) as the document path for Phase 1.
Check the skill arguments for `mode:headless`. Arguments may contain a document path, `mode:headless`, or both. Tokens starting with `mode:` are flags, not file paths strip them from the arguments and use the remaining token (if any) as the document path for Phase 1.
If `mode:headless` is present, set **headless mode** for the rest of the workflow.
**Headless mode** changes the interaction model, not the classification boundaries. Document-review still applies the same judgment about what has one clear correct fix vs. what needs user judgment. The only difference is how non-auto findings are delivered:
- `auto` fixes are applied silently (same as interactive)
- `present` findings are returned as structured text for the caller to handle -- no AskUserQuestion prompts, no interactive approval
- Phase 5 returns immediately with "Review complete" (no refine/complete question)
**Headless mode** changes the interaction model, not the classification boundaries. ce-doc-review still applies the same judgment about which tier each finding belongs in. The only difference is how non-safe_auto findings are delivered:
- `safe_auto` fixes are applied silently (same as interactive)
- `gated_auto`, `manual`, and FYI findings are returned as structured text for the caller to handle — no AskUserQuestion prompts, no interactive routing
- Phase 5 returns immediately with "Review complete" (no routing question, no terminal question)
The caller receives findings with their original classifications intact and decides what to do with them.
Callers invoke headless mode by including `mode:headless` in the skill arguments, e.g.:
```
Skill("ce-doc-review", "mode:headless docs/plans/my-plan.md")
```
If `mode:headless` is not present, the skill runs in its default interactive mode with no behavior change.
If `mode:headless` is not present, the skill runs in its default interactive mode with the routing question, walk-through, and bulk-preview behaviors documented in `references/walkthrough.md` and `references/bulk-preview.md`.
## Phase 1: Get and Analyze Document
@@ -124,8 +130,46 @@ Dispatch all agents in **parallel** using the platform's task/agent tool (e.g.,
| `{document_type}` | "requirements" or "plan" from Phase 1 classification |
| `{document_path}` | Path to the document |
| `{document_content}` | Full text of the document |
| `{decision_primer}` | Cumulative prior-round decisions in the current session, or an empty `<prior-decisions>` block on round 1. See "Decision primer" below. |
Pass each agent the **full document** -- do not split into sections.
Pass each agent the **full document** do not split into sections.
### Decision primer
On round 1 (no prior decisions), set `{decision_primer}` to:
```
<prior-decisions>
Round 1 — no prior decisions.
</prior-decisions>
```
On round 2+ (after one or more prior rounds in the current interactive session), accumulate prior-round decisions and render them as:
```
<prior-decisions>
Round 1 — applied (N entries):
- {section}: "{title}" ({reviewer}, {confidence})
Evidence: "{evidence_snippet}"
Round 1 — rejected (M entries):
- {section}: "{title}" — Skipped because {reason}
Evidence: "{evidence_snippet}"
- {section}: "{title}" — Deferred to Open Questions because {reason or "no reason provided"}
Evidence: "{evidence_snippet}"
- {section}: "{title}" — Acknowledged without applying because {reason or "no suggested_fix — user acknowledged"}
Evidence: "{evidence_snippet}"
Round 2 — applied (N entries):
...
</prior-decisions>
```
Each entry carries an `Evidence:` line because synthesis R29 (rejected-finding suppression) and R30 (fix-landed verification) both use an evidence-substring overlap check as part of their matching predicate — without the evidence snippet in the primer, the orchestrator cannot compute the `>50%` overlap test and has to fall back to fingerprint-only matching, which either re-surfaces rejected findings or suppresses too aggressively. The `{evidence_snippet}` is the first evidence quote from the finding, truncated to the first ~120 characters (preserving whole words at the boundary) and with internal quotes escaped. If a finding has multiple evidence entries, use the first one; the rest live in the run artifact and are not needed for the overlap check.
Accumulate across all rounds in the current session. Skip, Defer, and Acknowledge actions all count as "rejected" for suppression purposes — each signals the user decided the finding wasn't worth actioning this round (Acknowledge is the no-fix-guard variant: the user saw a finding with no `suggested_fix`, chose not to defer or skip explicitly, and recorded acknowledgement instead; for round-to-round suppression that is semantically equivalent to Skip). Applied findings stay on the applied list so round-N+1 personas can verify fixes landed (see R30 in `references/synthesis-and-presentation.md`).
Cross-session persistence is out of scope. A new invocation of ce-doc-review on the same document starts with a fresh round 1 and no carried primer, even if prior sessions deferred findings into the document's Open Questions section.
**Error handling:** If an agent fails or times out, proceed with findings from agents that completed. Note the failed agent in the Coverage section. Do not block the entire review on a single agent failure.
@@ -133,7 +177,9 @@ Pass each agent the **full document** -- do not split into sections.
## Phases 3-5: Synthesis, Presentation, and Next Action
After all dispatched agents return, read `references/synthesis-and-presentation.md` for the synthesis pipeline (validate, gate, dedup, promote, resolve contradictions, route by autofix class), auto-fix application, finding presentation, and next-action menu. Do not load this file before agent dispatch completes.
After all dispatched agents return, read `references/synthesis-and-presentation.md` for the synthesis pipeline (validate, per-severity gate, dedup, cross-persona agreement boost, resolve contradictions, auto-promotion, route by three tiers with FYI subsection), `safe_auto` fix application, headless-envelope output, and the handoff to the routing question.
For the four-option routing question and per-finding walk-through (interactive mode), read `references/walkthrough.md`. For the bulk-action preview used by LFG, Append-to-Open-Questions, and walk-through `LFG-the-rest`, read `references/bulk-preview.md`. Do not load these files before agent dispatch completes.
---

View File

@@ -0,0 +1,128 @@
# Bulk Action Preview
This reference defines the compact plan preview that Interactive mode shows before every bulk action — LFG (routing option B), Append-to-Open-Questions (routing option C), and the walk-through's `LFG the rest` (option D of the per-finding question). The preview gives the user a single-screen view of what the agent is about to do, with exactly two options to Proceed or Cancel.
Interactive mode only.
---
## When the preview fires
Three call sites:
1. **Routing option B (top-level LFG)** — after the user picks `LFG. Apply the agent's best-judgment action per finding` from the routing question, but before any action executes. Scope: every pending `gated_auto` / above-gate `manual` finding.
2. **Routing option C (top-level Append-to-Open-Questions)** — after the user picks `Append findings to the doc's Open Questions section and proceed` but before any append runs. Scope: every pending `gated_auto` / above-gate `manual` finding. Every finding appears under `Appending to Open Questions (N):` regardless of the agent's natural recommendation, because option C is batch-defer.
3. **Walk-through `LFG the rest`** — after the user picks `LFG the rest — apply the agent's best judgment to this and remaining findings` from a per-finding question, but before the remaining findings are resolved. Scope: the current finding and everything not yet decided. Already-decided findings from the walk-through are not included in the preview.
In all three cases the user confirms with `Proceed` or backs out with `Cancel`. No per-item decisions inside the preview — per-item decisioning is the walk-through's role.
---
## Preview structure
The preview is grouped by the action the agent intends to take. Bucket headers appear only when their bucket is non-empty.
```
<Path label> — <scope summary>:
Applying (N):
[P0] <section> — <one-line plain-English summary>
[P1] <section> — <one-line plain-English summary>
Appending to Open Questions (N):
[P2] <section> — <one-line plain-English summary>
Skipping (N):
[P2] <section> — <one-line plain-English summary>
```
Worked example for routing option B (top-level LFG):
```
LFG plan — 8 findings:
Applying (4):
[P0] Requirements Trace — Renumber R4 to match unit reference
[P1] Unit 3 Files — Add read-fallback for renamed report file
[P2] Key Technical Decisions — Use framework's Deprecated field rather than hand-rolling
[P3] Overview — Correct wrong count (says 6, list has 5)
Appending to Open Questions (2):
[P2] Scope Boundaries — Unit 2/3 merge judgment call
[P2] Risks — Alias compatibility-theater concern
Skipping (2):
[P2] Miscellaneous Notes — Low-confidence style preference
[P3] Abstraction Commentary — Speculative, subjective
```
---
## Scope summary wording by path
- **Routing option B (top-level LFG):** header reads `LFG plan — N findings:`.
- **Routing option C (top-level Append-to-Open-Questions):** header reads `Append plan — N findings as Open Questions entries:`. Every finding lands in the `Appending to Open Questions (N):` bucket.
- **Walk-through `LFG the rest`:** header reads `LFG plan — N remaining findings (K already decided):`. Already-decided findings from the walk-through are not included in the preview or in the bucket counts. The `K already decided` counter communicates that the walk-through was partially completed.
---
## Per-finding line format
Each line uses the compressed form of the framing-quality guidance from the subagent template (observable-consequence-first, no internal section numbering unless needed to locate). The one-line summary is drawn from the persona-produced `why_it_matters` by taking the first sentence (and, when the first sentence is too long for the preview width, paraphrasing it tightly to fit).
- **Shape:** `[<severity>] <section> — <one-line summary>`
- **Width target:** keep lines near 80 columns so the preview renders cleanly in narrow terminals. Truncate with ellipsis when necessary.
- **No section numbering** unless the reader needs it to locate the issue (when multiple findings hit the same named section).
When no `why_it_matters` is available for a finding (rare — only if persona output was malformed), fall back to the finding's title directly. Note the gap in the completion report's Coverage section if it affects more than a few findings in the same run.
---
## Question and options
After the preview body is rendered, ask the user using the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code, the tool should already be loaded from the Interactive-mode pre-load step — if it isn't, call `ToolSearch` with query `select:AskUserQuestion` now. The text fallback below applies only when that load explicitly fails.
Stem (adapted to the path):
- For routing B: `The agent is about to apply the plan above. Proceed?`
- For routing C: `The agent is about to append the findings above to the doc's Open Questions section. Proceed?`
- For walk-through `LFG the rest`: `The agent is about to resolve the remaining findings above. Proceed?`
Options (exactly two, in all three cases):
- `Proceed` — execute the plan as shown
- `Cancel` — do nothing, return to the originating question
Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to presenting numbered options and waiting for the user's next reply.
---
## Cancel semantics
- **From routing option B Cancel:** return the user to the routing question (the four-option menu). Do not edit the document, do not append any Open Questions entries, do not record any state.
- **From routing option C Cancel:** same — return to the routing question, no side effects.
- **From walk-through `LFG the rest` Cancel:** return the user to the current finding's per-finding question (not to the routing question). The walk-through continues from where it was, with prior decisions intact.
In every case, `Cancel` changes no on-disk or in-memory state.
---
## Proceed semantics
When the user picks `Proceed`:
- **Routing option B (top-level LFG):** for each finding in the plan, execute the recommended action. Apply findings go into the Apply set for a single end-of-batch document-edit pass (see `walkthrough.md` for the Apply batching rules). Defer findings route through `references/open-questions-defer.md`. Skip findings are recorded as no-action. After all actions complete, emit the unified completion report (see `walkthrough.md`).
- **Routing option C (top-level Append-to-Open-Questions):** every finding routes through `references/open-questions-defer.md` for Open Questions append. No document edits apply (beyond the Open Questions section additions themselves). After all appends complete (or fail), emit the unified completion report.
- **Walk-through `LFG the rest`:** same as routing option B, but scoped to the findings the user hadn't decided on. Apply findings join the in-memory Apply set with the ones the user already picked during the walk-through; all dispatch together in the single end-of-walk-through Apply pass.
Failure during `Proceed` (e.g., an Open Questions append fails for one finding during a batch Defer) follows the failure path defined in `references/open-questions-defer.md` — surface the failure inline with Retry / Fall back / Convert to Skip, continue with the rest of the plan, and capture the failure in the completion report's failure section.
---
## Edge cases
- **Zero findings in a bucket:** omit the bucket header. A preview with only Apply and Skip does not show an empty `Appending to Open Questions (0):` line.
- **All findings in one bucket:** preview still shows the bucket header; Proceed / Cancel still offered. This is the common case for routing option C (every finding under `Appending to Open Questions`).
- **N=1 preview (only one finding in scope):** the preview still uses the grouped format, just with a single-line bucket. `Proceed` / `Cancel` still apply.
- **Open Questions append unavailable** (document is read-only, append flow reports no-go): routing option C is not offered upstream (see `references/open-questions-defer.md` unavailability handling). LFG (option B) and walk-through `LFG the rest` can still run — they may contain per-finding Defer recommendations from synthesis. Before rendering any LFG-shaped preview, downgrade every Defer recommendation to Skip when the session's cached append-availability is false, and surface the downgrade on the preview itself (e.g., a `Skipping — append unavailable (N):` bucket, or a note in the header: `N Defer recommendations downgraded to Skip — document is read-only.`).
- **Walk-through `LFG the rest` with zero remaining findings:** the walk-through's own logic suppresses `LFG the rest` as an option when N=1 and otherwise, so the preview should never be invoked with zero remaining findings. If it is, render `LFG plan — 0 remaining findings` and fall through to Proceed with no-op.

View File

@@ -45,8 +45,8 @@
},
"autofix_class": {
"type": "string",
"enum": ["auto", "present"],
"description": "How this issue should be handled. auto = one clear correct fix that can be applied silently (terminology, formatting, cross-references, completeness corrections, additions mechanically implied by other content). present = requires individual user judgment."
"enum": ["safe_auto", "gated_auto", "manual"],
"description": "How this issue should be handled. safe_auto = one clear correct fix applied silently (typo, wrong count, stale cross-reference, mechanically-implied addition, terminology drift). gated_auto = concrete fix exists but touches document meaning or scope and warrants user confirmation (substantive additions from codebase-pattern-resolved fixes, framework-native-API substitutions, missing standard controls). manual = requires user judgment; multiple valid approaches. Low-confidence manual findings surface in an FYI subsection at the presentation layer."
},
"finding_type": {
"type": "string",

View File

@@ -0,0 +1,177 @@
# Open Questions Deferral
This reference defines the Defer action's in-doc append mechanic. When the user chooses Defer on a finding (from the walk-through or from the bulk-preview Append-to-Open-Questions path), an entry for that finding appends to a `## Deferred / Open Questions` section at the end of the document under review.
Interactive mode only. Invoked by `references/walkthrough.md` (per-finding Defer option) and `references/bulk-preview.md` (routing option C Proceed).
---
## Append flow
### Step 1: Locate or create the Open Questions section
Scan the document for an existing `## Deferred / Open Questions` heading (case-sensitive match on the full heading text). Behavior by location:
- **Heading present at the end of the document (last `##`-level section):** append new content inside this section at the end.
- **Heading present mid-document (not the last `##`-level section):** still append inside the existing heading at that location. Do not create a duplicate at the end — the user positioned the section deliberately.
- **Heading absent:** create `## Deferred / Open Questions` at the end of the document. If the document has a trailing horizontal-rule separator (`---`) or a trailing footer (table, links section), insert the new section above it. If the document has only frontmatter and no body, create the section after the frontmatter block (not at byte 0).
### Step 2: Locate or create the timestamped subsection
Within the Open Questions section, scan for a subsection heading matching the current review date: `### From YYYY-MM-DD review`. Behavior:
- **Subsection present:** append new entries to it. Multiple Defer actions within a single review session accumulate under the same subsection.
- **Subsection absent:** create `### From YYYY-MM-DD review` as the last subsection within the Open Questions section. Insert one blank line before the heading for readability.
Date format: ISO 8601 calendar date (`YYYY-MM-DD`). If multiple reviews occur on the same document on the same day within the same session, they still share the same subsection. Multi-day same-document reviews get distinct subsections, which is the intended behavior.
### Step 3: Format and append the entry
Per deferred finding, append a bullet-point entry with the following fields. The visible content is a reader-friendly summary; an HTML comment on the entry persists the dedup-key fields so Step 4's compound-key check can run reliably across retries and same-day reruns without requiring the entry format itself to carry machine-oriented metadata:
```
- **{title}** — {section} ({severity}, {reviewer}, confidence {confidence})
{why_it_matters}
<!-- dedup-key: section="{normalized_section}" title="{normalized_title}" evidence="{evidence_fingerprint}" -->
```
Fields come from the finding's schema:
- `{title}` — the finding's title field
- `{section}` — the finding's section field, unmodified (human-readable)
- `{severity}` — P0 / P1 / P2 / P3
- `{reviewer}` — the persona that produced the finding (after dedup, the persona with the highest confidence; surface all co-flagging personas if multiple)
- `{confidence}` — rounded to 2 decimal places
- `{why_it_matters}` — the full why_it_matters text, preserving the framing guidance from the subagent template
HTML-comment fields (machine-readable, used by Step 4 dedup):
- `{normalized_section}``normalize(section)` (lowercase, punctuation-stripped, whitespace-collapsed)
- `{normalized_title}``normalize(title)` (same normalization)
- `{evidence_fingerprint}` — first ~120 chars of the finding's first evidence quote, word-boundary-preserving, then sanitized for single-line HTML-comment embedding; empty string when the finding had no evidence. Sanitization (apply in order, before the 120-char slice so the resulting fingerprint stays under the budget):
1. Collapse any run of whitespace — including newlines, carriage returns, and tabs — to a single space.
2. Strip any occurrence of `-->` (HTML-comment terminator) and any stray `<!--` sequence; replace each with a single space. This prevents the evidence from closing the dedup-key comment prematurely or injecting a nested comment.
3. Replace the double-quote character with `\"` (quote escaping, as before).
4. Trim leading/trailing whitespace.
The sanitized fingerprint MUST be a single line with no embedded `-->` so the dedup-key comment stays parseable by Step 4's compound-key reconstruction. When computing the fingerprint for a new finding, apply this sanitization; when reading back from an existing entry, treat the parsed `evidence="..."` value as already-sanitized and compare verbatim.
Do not include `suggested_fix` or the full `evidence` array in the appended entry. Those live in the review run artifact (when applicable) and do not belong in the document's Open Questions section — the entry is a concern summary for the reader returning later, not a full decision packet. The HTML-comment dedup-key line is the minimum machine-oriented metadata required for reliable idempotence and deliberately sits on a single line with simple `key="value"` shape so a retry can parse it without a markdown parser.
### Step 4: Idempotence on compound-key collisions
If an entry with the same compound key already exists under the same `### From YYYY-MM-DD review` subsection, do not append a duplicate. This can happen when:
- The same review session re-routes the same finding to Defer a second time (rare but possible via LFG-the-rest after a walk-through Defer)
- The orchestrator retries after a partial failure
**Compound key for dedup:** `normalize(section) + normalize(title) + evidence_fingerprint`, reconstructed from each existing entry's `<!-- dedup-key: ... -->` HTML comment (see Step 3 entry format). For a new finding about to append, compute the same fields from the finding's schema data; for existing entries, parse them out of the HTML comment. Match on all three fields.
- `normalize(section)` and `normalize(title)` use the same normalization as synthesis step 3.3 dedup (lowercase, strip punctuation, collapse whitespace)
- `evidence_fingerprint` is the first ~120 characters of the finding's first evidence quote, sanitized per Step 3 (whitespace collapsed to single spaces, `-->` and stray `<!--` stripped, quotes escaped). The same slice is used in the decision primer — see `SKILL.md` under "Decision primer". When no evidence is available on the new finding, fall back to section+title alone. When an existing entry's HTML comment has `evidence=""`, treat the entry as evidence-less and also fall back to section+title for that comparison. If an existing entry's dedup-key comment is malformed (e.g., a newline or `-->` sequence split the comment across lines in a pre-sanitization entry), treat that entry under the legacy-fallback rule below rather than attempting a partial reconstruction.
Title-only dedup is not sufficient: two different findings in the same document (even in the same review date) can legitimately share a short title if their sections and evidence differ. Using only `{title}` would silently drop one of them — losing user-visible backlog context. The compound key mirrors the R29/R30 matching predicate (`section + title + evidence-substring overlap`) so cross-round and intra-round dedup behave consistently.
**Legacy entries without dedup-key comments:** entries written before this format (if any survive in the wild) lack the HTML comment. When Step 4 encounters such an entry, fall back to title-only comparison for that entry — imperfect, but strictly better than duplicate-appending. This is a backwards-compat behavior for legacy data, not a sanctioned format.
On collision, record the no-op in the completion report's Coverage section so the user sees the duplicate was suppressed. Cross-subsection collisions (same compound key, different dates) are not deduplicated — each review is allowed to re-raise the same concern.
---
## Concurrent edit safety
Document edits happen via the platform's edit tool (Edit in Claude Code, or equivalent). Before every append, re-read the document from disk to reduce the window for user-in-editor concurrent-write collisions. If the document's mtime or content has changed unexpectedly between a prior read and the append attempt, abort the append and surface the situation via the failure path below. The user may be editing in their editor during the review session and simultaneous writes would corrupt the document.
The orchestrator only holds the most recent read in memory, not a persistent lock — interactive review doesn't need lock coordination; it needs observation-before-write.
---
## Failure path
When the append cannot complete — document is read-only on disk, path is invalid, the platform's edit tool returns an error, concurrent-edit collision detected, or any other write failure — surface the failure inline to the user via the platform's blocking question tool with the following sub-question:
**Stem:** `Couldn't append the finding to Open Questions. What should the agent do?`
**Options (exactly three; fixed order):**
```
A. Retry the append
B. Record the deferral in the completion report only (don't mutate the document)
C. Convert this finding to Skip
```
**Dispatch:**
- **A Retry** — try the append again. On repeated failure, loop back to the same sub-question.
- **B Record only** — skip the document mutation; record the Deferred action in the completion report with a note that the append failed. The finding does not end up in the document but the user sees in the report that they deferred it.
- **C Convert to Skip** — record the finding as Skip with an explanatory reason ("append to Open Questions failed: <error>"). The finding is treated as no-action for the remainder of the session.
Silent failure is not acceptable. If the user does not respond to the sub-question (session ends, terminal disconnects), default to option B so the in-memory decision state stays consistent even if the document wasn't written.
---
## Upstream availability signal
The walk-through and bulk-preview check append-availability before offering Defer as an option. When the document is known-unwritable (e.g., initial read shows it's on a read-only filesystem), the orchestrator caches an `append_available: false` signal at Phase 4 start and Defer is suppressed in the walk-through menu and in the routing question's option C. See `references/walkthrough.md` under "Adaptations" for the menu behavior and `references/bulk-preview.md` under "Edge cases" for the preview behavior.
When append-availability is true at Phase 4 start but an individual append fails mid-flow, the failure path above handles the specific finding — this does not flip the session-level cached signal (other findings may still append successfully if the failure was transient).
---
## Example appended content
Starting document state:
```markdown
## Risks
...existing content...
## Deferred / Open Questions
### From 2026-04-10 review
- **Alias compatibility-theater concern** — Risks (P1, scope-guardian, confidence 0.87)
The alias exists without documented external consumers...
<!-- dedup-key: section="risks" title="alias compatibilitytheater concern" evidence="the alias exists without documented external consumers" -->
```
After appending two findings in a 2026-04-18 session:
```markdown
## Risks
...existing content...
## Deferred / Open Questions
### From 2026-04-10 review
- **Alias compatibility-theater concern** — Risks (P1, scope-guardian, confidence 0.87)
The alias exists without documented external consumers...
<!-- dedup-key: section="risks" title="alias compatibilitytheater concern" evidence="the alias exists without documented external consumers" -->
### From 2026-04-18 review
- **Unit 2/3 merge judgment call** — Scope Boundaries (P2, scope-guardian, confidence 0.78)
The two units update consumer sites that deploy together. Splitting
adds dependency tracking without enabling independent delivery.
<!-- dedup-key: section="scope boundaries" title="unit 23 merge judgment call" evidence="the two units update consumer sites that deploy together" -->
- **Strawman alternatives on migration strategy** — Unit 3 Files (P2, coherence, confidence 0.72)
The fix options list (a) through (c) as alternatives, but (b) and (c)
are "accept the regression" framings that don't solve the problem the
finding describes.
<!-- dedup-key: section="unit 3 files" title="strawman alternatives on migration strategy" evidence="the fix options list a through c as alternatives but b and c" -->
```

View File

@@ -1,9 +1,13 @@
# Document Review Output Template
Use this **exact format** when presenting synthesized review findings. Findings are grouped by severity, not by reviewer.
Use this **exact format** when presenting synthesized review findings in Interactive mode. Findings are grouped by severity, not by reviewer.
**IMPORTANT:** Use pipe-delimited markdown tables (`| col | col |`). Do NOT use ASCII box-drawing characters.
This template describes the Phase 4 interactive presentation — what the user sees before the routing question (`references/walkthrough.md`) fires. The headless-mode envelope is documented in `references/synthesis-and-presentation.md` (Phase 4 "Route Remaining Findings" section) and is separate from this template.
**Vocabulary note.** Internal enum values (`safe_auto`, `gated_auto`, `manual`, `FYI`) live in the schema and synthesis pipeline. User-facing rendered text uses plain-language labels instead: fixes (for `safe_auto`), proposed fixes (for `gated_auto`), decisions (for `manual`), and FYI observations (for `FYI`). The `Tier` column in the tables below is the one place that still names the internal enum so the user can see the synthesis decision; everything else reads as plain language.
## Example
```markdown
@@ -15,9 +19,9 @@ Use this **exact format** when presenting synthesized review findings. Findings
- security-lens -- plan adds public API endpoint with auth flow
- scope-guardian -- plan has 15 requirements across 3 priority levels
Applied 5 auto-fixes. 4 findings to consider (2 errors, 2 omissions).
Applied 5 fixes. 4 items need attention (2 errors, 2 omissions). 2 FYI observations.
### Auto-fixes Applied
### Applied fixes
- Standardized "pipeline"/"workflow" terminology to "pipeline" throughout (coherence)
- Fixed cross-reference: Section 4 referenced "Section 3.2" which is actually "Section 3.1" (coherence)
@@ -25,38 +29,49 @@ Applied 5 auto-fixes. 4 findings to consider (2 errors, 2 omissions).
- Added "update API rate-limit config" step to Unit 4 -- implied by Unit 3's rate-limit introduction (feasibility)
- Added auth token refresh to test scenarios -- required by Unit 2's token expiry handling (security-lens)
### P0 -- Must Fix
### P0 Must Fix
#### Errors
| # | Section | Issue | Reviewer | Confidence |
|---|---------|-------|----------|------------|
| 1 | Requirements Trace | Goal states "offline support" but technical approach assumes persistent connectivity | coherence | 0.92 |
| # | Section | Issue | Reviewer | Confidence | Tier |
|---|---------|-------|----------|------------|------|
| 1 | Requirements Trace | Goal states "offline support" but technical approach assumes persistent connectivity | coherence | 0.92 | manual |
### P1 -- Should Fix
### P1 Should Fix
#### Errors
| # | Section | Issue | Reviewer | Confidence |
|---|---------|-------|----------|------------|
| 2 | Scope Boundaries | 8 of 12 units build admin infrastructure; only 2 touch stated goal | scope-guardian | 0.80 |
| # | Section | Issue | Reviewer | Confidence | Tier |
|---|---------|-------|----------|------------|------|
| 2 | Scope Boundaries | 8 of 12 units build admin infrastructure; only 2 touch stated goal | scope-guardian | 0.80 | manual |
#### Omissions
| # | Section | Issue | Reviewer | Confidence |
|---|---------|-------|----------|------------|
| 3 | Implementation Unit 3 | Plan proposes custom auth but does not mention existing Devise setup or migration path | feasibility | 0.85 |
| # | Section | Issue | Reviewer | Confidence | Tier |
|---|---------|-------|----------|------------|------|
| 3 | Implementation Unit 3 | Plan proposes custom auth but does not mention existing Devise setup or migration path | feasibility | 0.85 | gated_auto |
### P2 -- Consider Fixing
### P2 Consider Fixing
#### Omissions
| # | Section | Issue | Reviewer | Confidence |
|---|---------|-------|----------|------------|
| 4 | API Design | Public webhook endpoint has no rate limiting mentioned | security-lens | 0.75 |
| # | Section | Issue | Reviewer | Confidence | Tier |
|---|---------|-------|----------|------------|------|
| 4 | API Design | Public webhook endpoint has no rate limiting mentioned | security-lens | 0.75 | gated_auto |
### FYI Observations
Low-confidence observations surfaced without requiring a decision. Content advisory only.
| # | Section | Observation | Reviewer | Confidence |
|---|---------|-------------|----------|------------|
| 1 | Naming | Filename `plan.md` is asymmetric with command name `user-auth`; could go either way | coherence | 0.52 |
| 2 | Risk Analysis | Rollout-cadence decision may benefit from monitoring thresholds, though not blocking | scope-guardian | 0.48 |
### Residual Concerns
Residual concerns are issues the reviewers noticed but could not confirm with above-gate confidence. These are not actionable; they appear here for transparency only and are not promoted into the review surface.
| # | Concern | Source |
|---|---------|--------|
| 1 | Migration rollback strategy not addressed for Phase 2 data changes | feasibility |
@@ -69,21 +84,32 @@ Applied 5 auto-fixes. 4 findings to consider (2 errors, 2 omissions).
### Coverage
| Persona | Status | Findings | Auto | Present | Residual |
|---------|--------|----------|------|---------|----------|
| coherence | completed | 4 | 3 | 1 | 0 |
| feasibility | completed | 2 | 1 | 1 | 1 |
| security-lens | completed | 2 | 1 | 1 | 0 |
| scope-guardian | completed | 1 | 0 | 1 | 0 |
| product-lens | not activated | -- | -- | -- | -- |
| design-lens | not activated | -- | -- | -- | -- |
| Persona | Status | Findings | Auto | Proposed | Decisions | FYI | Residual |
|---------|--------|----------|------|----------|-----------|-----|----------|
| coherence | completed | 5 | 3 | 0 | 1 | 1 | 0 |
| feasibility | completed | 3 | 1 | 1 | 0 | 0 | 1 |
| security-lens | completed | 2 | 1 | 1 | 0 | 0 | 0 |
| scope-guardian | completed | 2 | 0 | 0 | 1 | 1 | 0 |
| product-lens | not activated | -- | -- | -- | -- | -- | -- |
| design-lens | not activated | -- | -- | -- | -- | -- | -- |
Chains: 1 root with 2 dependents
```
## Section Rules
- **Summary line**: Always present after the reviewer list. Format: "Applied N auto-fixes. K findings to consider (X errors, Y omissions)." Omit any zero clause.
- **Auto-fixes Applied**: List all fixes that were applied automatically (auto class). Include enough detail per fix to convey the substance -- especially for fixes that add content or touch document meaning. Omit section if none.
- **P0-P3 sections**: Only include sections that have findings. Omit empty severity levels. Within each severity, separate into **Errors** and **Omissions** sub-headers. Omit a sub-header if that severity has none of that type.
- **Residual Concerns**: Findings below confidence threshold that were promoted by cross-persona corroboration, plus unpromoted residual risks. Omit if none.
- **Summary line**: Always present after the reviewer list. Format: "Applied N fixes. K items need attention (X errors, Y omissions). Z FYI observations." Omit any zero clause except the FYI clause when zero (it's informative that none surfaced).
- **Applied fixes**: List all fixes that were applied automatically (`safe_auto` tier). Include enough detail per fix to convey the substance especially for fixes that add content or touch document meaning. Omit section if none.
- **P0-P3 sections**: Only include sections that have actionable findings (`gated_auto` or `manual`). Omit empty severity levels. Within each severity, separate into **Errors** and **Omissions** sub-headers. Omit a sub-header if that severity has none of that type. The `Tier` column surfaces whether a finding is `gated_auto` (concrete fix exists, Apply recommended in walk-through) or `manual` (requires user judgment).
- **FYI Observations**: Low-confidence `manual` findings above the 0.40 FYI floor but below the per-severity gate. Surface here for transparency; these are not actionable and do not enter the walk-through. Omit section if none.
- **Residual Concerns**: Residual concerns noted by personas that did not make it above the confidence gate. Listed for transparency; not promoted into the review surface (cross-persona agreement boost runs on findings that already survived the gate, per synthesis step 3.4). Omit section if none.
- **Deferred Questions**: Questions for later workflow stages. Omit if none.
- **Coverage**: Always include. All counts are **post-synthesis**. **Findings** must equal Auto + Present exactly -- if deduplication merged a finding across personas, attribute it to the persona with the highest confidence and reduce the other persona's count. **Residual** = count of `residual_risks` from this persona's raw output (not the promoted subset in the Residual Concerns section).
- **Coverage**: Always include. All counts are **post-synthesis**. **Findings** must equal Auto + Proposed + Decisions + FYI exactly if deduplication merged a finding across personas, attribute it to the persona with the highest confidence and reduce the other persona's count. **Residual** = count of `residual_risks` from this persona's raw output (not the promoted subset in the Residual Concerns section). The `Auto` column counts `safe_auto` findings, `Proposed` counts `gated_auto`, `Decisions` counts above-gate `manual`, and `FYI` counts below-gate `manual` findings at or above the 0.40 FYI floor.
## Chain-Rendering Rules
Premise-dependency chains from synthesis step 3.5c annotate roots and dependents. Rendering follows the same count invariant documented in the synthesis reference; this template restates the rules so interactive output cannot drift from the headless envelope.
- **Dependents render only under their root.** When a finding has `dependents`, render the root at its normal severity position (in its P-tier Errors or Omissions table). Immediately below the root's table row, emit an indented `Dependents (N)` sub-block listing each dependent's `# | Section | Issue | Reviewer | Confidence | Tier` entry. Dependents MUST NOT appear at their own severity position. Findings without `depends_on` and without `dependents` render as they do today.
- **Count invariant.** The `Findings` column in Coverage continues to equal Auto + Proposed + Decisions + FYI. Each finding counts exactly once: a dependent counts in its assigned bucket (`Auto` / `Proposed` / `Decisions` / `FYI`) but does NOT render at its own severity position. The source of truth is the post-Step-4 `dependents` array on each root — the same array the headless envelope reads — so coverage count and rendering cannot drift.
- **Chains line (optional).** When one or more chains exist, add a final line to the coverage block: `Chains: N root(s) with M dependents` where N is the number of roots and M is the total dependent count summed across all roots. Omit the line when no chains exist. This mirrors the `Chains:` line the headless envelope emits in `references/synthesis-and-presentation.md` so reviewers get the same chain visibility in both modes.

View File

@@ -1,6 +1,6 @@
# Document Review Sub-agent Prompt Template
This template is used by the document-review orchestrator to spawn each reviewer sub-agent. Variable substitution slots are filled at dispatch time.
This template is used by the ce-doc-review orchestrator to spawn each reviewer sub-agent. Variable substitution slots are filled at dispatch time.
---
@@ -18,35 +18,121 @@ Return ONLY valid JSON matching the findings schema below. No prose, no markdown
{schema}
**Schema conformance — hard constraints (use these exact values; validation rejects anything else):**
- `severity`: one of `"P0"`, `"P1"`, `"P2"`, `"P3"` — use these exact strings. Do NOT use `"high"`, `"medium"`, `"low"`, `"critical"`, or any other vocabulary, even if your persona's prose discusses priorities in those terms conceptually.
- `finding_type`: one of `"error"`, `"omission"` — nothing else (no `"tension"`, `"concern"`, `"observation"`, etc.).
- `autofix_class`: one of `"safe_auto"`, `"gated_auto"`, `"manual"`.
- `evidence`: an ARRAY of strings with at least one element. A single string value is a validation failure — wrap every quote in `["..."]` even when there is only one.
- `confidence`: a number between 0.0 and 1.0 inclusive.
If your persona description uses severity vocabulary like "high-priority" or "critical" in its rubric text, translate to the P0-P3 scale at emit time. "Critical / must-fix" → P0, "important / should-fix" → P1, "worth-noting / could-fix" → P2, "low-signal" → P3. Same for priorities described qualitatively in your analysis — map to P0-P3 on the way out.
Example of a schema-valid finding (all required fields, correct enum values, correct array shape):
```json
{
"title": "Deployment ordering between migration and code unspecified",
"severity": "P0",
"section": "Unit 4",
"why_it_matters": "The plan acknowledges both deploy orderings produce incorrect state but resolves neither, leaving implementers with no safe deploy recipe.",
"finding_type": "omission",
"autofix_class": "gated_auto",
"suggested_fix": "Require Units 1-4 to land in a single atomic PR, or define the sequence explicitly.",
"confidence": 0.92,
"evidence": [
"If the migration runs before Units 1-3 land, the code reads stale data.",
"If after, new code temporarily sees old entries until migration runs."
]
}
```
Rules:
- You are a leaf reviewer inside an already-running compound-engineering review workflow. Do not invoke compound-engineering skills or agents unless this template explicitly instructs you to. Perform your analysis directly and return findings in the required output format only.
- Suppress any finding below your stated confidence floor (see your Confidence calibration section).
- Every finding MUST include at least one evidence item -- a direct quote from the document.
- Every finding MUST include at least one evidence item a direct quote from the document.
- You are operationally read-only. Analyze the document and produce findings. Do not edit the document, create files, or make changes. You may use non-mutating tools (file reads, glob, grep, git log) to gather context about the codebase when evaluating feasibility or existing patterns.
- **Exclude prior-round deferred entries from review scope.** If the document under review contains a `## Deferred / Open Questions` section or subsections such as `### From YYYY-MM-DD review`, ignore that content — it is review output from prior rounds, not part of the document's actual plan/requirements content. Do not flag entries inside it as new findings. Do not quote its text as evidence. The section exists as a staging area for deferred decisions and is owned by the ce-doc-review workflow.
- Set `finding_type` for every finding:
- `error`: Something the document says that is wrong -- contradictions, incorrect statements, design tensions, incoherent tradeoffs.
- `omission`: Something the document forgot to say -- missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references.
- Set `autofix_class` based on whether there is one clear correct fix, not on severity or importance:
- `auto`: One clear correct fix, applied silently. This includes trivial fixes AND substantive ones:
- Internal reconciliation -- one document part authoritative over another (summary/detail mismatches, wrong counts, stale cross-references, terminology drift)
- Implied additions -- correct content mechanically obvious from the document (missing steps, unstated thresholds, completeness gaps)
- Codebase-pattern-resolved -- an established codebase pattern resolves ambiguity (cite the specific file/function in `why_it_matters`)
- Incorrect behavior -- the document describes behavior that is factually wrong, and the correct behavior is obvious from context or the codebase
- Missing standard security measures -- HTTPS enforcement, checksum verification, input sanitization, private IP rejection, or other controls with known implementations where omission is clearly a bug
- Incomplete technical descriptions -- the accurate/complete version is directly derivable from the codebase
- Missing requirements that follow mechanically from the document's own explicit, concrete decisions (not high-level goals -- a goal can be satisfied by multiple valid requirements)
The test is not "is this fix important?" but "is there more than one reasonable way to fix this?" If a competent implementer would arrive at the same fix independently, it is auto -- even if the fix is substantive. Always include `suggested_fix`. NOT auto if more than one reasonable fix exists or if scope/priority judgment is involved.
- `present`: Requires user judgment -- genuinely multiple valid approaches where the right choice depends on priorities, tradeoffs, or context the reviewer does not have. Examples: architectural choices with real tradeoffs, scope decisions, feature prioritization, UX design choices.
- `suggested_fix` is required for `auto` findings. For `present` findings, include only when the fix is obvious.
- `error`: Something the document says that is wrong contradictions, incorrect statements, design tensions, incoherent tradeoffs.
- `omission`: Something the document forgot to say missing mechanical steps, absent list entries, undefined thresholds, forgotten cross-references.
- Set `autofix_class` based on whether there is one clear correct fix, not on severity or importance. Three tiers:
- `safe_auto`: One clear correct fix, applied silently. Use only when there is genuinely one right answer — typo, wrong count, stale cross-reference, missing mechanically-implied step, terminology drift, factually incorrect behavior where the correct behavior is derivable from context. Always include `suggested_fix`.
- `gated_auto`: A concrete fix exists but it touches document meaning, scope, or author intent in a way that warrants a one-click confirmation before applying. Use for: substantive additions implied by the document's own decisions, codebase-pattern-resolved fixes, framework-native-API substitutions, missing standard security/reliability controls with known implementations. Always include `suggested_fix`. `gated_auto` is the default tier for "I know the fix, but the author should sign off."
- `manual`: Requires user judgment — genuinely multiple valid approaches where the right choice depends on priorities, tradeoffs, or context the reviewer does not have. Examples: architectural choices with real tradeoffs, scope decisions, feature prioritization, UX design choices. Include `suggested_fix` only when the fix is obvious despite the judgment call.
- **Strawman-aware classification rule.** When listing alternatives to the primary fix, count only alternatives a competent implementer would genuinely weigh. A "do nothing / accept the defect" option is NOT a real alternative — it is the failure state the finding describes. The same applies to framings like "document in release notes," "accept drift," or "defer to later" when they sidestep the actual problem rather than solving it. If the only alternatives to the primary fix are strawmen (the problem persists under them), the finding is `safe_auto` or `gated_auto`, not `manual`.
Positive example: "Cache key collision causes stale reads. Fix: include user-id in the cache key. Alternative: never cache this data." → The alternative (disable caching) is a legitimate design choice with real tradeoffs — `manual`.
Negative example: "Silent read-side failure on renamed config files. Fix: read new name, fall back to old with deprecation warning. Alternative: accept drift and document in release notes." → The alternative does not solve the problem; users on mid-flight runs still hit the failure. Treat as `gated_auto` with the concrete fix.
- **Strawman safeguard on `safe_auto`.** If you classify a finding as `safe_auto` via strawman-dismissal of alternatives, name the dismissed alternatives explicitly in `why_it_matters` so synthesis and the reader can see the reasoning. When ANY non-strawman alternative exists (even if you judge it weak), downgrade to `gated_auto` — silent auto-apply is reserved for findings with genuinely one option.
- **Auto-promotion patterns** (findings eligible for `safe_auto` or `gated_auto` even when they're substantive):
- Factually incorrect behavior where the correct behavior is derivable from context or the codebase
- Missing standard security or reliability controls with established implementations (HTTPS enforcement, checksum verification, input sanitization, private IP rejection, fallback-with-deprecation-warning on renames)
- Codebase-pattern-resolved fixes that cite a specific existing pattern in a concrete file or function (the citation is required in `why_it_matters`)
- Framework-native-API substitutions — a hand-rolled implementation duplicates first-class framework behavior (cite the framework API in `why_it_matters`)
- Completeness additions mechanically implied by the document's own explicit decisions (not high-level goals — a goal can be satisfied by multiple valid requirements)
- `suggested_fix` is required for `safe_auto` and `gated_auto` findings. For `manual` findings, include only when the fix is obvious.
- If you find no issues, return an empty findings array. Still populate residual_risks and deferred_questions if applicable.
- Use your suppress conditions. Do not flag issues that belong to other personas.
Writing `why_it_matters` (required field, every finding):
The `why_it_matters` field is how the reader — a developer triaging findings, a reader returning to the doc months later, a downstream automated surface — understands the problem without re-reading the file. Treat it as the most important prose field in your output; every downstream surface (walk-through questions, bulk-action previews, Open Questions entries, headless output) depends on it being good.
- **Lead with observable consequence.** Describe what goes wrong from the reader's or implementer's perspective — what breaks, what gets misread, what decision gets made wrong, what the downstream audience experiences. Do not lead with document structure ("Section X on line Y says..."). Start with the effect ("Implementers will disagree on which tier applies when..."). Section references appear later, only when the reader needs them to locate the issue.
- **Explain why the fix resolves the problem.** If you include a `suggested_fix`, the `why_it_matters` should make clear why that specific fix addresses the root cause. When a similar pattern exists elsewhere in the document or codebase (a parallel section, an established convention, a cited code pattern), reference it so the recommendation is grounded in what the team has already chosen.
- **Keep it tight.** Approximately 2-4 sentences. Longer framings are a regression — downstream surfaces have narrow display budgets, and verbose content gets truncated or skimmed.
- **Always produce substantive content.** `why_it_matters` is required by the schema. Empty strings, nulls, and single-phrase entries are validation failures. If you found something worth flagging (confidence at or above your persona's floor), you can explain it — the field exists because every finding needs a reason.
Illustrative pair — same finding, weak vs. strong framing:
```
WEAK (document-citation first; fails the observable-consequence rule):
Section "Classification Tiers" lists four tiers but Section "Synthesis"
routes three. Reconcile.
STRONG (observable consequence first, grounded fix reasoning):
Implementers will disagree on which tier a finding lands in, because
the Classification Tiers section enumerates four values while the
Synthesis routing only handles three. The document does not say which
enumeration is authoritative. Suggest the Classification Tiers list is
authoritative; drop the fourth value from the tier definition since
Synthesis already lacks a route for it.
```
False-positive categories to actively suppress:
- Pedantic style nitpicks (word choice, bullet vs. numbered lists, comma-vs-semicolon) — style belongs to the document author
- Issues that belong to other personas (see your Suppress conditions)
- Findings already resolved elsewhere in the document (search before flagging)
- Content inside `## Deferred / Open Questions` sections (prior-round review output, not document content)
**Advisory observations — route to FYI, do not force a decision.** If the honest answer to "what actually breaks if we don't fix this?" is "nothing breaks, but…", the finding is advisory. Ask: would a competent implementer hit a wrong outcome, a production bug, a misleading plan, or rework later? If no, set confidence in the **0.400.59 LOW/Advisory band** so synthesis routes the finding to FYI rather than surfacing it as a manual decision. Do not suppress — the observation still has value; it just does not warrant user judgment. Typical advisory shapes: naming asymmetry with no wrong answer, stylistic preference without evidence of impact, speculative future-work concern with no current signal, subjective readability note, theoretical scalability concern without baseline data, "could also be split" organizational preference when the current split is not broken.
</output-contract>
<review-context>
Document type: {document_type}
Document path: {document_path}
{decision_primer}
Document content:
{document_content}
</review-context>
<decision-primer-rules>
When the `<prior-decisions>` block above lists entries (round 2+), honor them:
- Do not re-raise a finding whose title and evidence pattern-match a prior-round rejected (Skipped or Deferred) entry, unless the current document state makes the concern materially different. "Materially different" means the section was substantively edited and your evidence quote no longer appears in the current text — a light-touch edit doesn't count.
- Prior-round Applied findings are informational: the orchestrator verifies those landed via its own matching predicate. You do not need to re-surface them. If the applied fix did not actually land (you find the same issue at the same location), flag it — synthesis will recognize it via the R30 fix-landed predicate.
- Round 1 (no prior decisions) runs with no primer constraints.
This is a soft instruction; the orchestrator enforces the rule authoritatively via synthesis-level suppression (R29) regardless of persona behavior. Following the primer here reduces noisy re-raises and keeps the Coverage section clean.
</decision-primer-rules>
```

View File

@@ -2,163 +2,339 @@
## Phase 3: Synthesize Findings
Process findings from all agents through this pipeline. **Order matters** -- each step depends on the previous.
Process findings from all agents through this pipeline. Order matters each step depends on the previous. The pipeline implements the finding-lifecycle state machine: **Raised → (Confidence Gate | FYI-eligible | Dropped) → Deduplicated → Classified → SafeAuto | GatedAuto | Manual | FYI**. Re-evaluate state at each step boundary; do not carry forward assumptions from earlier steps as prose-level shortcuts.
### 3.1 Validate
Check each agent's returned JSON against the findings schema:
- Drop findings missing any required field defined in the schema
- Drop findings with invalid enum values
- Drop findings with invalid enum values (including the pre-rename `auto` / `present` values from older personas — treat those as malformed until all persona output has been regenerated)
- Note the agent name for any malformed output in the Coverage section
### 3.2 Confidence Gate
### 3.2 Confidence Gate (Per-Severity)
Suppress findings below 0.50 confidence. Store them as residual concerns for potential promotion in step 3.4.
Gate findings using per-severity thresholds rather than a single flat floor:
| Severity | Gate |
|----------|------|
| P0 | 0.50 |
| P1 | 0.60 |
| P2 | 0.65 |
| P3 | 0.75 |
Findings at or above their severity's gate survive for classification. Findings below the gate are evaluated for FYI-eligibility:
- **FYI-eligible** when `autofix_class` is `manual` and confidence is between 0.40 (FYI floor) and the severity gate. These surface in an FYI subsection at the presentation layer (see 3.7) but do not enter the walk-through or any bulk action — they exist as observational value without forcing a decision.
- **Dropped** when confidence is below 0.40, or when the finding is `safe_auto` / `gated_auto` but below the severity gate (auto-apply findings need confidence above the decision gate to avoid silent mistakes).
Record the drop count and the FYI count in Coverage.
### 3.3 Deduplicate
Fingerprint each finding using `normalize(section) + normalize(title)`. Normalization: lowercase, strip punctuation, collapse whitespace.
When fingerprints match across personas:
- If the findings recommend **opposing actions** (e.g., one says cut, the other says keep), do not merge -- preserve both for contradiction resolution in 3.5
- If the findings recommend opposing actions (e.g., one says cut, the other says keep), do not merge — preserve both for contradiction resolution in 3.5
- Otherwise merge: keep the highest severity, keep the highest confidence, union all evidence arrays, note all agreeing reviewers (e.g., "coherence, feasibility")
- **Coverage attribution:** Attribute the merged finding to the persona with the highest confidence. Decrement the losing persona's Findings count *and* the corresponding route bucket (Auto or Present) so `Findings = Auto + Present` stays exact.
- **Coverage attribution:** Attribute the merged finding to the persona with the highest confidence. Decrement the losing persona's Findings count and the corresponding route bucket so totals stay exact.
### 3.4 Promote Residual Concerns
### 3.4 Cross-Persona Agreement Boost
Scan the residual concerns (findings suppressed in 3.2) for:
- **Cross-persona corroboration**: A residual concern from Persona A overlaps with an above-threshold finding from Persona B. Promote at P2 with confidence 0.55-0.65. Inherit `finding_type` from the corroborating above-threshold finding.
- **Concrete blocking risks**: A residual concern describes a specific, concrete risk that would block implementation. Promote at P2 with confidence 0.55. Set `finding_type: omission` (blocking risks surfaced as residual concerns are inherently about something the document failed to address).
When 2+ independent personas flagged the same merged finding (from 3.3), boost the merged finding's confidence by +0.10 (capped at 1.0). Independent corroboration is strong signal — multiple reviewers converging on the same issue is more reliable than any single reviewer's confidence. Note the boost in the Reviewer column of the output (e.g., "coherence, feasibility +0.10").
This replaces the earlier residual-concern promotion step. Findings below the confidence gate are not promoted back into the review surface; they appear in Coverage as residual concerns only. If a below-gate finding is genuinely important, the reviewer should raise their confidence or provide stronger evidence rather than relying on a promotion rule.
### 3.5 Resolve Contradictions
When personas disagree on the same section:
- Create a **combined finding** presenting both perspectives
- Set `autofix_class: present`
- Set `finding_type: error` (contradictions are by definition about conflicting things the document says, not things it omits)
- Create a combined finding presenting both perspectives
- Set `autofix_class: manual` (contradictions are by definition judgment calls)
- Set `finding_type: error` (contradictions are about conflicting things the document says, not things it omits)
- Frame as a tradeoff, not a verdict
Specific conflict patterns:
- Coherence says "keep for consistency" + scope-guardian says "cut for simplicity" -> combined finding, let user decide
- Feasibility says "this is impossible" + product-lens says "this is essential" -> P1 finding framed as a tradeoff
- Multiple personas flag the same issue -> merge into single finding, note consensus, increase confidence
### 3.6 Promote Pattern-Resolved Findings
- Coherence says "keep for consistency" + scope-guardian says "cut for simplicity" → combined finding, let user decide
- Feasibility says "this is impossible" + product-lens says "this is essential" → P1 finding framed as a tradeoff
- Multiple personas flag the same issue (no disagreement) → handled in 3.3 merge, not here
Scan `present` findings for codebase-pattern-resolved auto-eligibility. Promote `present` -> `auto` when **all three** conditions are met:
### 3.5b Deterministic Recommended-Action Tie-Break
1. The finding's `why_it_matters` cites a specific existing codebase pattern -- not just "best practice" or "convention," but a concrete pattern with a file, function, or usage reference
2. The finding includes a concrete `suggested_fix` that follows that cited pattern
3. There is no genuine tradeoff -- the codebase context resolves any ambiguity about which approach to use
Every merged finding carries exactly one `recommended_action` field consumed by the walk-through (`references/walkthrough.md`) to mark the `(recommended)` option, by LFG (`references/bulk-preview.md`) to choose what to execute in bulk, and by the stem's yes/no framing. When a merged finding was flagged by multiple personas who implied different actions, synthesis picks the recommended action deterministically so identical review artifacts produce identical walk-through and LFG behavior across runs.
The principle: when a reviewer mentions multiple theoretical approaches but the codebase already has an established pattern that makes one approach clearly correct, the codebase context settles the question. Alternatives mentioned in passing do not create a real tradeoff if the evidence shows the codebase has already chosen.
**Tie-break order (most conservative first):** `Skip > Defer > Apply`. The first action that at least one contributing persona implied wins, scanning in that order.
Additional auto-promotion patterns (promote `present` -> `auto` when):
- The finding identifies factually incorrect behavior in the document and the suggested fix describes the correct behavior (not a design choice between alternatives)
- The finding identifies a missing industry-standard security control where the document's own context makes the omission clearly wrong (not a legitimate design choice for the system described), and the suggested fix follows established practice
- The finding identifies an incomplete technical description and the complete version is directly derivable from the codebase (the reviewer cited specific code showing what the description should say)
- If any contributing persona implied Skip → `recommended_action: Skip`
- Else if any contributing persona implied Defer → `recommended_action: Defer`
- Else → `recommended_action: Apply`
Do not promote if the finding involves scope or priority changes where the document author may have weighed tradeoffs invisible to the reviewer.
**Persona-to-action mapping.** A persona implies an action through its classification:
- `safe_auto` or `gated_auto` → implies Apply
- `manual` with a concrete `suggested_fix` and a recommended resolution → implies Apply (the persona has an opinion about what to do)
- `manual` flagged as a tradeoff or scope question with no recommended resolution → implies Defer (worth revisiting, not worth acting now)
- Any persona flagging the finding as low-confidence or suppression-eligible via residual concerns → implies Skip
- Persona in the contradiction set (3.5) implying "keep as-is / do not change" → implies Skip
If the contributing personas are all silent on action (e.g., a merged `manual` finding from personas that all flagged it as observation without recommendation), pick the default based on whether the merged finding carries an executable `suggested_fix`:
- `suggested_fix` present → `recommended_action: Apply` as the pragmatic default.
- `suggested_fix` absent → `recommended_action: Defer` (the walk-through and LFG cannot execute Apply without a fix; routing an actionless finding to Defer surfaces it in Open Questions where the user can decide what to do with it).
This gate holds for every branch of the tie-break: if the winning action is `Apply` but the merged finding has no `suggested_fix` after 3.6 (Promote) and 3.7 (Route) have run, downgrade to `Defer`. The walk-through still lets the user pick any of the four options; this rule only governs the agent's default recommendation so LFG and bulk-preview never schedule a non-executable Apply.
**Conflict-context surface.** When the tie-break fires (contributing personas implied different actions), record a one-line conflict-context string on the merged finding. The walk-through renders this on the R15 conflict-context line (see `references/walkthrough.md`). Example: `Coherence recommends Apply; scope-guardian recommends Skip. Agent's recommendation: Skip.`
**Downstream invariant.** The walk-through and bulk-preview never recompute the recommendation — they read `recommended_action` and render `(recommended)` on the matching option. LFG-the-rest and routing option B execute the `recommended_action` across the scoped finding set in bulk. This keeps LFG outcomes reproducible and auditable: the same review artifact always produces the same bulk plan.
### 3.5c Premise-Dependency Chain Linking
Document reviews often produce fanout: a single premise challenge ("is this work justified?") generates downstream findings that all evaporate if the premise is rejected ("alias unjustified", "abstraction overkill", "migration lacks rollback", "naming forecloses future"). Surfacing each as an independent decision forces the user to re-litigate the same root question N times. This step links dependent findings to their root so presentation can group them and the walk-through can cascade a single root decision across the chain.
Run this step after 3.5b (recommended_action normalized) and before 3.6 (auto-promotion), operating on the merged finding set.
**Step 1: Identify roots.** A finding is a candidate root when ALL of the following hold:
- Severity is `P0` or `P1` (premise-level issues carry high priority by nature)
- `autofix_class` is `manual` (the root itself requires judgment — a safe/gated root is acted on, not cascaded)
- `why_it_matters` or `title` challenges a foundational premise, not a detail. Signal phrases (shape, not vocabulary): "premise unsupported", "justification missing", "do-nothing baseline not evaluated", "is X justified", "unsupported by evidence", "is the proposed solution the right approach"
- The finding's `section` is framing-level (Problem Frame, Overview, Why, Motivation, Goals) OR the finding explicitly questions whether a named component should exist
If multiple candidates match the criteria, elevate ALL of them. The criteria above (P0/P1, manual, framing-level section, premise-challenge signal phrases) are restrictive enough that this list will be short for any well-formed document; do not impose a further numerical cap. Picking only one root when two valid roots exist leaves the second root's natural dependents stranded as independent manual findings — the exact UX problem chains are meant to solve.
**Peer vs nested test.** Two candidate roots are peers when accepting root A's proposed fix would not resolve root B's concern (and vice versa). They are nested when one root's fix would moot the other — in which case the subsumed candidate becomes a dependent of the surviving root, not a peer root. Apply the test symmetrically: check both directions before deciding.
**Surviving-root selection under asymmetric subsumption.** When nested, the surviving root is the one whose fix moots the other — **not** the one with higher confidence. If accepting Root A's fix moots Root B's concern, but accepting Root B's fix leaves Root A's concern standing, A is the surviving root and B becomes its dependent, regardless of which candidate scored higher confidence. The subsumption direction determines scope (broader premise wins); confidence determines strength, not scope. Confidence is used for tie-breaking *among peers*, not for deciding which of two nested candidates dominates.
**Sanity diagnostic.** If more than 3 candidates match, reconsider whether the criteria are being applied correctly — it is unusual for a single document to contain more than 3 genuinely distinct premise-level challenges. Do not silently drop candidates; either confirm each one independently meets the criteria (and surface them all), or tighten the application of the criteria. If the count is legitimately high, surfacing all of them is more useful than hiding any.
If none match, skip the rest of this step — no chains exist.
**Dependent assignment under multiple roots.** When multiple roots exist and a candidate dependent could plausibly link to more than one, assign it to the root whose rejection most directly dissolves the dependent's concern. If ambiguity remains, assign to the higher-confidence root. A dependent never links to more than one root — a single `depends_on` value.
**Step 2: Identify dependents.** For each candidate root, scan the remaining findings for dependents. The predicate must match the cascade trigger in `references/walkthrough.md` — dependents cascade when the user rejects (Skip/Defer) the root, so dependency is defined on the rejection branch, not the acceptance branch. A finding is a dependent of a root when:
- The root challenges a foundational premise about a named component — questioning whether it should exist, whether the proposed approach is correct, or whether the work is justified. Shapes to recognize (not a vocabulary list — map to whatever the document's domain actually uses): a compatibility layer whose necessity is challenged, a planned feature whose justification is in doubt, an abstraction whose warrant is questioned, a proposed change whose scope is disputed, a migration target whose choice is contested, an architectural commitment whose basis is unsupported
- The candidate's `suggested_fix` modifies, adds detail to, or constrains that same component
- The candidate's concern would dissolve if the root's premise is rejected — meaning: if the user rejects the root (Skip/Defer), the component the dependent targets is no longer a settled part of the plan, so the dependent's fix has nothing stable to act on and batch-rejects with the root
Test with the substitution check: "If the user rejects the root (Skip/Defer), does the dependent's finding still describe an actionable concern the user would want to engage with this round?" If no — the dependent's premise dissolves alongside the root's — it is a dependent. If yes (the finding identifies a problem that survives root rejection), it is not.
**Step 3: Independence safeguard.** Even when a finding's target component is addressed by the root, do NOT link if:
- The dependent identifies a problem that would exist regardless of the root's resolution. A migration's rollback plan, a module's error handling, a feature's test coverage — these are operational obligations that don't evaporate when the premise changes. They describe how a component must behave if it exists at all.
- The dependent's `why_it_matters` cites evidence (codebase fact, framework convention, production data) that stands on its own, not conditioned on the premise
- The dependent is `safe_auto` — it has one clear correct fix and should apply regardless of the root's resolution
When uncertain, default to NOT linking. A mis-linked chain hides a real issue; leaving a finding unlinked only costs one extra decision.
**Step 4: Annotate.** On each dependent, record `depends_on: <root_finding_id>` (use section + normalized title as the id). On each root, record `dependents: [<dependent_ids>]`. Cap `dependents` at 6 entries per root — if more than 6 candidates link to the same root, keep the top 6 by severity then confidence and leave the rest unlinked (over-aggressive chaining risks obscuring independent concerns).
Do NOT reclassify, re-route, or change confidence of any finding in this step. Linking is purely annotative; the walk-through and presentation use the annotation, synthesis proper does not.
**Step 5: Report in Coverage.** Add a line to the coverage summary: `Chains: N root(s) with M total dependents`. When N = 0, omit the line.
**Count invariant (critical — do not violate).** `M` in the coverage line is the number of findings with `depends_on` set after Step 4 completes — i.e., the final linked count after steps 2 (candidacy), 3 (independence safeguard), and 4 (cap). It is NOT the number of candidates considered in Step 2. The same `dependents` array is the source of truth for both coverage counting AND rendering the `Dependents (...)` sub-block. If a finding appears in a root's `dependents` array, it MUST appear nested under that root in the presentation and MUST NOT appear at its own severity position. If a finding does NOT appear in any root's `dependents` array, it MUST appear at its own severity position and MUST NOT appear nested anywhere. Coverage count and rendering drift apart only if the orchestrator is using two different source-of-truth values — there is exactly one, the post-Step-4 `dependents` array on each root.
**Worked example A (rename-shape).** Review of a refactor plan surfaces 11 findings. One is P0 manual "Rename premise unsupported by user-facing evidence" in Problem Frame — a candidate root. Scanning the other 10:
- P1 manual "Alias mechanism unjustified scope" — root proposes scoping down to a pure alias-free rename; dependent's fix proposes dropping alias infrastructure. Linked.
- P2 manual "AliasedCommand abstraction overkill" — abstraction exists to support the alias; if alias dropped, abstraction dissolves. Linked.
- P2 manual "Rename forecloses dual-mode future" — concern only exists if rename proceeds. Linked.
- P2 manual "Identity drift: command vs artifact names" — naming asymmetry only exists if rename proceeds. Linked.
- P1 manual "Migration lacks rollback strategy" — migration needs rollback regardless of scope. NOT linked (independence safeguard).
- P0 gated_auto "Deployment-ordering between migration and code" — concrete fix user confirms regardless. NOT linked (safeguard: gated_auto with own resolution path).
Result: 1 root + 4 dependents. User sees the root first; rejecting it cascades the 4 dependents to auto-resolved. Manual engagement drops from 11 → 7 (6 unlinked + 1 visible root).
**Worked example B (auth-shape).** Review of a plan to introduce a new session-management middleware. One finding is P1 manual "Middleware rewrite premise unsupported — existing session handling has no reported reliability issues" in Problem Frame. Scanning the other findings:
- P2 manual "Middleware abstraction boundary unclear vs existing request context" — the boundary only matters if the middleware is built. Linked.
- P2 manual "Rollout strategy for new session store not specified" — the rollout only matters if the new store ships. Linked.
- P1 gated_auto "CSRF token regeneration missing on session rotation" — a real security gap in the plan's written design, independent of whether the middleware is the right approach. NOT linked (safeguard: gated_auto, concrete fix applies regardless).
- P2 manual "Existing session timeout behavior not captured in tests" — this is a pre-existing test coverage gap. It exists in the current code regardless of whether the rewrite happens. NOT linked (independence safeguard).
Result: 1 root + 2 dependents. The shape is the same as Example A — different vocabulary, different domain — which is the pattern to recognize.
### 3.6 Promote Auto-Eligible Findings
Scan `manual` findings for promotion to `safe_auto` or `gated_auto`. Promote when the finding meets one of the consolidated auto-promotion patterns:
- **Codebase-pattern-resolved.** `why_it_matters` cites a specific existing codebase pattern (concrete file/function/usage reference, not just "best practice" or "convention"), and `suggested_fix` follows that pattern. Promote to `gated_auto` — the user still confirms, but the codebase evidence resolves ambiguity.
- **Factually incorrect behavior.** The document describes behavior that is factually wrong, and the correct behavior is derivable from context or the codebase. Promote to `gated_auto`.
- **Missing standard security/reliability controls.** The omission is clearly a gap (not a legitimate design choice for the system described), and the fix follows established practice (HTTPS enforcement, checksum verification, input sanitization, fallback-with-deprecation-warning on renames). Promote to `gated_auto`.
- **Framework-native-API substitutions.** A hand-rolled implementation duplicates first-class framework behavior, and the framework API is cited. Promote to `gated_auto`.
- **Mechanically-implied completeness additions.** The missing content follows mechanically from the document's own explicit, concrete decisions (not high-level goals). Promote to `safe_auto` when there is genuinely one correct addition; `gated_auto` when the addition is substantive.
Do not promote if the finding involves scope or priority changes where the author may have weighed tradeoffs invisible to the reviewer.
**Strawman-downgrade safeguard.** If a `safe_auto` finding names dismissed alternatives in `why_it_matters` (per the subagent template's strawman rule), verify the alternatives are genuinely strawmen. If any alternative is a plausible design choice that the persona dismissed too aggressively, downgrade to `gated_auto` so the user sees the tradeoff before the fix applies.
### 3.7 Route by Autofix Class
**Severity and autofix_class are independent.** A P1 finding can be `auto` if the correct fix is obvious. The test is not "how important?" but "is there one clear correct fix, or does this require judgment?"
**Severity and autofix_class are independent.** A P1 finding can be `safe_auto` if the correct fix is obvious. The test is not "how important?" but "is there one clear correct fix, or does this require judgment?"
| Autofix Class | Route |
|---------------|-------|
| `auto` | Apply automatically -- one clear correct fix. Includes internal reconciliation (one part authoritative over another), additions mechanically implied by the document's own content, and codebase-pattern-resolved fixes where codebase evidence makes one approach clearly correct. |
| `present` | Present individually for user judgment |
| `safe_auto` | Apply silently in Phase 4. Requires `suggested_fix`. Demote to `gated_auto` if missing. |
| `gated_auto` | Enter the per-finding walk-through with Apply marked (recommended). Requires `suggested_fix`. Demote to `manual` if missing. |
| `manual` | Enter the per-finding walk-through with user-judgment framing. `suggested_fix` is optional. |
| FYI-subsection | `manual` findings below the severity gate but at or above the FYI floor (0.40) — surface in a distinct FYI subsection of the presentation, do not enter the walk-through or any bulk action. |
Demote any `auto` finding that lacks a `suggested_fix` to `present`.
**Auto-eligible patterns for safe_auto:** summary/detail mismatch (body authoritative over overview), wrong counts, missing list entries derivable from elsewhere in the document, stale internal cross-references, terminology drift, prose/diagram contradictions where prose is more detailed, missing steps mechanically implied by other content, unstated thresholds implied by surrounding context.
**Auto-eligible patterns:** summary/detail mismatch (body is authoritative over overview), wrong counts, missing list entries derivable from elsewhere in the document, stale internal cross-references, terminology drift, prose/diagram contradictions where prose is more detailed, missing steps mechanically implied by other content, unstated thresholds implied by surrounding context, completeness gaps where the correct addition is obvious, codebase-pattern-resolved fixes where the reviewer cites a specific existing pattern and the suggested_fix follows it, factually incorrect behavior where the correct behavior is obvious from context or the codebase, missing standard security controls with known implementations, incomplete technical descriptions where the complete version is derivable from the codebase. If the fix requires judgment about *what* to do (not just *what to write*) and the codebase context does not resolve the ambiguity, it belongs in `present`.
**Auto-eligible patterns for gated_auto:** codebase-pattern-resolved fixes, factually incorrect behavior, missing standard security/reliability controls, framework-native-API substitutions, substantive completeness additions mechanically implied by explicit decisions.
### 3.8 Sort
Sort findings for presentation: P0 -> P1 -> P2 -> P3, then by finding type (errors before omissions), then by confidence (descending), then by document order (section position).
Sort findings for presentation: P0 P1 P2 P3, then by finding type (errors before omissions), then by confidence (descending), then by document order (section position).
## Phase 4: Apply and Present
### Apply Auto-fixes
### Apply safe_auto fixes
Apply all `safe_auto` findings to the document in a single pass:
Apply all `auto` findings to the document in a **single pass**:
- Edit the document inline using the platform's edit tool
- Track what was changed for the "Auto-fixes Applied" section
- Do not ask for approval -- these have one clear correct fix
- Track what was changed for the "Applied fixes" section in the rendered output (`safe_auto` is the internal enum; the rendered section header reads "Applied fixes")
- Do not ask for approval these have one clear correct fix
List every auto-fix in the output summary so the user can see what changed. Use enough detail to convey the substance of each fix (section, what was changed, reviewer attribution). This is especially important for fixes that add content or touch document meaning -- the user should not have to diff the document to understand what the review did.
List every applied fix in the output summary so the user can see what changed. Use enough detail to convey the substance of each fix (section, what was changed, reviewer attribution). This is especially important for fixes that add content or touch document meaning the user should not have to diff the document to understand what the review did.
### Present Remaining Findings
### Route Remaining Findings
**Headless mode:** Do not use interactive question tools. Output all non-auto findings as a structured text summary the caller can parse and act on:
After safe_auto fixes apply, remaining findings split into buckets:
- `gated_auto` and `manual` findings at or above the severity gate → enter the routing question (see Unit 5 / `references/walkthrough.md`)
- FYI-subsection findings → surface in the presentation only, no routing
- Zero actionable findings remaining → skip the routing question; flow directly to Phase 5 terminal question
**Headless mode:** Do not use interactive question tools. Output all findings as a structured text envelope the caller can parse. Internal enum values (`safe_auto`, `gated_auto`, `manual`, `FYI`) stay in the schema and synthesis prose; the envelope below uses user-facing vocabulary — "fixes", "Proposed fixes", "Decisions", "FYI observations" — so headless output reads the same way interactive output does.
```
Document review complete (headless mode).
Applied N auto-fixes:
Applied N fixes:
- <section>: <what was changed> (<reviewer>)
- <section>: <what was changed> (<reviewer>)
Findings (requires judgment):
Proposed fixes (concrete fix, requires user confirmation):
[P0] Section: <section> — <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Suggested fix: <suggested_fix or "none">
Suggested fix: <suggested_fix>
Decisions (requires user judgment):
[P1] Section: <section> — <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Suggested fix: <suggested_fix or "none">
Dependents (would resolve if this root is rejected):
[P2] Section: <section> — <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
[P2] Section: <section> — <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
FYI observations (low-confidence, no decision required):
[P3] Section: <section> — <title> (<reviewer>, confidence <N>)
Why: <why_it_matters>
Residual concerns:
- <concern> (<source>)
Deferred questions:
- <question> (<source>)
Review complete
```
Omit any section with zero items. Then proceed directly to Phase 5 (which returns immediately in headless mode).
Omit any section with zero items. The section headers reflect user-facing vocabulary: the "Proposed fixes" bucket carries `gated_auto` findings (the persona has a concrete fix; the user confirms), "Decisions" carries above-gate `manual` findings (judgment calls), and "FYI observations" carries `manual` findings between the 0.40 FYI floor and the per-severity gate. When a root has dependents, render the root at its normal position in the severity-sorted list and nest its dependents as an indented `Dependents (...)` sub-block immediately below. Do not re-list dependents at their own severity position — they appear only under their root. End with "Review complete" as the terminal signal so callers can detect completion.
**Interactive mode:**
Present `present` findings using the review output template (read `references/review-output-template.md`). Within each severity level, separate findings by type:
- **Errors** (design tensions, contradictions, incorrect statements) first -- these need resolution
- **Omissions** (missing steps, absent details, forgotten entries) second -- these need additions
Present findings using the review output template (read `references/review-output-template.md`). Within each severity level, separate findings by type:
Brief summary at the top: "Applied N auto-fixes. K findings to consider (X errors, Y omissions)."
- Errors (design tensions, contradictions, incorrect statements) first — these need resolution
- Omissions (missing steps, absent details, forgotten entries) second — these need additions
Include the Coverage table, auto-fixes applied, residual concerns, and deferred questions.
Brief summary at the top: "Applied N fixes. K items need attention (X errors, Y omissions). Z FYI observations."
Include the Coverage table, applied fixes, FYI observations (as a distinct subsection), residual concerns, and deferred questions.
### R29 Rejected-Finding Suppression (Round 2+)
When the orchestrator is running round 2+ on the same document in the same session, the decision primer (see `SKILL.md` — Decision primer) carries forward every prior-round Skipped, Deferred, and Acknowledged finding. Synthesis suppresses re-raised rejected findings rather than re-surfacing them to the user. Acknowledged is treated as a rejected-class decision here: the user saw the finding, chose not to act on it (no Apply, no Defer append), and wants it on record — equivalent to Skip for suppression purposes.
For each current-round finding, compare against the primer's rejected list:
- **Matching predicate:** same as R30 — `normalize(section) + normalize(title)` fingerprint augmented with evidence-substring overlap check (>50%). If a current-round finding matches a prior-round rejected finding on fingerprint AND evidence overlap, drop the current-round finding.
- **Materially-different exception:** if the current document state has changed around the finding's section since the prior round (e.g., the section was edited and the evidence quote no longer appears in the current text), treat the finding as new — the underlying context shifted and the concern may be genuinely different now. The persona's evidence itself reveals this: a quote that doesn't appear in the current document is a signal the prior-round rejection no longer applies.
- **On suppression:** record the drop in Coverage with a "previously rejected, re-raised this round" note so the user can see what was suppressed. The user can explicitly escalate by invoking the review again on a different context if they believe the suppression was wrong.
This rule runs at synthesis time, not at the persona level. Personas have a soft instruction via the subagent template's `{decision_primer}` variable to avoid re-raising rejected findings, but the orchestrator is the authoritative gate — if a persona re-raises despite the primer, synthesis drops the finding.
### R30 Fix-Landed Matching Predicate
When the orchestrator is running round 2+ on the same document (see Unit 7 multi-round memory), synthesis verifies that prior-round Applied findings actually landed. For each prior-round Applied finding:
- **Matching predicate:** `normalize(section) + normalize(title)` (same fingerprint as 3.3 dedup) augmented with an evidence-substring overlap check. If any current-round persona raises a finding whose fingerprint matches a prior-round Applied finding AND shares >50% of its evidence substring with the prior-round evidence, treat it as a fix-landed regression.
- **Section renames count as different locations.** If the section name has changed between rounds (edit introduced a heading rename), treat the new section as a different location and the current-round finding as new.
- **On match:** flag the finding as "fix did not land" in the report rather than surfacing it as a new finding. Include the prior-round finding's title and the current-round persona's evidence so the user can see why the verification flagged it.
### Protected Artifacts
During synthesis, discard any finding that recommends deleting or removing files in:
- `docs/brainstorms/`
- `docs/plans/`
- `docs/solutions/`
These are pipeline artifacts and must not be flagged for removal.
## Phase 5: Next Action
## Phase 5: Next Action — Terminal Question
**Headless mode:** Return "Review complete" immediately. Do not ask questions. The caller receives the text summary from Phase 4 and handles any remaining findings.
**Headless mode:** Return "Review complete" immediately. Do not ask questions. The caller receives the text envelope from Phase 4 and handles any remaining findings.
**Interactive mode:**
**Interactive mode:** fire the terminal question using the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). This question is distinct from the mid-flow routing question (`references/walkthrough.md`) — the routing question chooses *how* to engage with findings, this one chooses *what to do next* once engagement is complete. Do not merge them.
**Ask using the platform's interactive question tool** -- do not print the question as plain text output:
- Claude Code: `AskUserQuestion`
- Codex: `request_user_input`
- Gemini: `ask_user`
- Fallback (no question tool available): present numbered options and stop; wait for the user's next message
**Stem:** `Apply decisions and what next?`
Offer these two options. Use the document type from Phase 1 to set the "Review complete" description:
**Options (three by default; two in the zero-actionable case):**
1. **Refine again** -- Address the findings above, then re-review
2. **Review complete** -- description based on document type:
- requirements document: "Create technical plan with ce-plan"
- plan document: "Implement with ce-work"
When `fixes_applied_count > 0` (at least one safe_auto or Apply decision has landed this session):
After 2 refinement passes, recommend completion -- diminishing returns are likely. But if the user wants to continue, allow it.
```
A. Apply decisions and proceed to <next stage>
B. Apply decisions and re-review
C. Exit without further action
```
Return "Review complete" as the terminal signal for callers.
When `fixes_applied_count == 0` (zero-actionable case, or the user took routing option D / every walk-through decision was Skip):
```
A. Proceed to <next stage>
B. Exit without further action
```
The `<next stage>` substitution uses the document type from Phase 1:
- Requirements document → `ce-plan`
- Plan document → `ce-work`
**Label adaptation:** when no decisions are queued to apply, the primary option drops the `Apply decisions and` prefix — the label should match what the system is doing. `Apply decisions and proceed` when fixes are queued; `Proceed` when nothing is queued.
**Caller-context handling (implicit):** the terminal question's "Proceed to <next stage>" option is interpreted contextually by the agent from the visible conversation state. When ce-doc-review is invoked from inside another skill's flow (e.g., ce-brainstorm Phase 4 re-review, ce-plan phase 5.3.8), the agent does not fire a nested `/ce-plan` or `/ce-work` dispatch — it returns control to the caller's flow which continues its own logic. When invoked standalone, "Proceed" dispatches the appropriate next skill. No explicit caller-hint argument is required; if this implicit handling proves unreliable in practice, an explicit `nested:true` flag can be added as a follow-up.
### Iteration limit
After 2 refinement passes, recommend completion — diminishing returns are likely. But if the user wants to continue, allow it; the primer carries all prior-round decisions so later rounds suppress repeat findings cleanly.
Return "Review complete" as the terminal signal for callers, regardless of which option the user picked.
## What NOT to Do
@@ -170,4 +346,4 @@ Return "Review complete" as the terminal signal for callers.
## Iteration Guidance
On subsequent passes, re-dispatch personas and re-synthesize. The auto-fix mechanism and confidence gating prevent the same findings from recurring once fixed. If findings are repetitive across passes, recommend completion.
On subsequent passes, re-dispatch personas with the multi-round decision primer (see Unit 7) and re-synthesize. Fixed findings self-suppress because their evidence is gone from the current doc; rejected findings are handled by the R29 pattern-match suppression rule; applied-fix verification uses the R30 matching predicate above. If findings are repetitive across passes after these mechanisms run, recommend completion.

View File

@@ -0,0 +1,282 @@
# Per-finding Walk-through
This reference defines Interactive mode's per-finding walk-through — the path the user enters by picking option A (`Review each finding one by one — accept the recommendation or choose another action`) from the routing question, plus the unified completion report that every terminal path (walk-through, LFG, Append-to-Open-Questions, zero findings) emits.
Interactive mode only.
---
## Routing question (the entry point)
After `safe_auto` fixes apply and synthesis produces the remaining finding set, the orchestrator asks a four-option routing question before any walk-through or bulk action runs.
Use the platform's blocking question tool (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). In Claude Code, the tool should already be loaded from the Interactive-mode pre-load step in `SKILL.md` — if it isn't, call `ToolSearch` with query `select:AskUserQuestion` now. Rendering the routing question as narrative text is a bug, not a valid fallback.
**Stem:** `What should the agent do with the remaining N findings?`
**Options (fixed order; no option is labeled `(recommended)` — the routing choice is user-intent):**
```
A. Review each finding one by one — accept the recommendation or choose another action
B. LFG. Apply the agent's best-judgment action per finding
C. Append findings to the doc's Open Questions section and proceed
D. Report only — take no further action
```
The per-finding `(recommended)` labeling lives inside the walk-through (option A) and the bulk preview (options B/C), where it's applied per-finding from synthesis step 3.5b's `recommended_action`. The routing question itself does not recommend one of A/B/C/D because the right route depends on user intent (engage / trust / triage / skim), not on the finding-set shape — a rule that mapped finding-set shape to routing recommendation (e.g., "most findings are Apply-shaped → recommend LFG") would pressure users toward automated paths in ways that conflict with the user-intent framing.
If all remaining findings are FYI-subsection-only (no `gated_auto` or above-gate `manual` findings), skip the routing question entirely and flow to the Phase 5 terminal question.
**Append-availability adaptation.** When `references/open-questions-defer.md` has cached `append_available: false` at Phase 4 start (e.g., read-only document, unwritable filesystem), option C is suppressed from the routing question because every per-finding Defer would fail into the open-questions failure path. The menu shows three options (A / B / D) and the stem appends one line explaining why (e.g., `Append to Open Questions unavailable — document is read-only in this environment.`). This mirrors the per-finding option B suppression described under "Adaptations" below — both routing-level and per-finding Defer paths share the same availability signal so the user never sees Defer surfaced at one level and omitted at the other.
**Dispatch by selection:**
- **A** — load this walk-through (per-finding loop). Apply decisions accumulate in memory; Open-Questions defers execute inline via `references/open-questions-defer.md`; Skip decisions are recorded as no-action; `LFG the rest` routes through `references/bulk-preview.md`.
- **B** — load `references/bulk-preview.md` scoped to every pending `gated_auto` / `manual` finding. On Proceed, execute the plan: Apply → end-of-batch document edit; Open-Questions defers → `references/open-questions-defer.md`; Skip → no-op. On Cancel, return to the routing question.
- **C** — load `references/bulk-preview.md` with every pending finding in the Open-Questions bucket (regardless of the agent's natural recommendation). On Proceed, route every finding through `references/open-questions-defer.md`; no document edits apply. On Cancel, return to the routing question.
- **D** — do not enter any dispatch phase. Emit the completion report and flow to Phase 5 terminal question.
---
## Entry (walk-through mode)
The walk-through receives, from the orchestrator:
- The merged findings list in severity order (P0 → P1 → P2 → P3), filtered to `gated_auto` and `manual` findings that survived the per-severity confidence gate. FYI-subsection findings are not included — they surface in the final report only and have no walk-through entry.
- The run id for artifact lookups (when applicable).
- Premise-dependency chain annotations from synthesis step 3.5c: each finding may carry `depends_on: <root_id>` or `dependents: [<ids>]`.
Each finding's recommended action has already been normalized by synthesis step 3.5b (Deterministic Recommended-Action Tie-Break, `Skip > Defer > Apply`) — the walk-through surfaces that recommendation via the merged finding's `recommended_action` field and does not recompute it.
**Root-first iteration order.** When a finding has `dependents`, iterate it before any of its dependents regardless of severity order within the chain. The root always comes first so the user's root decision can cascade.
**Cascading root decisions.** When the user picks Skip or Defer on a finding with `dependents`:
1. Announce the cascade in the terminal before firing the next question: "Skipping/Deferring this root will auto-resolve N dependent finding(s): {titles}. Continue?"
2. Use the platform's blocking question tool with two options: `Cascade — apply same action to all dependents` (recommended) and `Decide each dependent individually`. Labels must be self-contained per the AskUserQuestion rules.
3. On Cascade: apply the root's action to every dependent and skip those findings' walk-through entries. Persistence follows the per-action routing rules from "Per-finding routing" below — the canonical home for every cascaded decision is the in-memory decision list (annotated with `cascaded from {root_title}` and the cascaded action), plus any action-specific side effect:
- Cascaded `Apply` — add the dependent id to the Apply set and record in the decision list.
- Cascaded `Defer` — invoke the open-questions append flow for the dependent and record the append outcome in the decision list. If the append fails, fall back to the per-finding failure path (Retry / Record only / Convert to Skip) for that dependent before advancing the cascade.
- Cascaded `Skip` — record in the decision list only; no Apply-set entry, no open-questions append.
On Individual: proceed normally — the root's dependents each get their own walk-through entry.
When the user picks Apply on a root, do NOT cascade — the premise held, so dependents each need their own decision. Proceed through the walk-through normally.
**Orphaned dependents.** If a dependent's root was rejected in a prior round and the root is suppressed this round (per R29), treat the dependent as a standalone finding with no chain context. Do not reference the missing root.
---
## Per-finding presentation
Each finding is presented in two parts: a terminal output block carrying the explanation, and a question via the platform's blocking question tool carrying the decision. Never merge the two — the terminal block uses markdown; the question uses plain text.
### Terminal output block (print before firing the question)
Render as markdown. Labels on their own line, blank lines between sections:
```
## Finding {N} of {M} — {severity} {plain-English title}
Section: {section}
**What's wrong**
{plain-English problem statement from why_it_matters}
**Proposed fix**
{suggested_fix — rendered per the substitution rules below: prose-first, intent-language}
**Why it works**
{short reasoning, grounded in a pattern cited in the document or codebase when available}
{Conflict-context line, when applicable — see below}
```
Substitutions:
- **`{plain-English title}`** — a 38 word summary suitable as a heading. Derived from the merged finding's `title` field but rephrased so it reads as observable consequence (e.g., "Implementers will pick different tiers" rather than "Section X-Y lists four tiers"). For document-review findings, observable consequence is the *effect on a reader, implementer, or downstream decision*, not runtime behavior.
- **`{section}`** — from the finding's `section` field.
- **`why_it_matters`** — from the merged finding's `why_it_matters` field. Rendered as-is; the subagent template's framing guidance ensures it's already observable-consequence-first.
- **`suggested_fix`** — from the merged finding's `suggested_fix` field. Render as prose describing intent, not as raw markup. The user's job is to trust or reject the action — they don't need to review exact text. Rules:
- **Default — one sentence describing the effect.** What does the fix achieve, and where does it live? Prefer intent language over quoted text.
- Good: `Drop the Advisory tier from the enum; advisory-style findings surface in an FYI subsection at the presentation layer.`
- Good: `Add a deployment-ordering constraint requiring Units 3 and 4 in a single commit.`
- Bad: `Change "autofix_class: [auto, gated_auto, advisory, present]" to "autofix_class: [safe_auto, gated_auto, manual]" in findings-schema.json on line 48.` — too syntax-focused for a decision loop
- **Code-span budget** — at most 2 inline backtick spans per sentence, each a single identifier, flag, or short phrase (e.g., `` `safe_auto` ``, `` `<work-context>` ``). Always leave a space before and after each backtick span.
- **Raw code blocks** — only for short (≤5-line) genuinely additive content where no before-state exists. Above 5 lines, switch to a summary.
- **No diff blocks.** Document mutations render as prose.
- **`Why it works`** — grounded reasoning that, where possible, references a similar pattern already used in the document or codebase. One to three sentences.
- **Conflict-context line (when applicable)** — when contributing personas implied different actions for this finding and synthesis step 3.6 broke the tie, surface that briefly. Example: `Coherence recommends Apply; scope-guardian recommends Skip. Agent's recommendation: Skip.` The orchestrator's recommendation — the post-tie-break value — is what the menu labels "recommended."
### Question stem (short, decision-focused)
After the terminal block renders, fire the platform's blocking question tool with a compact two-line stem:
```
Finding {N} of {M} — {severity} {short handle}.
{Action framing in a phrase}?
```
Where:
- **Short handle** matches the `{plain-English title}` from the terminal block heading.
- **Action framing** — one phrase describing what the single recommended action does, as a yes/no question. Examples: `Apply the rename?`, `Defer to Open Questions since the tradeoff is genuine?`, `Skip since the document already resolves this elsewhere?`.
Never enumerate alternatives in the stem. One recommendation as a yes/no — the option list carries the alternatives. When the recommendation is close, surface the disagreement in the conflict-context line, not as a multi-option stem.
### Confirmation between findings
After the user answers and before printing the next finding's terminal block, emit a one-line confirmation of the action taken. Examples: `→ Applied. Edit staged at "Scope Boundaries" section.`, `→ Deferred. Entry appended to "## Deferred / Open Questions".`, `→ Skipped.`
### Options (four; adapted as noted)
Fixed order. Never reorder:
```
A. Apply the proposed fix
B. Defer — append to the doc's Open Questions section
C. Skip — don't apply, don't append
D. LFG the rest — apply the agent's best judgment to this and remaining findings
```
**Mark the post-tie-break recommendation with `(recommended)` on its option label.** Required, not optional. Only A, B, or C can carry it — synthesis emits `recommended_action` as Apply/Defer/Skip, which maps to A/B/C. D (`LFG the rest`) is a workflow shortcut for bulk execution across remaining findings, not a finding-level resolution action, so it is never marked `(recommended)`.
```
A. Apply the proposed fix (recommended)
B. Defer — append to the doc's Open Questions section
C. Skip — don't apply, don't append
D. LFG the rest
```
When reviewers disagreed or evidence cuts against the default, still mark one option — whichever synthesis produced — and surface the disagreement in the conflict-context line.
### Adaptations
- **N=1 (exactly one pending finding):** the terminal block's heading omits `Finding N of M` and renders as `## {severity} {plain-English title}`. The stem's first line drops the position counter, becoming `{severity} {short handle}.` Option D (`LFG the rest`) is suppressed because no subsequent findings exist — the menu shows three options: Apply / Defer / Skip.
- **Open-Questions append unavailable** (read-only document, write-failed): when `references/open-questions-defer.md` reports the in-doc append mechanic cannot run, option B is omitted. The stem appends one line explaining why (e.g., `Defer unavailable — document is read-only in this environment.`). The menu shows three options: Apply / Skip / LFG the rest. Before rendering options, remap any per-finding `Defer` recommendation from synthesis to `Skip` so the `(recommended)` marker lands on an option that's actually in the menu. Surface the remap on the conflict-context line (e.g., `Synthesis recommended Defer; downgraded to Skip — document is read-only.`).
- **Combined N=1 + no-append:** the menu shows two options: Apply / Skip.
Only when `ToolSearch` explicitly returns no match or the tool call errors — or on a platform with no blocking question tool — fall back to presenting the options as a numbered list and waiting for the user's next reply.
---
## Per-finding routing
For each finding's answer:
- **Apply the proposed fix** — add the finding's id to an in-memory Apply set. Advance to the next finding. Do not edit the document inline — Apply accumulates for end-of-walk-through batch execution. **No-fix guard:** if the merged finding has no `suggested_fix` (possible on `manual` findings where the persona flagged the issue as observation without a concrete resolution), Apply is not executable. Do not add the finding to the Apply set. Instead, surface the no-fix sub-question described below before advancing.
- **Defer — append to Open Questions section** — invoke the append flow from `references/open-questions-defer.md`. The walk-through's position indicator stays on the current finding during any failure-path sub-question (Retry / Fall back / Convert to Skip). On success, record the append location and reference in the in-memory decision list and advance. On conversion-to-Skip from the failure path, advance with the failure noted in the completion report.
- **Skip — don't apply, don't append** — record Skip in the in-memory decision list. Advance. No side effects.
- **LFG the rest — apply the agent's best judgment** — exit the walk-through loop. Dispatch the bulk preview from `references/bulk-preview.md`, scoped to the current finding and everything not yet decided. The preview header reports the count of already-decided findings ("K already decided"). If the user picks Cancel from the preview, return to the current finding's per-finding question (not to the routing question). If the user picks Proceed, execute the plan per `references/bulk-preview.md` — Apply findings join the in-memory Apply set with the ones the user already picked, Defer findings route through `references/open-questions-defer.md`, Skip is no-op — then proceed to end-of-walk-through execution.
### No-fix sub-question (Apply picked on a finding with no `suggested_fix`)
Synthesis step 3.5b demotes the default recommendation from Apply to Defer for any merged finding without a `suggested_fix`, so `(recommended)` never lands on Apply for these findings. But the menu still lets the user pick Apply manually. When that happens, do not add the finding to the Apply set — the execution pass has no edit payload to apply, which would either fail the batch or record a misleading "applied" outcome.
Fire a blocking sub-question using the platform's question tool. The stem explains why Apply is not executable in one line, then offers three self-contained options. Position indicator stays on the current finding while the sub-question is open.
**Stem:** `Apply isn't executable for this finding — the review surfaced the issue without a concrete fix. How should the agent proceed?`
**Options (fixed order):**
```
A. Defer to Open Questions (recommended)
B. Skip — don't apply, don't append
C. Acknowledge without applying — record the decision, no document edit
```
**Routing:**
- **A. Defer to Open Questions** — invoke the append flow from `references/open-questions-defer.md` as though the user had originally picked Defer. Failure-path handling is identical (Retry / Fall back / Convert to Skip). On success, record the append location in the decision list (annotated `redirected from Apply — no suggested_fix`) and advance.
- **B. Skip** — record Skip in the decision list (annotated `redirected from Apply — no suggested_fix`). Advance. No side effects.
- **C. Acknowledge without applying** — record the finding in the decision list as `acknowledged` (annotated `Apply picked but no suggested_fix — no edit dispatched`). Do not add to the Apply set. Advance. The completion report surfaces Acknowledged as its own dedicated bucket with its own count, its own per-finding action label, and its own position in the report ordering (`Applied / Deferred / Skipped / Acknowledged`) — see "Minimum required fields" and "Report ordering" in the unified completion report section below for the full contract. The acknowledgement reason is surfaced on each per-finding line. For round-to-round suppression (distinct from report display), Acknowledged decisions carry forward in the multi-round decision primer as a rejected-class decision alongside Skip and Defer so round-N+1 synthesis suppresses re-raises via R29 — semantically the user saw the finding, chose not to act, and wants it recorded, which is equivalent to Skip for suppression purposes but remains its own bucket in the report.
**Availability adaptation.** When `references/open-questions-defer.md` has cached `append_available: false` for the session, omit option A and surface one line in the stem explaining why (e.g., `Defer unavailable — document is read-only in this environment.`). The menu becomes Skip / Acknowledge without applying, with Skip labeled `(recommended)`.
**Cascading roots.** When the finding is a root with dependents and the user picks A (Defer) or B (Skip) from this sub-question, run the cascade announcement in "Cascading root decisions" above — treat the sub-question's choice as the root's effective action. Option C (Acknowledge) does not cascade; the root is recorded as acknowledged and dependents each get their own walk-through entry.
---
## Override rule
"Override" means the user picks a different preset action (Defer or Skip in place of Apply, or Apply in place of the agent's recommendation). No inline freeform custom-fix authoring — the walk-through is a decision loop, not a pair-editing surface. A user who wants a variant of the proposed fix picks Skip and hand-edits outside the flow; if they also want the finding tracked, they can Defer first and edit afterward.
---
## State
Walk-through state is **in-memory only**. The orchestrator maintains:
- An Apply set (finding ids the user picked Apply on)
- A decision list (every answered finding with its action and any metadata like `append_location` for Deferred or `reason` for Skipped)
- The current position in the findings list
Nothing is written to disk per-decision except the in-doc Open Questions appends (which are external side effects — those cannot be rolled back). An interrupted walk-through (user cancels the prompt, session compacts, network dies) discards all in-memory state. Apply decisions have not been dispatched yet (they batch at end-of-walk-through), so they are cleanly lost with no document changes.
Cross-session persistence is out of scope. Mirrors `ce-code-review`'s walk-through state rules.
---
## End-of-walk-through execution
After the loop terminates — either every finding has been answered, or the user took `LFG the rest → Proceed` — the walk-through hands off to the execution phase:
1. **Apply set:** in a single pass, the orchestrator applies every accumulated Apply-set finding's `suggested_fix` to the document. Document edits happen inline via the platform's edit tool — ce-doc-review has no batch-fixer subagent (per scope boundary); the orchestrator performs the edits directly, since `gated_auto` and `manual` fixes for documents are single-file markdown changes with no cross-file dependencies. **Defensive no-fix check:** before dispatching the edit for each Apply-set entry, verify the merged finding carries a `suggested_fix`. If it does not (the decision-time no-fix guard in "Per-finding routing" should prevent this, but treat it as a defensive fallback), skip the edit, record the finding in the completion report's failure section with reason `Apply skipped — no suggested_fix available`, and continue the batch. Do not fail the entire pass because one Apply-set entry lacks a fix.
2. **Defer set:** already executed inline during the walk-through via `references/open-questions-defer.md`. Nothing to dispatch here.
3. **Skip:** no-op.
After execution completes (or after `LFG the rest → Cancel` followed by the user working through remaining findings one at a time, or after the loop runs to completion), emit the unified completion report described below.
---
## Unified completion report
Every terminal path of Interactive mode emits the same completion report structure. This covers:
- Walk-through completed (all findings answered)
- Walk-through bailed via `LFG the rest → Proceed`
- Top-level LFG (routing option B) completed
- Top-level Append-to-Open-Questions (routing option C) completed
- Zero findings after `safe_auto` (routing question was skipped — the completion summary is a one-line degenerate case of this structure)
### Minimum required fields
- **Per-finding entries:** for every finding the flow touched, a line with — at minimum — title, severity, the action taken (Applied / Deferred / Skipped / Acknowledged), the append location for Deferred entries, a one-line reason for Skipped entries (grounded in the finding's confidence or the one-line `why_it_matters` snippet), and the acknowledgement reason for Acknowledged entries (e.g., `Apply picked but no suggested_fix available`).
- **Summary counts by action:** totals per bucket (e.g., `4 applied, 2 deferred, 2 skipped`). Include an `acknowledged` count when any entries land in that bucket; omit the label when the count is zero.
- **Failures called out explicitly:** any Apply that failed (e.g., document write error, or the defensive no-fix fallback skipping an Apply-set entry), any Open-Questions append that failed. Failures surface above the per-finding list so they are not missed.
- **End-of-review verdict:** carried over from Phase 4's Coverage section.
### Report ordering
Failures first (above the per-finding list), then per-finding entries grouped by action bucket in the order `Applied / Deferred / Skipped / Acknowledged`, then summary counts, then Coverage (FYI observations, residual concerns), then the verdict. Omit any bucket whose count is zero.
### Zero-findings degenerate case
When the routing question was skipped because no `gated_auto` / above-gate `manual` findings remained after `safe_auto`, the completion report collapses to its summary-counts + verdict form with one added line — the count of `safe_auto` fixes applied. The summary wording:
No FYI or residual concerns:
```
All findings resolved — 3 fixes applied.
Verdict: Ready.
```
FYI or residual concerns remain:
```
All actionable findings resolved — 3 fixes applied. (2 FYI observations, 1 residual concern remain in the report.)
Verdict: Ready.
```
---
## Execution posture
The walk-through is operationally read-only with respect to the project except for three permitted writes: the in-memory Apply set / decision list (managed by the orchestrator), the in-doc Open Questions appends (external side effects managed by `references/open-questions-defer.md`), and the end-of-walk-through batch document edits (the orchestrator's final Apply pass). Persona agents remain strictly read-only. Unlike `ce-code-review`, there is no fixer subagent — the orchestrator owns the document edit directly.