feat(doc-review, learnings-researcher): tiers, chain grouping, rewrite (#601)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-19 20:25:47 -07:00
parent 409b07fbc7
commit c1f68d4d55
39 changed files with 3142 additions and 290 deletions
--- a/plugins/compound-engineering/skills/ce-doc-review/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-doc-review/SKILL.md
@@ -6,28 +6,34 @@ argument-hint: "[mode:headless] [path/to/document.md]"

 # Document Review

-Review requirements or plan documents through multi-persona analysis. Dispatches specialized reviewer agents in parallel, auto-fixes quality issues, and presents strategic questions for user decision.
+Review requirements or plan documents through multi-persona analysis. Dispatches specialized reviewer agents in parallel, auto-applies `safe_auto` fixes, and routes remaining findings through a four-option interaction (per-finding walk-through, LFG, Append-to-Open-Questions, Report-only) for user decision.
+
+## Interactive mode rules
+
+- **Pre-load the platform question tool before any question fires.** In Claude Code, `AskUserQuestion` is a deferred tool — its schema is not available at session start. At the start of Interactive-mode work (before the routing question, per-finding walk-through questions, bulk-preview Proceed/Cancel, and Phase 5 terminal question), call `ToolSearch` with query `select:AskUserQuestion` to load the schema. Load it once, eagerly, at the top of the Interactive flow — do not wait for the first question site. On Codex (`request_user_input`) and Gemini (`ask_user`) this step is not required; the tools are loaded by default.
+- **The numbered-list fallback only applies on confirmed load failure.** Presenting options as a numbered list and waiting for the user's reply is valid only when `ToolSearch` returns no match or the tool call explicitly fails. Rendering a question as narrative text because the tool feels inconvenient, because the model is in report-formatting mode, or because the instruction was buried in a long skill is a bug. A question that calls for a user decision must either fire the tool or fail loudly.

 ## Phase 0: Detect Mode

-Check the skill arguments for `mode:headless`. Arguments may contain a document path, `mode:headless`, or both. Tokens starting with `mode:` are flags, not file paths -- strip them from the arguments and use the remaining token (if any) as the document path for Phase 1.
+Check the skill arguments for `mode:headless`. Arguments may contain a document path, `mode:headless`, or both. Tokens starting with `mode:` are flags, not file paths — strip them from the arguments and use the remaining token (if any) as the document path for Phase 1.

 If `mode:headless` is present, set **headless mode** for the rest of the workflow.

-**Headless mode** changes the interaction model, not the classification boundaries. Document-review still applies the same judgment about what has one clear correct fix vs. what needs user judgment. The only difference is how non-auto findings are delivered:
- `auto` fixes are applied silently (same as interactive)
- `present` findings are returned as structured text for the caller to handle -- no AskUserQuestion prompts, no interactive approval
- Phase 5 returns immediately with "Review complete" (no refine/complete question)
+**Headless mode** changes the interaction model, not the classification boundaries. ce-doc-review still applies the same judgment about which tier each finding belongs in. The only difference is how non-safe_auto findings are delivered:
+
+- `safe_auto` fixes are applied silently (same as interactive)
+- `gated_auto`, `manual`, and FYI findings are returned as structured text for the caller to handle — no AskUserQuestion prompts, no interactive routing
+- Phase 5 returns immediately with "Review complete" (no routing question, no terminal question)

 The caller receives findings with their original classifications intact and decides what to do with them.

 Callers invoke headless mode by including `mode:headless` in the skill arguments, e.g.:
+
 ```
 Skill("ce-doc-review", "mode:headless docs/plans/my-plan.md")
 ```

-
-If `mode:headless` is not present, the skill runs in its default interactive mode with no behavior change.
+If `mode:headless` is not present, the skill runs in its default interactive mode with the routing question, walk-through, and bulk-preview behaviors documented in `references/walkthrough.md` and `references/bulk-preview.md`.

 ## Phase 1: Get and Analyze Document

@@ -124,8 +130,46 @@ Dispatch all agents in **parallel** using the platform's task/agent tool (e.g.,
 | `{document_type}` | "requirements" or "plan" from Phase 1 classification |
 | `{document_path}` | Path to the document |
 | `{document_content}` | Full text of the document |
+| `{decision_primer}` | Cumulative prior-round decisions in the current session, or an empty `<prior-decisions>` block on round 1. See "Decision primer" below. |

-Pass each agent the **full document** -- do not split into sections.
+Pass each agent the **full document** — do not split into sections.
+
+### Decision primer
+
+On round 1 (no prior decisions), set `{decision_primer}` to:
+
+```
+<prior-decisions>
+Round 1 — no prior decisions.
+</prior-decisions>
+```
+
+On round 2+ (after one or more prior rounds in the current interactive session), accumulate prior-round decisions and render them as:
+
+```
+<prior-decisions>
+Round 1 — applied (N entries):
+- {section}: "{title}" ({reviewer}, {confidence})
+  Evidence: "{evidence_snippet}"
+
+Round 1 — rejected (M entries):
+- {section}: "{title}" — Skipped because {reason}
+  Evidence: "{evidence_snippet}"
+- {section}: "{title}" — Deferred to Open Questions because {reason or "no reason provided"}
+  Evidence: "{evidence_snippet}"
+- {section}: "{title}" — Acknowledged without applying because {reason or "no suggested_fix — user acknowledged"}
+  Evidence: "{evidence_snippet}"
+
+Round 2 — applied (N entries):
+...
+</prior-decisions>
+```
+
+Each entry carries an `Evidence:` line because synthesis R29 (rejected-finding suppression) and R30 (fix-landed verification) both use an evidence-substring overlap check as part of their matching predicate — without the evidence snippet in the primer, the orchestrator cannot compute the `>50%` overlap test and has to fall back to fingerprint-only matching, which either re-surfaces rejected findings or suppresses too aggressively. The `{evidence_snippet}` is the first evidence quote from the finding, truncated to the first ~120 characters (preserving whole words at the boundary) and with internal quotes escaped. If a finding has multiple evidence entries, use the first one; the rest live in the run artifact and are not needed for the overlap check.
+
+Accumulate across all rounds in the current session. Skip, Defer, and Acknowledge actions all count as "rejected" for suppression purposes — each signals the user decided the finding wasn't worth actioning this round (Acknowledge is the no-fix-guard variant: the user saw a finding with no `suggested_fix`, chose not to defer or skip explicitly, and recorded acknowledgement instead; for round-to-round suppression that is semantically equivalent to Skip). Applied findings stay on the applied list so round-N+1 personas can verify fixes landed (see R30 in `references/synthesis-and-presentation.md`).
+
+Cross-session persistence is out of scope. A new invocation of ce-doc-review on the same document starts with a fresh round 1 and no carried primer, even if prior sessions deferred findings into the document's Open Questions section.

 **Error handling:** If an agent fails or times out, proceed with findings from agents that completed. Note the failed agent in the Coverage section. Do not block the entire review on a single agent failure.

@@ -133,7 +177,9 @@ Pass each agent the **full document** -- do not split into sections.

 ## Phases 3-5: Synthesis, Presentation, and Next Action

-After all dispatched agents return, read `references/synthesis-and-presentation.md` for the synthesis pipeline (validate, gate, dedup, promote, resolve contradictions, route by autofix class), auto-fix application, finding presentation, and next-action menu. Do not load this file before agent dispatch completes.
+After all dispatched agents return, read `references/synthesis-and-presentation.md` for the synthesis pipeline (validate, per-severity gate, dedup, cross-persona agreement boost, resolve contradictions, auto-promotion, route by three tiers with FYI subsection), `safe_auto` fix application, headless-envelope output, and the handoff to the routing question.
+
+For the four-option routing question and per-finding walk-through (interactive mode), read `references/walkthrough.md`. For the bulk-action preview used by LFG, Append-to-Open-Questions, and walk-through `LFG-the-rest`, read `references/bulk-preview.md`. Do not load these files before agent dispatch completes.

 ---