feat(ce-ideate): subject gate, surprise-me, and warrant contract (#671)

Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-24 02:04:21 -07:00
parent 494313e8eb
commit 6514b1fce5
8 changed files with 162 additions and 58 deletions
--- a/plugins/compound-engineering/skills/ce-ideate/references/post-ideation-workflow.md
+++ b/plugins/compound-engineering/skills/ce-ideate/references/post-ideation-workflow.md
@@ -18,8 +18,11 @@ Rejection criteria:
 - too expensive relative to likely value
 - already covered by existing workflows or docs
 - interesting but better handled as a brainstorm variant, not a product improvement
+- **unjustified — no articulated warrant** (sub-agent failed to provide `direct:`, `external:`, or `reasoned:` justification, or the stated warrant does not actually support the claimed move)
+- **below ambition floor** (fails the meeting-test: would not warrant team discussion — except when Phase 0.5 detected tactical focus signals, in which case this criterion is waived)
+- **subject-replacement** (abandons or replaces the subject of ideation rather than operating on it — e.g., "pivot to an unrelated domain," "become a different organization")

-Score survivors using a consistent rubric weighing: groundedness in stated context, expected value, novelty, pragmatism, leverage on future work, implementation burden, and overlap with stronger ideas.
+Score survivors using a consistent rubric weighing: groundedness in stated context, **warrant strength** (`direct:` > `external:` > `reasoned:`; none excluded, but direct-evidence ideas score higher all else equal), expected value, novelty, pragmatism, leverage on future work, implementation burden, and overlap with stronger ideas.

 Target output:
 - keep 5-7 survivors by default
@@ -36,7 +39,8 @@ Present only the surviving ideas in structured form:

 - title
 - description
- rationale
+- **warrant** (tagged `direct:` / `external:` / `reasoned:`, with the quoted evidence, cited source, or written-out argument)
+- rationale (how the warrant connects to the move's significance)
 - downsides
 - confidence score
 - estimated complexity
@@ -89,7 +93,8 @@ mode: <repo-grounded | elsewhere-software | elsewhere-non-software>

 ### 1. <Idea Title>
 **Description:** [Concrete explanation]
-**Rationale:** [Why this idea is strong in the stated context]
+**Warrant:** [`direct:` / `external:` / `reasoned:` — the actual basis, quoted or cited]
+**Rationale:** [How the warrant connects to the move's significance]
 **Downsides:** [Tradeoffs or costs]
 **Confidence:** [0-100%]
 **Complexity:** [Low / Medium / High]
@@ -221,7 +226,10 @@ After the fallback completes (any path), continue back to the Phase 6 menu so th

 Before finishing, check:

- the idea set is grounded in the stated context (codebase in repo mode; user-supplied topic in elsewhere mode)
+- the idea set is grounded in the stated context (codebase in repo mode; user-supplied context in elsewhere mode)
+- **every surviving idea has articulated warrant** (`direct:`, `external:`, or `reasoned:`) that actually supports the claimed move — speculation dressed as ambition was rejected, with reasons
+- **every surviving idea passes the meeting-test** unless Phase 0.5 detected tactical focus signals that waived the floor
+- **no surviving idea replaces the subject** rather than operating on it
 - the candidate list was generated before filtering
 - the original many-ideas -> critique -> survivors mechanism was preserved
 - if sub-agents were used, they improved diversity without replacing the core workflow
--- a/plugins/compound-engineering/skills/ce-ideate/references/universal-ideation.md
+++ b/plugins/compound-engineering/skills/ce-ideate/references/universal-ideation.md
@@ -22,7 +22,7 @@ Match depth to scope:
 - **Standard** — light intake (one or two questions), one round of generation, adversarial critique, present 5-7 survivors.
 - **Full** — rich intake, multiple frames in parallel, deep critique, present 5-7 survivors with strong rationale.

-Apply the discrimination test before asking anything. Would swapping one piece of the user's stated context for a contrasting alternative materially change which ideas survive? If yes, the context is load-bearing — proceed. If no, ask 1-3 narrowly chosen questions, building on what the user already provided rather than starting from a template. After each answer, re-apply the test before asking another. Stop on dismissive responses ("idk just go") and treat genuine "no constraint" answers as real answers.
+Apply the discrimination test before asking anything. Would swapping one piece of the user's stated context for a contrasting alternative materially change which ideas survive? If yes, the context is load-bearing — proceed. If no, ask 1-3 narrowly chosen questions. Follow the questioning principles from SKILL.md Phase 0.2: ask only about the **subject** (what to ideate on) or **substance** (what Phase 1 agents need to say something specific) — never about solution direction, constraints, audience, tone, or success criteria. Those belong to `ce-brainstorm`. Build on what the user already provided rather than starting from a template. After each answer, re-apply the test before asking another. Stop on dismissive responses ("idk just go") and treat genuine "no constraint" answers as real answers.

 **Grounding freshness.** Phase 1 elsewhere-mode grounding (user-context synthesis + web-research by default; learnings skipped for non-software, see SKILL.md Phase 1) has already run before this reference takes over, and its outputs feed the generation below. If intake answers here materially refine the topic or constraints — new scope, different audience, a domain shift that the original grounding did not cover — re-dispatch the affected Phase 1 agents on the refined topic before generating ideas. The guardrail mirrors SKILL.md Phase 0.4's rule that mode and grounding re-evaluate when intake changes the scope to be acted on; ranking against stale grounding risks surfacing ideas fit to the wrong topic.

@@ -39,17 +39,28 @@ Generate the full candidate list before critiquing any idea. Use the same six fr
 - **Cross-domain analogy** — how do completely different fields solve a structurally similar problem? The grounding domain is the user's topic; the analogy domain is anywhere else (other industries, biology, games, infrastructure, history). Push past the obvious analogy to non-obvious ones.
 - **Constraint-flipping** — invert the obvious constraint to its opposite or extreme. What if the budget were 10x or 0? What if there were one constraint instead of ten, or ten instead of one? Use the resulting design as a candidate even if the flip itself is not realistic.

-Aim for 5-8 ideas per frame. After generating, merge and dedupe; scan for cross-cutting combinations (3-5 additions at most).
+Aim for 5-8 ideas per frame. After generating, merge and dedupe; scan for cross-cutting combinations (3-5 additions at most; more in surprise-me mode, where different frames often discover different subjects and combinations are the magic layer).
+
+**Per-idea output contract (mirrors SKILL.md Phase 2):** each idea carries title, summary, **warrant** (required, tagged `direct:` quoted evidence / `external:` named prior art or domain research / `reasoned:` written-out first-principles argument), why-it-matters connecting the warrant to the move's significance, and a one-line meeting-test self-check (waived when tactical focus signals were detected in Phase 0.5). Warrant is required, not optional — unjustified speculation does not surface.
+
+**Generation rules:**
+
+- Every idea carries articulated warrant. The failure mode to prevent is plausible-sounding speculation that lacks any basis the user can verify.
+- Bias toward the warrant type your frame naturally produces — pain/inversion/leverage tend toward `direct:`; analogy and constraint-flipping tend toward `reasoned:` — but don't exclude other types. When a frame produces reasoned warrant, write the argument out, don't gesture at it.
+- Apply the meeting-test as a default floor: would this idea warrant the equivalent of team discussion (or whatever maps to "worth talking through" in this topic's native domain)? If not, it's below the floor and does not surface. The floor is relaxed only when Phase 0.5 detected tactical focus signals.
+- Stay within the subject's identity. Expansions, new surfaces, new directions, retirements are fair game when warrant supports them. Subject-replacement moves (abandoning the subject, pivoting to an unrelated domain) are out regardless of warrant.
+
+**Surprise-me mode in this reference.** When Phase 0.2 routed to surprise-me, there is no user-specified subject. Through each frame's lens, explore the Phase 1 grounding (user-context synthesis + web research) and identify the subject(s) you find most interesting for that lens. Different frames finding different subjects is the feature. Warrant may include identification of the subject itself — why this subject is worth ideating on through this lens, citing what in the Phase 1 material signals it.

 ## How to converge

-Apply adversarial critique. For each candidate, write a one-line reason if rejected. Score survivors using a consistent rubric weighing: groundedness in stated context, expected value, novelty, pragmatism, leverage, implementation burden, and overlap with stronger candidates.
+Apply adversarial critique. For each candidate, write a one-line reason if rejected. **Warrant-integrity check:** reject any idea lacking articulated warrant, any idea whose stated warrant does not actually support the claimed move (speculation dressed as ambition), and any idea that replaces the subject rather than operating on it. Score survivors using a consistent rubric weighing: groundedness in stated context, **warrant strength** (`direct:` > `external:` > `reasoned:`; none excluded, but direct-evidence ideas score higher all else equal), expected value, novelty, pragmatism, leverage, implementation burden, and overlap with stronger candidates.

 Target 5-7 survivors by default. If too many survive, run a second stricter pass. If fewer than five survive, report that honestly rather than lowering the bar.

 ## When to wrap up

-Present survivors before any persistence. For each: title, description, rationale, downsides, confidence, complexity. Then a brief rejection summary so the user can see what was considered and cut.
+Present survivors before any persistence. For each: title, description, **warrant** (tagged `direct:` / `external:` / `reasoned:`, with the quoted evidence, cited source, or written-out argument), rationale (how the warrant connects to the move's significance), downsides, confidence, complexity. Then a brief rejection summary so the user can see what was considered and cut.

 Persistence is opt-in. The terminal review loop is a complete ideation cycle. Refinement happens in conversation with no file or network cost. Persistence triggers only when the user explicitly chooses to save, share, or hand off.

--- a/plugins/compound-engineering/skills/ce-ideate/references/web-research-cache.md
+++ b/plugins/compound-engineering/skills/ce-ideate/references/web-research-cache.md
@@ -18,14 +18,14 @@ Read this when checking the V15 cache before dispatching `web-researcher`, or wh
 ]
 ```

-Files live under `<scratch-dir>/web-research-cache.json`, where `<scratch-dir>` is the absolute OS-temp path resolved once in SKILL.md Phase 1 (`"${TMPDIR:-/tmp}/compound-engineering/ce-ideate/<run-id>"`). Do not pass the unresolved `${TMPDIR:-/tmp}` string to non-shell tools; always use the absolute path captured in Phase 1.
+Files live under `<scratch-dir>/web-research-cache.json`, where `<scratch-dir>` is `/tmp/compound-engineering/ce-ideate/<run-id>`, resolved once in SKILL.md Phase 1.

 ## Reuse check

 Before dispatching `web-researcher`, resolve the scratch root (the parent of `<scratch-dir>`) in bash and list sibling run-id directories — refinement loops within a session may legitimately reuse another run's cache by topic, not run-id:

 ```bash
-SCRATCH_ROOT="${TMPDIR:-/tmp}/compound-engineering/ce-ideate"
+SCRATCH_ROOT="/tmp/compound-engineering/ce-ideate"
 find "$SCRATCH_ROOT" -maxdepth 2 -name 'web-research-cache.json' -type f 2>/dev/null
 ```