feat: promote ce:plan-beta and deepen-plan-beta to stable (#355)

2026-03-24 10:18:14 -07:00
parent 65e5621dbe
commit 169996a75e
8 changed files with 972 additions and 2033 deletions
--- a/README.md
+++ b/README.md
@@ -201,8 +201,6 @@ The `/ce:ideate` skill proactively surfaces strong improvement ideas, and `/ce:b

 Each cycle compounds: brainstorms sharpen plans, plans inform future plans, reviews catch more issues, patterns get documented.

-> **Beta:** Experimental versions of `/ce:plan` and `/deepen-plan` are available as `/ce:plan-beta` and `/deepen-plan-beta`. See the [plugin README](plugins/compound-engineering/README.md#beta-skills) for details.
-
 ## Philosophy

 **Each unit of engineering work should make subsequent units easier—not harder.**
--- a/docs/plans/2026-03-23-001-feat-promote-plan-beta-skills-to-stable-plan.md
+++ b/docs/plans/2026-03-23-001-feat-promote-plan-beta-skills-to-stable-plan.md
@@ -0,0 +1,132 @@
+---
+title: "feat: promote ce:plan-beta and deepen-plan-beta to stable"
+type: feat
+status: completed
+date: 2026-03-23
+---
+
+# Promote ce:plan-beta and deepen-plan-beta to stable
+
+## Overview
+
+Replace the stable `ce:plan` and `deepen-plan` skills with their validated beta counterparts, following the documented 9-step promotion path from `docs/solutions/skill-design/beta-skills-framework.md`.
+
+## Problem Statement
+
+The beta versions of `ce:plan` and `deepen-plan` have been tested and are ready for promotion. They currently sit alongside the stable versions as separate skill directories with `disable-model-invocation: true`, meaning users must invoke them manually. Promotion makes them the default for all workflows including `lfg`/`slfg` orchestration.
+
+## Proposed Solution
+
+Follow the beta-skills-framework promotion checklist exactly, applied to both skill pairs simultaneously.
+
+## Implementation Plan
+
+### Phase 1: Replace stable SKILL.md content with beta content
+
+**Files to modify:**
+
+1. **`skills/ce-plan/SKILL.md`** -- Replace entire content with `skills/ce-plan-beta/SKILL.md`
+2. **`skills/deepen-plan/SKILL.md`** -- Replace entire content with `skills/deepen-plan-beta/SKILL.md`
+
+### Phase 2: Restore stable frontmatter and remove beta markers
+
+**In promoted `skills/ce-plan/SKILL.md`:**
+
+- Change `name: ce:plan-beta` to `name: ce:plan`
+- Remove `[BETA] ` prefix from description
+- Remove `disable-model-invocation: true` line
+
+**In promoted `skills/deepen-plan/SKILL.md`:**
+
+- Change `name: deepen-plan-beta` to `name: deepen-plan`
+- Remove `[BETA] ` prefix from description
+- Remove `disable-model-invocation: true` line
+
+### Phase 3: Update all internal references from beta to stable names
+
+**In promoted `skills/ce-plan/SKILL.md`:**
+
+- All references to `/deepen-plan-beta` become `/deepen-plan`
+- All references to `ce:plan-beta` become `ce:plan` (in headings, prose, etc.)
+- All references to `-beta-plan.md` file suffix become `-plan.md`
+- Example filenames using `-beta-plan.md` become `-plan.md`
+
+**In promoted `skills/deepen-plan/SKILL.md`:**
+
+- All references to `ce:plan-beta` become `ce:plan`
+- All references to `deepen-plan-beta` become `deepen-plan`
+- Scratch directory paths: `deepen-plan-beta` becomes `deepen-plan`
+
+### Phase 4: Clean up ce-work-beta cross-reference
+
+**In `skills/ce-work-beta/SKILL.md` (line 450):**
+
+- Remove `ce:plan-beta or ` from the text so it reads just `ce:plan`
+
+### Phase 5: Delete beta skill directories
+
+- Delete `skills/ce-plan-beta/` directory entirely
+- Delete `skills/deepen-plan-beta/` directory entirely
+
+### Phase 6: Update README.md
+
+**In `plugins/compound-engineering/README.md`:**
+
+1. **Update `ce:plan` description** in the Workflow Commands table (line 81): Change from `Create implementation plans` to `Transform features into structured implementation plans grounded in repo patterns`
+2. **Update `deepen-plan` description** in the Utility Commands table (line 93): Description already says `Stress-test plans and deepen weak sections with targeted research` which matches the beta -- verify and keep
+3. **Remove the entire Beta Skills section** (lines 156-165): The `### Beta Skills` heading, explanatory paragraph, table with `ce:plan-beta` and `deepen-plan-beta` rows, and the "To test" line
+4. **Update skill count**: Currently `40+` in the Components table. Removing 2 beta directories decreases the count. Verify with `bun run release:validate` and update if needed
+
+### Phase 7: Validation
+
+1. **Search for remaining `-beta` references**: Grep all files under `plugins/compound-engineering/` for leftover `plan-beta` strings -- every hit is a bug, except historical entries in `CHANGELOG.md` which are expected and must not be modified
+2. **Run `bun run release:validate`**: Check plugin/marketplace consistency, skill counts
+3. **Run `bun test`**: Ensure converter tests still pass (they use skill names as fixtures)
+4. **Verify `lfg`/`slfg` references**: Confirm they reference stable `/ce:plan` and `/deepen-plan` (they already do -- no change needed)
+5. **Verify `ce:brainstorm` handoff**: Confirms it hands off to stable `/ce:plan` (already does -- no change needed)
+6. **Verify `ce:work` compatibility**: Plans from promoted skills use `-plan.md` suffix, same as before
+
+## Files Changed
+
+| File | Action | Notes |
+|------|--------|-------|
+| `skills/ce-plan/SKILL.md` | Replace | Beta content with stable frontmatter |
+| `skills/deepen-plan/SKILL.md` | Replace | Beta content with stable frontmatter |
+| `skills/ce-plan-beta/` | Delete | Entire directory |
+| `skills/deepen-plan-beta/` | Delete | Entire directory |
+| `skills/ce-work-beta/SKILL.md` | Edit | Remove `ce:plan-beta or` reference at line 450 |
+| `README.md` | Edit | Remove Beta Skills section, verify counts and descriptions |
+
+## Files NOT Changed (verified safe)
+
+These files reference stable `ce:plan` or `deepen-plan` and require **no changes** because stable names are preserved:
+
+- `skills/lfg/SKILL.md` -- calls `/ce:plan` and `/deepen-plan`
+- `skills/slfg/SKILL.md` -- calls `/ce:plan` and `/deepen-plan`
+- `skills/ce-brainstorm/SKILL.md` -- hands off to `/ce:plan`
+- `skills/ce-ideate/SKILL.md` -- explains pipeline
+- `skills/document-review/SKILL.md` -- references `/ce:plan`
+- `skills/ce-compound/SKILL.md` -- references `/ce:plan`
+- `skills/ce-review/SKILL.md` -- references `/ce:plan`
+- `AGENTS.md` -- lists `ce:plan`
+- `agents/research/learnings-researcher.md` -- references both
+- `agents/research/git-history-analyzer.md` -- references `/ce:plan`
+- `agents/review/code-simplicity-reviewer.md` -- references `/ce:plan`
+- `plugin.json` / `marketplace.json` -- no individual skill listings
+
+## Acceptance Criteria
+
+- [ ] `skills/ce-plan/SKILL.md` contains the beta planning approach (decision-first, phase-structured)
+- [ ] `skills/deepen-plan/SKILL.md` contains the beta deepening approach (selective stress-test, risk-weighted)
+- [ ] No `disable-model-invocation` in either promoted skill
+- [ ] No `[BETA]` prefix in either description
+- [ ] No remaining `-beta` references in any file under `plugins/compound-engineering/`
+- [ ] `skills/ce-plan-beta/` and `skills/deepen-plan-beta/` directories deleted
+- [ ] README Beta Skills section removed
+- [ ] `bun run release:validate` passes
+- [ ] `bun test` passes
+
+## Sources
+
+- **Promotion checklist:** `docs/solutions/skill-design/beta-skills-framework.md` (steps 1-9)
+- **Versioning rules:** `docs/solutions/plugin-versioning-requirements.md` (no manual version bumps)
--- a/plugins/compound-engineering/README.md
+++ b/plugins/compound-engineering/README.md
@@ -97,7 +97,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
 |---------|-------------|
 | `/ce:ideate` | Discover high-impact project improvements through divergent ideation and adversarial filtering |
 | `/ce:brainstorm` | Explore requirements and approaches before planning |
-| `/ce:plan` | Create implementation plans |
+| `/ce:plan` | Transform features into structured implementation plans grounded in repo patterns |
 | `/ce:review` | Run comprehensive code reviews |
 | `/ce:work` | Execute work items systematically |
 | `/ce:compound` | Document solved problems to compound team knowledge |
@@ -178,11 +178,9 @@ Experimental versions of core workflow skills. These are being tested before rep

 | Skill | Description | Replaces |
 |-------|-------------|----------|
-| `ce:plan-beta` | Decision-first planning focused on boundaries, sequencing, and verification | `ce:plan` |
 | `ce:review-beta` | Structured review with tiered persona agents, confidence gating, and dedup pipeline | `ce:review` |
-| `deepen-plan-beta` | Selective stress-test that targets weak sections with research | `deepen-plan` |

-To test: invoke `/ce:plan-beta`, `/ce:review-beta`, or `/deepen-plan-beta` directly. Plans produced by the beta skills are compatible with `/ce:work`.
+To test: invoke `/ce:review-beta` directly.

 ### Image Generation

--- a/plugins/compound-engineering/skills/ce-plan-beta/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-plan-beta/SKILL.md
@@ -1,654 +0,0 @@
---
-name: ce:plan-beta
-description: "[BETA] Transform feature descriptions or requirements into structured implementation plans grounded in repo patterns and research. Use when the user says 'plan this', 'create a plan', 'write a tech plan', 'plan the implementation', 'how should we build', 'what's the approach for', 'break this down', or when a brainstorm/requirements document is ready for technical planning. Best when requirements are at least roughly defined; for exploratory or ambiguous requests, prefer ce:brainstorm first."
-argument-hint: "[feature description, requirements doc path, or improvement idea]"
-disable-model-invocation: true
---
-
-# Create Technical Plan
-
-**Note: The current year is 2026.** Use this when dating plans and searching for recent documentation.
-
-`ce:brainstorm` defines **WHAT** to build. `ce:plan` defines **HOW** to build it. `ce:work` executes the plan.
-
-This workflow produces a durable implementation plan. It does **not** implement code, run tests, or learn from execution-time results. If the answer depends on changing code and seeing what happens, that belongs in `ce:work`, not here.
-
-## Interaction Method
-
-Use the platform's question tool when available. When asking the user a question, prefer the platform's blocking question tool if one exists (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
-
-Ask one question at a time. Prefer a concise single-select choice when natural options exist.
-
-## Feature Description
-
-<feature_description> #$ARGUMENTS </feature_description>
-
-**If the feature description above is empty, ask the user:** "What would you like to plan? Please describe the feature, bug fix, or improvement you have in mind."
-
-Do not proceed until you have a clear planning input.
-
-## Core Principles
-
-1. **Use requirements as the source of truth** - If `ce:brainstorm` produced a requirements document, planning should build from it rather than re-inventing behavior.
-2. **Decisions, not code** - Capture approach, boundaries, files, dependencies, risks, and test scenarios. Do not pre-write implementation code or shell command choreography. Pseudo-code sketches or DSL grammars that communicate high-level technical design are welcome when they help a reviewer validate direction — but they must be explicitly framed as directional guidance, not implementation specification.
-3. **Research before structuring** - Explore the codebase, institutional learnings, and external guidance when warranted before finalizing the plan.
-4. **Right-size the artifact** - Small work gets a compact plan. Large work gets more structure. The philosophy stays the same at every depth.
-5. **Separate planning from execution discovery** - Resolve planning-time questions here. Explicitly defer execution-time unknowns to implementation.
-6. **Keep the plan portable** - The plan should work as a living document, review artifact, or issue body without embedding tool-specific executor instructions.
-7. **Carry execution posture lightly when it matters** - If the request, origin document, or repo context clearly implies test-first, characterization-first, or another non-default execution posture, reflect that in the plan as a lightweight signal. Do not turn the plan into step-by-step execution choreography.
-
-## Plan Quality Bar
-
-Every plan should contain:
- A clear problem frame and scope boundary
- Concrete requirements traceability back to the request or origin document
- Exact file paths for the work being proposed
- Explicit test file paths for feature-bearing implementation units
- Decisions with rationale, not just tasks
- Existing patterns or code references to follow
- Specific test scenarios and verification outcomes
- Clear dependencies and sequencing
-
-A plan is ready when an implementer can start confidently without needing the plan to write the code for them.
-
-## Workflow
-
-### Phase 0: Resume, Source, and Scope
-
-#### 0.1 Resume Existing Plan Work When Appropriate
-
-If the user references an existing plan file or there is an obvious recent matching plan in `docs/plans/`:
- Read it
- Confirm whether to update it in place or create a new plan
- If updating, preserve completed checkboxes and revise only the still-relevant sections
-
-#### 0.2 Find Upstream Requirements Document
-
-Before asking planning questions, search `docs/brainstorms/` for files matching `*-requirements.md`.
-
-**Relevance criteria:** A requirements document is relevant if:
- The topic semantically matches the feature description
- It was created within the last 30 days (use judgment to override if the document is clearly still relevant or clearly stale)
- It appears to cover the same user problem or scope
-
-If multiple source documents match, ask which one to use using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
-
-#### 0.3 Use the Source Document as Primary Input
-
-If a relevant requirements document exists:
-1. Read it thoroughly
-2. Announce that it will serve as the origin document for planning
-3. Carry forward all of the following:
-   - Problem frame
-   - Requirements and success criteria
-   - Scope boundaries
-   - Key decisions and rationale
-   - Dependencies or assumptions
-   - Outstanding questions, preserving whether they are blocking or deferred
-4. Use the source document as the primary input to planning and research
-5. Reference important carried-forward decisions in the plan with `(see origin: <source-path>)`
-6. Do not silently omit source content — if the origin document discussed it, the plan must address it even if briefly. Before finalizing, scan each section of the origin document to verify nothing was dropped.
-
-If no relevant requirements document exists, planning may proceed from the user's request directly.
-
-#### 0.4 No-Requirements-Doc Fallback
-
-If no relevant requirements document exists:
- Assess whether the request is already clear enough for direct technical planning
- If the ambiguity is mainly product framing, user behavior, or scope definition, recommend `ce:brainstorm` first
- If the user wants to continue here anyway, run a short planning bootstrap instead of refusing
-
-The planning bootstrap should establish:
- Problem frame
- Intended behavior
- Scope boundaries and obvious non-goals
- Success criteria
- Blocking questions or assumptions
-
-Keep this bootstrap brief. It exists to preserve direct-entry convenience, not to replace a full brainstorm.
-
-If the bootstrap uncovers major unresolved product questions:
- Recommend `ce:brainstorm` again
- If the user still wants to continue, require explicit assumptions before proceeding
-
-#### 0.5 Classify Outstanding Questions Before Planning
-
-If the origin document contains `Resolve Before Planning` or similar blocking questions:
- Review each one before proceeding
- Reclassify it into planning-owned work **only if** it is actually a technical, architectural, or research question
- Keep it as a blocker if it would change product behavior, scope, or success criteria
-
-If true product blockers remain:
- Surface them clearly
- Ask the user, using the platform's blocking question tool when available (see Interaction Method), whether to:
-  1. Resume `ce:brainstorm` to resolve them
-  2. Convert them into explicit assumptions or decisions and continue
- Do not continue planning while true blockers remain unresolved
-
-#### 0.6 Assess Plan Depth
-
-Classify the work into one of these plan depths:
-
- **Lightweight** - small, well-bounded, low ambiguity
- **Standard** - normal feature or bounded refactor with some technical decisions to document
- **Deep** - cross-cutting, strategic, high-risk, or highly ambiguous implementation work
-
-If depth is unclear, ask one targeted question and then continue.
-
-### Phase 1: Gather Context
-
-#### 1.1 Local Research (Always Runs)
-
-Prepare a concise planning context summary (a paragraph or two) to pass as input to the research agents:
- If an origin document exists, summarize the problem frame, requirements, and key decisions from that document
- Otherwise use the feature description directly
-
-Run these agents in parallel:
-
- Task compound-engineering:research:repo-research-analyst(Scope: technology, architecture, patterns. {planning context summary})
- Task compound-engineering:research:learnings-researcher(planning context summary)
-
-Collect:
- Technology stack and versions (used in section 1.2 to make sharper external research decisions)
- Architectural patterns and conventions to follow
- Implementation patterns, relevant files, modules, and tests
- AGENTS.md guidance that materially affects the plan, with CLAUDE.md used only as compatibility fallback when present
- Institutional learnings from `docs/solutions/`
-
-#### 1.1b Detect Execution Posture Signals
-
-Decide whether the plan should carry a lightweight execution posture signal.
-
-Look for signals such as:
- The user explicitly asks for TDD, test-first, or characterization-first work
- The origin document calls for test-first implementation or exploratory hardening of legacy code
- Local research shows the target area is legacy, weakly tested, or historically fragile, suggesting characterization coverage before changing behavior
- The user asks for external delegation, says "use codex", "delegate mode", or mentions token conservation -- add `Execution target: external-delegate` to implementation units that are pure code writing
-
-When the signal is clear, carry it forward silently in the relevant implementation units.
-
-Ask the user only if the posture would materially change sequencing or risk and cannot be responsibly inferred.
-
-#### 1.2 Decide on External Research
-
-Based on the origin document, user signals, and local findings, decide whether external research adds value.
-
-**Read between the lines.** Pay attention to signals from the conversation so far:
- **User familiarity** — Are they pointing to specific files or patterns? They likely know the codebase well.
- **User intent** — Do they want speed or thoroughness? Exploration or execution?
- **Topic risk** — Security, payments, external APIs warrant more caution regardless of user signals.
- **Uncertainty level** — Is the approach clear or still open-ended?
-
-**Leverage repo-research-analyst's technology context:**
-
-The repo-research-analyst output includes a structured Technology & Infrastructure summary. Use it to make sharper external research decisions:
-
- If specific frameworks and versions were detected (e.g., Rails 7.2, Next.js 14, Go 1.22), pass those exact identifiers to framework-docs-researcher so it fetches version-specific documentation
- If the feature touches a technology layer the scan found well-established in the repo (e.g., existing Sidekiq jobs when planning a new background job), lean toward skipping external research -- local patterns are likely sufficient
- If the feature touches a technology layer the scan found absent or thin (e.g., no existing proto files when planning a new gRPC service), lean toward external research -- there are no local patterns to follow
- If the scan detected deployment infrastructure (Docker, K8s, serverless), note it in the planning context passed to downstream agents so they can account for deployment constraints
- If the scan detected a monorepo and scoped to a specific service, pass that service's tech context to downstream research agents -- not the aggregate of all services. If the scan surfaced the workspace map without scoping, use the feature description to identify the relevant service before proceeding with research
-
-**Always lean toward external research when:**
- The topic is high-risk: security, payments, privacy, external APIs, migrations, compliance
- The codebase lacks relevant local patterns
- The user is exploring unfamiliar territory
- The technology scan found the relevant layer absent or thin in the codebase
-
-**Skip external research when:**
- The codebase already shows a strong local pattern
- The user already knows the intended shape
- Additional external context would add little practical value
- The technology scan found the relevant layer well-established with existing examples to follow
-
-Announce the decision briefly before continuing. Examples:
- "Your codebase has solid patterns for this. Proceeding without external research."
- "This involves payment processing, so I'll research current best practices first."
-
-#### 1.3 External Research (Conditional)
-
-If Step 1.2 indicates external research is useful, run these agents in parallel:
-
- Task compound-engineering:research:best-practices-researcher(planning context summary)
- Task compound-engineering:research:framework-docs-researcher(planning context summary)
-
-#### 1.4 Consolidate Research
-
-Summarize:
- Relevant codebase patterns and file paths
- Relevant institutional learnings
- External references and best practices, if gathered
- Related issues, PRs, or prior art
- Any constraints that should materially shape the plan
-
-#### 1.5 Flow and Edge-Case Analysis (Conditional)
-
-For **Standard** or **Deep** plans, or when user flow completeness is still unclear, run:
-
- Task compound-engineering:workflow:spec-flow-analyzer(planning context summary, research findings)
-
-Use the output to:
- Identify missing edge cases, state transitions, or handoff gaps
- Tighten requirements trace or verification strategy
- Add only the flow details that materially improve the plan
-
-### Phase 2: Resolve Planning Questions
-
-Build a planning question list from:
- Deferred questions in the origin document
- Gaps discovered in repo or external research
- Technical decisions required to produce a useful plan
-
-For each question, decide whether it should be:
- **Resolved during planning** - the answer is knowable from repo context, documentation, or user choice
- **Deferred to implementation** - the answer depends on code changes, runtime behavior, or execution-time discovery
-
-Ask the user only when the answer materially affects architecture, scope, sequencing, or risk and cannot be responsibly inferred. Use the platform's blocking question tool when available (see Interaction Method).
-
-**Do not** run tests, build the app, or probe runtime behavior in this phase. The goal is a strong plan, not partial execution.
-
-### Phase 3: Structure the Plan
-
-#### 3.1 Title and File Naming
-
- Draft a clear, searchable title using conventional format such as `feat: Add user authentication` or `fix: Prevent checkout double-submit`
- Determine the plan type: `feat`, `fix`, or `refactor`
- Build the filename following the repository convention: `docs/plans/YYYY-MM-DD-NNN-<type>-<descriptive-name>-beta-plan.md`
-  - Create `docs/plans/` if it does not exist
-  - Check existing files for today's date to determine the next sequence number (zero-padded to 3 digits, starting at 001)
-  - Keep the descriptive name concise (3-5 words) and kebab-cased
-  - Append `-beta` before `-plan` to distinguish from stable-generated plans
-  - Examples: `2026-01-15-001-feat-user-authentication-flow-beta-plan.md`, `2026-02-03-002-fix-checkout-race-condition-beta-plan.md`
-  - Avoid: missing sequence numbers, vague names like "new-feature", invalid characters (colons, spaces)
-
-#### 3.2 Stakeholder and Impact Awareness
-
-For **Standard** or **Deep** plans, briefly consider who is affected by this change — end users, developers, operations, other teams — and how that should shape the plan. For cross-cutting work, note affected parties in the System-Wide Impact section.
-
-#### 3.3 Break Work into Implementation Units
-
-Break the work into logical implementation units. Each unit should represent one meaningful change that an implementer could typically land as an atomic commit.
-
-Good units are:
- Focused on one component, behavior, or integration seam
- Usually touching a small cluster of related files
- Ordered by dependency
- Concrete enough for execution without pre-writing code
- Marked with checkbox syntax for progress tracking
-
-Avoid:
- 2-5 minute micro-steps
- Units that span multiple unrelated concerns
- Units that are so vague an implementer still has to invent the plan
-
-#### 3.4 High-Level Technical Design (Optional)
-
-Before detailing implementation units, decide whether an overview would help a reviewer validate the intended approach. This section communicates the *shape* of the solution — how pieces fit together — without dictating implementation.
-
-**When to include it:**
-
-| Work involves... | Best overview form |
-|---|---|
-| DSL or API surface design | Pseudo-code grammar or contract sketch |
-| Multi-component integration | Mermaid sequence or component diagram |
-| Data pipeline or transformation | Data flow sketch |
-| State-heavy lifecycle | State diagram |
-| Complex branching logic | Flowchart |
-| Single-component with non-obvious shape | Pseudo-code sketch |
-
-**When to skip it:**
- Well-patterned work where prose and file paths tell the whole story
- Straightforward CRUD or convention-following changes
- Lightweight plans where the approach is obvious
-
-Choose the medium that fits the work. Do not default to pseudo-code when a diagram communicates better, and vice versa.
-
-Frame every sketch with: *"This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce."*
-
-Keep sketches concise — enough to validate direction, not enough to copy-paste into production.
-
-#### 3.5 Define Each Implementation Unit
-
-For each unit, include:
- **Goal** - what this unit accomplishes
- **Requirements** - which requirements or success criteria it advances
- **Dependencies** - what must exist first
- **Files** - exact file paths to create, modify, or test
- **Approach** - key decisions, data flow, component boundaries, or integration notes
- **Execution note** - optional, only when the unit benefits from a non-default execution posture such as test-first, characterization-first, or external delegation
- **Technical design** - optional pseudo-code or diagram when the unit's approach is non-obvious and prose alone would leave it ambiguous. Frame explicitly as directional guidance, not implementation specification
- **Patterns to follow** - existing code or conventions to mirror
- **Test scenarios** - specific behaviors, edge cases, and failure paths to cover
- **Verification** - how an implementer should know the unit is complete, expressed as outcomes rather than shell command scripts
-
-Every feature-bearing unit should include the test file path in `**Files:**`.
-
-Use `Execution note` sparingly. Good uses include:
- `Execution note: Start with a failing integration test for the request/response contract.`
- `Execution note: Add characterization coverage before modifying this legacy parser.`
- `Execution note: Implement new domain behavior test-first.`
- `Execution note: Execution target: external-delegate`
-
-Do not expand units into literal `RED/GREEN/REFACTOR` substeps.
-
-#### 3.6 Keep Planning-Time and Implementation-Time Unknowns Separate
-
-If something is important but not knowable yet, record it explicitly under deferred implementation notes rather than pretending to resolve it in the plan.
-
-Examples:
- Exact method or helper names
- Final SQL or query details after touching real code
- Runtime behavior that depends on seeing actual test failures
- Refactors that may become unnecessary once implementation starts
-
-### Phase 4: Write the Plan
-
-Use one planning philosophy across all depths. Change the amount of detail, not the boundary between planning and execution.
-
-#### 4.1 Plan Depth Guidance
-
-**Lightweight**
- Keep the plan compact
- Usually 2-4 implementation units
- Omit optional sections that add little value
-
-**Standard**
- Use the full core template, omitting optional sections (including High-Level Technical Design) that add no value for this particular work
- Usually 3-6 implementation units
- Include risks, deferred questions, and system-wide impact when relevant
-
-**Deep**
- Use the full core template plus optional analysis sections where warranted
- Usually 4-8 implementation units
- Group units into phases when that improves clarity
- Include alternatives considered, documentation impacts, and deeper risk treatment when warranted
-
-#### 4.1b Optional Deep Plan Extensions
-
-For sufficiently large, risky, or cross-cutting work, add the sections that genuinely help:
- **Alternative Approaches Considered**
- **Success Metrics**
- **Dependencies / Prerequisites**
- **Risk Analysis & Mitigation**
- **Phased Delivery**
- **Documentation Plan**
- **Operational / Rollout Notes**
- **Future Considerations** only when they materially affect current design
-
-Do not add these as boilerplate. Include them only when they improve execution quality or stakeholder alignment.
-
-#### 4.2 Core Plan Template
-
-Omit clearly inapplicable optional sections, especially for Lightweight plans.
-
-```markdown
---
-title: [Plan Title]
-type: [feat|fix|refactor]
-status: active
-date: YYYY-MM-DD
-origin: docs/brainstorms/YYYY-MM-DD-<topic>-requirements.md  # include when planning from a requirements doc
-deepened: YYYY-MM-DD  # optional, set later by deepen-plan-beta when the plan is substantively strengthened
---
-
-# [Plan Title]
-
-## Overview
-
-[What is changing and why]
-
-## Problem Frame
-
-[Summarize the user/business problem and context. Reference the origin doc when present.]
-
-## Requirements Trace
-
- R1. [Requirement or success criterion this plan must satisfy]
- R2. [Requirement or success criterion this plan must satisfy]
-
-## Scope Boundaries
-
- [Explicit non-goal or exclusion]
-
-## Context & Research
-
-### Relevant Code and Patterns
-
- [Existing file, class, component, or pattern to follow]
-
-### Institutional Learnings
-
- [Relevant `docs/solutions/` insight]
-
-### External References
-
- [Relevant external docs or best-practice source, if used]
-
-## Key Technical Decisions
-
- [Decision]: [Rationale]
-
-## Open Questions
-
-### Resolved During Planning
-
- [Question]: [Resolution]
-
-### Deferred to Implementation
-
- [Question or unknown]: [Why it is intentionally deferred]
-
-<!-- Optional: Include this section only when the work involves DSL design, multi-component
-     integration, complex data flow, state-heavy lifecycle, or other cases where prose alone
-     would leave the approach shape ambiguous. Omit it entirely for well-patterned or
-     straightforward work. -->
-## High-Level Technical Design
-
-> *This illustrates the intended approach and is directional guidance for review, not implementation specification. The implementing agent should treat it as context, not code to reproduce.*
-
-[Pseudo-code grammar, mermaid diagram, data flow sketch, or state diagram — choose the medium that best communicates the solution shape for this work.]
-
-## Implementation Units
-
- [ ] **Unit 1: [Name]**
-
-**Goal:** [What this unit accomplishes]
-
-**Requirements:** [R1, R2]
-
-**Dependencies:** [None / Unit 1 / external prerequisite]
-
-**Files:**
- Create: `path/to/new_file`
- Modify: `path/to/existing_file`
- Test: `path/to/test_file`
-
-**Approach:**
- [Key design or sequencing decision]
-
-**Execution note:** [Optional test-first, characterization-first, external-delegate, or other execution posture signal]
-
-**Technical design:** *(optional -- pseudo-code or diagram when the unit's approach is non-obvious. Directional guidance, not implementation specification.)*
-
-**Patterns to follow:**
- [Existing file, class, or pattern]
-
-**Test scenarios:**
- [Specific scenario with expected behavior]
- [Edge case or failure path]
-
-**Verification:**
- [Outcome that should hold when this unit is complete]
-
-## System-Wide Impact
-
- **Interaction graph:** [What callbacks, middleware, observers, or entry points may be affected]
- **Error propagation:** [How failures should travel across layers]
- **State lifecycle risks:** [Partial-write, cache, duplicate, or cleanup concerns]
- **API surface parity:** [Other interfaces that may require the same change]
- **Integration coverage:** [Cross-layer scenarios unit tests alone will not prove]
-
-## Risks & Dependencies
-
- [Meaningful risk, dependency, or sequencing concern]
-
-## Documentation / Operational Notes
-
- [Docs, rollout, monitoring, or support impacts when relevant]
-
-## Sources & References
-
- **Origin document:** [docs/brainstorms/YYYY-MM-DD-<topic>-requirements.md](path)
- Related code: [path or symbol]
- Related PRs/issues: #[number]
- External docs: [url]
-```
-
-For larger `Deep` plans, extend the core template only when useful with sections such as:
-
-```markdown
-## Alternative Approaches Considered
-
- [Approach]: [Why rejected or not chosen]
-
-## Success Metrics
-
- [How we will know this solved the intended problem]
-
-## Dependencies / Prerequisites
-
- [Technical, organizational, or rollout dependency]
-
-## Risk Analysis & Mitigation
-
- [Risk]: [Mitigation]
-
-## Phased Delivery
-
-### Phase 1
- [What lands first and why]
-
-### Phase 2
- [What follows and why]
-
-## Documentation Plan
-
- [Docs or runbooks to update]
-
-## Operational / Rollout Notes
-
- [Monitoring, migration, feature flag, or rollout considerations]
-```
-
-#### 4.3 Planning Rules
-
- Prefer path plus class/component/pattern references over brittle line numbers
- Keep implementation units checkable with `- [ ]` syntax for progress tracking
- Do not include implementation code — no imports, exact method signatures, or framework-specific syntax
- Pseudo-code sketches and DSL grammars are allowed in the High-Level Technical Design section and per-unit technical design fields when they communicate design direction. Frame them explicitly as directional guidance, not implementation specification
- Mermaid diagrams are encouraged when they clarify relationships or flows that prose alone would make hard to follow — ERDs for data model changes, sequence diagrams for multi-service interactions, state diagrams for lifecycle transitions, flowcharts for complex branching logic
- Do not include git commands, commit messages, or exact test command recipes
- Do not expand implementation units into micro-step `RED/GREEN/REFACTOR` instructions
- Do not pretend an execution-time question is settled just to make the plan look complete
-
-### Phase 5: Final Review, Write File, and Handoff
-
-#### 5.1 Review Before Writing
-
-Before finalizing, check:
- The plan does not invent product behavior that should have been defined in `ce:brainstorm`
- If there was no origin document, the bounded planning bootstrap established enough product clarity to plan responsibly
- Every major decision is grounded in the origin document or research
- Each implementation unit is concrete, dependency-ordered, and implementation-ready
- If test-first or characterization-first posture was explicit or strongly implied, the relevant units carry it forward with a lightweight `Execution note`
- Test scenarios are specific without becoming test code
- Deferred items are explicit and not hidden as fake certainty
- If a High-Level Technical Design section is included, it uses the right medium for the work, carries the non-prescriptive framing, and does not contain implementation code (no imports, exact signatures, or framework-specific syntax)
- Per-unit technical design fields, if present, are concise and directional rather than copy-paste-ready
-
-If the plan originated from a requirements document, re-read that document and verify:
- The chosen approach still matches the product intent
- Scope boundaries and success criteria are preserved
- Blocking questions were either resolved, explicitly assumed, or sent back to `ce:brainstorm`
- Every section of the origin document is addressed in the plan — scan each section to confirm nothing was silently dropped
-
-#### 5.2 Write Plan File
-
-**REQUIRED: Write the plan file to disk before presenting any options.**
-
-Use the Write tool to save the complete plan to:
-
-```text
-docs/plans/YYYY-MM-DD-NNN-<type>-<descriptive-name>-beta-plan.md
-```
-
-Confirm:
-
-```text
-Plan written to docs/plans/[filename]
-```
-
-**Pipeline mode:** If invoked from an automated workflow such as LFG, SLFG, or any `disable-model-invocation` context, skip interactive questions. Make the needed choices automatically and proceed to writing the plan.
-
-#### 5.3 Post-Generation Options
-
-After writing the plan file, present the options using the platform's blocking question tool when available (see Interaction Method). Otherwise present numbered options in chat and wait for the user's reply before proceeding.
-
-**Question:** "Plan ready at `docs/plans/YYYY-MM-DD-NNN-<type>-<name>-beta-plan.md`. What would you like to do next?"
-
-**Options:**
-1. **Open plan in editor** - Open the plan file for review
-2. **Run `/deepen-plan-beta`** - Stress-test weak sections with targeted research when the plan needs more confidence
-3. **Run `document-review` skill** - Improve the plan through structured document review
-4. **Share to Proof** - Upload the plan for collaborative review and sharing
-5. **Start `/ce:work`** - Begin implementing this plan in the current environment
-6. **Start `/ce:work` in another session** - Begin implementing in a separate agent session when the current platform supports it
-7. **Create Issue** - Create an issue in the configured tracker
-
-Based on selection:
- **Open plan in editor** → Open `docs/plans/<plan_filename>.md` using the current platform's file-open or editor mechanism (e.g., `open` on macOS, `xdg-open` on Linux, or the IDE's file-open API)
- **`/deepen-plan-beta`** → Call `/deepen-plan-beta` with the plan path
- **`document-review` skill** → Load the `document-review` skill with the plan path
- **Share to Proof** → Upload the plan:
-  ```bash
-  CONTENT=$(cat docs/plans/<plan_filename>.md)
-  TITLE="Plan: <plan title from frontmatter>"
-  RESPONSE=$(curl -s -X POST https://www.proofeditor.ai/share/markdown \
-    -H "Content-Type: application/json" \
-    -d "$(jq -n --arg title "$TITLE" --arg markdown "$CONTENT" --arg by "ai:compound" '{title: $title, markdown: $markdown, by: $by}')")
-  PROOF_URL=$(echo "$RESPONSE" | jq -r '.tokenUrl')
-  ```
-  Display `View & collaborate in Proof: <PROOF_URL>` if successful, then return to the options
- **`/ce:work`** → Call `/ce:work` with the plan path
- **`/ce:work` in another session** → If the current platform supports launching a separate agent session, start `/ce:work` with the plan path there. Otherwise, explain the limitation briefly and offer to run `/ce:work` in the current session instead.
- **Create Issue** → Follow the Issue Creation section below
- **Other** → Accept free text for revisions and loop back to options
-
-If running with ultrathink enabled, or the platform's reasoning/effort level is set to max or extra-high, automatically run `/deepen-plan-beta` only when the plan is `Standard` or `Deep`, high-risk, or still shows meaningful confidence gaps in decisions, sequencing, system-wide impact, risks, or verification.
-
-## Issue Creation
-
-When the user selects "Create Issue", detect their project tracker from `AGENTS.md` or, if needed for compatibility, `CLAUDE.md`:
-
-1. Look for `project_tracker: github` or `project_tracker: linear`
-2. If GitHub:
-
-   ```bash
-   gh issue create --title "<type>: <title>" --body-file <plan_path>
-   ```
-
-3. If Linear:
-
-   ```bash
-   linear issue create --title "<title>" --description "$(cat <plan_path>)"
-   ```
-
-4. If no tracker is configured:
-   - Ask which tracker they use using the platform's blocking question tool when available (see Interaction Method)
-   - Suggest adding the tracker to `AGENTS.md` for future runs
-
-After issue creation:
- Display the issue URL
- Ask whether to proceed to `/ce:work`
-
-NEVER CODE! Research, decide, and write the plan.
--- a/plugins/compound-engineering/skills/ce-plan/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-plan/SKILL.md
--- a/plugins/compound-engineering/skills/ce-work-beta/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-work-beta/SKILL.md
@@ -447,7 +447,7 @@ This mode integrates with the existing Phase 1 Step 4 strategy selection as a **

 External delegation activates when any of these conditions are met:
 - The user says "use codex for this work", "delegate to codex", or "delegate mode"
- A plan implementation unit contains `Execution target: external-delegate` in its Execution note (set by ce:plan-beta or ce:plan)
+- A plan implementation unit contains `Execution target: external-delegate` in its Execution note (set by ce:plan)

 The specific delegate tool is resolved at execution time. Currently the only supported delegate is Codex CLI. Future delegates can be added without changing plan files.

--- a/plugins/compound-engineering/skills/deepen-plan-beta/SKILL.md
+++ b/plugins/compound-engineering/skills/deepen-plan-beta/SKILL.md
@@ -1,410 +0,0 @@
---
-name: deepen-plan-beta
-description: "[BETA] Stress-test an existing implementation plan and selectively strengthen weak sections with targeted research. Use when a plan needs more confidence around decisions, sequencing, system-wide impact, risks, or verification. Best for Standard or Deep plans, or high-risk topics such as auth, payments, migrations, external APIs, and security. For structural or clarity improvements, prefer document-review instead."
-argument-hint: "[path to plan file]"
-disable-model-invocation: true
---
-
-# Deepen Plan
-
-## Introduction
-
-**Note: The current year is 2026.** Use this when searching for recent documentation and best practices.
-
-`ce:plan-beta` does the first planning pass. `deepen-plan-beta` is a second-pass confidence check.
-
-Use this skill when the plan already exists and the question is not "Is this document clear?" but rather "Is this plan grounded enough for the complexity and risk involved?"
-
-This skill does **not** turn plans into implementation scripts. It identifies weak sections, runs targeted research only for those sections, and strengthens the plan in place.
-
-`document-review` and `deepen-plan-beta` are different:
- Use the `document-review` skill when the document needs clarity, simplification, completeness, or scope control
- Use `deepen-plan-beta` when the document is structurally sound but still needs stronger rationale, sequencing, risk treatment, or system-wide thinking
-
-## Interaction Method
-
-Use the platform's question tool when available. When asking the user a question, prefer the platform's blocking question tool if one exists (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
-
-Ask one question at a time. Prefer a concise single-select choice when natural options exist.
-
-## Plan File
-
-<plan_path> #$ARGUMENTS </plan_path>
-
-If the plan path above is empty:
-1. Check `docs/plans/` for recent files
-2. Ask the user which plan to deepen using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding
-
-Do not proceed until you have a valid plan file path.
-
-## Core Principles
-
-1. **Stress-test, do not inflate** - Deepening should increase justified confidence, not make the plan longer for its own sake.
-2. **Selective depth only** - Focus on the weakest 2-5 sections rather than enriching everything.
-3. **Prefer the simplest execution mode** - Use direct agent synthesis by default. Switch to artifact-backed research only when the selected research scope is large enough that returning all findings inline would create avoidable context pressure.
-4. **Preserve the planning boundary** - No implementation code, no git command choreography, no exact test command recipes.
-5. **Use artifact-contained evidence** - Work from the written plan, its `Context & Research`, `Sources & References`, and its origin document when present.
-6. **Respect product boundaries** - Do not invent new product requirements. If deepening reveals a product-level gap, surface it as an open question or route back to `ce:brainstorm`.
-7. **Prioritize risk and cross-cutting impact** - The more dangerous or interconnected the work, the more valuable another planning pass becomes.
-
-## Workflow
-
-### Phase 0: Load the Plan and Decide Whether Deepening Is Warranted
-
-#### 0.1 Read the Plan and Supporting Inputs
-
-Read the plan file completely.
-
-If the plan frontmatter includes an `origin:` path:
- Read the origin document too
- Use it to check whether the plan still reflects the product intent, scope boundaries, and success criteria
-
-#### 0.2 Classify Plan Depth and Topic Risk
-
-Determine the plan depth from the document:
- **Lightweight** - small, bounded, low ambiguity, usually 2-4 implementation units
- **Standard** - moderate complexity, some technical decisions, usually 3-6 units
- **Deep** - cross-cutting, high-risk, or strategically important work, usually 4-8 units or phased delivery
-
-Also build a risk profile. Treat these as high-risk signals:
- Authentication, authorization, or security-sensitive behavior
- Payments, billing, or financial flows
- Data migrations, backfills, or persistent data changes
- External APIs or third-party integrations
- Privacy, compliance, or user data handling
- Cross-interface parity or multi-surface behavior
- Significant rollout, monitoring, or operational concerns
-
-#### 0.3 Decide Whether to Deepen
-
-Use this default:
- **Lightweight** plans usually do not need deepening unless they are high-risk or the user explicitly requests it
- **Standard** plans often benefit when one or more important sections still look thin
- **Deep** or high-risk plans often benefit from a targeted second pass
-
-If the plan already appears sufficiently grounded:
- Say so briefly
- Recommend moving to `/ce:work` or the `document-review` skill
- If the user explicitly asked to deepen anyway, continue with a light pass and deepen at most 1-2 sections
-
-### Phase 1: Parse the Current `ce:plan-beta` Structure
-
-Map the plan into the current template. Look for these sections, or their nearest equivalents:
- `Overview`
- `Problem Frame`
- `Requirements Trace`
- `Scope Boundaries`
- `Context & Research`
- `Key Technical Decisions`
- `Open Questions`
- `High-Level Technical Design` (optional overview — pseudo-code, DSL grammar, mermaid diagram, or data flow)
- `Implementation Units` (may include per-unit `Technical design` subsections)
- `System-Wide Impact`
- `Risks & Dependencies`
- `Documentation / Operational Notes`
- `Sources & References`
- Optional deep-plan sections such as `Alternative Approaches Considered`, `Success Metrics`, `Phased Delivery`, `Risk Analysis & Mitigation`, and `Operational / Rollout Notes`
-
-If the plan was written manually or uses different headings:
- Map sections by intent rather than exact heading names
- If a section is structurally present but titled differently, treat it as the equivalent section
- If the plan truly lacks a section, decide whether that absence is intentional for the plan depth or a confidence gap worth scoring
-
-Also collect:
- Frontmatter, including existing `deepened:` date if present
- Number of implementation units
- Which files and test files are named
- Which learnings, patterns, or external references are cited
- Which sections appear omitted because they were unnecessary versus omitted because they are missing
-
-### Phase 2: Score Confidence Gaps
-
-Use a checklist-first, risk-weighted scoring pass.
-
-For each section, compute:
- **Trigger count** - number of checklist problems that apply
- **Risk bonus** - add 1 if the topic is high-risk and this section is materially relevant to that risk
- **Critical-section bonus** - add 1 for `Key Technical Decisions`, `Implementation Units`, `System-Wide Impact`, `Risks & Dependencies`, or `Open Questions` in `Standard` or `Deep` plans
-
-Treat a section as a candidate if:
- it hits **2+ total points**, or
- it hits **1+ point** in a high-risk domain and the section is materially important
-
-Choose only the top **2-5** sections by score. If the user explicitly asked to deepen a lightweight plan, cap at **1-2** sections unless the topic is high-risk.
-
-Example:
- A `Key Technical Decisions` section with 1 checklist trigger and the critical-section bonus scores **2 points** and is a candidate
- A `Risks & Dependencies` section with 1 checklist trigger in a high-risk migration plan also becomes a candidate because the risk bonus applies
-
-If the plan already has a `deepened:` date:
- Prefer sections that have not yet been substantially strengthened, if their scores are comparable
- Revisit an already-deepened section only when it still scores clearly higher than alternatives or the user explicitly asks for another pass on it
-
-#### 2.1 Section Checklists
-
-Use these triggers.
-
-**Requirements Trace**
- Requirements are vague or disconnected from implementation units
- Success criteria are missing or not reflected downstream
- Units do not clearly advance the traced requirements
- Origin requirements are not clearly carried forward
-
-**Context & Research / Sources & References**
- Relevant repo patterns are named but never used in decisions or implementation units
- Cited learnings or references do not materially shape the plan
- High-risk work lacks appropriate external or internal grounding
- Research is generic instead of tied to this repo or this plan
-
-**Key Technical Decisions**
- A decision is stated without rationale
- Rationale does not explain tradeoffs or rejected alternatives
- The decision does not connect back to scope, requirements, or origin context
- An obvious design fork exists but the plan never addresses why one path won
-
-**Open Questions**
- Product blockers are hidden as assumptions
- Planning-owned questions are incorrectly deferred to implementation
- Resolved questions have no clear basis in repo context, research, or origin decisions
- Deferred items are too vague to be useful later
-
-**High-Level Technical Design (when present)**
- The sketch uses the wrong medium for the work (e.g., pseudo-code where a sequence diagram would communicate better)
- The sketch contains implementation code (imports, exact signatures, framework-specific syntax) rather than pseudo-code
- The non-prescriptive framing is missing or weak
- The sketch does not connect to the key technical decisions or implementation units
-
-**High-Level Technical Design (when absent)** *(Standard or Deep plans only)*
- The work involves DSL design, API surface design, multi-component integration, complex data flow, or state-heavy lifecycle
- Key technical decisions would be easier to validate with a visual or pseudo-code representation
- The approach section of implementation units is thin and a higher-level technical design would provide context
-
-**Implementation Units**
- Dependency order is unclear or likely wrong
- File paths or test file paths are missing where they should be explicit
- Units are too large, too vague, or broken into micro-steps
- Approach notes are thin or do not name the pattern to follow
- Test scenarios or verification outcomes are vague
-
-**System-Wide Impact**
- Affected interfaces, callbacks, middleware, entry points, or parity surfaces are missing
- Failure propagation is underexplored
- State lifecycle, caching, or data integrity risks are absent where relevant
- Integration coverage is weak for cross-layer work
-
-**Risks & Dependencies / Documentation / Operational Notes**
- Risks are listed without mitigation
- Rollout, monitoring, migration, or support implications are missing when warranted
- External dependency assumptions are weak or unstated
- Security, privacy, performance, or data risks are absent where they obviously apply
-
-Use the plan's own `Context & Research` and `Sources & References` as evidence. If those sections cite a pattern, learning, or risk that never affects decisions, implementation units, or verification, treat that as a confidence gap.
-
-### Phase 3: Select Targeted Research Agents
-
-For each selected section, choose the smallest useful agent set. Do **not** run every agent. Use at most **1-3 agents per section** and usually no more than **8 agents total**.
-
-Use fully-qualified agent names inside Task calls.
-
-#### 3.1 Deterministic Section-to-Agent Mapping
-
-**Requirements Trace / Open Questions classification**
- `compound-engineering:workflow:spec-flow-analyzer` for missing user flows, edge cases, and handoff gaps
- `compound-engineering:research:repo-research-analyst` (Scope: `architecture, patterns`) for repo-grounded patterns, conventions, and implementation reality checks
-
-**Context & Research / Sources & References gaps**
- `compound-engineering:research:learnings-researcher` for institutional knowledge and past solved problems
- `compound-engineering:research:framework-docs-researcher` for official framework or library behavior
- `compound-engineering:research:best-practices-researcher` for current external patterns and industry guidance
- Add `compound-engineering:research:git-history-analyzer` only when historical rationale or prior art is materially missing
-
-**Key Technical Decisions**
- `compound-engineering:review:architecture-strategist` for design integrity, boundaries, and architectural tradeoffs
- Add `compound-engineering:research:framework-docs-researcher` or `compound-engineering:research:best-practices-researcher` when the decision needs external grounding beyond repo evidence
-
-**High-Level Technical Design**
- `compound-engineering:review:architecture-strategist` for validating that the technical design accurately represents the intended approach and identifying gaps
- `compound-engineering:research:repo-research-analyst` (Scope: `architecture, patterns`) for grounding the technical design in existing repo patterns and conventions
- Add `compound-engineering:research:best-practices-researcher` when the technical design involves a DSL, API surface, or pattern that benefits from external validation
-
-**Implementation Units / Verification**
- `compound-engineering:research:repo-research-analyst` (Scope: `patterns`) for concrete file targets, patterns to follow, and repo-specific sequencing clues
- `compound-engineering:review:pattern-recognition-specialist` for consistency, duplication risks, and alignment with existing patterns
- Add `compound-engineering:workflow:spec-flow-analyzer` when sequencing depends on user flow or handoff completeness
-
-**System-Wide Impact**
- `compound-engineering:review:architecture-strategist` for cross-boundary effects, interface surfaces, and architectural knock-on impact
- Add the specific specialist that matches the risk:
-  - `compound-engineering:review:performance-oracle` for scalability, latency, throughput, and resource-risk analysis
-  - `compound-engineering:review:security-sentinel` for auth, validation, exploit surfaces, and security boundary review
-  - `compound-engineering:review:data-integrity-guardian` for migrations, persistent state safety, consistency, and data lifecycle risks
-
-**Risks & Dependencies / Operational Notes**
- Use the specialist that matches the actual risk:
-  - `compound-engineering:review:security-sentinel` for security, auth, privacy, and exploit risk
-  - `compound-engineering:review:data-integrity-guardian` for persistent data safety, constraints, and transaction boundaries
-  - `compound-engineering:review:data-migration-expert` for migration realism, backfills, and production data transformation risk
-  - `compound-engineering:review:deployment-verification-agent` for rollout checklists, rollback planning, and launch verification
-  - `compound-engineering:review:performance-oracle` for capacity, latency, and scaling concerns
-
-#### 3.2 Agent Prompt Shape
-
-For each selected section, pass:
- The scope prefix from section 3.1 (e.g., `Scope: architecture, patterns.`) when the agent supports scoped invocation
- A short plan summary
- The exact section text
- Why the section was selected, including which checklist triggers fired
- The plan depth and risk profile
- A specific question to answer
-
-Instruct the agent to return:
- findings that change planning quality
- stronger rationale, sequencing, verification, risk treatment, or references
- no implementation code
- no shell commands
-
-#### 3.3 Choose Research Execution Mode
-
-Use the lightest mode that will work:
-
- **Direct mode** - Default. Use when the selected section set is small and the parent can safely read the agent outputs inline.
- **Artifact-backed mode** - Use only when the selected research scope is large enough that inline returns would create unnecessary context pressure.
-
-Signals that justify artifact-backed mode:
- More than 5 agents are likely to return meaningful findings
- The selected section excerpts are long enough that repeating them in multiple agent outputs would be wasteful
- The topic is high-risk and likely to attract bulky source-backed analysis
- The platform has a history of parent-context instability on large parallel returns
-
-If artifact-backed mode is not clearly warranted, stay in direct mode.
-
-### Phase 4: Run Targeted Research and Review
-
-Launch the selected agents in parallel using the execution mode chosen in Step 3.3. If the current platform does not support parallel dispatch, run them sequentially instead.
-
-Prefer local repo and institutional evidence first. Use external research only when the gap cannot be closed responsibly from repo context or already-cited sources.
-
-If a selected section can be improved by reading the origin document more carefully, do that before dispatching external agents.
-
-#### 4.1 Direct Mode
-
-Have each selected agent return its findings directly to the parent.
-
-Keep the return payload focused:
- strongest findings only
- the evidence or sources that matter
- the concrete planning improvement implied by the finding
-
-If a direct-mode agent starts producing bulky or repetitive output, stop and switch the remaining research to artifact-backed mode instead of letting the parent context bloat.
-
-#### 4.2 Artifact-Backed Mode
-
-Use a per-run scratch directory under `.context/compound-engineering/deepen-plan-beta/`, for example `.context/compound-engineering/deepen-plan-beta/<run-id>/` or `.context/compound-engineering/deepen-plan-beta/<plan-filename-stem>/`.
-
-Use the scratch directory only for the current deepening pass.
-
-For each selected agent:
- give it the same plan summary, section text, trigger rationale, depth, and risk profile described in Step 3.2
- instruct it to write one compact artifact file for its assigned section or sections
- have it return only a short completion summary to the parent
-
-Prefer a compact markdown artifact unless machine-readable structure is clearly useful. Each artifact should contain:
- target section id and title
- why the section was selected
- 3-7 findings that materially improve planning quality
- source-backed rationale, including whether the evidence came from repo context, origin context, institutional learnings, official docs, or external best practices
- the specific plan change implied by each finding
- any unresolved tradeoff that should remain explicit in the plan
-
-Artifact rules:
- no implementation code
- no shell commands
- no checkpoint logs or self-diagnostics
- no duplicated boilerplate across files
- no judge or merge sub-pipeline
-
-Before synthesis:
- quickly verify that each selected section has at least one usable artifact
- if an artifact is missing or clearly malformed, re-run that agent or fall back to direct-mode reasoning for that section instead of building a validation pipeline
-
-If agent outputs conflict:
- Prefer repo-grounded and origin-grounded evidence over generic advice
- Prefer official framework documentation over secondary best-practice summaries when the conflict is about library behavior
- If a real tradeoff remains, record it explicitly in the plan rather than pretending the conflict does not exist
-
-### Phase 5: Synthesize and Rewrite the Plan
-
-Strengthen only the selected sections. Keep the plan coherent and preserve its overall structure.
-
-If artifact-backed mode was used:
- read the plan, origin document if present, and the selected section artifacts
- also incorporate any findings already returned inline from direct-mode agents before a mid-run switch, so early results are not silently dropped
- synthesize in one pass
- do not create a separate judge, merge, or quality-review phase unless the user explicitly asks for another pass
-
-Allowed changes:
- Clarify or strengthen decision rationale
- Tighten requirements trace or origin fidelity
- Reorder or split implementation units when sequencing is weak
- Add missing pattern references, file/test paths, or verification outcomes
- Expand system-wide impact, risks, or rollout treatment where justified
- Reclassify open questions between `Resolved During Planning` and `Deferred to Implementation` when evidence supports the change
- Strengthen, replace, or add a High-Level Technical Design section when the work warrants it and the current representation is weak, uses the wrong medium, or is absent where it would help. Preserve the non-prescriptive framing
- Strengthen or add per-unit technical design fields where the unit's approach is non-obvious and the current approach notes are thin
- Add an optional deep-plan section only when it materially improves execution quality
- Add or update `deepened: YYYY-MM-DD` in frontmatter when the plan was substantively improved
-
-Do **not**:
- Add implementation code — no imports, exact method signatures, or framework-specific syntax. Pseudo-code sketches and DSL grammars are allowed in both the top-level High-Level Technical Design section and per-unit technical design fields
- Add git commands, commit choreography, or exact test command recipes
- Add generic `Research Insights` subsections everywhere
- Rewrite the entire plan from scratch
- Invent new product requirements, scope changes, or success criteria without surfacing them explicitly
-
-If research reveals a product-level ambiguity that should change behavior or scope:
- Do not silently decide it here
- Record it under `Open Questions`
- Recommend `ce:brainstorm` if the gap is truly product-defining
-
-### Phase 6: Final Checks and Write the File
-
-Before writing:
- Confirm the plan is stronger in specific ways, not merely longer
- Confirm the planning boundary is intact
- Confirm the selected sections were actually the weakest ones
- Confirm origin decisions were preserved when an origin document exists
- Confirm the final plan still feels right-sized for its depth
- If artifact-backed mode was used, confirm the scratch artifacts did not become a second hidden plan format
-
-Update the plan file in place by default.
-
-If the user explicitly requests a separate file, append `-deepened` before `.md`, for example:
- `docs/plans/2026-03-15-001-feat-example-plan-deepened.md`
-
-If artifact-backed mode was used and the user did not ask to inspect the scratch files:
- delete the specific per-run scratch directory (e.g., `.context/compound-engineering/deepen-plan-beta/<run-id>/`) after the plan is safely written. Do not delete any other `.context/` subdirectories.
- if cleanup is not practical on the current platform, say where the artifacts were left and that they are temporary workflow output
-
-## Post-Enhancement Options
-
-If substantive changes were made, present next steps using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
-
-**Question:** "Plan deepened at `[plan_path]`. What would you like to do next?"
-
-**Options:**
-1. **View diff** - Show what changed
-2. **Run `document-review` skill** - Improve the updated plan through structured document review
-3. **Start `ce:work` skill** - Begin implementing the plan
-4. **Deepen specific sections further** - Run another targeted deepening pass on named sections
-
-Based on selection:
- **View diff** -> Show the important additions and changed sections
- **`document-review` skill** -> Load the `document-review` skill with the plan path
- **Start `ce:work` skill** -> Call the `ce:work` skill with the plan path
- **Deepen specific sections further** -> Ask which sections still feel weak and run another targeted pass only for those sections
-
-If no substantive changes were warranted:
- Say that the plan already appears sufficiently grounded
- Offer the `document-review` skill or `/ce:work` as the next step instead
-
-NEVER CODE! Research, challenge, and strengthen the plan.
--- a/plugins/compound-engineering/skills/deepen-plan/SKILL.md
+++ b/plugins/compound-engineering/skills/deepen-plan/SKILL.md
@@ -1,544 +1,409 @@
 ---
 name: deepen-plan
-description: Enhance a plan with parallel research agents for each section to add depth, best practices, and implementation details
+description: "Stress-test an existing implementation plan and selectively strengthen weak sections with targeted research. Use when a plan needs more confidence around decisions, sequencing, system-wide impact, risks, or verification. Best for Standard or Deep plans, or high-risk topics such as auth, payments, migrations, external APIs, and security. For structural or clarity improvements, prefer document-review instead."
 argument-hint: "[path to plan file]"
 ---

-# Deepen Plan - Power Enhancement Mode
+# Deepen Plan

 ## Introduction

 **Note: The current year is 2026.** Use this when searching for recent documentation and best practices.

-This command takes an existing plan (from `/ce:plan`) and enhances each section with parallel research agents. Each major element gets its own dedicated research sub-agent to find:
- Best practices and industry patterns
- Performance optimizations
- UI/UX improvements (if applicable)
- Quality enhancements and edge cases
- Real-world implementation examples
+`ce:plan` does the first planning pass. `deepen-plan` is a second-pass confidence check.

-The result is a deeply grounded, production-ready plan with concrete implementation details.
+Use this skill when the plan already exists and the question is not "Is this document clear?" but rather "Is this plan grounded enough for the complexity and risk involved?"
+
+This skill does **not** turn plans into implementation scripts. It identifies weak sections, runs targeted research only for those sections, and strengthens the plan in place.
+
+`document-review` and `deepen-plan` are different:
+- Use the `document-review` skill when the document needs clarity, simplification, completeness, or scope control
+- Use `deepen-plan` when the document is structurally sound but still needs stronger rationale, sequencing, risk treatment, or system-wide thinking
+
+## Interaction Method
+
+Use the platform's question tool when available. When asking the user a question, prefer the platform's blocking question tool if one exists (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
+
+Ask one question at a time. Prefer a concise single-select choice when natural options exist.

 ## Plan File

 <plan_path> #$ARGUMENTS </plan_path>

-**If the plan path above is empty:**
-1. Check for recent plans: `ls -la docs/plans/`
-2. Ask the user: "Which plan would you like to deepen? Please provide the path (e.g., `docs/plans/2026-01-15-feat-my-feature-plan.md`)."
+If the plan path above is empty:
+1. Check `docs/plans/` for recent files
+2. Ask the user which plan to deepen using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding

 Do not proceed until you have a valid plan file path.

-## Main Tasks
+## Core Principles
+
+1. **Stress-test, do not inflate** - Deepening should increase justified confidence, not make the plan longer for its own sake.
+2. **Selective depth only** - Focus on the weakest 2-5 sections rather than enriching everything.
+3. **Prefer the simplest execution mode** - Use direct agent synthesis by default. Switch to artifact-backed research only when the selected research scope is large enough that returning all findings inline would create avoidable context pressure.
+4. **Preserve the planning boundary** - No implementation code, no git command choreography, no exact test command recipes.
+5. **Use artifact-contained evidence** - Work from the written plan, its `Context & Research`, `Sources & References`, and its origin document when present.
+6. **Respect product boundaries** - Do not invent new product requirements. If deepening reveals a product-level gap, surface it as an open question or route back to `ce:brainstorm`.
+7. **Prioritize risk and cross-cutting impact** - The more dangerous or interconnected the work, the more valuable another planning pass becomes.
+
+## Workflow
+
+### Phase 0: Load the Plan and Decide Whether Deepening Is Warranted
+
+#### 0.1 Read the Plan and Supporting Inputs
+
+Read the plan file completely.
+
+If the plan frontmatter includes an `origin:` path:
+- Read the origin document too
+- Use it to check whether the plan still reflects the product intent, scope boundaries, and success criteria
+
+#### 0.2 Classify Plan Depth and Topic Risk
+
+Determine the plan depth from the document:
+- **Lightweight** - small, bounded, low ambiguity, usually 2-4 implementation units
+- **Standard** - moderate complexity, some technical decisions, usually 3-6 units
+- **Deep** - cross-cutting, high-risk, or strategically important work, usually 4-8 units or phased delivery
+
+Also build a risk profile. Treat these as high-risk signals:
+- Authentication, authorization, or security-sensitive behavior
+- Payments, billing, or financial flows
+- Data migrations, backfills, or persistent data changes
+- External APIs or third-party integrations
+- Privacy, compliance, or user data handling
+- Cross-interface parity or multi-surface behavior
+- Significant rollout, monitoring, or operational concerns
+
+#### 0.3 Decide Whether to Deepen
+
+Use this default:
+- **Lightweight** plans usually do not need deepening unless they are high-risk or the user explicitly requests it
+- **Standard** plans often benefit when one or more important sections still look thin
+- **Deep** or high-risk plans often benefit from a targeted second pass
+
+If the plan already appears sufficiently grounded:
+- Say so briefly
+- Recommend moving to `/ce:work` or the `document-review` skill
+- If the user explicitly asked to deepen anyway, continue with a light pass and deepen at most 1-2 sections
+
+### Phase 1: Parse the Current `ce:plan` Structure
+
+Map the plan into the current template. Look for these sections, or their nearest equivalents:
+- `Overview`
+- `Problem Frame`
+- `Requirements Trace`
+- `Scope Boundaries`
+- `Context & Research`
+- `Key Technical Decisions`
+- `Open Questions`
+- `High-Level Technical Design` (optional overview — pseudo-code, DSL grammar, mermaid diagram, or data flow)
+- `Implementation Units` (may include per-unit `Technical design` subsections)
+- `System-Wide Impact`
+- `Risks & Dependencies`
+- `Documentation / Operational Notes`
+- `Sources & References`
+- Optional deep-plan sections such as `Alternative Approaches Considered`, `Success Metrics`, `Phased Delivery`, `Risk Analysis & Mitigation`, and `Operational / Rollout Notes`
+
+If the plan was written manually or uses different headings:
+- Map sections by intent rather than exact heading names
+- If a section is structurally present but titled differently, treat it as the equivalent section
+- If the plan truly lacks a section, decide whether that absence is intentional for the plan depth or a confidence gap worth scoring
+
+Also collect:
+- Frontmatter, including existing `deepened:` date if present
+- Number of implementation units
+- Which files and test files are named
+- Which learnings, patterns, or external references are cited
+- Which sections appear omitted because they were unnecessary versus omitted because they are missing
+
+### Phase 2: Score Confidence Gaps
+
+Use a checklist-first, risk-weighted scoring pass.
+
+For each section, compute:
+- **Trigger count** - number of checklist problems that apply
+- **Risk bonus** - add 1 if the topic is high-risk and this section is materially relevant to that risk
+- **Critical-section bonus** - add 1 for `Key Technical Decisions`, `Implementation Units`, `System-Wide Impact`, `Risks & Dependencies`, or `Open Questions` in `Standard` or `Deep` plans
+
+Treat a section as a candidate if:
+- it hits **2+ total points**, or
+- it hits **1+ point** in a high-risk domain and the section is materially important
+
+Choose only the top **2-5** sections by score. If the user explicitly asked to deepen a lightweight plan, cap at **1-2** sections unless the topic is high-risk.
+
+Example:
+- A `Key Technical Decisions` section with 1 checklist trigger and the critical-section bonus scores **2 points** and is a candidate
+- A `Risks & Dependencies` section with 1 checklist trigger in a high-risk migration plan also becomes a candidate because the risk bonus applies
+
+If the plan already has a `deepened:` date:
+- Prefer sections that have not yet been substantially strengthened, if their scores are comparable
+- Revisit an already-deepened section only when it still scores clearly higher than alternatives or the user explicitly asks for another pass on it
+
+#### 2.1 Section Checklists
+
+Use these triggers.
+
+**Requirements Trace**
+- Requirements are vague or disconnected from implementation units
+- Success criteria are missing or not reflected downstream
+- Units do not clearly advance the traced requirements
+- Origin requirements are not clearly carried forward
+
+**Context & Research / Sources & References**
+- Relevant repo patterns are named but never used in decisions or implementation units
+- Cited learnings or references do not materially shape the plan
+- High-risk work lacks appropriate external or internal grounding
+- Research is generic instead of tied to this repo or this plan
+
+**Key Technical Decisions**
+- A decision is stated without rationale
+- Rationale does not explain tradeoffs or rejected alternatives
+- The decision does not connect back to scope, requirements, or origin context
+- An obvious design fork exists but the plan never addresses why one path won
+
+**Open Questions**
+- Product blockers are hidden as assumptions
+- Planning-owned questions are incorrectly deferred to implementation
+- Resolved questions have no clear basis in repo context, research, or origin decisions
+- Deferred items are too vague to be useful later
+
+**High-Level Technical Design (when present)**
+- The sketch uses the wrong medium for the work (e.g., pseudo-code where a sequence diagram would communicate better)
+- The sketch contains implementation code (imports, exact signatures, framework-specific syntax) rather than pseudo-code
+- The non-prescriptive framing is missing or weak
+- The sketch does not connect to the key technical decisions or implementation units
+
+**High-Level Technical Design (when absent)** *(Standard or Deep plans only)*
+- The work involves DSL design, API surface design, multi-component integration, complex data flow, or state-heavy lifecycle
+- Key technical decisions would be easier to validate with a visual or pseudo-code representation
+- The approach section of implementation units is thin and a higher-level technical design would provide context
+
+**Implementation Units**
+- Dependency order is unclear or likely wrong
+- File paths or test file paths are missing where they should be explicit
+- Units are too large, too vague, or broken into micro-steps
+- Approach notes are thin or do not name the pattern to follow
+- Test scenarios or verification outcomes are vague
+
+**System-Wide Impact**
+- Affected interfaces, callbacks, middleware, entry points, or parity surfaces are missing
+- Failure propagation is underexplored
+- State lifecycle, caching, or data integrity risks are absent where relevant
+- Integration coverage is weak for cross-layer work
+
+**Risks & Dependencies / Documentation / Operational Notes**
+- Risks are listed without mitigation
+- Rollout, monitoring, migration, or support implications are missing when warranted
+- External dependency assumptions are weak or unstated
+- Security, privacy, performance, or data risks are absent where they obviously apply
+
+Use the plan's own `Context & Research` and `Sources & References` as evidence. If those sections cite a pattern, learning, or risk that never affects decisions, implementation units, or verification, treat that as a confidence gap.

-### 1. Parse and Analyze Plan Structure
+### Phase 3: Select Targeted Research Agents

-<thinking>
-First, read and parse the plan to identify each major section that can be enhanced with research.
-</thinking>
+For each selected section, choose the smallest useful agent set. Do **not** run every agent. Use at most **1-3 agents per section** and usually no more than **8 agents total**.

-**Read the plan file and extract:**
- [ ] Overview/Problem Statement
- [ ] Proposed Solution sections
- [ ] Technical Approach/Architecture
- [ ] Implementation phases/steps
- [ ] Code examples and file references
- [ ] Acceptance criteria
- [ ] Any UI/UX components mentioned
- [ ] Technologies/frameworks mentioned (Rails, React, Python, TypeScript, etc.)
- [ ] Domain areas (data models, APIs, UI, security, performance, etc.)
+Use fully-qualified agent names inside Task calls.

-**Create a section manifest:**
-```
-Section 1: [Title] - [Brief description of what to research]
-Section 2: [Title] - [Brief description of what to research]
-...
-```
+#### 3.1 Deterministic Section-to-Agent Mapping

-### 2. Discover and Apply Available Skills
+**Requirements Trace / Open Questions classification**
+- `compound-engineering:workflow:spec-flow-analyzer` for missing user flows, edge cases, and handoff gaps
+- `compound-engineering:research:repo-research-analyst` (Scope: `architecture, patterns`) for repo-grounded patterns, conventions, and implementation reality checks

-<thinking>
-Dynamically discover all available skills and match them to plan sections. Don't assume what skills exist - discover them at runtime.
-</thinking>
+**Context & Research / Sources & References gaps**
+- `compound-engineering:research:learnings-researcher` for institutional knowledge and past solved problems
+- `compound-engineering:research:framework-docs-researcher` for official framework or library behavior
+- `compound-engineering:research:best-practices-researcher` for current external patterns and industry guidance
+- Add `compound-engineering:research:git-history-analyzer` only when historical rationale or prior art is materially missing

-**Step 1: Discover ALL available skills from ALL sources**
+**Key Technical Decisions**
+- `compound-engineering:review:architecture-strategist` for design integrity, boundaries, and architectural tradeoffs
+- Add `compound-engineering:research:framework-docs-researcher` or `compound-engineering:research:best-practices-researcher` when the decision needs external grounding beyond repo evidence
+
+**High-Level Technical Design**
+- `compound-engineering:review:architecture-strategist` for validating that the technical design accurately represents the intended approach and identifying gaps
+- `compound-engineering:research:repo-research-analyst` (Scope: `architecture, patterns`) for grounding the technical design in existing repo patterns and conventions
+- Add `compound-engineering:research:best-practices-researcher` when the technical design involves a DSL, API surface, or pattern that benefits from external validation

-```bash
-# 1. Project-local skills (highest priority - project-specific)
-ls .claude/skills/
+**Implementation Units / Verification**
+- `compound-engineering:research:repo-research-analyst` (Scope: `patterns`) for concrete file targets, patterns to follow, and repo-specific sequencing clues
+- `compound-engineering:review:pattern-recognition-specialist` for consistency, duplication risks, and alignment with existing patterns
+- Add `compound-engineering:workflow:spec-flow-analyzer` when sequencing depends on user flow or handoff completeness

-# 2. User's global skills (~/.claude/)
-ls ~/.claude/skills/
+**System-Wide Impact**
+- `compound-engineering:review:architecture-strategist` for cross-boundary effects, interface surfaces, and architectural knock-on impact
+- Add the specific specialist that matches the risk:
+  - `compound-engineering:review:performance-oracle` for scalability, latency, throughput, and resource-risk analysis
+  - `compound-engineering:review:security-sentinel` for auth, validation, exploit surfaces, and security boundary review
+  - `compound-engineering:review:data-integrity-guardian` for migrations, persistent state safety, consistency, and data lifecycle risks

-# 3. compound-engineering plugin skills
-ls ~/.claude/plugins/cache/*/compound-engineering/*/skills/
+**Risks & Dependencies / Operational Notes**
+- Use the specialist that matches the actual risk:
+  - `compound-engineering:review:security-sentinel` for security, auth, privacy, and exploit risk
+  - `compound-engineering:review:data-integrity-guardian` for persistent data safety, constraints, and transaction boundaries
+  - `compound-engineering:review:data-migration-expert` for migration realism, backfills, and production data transformation risk
+  - `compound-engineering:review:deployment-verification-agent` for rollout checklists, rollback planning, and launch verification
+  - `compound-engineering:review:performance-oracle` for capacity, latency, and scaling concerns
+
+#### 3.2 Agent Prompt Shape
+
+For each selected section, pass:
+- The scope prefix from section 3.1 (e.g., `Scope: architecture, patterns.`) when the agent supports scoped invocation
+- A short plan summary
+- The exact section text
+- Why the section was selected, including which checklist triggers fired
+- The plan depth and risk profile
+- A specific question to answer
+
+Instruct the agent to return:
+- findings that change planning quality
+- stronger rationale, sequencing, verification, risk treatment, or references
+- no implementation code
+- no shell commands
+
+#### 3.3 Choose Research Execution Mode
+
+Use the lightest mode that will work:
+
+- **Direct mode** - Default. Use when the selected section set is small and the parent can safely read the agent outputs inline.
+- **Artifact-backed mode** - Use only when the selected research scope is large enough that inline returns would create unnecessary context pressure.
+
+Signals that justify artifact-backed mode:
+- More than 5 agents are likely to return meaningful findings
+- The selected section excerpts are long enough that repeating them in multiple agent outputs would be wasteful
+- The topic is high-risk and likely to attract bulky source-backed analysis
+- The platform has a history of parent-context instability on large parallel returns
+
+If artifact-backed mode is not clearly warranted, stay in direct mode.
+
+### Phase 4: Run Targeted Research and Review
+
+Launch the selected agents in parallel using the execution mode chosen in Step 3.3. If the current platform does not support parallel dispatch, run them sequentially instead.
+
+Prefer local repo and institutional evidence first. Use external research only when the gap cannot be closed responsibly from repo context or already-cited sources.
+
+If a selected section can be improved by reading the origin document more carefully, do that before dispatching external agents.

-# 4. ALL other installed plugins - check every plugin for skills
-find ~/.claude/plugins/cache -type d -name "skills" 2>/dev/null
+#### 4.1 Direct Mode
+
+Have each selected agent return its findings directly to the parent.

-# 5. Also check installed_plugins.json for all plugin locations
-cat ~/.claude/plugins/installed_plugins.json
-```
+Keep the return payload focused:
+- strongest findings only
+- the evidence or sources that matter
+- the concrete planning improvement implied by the finding

-**Important:** Check EVERY source. Don't assume compound-engineering is the only plugin. Use skills from ANY installed plugin that's relevant.
+If a direct-mode agent starts producing bulky or repetitive output, stop and switch the remaining research to artifact-backed mode instead of letting the parent context bloat.

-**Step 2: For each discovered skill, read its SKILL.md to understand what it does**
+#### 4.2 Artifact-Backed Mode

-```bash
-# For each skill directory found, read its documentation
-cat [skill-path]/SKILL.md
-```
+Use a per-run scratch directory under `.context/compound-engineering/deepen-plan/`, for example `.context/compound-engineering/deepen-plan/<run-id>/` or `.context/compound-engineering/deepen-plan/<plan-filename-stem>/`.
+
+Use the scratch directory only for the current deepening pass.
+
+For each selected agent:
+- give it the same plan summary, section text, trigger rationale, depth, and risk profile described in Step 3.2
+- instruct it to write one compact artifact file for its assigned section or sections
+- have it return only a short completion summary to the parent
+
+Prefer a compact markdown artifact unless machine-readable structure is clearly useful. Each artifact should contain:
+- target section id and title
+- why the section was selected
+- 3-7 findings that materially improve planning quality
+- source-backed rationale, including whether the evidence came from repo context, origin context, institutional learnings, official docs, or external best practices
+- the specific plan change implied by each finding
+- any unresolved tradeoff that should remain explicit in the plan
+
+Artifact rules:
+- no implementation code
+- no shell commands
+- no checkpoint logs or self-diagnostics
+- no duplicated boilerplate across files
+- no judge or merge sub-pipeline
+
+Before synthesis:
+- quickly verify that each selected section has at least one usable artifact
+- if an artifact is missing or clearly malformed, re-run that agent or fall back to direct-mode reasoning for that section instead of building a validation pipeline
+
+If agent outputs conflict:
+- Prefer repo-grounded and origin-grounded evidence over generic advice
+- Prefer official framework documentation over secondary best-practice summaries when the conflict is about library behavior
+- If a real tradeoff remains, record it explicitly in the plan rather than pretending the conflict does not exist
+
+### Phase 5: Synthesize and Rewrite the Plan
+
+Strengthen only the selected sections. Keep the plan coherent and preserve its overall structure.
+
+If artifact-backed mode was used:
+- read the plan, origin document if present, and the selected section artifacts
+- also incorporate any findings already returned inline from direct-mode agents before a mid-run switch, so early results are not silently dropped
+- synthesize in one pass
+- do not create a separate judge, merge, or quality-review phase unless the user explicitly asks for another pass

-**Step 3: Match skills to plan content**
+Allowed changes:
+- Clarify or strengthen decision rationale
+- Tighten requirements trace or origin fidelity
+- Reorder or split implementation units when sequencing is weak
+- Add missing pattern references, file/test paths, or verification outcomes
+- Expand system-wide impact, risks, or rollout treatment where justified
+- Reclassify open questions between `Resolved During Planning` and `Deferred to Implementation` when evidence supports the change
+- Strengthen, replace, or add a High-Level Technical Design section when the work warrants it and the current representation is weak, uses the wrong medium, or is absent where it would help. Preserve the non-prescriptive framing
+- Strengthen or add per-unit technical design fields where the unit's approach is non-obvious and the current approach notes are thin
+- Add an optional deep-plan section only when it materially improves execution quality
+- Add or update `deepened: YYYY-MM-DD` in frontmatter when the plan was substantively improved

-For each skill discovered:
- Read its SKILL.md description
- Check if any plan sections match the skill's domain
- If there's a match, spawn a sub-agent to apply that skill's knowledge
+Do **not**:
+- Add implementation code — no imports, exact method signatures, or framework-specific syntax. Pseudo-code sketches and DSL grammars are allowed in both the top-level High-Level Technical Design section and per-unit technical design fields
+- Add git commands, commit choreography, or exact test command recipes
+- Add generic `Research Insights` subsections everywhere
+- Rewrite the entire plan from scratch
+- Invent new product requirements, scope changes, or success criteria without surfacing them explicitly

-**Step 4: Spawn a sub-agent for EVERY matched skill**
+If research reveals a product-level ambiguity that should change behavior or scope:
+- Do not silently decide it here
+- Record it under `Open Questions`
+- Recommend `ce:brainstorm` if the gap is truly product-defining

-**CRITICAL: For EACH skill that matches, spawn a separate sub-agent and instruct it to USE that skill.**
+### Phase 6: Final Checks and Write the File

-For each matched skill:
-```
-Task general-purpose: "You have the [skill-name] skill available at [skill-path].
+Before writing:
+- Confirm the plan is stronger in specific ways, not merely longer
+- Confirm the planning boundary is intact
+- Confirm the selected sections were actually the weakest ones
+- Confirm origin decisions were preserved when an origin document exists
+- Confirm the final plan still feels right-sized for its depth
+- If artifact-backed mode was used, confirm the scratch artifacts did not become a second hidden plan format

-YOUR JOB: Use this skill on the plan.
+Update the plan file in place by default.

-1. Read the skill: cat [skill-path]/SKILL.md
-2. Follow the skill's instructions exactly
-3. Apply the skill to this content:
-
-[relevant plan section or full plan]
-
-4. Return the skill's full output
-
-The skill tells you what to do - follow it. Execute the skill completely."
-```
-
-**Spawn ALL skill sub-agents in PARALLEL:**
- 1 sub-agent per matched skill
- Each sub-agent reads and uses its assigned skill
- All run simultaneously
- 10, 20, 30 skill sub-agents is fine
-
-**Each sub-agent:**
-1. Reads its skill's SKILL.md
-2. Follows the skill's workflow/instructions
-3. Applies the skill to the plan
-4. Returns whatever the skill produces (code, recommendations, patterns, reviews, etc.)
-
-**Example spawns:**
-```
-Task general-purpose: "Use the dhh-rails-style skill at ~/.claude/plugins/.../dhh-rails-style. Read SKILL.md and apply it to: [Rails sections of plan]"
-
-Task general-purpose: "Use the frontend-design skill at ~/.claude/plugins/.../frontend-design. Read SKILL.md and apply it to: [UI sections of plan]"
-
-Task general-purpose: "Use the agent-native-architecture skill at ~/.claude/plugins/.../agent-native-architecture. Read SKILL.md and apply it to: [agent/tool sections of plan]"
-
-Task general-purpose: "Use the security-patterns skill at ~/.claude/skills/security-patterns. Read SKILL.md and apply it to: [full plan]"
-```
-
-**No limit on skill sub-agents. Spawn one for every skill that could possibly be relevant.**
-
-### 3. Discover and Apply Learnings/Solutions
-
-<thinking>
-Check for documented learnings from /ce:compound. These are solved problems stored as markdown files. Spawn a sub-agent for each learning to check if it's relevant.
-</thinking>
-
-**LEARNINGS LOCATION - Check these exact folders:**
-
-```
-docs/solutions/           <-- PRIMARY: Project-level learnings (created by /ce:compound)
-├── performance-issues/
-│   └── *.md
-├── debugging-patterns/
-│   └── *.md
-├── configuration-fixes/
-│   └── *.md
-├── integration-issues/
-│   └── *.md
-├── deployment-issues/
-│   └── *.md
-└── [other-categories]/
-    └── *.md
-```
-
-**Step 1: Find ALL learning markdown files**
-
-Run these commands to get every learning file:
-
-```bash
-# PRIMARY LOCATION - Project learnings
-find docs/solutions -name "*.md" -type f 2>/dev/null
-
-# If docs/solutions doesn't exist, check alternate locations:
-find .claude/docs -name "*.md" -type f 2>/dev/null
-find ~/.claude/docs -name "*.md" -type f 2>/dev/null
-```
-
-**Step 2: Read frontmatter of each learning to filter**
-
-Each learning file has YAML frontmatter with metadata. Read the first ~20 lines of each file to get:
-
-```yaml
---
-title: "N+1 Query Fix for Briefs"
-category: performance-issues
-tags: [activerecord, n-plus-one, includes, eager-loading]
-module: Briefs
-symptom: "Slow page load, multiple queries in logs"
-root_cause: "Missing includes on association"
---
-```
-
-**For each .md file, quickly scan its frontmatter:**
-
-```bash
-# Read first 20 lines of each learning (frontmatter + summary)
-head -20 docs/solutions/**/*.md
-```
-
-**Step 3: Filter - only spawn sub-agents for LIKELY relevant learnings**
-
-Compare each learning's frontmatter against the plan:
- `tags:` - Do any tags match technologies/patterns in the plan?
- `category:` - Is this category relevant? (e.g., skip deployment-issues if plan is UI-only)
- `module:` - Does the plan touch this module?
- `symptom:` / `root_cause:` - Could this problem occur with the plan?
-
-**SKIP learnings that are clearly not applicable:**
- Plan is frontend-only → skip `database-migrations/` learnings
- Plan is Python → skip `rails-specific/` learnings
- Plan has no auth → skip `authentication-issues/` learnings
-
-**SPAWN sub-agents for learnings that MIGHT apply:**
- Any tag overlap with plan technologies
- Same category as plan domain
- Similar patterns or concerns
-
-**Step 4: Spawn sub-agents for filtered learnings**
-
-For each learning that passes the filter:
-
-```
-Task general-purpose: "
-LEARNING FILE: [full path to .md file]
-
-1. Read this learning file completely
-2. This learning documents a previously solved problem
-
-Check if this learning applies to this plan:
-
---
-[full plan content]
---
-
-If relevant:
- Explain specifically how it applies
- Quote the key insight or solution
- Suggest where/how to incorporate it
-
-If NOT relevant after deeper analysis:
- Say 'Not applicable: [reason]'
-"
-```
-
-**Example filtering:**
-```
-# Found 15 learning files, plan is about "Rails API caching"
-
-# SPAWN (likely relevant):
-docs/solutions/performance-issues/n-plus-one-queries.md      # tags: [activerecord] ✓
-docs/solutions/performance-issues/redis-cache-stampede.md    # tags: [caching, redis] ✓
-docs/solutions/configuration-fixes/redis-connection-pool.md  # tags: [redis] ✓
-
-# SKIP (clearly not applicable):
-docs/solutions/deployment-issues/heroku-memory-quota.md      # not about caching
-docs/solutions/frontend-issues/stimulus-race-condition.md    # plan is API, not frontend
-docs/solutions/authentication-issues/jwt-expiry.md           # plan has no auth
-```
-
-**Spawn sub-agents in PARALLEL for all filtered learnings.**
-
-**These learnings are institutional knowledge - applying them prevents repeating past mistakes.**
-
-### 4. Launch Per-Section Research Agents
-
-<thinking>
-For each major section in the plan, spawn dedicated sub-agents to research improvements. Use the Explore agent type for open-ended research.
-</thinking>
-
-**For each identified section, launch parallel research:**
-
-```
-Task Explore: "Research best practices, patterns, and real-world examples for: [section topic].
-Find:
- Industry standards and conventions
- Performance considerations
- Common pitfalls and how to avoid them
- Documentation and tutorials
-Return concrete, actionable recommendations."
-```
-
-**Also use Context7 MCP for framework documentation:**
-
-For any technologies/frameworks mentioned in the plan, query Context7:
-```
-mcp__plugin_compound-engineering_context7__resolve-library-id: Find library ID for [framework]
-mcp__plugin_compound-engineering_context7__query-docs: Query documentation for specific patterns
-```
-
-**Use WebSearch for current best practices:**
-
-Search for recent (2024-2026) articles, blog posts, and documentation on topics in the plan.
-
-### 5. Discover and Run ALL Review Agents
-
-<thinking>
-Dynamically discover every available agent and run them ALL against the plan. Don't filter, don't skip, don't assume relevance. 40+ parallel agents is fine. Use everything available.
-</thinking>
-
-**Step 1: Discover ALL available agents from ALL sources**
-
-```bash
-# 1. Project-local agents (highest priority - project-specific)
-find .claude/agents -name "*.md" 2>/dev/null
-
-# 2. User's global agents (~/.claude/)
-find ~/.claude/agents -name "*.md" 2>/dev/null
-
-# 3. compound-engineering plugin agents (all subdirectories)
-find ~/.claude/plugins/cache/*/compound-engineering/*/agents -name "*.md" 2>/dev/null
-
-# 4. ALL other installed plugins - check every plugin for agents
-find ~/.claude/plugins/cache -path "*/agents/*.md" 2>/dev/null
-
-# 5. Check installed_plugins.json to find all plugin locations
-cat ~/.claude/plugins/installed_plugins.json
-
-# 6. For local plugins (isLocal: true), check their source directories
-# Parse installed_plugins.json and find local plugin paths
-```
-
-**Important:** Check EVERY source. Include agents from:
- Project `.claude/agents/`
- User's `~/.claude/agents/`
- compound-engineering plugin (but SKIP workflow/ agents - only use review/, research/, design/, docs/)
- ALL other installed plugins (agent-sdk-dev, frontend-design, etc.)
- Any local plugins
-
-**For compound-engineering plugin specifically:**
- USE: `agents/review/*` (all reviewers)
- USE: `agents/research/*` (all researchers)
- USE: `agents/design/*` (design agents)
- USE: `agents/docs/*` (documentation agents)
- SKIP: `agents/workflow/*` (these are workflow orchestrators, not reviewers)
-
-**Step 2: For each discovered agent, read its description**
-
-Read the first few lines of each agent file to understand what it reviews/analyzes.
-
-**Step 3: Launch ALL agents in parallel**
-
-For EVERY agent discovered, launch a Task in parallel:
-
-```
-Task [agent-name]: "Review this plan using your expertise. Apply all your checks and patterns. Plan content: [full plan content]"
-```
-
-**CRITICAL RULES:**
- Do NOT filter agents by "relevance" - run them ALL
- Do NOT skip agents because they "might not apply" - let them decide
- Launch ALL agents in a SINGLE message with multiple Task tool calls
- 20, 30, 40 parallel agents is fine - use everything
- Each agent may catch something others miss
- The goal is MAXIMUM coverage, not efficiency
-
-**Step 4: Also discover and run research agents**
-
-Research agents (like `best-practices-researcher`, `framework-docs-researcher`, `git-history-analyzer`, `repo-research-analyst`) should also be run for relevant plan sections.
-
-### 6. Wait for ALL Agents and Synthesize Everything
-
-<thinking>
-Wait for ALL parallel agents to complete - skills, research agents, review agents, everything. Then synthesize all findings into a comprehensive enhancement.
-</thinking>
-
-**Collect outputs from ALL sources:**
-
-1. **Skill-based sub-agents** - Each skill's full output (code examples, patterns, recommendations)
-2. **Learnings/Solutions sub-agents** - Relevant documented learnings from /ce:compound
-3. **Research agents** - Best practices, documentation, real-world examples
-4. **Review agents** - All feedback from every reviewer (architecture, security, performance, simplicity, etc.)
-5. **Context7 queries** - Framework documentation and patterns
-6. **Web searches** - Current best practices and articles
-
-**For each agent's findings, extract:**
- [ ] Concrete recommendations (actionable items)
- [ ] Code patterns and examples (copy-paste ready)
- [ ] Anti-patterns to avoid (warnings)
- [ ] Performance considerations (metrics, benchmarks)
- [ ] Security considerations (vulnerabilities, mitigations)
- [ ] Edge cases discovered (handling strategies)
- [ ] Documentation links (references)
- [ ] Skill-specific patterns (from matched skills)
- [ ] Relevant learnings (past solutions that apply - prevent repeating mistakes)
-
-**Deduplicate and prioritize:**
- Merge similar recommendations from multiple agents
- Prioritize by impact (high-value improvements first)
- Flag conflicting advice for human review
- Group by plan section
-
-### 7. Enhance Plan Sections
-
-<thinking>
-Merge research findings back into the plan, adding depth without changing the original structure.
-</thinking>
-
-**Enhancement format for each section:**
-
-```markdown
-## [Original Section Title]
-
-[Original content preserved]
-
-### Research Insights
-
-**Best Practices:**
- [Concrete recommendation 1]
- [Concrete recommendation 2]
-
-**Performance Considerations:**
- [Optimization opportunity]
- [Benchmark or metric to target]
-
-**Implementation Details:**
-```[language]
-// Concrete code example from research
-```
-
-**Edge Cases:**
- [Edge case 1 and how to handle]
- [Edge case 2 and how to handle]
-
-**References:**
- [Documentation URL 1]
- [Documentation URL 2]
-```
-
-### 8. Add Enhancement Summary
-
-At the top of the plan, add a summary section:
-
-```markdown
-## Enhancement Summary
-
-**Deepened on:** [Date]
-**Sections enhanced:** [Count]
-**Research agents used:** [List]
-
-### Key Improvements
-1. [Major improvement 1]
-2. [Major improvement 2]
-3. [Major improvement 3]
-
-### New Considerations Discovered
- [Important finding 1]
- [Important finding 2]
-```
-
-### 9. Update Plan File
-
-**Write the enhanced plan:**
- Preserve original filename
- Add `-deepened` suffix if user prefers a new file
- Update any timestamps or metadata
-
-## Output Format
-
-Update the plan file in place (or if user requests a separate file, append `-deepened` after `-plan`, e.g., `2026-01-15-feat-auth-plan-deepened.md`).
-
-## Quality Checks
-
-Before finalizing:
- [ ] All original content preserved
- [ ] Research insights clearly marked and attributed
- [ ] Code examples are syntactically correct
- [ ] Links are valid and relevant
- [ ] No contradictions between sections
- [ ] Enhancement summary accurately reflects changes
+If the user explicitly requests a separate file, append `-deepened` before `.md`, for example:
+- `docs/plans/2026-03-15-001-feat-example-plan-deepened.md`
+
+If artifact-backed mode was used and the user did not ask to inspect the scratch files:
+- clean up the temporary scratch directory after the plan is safely written
+- if cleanup is not practical on the current platform, say where the artifacts were left and that they are temporary workflow output

 ## Post-Enhancement Options

-After writing the enhanced plan, use the **AskUserQuestion tool** to present these options:
+If substantive changes were made, present next steps using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.

 **Question:** "Plan deepened at `[plan_path]`. What would you like to do next?"

 **Options:**
-1. **View diff** - Show what was added/changed
-2. **Start `/ce:work`** - Begin implementing this enhanced plan
-3. **Deepen further** - Run another round of research on specific sections
-4. **Revert** - Restore original plan (if backup exists)
+1. **View diff** - Show what changed
+2. **Run `document-review` skill** - Improve the updated plan through structured document review
+3. **Start `ce:work` skill** - Begin implementing the plan
+4. **Deepen specific sections further** - Run another targeted deepening pass on named sections

 Based on selection:
- **View diff** → Run `git diff [plan_path]` or show before/after
- **`/ce:work`** → Call the /ce:work command with the plan file path
- **Deepen further** → Ask which sections need more research, then re-run those agents
- **Revert** → Restore from git or backup
+- **View diff** -> Show the important additions and changed sections
+- **`document-review` skill** -> Load the `document-review` skill with the plan path
+- **Start `ce:work` skill** -> Call the `ce:work` skill with the plan path
+- **Deepen specific sections further** -> Ask which sections still feel weak and run another targeted pass only for those sections

-## Example Enhancement
+If no substantive changes were warranted:
+- Say that the plan already appears sufficiently grounded
+- Offer the `document-review` skill or `/ce:work` as the next step instead

-**Before (from /workflows:plan):**
-```markdown
-## Technical Approach
-
-Use React Query for data fetching with optimistic updates.
-```
-
-**After (from /workflows:deepen-plan):**
-```markdown
-## Technical Approach
-
-Use React Query for data fetching with optimistic updates.
-
-### Research Insights
-
-**Best Practices:**
- Configure `staleTime` and `cacheTime` based on data freshness requirements
- Use `queryKey` factories for consistent cache invalidation
- Implement error boundaries around query-dependent components
-
-**Performance Considerations:**
- Enable `refetchOnWindowFocus: false` for stable data to reduce unnecessary requests
- Use `select` option to transform and memoize data at query level
- Consider `placeholderData` for instant perceived loading
-
-**Implementation Details:**
-```typescript
-// Recommended query configuration
-const queryClient = new QueryClient({
-  defaultOptions: {
-    queries: {
-      staleTime: 5 * 60 * 1000, // 5 minutes
-      retry: 2,
-      refetchOnWindowFocus: false,
-    },
-  },
-});
-```
-
-**Edge Cases:**
- Handle race conditions with `cancelQueries` on component unmount
- Implement retry logic for transient network failures
- Consider offline support with `persistQueryClient`
-
-**References:**
- https://tanstack.com/query/latest/docs/react/guides/optimistic-updates
- https://tkdodo.eu/blog/practical-react-query
-```
-
-NEVER CODE! Just research and enhance the plan.
+NEVER CODE! Research, challenge, and strengthen the plan.