refactor: redefine deepen-plan as targeted stress test

2026-03-15 14:15:00 -07:00
parent 6e060e9f9e
commit 80818617bc
5 changed files with 301 additions and 518 deletions
--- a/plugins/compound-engineering/skills/ce-plan/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-plan/SKILL.md
@@ -322,6 +322,7 @@ type: [feat|fix|refactor]
 status: active
 date: YYYY-MM-DD
 origin: docs/brainstorms/YYYY-MM-DD-<topic>-requirements.md  # include when planning from a requirements doc
+deepened: YYYY-MM-DD  # optional, set later by deepen-plan when the plan is substantively strengthened
 ---

 # [Plan Title]
@@ -512,7 +513,7 @@ After writing the plan file, present the options using the platform's blocking q

 **Options:**
 1. **Open plan in editor** - Open the plan file for review
-2. **Run `deepen-plan` skill** - Enhance sections with parallel research agents
+2. **Run `deepen-plan` skill** - Stress-test weak sections with targeted research when the plan needs more confidence
 3. **Review and refine** - Improve the plan through structured document review
 4. **Share to Proof** - Upload the plan for collaborative review and sharing
 5. **Start `ce:work` skill** - Begin implementing this plan in the current environment
@@ -538,7 +539,7 @@ Based on selection:
 - **Create Issue** → Follow the Issue Creation section below
 - **Other** → Accept free text for revisions and loop back to options

-If running with ultrathink enabled, or the platform's reasoning/effort level is set to max or extra-high, automatically run the `deepen-plan` skill after plan creation for maximum grounding.
+If running with ultrathink enabled, or the platform's reasoning/effort level is set to max or extra-high, automatically run the `deepen-plan` skill only when the plan is `Standard` or `Deep`, high-risk, or still shows meaningful confidence gaps in decisions, sequencing, system-wide impact, risks, or verification.

 ## Issue Creation

--- a/plugins/compound-engineering/skills/deepen-plan/SKILL.md
+++ b/plugins/compound-engineering/skills/deepen-plan/SKILL.md
@@ -1,544 +1,321 @@
 ---
 name: deepen-plan
-description: Enhance a plan with parallel research agents for each section to add depth, best practices, and implementation details
+description: Stress-test an existing implementation plan and selectively strengthen weak sections with targeted research. Use when a `ce:plan` output exists but needs more confidence around decisions, sequencing, system-wide impact, risks, or verification. Best for Standard or Deep plans, or high-risk topics such as auth, payments, migrations, external APIs, and security.
 argument-hint: "[path to plan file]"
 ---

-# Deepen Plan - Power Enhancement Mode
+# Deepen Plan

 ## Introduction

 **Note: The current year is 2026.** Use this when searching for recent documentation and best practices.

-This command takes an existing plan (from `/ce:plan`) and enhances each section with parallel research agents. Each major element gets its own dedicated research sub-agent to find:
- Best practices and industry patterns
- Performance optimizations
- UI/UX improvements (if applicable)
- Quality enhancements and edge cases
- Real-world implementation examples
+`ce:plan` does the first planning pass. `deepen-plan` is a second-pass confidence check.

-The result is a deeply grounded, production-ready plan with concrete implementation details.
+Use this skill when the plan already exists and the question is not "Is this document clear?" but rather "Is this plan grounded enough for the complexity and risk involved?"
+
+This skill does **not** turn plans into implementation scripts. It identifies weak sections, runs targeted research only for those sections, and strengthens the plan in place.
+
+`document-review` and `deepen-plan` are different:
+- Use `document-review` when the document needs clarity, simplification, completeness, or scope control
+- Use `deepen-plan` when the document is structurally sound but still needs stronger rationale, sequencing, risk treatment, or system-wide thinking
+
+## Interaction Method
+
+Use the platform's question tool when available. When asking the user a question, prefer the platform's blocking question tool if one exists (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
+
+Ask one question at a time. Prefer a concise single-select choice when natural options exist.

 ## Plan File

 <plan_path> #$ARGUMENTS </plan_path>

-**If the plan path above is empty:**
-1. Check for recent plans: `ls -la docs/plans/`
-2. Ask the user: "Which plan would you like to deepen? Please provide the path (e.g., `docs/plans/2026-01-15-feat-my-feature-plan.md`)."
+If the plan path above is empty:
+1. Check `docs/plans/` for recent files
+2. Ask the user which plan to deepen using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding

 Do not proceed until you have a valid plan file path.

-## Main Tasks
+## Core Principles
+
+1. **Stress-test, do not inflate** - Deepening should increase justified confidence, not make the plan longer for its own sake.
+2. **Selective depth only** - Focus on the weakest 2-5 sections rather than enriching everything.
+3. **Preserve the planning boundary** - No implementation code, no git command choreography, no exact test command recipes.
+4. **Use artifact-contained evidence** - Work from the written plan, its `Context & Research`, `Sources & References`, and its origin document when present.
+5. **Respect product boundaries** - Do not invent new product requirements. If deepening reveals a product-level gap, surface it as an open question or route back to `ce:brainstorm`.
+6. **Prioritize risk and cross-cutting impact** - The more dangerous or interconnected the work, the more valuable another planning pass becomes.
+
+## Workflow

-### 1. Parse and Analyze Plan Structure
+### Phase 0: Load the Plan and Decide Whether Deepening Is Warranted
+
+#### 0.1 Read the Plan and Supporting Inputs
+
+Read the plan file completely.
+
+If the plan frontmatter includes an `origin:` path:
+- Read the origin document too
+- Use it to check whether the plan still reflects the product intent, scope boundaries, and success criteria
+
+#### 0.2 Classify Plan Depth and Topic Risk
+
+Determine the plan depth from the document:
+- **Lightweight** - small, bounded, low ambiguity, usually 2-4 implementation units
+- **Standard** - moderate complexity, some technical decisions, usually 3-6 units
+- **Deep** - cross-cutting, high-risk, or strategically important work, usually 4-8 units or phased delivery
+
+Also build a risk profile. Treat these as high-risk signals:
+- Authentication, authorization, or security-sensitive behavior
+- Payments, billing, or financial flows
+- Data migrations, backfills, or persistent data changes
+- External APIs or third-party integrations
+- Privacy, compliance, or user data handling
+- Cross-interface parity or multi-surface behavior
+- Significant rollout, monitoring, or operational concerns
+
+#### 0.3 Decide Whether to Deepen
+
+Use this default:
+- **Lightweight** plans usually do not need deepening unless they are high-risk or the user explicitly requests it
+- **Standard** plans often benefit when one or more important sections still look thin
+- **Deep** or high-risk plans often benefit from a targeted second pass
+
+If the plan already appears sufficiently grounded:
+- Say so briefly
+- Recommend moving to `ce:work` or `document-review`
+- If the user explicitly asked to deepen anyway, continue with a light pass and deepen at most 1-2 sections
+
+### Phase 1: Parse the Current `ce:plan` Structure
+
+Map the plan into the current template. Look for these sections, or their nearest equivalents:
+- `Overview`
+- `Problem Frame`
+- `Requirements Trace`
+- `Scope Boundaries`
+- `Context & Research`
+- `Key Technical Decisions`
+- `Open Questions`
+- `Implementation Units`
+- `System-Wide Impact`
+- `Risks & Dependencies`
+- `Documentation / Operational Notes`
+- `Sources & References`
+- Optional deep-plan sections such as `Alternative Approaches Considered`, `Success Metrics`, `Phased Delivery`, `Risk Analysis & Mitigation`, and `Operational / Rollout Notes`
+
+If the plan was written manually or uses different headings:
+- Map sections by intent rather than exact heading names
+- If a section is structurally present but titled differently, treat it as the equivalent section
+- If the plan truly lacks a section, decide whether that absence is intentional for the plan depth or a confidence gap worth scoring
+
+Also collect:
+- Frontmatter, including existing `deepened:` date if present
+- Number of implementation units
+- Which files and test files are named
+- Which learnings, patterns, or external references are cited
+- Which sections appear omitted because they were unnecessary versus omitted because they are missing
+
+### Phase 2: Score Confidence Gaps
+
+Use a checklist-first, risk-weighted scoring pass.
+
+For each section, compute:
+- **Trigger count** - number of checklist problems that apply
+- **Risk bonus** - add 1 if the topic is high-risk and this section is materially relevant to that risk
+- **Critical-section bonus** - add 1 for `Key Technical Decisions`, `Implementation Units`, `System-Wide Impact`, `Risks & Dependencies`, or `Open Questions` in `Standard` or `Deep` plans
+
+Treat a section as a candidate if:
+- it hits **2+ total points**, or
+- it hits **1+ point** in a high-risk domain and the section is materially important
+
+Choose only the top **2-5** sections by score. If the user explicitly asked to deepen a lightweight plan, cap at **1-2** sections unless the topic is high-risk.
+
+Example:
+- A `Key Technical Decisions` section with 1 checklist trigger and the critical-section bonus scores **2 points** and is a candidate
+- A `Risks & Dependencies` section with 1 checklist trigger in a high-risk migration plan also becomes a candidate because the risk bonus applies
+
+If the plan already has a `deepened:` date:
+- Prefer sections that have not yet been substantially strengthened, if their scores are comparable
+- Revisit an already-deepened section only when it still scores clearly higher than alternatives or the user explicitly asks for another pass on it
+
+#### 2.1 Section Checklists
+
+Use these triggers.
+
+**Requirements Trace**
+- Requirements are vague or disconnected from implementation units
+- Success criteria are missing or not reflected downstream
+- Units do not clearly advance the traced requirements
+- Origin requirements are not clearly carried forward
+
+**Context & Research / Sources & References**
+- Relevant repo patterns are named but never used in decisions or implementation units
+- Cited learnings or references do not materially shape the plan
+- High-risk work lacks appropriate external or internal grounding
+- Research is generic instead of tied to this repo or this plan
+
+**Key Technical Decisions**
+- A decision is stated without rationale
+- Rationale does not explain tradeoffs or rejected alternatives
+- The decision does not connect back to scope, requirements, or origin context
+- An obvious design fork exists but the plan never addresses why one path won
+
+**Open Questions**
+- Product blockers are hidden as assumptions
+- Planning-owned questions are incorrectly deferred to implementation
+- Resolved questions have no clear basis in repo context, research, or origin decisions
+- Deferred items are too vague to be useful later
+
+**Implementation Units**
+- Dependency order is unclear or likely wrong
+- File paths or test file paths are missing where they should be explicit
+- Units are too large, too vague, or broken into micro-steps
+- Approach notes are thin or do not name the pattern to follow
+- Test scenarios or verification outcomes are vague
+
+**System-Wide Impact**
+- Affected interfaces, callbacks, middleware, entry points, or parity surfaces are missing
+- Failure propagation is underexplored
+- State lifecycle, caching, or data integrity risks are absent where relevant
+- Integration coverage is weak for cross-layer work
+
+**Risks & Dependencies / Documentation / Operational Notes**
+- Risks are listed without mitigation
+- Rollout, monitoring, migration, or support implications are missing when warranted
+- External dependency assumptions are weak or unstated
+- Security, privacy, performance, or data risks are absent where they obviously apply
+
+Use the plan's own `Context & Research` and `Sources & References` as evidence. If those sections cite a pattern, learning, or risk that never affects decisions, implementation units, or verification, treat that as a confidence gap.
+
+### Phase 3: Select Targeted Research Agents
+
+For each selected section, choose the smallest useful agent set. Do **not** run every agent. Use at most **1-3 agents per section** and usually no more than **8 agents total**.
+
+Use fully-qualified agent names inside Task calls.
+
+#### 3.1 Deterministic Section-to-Agent Mapping
+
+**Requirements Trace / Open Questions classification**
+- `compound-engineering:workflow:spec-flow-analyzer` for missing user flows, edge cases, and handoff gaps
+- `compound-engineering:research:repo-research-analyst` for repo-grounded patterns, conventions, and implementation reality checks
+
+**Context & Research / Sources & References gaps**
+- `compound-engineering:research:learnings-researcher` for institutional knowledge and past solved problems
+- `compound-engineering:research:framework-docs-researcher` for official framework or library behavior
+- `compound-engineering:research:best-practices-researcher` for current external patterns and industry guidance
+- Add `compound-engineering:research:git-history-analyzer` only when historical rationale or prior art is materially missing
+
+**Key Technical Decisions**
+- `compound-engineering:review:architecture-strategist` for design integrity, boundaries, and architectural tradeoffs
+- Add `compound-engineering:research:framework-docs-researcher` or `compound-engineering:research:best-practices-researcher` when the decision needs external grounding beyond repo evidence
+
+**Implementation Units / Verification**
+- `compound-engineering:research:repo-research-analyst` for concrete file targets, patterns to follow, and repo-specific sequencing clues
+- `compound-engineering:review:pattern-recognition-specialist` for consistency, duplication risks, and alignment with existing patterns
+- Add `compound-engineering:workflow:spec-flow-analyzer` when sequencing depends on user flow or handoff completeness
+
+**System-Wide Impact**
+- `compound-engineering:review:architecture-strategist` for cross-boundary effects, interface surfaces, and architectural knock-on impact
+- Add the specific specialist that matches the risk:
+  - `compound-engineering:review:performance-oracle` for scalability, latency, throughput, and resource-risk analysis
+  - `compound-engineering:review:security-sentinel` for auth, validation, exploit surfaces, and security boundary review
+  - `compound-engineering:review:data-integrity-guardian` for migrations, persistent state safety, consistency, and data lifecycle risks
+
+**Risks & Dependencies / Operational Notes**
+- Use the specialist that matches the actual risk:
+  - `compound-engineering:review:security-sentinel` for security, auth, privacy, and exploit risk
+  - `compound-engineering:review:data-integrity-guardian` for persistent data safety, constraints, and transaction boundaries
+  - `compound-engineering:review:data-migration-expert` for migration realism, backfills, and production data transformation risk
+  - `compound-engineering:review:deployment-verification-agent` for rollout checklists, rollback planning, and launch verification
+  - `compound-engineering:review:performance-oracle` for capacity, latency, and scaling concerns
+
+#### 3.2 Agent Prompt Shape
+
+For each selected section, pass:
+- A short plan summary
+- The exact section text
+- Why the section was selected, including which checklist triggers fired
+- The plan depth and risk profile
+- A specific question to answer
+
+Instruct the agent to return:
+- findings that change planning quality
+- stronger rationale, sequencing, verification, risk treatment, or references
+- no implementation code
+- no shell commands
+
+### Phase 4: Run Targeted Research and Review
+
+Launch the selected agents in parallel.
+
+Prefer local repo and institutional evidence first. Use external research only when the gap cannot be closed responsibly from repo context or already-cited sources.
+
+If a selected section can be improved by reading the origin document more carefully, do that before dispatching external agents.
+
+If agent outputs conflict:
+- Prefer repo-grounded and origin-grounded evidence over generic advice
+- Prefer official framework documentation over secondary best-practice summaries when the conflict is about library behavior
+- If a real tradeoff remains, record it explicitly in the plan rather than pretending the conflict does not exist
+
+### Phase 5: Synthesize and Rewrite the Plan
+
+Strengthen only the selected sections. Keep the plan coherent and preserve its overall structure.
+
+Allowed changes:
+- Clarify or strengthen decision rationale
+- Tighten requirements trace or origin fidelity
+- Reorder or split implementation units when sequencing is weak
+- Add missing pattern references, file/test paths, or verification outcomes
+- Expand system-wide impact, risks, or rollout treatment where justified
+- Reclassify open questions between `Resolved During Planning` and `Deferred to Implementation` when evidence supports the change
+- Add an optional deep-plan section only when it materially improves execution quality
+- Add or update `deepened: YYYY-MM-DD` in frontmatter when the plan was substantively improved
+
+Do **not**:
+- Add fenced implementation code blocks unless the plan itself is about code shape as a design artifact
+- Add git commands, commit choreography, or exact test command recipes
+- Add generic `Research Insights` subsections everywhere
+- Rewrite the entire plan from scratch
+- Invent new product requirements, scope changes, or success criteria without surfacing them explicitly

-<thinking>
-First, read and parse the plan to identify each major section that can be enhanced with research.
-</thinking>
+If research reveals a product-level ambiguity that should change behavior or scope:
+- Do not silently decide it here
+- Record it under `Open Questions`
+- Recommend `ce:brainstorm` if the gap is truly product-defining

-**Read the plan file and extract:**
- [ ] Overview/Problem Statement
- [ ] Proposed Solution sections
- [ ] Technical Approach/Architecture
- [ ] Implementation phases/steps
- [ ] Code examples and file references
- [ ] Acceptance criteria
- [ ] Any UI/UX components mentioned
- [ ] Technologies/frameworks mentioned (Rails, React, Python, TypeScript, etc.)
- [ ] Domain areas (data models, APIs, UI, security, performance, etc.)
+### Phase 6: Final Checks and Write the File

-**Create a section manifest:**
-```
-Section 1: [Title] - [Brief description of what to research]
-Section 2: [Title] - [Brief description of what to research]
-...
-```
-
-### 2. Discover and Apply Available Skills
-
-<thinking>
-Dynamically discover all available skills and match them to plan sections. Don't assume what skills exist - discover them at runtime.
-</thinking>
-
-**Step 1: Discover ALL available skills from ALL sources**
-
-```bash
-# 1. Project-local skills (highest priority - project-specific)
-ls .claude/skills/
-
-# 2. User's global skills (~/.claude/)
-ls ~/.claude/skills/
-
-# 3. compound-engineering plugin skills
-ls ~/.claude/plugins/cache/*/compound-engineering/*/skills/
-
-# 4. ALL other installed plugins - check every plugin for skills
-find ~/.claude/plugins/cache -type d -name "skills" 2>/dev/null
-
-# 5. Also check installed_plugins.json for all plugin locations
-cat ~/.claude/plugins/installed_plugins.json
-```
-
-**Important:** Check EVERY source. Don't assume compound-engineering is the only plugin. Use skills from ANY installed plugin that's relevant.
-
-**Step 2: For each discovered skill, read its SKILL.md to understand what it does**
-
-```bash
-# For each skill directory found, read its documentation
-cat [skill-path]/SKILL.md
-```
-
-**Step 3: Match skills to plan content**
-
-For each skill discovered:
- Read its SKILL.md description
- Check if any plan sections match the skill's domain
- If there's a match, spawn a sub-agent to apply that skill's knowledge
-
-**Step 4: Spawn a sub-agent for EVERY matched skill**
-
-**CRITICAL: For EACH skill that matches, spawn a separate sub-agent and instruct it to USE that skill.**
-
-For each matched skill:
-```
-Task general-purpose: "You have the [skill-name] skill available at [skill-path].
-
-YOUR JOB: Use this skill on the plan.
-
-1. Read the skill: cat [skill-path]/SKILL.md
-2. Follow the skill's instructions exactly
-3. Apply the skill to this content:
-
-[relevant plan section or full plan]
-
-4. Return the skill's full output
-
-The skill tells you what to do - follow it. Execute the skill completely."
-```
-
-**Spawn ALL skill sub-agents in PARALLEL:**
- 1 sub-agent per matched skill
- Each sub-agent reads and uses its assigned skill
- All run simultaneously
- 10, 20, 30 skill sub-agents is fine
-
-**Each sub-agent:**
-1. Reads its skill's SKILL.md
-2. Follows the skill's workflow/instructions
-3. Applies the skill to the plan
-4. Returns whatever the skill produces (code, recommendations, patterns, reviews, etc.)
-
-**Example spawns:**
-```
-Task general-purpose: "Use the dhh-rails-style skill at ~/.claude/plugins/.../dhh-rails-style. Read SKILL.md and apply it to: [Rails sections of plan]"
-
-Task general-purpose: "Use the frontend-design skill at ~/.claude/plugins/.../frontend-design. Read SKILL.md and apply it to: [UI sections of plan]"
-
-Task general-purpose: "Use the agent-native-architecture skill at ~/.claude/plugins/.../agent-native-architecture. Read SKILL.md and apply it to: [agent/tool sections of plan]"
-
-Task general-purpose: "Use the security-patterns skill at ~/.claude/skills/security-patterns. Read SKILL.md and apply it to: [full plan]"
-```
-
-**No limit on skill sub-agents. Spawn one for every skill that could possibly be relevant.**
-
-### 3. Discover and Apply Learnings/Solutions
-
-<thinking>
-Check for documented learnings from /ce:compound. These are solved problems stored as markdown files. Spawn a sub-agent for each learning to check if it's relevant.
-</thinking>
-
-**LEARNINGS LOCATION - Check these exact folders:**
-
-```
-docs/solutions/           <-- PRIMARY: Project-level learnings (created by /ce:compound)
-├── performance-issues/
-│   └── *.md
-├── debugging-patterns/
-│   └── *.md
-├── configuration-fixes/
-│   └── *.md
-├── integration-issues/
-│   └── *.md
-├── deployment-issues/
-│   └── *.md
-└── [other-categories]/
-    └── *.md
-```
-
-**Step 1: Find ALL learning markdown files**
-
-Run these commands to get every learning file:
-
-```bash
-# PRIMARY LOCATION - Project learnings
-find docs/solutions -name "*.md" -type f 2>/dev/null
-
-# If docs/solutions doesn't exist, check alternate locations:
-find .claude/docs -name "*.md" -type f 2>/dev/null
-find ~/.claude/docs -name "*.md" -type f 2>/dev/null
-```
-
-**Step 2: Read frontmatter of each learning to filter**
-
-Each learning file has YAML frontmatter with metadata. Read the first ~20 lines of each file to get:
-
-```yaml
---
-title: "N+1 Query Fix for Briefs"
-category: performance-issues
-tags: [activerecord, n-plus-one, includes, eager-loading]
-module: Briefs
-symptom: "Slow page load, multiple queries in logs"
-root_cause: "Missing includes on association"
---
-```
-
-**For each .md file, quickly scan its frontmatter:**
-
-```bash
-# Read first 20 lines of each learning (frontmatter + summary)
-head -20 docs/solutions/**/*.md
-```
-
-**Step 3: Filter - only spawn sub-agents for LIKELY relevant learnings**
-
-Compare each learning's frontmatter against the plan:
- `tags:` - Do any tags match technologies/patterns in the plan?
- `category:` - Is this category relevant? (e.g., skip deployment-issues if plan is UI-only)
- `module:` - Does the plan touch this module?
- `symptom:` / `root_cause:` - Could this problem occur with the plan?
-
-**SKIP learnings that are clearly not applicable:**
- Plan is frontend-only → skip `database-migrations/` learnings
- Plan is Python → skip `rails-specific/` learnings
- Plan has no auth → skip `authentication-issues/` learnings
-
-**SPAWN sub-agents for learnings that MIGHT apply:**
- Any tag overlap with plan technologies
- Same category as plan domain
- Similar patterns or concerns
-
-**Step 4: Spawn sub-agents for filtered learnings**
-
-For each learning that passes the filter:
-
-```
-Task general-purpose: "
-LEARNING FILE: [full path to .md file]
-
-1. Read this learning file completely
-2. This learning documents a previously solved problem
-
-Check if this learning applies to this plan:
-
---
-[full plan content]
---
-
-If relevant:
- Explain specifically how it applies
- Quote the key insight or solution
- Suggest where/how to incorporate it
-
-If NOT relevant after deeper analysis:
- Say 'Not applicable: [reason]'
-"
-```
-
-**Example filtering:**
-```
-# Found 15 learning files, plan is about "Rails API caching"
-
-# SPAWN (likely relevant):
-docs/solutions/performance-issues/n-plus-one-queries.md      # tags: [activerecord] ✓
-docs/solutions/performance-issues/redis-cache-stampede.md    # tags: [caching, redis] ✓
-docs/solutions/configuration-fixes/redis-connection-pool.md  # tags: [redis] ✓
-
-# SKIP (clearly not applicable):
-docs/solutions/deployment-issues/heroku-memory-quota.md      # not about caching
-docs/solutions/frontend-issues/stimulus-race-condition.md    # plan is API, not frontend
-docs/solutions/authentication-issues/jwt-expiry.md           # plan has no auth
-```
-
-**Spawn sub-agents in PARALLEL for all filtered learnings.**
-
-**These learnings are institutional knowledge - applying them prevents repeating past mistakes.**
-
-### 4. Launch Per-Section Research Agents
-
-<thinking>
-For each major section in the plan, spawn dedicated sub-agents to research improvements. Use the Explore agent type for open-ended research.
-</thinking>
-
-**For each identified section, launch parallel research:**
-
-```
-Task Explore: "Research best practices, patterns, and real-world examples for: [section topic].
-Find:
- Industry standards and conventions
- Performance considerations
- Common pitfalls and how to avoid them
- Documentation and tutorials
-Return concrete, actionable recommendations."
-```
-
-**Also use Context7 MCP for framework documentation:**
-
-For any technologies/frameworks mentioned in the plan, query Context7:
-```
-mcp__plugin_compound-engineering_context7__resolve-library-id: Find library ID for [framework]
-mcp__plugin_compound-engineering_context7__query-docs: Query documentation for specific patterns
-```
-
-**Use WebSearch for current best practices:**
-
-Search for recent (2024-2026) articles, blog posts, and documentation on topics in the plan.
-
-### 5. Discover and Run ALL Review Agents
-
-<thinking>
-Dynamically discover every available agent and run them ALL against the plan. Don't filter, don't skip, don't assume relevance. 40+ parallel agents is fine. Use everything available.
-</thinking>
-
-**Step 1: Discover ALL available agents from ALL sources**
-
-```bash
-# 1. Project-local agents (highest priority - project-specific)
-find .claude/agents -name "*.md" 2>/dev/null
-
-# 2. User's global agents (~/.claude/)
-find ~/.claude/agents -name "*.md" 2>/dev/null
-
-# 3. compound-engineering plugin agents (all subdirectories)
-find ~/.claude/plugins/cache/*/compound-engineering/*/agents -name "*.md" 2>/dev/null
-
-# 4. ALL other installed plugins - check every plugin for agents
-find ~/.claude/plugins/cache -path "*/agents/*.md" 2>/dev/null
-
-# 5. Check installed_plugins.json to find all plugin locations
-cat ~/.claude/plugins/installed_plugins.json
-
-# 6. For local plugins (isLocal: true), check their source directories
-# Parse installed_plugins.json and find local plugin paths
-```
-
-**Important:** Check EVERY source. Include agents from:
- Project `.claude/agents/`
- User's `~/.claude/agents/`
- compound-engineering plugin (but SKIP workflow/ agents - only use review/, research/, design/, docs/)
- ALL other installed plugins (agent-sdk-dev, frontend-design, etc.)
- Any local plugins
-
-**For compound-engineering plugin specifically:**
- USE: `agents/review/*` (all reviewers)
- USE: `agents/research/*` (all researchers)
- USE: `agents/design/*` (design agents)
- USE: `agents/docs/*` (documentation agents)
- SKIP: `agents/workflow/*` (these are workflow orchestrators, not reviewers)
-
-**Step 2: For each discovered agent, read its description**
-
-Read the first few lines of each agent file to understand what it reviews/analyzes.
-
-**Step 3: Launch ALL agents in parallel**
-
-For EVERY agent discovered, launch a Task in parallel:
-
-```
-Task [agent-name]: "Review this plan using your expertise. Apply all your checks and patterns. Plan content: [full plan content]"
-```
-
-**CRITICAL RULES:**
- Do NOT filter agents by "relevance" - run them ALL
- Do NOT skip agents because they "might not apply" - let them decide
- Launch ALL agents in a SINGLE message with multiple Task tool calls
- 20, 30, 40 parallel agents is fine - use everything
- Each agent may catch something others miss
- The goal is MAXIMUM coverage, not efficiency
-
-**Step 4: Also discover and run research agents**
-
-Research agents (like `best-practices-researcher`, `framework-docs-researcher`, `git-history-analyzer`, `repo-research-analyst`) should also be run for relevant plan sections.
-
-### 6. Wait for ALL Agents and Synthesize Everything
-
-<thinking>
-Wait for ALL parallel agents to complete - skills, research agents, review agents, everything. Then synthesize all findings into a comprehensive enhancement.
-</thinking>
-
-**Collect outputs from ALL sources:**
-
-1. **Skill-based sub-agents** - Each skill's full output (code examples, patterns, recommendations)
-2. **Learnings/Solutions sub-agents** - Relevant documented learnings from /ce:compound
-3. **Research agents** - Best practices, documentation, real-world examples
-4. **Review agents** - All feedback from every reviewer (architecture, security, performance, simplicity, etc.)
-5. **Context7 queries** - Framework documentation and patterns
-6. **Web searches** - Current best practices and articles
-
-**For each agent's findings, extract:**
- [ ] Concrete recommendations (actionable items)
- [ ] Code patterns and examples (copy-paste ready)
- [ ] Anti-patterns to avoid (warnings)
- [ ] Performance considerations (metrics, benchmarks)
- [ ] Security considerations (vulnerabilities, mitigations)
- [ ] Edge cases discovered (handling strategies)
- [ ] Documentation links (references)
- [ ] Skill-specific patterns (from matched skills)
- [ ] Relevant learnings (past solutions that apply - prevent repeating mistakes)
-
-**Deduplicate and prioritize:**
- Merge similar recommendations from multiple agents
- Prioritize by impact (high-value improvements first)
- Flag conflicting advice for human review
- Group by plan section
-
-### 7. Enhance Plan Sections
-
-<thinking>
-Merge research findings back into the plan, adding depth without changing the original structure.
-</thinking>
-
-**Enhancement format for each section:**
-
-```markdown
-## [Original Section Title]
-
-[Original content preserved]
-
-### Research Insights
-
-**Best Practices:**
- [Concrete recommendation 1]
- [Concrete recommendation 2]
-
-**Performance Considerations:**
- [Optimization opportunity]
- [Benchmark or metric to target]
-
-**Implementation Details:**
-```[language]
-// Concrete code example from research
-```
-
-**Edge Cases:**
- [Edge case 1 and how to handle]
- [Edge case 2 and how to handle]
-
-**References:**
- [Documentation URL 1]
- [Documentation URL 2]
-```
-
-### 8. Add Enhancement Summary
-
-At the top of the plan, add a summary section:
-
-```markdown
-## Enhancement Summary
-
-**Deepened on:** [Date]
-**Sections enhanced:** [Count]
-**Research agents used:** [List]
-
-### Key Improvements
-1. [Major improvement 1]
-2. [Major improvement 2]
-3. [Major improvement 3]
-
-### New Considerations Discovered
- [Important finding 1]
- [Important finding 2]
-```
-
-### 9. Update Plan File
-
-**Write the enhanced plan:**
- Preserve original filename
- Add `-deepened` suffix if user prefers a new file
- Update any timestamps or metadata
-
-## Output Format
-
-Update the plan file in place (or if user requests a separate file, append `-deepened` after `-plan`, e.g., `2026-01-15-feat-auth-plan-deepened.md`).
-
-## Quality Checks
-
-Before finalizing:
- [ ] All original content preserved
- [ ] Research insights clearly marked and attributed
- [ ] Code examples are syntactically correct
- [ ] Links are valid and relevant
- [ ] No contradictions between sections
- [ ] Enhancement summary accurately reflects changes
+Before writing:
+- Confirm the plan is stronger in specific ways, not merely longer
+- Confirm the planning boundary is intact
+- Confirm the selected sections were actually the weakest ones
+- Confirm origin decisions were preserved when an origin document exists
+- Confirm the final plan still feels right-sized for its depth
+
+Update the plan file in place by default.
+
+If the user explicitly requests a separate file, append `-deepened` before `.md`, for example:
+- `docs/plans/2026-03-15-001-feat-example-plan-deepened.md`

 ## Post-Enhancement Options

-After writing the enhanced plan, use the **AskUserQuestion tool** to present these options:
+If substantive changes were made, present next steps using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.

 **Question:** "Plan deepened at `[plan_path]`. What would you like to do next?"

 **Options:**
-1. **View diff** - Show what was added/changed
-2. **Start `/ce:work`** - Begin implementing this enhanced plan
-3. **Deepen further** - Run another round of research on specific sections
-4. **Revert** - Restore original plan (if backup exists)
+1. **View diff** - Show what changed
+2. **Review and refine** - Run the `document-review` skill on the updated plan
+3. **Start `ce:work` skill** - Begin implementing the plan
+4. **Deepen specific sections further** - Run another targeted deepening pass on named sections

 Based on selection:
- **View diff** → Run `git diff [plan_path]` or show before/after
- **`/ce:work`** → Call the /ce:work command with the plan file path
- **Deepen further** → Ask which sections need more research, then re-run those agents
- **Revert** → Restore from git or backup
+- **View diff** -> Show the important additions and changed sections
+- **Review and refine** -> Load the `document-review` skill with the plan path
+- **Start `ce:work` skill** -> Call the `ce:work` skill with the plan path
+- **Deepen specific sections further** -> Ask which sections still feel weak and run another targeted pass only for those sections

-## Example Enhancement
+If no substantive changes were warranted:
+- Say that the plan already appears sufficiently grounded
+- Offer `document-review` or `ce:work` as the next step instead

-**Before (from /workflows:plan):**
-```markdown
-## Technical Approach
-
-Use React Query for data fetching with optimistic updates.
-```
-
-**After (from /workflows:deepen-plan):**
-```markdown
-## Technical Approach
-
-Use React Query for data fetching with optimistic updates.
-
-### Research Insights
-
-**Best Practices:**
- Configure `staleTime` and `cacheTime` based on data freshness requirements
- Use `queryKey` factories for consistent cache invalidation
- Implement error boundaries around query-dependent components
-
-**Performance Considerations:**
- Enable `refetchOnWindowFocus: false` for stable data to reduce unnecessary requests
- Use `select` option to transform and memoize data at query level
- Consider `placeholderData` for instant perceived loading
-
-**Implementation Details:**
-```typescript
-// Recommended query configuration
-const queryClient = new QueryClient({
-  defaultOptions: {
-    queries: {
-      staleTime: 5 * 60 * 1000, // 5 minutes
-      retry: 2,
-      refetchOnWindowFocus: false,
-    },
-  },
-});
-```
-
-**Edge Cases:**
- Handle race conditions with `cancelQueries` on component unmount
- Implement retry logic for transient network failures
- Consider offline support with `persistQueryClient`
-
-**References:**
- https://tanstack.com/query/latest/docs/react/guides/optimistic-updates
- https://tkdodo.eu/blog/practical-react-query
-```
-
-NEVER CODE! Just research and enhance the plan.
+NEVER CODE! Research, challenge, and strengthen the plan.
--- a/plugins/compound-engineering/skills/lfg/SKILL.md
+++ b/plugins/compound-engineering/skills/lfg/SKILL.md
@@ -5,17 +5,19 @@ argument-hint: "[feature description]"
 disable-model-invocation: true
 ---

-CRITICAL: You MUST execute every step below IN ORDER. Do NOT skip any step. Do NOT jump ahead to coding or implementation. The plan phase (steps 2-3) MUST be completed and verified BEFORE any work begins. Violating this order produces bad output.
+CRITICAL: You MUST execute every step below IN ORDER. Do NOT skip any required step. Do NOT jump ahead to coding or implementation. The plan phase (step 2, and step 3 when warranted) MUST be completed and verified BEFORE any work begins. Violating this order produces bad output.

 1. **Optional:** If the `ralph-wiggum` skill is available, run `/ralph-wiggum:ralph-loop "finish all slash commands" --completion-promise "DONE"`. If not available or it fails, skip and continue to step 2 immediately.

 2. `/ce:plan $ARGUMENTS`

-   GATE: STOP. Verify that `/ce:plan` produced a plan file in `docs/plans/`. If no plan file was created, run `/ce:plan $ARGUMENTS` again. Do NOT proceed to step 3 until a written plan exists.
+   GATE: STOP. Verify that the `ce:plan` workflow produced a plan file in `docs/plans/`. If no plan file was created, run `/ce:plan $ARGUMENTS` again. Do NOT proceed to step 3 until a written plan exists.

-3. `/compound-engineering:deepen-plan`
+3. **Conditionally** run `/compound-engineering:deepen-plan`

-   GATE: STOP. Confirm the plan has been deepened and updated. The plan file in `docs/plans/` should now contain additional detail. Do NOT proceed to step 4 without a deepened plan.
+   Run the `deepen-plan` workflow only if the plan is `Standard` or `Deep`, touches a high-risk area (auth, security, payments, migrations, external APIs, significant rollout concerns), or still has obvious confidence gaps in decisions, sequencing, system-wide impact, risks, or verification.
+
+   GATE: STOP. If you ran the `deepen-plan` workflow, confirm the plan was deepened or explicitly judged sufficiently grounded. If you skipped it, briefly note why and proceed to step 4.

 4. `/ce:work`

--- a/plugins/compound-engineering/skills/slfg/SKILL.md
+++ b/plugins/compound-engineering/skills/slfg/SKILL.md
@@ -11,7 +11,10 @@ Swarm-enabled LFG. Run these steps in order, parallelizing where indicated. Do n

 1. **Optional:** If the `ralph-wiggum` skill is available, run `/ralph-wiggum:ralph-loop "finish all slash commands" --completion-promise "DONE"`. If not available or it fails, skip and continue to step 2 immediately.
 2. `/ce:plan $ARGUMENTS`
-3. `/compound-engineering:deepen-plan`
+3. **Conditionally** run `/compound-engineering:deepen-plan`
+   - Run the `deepen-plan` workflow only if the plan is `Standard` or `Deep`, touches a high-risk area (auth, security, payments, migrations, external APIs, significant rollout concerns), or still has obvious confidence gaps in decisions, sequencing, system-wide impact, risks, or verification
+   - If you run the `deepen-plan` workflow, confirm the plan was deepened or explicitly judged sufficiently grounded before moving on
+   - If you skip it, note why and continue to step 4
 4. `/ce:work` — **Use swarm mode**: Make a Task list and launch an army of agent swarm subagents to build the plan

 ## Parallel Phase