Merge pull request #272 from EveryInc/feat/ce-plan-rewrite-brainstorm

feat: add ce:plan-beta and deepen-plan-beta skills
2026-03-17 10:40:10 -07:00
parent 6462de20a6 a83e11e982
commit 04f00e7632
11 changed files with 1200 additions and 79 deletions
--- a/.claude-plugin/marketplace.json
+++ b/.claude-plugin/marketplace.json
@@ -11,7 +11,7 @@
  "plugins": [
    {
      "name": "compound-engineering",
-      "description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last. Includes 29 specialized agents and 42 skills.",
+      "description": "AI-powered development tools that get smarter with every use. Make each unit of engineering work easier than the last. Includes 29 specialized agents and 44 skills.",
      "version": "2.41.0",
      "author": {
        "name": "Kieran Klaassen",
--- a/docs/brainstorms/2026-03-14-ce-plan-rewrite-requirements.md
+++ b/docs/brainstorms/2026-03-14-ce-plan-rewrite-requirements.md
@@ -0,0 +1,85 @@
+---
+date: 2026-03-14
+topic: ce-plan-rewrite
+---
+
+# Rewrite `ce:plan` to Separate Planning from Implementation
+
+## Problem Frame
+
+`ce:plan` sits between `ce:brainstorm` and `ce:work`, but the current skill mixes issue authoring, technical planning, and pseudo-implementation. That makes plans brittle and pushes the planning phase to predict details that are often only discoverable during implementation. PR #246 intensifies this by asking plans to include complete code, exact commands, and micro-step TDD and commit choreography. The rewrite should keep planning strong enough for a capable agent or engineer to execute, while moving code-writing, test-running, and execution-time learning back into `ce:work`.
+
+## Requirements
+
+- R1. `ce:plan` must accept either a raw feature description or a requirements document produced by `ce:brainstorm` as primary input.
+- R2. `ce:plan` must preserve compound-engineering's planning strengths: repo pattern scan, institutional learnings, conditional external research, and requirements-gap checks when warranted.
+- R3. `ce:plan` must produce a durable implementation plan focused on decisions, sequencing, file paths, dependencies, risks, and test scenarios, not implementation code.
+- R4. `ce:plan` must not instruct the planner to run tests, generate exact implementation snippets, or learn from execution-time results. Those belong to `ce:work`.
+- R5. Plan tasks and subtasks must be right-sized for implementation handoff, but sized as logical units or atomic commits rather than 2-5 minute copy-paste steps.
+- R6. Plans must remain shareable and portable as documents or issues without tool-specific executor litter such as TodoWrite instructions, `/ce:work` choreography, or git command recipes in the artifact itself.
+- R7. `ce:plan` must carry forward product decisions, scope boundaries, success criteria, and deferred questions from `ce:brainstorm` without re-inventing them.
+- R8. `ce:plan` must explicitly distinguish what gets resolved during planning from what is intentionally deferred to implementation-time discovery.
+- R9. `ce:plan` must hand off cleanly to `ce:work`, giving enough information for task creation without pre-writing code.
+- R10. If detail levels remain, they must change depth of analysis and documentation, not the planning philosophy. A small plan can be terse while still staying decision-first.
+- R11. If an upstream requirements document contains unresolved `Resolve Before Planning` items, `ce:plan` must classify whether they are true product blockers or misfiled technical questions before proceeding.
+- R12. `ce:plan` must not plan past unresolved product decisions that would change behavior, scope, or success criteria, but it may absorb technical or research questions by reclassifying them into planning-owned investigation.
+- R13. When true blockers remain, `ce:plan` must pause helpfully: surface the blockers, allow the user to convert them into explicit assumptions or decisions, or route them back to `ce:brainstorm`.
+
+## Success Criteria
+
+- A fresh implementer can start work from the plan without needing clarifying questions, but the plan does not contain implementation code.
+- `ce:work` can derive actionable tasks from the plan without relying on micro-step commands or embedded git/test instructions.
+- Plans stay accurate longer as repo context changes because they capture decisions and boundaries rather than speculative code.
+- A requirements document from `ce:brainstorm` flows into planning without losing decisions, scope boundaries, or success criteria.
+- Plans do not proceed past unresolved product blockers unless the user explicitly converts them into assumptions or decisions.
+- For the same feature, the rewritten `ce:plan` produces output that is materially shorter and less brittle than the current skill or PR #246's proposed format while remaining execution-ready.
+
+## Scope Boundaries
+
+- Do not redesign `ce:brainstorm`'s product-definition role.
+- Do not remove decomposition, file paths, verification, or risk analysis from `ce:plan`.
+- Do not move planning into a vague, under-specified artifact that leaves execution to guess.
+- Do not change `ce:work` in this phase beyond possible follow-up clarification of what plan structure it should prefer.
+- Do not require heavyweight PRD ceremony for small or straightforward work.
+
+## Key Decisions
+
+- Use a hybrid model: keep compound-engineering's research and handoff strengths, but adopt iterative-engineering's "decisions, not code" boundary.
+- Planning stops before execution: no running tests, no fail/pass learning, no exact implementation snippets, and no commit shell commands in the plan.
+- Use logical tasks and subtasks sized around atomic changes or commit units rather than 2-5 minute micro-steps.
+- Keep explicit verification and test scenarios, but express them as expected coverage and validation outcomes rather than commands with predicted output.
+- Preserve `ce:brainstorm` as the preferred upstream input when available, with clear handling for deferred technical questions.
+- Treat `Resolve Before Planning` as a classification gate: planning first distinguishes true product blockers from technical questions, then investigates only the latter.
+
+## High-Level Direction
+
+- Phase 0: Resume existing plan work when relevant, detect brainstorm input, and assess scope.
+- Phase 1: Gather context through repo research, institutional learnings, and conditional external research.
+- Phase 2: Resolve planning-time technical questions and capture implementation-time unknowns separately.
+- Phase 3: Structure the plan around components, dependencies, files, test targets, risks, and verification.
+- Phase 4: Write a right-sized plan artifact whose depth varies by scope, but whose boundary stays planning-only.
+- Phase 5: Review and hand off to refinement, deeper research, issue sharing, or `ce:work`.
+
+## Alternatives Considered
+
+- Keep the current `ce:plan` and only reject PR #246.
+  Rejected because the underlying issue remains: the current skill already drifts toward issue-template output plus pseudo-implementation.
+- Adopt Superpowers `writing-plans` nearly wholesale.
+  Rejected because it is intentionally execution-script-oriented and collapses planning into detailed code-writing and command choreography.
+- Adopt iterative-engineering `tech-planning` wholesale.
+  Rejected because it would lose useful compound-engineering behaviors such as brainstorm-origin integration, institutional learnings, and richer post-plan handoff options.
+
+## Dependencies / Assumptions
+
+- `ce:work` can continue creating its own actionable task list from a decision-first plan.
+- If `ce:work` later benefits from an explicit section such as `## Implementation Units` or `## Work Breakdown`, that should be a separate follow-up designed around execution needs rather than micro-step code generation.
+
+## Resolved During Planning
+
+- [Affects R10][Technical] Replaced `MINIMAL` / `MORE` / `A LOT` with `Lightweight` / `Standard` / `Deep` to align `ce:plan` with `ce:brainstorm`'s scope model.
+- [Affects R9][Technical] Updated `ce:work` to explicitly consume decision-first plan sections such as `Implementation Units`, `Requirements Trace`, `Files`, `Test Scenarios`, and `Verification`.
+- [Affects R2][Needs research] Kept SpecFlow as a conditional planning aid: use it for `Standard` or `Deep` plans when flow completeness is unclear rather than making it mandatory for every plan.
+
+## Next Steps
+
+-> Review, refine, and commit the `ce:plan` and `ce:work` rewrite
--- a/docs/solutions/skill-design/beta-skills-framework.md
+++ b/docs/solutions/skill-design/beta-skills-framework.md
@@ -0,0 +1,96 @@
+---
+title: "Beta skills framework: parallel skills with -beta suffix for safe rollouts"
+category: skill-design
+date: 2026-03-17
+module: plugins/compound-engineering/skills
+component: SKILL.md
+tags:
+  - skill-design
+  - beta-testing
+  - skill-versioning
+  - rollout-safety
+severity: medium
+description: "Pattern for trialing new skill versions alongside stable ones using a -beta suffix. Covers naming, plan file naming, internal references, and promotion path."
+related:
+  - docs/solutions/skill-design/compound-refresh-skill-improvements.md
+---
+
+## Problem
+
+Core workflow skills like `ce:plan` and `deepen-plan` are deeply chained (`ce:brainstorm` → `ce:plan` → `deepen-plan` → `ce:work`) and orchestrated by `lfg` and `slfg`. Rewriting these skills risks breaking the entire workflow for all users simultaneously. There was no mechanism to let users trial new skill versions alongside stable ones.
+
+Alternatives considered and rejected:
+- **Beta gate in SKILL.md** with config-driven routing (`beta: true` in `compound-engineering.local.md`): relies on prompt-level conditional routing which risks instruction blending, requires setup integration, and adds complexity to the skill files themselves.
+- **Pure router SKILL.md** with both versions in `references/`: adds file-read penalty and refactors stable skills unnecessarily.
+- **Separate beta plugin**: heavy infrastructure for a temporary need.
+
+## Solution
+
+### Parallel skills with `-beta` suffix
+
+Create separate skill directories alongside the stable ones. Each beta skill is a fully independent copy with its own frontmatter, instructions, and internal references.
+
+```
+skills/
+├── ce-plan/SKILL.md           # Stable (unchanged)
+├── ce-plan-beta/SKILL.md      # New version
+├── deepen-plan/SKILL.md       # Stable (unchanged)
+└── deepen-plan-beta/SKILL.md  # New version
+```
+
+### Naming and frontmatter conventions
+
+- **Directory**: `<skill-name>-beta/`
+- **Frontmatter name**: `<skill:name>-beta` (e.g., `ce:plan-beta`)
+- **Description**: Write the intended stable description, then prefix with `[BETA]`. This ensures promotion is a simple prefix removal rather than a rewrite.
+- **`disable-model-invocation: true`**: Prevents the model from auto-triggering the beta skill. Users invoke it manually with the slash command. Remove this field when promoting to stable.
+- **Plan files**: Use `-beta-plan.md` suffix (e.g., `2026-03-17-001-feat-auth-flow-beta-plan.md`) to avoid clobbering stable plan files
+
+### Internal references
+
+Beta skills must reference each other by their beta names:
+- `ce:plan-beta` references `/deepen-plan-beta` (not `/deepen-plan`)
+- `deepen-plan-beta` references `ce:plan-beta` (not `ce:plan`)
+
+### What doesn't change
+
+- Stable `ce:plan` and `deepen-plan` are completely untouched
+- `lfg`/`slfg` orchestration continues to use stable skills — no modification needed
+- `ce:brainstorm` still hands off to stable `ce:plan` — no modification needed
+- `ce:work` consumes plan files from either version (reads the file, doesn't care which skill wrote it)
+
+### Tradeoffs
+
+**Simplicity over seamless integration.** Beta skills exist as standalone, manually-invoked skills. They won't be auto-triggered by `ce:brainstorm` handoffs or `lfg`/`slfg` orchestration without further surgery to those skills, which isn't worth the complexity for a trial period.
+
+**Intended usage pattern:** A user can run `/ce:plan` for the stable output, then run `/ce:plan-beta` on the same input to compare the two plan documents side by side. The `-beta-plan.md` suffix ensures both outputs coexist in `docs/plans/` without collision.
+
+## Promotion path
+
+When the beta version is validated:
+
+1. Replace stable `SKILL.md` content with beta skill content
+2. Restore stable frontmatter: remove `[BETA]` prefix from description, restore stable `name:`
+3. Remove `disable-model-invocation: true` so the model can auto-trigger it
+4. Update all internal references back to stable names
+5. Restore stable plan file naming (remove `-beta` from the convention)
+6. Delete the beta skill directory
+7. Update README.md: remove from Beta Skills section, verify counts
+8. Verify `lfg`/`slfg` work with the promoted skill
+9. Verify `ce:work` consumes plans from the promoted skill
+
+## Validation
+
+After creating a beta skill, search its SKILL.md for references to the stable skill name it replaces. Any occurrence of the stable name without `-beta` is a missed rename — it would cause output collisions or route to the wrong skill.
+
+Check for:
+- **Output file paths** that use the stable naming convention instead of the `-beta` variant
+- **Cross-skill references** that point to stable skill names instead of beta counterparts
+- **User-facing text** (questions, confirmations) that mentions stable paths or names
+
+## Prevention
+
+- When adding a beta skill, always use the `-beta` suffix consistently in directory name, frontmatter name, description, plan file naming, and all internal skill-to-skill references
+- After creating a beta skill, run the validation checks above to catch missed renames in file paths, user-facing text, and cross-skill references
+- Always test that stable skills are completely unaffected by the beta skill's existence
+- Keep beta and stable plan file suffixes distinct so outputs can coexist for comparison
--- a/plugins/compound-engineering/.claude-plugin/plugin.json
+++ b/plugins/compound-engineering/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
  "name": "compound-engineering",
  "version": "2.41.0",
-  "description": "AI-powered development tools. 29 agents, 42 skills, 1 MCP server for code review, research, design, and workflow automation.",
+  "description": "AI-powered development tools. 29 agents, 44 skills, 1 MCP server for code review, research, design, and workflow automation.",
  "author": {
    "name": "Kieran Klaassen",
    "email": "kieran@every.to",
--- a/plugins/compound-engineering/AGENTS.md
+++ b/plugins/compound-engineering/AGENTS.md
@@ -116,6 +116,43 @@ grep -E '`(references|assets|scripts)/[^`]+`' skills/*/SKILL.md
 grep -E '^description:' skills/*/SKILL.md
 ```

+## Beta Skills
+
+Beta skills are experimental versions of core workflow skills, published as separate skills with a `-beta` suffix (e.g., `ce-plan-beta`, `deepen-plan-beta`). They live alongside the stable versions and are invoked directly.
+
+See `docs/solutions/skill-design/beta-skills-framework.md` for the full pattern.
+
+### Beta Skill Rules
+
+- Beta skills use `-beta` suffix in directory name, skill name, and description prefix (`[BETA]`)
+- Beta skills set `disable-model-invocation: true` to prevent accidental auto-triggering — users invoke them manually
+- Beta skill descriptions should be the intended stable description prefixed with `[BETA]`, so promotion is a simple prefix removal
+- Beta skills must reference other beta skills by their beta names (e.g., `/deepen-plan-beta`, not `/deepen-plan`)
+- Beta plan output files use `-beta-plan.md` suffix to avoid clobbering stable plan files
+- Beta skills are not wired into `lfg`/`slfg` orchestration — invoke them directly
+
+### Beta Skill Validation
+
+After creating or modifying a beta skill, search its SKILL.md for any reference to the stable skill name it replaces. Occurrences of the stable name without `-beta` are missed renames that would cause output collisions or misrouting. Check for:
+
+- Output file paths using the stable naming convention instead of the `-beta` variant
+- Cross-skill references pointing to stable names instead of beta counterparts
+- User-facing text (questions, confirmations) mentioning stable paths or names
+
+### Promoting Beta to Stable
+
+When replacing a stable skill with its beta version:
+
+- [ ] Replace stable `SKILL.md` content with beta skill content
+- [ ] Restore stable frontmatter: remove `[BETA]` prefix from description, restore stable `name:` (e.g., `ce:plan` not `ce:plan-beta`)
+- [ ] Remove `disable-model-invocation: true` so the model can auto-trigger the skill
+- [ ] Update all internal references back to stable names (`/deepen-plan` not `/deepen-plan-beta`)
+- [ ] Restore stable plan file naming (remove `-beta` from `-beta-plan.md` convention)
+- [ ] Delete the beta skill directory
+- [ ] Update README.md: remove from Beta Skills section, verify counts
+- [ ] Verify `lfg`/`slfg` still work with the updated stable skill
+- [ ] Verify `ce:work` consumes plans from the promoted skill correctly
+
 ## Documentation

 See `docs/solutions/plugin-versioning-requirements.md` for detailed versioning workflow.
--- a/plugins/compound-engineering/README.md
+++ b/plugins/compound-engineering/README.md
@@ -7,7 +7,7 @@ AI-powered development tools that get smarter with every use. Make each unit of
 | Component | Count |
 |-----------|-------|
 | Agents | 29 |
-| Skills | 42 |
+| Skills | 44 |
 | MCP Servers | 1 |

 ## Agents
@@ -90,7 +90,7 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
 |---------|-------------|
 | `/lfg` | Full autonomous engineering workflow |
 | `/slfg` | Full autonomous workflow with swarm mode for parallel execution |
-| `/deepen-plan` | Enhance plans with parallel research agents for each section |
+| `/deepen-plan` | Stress-test plans and deepen weak sections with targeted research |
 | `/changelog` | Create engaging changelogs for recent merges |
 | `/create-agent-skill` | Create or edit Claude Code skills |
 | `/generate_command` | Generate new slash commands |
@@ -156,6 +156,17 @@ Core workflow commands use `ce:` prefix to unambiguously identify them as compou
 |-------|-------------|
 | `agent-browser` | CLI-based browser automation using Vercel's agent-browser |

+### Beta Skills
+
+Experimental versions of core workflow skills. These are being tested before replacing their stable counterparts. They work standalone but are not yet wired into the automated `lfg`/`slfg` orchestration.
+
+| Skill | Description | Replaces |
+|-------|-------------|----------|
+| `ce:plan-beta` | Decision-first planning focused on boundaries, sequencing, and verification | `ce:plan` |
+| `deepen-plan-beta` | Selective stress-test that targets weak sections with research | `deepen-plan` |
+
+To test: invoke `/ce:plan-beta` or `/deepen-plan-beta` directly. Plans produced by the beta skills are compatible with `/ce:work`.
+
 ### Image Generation

 | Skill | Description |
--- a/plugins/compound-engineering/skills/ce-plan-beta/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-plan-beta/SKILL.md
@@ -0,0 +1,571 @@
+---
+name: ce:plan-beta
+description: "[BETA] Transform feature descriptions or requirements into structured implementation plans grounded in repo patterns and research. Use when the user says 'plan this', 'create a plan', 'write a tech plan', 'plan the implementation', 'how should we build', 'what's the approach for', 'break this down', or when a brainstorm/requirements document is ready for technical planning. Best when requirements are at least roughly defined; for exploratory or ambiguous requests, prefer ce:brainstorm first."
+argument-hint: "[feature description, requirements doc path, or improvement idea]"
+disable-model-invocation: true
+---
+
+# Create Technical Plan
+
+**Note: The current year is 2026.** Use this when dating plans and searching for recent documentation.
+
+`ce:brainstorm` defines **WHAT** to build. `ce:plan` defines **HOW** to build it. `ce:work` executes the plan.
+
+This workflow produces a durable implementation plan. It does **not** implement code, run tests, or learn from execution-time results. If the answer depends on changing code and seeing what happens, that belongs in `ce:work`, not here.
+
+## Interaction Method
+
+Use the platform's question tool when available. When asking the user a question, prefer the platform's blocking question tool if one exists (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
+
+Ask one question at a time. Prefer a concise single-select choice when natural options exist.
+
+## Feature Description
+
+<feature_description> #$ARGUMENTS </feature_description>
+
+**If the feature description above is empty, ask the user:** "What would you like to plan? Please describe the feature, bug fix, or improvement you have in mind."
+
+Do not proceed until you have a clear planning input.
+
+## Core Principles
+
+1. **Use requirements as the source of truth** - If `ce:brainstorm` produced a requirements document, planning should build from it rather than re-inventing behavior.
+2. **Decisions, not code** - Capture approach, boundaries, files, dependencies, risks, and test scenarios. Do not pre-write implementation code or shell command choreography.
+3. **Research before structuring** - Explore the codebase, institutional learnings, and external guidance when warranted before finalizing the plan.
+4. **Right-size the artifact** - Small work gets a compact plan. Large work gets more structure. The philosophy stays the same at every depth.
+5. **Separate planning from execution discovery** - Resolve planning-time questions here. Explicitly defer execution-time unknowns to implementation.
+6. **Keep the plan portable** - The plan should work as a living document, review artifact, or issue body without embedding tool-specific executor instructions.
+
+## Plan Quality Bar
+
+Every plan should contain:
+- A clear problem frame and scope boundary
+- Concrete requirements traceability back to the request or origin document
+- Exact file paths for the work being proposed
+- Explicit test file paths for feature-bearing implementation units
+- Decisions with rationale, not just tasks
+- Existing patterns or code references to follow
+- Specific test scenarios and verification outcomes
+- Clear dependencies and sequencing
+
+A plan is ready when an implementer can start confidently without needing the plan to write the code for them.
+
+## Workflow
+
+### Phase 0: Resume, Source, and Scope
+
+#### 0.1 Resume Existing Plan Work When Appropriate
+
+If the user references an existing plan file or there is an obvious recent matching plan in `docs/plans/`:
+- Read it
+- Confirm whether to update it in place or create a new plan
+- If updating, preserve completed checkboxes and revise only the still-relevant sections
+
+#### 0.2 Find Upstream Requirements Document
+
+Before asking planning questions, search `docs/brainstorms/` for files matching `*-requirements.md`.
+
+**Relevance criteria:** A requirements document is relevant if:
+- The topic semantically matches the feature description
+- It was created within the last 30 days (use judgment to override if the document is clearly still relevant or clearly stale)
+- It appears to cover the same user problem or scope
+
+If multiple source documents match, ask which one to use using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
+
+#### 0.3 Use the Source Document as Primary Input
+
+If a relevant requirements document exists:
+1. Read it thoroughly
+2. Announce that it will serve as the origin document for planning
+3. Carry forward all of the following:
+   - Problem frame
+   - Requirements and success criteria
+   - Scope boundaries
+   - Key decisions and rationale
+   - Dependencies or assumptions
+   - Outstanding questions, preserving whether they are blocking or deferred
+4. Use the source document as the primary input to planning and research
+5. Reference important carried-forward decisions in the plan with `(see origin: <source-path>)`
+6. Do not silently omit source content — if the origin document discussed it, the plan must address it even if briefly. Before finalizing, scan each section of the origin document to verify nothing was dropped.
+
+If no relevant requirements document exists, planning may proceed from the user's request directly.
+
+#### 0.4 No-Requirements-Doc Fallback
+
+If no relevant requirements document exists:
+- Assess whether the request is already clear enough for direct technical planning
+- If the ambiguity is mainly product framing, user behavior, or scope definition, recommend `ce:brainstorm` first
+- If the user wants to continue here anyway, run a short planning bootstrap instead of refusing
+
+The planning bootstrap should establish:
+- Problem frame
+- Intended behavior
+- Scope boundaries and obvious non-goals
+- Success criteria
+- Blocking questions or assumptions
+
+Keep this bootstrap brief. It exists to preserve direct-entry convenience, not to replace a full brainstorm.
+
+If the bootstrap uncovers major unresolved product questions:
+- Recommend `ce:brainstorm` again
+- If the user still wants to continue, require explicit assumptions before proceeding
+
+#### 0.5 Classify Outstanding Questions Before Planning
+
+If the origin document contains `Resolve Before Planning` or similar blocking questions:
+- Review each one before proceeding
+- Reclassify it into planning-owned work **only if** it is actually a technical, architectural, or research question
+- Keep it as a blocker if it would change product behavior, scope, or success criteria
+
+If true product blockers remain:
+- Surface them clearly
+- Ask the user, using the platform's blocking question tool when available (see Interaction Method), whether to:
+  1. Resume `ce:brainstorm` to resolve them
+  2. Convert them into explicit assumptions or decisions and continue
+- Do not continue planning while true blockers remain unresolved
+
+#### 0.6 Assess Plan Depth
+
+Classify the work into one of these plan depths:
+
+- **Lightweight** - small, well-bounded, low ambiguity
+- **Standard** - normal feature or bounded refactor with some technical decisions to document
+- **Deep** - cross-cutting, strategic, high-risk, or highly ambiguous implementation work
+
+If depth is unclear, ask one targeted question and then continue.
+
+### Phase 1: Gather Context
+
+#### 1.1 Local Research (Always Runs)
+
+Prepare a concise planning context summary (a paragraph or two) to pass as input to the research agents:
+- If an origin document exists, summarize the problem frame, requirements, and key decisions from that document
+- Otherwise use the feature description directly
+
+Run these agents in parallel:
+
+- Task compound-engineering:research:repo-research-analyst(planning context summary)
+- Task compound-engineering:research:learnings-researcher(planning context summary)
+
+Collect:
+- Existing patterns and conventions to follow
+- Relevant files, modules, and tests
+- CLAUDE.md or AGENTS.md guidance that materially affects the plan
+- Institutional learnings from `docs/solutions/`
+
+#### 1.2 Decide on External Research
+
+Based on the origin document, user signals, and local findings, decide whether external research adds value.
+
+**Read between the lines.** Pay attention to signals from the conversation so far:
+- **User familiarity** — Are they pointing to specific files or patterns? They likely know the codebase well.
+- **User intent** — Do they want speed or thoroughness? Exploration or execution?
+- **Topic risk** — Security, payments, external APIs warrant more caution regardless of user signals.
+- **Uncertainty level** — Is the approach clear or still open-ended?
+
+**Always lean toward external research when:**
+- The topic is high-risk: security, payments, privacy, external APIs, migrations, compliance
+- The codebase lacks relevant local patterns
+- The user is exploring unfamiliar territory
+
+**Skip external research when:**
+- The codebase already shows a strong local pattern
+- The user already knows the intended shape
+- Additional external context would add little practical value
+
+Announce the decision briefly before continuing. Examples:
+- "Your codebase has solid patterns for this. Proceeding without external research."
+- "This involves payment processing, so I'll research current best practices first."
+
+#### 1.3 External Research (Conditional)
+
+If Step 1.2 indicates external research is useful, run these agents in parallel:
+
+- Task compound-engineering:research:best-practices-researcher(planning context summary)
+- Task compound-engineering:research:framework-docs-researcher(planning context summary)
+
+#### 1.4 Consolidate Research
+
+Summarize:
+- Relevant codebase patterns and file paths
+- Relevant institutional learnings
+- External references and best practices, if gathered
+- Related issues, PRs, or prior art
+- Any constraints that should materially shape the plan
+
+#### 1.5 Flow and Edge-Case Analysis (Conditional)
+
+For **Standard** or **Deep** plans, or when user flow completeness is still unclear, run:
+
+- Task compound-engineering:workflow:spec-flow-analyzer(planning context summary, research findings)
+
+Use the output to:
+- Identify missing edge cases, state transitions, or handoff gaps
+- Tighten requirements trace or verification strategy
+- Add only the flow details that materially improve the plan
+
+### Phase 2: Resolve Planning Questions
+
+Build a planning question list from:
+- Deferred questions in the origin document
+- Gaps discovered in repo or external research
+- Technical decisions required to produce a useful plan
+
+For each question, decide whether it should be:
+- **Resolved during planning** - the answer is knowable from repo context, documentation, or user choice
+- **Deferred to implementation** - the answer depends on code changes, runtime behavior, or execution-time discovery
+
+Ask the user only when the answer materially affects architecture, scope, sequencing, or risk and cannot be responsibly inferred. Use the platform's blocking question tool when available (see Interaction Method).
+
+**Do not** run tests, build the app, or probe runtime behavior in this phase. The goal is a strong plan, not partial execution.
+
+### Phase 3: Structure the Plan
+
+#### 3.1 Title and File Naming
+
+- Draft a clear, searchable title using conventional format such as `feat: Add user authentication` or `fix: Prevent checkout double-submit`
+- Determine the plan type: `feat`, `fix`, or `refactor`
+- Build the filename following the repository convention: `docs/plans/YYYY-MM-DD-NNN-<type>-<descriptive-name>-beta-plan.md`
+  - Create `docs/plans/` if it does not exist
+  - Check existing files for today's date to determine the next sequence number (zero-padded to 3 digits, starting at 001)
+  - Keep the descriptive name concise (3-5 words) and kebab-cased
+  - Append `-beta` before `-plan` to distinguish from stable-generated plans
+  - Examples: `2026-01-15-001-feat-user-authentication-flow-beta-plan.md`, `2026-02-03-002-fix-checkout-race-condition-beta-plan.md`
+  - Avoid: missing sequence numbers, vague names like "new-feature", invalid characters (colons, spaces)
+
+#### 3.2 Stakeholder and Impact Awareness
+
+For **Standard** or **Deep** plans, briefly consider who is affected by this change — end users, developers, operations, other teams — and how that should shape the plan. For cross-cutting work, note affected parties in the System-Wide Impact section.
+
+#### 3.3 Break Work into Implementation Units
+
+Break the work into logical implementation units. Each unit should represent one meaningful change that an implementer could typically land as an atomic commit.
+
+Good units are:
+- Focused on one component, behavior, or integration seam
+- Usually touching a small cluster of related files
+- Ordered by dependency
+- Concrete enough for execution without pre-writing code
+- Marked with checkbox syntax for progress tracking
+
+Avoid:
+- 2-5 minute micro-steps
+- Units that span multiple unrelated concerns
+- Units that are so vague an implementer still has to invent the plan
+
+#### 3.4 Define Each Implementation Unit
+
+For each unit, include:
+- **Goal** - what this unit accomplishes
+- **Requirements** - which requirements or success criteria it advances
+- **Dependencies** - what must exist first
+- **Files** - exact file paths to create, modify, or test
+- **Approach** - key decisions, data flow, component boundaries, or integration notes
+- **Patterns to follow** - existing code or conventions to mirror
+- **Test scenarios** - specific behaviors, edge cases, and failure paths to cover
+- **Verification** - how an implementer should know the unit is complete, expressed as outcomes rather than shell command scripts
+
+Every feature-bearing unit should include the test file path in `**Files:**`.
+
+#### 3.5 Keep Planning-Time and Implementation-Time Unknowns Separate
+
+If something is important but not knowable yet, record it explicitly under deferred implementation notes rather than pretending to resolve it in the plan.
+
+Examples:
+- Exact method or helper names
+- Final SQL or query details after touching real code
+- Runtime behavior that depends on seeing actual test failures
+- Refactors that may become unnecessary once implementation starts
+
+### Phase 4: Write the Plan
+
+Use one planning philosophy across all depths. Change the amount of detail, not the boundary between planning and execution.
+
+#### 4.1 Plan Depth Guidance
+
+**Lightweight**
+- Keep the plan compact
+- Usually 2-4 implementation units
+- Omit optional sections that add little value
+
+**Standard**
+- Use the full core template
+- Usually 3-6 implementation units
+- Include risks, deferred questions, and system-wide impact when relevant
+
+**Deep**
+- Use the full core template plus optional analysis sections
+- Usually 4-8 implementation units
+- Group units into phases when that improves clarity
+- Include alternatives considered, documentation impacts, and deeper risk treatment when warranted
+
+#### 4.1b Optional Deep Plan Extensions
+
+For sufficiently large, risky, or cross-cutting work, add the sections that genuinely help:
+- **Alternative Approaches Considered**
+- **Success Metrics**
+- **Dependencies / Prerequisites**
+- **Risk Analysis & Mitigation**
+- **Phased Delivery**
+- **Documentation Plan**
+- **Operational / Rollout Notes**
+- **Future Considerations** only when they materially affect current design
+
+Do not add these as boilerplate. Include them only when they improve execution quality or stakeholder alignment.
+
+#### 4.2 Core Plan Template
+
+Omit clearly inapplicable optional sections, especially for Lightweight plans.
+
+```markdown
+---
+title: [Plan Title]
+type: [feat|fix|refactor]
+status: active
+date: YYYY-MM-DD
+origin: docs/brainstorms/YYYY-MM-DD-<topic>-requirements.md  # include when planning from a requirements doc
+deepened: YYYY-MM-DD  # optional, set later by deepen-plan-beta when the plan is substantively strengthened
+---
+
+# [Plan Title]
+
+## Overview
+
+[What is changing and why]
+
+## Problem Frame
+
+[Summarize the user/business problem and context. Reference the origin doc when present.]
+
+## Requirements Trace
+
+- R1. [Requirement or success criterion this plan must satisfy]
+- R2. [Requirement or success criterion this plan must satisfy]
+
+## Scope Boundaries
+
+- [Explicit non-goal or exclusion]
+
+## Context & Research
+
+### Relevant Code and Patterns
+
+- [Existing file, class, component, or pattern to follow]
+
+### Institutional Learnings
+
+- [Relevant `docs/solutions/` insight]
+
+### External References
+
+- [Relevant external docs or best-practice source, if used]
+
+## Key Technical Decisions
+
+- [Decision]: [Rationale]
+
+## Open Questions
+
+### Resolved During Planning
+
+- [Question]: [Resolution]
+
+### Deferred to Implementation
+
+- [Question or unknown]: [Why it is intentionally deferred]
+
+## Implementation Units
+
+- [ ] **Unit 1: [Name]**
+
+**Goal:** [What this unit accomplishes]
+
+**Requirements:** [R1, R2]
+
+**Dependencies:** [None / Unit 1 / external prerequisite]
+
+**Files:**
+- Create: `path/to/new_file`
+- Modify: `path/to/existing_file`
+- Test: `path/to/test_file`
+
+**Approach:**
+- [Key design or sequencing decision]
+
+**Patterns to follow:**
+- [Existing file, class, or pattern]
+
+**Test scenarios:**
+- [Specific scenario with expected behavior]
+- [Edge case or failure path]
+
+**Verification:**
+- [Outcome that should hold when this unit is complete]
+
+## System-Wide Impact
+
+- **Interaction graph:** [What callbacks, middleware, observers, or entry points may be affected]
+- **Error propagation:** [How failures should travel across layers]
+- **State lifecycle risks:** [Partial-write, cache, duplicate, or cleanup concerns]
+- **API surface parity:** [Other interfaces that may require the same change]
+- **Integration coverage:** [Cross-layer scenarios unit tests alone will not prove]
+
+## Risks & Dependencies
+
+- [Meaningful risk, dependency, or sequencing concern]
+
+## Documentation / Operational Notes
+
+- [Docs, rollout, monitoring, or support impacts when relevant]
+
+## Sources & References
+
+- **Origin document:** [docs/brainstorms/YYYY-MM-DD-<topic>-requirements.md](path)
+- Related code: [path or symbol]
+- Related PRs/issues: #[number]
+- External docs: [url]
+```
+
+For larger `Deep` plans, extend the core template only when useful with sections such as:
+
+```markdown
+## Alternative Approaches Considered
+
+- [Approach]: [Why rejected or not chosen]
+
+## Success Metrics
+
+- [How we will know this solved the intended problem]
+
+## Dependencies / Prerequisites
+
+- [Technical, organizational, or rollout dependency]
+
+## Risk Analysis & Mitigation
+
+- [Risk]: [Mitigation]
+
+## Phased Delivery
+
+### Phase 1
+- [What lands first and why]
+
+### Phase 2
+- [What follows and why]
+
+## Documentation Plan
+
+- [Docs or runbooks to update]
+
+## Operational / Rollout Notes
+
+- [Monitoring, migration, feature flag, or rollout considerations]
+```
+
+#### 4.3 Planning Rules
+
+- Prefer path plus class/component/pattern references over brittle line numbers
+- Keep implementation units checkable with `- [ ]` syntax for progress tracking
+- Do not include fenced implementation code blocks unless the plan itself is about code shape as a design artifact
+- Do not include git commands, commit messages, or exact test command recipes
+- Do not pretend an execution-time question is settled just to make the plan look complete
+- Include mermaid diagrams when they clarify relationships or flows that prose alone would make hard to follow — ERDs for data model changes, sequence diagrams for multi-service interactions, state diagrams for lifecycle transitions, flowcharts for complex branching logic
+
+### Phase 5: Final Review, Write File, and Handoff
+
+#### 5.1 Review Before Writing
+
+Before finalizing, check:
+- The plan does not invent product behavior that should have been defined in `ce:brainstorm`
+- If there was no origin document, the bounded planning bootstrap established enough product clarity to plan responsibly
+- Every major decision is grounded in the origin document or research
+- Each implementation unit is concrete, dependency-ordered, and implementation-ready
+- Test scenarios are specific without becoming test code
+- Deferred items are explicit and not hidden as fake certainty
+
+If the plan originated from a requirements document, re-read that document and verify:
+- The chosen approach still matches the product intent
+- Scope boundaries and success criteria are preserved
+- Blocking questions were either resolved, explicitly assumed, or sent back to `ce:brainstorm`
+- Every section of the origin document is addressed in the plan — scan each section to confirm nothing was silently dropped
+
+#### 5.2 Write Plan File
+
+**REQUIRED: Write the plan file to disk before presenting any options.**
+
+Use the Write tool to save the complete plan to:
+
+```text
+docs/plans/YYYY-MM-DD-NNN-<type>-<descriptive-name>-beta-plan.md
+```
+
+Confirm:
+
+```text
+Plan written to docs/plans/[filename]
+```
+
+**Pipeline mode:** If invoked from an automated workflow such as LFG, SLFG, or any `disable-model-invocation` context, skip interactive questions. Make the needed choices automatically and proceed to writing the plan.
+
+#### 5.3 Post-Generation Options
+
+After writing the plan file, present the options using the platform's blocking question tool when available (see Interaction Method). Otherwise present numbered options in chat and wait for the user's reply before proceeding.
+
+**Question:** "Plan ready at `docs/plans/YYYY-MM-DD-NNN-<type>-<name>-beta-plan.md`. What would you like to do next?"
+
+**Options:**
+1. **Open plan in editor** - Open the plan file for review
+2. **Run `/deepen-plan-beta`** - Stress-test weak sections with targeted research when the plan needs more confidence
+3. **Run `document-review` skill** - Improve the plan through structured document review
+4. **Share to Proof** - Upload the plan for collaborative review and sharing
+5. **Start `/ce:work`** - Begin implementing this plan in the current environment
+6. **Start `/ce:work` in another session** - Begin implementing in a separate agent session when the current platform supports it
+7. **Create Issue** - Create an issue in the configured tracker
+
+Based on selection:
+- **Open plan in editor** → Open `docs/plans/<plan_filename>.md` using the current platform's file-open or editor mechanism (e.g., `open` on macOS, `xdg-open` on Linux, or the IDE's file-open API)
+- **`/deepen-plan-beta`** → Call `/deepen-plan-beta` with the plan path
+- **`document-review` skill** → Load the `document-review` skill with the plan path
+- **Share to Proof** → Upload the plan:
+  ```bash
+  CONTENT=$(cat docs/plans/<plan_filename>.md)
+  TITLE="Plan: <plan title from frontmatter>"
+  RESPONSE=$(curl -s -X POST https://www.proofeditor.ai/share/markdown \
+    -H "Content-Type: application/json" \
+    -d "$(jq -n --arg title "$TITLE" --arg markdown "$CONTENT" --arg by "ai:compound" '{title: $title, markdown: $markdown, by: $by}')")
+  PROOF_URL=$(echo "$RESPONSE" | jq -r '.tokenUrl')
+  ```
+  Display `View & collaborate in Proof: <PROOF_URL>` if successful, then return to the options
+- **`/ce:work`** → Call `/ce:work` with the plan path
+- **`/ce:work` in another session** → If the current platform supports launching a separate agent session, start `/ce:work` with the plan path there. Otherwise, explain the limitation briefly and offer to run `/ce:work` in the current session instead.
+- **Create Issue** → Follow the Issue Creation section below
+- **Other** → Accept free text for revisions and loop back to options
+
+If running with ultrathink enabled, or the platform's reasoning/effort level is set to max or extra-high, automatically run `/deepen-plan-beta` only when the plan is `Standard` or `Deep`, high-risk, or still shows meaningful confidence gaps in decisions, sequencing, system-wide impact, risks, or verification.
+
+## Issue Creation
+
+When the user selects "Create Issue", detect their project tracker from `CLAUDE.md` or `AGENTS.md`:
+
+1. Look for `project_tracker: github` or `project_tracker: linear`
+2. If GitHub:
+
+   ```bash
+   gh issue create --title "<type>: <title>" --body-file <plan_path>
+   ```
+
+3. If Linear:
+
+   ```bash
+   linear issue create --title "<title>" --description "$(cat <plan_path>)"
+   ```
+
+4. If no tracker is configured:
+   - Ask which tracker they use using the platform's blocking question tool when available (see Interaction Method)
+   - Suggest adding the tracker to `CLAUDE.md` or `AGENTS.md` for future runs
+
+After issue creation:
+- Display the issue URL
+- Ask whether to proceed to `/ce:work`
+
+NEVER CODE! Research, decide, and write the plan.
--- a/plugins/compound-engineering/skills/ce-work/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-work/SKILL.md
@@ -23,6 +23,10 @@ This command takes a work document (plan, specification, or todo file) and execu
 1. **Read Plan and Clarify**

   - Read the work document completely
+   - Treat the plan as a decision artifact, not an execution script
+   - If the plan includes sections such as `Implementation Units`, `Work Breakdown`, `Requirements Trace`, `Files`, `Test Scenarios`, or `Verification`, use those as the primary source material for execution
+   - Check for a `Deferred to Implementation` or `Implementation-Time Unknowns` section — these are questions the planner intentionally left for you to resolve during execution. Note them before starting so they inform your approach rather than surprising you mid-task
+   - Check for a `Scope Boundaries` section — these are explicit non-goals. Refer back to them if implementation starts pulling you toward adjacent work
   - Review any references or links provided in the plan
   - If anything is unclear or ambiguous, ask clarifying questions now
   - Get user approval to proceed
@@ -73,12 +77,35 @@ This command takes a work document (plan, specification, or todo file) and execu
   - You plan to switch between branches frequently

 3. **Create Todo List**
-   - Use TodoWrite to break plan into actionable tasks
+   - Use your available task tracking tool (e.g., TodoWrite, task lists) to break the plan into actionable tasks
+   - Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
+   - For each unit, read the `Patterns to follow` field before implementing — these point to specific files or conventions to mirror
+   - Use each unit's `Verification` field as the primary "done" signal for that task
+   - Do not expect the plan to contain implementation code, micro-step TDD instructions, or exact shell commands
   - Include dependencies between tasks
   - Prioritize based on what needs to be done first
   - Include testing and quality check tasks
   - Keep tasks specific and completable

+4. **Choose Execution Strategy**
+
+   After creating the task list, decide how to execute based on the plan's size and dependency structure:
+
+   | Strategy | When to use |
+   |----------|-------------|
+   | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight |
+   | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks |
+   | **Parallel subagents** | 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete |
+
+   **Subagent dispatch** uses your available subagent or task spawning mechanism. For each unit, give the subagent:
+   - The full plan file path (for overall context)
+   - The specific unit's Goal, Files, Approach, Patterns, Test scenarios, and Verification
+   - Any resolved deferred questions relevant to that unit
+
+   After each subagent completes, update the plan checkboxes and task list before dispatching the next dependent unit.
+
+   For genuinely large plans needing persistent inter-agent communication (agents challenging each other's approaches, shared coordination across 10+ tasks), see Swarm Mode below which uses Agent Teams.
+
 ### Phase 2: Execute

 1. **Task Execution Loop**
@@ -87,15 +114,14 @@ This command takes a work document (plan, specification, or todo file) and execu

   ```
   while (tasks remain):
-     - Mark task as in_progress in TodoWrite
+     - Mark task as in-progress
     - Read any referenced files from the plan
     - Look for similar patterns in codebase
     - Implement following existing conventions
     - Write tests for new functionality
     - Run System-Wide Test Check (see below)
     - Run tests after changes
-     - Mark task as completed in TodoWrite
-     - Mark off the corresponding checkbox in the plan file ([ ] → [x])
+     - Mark task as completed
     - Evaluate for incremental commit (see below)
   ```

@@ -113,7 +139,6 @@ This command takes a work document (plan, specification, or todo file) and execu

   **When this matters most:** Any change that touches models with callbacks, error handling with fallback/retry, or functionality exposed through multiple interfaces.

-   **IMPORTANT**: Always update the original plan document by checking off completed items. Use the Edit tool to change `- [ ]` to `- [x]` for each task you finish. This keeps the plan as a living document showing progress and ensures no checkboxes are left unchecked.

 2. **Incremental Commits**

@@ -128,6 +153,8 @@ This command takes a work document (plan, specification, or todo file) and execu

   **Heuristic:** "Can I write a commit message that describes a complete, valuable change? If yes, commit. If the message would be 'WIP' or 'partial X', wait."

+   If the plan has Implementation Units, use them as a starting guide for commit boundaries — but adapt based on what you find during implementation. A unit might need multiple commits if it's larger than expected, or small related units might land together. Use each unit's Goal to inform the commit message.
+
   **Commit workflow:**
   ```bash
   # 1. Verify tests pass (use project's test command)
@@ -160,7 +187,15 @@ This command takes a work document (plan, specification, or todo file) and execu
   - Add new tests for new functionality
   - **Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together.** If your change touches callbacks, middleware, or error handling — you need both.

-5. **Figma Design Sync** (if applicable)
+5. **Simplify as You Go**
+
+   After completing a cluster of related implementation units (or every 2-3 units), review recently changed files for simplification opportunities — consolidate duplicated patterns, extract shared helpers, and improve code reuse and efficiency. This is especially valuable when using subagents, since each agent works with isolated context and can't see patterns emerging across units.
+
+   Don't simplify after every single unit — early patterns may look duplicated but diverge intentionally in later units. Wait for a natural phase boundary or when you notice accumulated complexity.
+
+   If a `/simplify` skill or equivalent is available, use it. Otherwise, review the changed files yourself for reuse and consolidation opportunities.
+
+6. **Figma Design Sync** (if applicable)

   For UI work with Figma designs:

@@ -170,7 +205,7 @@ This command takes a work document (plan, specification, or todo file) and execu
   - Repeat until implementation matches design

 6. **Track Progress**
-   - Keep TodoWrite updated as you complete tasks
+   - Keep the task list updated as you complete tasks
   - Note any blockers or unexpected discoveries
   - Create new tasks if scope expands
   - Keep user informed of major milestones
@@ -196,12 +231,14 @@ This command takes a work document (plan, specification, or todo file) and execu
   Run configured agents in parallel with Task tool. Present findings and address critical issues.

 3. **Final Validation**
-   - All TodoWrite tasks marked completed
+   - All tasks marked completed
   - All tests pass
   - Linting passes
   - Code follows existing patterns
   - Figma designs match (if applicable)
   - No console errors or warnings
+   - If the plan has a `Requirements Trace`, verify each requirement is satisfied by the completed work
+   - If any `Deferred to Implementation` questions were noted, confirm they were resolved during execution

 4. **Prepare Operational Validation Plan** (REQUIRED)
   - Add a `## Post-Deploy Monitoring & Validation` section to the PR description for every change.
@@ -344,73 +381,30 @@ This command takes a work document (plan, specification, or todo file) and execu

 ---

-## Swarm Mode (Optional)
+## Swarm Mode with Agent Teams (Optional)

-For complex plans with multiple independent workstreams, enable swarm mode for parallel execution with coordinated agents.
+For genuinely large plans where agents need to communicate with each other, challenge approaches, or coordinate across 10+ tasks with persistent specialized roles, use agent team capabilities if available (e.g., Agent Teams in Claude Code, multi-agent workflows in Codex).

-### When to Use Swarm Mode
+**Agent teams are typically experimental and require opt-in.** Do not attempt to use agent teams unless the user explicitly requests swarm mode or agent teams, and the platform supports it.

-| Use Swarm Mode when... | Use Standard Mode when... |
-|------------------------|---------------------------|
-| Plan has 5+ independent tasks | Plan is linear/sequential |
-| Multiple specialists needed (review + test + implement) | Single-focus work |
-| Want maximum parallelism | Simpler mental model preferred |
-| Large feature with clear phases | Small feature or bug fix |
+### When to Use Agent Teams vs Subagents

-### Enabling Swarm Mode
+| Agent Teams | Subagents (standard mode) |
+|-------------|---------------------------|
+| Agents need to discuss and challenge each other's approaches | Each task is independent — only the result matters |
+| Persistent specialized roles (e.g., dedicated tester running continuously) | Workers report back and finish |
+| 10+ tasks with complex cross-cutting coordination | 3-8 tasks with clear dependency chains |
+| User explicitly requests "swarm mode" or "agent teams" | Default for most plans |

-To trigger swarm execution, say:
+Most plans should use subagent dispatch from standard mode. Agent teams add significant token cost and coordination overhead — use them when the inter-agent communication genuinely improves the outcome.

-> "Make a Task list and launch an army of agent swarm subagents to build the plan"
+### Agent Teams Workflow

-Or explicitly request: "Use swarm mode for this work"
-
-### Swarm Workflow
-
-When swarm mode is enabled, the workflow changes:
-
-1. **Create Team**
-   ```
-   Teammate({ operation: "spawnTeam", team_name: "work-{timestamp}" })
-   ```
-
-2. **Create Task List with Dependencies**
-   - Parse plan into TaskCreate items
-   - Set up blockedBy relationships for sequential dependencies
-   - Independent tasks have no blockers (can run in parallel)
-
-3. **Spawn Specialized Teammates**
-   ```
-   Task({
-     team_name: "work-{timestamp}",
-     name: "implementer",
-     subagent_type: "general-purpose",
-     prompt: "Claim implementation tasks, execute, mark complete",
-     run_in_background: true
-   })
-
-   Task({
-     team_name: "work-{timestamp}",
-     name: "tester",
-     subagent_type: "general-purpose",
-     prompt: "Claim testing tasks, run tests, mark complete",
-     run_in_background: true
-   })
-   ```
-
-4. **Coordinate and Monitor**
-   - Team lead monitors task completion
-   - Spawn additional workers as phases unblock
-   - Handle plan approval if required
-
-5. **Cleanup**
-   ```
-   Teammate({ operation: "requestShutdown", target_agent_id: "implementer" })
-   Teammate({ operation: "requestShutdown", target_agent_id: "tester" })
-   Teammate({ operation: "cleanup" })
-   ```
-
-See the `orchestrating-swarms` skill for detailed swarm patterns and best practices.
+1. **Create team** — use your available team creation mechanism
+2. **Create task list** — parse Implementation Units into tasks with dependency relationships
+3. **Spawn teammates** — assign specialized roles (implementer, tester, reviewer) based on the plan's needs. Give each teammate the plan file path and their specific task assignments
+4. **Coordinate** — the lead monitors task completion, reassigns work if someone gets stuck, and spawns additional workers as phases unblock
+5. **Cleanup** — shut down all teammates, then clean up the team resources

 ---

@@ -452,7 +446,7 @@ See the `orchestrating-swarms` skill for detailed swarm patterns and best practi
 Before creating PR, verify:

 - [ ] All clarifying questions asked and answered
- [ ] All TodoWrite tasks marked completed
+- [ ] All tasks marked completed
 - [ ] Tests pass (run project's test command)
 - [ ] Linting passes (use linting-agent)
 - [ ] Code follows existing patterns
@@ -481,6 +475,6 @@ For most features: tests + linting + following patterns is sufficient.
 - **Skipping clarifying questions** - Ask now, not after building wrong thing
 - **Ignoring plan references** - The plan has links for a reason
 - **Testing at the end** - Test continuously or suffer later
- **Forgetting TodoWrite** - Track progress or lose track of what's done
+- **Forgetting to track progress** - Update task status as you go or lose track of what's done
 - **80% done syndrome** - Finish the feature, don't move on early
 - **Over-reviewing simple changes** - Save reviewer agents for complex work
--- a/plugins/compound-engineering/skills/deepen-plan-beta/SKILL.md
+++ b/plugins/compound-engineering/skills/deepen-plan-beta/SKILL.md
@@ -0,0 +1,322 @@
+---
+name: deepen-plan-beta
+description: "[BETA] Stress-test an existing implementation plan and selectively strengthen weak sections with targeted research. Use when a plan needs more confidence around decisions, sequencing, system-wide impact, risks, or verification. Best for Standard or Deep plans, or high-risk topics such as auth, payments, migrations, external APIs, and security. For structural or clarity improvements, prefer document-review instead."
+argument-hint: "[path to plan file]"
+disable-model-invocation: true
+---
+
+# Deepen Plan
+
+## Introduction
+
+**Note: The current year is 2026.** Use this when searching for recent documentation and best practices.
+
+`ce:plan-beta` does the first planning pass. `deepen-plan-beta` is a second-pass confidence check.
+
+Use this skill when the plan already exists and the question is not "Is this document clear?" but rather "Is this plan grounded enough for the complexity and risk involved?"
+
+This skill does **not** turn plans into implementation scripts. It identifies weak sections, runs targeted research only for those sections, and strengthens the plan in place.
+
+`document-review` and `deepen-plan-beta` are different:
+- Use the `document-review` skill when the document needs clarity, simplification, completeness, or scope control
+- Use `deepen-plan-beta` when the document is structurally sound but still needs stronger rationale, sequencing, risk treatment, or system-wide thinking
+
+## Interaction Method
+
+Use the platform's question tool when available. When asking the user a question, prefer the platform's blocking question tool if one exists (`AskUserQuestion` in Claude Code, `request_user_input` in Codex, `ask_user` in Gemini). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
+
+Ask one question at a time. Prefer a concise single-select choice when natural options exist.
+
+## Plan File
+
+<plan_path> #$ARGUMENTS </plan_path>
+
+If the plan path above is empty:
+1. Check `docs/plans/` for recent files
+2. Ask the user which plan to deepen using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding
+
+Do not proceed until you have a valid plan file path.
+
+## Core Principles
+
+1. **Stress-test, do not inflate** - Deepening should increase justified confidence, not make the plan longer for its own sake.
+2. **Selective depth only** - Focus on the weakest 2-5 sections rather than enriching everything.
+3. **Preserve the planning boundary** - No implementation code, no git command choreography, no exact test command recipes.
+4. **Use artifact-contained evidence** - Work from the written plan, its `Context & Research`, `Sources & References`, and its origin document when present.
+5. **Respect product boundaries** - Do not invent new product requirements. If deepening reveals a product-level gap, surface it as an open question or route back to `ce:brainstorm`.
+6. **Prioritize risk and cross-cutting impact** - The more dangerous or interconnected the work, the more valuable another planning pass becomes.
+
+## Workflow
+
+### Phase 0: Load the Plan and Decide Whether Deepening Is Warranted
+
+#### 0.1 Read the Plan and Supporting Inputs
+
+Read the plan file completely.
+
+If the plan frontmatter includes an `origin:` path:
+- Read the origin document too
+- Use it to check whether the plan still reflects the product intent, scope boundaries, and success criteria
+
+#### 0.2 Classify Plan Depth and Topic Risk
+
+Determine the plan depth from the document:
+- **Lightweight** - small, bounded, low ambiguity, usually 2-4 implementation units
+- **Standard** - moderate complexity, some technical decisions, usually 3-6 units
+- **Deep** - cross-cutting, high-risk, or strategically important work, usually 4-8 units or phased delivery
+
+Also build a risk profile. Treat these as high-risk signals:
+- Authentication, authorization, or security-sensitive behavior
+- Payments, billing, or financial flows
+- Data migrations, backfills, or persistent data changes
+- External APIs or third-party integrations
+- Privacy, compliance, or user data handling
+- Cross-interface parity or multi-surface behavior
+- Significant rollout, monitoring, or operational concerns
+
+#### 0.3 Decide Whether to Deepen
+
+Use this default:
+- **Lightweight** plans usually do not need deepening unless they are high-risk or the user explicitly requests it
+- **Standard** plans often benefit when one or more important sections still look thin
+- **Deep** or high-risk plans often benefit from a targeted second pass
+
+If the plan already appears sufficiently grounded:
+- Say so briefly
+- Recommend moving to `/ce:work` or the `document-review` skill
+- If the user explicitly asked to deepen anyway, continue with a light pass and deepen at most 1-2 sections
+
+### Phase 1: Parse the Current `ce:plan-beta` Structure
+
+Map the plan into the current template. Look for these sections, or their nearest equivalents:
+- `Overview`
+- `Problem Frame`
+- `Requirements Trace`
+- `Scope Boundaries`
+- `Context & Research`
+- `Key Technical Decisions`
+- `Open Questions`
+- `Implementation Units`
+- `System-Wide Impact`
+- `Risks & Dependencies`
+- `Documentation / Operational Notes`
+- `Sources & References`
+- Optional deep-plan sections such as `Alternative Approaches Considered`, `Success Metrics`, `Phased Delivery`, `Risk Analysis & Mitigation`, and `Operational / Rollout Notes`
+
+If the plan was written manually or uses different headings:
+- Map sections by intent rather than exact heading names
+- If a section is structurally present but titled differently, treat it as the equivalent section
+- If the plan truly lacks a section, decide whether that absence is intentional for the plan depth or a confidence gap worth scoring
+
+Also collect:
+- Frontmatter, including existing `deepened:` date if present
+- Number of implementation units
+- Which files and test files are named
+- Which learnings, patterns, or external references are cited
+- Which sections appear omitted because they were unnecessary versus omitted because they are missing
+
+### Phase 2: Score Confidence Gaps
+
+Use a checklist-first, risk-weighted scoring pass.
+
+For each section, compute:
+- **Trigger count** - number of checklist problems that apply
+- **Risk bonus** - add 1 if the topic is high-risk and this section is materially relevant to that risk
+- **Critical-section bonus** - add 1 for `Key Technical Decisions`, `Implementation Units`, `System-Wide Impact`, `Risks & Dependencies`, or `Open Questions` in `Standard` or `Deep` plans
+
+Treat a section as a candidate if:
+- it hits **2+ total points**, or
+- it hits **1+ point** in a high-risk domain and the section is materially important
+
+Choose only the top **2-5** sections by score. If the user explicitly asked to deepen a lightweight plan, cap at **1-2** sections unless the topic is high-risk.
+
+Example:
+- A `Key Technical Decisions` section with 1 checklist trigger and the critical-section bonus scores **2 points** and is a candidate
+- A `Risks & Dependencies` section with 1 checklist trigger in a high-risk migration plan also becomes a candidate because the risk bonus applies
+
+If the plan already has a `deepened:` date:
+- Prefer sections that have not yet been substantially strengthened, if their scores are comparable
+- Revisit an already-deepened section only when it still scores clearly higher than alternatives or the user explicitly asks for another pass on it
+
+#### 2.1 Section Checklists
+
+Use these triggers.
+
+**Requirements Trace**
+- Requirements are vague or disconnected from implementation units
+- Success criteria are missing or not reflected downstream
+- Units do not clearly advance the traced requirements
+- Origin requirements are not clearly carried forward
+
+**Context & Research / Sources & References**
+- Relevant repo patterns are named but never used in decisions or implementation units
+- Cited learnings or references do not materially shape the plan
+- High-risk work lacks appropriate external or internal grounding
+- Research is generic instead of tied to this repo or this plan
+
+**Key Technical Decisions**
+- A decision is stated without rationale
+- Rationale does not explain tradeoffs or rejected alternatives
+- The decision does not connect back to scope, requirements, or origin context
+- An obvious design fork exists but the plan never addresses why one path won
+
+**Open Questions**
+- Product blockers are hidden as assumptions
+- Planning-owned questions are incorrectly deferred to implementation
+- Resolved questions have no clear basis in repo context, research, or origin decisions
+- Deferred items are too vague to be useful later
+
+**Implementation Units**
+- Dependency order is unclear or likely wrong
+- File paths or test file paths are missing where they should be explicit
+- Units are too large, too vague, or broken into micro-steps
+- Approach notes are thin or do not name the pattern to follow
+- Test scenarios or verification outcomes are vague
+
+**System-Wide Impact**
+- Affected interfaces, callbacks, middleware, entry points, or parity surfaces are missing
+- Failure propagation is underexplored
+- State lifecycle, caching, or data integrity risks are absent where relevant
+- Integration coverage is weak for cross-layer work
+
+**Risks & Dependencies / Documentation / Operational Notes**
+- Risks are listed without mitigation
+- Rollout, monitoring, migration, or support implications are missing when warranted
+- External dependency assumptions are weak or unstated
+- Security, privacy, performance, or data risks are absent where they obviously apply
+
+Use the plan's own `Context & Research` and `Sources & References` as evidence. If those sections cite a pattern, learning, or risk that never affects decisions, implementation units, or verification, treat that as a confidence gap.
+
+### Phase 3: Select Targeted Research Agents
+
+For each selected section, choose the smallest useful agent set. Do **not** run every agent. Use at most **1-3 agents per section** and usually no more than **8 agents total**.
+
+Use fully-qualified agent names inside Task calls.
+
+#### 3.1 Deterministic Section-to-Agent Mapping
+
+**Requirements Trace / Open Questions classification**
+- `compound-engineering:workflow:spec-flow-analyzer` for missing user flows, edge cases, and handoff gaps
+- `compound-engineering:research:repo-research-analyst` for repo-grounded patterns, conventions, and implementation reality checks
+
+**Context & Research / Sources & References gaps**
+- `compound-engineering:research:learnings-researcher` for institutional knowledge and past solved problems
+- `compound-engineering:research:framework-docs-researcher` for official framework or library behavior
+- `compound-engineering:research:best-practices-researcher` for current external patterns and industry guidance
+- Add `compound-engineering:research:git-history-analyzer` only when historical rationale or prior art is materially missing
+
+**Key Technical Decisions**
+- `compound-engineering:review:architecture-strategist` for design integrity, boundaries, and architectural tradeoffs
+- Add `compound-engineering:research:framework-docs-researcher` or `compound-engineering:research:best-practices-researcher` when the decision needs external grounding beyond repo evidence
+
+**Implementation Units / Verification**
+- `compound-engineering:research:repo-research-analyst` for concrete file targets, patterns to follow, and repo-specific sequencing clues
+- `compound-engineering:review:pattern-recognition-specialist` for consistency, duplication risks, and alignment with existing patterns
+- Add `compound-engineering:workflow:spec-flow-analyzer` when sequencing depends on user flow or handoff completeness
+
+**System-Wide Impact**
+- `compound-engineering:review:architecture-strategist` for cross-boundary effects, interface surfaces, and architectural knock-on impact
+- Add the specific specialist that matches the risk:
+  - `compound-engineering:review:performance-oracle` for scalability, latency, throughput, and resource-risk analysis
+  - `compound-engineering:review:security-sentinel` for auth, validation, exploit surfaces, and security boundary review
+  - `compound-engineering:review:data-integrity-guardian` for migrations, persistent state safety, consistency, and data lifecycle risks
+
+**Risks & Dependencies / Operational Notes**
+- Use the specialist that matches the actual risk:
+  - `compound-engineering:review:security-sentinel` for security, auth, privacy, and exploit risk
+  - `compound-engineering:review:data-integrity-guardian` for persistent data safety, constraints, and transaction boundaries
+  - `compound-engineering:review:data-migration-expert` for migration realism, backfills, and production data transformation risk
+  - `compound-engineering:review:deployment-verification-agent` for rollout checklists, rollback planning, and launch verification
+  - `compound-engineering:review:performance-oracle` for capacity, latency, and scaling concerns
+
+#### 3.2 Agent Prompt Shape
+
+For each selected section, pass:
+- A short plan summary
+- The exact section text
+- Why the section was selected, including which checklist triggers fired
+- The plan depth and risk profile
+- A specific question to answer
+
+Instruct the agent to return:
+- findings that change planning quality
+- stronger rationale, sequencing, verification, risk treatment, or references
+- no implementation code
+- no shell commands
+
+### Phase 4: Run Targeted Research and Review
+
+Launch the selected agents in parallel.
+
+Prefer local repo and institutional evidence first. Use external research only when the gap cannot be closed responsibly from repo context or already-cited sources.
+
+If a selected section can be improved by reading the origin document more carefully, do that before dispatching external agents.
+
+If agent outputs conflict:
+- Prefer repo-grounded and origin-grounded evidence over generic advice
+- Prefer official framework documentation over secondary best-practice summaries when the conflict is about library behavior
+- If a real tradeoff remains, record it explicitly in the plan rather than pretending the conflict does not exist
+
+### Phase 5: Synthesize and Rewrite the Plan
+
+Strengthen only the selected sections. Keep the plan coherent and preserve its overall structure.
+
+Allowed changes:
+- Clarify or strengthen decision rationale
+- Tighten requirements trace or origin fidelity
+- Reorder or split implementation units when sequencing is weak
+- Add missing pattern references, file/test paths, or verification outcomes
+- Expand system-wide impact, risks, or rollout treatment where justified
+- Reclassify open questions between `Resolved During Planning` and `Deferred to Implementation` when evidence supports the change
+- Add an optional deep-plan section only when it materially improves execution quality
+- Add or update `deepened: YYYY-MM-DD` in frontmatter when the plan was substantively improved
+
+Do **not**:
+- Add fenced implementation code blocks unless the plan itself is about code shape as a design artifact
+- Add git commands, commit choreography, or exact test command recipes
+- Add generic `Research Insights` subsections everywhere
+- Rewrite the entire plan from scratch
+- Invent new product requirements, scope changes, or success criteria without surfacing them explicitly
+
+If research reveals a product-level ambiguity that should change behavior or scope:
+- Do not silently decide it here
+- Record it under `Open Questions`
+- Recommend `ce:brainstorm` if the gap is truly product-defining
+
+### Phase 6: Final Checks and Write the File
+
+Before writing:
+- Confirm the plan is stronger in specific ways, not merely longer
+- Confirm the planning boundary is intact
+- Confirm the selected sections were actually the weakest ones
+- Confirm origin decisions were preserved when an origin document exists
+- Confirm the final plan still feels right-sized for its depth
+
+Update the plan file in place by default.
+
+If the user explicitly requests a separate file, append `-deepened` before `.md`, for example:
+- `docs/plans/2026-03-15-001-feat-example-plan-deepened.md`
+
+## Post-Enhancement Options
+
+If substantive changes were made, present next steps using the platform's blocking question tool when available (see Interaction Method). Otherwise, present numbered options in chat and wait for the user's reply before proceeding.
+
+**Question:** "Plan deepened at `[plan_path]`. What would you like to do next?"
+
+**Options:**
+1. **View diff** - Show what changed
+2. **Run `document-review` skill** - Improve the updated plan through structured document review
+3. **Start `ce:work` skill** - Begin implementing the plan
+4. **Deepen specific sections further** - Run another targeted deepening pass on named sections
+
+Based on selection:
+- **View diff** -> Show the important additions and changed sections
+- **`document-review` skill** -> Load the `document-review` skill with the plan path
+- **Start `ce:work` skill** -> Call the `ce:work` skill with the plan path
+- **Deepen specific sections further** -> Ask which sections still feel weak and run another targeted pass only for those sections
+
+If no substantive changes were warranted:
+- Say that the plan already appears sufficiently grounded
+- Offer the `document-review` skill or `/ce:work` as the next step instead
+
+NEVER CODE! Research, challenge, and strengthen the plan.
--- a/plugins/compound-engineering/skills/lfg/SKILL.md
+++ b/plugins/compound-engineering/skills/lfg/SKILL.md
@@ -5,17 +5,19 @@ argument-hint: "[feature description]"
 disable-model-invocation: true
 ---

-CRITICAL: You MUST execute every step below IN ORDER. Do NOT skip any step. Do NOT jump ahead to coding or implementation. The plan phase (steps 2-3) MUST be completed and verified BEFORE any work begins. Violating this order produces bad output.
+CRITICAL: You MUST execute every step below IN ORDER. Do NOT skip any required step. Do NOT jump ahead to coding or implementation. The plan phase (step 2, and step 3 when warranted) MUST be completed and verified BEFORE any work begins. Violating this order produces bad output.

 1. **Optional:** If the `ralph-wiggum` skill is available, run `/ralph-wiggum:ralph-loop "finish all slash commands" --completion-promise "DONE"`. If not available or it fails, skip and continue to step 2 immediately.

 2. `/ce:plan $ARGUMENTS`

-   GATE: STOP. Verify that `/ce:plan` produced a plan file in `docs/plans/`. If no plan file was created, run `/ce:plan $ARGUMENTS` again. Do NOT proceed to step 3 until a written plan exists.
+   GATE: STOP. Verify that the `ce:plan` workflow produced a plan file in `docs/plans/`. If no plan file was created, run `/ce:plan $ARGUMENTS` again. Do NOT proceed to step 3 until a written plan exists.

-3. `/compound-engineering:deepen-plan`
+3. **Conditionally** run `/compound-engineering:deepen-plan`

-   GATE: STOP. Confirm the plan has been deepened and updated. The plan file in `docs/plans/` should now contain additional detail. Do NOT proceed to step 4 without a deepened plan.
+   Run the `deepen-plan` workflow only if the plan is `Standard` or `Deep`, touches a high-risk area (auth, security, payments, migrations, external APIs, significant rollout concerns), or still has obvious confidence gaps in decisions, sequencing, system-wide impact, risks, or verification.
+
+   GATE: STOP. If you ran the `deepen-plan` workflow, confirm the plan was deepened or explicitly judged sufficiently grounded. If you skipped it, briefly note why and proceed to step 4.

 4. `/ce:work`

--- a/plugins/compound-engineering/skills/slfg/SKILL.md
+++ b/plugins/compound-engineering/skills/slfg/SKILL.md
@@ -11,7 +11,10 @@ Swarm-enabled LFG. Run these steps in order, parallelizing where indicated. Do n

 1. **Optional:** If the `ralph-wiggum` skill is available, run `/ralph-wiggum:ralph-loop "finish all slash commands" --completion-promise "DONE"`. If not available or it fails, skip and continue to step 2 immediately.
 2. `/ce:plan $ARGUMENTS`
-3. `/compound-engineering:deepen-plan`
+3. **Conditionally** run `/compound-engineering:deepen-plan`
+   - Run the `deepen-plan` workflow only if the plan is `Standard` or `Deep`, touches a high-risk area (auth, security, payments, migrations, external APIs, significant rollout concerns), or still has obvious confidence gaps in decisions, sequencing, system-wide impact, risks, or verification
+   - If you run the `deepen-plan` workflow, confirm the plan was deepened or explicitly judged sufficiently grounded before moving on
+   - If you skip it, note why and continue to step 4
 4. `/ce:work` — **Use swarm mode**: Make a Task list and launch an army of agent swarm subagents to build the plan

 ## Parallel Phase