Merge upstream origin/main (v2.60.0) with fork customizations preserved

Incorporates 78 upstream commits while preserving all local fork intent: - Keep deleted: dhh-rails, kieran-rails, dspy-ruby, andrew-kane-gem-writer (FastAPI pivot) - Merge both: ce-review (zip-agent + design-conformance wiring), kieran-python-reviewer (pipeline + FastAPI conventions), ce-brainstorm/ce-plan/ce-work (improvements + deploy wiring), todo-create (template refs + assessment block), best-practices-researcher (rename + FastAPI refs) - Accept remote: 142 remote-only files, plugin.json, README.md - Keep local: 71 local-only files (custom agents, skills, commands, voice) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 12:28:53 -05:00
parent 8a1b176044 bf1f79aba4
commit 4018db3d9e
153 changed files with 12801 additions and 3761 deletions
--- a/plugins/compound-engineering/skills/ce-work/SKILL.md
+++ b/plugins/compound-engineering/skills/ce-work/SKILL.md
@@ -1,16 +1,16 @@
 ---
 name: ce:work
-description: Execute work plans efficiently while maintaining quality and finishing features
-argument-hint: "[plan file, specification, or todo file path]"
+description: Execute work efficiently while maintaining quality and finishing features
+argument-hint: "[Plan doc path or description of work. Blank to auto use latest plan doc]"
 ---

-# Work Plan Execution Command
+# Work Execution Command

-Execute a work plan efficiently while maintaining quality and finishing features.
+Execute work efficiently while maintaining quality and finishing features.

 ## Introduction

-This command takes a work document (plan, specification, or todo file) and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
+This command takes a work document (plan, specification, or todo file) or a bare prompt describing the work, and executes it systematically. The focus is on **shipping complete features** by understanding requirements quickly, following existing patterns, and maintaining quality throughout.

 ## Input Document

@@ -18,9 +18,33 @@ This command takes a work document (plan, specification, or todo file) and execu

 ## Execution Workflow

+### Phase 0: Input Triage
+
+Determine how to proceed based on what was provided in `<input_document>`.
+
+**Plan document** (input is a file path to an existing plan, specification, or todo file) → skip to Phase 1.
+
+**Bare prompt** (input is a description of work, not a file path):
+
+1. **Scan the work area**
+
+   - Identify files likely to change based on the prompt
+   - Find existing test files for those areas (search for test/spec files that import, reference, or share names with the implementation files)
+   - Note local patterns and conventions in the affected areas
+
+2. **Assess complexity and route**
+
+   | Complexity | Signals | Action |
+   |-----------|---------|--------|
+   | **Trivial** | 1-2 files, no behavioral change (typo, config, rename) | Proceed to Phase 1 step 2 (environment setup), then implement directly — no task list, no execution loop. Apply Test Discovery if the change touches behavior-bearing code |
+   | **Small / Medium** | Clear scope, under ~10 files | Build a task list from discovery. Proceed to Phase 1 step 2 |
+   | **Large** | Cross-cutting, architectural decisions, 10+ files, touches auth/payments/migrations | Inform the user this would benefit from `/ce:brainstorm` or `/ce:plan` to surface edge cases and scope boundaries. Honor their choice. If proceeding, build a task list and continue to Phase 1 step 2 |
+
+---
+
 ### Phase 1: Quick Start

-1. **Read Plan and Clarify**
+1. **Read Plan and Clarify** _(skip if arriving from Phase 0 with a bare prompt)_

   - Read the work document completely
   - Treat the plan as a decision artifact, not an execution script
@@ -49,8 +73,17 @@ This command takes a work document (plan, specification, or todo file) and execu
   ```

   **If already on a feature branch** (not the default branch):
-   - Ask: "Continue working on `[current_branch]`, or create a new branch?"
-   - If continuing, proceed to step 3
+
+   First, check whether the branch name is **meaningful** — a name like `feat/crowd-sniff` or `fix/email-validation` tells future readers what the work is about. Auto-generated worktree names (e.g., `worktree-jolly-beaming-raven`) or other opaque names do not.
+
+   If the branch name is meaningless or auto-generated, suggest renaming it before continuing:
+   ```bash
+   git branch -m <meaningful-name>
+   ```
+   Derive the new name from the plan title or work description (e.g., `feat/crowd-sniff`). Present the rename as a recommended option alongside continuing as-is.
+
+   Then ask: "Continue working on `[current_branch]`, or create a new branch?"
+   - If continuing (with or without rename), proceed to step 3
   - If creating new, follow Option A or B below

   **If on the default branch**, choose how to proceed:
@@ -78,7 +111,7 @@ This command takes a work document (plan, specification, or todo file) and execu
   - You want to keep the default branch clean while experimenting
   - You plan to switch between branches frequently

-3. **Create Todo List**
+3. **Create Todo List** _(skip if Phase 0 already built one, or if Phase 0 routed as Trivial)_
   - Use your available task tracking tool (e.g., TodoWrite, task lists) to break the plan into actionable tasks
   - Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
   - Carry each unit's `Execution note` into the task when present
@@ -96,14 +129,15 @@ This command takes a work document (plan, specification, or todo file) and execu

   | Strategy | When to use |
   |----------|-------------|
-   | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight |
-   | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks |
-   | **Parallel subagents** | 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete |
+   | **Inline** | 1-2 small tasks, or tasks needing user interaction mid-flight. **Default for bare-prompt work** — bare prompts rarely produce enough structured context to justify subagent dispatch |
+   | **Serial subagents** | 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks. Requires plan-unit metadata (Goal, Files, Approach, Test scenarios) |
+   | **Parallel subagents** | 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete. Requires plan-unit metadata |

   **Subagent dispatch** uses your available subagent or task spawning mechanism. For each unit, give the subagent:
   - The full plan file path (for overall context)
   - The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
   - Any resolved deferred questions relevant to that unit
+   - Instruction to check whether the unit's test scenarios cover all applicable categories (happy paths, edge cases, error paths, integration) and supplement gaps before writing tests

   After each subagent completes, update the plan checkboxes and task list before dispatching the next dependent unit.

@@ -118,12 +152,14 @@ This command takes a work document (plan, specification, or todo file) and execu
   ```
   while (tasks remain):
     - Mark task as in-progress
-     - Read any referenced files from the plan
+     - Read any referenced files from the plan or discovered during Phase 0
     - Look for similar patterns in codebase
+     - Find existing test files for implementation files being changed (Test Discovery — see below)
     - Implement following existing conventions
-     - Write tests for new functionality
+     - Add, update, or remove tests to match implementation changes (see Test Discovery below)
     - Run System-Wide Test Check (see below)
     - Run tests after changes
+     - Assess testing coverage: did this task change behavior? If yes, were tests written or updated? If no tests were added, is the justification deliberate (e.g., pure config, no behavioral change)?
     - Mark task as completed
     - Evaluate for incremental commit (see below)
   ```
@@ -136,6 +172,17 @@ This command takes a work document (plan, specification, or todo file) and execu
   - Do not over-implement beyond the current behavior slice when working test-first
   - Skip test-first discipline for trivial renames, pure configuration, and pure styling work

+   **Test Discovery** — Before implementing changes to a file, find its existing test files (search for test/spec files that import, reference, or share naming patterns with the implementation file). When a plan specifies test scenarios or test files, start there, then check for additional test coverage the plan may not have enumerated. Changes to implementation files should be accompanied by corresponding test updates — new tests for new behavior, modified tests for changed behavior, removed or updated tests for deleted behavior.
+
+   **Test Scenario Completeness** — Before writing tests for a feature-bearing unit, check whether the plan's `Test scenarios` cover all categories that apply to this unit. If a category is missing or scenarios are vague (e.g., "validates correctly" without naming inputs and expected outcomes), supplement from the unit's own context before writing tests:
+
+   | Category | When it applies | How to derive if missing |
+   |----------|----------------|------------------------|
+   | **Happy path** | Always for feature-bearing units | Read the unit's Goal and Approach for core input/output pairs |
+   | **Edge cases** | When the unit has meaningful boundaries (inputs, state, concurrency) | Identify boundary values, empty/nil inputs, and concurrent access patterns |
+   | **Error/failure paths** | When the unit has failure modes (validation, external calls, permissions) | Enumerate invalid inputs the unit should reject, permission/auth denials it should enforce, and downstream failures it should handle |
+   | **Integration** | When the unit crosses layers (callbacks, middleware, multi-service) | Identify the cross-layer chain and write a scenario that exercises it without mocks |
+
   **System-Wide Test Check** — Before marking a task done, pause and ask:

   | Question | What to do |
@@ -196,7 +243,7 @@ This command takes a work document (plan, specification, or todo file) and execu
   - Run relevant tests after each significant change
   - Don't wait until the end to test
   - Fix failures immediately
-   - Add new tests for new functionality
+   - Add new tests for new behavior, update tests for changed behavior, remove tests for deleted behavior
   - **Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together.** If your change touches callbacks, middleware, or error handling — you need both.

 5. **Simplify as You Go**
@@ -236,15 +283,21 @@ This command takes a work document (plan, specification, or todo file) and execu
   # Use linting-agent before pushing to origin
   ```

-2. **Consider Reviewer Agents** (Optional)
+2. **Code Review** (REQUIRED)

-   Use for complex, risky, or large changes. Read agents from `compound-engineering.local.md` frontmatter (`review_agents`). If no settings file, invoke the `setup` skill to create one.
+   Every change gets reviewed before shipping. The depth scales with the change's risk profile, but review itself is never skipped.

-   Run configured agents in parallel with Task tool. Present findings and address critical issues.
+   **Tier 2: Full review (default)** — REQUIRED unless Tier 1 criteria are explicitly met. Invoke the `ce:review` skill with `mode:autofix` to run specialized reviewer agents, auto-apply safe fixes, and surface residual work as todos. When the plan file path is known, pass it as `plan:<path>`. This is the mandatory default — proceed to Tier 1 only after confirming every criterion below.
+
+   **Tier 1: Inline self-review** — A lighter alternative permitted only when **all four** criteria are true. Before choosing Tier 1, explicitly state which criteria apply and why. If any criterion is uncertain, use Tier 2.
+   - Purely additive (new files only, no existing behavior modified)
+   - Single concern (one skill, one component — not cross-cutting)
+   - Pattern-following (implementation mirrors an existing example with no novel logic)
+   - Plan-faithful (no scope growth, no deferred questions resolved with surprising answers)

 3. **Final Validation**
   - All tasks marked completed
-   - All tests pass
+   - Testing addressed -- tests pass and new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
   - Linting passes
   - Code follows existing patterns
   - Figma designs match (if applicable)
@@ -264,44 +317,9 @@ This command takes a work document (plan, specification, or todo file) and execu

 ### Phase 4: Ship It

-1. **Create Commit**
+1. **Capture and Upload Screenshots for UI Changes** (REQUIRED for any UI work)

-   ```bash
-   git add .
-   git status  # Review what's being committed
-   git diff --staged  # Check the changes
-
-   # Commit with conventional format
-   git commit -m "$(cat <<'EOF'
-   feat(scope): description of what and why
-
-   Brief explanation if needed.
-
-   🤖 Generated with [MODEL] via [HARNESS](HARNESS_URL) + Compound Engineering v[VERSION]
-
-   Co-Authored-By: [MODEL] ([CONTEXT] context, [THINKING]) <noreply@anthropic.com>
-   EOF
-   )"
-   ```
-
-   **Fill in at commit/PR time:**
-
-   | Placeholder | Value | Example |
-   |-------------|-------|---------|
-   | Placeholder | Value | Example |
-   |-------------|-------|---------|
-   | `[MODEL]` | Model name | Claude Opus 4.6, GPT-5.4 |
-   | `[CONTEXT]` | Context window (if known) | 200K, 1M |
-   | `[THINKING]` | Thinking level (if known) | extended thinking |
-   | `[HARNESS]` | Tool running you | Claude Code, Codex, Gemini CLI |
-   | `[HARNESS_URL]` | Link to that tool | `https://claude.com/claude-code` |
-   | `[VERSION]` | `plugin.json` → `version` | 2.40.0 |
-
-   Subagents creating commits/PRs are equally responsible for accurate attribution.
-
-2. **Capture and Upload Screenshots for UI Changes** (REQUIRED for any UI work)
-
-   For **any** design changes, new views, or UI modifications, you MUST capture and upload screenshots:
+   For **any** design changes, new views, or UI modifications, capture and upload screenshots before creating the PR:

   **Step 1: Start dev server** (if not running)
   ```bash
@@ -329,65 +347,29 @@ This command takes a work document (plan, specification, or todo file) and execu
   - **Modified screens**: Before AND after screenshots
   - **Design implementation**: Screenshot showing Figma design match

-   **IMPORTANT**: Always include uploaded image URLs in PR description. This provides visual context for reviewers and documents the change.
+2. **Commit and Create Pull Request**

-3. **Create Pull Request**
+   Load the `git-commit-push-pr` skill to handle committing, pushing, and PR creation. The skill handles convention detection, branch safety, logical commit splitting, adaptive PR descriptions, and attribution badges.

-   ```bash
-   git push -u origin feature-branch-name
+   When providing context for the PR description, include:
+   - The plan's summary and key decisions
+   - Testing notes (tests added/modified, manual testing performed)
+   - Screenshot URLs from step 1 (if applicable)
+   - Figma design link (if applicable)
+   - The Post-Deploy Monitoring & Validation section (see Phase 3 Step 4)

-   gh pr create --title "Feature: [Description]" --body "$(cat <<'EOF'
-   ## Summary
-   - What was built
-   - Why it was needed
-   - Key decisions made
+   If the user prefers to commit without creating a PR, load the `git-commit` skill instead.

-   ## Testing
-   - Tests added/modified
-   - Manual testing performed
-
-   ## Post-Deploy Monitoring & Validation
-   - **What to monitor/search**
-     - Logs:
-     - Metrics/Dashboards:
-   - **Validation checks (queries/commands)**
-     - `command or query here`
-   - **Expected healthy behavior**
-     - Expected signal(s)
-   - **Failure signal(s) / rollback trigger**
-     - Trigger + immediate action
-   - **Validation window & owner**
-     - Window:
-     - Owner:
-   - **If no operational impact**
-     - `No additional operational monitoring required: <reason>`
-
-   ## Before / After Screenshots
-   | Before | After |
-   |--------|-------|
-   | ![before](URL) | ![after](URL) |
-
-   ## Figma Design
-   [Link if applicable]
-
-   ---
-
-   [![Compound Engineering v[VERSION]](https://img.shields.io/badge/Compound_Engineering-v[VERSION]-6366f1)](https://github.com/EveryInc/compound-engineering-plugin)
-   🤖 Generated with [MODEL] ([CONTEXT] context, [THINKING]) via [HARNESS](HARNESS_URL)
-   EOF
-   )"
-   ```
-
-4. **Update Plan Status**
+3. **Update Plan Status**

   If the input document has YAML frontmatter with a `status` field, update it to `completed`:
   ```
   status: active  →  status: completed
   ```

-5. **Notify User**
+4. **Notify User**
   - Summarize what was completed
-   - Link to PR
+   - Link to PR (if one was created)
   - Note any follow-up work needed
   - Suggest next steps if applicable

@@ -445,7 +427,7 @@ Most plans should use subagent dispatch from standard mode. Agent teams add sign
 - Follow existing patterns
 - Write tests for new code
 - Run linting before pushing
- Use reviewer agents for complex/risky changes only
+- Review every change — inline for simple additive work, full review for everything else

 ### Ship Complete Features

@@ -459,7 +441,7 @@ Before creating PR, verify:

 - [ ] All clarifying questions asked and answered
 - [ ] All tasks marked completed
- [ ] Tests pass (run project's test command)
+- [ ] Testing addressed -- tests pass AND new/changed behavior has corresponding test coverage (or an explicit justification for why tests are not needed)
 - [ ] Linting passes (use linting-agent)
 - [ ] Code follows existing patterns
 - [ ] Figma designs match implementation (if applicable)
@@ -467,20 +449,22 @@ Before creating PR, verify:
 - [ ] Commit messages follow conventional format
 - [ ] If new env vars added to backend config, deploy values files updated in same PR (not a follow-up)
 - [ ] PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
+- [ ] Code review completed (inline self-review or full `ce:review`)
 - [ ] PR description includes summary, testing notes, and screenshots
- [ ] PR description includes Compound Engineered badge with accurate model, harness, and version
+- [ ] If new env vars added to backend config, deploy values files updated in same PR (not a follow-up)
+- [ ] PR description includes Compound Engineered badge with accurate model and harness

-## When to Use Reviewer Agents
+## Code Review Tiers

-**Don't use by default.** Use reviewer agents only when:
+Every change gets reviewed. The tier determines depth, not whether review happens.

- Large refactor affecting many files (10+)
- Security-sensitive changes (authentication, permissions, data access)
- Performance-critical code paths
- Complex algorithms or business logic
- User explicitly requests thorough review
+**Tier 2 (full review)** — REQUIRED default. Invoke `ce:review mode:autofix` with `plan:<path>` when available. Safe fixes are applied automatically; residual work surfaces as todos. Always use this tier unless all four Tier 1 criteria are explicitly confirmed.

-For most features: tests + linting + following patterns is sufficient.
+**Tier 1 (inline self-review)** — permitted only when all four are true (state each explicitly before choosing):
+- Purely additive (new files only, no existing behavior modified)
+- Single concern (one skill, one component — not cross-cutting)
+- Pattern-following (mirrors an existing example, no novel logic)
+- Plan-faithful (no scope growth, no surprising deferred-question resolutions)

 ## Common Pitfalls to Avoid

@@ -490,4 +474,4 @@ For most features: tests + linting + following patterns is sufficient.
 - **Testing at the end** - Test continuously or suffer later
 - **Forgetting to track progress** - Update task status as you go or lose track of what's done
 - **80% done syndrome** - Finish the feature, don't move on early
- **Over-reviewing simple changes** - Save reviewer agents for complex work
+- **Skipping review** - Every change gets reviewed; only the depth varies