22 KiB
name, description, argument-hint
| name | description | argument-hint |
|---|---|---|
| ce:work | Execute work plans efficiently while maintaining quality and finishing features | [plan file, specification, or todo file path] |
Work Plan Execution Command
Execute a work plan efficiently while maintaining quality and finishing features.
Introduction
This command takes a work document (plan, specification, or todo file) and executes it systematically. The focus is on shipping complete features by understanding requirements quickly, following existing patterns, and maintaining quality throughout.
Input Document
<input_document> #$ARGUMENTS </input_document>
Execution Workflow
Phase 1: Quick Start
-
Read Plan and Clarify
- Read the work document completely
- Treat the plan as a decision artifact, not an execution script
- If the plan includes sections such as
Implementation Units,Work Breakdown,Requirements Trace,Files,Test Scenarios, orVerification, use those as the primary source material for execution - Check for
Execution noteon each implementation unit — these carry the plan's execution posture signal for that unit (for example, test-first or characterization-first). Note them when creating tasks. - Check for a
Deferred to ImplementationorImplementation-Time Unknownssection — these are questions the planner intentionally left for you to resolve during execution. Note them before starting so they inform your approach rather than surprising you mid-task - Check for a
Scope Boundariessection — these are explicit non-goals. Refer back to them if implementation starts pulling you toward adjacent work - Review any references or links provided in the plan
- If the user explicitly asks for TDD, test-first, or characterization-first execution in this session, honor that request even if the plan has no
Execution note - If anything is unclear or ambiguous, ask clarifying questions now
- Get user approval to proceed
- Do not skip this - better to ask questions now than build the wrong thing
-
Setup Environment
First, check the current branch:
current_branch=$(git branch --show-current) default_branch=$(git symbolic-ref refs/remotes/origin/HEAD 2>/dev/null | sed 's@^refs/remotes/origin/@@') # Fallback if remote HEAD isn't set if [ -z "$default_branch" ]; then default_branch=$(git rev-parse --verify origin/main >/dev/null 2>&1 && echo "main" || echo "master") fiIf already on a feature branch (not the default branch):
- Ask: "Continue working on
[current_branch], or create a new branch?" - If continuing, proceed to step 3
- If creating new, follow Option A or B below
If on the default branch, choose how to proceed:
Option A: Create a new branch
git pull origin [default_branch] git checkout -b feature-branch-nameUse a meaningful name based on the work (e.g.,
feat/user-authentication,fix/email-validation).Option B: Use a worktree (recommended for parallel development)
skill: git-worktree # The skill will create a new branch from the default branch in an isolated worktreeOption C: Continue on the default branch
- Requires explicit user confirmation
- Only proceed after user explicitly says "yes, commit to [default_branch]"
- Never commit directly to the default branch without explicit permission
Recommendation: Use worktree if:
- You want to work on multiple features simultaneously
- You want to keep the default branch clean while experimenting
- You plan to switch between branches frequently
- Ask: "Continue working on
-
Create Todo List
- Use your available task tracking tool (e.g., TodoWrite, task lists) to break the plan into actionable tasks
- Derive tasks from the plan's implementation units, dependencies, files, test targets, and verification criteria
- Carry each unit's
Execution noteinto the task when present - For each unit, read the
Patterns to followfield before implementing — these point to specific files or conventions to mirror - Use each unit's
Verificationfield as the primary "done" signal for that task - Do not expect the plan to contain implementation code, micro-step TDD instructions, or exact shell commands
- Include dependencies between tasks
- Prioritize based on what needs to be done first
- Include testing and quality check tasks
- Keep tasks specific and completable
-
Choose Execution Strategy
After creating the task list, decide how to execute based on the plan's size and dependency structure:
Strategy When to use Inline 1-2 small tasks, or tasks needing user interaction mid-flight Serial subagents 3+ tasks with dependencies between them. Each subagent gets a fresh context window focused on one unit — prevents context degradation across many tasks Parallel subagents 3+ tasks where some units have no shared dependencies and touch non-overlapping files. Dispatch independent units simultaneously, run dependent units after their prerequisites complete Subagent dispatch uses your available subagent or task spawning mechanism. For each unit, give the subagent:
- The full plan file path (for overall context)
- The specific unit's Goal, Files, Approach, Execution note, Patterns, Test scenarios, and Verification
- Any resolved deferred questions relevant to that unit
After each subagent completes, update the plan checkboxes and task list before dispatching the next dependent unit.
For genuinely large plans needing persistent inter-agent communication (agents challenging each other's approaches, shared coordination across 10+ tasks), see Swarm Mode below which uses Agent Teams.
Phase 2: Execute
-
Task Execution Loop
For each task in priority order:
while (tasks remain): - Mark task as in-progress - Read any referenced files from the plan - Look for similar patterns in codebase - Implement following existing conventions - Write tests for new functionality - Run System-Wide Test Check (see below) - Run tests after changes - Mark task as completed - Evaluate for incremental commit (see below)When a unit carries an
Execution note, honor it. For test-first units, write the failing test before implementation for that unit. For characterization-first units, capture existing behavior before changing it. For units without anExecution note, proceed pragmatically.Guardrails for execution posture:
- Do not write the test and implementation in the same step when working test-first
- Do not skip verifying that a new test fails before implementing the fix or feature
- Do not over-implement beyond the current behavior slice when working test-first
- Skip test-first discipline for trivial renames, pure configuration, and pure styling work
System-Wide Test Check — Before marking a task done, pause and ask:
Question What to do What fires when this runs? Callbacks, middleware, observers, event handlers — trace two levels out from your change. Read the actual code (not docs) for callbacks on models you touch, middleware in the request chain, after_*hooks.Do my tests exercise the real chain? If every dependency is mocked, the test proves your logic works in isolation — it says nothing about the interaction. Write at least one integration test that uses real objects through the full callback/middleware chain. No mocks for the layers that interact. Can failure leave orphaned state? If your code persists state (DB row, cache, file) before calling an external service, what happens when the service fails? Does retry create duplicates? Trace the failure path with real objects. If state is created before the risky call, test that failure cleans up or that retry is idempotent. What other interfaces expose this? Mixins, DSLs, alternative entry points (Agent vs Chat vs ChatMethods). Grep for the method/behavior in related classes. If parity is needed, add it now — not as a follow-up. Do error strategies align across layers? Retry middleware + application fallback + framework error handling — do they conflict or create double execution? List the specific error classes at each layer. Verify your rescue list matches what the lower layer actually raises. When to skip: Leaf-node changes with no callbacks, no state persistence, no parallel interfaces. If the change is purely additive (new helper method, new view partial), the check takes 10 seconds and the answer is "nothing fires, skip."
When this matters most: Any change that touches models with callbacks, error handling with fallback/retry, or functionality exposed through multiple interfaces.
-
Incremental Commits
After completing each task, evaluate whether to create an incremental commit:
Commit when... Don't commit when... Logical unit complete (model, service, component) Small part of a larger unit Tests pass + meaningful progress Tests failing About to switch contexts (backend → frontend) Purely scaffolding with no behavior About to attempt risky/uncertain changes Would need a "WIP" commit message Heuristic: "Can I write a commit message that describes a complete, valuable change? If yes, commit. If the message would be 'WIP' or 'partial X', wait."
If the plan has Implementation Units, use them as a starting guide for commit boundaries — but adapt based on what you find during implementation. A unit might need multiple commits if it's larger than expected, or small related units might land together. Use each unit's Goal to inform the commit message.
Commit workflow:
# 1. Verify tests pass (use project's test command) # Examples: bin/rails test, npm test, pytest, go test, etc. # 2. Stage only files related to this logical unit (not `git add .`) git add <files related to this logical unit> # 3. Commit with conventional message git commit -m "feat(scope): description of this unit"Handling merge conflicts: If conflicts arise during rebasing or merging, resolve them immediately. Incremental commits make conflict resolution easier since each commit is small and focused.
Note: Incremental commits use clean conventional messages without attribution footers. The final Phase 4 commit/PR includes the full attribution.
-
Follow Existing Patterns
- The plan should reference similar code - read those files first
- Match naming conventions exactly
- Reuse existing components where possible
- Follow project coding standards (see AGENTS.md; use CLAUDE.md only if the repo still keeps a compatibility shim)
- When in doubt, grep for similar implementations
-
Test Continuously
- Run relevant tests after each significant change
- Don't wait until the end to test
- Fix failures immediately
- Add new tests for new functionality
- Unit tests with mocks prove logic in isolation. Integration tests with real objects prove the layers work together. If your change touches callbacks, middleware, or error handling — you need both.
-
Simplify as You Go
After completing a cluster of related implementation units (or every 2-3 units), review recently changed files for simplification opportunities — consolidate duplicated patterns, extract shared helpers, and improve code reuse and efficiency. This is especially valuable when using subagents, since each agent works with isolated context and can't see patterns emerging across units.
Don't simplify after every single unit — early patterns may look duplicated but diverge intentionally in later units. Wait for a natural phase boundary or when you notice accumulated complexity.
If a
/simplifyskill or equivalent is available, use it. Otherwise, review the changed files yourself for reuse and consolidation opportunities. -
Figma Design Sync (if applicable)
For UI work with Figma designs:
- Implement components following design specs
- Use figma-design-sync agent iteratively to compare
- Fix visual differences identified
- Repeat until implementation matches design
-
Track Progress
- Keep the task list updated as you complete tasks
- Note any blockers or unexpected discoveries
- Create new tasks if scope expands
- Keep user informed of major milestones
Phase 3: Quality Check
-
Run Core Quality Checks
Always run before submitting:
# Run full test suite (use project's test command) # Examples: bin/rails test, npm test, pytest, go test, etc. # Run linting (per AGENTS.md) # Use linting-agent before pushing to origin -
Consider Reviewer Agents (Optional)
Use for complex, risky, or large changes. Read agents from
compound-engineering.local.mdfrontmatter (review_agents). If no settings file, invoke thesetupskill to create one.Run configured agents in parallel with Task tool. Present findings and address critical issues.
-
Final Validation
- All tasks marked completed
- All tests pass
- Linting passes
- Code follows existing patterns
- Figma designs match (if applicable)
- No console errors or warnings
- If the plan has a
Requirements Trace, verify each requirement is satisfied by the completed work - If any
Deferred to Implementationquestions were noted, confirm they were resolved during execution
-
Prepare Operational Validation Plan (REQUIRED)
- Add a
## Post-Deploy Monitoring & Validationsection to the PR description for every change. - Include concrete:
- Log queries/search terms
- Metrics or dashboards to watch
- Expected healthy signals
- Failure signals and rollback/mitigation trigger
- Validation window and owner
- If there is truly no production/runtime impact, still include the section with:
No additional operational monitoring requiredand a one-line reason.
- Add a
Phase 4: Ship It
-
Create Commit
git add . git status # Review what's being committed git diff --staged # Check the changes # Commit with conventional format git commit -m "$(cat <<'EOF' feat(scope): description of what and why Brief explanation if needed. 🤖 Generated with [MODEL] via [HARNESS](HARNESS_URL) + Compound Engineering v[VERSION] Co-Authored-By: [MODEL] ([CONTEXT] context, [THINKING]) <noreply@anthropic.com> EOF )"Fill in at commit/PR time:
Placeholder Value Example Placeholder Value Example ------------- ------- --------- [MODEL]Model name Claude Opus 4.6, GPT-5.4 [CONTEXT]Context window (if known) 200K, 1M [THINKING]Thinking level (if known) extended thinking [HARNESS]Tool running you Claude Code, Codex, Gemini CLI [HARNESS_URL]Link to that tool https://claude.com/claude-code[VERSION]plugin.json→version2.40.0 Subagents creating commits/PRs are equally responsible for accurate attribution.
-
Capture and Upload Screenshots for UI Changes (REQUIRED for any UI work)
For any design changes, new views, or UI modifications, you MUST capture and upload screenshots:
Step 1: Start dev server (if not running)
bin/dev # Run in backgroundStep 2: Capture screenshots with agent-browser CLI
agent-browser open http://localhost:3000/[route] agent-browser snapshot -i agent-browser screenshot output.pngSee the
agent-browserskill for detailed usage.Step 3: Upload using imgup skill
skill: imgup # Then upload each screenshot: imgup -h pixhost screenshot.png # pixhost works without API key # Alternative hosts: catbox, imagebin, beeimgWhat to capture:
- New screens: Screenshot of the new UI
- Modified screens: Before AND after screenshots
- Design implementation: Screenshot showing Figma design match
IMPORTANT: Always include uploaded image URLs in PR description. This provides visual context for reviewers and documents the change.
-
Create Pull Request
git push -u origin feature-branch-name gh pr create --title "Feature: [Description]" --body "$(cat <<'EOF' ## Summary - What was built - Why it was needed - Key decisions made ## Testing - Tests added/modified - Manual testing performed ## Post-Deploy Monitoring & Validation - **What to monitor/search** - Logs: - Metrics/Dashboards: - **Validation checks (queries/commands)** - `command or query here` - **Expected healthy behavior** - Expected signal(s) - **Failure signal(s) / rollback trigger** - Trigger + immediate action - **Validation window & owner** - Window: - Owner: - **If no operational impact** - `No additional operational monitoring required: <reason>` ## Before / After Screenshots | Before | After | |--------|-------| |  |  | ## Figma Design [Link if applicable] --- [![Compound Engineering v[VERSION]](https://img.shields.io/badge/Compound_Engineering-v[VERSION]-6366f1)](https://github.com/EveryInc/compound-engineering-plugin) 🤖 Generated with [MODEL] ([CONTEXT] context, [THINKING]) via [HARNESS](HARNESS_URL) EOF )" -
Update Plan Status
If the input document has YAML frontmatter with a
statusfield, update it tocompleted:status: active → status: completed -
Notify User
- Summarize what was completed
- Link to PR
- Note any follow-up work needed
- Suggest next steps if applicable
Swarm Mode with Agent Teams (Optional)
For genuinely large plans where agents need to communicate with each other, challenge approaches, or coordinate across 10+ tasks with persistent specialized roles, use agent team capabilities if available (e.g., Agent Teams in Claude Code, multi-agent workflows in Codex).
Agent teams are typically experimental and require opt-in. Do not attempt to use agent teams unless the user explicitly requests swarm mode or agent teams, and the platform supports it.
When to Use Agent Teams vs Subagents
| Agent Teams | Subagents (standard mode) |
|---|---|
| Agents need to discuss and challenge each other's approaches | Each task is independent — only the result matters |
| Persistent specialized roles (e.g., dedicated tester running continuously) | Workers report back and finish |
| 10+ tasks with complex cross-cutting coordination | 3-8 tasks with clear dependency chains |
| User explicitly requests "swarm mode" or "agent teams" | Default for most plans |
Most plans should use subagent dispatch from standard mode. Agent teams add significant token cost and coordination overhead — use them when the inter-agent communication genuinely improves the outcome.
Agent Teams Workflow
- Create team — use your available team creation mechanism
- Create task list — parse Implementation Units into tasks with dependency relationships
- Spawn teammates — assign specialized roles (implementer, tester, reviewer) based on the plan's needs. Give each teammate the plan file path and their specific task assignments
- Coordinate — the lead monitors task completion, reassigns work if someone gets stuck, and spawns additional workers as phases unblock
- Cleanup — shut down all teammates, then clean up the team resources
Key Principles
Start Fast, Execute Faster
- Get clarification once at the start, then execute
- Don't wait for perfect understanding - ask questions and move
- The goal is to finish the feature, not create perfect process
The Plan is Your Guide
- Work documents should reference similar code and patterns
- Load those references and follow them
- Don't reinvent - match what exists
Test As You Go
- Run tests after each change, not at the end
- Fix failures immediately
- Continuous testing prevents big surprises
Quality is Built In
- Follow existing patterns
- Write tests for new code
- Run linting before pushing
- Use reviewer agents for complex/risky changes only
Ship Complete Features
- Mark all tasks completed before moving on
- Don't leave features 80% done
- A finished feature that ships beats a perfect feature that doesn't
Quality Checklist
Before creating PR, verify:
- All clarifying questions asked and answered
- All tasks marked completed
- Tests pass (run project's test command)
- Linting passes (use linting-agent)
- Code follows existing patterns
- Figma designs match implementation (if applicable)
- Before/after screenshots captured and uploaded (for UI changes)
- Commit messages follow conventional format
- PR description includes Post-Deploy Monitoring & Validation section (or explicit no-impact rationale)
- PR description includes summary, testing notes, and screenshots
- PR description includes Compound Engineered badge with accurate model, harness, and version
When to Use Reviewer Agents
Don't use by default. Use reviewer agents only when:
- Large refactor affecting many files (10+)
- Security-sensitive changes (authentication, permissions, data access)
- Performance-critical code paths
- Complex algorithms or business logic
- User explicitly requests thorough review
For most features: tests + linting + following patterns is sufficient.
Common Pitfalls to Avoid
- Analysis paralysis - Don't overthink, read the plan and execute
- Skipping clarifying questions - Ask now, not after building wrong thing
- Ignoring plan references - The plan has links for a reason
- Testing at the end - Test continuously or suffer later
- Forgetting to track progress - Update task status as you go or lose track of what's done
- 80% done syndrome - Finish the feature, don't move on early
- Over-reviewing simple changes - Save reviewer agents for complex work