[2.14.0] Add /playwright-test command for browser testing

- New `/playwright-test` command for end-to-end browser tests on PR-affected pages - Uses Playwright MCP to navigate, snapshot, check console errors - Supports human-in-the-loop for OAuth/email/payment flows - Creates P1 todos for failures and retries until passing - Added Section 7 to `/workflows:review` - optional Playwright testing as subagent 🤖 Generated with [Claude Code](https://claude.com/claude-code) Co-Authored-By: Claude <noreply@anthropic.com>
2025-12-18 10:26:43 -08:00
parent d8ea046bd9
commit f619e261c4
5 changed files with 292 additions and 4 deletions
--- a/plugins/compound-engineering/.claude-plugin/plugin.json
+++ b/plugins/compound-engineering/.claude-plugin/plugin.json
@@ -1,7 +1,7 @@
 {
  "name": "compound-engineering",
-  "version": "2.13.0",
+  "version": "2.14.0",
-  "description": "AI-powered development tools. 27 agents, 17 commands, 13 skills, 2 MCP servers for code review, research, design, and workflow automation.",
+  "description": "AI-powered development tools. 27 agents, 18 commands, 13 skills, 2 MCP servers for code review, research, design, and workflow automation.",
  "author": {
    "name": "Kieran Klaassen",
    "email": "kieran@every.to",
--- a/plugins/compound-engineering/CHANGELOG.md
+++ b/plugins/compound-engineering/CHANGELOG.md
@@ -5,6 +5,16 @@ All notable changes to the compound-engineering plugin will be documented in thi
 The format is based on [Keep a Changelog](https://keepachangelog.com/en/1.0.0/),
 and this project adheres to [Semantic Versioning](https://semver.org/spec/v2.0.0.html).
 ## [2.14.0] - 2025-12-18
 ### Added
 - **`/playwright-test` command** - Run end-to-end browser tests on pages affected by a PR or branch. Uses Playwright MCP to navigate pages, capture snapshots, check console errors, test interactions, and pause for human verification on OAuth/email/payment flows. Creates P1 todos for failures and retries until passing.
 ### Changed
 - **`/workflows:review` command** - Added optional Playwright testing phase (Section 7). After review agents complete, offers to spawn `/playwright-test` as a subagent to verify affected pages in a real browser.
 ## [2.13.0] - 2025-12-15
 ### Added
--- a/plugins/compound-engineering/README.md
+++ b/plugins/compound-engineering/README.md
@@ -7,8 +7,8 @@ AI-powered development tools that get smarter with every use. Make each unit of
 | Component | Count |
 |-----------|-------|
 | Agents | 27 |
-| Commands | 17 |
+| Commands | 18 |
-| Skills | 12 |
+| Skills | 13 |
 | MCP Servers | 2 |
 ## Agents
@@ -95,6 +95,7 @@ Core workflow commands use `workflows:` prefix to avoid collisions with built-in
 | `/resolve_pr_parallel` | Resolve PR comments in parallel |
 | `/resolve_todo_parallel` | Resolve todos in parallel |
 | `/triage` | Triage and prioritize issues |
 | `/playwright-test` | Run browser tests on PR-affected pages |
 ## Skills
--- a/plugins/compound-engineering/commands/playwright-test.md
+++ b/plugins/compound-engineering/commands/playwright-test.md
@@ -0,0 +1,248 @@
 ---
 name: playwright-test
 description: Run Playwright browser tests on pages affected by current PR or branch
 argument-hint: "[PR number, branch name, or 'current' for current branch]"
 ---
 # Playwright Test Command
 <command_purpose>Run end-to-end browser tests on pages affected by a PR or branch changes using Playwright MCP.</command_purpose>
 ## Introduction
 <role>QA Engineer specializing in browser-based end-to-end testing</role>
 This command tests affected pages in a real browser, catching issues that unit tests miss:
 - JavaScript integration bugs
 - CSS/layout regressions
 - User workflow breakages
 - Console errors
 ## Prerequisites
 <requirements>
 - Local development server running (e.g., `bin/dev`, `rails server`)
 - Playwright MCP server connected
 - Git repository with changes to test
 </requirements>
 ## Main Tasks
 ### 1. Determine Test Scope
 <test_target> $ARGUMENTS </test_target>
 <determine_scope>
 **If PR number provided:**
 ```bash
 gh pr view [number] --json files -q '.files[].path'
 ```
 **If 'current' or empty:**
 ```bash
 git diff --name-only main...HEAD
 ```
 **If branch name provided:**
 ```bash
 git diff --name-only main...[branch]
 ```
 </determine_scope>
 ### 2. Map Files to Routes
 <file_to_route_mapping>
 Map changed files to testable routes:
 | File Pattern | Route(s) |
 |-------------|----------|
 | `app/views/users/*` | `/users`, `/users/:id`, `/users/new` |
 | `app/controllers/settings_controller.rb` | `/settings` |
 | `app/javascript/controllers/*_controller.js` | Pages using that Stimulus controller |
 | `app/components/*_component.rb` | Pages rendering that component |
 | `app/views/layouts/*` | All pages (test homepage at minimum) |
 | `app/assets/stylesheets/*` | Visual regression on key pages |
 | `app/helpers/*_helper.rb` | Pages using that helper |
 Build a list of URLs to test based on the mapping.
 </file_to_route_mapping>
 ### 3. Verify Server is Running
 <check_server>
 Before testing, verify the local server is accessible:
 ```
 mcp__playwright__browser_navigate({ url: "http://localhost:3000" })
 mcp__playwright__browser_snapshot({})
 ```
 If server is not running, inform user:
 ```markdown
 **Server not running**
 Please start your development server:
 - Rails: `bin/dev` or `rails server`
 - Node: `npm run dev`
 Then run `/playwright-test` again.
 ```
 </check_server>
 ### 4. Test Each Affected Page
 <test_pages>
 For each affected route:
 **Step 1: Navigate and capture snapshot**
 ```
 mcp__playwright__browser_navigate({ url: "http://localhost:3000/[route]" })
 mcp__playwright__browser_snapshot({})
 ```
 **Step 2: Check for errors**
 ```
 mcp__playwright__browser_console_messages({ level: "error" })
 ```
 **Step 3: Verify key elements**
 - Page title/heading present
 - Primary content rendered
 - No error messages visible
 - Forms have expected fields
 **Step 4: Test critical interactions (if applicable)**
 ```
 mcp__playwright__browser_click({ element: "[description]", ref: "[ref]" })
 mcp__playwright__browser_snapshot({})
 ```
 </test_pages>
 ### 5. Human Verification (When Required)
 <human_verification>
 Pause for human input when testing touches:
 | Flow Type | What to Ask |
 |-----------|-------------|
 | OAuth | "Please sign in with [provider] and confirm it works" |
 | Email | "Check your inbox for the test email and confirm receipt" |
 | Payments | "Complete a test purchase in sandbox mode" |
 | SMS | "Verify you received the SMS code" |
 | External APIs | "Confirm the [service] integration is working" |
 Use AskUserQuestion:
 ```markdown
 **Human Verification Needed**
 This test touches the [flow type]. Please:
 1. [Action to take]
 2. [What to verify]
 Did it work correctly?
 1. Yes - continue testing
 2. No - describe the issue
 ```
 </human_verification>
 ### 6. Handle Failures
 <failure_handling>
 When a test fails:
 1. **Document the failure:**
   - Screenshot the error state
   - Capture console errors
   - Note the exact reproduction steps
 2. **Ask user how to proceed:**
   ```markdown
   **Test Failed: [route]**
   Issue: [description]
   Console errors: [if any]
   How to proceed?
   1. Fix now - I'll help debug and fix
   2. Create todo - Add to todos/ for later
   3. Skip - Continue testing other pages
   ```
 3. **If "Fix now":**
   - Investigate the issue
   - Propose a fix
   - Apply fix
   - Re-run the failing test
 4. **If "Create todo":**
   - Create `{id}-pending-p1-playwright-{description}.md`
   - Continue testing
 5. **If "Skip":**
   - Log as skipped
   - Continue testing
 </failure_handling>
 ### 7. Test Summary
 <test_summary>
 After all tests complete, present summary:
 ```markdown
 ## 🎭 Playwright Test Results
 **Test Scope:** PR #[number] / [branch name]
 **Server:** http://localhost:3000
 ### Pages Tested: [count]
 | Route | Status | Notes |
 |-------|--------|-------|
 | `/users` | ✅ Pass | |
 | `/settings` | ✅ Pass | |
 | `/dashboard` | ❌ Fail | Console error: [msg] |
 | `/checkout` | ⏭️ Skip | Requires payment credentials |
 ### Console Errors: [count]
 - [List any errors found]
 ### Human Verifications: [count]
 - OAuth flow: ✅ Confirmed
 - Email delivery: ✅ Confirmed
 ### Failures: [count]
 - `/dashboard` - [issue description]
 ### Created Todos: [count]
 - `005-pending-p1-playwright-dashboard-error.md`
 ### Result: [PASS / FAIL / PARTIAL]
 ```
 </test_summary>
 ## Quick Usage Examples
 ```bash
 # Test current branch changes
 /playwright-test
 # Test specific PR
 /playwright-test 847
 # Test specific branch
 /playwright-test feature/new-dashboard
 ```
--- a/plugins/compound-engineering/commands/workflows/review.md
+++ b/plugins/compound-engineering/commands/workflows/review.md
@@ -425,6 +425,35 @@ After creating all todo files, present comprehensive summary:
 ```
 ### 7. Playwright Testing (Optional)
 <offer_testing>
 After presenting the Summary Report, ask the user:
 **"Want to run Playwright tests on the affected pages?"**
 1. Yes - run `/playwright-test`
 2. No - skip to next steps
 </offer_testing>
 #### If User Accepts:
 Spawn a subagent to run the tests (preserves main context):
 ```
 Task general-purpose("Run /playwright-test for PR #[number]. Test all affected pages, check for console errors, handle failures by creating todos and fixing.")
 ```
 The subagent will:
 1. Identify pages affected by the PR
 2. Navigate to each page and capture snapshots
 3. Check for console errors
 4. Test critical interactions
 5. Pause for human verification on OAuth/email/payment flows
 6. Create P1 todos for any failures
 7. Fix and retry until all tests pass
 **Alternatively**, user can run standalone: `/playwright-test [PR number]`
 ### Important: P1 Findings Block Merge
 Any **🔴 P1 (CRITICAL)** findings must be addressed before merging the PR. Present these prominently and ensure they're resolved before accepting the PR.