feat: Replace Playwright MCP with agent-browser CLI

- Remove Playwright MCP server from plugin
- Add new agent-browser skill for CLI-based browser automation
- Rename /playwright-test to /test-browser command
- Update all commands and agents to use agent-browser CLI
- Update README and plugin.json

agent-browser is Vercel's headless browser CLI designed for AI agents.
It uses ref-based selection (@e1, @e2) from accessibility snapshots
and provides a simpler CLI interface compared to MCP tools.

Key benefits:
- No MCP server required
- Simpler Bash-based workflow
- Same ref-based element selection
- Better for quick automation tasks

🤖 Generated with [Claude Code](https://claude.com/claude-code)

Co-Authored-By: Claude <noreply@anthropic.com>
This commit is contained in:
Kieran Klaassen
2026-01-14 15:56:59 -08:00
parent 012a638d31
commit 31bd85f60b
11 changed files with 398 additions and 152 deletions

View File

@@ -1,7 +1,7 @@
{ {
"name": "compound-engineering", "name": "compound-engineering",
"version": "2.23.1", "version": "2.24.0",
"description": "AI-powered development tools. 27 agents, 21 commands, 13 skills, 2 MCP servers for code review, research, design, and workflow automation.", "description": "AI-powered development tools. 27 agents, 20 commands, 14 skills, 1 MCP server for code review, research, design, and workflow automation.",
"author": { "author": {
"name": "Kieran Klaassen", "name": "Kieran Klaassen",
"email": "kieran@every.to", "email": "kieran@every.to",
@@ -21,16 +21,10 @@
"typescript", "typescript",
"knowledge-management", "knowledge-management",
"image-generation", "image-generation",
"playwright", "agent-browser",
"browser-automation" "browser-automation"
], ],
"mcpServers": { "mcpServers": {
"pw": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@playwright/mcp@latest"],
"env": {}
},
"context7": { "context7": {
"type": "http", "type": "http",
"url": "https://mcp.context7.com/mcp" "url": "https://mcp.context7.com/mcp"

View File

@@ -8,8 +8,8 @@ AI-powered development tools that get smarter with every use. Make each unit of
|-----------|-------| |-----------|-------|
| Agents | 27 | | Agents | 27 |
| Commands | 20 | | Commands | 20 |
| Skills | 13 | | Skills | 14 |
| MCP Servers | 2 | | MCP Servers | 1 |
## Agents ## Agents
@@ -96,7 +96,7 @@ Core workflow commands use `workflows:` prefix to avoid collisions with built-in
| `/resolve_pr_parallel` | Resolve PR comments in parallel | | `/resolve_pr_parallel` | Resolve PR comments in parallel |
| `/resolve_todo_parallel` | Resolve todos in parallel | | `/resolve_todo_parallel` | Resolve todos in parallel |
| `/triage` | Triage and prioritize issues | | `/triage` | Triage and prioritize issues |
| `/playwright-test` | Run browser tests on PR-affected pages | | `/test-browser` | Run browser tests on PR-affected pages |
| `/xcode-test` | Build and test iOS apps on simulator | | `/xcode-test` | Build and test iOS apps on simulator |
| `/feature-video` | Record video walkthroughs and add to PR description | | `/feature-video` | Record video walkthroughs and add to PR description |
@@ -134,6 +134,12 @@ Core workflow commands use `workflows:` prefix to avoid collisions with built-in
|-------|-------------| |-------|-------------|
| `rclone` | Upload files to S3, Cloudflare R2, Backblaze B2, and cloud storage | | `rclone` | Upload files to S3, Cloudflare R2, Backblaze B2, and cloud storage |
### Browser Automation
| Skill | Description |
|-------|-------------|
| `agent-browser` | CLI-based browser automation using Vercel's agent-browser |
### Image Generation ### Image Generation
| Skill | Description | | Skill | Description |
@@ -154,19 +160,8 @@ Core workflow commands use `workflows:` prefix to avoid collisions with built-in
| Server | Description | | Server | Description |
|--------|-------------| |--------|-------------|
| `playwright` | Browser automation via `@playwright/mcp` |
| `context7` | Framework documentation lookup via Context7 | | `context7` | Framework documentation lookup via Context7 |
### Playwright
**Tools provided:**
- `browser_navigate` - Navigate to URLs
- `browser_take_screenshot` - Take screenshots
- `browser_click` - Click elements
- `browser_fill_form` - Fill form fields
- `browser_snapshot` - Get accessibility snapshot
- `browser_evaluate` - Execute JavaScript
### Context7 ### Context7
**Tools provided:** **Tools provided:**
@@ -177,6 +172,17 @@ Supports 100+ frameworks including Rails, React, Next.js, Vue, Django, Laravel,
MCP servers start automatically when the plugin is enabled. MCP servers start automatically when the plugin is enabled.
## Browser Automation
This plugin uses **agent-browser CLI** for browser automation tasks. Install it globally:
```bash
npm install -g agent-browser
agent-browser install # Downloads Chromium
```
The `agent-browser` skill provides comprehensive documentation on usage.
## Installation ## Installation
```bash ```bash
@@ -187,19 +193,13 @@ claude /plugin install compound-engineering
### MCP Servers Not Auto-Loading ### MCP Servers Not Auto-Loading
**Issue:** The bundled MCP servers (Playwright and Context7) may not load automatically when the plugin is installed. **Issue:** The bundled Context7 MCP server may not load automatically when the plugin is installed.
**Workaround:** Manually add them to your project's `.claude/settings.json`: **Workaround:** Manually add it to your project's `.claude/settings.json`:
```json ```json
{ {
"mcpServers": { "mcpServers": {
"playwright": {
"type": "stdio",
"command": "npx",
"args": ["-y", "@playwright/mcp@latest"],
"env": {}
},
"context7": { "context7": {
"type": "http", "type": "http",
"url": "https://mcp.context7.com/mcp" "url": "https://mcp.context7.com/mcp"
@@ -208,7 +208,7 @@ claude /plugin install compound-engineering
} }
``` ```
Or add them globally in `~/.claude/settings.json` for all projects. Or add it globally in `~/.claude/settings.json` for all projects.
## Version History ## Version History

View File

@@ -11,11 +11,20 @@ Your primary responsibility is to conduct thorough visual comparisons between im
## Your Workflow ## Your Workflow
1. **Capture Implementation State** 1. **Capture Implementation State**
- Use the Playwright MCP to capture screenshots of the implemented UI - Use agent-browser CLI to capture screenshots of the implemented UI
- Test different viewport sizes if the design includes responsive breakpoints - Test different viewport sizes if the design includes responsive breakpoints
- Capture interactive states (hover, focus, active) when relevant - Capture interactive states (hover, focus, active) when relevant
- Document the URL and selectors of the components being reviewed - Document the URL and selectors of the components being reviewed
```bash
agent-browser open [url]
agent-browser snapshot -i
agent-browser screenshot output.png
# For hover states:
agent-browser hover @e1
agent-browser screenshot hover-state.png
```
2. **Retrieve Design Specifications** 2. **Retrieve Design Specifications**
- Use the Figma MCP to access the corresponding design files - Use the Figma MCP to access the corresponding design files
- Extract design tokens (colors, typography, spacing, shadows) - Extract design tokens (colors, typography, spacing, shadows)

View File

@@ -23,50 +23,42 @@ For each iteration cycle, you must:
### Setup: Set Appropriate Window Size ### Setup: Set Appropriate Window Size
Before starting iterations, resize the browser to fit your target area: Before starting iterations, open the browser in headed mode to see and resize as needed:
```bash
agent-browser --headed open [url]
``` ```
browser_resize with width and height appropriate for the component:
Recommended viewport sizes for reference:
- Small component (button, card): 800x600 - Small component (button, card): 800x600
- Medium section (hero, features): 1200x800 - Medium section (hero, features): 1200x800
- Full page section: 1440x900 - Full page section: 1440x900
```
### Taking Element Screenshots ### Taking Element Screenshots
Use `browser_take_screenshot` with element targeting: 1. First, get element references with `agent-browser snapshot -i`
2. Find the ref for your target element (e.g., @e1, @e2)
3. Use `agent-browser scrollintoview @e1` to focus on specific elements
4. Take screenshot: `agent-browser screenshot output.png`
1. First, take a `browser_snapshot` to get element references ### Viewport Screenshots
2. Find the `ref` for your target element (e.g., a section, div, or component)
3. Screenshot that specific element:
``` For focused screenshots:
browser_take_screenshot with: 1. Use `agent-browser scrollintoview @e1` to scroll element into view
- element: "Hero section" (human-readable description) 2. Take viewport screenshot: `agent-browser screenshot output.png`
- ref: "E123" (exact ref from snapshot)
```
### Fallback: Viewport Screenshots
If the element doesn't have a clear ref, ensure the browser viewport shows only your target area:
1. Use `browser_resize` to set viewport to component dimensions
2. Scroll the element into view using `browser_evaluate`
3. Take a viewport screenshot (no element/ref params)
### Example Workflow ### Example Workflow
``` ```bash
1. browser_resize(width: 1200, height: 800) 1. agent-browser open [url]
2. browser_navigate to page 2. agent-browser snapshot -i # Get refs
3. browser_snapshot to see element refs 3. agent-browser screenshot output.png
4. browser_take_screenshot(element: "Features grid", ref: "E45") 4. [analyze and implement changes]
5. [analyze and implement changes] 5. agent-browser screenshot output-v2.png
6. browser_take_screenshot(element: "Features grid", ref: "E45") 6. [repeat...]
7. [repeat...]
``` ```
**Never use `fullPage: true`** - it captures unnecessary content and bloats context. **Keep screenshots focused** - capture only the element/area you're working on to reduce noise.
## Design Principles to Apply ## Design Principles to Apply

View File

@@ -11,7 +11,13 @@ You are an expert design-to-code synchronization specialist with deep expertise
1. **Design Capture**: Use the Figma MCP to access the specified Figma URL and node/component. Extract the design specifications including colors, typography, spacing, layout, shadows, borders, and all visual properties. Also take a screenshot and load it into the agent. 1. **Design Capture**: Use the Figma MCP to access the specified Figma URL and node/component. Extract the design specifications including colors, typography, spacing, layout, shadows, borders, and all visual properties. Also take a screenshot and load it into the agent.
2. **Implementation Capture**: Use the Playwright MCP to navigate to the specified web page/component URL and capture a high-quality screenshot of the current implementation. 2. **Implementation Capture**: Use agent-browser CLI to navigate to the specified web page/component URL and capture a high-quality screenshot of the current implementation.
```bash
agent-browser open [url]
agent-browser snapshot -i
agent-browser screenshot implementation.png
```
3. **Systematic Comparison**: Perform a meticulous visual comparison between the Figma design and the screenshot, analyzing: 3. **Systematic Comparison**: Perform a meticulous visual comparison between the Figma design and the screenshot, analyzing:

View File

@@ -19,7 +19,7 @@ When presented with a bug report, you will:
- Set up the minimal test case needed to reproduce the issue - Set up the minimal test case needed to reproduce the issue
- Execute the reproduction steps methodically, documenting each step - Execute the reproduction steps methodically, documenting each step
- If the bug involves data states, check fixtures or create appropriate test data - If the bug involves data states, check fixtures or create appropriate test data
- For UI bugs, consider using Playwright MCP if available to visually verify - For UI bugs, use agent-browser CLI to visually verify (see `agent-browser` skill)
- For backend bugs, examine logs, database states, and service interactions - For backend bugs, examine logs, database states, and service interactions
3. **Validation Methodology**: 3. **Validation Methodology**:

View File

@@ -13,7 +13,7 @@ argument-hint: "[PR number or 'current'] [optional: base URL, default localhost:
<role>Developer Relations Engineer creating feature demo videos</role> <role>Developer Relations Engineer creating feature demo videos</role>
This command creates professional video walkthroughs of features for PR documentation: This command creates professional video walkthroughs of features for PR documentation:
- Records browser interactions using Playwright video capture - Records browser interactions using agent-browser CLI
- Demonstrates the complete user flow - Demonstrates the complete user flow
- Uploads the video for easy sharing - Uploads the video for easy sharing
- Updates the PR description with an embedded video - Updates the PR description with an embedded video
@@ -22,12 +22,26 @@ This command creates professional video walkthroughs of features for PR document
<requirements> <requirements>
- Local development server running (e.g., `bin/dev`, `rails server`) - Local development server running (e.g., `bin/dev`, `rails server`)
- Playwright MCP server connected - agent-browser CLI installed
- Git repository with a PR to document - Git repository with a PR to document
- `ffmpeg` installed (for video conversion) - `ffmpeg` installed (for video conversion)
- `rclone` configured (optional, for cloud upload - see rclone skill) - `rclone` configured (optional, for cloud upload - see rclone skill)
</requirements> </requirements>
## Setup
**Check installation:**
```bash
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED"
```
**Install if needed:**
```bash
npm install -g agent-browser && agent-browser install
```
See the `agent-browser` skill for detailed usage.
## Main Tasks ## Main Tasks
### 1. Parse Arguments ### 1. Parse Arguments
@@ -118,26 +132,9 @@ Does this look right?
mkdir -p tmp/videos mkdir -p tmp/videos
``` ```
**Start browser with video recording using Playwright MCP:** **Recording approach: Use browser screenshots as frames**
Note: Playwright MCP's browser_navigate will be used, and we'll use browser_run_code to enable video recording: agent-browser captures screenshots at key moments, then combine into video using ffmpeg:
```javascript
// Enable video recording context
mcp__plugin_compound-engineering_pw__browser_run_code({
code: `async (page) => {
// Video recording is enabled at context level
// The MCP server handles this automatically
return 'Video recording active';
}`
})
```
**Alternative: Use browser screenshots as frames**
If video recording isn't available via MCP, fall back to:
1. Take screenshots at key moments
2. Combine into a GIF using ffmpeg
```bash ```bash
ffmpeg -framerate 2 -pattern_type glob -i 'tmp/screenshots/*.png' -vf "scale=1280:-1" tmp/videos/feature-demo.gif ffmpeg -framerate 2 -pattern_type glob -i 'tmp/screenshots/*.png' -vf "scale=1280:-1" tmp/videos/feature-demo.gif
@@ -152,32 +149,32 @@ ffmpeg -framerate 2 -pattern_type glob -i 'tmp/screenshots/*.png' -vf "scale=128
Execute the planned flow, capturing each step: Execute the planned flow, capturing each step:
**Step 1: Navigate to starting point** **Step 1: Navigate to starting point**
``` ```bash
mcp__plugin_compound-engineering_pw__browser_navigate({ url: "[base-url]/[start-route]" }) agent-browser open "[base-url]/[start-route]"
mcp__plugin_compound-engineering_pw__browser_wait_for({ time: 2 }) agent-browser wait 2000
mcp__plugin_compound-engineering_pw__browser_take_screenshot({ filename: "tmp/screenshots/01-start.png" }) agent-browser screenshot tmp/screenshots/01-start.png
``` ```
**Step 2: Perform navigation/interactions** **Step 2: Perform navigation/interactions**
``` ```bash
mcp__plugin_compound-engineering_pw__browser_click({ element: "[description]", ref: "[ref]" }) agent-browser snapshot -i # Get refs
mcp__plugin_compound-engineering_pw__browser_wait_for({ time: 1 }) agent-browser click @e1 # Click navigation element
mcp__plugin_compound-engineering_pw__browser_take_screenshot({ filename: "tmp/screenshots/02-navigate.png" }) agent-browser wait 1000
agent-browser screenshot tmp/screenshots/02-navigate.png
``` ```
**Step 3: Demonstrate feature** **Step 3: Demonstrate feature**
``` ```bash
mcp__plugin_compound-engineering_pw__browser_snapshot({}) agent-browser snapshot -i # Get refs for feature elements
// Identify interactive elements agent-browser click @e2 # Click feature element
mcp__plugin_compound-engineering_pw__browser_click({ element: "[feature element]", ref: "[ref]" }) agent-browser wait 1000
mcp__plugin_compound-engineering_pw__browser_wait_for({ time: 1 }) agent-browser screenshot tmp/screenshots/03-feature.png
mcp__plugin_compound-engineering_pw__browser_take_screenshot({ filename: "tmp/screenshots/03-feature.png" })
``` ```
**Step 4: Capture result** **Step 4: Capture result**
``` ```bash
mcp__plugin_compound-engineering_pw__browser_wait_for({ time: 2 }) agent-browser wait 2000
mcp__plugin_compound-engineering_pw__browser_take_screenshot({ filename: "tmp/screenshots/04-result.png" }) agent-browser screenshot tmp/screenshots/04-result.png
``` ```
**Create video/GIF from screenshots:** **Create video/GIF from screenshots:**
@@ -189,17 +186,14 @@ mkdir -p tmp/videos tmp/screenshots
# Create MP4 video (RECOMMENDED - better quality, smaller size) # Create MP4 video (RECOMMENDED - better quality, smaller size)
# -framerate 0.5 = 2 seconds per frame (slower playback) # -framerate 0.5 = 2 seconds per frame (slower playback)
# -framerate 1 = 1 second per frame # -framerate 1 = 1 second per frame
ffmpeg -y -framerate 0.5 -pattern_type glob -i '.playwright-mcp/tmp/screenshots/*.png' \ ffmpeg -y -framerate 0.5 -pattern_type glob -i 'tmp/screenshots/*.png' \
-c:v libx264 -pix_fmt yuv420p -vf "scale=1280:-2" \ -c:v libx264 -pix_fmt yuv420p -vf "scale=1280:-2" \
tmp/videos/feature-demo.mp4 tmp/videos/feature-demo.mp4
# Create low-quality GIF for preview (small file, for GitHub embed) # Create low-quality GIF for preview (small file, for GitHub embed)
ffmpeg -y -framerate 0.5 -pattern_type glob -i '.playwright-mcp/tmp/screenshots/*.png' \ ffmpeg -y -framerate 0.5 -pattern_type glob -i 'tmp/screenshots/*.png' \
-vf "scale=640:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=128[p];[s1][p]paletteuse" \ -vf "scale=640:-1:flags=lanczos,split[s0][s1];[s0]palettegen=max_colors=128[p];[s1][p]paletteuse" \
-loop 0 tmp/videos/feature-demo-preview.gif -loop 0 tmp/videos/feature-demo-preview.gif
# Copy screenshots to project folder for easy access
cp -r .playwright-mcp/tmp/screenshots tmp/
``` ```
**Note:** **Note:**

View File

@@ -1,12 +1,12 @@
--- ---
name: playwright-test name: test-browser
description: Run Playwright browser tests on pages affected by current PR or branch description: Run browser tests on pages affected by current PR or branch
argument-hint: "[PR number, branch name, or 'current' for current branch]" argument-hint: "[PR number, branch name, or 'current' for current branch]"
--- ---
# Playwright Test Command # Browser Test Command
<command_purpose>Run end-to-end browser tests on pages affected by a PR or branch changes using Playwright MCP.</command_purpose> <command_purpose>Run end-to-end browser tests on pages affected by a PR or branch changes using agent-browser CLI.</command_purpose>
## Introduction ## Introduction
@@ -22,10 +22,25 @@ This command tests affected pages in a real browser, catching issues that unit t
<requirements> <requirements>
- Local development server running (e.g., `bin/dev`, `rails server`) - Local development server running (e.g., `bin/dev`, `rails server`)
- Playwright MCP server connected - agent-browser CLI installed
- Git repository with changes to test - Git repository with changes to test
</requirements> </requirements>
## Setup
**Check installation:**
```bash
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED"
```
**Install if needed:**
```bash
npm install -g agent-browser
agent-browser install # Downloads Chromium
```
See the `agent-browser` skill for detailed usage.
## Main Tasks ## Main Tasks
### 1. Determine Test Scope ### 1. Determine Test Scope
@@ -77,9 +92,9 @@ Build a list of URLs to test based on the mapping.
Before testing, verify the local server is accessible: Before testing, verify the local server is accessible:
``` ```bash
mcp__playwright__browser_navigate({ url: "http://localhost:3000" }) agent-browser open http://localhost:3000
mcp__playwright__browser_snapshot({}) agent-browser snapshot -i
``` ```
If server is not running, inform user: If server is not running, inform user:
@@ -90,7 +105,7 @@ Please start your development server:
- Rails: `bin/dev` or `rails server` - Rails: `bin/dev` or `rails server`
- Node: `npm run dev` - Node: `npm run dev`
Then run `/playwright-test` again. Then run `/test-browser` again.
``` ```
</check_server> </check_server>
@@ -102,26 +117,27 @@ Then run `/playwright-test` again.
For each affected route: For each affected route:
**Step 1: Navigate and capture snapshot** **Step 1: Navigate and capture snapshot**
``` ```bash
mcp__playwright__browser_navigate({ url: "http://localhost:3000/[route]" }) agent-browser open "http://localhost:3000/[route]"
mcp__playwright__browser_snapshot({}) agent-browser snapshot -i
``` ```
**Step 2: Check for errors** **Step 2: Check for errors** (use headed mode for console inspection)
``` ```bash
mcp__playwright__browser_console_messages({ level: "error" }) agent-browser --headed open "http://localhost:3000/[route]"
``` ```
**Step 3: Verify key elements** **Step 3: Verify key elements**
- Use `agent-browser snapshot -i` to get interactive elements with refs
- Page title/heading present - Page title/heading present
- Primary content rendered - Primary content rendered
- No error messages visible - No error messages visible
- Forms have expected fields - Forms have expected fields
**Step 4: Test critical interactions (if applicable)** **Step 4: Test critical interactions**
``` ```bash
mcp__playwright__browser_click({ element: "[description]", ref: "[ref]" }) agent-browser click @e1 # Use ref from snapshot
mcp__playwright__browser_snapshot({}) agent-browser snapshot -i
``` ```
</test_pages> </test_pages>
@@ -162,8 +178,7 @@ Did it work correctly?
When a test fails: When a test fails:
1. **Document the failure:** 1. **Document the failure:**
- Screenshot the error state - Screenshot the error state: `agent-browser screenshot error.png`
- Capture console errors
- Note the exact reproduction steps - Note the exact reproduction steps
2. **Ask user how to proceed:** 2. **Ask user how to proceed:**
@@ -186,7 +201,7 @@ When a test fails:
- Re-run the failing test - Re-run the failing test
4. **If "Create todo":** 4. **If "Create todo":**
- Create `{id}-pending-p1-playwright-{description}.md` - Create `{id}-pending-p1-browser-test-{description}.md`
- Continue testing - Continue testing
5. **If "Skip":** 5. **If "Skip":**
@@ -202,7 +217,7 @@ When a test fails:
After all tests complete, present summary: After all tests complete, present summary:
```markdown ```markdown
## 🎭 Playwright Test Results ## Browser Test Results
**Test Scope:** PR #[number] / [branch name] **Test Scope:** PR #[number] / [branch name]
**Server:** http://localhost:3000 **Server:** http://localhost:3000
@@ -211,23 +226,23 @@ After all tests complete, present summary:
| Route | Status | Notes | | Route | Status | Notes |
|-------|--------|-------| |-------|--------|-------|
| `/users` | Pass | | | `/users` | Pass | |
| `/settings` | Pass | | | `/settings` | Pass | |
| `/dashboard` | Fail | Console error: [msg] | | `/dashboard` | Fail | Console error: [msg] |
| `/checkout` | ⏭️ Skip | Requires payment credentials | | `/checkout` | Skip | Requires payment credentials |
### Console Errors: [count] ### Console Errors: [count]
- [List any errors found] - [List any errors found]
### Human Verifications: [count] ### Human Verifications: [count]
- OAuth flow: Confirmed - OAuth flow: Confirmed
- Email delivery: Confirmed - Email delivery: Confirmed
### Failures: [count] ### Failures: [count]
- `/dashboard` - [issue description] - `/dashboard` - [issue description]
### Created Todos: [count] ### Created Todos: [count]
- `005-pending-p1-playwright-dashboard-error.md` - `005-pending-p1-browser-test-dashboard-error.md`
### Result: [PASS / FAIL / PARTIAL] ### Result: [PASS / FAIL / PARTIAL]
``` ```
@@ -238,11 +253,22 @@ After all tests complete, present summary:
```bash ```bash
# Test current branch changes # Test current branch changes
/playwright-test /test-browser
# Test specific PR # Test specific PR
/playwright-test 847 /test-browser 847
# Test specific branch # Test specific branch
/playwright-test feature/new-dashboard /test-browser feature/new-dashboard
```
## Key agent-browser Commands
```bash
agent-browser open <url> # Navigate
agent-browser snapshot -i # Interactive elements with refs
agent-browser click @e1 # Click by ref
agent-browser fill @e1 "text" # Fill input
agent-browser screenshot out.png # Screenshot
agent-browser --headed open <url> # Visible browser
``` ```

View File

@@ -445,8 +445,8 @@ After presenting the Summary Report, offer appropriate testing based on project
**For Web Projects:** **For Web Projects:**
```markdown ```markdown
**"Want to run Playwright browser tests on the affected pages?"** **"Want to run browser tests on the affected pages?"**
1. Yes - run `/playwright-test` 1. Yes - run `/test-browser`
2. No - skip 2. No - skip
``` ```
@@ -460,7 +460,7 @@ After presenting the Summary Report, offer appropriate testing based on project
**For Hybrid Projects (e.g., Rails + Hotwire Native):** **For Hybrid Projects (e.g., Rails + Hotwire Native):**
```markdown ```markdown
**"Want to run end-to-end tests?"** **"Want to run end-to-end tests?"**
1. Web only - run `/playwright-test` 1. Web only - run `/test-browser`
2. iOS only - run `/xcode-test` 2. iOS only - run `/xcode-test`
3. Both - run both commands 3. Both - run both commands
4. No - skip 4. No - skip
@@ -470,22 +470,22 @@ After presenting the Summary Report, offer appropriate testing based on project
#### If User Accepts Web Testing: #### If User Accepts Web Testing:
Spawn a subagent to run Playwright tests (preserves main context): Spawn a subagent to run browser tests (preserves main context):
``` ```
Task general-purpose("Run /playwright-test for PR #[number]. Test all affected pages, check for console errors, handle failures by creating todos and fixing.") Task general-purpose("Run /test-browser for PR #[number]. Test all affected pages, check for console errors, handle failures by creating todos and fixing.")
``` ```
The subagent will: The subagent will:
1. Identify pages affected by the PR 1. Identify pages affected by the PR
2. Navigate to each page and capture snapshots 2. Navigate to each page and capture snapshots (using Playwright MCP or agent-browser CLI)
3. Check for console errors 3. Check for console errors
4. Test critical interactions 4. Test critical interactions
5. Pause for human verification on OAuth/email/payment flows 5. Pause for human verification on OAuth/email/payment flows
6. Create P1 todos for any failures 6. Create P1 todos for any failures
7. Fix and retry until all tests pass 7. Fix and retry until all tests pass
**Standalone:** `/playwright-test [PR number]` **Standalone:** `/test-browser [PR number]`
#### If User Accepts iOS Testing: #### If User Accepts iOS Testing:

View File

@@ -181,11 +181,13 @@ This command takes a work document (plan, specification, or todo file) and execu
bin/dev # Run in background bin/dev # Run in background
``` ```
**Step 2: Capture screenshots with Playwright MCP tools** **Step 2: Capture screenshots with agent-browser CLI**
- `browser_navigate` to go to affected pages ```bash
- `browser_resize` to set viewport (desktop or mobile as needed) agent-browser open http://localhost:3000/[route]
- `browser_snapshot` to verify page state agent-browser snapshot -i
- `browser_take_screenshot` to capture images agent-browser screenshot output.png
```
See the `agent-browser` skill for detailed usage.
**Step 3: Upload using imgup skill** **Step 3: Upload using imgup skill**
```bash ```bash

View File

@@ -0,0 +1,223 @@
---
name: agent-browser
description: Browser automation using Vercel's agent-browser CLI. Use when you need to interact with web pages, fill forms, take screenshots, or scrape data. Alternative to Playwright MCP - uses Bash commands with ref-based element selection. Triggers on "browse website", "fill form", "click button", "take screenshot", "scrape page", "web automation".
---
# agent-browser: CLI Browser Automation
Vercel's headless browser automation CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
## Setup Check
```bash
# Check installation
command -v agent-browser >/dev/null 2>&1 && echo "Installed" || echo "NOT INSTALLED - run: npm install -g agent-browser && agent-browser install"
```
### Install if needed
```bash
npm install -g agent-browser
agent-browser install # Downloads Chromium
```
## Core Workflow
**The snapshot + ref pattern is optimal for LLMs:**
1. **Navigate** to URL
2. **Snapshot** to get interactive elements with refs
3. **Interact** using refs (@e1, @e2, etc.)
4. **Re-snapshot** after navigation or DOM changes
```bash
# Step 1: Open URL
agent-browser open https://example.com
# Step 2: Get interactive elements with refs
agent-browser snapshot -i --json
# Step 3: Interact using refs
agent-browser click @e1
agent-browser fill @e2 "search query"
# Step 4: Re-snapshot after changes
agent-browser snapshot -i
```
## Key Commands
### Navigation
```bash
agent-browser open <url> # Navigate to URL
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser close # Close browser
```
### Snapshots (Essential for AI)
```bash
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -i --json # JSON output for parsing
agent-browser snapshot -c # Compact (remove empty elements)
agent-browser snapshot -d 3 # Limit depth
```
### Interactions
```bash
agent-browser click @e1 # Click element
agent-browser dblclick @e1 # Double-click
agent-browser fill @e1 "text" # Clear and fill input
agent-browser type @e1 "text" # Type without clearing
agent-browser press Enter # Press key
agent-browser hover @e1 # Hover element
agent-browser check @e1 # Check checkbox
agent-browser uncheck @e1 # Uncheck checkbox
agent-browser select @e1 "option" # Select dropdown option
agent-browser scroll down 500 # Scroll (up/down/left/right)
agent-browser scrollintoview @e1 # Scroll element into view
```
### Get Information
```bash
agent-browser get text @e1 # Get element text
agent-browser get html @e1 # Get element HTML
agent-browser get value @e1 # Get input value
agent-browser get attr href @e1 # Get attribute
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get count "button" # Count matching elements
```
### Screenshots & PDFs
```bash
agent-browser screenshot # Viewport screenshot
agent-browser screenshot --full # Full page
agent-browser screenshot output.png # Save to file
agent-browser screenshot --full output.png # Full page to file
agent-browser pdf output.pdf # Save as PDF
```
### Wait
```bash
agent-browser wait @e1 # Wait for element
agent-browser wait 2000 # Wait milliseconds
agent-browser wait "text" # Wait for text to appear
```
## Semantic Locators (Alternative to Refs)
```bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign up" click
agent-browser find label "Email" fill "user@example.com"
agent-browser find placeholder "Search..." fill "query"
```
## Sessions (Parallel Browsers)
```bash
# Run multiple independent browser sessions
agent-browser --session browser1 open https://site1.com
agent-browser --session browser2 open https://site2.com
# List active sessions
agent-browser session list
```
## Examples
### Login Flow
```bash
agent-browser open https://app.example.com/login
agent-browser snapshot -i
# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait 2000
agent-browser snapshot -i # Verify logged in
```
### Search and Extract
```bash
agent-browser open https://news.ycombinator.com
agent-browser snapshot -i --json
# Parse JSON to find story links
agent-browser get text @e12 # Get headline text
agent-browser click @e12 # Click to open story
```
### Form Filling
```bash
agent-browser open https://forms.example.com
agent-browser snapshot -i
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser select @e3 "United States"
agent-browser check @e4 # Agree to terms
agent-browser click @e5 # Submit button
agent-browser screenshot confirmation.png
```
### Debug Mode
```bash
# Run with visible browser window
agent-browser --headed open https://example.com
agent-browser --headed snapshot -i
agent-browser --headed click @e1
```
## JSON Output
Add `--json` for structured output:
```bash
agent-browser snapshot -i --json
```
Returns:
```json
{
"success": true,
"data": {
"refs": {
"e1": {"name": "Submit", "role": "button"},
"e2": {"name": "Email", "role": "textbox"}
},
"snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]"
}
}
```
## vs Playwright MCP
| Feature | agent-browser (CLI) | Playwright MCP |
|---------|---------------------|----------------|
| Interface | Bash commands | MCP tools |
| Selection | Refs (@e1) | Refs (e1) |
| Output | Text/JSON | Tool responses |
| Parallel | Sessions | Tabs |
| Best for | Quick automation | Tool integration |
Use agent-browser when:
- You prefer Bash-based workflows
- You want simpler CLI commands
- You need quick one-off automation
Use Playwright MCP when:
- You need deep MCP tool integration
- You want tool-based responses
- You're building complex automation