feat: sync agent-browser skill with upstream vercel-labs/agent-browser

Update SKILL.md to match the latest upstream skill from vercel-labs/agent-browser, adding substantial new capabilities: - Authentication (auth vault, profiles, session persistence, state files) - Command chaining, annotated screenshots, diffing - Security features (content boundaries, domain allowlist, action policy) - iOS Simulator support, Lightpanda engine, downloads, clipboard - JS eval improvements (--stdin, -b for shell safety) - Timeout guidance, config files, session cleanup Add 7 reference docs (commands, authentication, snapshot-refs, session-management, video-recording, profiling, proxy-support) and 3 ready-to-use shell templates. Kept our YAML frontmatter, setup check section, and Playwright MCP comparison table which are unique to our plugin context.
2026-03-14 20:08:27 -07:00
parent 7c04c3158f
commit 24860ec3f1
11 changed files with 2260 additions and 240 deletions
--- a/plugins/compound-engineering/skills/agent-browser/SKILL.md
+++ b/plugins/compound-engineering/skills/agent-browser/SKILL.md
@@ -3,9 +3,9 @@ name: agent-browser
 description: Browser automation using Vercel's agent-browser CLI. Use when you need to interact with web pages, fill forms, take screenshots, or scrape data. Alternative to Playwright MCP - uses Bash commands with ref-based element selection. Triggers on "browse website", "fill form", "click button", "take screenshot", "scrape page", "web automation".
 ---

-# agent-browser: CLI Browser Automation
+# Browser Automation with agent-browser

-Vercel's headless browser automation CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
+The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome.

 ## Setup Check

@@ -23,284 +23,625 @@ agent-browser install  # Downloads Chromium

 ## Core Workflow

-**The snapshot + ref pattern is optimal for LLMs:**
+Every browser automation follows this pattern:

-1. **Navigate** to URL
-2. **Snapshot** to get interactive elements with refs
-3. **Interact** using refs (@e1, @e2, etc.)
-4. **Re-snapshot** after navigation or DOM changes
+1. **Navigate**: `agent-browser open <url>`
+2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
+3. **Interact**: Use refs to click, fill, select
+4. **Re-snapshot**: After navigation or DOM changes, get fresh refs

 ```bash
-# Step 1: Open URL
-agent-browser open https://example.com
-
-# Step 2: Get interactive elements with refs
-agent-browser snapshot -i --json
-
-# Step 3: Interact using refs
-agent-browser click @e1
-agent-browser fill @e2 "search query"
-
-# Step 4: Re-snapshot after changes
+agent-browser open https://example.com/form
 agent-browser snapshot -i
-```
+# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"

-## Key Commands
-
-### Navigation
-
-```bash
-agent-browser open <url>       # Navigate to URL
-agent-browser back             # Go back
-agent-browser forward          # Go forward
-agent-browser reload           # Reload page
-agent-browser close            # Close browser
-```
-
-### Snapshots (Essential for AI)
-
-```bash
-agent-browser snapshot              # Full accessibility tree
-agent-browser snapshot -i           # Interactive elements only (recommended)
-agent-browser snapshot -i --json    # JSON output for parsing
-agent-browser snapshot -c           # Compact (remove empty elements)
-agent-browser snapshot -d 3         # Limit depth
-```
-
-### Interactions
-
-```bash
-agent-browser click @e1                    # Click element
-agent-browser dblclick @e1                 # Double-click
-agent-browser fill @e1 "text"              # Clear and fill input
-agent-browser type @e1 "text"              # Type without clearing
-agent-browser press Enter                  # Press key
-agent-browser hover @e1                    # Hover element
-agent-browser check @e1                    # Check checkbox
-agent-browser uncheck @e1                  # Uncheck checkbox
-agent-browser select @e1 "option"          # Select dropdown option
-agent-browser scroll down 500              # Scroll (up/down/left/right)
-agent-browser scrollintoview @e1           # Scroll element into view
-```
-
-### Get Information
-
-```bash
-agent-browser get text @e1          # Get element text
-agent-browser get html @e1          # Get element HTML
-agent-browser get value @e1         # Get input value
-agent-browser get attr href @e1     # Get attribute
-agent-browser get title             # Get page title
-agent-browser get url               # Get current URL
-agent-browser get count "button"    # Count matching elements
-```
-
-### Screenshots & PDFs
-
-```bash
-agent-browser screenshot                      # Viewport screenshot
-agent-browser screenshot --full               # Full page
-agent-browser screenshot output.png           # Save to file
-agent-browser screenshot --full output.png    # Full page to file
-agent-browser pdf output.pdf                  # Save as PDF
-```
-
-### Wait
-
-```bash
-agent-browser wait @e1              # Wait for element
-agent-browser wait 2000             # Wait milliseconds
-agent-browser wait "text"           # Wait for text to appear
-```
-
-## Semantic Locators (Alternative to Refs)
-
-```bash
-agent-browser find role button click --name "Submit"
-agent-browser find text "Sign up" click
-agent-browser find label "Email" fill "user@example.com"
-agent-browser find placeholder "Search..." fill "query"
-```
-
-## Sessions (Parallel Browsers)
-
-```bash
-# Run multiple independent browser sessions
-agent-browser --session browser1 open https://site1.com
-agent-browser --session browser2 open https://site2.com
-
-# List active sessions
-agent-browser session list
-```
-
-## Examples
-
-### Login Flow
-
-```bash
-agent-browser open https://app.example.com/login
-agent-browser snapshot -i
-# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
 agent-browser fill @e1 "user@example.com"
 agent-browser fill @e2 "password123"
 agent-browser click @e3
-agent-browser wait 2000
-agent-browser snapshot -i  # Verify logged in
+agent-browser wait --load networkidle
+agent-browser snapshot -i  # Check result
 ```

-### Search and Extract
+## Command Chaining
+
+Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.

 ```bash
-agent-browser open https://news.ycombinator.com
-agent-browser snapshot -i --json
-# Parse JSON to find story links
-agent-browser get text @e12  # Get headline text
-agent-browser click @e12     # Click to open story
+# Chain open + wait + snapshot in one call
+agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
+
+# Chain multiple interactions
+agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
+
+# Navigate and capture
+agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
 ```

-### Form Filling
+**When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
+
+## Handling Authentication
+
+When automating a site that requires login, choose the approach that fits:
+
+**Option 1: Import auth from the user's browser (fastest for one-off tasks)**

 ```bash
-agent-browser open https://forms.example.com
+# Connect to the user's running Chrome (they're already logged in)
+agent-browser --auto-connect state save ./auth.json
+# Use that auth state
+agent-browser --state ./auth.json open https://app.example.com/dashboard
+```
+
+State files contain session tokens in plaintext -- add to `.gitignore` and delete when no longer needed. Set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest.
+
+**Option 2: Persistent profile (simplest for recurring tasks)**
+
+```bash
+# First run: login manually or via automation
+agent-browser --profile ~/.myapp open https://app.example.com/login
+# ... fill credentials, submit ...
+
+# All future runs: already authenticated
+agent-browser --profile ~/.myapp open https://app.example.com/dashboard
+```
+
+**Option 3: Session name (auto-save/restore cookies + localStorage)**
+
+```bash
+agent-browser --session-name myapp open https://app.example.com/login
+# ... login flow ...
+agent-browser close  # State auto-saved
+
+# Next time: state auto-restored
+agent-browser --session-name myapp open https://app.example.com/dashboard
+```
+
+**Option 4: Auth vault (credentials stored encrypted, login by name)**
+
+```bash
+echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
+agent-browser auth login myapp
+```
+
+**Option 5: State file (manual save/load)**
+
+```bash
+# After logging in:
+agent-browser state save ./auth.json
+# In a future session:
+agent-browser state load ./auth.json
+agent-browser open https://app.example.com/dashboard
+```
+
+See [references/authentication.md](references/authentication.md) for OAuth, 2FA, cookie-based auth, and token refresh patterns.
+
+## Essential Commands
+
+```bash
+# Navigation
+agent-browser open <url>              # Navigate (aliases: goto, navigate)
+agent-browser close                   # Close browser
+
+# Snapshot
+agent-browser snapshot -i             # Interactive elements with refs (recommended)
+agent-browser snapshot -i -C          # Include cursor-interactive elements (divs with onclick, cursor:pointer)
+agent-browser snapshot -s "#selector" # Scope to CSS selector
+
+# Interaction (use @refs from snapshot)
+agent-browser click @e1               # Click element
+agent-browser click @e1 --new-tab     # Click and open in new tab
+agent-browser fill @e2 "text"         # Clear and type text
+agent-browser type @e2 "text"         # Type without clearing
+agent-browser select @e1 "option"     # Select dropdown option
+agent-browser check @e1               # Check checkbox
+agent-browser press Enter             # Press key
+agent-browser keyboard type "text"    # Type at current focus (no selector)
+agent-browser keyboard inserttext "text"  # Insert without key events
+agent-browser scroll down 500         # Scroll page
+agent-browser scroll down 500 --selector "div.content"  # Scroll within a specific container
+
+# Get information
+agent-browser get text @e1            # Get element text
+agent-browser get url                 # Get current URL
+agent-browser get title               # Get page title
+agent-browser get cdp-url             # Get CDP WebSocket URL
+
+# Wait
+agent-browser wait @e1                # Wait for element
+agent-browser wait --load networkidle # Wait for network idle
+agent-browser wait --url "**/page"    # Wait for URL pattern
+agent-browser wait 2000               # Wait milliseconds
+agent-browser wait --text "Welcome"    # Wait for text to appear (substring match)
+agent-browser wait --fn "!document.body.innerText.includes('Loading...')"  # Wait for text to disappear
+agent-browser wait "#spinner" --state hidden  # Wait for element to disappear
+
+# Downloads
+agent-browser download @e1 ./file.pdf          # Click element to trigger download
+agent-browser wait --download ./output.zip     # Wait for any download to complete
+agent-browser --download-path ./downloads open <url>  # Set default download directory
+
+# Viewport & Device Emulation
+agent-browser set viewport 1920 1080          # Set viewport size (default: 1280x720)
+agent-browser set viewport 1920 1080 2        # 2x retina (same CSS size, higher res screenshots)
+agent-browser set device "iPhone 14"          # Emulate device (viewport + user agent)
+
+# Capture
+agent-browser screenshot              # Screenshot to temp dir
+agent-browser screenshot --full       # Full page screenshot
+agent-browser screenshot --annotate   # Annotated screenshot with numbered element labels
+agent-browser screenshot --screenshot-dir ./shots  # Save to custom directory
+agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
+agent-browser pdf output.pdf          # Save as PDF
+
+# Clipboard
+agent-browser clipboard read                      # Read text from clipboard
+agent-browser clipboard write "Hello, World!"     # Write text to clipboard
+agent-browser clipboard copy                      # Copy current selection
+agent-browser clipboard paste                     # Paste from clipboard
+
+# Diff (compare page states)
+agent-browser diff snapshot                          # Compare current vs last snapshot
+agent-browser diff snapshot --baseline before.txt    # Compare current vs saved file
+agent-browser diff screenshot --baseline before.png  # Visual pixel diff
+agent-browser diff url <url1> <url2>                 # Compare two pages
+agent-browser diff url <url1> <url2> --wait-until networkidle  # Custom wait strategy
+agent-browser diff url <url1> <url2> --selector "#main"  # Scope to element
+```
+
+## Common Patterns
+
+### Form Submission
+
+```bash
+agent-browser open https://example.com/signup
 agent-browser snapshot -i
-agent-browser fill @e1 "John Doe"
-agent-browser fill @e2 "john@example.com"
-agent-browser select @e3 "United States"
-agent-browser check @e4  # Agree to terms
-agent-browser click @e5  # Submit button
-agent-browser screenshot confirmation.png
+agent-browser fill @e1 "Jane Doe"
+agent-browser fill @e2 "jane@example.com"
+agent-browser select @e3 "California"
+agent-browser check @e4
+agent-browser click @e5
+agent-browser wait --load networkidle
 ```

-### Debug Mode
+### Authentication with Auth Vault (Recommended)

 ```bash
-# Run with visible browser window
-agent-browser --headed open https://example.com
-agent-browser --headed snapshot -i
-agent-browser --headed click @e1
+# Save credentials once (encrypted with AGENT_BROWSER_ENCRYPTION_KEY)
+# Recommended: pipe password via stdin to avoid shell history exposure
+echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
+
+# Login using saved profile (LLM never sees password)
+agent-browser auth login github
+
+# List/show/delete profiles
+agent-browser auth list
+agent-browser auth show github
+agent-browser auth delete github
 ```

-## JSON Output
-
-Add `--json` for structured output:
+### Authentication with State Persistence

 ```bash
+# Login once and save state
+agent-browser open https://app.example.com/login
+agent-browser snapshot -i
+agent-browser fill @e1 "$USERNAME"
+agent-browser fill @e2 "$PASSWORD"
+agent-browser click @e3
+agent-browser wait --url "**/dashboard"
+agent-browser state save auth.json
+
+# Reuse in future sessions
+agent-browser state load auth.json
+agent-browser open https://app.example.com/dashboard
+```
+
+### Session Persistence
+
+```bash
+# Auto-save/restore cookies and localStorage across browser restarts
+agent-browser --session-name myapp open https://app.example.com/login
+# ... login flow ...
+agent-browser close  # State auto-saved to ~/.agent-browser/sessions/
+
+# Next time, state is auto-loaded
+agent-browser --session-name myapp open https://app.example.com/dashboard
+
+# Encrypt state at rest
+export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
+agent-browser --session-name secure open https://app.example.com
+
+# Manage saved states
+agent-browser state list
+agent-browser state show myapp-default.json
+agent-browser state clear myapp
+agent-browser state clean --older-than 7
+```
+
+### Data Extraction
+
+```bash
+agent-browser open https://example.com/products
+agent-browser snapshot -i
+agent-browser get text @e5           # Get specific element text
+agent-browser get text body > page.txt  # Get all page text
+
+# JSON output for parsing
 agent-browser snapshot -i --json
+agent-browser get text @e1 --json
 ```

-Returns:
+### Parallel Sessions
+
+```bash
+agent-browser --session site1 open https://site-a.com
+agent-browser --session site2 open https://site-b.com
+
+agent-browser --session site1 snapshot -i
+agent-browser --session site2 snapshot -i
+
+agent-browser session list
+```
+
+### Connect to Existing Chrome
+
+```bash
+# Auto-discover running Chrome with remote debugging enabled
+agent-browser --auto-connect open https://example.com
+agent-browser --auto-connect snapshot
+
+# Or with explicit CDP port
+agent-browser --cdp 9222 snapshot
+```
+
+### Color Scheme (Dark Mode)
+
+```bash
+# Persistent dark mode via flag (applies to all pages and new tabs)
+agent-browser --color-scheme dark open https://example.com
+
+# Or via environment variable
+AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com
+
+# Or set during session (persists for subsequent commands)
+agent-browser set media dark
+```
+
+### Viewport & Responsive Testing
+
+```bash
+# Set a custom viewport size (default is 1280x720)
+agent-browser set viewport 1920 1080
+agent-browser screenshot desktop.png
+
+# Test mobile-width layout
+agent-browser set viewport 375 812
+agent-browser screenshot mobile.png
+
+# Retina/HiDPI: same CSS layout at 2x pixel density
+# Screenshots stay at logical viewport size, but content renders at higher DPI
+agent-browser set viewport 1920 1080 2
+agent-browser screenshot retina.png
+
+# Device emulation (sets viewport + user agent in one step)
+agent-browser set device "iPhone 14"
+agent-browser screenshot device.png
+```
+
+The `scale` parameter (3rd argument) sets `window.devicePixelRatio` without changing CSS layout. Use it when testing retina rendering or capturing higher-resolution screenshots.
+
+### Visual Browser (Debugging)
+
+```bash
+agent-browser --headed open https://example.com
+agent-browser highlight @e1          # Highlight element
+agent-browser inspect                # Open Chrome DevTools for the active page
+agent-browser record start demo.webm # Record session
+agent-browser profiler start         # Start Chrome DevTools profiling
+agent-browser profiler stop trace.json # Stop and save profile (path optional)
+```
+
+Use `AGENT_BROWSER_HEADED=1` to enable headed mode via environment variable. Browser extensions work in both headed and headless mode.
+
+### Local Files (PDFs, HTML)
+
+```bash
+# Open local files with file:// URLs
+agent-browser --allow-file-access open file:///path/to/document.pdf
+agent-browser --allow-file-access open file:///path/to/page.html
+agent-browser screenshot output.png
+```
+
+### iOS Simulator (Mobile Safari)
+
+```bash
+# List available iOS simulators
+agent-browser device list
+
+# Launch Safari on a specific device
+agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
+
+# Same workflow as desktop - snapshot, interact, re-snapshot
+agent-browser -p ios snapshot -i
+agent-browser -p ios tap @e1          # Tap (alias for click)
+agent-browser -p ios fill @e2 "text"
+agent-browser -p ios swipe up         # Mobile-specific gesture
+
+# Take screenshot
+agent-browser -p ios screenshot mobile.png
+
+# Close session (shuts down simulator)
+agent-browser -p ios close
+```
+
+**Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
+
+**Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
+
+## Security
+
+All security features are opt-in. By default, agent-browser imposes no restrictions on navigation, actions, or output.
+
+### Content Boundaries (Recommended for AI Agents)
+
+Enable `--content-boundaries` to wrap page-sourced output in markers that help LLMs distinguish tool output from untrusted page content:
+
+```bash
+export AGENT_BROWSER_CONTENT_BOUNDARIES=1
+agent-browser snapshot
+# Output:
+# --- AGENT_BROWSER_PAGE_CONTENT nonce=<hex> origin=https://example.com ---
+# [accessibility tree]
+# --- END_AGENT_BROWSER_PAGE_CONTENT nonce=<hex> ---
+```
+
+### Domain Allowlist
+
+Restrict navigation to trusted domains. Wildcards like `*.example.com` also match the bare domain `example.com`. Sub-resource requests, WebSocket, and EventSource connections to non-allowed domains are also blocked. Include CDN domains your target pages depend on:
+
+```bash
+export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
+agent-browser open https://example.com        # OK
+agent-browser open https://malicious.com       # Blocked
+```
+
+### Action Policy
+
+Use a policy file to gate destructive actions:
+
+```bash
+export AGENT_BROWSER_ACTION_POLICY=./policy.json
+```
+
+Example `policy.json`:
+
+```json
+{ "default": "deny", "allow": ["navigate", "snapshot", "click", "scroll", "wait", "get"] }
+```
+
+Auth vault operations (`auth login`, etc.) bypass action policy but domain allowlist still applies.
+
+### Output Limits
+
+Prevent context flooding from large pages:
+
+```bash
+export AGENT_BROWSER_MAX_OUTPUT=50000
+```
+
+## Diffing (Verifying Changes)
+
+Use `diff snapshot` after performing an action to verify it had the intended effect. This compares the current accessibility tree against the last snapshot taken in the session.
+
+```bash
+# Typical workflow: snapshot -> action -> diff
+agent-browser snapshot -i          # Take baseline snapshot
+agent-browser click @e2            # Perform action
+agent-browser diff snapshot        # See what changed (auto-compares to last snapshot)
+```
+
+For visual regression testing or monitoring:
+
+```bash
+# Save a baseline screenshot, then compare later
+agent-browser screenshot baseline.png
+# ... time passes or changes are made ...
+agent-browser diff screenshot --baseline baseline.png
+
+# Compare staging vs production
+agent-browser diff url https://staging.example.com https://prod.example.com --screenshot
+```
+
+`diff snapshot` output uses `+` for additions and `-` for removals, similar to git diff. `diff screenshot` produces a diff image with changed pixels highlighted in red, plus a mismatch percentage.
+
+## Timeouts and Slow Pages
+
+The default timeout is 25 seconds. This can be overridden with the `AGENT_BROWSER_DEFAULT_TIMEOUT` environment variable (value in milliseconds). For slow websites or large pages, use explicit waits instead of relying on the default timeout:
+
+```bash
+# Wait for network activity to settle (best for slow pages)
+agent-browser wait --load networkidle
+
+# Wait for a specific element to appear
+agent-browser wait "#content"
+agent-browser wait @e1
+
+# Wait for a specific URL pattern (useful after redirects)
+agent-browser wait --url "**/dashboard"
+
+# Wait for a JavaScript condition
+agent-browser wait --fn "document.readyState === 'complete'"
+
+# Wait a fixed duration (milliseconds) as a last resort
+agent-browser wait 5000
+```
+
+When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait <selector>` or `wait @ref`.
+
+## Session Management and Cleanup
+
+When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:
+
+```bash
+# Each agent gets its own isolated session
+agent-browser --session agent1 open site-a.com
+agent-browser --session agent2 open site-b.com
+
+# Check active sessions
+agent-browser session list
+```
+
+Always close your browser session when done to avoid leaked processes:
+
+```bash
+agent-browser close                    # Close default session
+agent-browser --session agent1 close   # Close specific session
+```
+
+If a previous session was not closed properly, the daemon may still be running. Use `agent-browser close` to clean it up before starting new work.
+
+To auto-shutdown the daemon after a period of inactivity (useful for ephemeral/CI environments):
+
+```bash
+AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser open example.com
+```
+
+## Ref Lifecycle (Important)
+
+Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
+
+- Clicking links or buttons that navigate
+- Form submissions
+- Dynamic content loading (dropdowns, modals)
+
+```bash
+agent-browser click @e5              # Navigates to new page
+agent-browser snapshot -i            # MUST re-snapshot
+agent-browser click @e1              # Use new refs
+```
+
+## Annotated Screenshots (Vision Mode)
+
+Use `--annotate` to take a screenshot with numbered labels overlaid on interactive elements. Each label `[N]` maps to ref `@eN`. This also caches refs, so you can interact with elements immediately without a separate snapshot.
+
+```bash
+agent-browser screenshot --annotate
+# Output includes the image path and a legend:
+#   [1] @e1 button "Submit"
+#   [2] @e2 link "Home"
+#   [3] @e3 textbox "Email"
+agent-browser click @e2              # Click using ref from annotated screenshot
+```
+
+Use annotated screenshots when:
+
+- The page has unlabeled icon buttons or visual-only elements
+- You need to verify visual layout or styling
+- Canvas or chart elements are present (invisible to text snapshots)
+- You need spatial reasoning about element positions
+
+## Semantic Locators (Alternative to Refs)
+
+When refs are unavailable or unreliable, use semantic locators:
+
+```bash
+agent-browser find text "Sign In" click
+agent-browser find label "Email" fill "user@test.com"
+agent-browser find role button click --name "Submit"
+agent-browser find placeholder "Search" type "query"
+agent-browser find testid "submit-btn" click
+```
+
+## JavaScript Evaluation (eval)
+
+Use `eval` to run JavaScript in the browser context. **Shell quoting can corrupt complex expressions** -- use `--stdin` or `-b` to avoid issues.
+
+```bash
+# Simple expressions work with regular quoting
+agent-browser eval 'document.title'
+agent-browser eval 'document.querySelectorAll("img").length'
+
+# Complex JS: use --stdin with heredoc (RECOMMENDED)
+agent-browser eval --stdin <<'EVALEOF'
+JSON.stringify(
+  Array.from(document.querySelectorAll("img"))
+    .filter(i => !i.alt)
+    .map(i => ({ src: i.src.split("/").pop(), width: i.width }))
+)
+EVALEOF
+
+# Alternative: base64 encoding (avoids all shell escaping issues)
+agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
+```
+
+**Why this matters:** When the shell processes your command, inner double quotes, `!` characters (history expansion), backticks, and `$()` can all corrupt the JavaScript before it reaches agent-browser. The `--stdin` and `-b` flags bypass shell interpretation entirely.
+
+**Rules of thumb:**
+
+- Single-line, no nested quotes -> regular `eval 'expression'` with single quotes is fine
+- Nested quotes, arrow functions, template literals, or multiline -> use `eval --stdin <<'EVALEOF'`
+- Programmatic/generated scripts -> use `eval -b` with base64
+
+## Configuration File
+
+Create `agent-browser.json` in the project root for persistent settings:
+
 ```json
 {
-  "success": true,
-  "data": {
-    "refs": {
-      "e1": {"name": "Submit", "role": "button"},
-      "e2": {"name": "Email", "role": "textbox"}
-    },
-    "snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]"
-  }
+  "headed": true,
+  "proxy": "http://localhost:8080",
+  "profile": "./browser-data"
 }
 ```

-## Inspection & Debugging
+Priority (lowest to highest): `~/.agent-browser/config.json` < `./agent-browser.json` < env vars < CLI flags. Use `--config <path>` or `AGENT_BROWSER_CONFIG` env var for a custom config file (exits with error if missing/invalid). All CLI options map to camelCase keys (e.g., `--executable-path` -> `"executablePath"`). Boolean flags accept `true`/`false` values (e.g., `--headed false` overrides config). Extensions from user and project configs are merged, not replaced.

-### JavaScript Evaluation
+## Browser Engine Selection
+
+Use `--engine` to choose a local browser engine. The default is `chrome`.

 ```bash
-agent-browser eval "document.title"                    # Evaluate JS expression
-agent-browser eval "JSON.stringify(localStorage)"      # Return serialized data
-agent-browser eval "document.querySelectorAll('a').length"  # Count elements
+# Use Lightpanda (fast headless browser, requires separate install)
+agent-browser --engine lightpanda open example.com
+
+# Via environment variable
+export AGENT_BROWSER_ENGINE=lightpanda
+agent-browser open example.com
+
+# With custom binary path
+agent-browser --engine lightpanda --executable-path /path/to/lightpanda open example.com
 ```

-### Console & Errors
+Supported engines:
+- `chrome` (default) -- Chrome/Chromium via CDP
+- `lightpanda` -- Lightpanda headless browser via CDP (10x faster, 10x less memory than Chrome)
+
+Lightpanda does not support `--extension`, `--profile`, `--state`, or `--allow-file-access`. Install Lightpanda from https://lightpanda.io/docs/open-source/installation.
+
+## Deep-Dive Documentation
+
+| Reference                                                            | When to Use                                               |
+| -------------------------------------------------------------------- | --------------------------------------------------------- |
+| [references/commands.md](references/commands.md)                     | Full command reference with all options                   |
+| [references/snapshot-refs.md](references/snapshot-refs.md)           | Ref lifecycle, invalidation rules, troubleshooting        |
+| [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
+| [references/authentication.md](references/authentication.md)         | Login flows, OAuth, 2FA handling, state reuse             |
+| [references/video-recording.md](references/video-recording.md)       | Recording workflows for debugging and documentation       |
+| [references/profiling.md](references/profiling.md)                   | Chrome DevTools profiling for performance analysis        |
+| [references/proxy-support.md](references/proxy-support.md)           | Proxy configuration, geo-testing, rotating proxies        |
+
+## Ready-to-Use Templates
+
+| Template                                                                 | Description                         |
+| ------------------------------------------------------------------------ | ----------------------------------- |
+| [templates/form-automation.sh](templates/form-automation.sh)             | Form filling with validation        |
+| [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse state             |
+| [templates/capture-workflow.sh](templates/capture-workflow.sh)           | Content extraction with screenshots |

 ```bash
-agent-browser console                 # Show browser console output
-agent-browser console --clear         # Show and clear console
-agent-browser errors                  # Show JavaScript errors only
-agent-browser errors --clear          # Show and clear errors
-```
-
-## Network
-
-```bash
-agent-browser network requests                          # List captured requests
-agent-browser network requests --filter "api"           # Filter by URL pattern
-agent-browser route "**/*.png" abort                    # Block matching requests
-agent-browser route "https://api.example.com/*" fulfill --status 200 --body '{"mock":true}'  # Mock response
-agent-browser unroute "**/*.png"                        # Remove route handler
-```
-
-## Storage
-
-### Cookies
-
-```bash
-agent-browser cookies get                               # Get all cookies
-agent-browser cookies get --name "session"              # Get specific cookie
-agent-browser cookies set --name "token" --value "abc"  # Set cookie
-agent-browser cookies clear                             # Clear all cookies
-```
-
-### Local & Session Storage
-
-```bash
-agent-browser storage local                             # Get all localStorage
-agent-browser storage local --key "theme"               # Get specific key
-agent-browser storage session                           # Get all sessionStorage
-agent-browser storage session --key "cart"              # Get specific key
-```
-
-## Device & Settings
-
-```bash
-agent-browser set viewport 1920 1080                    # Set viewport size
-agent-browser set device "iPhone 14"                    # Emulate device
-agent-browser set geo --lat 47.6 --lon -122.3           # Set geolocation
-agent-browser set offline true                          # Enable offline mode
-agent-browser set offline false                         # Disable offline mode
-agent-browser set media "prefers-color-scheme" "dark"   # Set media feature
-agent-browser set headers '{"X-Custom":"value"}'        # Set extra HTTP headers
-agent-browser set credentials "user" "pass"             # Set HTTP auth credentials
-```
-
-## Element Debugging
-
-```bash
-agent-browser highlight @e1                             # Highlight element visually
-agent-browser get box @e1                               # Get bounding box (x, y, width, height)
-agent-browser get styles @e1                            # Get computed styles
-agent-browser is visible @e1                            # Check if element is visible
-agent-browser is enabled @e1                            # Check if element is enabled
-agent-browser is checked @e1                            # Check if checkbox/radio is checked
-```
-
-## Recording & Tracing
-
-```bash
-agent-browser trace start                               # Start recording trace
-agent-browser trace stop trace.zip                      # Stop and save trace file
-agent-browser record start                              # Start recording video
-agent-browser record stop video.webm                    # Stop and save recording
-```
-
-## Tabs & Windows
-
-```bash
-agent-browser tab list                                  # List open tabs
-agent-browser tab new https://example.com               # Open URL in new tab
-agent-browser tab close                                 # Close current tab
-agent-browser tab 2                                     # Switch to tab by index
-```
-
-## Advanced Mouse
-
-```bash
-agent-browser mouse move 100 200                        # Move mouse to coordinates
-agent-browser mouse down                                # Press mouse button
-agent-browser mouse up                                  # Release mouse button
-agent-browser mouse wheel 0 500                         # Scroll (deltaX, deltaY)
-agent-browser drag @e1 @e2                              # Drag from element to element
+./templates/form-automation.sh https://example.com/form
+./templates/authenticated-session.sh https://app.example.com/login
+./templates/capture-workflow.sh https://example.com ./output
 ```

 ## vs Playwright MCP