feat: sync agent-browser skill with upstream vercel-labs/agent-browser
Update SKILL.md to match the latest upstream skill from vercel-labs/agent-browser, adding substantial new capabilities: - Authentication (auth vault, profiles, session persistence, state files) - Command chaining, annotated screenshots, diffing - Security features (content boundaries, domain allowlist, action policy) - iOS Simulator support, Lightpanda engine, downloads, clipboard - JS eval improvements (--stdin, -b for shell safety) - Timeout guidance, config files, session cleanup Add 7 reference docs (commands, authentication, snapshot-refs, session-management, video-recording, profiling, proxy-support) and 3 ready-to-use shell templates. Kept our YAML frontmatter, setup check section, and Playwright MCP comparison table which are unique to our plugin context.
This commit is contained in:
@@ -3,9 +3,9 @@ name: agent-browser
|
||||
description: Browser automation using Vercel's agent-browser CLI. Use when you need to interact with web pages, fill forms, take screenshots, or scrape data. Alternative to Playwright MCP - uses Bash commands with ref-based element selection. Triggers on "browse website", "fill form", "click button", "take screenshot", "scrape page", "web automation".
|
||||
---
|
||||
|
||||
# agent-browser: CLI Browser Automation
|
||||
# Browser Automation with agent-browser
|
||||
|
||||
Vercel's headless browser automation CLI designed for AI agents. Uses ref-based selection (@e1, @e2) from accessibility snapshots.
|
||||
The CLI uses Chrome/Chromium via CDP directly. Install via `npm i -g agent-browser`, `brew install agent-browser`, or `cargo install agent-browser`. Run `agent-browser install` to download Chrome.
|
||||
|
||||
## Setup Check
|
||||
|
||||
@@ -23,284 +23,625 @@ agent-browser install # Downloads Chromium
|
||||
|
||||
## Core Workflow
|
||||
|
||||
**The snapshot + ref pattern is optimal for LLMs:**
|
||||
Every browser automation follows this pattern:
|
||||
|
||||
1. **Navigate** to URL
|
||||
2. **Snapshot** to get interactive elements with refs
|
||||
3. **Interact** using refs (@e1, @e2, etc.)
|
||||
4. **Re-snapshot** after navigation or DOM changes
|
||||
1. **Navigate**: `agent-browser open <url>`
|
||||
2. **Snapshot**: `agent-browser snapshot -i` (get element refs like `@e1`, `@e2`)
|
||||
3. **Interact**: Use refs to click, fill, select
|
||||
4. **Re-snapshot**: After navigation or DOM changes, get fresh refs
|
||||
|
||||
```bash
|
||||
# Step 1: Open URL
|
||||
agent-browser open https://example.com
|
||||
|
||||
# Step 2: Get interactive elements with refs
|
||||
agent-browser snapshot -i --json
|
||||
|
||||
# Step 3: Interact using refs
|
||||
agent-browser click @e1
|
||||
agent-browser fill @e2 "search query"
|
||||
|
||||
# Step 4: Re-snapshot after changes
|
||||
agent-browser open https://example.com/form
|
||||
agent-browser snapshot -i
|
||||
```
|
||||
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Submit"
|
||||
|
||||
## Key Commands
|
||||
|
||||
### Navigation
|
||||
|
||||
```bash
|
||||
agent-browser open <url> # Navigate to URL
|
||||
agent-browser back # Go back
|
||||
agent-browser forward # Go forward
|
||||
agent-browser reload # Reload page
|
||||
agent-browser close # Close browser
|
||||
```
|
||||
|
||||
### Snapshots (Essential for AI)
|
||||
|
||||
```bash
|
||||
agent-browser snapshot # Full accessibility tree
|
||||
agent-browser snapshot -i # Interactive elements only (recommended)
|
||||
agent-browser snapshot -i --json # JSON output for parsing
|
||||
agent-browser snapshot -c # Compact (remove empty elements)
|
||||
agent-browser snapshot -d 3 # Limit depth
|
||||
```
|
||||
|
||||
### Interactions
|
||||
|
||||
```bash
|
||||
agent-browser click @e1 # Click element
|
||||
agent-browser dblclick @e1 # Double-click
|
||||
agent-browser fill @e1 "text" # Clear and fill input
|
||||
agent-browser type @e1 "text" # Type without clearing
|
||||
agent-browser press Enter # Press key
|
||||
agent-browser hover @e1 # Hover element
|
||||
agent-browser check @e1 # Check checkbox
|
||||
agent-browser uncheck @e1 # Uncheck checkbox
|
||||
agent-browser select @e1 "option" # Select dropdown option
|
||||
agent-browser scroll down 500 # Scroll (up/down/left/right)
|
||||
agent-browser scrollintoview @e1 # Scroll element into view
|
||||
```
|
||||
|
||||
### Get Information
|
||||
|
||||
```bash
|
||||
agent-browser get text @e1 # Get element text
|
||||
agent-browser get html @e1 # Get element HTML
|
||||
agent-browser get value @e1 # Get input value
|
||||
agent-browser get attr href @e1 # Get attribute
|
||||
agent-browser get title # Get page title
|
||||
agent-browser get url # Get current URL
|
||||
agent-browser get count "button" # Count matching elements
|
||||
```
|
||||
|
||||
### Screenshots & PDFs
|
||||
|
||||
```bash
|
||||
agent-browser screenshot # Viewport screenshot
|
||||
agent-browser screenshot --full # Full page
|
||||
agent-browser screenshot output.png # Save to file
|
||||
agent-browser screenshot --full output.png # Full page to file
|
||||
agent-browser pdf output.pdf # Save as PDF
|
||||
```
|
||||
|
||||
### Wait
|
||||
|
||||
```bash
|
||||
agent-browser wait @e1 # Wait for element
|
||||
agent-browser wait 2000 # Wait milliseconds
|
||||
agent-browser wait "text" # Wait for text to appear
|
||||
```
|
||||
|
||||
## Semantic Locators (Alternative to Refs)
|
||||
|
||||
```bash
|
||||
agent-browser find role button click --name "Submit"
|
||||
agent-browser find text "Sign up" click
|
||||
agent-browser find label "Email" fill "user@example.com"
|
||||
agent-browser find placeholder "Search..." fill "query"
|
||||
```
|
||||
|
||||
## Sessions (Parallel Browsers)
|
||||
|
||||
```bash
|
||||
# Run multiple independent browser sessions
|
||||
agent-browser --session browser1 open https://site1.com
|
||||
agent-browser --session browser2 open https://site2.com
|
||||
|
||||
# List active sessions
|
||||
agent-browser session list
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
### Login Flow
|
||||
|
||||
```bash
|
||||
agent-browser open https://app.example.com/login
|
||||
agent-browser snapshot -i
|
||||
# Output shows: textbox "Email" [ref=e1], textbox "Password" [ref=e2], button "Sign in" [ref=e3]
|
||||
agent-browser fill @e1 "user@example.com"
|
||||
agent-browser fill @e2 "password123"
|
||||
agent-browser click @e3
|
||||
agent-browser wait 2000
|
||||
agent-browser snapshot -i # Verify logged in
|
||||
agent-browser wait --load networkidle
|
||||
agent-browser snapshot -i # Check result
|
||||
```
|
||||
|
||||
### Search and Extract
|
||||
## Command Chaining
|
||||
|
||||
Commands can be chained with `&&` in a single shell invocation. The browser persists between commands via a background daemon, so chaining is safe and more efficient than separate calls.
|
||||
|
||||
```bash
|
||||
agent-browser open https://news.ycombinator.com
|
||||
agent-browser snapshot -i --json
|
||||
# Parse JSON to find story links
|
||||
agent-browser get text @e12 # Get headline text
|
||||
agent-browser click @e12 # Click to open story
|
||||
# Chain open + wait + snapshot in one call
|
||||
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser snapshot -i
|
||||
|
||||
# Chain multiple interactions
|
||||
agent-browser fill @e1 "user@example.com" && agent-browser fill @e2 "password123" && agent-browser click @e3
|
||||
|
||||
# Navigate and capture
|
||||
agent-browser open https://example.com && agent-browser wait --load networkidle && agent-browser screenshot page.png
|
||||
```
|
||||
|
||||
### Form Filling
|
||||
**When to chain:** Use `&&` when you don't need to read the output of an intermediate command before proceeding (e.g., open + wait + screenshot). Run commands separately when you need to parse the output first (e.g., snapshot to discover refs, then interact using those refs).
|
||||
|
||||
## Handling Authentication
|
||||
|
||||
When automating a site that requires login, choose the approach that fits:
|
||||
|
||||
**Option 1: Import auth from the user's browser (fastest for one-off tasks)**
|
||||
|
||||
```bash
|
||||
agent-browser open https://forms.example.com
|
||||
# Connect to the user's running Chrome (they're already logged in)
|
||||
agent-browser --auto-connect state save ./auth.json
|
||||
# Use that auth state
|
||||
agent-browser --state ./auth.json open https://app.example.com/dashboard
|
||||
```
|
||||
|
||||
State files contain session tokens in plaintext -- add to `.gitignore` and delete when no longer needed. Set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest.
|
||||
|
||||
**Option 2: Persistent profile (simplest for recurring tasks)**
|
||||
|
||||
```bash
|
||||
# First run: login manually or via automation
|
||||
agent-browser --profile ~/.myapp open https://app.example.com/login
|
||||
# ... fill credentials, submit ...
|
||||
|
||||
# All future runs: already authenticated
|
||||
agent-browser --profile ~/.myapp open https://app.example.com/dashboard
|
||||
```
|
||||
|
||||
**Option 3: Session name (auto-save/restore cookies + localStorage)**
|
||||
|
||||
```bash
|
||||
agent-browser --session-name myapp open https://app.example.com/login
|
||||
# ... login flow ...
|
||||
agent-browser close # State auto-saved
|
||||
|
||||
# Next time: state auto-restored
|
||||
agent-browser --session-name myapp open https://app.example.com/dashboard
|
||||
```
|
||||
|
||||
**Option 4: Auth vault (credentials stored encrypted, login by name)**
|
||||
|
||||
```bash
|
||||
echo "$PASSWORD" | agent-browser auth save myapp --url https://app.example.com/login --username user --password-stdin
|
||||
agent-browser auth login myapp
|
||||
```
|
||||
|
||||
**Option 5: State file (manual save/load)**
|
||||
|
||||
```bash
|
||||
# After logging in:
|
||||
agent-browser state save ./auth.json
|
||||
# In a future session:
|
||||
agent-browser state load ./auth.json
|
||||
agent-browser open https://app.example.com/dashboard
|
||||
```
|
||||
|
||||
See [references/authentication.md](references/authentication.md) for OAuth, 2FA, cookie-based auth, and token refresh patterns.
|
||||
|
||||
## Essential Commands
|
||||
|
||||
```bash
|
||||
# Navigation
|
||||
agent-browser open <url> # Navigate (aliases: goto, navigate)
|
||||
agent-browser close # Close browser
|
||||
|
||||
# Snapshot
|
||||
agent-browser snapshot -i # Interactive elements with refs (recommended)
|
||||
agent-browser snapshot -i -C # Include cursor-interactive elements (divs with onclick, cursor:pointer)
|
||||
agent-browser snapshot -s "#selector" # Scope to CSS selector
|
||||
|
||||
# Interaction (use @refs from snapshot)
|
||||
agent-browser click @e1 # Click element
|
||||
agent-browser click @e1 --new-tab # Click and open in new tab
|
||||
agent-browser fill @e2 "text" # Clear and type text
|
||||
agent-browser type @e2 "text" # Type without clearing
|
||||
agent-browser select @e1 "option" # Select dropdown option
|
||||
agent-browser check @e1 # Check checkbox
|
||||
agent-browser press Enter # Press key
|
||||
agent-browser keyboard type "text" # Type at current focus (no selector)
|
||||
agent-browser keyboard inserttext "text" # Insert without key events
|
||||
agent-browser scroll down 500 # Scroll page
|
||||
agent-browser scroll down 500 --selector "div.content" # Scroll within a specific container
|
||||
|
||||
# Get information
|
||||
agent-browser get text @e1 # Get element text
|
||||
agent-browser get url # Get current URL
|
||||
agent-browser get title # Get page title
|
||||
agent-browser get cdp-url # Get CDP WebSocket URL
|
||||
|
||||
# Wait
|
||||
agent-browser wait @e1 # Wait for element
|
||||
agent-browser wait --load networkidle # Wait for network idle
|
||||
agent-browser wait --url "**/page" # Wait for URL pattern
|
||||
agent-browser wait 2000 # Wait milliseconds
|
||||
agent-browser wait --text "Welcome" # Wait for text to appear (substring match)
|
||||
agent-browser wait --fn "!document.body.innerText.includes('Loading...')" # Wait for text to disappear
|
||||
agent-browser wait "#spinner" --state hidden # Wait for element to disappear
|
||||
|
||||
# Downloads
|
||||
agent-browser download @e1 ./file.pdf # Click element to trigger download
|
||||
agent-browser wait --download ./output.zip # Wait for any download to complete
|
||||
agent-browser --download-path ./downloads open <url> # Set default download directory
|
||||
|
||||
# Viewport & Device Emulation
|
||||
agent-browser set viewport 1920 1080 # Set viewport size (default: 1280x720)
|
||||
agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)
|
||||
agent-browser set device "iPhone 14" # Emulate device (viewport + user agent)
|
||||
|
||||
# Capture
|
||||
agent-browser screenshot # Screenshot to temp dir
|
||||
agent-browser screenshot --full # Full page screenshot
|
||||
agent-browser screenshot --annotate # Annotated screenshot with numbered element labels
|
||||
agent-browser screenshot --screenshot-dir ./shots # Save to custom directory
|
||||
agent-browser screenshot --screenshot-format jpeg --screenshot-quality 80
|
||||
agent-browser pdf output.pdf # Save as PDF
|
||||
|
||||
# Clipboard
|
||||
agent-browser clipboard read # Read text from clipboard
|
||||
agent-browser clipboard write "Hello, World!" # Write text to clipboard
|
||||
agent-browser clipboard copy # Copy current selection
|
||||
agent-browser clipboard paste # Paste from clipboard
|
||||
|
||||
# Diff (compare page states)
|
||||
agent-browser diff snapshot # Compare current vs last snapshot
|
||||
agent-browser diff snapshot --baseline before.txt # Compare current vs saved file
|
||||
agent-browser diff screenshot --baseline before.png # Visual pixel diff
|
||||
agent-browser diff url <url1> <url2> # Compare two pages
|
||||
agent-browser diff url <url1> <url2> --wait-until networkidle # Custom wait strategy
|
||||
agent-browser diff url <url1> <url2> --selector "#main" # Scope to element
|
||||
```
|
||||
|
||||
## Common Patterns
|
||||
|
||||
### Form Submission
|
||||
|
||||
```bash
|
||||
agent-browser open https://example.com/signup
|
||||
agent-browser snapshot -i
|
||||
agent-browser fill @e1 "John Doe"
|
||||
agent-browser fill @e2 "john@example.com"
|
||||
agent-browser select @e3 "United States"
|
||||
agent-browser check @e4 # Agree to terms
|
||||
agent-browser click @e5 # Submit button
|
||||
agent-browser screenshot confirmation.png
|
||||
agent-browser fill @e1 "Jane Doe"
|
||||
agent-browser fill @e2 "jane@example.com"
|
||||
agent-browser select @e3 "California"
|
||||
agent-browser check @e4
|
||||
agent-browser click @e5
|
||||
agent-browser wait --load networkidle
|
||||
```
|
||||
|
||||
### Debug Mode
|
||||
### Authentication with Auth Vault (Recommended)
|
||||
|
||||
```bash
|
||||
# Run with visible browser window
|
||||
agent-browser --headed open https://example.com
|
||||
agent-browser --headed snapshot -i
|
||||
agent-browser --headed click @e1
|
||||
# Save credentials once (encrypted with AGENT_BROWSER_ENCRYPTION_KEY)
|
||||
# Recommended: pipe password via stdin to avoid shell history exposure
|
||||
echo "pass" | agent-browser auth save github --url https://github.com/login --username user --password-stdin
|
||||
|
||||
# Login using saved profile (LLM never sees password)
|
||||
agent-browser auth login github
|
||||
|
||||
# List/show/delete profiles
|
||||
agent-browser auth list
|
||||
agent-browser auth show github
|
||||
agent-browser auth delete github
|
||||
```
|
||||
|
||||
## JSON Output
|
||||
|
||||
Add `--json` for structured output:
|
||||
### Authentication with State Persistence
|
||||
|
||||
```bash
|
||||
# Login once and save state
|
||||
agent-browser open https://app.example.com/login
|
||||
agent-browser snapshot -i
|
||||
agent-browser fill @e1 "$USERNAME"
|
||||
agent-browser fill @e2 "$PASSWORD"
|
||||
agent-browser click @e3
|
||||
agent-browser wait --url "**/dashboard"
|
||||
agent-browser state save auth.json
|
||||
|
||||
# Reuse in future sessions
|
||||
agent-browser state load auth.json
|
||||
agent-browser open https://app.example.com/dashboard
|
||||
```
|
||||
|
||||
### Session Persistence
|
||||
|
||||
```bash
|
||||
# Auto-save/restore cookies and localStorage across browser restarts
|
||||
agent-browser --session-name myapp open https://app.example.com/login
|
||||
# ... login flow ...
|
||||
agent-browser close # State auto-saved to ~/.agent-browser/sessions/
|
||||
|
||||
# Next time, state is auto-loaded
|
||||
agent-browser --session-name myapp open https://app.example.com/dashboard
|
||||
|
||||
# Encrypt state at rest
|
||||
export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
|
||||
agent-browser --session-name secure open https://app.example.com
|
||||
|
||||
# Manage saved states
|
||||
agent-browser state list
|
||||
agent-browser state show myapp-default.json
|
||||
agent-browser state clear myapp
|
||||
agent-browser state clean --older-than 7
|
||||
```
|
||||
|
||||
### Data Extraction
|
||||
|
||||
```bash
|
||||
agent-browser open https://example.com/products
|
||||
agent-browser snapshot -i
|
||||
agent-browser get text @e5 # Get specific element text
|
||||
agent-browser get text body > page.txt # Get all page text
|
||||
|
||||
# JSON output for parsing
|
||||
agent-browser snapshot -i --json
|
||||
agent-browser get text @e1 --json
|
||||
```
|
||||
|
||||
Returns:
|
||||
### Parallel Sessions
|
||||
|
||||
```bash
|
||||
agent-browser --session site1 open https://site-a.com
|
||||
agent-browser --session site2 open https://site-b.com
|
||||
|
||||
agent-browser --session site1 snapshot -i
|
||||
agent-browser --session site2 snapshot -i
|
||||
|
||||
agent-browser session list
|
||||
```
|
||||
|
||||
### Connect to Existing Chrome
|
||||
|
||||
```bash
|
||||
# Auto-discover running Chrome with remote debugging enabled
|
||||
agent-browser --auto-connect open https://example.com
|
||||
agent-browser --auto-connect snapshot
|
||||
|
||||
# Or with explicit CDP port
|
||||
agent-browser --cdp 9222 snapshot
|
||||
```
|
||||
|
||||
### Color Scheme (Dark Mode)
|
||||
|
||||
```bash
|
||||
# Persistent dark mode via flag (applies to all pages and new tabs)
|
||||
agent-browser --color-scheme dark open https://example.com
|
||||
|
||||
# Or via environment variable
|
||||
AGENT_BROWSER_COLOR_SCHEME=dark agent-browser open https://example.com
|
||||
|
||||
# Or set during session (persists for subsequent commands)
|
||||
agent-browser set media dark
|
||||
```
|
||||
|
||||
### Viewport & Responsive Testing
|
||||
|
||||
```bash
|
||||
# Set a custom viewport size (default is 1280x720)
|
||||
agent-browser set viewport 1920 1080
|
||||
agent-browser screenshot desktop.png
|
||||
|
||||
# Test mobile-width layout
|
||||
agent-browser set viewport 375 812
|
||||
agent-browser screenshot mobile.png
|
||||
|
||||
# Retina/HiDPI: same CSS layout at 2x pixel density
|
||||
# Screenshots stay at logical viewport size, but content renders at higher DPI
|
||||
agent-browser set viewport 1920 1080 2
|
||||
agent-browser screenshot retina.png
|
||||
|
||||
# Device emulation (sets viewport + user agent in one step)
|
||||
agent-browser set device "iPhone 14"
|
||||
agent-browser screenshot device.png
|
||||
```
|
||||
|
||||
The `scale` parameter (3rd argument) sets `window.devicePixelRatio` without changing CSS layout. Use it when testing retina rendering or capturing higher-resolution screenshots.
|
||||
|
||||
### Visual Browser (Debugging)
|
||||
|
||||
```bash
|
||||
agent-browser --headed open https://example.com
|
||||
agent-browser highlight @e1 # Highlight element
|
||||
agent-browser inspect # Open Chrome DevTools for the active page
|
||||
agent-browser record start demo.webm # Record session
|
||||
agent-browser profiler start # Start Chrome DevTools profiling
|
||||
agent-browser profiler stop trace.json # Stop and save profile (path optional)
|
||||
```
|
||||
|
||||
Use `AGENT_BROWSER_HEADED=1` to enable headed mode via environment variable. Browser extensions work in both headed and headless mode.
|
||||
|
||||
### Local Files (PDFs, HTML)
|
||||
|
||||
```bash
|
||||
# Open local files with file:// URLs
|
||||
agent-browser --allow-file-access open file:///path/to/document.pdf
|
||||
agent-browser --allow-file-access open file:///path/to/page.html
|
||||
agent-browser screenshot output.png
|
||||
```
|
||||
|
||||
### iOS Simulator (Mobile Safari)
|
||||
|
||||
```bash
|
||||
# List available iOS simulators
|
||||
agent-browser device list
|
||||
|
||||
# Launch Safari on a specific device
|
||||
agent-browser -p ios --device "iPhone 16 Pro" open https://example.com
|
||||
|
||||
# Same workflow as desktop - snapshot, interact, re-snapshot
|
||||
agent-browser -p ios snapshot -i
|
||||
agent-browser -p ios tap @e1 # Tap (alias for click)
|
||||
agent-browser -p ios fill @e2 "text"
|
||||
agent-browser -p ios swipe up # Mobile-specific gesture
|
||||
|
||||
# Take screenshot
|
||||
agent-browser -p ios screenshot mobile.png
|
||||
|
||||
# Close session (shuts down simulator)
|
||||
agent-browser -p ios close
|
||||
```
|
||||
|
||||
**Requirements:** macOS with Xcode, Appium (`npm install -g appium && appium driver install xcuitest`)
|
||||
|
||||
**Real devices:** Works with physical iOS devices if pre-configured. Use `--device "<UDID>"` where UDID is from `xcrun xctrace list devices`.
|
||||
|
||||
## Security
|
||||
|
||||
All security features are opt-in. By default, agent-browser imposes no restrictions on navigation, actions, or output.
|
||||
|
||||
### Content Boundaries (Recommended for AI Agents)
|
||||
|
||||
Enable `--content-boundaries` to wrap page-sourced output in markers that help LLMs distinguish tool output from untrusted page content:
|
||||
|
||||
```bash
|
||||
export AGENT_BROWSER_CONTENT_BOUNDARIES=1
|
||||
agent-browser snapshot
|
||||
# Output:
|
||||
# --- AGENT_BROWSER_PAGE_CONTENT nonce=<hex> origin=https://example.com ---
|
||||
# [accessibility tree]
|
||||
# --- END_AGENT_BROWSER_PAGE_CONTENT nonce=<hex> ---
|
||||
```
|
||||
|
||||
### Domain Allowlist
|
||||
|
||||
Restrict navigation to trusted domains. Wildcards like `*.example.com` also match the bare domain `example.com`. Sub-resource requests, WebSocket, and EventSource connections to non-allowed domains are also blocked. Include CDN domains your target pages depend on:
|
||||
|
||||
```bash
|
||||
export AGENT_BROWSER_ALLOWED_DOMAINS="example.com,*.example.com"
|
||||
agent-browser open https://example.com # OK
|
||||
agent-browser open https://malicious.com # Blocked
|
||||
```
|
||||
|
||||
### Action Policy
|
||||
|
||||
Use a policy file to gate destructive actions:
|
||||
|
||||
```bash
|
||||
export AGENT_BROWSER_ACTION_POLICY=./policy.json
|
||||
```
|
||||
|
||||
Example `policy.json`:
|
||||
|
||||
```json
|
||||
{ "default": "deny", "allow": ["navigate", "snapshot", "click", "scroll", "wait", "get"] }
|
||||
```
|
||||
|
||||
Auth vault operations (`auth login`, etc.) bypass action policy but domain allowlist still applies.
|
||||
|
||||
### Output Limits
|
||||
|
||||
Prevent context flooding from large pages:
|
||||
|
||||
```bash
|
||||
export AGENT_BROWSER_MAX_OUTPUT=50000
|
||||
```
|
||||
|
||||
## Diffing (Verifying Changes)
|
||||
|
||||
Use `diff snapshot` after performing an action to verify it had the intended effect. This compares the current accessibility tree against the last snapshot taken in the session.
|
||||
|
||||
```bash
|
||||
# Typical workflow: snapshot -> action -> diff
|
||||
agent-browser snapshot -i # Take baseline snapshot
|
||||
agent-browser click @e2 # Perform action
|
||||
agent-browser diff snapshot # See what changed (auto-compares to last snapshot)
|
||||
```
|
||||
|
||||
For visual regression testing or monitoring:
|
||||
|
||||
```bash
|
||||
# Save a baseline screenshot, then compare later
|
||||
agent-browser screenshot baseline.png
|
||||
# ... time passes or changes are made ...
|
||||
agent-browser diff screenshot --baseline baseline.png
|
||||
|
||||
# Compare staging vs production
|
||||
agent-browser diff url https://staging.example.com https://prod.example.com --screenshot
|
||||
```
|
||||
|
||||
`diff snapshot` output uses `+` for additions and `-` for removals, similar to git diff. `diff screenshot` produces a diff image with changed pixels highlighted in red, plus a mismatch percentage.
|
||||
|
||||
## Timeouts and Slow Pages
|
||||
|
||||
The default timeout is 25 seconds. This can be overridden with the `AGENT_BROWSER_DEFAULT_TIMEOUT` environment variable (value in milliseconds). For slow websites or large pages, use explicit waits instead of relying on the default timeout:
|
||||
|
||||
```bash
|
||||
# Wait for network activity to settle (best for slow pages)
|
||||
agent-browser wait --load networkidle
|
||||
|
||||
# Wait for a specific element to appear
|
||||
agent-browser wait "#content"
|
||||
agent-browser wait @e1
|
||||
|
||||
# Wait for a specific URL pattern (useful after redirects)
|
||||
agent-browser wait --url "**/dashboard"
|
||||
|
||||
# Wait for a JavaScript condition
|
||||
agent-browser wait --fn "document.readyState === 'complete'"
|
||||
|
||||
# Wait a fixed duration (milliseconds) as a last resort
|
||||
agent-browser wait 5000
|
||||
```
|
||||
|
||||
When dealing with consistently slow websites, use `wait --load networkidle` after `open` to ensure the page is fully loaded before taking a snapshot. If a specific element is slow to render, wait for it directly with `wait <selector>` or `wait @ref`.
|
||||
|
||||
## Session Management and Cleanup
|
||||
|
||||
When running multiple agents or automations concurrently, always use named sessions to avoid conflicts:
|
||||
|
||||
```bash
|
||||
# Each agent gets its own isolated session
|
||||
agent-browser --session agent1 open site-a.com
|
||||
agent-browser --session agent2 open site-b.com
|
||||
|
||||
# Check active sessions
|
||||
agent-browser session list
|
||||
```
|
||||
|
||||
Always close your browser session when done to avoid leaked processes:
|
||||
|
||||
```bash
|
||||
agent-browser close # Close default session
|
||||
agent-browser --session agent1 close # Close specific session
|
||||
```
|
||||
|
||||
If a previous session was not closed properly, the daemon may still be running. Use `agent-browser close` to clean it up before starting new work.
|
||||
|
||||
To auto-shutdown the daemon after a period of inactivity (useful for ephemeral/CI environments):
|
||||
|
||||
```bash
|
||||
AGENT_BROWSER_IDLE_TIMEOUT_MS=60000 agent-browser open example.com
|
||||
```
|
||||
|
||||
## Ref Lifecycle (Important)
|
||||
|
||||
Refs (`@e1`, `@e2`, etc.) are invalidated when the page changes. Always re-snapshot after:
|
||||
|
||||
- Clicking links or buttons that navigate
|
||||
- Form submissions
|
||||
- Dynamic content loading (dropdowns, modals)
|
||||
|
||||
```bash
|
||||
agent-browser click @e5 # Navigates to new page
|
||||
agent-browser snapshot -i # MUST re-snapshot
|
||||
agent-browser click @e1 # Use new refs
|
||||
```
|
||||
|
||||
## Annotated Screenshots (Vision Mode)
|
||||
|
||||
Use `--annotate` to take a screenshot with numbered labels overlaid on interactive elements. Each label `[N]` maps to ref `@eN`. This also caches refs, so you can interact with elements immediately without a separate snapshot.
|
||||
|
||||
```bash
|
||||
agent-browser screenshot --annotate
|
||||
# Output includes the image path and a legend:
|
||||
# [1] @e1 button "Submit"
|
||||
# [2] @e2 link "Home"
|
||||
# [3] @e3 textbox "Email"
|
||||
agent-browser click @e2 # Click using ref from annotated screenshot
|
||||
```
|
||||
|
||||
Use annotated screenshots when:
|
||||
|
||||
- The page has unlabeled icon buttons or visual-only elements
|
||||
- You need to verify visual layout or styling
|
||||
- Canvas or chart elements are present (invisible to text snapshots)
|
||||
- You need spatial reasoning about element positions
|
||||
|
||||
## Semantic Locators (Alternative to Refs)
|
||||
|
||||
When refs are unavailable or unreliable, use semantic locators:
|
||||
|
||||
```bash
|
||||
agent-browser find text "Sign In" click
|
||||
agent-browser find label "Email" fill "user@test.com"
|
||||
agent-browser find role button click --name "Submit"
|
||||
agent-browser find placeholder "Search" type "query"
|
||||
agent-browser find testid "submit-btn" click
|
||||
```
|
||||
|
||||
## JavaScript Evaluation (eval)
|
||||
|
||||
Use `eval` to run JavaScript in the browser context. **Shell quoting can corrupt complex expressions** -- use `--stdin` or `-b` to avoid issues.
|
||||
|
||||
```bash
|
||||
# Simple expressions work with regular quoting
|
||||
agent-browser eval 'document.title'
|
||||
agent-browser eval 'document.querySelectorAll("img").length'
|
||||
|
||||
# Complex JS: use --stdin with heredoc (RECOMMENDED)
|
||||
agent-browser eval --stdin <<'EVALEOF'
|
||||
JSON.stringify(
|
||||
Array.from(document.querySelectorAll("img"))
|
||||
.filter(i => !i.alt)
|
||||
.map(i => ({ src: i.src.split("/").pop(), width: i.width }))
|
||||
)
|
||||
EVALEOF
|
||||
|
||||
# Alternative: base64 encoding (avoids all shell escaping issues)
|
||||
agent-browser eval -b "$(echo -n 'Array.from(document.querySelectorAll("a")).map(a => a.href)' | base64)"
|
||||
```
|
||||
|
||||
**Why this matters:** When the shell processes your command, inner double quotes, `!` characters (history expansion), backticks, and `$()` can all corrupt the JavaScript before it reaches agent-browser. The `--stdin` and `-b` flags bypass shell interpretation entirely.
|
||||
|
||||
**Rules of thumb:**
|
||||
|
||||
- Single-line, no nested quotes -> regular `eval 'expression'` with single quotes is fine
|
||||
- Nested quotes, arrow functions, template literals, or multiline -> use `eval --stdin <<'EVALEOF'`
|
||||
- Programmatic/generated scripts -> use `eval -b` with base64
|
||||
|
||||
## Configuration File
|
||||
|
||||
Create `agent-browser.json` in the project root for persistent settings:
|
||||
|
||||
```json
|
||||
{
|
||||
"success": true,
|
||||
"data": {
|
||||
"refs": {
|
||||
"e1": {"name": "Submit", "role": "button"},
|
||||
"e2": {"name": "Email", "role": "textbox"}
|
||||
},
|
||||
"snapshot": "- button \"Submit\" [ref=e1]\n- textbox \"Email\" [ref=e2]"
|
||||
}
|
||||
"headed": true,
|
||||
"proxy": "http://localhost:8080",
|
||||
"profile": "./browser-data"
|
||||
}
|
||||
```
|
||||
|
||||
## Inspection & Debugging
|
||||
Priority (lowest to highest): `~/.agent-browser/config.json` < `./agent-browser.json` < env vars < CLI flags. Use `--config <path>` or `AGENT_BROWSER_CONFIG` env var for a custom config file (exits with error if missing/invalid). All CLI options map to camelCase keys (e.g., `--executable-path` -> `"executablePath"`). Boolean flags accept `true`/`false` values (e.g., `--headed false` overrides config). Extensions from user and project configs are merged, not replaced.
|
||||
|
||||
### JavaScript Evaluation
|
||||
## Browser Engine Selection
|
||||
|
||||
Use `--engine` to choose a local browser engine. The default is `chrome`.
|
||||
|
||||
```bash
|
||||
agent-browser eval "document.title" # Evaluate JS expression
|
||||
agent-browser eval "JSON.stringify(localStorage)" # Return serialized data
|
||||
agent-browser eval "document.querySelectorAll('a').length" # Count elements
|
||||
# Use Lightpanda (fast headless browser, requires separate install)
|
||||
agent-browser --engine lightpanda open example.com
|
||||
|
||||
# Via environment variable
|
||||
export AGENT_BROWSER_ENGINE=lightpanda
|
||||
agent-browser open example.com
|
||||
|
||||
# With custom binary path
|
||||
agent-browser --engine lightpanda --executable-path /path/to/lightpanda open example.com
|
||||
```
|
||||
|
||||
### Console & Errors
|
||||
Supported engines:
|
||||
- `chrome` (default) -- Chrome/Chromium via CDP
|
||||
- `lightpanda` -- Lightpanda headless browser via CDP (10x faster, 10x less memory than Chrome)
|
||||
|
||||
Lightpanda does not support `--extension`, `--profile`, `--state`, or `--allow-file-access`. Install Lightpanda from https://lightpanda.io/docs/open-source/installation.
|
||||
|
||||
## Deep-Dive Documentation
|
||||
|
||||
| Reference | When to Use |
|
||||
| -------------------------------------------------------------------- | --------------------------------------------------------- |
|
||||
| [references/commands.md](references/commands.md) | Full command reference with all options |
|
||||
| [references/snapshot-refs.md](references/snapshot-refs.md) | Ref lifecycle, invalidation rules, troubleshooting |
|
||||
| [references/session-management.md](references/session-management.md) | Parallel sessions, state persistence, concurrent scraping |
|
||||
| [references/authentication.md](references/authentication.md) | Login flows, OAuth, 2FA handling, state reuse |
|
||||
| [references/video-recording.md](references/video-recording.md) | Recording workflows for debugging and documentation |
|
||||
| [references/profiling.md](references/profiling.md) | Chrome DevTools profiling for performance analysis |
|
||||
| [references/proxy-support.md](references/proxy-support.md) | Proxy configuration, geo-testing, rotating proxies |
|
||||
|
||||
## Ready-to-Use Templates
|
||||
|
||||
| Template | Description |
|
||||
| ------------------------------------------------------------------------ | ----------------------------------- |
|
||||
| [templates/form-automation.sh](templates/form-automation.sh) | Form filling with validation |
|
||||
| [templates/authenticated-session.sh](templates/authenticated-session.sh) | Login once, reuse state |
|
||||
| [templates/capture-workflow.sh](templates/capture-workflow.sh) | Content extraction with screenshots |
|
||||
|
||||
```bash
|
||||
agent-browser console # Show browser console output
|
||||
agent-browser console --clear # Show and clear console
|
||||
agent-browser errors # Show JavaScript errors only
|
||||
agent-browser errors --clear # Show and clear errors
|
||||
```
|
||||
|
||||
## Network
|
||||
|
||||
```bash
|
||||
agent-browser network requests # List captured requests
|
||||
agent-browser network requests --filter "api" # Filter by URL pattern
|
||||
agent-browser route "**/*.png" abort # Block matching requests
|
||||
agent-browser route "https://api.example.com/*" fulfill --status 200 --body '{"mock":true}' # Mock response
|
||||
agent-browser unroute "**/*.png" # Remove route handler
|
||||
```
|
||||
|
||||
## Storage
|
||||
|
||||
### Cookies
|
||||
|
||||
```bash
|
||||
agent-browser cookies get # Get all cookies
|
||||
agent-browser cookies get --name "session" # Get specific cookie
|
||||
agent-browser cookies set --name "token" --value "abc" # Set cookie
|
||||
agent-browser cookies clear # Clear all cookies
|
||||
```
|
||||
|
||||
### Local & Session Storage
|
||||
|
||||
```bash
|
||||
agent-browser storage local # Get all localStorage
|
||||
agent-browser storage local --key "theme" # Get specific key
|
||||
agent-browser storage session # Get all sessionStorage
|
||||
agent-browser storage session --key "cart" # Get specific key
|
||||
```
|
||||
|
||||
## Device & Settings
|
||||
|
||||
```bash
|
||||
agent-browser set viewport 1920 1080 # Set viewport size
|
||||
agent-browser set device "iPhone 14" # Emulate device
|
||||
agent-browser set geo --lat 47.6 --lon -122.3 # Set geolocation
|
||||
agent-browser set offline true # Enable offline mode
|
||||
agent-browser set offline false # Disable offline mode
|
||||
agent-browser set media "prefers-color-scheme" "dark" # Set media feature
|
||||
agent-browser set headers '{"X-Custom":"value"}' # Set extra HTTP headers
|
||||
agent-browser set credentials "user" "pass" # Set HTTP auth credentials
|
||||
```
|
||||
|
||||
## Element Debugging
|
||||
|
||||
```bash
|
||||
agent-browser highlight @e1 # Highlight element visually
|
||||
agent-browser get box @e1 # Get bounding box (x, y, width, height)
|
||||
agent-browser get styles @e1 # Get computed styles
|
||||
agent-browser is visible @e1 # Check if element is visible
|
||||
agent-browser is enabled @e1 # Check if element is enabled
|
||||
agent-browser is checked @e1 # Check if checkbox/radio is checked
|
||||
```
|
||||
|
||||
## Recording & Tracing
|
||||
|
||||
```bash
|
||||
agent-browser trace start # Start recording trace
|
||||
agent-browser trace stop trace.zip # Stop and save trace file
|
||||
agent-browser record start # Start recording video
|
||||
agent-browser record stop video.webm # Stop and save recording
|
||||
```
|
||||
|
||||
## Tabs & Windows
|
||||
|
||||
```bash
|
||||
agent-browser tab list # List open tabs
|
||||
agent-browser tab new https://example.com # Open URL in new tab
|
||||
agent-browser tab close # Close current tab
|
||||
agent-browser tab 2 # Switch to tab by index
|
||||
```
|
||||
|
||||
## Advanced Mouse
|
||||
|
||||
```bash
|
||||
agent-browser mouse move 100 200 # Move mouse to coordinates
|
||||
agent-browser mouse down # Press mouse button
|
||||
agent-browser mouse up # Release mouse button
|
||||
agent-browser mouse wheel 0 500 # Scroll (deltaX, deltaY)
|
||||
agent-browser drag @e1 @e2 # Drag from element to element
|
||||
./templates/form-automation.sh https://example.com/form
|
||||
./templates/authenticated-session.sh https://app.example.com/login
|
||||
./templates/capture-workflow.sh https://example.com ./output
|
||||
```
|
||||
|
||||
## vs Playwright MCP
|
||||
|
||||
Reference in New Issue
Block a user