feat: sync agent-browser skill with upstream vercel-labs/agent-browser

Update SKILL.md to match the latest upstream skill from
vercel-labs/agent-browser, adding substantial new capabilities:

- Authentication (auth vault, profiles, session persistence, state files)
- Command chaining, annotated screenshots, diffing
- Security features (content boundaries, domain allowlist, action policy)
- iOS Simulator support, Lightpanda engine, downloads, clipboard
- JS eval improvements (--stdin, -b for shell safety)
- Timeout guidance, config files, session cleanup

Add 7 reference docs (commands, authentication, snapshot-refs,
session-management, video-recording, profiling, proxy-support) and
3 ready-to-use shell templates.

Kept our YAML frontmatter, setup check section, and Playwright MCP
comparison table which are unique to our plugin context.
This commit is contained in:
Trevin Chow
2026-03-14 20:08:27 -07:00
parent 7c04c3158f
commit 24860ec3f1
11 changed files with 2260 additions and 240 deletions

View File

@@ -0,0 +1,303 @@
# Authentication Patterns
Login flows, session persistence, OAuth, 2FA, and authenticated browsing.
**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.
## Contents
- [Import Auth from Your Browser](#import-auth-from-your-browser)
- [Persistent Profiles](#persistent-profiles)
- [Session Persistence](#session-persistence)
- [Basic Login Flow](#basic-login-flow)
- [Saving Authentication State](#saving-authentication-state)
- [Restoring Authentication](#restoring-authentication)
- [OAuth / SSO Flows](#oauth--sso-flows)
- [Two-Factor Authentication](#two-factor-authentication)
- [HTTP Basic Auth](#http-basic-auth)
- [Cookie-Based Auth](#cookie-based-auth)
- [Token Refresh Handling](#token-refresh-handling)
- [Security Best Practices](#security-best-practices)
## Import Auth from Your Browser
The fastest way to authenticate is to reuse cookies from a Chrome session you are already logged into.
**Step 1: Start Chrome with remote debugging**
```bash
# macOS
"/Applications/Google Chrome.app/Contents/MacOS/Google Chrome" --remote-debugging-port=9222
# Linux
google-chrome --remote-debugging-port=9222
# Windows
"C:\Program Files\Google\Chrome\Application\chrome.exe" --remote-debugging-port=9222
```
Log in to your target site(s) in this Chrome window as you normally would.
> **Security note:** `--remote-debugging-port` exposes full browser control on localhost. Any local process can connect and read cookies, execute JS, etc. Only use on trusted machines and close Chrome when done.
**Step 2: Grab the auth state**
```bash
# Auto-discover the running Chrome and save its cookies + localStorage
agent-browser --auto-connect state save ./my-auth.json
```
**Step 3: Reuse in automation**
```bash
# Load auth at launch
agent-browser --state ./my-auth.json open https://app.example.com/dashboard
# Or load into an existing session
agent-browser state load ./my-auth.json
agent-browser open https://app.example.com/dashboard
```
This works for any site, including those with complex OAuth flows, SSO, or 2FA -- as long as Chrome already has valid session cookies.
> **Security note:** State files contain session tokens in plaintext. Add them to `.gitignore`, delete when no longer needed, and set `AGENT_BROWSER_ENCRYPTION_KEY` for encryption at rest. See [Security Best Practices](#security-best-practices).
**Tip:** Combine with `--session-name` so the imported auth auto-persists across restarts:
```bash
agent-browser --session-name myapp state load ./my-auth.json
# From now on, state is auto-saved/restored for "myapp"
```
## Persistent Profiles
Use `--profile` to point agent-browser at a Chrome user data directory. This persists everything (cookies, IndexedDB, service workers, cache) across browser restarts without explicit save/load:
```bash
# First run: login once
agent-browser --profile ~/.myapp-profile open https://app.example.com/login
# ... complete login flow ...
# All subsequent runs: already authenticated
agent-browser --profile ~/.myapp-profile open https://app.example.com/dashboard
```
Use different paths for different projects or test users:
```bash
agent-browser --profile ~/.profiles/admin open https://app.example.com
agent-browser --profile ~/.profiles/viewer open https://app.example.com
```
Or set via environment variable:
```bash
export AGENT_BROWSER_PROFILE=~/.myapp-profile
agent-browser open https://app.example.com/dashboard
```
## Session Persistence
Use `--session-name` to auto-save and restore cookies + localStorage by name, without managing files:
```bash
# Auto-saves state on close, auto-restores on next launch
agent-browser --session-name twitter open https://twitter.com
# ... login flow ...
agent-browser close # state saved to ~/.agent-browser/sessions/
# Next time: state is automatically restored
agent-browser --session-name twitter open https://twitter.com
```
Encrypt state at rest:
```bash
export AGENT_BROWSER_ENCRYPTION_KEY=$(openssl rand -hex 32)
agent-browser --session-name secure open https://app.example.com
```
## Basic Login Flow
```bash
# Navigate to login page
agent-browser open https://app.example.com/login
agent-browser wait --load networkidle
# Get form elements
agent-browser snapshot -i
# Output: @e1 [input type="email"], @e2 [input type="password"], @e3 [button] "Sign In"
# Fill credentials
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
# Submit
agent-browser click @e3
agent-browser wait --load networkidle
# Verify login succeeded
agent-browser get url # Should be dashboard, not login
```
## Saving Authentication State
After logging in, save state for reuse:
```bash
# Login first (see above)
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
# Save authenticated state
agent-browser state save ./auth-state.json
```
## Restoring Authentication
Skip login by loading saved state:
```bash
# Load saved auth state
agent-browser state load ./auth-state.json
# Navigate directly to protected page
agent-browser open https://app.example.com/dashboard
# Verify authenticated
agent-browser snapshot -i
```
## OAuth / SSO Flows
For OAuth redirects:
```bash
# Start OAuth flow
agent-browser open https://app.example.com/auth/google
# Handle redirects automatically
agent-browser wait --url "**/accounts.google.com**"
agent-browser snapshot -i
# Fill Google credentials
agent-browser fill @e1 "user@gmail.com"
agent-browser click @e2 # Next button
agent-browser wait 2000
agent-browser snapshot -i
agent-browser fill @e3 "password"
agent-browser click @e4 # Sign in
# Wait for redirect back
agent-browser wait --url "**/app.example.com**"
agent-browser state save ./oauth-state.json
```
## Two-Factor Authentication
Handle 2FA with manual intervention:
```bash
# Login with credentials
agent-browser open https://app.example.com/login --headed # Show browser
agent-browser snapshot -i
agent-browser fill @e1 "user@example.com"
agent-browser fill @e2 "password123"
agent-browser click @e3
# Wait for user to complete 2FA manually
echo "Complete 2FA in the browser window..."
agent-browser wait --url "**/dashboard" --timeout 120000
# Save state after 2FA
agent-browser state save ./2fa-state.json
```
## HTTP Basic Auth
For sites using HTTP Basic Authentication:
```bash
# Set credentials before navigation
agent-browser set credentials username password
# Navigate to protected resource
agent-browser open https://protected.example.com/api
```
## Cookie-Based Auth
Manually set authentication cookies:
```bash
# Set auth cookie
agent-browser cookies set session_token "abc123xyz"
# Navigate to protected page
agent-browser open https://app.example.com/dashboard
```
## Token Refresh Handling
For sessions with expiring tokens:
```bash
#!/bin/bash
# Wrapper that handles token refresh
STATE_FILE="./auth-state.json"
# Try loading existing state
if [[ -f "$STATE_FILE" ]]; then
agent-browser state load "$STATE_FILE"
agent-browser open https://app.example.com/dashboard
# Check if session is still valid
URL=$(agent-browser get url)
if [[ "$URL" == *"/login"* ]]; then
echo "Session expired, re-authenticating..."
# Perform fresh login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --url "**/dashboard"
agent-browser state save "$STATE_FILE"
fi
else
# First-time login
agent-browser open https://app.example.com/login
# ... login flow ...
fi
```
## Security Best Practices
1. **Never commit state files** - They contain session tokens
```bash
echo "*.auth-state.json" >> .gitignore
```
2. **Use environment variables for credentials**
```bash
agent-browser fill @e1 "$APP_USERNAME"
agent-browser fill @e2 "$APP_PASSWORD"
```
3. **Clean up after automation**
```bash
agent-browser cookies clear
rm -f ./auth-state.json
```
4. **Use short-lived sessions for CI/CD**
```bash
# Don't persist state in CI
agent-browser open https://app.example.com/login
# ... login and perform actions ...
agent-browser close # Session ends, nothing persisted
```

View File

@@ -0,0 +1,266 @@
# Command Reference
Complete reference for all agent-browser commands. For quick start and common patterns, see SKILL.md.
## Navigation
```bash
agent-browser open <url> # Navigate to URL (aliases: goto, navigate)
# Supports: https://, http://, file://, about:, data://
# Auto-prepends https:// if no protocol given
agent-browser back # Go back
agent-browser forward # Go forward
agent-browser reload # Reload page
agent-browser close # Close browser (aliases: quit, exit)
agent-browser connect 9222 # Connect to browser via CDP port
```
## Snapshot (page analysis)
```bash
agent-browser snapshot # Full accessibility tree
agent-browser snapshot -i # Interactive elements only (recommended)
agent-browser snapshot -c # Compact output
agent-browser snapshot -d 3 # Limit depth to 3
agent-browser snapshot -s "#main" # Scope to CSS selector
```
## Interactions (use @refs from snapshot)
```bash
agent-browser click @e1 # Click
agent-browser click @e1 --new-tab # Click and open in new tab
agent-browser dblclick @e1 # Double-click
agent-browser focus @e1 # Focus element
agent-browser fill @e2 "text" # Clear and type
agent-browser type @e2 "text" # Type without clearing
agent-browser press Enter # Press key (alias: key)
agent-browser press Control+a # Key combination
agent-browser keydown Shift # Hold key down
agent-browser keyup Shift # Release key
agent-browser hover @e1 # Hover
agent-browser check @e1 # Check checkbox
agent-browser uncheck @e1 # Uncheck checkbox
agent-browser select @e1 "value" # Select dropdown option
agent-browser select @e1 "a" "b" # Select multiple options
agent-browser scroll down 500 # Scroll page (default: down 300px)
agent-browser scrollintoview @e1 # Scroll element into view (alias: scrollinto)
agent-browser drag @e1 @e2 # Drag and drop
agent-browser upload @e1 file.pdf # Upload files
```
## Get Information
```bash
agent-browser get text @e1 # Get element text
agent-browser get html @e1 # Get innerHTML
agent-browser get value @e1 # Get input value
agent-browser get attr @e1 href # Get attribute
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser get cdp-url # Get CDP WebSocket URL
agent-browser get count ".item" # Count matching elements
agent-browser get box @e1 # Get bounding box
agent-browser get styles @e1 # Get computed styles (font, color, bg, etc.)
```
## Check State
```bash
agent-browser is visible @e1 # Check if visible
agent-browser is enabled @e1 # Check if enabled
agent-browser is checked @e1 # Check if checked
```
## Screenshots and PDF
```bash
agent-browser screenshot # Save to temporary directory
agent-browser screenshot path.png # Save to specific path
agent-browser screenshot --full # Full page
agent-browser pdf output.pdf # Save as PDF
```
## Video Recording
```bash
agent-browser record start ./demo.webm # Start recording
agent-browser click @e1 # Perform actions
agent-browser record stop # Stop and save video
agent-browser record restart ./take2.webm # Stop current + start new
```
## Wait
```bash
agent-browser wait @e1 # Wait for element
agent-browser wait 2000 # Wait milliseconds
agent-browser wait --text "Success" # Wait for text (or -t)
agent-browser wait --url "**/dashboard" # Wait for URL pattern (or -u)
agent-browser wait --load networkidle # Wait for network idle (or -l)
agent-browser wait --fn "window.ready" # Wait for JS condition (or -f)
```
## Mouse Control
```bash
agent-browser mouse move 100 200 # Move mouse
agent-browser mouse down left # Press button
agent-browser mouse up left # Release button
agent-browser mouse wheel 100 # Scroll wheel
```
## Semantic Locators (alternative to refs)
```bash
agent-browser find role button click --name "Submit"
agent-browser find text "Sign In" click
agent-browser find text "Sign In" click --exact # Exact match only
agent-browser find label "Email" fill "user@test.com"
agent-browser find placeholder "Search" type "query"
agent-browser find alt "Logo" click
agent-browser find title "Close" click
agent-browser find testid "submit-btn" click
agent-browser find first ".item" click
agent-browser find last ".item" click
agent-browser find nth 2 "a" hover
```
## Browser Settings
```bash
agent-browser set viewport 1920 1080 # Set viewport size
agent-browser set viewport 1920 1080 2 # 2x retina (same CSS size, higher res screenshots)
agent-browser set device "iPhone 14" # Emulate device
agent-browser set geo 37.7749 -122.4194 # Set geolocation (alias: geolocation)
agent-browser set offline on # Toggle offline mode
agent-browser set headers '{"X-Key":"v"}' # Extra HTTP headers
agent-browser set credentials user pass # HTTP basic auth (alias: auth)
agent-browser set media dark # Emulate color scheme
agent-browser set media light reduced-motion # Light mode + reduced motion
```
## Cookies and Storage
```bash
agent-browser cookies # Get all cookies
agent-browser cookies set name value # Set cookie
agent-browser cookies clear # Clear cookies
agent-browser storage local # Get all localStorage
agent-browser storage local key # Get specific key
agent-browser storage local set k v # Set value
agent-browser storage local clear # Clear all
```
## Network
```bash
agent-browser network route <url> # Intercept requests
agent-browser network route <url> --abort # Block requests
agent-browser network route <url> --body '{}' # Mock response
agent-browser network unroute [url] # Remove routes
agent-browser network requests # View tracked requests
agent-browser network requests --filter api # Filter requests
```
## Tabs and Windows
```bash
agent-browser tab # List tabs
agent-browser tab new [url] # New tab
agent-browser tab 2 # Switch to tab by index
agent-browser tab close # Close current tab
agent-browser tab close 2 # Close tab by index
agent-browser window new # New window
```
## Frames
```bash
agent-browser frame "#iframe" # Switch to iframe
agent-browser frame main # Back to main frame
```
## Dialogs
```bash
agent-browser dialog accept [text] # Accept dialog
agent-browser dialog dismiss # Dismiss dialog
```
## JavaScript
```bash
agent-browser eval "document.title" # Simple expressions only
agent-browser eval -b "<base64>" # Any JavaScript (base64 encoded)
agent-browser eval --stdin # Read script from stdin
```
Use `-b`/`--base64` or `--stdin` for reliable execution. Shell escaping with nested quotes and special characters is error-prone.
```bash
# Base64 encode your script, then:
agent-browser eval -b "ZG9jdW1lbnQucXVlcnlTZWxlY3RvcignW3NyYyo9Il9uZXh0Il0nKQ=="
# Or use stdin with heredoc for multiline scripts:
cat <<'EOF' | agent-browser eval --stdin
const links = document.querySelectorAll('a');
Array.from(links).map(a => a.href);
EOF
```
## State Management
```bash
agent-browser state save auth.json # Save cookies, storage, auth state
agent-browser state load auth.json # Restore saved state
```
## Global Options
```bash
agent-browser --session <name> ... # Isolated browser session
agent-browser --json ... # JSON output for parsing
agent-browser --headed ... # Show browser window (not headless)
agent-browser --full ... # Full page screenshot (-f)
agent-browser --cdp <port> ... # Connect via Chrome DevTools Protocol
agent-browser -p <provider> ... # Cloud browser provider (--provider)
agent-browser --proxy <url> ... # Use proxy server
agent-browser --proxy-bypass <hosts> # Hosts to bypass proxy
agent-browser --headers <json> ... # HTTP headers scoped to URL's origin
agent-browser --executable-path <p> # Custom browser executable
agent-browser --extension <path> ... # Load browser extension (repeatable)
agent-browser --ignore-https-errors # Ignore SSL certificate errors
agent-browser --help # Show help (-h)
agent-browser --version # Show version (-V)
agent-browser <command> --help # Show detailed help for a command
```
## Debugging
```bash
agent-browser --headed open example.com # Show browser window
agent-browser --cdp 9222 snapshot # Connect via CDP port
agent-browser connect 9222 # Alternative: connect command
agent-browser console # View console messages
agent-browser console --clear # Clear console
agent-browser errors # View page errors
agent-browser errors --clear # Clear errors
agent-browser highlight @e1 # Highlight element
agent-browser inspect # Open Chrome DevTools for this session
agent-browser trace start # Start recording trace
agent-browser trace stop trace.zip # Stop and save trace
agent-browser profiler start # Start Chrome DevTools profiling
agent-browser profiler stop trace.json # Stop and save profile
```
## Environment Variables
```bash
AGENT_BROWSER_SESSION="mysession" # Default session name
AGENT_BROWSER_EXECUTABLE_PATH="/path/chrome" # Custom browser path
AGENT_BROWSER_EXTENSIONS="/ext1,/ext2" # Comma-separated extension paths
AGENT_BROWSER_PROVIDER="browserbase" # Cloud browser provider
AGENT_BROWSER_STREAM_PORT="9223" # WebSocket streaming port
AGENT_BROWSER_HOME="/path/to/agent-browser" # Custom install location
```

View File

@@ -0,0 +1,120 @@
# Profiling
Capture Chrome DevTools performance profiles during browser automation for performance analysis.
**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.
## Contents
- [Basic Profiling](#basic-profiling)
- [Profiler Commands](#profiler-commands)
- [Categories](#categories)
- [Use Cases](#use-cases)
- [Output Format](#output-format)
- [Viewing Profiles](#viewing-profiles)
- [Limitations](#limitations)
## Basic Profiling
```bash
# Start profiling
agent-browser profiler start
# Perform actions
agent-browser navigate https://example.com
agent-browser click "#button"
agent-browser wait 1000
# Stop and save
agent-browser profiler stop ./trace.json
```
## Profiler Commands
```bash
# Start profiling with default categories
agent-browser profiler start
# Start with custom trace categories
agent-browser profiler start --categories "devtools.timeline,v8.execute,blink.user_timing"
# Stop profiling and save to file
agent-browser profiler stop ./trace.json
```
## Categories
The `--categories` flag accepts a comma-separated list of Chrome trace categories. Default categories include:
- `devtools.timeline` -- standard DevTools performance traces
- `v8.execute` -- time spent running JavaScript
- `blink` -- renderer events
- `blink.user_timing` -- `performance.mark()` / `performance.measure()` calls
- `latencyInfo` -- input-to-latency tracking
- `renderer.scheduler` -- task scheduling and execution
- `toplevel` -- broad-spectrum basic events
Several `disabled-by-default-*` categories are also included for detailed timeline, call stack, and V8 CPU profiling data.
## Use Cases
### Diagnosing Slow Page Loads
```bash
agent-browser profiler start
agent-browser navigate https://app.example.com
agent-browser wait --load networkidle
agent-browser profiler stop ./page-load-profile.json
```
### Profiling User Interactions
```bash
agent-browser navigate https://app.example.com
agent-browser profiler start
agent-browser click "#submit"
agent-browser wait 2000
agent-browser profiler stop ./interaction-profile.json
```
### CI Performance Regression Checks
```bash
#!/bin/bash
agent-browser profiler start
agent-browser navigate https://app.example.com
agent-browser wait --load networkidle
agent-browser profiler stop "./profiles/build-${BUILD_ID}.json"
```
## Output Format
The output is a JSON file in Chrome Trace Event format:
```json
{
"traceEvents": [
{ "cat": "devtools.timeline", "name": "RunTask", "ph": "X", "ts": 12345, "dur": 100 },
...
],
"metadata": {
"clock-domain": "LINUX_CLOCK_MONOTONIC"
}
}
```
The `metadata.clock-domain` field is set based on the host platform (Linux or macOS). On Windows it is omitted.
## Viewing Profiles
Load the output JSON file in any of these tools:
- **Chrome DevTools**: Performance panel > Load profile (Ctrl+Shift+I > Performance)
- **Perfetto UI**: https://ui.perfetto.dev/ -- drag and drop the JSON file
- **Trace Viewer**: `chrome://tracing` in any Chromium browser
## Limitations
- Only works with Chromium-based browsers (Chrome, Edge). Not supported on Firefox or WebKit.
- Trace data accumulates in memory while profiling is active (capped at 5 million events). Stop profiling promptly after the area of interest.
- Data collection on stop has a 30-second timeout. If the browser is unresponsive, the stop command may fail.

View File

@@ -0,0 +1,194 @@
# Proxy Support
Proxy configuration for geo-testing, rate limiting avoidance, and corporate environments.
**Related**: [commands.md](commands.md) for global options, [SKILL.md](../SKILL.md) for quick start.
## Contents
- [Basic Proxy Configuration](#basic-proxy-configuration)
- [Authenticated Proxy](#authenticated-proxy)
- [SOCKS Proxy](#socks-proxy)
- [Proxy Bypass](#proxy-bypass)
- [Common Use Cases](#common-use-cases)
- [Verifying Proxy Connection](#verifying-proxy-connection)
- [Troubleshooting](#troubleshooting)
- [Best Practices](#best-practices)
## Basic Proxy Configuration
Use the `--proxy` flag or set proxy via environment variable:
```bash
# Via CLI flag
agent-browser --proxy "http://proxy.example.com:8080" open https://example.com
# Via environment variable
export HTTP_PROXY="http://proxy.example.com:8080"
agent-browser open https://example.com
# HTTPS proxy
export HTTPS_PROXY="https://proxy.example.com:8080"
agent-browser open https://example.com
# Both
export HTTP_PROXY="http://proxy.example.com:8080"
export HTTPS_PROXY="http://proxy.example.com:8080"
agent-browser open https://example.com
```
## Authenticated Proxy
For proxies requiring authentication:
```bash
# Include credentials in URL
export HTTP_PROXY="http://username:password@proxy.example.com:8080"
agent-browser open https://example.com
```
## SOCKS Proxy
```bash
# SOCKS5 proxy
export ALL_PROXY="socks5://proxy.example.com:1080"
agent-browser open https://example.com
# SOCKS5 with auth
export ALL_PROXY="socks5://user:pass@proxy.example.com:1080"
agent-browser open https://example.com
```
## Proxy Bypass
Skip proxy for specific domains using `--proxy-bypass` or `NO_PROXY`:
```bash
# Via CLI flag
agent-browser --proxy "http://proxy.example.com:8080" --proxy-bypass "localhost,*.internal.com" open https://example.com
# Via environment variable
export NO_PROXY="localhost,127.0.0.1,.internal.company.com"
agent-browser open https://internal.company.com # Direct connection
agent-browser open https://external.com # Via proxy
```
## Common Use Cases
### Geo-Location Testing
```bash
#!/bin/bash
# Test site from different regions using geo-located proxies
PROXIES=(
"http://us-proxy.example.com:8080"
"http://eu-proxy.example.com:8080"
"http://asia-proxy.example.com:8080"
)
for proxy in "${PROXIES[@]}"; do
export HTTP_PROXY="$proxy"
export HTTPS_PROXY="$proxy"
region=$(echo "$proxy" | grep -oP '^\w+-\w+')
echo "Testing from: $region"
agent-browser --session "$region" open https://example.com
agent-browser --session "$region" screenshot "./screenshots/$region.png"
agent-browser --session "$region" close
done
```
### Rotating Proxies for Scraping
```bash
#!/bin/bash
# Rotate through proxy list to avoid rate limiting
PROXY_LIST=(
"http://proxy1.example.com:8080"
"http://proxy2.example.com:8080"
"http://proxy3.example.com:8080"
)
URLS=(
"https://site.com/page1"
"https://site.com/page2"
"https://site.com/page3"
)
for i in "${!URLS[@]}"; do
proxy_index=$((i % ${#PROXY_LIST[@]}))
export HTTP_PROXY="${PROXY_LIST[$proxy_index]}"
export HTTPS_PROXY="${PROXY_LIST[$proxy_index]}"
agent-browser open "${URLS[$i]}"
agent-browser get text body > "output-$i.txt"
agent-browser close
sleep 1 # Polite delay
done
```
### Corporate Network Access
```bash
#!/bin/bash
# Access internal sites via corporate proxy
export HTTP_PROXY="http://corpproxy.company.com:8080"
export HTTPS_PROXY="http://corpproxy.company.com:8080"
export NO_PROXY="localhost,127.0.0.1,.company.com"
# External sites go through proxy
agent-browser open https://external-vendor.com
# Internal sites bypass proxy
agent-browser open https://intranet.company.com
```
## Verifying Proxy Connection
```bash
# Check your apparent IP
agent-browser open https://httpbin.org/ip
agent-browser get text body
# Should show proxy's IP, not your real IP
```
## Troubleshooting
### Proxy Connection Failed
```bash
# Test proxy connectivity first
curl -x http://proxy.example.com:8080 https://httpbin.org/ip
# Check if proxy requires auth
export HTTP_PROXY="http://user:pass@proxy.example.com:8080"
```
### SSL/TLS Errors Through Proxy
Some proxies perform SSL inspection. If you encounter certificate errors:
```bash
# For testing only - not recommended for production
agent-browser open https://example.com --ignore-https-errors
```
### Slow Performance
```bash
# Use proxy only when necessary
export NO_PROXY="*.cdn.com,*.static.com" # Direct CDN access
```
## Best Practices
1. **Use environment variables** - Don't hardcode proxy credentials
2. **Set NO_PROXY appropriately** - Avoid routing local traffic through proxy
3. **Test proxy before automation** - Verify connectivity with simple requests
4. **Handle proxy failures gracefully** - Implement retry logic for unstable proxies
5. **Rotate proxies for large scraping jobs** - Distribute load and avoid bans

View File

@@ -0,0 +1,193 @@
# Session Management
Multiple isolated browser sessions with state persistence and concurrent browsing.
**Related**: [authentication.md](authentication.md) for login patterns, [SKILL.md](../SKILL.md) for quick start.
## Contents
- [Named Sessions](#named-sessions)
- [Session Isolation Properties](#session-isolation-properties)
- [Session State Persistence](#session-state-persistence)
- [Common Patterns](#common-patterns)
- [Default Session](#default-session)
- [Session Cleanup](#session-cleanup)
- [Best Practices](#best-practices)
## Named Sessions
Use `--session` flag to isolate browser contexts:
```bash
# Session 1: Authentication flow
agent-browser --session auth open https://app.example.com/login
# Session 2: Public browsing (separate cookies, storage)
agent-browser --session public open https://example.com
# Commands are isolated by session
agent-browser --session auth fill @e1 "user@example.com"
agent-browser --session public get text body
```
## Session Isolation Properties
Each session has independent:
- Cookies
- LocalStorage / SessionStorage
- IndexedDB
- Cache
- Browsing history
- Open tabs
## Session State Persistence
### Save Session State
```bash
# Save cookies, storage, and auth state
agent-browser state save /path/to/auth-state.json
```
### Load Session State
```bash
# Restore saved state
agent-browser state load /path/to/auth-state.json
# Continue with authenticated session
agent-browser open https://app.example.com/dashboard
```
### State File Contents
```json
{
"cookies": [...],
"localStorage": {...},
"sessionStorage": {...},
"origins": [...]
}
```
## Common Patterns
### Authenticated Session Reuse
```bash
#!/bin/bash
# Save login state once, reuse many times
STATE_FILE="/tmp/auth-state.json"
# Check if we have saved state
if [[ -f "$STATE_FILE" ]]; then
agent-browser state load "$STATE_FILE"
agent-browser open https://app.example.com/dashboard
else
# Perform login
agent-browser open https://app.example.com/login
agent-browser snapshot -i
agent-browser fill @e1 "$USERNAME"
agent-browser fill @e2 "$PASSWORD"
agent-browser click @e3
agent-browser wait --load networkidle
# Save for future use
agent-browser state save "$STATE_FILE"
fi
```
### Concurrent Scraping
```bash
#!/bin/bash
# Scrape multiple sites concurrently
# Start all sessions
agent-browser --session site1 open https://site1.com &
agent-browser --session site2 open https://site2.com &
agent-browser --session site3 open https://site3.com &
wait
# Extract from each
agent-browser --session site1 get text body > site1.txt
agent-browser --session site2 get text body > site2.txt
agent-browser --session site3 get text body > site3.txt
# Cleanup
agent-browser --session site1 close
agent-browser --session site2 close
agent-browser --session site3 close
```
### A/B Testing Sessions
```bash
# Test different user experiences
agent-browser --session variant-a open "https://app.com?variant=a"
agent-browser --session variant-b open "https://app.com?variant=b"
# Compare
agent-browser --session variant-a screenshot /tmp/variant-a.png
agent-browser --session variant-b screenshot /tmp/variant-b.png
```
## Default Session
When `--session` is omitted, commands use the default session:
```bash
# These use the same default session
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser close # Closes default session
```
## Session Cleanup
```bash
# Close specific session
agent-browser --session auth close
# List active sessions
agent-browser session list
```
## Best Practices
### 1. Name Sessions Semantically
```bash
# GOOD: Clear purpose
agent-browser --session github-auth open https://github.com
agent-browser --session docs-scrape open https://docs.example.com
# AVOID: Generic names
agent-browser --session s1 open https://github.com
```
### 2. Always Clean Up
```bash
# Close sessions when done
agent-browser --session auth close
agent-browser --session scrape close
```
### 3. Handle State Files Securely
```bash
# Don't commit state files (contain auth tokens!)
echo "*.auth-state.json" >> .gitignore
# Delete after use
rm /tmp/auth-state.json
```
### 4. Timeout Long Sessions
```bash
# Set timeout for automated scripts
timeout 60 agent-browser --session long-task get text body
```

View File

@@ -0,0 +1,194 @@
# Snapshot and Refs
Compact element references that reduce context usage dramatically for AI agents.
**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.
## Contents
- [How Refs Work](#how-refs-work)
- [Snapshot Command](#the-snapshot-command)
- [Using Refs](#using-refs)
- [Ref Lifecycle](#ref-lifecycle)
- [Best Practices](#best-practices)
- [Ref Notation Details](#ref-notation-details)
- [Troubleshooting](#troubleshooting)
## How Refs Work
Traditional approach:
```
Full DOM/HTML -> AI parses -> CSS selector -> Action (~3000-5000 tokens)
```
agent-browser approach:
```
Compact snapshot -> @refs assigned -> Direct interaction (~200-400 tokens)
```
## The Snapshot Command
```bash
# Basic snapshot (shows page structure)
agent-browser snapshot
# Interactive snapshot (-i flag) - RECOMMENDED
agent-browser snapshot -i
```
### Snapshot Output Format
```
Page: Example Site - Home
URL: https://example.com
@e1 [header]
@e2 [nav]
@e3 [a] "Home"
@e4 [a] "Products"
@e5 [a] "About"
@e6 [button] "Sign In"
@e7 [main]
@e8 [h1] "Welcome"
@e9 [form]
@e10 [input type="email"] placeholder="Email"
@e11 [input type="password"] placeholder="Password"
@e12 [button type="submit"] "Log In"
@e13 [footer]
@e14 [a] "Privacy Policy"
```
## Using Refs
Once you have refs, interact directly:
```bash
# Click the "Sign In" button
agent-browser click @e6
# Fill email input
agent-browser fill @e10 "user@example.com"
# Fill password
agent-browser fill @e11 "password123"
# Submit the form
agent-browser click @e12
```
## Ref Lifecycle
**IMPORTANT**: Refs are invalidated when the page changes!
```bash
# Get initial snapshot
agent-browser snapshot -i
# @e1 [button] "Next"
# Click triggers page change
agent-browser click @e1
# MUST re-snapshot to get new refs!
agent-browser snapshot -i
# @e1 [h1] "Page 2" <- Different element now!
```
## Best Practices
### 1. Always Snapshot Before Interacting
```bash
# CORRECT
agent-browser open https://example.com
agent-browser snapshot -i # Get refs first
agent-browser click @e1 # Use ref
# WRONG
agent-browser open https://example.com
agent-browser click @e1 # Ref doesn't exist yet!
```
### 2. Re-Snapshot After Navigation
```bash
agent-browser click @e5 # Navigates to new page
agent-browser snapshot -i # Get new refs
agent-browser click @e1 # Use new refs
```
### 3. Re-Snapshot After Dynamic Changes
```bash
agent-browser click @e1 # Opens dropdown
agent-browser snapshot -i # See dropdown items
agent-browser click @e7 # Select item
```
### 4. Snapshot Specific Regions
For complex pages, snapshot specific areas:
```bash
# Snapshot just the form
agent-browser snapshot @e9
```
## Ref Notation Details
```
@e1 [tag type="value"] "text content" placeholder="hint"
| | | | |
| | | | +- Additional attributes
| | | +- Visible text
| | +- Key attributes shown
| +- HTML tag name
+- Unique ref ID
```
### Common Patterns
```
@e1 [button] "Submit" # Button with text
@e2 [input type="email"] # Email input
@e3 [input type="password"] # Password input
@e4 [a href="/page"] "Link Text" # Anchor link
@e5 [select] # Dropdown
@e6 [textarea] placeholder="Message" # Text area
@e7 [div class="modal"] # Container (when relevant)
@e8 [img alt="Logo"] # Image
@e9 [checkbox] checked # Checked checkbox
@e10 [radio] selected # Selected radio
```
## Troubleshooting
### "Ref not found" Error
```bash
# Ref may have changed - re-snapshot
agent-browser snapshot -i
```
### Element Not Visible in Snapshot
```bash
# Scroll down to reveal element
agent-browser scroll down 1000
agent-browser snapshot -i
# Or wait for dynamic content
agent-browser wait 1000
agent-browser snapshot -i
```
### Too Many Elements
```bash
# Snapshot specific container
agent-browser snapshot @e5
# Or use get text for content-only extraction
agent-browser get text @e5
```

View File

@@ -0,0 +1,173 @@
# Video Recording
Capture browser automation as video for debugging, documentation, or verification.
**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.
## Contents
- [Basic Recording](#basic-recording)
- [Recording Commands](#recording-commands)
- [Use Cases](#use-cases)
- [Best Practices](#best-practices)
- [Output Format](#output-format)
- [Limitations](#limitations)
## Basic Recording
```bash
# Start recording
agent-browser record start ./demo.webm
# Perform actions
agent-browser open https://example.com
agent-browser snapshot -i
agent-browser click @e1
agent-browser fill @e2 "test input"
# Stop and save
agent-browser record stop
```
## Recording Commands
```bash
# Start recording to file
agent-browser record start ./output.webm
# Stop current recording
agent-browser record stop
# Restart with new file (stops current + starts new)
agent-browser record restart ./take2.webm
```
## Use Cases
### Debugging Failed Automation
```bash
#!/bin/bash
# Record automation for debugging
agent-browser record start ./debug-$(date +%Y%m%d-%H%M%S).webm
# Run your automation
agent-browser open https://app.example.com
agent-browser snapshot -i
agent-browser click @e1 || {
echo "Click failed - check recording"
agent-browser record stop
exit 1
}
agent-browser record stop
```
### Documentation Generation
```bash
#!/bin/bash
# Record workflow for documentation
agent-browser record start ./docs/how-to-login.webm
agent-browser open https://app.example.com/login
agent-browser wait 1000 # Pause for visibility
agent-browser snapshot -i
agent-browser fill @e1 "demo@example.com"
agent-browser wait 500
agent-browser fill @e2 "password"
agent-browser wait 500
agent-browser click @e3
agent-browser wait --load networkidle
agent-browser wait 1000 # Show result
agent-browser record stop
```
### CI/CD Test Evidence
```bash
#!/bin/bash
# Record E2E test runs for CI artifacts
TEST_NAME="${1:-e2e-test}"
RECORDING_DIR="./test-recordings"
mkdir -p "$RECORDING_DIR"
agent-browser record start "$RECORDING_DIR/$TEST_NAME-$(date +%s).webm"
# Run test
if run_e2e_test; then
echo "Test passed"
else
echo "Test failed - recording saved"
fi
agent-browser record stop
```
## Best Practices
### 1. Add Pauses for Clarity
```bash
# Slow down for human viewing
agent-browser click @e1
agent-browser wait 500 # Let viewer see result
```
### 2. Use Descriptive Filenames
```bash
# Include context in filename
agent-browser record start ./recordings/login-flow-2024-01-15.webm
agent-browser record start ./recordings/checkout-test-run-42.webm
```
### 3. Handle Recording in Error Cases
```bash
#!/bin/bash
set -e
cleanup() {
agent-browser record stop 2>/dev/null || true
agent-browser close 2>/dev/null || true
}
trap cleanup EXIT
agent-browser record start ./automation.webm
# ... automation steps ...
```
### 4. Combine with Screenshots
```bash
# Record video AND capture key frames
agent-browser record start ./flow.webm
agent-browser open https://example.com
agent-browser screenshot ./screenshots/step1-homepage.png
agent-browser click @e1
agent-browser screenshot ./screenshots/step2-after-click.png
agent-browser record stop
```
## Output Format
- Default format: WebM (VP8/VP9 codec)
- Compatible with all modern browsers and video players
- Compressed but high quality
## Limitations
- Recording adds slight overhead to automation
- Large recordings can consume significant disk space
- Some headless environments may have codec limitations