Update SKILL.md to match the latest upstream skill from vercel-labs/agent-browser, adding substantial new capabilities: - Authentication (auth vault, profiles, session persistence, state files) - Command chaining, annotated screenshots, diffing - Security features (content boundaries, domain allowlist, action policy) - iOS Simulator support, Lightpanda engine, downloads, clipboard - JS eval improvements (--stdin, -b for shell safety) - Timeout guidance, config files, session cleanup Add 7 reference docs (commands, authentication, snapshot-refs, session-management, video-recording, profiling, proxy-support) and 3 ready-to-use shell templates. Kept our YAML frontmatter, setup check section, and Playwright MCP comparison table which are unique to our plugin context.
195 lines
4.1 KiB
Markdown
195 lines
4.1 KiB
Markdown
# Snapshot and Refs
|
|
|
|
Compact element references that reduce context usage dramatically for AI agents.
|
|
|
|
**Related**: [commands.md](commands.md) for full command reference, [SKILL.md](../SKILL.md) for quick start.
|
|
|
|
## Contents
|
|
|
|
- [How Refs Work](#how-refs-work)
|
|
- [Snapshot Command](#the-snapshot-command)
|
|
- [Using Refs](#using-refs)
|
|
- [Ref Lifecycle](#ref-lifecycle)
|
|
- [Best Practices](#best-practices)
|
|
- [Ref Notation Details](#ref-notation-details)
|
|
- [Troubleshooting](#troubleshooting)
|
|
|
|
## How Refs Work
|
|
|
|
Traditional approach:
|
|
```
|
|
Full DOM/HTML -> AI parses -> CSS selector -> Action (~3000-5000 tokens)
|
|
```
|
|
|
|
agent-browser approach:
|
|
```
|
|
Compact snapshot -> @refs assigned -> Direct interaction (~200-400 tokens)
|
|
```
|
|
|
|
## The Snapshot Command
|
|
|
|
```bash
|
|
# Basic snapshot (shows page structure)
|
|
agent-browser snapshot
|
|
|
|
# Interactive snapshot (-i flag) - RECOMMENDED
|
|
agent-browser snapshot -i
|
|
```
|
|
|
|
### Snapshot Output Format
|
|
|
|
```
|
|
Page: Example Site - Home
|
|
URL: https://example.com
|
|
|
|
@e1 [header]
|
|
@e2 [nav]
|
|
@e3 [a] "Home"
|
|
@e4 [a] "Products"
|
|
@e5 [a] "About"
|
|
@e6 [button] "Sign In"
|
|
|
|
@e7 [main]
|
|
@e8 [h1] "Welcome"
|
|
@e9 [form]
|
|
@e10 [input type="email"] placeholder="Email"
|
|
@e11 [input type="password"] placeholder="Password"
|
|
@e12 [button type="submit"] "Log In"
|
|
|
|
@e13 [footer]
|
|
@e14 [a] "Privacy Policy"
|
|
```
|
|
|
|
## Using Refs
|
|
|
|
Once you have refs, interact directly:
|
|
|
|
```bash
|
|
# Click the "Sign In" button
|
|
agent-browser click @e6
|
|
|
|
# Fill email input
|
|
agent-browser fill @e10 "user@example.com"
|
|
|
|
# Fill password
|
|
agent-browser fill @e11 "password123"
|
|
|
|
# Submit the form
|
|
agent-browser click @e12
|
|
```
|
|
|
|
## Ref Lifecycle
|
|
|
|
**IMPORTANT**: Refs are invalidated when the page changes!
|
|
|
|
```bash
|
|
# Get initial snapshot
|
|
agent-browser snapshot -i
|
|
# @e1 [button] "Next"
|
|
|
|
# Click triggers page change
|
|
agent-browser click @e1
|
|
|
|
# MUST re-snapshot to get new refs!
|
|
agent-browser snapshot -i
|
|
# @e1 [h1] "Page 2" <- Different element now!
|
|
```
|
|
|
|
## Best Practices
|
|
|
|
### 1. Always Snapshot Before Interacting
|
|
|
|
```bash
|
|
# CORRECT
|
|
agent-browser open https://example.com
|
|
agent-browser snapshot -i # Get refs first
|
|
agent-browser click @e1 # Use ref
|
|
|
|
# WRONG
|
|
agent-browser open https://example.com
|
|
agent-browser click @e1 # Ref doesn't exist yet!
|
|
```
|
|
|
|
### 2. Re-Snapshot After Navigation
|
|
|
|
```bash
|
|
agent-browser click @e5 # Navigates to new page
|
|
agent-browser snapshot -i # Get new refs
|
|
agent-browser click @e1 # Use new refs
|
|
```
|
|
|
|
### 3. Re-Snapshot After Dynamic Changes
|
|
|
|
```bash
|
|
agent-browser click @e1 # Opens dropdown
|
|
agent-browser snapshot -i # See dropdown items
|
|
agent-browser click @e7 # Select item
|
|
```
|
|
|
|
### 4. Snapshot Specific Regions
|
|
|
|
For complex pages, snapshot specific areas:
|
|
|
|
```bash
|
|
# Snapshot just the form
|
|
agent-browser snapshot @e9
|
|
```
|
|
|
|
## Ref Notation Details
|
|
|
|
```
|
|
@e1 [tag type="value"] "text content" placeholder="hint"
|
|
| | | | |
|
|
| | | | +- Additional attributes
|
|
| | | +- Visible text
|
|
| | +- Key attributes shown
|
|
| +- HTML tag name
|
|
+- Unique ref ID
|
|
```
|
|
|
|
### Common Patterns
|
|
|
|
```
|
|
@e1 [button] "Submit" # Button with text
|
|
@e2 [input type="email"] # Email input
|
|
@e3 [input type="password"] # Password input
|
|
@e4 [a href="/page"] "Link Text" # Anchor link
|
|
@e5 [select] # Dropdown
|
|
@e6 [textarea] placeholder="Message" # Text area
|
|
@e7 [div class="modal"] # Container (when relevant)
|
|
@e8 [img alt="Logo"] # Image
|
|
@e9 [checkbox] checked # Checked checkbox
|
|
@e10 [radio] selected # Selected radio
|
|
```
|
|
|
|
## Troubleshooting
|
|
|
|
### "Ref not found" Error
|
|
|
|
```bash
|
|
# Ref may have changed - re-snapshot
|
|
agent-browser snapshot -i
|
|
```
|
|
|
|
### Element Not Visible in Snapshot
|
|
|
|
```bash
|
|
# Scroll down to reveal element
|
|
agent-browser scroll down 1000
|
|
agent-browser snapshot -i
|
|
|
|
# Or wait for dynamic content
|
|
agent-browser wait 1000
|
|
agent-browser snapshot -i
|
|
```
|
|
|
|
### Too Many Elements
|
|
|
|
```bash
|
|
# Snapshot specific container
|
|
agent-browser snapshot @e5
|
|
|
|
# Or use get text for content-only extraction
|
|
agent-browser get text @e5
|
|
```
|