Update SKILL.md to match the latest upstream skill from vercel-labs/agent-browser, adding substantial new capabilities: - Authentication (auth vault, profiles, session persistence, state files) - Command chaining, annotated screenshots, diffing - Security features (content boundaries, domain allowlist, action policy) - iOS Simulator support, Lightpanda engine, downloads, clipboard - JS eval improvements (--stdin, -b for shell safety) - Timeout guidance, config files, session cleanup Add 7 reference docs (commands, authentication, snapshot-refs, session-management, video-recording, profiling, proxy-support) and 3 ready-to-use shell templates. Kept our YAML frontmatter, setup check section, and Playwright MCP comparison table which are unique to our plugin context.
4.1 KiB
4.1 KiB
Snapshot and Refs
Compact element references that reduce context usage dramatically for AI agents.
Related: commands.md for full command reference, SKILL.md for quick start.
Contents
- How Refs Work
- Snapshot Command
- Using Refs
- Ref Lifecycle
- Best Practices
- Ref Notation Details
- Troubleshooting
How Refs Work
Traditional approach:
Full DOM/HTML -> AI parses -> CSS selector -> Action (~3000-5000 tokens)
agent-browser approach:
Compact snapshot -> @refs assigned -> Direct interaction (~200-400 tokens)
The Snapshot Command
# Basic snapshot (shows page structure)
agent-browser snapshot
# Interactive snapshot (-i flag) - RECOMMENDED
agent-browser snapshot -i
Snapshot Output Format
Page: Example Site - Home
URL: https://example.com
@e1 [header]
@e2 [nav]
@e3 [a] "Home"
@e4 [a] "Products"
@e5 [a] "About"
@e6 [button] "Sign In"
@e7 [main]
@e8 [h1] "Welcome"
@e9 [form]
@e10 [input type="email"] placeholder="Email"
@e11 [input type="password"] placeholder="Password"
@e12 [button type="submit"] "Log In"
@e13 [footer]
@e14 [a] "Privacy Policy"
Using Refs
Once you have refs, interact directly:
# Click the "Sign In" button
agent-browser click @e6
# Fill email input
agent-browser fill @e10 "user@example.com"
# Fill password
agent-browser fill @e11 "password123"
# Submit the form
agent-browser click @e12
Ref Lifecycle
IMPORTANT: Refs are invalidated when the page changes!
# Get initial snapshot
agent-browser snapshot -i
# @e1 [button] "Next"
# Click triggers page change
agent-browser click @e1
# MUST re-snapshot to get new refs!
agent-browser snapshot -i
# @e1 [h1] "Page 2" <- Different element now!
Best Practices
1. Always Snapshot Before Interacting
# CORRECT
agent-browser open https://example.com
agent-browser snapshot -i # Get refs first
agent-browser click @e1 # Use ref
# WRONG
agent-browser open https://example.com
agent-browser click @e1 # Ref doesn't exist yet!
2. Re-Snapshot After Navigation
agent-browser click @e5 # Navigates to new page
agent-browser snapshot -i # Get new refs
agent-browser click @e1 # Use new refs
3. Re-Snapshot After Dynamic Changes
agent-browser click @e1 # Opens dropdown
agent-browser snapshot -i # See dropdown items
agent-browser click @e7 # Select item
4. Snapshot Specific Regions
For complex pages, snapshot specific areas:
# Snapshot just the form
agent-browser snapshot @e9
Ref Notation Details
@e1 [tag type="value"] "text content" placeholder="hint"
| | | | |
| | | | +- Additional attributes
| | | +- Visible text
| | +- Key attributes shown
| +- HTML tag name
+- Unique ref ID
Common Patterns
@e1 [button] "Submit" # Button with text
@e2 [input type="email"] # Email input
@e3 [input type="password"] # Password input
@e4 [a href="/page"] "Link Text" # Anchor link
@e5 [select] # Dropdown
@e6 [textarea] placeholder="Message" # Text area
@e7 [div class="modal"] # Container (when relevant)
@e8 [img alt="Logo"] # Image
@e9 [checkbox] checked # Checked checkbox
@e10 [radio] selected # Selected radio
Troubleshooting
"Ref not found" Error
# Ref may have changed - re-snapshot
agent-browser snapshot -i
Element Not Visible in Snapshot
# Scroll down to reveal element
agent-browser scroll down 1000
agent-browser snapshot -i
# Or wait for dynamic content
agent-browser wait 1000
agent-browser snapshot -i
Too Many Elements
# Snapshot specific container
agent-browser snapshot @e5
# Or use get text for content-only extraction
agent-browser get text @e5