π The Ultimate Browser Automation for AI Agents
Built on agent-browser by Vercel Labs, enhanced for Claude Code
Quick Start β’ Features β’ Examples β’ Documentation
agent-browser-skill is a professional Claude Code integration that brings powerful browser automation to your AI assistant. Think of it as giving Claude the ability to see and interact with websites just like you do - but faster, more reliable, and with memory.
Ever wanted Claude to:
- πΈ Take screenshots of websites for you?
- π Remember your login sessions across conversations?
- π€ Upload files to websites automatically?
- π€ Fill out forms and click buttons?
- π Automate repetitive web tasks?
Now it can. With a single command.
We took Vercel Labs' excellent agent-browser and made it Claude Code native. Here's what we added:
| Feature | Original agent-browser | Our Enhancement |
|---|---|---|
| Installation | Manual npm install + setup | β¨ One-command installer (/agent-browser-installer) |
| Claude Integration | CLI tool only | β¨ Native Claude Code skill with intelligent task understanding |
| Session Management | Basic profile support | β¨ Guided profile setup with examples and best practices |
| Documentation | Technical docs | β¨ Beginner-friendly guides + advanced tutorials |
| Error Handling | Basic error messages | β¨ Intelligent troubleshooting with auto-recovery |
| User Experience | Command-line focused | β¨ Natural language interface through Claude |
For Beginners: You don't need to understand npm, Playwright, or browser automation. Just tell Claude what you want, and it happens.
For Experts: You get all the power of agent-browser + Playwright, with the convenience of Claude Code's AI-driven workflow.
/agent-browser-installerThat's it. No configuration files, no environment variables, no headaches.
Login once, stay logged in forever. Your browser sessions are saved and restored automatically.
# Login once
agent-browser --profile ~/.agent-browser/github open https://github.com/login
# Next time: already logged in!
agent-browser --profile ~/.agent-browser/github open https://github.com/settingsUpload files programmatically - no clicking through file dialogs.
agent-browser upload @e68 "/path/to/file.pdf"Connect to your existing Chrome browser with all your logins and extensions.
chrome --remote-debugging-port=9222
agent-browser connect http://127.0.0.1:9222Uses ref-based element selection instead of CSS selectors, reducing context consumption by 93%.
- β Claude Code installed
- β Node.js 18+ (Download)
- β 5 minutes of your time
# In Claude Code, run:
/agent-browser-installerThe installer will:
- β Check your system
- β Install agent-browser
- β Download Chromium
- β Set up the skill
- β Verify everything works
Just ask Claude naturally:
"Open Google and search for 'AI news'"
"Take a screenshot of example.com"
"Login to GitHub and save my session"
Claude will automatically use agent-browser to complete these tasks!
agent-browser open https://example.com
agent-browser screenshot ~/Desktop/example.png
agent-browser closeWhat this does: Opens a website, captures what you see, saves it to your desktop.
# First time: login with visible browser
agent-browser --profile ~/.agent-browser/github --headed open https://github.com/login
# (Login manually in the browser window)
# Next time: automatically logged in!
agent-browser --profile ~/.agent-browser/github open https://github.com/settings
agent-browser screenshot ~/Desktop/github-settings.png
agent-browser closeWhat this does: Saves your login session so you don't have to login again. Ever.
agent-browser open https://example.com/contact
# Get interactive elements
agent-browser snapshot -i
# Output: Shows all buttons, inputs, links with refs like @e1, @e2, etc.
# Fill the form
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "Hello, this is my message"
# Submit
agent-browser click @e4
agent-browser closeWhat this does: Finds all interactive elements on a page, fills them out, and submits the form automatically.
# Open with saved session
agent-browser --profile ~/.agent-browser/github open https://github.com/settings/profile
# Find the file input
agent-browser snapshot -i
# Look for: input "Upload new picture" [ref=@e68]
# Upload your avatar
agent-browser upload @e68 "/home/user/Pictures/avatar.jpg"
# Save changes
agent-browser click @e65
# Confirm
agent-browser screenshot ~/Desktop/profile-updated.png
agent-browser closeWhat this does: Uploads a file to GitHub without clicking through file dialogs. Works for any website.
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β You: "Take a screenshot of example.com" β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Claude Code: Understands intent, calls agent-browser skill β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β agent-browser: Executes browser automation β
β - Opens Chromium browser β
β - Navigates to example.com β
β - Takes screenshot β
β - Saves to disk β
ββββββββββββββββββββββββββ¬βββββββββββββββββββββββββββββββββββββ
β
βΌ
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
β Result: Screenshot saved to ~/Desktop/example.png β
βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ
Under the Hood:
- agent-browser (by Vercel Labs): Core automation engine
- Playwright: Browser control framework
- Chromium: The actual browser
- Our Skills: Claude Code integration layer
agent-browser open <url> # Navigate to URL
agent-browser back # Go back one page
agent-browser forward # Go forward one page
agent-browser reload # Reload current pageagent-browser snapshot -i # Get interactive elements with refs
agent-browser click @ref # Click element
agent-browser fill @ref <text> # Fill input field
agent-browser type @ref <text> # Type into element
agent-browser hover @ref # Hover over elementagent-browser get text <selector> # Get text content
agent-browser get title # Get page title
agent-browser get url # Get current URL
agent-browser screenshot [path] # Take screenshotagent-browser upload @ref <filepath> # Upload fileagent-browser --profile <path> open <url> # Use persistent profile
agent-browser --headed open <url> # Show browser window
agent-browser close # Close browseragent-browser connect [endpoint] # Connect to Chrome debugging port
agent-browser batch "cmd1" "cmd2" "cmd3" # Execute multiple commands
agent-browser wait <ms> # Wait for specified timeCreate separate profiles for different accounts:
# Work GitHub account
agent-browser --profile ~/.agent-browser/github-work open https://github.com
# Personal GitHub account
agent-browser --profile ~/.agent-browser/github-personal open https://github.com
# Twitter account
agent-browser --profile ~/.agent-browser/twitter open https://twitter.comWhy this matters: Each profile has its own cookies, localStorage, and login sessions. No more logging in and out!
Execute multiple commands in one go:
agent-browser batch "open https://example.com" "snapshot -i" "screenshot /tmp/page.png" "close"Why this matters: Faster execution, less overhead, perfect for automation scripts.
Use your existing Chrome browser with all your logins:
# Step 1: Start Chrome with debugging
chrome --remote-debugging-port=9222
# Step 2: Connect agent-browser
agent-browser connect http://127.0.0.1:9222
# Step 3: Use as normal (already logged in everywhere!)
agent-browser open https://gmail.com
agent-browser get titleWhy this matters: No need to login again. Use your existing browser state.
- β 100% Local: All automation runs on your machine
- β No Telemetry: Zero data collection or tracking
- β Profile Isolation: Each profile is completely isolated
- β Open Source: Audit the code yourself
β οΈ Never commit profiles: They contain cookies and login tokensβ οΈ Use separate profiles: Different accounts = different profilesβ οΈ Backup important profiles: They're just directories, easy to backupβ οΈ Review permissions: Check what websites you're automating
Problem: agent-browser: command not found
Solution:
# Check if installed
npm list -g agent-browser
# If not, reinstall
/agent-browser-installerProblem: snapshot -i returns empty
Solution: Wait for page to load
agent-browser open https://example.com
agent-browser wait 3000 # Wait 3 seconds
agent-browser snapshot -iProblem: Login session not saved
Solution: Use absolute paths
# β Wrong (relative path)
agent-browser --profile ./profile open https://example.com
# β
Correct (absolute path)
agent-browser --profile ~/.agent-browser/profile open https://example.com- π Full Troubleshooting Guide
- π¬ Ask in Discussions
- π Report a Bug
- π Quick Start Guide - Get started in 5 minutes
- π Project Structure - Understand the codebase
- π Contributing Guide - Help improve the project
- π Release Notes - What's new
We welcome contributions! Whether you're:
- π Reporting bugs
- π‘ Suggesting features
- π Improving documentation
- π§ Submitting code
See CONTRIBUTING.md for guidelines.
This project is licensed under the Apache-2.0 License - see LICENSE for details.
What this means: You can use, modify, and distribute this project freely, even commercially. Just include the license and copyright notice.
This project stands on the shoulders of giants:
- agent-browser by Vercel Labs - The core automation engine that makes everything possible
- Playwright by Microsoft - The browser automation framework
- Claude Code by Anthropic - The AI development environment
- Open Source Community - For inspiration, feedback, and contributions
- π Found a bug? Report it
- π¬ Have a question? Ask in Discussions
- π Need docs? Check the Wiki
- β Like the project? Give us a star!
If you find this project useful, please consider giving it a star! It helps others discover the project.
Made with β€οΈ for the Claude Code community
Get Started β’ View on GitHub β’ Report Issue