Agent Browser Skill for Claude Code

🚀 The Ultimate Browser Automation for AI Agents

Built on agent-browser by Vercel Labs, enhanced for Claude Code

Quick Start • Features • Examples • Documentation

🎯 What is This?

agent-browser-skill is a professional Claude Code integration that brings powerful browser automation to your AI assistant. Think of it as giving Claude the ability to see and interact with websites just like you do - but faster, more reliable, and with memory.

The Problem We Solve

Ever wanted Claude to:

📸 Take screenshots of websites for you?
🔐 Remember your login sessions across conversations?
📤 Upload files to websites automatically?
🤖 Fill out forms and click buttons?
🔄 Automate repetitive web tasks?

Now it can. With a single command.

🌟 What Makes This Special?

We took Vercel Labs' excellent agent-browser and made it Claude Code native. Here's what we added:

🎁 Our Enhancements

Feature	Original agent-browser	Our Enhancement
Installation	Manual npm install + setup	✨ One-command installer (`/agent-browser-installer`)
Claude Integration	CLI tool only	✨ Native Claude Code skill with intelligent task understanding
Session Management	Basic profile support	✨ Guided profile setup with examples and best practices
Documentation	Technical docs	✨ Beginner-friendly guides + advanced tutorials
Error Handling	Basic error messages	✨ Intelligent troubleshooting with auto-recovery
User Experience	Command-line focused	✨ Natural language interface through Claude

💡 Why This Matters

For Beginners: You don't need to understand npm, Playwright, or browser automation. Just tell Claude what you want, and it happens.

For Experts: You get all the power of agent-browser + Playwright, with the convenience of Claude Code's AI-driven workflow.

✨ Core Features

🎯 One-Command Installation

/agent-browser-installer

That's it. No configuration files, no environment variables, no headaches.

🔄 Session Persistence

Login once, stay logged in forever. Your browser sessions are saved and restored automatically.

# Login once
agent-browser --profile ~/.agent-browser/github open https://github.com/login

# Next time: already logged in!
agent-browser --profile ~/.agent-browser/github open https://github.com/settings

📤 File Upload Without Dialogs

Upload files programmatically - no clicking through file dialogs.

agent-browser upload @e68 "/path/to/file.pdf"

🤝 Human-in-the-Loop

Connect to your existing Chrome browser with all your logins and extensions.

chrome --remote-debugging-port=9222
agent-browser connect http://127.0.0.1:9222

⚡ AI-Optimized Design

Uses ref-based element selection instead of CSS selectors, reducing context consumption by 93%.

🚀 Quick Start

Prerequisites

✅ Claude Code installed
✅ Node.js 18+ (Download)
✅ 5 minutes of your time

Installation (Literally One Command)

# In Claude Code, run:
/agent-browser-installer

The installer will:

✅ Check your system
✅ Install agent-browser
✅ Download Chromium
✅ Set up the skill
✅ Verify everything works

Your First Automation

Just ask Claude naturally:

"Open Google and search for 'AI news'"
"Take a screenshot of example.com"
"Login to GitHub and save my session"

Claude will automatically use agent-browser to complete these tasks!

💡 Usage Examples

🌱 Beginner: Take a Screenshot

agent-browser open https://example.com
agent-browser screenshot ~/Desktop/example.png
agent-browser close

What this does: Opens a website, captures what you see, saves it to your desktop.

🌿 Intermediate: Save Login Session

# First time: login with visible browser
agent-browser --profile ~/.agent-browser/github --headed open https://github.com/login
# (Login manually in the browser window)

# Next time: automatically logged in!
agent-browser --profile ~/.agent-browser/github open https://github.com/settings
agent-browser screenshot ~/Desktop/github-settings.png
agent-browser close

What this does: Saves your login session so you don't have to login again. Ever.

🌳 Advanced: Automate Form Filling

agent-browser open https://example.com/contact

# Get interactive elements
agent-browser snapshot -i
# Output: Shows all buttons, inputs, links with refs like @e1, @e2, etc.

# Fill the form
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "Hello, this is my message"

# Submit
agent-browser click @e4

agent-browser close

What this does: Finds all interactive elements on a page, fills them out, and submits the form automatically.

🚀 Expert: Upload File to GitHub Profile

# Open with saved session
agent-browser --profile ~/.agent-browser/github open https://github.com/settings/profile

# Find the file input
agent-browser snapshot -i
# Look for: input "Upload new picture" [ref=@e68]

# Upload your avatar
agent-browser upload @e68 "/home/user/Pictures/avatar.jpg"

# Save changes
agent-browser click @e65

# Confirm
agent-browser screenshot ~/Desktop/profile-updated.png
agent-browser close

What this does: Uploads a file to GitHub without clicking through file dialogs. Works for any website.

🏗️ How It Works

┌─────────────────────────────────────────────────────────────┐
│  You: "Take a screenshot of example.com"                    │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Claude Code: Understands intent, calls agent-browser skill │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  agent-browser: Executes browser automation                 │
│  - Opens Chromium browser                                   │
│  - Navigates to example.com                                 │
│  - Takes screenshot                                          │
│  - Saves to disk                                            │
└────────────────────────┬────────────────────────────────────┘
                         │
                         ▼
┌─────────────────────────────────────────────────────────────┐
│  Result: Screenshot saved to ~/Desktop/example.png          │
└─────────────────────────────────────────────────────────────┘

Under the Hood:

agent-browser (by Vercel Labs): Core automation engine
Playwright: Browser control framework
Chromium: The actual browser
Our Skills: Claude Code integration layer

📚 Command Reference

Navigation

agent-browser open <url>          # Navigate to URL
agent-browser back                # Go back one page
agent-browser forward             # Go forward one page
agent-browser reload              # Reload current page

Element Interaction

agent-browser snapshot -i         # Get interactive elements with refs
agent-browser click @ref          # Click element
agent-browser fill @ref <text>    # Fill input field
agent-browser type @ref <text>    # Type into element
agent-browser hover @ref          # Hover over element

Information Retrieval

agent-browser get text <selector> # Get text content
agent-browser get title           # Get page title
agent-browser get url             # Get current URL
agent-browser screenshot [path]   # Take screenshot

File Operations

agent-browser upload @ref <filepath>  # Upload file

Session Management

agent-browser --profile <path> open <url>  # Use persistent profile
agent-browser --headed open <url>          # Show browser window
agent-browser close                        # Close browser

Advanced

agent-browser connect [endpoint]           # Connect to Chrome debugging port
agent-browser batch "cmd1" "cmd2" "cmd3"   # Execute multiple commands
agent-browser wait <ms>                    # Wait for specified time

🔧 Advanced Features

Profile Management

Create separate profiles for different accounts:

# Work GitHub account
agent-browser --profile ~/.agent-browser/github-work open https://github.com

# Personal GitHub account
agent-browser --profile ~/.agent-browser/github-personal open https://github.com

# Twitter account
agent-browser --profile ~/.agent-browser/twitter open https://twitter.com

Why this matters: Each profile has its own cookies, localStorage, and login sessions. No more logging in and out!

Batch Operations

Execute multiple commands in one go:

agent-browser batch "open https://example.com" "snapshot -i" "screenshot /tmp/page.png" "close"

Why this matters: Faster execution, less overhead, perfect for automation scripts.

Connect to Existing Chrome

Use your existing Chrome browser with all your logins:

# Step 1: Start Chrome with debugging
chrome --remote-debugging-port=9222

# Step 2: Connect agent-browser
agent-browser connect http://127.0.0.1:9222

# Step 3: Use as normal (already logged in everywhere!)
agent-browser open https://gmail.com
agent-browser get title

Why this matters: No need to login again. Use your existing browser state.

🔒 Security & Privacy

What We Do

✅ 100% Local: All automation runs on your machine
✅ No Telemetry: Zero data collection or tracking
✅ Profile Isolation: Each profile is completely isolated
✅ Open Source: Audit the code yourself

What You Should Do

⚠️ Never commit profiles: They contain cookies and login tokens
⚠️ Use separate profiles: Different accounts = different profiles
⚠️ Backup important profiles: They're just directories, easy to backup
⚠️ Review permissions: Check what websites you're automating

🛠️ Troubleshooting

"Command not found"

Problem: agent-browser: command not found

Solution:

# Check if installed
npm list -g agent-browser

# If not, reinstall
/agent-browser-installer

"No interactive elements found"

Problem: snapshot -i returns empty

Solution: Wait for page to load

agent-browser open https://example.com
agent-browser wait 3000  # Wait 3 seconds
agent-browser snapshot -i

"Profile not persisting"

Problem: Login session not saved

Solution: Use absolute paths

# ❌ Wrong (relative path)
agent-browser --profile ./profile open https://example.com

# ✅ Correct (absolute path)
agent-browser --profile ~/.agent-browser/profile open https://example.com

More Help

📖 Documentation

📘 Quick Start Guide - Get started in 5 minutes
📗 Project Structure - Understand the codebase
📕 Contributing Guide - Help improve the project
📙 Release Notes - What's new

🤝 Contributing

We welcome contributions! Whether you're:

🐛 Reporting bugs
💡 Suggesting features
📝 Improving documentation
🔧 Submitting code

See CONTRIBUTING.md for guidelines.

📄 License

This project is licensed under the Apache-2.0 License - see LICENSE for details.

What this means: You can use, modify, and distribute this project freely, even commercially. Just include the license and copyright notice.

🙏 Acknowledgments

This project stands on the shoulders of giants:

agent-browser by Vercel Labs - The core automation engine that makes everything possible
Playwright by Microsoft - The browser automation framework
Claude Code by Anthropic - The AI development environment
Open Source Community - For inspiration, feedback, and contributions

📞 Support & Community

🐛 Found a bug? Report it
💬 Have a question? Ask in Discussions
📖 Need docs? Check the Wiki
⭐ Like the project? Give us a star!

🌟 Star History

If you find this project useful, please consider giving it a star! It helps others discover the project.

Made with ❤️ for the Claude Code community

Get Started • View on GitHub • Report Issue

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
agent-browser-installer		agent-browser-installer
agent-browser		agent-browser
.gitignore		.gitignore
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
PROJECT_STRUCTURE.md		PROJECT_STRUCTURE.md
QUICKSTART.md		QUICKSTART.md
README.md		README.md
RELEASE_SUMMARY.md		RELEASE_SUMMARY.md

Folders and files

Latest commit

History

Repository files navigation

Agent Browser Skill for Claude Code

🎯 What is This?

The Problem We Solve

🌟 What Makes This Special?

🎁 Our Enhancements

💡 Why This Matters

✨ Core Features

🎯 One-Command Installation

🔄 Session Persistence

📤 File Upload Without Dialogs

🤝 Human-in-the-Loop

⚡ AI-Optimized Design

🚀 Quick Start

Prerequisites

Installation (Literally One Command)

Your First Automation

💡 Usage Examples

🌱 Beginner: Take a Screenshot

🌿 Intermediate: Save Login Session

🌳 Advanced: Automate Form Filling

🚀 Expert: Upload File to GitHub Profile

🏗️ How It Works

📚 Command Reference

Navigation

Element Interaction

Information Retrieval

File Operations

Session Management

Advanced

🔧 Advanced Features

Profile Management

Batch Operations

Connect to Existing Chrome

🔒 Security & Privacy

What We Do

What You Should Do

🛠️ Troubleshooting

"Command not found"

"No interactive elements found"

"Profile not persisting"

More Help

📖 Documentation

🤝 Contributing

📄 License

🙏 Acknowledgments

📞 Support & Community

🌟 Star History

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Packages