Skip to content

00OO666/agent-browser-skill

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

5 Commits
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

Agent Browser Skill for Claude Code

πŸš€ The Ultimate Browser Automation for AI Agents

Built on agent-browser by Vercel Labs, enhanced for Claude Code

License Node Platform

Quick Start β€’ Features β€’ Examples β€’ Documentation


🎯 What is This?

agent-browser-skill is a professional Claude Code integration that brings powerful browser automation to your AI assistant. Think of it as giving Claude the ability to see and interact with websites just like you do - but faster, more reliable, and with memory.

The Problem We Solve

Ever wanted Claude to:

  • πŸ“Έ Take screenshots of websites for you?
  • πŸ” Remember your login sessions across conversations?
  • πŸ“€ Upload files to websites automatically?
  • πŸ€– Fill out forms and click buttons?
  • πŸ”„ Automate repetitive web tasks?

Now it can. With a single command.


🌟 What Makes This Special?

We took Vercel Labs' excellent agent-browser and made it Claude Code native. Here's what we added:

🎁 Our Enhancements

Feature Original agent-browser Our Enhancement
Installation Manual npm install + setup ✨ One-command installer (/agent-browser-installer)
Claude Integration CLI tool only ✨ Native Claude Code skill with intelligent task understanding
Session Management Basic profile support ✨ Guided profile setup with examples and best practices
Documentation Technical docs ✨ Beginner-friendly guides + advanced tutorials
Error Handling Basic error messages ✨ Intelligent troubleshooting with auto-recovery
User Experience Command-line focused ✨ Natural language interface through Claude

πŸ’‘ Why This Matters

For Beginners: You don't need to understand npm, Playwright, or browser automation. Just tell Claude what you want, and it happens.

For Experts: You get all the power of agent-browser + Playwright, with the convenience of Claude Code's AI-driven workflow.


✨ Core Features

🎯 One-Command Installation

/agent-browser-installer

That's it. No configuration files, no environment variables, no headaches.

πŸ”„ Session Persistence

Login once, stay logged in forever. Your browser sessions are saved and restored automatically.

# Login once
agent-browser --profile ~/.agent-browser/github open https://github.com/login

# Next time: already logged in!
agent-browser --profile ~/.agent-browser/github open https://github.com/settings

πŸ“€ File Upload Without Dialogs

Upload files programmatically - no clicking through file dialogs.

agent-browser upload @e68 "/path/to/file.pdf"

🀝 Human-in-the-Loop

Connect to your existing Chrome browser with all your logins and extensions.

chrome --remote-debugging-port=9222
agent-browser connect http://127.0.0.1:9222

⚑ AI-Optimized Design

Uses ref-based element selection instead of CSS selectors, reducing context consumption by 93%.


πŸš€ Quick Start

Prerequisites

Installation (Literally One Command)

# In Claude Code, run:
/agent-browser-installer

The installer will:

  1. βœ… Check your system
  2. βœ… Install agent-browser
  3. βœ… Download Chromium
  4. βœ… Set up the skill
  5. βœ… Verify everything works

Your First Automation

Just ask Claude naturally:

"Open Google and search for 'AI news'"
"Take a screenshot of example.com"
"Login to GitHub and save my session"

Claude will automatically use agent-browser to complete these tasks!


πŸ’‘ Usage Examples

🌱 Beginner: Take a Screenshot

agent-browser open https://example.com
agent-browser screenshot ~/Desktop/example.png
agent-browser close

What this does: Opens a website, captures what you see, saves it to your desktop.

🌿 Intermediate: Save Login Session

# First time: login with visible browser
agent-browser --profile ~/.agent-browser/github --headed open https://github.com/login
# (Login manually in the browser window)

# Next time: automatically logged in!
agent-browser --profile ~/.agent-browser/github open https://github.com/settings
agent-browser screenshot ~/Desktop/github-settings.png
agent-browser close

What this does: Saves your login session so you don't have to login again. Ever.

🌳 Advanced: Automate Form Filling

agent-browser open https://example.com/contact

# Get interactive elements
agent-browser snapshot -i
# Output: Shows all buttons, inputs, links with refs like @e1, @e2, etc.

# Fill the form
agent-browser fill @e1 "John Doe"
agent-browser fill @e2 "john@example.com"
agent-browser fill @e3 "Hello, this is my message"

# Submit
agent-browser click @e4

agent-browser close

What this does: Finds all interactive elements on a page, fills them out, and submits the form automatically.

πŸš€ Expert: Upload File to GitHub Profile

# Open with saved session
agent-browser --profile ~/.agent-browser/github open https://github.com/settings/profile

# Find the file input
agent-browser snapshot -i
# Look for: input "Upload new picture" [ref=@e68]

# Upload your avatar
agent-browser upload @e68 "/home/user/Pictures/avatar.jpg"

# Save changes
agent-browser click @e65

# Confirm
agent-browser screenshot ~/Desktop/profile-updated.png
agent-browser close

What this does: Uploads a file to GitHub without clicking through file dialogs. Works for any website.


πŸ—οΈ How It Works

β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  You: "Take a screenshot of example.com"                    β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Claude Code: Understands intent, calls agent-browser skill β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  agent-browser: Executes browser automation                 β”‚
β”‚  - Opens Chromium browser                                   β”‚
β”‚  - Navigates to example.com                                 β”‚
β”‚  - Takes screenshot                                          β”‚
β”‚  - Saves to disk                                            β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”¬β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜
                         β”‚
                         β–Ό
β”Œβ”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”
β”‚  Result: Screenshot saved to ~/Desktop/example.png          β”‚
β””β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”€β”˜

Under the Hood:

  • agent-browser (by Vercel Labs): Core automation engine
  • Playwright: Browser control framework
  • Chromium: The actual browser
  • Our Skills: Claude Code integration layer

πŸ“š Command Reference

Navigation

agent-browser open <url>          # Navigate to URL
agent-browser back                # Go back one page
agent-browser forward             # Go forward one page
agent-browser reload              # Reload current page

Element Interaction

agent-browser snapshot -i         # Get interactive elements with refs
agent-browser click @ref          # Click element
agent-browser fill @ref <text>    # Fill input field
agent-browser type @ref <text>    # Type into element
agent-browser hover @ref          # Hover over element

Information Retrieval

agent-browser get text <selector> # Get text content
agent-browser get title           # Get page title
agent-browser get url             # Get current URL
agent-browser screenshot [path]   # Take screenshot

File Operations

agent-browser upload @ref <filepath>  # Upload file

Session Management

agent-browser --profile <path> open <url>  # Use persistent profile
agent-browser --headed open <url>          # Show browser window
agent-browser close                        # Close browser

Advanced

agent-browser connect [endpoint]           # Connect to Chrome debugging port
agent-browser batch "cmd1" "cmd2" "cmd3"   # Execute multiple commands
agent-browser wait <ms>                    # Wait for specified time

πŸ”§ Advanced Features

Profile Management

Create separate profiles for different accounts:

# Work GitHub account
agent-browser --profile ~/.agent-browser/github-work open https://github.com

# Personal GitHub account
agent-browser --profile ~/.agent-browser/github-personal open https://github.com

# Twitter account
agent-browser --profile ~/.agent-browser/twitter open https://twitter.com

Why this matters: Each profile has its own cookies, localStorage, and login sessions. No more logging in and out!

Batch Operations

Execute multiple commands in one go:

agent-browser batch "open https://example.com" "snapshot -i" "screenshot /tmp/page.png" "close"

Why this matters: Faster execution, less overhead, perfect for automation scripts.

Connect to Existing Chrome

Use your existing Chrome browser with all your logins:

# Step 1: Start Chrome with debugging
chrome --remote-debugging-port=9222

# Step 2: Connect agent-browser
agent-browser connect http://127.0.0.1:9222

# Step 3: Use as normal (already logged in everywhere!)
agent-browser open https://gmail.com
agent-browser get title

Why this matters: No need to login again. Use your existing browser state.


πŸ”’ Security & Privacy

What We Do

  • βœ… 100% Local: All automation runs on your machine
  • βœ… No Telemetry: Zero data collection or tracking
  • βœ… Profile Isolation: Each profile is completely isolated
  • βœ… Open Source: Audit the code yourself

What You Should Do

  • ⚠️ Never commit profiles: They contain cookies and login tokens
  • ⚠️ Use separate profiles: Different accounts = different profiles
  • ⚠️ Backup important profiles: They're just directories, easy to backup
  • ⚠️ Review permissions: Check what websites you're automating

πŸ› οΈ Troubleshooting

"Command not found"

Problem: agent-browser: command not found

Solution:

# Check if installed
npm list -g agent-browser

# If not, reinstall
/agent-browser-installer

"No interactive elements found"

Problem: snapshot -i returns empty

Solution: Wait for page to load

agent-browser open https://example.com
agent-browser wait 3000  # Wait 3 seconds
agent-browser snapshot -i

"Profile not persisting"

Problem: Login session not saved

Solution: Use absolute paths

# ❌ Wrong (relative path)
agent-browser --profile ./profile open https://example.com

# βœ… Correct (absolute path)
agent-browser --profile ~/.agent-browser/profile open https://example.com

More Help


πŸ“– Documentation


🀝 Contributing

We welcome contributions! Whether you're:

  • πŸ› Reporting bugs
  • πŸ’‘ Suggesting features
  • πŸ“ Improving documentation
  • πŸ”§ Submitting code

See CONTRIBUTING.md for guidelines.


πŸ“„ License

This project is licensed under the Apache-2.0 License - see LICENSE for details.

What this means: You can use, modify, and distribute this project freely, even commercially. Just include the license and copyright notice.


πŸ™ Acknowledgments

This project stands on the shoulders of giants:

  • agent-browser by Vercel Labs - The core automation engine that makes everything possible
  • Playwright by Microsoft - The browser automation framework
  • Claude Code by Anthropic - The AI development environment
  • Open Source Community - For inspiration, feedback, and contributions

πŸ“ž Support & Community


🌟 Star History

If you find this project useful, please consider giving it a star! It helps others discover the project.

Star History Chart


Made with ❀️ for the Claude Code community

Get Started β€’ View on GitHub β€’ Report Issue

About

πŸš€ One-command browser automation for Claude Code | Session persistence | File upload | AI-native design

Resources

License

Contributing

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors