🤖 Browser4

English | 简体中文 | 中国镜像

Table of Contents

🤖 Browser4

🌟 Introduction

💖 Browser4: a lightning-fast, coroutine-safe browser engine for your AI 💖

✨ Key Capabilities

👽 Browser Agents — Fully autonomous browser agents that reason, plan, and execute end-to-end tasks.
🤖 Browser Automation — High-performance automation for workflows, navigation, and data extraction.
⚙️ Machine Learning Agent - Learns field structures across complex pages without consuming tokens.
⚡ Extreme Performance — Fully coroutine-safe; supports 100k ~ 200k complex page visits per machine per day.
🧬 Data Extraction — Hybrid of LLM, ML, and selectors for clean data across chaotic pages.

💡 Usage Examples

Quick Start

Just ask any LLM agent to use browser4-cli for browser interactions, and it will be able to perform complex tasks like this:

$prompt = @"
Install https://raw.githubusercontent.com/platonai/Browser4/refs/heads/main/cli/skill/SKILL.md and use browser4-cli and perform the following task:

1. go to amazon.com
2. search for pens to draw on whiteboards
3. compare the first 4 ones
4. write the result to a markdown file
"@

copilot -p "$prompt"
# claude "$prompt"

CLI & SKILLS

Browser4 CLI is a powerful command-line interface for direct browser control and automation, designed for both human users and AI agents. It provides a simple syntax to perform complex browser interactions without writing code.

Browser4 CLI is compatible with Playwright and supports a wide range of commands for navigation, interaction, and data extraction. It can be used in scripts, terminal sessions, or integrated into AI agents through SKILLS.

Installs browser4-cli globally using npm:

npm install -g browser4-cli

# Open browser4 without navigation
browser4-cli open

# Open in headed or headless mode
browser4-cli open --headed https://browser4.io
browser4-cli open --headless https://browser4.io

# Navigate to a page — auto-opens a session if none is active
browser4-cli goto https://playwright.dev

# Inspect the page — note the eN labels on interactive nodes
browser4-cli snapshot

# Interact using refs from the snapshot
browser4-cli click e15
browser4-cli type e15 "Hello World"
browser4-cli press e15 Enter
browser4-cli keydown Shift
browser4-cli mousemove 150 300
browser4-cli mousewheel 0 100
browser4-cli keyup Shift

# Take a screenshot and save it to disk
browser4-cli screenshot

# Use a custom server URL
browser4-cli open --server http://localhost:9090

# Execute multiple commands in one process
browser4-cli batch "goto https://playwright.dev" "snapshot"

# Stop on the first batch failure
browser4-cli batch --bail "goto https://playwright.dev" "click e1" "screenshot"

# Advanced: pipe batch commands as JSON via stdin, useful for form filling and complex workflows
echo '[
  ["goto", "https://example.com/form-filling"],
  ["click", "#reset-btn"],
  ["fill", "#first-name", "Bob"],
  ["fill", "#last-name", "Smith"],
  ["fill", "#email", "bob@example.com"],
  ["select", "#country", "uk"],
  ["check", "#agree-terms"],
  ["click", "#submit-btn"]
]' | browser4-cli batch --json

# Close the session when done
browser4-cli close

Advanced commands

Some advanced commands are intentionally omitted from the global browser4-cli help summary. Query them explicitly when needed:

browser4-cli help batch
browser4-cli help extract
browser4-cli help summarize
browser4-cli help agent run
browser4-cli help swarm create

Browser4 CLI is designed for use by AI agents through SKILLS + CLI.

SKILL.md

Agent and Swarm CLI

Browser4 CLI provides two high-level interfaces for complex, multi-step browser tasks beyond the standard single-action commands:

Agent CLI (agent <subcommand>) — Submit a natural-language task and let Browser4's backend AI agent plan and execute it autonomously. The agent reasons about the page, decides which actions to take, and completes the task asynchronously. Use this when you have a goal but don't know the exact page structure, or need multi-step exploration without scripting every action.

Swarm CLI (swarm <subcommand>) — Orchestrate parallel scraping and structured data extraction across multiple browser contexts. Built for high-throughput jobs: refreshing curated URL lists, supervised fan-out browsing, and repeatable selector-based scraping. Supports X-SQL for structured queries against loaded webpages.

Interface	Model	Use when
Standard commands	Single action per invocation	You know the exact refs/selectors and want precise control
Agent CLI	Natural-language task → autonomous execution	You have a goal but don't know the page structure; multi-step exploration
Swarm CLI	Parallel contexts + X-SQL queries	High-throughput scraping, structured extraction across many pages

Agent CLI examples

Submit a natural-language task and let the backend agent reason, plan, and execute autonomously:

# Submit an autonomous task — returns a task ID immediately
browser4-cli agent run "Open example.com, find the sign-up form, fill it with test data, and take a screenshot"

# Poll progress with the returned task ID
browser4-cli agent status agent-task-1

# Read the final result once the task completes
browser4-cli agent result agent-task-1

What happens under the hood:

agent run sends the task to the Browser4 backend, which spawns an AI agent with tool access (navigate, click, type, snapshot, screenshot, extract, summarize, etc.).
The agent iteratively explores the page, takes snapshots, decides on actions, and executes them until the task is complete.
agent status returns the backend status payload (typically JSON with id, status, statusCode, processState, agentState, agentHistory, commandResult).
agent result returns the final task output — plain text or structured JSON depending on the task.

Key notes:

agent run is asynchronous: it returns immediately after the backend accepts the task.
agent run performs a quick post-submit probe so missing LLM/API key errors fail fast.
Agent commands are task-ID based and do not require an active CLI browser session slot.
Agent subcommands are not supported inside batch mode.

Swarm CLI examples

Create a swarm session, submit URLs for scraping, and collect results at scale:

# 1) Create a swarm scrape session with parallel browser contexts
browser4-cli swarm create \
  --profile-mode=TEMPORARY \
  --max-open-tabs=12 \
  --max-browser-contexts=3 \
  --display-mode=HEADLESS

# 2) Submit URLs as scrape jobs (direct URL + seed file)
browser4-cli swarm submit https://example.com/direct \
  --seed-file=./urls.txt \
  --refresh --parse --store-content

# 3) Poll and fetch the result
browser4-cli swarm status scrape-task-4
browser4-cli swarm result scrape-task-4

Run X-SQL queries to extract structured data from loaded webpages:

# Inline X-SQL query
browser4-cli swarm query "https://www.amazon.com/dp/B08PP5MSVB" --sql "
  SELECT
    dom_base_uri(dom) AS url,
    dom_first_text(dom, '#productTitle') AS title,
    dom_first_slim_html(dom, 'img:expr(width > 400)') AS img
  FROM load_and_select(@url, 'body');
"

# Read query from a file
browser4-cli swarm query "https://www.amazon.com/dp/B08PP5MSVB" --sql @query.sql

# With seed file and load options
browser4-cli swarm query --sql @query.sql --seed-file=./urls.txt --refresh --parse

Key notes:

Seed files are plain text, one URL per line; # comments and blank lines are ignored.
Both swarm submit and swarm query accept --seed-file, --deadline, --expires, --refresh, --parse, --store-content.
All swarm commands return a task ID; track progress with swarm status / swarm result.
Use @url in X-SQL templates — it is replaced with the target URL server-side.

🚀 Build from source

Prerequisites: Java 17+

Clone the repository

git clone https://github.com/platonai/Browser4.git
cd Browser4

Configure your LLM API key

Edit application.properties and add your API key.
Build the project
```
./mvnw -DskipTests
```

🎬 YouTube:

📺 Bilibili: https://www.bilibili.com/video/BV1kM2rYrEFC

Auto Extraction

Automatic, large-scale, high-precision field discovery and extraction powered by self-/unsupervised machine learning — no LLM API calls, no tokens, deterministic and fast.

What it does:

Learns every extractable field on item/detail pages (often dozens to hundreds) with high precision.
Open source when browser4 has 10K stars on GitHub.

Why not just LLMs?

LLM extraction adds latency, cost, and token limits.
ML-based auto extraction is local, reproducible, and scalable to 100k+ ~ 200k pages/day.
You can still combine both: use Auto Extraction for structured baseline + LLM for semantic enrichment.

Quick Commands (PulsarRPAPro):

# NOTE: MongoDB required
curl -L -o PulsarRPAPro.jar https://github.com/platonai/PulsarRPAPro/releases/download/v3.0.0/PulsarRPAPro.jar

Integration Status:

Available today via the companion project PulsarRPAPro.
Native Browser4 API exposure is planned; follow releases for updates.

Key Advantages:

High precision: >95% fields discovered; majority with >99% accuracy (indicative on tested domains).
Resilient to selector churn & HTML noise.
Zero external dependency (no API key) → cost-efficient at scale.
Explainable: generated selectors & SQL are transparent and auditable.

👽 Extract data with machine learning agents:

(Coming soon: richer in-repo examples and direct API hooks.)

📦 Modules Overview

Module	Description
`cli`	CLI in Rust that supports SKILLS
`browser4-core`	Core engine: sessions, scheduling, DOM, browser control
`browser4-agentic`	Agent implementation, MCP, and skill registration
`browser4-rest`	Spring Boot REST layer & command endpoints
`browser4-standalone`	Agent & crawler orchestration with product packaging
`examples`	Runnable examples and demos
`browser4-tests`	E2E & heavy integration & scenario tests

🤝 Support & Community

Join our community for support, feedback, and collaboration!

GitHub Discussions: Engage with developers and users.
Issue Tracker: Report bugs or request features.
Social Media: Follow us for updates and news.

We welcome contributions! See CONTRIBUTING.md for details.

📜 Documentation

Comprehensive documentation is available in the docs/ directory and on our GitHub Pages site.

🔧 Proxy Configuration - Unblock Website Access

Details

Set the environment variable PROXY_ROTATION_URL to the rotation URL provided by your proxy service provider:

export PROXY_ROTATION_URL=https://your-proxy-provider.com/rotation-endpoint

Each time you access this rotation URL, it should return a response containing one or more fresh proxy IPs. If you need this type of URL, please contact your proxy service provider.

License

Apache 2.0 License. See LICENSE for details.

Name		Name	Last commit message	Last commit date
Latest commit History 7,774 Commits
.claude		.claude
.github		.github
.mvn		.mvn
bin		bin
browser4-agent-tools		browser4-agent-tools
browser4-agentic		browser4-agentic
browser4-apps		browser4-apps
browser4-boot		browser4-boot
browser4-core		browser4-core
browser4-dependencies		browser4-dependencies
browser4-rest		browser4-rest
browser4-tests		browser4-tests
cdp-protocol		cdp-protocol
cli		cli
coworker		coworker
docs		docs
examples		examples
.dockerignore		.dockerignore
.editorconfig		.editorconfig
.gitattributes		.gitattributes
.gitignore		.gitignore
AGENTS.md		AGENTS.md
Dockerfile		Dockerfile
Dockerfile.fast		Dockerfile.fast
LICENSE		LICENSE
README.md		README.md
README.zh.md		README.zh.md
ROOT.md		ROOT.md
VERSION		VERSION
application-advanced.properties		application-advanced.properties
application.properties		application.properties
b4.cmd		b4.cmd
b4.sh		b4.sh
docker-compose.yml		docker-compose.yml
mvnw		mvnw
mvnw.cmd		mvnw.cmd
pom.xml		pom.xml
qodana.yaml		qodana.yaml

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

🤖 Browser4

🌟 Introduction

✨ Key Capabilities

💡 Usage Examples

Quick Start

CLI & SKILLS

Advanced commands

Agent and Swarm CLI

Agent CLI examples

Swarm CLI examples

🚀 Build from source

Auto Extraction

📦 Modules Overview

🤝 Support & Community

📜 Documentation

🔧 Proxy Configuration - Unblock Website Access

License

About

Uh oh!

Releases 18

Packages

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

🤖 Browser4

🌟 Introduction

✨ Key Capabilities

💡 Usage Examples

Quick Start

CLI & SKILLS

Advanced commands

Agent and Swarm CLI

Agent CLI examples

Swarm CLI examples

🚀 Build from source

Auto Extraction

📦 Modules Overview

🤝 Support & Community

📜 Documentation

🔧 Proxy Configuration - Unblock Website Access

License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases 18

Packages 0

Uh oh!

Uh oh!

Contributors

Uh oh!

Languages

Packages