Table of Contents
💖 Browser4: a lightning-fast, coroutine-safe browser engine for your AI 💖
- 👽 Browser Agents — Fully autonomous browser agents that reason, plan, and execute end-to-end tasks.
- 🤖 Browser Automation — High-performance automation for workflows, navigation, and data extraction.
- ⚙️ Machine Learning Agent - Learns field structures across complex pages without consuming tokens.
- ⚡ Extreme Performance — Fully coroutine-safe; supports 100k ~ 200k complex page visits per machine per day.
- 🧬 Data Extraction — Hybrid of LLM, ML, and selectors for clean data across chaotic pages.
Just ask any LLM agent to use browser4-cli for browser interactions, and it will be able to perform complex tasks like this:
$prompt = @"
Install https://raw.githubusercontent.com/platonai/Browser4/refs/heads/main/cli/skill/SKILL.md and use browser4-cli and perform the following task:
1. go to amazon.com
2. search for pens to draw on whiteboards
3. compare the first 4 ones
4. write the result to a markdown file
"@
copilot -p "$prompt"
# claude "$prompt"Browser4 CLI is a powerful command-line interface for direct browser control and automation, designed for both human users and AI agents. It provides a simple syntax to perform complex browser interactions without writing code.
Browser4 CLI is compatible with Playwright and supports a wide range of commands for navigation, interaction, and data extraction. It can be used in scripts, terminal sessions, or integrated into AI agents through SKILLS.
Installs browser4-cli globally using npm:
npm install -g browser4-cli# Open browser4 without navigation
browser4-cli open
# Open in headed or headless mode
browser4-cli open --headed https://browser4.io
browser4-cli open --headless https://browser4.io
# Navigate to a page — auto-opens a session if none is active
browser4-cli goto https://playwright.dev
# Inspect the page — note the eN labels on interactive nodes
browser4-cli snapshot
# Interact using refs from the snapshot
browser4-cli click e15
browser4-cli type e15 "Hello World"
browser4-cli press e15 Enter
browser4-cli keydown Shift
browser4-cli mousemove 150 300
browser4-cli mousewheel 0 100
browser4-cli keyup Shift
# Take a screenshot and save it to disk
browser4-cli screenshot
# Use a custom server URL
browser4-cli open --server http://localhost:9090
# Execute multiple commands in one process
browser4-cli batch "goto https://playwright.dev" "snapshot"
# Stop on the first batch failure
browser4-cli batch --bail "goto https://playwright.dev" "click e1" "screenshot"
# Advanced: pipe batch commands as JSON via stdin, useful for form filling and complex workflows
echo '[
["goto", "https://example.com/form-filling"],
["click", "#reset-btn"],
["fill", "#first-name", "Bob"],
["fill", "#last-name", "Smith"],
["fill", "#email", "bob@example.com"],
["select", "#country", "uk"],
["check", "#agree-terms"],
["click", "#submit-btn"]
]' | browser4-cli batch --json
# Close the session when done
browser4-cli closeSome advanced commands are intentionally omitted from the global browser4-cli help summary.
Query them explicitly when needed:
browser4-cli help batch
browser4-cli help extract
browser4-cli help summarize
browser4-cli help agent run
browser4-cli help swarm createBrowser4 CLI is designed for use by AI agents through SKILLS + CLI.
Browser4 CLI provides two high-level interfaces for complex, multi-step browser tasks beyond the standard single-action commands:
Agent CLI (agent <subcommand>) — Submit a natural-language task and let Browser4's backend AI agent plan and execute it autonomously. The agent reasons about the page, decides which actions to take, and completes the task asynchronously. Use this when you have a goal but don't know the exact page structure, or need multi-step exploration without scripting every action.
Swarm CLI (swarm <subcommand>) — Orchestrate parallel scraping and structured data extraction across multiple browser contexts. Built for high-throughput jobs: refreshing curated URL lists, supervised fan-out browsing, and repeatable selector-based scraping. Supports X-SQL for structured queries against loaded webpages.
| Interface | Model | Use when |
|---|---|---|
| Standard commands | Single action per invocation | You know the exact refs/selectors and want precise control |
| Agent CLI | Natural-language task → autonomous execution | You have a goal but don't know the page structure; multi-step exploration |
| Swarm CLI | Parallel contexts + X-SQL queries | High-throughput scraping, structured extraction across many pages |
Submit a natural-language task and let the backend agent reason, plan, and execute autonomously:
# Submit an autonomous task — returns a task ID immediately
browser4-cli agent run "Open example.com, find the sign-up form, fill it with test data, and take a screenshot"
# Poll progress with the returned task ID
browser4-cli agent status agent-task-1
# Read the final result once the task completes
browser4-cli agent result agent-task-1What happens under the hood:
agent runsends the task to the Browser4 backend, which spawns an AI agent with tool access (navigate, click, type, snapshot, screenshot, extract, summarize, etc.).- The agent iteratively explores the page, takes snapshots, decides on actions, and executes them until the task is complete.
agent statusreturns the backend status payload (typically JSON withid,status,statusCode,processState,agentState,agentHistory,commandResult).agent resultreturns the final task output — plain text or structured JSON depending on the task.
Key notes:
agent runis asynchronous: it returns immediately after the backend accepts the task.agent runperforms a quick post-submit probe so missing LLM/API key errors fail fast.- Agent commands are task-ID based and do not require an active CLI browser session slot.
- Agent subcommands are not supported inside
batchmode.
Create a swarm session, submit URLs for scraping, and collect results at scale:
# 1) Create a swarm scrape session with parallel browser contexts
browser4-cli swarm create \
--profile-mode=TEMPORARY \
--max-open-tabs=12 \
--max-browser-contexts=3 \
--display-mode=HEADLESS
# 2) Submit URLs as scrape jobs (direct URL + seed file)
browser4-cli swarm submit https://example.com/direct \
--seed-file=./urls.txt \
--refresh --parse --store-content
# 3) Poll and fetch the result
browser4-cli swarm status scrape-task-4
browser4-cli swarm result scrape-task-4Run X-SQL queries to extract structured data from loaded webpages:
# Inline X-SQL query
browser4-cli swarm query "https://www.amazon.com/dp/B08PP5MSVB" --sql "
SELECT
dom_base_uri(dom) AS url,
dom_first_text(dom, '#productTitle') AS title,
dom_first_slim_html(dom, 'img:expr(width > 400)') AS img
FROM load_and_select(@url, 'body');
"
# Read query from a file
browser4-cli swarm query "https://www.amazon.com/dp/B08PP5MSVB" --sql @query.sql
# With seed file and load options
browser4-cli swarm query --sql @query.sql --seed-file=./urls.txt --refresh --parseKey notes:
- Seed files are plain text, one URL per line;
#comments and blank lines are ignored. - Both
swarm submitandswarm queryaccept--seed-file,--deadline,--expires,--refresh,--parse,--store-content. - All swarm commands return a task ID; track progress with
swarm status/swarm result. - Use
@urlin X-SQL templates — it is replaced with the target URL server-side.
Prerequisites: Java 17+
-
Clone the repository
git clone https://github.com/platonai/Browser4.git cd Browser4 -
Configure your LLM API key
Edit application.properties and add your API key.
-
Build the project
./mvnw -DskipTests
📺 Bilibili: https://www.bilibili.com/video/BV1kM2rYrEFC
Automatic, large-scale, high-precision field discovery and extraction powered by self-/unsupervised machine learning — no LLM API calls, no tokens, deterministic and fast.
What it does:
- Learns every extractable field on item/detail pages (often dozens to hundreds) with high precision.
- Open source when browser4 has 10K stars on GitHub.
Why not just LLMs?
- LLM extraction adds latency, cost, and token limits.
- ML-based auto extraction is local, reproducible, and scalable to 100k+ ~ 200k pages/day.
- You can still combine both: use Auto Extraction for structured baseline + LLM for semantic enrichment.
Quick Commands (PulsarRPAPro):
# NOTE: MongoDB required
curl -L -o PulsarRPAPro.jar https://github.com/platonai/PulsarRPAPro/releases/download/v3.0.0/PulsarRPAPro.jarIntegration Status:
- Available today via the companion project PulsarRPAPro.
- Native Browser4 API exposure is planned; follow releases for updates.
Key Advantages:
- High precision: >95% fields discovered; majority with >99% accuracy (indicative on tested domains).
- Resilient to selector churn & HTML noise.
- Zero external dependency (no API key) → cost-efficient at scale.
- Explainable: generated selectors & SQL are transparent and auditable.
👽 Extract data with machine learning agents:
(Coming soon: richer in-repo examples and direct API hooks.)
| Module | Description |
|---|---|
cli |
CLI in Rust that supports SKILLS |
browser4-core |
Core engine: sessions, scheduling, DOM, browser control |
browser4-agentic |
Agent implementation, MCP, and skill registration |
browser4-rest |
Spring Boot REST layer & command endpoints |
browser4-standalone |
Agent & crawler orchestration with product packaging |
examples |
Runnable examples and demos |
browser4-tests |
E2E & heavy integration & scenario tests |
Join our community for support, feedback, and collaboration!
- GitHub Discussions: Engage with developers and users.
- Issue Tracker: Report bugs or request features.
- Social Media: Follow us for updates and news.
We welcome contributions! See CONTRIBUTING.md for details.
Comprehensive documentation is available in the docs/ directory and on our GitHub Pages site.
Details
Set the environment variable PROXY_ROTATION_URL to the rotation URL provided by your proxy service provider:
export PROXY_ROTATION_URL=https://your-proxy-provider.com/rotation-endpointEach time you access this rotation URL, it should return a response containing one or more fresh proxy IPs. If you need this type of URL, please contact your proxy service provider.
Apache 2.0 License. See LICENSE for details.

