Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
15 changes: 10 additions & 5 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Shadow DOM + iframes), multi-step plans, structured extraction, visual diff, and
guardrails** for payments and bookings. It drives real Chromium, so it reads **Next.js / SPA**
pages after hydration — not just static HTML.

> 37 MCP tools · stealth + rotating proxies · HTTP fast-path (single, batch & crawl) · full-site content + screenshot snapshots · virtualized-list scraping · HAR record/replay · pixel visual-diff · human handoff + live view.
> 44 MCP tools · stealth + rotating proxies · HTTP fast-path (single, batch & crawl) · full-site content + screenshot snapshots · structured per-card product extraction · virtualized-list scraping + autoscroll · tabs / dialogs / downloads · console + network logs · MCP screenshot resources · `FUSE_CAPS` tool-group filtering · named auth profiles · `blockResources` · HAR record/replay · pixel visual-diff · human handoff + live view.

## Install

Expand All @@ -37,13 +37,15 @@ Prefer a terminal? Install the CLI: `npm i -g @fusengine/browser-mcp`
```bash
fuse-browser probe https://example.com --extract-prices
fuse-browser fetch https://books.toscrape.com/ --extract-prices # no browser, ~10× faster
fuse-browser products "https://www.digitec.ch/en/search?q=macbook" --limit 20 # structured cards → sort to find the cheapest
```

## How it works

An LLM runs a **perceive → decide → act** loop through the tools: `browser_open` →
`browser_navigate` → `browser_snapshot` (indexed `ref`s + form state) → `browser_act`
(click/fill/select/pick, returns a page diff) → `browser_wait_for` → `browser_extract` /
(click/fill/select/pick, returns a page diff) → `browser_wait_for` → `browser_autoscroll`
(drain lazy lists) → `browser_products` / `browser_collect` / `browser_extract` /
`browser_screenshot`. Sensitive actions (pay / book / checkout) are **blocked** unless the
agent passes `humanApproved`.

Expand All @@ -52,18 +54,21 @@ agent passes `humanApproved`.
- **Stealth** — Patchright neutralizes the real automation signals; per-country identity + rotating proxy pool.
- **Agentic targeting** — accessibility-style snapshot with stable refs, self-healing click/fill, multi-step plans.
- **Vision (Set-of-Marks)** — `annotate:true` on `browser_snapshot`/`browser_act`/`browser_screenshot` draws numbered badges (= each `ref`) on the page, so vision models *see* it and target by ref.
- **Sees everything** — open Shadow DOM, same/cross-origin iframes, and **virtualized/infinite lists** (`browser_collect`).
- **Sees everything** — open Shadow DOM, same/cross-origin iframes, and **virtualized/infinite lists** (`browser_collect`, `browser_autoscroll` to drain lazy-loaded results first).
- **Structured extraction** — `browser_products` pulls **per-card** rows (`{title, price, currency, url?}`, each price tied to its own title) by detecting repeated card containers — works on Digitec, Booking, Amazon… Sort by price to answer "which is the cheapest?". **Layout-agnostic prices**: prefix/suffix currency, thousands/decimal markup, CH/EU formats. Also exposed as the CLI `products` command.
- **Full session control** — multi-tab (`browser_tabs`: list/new/select/close popups & OAuth windows), native dialog policy (`browser_dialog`), captured `browser_downloads`, plus `browser_console` / `browser_network` logs to debug why a page misbehaves.
- **Fast-path** — `browser_fetch` impersonates a real Chrome TLS fingerprint for server-rendered HTML, no browser launch — returns clean **markdown** and optional **contacts** (`extractContacts`) at ~HTTP speed. **JSON APIs / plain text** come back verbatim (no HTML mangling). Opt-in **`browserFallback`** auto-renders client-side (SPA/CSR) pages in a real browser when the HTTP response is an empty shell (`escalated: true`). **`browser_fetch_batch`** fetches many URLs in parallel (bounded concurrency, errors isolated per URL). **`browser_crawl`** walks a whole site (bounded same-origin BFS, robots-honored) → clean markdown per page. **`browser_shots_batch`** captures responsive full-page screenshots of many URLs in parallel (see the design of a whole set of pages at once). **`browser_collect_batch`** exhausts the infinite-scroll list of many listing URLs at once (crawl finds the pages, collect drains them). **`browser_site_shots`** snapshots a whole site in one call — crawl + screenshot each page, returning content **and** responsive PNGs per page.
- **Data out** — multi-currency prices, typed CSS extraction, **contact extraction** (emails/phones E.164, `fastPathFirst` cascade), a clean→validate→dedupe→emit pipeline, CSV export, Google SERP rank tracking.
- **Ops** — persistent sessions, **auto crash recovery** (a crashed page is recreated in the same context and restored to its last URL between calls), opt-in **per-host circuit breaker** + **bounded probe queue/budget** + **`browser_metrics`** for mass scraping, **live view** (watch any session — even headless — in your browser), `storageState` auto-save, HAR record/replay, pixel `visual_diff`, human handoff for login/2FA.
- **Ops** — persistent sessions, **auto crash recovery** (a crashed page is recreated in the same context and restored to its last URL between calls), opt-in **per-host circuit breaker** + **bounded probe queue/budget** + **`browser_metrics`** for mass scraping, **live view** (watch any session — even headless — in your browser), **`screenshot://{sessionId}/last` MCP resource** (read a session's current page as a JPEG on demand), `storageState` auto-save, **named auth profiles** (`profile`), **`blockResources`** to skip images/fonts/etc. on batch runs, HAR record/replay, pixel `visual_diff`, human handoff for login/2FA.
- **Context control** — **`FUSE_CAPS`** registers only the tool groups you need (`core`/`batch`/`extract`/`debug`/`live`) for a lighter LLM context, and the batch tools emit MCP **progress notifications** when the client sends a `progressToken`.

## Documentation

Full reference in **[`docs/`](./docs/README.md)**:

[Installation](./docs/installation.md) ·
[CLI](./docs/cli.md) ·
[MCP tools (37)](./docs/mcp-tools.md) ·
[MCP tools (44)](./docs/mcp-tools.md) ·
[Configuration](./docs/configuration.md) ·
[Sessions](./docs/sessions.md) ·
[Extraction](./docs/extraction.md) ·
Expand Down
4 changes: 2 additions & 2 deletions docs/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -6,8 +6,8 @@ New here? Start with the root [README](../README.md), then dive in:
| Doc | What's inside |
| --- | --- |
| [Installation](./installation.md) | Requirements, install, Chromium, MCP registration, the three ways to get a browser |
| [CLI](./cli.md) | `probe` / `fetch` / `fetch-batch` / `crawl` / `collect-batch` / `shots` / `shots-batch` / `site-shots` / `serp-batch` + every flag |
| [MCP tools](./mcp-tools.md) | All 37 tools with parameters and examples |
| [CLI](./cli.md) | `probe` / `fetch` / `fetch-batch` / `crawl` / `collect-batch` / `shots` / `shots-batch` / `site-shots` / `serp-batch` + one-shot page commands (`run` / `products` / `extract` / `snapshot` / `screenshot` / `inspect`) + every flag |
| [MCP tools](./mcp-tools.md) | All 44 tools with parameters and examples |
| [Configuration](./configuration.md) | `AgentOptions`, `FUSE_*` env vars, identity, retry, output location |
| [Sessions](./sessions.md) | Session lifecycle, auto crash recovery, `storageState` auto-save, HAR record/replay, CDP attach |
| [Extraction](./extraction.md) | `browser_extract` / `extract_schema` / `collect` + the clean→validate→dedupe→emit pipeline |
Expand Down
2 changes: 1 addition & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CLI

`fuse-browser` is a command-line front-end for the browser agent. It exposes nine one-shot subcommands (`probe`, `fetch`, `fetch-batch`, `crawl`, `collect-batch`, `serp-batch`, `shots`, `shots-batch`, `site-shots`) that all share a single flag parser (`node:util` `parseArgs`, strict mode), so any flag is accepted globally but only consumed by the subcommands documented below. Session-based interaction (open/navigate/click/products/autoscroll/…) is exposed through the MCP server (`browser-mcp`), not the CLI.
`fuse-browser` is a command-line front-end for the browser agent. It exposes 15 one-shot subcommands — nine batch/fast-path commands (`probe`, `fetch`, `fetch-batch`, `crawl`, `collect-batch`, `serp-batch`, `shots`, `shots-batch`, `site-shots`) plus six [page commands](#page-commands-one-shot) (`run`, `products`, `extract`, `snapshot`, `screenshot`, `inspect`) — that all share a single flag parser (`node:util` `parseArgs`, strict mode), so any flag is accepted globally but only consumed by the subcommands documented below. Stateful, multi-turn session interaction (opennavigateclick → snapshot → …) is exposed through the MCP server (`browser-mcp`), not the CLI.

```
fuse-browser probe <url> [flags]
Expand Down
2 changes: 2 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,8 @@ Read by `envAgentDefaults` (`src/server/env-defaults.ts`) and the proxy loader (
| `FUSE_STORAGE_STATE` | `storageStatePath` | Path to a storage-state JSON. |
| `FUSE_OUTPUT_DIR` | `outputDir` | Override the artifact output directory. |
| `FUSE_PROXIES` | proxy pool | Comma- or newline-separated proxy URLs; deduped, blanks dropped. Merged with `proxiesPath`. Treat as a secret. |
| `FUSE_CAPS` | tool-group filter | Comma-separated [capability groups](./mcp-tools.md#capability-groups-fuse_caps) to register (`core`/`batch`/`extract`/`debug`/`live`). Case-insensitive, whitespace-tolerant; unknown names are ignored. Blank/unset (or only-unknown) = all 44 tools. Server-only (no per-call/library equivalent). |
| `FUSE_NETLOG_MAX` | network/console log cap | Max entries kept per session in `browser_console` / `browser_network` (oldest dropped). Positive integer; default `250`. |

### MCP config example

Expand Down
44 changes: 40 additions & 4 deletions docs/mcp-tools.md
Original file line number Diff line number Diff line change
@@ -1,10 +1,11 @@
# MCP tools

Complete reference for the 37 `browser_*` tools exposed by the fuse-browser MCP server.
Complete reference for the 44 `browser_*` tools exposed by the fuse-browser MCP server.

Tools fall into two families:

- **One-shot / fast-path** (`browser_probe`, `browser_probe_html`, `browser_fetch`, `browser_fetch_batch`, `browser_crawl`, `browser_collect_batch`, `browser_shots_batch`, `browser_site_shots`, `browser_serp_batch`) open a fresh browser (or do a pure HTTP fetch) per call and return a report. No session id needed.
- **Structured extraction** (`browser_products`, `browser_collect`, `browser_extract`, `browser_extract_schema`) and `browser_autoscroll` (drain lazy lists) run against a live session.
- **Session tools** require a `sessionId` obtained from `browser_open` (or `browser_connect`). They drive one persistent, stateful page.

Every field is optional unless **Required** says `yes`. Defaults shown below come from the tool itself; many can also be set globally via `FUSE_*` environment variables — see [configuration](./configuration.md). Per-call arguments always override env defaults.
Expand All @@ -13,13 +14,13 @@ The shared identity/profile options (the `agentOptionShape`) are listed once und

## Capability groups (`FUSE_CAPS`)

By default all 37 tools are registered. Set the `FUSE_CAPS` env var (comma-separated group names) to expose fewer tools — a lighter context for the LLM client:
By default all 44 tools are registered. Set the `FUSE_CAPS` env var (comma-separated group names) to expose fewer tools — a lighter context for the LLM client:

| Group | Tools |
| --- | --- |
| `core` | Session lifecycle (`browser_open`/`browser_status`/`browser_close`/`browser_connect`), navigation (`browser_navigate`/`browser_back`/`browser_forward`), actions (`browser_click`/`browser_fill`/`browser_login`/`browser_scroll`/`browser_press`/`browser_select`), `browser_tabs`, `browser_dialog`/`browser_downloads`, `browser_snapshot`/`browser_act`, `browser_wait`/`browser_wait_for`, `browser_screenshot`. |
| `core` | Session lifecycle (`browser_open`/`browser_status`/`browser_close`/`browser_connect`), navigation (`browser_navigate`/`browser_back`/`browser_forward`), actions (`browser_click`/`browser_fill`/`browser_login`/`browser_scroll`/`browser_press`/`browser_select`), `browser_tabs`, `browser_dialog`/`browser_downloads`, `browser_snapshot`/`browser_act`, `browser_wait`/`browser_wait_for`, `browser_screenshot`, `browser_autoscroll`. |
| `batch` | `browser_probe`, `browser_probe_html`, `browser_fetch`, `browser_fetch_batch`, `browser_crawl`, `browser_collect_batch`, `browser_shots_batch`, `browser_site_shots`, `browser_serp_batch`. |
| `extract` | `browser_collect`, `browser_run`, `browser_extract`, `browser_extract_schema`. |
| `extract` | `browser_collect`, `browser_run`, `browser_extract`, `browser_extract_schema`, `browser_products`. |
| `debug` | `browser_inspect`, `browser_console`, `browser_network`, `browser_visual_diff`, `browser_metrics`. |
| `live` | `browser_handoff`, `browser_live_view`, `browser_live_view_stop`. |

Expand Down Expand Up @@ -588,6 +589,25 @@ The optional `pipeline` runs a declarative clean→validate→dedupe→emit pass
{ "sessionId": "s_abc123", "item": ".result-card", "extractPrices": true, "maxSteps": 20 }
```

### browser_autoscroll

Repeatedly scroll a long / infinite list to the bottom to trigger lazy-load until it stabilises — run it **before** `browser_extract` / `browser_collect` / `browser_products` on lazy-loaded result pages so every item is in the DOM. Stops after `idleRounds` rounds without growth, at `maxScrolls`, or once `untilSelector` reaches `minCount` elements.

| Param | Type | Required | Description |
| --- | --- | --- | --- |
| `sessionId` | string | yes | Target session. |
| `maxScrolls` | integer | no | Hard cap on scroll rounds. |
| `idleRounds` | integer | no | Stop after this many rounds with no height growth. |
| `untilSelector` | string | no | Stop once this selector reaches `minCount` matches. |
| `minCount` | integer | no | Element count target for `untilSelector`. |
| `delayMs` | integer | no | Pause between scroll rounds. |

Returns `{ rounds, height, url }`.

```json
{ "sessionId": "s_abc123", "untilSelector": ".result-card", "minCount": 100 }
```

---

## Extract
Expand Down Expand Up @@ -626,6 +646,22 @@ Extract typed data from the live page via a field map. Deterministic; reads the
}
```

### browser_products

Extract structured **per-card** product rows from an e-commerce / search-results page: one `{title, price, currency, url?}` per card, each price tied to its own title (unlike flat price scraping). Generic — detects repeated card containers by structure, so it works on Digitec, Booking, Amazon… Prices are parsed **layout-agnostically** (prefix/suffix currency, thousands/decimal markup, CH/EU formats). Sort the rows by price to answer "which is the cheapest?". Also exposed as the CLI `products` command.

| Param | Type | Required | Description |
| --- | --- | --- | --- |
| `sessionId` | string | yes | Target session. |
| `limit` | integer | no | Cap the number of returned rows. |
| `containerSelector` | string | no | Pin the card-container selector (auto-detected otherwise). |

Returns `{ url, count, products: [{ title, price, currency, url? }] }`.

```json
{ "sessionId": "s_abc123", "limit": 20 }
```

---

## SERP
Expand Down