From 0d26a3fd43d3b2147ddf97afe8b7d00a4c1d9d80 Mon Sep 17 00:00:00 2001 From: Bruno Azoulay Date: Thu, 11 Jun 2026 13:13:05 +0200 Subject: [PATCH] docs: sync README + docs to the real surface (44 MCP tools, 15 CLI commands) 37 to 44 MCP tools (matches registry.ts + mcp.test.ts); add browser_products + browser_autoscroll to mcp-tools.md; document FUSE_CAPS + FUSE_NETLOG_MAX; cli.md 9 to 15 commands. --- README.md | 15 ++++++++++----- docs/README.md | 4 ++-- docs/cli.md | 2 +- docs/configuration.md | 2 ++ docs/mcp-tools.md | 44 +++++++++++++++++++++++++++++++++++++++---- 5 files changed, 55 insertions(+), 12 deletions(-) diff --git a/README.md b/README.md index ca676fb..aeb5d35 100644 --- a/README.md +++ b/README.md @@ -10,7 +10,7 @@ Shadow DOM + iframes), multi-step plans, structured extraction, visual diff, and guardrails** for payments and bookings. It drives real Chromium, so it reads **Next.js / SPA** pages after hydration — not just static HTML. -> 37 MCP tools · stealth + rotating proxies · HTTP fast-path (single, batch & crawl) · full-site content + screenshot snapshots · virtualized-list scraping · HAR record/replay · pixel visual-diff · human handoff + live view. +> 44 MCP tools · stealth + rotating proxies · HTTP fast-path (single, batch & crawl) · full-site content + screenshot snapshots · structured per-card product extraction · virtualized-list scraping + autoscroll · tabs / dialogs / downloads · console + network logs · MCP screenshot resources · `FUSE_CAPS` tool-group filtering · named auth profiles · `blockResources` · HAR record/replay · pixel visual-diff · human handoff + live view. ## Install @@ -37,13 +37,15 @@ Prefer a terminal? Install the CLI: `npm i -g @fusengine/browser-mcp` ```bash fuse-browser probe https://example.com --extract-prices fuse-browser fetch https://books.toscrape.com/ --extract-prices # no browser, ~10× faster +fuse-browser products "https://www.digitec.ch/en/search?q=macbook" --limit 20 # structured cards → sort to find the cheapest ``` ## How it works An LLM runs a **perceive → decide → act** loop through the tools: `browser_open` → `browser_navigate` → `browser_snapshot` (indexed `ref`s + form state) → `browser_act` -(click/fill/select/pick, returns a page diff) → `browser_wait_for` → `browser_extract` / +(click/fill/select/pick, returns a page diff) → `browser_wait_for` → `browser_autoscroll` +(drain lazy lists) → `browser_products` / `browser_collect` / `browser_extract` / `browser_screenshot`. Sensitive actions (pay / book / checkout) are **blocked** unless the agent passes `humanApproved`. @@ -52,10 +54,13 @@ agent passes `humanApproved`. - **Stealth** — Patchright neutralizes the real automation signals; per-country identity + rotating proxy pool. - **Agentic targeting** — accessibility-style snapshot with stable refs, self-healing click/fill, multi-step plans. - **Vision (Set-of-Marks)** — `annotate:true` on `browser_snapshot`/`browser_act`/`browser_screenshot` draws numbered badges (= each `ref`) on the page, so vision models *see* it and target by ref. -- **Sees everything** — open Shadow DOM, same/cross-origin iframes, and **virtualized/infinite lists** (`browser_collect`). +- **Sees everything** — open Shadow DOM, same/cross-origin iframes, and **virtualized/infinite lists** (`browser_collect`, `browser_autoscroll` to drain lazy-loaded results first). +- **Structured extraction** — `browser_products` pulls **per-card** rows (`{title, price, currency, url?}`, each price tied to its own title) by detecting repeated card containers — works on Digitec, Booking, Amazon… Sort by price to answer "which is the cheapest?". **Layout-agnostic prices**: prefix/suffix currency, thousands/decimal markup, CH/EU formats. Also exposed as the CLI `products` command. +- **Full session control** — multi-tab (`browser_tabs`: list/new/select/close popups & OAuth windows), native dialog policy (`browser_dialog`), captured `browser_downloads`, plus `browser_console` / `browser_network` logs to debug why a page misbehaves. - **Fast-path** — `browser_fetch` impersonates a real Chrome TLS fingerprint for server-rendered HTML, no browser launch — returns clean **markdown** and optional **contacts** (`extractContacts`) at ~HTTP speed. **JSON APIs / plain text** come back verbatim (no HTML mangling). Opt-in **`browserFallback`** auto-renders client-side (SPA/CSR) pages in a real browser when the HTTP response is an empty shell (`escalated: true`). **`browser_fetch_batch`** fetches many URLs in parallel (bounded concurrency, errors isolated per URL). **`browser_crawl`** walks a whole site (bounded same-origin BFS, robots-honored) → clean markdown per page. **`browser_shots_batch`** captures responsive full-page screenshots of many URLs in parallel (see the design of a whole set of pages at once). **`browser_collect_batch`** exhausts the infinite-scroll list of many listing URLs at once (crawl finds the pages, collect drains them). **`browser_site_shots`** snapshots a whole site in one call — crawl + screenshot each page, returning content **and** responsive PNGs per page. - **Data out** — multi-currency prices, typed CSS extraction, **contact extraction** (emails/phones E.164, `fastPathFirst` cascade), a clean→validate→dedupe→emit pipeline, CSV export, Google SERP rank tracking. -- **Ops** — persistent sessions, **auto crash recovery** (a crashed page is recreated in the same context and restored to its last URL between calls), opt-in **per-host circuit breaker** + **bounded probe queue/budget** + **`browser_metrics`** for mass scraping, **live view** (watch any session — even headless — in your browser), `storageState` auto-save, HAR record/replay, pixel `visual_diff`, human handoff for login/2FA. +- **Ops** — persistent sessions, **auto crash recovery** (a crashed page is recreated in the same context and restored to its last URL between calls), opt-in **per-host circuit breaker** + **bounded probe queue/budget** + **`browser_metrics`** for mass scraping, **live view** (watch any session — even headless — in your browser), **`screenshot://{sessionId}/last` MCP resource** (read a session's current page as a JPEG on demand), `storageState` auto-save, **named auth profiles** (`profile`), **`blockResources`** to skip images/fonts/etc. on batch runs, HAR record/replay, pixel `visual_diff`, human handoff for login/2FA. +- **Context control** — **`FUSE_CAPS`** registers only the tool groups you need (`core`/`batch`/`extract`/`debug`/`live`) for a lighter LLM context, and the batch tools emit MCP **progress notifications** when the client sends a `progressToken`. ## Documentation @@ -63,7 +68,7 @@ Full reference in **[`docs/`](./docs/README.md)**: [Installation](./docs/installation.md) · [CLI](./docs/cli.md) · -[MCP tools (37)](./docs/mcp-tools.md) · +[MCP tools (44)](./docs/mcp-tools.md) · [Configuration](./docs/configuration.md) · [Sessions](./docs/sessions.md) · [Extraction](./docs/extraction.md) · diff --git a/docs/README.md b/docs/README.md index db41914..0f476f4 100644 --- a/docs/README.md +++ b/docs/README.md @@ -6,8 +6,8 @@ New here? Start with the root [README](../README.md), then dive in: | Doc | What's inside | | --- | --- | | [Installation](./installation.md) | Requirements, install, Chromium, MCP registration, the three ways to get a browser | -| [CLI](./cli.md) | `probe` / `fetch` / `fetch-batch` / `crawl` / `collect-batch` / `shots` / `shots-batch` / `site-shots` / `serp-batch` + every flag | -| [MCP tools](./mcp-tools.md) | All 37 tools with parameters and examples | +| [CLI](./cli.md) | `probe` / `fetch` / `fetch-batch` / `crawl` / `collect-batch` / `shots` / `shots-batch` / `site-shots` / `serp-batch` + one-shot page commands (`run` / `products` / `extract` / `snapshot` / `screenshot` / `inspect`) + every flag | +| [MCP tools](./mcp-tools.md) | All 44 tools with parameters and examples | | [Configuration](./configuration.md) | `AgentOptions`, `FUSE_*` env vars, identity, retry, output location | | [Sessions](./sessions.md) | Session lifecycle, auto crash recovery, `storageState` auto-save, HAR record/replay, CDP attach | | [Extraction](./extraction.md) | `browser_extract` / `extract_schema` / `collect` + the clean→validate→dedupe→emit pipeline | diff --git a/docs/cli.md b/docs/cli.md index 9e9a88e..53366dc 100644 --- a/docs/cli.md +++ b/docs/cli.md @@ -1,6 +1,6 @@ # CLI -`fuse-browser` is a command-line front-end for the browser agent. It exposes nine one-shot subcommands (`probe`, `fetch`, `fetch-batch`, `crawl`, `collect-batch`, `serp-batch`, `shots`, `shots-batch`, `site-shots`) that all share a single flag parser (`node:util` `parseArgs`, strict mode), so any flag is accepted globally but only consumed by the subcommands documented below. Session-based interaction (open/navigate/click/products/autoscroll/…) is exposed through the MCP server (`browser-mcp`), not the CLI. +`fuse-browser` is a command-line front-end for the browser agent. It exposes 15 one-shot subcommands — nine batch/fast-path commands (`probe`, `fetch`, `fetch-batch`, `crawl`, `collect-batch`, `serp-batch`, `shots`, `shots-batch`, `site-shots`) plus six [page commands](#page-commands-one-shot) (`run`, `products`, `extract`, `snapshot`, `screenshot`, `inspect`) — that all share a single flag parser (`node:util` `parseArgs`, strict mode), so any flag is accepted globally but only consumed by the subcommands documented below. Stateful, multi-turn session interaction (open → navigate → click → snapshot → …) is exposed through the MCP server (`browser-mcp`), not the CLI. ``` fuse-browser probe [flags] diff --git a/docs/configuration.md b/docs/configuration.md index f20cc13..7dd3463 100644 --- a/docs/configuration.md +++ b/docs/configuration.md @@ -78,6 +78,8 @@ Read by `envAgentDefaults` (`src/server/env-defaults.ts`) and the proxy loader ( | `FUSE_STORAGE_STATE` | `storageStatePath` | Path to a storage-state JSON. | | `FUSE_OUTPUT_DIR` | `outputDir` | Override the artifact output directory. | | `FUSE_PROXIES` | proxy pool | Comma- or newline-separated proxy URLs; deduped, blanks dropped. Merged with `proxiesPath`. Treat as a secret. | +| `FUSE_CAPS` | tool-group filter | Comma-separated [capability groups](./mcp-tools.md#capability-groups-fuse_caps) to register (`core`/`batch`/`extract`/`debug`/`live`). Case-insensitive, whitespace-tolerant; unknown names are ignored. Blank/unset (or only-unknown) = all 44 tools. Server-only (no per-call/library equivalent). | +| `FUSE_NETLOG_MAX` | network/console log cap | Max entries kept per session in `browser_console` / `browser_network` (oldest dropped). Positive integer; default `250`. | ### MCP config example diff --git a/docs/mcp-tools.md b/docs/mcp-tools.md index 7f332d4..322ee07 100644 --- a/docs/mcp-tools.md +++ b/docs/mcp-tools.md @@ -1,10 +1,11 @@ # MCP tools -Complete reference for the 37 `browser_*` tools exposed by the fuse-browser MCP server. +Complete reference for the 44 `browser_*` tools exposed by the fuse-browser MCP server. Tools fall into two families: - **One-shot / fast-path** (`browser_probe`, `browser_probe_html`, `browser_fetch`, `browser_fetch_batch`, `browser_crawl`, `browser_collect_batch`, `browser_shots_batch`, `browser_site_shots`, `browser_serp_batch`) open a fresh browser (or do a pure HTTP fetch) per call and return a report. No session id needed. +- **Structured extraction** (`browser_products`, `browser_collect`, `browser_extract`, `browser_extract_schema`) and `browser_autoscroll` (drain lazy lists) run against a live session. - **Session tools** require a `sessionId` obtained from `browser_open` (or `browser_connect`). They drive one persistent, stateful page. Every field is optional unless **Required** says `yes`. Defaults shown below come from the tool itself; many can also be set globally via `FUSE_*` environment variables — see [configuration](./configuration.md). Per-call arguments always override env defaults. @@ -13,13 +14,13 @@ The shared identity/profile options (the `agentOptionShape`) are listed once und ## Capability groups (`FUSE_CAPS`) -By default all 37 tools are registered. Set the `FUSE_CAPS` env var (comma-separated group names) to expose fewer tools — a lighter context for the LLM client: +By default all 44 tools are registered. Set the `FUSE_CAPS` env var (comma-separated group names) to expose fewer tools — a lighter context for the LLM client: | Group | Tools | | --- | --- | -| `core` | Session lifecycle (`browser_open`/`browser_status`/`browser_close`/`browser_connect`), navigation (`browser_navigate`/`browser_back`/`browser_forward`), actions (`browser_click`/`browser_fill`/`browser_login`/`browser_scroll`/`browser_press`/`browser_select`), `browser_tabs`, `browser_dialog`/`browser_downloads`, `browser_snapshot`/`browser_act`, `browser_wait`/`browser_wait_for`, `browser_screenshot`. | +| `core` | Session lifecycle (`browser_open`/`browser_status`/`browser_close`/`browser_connect`), navigation (`browser_navigate`/`browser_back`/`browser_forward`), actions (`browser_click`/`browser_fill`/`browser_login`/`browser_scroll`/`browser_press`/`browser_select`), `browser_tabs`, `browser_dialog`/`browser_downloads`, `browser_snapshot`/`browser_act`, `browser_wait`/`browser_wait_for`, `browser_screenshot`, `browser_autoscroll`. | | `batch` | `browser_probe`, `browser_probe_html`, `browser_fetch`, `browser_fetch_batch`, `browser_crawl`, `browser_collect_batch`, `browser_shots_batch`, `browser_site_shots`, `browser_serp_batch`. | -| `extract` | `browser_collect`, `browser_run`, `browser_extract`, `browser_extract_schema`. | +| `extract` | `browser_collect`, `browser_run`, `browser_extract`, `browser_extract_schema`, `browser_products`. | | `debug` | `browser_inspect`, `browser_console`, `browser_network`, `browser_visual_diff`, `browser_metrics`. | | `live` | `browser_handoff`, `browser_live_view`, `browser_live_view_stop`. | @@ -588,6 +589,25 @@ The optional `pipeline` runs a declarative clean→validate→dedupe→emit pass { "sessionId": "s_abc123", "item": ".result-card", "extractPrices": true, "maxSteps": 20 } ``` +### browser_autoscroll + +Repeatedly scroll a long / infinite list to the bottom to trigger lazy-load until it stabilises — run it **before** `browser_extract` / `browser_collect` / `browser_products` on lazy-loaded result pages so every item is in the DOM. Stops after `idleRounds` rounds without growth, at `maxScrolls`, or once `untilSelector` reaches `minCount` elements. + +| Param | Type | Required | Description | +| --- | --- | --- | --- | +| `sessionId` | string | yes | Target session. | +| `maxScrolls` | integer | no | Hard cap on scroll rounds. | +| `idleRounds` | integer | no | Stop after this many rounds with no height growth. | +| `untilSelector` | string | no | Stop once this selector reaches `minCount` matches. | +| `minCount` | integer | no | Element count target for `untilSelector`. | +| `delayMs` | integer | no | Pause between scroll rounds. | + +Returns `{ rounds, height, url }`. + +```json +{ "sessionId": "s_abc123", "untilSelector": ".result-card", "minCount": 100 } +``` + --- ## Extract @@ -626,6 +646,22 @@ Extract typed data from the live page via a field map. Deterministic; reads the } ``` +### browser_products + +Extract structured **per-card** product rows from an e-commerce / search-results page: one `{title, price, currency, url?}` per card, each price tied to its own title (unlike flat price scraping). Generic — detects repeated card containers by structure, so it works on Digitec, Booking, Amazon… Prices are parsed **layout-agnostically** (prefix/suffix currency, thousands/decimal markup, CH/EU formats). Sort the rows by price to answer "which is the cheapest?". Also exposed as the CLI `products` command. + +| Param | Type | Required | Description | +| --- | --- | --- | --- | +| `sessionId` | string | yes | Target session. | +| `limit` | integer | no | Cap the number of returned rows. | +| `containerSelector` | string | no | Pin the card-container selector (auto-detected otherwise). | + +Returns `{ url, count, products: [{ title, price, currency, url? }] }`. + +```json +{ "sessionId": "s_abc123", "limit": 20 } +``` + --- ## SERP