Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
3 changes: 3 additions & 0 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,9 @@ jobs:
- name: Typecheck
run: bun run typecheck

- name: Lint
run: bun run lint

- name: Unit tests
run: bun test tests/unit

Expand Down
38 changes: 38 additions & 0 deletions .github/workflows/stealth-bench.yml
Original file line number Diff line number Diff line change
@@ -0,0 +1,38 @@
name: Stealth Bench

# Separate, non-blocking benchmark: validates anti-bot stealth
# non-regression against public detection pages. Independent of CI —
# never gates merges. Manual dispatch + weekly schedule.
on:
workflow_dispatch:
schedule:
# Mondays 06:00 UTC
- cron: "0 6 * * 1"

jobs:
stealth-bench:
runs-on: ubuntu-latest
timeout-minutes: 15
# Network-dependent on third-party sites; do not fail the suite on flakes.
continue-on-error: true
steps:
- uses: actions/checkout@v6

- name: Setup Bun
uses: oven-sh/setup-bun@v2
with:
bun-version: latest

- name: Setup Node
uses: actions/setup-node@v6
with:
node-version: 22

- name: Install dependencies
run: bun install --frozen-lockfile

- name: Install Chromium (Patchright + system deps)
run: bunx patchright install --with-deps chromium

- name: Run stealth benchmark
run: node --import tsx tests/live/stealth-bench.ts
22 changes: 22 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,27 @@
# Changelog

## [0.1.55] - 11-06-2026

### Added

- **Layout-agnostic price extraction** — captures a currency whether it sits before or after the amount (`CHF 6.90`, `6.90 CHF`, `10 €`, `350 kr`), even when the symbol and number land on separate DOM lines (`CHF\n6.90`, the digitec case that previously yielded zero prices). Handles non-breaking/narrow spaces and CH (`1'234.56`) / EU (`1.234,56`) decimal formats. Each price now carries a short `context` label of its surrounding line.
- **Structured per-card extraction** — `browser_products` MCP tool + `fuse-browser products <url>` CLI return `{title, price, currency, url}` grouped by repeated product card, plus an `extract_schema` container mode that iterates card-by-card.
- **New MCP tools** — `browser_tabs` (OAuth popups/multi-tab), `browser_dialog` + `browser_downloads`, `browser_console` + `browser_network`, `browser_autoscroll` (infinite-scroll until idle). Screenshots exposed as MCP resources (`screenshot://{sessionId}/last`).
- **CLI parity** — six one-shot page commands: `run` (multi-step plans via `--steps`/`--steps-file`/stdin), `products`, `extract`, `snapshot`, `screenshot`, `inspect`. `--help` now lists all 15 commands.
- **Config** — `FUSE_CAPS` tool-group filtering, named auth `profile`, `blockResources`, MCP progress notifications on batch tools, configurable network-log buffer (`FUSE_NETLOG_MAX`) that pins the main document.
- **Stealth** — self-healing selectors (role/text/re-snapshot fallback) and a weekly anti-bot benchmark workflow.

### Fixed

- **Booking currency** — `prepareBookingCurrency` no longer does an intermediate homepage navigation that landed on the consent wall and blanked the target page; cookies + the in-page picker apply the currency without it.
- **Probe robustness** — resilient load-state settle + a single re-extraction when the first text/title come back empty, fixing blank reports on heavy/consent-gated pages.
- **Tabs network capture** — a tab opened with a URL now wires its network log before navigating, so its document and subresources are captured.
- **mainText** — strips nav/aside/search/filter sub-trees so filter sidebars (e.g. Booking's budget slider) no longer leak into extracted prices, while product grids still yield every card.

### Tooling

- Biome linter wired into CI; full suite green (292 unit, 20 integration on real Chromium).

## [0.1.54] - 08-06-2026

### Fixed
Expand Down
43 changes: 43 additions & 0 deletions biome.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,43 @@
{
"$schema": "https://biomejs.dev/schemas/2.4.16/schema.json",
"files": {
"includes": ["src/**", "tests/**", "!dist/**", "!node_modules/**"]
},
"formatter": {
"enabled": false
},
"assist": {
"enabled": false
},
"linter": {
"enabled": true,
"rules": {
"recommended": true,
"style": {
"useImportType": "off",
"useTemplate": "off"
},
"complexity": {
"useOptionalChain": "off"
},
"suspicious": {
"noAssignInExpressions": "off"
}
}
},
"overrides": [
{
"includes": ["tests/**"],
"linter": {
"rules": {
"style": {
"noNonNullAssertion": "off"
},
"suspicious": {
"useIterableCallbackReturn": "off"
}
}
}
}
]
}
19 changes: 19 additions & 0 deletions bun.lock

Some generated files are not rendered by default. Learn more about how customized files appear on GitHub.

57 changes: 56 additions & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# CLI

`fuse-browser` is a command-line front-end for the browser agent. It exposes four subcommands that all share a single flag parser (`node:util` `parseArgs`, strict mode), so any flag is accepted globally but only consumed by the subcommands documented below.
`fuse-browser` is a command-line front-end for the browser agent. It exposes nine one-shot subcommands (`probe`, `fetch`, `fetch-batch`, `crawl`, `collect-batch`, `serp-batch`, `shots`, `shots-batch`, `site-shots`) that all share a single flag parser (`node:util` `parseArgs`, strict mode), so any flag is accepted globally but only consumed by the subcommands documented below. Session-based interaction (open/navigate/click/products/autoscroll/…) is exposed through the MCP server (`browser-mcp`), not the CLI.

```
fuse-browser probe <url> [flags]
Expand Down Expand Up @@ -109,6 +109,61 @@ Applicable flags: `--engine`, `--country`, `--headed`, `--output-dir`, `--viewpo

---

## Page commands (one-shot)

These commands open a page, run one operation, print JSON on stdout (errors on stderr), and tear the browser down. Exit codes: `0` success, `1` functional/step failure, `2` bad usage or malformed JSON. They all share the page flags `--engine`, `--country`, `--currency`, `--headed`, `--human-mode`, `--proxy`, `--proxy-map`, `--output-dir`, `--storage-state`, `--no-robots`, `--wait-ms`, `--block-resources`.

### `run <url>`

Executes a multi-step plan in one session. Steps come from `--steps '<json>'` (inline array) or `--steps-file <path>` (`-` reads stdin). Each step is `{type, …}`: `navigate`, `click`, `fill`, `scroll`, `press`, `wait`, `select`, `extract`. Prints `{ok, url, steps}`; on a failed step prints `{ok:false, error:{kind:"step_failed", step, message}}` and exits `1`. Malformed/non-array JSON exits `2`.

```bash
fuse-browser run https://example.com \
--steps '[{"type":"wait","ms":500},{"type":"extract","kind":"text"}]'
```

### `products <url>`

Extracts repeated product cards from the rendered DOM. `--limit <n>` caps the result; `--container <selector>` forces the card container. Prints `{url, count, products}`.

```bash
fuse-browser products "https://www.digitec.ch/en/search?q=macbook" --limit 20
```

### `extract <url>`

Pulls page content. `--kind text` (default) returns main text, `prices` returns parsed prices, `markdown` returns LLM-ready markdown + metadata. Prints `{url, kind, …}`.

```bash
fuse-browser extract https://example.com --kind markdown
```

### `snapshot <url>`

Captures the indexed interactive-element snapshot (each element gets a stable `ref`). Add `--selectors` for per-element CSS selectors. Prints `{url, count, elements}`.

```bash
fuse-browser snapshot https://example.com --selectors
```

### `screenshot <url>`

Captures a PNG. Add `--full-page` for the full scroll height. With `--output <file>` it writes the file and prints `{url, path, bytes}`; otherwise it prints `{url, base64}`.

```bash
fuse-browser screenshot https://example.com --full-page --output shot.png
```

### `inspect <url>`

Snapshots the page (tagging elements with `data-fuse-ref`) then reports computed style, box, and WCAG contrast for the element named by `--ref <ref>` (a `ref` from `snapshot`). `--ref` is required (else exits `2`); an unknown ref exits `1`. Prints `{url, ref, style}`.

```bash
fuse-browser inspect https://example.com --ref 0
```

---

## Approval guardrail

Sensitive actions (pay / book / checkout / confirm) are blocked unless `--approved` is passed. When blocked, `probe` prints `BLOCKED: <reason>` to stderr and exits `2`. `--approved` sets the `humanApproved` flag on the probe.
Expand Down
2 changes: 2 additions & 0 deletions docs/configuration.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,8 @@ Every field is optional. Defaults are applied by `resolveConfig` (`src/agent/con
| `cdpCloseOnDone` | `boolean` | `true` for `ws/wss`, else `false` | Close the CDP session on teardown. Remote endpoints (Browserless) get a fresh context that is closed when done; a local attach never closes the user's browser (only the link is dropped). |
| `cdpTimeoutMs` | `number` | `20000` | Timeout (ms) for the CDP connect. |
| `storageStatePath` | `string` | `null` | Path to a Playwright storage-state JSON (cookies + localStorage) to load/persist a logged-in session. See [./sessions.md](./sessions.md). |
| `profile` | `string` | `null` | Named persistent auth profile — shorthand for `storageStatePath` at `~/.fuse-browser/profiles/<name>.json` (`FUSE_BROWSER_HOME` overrides the home dir; ignored when `storageStatePath` is set). Name: letters/digits then `-`/`_`, max 41 chars. |
| `blockResources` | `string[]` | `null` (off) | Resource types aborted at the network layer to speed up batch runs: `image`, `media`, `font`, `stylesheet`, `script`, `xhr`, `fetch`, `websocket`, `manifest`, `other`. Unknown types are ignored. |
| `harPath` | `string` | `null` | Record network traffic to this HAR file. See [./sessions.md](./sessions.md). |
| `harMode` | `"minimal" \| "full"` | `"minimal"` | HAR recording detail. `minimal` records metadata only; `full` records response bodies. |
| `harReplay` | `string` | `null` | Replay network traffic from this HAR file instead of hitting the live network. |
Expand Down
102 changes: 101 additions & 1 deletion docs/mcp-tools.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# MCP tools

Complete reference for the 32 `browser_*` tools exposed by the fuse-browser MCP server.
Complete reference for the 37 `browser_*` tools exposed by the fuse-browser MCP server.

Tools fall into two families:

Expand All @@ -11,6 +11,26 @@ Every field is optional unless **Required** says `yes`. Defaults shown below com

The shared identity/profile options (the `agentOptionShape`) are listed once under [`browser_open`](#browser_open); tools that accept them say so and link back.

## Capability groups (`FUSE_CAPS`)

By default all 37 tools are registered. Set the `FUSE_CAPS` env var (comma-separated group names) to expose fewer tools — a lighter context for the LLM client:

| Group | Tools |
| --- | --- |
| `core` | Session lifecycle (`browser_open`/`browser_status`/`browser_close`/`browser_connect`), navigation (`browser_navigate`/`browser_back`/`browser_forward`), actions (`browser_click`/`browser_fill`/`browser_login`/`browser_scroll`/`browser_press`/`browser_select`), `browser_tabs`, `browser_dialog`/`browser_downloads`, `browser_snapshot`/`browser_act`, `browser_wait`/`browser_wait_for`, `browser_screenshot`. |
| `batch` | `browser_probe`, `browser_probe_html`, `browser_fetch`, `browser_fetch_batch`, `browser_crawl`, `browser_collect_batch`, `browser_shots_batch`, `browser_site_shots`, `browser_serp_batch`. |
| `extract` | `browser_collect`, `browser_run`, `browser_extract`, `browser_extract_schema`. |
| `debug` | `browser_inspect`, `browser_console`, `browser_network`, `browser_visual_diff`, `browser_metrics`. |
| `live` | `browser_handoff`, `browser_live_view`, `browser_live_view_stop`. |

```sh
FUSE_CAPS=core,extract browser-mcp # only the core + extract groups
```

Parsing is forgiving: names are case-insensitive and whitespace-tolerant (`" CORE , Extract "` works), unknown names are reported on stderr and ignored, and a blank/unset value — or one containing **only** unknown names — falls back to all groups (never an empty server).

**Progress notifications.** The batch tools (`browser_fetch_batch`, `browser_crawl`, `browser_collect_batch`, `browser_shots_batch`, `browser_site_shots`, `browser_serp_batch`) emit MCP `notifications/progress` (`progress`/`total` per finished item, with the item URL/query as `message`) when the client sends a `progressToken` with the request. Clients that don't request progress see no change.

---

## One-shot
Expand Down Expand Up @@ -285,6 +305,53 @@ Launch an installed browser with remote debugging, attach to it, and return a se
{ "browser": "chrome", "port": 9222 }
```

### browser_tabs

Manage the tabs of a live session: list them, open a new tab (optional `url`), select one as the active target of every other `browser_*` tool, or close one. Use it when a click spawned a popup (OAuth login, `target=_blank` link): `list` to find it, `select` to drive it, `close` then `select` to come back.

| Param | Type | Required | Description |
| --- | --- | --- | --- |
| `sessionId` | string | yes | Target session. |
| `action` | enum `list` \| `new` \| `select` \| `close` | yes | Tab action to perform. |
| `index` | number | no* | Tab index (**required** for `select`/`close`; `missing_index` error otherwise). |
| `url` | string | no | URL to open in the new tab (for `new`). |

Always returns the tab list plus the active index after the action: `{ tabs, active }`. Closing the last tab is refused (`cannot_close_last_tab`) — use `browser_close` to end the session. An out-of-range `index` returns `invalid_tab_index`.

```json
{ "sessionId": "s_abc123", "action": "select", "index": 1 }
```

### browser_dialog

Set how native dialogs (`alert`/`confirm`/`prompt`/`beforeunload`) are handled on this session: accept or dismiss, with optional text for prompts. Applies to **upcoming** dialogs; the default policy (before any call) is **dismiss**. Dialog handling is auto-attached when the session opens, so dialogs never block a run.

| Param | Type | Required | Description |
| --- | --- | --- | --- |
| `sessionId` | string | yes | Target session. |
| `action` | enum `accept` \| `dismiss` | yes | Policy applied to upcoming dialogs. |
| `promptText` | string | no | Text typed into `prompt` dialogs when accepting. |

Returns `{ policy, recent }` — `recent` is the last observed dialogs (max 20, oldest first), each `{ type, message, at, handled }`.

```json
{ "sessionId": "s_abc123", "action": "accept", "promptText": "yes" }
```

### browser_downloads

List the files downloaded by this session. Download capture is auto-attached when the session opens: every download is saved under `<outputDir>/downloads/<suggestedFilename>` (suffixing `-1`, `-2` on name collisions) and recorded.

| Param | Type | Required | Description |
| --- | --- | --- | --- |
| `sessionId` | string | yes | Target session. |

Returns `{ count, downloads }` — each download is `{ url, suggestedFilename, path, at, error? }` (`path` is empty while saving or when `error` is set).

```json
{ "sessionId": "s_abc123" }
```

---

## Navigate
Expand Down Expand Up @@ -678,6 +745,39 @@ Returned fields: `uptimeMs`, `probesOk`, `probesFailed`, `avgDurationMs`, `minDu
{ "reset": false }
```

### browser_console

Console messages captured in the session since open (last 80 kept). Use to debug JS errors, CSP violations, or why a page misbehaves.

| Param | Type | Required | Description |
| --- | --- | --- | --- |
| `sessionId` | string | yes | Target session. |
| `level` | enum `error` \| `warning` \| `info` \| `log` \| `debug` | no | Keep only this console message type. |
| `limit` | number | no | Last N entries returned (default `50`). |

Returns `{ count, entries }`.

```json
{ "sessionId": "s_abc123", "level": "error", "limit": 20 }
```

### browser_network

Network requests captured in the session since open (last 80 kept), merged into one row per URL: `method`, `url`, `status`, `resourceType`. Use to debug why a page does not load: failed requests (`status: 404/500`), blocked APIs (`urlContains`). Entries without `status` got no response (pending/failed).

| Param | Type | Required | Description |
| --- | --- | --- | --- |
| `sessionId` | string | yes | Target session. |
| `status` | number | no | Keep only rows with this exact HTTP status. |
| `urlContains` | string | no | Keep only rows whose URL contains this substring. |
| `limit` | number | no | Last N rows returned (default `50`). |

Returns `{ count, requests }`.

```json
{ "sessionId": "s_abc123", "status": 404 }
```

---

## Live view
Expand Down
Loading