Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
6 changes: 6 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,5 +1,11 @@
# Changelog

## [0.1.57] - 11-06-2026

### Added

- **File upload** — new `upload` action (Playwright `setInputFiles`) for `browser_act` (`kind:"upload"`) and `run` step plans (`{"type":"upload","target":"input[type=file]","files":"/path/cv.pdf"}`). `files` accepts one path, a comma-separated string, or an array. Targets by `ref` or selector. Enables forms that need an attachment (job applications, avatars, CSV imports). Verified live (the-internet upload form). No new MCP tool — tool count stays 44.

## [0.1.56] - 11-06-2026

### Docs
Expand Down
4 changes: 2 additions & 2 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -10,7 +10,7 @@ Shadow DOM + iframes), multi-step plans, structured extraction, visual diff, and
guardrails** for payments and bookings. It drives real Chromium, so it reads **Next.js / SPA**
pages after hydration — not just static HTML.

> 44 MCP tools · stealth + rotating proxies · HTTP fast-path (single, batch & crawl) · full-site content + screenshot snapshots · structured per-card product extraction · virtualized-list scraping + autoscroll · tabs / dialogs / downloads · console + network logs · MCP screenshot resources · `FUSE_CAPS` tool-group filtering · named auth profiles · `blockResources` · HAR record/replay · pixel visual-diff · human handoff + live view.
> 44 MCP tools · stealth + rotating proxies · HTTP fast-path (single, batch & crawl) · full-site content + screenshot snapshots · structured per-card product extraction · form fill + file upload · virtualized-list scraping + autoscroll · tabs / dialogs / downloads · console + network logs · MCP screenshot resources · `FUSE_CAPS` tool-group filtering · named auth profiles · `blockResources` · HAR record/replay · pixel visual-diff · human handoff + live view.

## Install

Expand Down Expand Up @@ -44,7 +44,7 @@ fuse-browser products "https://www.digitec.ch/en/search?q=macbook" --limit 20

An LLM runs a **perceive → decide → act** loop through the tools: `browser_open` →
`browser_navigate` → `browser_snapshot` (indexed `ref`s + form state) → `browser_act`
(click/fill/select/pick, returns a page diff) → `browser_wait_for` → `browser_autoscroll`
(click/fill/select/pick/upload, returns a page diff) → `browser_wait_for` → `browser_autoscroll`
(drain lazy lists) → `browser_products` / `browser_collect` / `browser_extract` /
`browser_screenshot`. Sensitive actions (pay / book / checkout) are **blocked** unless the
agent passes `humanApproved`.
Expand Down
7 changes: 6 additions & 1 deletion docs/cli.md
Original file line number Diff line number Diff line change
Expand Up @@ -115,13 +115,18 @@ These commands open a page, run one operation, print JSON on stdout (errors on s

### `run <url>`

Executes a multi-step plan in one session. Steps come from `--steps '<json>'` (inline array) or `--steps-file <path>` (`-` reads stdin). Each step is `{type, …}`: `navigate`, `click`, `fill`, `scroll`, `press`, `wait`, `select`, `extract`. Prints `{ok, url, steps}`; on a failed step prints `{ok:false, error:{kind:"step_failed", step, message}}` and exits `1`. Malformed/non-array JSON exits `2`.
Executes a multi-step plan in one session. Steps come from `--steps '<json>'` (inline array) or `--steps-file <path>` (`-` reads stdin). Each step is `{type, …}`: `navigate`, `click`, `fill`, `scroll`, `press`, `wait`, `select`, `upload`, `extract`. An `upload` step is `{"type":"upload","target":"<selector>","files":"<path>"}` — `files` accepts one path, a comma-separated string, or an array, and is set on the matching `<input type=file>`. Prints `{ok, url, steps}`; on a failed step prints `{ok:false, error:{kind:"step_failed", step, message}}` and exits `1`. Malformed/non-array JSON exits `2`.

```bash
fuse-browser run https://example.com \
--steps '[{"type":"wait","ms":500},{"type":"extract","kind":"text"}]'
```

```bash
fuse-browser run https://example.com/apply \
--steps '[{"type":"upload","target":"input[type=file]","files":"/path/cv.pdf"}]'
```

### `products <url>`

Extracts repeated product cards from the rendered DOM. `--limit <n>` caps the result; `--container <selector>` forces the card container. Prints `{url, count, products}`.
Expand Down
13 changes: 10 additions & 3 deletions docs/mcp-tools.md
Original file line number Diff line number Diff line change
Expand Up @@ -536,18 +536,21 @@ Return the indexed interactive elements of the live page, each with a `ref` to u

### browser_act

Execute click/fill/select/pick on an element by `ref` (from `browser_snapshot`) or by `target` text. Returns a diff of what changed on the page (added/removed/text/url).
Execute click/fill/select/pick/upload on an element by `ref` (from `browser_snapshot`) or by `target` text. Returns a diff of what changed on the page (added/removed/text/url).

`pick` = type `value` into a combobox, then click the matching suggestion (`option` text, defaults to `value`) — for airport/city autocompletes.

`upload` = set local file path(s) on an `<input type=file>` via `files` (single path, a comma-separated string, or an array). Resolves the same `ref`/`target` as the other kinds, then calls Playwright's `setInputFiles`.

| Param | Type | Required | Description |
| --- | --- | --- | --- |
| `sessionId` | string | yes | Target session. |
| `kind` | enum `click` \| `fill` \| `select` \| `pick` | yes | Action to perform. |
| `kind` | enum `click` \| `fill` \| `select` \| `pick` \| `upload` | yes | Action to perform. |
| `ref` | integer \| string | no | Element ref from `browser_snapshot` (e.g. `12` or `"3:4"`). |
| `target` | string | no | Text/selector fallback when no `ref`. |
| `value` | string | no | Value to type/select (for `fill`/`select`/`pick`). |
| `option` | string | no | Suggestion text to click for `pick` (defaults to `value`). |
| `files` | string \| string[] | no | File path(s) for `upload` — one path, a comma-separated string (split into many), or an array. |
| `annotate` | boolean | no | Also return a Set-of-Marks JPEG of the NEW state (re-marked, anti-drift). |

Provide either `ref` or `target`.
Expand All @@ -556,9 +559,13 @@ Provide either `ref` or `target`.
{ "sessionId": "s_abc123", "kind": "fill", "ref": 12, "value": "Paris" }
```

```json
{ "sessionId": "s_abc123", "kind": "upload", "target": "input[type=file]", "files": "/path/cv.pdf" }
```

### browser_run

Execute an ordered multi-step plan (navigate, click, fill, scroll, press, select, wait_for, extract) in one call. Stops at the first failed step. Sensitive actions require `humanApproved`.
Execute an ordered multi-step plan (navigate, click, fill, scroll, press, select, upload, wait_for, extract) in one call. Stops at the first failed step. Sensitive actions require `humanApproved`. An `upload` step takes `{type:"upload", target, files}` where `files` is a path, a comma-separated string, or an array.

| Param | Type | Required | Description |
| --- | --- | --- | --- |
Expand Down
2 changes: 1 addition & 1 deletion package.json
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
{
"name": "@fusengine/browser-mcp",
"version": "0.1.56",
"version": "0.1.57",
"description": "MCP server + CLI giving AI agents a real, stealth browser (Patchright/Playwright) — per-country identity, self-healing actions, snapshots, multi-step plans, structured extraction, CDP attach.",
"license": "MIT",
"author": "Fusengine",
Expand Down
17 changes: 15 additions & 2 deletions src/actions/act-by-ref.ts
Original file line number Diff line number Diff line change
Expand Up @@ -8,24 +8,37 @@ import type { Page } from "playwright";
import type { ActionResult } from "../interfaces/types.js";
import { pickAutocomplete } from "./autocomplete.js";
import { refLocator } from "./ref-locator.js";
import { type FilesInput, setFiles } from "./upload.js";

/** Action kinds that can target a snapshot ref. */
export type RefActionKind = "click" | "fill" | "select" | "pick";
export type RefActionKind = "click" | "fill" | "select" | "pick" | "upload";

/** Run `kind` on the element carrying the frame-scoped `ref`. */
/**
* Run `kind` on the element carrying the frame-scoped `ref`.
*
* @param page - Active Playwright page.
* @param ref - Snapshot ref (bare or frame-scoped).
* @param kind - Action to perform.
* @param value - Text for fill/select/pick (ignored for click/upload).
* @param option - Suggestion text for `pick`.
* @param files - Paths for `upload` (string, CSV string, or array).
* @returns Action result tagged with the `ref`.
*/
export async function actByRef(
page: Page,
ref: string | number,
kind: RefActionKind,
value = "",
option = "",
files: FilesInput = "",
): Promise<ActionResult> {
const locator = refLocator(page, ref);
try {
if (!locator || (await locator.count()) === 0) {
return { type: kind, ok: false, ref, error: "ref_not_found" };
}
if (kind === "pick") return { ...(await pickAutocomplete(page, locator, value, option)), ref };
if (kind === "upload") return { ...(await setFiles(locator, files)), ref };
if (kind === "click") {
await locator.click({ timeout: 5_000 });
} else if (kind === "fill") {
Expand Down
3 changes: 3 additions & 0 deletions src/actions/perform.ts
Original file line number Diff line number Diff line change
Expand Up @@ -9,6 +9,7 @@ import { login, type LoginAction } from "./login.js";
import { navigateHistory, pressKey, scroll, selectOption } from "./navigation.js";
import { smartClick } from "./smart-click.js";
import { smartFill } from "./smart-fill.js";
import { type FilesInput, uploadFiles } from "./upload.js";

/** Loose runtime action (may carry `preferredStrategy` injected by site memory). */
export type ActionInput = Record<string, unknown> & { type: string };
Expand Down Expand Up @@ -37,6 +38,8 @@ export async function performAction(
return pressKey(page, String(action.key ?? ""));
case "select":
return selectOption(page, target, String(action.value ?? ""));
case "upload":
return uploadFiles(page, target, (action.files ?? action.value ?? "") as FilesInput);
case "pick":
return pickAutocomplete(page, page.locator(target).first(), String(action.value ?? ""), String(action.option ?? ""));
case "back":
Expand Down
54 changes: 54 additions & 0 deletions src/actions/upload.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,54 @@
/**
* File upload action: set local file paths on an `<input type="file">` via
* Playwright's `setInputFiles`. Shared by `performAction` (run steps) and the
* snapshot `browser_act` ref/target path.
* @module actions/upload
*/
import type { Locator, Page } from "playwright";
import type { ActionResult } from "../interfaces/types.js";

/** A `files` field: one path, a CSV of paths, or an explicit array of paths. */
export type FilesInput = string | string[];

/**
* Normalize a `files` field into a clean path list. A comma-containing string is
* split into several paths; whitespace is trimmed and empty entries dropped.
*
* @param files - One path, a comma-separated string, or an array of paths.
* @returns The de-blanked list of file paths.
*/
export function normalizeFiles(files: FilesInput): string[] {
const raw = Array.isArray(files) ? files : String(files).split(",");
return raw.map((p) => p.trim()).filter((p) => p.length > 0);
}

/**
* Set `files` on an already-resolved `<input type="file">` locator.
*
* @param locator - Locator pointing at the file input.
* @param files - Paths to upload (string, CSV string, or array).
* @returns Action result; `ok:false` with `error` on failure or no files.
*/
export async function setFiles(locator: Locator, files: FilesInput): Promise<ActionResult> {
const paths = normalizeFiles(files);
if (paths.length === 0) return { type: "upload", ok: false, error: "no_files" };
try {
await locator.setInputFiles(paths, { timeout: 10_000 });
return { type: "upload", ok: true, files: paths };
} catch (err) {
return { type: "upload", ok: false, files: paths, error: String(err).split("\n")[0] ?? "error" };
}
}

/**
* Resolve `target` to its first matching input and upload `files` onto it.
*
* @param page - Active Playwright page.
* @param target - Selector for the file input.
* @param files - Paths to upload (string, CSV string, or array).
* @returns Action result tagged with the `target`.
*/
export async function uploadFiles(page: Page, target: string, files: FilesInput): Promise<ActionResult> {
if (!target) return { type: "upload", ok: false, error: "no_target" };
return { ...(await setFiles(page.locator(target).first(), files)), target };
}
1 change: 1 addition & 0 deletions src/interfaces/types.ts
Original file line number Diff line number Diff line change
Expand Up @@ -30,6 +30,7 @@ export type BrowserAction =
username?: string;
password?: string;
}
| { type: "upload"; target: string; files: string | string[] }
| { type: "wait"; ms?: number };

/** Normalized result of an action. */
Expand Down
47 changes: 47 additions & 0 deletions src/server/tools/run-act.ts
Original file line number Diff line number Diff line change
@@ -0,0 +1,47 @@
/**
* `browser_act` dispatch: run the chosen action (by `ref` or text `target`),
* with site-memory assist. Extracted from snapshot.ts to keep both < 90 lines.
* @module server/tools/run-act
*/
import type { Page } from "playwright";
import { z } from "zod";
import { actByRef, type RefActionKind } from "../../actions/act-by-ref.js";
import { pickAutocomplete } from "../../actions/autocomplete.js";
import { smartClick } from "../../actions/smart-click.js";
import { smartFill } from "../../actions/smart-fill.js";
import { type FilesInput, uploadFiles } from "../../actions/upload.js";
import type { ActionResult } from "../../interfaces/types.js";
import { runWithMemory } from "../../state/action-memory.js";

/** Allowed `browser_act` kinds. */
export const KIND = z.enum(["click", "fill", "select", "pick", "upload"]);

/**
* Run the chosen action (by `ref` or text fallback), with site-memory assist.
*
* @param page - Active Playwright page.
* @param a - Raw tool args (`kind`, `ref`/`target`, `value`, `option`, `files`).
* @param human - Whether human-like cursor motion is enabled.
* @param dir - Site-memory directory for strategy recall.
* @returns Action result, or `null` when neither `ref` nor `target` is given.
*/
export async function runAct(
page: Page,
a: Record<string, unknown>,
human: boolean,
dir: string,
): Promise<ActionResult | null> {
const kind = a.kind as RefActionKind;
const value = a.value ? String(a.value) : "";
const option = a.option ? String(a.option) : "";
const files = (a.files ?? a.value ?? "") as FilesInput;
if (typeof a.ref === "number" || typeof a.ref === "string") return actByRef(page, a.ref, kind, value, option, files);
if (typeof a.target !== "string") return null;
const target = a.target;
if (kind === "pick") return pickAutocomplete(page, page.locator(target).first(), value, option);
if (kind === "upload") return uploadFiles(page, target, files);
return runWithMemory(dir, page, { type: kind, target }, (act) => {
const pref = String(act.preferredStrategy ?? "");
return kind === "fill" ? smartFill(page, target, value, pref, human) : smartClick(page, target, pref, human);
});
}
2 changes: 1 addition & 1 deletion src/server/tools/run.ts
Original file line number Diff line number Diff line change
Expand Up @@ -20,7 +20,7 @@ export function registerRunTool(server: McpServer, sessions: SessionManager): vo
{
title: "Run multi-step plan",
description:
"Execute an ordered list of steps (navigate, click, fill, scroll, press, select, wait_for, extract) in one call. Stops at the first failed step. Sensitive actions need humanApproved.",
"Execute an ordered list of steps (navigate, click, fill, scroll, press, select, upload, wait_for, extract) in one call. `upload` sets local file path(s) on an `<input type=file>` via `files` (single path, comma-separated string, or array). Stops at the first failed step. Sensitive actions need humanApproved.",
inputSchema: {
sessionId: z.string(),
steps: z.array(stepSchema),
Expand Down
33 changes: 3 additions & 30 deletions src/server/tools/snapshot.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,43 +6,15 @@
* @module server/tools/snapshot
*/
import type { McpServer } from "@modelcontextprotocol/sdk/server/mcp.js";
import type { Page } from "playwright";
import { z } from "zod";
import { actByRef, type RefActionKind } from "../../actions/act-by-ref.js";
import { pickAutocomplete } from "../../actions/autocomplete.js";
import { smartClick } from "../../actions/smart-click.js";
import { smartFill } from "../../actions/smart-fill.js";
import { annotatedScreenshot } from "../../extraction/annotate.js";
import { captureSnapshot } from "../../extraction/snapshot.js";
import { diffSnapshots } from "../../extraction/snapshot-diff.js";
import type { ActionResult } from "../../interfaces/types.js";
import type { SessionManager } from "../../session/manager.js";
import { runWithMemory } from "../../state/action-memory.js";
import { errorResult, imageJsonResult, jsonResult } from "../result.js";
import { KIND, runAct } from "./run-act.js";
import { withSession } from "./with-session.js";

const KIND = z.enum(["click", "fill", "select", "pick"]);

/** Run the chosen action (by ref or text fallback), with site-memory assist. */
async function runAct(
page: Page,
a: Record<string, unknown>,
human: boolean,
dir: string,
): Promise<ActionResult | null> {
const kind = a.kind as RefActionKind;
const value = a.value ? String(a.value) : "";
const option = a.option ? String(a.option) : "";
if (typeof a.ref === "number" || typeof a.ref === "string") return actByRef(page, a.ref, kind, value, option);
if (typeof a.target !== "string") return null;
const target = a.target;
if (kind === "pick") return pickAutocomplete(page, page.locator(target).first(), value, option);
return runWithMemory(dir, page, { type: kind, target }, (act) => {
const pref = String(act.preferredStrategy ?? "");
return kind === "fill" ? smartFill(page, target, value, pref, human) : smartClick(page, target, pref, human);
});
}

/** Register `browser_snapshot` and `browser_act`. */
export function registerSnapshotTools(server: McpServer, sessions: SessionManager): void {
server.registerTool(
Expand Down Expand Up @@ -70,14 +42,15 @@ export function registerSnapshotTools(server: McpServer, sessions: SessionManage
{
title: "Act on element",
description:
"Execute click/fill/select/pick on an element by `ref` (from browser_snapshot) or by `target` text. `pick` = type `value` into a combobox then click the matching suggestion (`option` text, defaults to `value`) — for airport/city autocompletes. Returns a diff of what changed. Pass `annotate:true` to also get a Set-of-Marks screenshot of the NEW state (re-marked, anti-drift) for vision models.",
"Execute click/fill/select/pick/upload on an element by `ref` (from browser_snapshot) or by `target` text. `pick` = type `value` into a combobox then click the matching suggestion (`option` text, defaults to `value`) — for airport/city autocompletes. `upload` = set local file path(s) on an `<input type=file>` via `files` (a single path, a comma-separated string, or an array). Returns a diff of what changed. Pass `annotate:true` to also get a Set-of-Marks screenshot of the NEW state (re-marked, anti-drift) for vision models.",
inputSchema: {
sessionId: z.string(),
kind: KIND,
ref: z.union([z.number().int(), z.string()]).optional(),
target: z.string().optional(),
value: z.string().optional(),
option: z.string().optional(),
files: z.union([z.string(), z.array(z.string())]).optional(),
annotate: z.boolean().optional(),
},
},
Expand Down
Loading