Skip to content

feat: layout-agnostic prices, structured product cards, CLI parity + new MCP tools#66

Merged
fusengine merged 1 commit into
mainfrom
feat/structured-extraction-stealth-cli-parity
Jun 11, 2026
Merged

feat: layout-agnostic prices, structured product cards, CLI parity + new MCP tools#66
fusengine merged 1 commit into
mainfrom
feat/structured-extraction-stealth-cli-parity

Conversation

@fusengine

Copy link
Copy Markdown
Owner

Summary

Makes extraction work on real-world e-commerce/OTA layouts, adds structured per-card extraction, brings the CLI to capability parity with the MCP server, and adds several MCP tools. Validated end-to-end on Digitec and Booking.

Changes

Extraction

  • Layout-agnostic price parsing: currency before/after the amount, split across DOM lines (CHF\n6.90), nbsp/narrow spaces, CH (1'234.56) and EU (1.234,56) decimals. Per-price context label.
  • browser_products (MCP) + products (CLI): structured {title, price, currency, url} per product card; extract_schema container mode.
  • mainText strips nav/aside/search/filter sub-trees (no more filter-slider prices) while keeping every product-grid card.

New MCP tools & config

  • tabs, dialog, downloads, console, network, autoscroll; screenshots as resources (screenshot://{sessionId}/last).
  • FUSE_CAPS group filtering, named auth profile, blockResources, progress notifications, configurable network buffer (FUSE_NETLOG_MAX), self-healing selectors, weekly anti-bot benchmark.

CLI parity (9 → 15 commands)

  • run (multi-step --steps/--steps-file/stdin), products, extract, snapshot, screenshot, inspect. --help lists all 15.

Fixes

  • Booking currency intermediate-navigation blanking the page.
  • Probe robustness: resilient settle + re-extraction on empty result.
  • Tab network capture wired before navigation.

Test plan

  • bun test tests/unit — 292 pass
  • bun run test:integration — 20 pass (real Chromium)
  • Live: Digitec (products, cheapest MacBook), Booking (Milan rooms, EUR), CLI commands 8/8
  • tsc --noEmit + Biome lint clean
  • CI passes

Breaking changes

None — all additions are backward-compatible; registerResources signature change is internal.

…LI parity + new MCP tools

Added:
- Layout-agnostic price extraction (prefix/suffix currency, split DOM lines,
  nbsp, CH/EU decimals) with per-price context labels
- Structured per-card extraction: browser_products tool + `products` CLI,
  extract_schema container mode
- New MCP tools: tabs, dialog, downloads, console, network, autoscroll;
  screenshots as MCP resources (screenshot://{sessionId}/last)
- CLI parity: run/products/extract/snapshot/screenshot/inspect (15 commands)
- FUSE_CAPS tool-group filtering, named auth profile, blockResources,
  progress notifications, configurable network buffer, self-healing selectors,
  weekly anti-bot benchmark

Fixed:
- Booking currency intermediate navigation blanking the target page
- Probe robustness: resilient settle + re-extraction on empty result
- Tab network capture wired before navigation
- mainText strips filter/nav sidebars without dropping product grids

Tooling: Biome linter in CI. Suite green: 292 unit, 20 integration (real Chromium).
@fusengine fusengine merged commit 4fe4ddd into main Jun 11, 2026
2 checks passed
@fusengine fusengine deleted the feat/structured-extraction-stealth-cli-parity branch June 11, 2026 11:05
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant