Add zero-LLM page exploration tools (search_page, find_elements, structured extract)

## Current state

Pilo's only on-page information-extraction tool is `extract` (`packages/core/src/tools/webActionTools.ts:310-358`), which is LLM-powered:

```ts
extract: tool({
  description: "Extract specific data from the current page for later reference",
  inputSchema: z.object({
    description: z.string(),
  }),
  execute: async ({ description }) => {
    const markdown = await context.browser.getMarkdown();
    const prompt = buildExtractionPrompt(description, markdown);
    const extractResponse = await generateTextWithRetry({...}, { maxAttempts: 3 });
    // ... returns extractedData as markdown string ...
  },
})
```

Every `extract` call:

- Converts the whole page to markdown (via Turndown, `playwrightBrowser.ts:668-696`)
- Sends ~5000 tokens to an LLM
- Retries up to 3 times on failure
- Returns markdown text the agent then has to parse/interpret

The agent has no cheaper alternative for simpler questions ("is the word 'logout' on this page?" / "how many product cards are there?" / "what's the URL of the link with text 'Privacy Policy'?"). Every such question costs an `extract` LLM round trip.

## The gap

Three related capability gaps:

1. **No zero-LLM page text search** — for "does the page contain X?" the agent must call `extract` with a descriptive query and pay LLM cost + latency.
2. **No zero-LLM element query** — for "how many `<article>` elements are there?" or "what are the `href`s of links in `<nav>`?" — same story.
3. **`extract` returns markdown only** — when the agent wants structured data (a list of 10 items each with `{ name, price, url }`), it has to parse the markdown back out, which is fragile. The Vercel AI SDK supports `generateObject` for structured output; Pilo's `extract` doesn't use it.

## Proposed scope

### A. Add `search_page` tool

```ts
search_page: tool({
  description:
    "Search the current page content for text matching a pattern. " +
    "Returns matches with surrounding context. Free and fast — prefer this over " +
    "extract() when you know what text to look for.",
  inputSchema: z.object({
    pattern: z.string(),
    regex: z.boolean().default(false),
    caseSensitive: z.boolean().default(false),
    contextChars: z.number().min(0).max(500).default(80),
    maxResults: z.number().min(1).max(50).default(10),
  }),
  execute: async ({ pattern, regex, caseSensitive, contextChars, maxResults }) => {
    return performActionWithValidation(
      PageAction.SearchPage,
      context,
      undefined,
      JSON.stringify({ pattern, regex, caseSensitive, contextChars, maxResults }),
    );
  },
}),
```

Implementation in `playwrightBrowser.ts` via `page.evaluate`:

```ts
const matches = await this.page!.evaluate(({ pattern, regex, caseSensitive, contextChars, maxResults }) => {
  const re = regex
    ? new RegExp(pattern, caseSensitive ? "g" : "gi")
    : new RegExp(pattern.replace(/[.*+?^${}()|[\]\\]/g, "\\$&"), caseSensitive ? "g" : "gi");
  // Walk text nodes via TreeWalker, accumulating offsets
  // Return array of { match, contextBefore, contextAfter, element selector hint }
}, { ... });
```

### B. Add `find_elements` tool

```ts
find_elements: tool({
  description:
    "Find elements on the page by CSS selector. Returns matching elements with their " +
    "text and attributes. Free and fast — useful for inventory queries like " +
    "'how many product cards are there?' before deciding to extract().",
  inputSchema: z.object({
    selector: z.string(),
    attributes: z.array(z.string()).optional()
      .describe("Specific attributes to include (e.g., ['href', 'data-id'])"),
    maxResults: z.number().min(1).max(100).default(20),
    includeText: z.boolean().default(true),
  }),
  execute: async ({ selector, attributes, maxResults, includeText }) => {
    return performActionWithValidation(
      PageAction.FindElements,
      context,
      undefined,
      JSON.stringify({ selector, attributes, maxResults, includeText }),
    );
  },
}),
```

Implementation runs `document.querySelectorAll` in-page, returns `{ tag, text, attributes }` per match. Resolve `src`/`href` to absolute URLs.

### C. Add optional `outputSchema` to `extract`

Extend the existing tool:

```ts
extract: tool({
  description:
    "Extract data from the current page. If outputSchema is provided, returns structured " +
    "data matching the schema. Else returns markdown text.",
  inputSchema: z.object({
    description: z.string(),
    outputSchema: z.record(z.string(), z.unknown()).optional()
      .describe("JSON Schema describing the desired output structure"),
  }),
  execute: async ({ description, outputSchema }) => {
    const markdown = await context.browser.getMarkdown();
    if (outputSchema) {
      const zodSchema = jsonSchemaToZod(outputSchema);
      const { object } = await generateObjectWithRetry({
        ...providerConfig,
        prompt: buildExtractionPrompt(description, markdown),
        schema: zodSchema,
      }, { maxAttempts: 3 });
      return { success: true, action: "extract", description, data: object };
    } else {
      // existing markdown path
    }
  },
}),
```

`generateObjectWithRetry` is a thin wrapper around `generateObject` from the AI SDK following the same retry pattern as `generateTextWithRetry`.

Need a small JSON Schema → Zod converter, OR (simpler) accept Zod schemas directly and the model returns the matching JSON. Since tool schemas are already Zod, accepting `outputSchema: z.record(z.string(), z.unknown())` is the most flexible — interpret it via `generateObject({ output: 'no-schema' })` mode and validate after.

### D. Update prompts

In `prompts.ts:163-210` (`buildToolExamples`):

```
- search_page({"pattern": "logout"}) - Search page text. Free, fast.
- find_elements({"selector": "a.nav-link"}) - Query elements by CSS selector. Free, fast.
- extract({"description": "...", "outputSchema": {...}}) - Extract data. Use outputSchema
  for structured output (lists of items, key-value pairs, etc.).
```

Add to best practices:

> For inventory questions ("how many X are there?", "is Y on the page?"), prefer
> `find_elements` or `search_page` — they are free and instant. Reserve `extract` for
> cases where you need synthesized or structured data the page doesn't expose directly.

## Implementation notes

- These tools run via `performActionWithValidation` for consistency in error handling and event emission, even though they aren't "actions" in the traditional sense (no DOM mutation). The naming is a bit off but consistent with the existing pattern.
- `search_page` regex compilation can throw `SyntaxError` on bad patterns — return `{ success: false, error: "...", isRecoverable: true }` rather than crashing.
- `find_elements` selector can throw `DOMException` on bad selectors — same treatment.
- Both tools should be safe and idempotent — no `pageChanged: true`.
- The result shapes are not the standard `ActionResult`; consider extending the type or adding a discriminated union. Worth a small refactor.

## Acceptance criteria

- `search_page` and `find_elements` are available in `webActionTools`, with the right tool descriptions and prompt examples.
- `extract` accepts an optional `outputSchema` and returns structured data when provided.
- Tests in `packages/core/test/` cover: text search (literal and regex), CSS query for various selectors, bad-pattern error handling, structured extract with a schema.
- A manual eval on a small task set (e.g., "find the number of pricing tiers on this page" / "extract the top 5 product names and prices") shows the new tools reduce LLM calls per task.

## Effort estimate

2-4 days. The two zero-LLM tools are quick (1 day each). The `outputSchema` work depends on how clean the JSON Schema → Zod path is.

## Related issues

Pairs with the action-vocabulary-additions issue (both expand tool capabilities). Related to the modal/viewport-context issue (those tools also benefit from a clearer page model).

## Files likely affected

- `packages/core/src/tools/webActionTools.ts` (or new `tools/inspectionTools.ts`)
- `packages/core/src/browser/ariaBrowser.ts` (PageAction enum)
- `packages/core/src/browser/playwrightBrowser.ts` (handlers)
- `packages/core/src/prompts.ts` (tool examples + best practices)
- `packages/core/src/utils/retry.ts` (add `generateObjectWithRetry`)
- `packages/core/test/`


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Add zero-LLM page exploration tools (search_page, find_elements, structured extract) #432

Current state

The gap

Proposed scope

A. Add `search_page` tool

B. Add `find_elements` tool

C. Add optional `outputSchema` to `extract`

D. Update prompts

Implementation notes

Acceptance criteria

Effort estimate

Related issues

Files likely affected

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Add zero-LLM page exploration tools (search_page, find_elements, structured extract) #432

Description

Current state

The gap

Proposed scope

A. Add search_page tool

B. Add find_elements tool

C. Add optional outputSchema to extract

D. Update prompts

Implementation notes

Acceptance criteria

Effort estimate

Related issues

Files likely affected

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions

A. Add `search_page` tool

B. Add `find_elements` tool

C. Add optional `outputSchema` to `extract`