Deep-CodeAI · Skobeltsyn · Jun 15, 2026 · Jun 15, 2026
diff --git a/CHANGELOG.md b/CHANGELOG.md
@@ -4,6 +4,21 @@ All notable changes to Agents.KT are documented here. The format follows [Keep a
 
 ## [Unreleased]
 
+### Added — `nlwebSearch` tool: query an NLWeb endpoint (#4541, PRD §12.9)
+
+`tools { +nlwebSearchTool(baseUrl = "https://example.com") }` lets an agent on its own model query an
+[NLWeb](https://github.com/nlweb-ai/NLWeb) endpoint — a website's natural-language interface over its
+**schema.org**-structured content — and fold the ranked, typed results into context. Mirrors
+`perplexitySearch`: marked `untrustedOutput = true` (fetched web content is wrapped in the
+`{trusted:false}` envelope and the model is warned to treat it as data, #642), with pure
+`buildNlWebAskBody` / `parseNlWebResponse` wire helpers and an injectable `NlWebSearchBackend` seam.
+Posts to `<baseUrl>/ask` (no API key — NLWeb endpoints are public); `NlWebSearchOptions(site, mode =
+LIST/SUMMARIZE/GENERATE)` selects the namespace and query mode; results render as a numbered list of
+schema.org matches (name, `@type`, description, url) plus any summarize/generate answer. The first slice
+of the agent↔web-content layer (epic #4539). 8 tests. (Every NLWeb endpoint is also an MCP server, so an
+NLWeb `/mcp` URL is equally consumable through the existing MCP client; this tool is the zero-wiring
+`/ask`-over-HTTP path.)
+
 ## [0.8.0] — 2026-06-14
 
 **Interoperable, multimodal agents — with capability grants.** The largest minor since 0.5.0:

diff --git a/README.md b/README.md
@@ -207,6 +207,7 @@ These APIs work in `main`, are unit-tested, and are exercised by integration tes
 - **Vision input to models** — `LlmMessage(role = "user", content = "...", images = listOf(ImagePart(base64, ImagePart.WireMime.Png)))` (#2470 slice a) reaches all four built-in adapters: Ollama emits `images: [<b64>...]`, Claude emits `{type:"image", source:{type:"base64",...}}` content blocks, OpenAI emits `{type:"image_url", image_url:{url:"data:..."}}` content blocks, DeepSeek inherits OpenAI (silently ignored on non-vision models). Closed `ImagePart.WireMime { Png, Jpeg, Gif, Webp }` — no `String` mime. Programmatic `VisionFixtures.threeSquaresPng()` / `housePng()` (256×256, `BufferedImage`-rendered, ~5KB) + per-provider live tests (qwen3-vl:8b / Haiku 4.5 / gpt-4o-mini) with cost discipline. See [docs/multimodal.md](docs/multimodal.md#vision-input--talking-to-the-model-2470-slice-a).
 - **Typed `Content.Image` at the agent surface** — `agent.invokeWithAttachments("describe", attachments = listOf(Content.Image(ref, ImageMime.Png)))` (#2470 slice b). Inject a `BlobStore` via `blobStore(store)` in the agent DSL; the runtime dereferences each `Content.Image` against the store, base64-encodes once, and attaches `ImagePart` to the first user message. Closed `ImageMime → ImagePart.WireMime` mapping covers all four variants. Misconfiguration errors fast (no `blobStore` configured, missing blob for a ref). Composes with snapshot/resume — refs travel in the snapshot; the same store dereferences on resume. Suspending sibling `invokeSuspendWithAttachments`. Live tests across all three vision providers via the agent surface. See [docs/multimodal.md](docs/multimodal.md#agent-attachments--typed-contentimage-at-the-invoke-surface-2470-slice-b).
 - **Web-grounded search tool (`perplexitySearch`)** — `tools { +perplexitySearchTool(perplexityKey) }` lets an agent reasoning on its *own* model (Claude/OpenAI/Ollama/…) fetch live, cited facts from Perplexity's Sonar API. The tool is `untrustedOutput = true`, so results are auto-wrapped in the `{"trusted":false}` envelope and the model is warned to treat them as data, not instructions (#642) — web search is the canonical prompt-injection vector. The result renders the answer plus a numbered source list parsed from `search_results[]` (citations land in both the model context and the JSONL audit row). Controls via `perplexitySearchOptions { mode = SearchMode.ACADEMIC; recency = SearchRecency.WEEK; allowDomains("arxiv.org"); contextSize = SearchContextSize.HIGH; structuredOutput(MyType::class) }` map to `search_mode` / `search_recency_filter` / `search_domain_filter` / `web_search_options` / `response_format` json_schema (#3674). Key from `.secrets/perplexity-key`. See [docs/providers.md](docs/providers.md#web-grounded-search-tool-perplexitysearch-3676--3677).
+- **NLWeb endpoint tool (`nlwebSearch`)** — `tools { +nlwebSearchTool(baseUrl = "https://example.com") }` lets an agent query an [NLWeb](https://github.com/nlweb-ai/NLWeb) endpoint — a website's natural-language interface over its **schema.org**-structured content — and fold the ranked, typed results into context (#4541, PRD §12.9). Like `perplexitySearch` it is `untrustedOutput = true` (fetched web content is treated as data, not instructions). `nlwebSearchOptions`-style args via `NlWebSearchOptions(site = "podcasts", mode = NlWebMode.GENERATE)`. NLWeb endpoints need no API key. (Every NLWeb endpoint is also an MCP server, so an NLWeb `/mcp` URL is equally consumable through the existing MCP client — this tool is the zero-wiring `/ask`-over-HTTP path.)
 - **Prompt caching across providers** — `agent { caching { enabled = true; cacheSystemPrompt = true; cacheToolDefs = true; cacheConversation = Rolling; ttl = 1.hours; cacheable("doc-id") { ... } } }`. Vendor-neutral DSL drives Anthropic's explicit `cache_control` breakpoints (#2658), OpenAI / DeepSeek automatic prefix caching with a stable `prompt_cache_key` routing hint (#2659 / #2661), Ollama / vLLM / SGLang engine-level KV-cache reuse (no-op hints, #2662), and surfaces cache reads + writes + hit-rate on `TokenUsage` (#2663). A prefix-stability guard (#2657) detects silent cache-busters — timestamps, UUIDs, non-deterministic ordering inside cacheable segments — and warns before you pay for a single non-cached run. Off by default; non-breaking. See [docs/caching.md](docs/caching.md).
 - **JSONL audit exporter** — `:agents-kt-observability` writes append-only, one-line-per-event audit rows with `requestId`, `sessionId`, `manifestHash`, agent/skill/tool ids, event type, provider, and model; raw arguments/results are omitted by default (#1914). See [docs/observability.md](docs/observability.md).
 - **ObservabilityBridge adapters** — `.observe(OtelBridge(tracer))` maps runtime events to OTel spans (#1908), `.observe(LangSmithBridge(apiKey, project))` maps the same events to LangSmith run trees (#1909), and `.observe(LangfuseBridge(publicKey, secretKey))` maps them to Langfuse traces, generations, spans, and events (#1910), while keeping core vendor-free. See [docs/observability.md](docs/observability.md).
@@ -259,7 +260,7 @@ What the framework does **not** enforce — your responsibility:
 
 ### Known Limitations
 
-- **Seven LLM providers shipped** — Ollama, Anthropic, OpenAI, DeepSeek, Kimi (Moonshot AI, #2697), OpenRouter (#2701), and Perplexity (Sonar, #3675) — the last with a `perplexitySearch` web-grounded search tool (#3676 / #3677). Google (Gemini) is the main adapter still on the roadmap (Phase 2); the injectable `ModelClient` covers test stubs and your own adapters in the meantime.
+- **Eight LLM providers shipped** — Ollama, Anthropic, OpenAI, DeepSeek, Kimi (Moonshot AI, #2697), OpenRouter (#2701), Perplexity (Sonar, #3675) — the last with a `perplexitySearch` web-grounded search tool (#3676 / #3677) — and Google Gemini (#1917, a full from-scratch adapter with native SSE, function calling, and `responseJsonSchema` decoding). The injectable `ModelClient` covers test stubs and your own adapters.
 - **Synchronous agentic loop** — `runBlocking` inside the loop until the suspend refactor lands (#638). Calling agents from existing coroutine scopes works but doesn't propagate cancellation cleanly.
 - **No built-in MCP rate limiter** — use `McpServer` auth/policy plus a gateway for throttling. Agent/runtime audit events have a first-party JSONL exporter in `:agents-kt-observability`.
 - **Streaming runtime** *(shipped — v0.5.0)*. `agent.session(input): AgentSession<OUT>` exposes `events: Flow<AgentEvent<OUT>>` — bracket events (`SkillStarted` / `SkillCompleted` / `Completed<OUT>` / `Failed`) plus mid-loop `Token` / `Reasoning` / `ToolCallStarted` / `ToolCallArgumentsDelta` / `ToolCallFinished` events as the agentic loop runs. All events carry `requestId`, `sessionId`, and `manifestHash` for audit correlation (#1913). All eight providers stream at the wire — Ollama (NDJSON), Anthropic, OpenAI, and Gemini (native SSE), with DeepSeek / Kimi / OpenRouter / Perplexity inheriting the OpenAI-compatible SSE path; live integration tests measure 19 / 2 / 19 chunks for the original three native adapters. `SkillCompleted.tokensUsed` and `Completed.tokensUsed` carry cumulative `TokenUsage` across all turns. The underlying `LlmChunk` sealed type + `ModelClient.chatStream(messages): Flow<LlmChunk>` foundation (#1722) is what custom adapters plug into. See [docs/streaming.md](docs/streaming.md) for the full API + the [v0.5.0 streaming premortem](docs/premortem-0.5.0-streaming.md) for design rationale.

diff --git a/docs/prd.md b/docs/prd.md
@@ -2949,7 +2949,7 @@ Tracking: epic `[interop] x402 agent payments`, deferred — seller-side experim
 
 **Query shape** (the `/ask` and `/mcp` endpoints, same args): `query` (required), `site`, `prev` (conversation history — server is stateless), `mode` (`list` = ranked results, `summarize` = list + LLM summary, `generate` = full RAG answer), `streaming`. Response: `{query_id, results[]}` where each result is `{url, name, site, score, description, schema_object}` (`schema_object` = the schema.org JSON). Build tolerant of two divergent schemas — the implemented `schema_object` shape and the newer nlweb.ai v0.55 `query/context/prefer/meta` envelope.
 
-**Client-side (consume NLWeb as knowledge) — do opportunistically, ~free.** A thin helper over the MCP client: point it at an NLWeb `/mcp` URL, `tools/call` the `ask` tool, surface each `schema_object` into a `KnowledgeProvider`/retrieval source. Mode mapping: `list`→retrieval source, `generate`→delegate-the-answer. The honest, shippable claim is *"agents.kt MCP clients can consume NLWeb endpoints today."*
+**Client-side (consume NLWeb as knowledge) — SHIPPED (#4541).** `tools { +nlwebSearchTool(baseUrl) }` — a tool (mirroring `perplexitySearch`) that posts to an NLWeb `/ask` endpoint and folds the ranked schema.org results into the agent's context, `untrustedOutput = true`. `NlWebSearchOptions(site, mode = LIST/SUMMARIZE/GENERATE)` selects namespace + mode. No API key (NLWeb endpoints are public). This is the zero-wiring `/ask`-over-HTTP path; because every NLWeb endpoint is also an MCP server, an NLWeb `/mcp` URL is *equally* consumable through the existing MCP client (`tools/call` the `ask` tool) — so *"agents.kt agents can consume NLWeb endpoints today"* holds via both transports.
 
 **Server-side (expose agent data as an NLWeb endpoint) — deferred, niche.** That means standing up schema.org-shaped data + a vector store + an LLM-in-the-loop retrieval pipeline behind `/ask` + `/mcp` — effectively building/operating a RAG service. An independent benchmark (Univ. Mannheim, [arXiv 2511.23281](https://arxiv.org/abs/2511.23281)) finds NLWeb *ties* RAG/MCP on effectiveness but plain RAG is more cost-effective — so NLWeb's value is standardization, not performance. This is an **application** concern, not a runtime primitive; defer unless a concrete consumer needs to discover our content over the open web.
 

diff --git a/src/main/kotlin/agents_engine/model/HttpNlWebSearchBackend.kt b/src/main/kotlin/agents_engine/model/HttpNlWebSearchBackend.kt
@@ -0,0 +1,48 @@
+package agents_engine.model
+
+import java.net.URI
+import java.net.http.HttpClient
+import java.net.http.HttpRequest
+import java.net.http.HttpResponse
+import kotlin.time.Duration
+import kotlin.time.toJavaDuration
+
+/**
+ * Default [NlWebSearchBackend] (#4541) — POSTs to `<baseUrl>/ask` and parses the
+ * schema.org result list. NLWeb endpoints are public, so there is no auth header.
+ * Reuses the same JDK HttpClient shape as [HttpPerplexitySearchBackend].
+ */
+class HttpNlWebSearchBackend(
+    private val baseUrl: String,
+    private val requestTimeout: Duration = OpenAiClient.DEFAULT_REQUEST_TIMEOUT,
+    connectTimeout: Duration = OpenAiClient.DEFAULT_CONNECT_TIMEOUT,
+    httpClient: HttpClient? = null,
+) : NlWebSearchBackend {
+
+    private val http: HttpClient = httpClient ?: HttpClient.newBuilder()
+        .connectTimeout(connectTimeout.toJavaDuration())
+        .build()
+
+    override fun search(query: String, options: NlWebSearchOptions): NlWebSearchResult {
+        val body = buildNlWebAskBody(query, options)
+        val request = HttpRequest.newBuilder()
+            .uri(URI.create("${baseUrl.trimEnd('/')}/ask"))
+            .timeout(requestTimeout.toJavaDuration())
+            .header("content-type", "application/json")
+            .POST(HttpRequest.BodyPublishers.ofString(body))
+            .build()
+        val response = http.send(request, HttpResponse.BodyHandlers.ofString())
+        if (response.statusCode() >= HTTP_BAD_REQUEST) {
+            // Try to surface the endpoint's error message; fall back to the status line.
+            val parsed = runCatching { parseNlWebResponse(response.body()) }
+            parsed.exceptionOrNull()?.let { throw it }
+            throw NlWebSearchException("NLWeb HTTP ${response.statusCode()}: ${response.body().take(ERROR_BODY_CAP)}")
+        }
+        return parseNlWebResponse(response.body())
+    }
+
+    private companion object {
+        const val HTTP_BAD_REQUEST = 400
+        const val ERROR_BODY_CAP = 500
+    }
+}
diff --git a/src/main/kotlin/agents_engine/model/NlWebMode.kt b/src/main/kotlin/agents_engine/model/NlWebMode.kt
@@ -0,0 +1,9 @@
+package agents_engine.model
+
+/**
+ * NLWeb `/ask` query mode (#4541). `LIST` returns the ranked schema.org matches;
+ * `SUMMARIZE` adds an LLM summary of the list; `GENERATE` is full RAG — the
+ * endpoint composes a direct answer from the retrieved items. Sent lowercase on
+ * the wire. Defaults to `LIST`.
+ */
+enum class NlWebMode { LIST, SUMMARIZE, GENERATE }
diff --git a/src/main/kotlin/agents_engine/model/NlWebResult.kt b/src/main/kotlin/agents_engine/model/NlWebResult.kt
@@ -0,0 +1,15 @@
+package agents_engine.model
+
+/**
+ * One result from an NLWeb `/ask` response (#4541): a ranked match backed by the
+ * site's schema.org-structured content. [schemaType] is the `@type` lifted from
+ * the result's `schema_object` (e.g. `Recipe`, `PodcastEpisode`) when present.
+ */
+data class NlWebResult(
+    val url: String,
+    val name: String? = null,
+    val site: String? = null,
+    val score: Double? = null,
+    val description: String? = null,
+    val schemaType: String? = null,
+)
diff --git a/src/main/kotlin/agents_engine/model/NlWebSearch.kt b/src/main/kotlin/agents_engine/model/NlWebSearch.kt
@@ -0,0 +1,115 @@
+package agents_engine.model
+
+import agents_engine.generation.LenientJsonParser
+import agents_engine.internal.toJsonString
+
+/**
+ * `agents_engine/model/NlWebSearch.kt` — #4541 (PRD §12.9), the `nlwebSearch`
+ * tool factory plus its pure request/response wire helpers. Supporting types
+ * live one-per-file alongside (`NlWebSearchArgs`, `NlWebMode`,
+ * `NlWebSearchOptions`, `NlWebResult`, `NlWebSearchResult`, `NlWebSearchBackend`
+ * + `HttpNlWebSearchBackend`, `NlWebSearchException`).
+ *
+ * [NLWeb](https://github.com/nlweb-ai/NLWeb) gives a website a natural-language
+ * interface over its **schema.org-structured content**. This tool lets an agent
+ * on its OWN model ask an NLWeb endpoint and fold the ranked, schema.org-typed
+ * results into its context — the inbound, external-knowledge counterpart to
+ * MCP-tools. It is marked [ToolDef.untrustedOutput] so the agentic loop wraps the
+ * result in the `{trusted:false}` envelope and warns the model to treat fetched
+ * web content as data, not instructions (#642).
+ *
+ * (Every NLWeb endpoint is also an MCP server, so an NLWeb `/mcp` URL is equally
+ * consumable through the existing MCP client; this tool is the zero-wiring
+ * `/ask`-over-HTTP path for an agent on any model.)
+ *
+ * Register on an agent via the `tools { }` DSL:
+ * ```
+ * tools { +nlwebSearchTool(baseUrl = "https://example.com") }
+ * ```
+ */
+
+/**
+ * Build the NLWeb `/ask` request body. Pure + internal so it is unit-testable
+ * without a live call. Streaming is disabled so the response is a single JSON
+ * blob; [NlWebSearchOptions.site] is omitted when null.
+ */
+internal fun buildNlWebAskBody(query: String, options: NlWebSearchOptions): String {
+    val fields = buildList {
+        add(""""query":${query.toJsonString()}""")
+        options.site?.let { add(""""site":${it.toJsonString()}""") }
+        add(""""mode":${options.mode.name.lowercase().toJsonString()}""")
+        add(""""streaming":false""")
+    }
+    return "{${fields.joinToString(",")}}"
+}
+
+/**
+ * Parse an NLWeb `/ask` response body into an [NlWebSearchResult]. Pure +
+ * internal so it is unit-testable without a live call.
+ *
+ * - `results[]` ← each `{url, name, site, score, description, schema_object}`;
+ *   `schemaType` is `schema_object.@type` when present.
+ * - `answer` ← a top-level `summary` / `answer` (present in `SUMMARIZE` /
+ *   `GENERATE` mode), else null.
+ * - a top-level `error` raises [NlWebSearchException].
+ */
+internal fun parseNlWebResponse(rawJson: String): NlWebSearchResult {
+    val root = LenientJsonParser.parse(rawJson) as? Map<*, *>
+        ?: throw NlWebSearchException("NLWeb response was not a JSON object")
+
+    root["error"]?.let { err ->
+        val message = (err as? Map<*, *>)?.get("message") ?: err
+        throw NlWebSearchException("NLWeb error: $message")
+    }
+
+    val queryId = root["query_id"] as? String
+    val answer = (root["summary"] as? String) ?: (root["answer"] as? String)
+    val results = (root["results"] as? List<*>).orEmpty().mapNotNull { parseNlWebResult(it) }
+    return NlWebSearchResult(results = results, answer = answer, queryId = queryId)
+}
+
+private fun parseNlWebResult(item: Any?): NlWebResult? {
+    val obj = item as? Map<*, *> ?: return null
+    val url = obj["url"] as? String ?: return null
+    val schemaType = (obj["schema_object"] as? Map<*, *>)?.get("@type") as? String
+    return NlWebResult(
+        url = url,
+        name = obj["name"] as? String,
+        site = obj["site"] as? String,
+        score = (obj["score"] as? Number)?.toDouble(),
+        description = obj["description"] as? String,
+        schemaType = schemaType,
+    )
+}
+
+/**
+ * Build the `nlweb_search` tool. Register via `tools { +nlwebSearchTool(baseUrl) }`.
+ *
+ * - `untrustedOutput = true` — results are auto-wrapped in the `{trusted:false}`
+ *   envelope and the model is warned to treat them as data (#642).
+ * - On a blank query or a backend failure, returns an `"ERROR: …"` string
+ *   (the agentic loop's standard tool-error convention) rather than throwing.
+ *
+ * @param baseUrl the NLWeb endpoint base URL (e.g. `http://localhost:8000`); `/ask` is appended.
+ * @param options default query options (`site` namespace + list/summarize/generate `mode`).
+ * @param backend override the network backend — injected in tests.
+ */
+fun nlwebSearchTool(
+    baseUrl: String,
+    options: NlWebSearchOptions = NlWebSearchOptions(),
+    backend: NlWebSearchBackend = HttpNlWebSearchBackend(baseUrl),
+): ToolDef = ToolDef(
+    name = "nlweb_search",
+    description = "Query an NLWeb endpoint — a website's natural-language interface — for schema.org-" +
+        "structured answers from its content (its catalog, articles, recipes, etc.). Arguments: {query: string}.",
+    argsType = NlWebSearchArgs::class,
+    untrustedOutput = true,
+) { args ->
+    val query = args["query"]?.toString().orEmpty()
+    if (query.isBlank()) {
+        "ERROR: missing 'query'"
+    } else {
+        runCatching { backend.search(query, options) }
+            .getOrElse { e -> "ERROR: nlweb_search failed: ${e.message}" }
+    }
+}
diff --git a/src/main/kotlin/agents_engine/model/NlWebSearchArgs.kt b/src/main/kotlin/agents_engine/model/NlWebSearchArgs.kt
@@ -0,0 +1,15 @@
+package agents_engine.model
+
+import agents_engine.generation.Generable
+import agents_engine.generation.Guide
+
+/**
+ * The single `@Generable` argument of the `nlwebSearch` tool (#4541): the
+ * natural-language query to ask an [NLWeb](https://github.com/nlweb-ai/NLWeb)
+ * endpoint, which answers from a website's schema.org-structured content.
+ */
+@Generable("Arguments for a natural-language query against an NLWeb endpoint")
+data class NlWebSearchArgs(
+    @Guide("The natural-language query to ask the NLWeb site")
+    val query: String,
+)