Problem
MCP Apps relies on resources/read returning the full app content as a single HTML string or base64 blob. This approach ignores the inherent streamability of both HTTP and HTML, and introduces a performance bottleneck for larger applications.
Web performance best practices rely heavily on streaming HTML, and browsers are designed to parse HTML chunks as they arrive over the network, build the DOM progressively, and initiate requests for subresources (like CSS and JS) as soon as their tags are parsed (possibly by a preload parser).
But by forcing the HTML to be fully buffered into a JSON string property:
- Time to first paint is delayed: The user sees a blank screen until the entire JSON payload is downloaded and parsed.
- Memory overhead: Large strings are kept in memory for parsing.
- Loss of parallelism: The browser cannot begin fetching styles or scripts referenced in the top of the HTML until the full document is received.
Workarounds
Since normal HTTP is streamable, the host could stream the JSON response of a UI resource and extract the HTML string contents on the fly, all using fetch() + ReadableStream + some hacks or the coming streamHTMLUnsafe() API. But this gets messy:
- We'd need a chunk-aware parser to ignore everything until we hit the
"text": " key. From there, we'd stream characters directly until we hit the matching unescaped closing quote.
- Escaping: HTML content in JSON must have quotes and special characters escaped, meaning we couldn't just dump the raw character stream; we would have to unescape characters on the fly (e.g., turning
\" back into ").
- Base64: If the content is a base64 blob, we'd have to handle this specially and apply the right decoding strategy. base64 is actually easy to decode as a stream, but all of this adds nuisance to the implementation.
Proposal: Stream HTML content directly from URL
To unlock the performance of streaming while avoiding changes to the MCP Resource interface where "text" is defined, I propose introducing a way for the server to convey to the host, a streamable URL containing the actual HTML body.
You can imagine a new key like htmlUri sitting alongside resourceUri in the McpUiToolMeta that points to the HTML resource.
{
"name": "my-tool",
"description": "...",
"_meta": {
"ui": {
"resourceUri": "ui://my-app-resource",
"htmlUri": "http://localhost:3001/streamable-app.html"
}
}
}
With this tool response, the host would trigger a resources/read to resourceUri, and in parallel, a fetch() to htmlUri. Then once it receives the UI metadata, it could pipe the response body into the frame and progressively render the app, letting the HTML parser incrementally discover and fetch any external resources.
Backwards compatibility
Of course we'd need to find a way for the host to communicate this capability to the server, so that the server knows it understands the htmlUri field, and will use it to fetch the body instead of "text" in the resource body. I presume this could be done with something in https://modelcontextprotocol.io/specification/draft/basic/lifecycle#capability-negotiation or some other purpose-built mechanism.
But for now, is there any interest in this? /cc @AbhiGemTest @bricedp.
P.S. Prefetching mitigation
I've heard that some implementers maybe prefetch all View UI content before it is needed (do I remember this correctly from discussions last week, @ochafik?). Perhaps for those hosts, this isn't really an issue since the UI content is available ✨ instantly 🪄. But as long as this isn't mandated, and for hosts that implement a different strategy because they detect that the user is on slow internet or is otherwise data-constrained, it seems like some kind of native streaming support would best take advantage of the platform primitives that power most of the web!
Problem
MCP Apps relies on
resources/readreturning the full app content as a single HTML string or base64 blob. This approach ignores the inherent streamability of both HTTP and HTML, and introduces a performance bottleneck for larger applications.Web performance best practices rely heavily on streaming HTML, and browsers are designed to parse HTML chunks as they arrive over the network, build the DOM progressively, and initiate requests for subresources (like CSS and JS) as soon as their tags are parsed (possibly by a preload parser).
But by forcing the HTML to be fully buffered into a JSON string property:
Workarounds
Since normal HTTP is streamable, the host could stream the JSON response of a UI resource and extract the HTML string contents on the fly, all using
fetch()+ReadableStream+ some hacks or the comingstreamHTMLUnsafe()API. But this gets messy:"text": "key. From there, we'd stream characters directly until we hit the matching unescaped closing quote.\"back into").Proposal: Stream HTML content directly from URL
To unlock the performance of streaming while avoiding changes to the MCP
Resourceinterface where"text"is defined, I propose introducing a way for the server to convey to the host, a streamable URL containing the actual HTML body.You can imagine a new key like
htmlUrisitting alongsideresourceUriin the McpUiToolMeta that points to the HTML resource.{ "name": "my-tool", "description": "...", "_meta": { "ui": { "resourceUri": "ui://my-app-resource", "htmlUri": "http://localhost:3001/streamable-app.html" } } }With this tool response, the host would trigger a
resources/readtoresourceUri, and in parallel, afetch()tohtmlUri. Then once it receives the UI metadata, it could pipe the response body into the frame and progressively render the app, letting the HTML parser incrementally discover and fetch any external resources.Backwards compatibility
Of course we'd need to find a way for the host to communicate this capability to the server, so that the server knows it understands the
htmlUrifield, and will use it to fetch the body instead of"text"in the resource body. I presume this could be done with something in https://modelcontextprotocol.io/specification/draft/basic/lifecycle#capability-negotiation or some other purpose-built mechanism.But for now, is there any interest in this? /cc @AbhiGemTest @bricedp.
P.S. Prefetching mitigation
I've heard that some implementers maybe prefetch all View UI content before it is needed (do I remember this correctly from discussions last week, @ochafik?). Perhaps for those hosts, this isn't really an issue since the UI content is available ✨ instantly 🪄. But as long as this isn't mandated, and for hosts that implement a different strategy because they detect that the user is on slow internet or is otherwise data-constrained, it seems like some kind of native streaming support would best take advantage of the platform primitives that power most of the web!