Add LP.setSubframeLoading + --disable-subframes opt-out for iframe loading#2401
Conversation
Adds a Lightpanda-specific CDP method that lets drivers opt out of
subframe processing entirely:
await client.send('LP.setSubframeLoading', { enabled: false });
When disabled, the HTML parser silently bypasses every <iframe> it
encounters: no child Frame is created, no document fetch is issued,
and no Page.frameAttached / Page.frameNavigated /
Runtime.executionContextCreated events are emitted. The driver only
sees the main frame's lifecycle.
Motivation: pages that load large numbers of analytics / pixel
iframes (Shopify storefronts, ad-heavy news sites) trigger lightpanda-io#2400
\u2014 each subframe navigation re-registers the main frame's V8 context
under the child's frameId and invalidates the executionContextId the
driver had pinned for the main frame. Subsequent Runtime.evaluate
fails with 'Cannot find context with specified id' (Playwright
surfaces this as 'Execution context was destroyed', Puppeteer hangs
in IsolatedWorld.evaluate waiting for a 'context' event). The proper
fix is per-frame V8 inspector context groups (or per-frame
IsolatedWorld), discussed in lightpanda-io#2400; this method gives drivers a
clean opt-in workaround in the meantime.
Mechanism: new bool field Session.subframe_loading_enabled (default
true). Frame.iframeAddedCallback short-circuits when false, marking
the iframe as _executed so the parser doesn't re-deliver it.
Verified against the puppeteer-core repro on
https://www.allbirds.com/products/mens-wool-runners (which
instantiates ~11 web-pixel iframes during initial render):
baseline (subframe loading ON):
page.title() works (lucky timing) but server segfaults on
disconnect from the worker re-entrancy bug; iframes
do load and trigger the executionContextId churn
with LP.setSubframeLoading(false):
[opt-out] LP.setSubframeLoading reply: {}
[ok] goto status=200 elapsed=6166ms
[stats] frame_attached_events_seen=0
[ok] page.title() = "Allbirds Wool Runners, Men's | ..."
[ok] evaluate(1+1) = 2
[ok] evaluate(document.title) = "Allbirds Wool Runners, Men's | ..."
[ok] body.innerHTML.length = 923161
521/521 unit tests still pass.
Complementary to LP.setSubframeLoading (preceding commit): exposes
the same iframe-skip behavior as a CLI option that applies to all
sessions in the process. Useful for:
* the 'fetch' subcommand (no CDP driver to call LP.setSubframeLoading)
* 'serve' deployments where the operator wants iframes off by
default for every connecting client (the LP method can still
re-enable per-session if needed)
* Playwright's chromium.connectOverCDP, which can't reliably issue
custom CDP methods on Lightpanda today: BrowserContext.newCDPSession
and Browser.newBrowserCDPSession both attach a new CRSession that
collides with the STARTUP-session reuse from lightpanda-io#2399, triggering a
Playwright internal assertion. With --disable-subframes set on the
server, Playwright doesn't need to issue any custom CDP \u2014 every
session inherits subframes-off and the executionContextId churn
from lightpanda-io#2400 never trips.
Verified:
serve --disable-subframes + plain puppeteer-core goto
[ok] goto status=200 elapsed=6354ms frameAttached=0
fetch --disable-subframes --dump html https://www.allbirds.com/...
exit=0
html bytes: 1021562
title: <title>Allbirds Wool Runners, Men's | ...</title>
iframe count in dumped html: 2 (still in DOM, just not loaded)
521/521 unit tests pass.
|
I like the idea of giving the control to disable sub frame's loading to the client 👍 I'm wondering if we should use a more generic CDP command/CLI option to prepare the future. But I don't have strong opinion, the risk is to have something too generic. We need an e2e test in demo to reproduce the issue. |
| // parser doesn't keep handing it back to us, but skip the child | ||
| // frame creation / navigation / notification entirely — no child | ||
| // Frame, no Page.frameAttached, no Runtime.executionContextCreated. | ||
| iframe._executed = true; |
There was a problem hiding this comment.
I think it would be nice to add a debug log in this case (and moving the block after src creation)
log.debug(.frame, "skip iframe loading", .{.src = src});|
I agree for CDP, a single command is nice. It lets you disable multiple things in a single call. For CLI, feels less important. |
Summary
Adds two ways to opt out of subframe loading entirely -- useful as a workaround for #2400 (child-iframe navigation invalidates the main frame's
executionContextId):LP.setSubframeLoading { enabled: bool }-- per-session opt-in toggleable at runtime by the driver.--disable-subframes-- process-wide default, applies to every session and to thefetchsubcommand. Operators can flip it on without any driver changes.When subframe loading is off, the HTML parser registers
<iframe>elements in the DOM (they're still in the tree if the driver inspects viaDOM.getDocumentorLP.getMarkdown) but skips child Frame creation, document fetch, and the correspondingPage.frameAttached/Page.frameNavigated/Runtime.executionContextCreatedevents. The driver only sees the main frame's lifecycle.Motivation
#2400 is the underlying issue: every child-iframe navigation in Lightpanda re-emits
Runtime.executionContextCreatedon the main frame's V8 contexts (becauseIsolatedWorldis shared per-BrowserContextandCONTEXT_GROUP_IDis a constant). V8's inspector treats that as a re-registration with a freshexecutionContextId, invalidating the id the driver had pinned for the main frame's main world / utility world. SubsequentRuntime.evaluatefails with-32000 "Cannot find context with specified id", which Playwright surfaces as"Execution context was destroyed, most likely because of a navigation."and Puppeteer surfaces asIsolatedWorld.evaluatehangs.The proper fix needs per-frame V8 inspector context groups (or per-frame
IsolatedWorld), discussed in #2400. That's a meaningful refactor. This PR is the workaround so users hittingpage.title()/page.evaluate(...)failures on iframe-heavy pages (Shopify storefronts, ad-heavy news sites, anywhere with web-pixel sandboxes) have a clean opt-in escape today.Implementation
Session.subframe_loading_enabled: bool = true-- default matches existing behavior.Frame.iframeAddedCallbackshort-circuits when the flag is false, marking the iframe_executed = trueso the parser doesn't re-deliver it:Two ways to flip the flag:
LP.setSubframeLoading { enabled }(src/cdp/domains/lp.zig) -- CDP method on the existingLPdomain. Setsbc.session.subframe_loading_enabled.--disable-subframesCLI flag (src/Config.zig) -- added toCommonOptions(so it applies toserve,fetch,mcp). NewConfig.disableSubframes()getter;Session.initreads it as the initial value. The CDP method can override per-session at runtime regardless of the CLI default.Total diff: +75 / 0 across 4 files (
src/Config.zig,src/browser/Session.zig,src/browser/Frame.zig,src/cdp/domains/lp.zig).Verification
Reproducer:
puppeteer-core24.42.0 againsthttps://www.allbirds.com/products/mens-wool-runners(page instantiates ~11 web-pixel iframes during initial render).Baseline (no fix) -- page loads, but the worker re-entrancy bug from #2398 also bites and the server segfaults on disconnect; iframe
executionContextCreatedchurn happens in the trace.With
LP.setSubframeLoading({ enabled: false }):With
--disable-subframesCLI flag (no driver-side opt-in):521/521 unit tests pass.
Notes
For
playwright-corechromium.connectOverCDP, the CDP method path is awkward: bothBrowserContext.newCDPSession(page)andBrowser.newBrowserCDPSession()open a newCRSessionthat collides with Promote synthetic STARTUP session for Playwright connectOverCDP #2399's STARTUP-session reuse and triggers Playwright's internalassert(!object.id)incrConnection.js. The--disable-subframesCLI flag is the recommended path for Playwright users for now.This intentionally doesn't prevent iframes from existing in the DOM --
document.querySelectorAll('iframe')still returns them,LP.getMarkdownandLP.getSemanticTreestill see them -- it just stops their content from being fetched and processed. That preserves any selector / scraping logic that relies on inspecting the iframe tags themselves.Once Child iframe navigation invalidates main frame's executionContextId for CDP drivers #2400's underlying architectural fix lands (per-frame V8 inspector context groups or per-frame
IsolatedWorld), this method becomes a niche performance / sandboxing tool rather than a correctness workaround. Worth keeping anyway: blocking analytics / pixel iframes is a reasonable thing to want to do.Related
connectOverCDPSTARTUP session promotion; referenced for the Playwright-side CDP-method limitation noted above.