Fix/2462 fetch proxy deadlock by krichprollsch · Pull Request #2489 · lightpanda-io/browser

krichprollsch · 2026-05-18T09:33:13Z

rebased version of #2468

WsConnection.send used to switch the socket to blocking mode on EWOULDBLOCK, with a comment claiming this 'should virtually never happen'. Under CDP Fetch.enable + --http-proxy it is routine: the CONNECT+TLS proxy round-trip means many subresources are in flight simultaneously, lightpanda emits a flood of Fetch.requestPaused events, the kernel send buffer fills, lightpanda blocks on write, and puppeteer's matching Fetch.continueRequest replies pile up in lightpanda's TCP recv buffer (which lightpanda can't drain because it is blocked on the write). Both peers wedge until the client times out. Other contributing problems all collapse paused-intercept sites where an outer loop polls without ever draining the CDP socket, OR where disconnect-time cleanup re-enters JS through paths the runtime can no longer satisfy: * HttpClient.perform skipped the CDP socket poll entirely whenever processMessages() returned non-empty, so a steady stream of HTTP completions could starve CDP reads. * ScriptManagerBase.waitForImport spun on `client.tick(200)` and discarded the .cdp_socket return, so a script `import()` whose request was paused at the InterceptionLayer hung forever. * BrowserContext.deinit aborted pending intercepts via `transfer.abort`, which fired XHR/script error_callback chains into a half-torn-down V8 context (the inspector had already been stopped two lines above). * Headers.deinit was non-idempotent, so a value-copied Headers (the hazard RobotsLayer documents) double-freed its curl_slist on the second deinit; the symptom was an "incorrect alignment" panic inside ZigToCurlAllocator.free. * Transfer.deinit was non-idempotent, so a cascade out of error_callback (e.g. Script.errorCallback -> manager.evaluate() -> JS execution -> Frame.deinit -> abortOwner -> Transfer.kill -> Transfer.deinit) reached `arena_pool.release` twice on the same arena. Coordinated changes: * src/network/WsConnection.zig: On EWOULDBLOCK, instead of switching to a blocking write, poll for both POLLOUT and POLLIN. While waiting for write space, drain any incoming bytes into the reader buffer (without dispatching - that would re-enter send and recurse). Adds tryRead/bufferedBytes accessors. * src/browser/HttpClient.zig: - Add has_buffered_input to CDPClient. In perform(), return .cdp_socket when buffered input exists, and always do at least a non-blocking poll on the CDP socket so HTTP completions can no longer starve CDP reads. - Make Transfer.deinit idempotent by claiming ownership through `client.transfers.remove(self.id)`. Second deinits (cascades out of error_callback) early-return. - Make `Transfer.kill` public (was `fn`) so BrowserContext.deinit can use it. - Tighten RequestParams.deinit / Request.deinit to take `*` instead of `*const` so they can call into `Headers.deinit` (now mutating). * src/network/http.zig: Headers.deinit now nulls out `self.headers` after `curl_slist_free_all`, so a second deinit is a no-op. Without this guard a value-copied Headers double-frees the curl_slist (the hazard RobotsLayer's call site already documents). * src/browser/ScriptManagerBase.zig: waitForImport now drains pending CDP messages on every iteration (matching the syncRequest pattern) and re-fetches the imported_modules entry per iteration. The cached entry was a use-after-free risk because the CDP-drain step above re-enters JS, and a transitively-imported module's preloadImport() -> getOrPut() can rehash the map and invalidate the prior entry pointer. * src/cdp/CDP.zig: - Wire hasBufferedInput. - Replace read() with tryRead() in readSocket and tolerate the no_new_data case so we still process messages drained during a backpressured send. - BrowserContext.deinit aborts pending intercepts via `transfer.kill` instead of `transfer.abort`. `kill` fires shutdown_callback (or no-op for transfers without one), avoiding error_callback's re-entry into JS through XHR/script error handlers - those crash because the V8 context and inspector this BC owns have either been torn down already or are about to be. Verified end-to-end against a puppeteer + http-proxy reproducer: | URL | before | after | |----------------------------|----------------------|------------------| | example.com | OK 124ms | OK 117ms | | github.com | HANG 20s (12/82) | OK 1410ms (82/82)| | shopify.com | HANG 20s (1/4) | OK 1973ms (66/66)| | allbirds.com | HANG 20s (12/53) | OK 3944ms (372) | | allbirds wool-runners PDP | HANG 20s (12/53) | OK 6286ms (459) | (nike.com still doesn't reach `load` event but all 68 continueRequests process cleanly - the remaining stall is third-party widgets keeping the page in `loading` state, not the CDP/HTTP deadlock this PR fixes.) Lightpanda survives 18 back-to-back navigation runs across the matrix above (3 per URL) without crashing. Fixes #2462.

krichprollsch · 2026-05-18T09:38:25Z

Integration test: 176

krichprollsch · 2026-05-20T08:22:15Z

closed in favor of #2495

staylor and others added 2 commits May 18, 2026 11:30

handle cdp read's error during ScriptManagerBase

6666a8c

krichprollsch self-assigned this May 18, 2026

krichprollsch mentioned this pull request May 18, 2026

Fix Fetch.enable + http-proxy CDP send/recv deadlock (#2462) #2468

Closed

krichprollsch mentioned this pull request May 18, 2026

add allbirds integration tests lightpanda-io/demo#176

Merged

krichprollsch closed this May 20, 2026

github-actions Bot locked and limited conversation to collaborators May 20, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix/2462 fetch proxy deadlock#2489

Fix/2462 fetch proxy deadlock#2489
krichprollsch wants to merge 2 commits into
mainfrom
fix/2462-fetch-proxy-deadlock

krichprollsch commented May 18, 2026

Uh oh!

krichprollsch commented May 18, 2026

Uh oh!

krichprollsch commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

krichprollsch commented May 18, 2026

Uh oh!

krichprollsch commented May 18, 2026

Uh oh!

krichprollsch commented May 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants