Skip to content

Skip redundant protocol parsing during raw streaming#1025

Closed
drudolf wants to merge 2 commits into
electric-sql:mainfrom
drudolf:perf/raw-stream-skip-parse-main
Closed

Skip redundant protocol parsing during raw streaming#1025
drudolf wants to merge 2 commits into
electric-sql:mainfrom
drudolf:perf/raw-stream-skip-parse-main

Conversation

@drudolf

@drudolf drudolf commented Jun 11, 2026

Copy link
Copy Markdown

Problem

The WASM write callback unconditionally runs the internal pg-protocol parser over every outbound chunk and pushes the parsed messages into #currentResults — including during execProtocolRawStream(), where the caller consumes raw bytes via onRawData and the parsed results are never read.

For raw-stream consumers this means:

  • Every response is effectively parsed twice: once internally (and discarded), once by the consumer on the other side of the wire protocol.
  • The parse runs synchronously inside the write callback, directly on the query's critical path. In our profiling it accounted for roughly half the end-to-end latency of a raw-stream query (eager string-decoding of every DataRow field).
  • #currentResults grows unbounded until an unrelated execProtocol*() call happens to reset it, forcing raw-stream consumers to issue periodic no-op protocol calls just to release the memory.

Change

Two commits, each with a changeset:

  1. Don't accumulate parsed results during raw streaming — nothing reads #currentResults on that path.
  2. Skip the internal parse entirely during raw streaming when no notification listeners are registered. On the raw path the parse is load-bearing only for LISTEN/NOTIFY dispatch — errors and notices are not surfaced there, consistent with the method's existing docstring ("bypasses PGlite's protocol wrappers… don't intend to use the above features"). With listeners registered, the parse still runs and notifications dispatch as before. The gate cannot change mid-query because execProtocolRawSync is synchronous, so the stateful parser always sees a query's chunks all-or-nothing.

Consumers of the parsed APIs (query(), exec(), execProtocol*) are unaffected — verified as a benchmark control.

Numbers

Measured downstream through prisma-pglite-bridge (Prisma → pg → wire protocol → execProtocolRawStream), findMany over 100 rows, n=1000 iterations × 5 repeats, same engine version A/B:

p50 p95 p99
before 2.31 ms 2.96 ms 5.91 ms
after 1.05 ms 1.42 ms 2.13 ms

Commit 1 alone is memory hygiene (~no latency change); commit 2 is the latency win. Re-validated on this branch (PG 18.3) with the same profile. pglite-socket should benefit identically, since it serves the raw protocol.

Validation

  • packages/pglite test suite passes, including exec-protocol.test.ts and test:node (the only local failures were caused by my locally assembled release/ artifacts lacking pg_stat_statements.tar.gz, which the published npm package excludes — unrelated to this change).
  • typecheck, eslint, prettier clean.
  • LISTEN/NOTIFY verified end-to-end: notifications still dispatch when listeners are registered while raw streaming is in use.

Relation to existing work

Independent of and compatible with #903 — that work optimizes the parsed-results pipeline; this PR fixes the raw path's cost model, which #903 doesn't cover.

execProtocolRawStream() consumers receive raw bytes via onRawData and
never read #currentResults, but the write callback pushed every parsed
message into it anyway. The array grew unbounded until the next
execProtocol*() call happened to reset it, forcing raw-stream consumers
(e.g. wire-protocol bridges) to issue periodic no-op protocol calls just
to release the memory.
@tdrz

tdrz commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

@drudolf This is good, thank you! Unsure if we'll merge it like this because the code in this hot path just keeps getting more complex.

The WASM write callback ran the internal pg-protocol parser over every
outbound chunk even in execProtocolRawStream() mode, eagerly decoding
every DataRow field into strings that no one consumes — roughly half
the latency of a raw-stream query. On the raw path the parse is
load-bearing only for LISTEN/NOTIFY dispatch (errors and notices are
not surfaced there), so skip it when no notification listeners are
registered. The skip decision cannot change mid-query because
execProtocolRawSync is synchronous, so the stateful parser always sees
a query's chunks all-or-nothing.
@drudolf drudolf force-pushed the perf/raw-stream-skip-parse-main branch from d98f1d7 to ae485fa Compare June 16, 2026 12:13
@drudolf

drudolf commented Jun 16, 2026

Copy link
Copy Markdown
Author

@tdrz Good point... that write callback is the hottest function in the library and it shouldn't keep accreting branches. Reworked it so the parse/accumulate/skip decision lives in a single named method, and the hot path is back to one line:

// write callback
this.#handleProtocolWrite(bytes)
#handleProtocolWrite(bytes: Uint8Array) {
  if (this.#rawStreamMode) {
    if (
      this.#notifyListeners.size !== 0 ||
      this.#globalNotifyListeners.size !== 0
    ) {
      this.#protocolParser.parse(bytes, (msg) => this.#parse(msg))
    }
  } else {
    this.#protocolParser.parse(bytes, (msg) => {
      const parsedMsg = this.#parse(msg)
      if (parsedMsg) {
        this.#currentResults.push(parsedMsg)
      }
    })
  }
}

Raw mode parses only when a notify listener needs it (LISTEN/NOTIFY dispatch). Otherwise the normal path is untouched, same parse-and-accumulate as before. The only state left in the hot path is the #rawStreamMode flag, which the shared callback needs to tell which mode it's in.

typecheck / eslint / prettier clean, exec-protocol and notify suites green (notify confirms raw-stream and listeners still dispatches).

@tdrz

tdrz commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

Thank you for pointing out this issue. Implementation wise I had something more like this in mind: #1030

@drudolf

drudolf commented Jun 16, 2026

Copy link
Copy Markdown
Author

Closing in favor of #1030 — your strategy-function approach is the cleaner implementation. Selecting the handler once per exec mode reads much better than the per-chunk branching here, and it lands the same parse-skip win.

One heads-up from the analysis behind this PR: #1030's raw-stream path now skips the protocol parse entirely, so JS-side pg.listen() notifications no longer dispatch during execProtocolRawStream (the previous code parsed unconditionally on that path). No impact for raw-protocol consumers like pglite-socket, but it may matter for anyone combining pg.listen() with raw streaming.

Thanks for picking this up!

@drudolf drudolf closed this Jun 16, 2026
@tdrz

tdrz commented Jun 16, 2026

Copy link
Copy Markdown
Collaborator

One heads-up from the analysis behind this PR: #1030's raw-stream path now skips the protocol parse entirely, so JS-side pg.listen() notifications no longer dispatch during execProtocolRawStream (the previous code parsed unconditionally on that path). No impact for raw-protocol consumers like pglite-socket, but it may matter for anyone combining pg.listen() with raw streaming.

Yep, probably should address this as well. The real issue imo is that the current API is trying to achieve too many things with too few entrypoints.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants