Skip redundant protocol parsing during raw streaming#1025
Conversation
execProtocolRawStream() consumers receive raw bytes via onRawData and never read #currentResults, but the write callback pushed every parsed message into it anyway. The array grew unbounded until the next execProtocol*() call happened to reset it, forcing raw-stream consumers (e.g. wire-protocol bridges) to issue periodic no-op protocol calls just to release the memory.
|
@drudolf This is good, thank you! Unsure if we'll merge it like this because the code in this hot path just keeps getting more complex. |
The WASM write callback ran the internal pg-protocol parser over every outbound chunk even in execProtocolRawStream() mode, eagerly decoding every DataRow field into strings that no one consumes — roughly half the latency of a raw-stream query. On the raw path the parse is load-bearing only for LISTEN/NOTIFY dispatch (errors and notices are not surfaced there), so skip it when no notification listeners are registered. The skip decision cannot change mid-query because execProtocolRawSync is synchronous, so the stateful parser always sees a query's chunks all-or-nothing.
d98f1d7 to
ae485fa
Compare
|
@tdrz Good point... that write callback is the hottest function in the library and it shouldn't keep accreting branches. Reworked it so the parse/accumulate/skip decision lives in a single named method, and the hot path is back to one line: // write callback
this.#handleProtocolWrite(bytes)#handleProtocolWrite(bytes: Uint8Array) {
if (this.#rawStreamMode) {
if (
this.#notifyListeners.size !== 0 ||
this.#globalNotifyListeners.size !== 0
) {
this.#protocolParser.parse(bytes, (msg) => this.#parse(msg))
}
} else {
this.#protocolParser.parse(bytes, (msg) => {
const parsedMsg = this.#parse(msg)
if (parsedMsg) {
this.#currentResults.push(parsedMsg)
}
})
}
}Raw mode parses only when a notify listener needs it (LISTEN/NOTIFY dispatch). Otherwise the normal path is untouched, same parse-and-accumulate as before. The only state left in the hot path is the typecheck / eslint / prettier clean, |
|
Thank you for pointing out this issue. Implementation wise I had something more like this in mind: #1030 |
|
Closing in favor of #1030 — your strategy-function approach is the cleaner implementation. Selecting the handler once per exec mode reads much better than the per-chunk branching here, and it lands the same parse-skip win. One heads-up from the analysis behind this PR: #1030's raw-stream path now skips the protocol parse entirely, so JS-side Thanks for picking this up! |
Yep, probably should address this as well. The real issue imo is that the current API is trying to achieve too many things with too few entrypoints. |
Problem
The WASM write callback unconditionally runs the internal pg-protocol parser over every outbound chunk and pushes the parsed messages into
#currentResults— including duringexecProtocolRawStream(), where the caller consumes raw bytes viaonRawDataand the parsed results are never read.For raw-stream consumers this means:
#currentResultsgrows unbounded until an unrelatedexecProtocol*()call happens to reset it, forcing raw-stream consumers to issue periodic no-op protocol calls just to release the memory.Change
Two commits, each with a changeset:
#currentResultson that path.execProtocolRawSyncis synchronous, so the stateful parser always sees a query's chunks all-or-nothing.Consumers of the parsed APIs (
query(),exec(),execProtocol*) are unaffected — verified as a benchmark control.Numbers
Measured downstream through prisma-pglite-bridge (Prisma → pg → wire protocol →
execProtocolRawStream),findManyover 100 rows, n=1000 iterations × 5 repeats, same engine version A/B:Commit 1 alone is memory hygiene (~no latency change); commit 2 is the latency win. Re-validated on this branch (PG 18.3) with the same profile.
pglite-socketshould benefit identically, since it serves the raw protocol.Validation
packages/pglitetest suite passes, includingexec-protocol.test.tsandtest:node(the only local failures were caused by my locally assembledrelease/artifacts lackingpg_stat_statements.tar.gz, which the published npm package excludes — unrelated to this change).typecheck,eslint,prettierclean.Relation to existing work
Independent of and compatible with #903 — that work optimizes the parsed-results pipeline; this PR fixes the raw path's cost model, which #903 doesn't cover.