fix(tesseract): memory leaks in the native planner#11120
Conversation
Every buildSqlAndParams built a per-query object graph hanging off
Rc<QueryTools>, and QueryTools owned the join-tree cache. The cached
JoinTree's BaseCube held a strong Rc<QueryTools> back, so
QueryTools -> join_tree_cache -> JoinTree -> BaseCube -> QueryTools
formed a cycle: QueryTools never dropped and the whole graph (Compiler,
symbols, ~350 compiled minijinja SQL templates) leaked on every query.
In the refresh worker, which generates huge volumes of pre-aggregation
SQL, this grew unbounded and OOMed (Tesseract only; the JS planner was
unaffected).
Split the per-query state: introduce `State { query_tools, compiler,
join_tree_cache }` (Deref<Target = QueryTools>) that owns the mutable /
caching pieces, and make QueryTools a cache-free leaf. Transient planners
hold Rc<State>; cached/long-lived values (BaseCube, TypedFilter, symbols)
and the render layer keep Rc<QueryTools>, so no cached value can pin the
hub. The Compiler moved into State too (its symbol cache was a latent
host for the same class of cycle).
Filter building needs the Compiler only to (re)compile a to_date
rolling-window granularity, so BaseFilter/TypedFilterBuilder take an
optional &mut Compiler; a member-only rewrite carries the precomputed
FilterOp instead, avoiding the dependency. These compiler-threading
points are marked FIXME for a future early-compilation pass.
Adds a pure-Rust regression test (Weak-after-drop on QueryTools) that
fails on the cycle and passes once it is broken.
Note: member_mask_filters still lives in QueryTools and keeps a strong
back-reference via TypedFilter, so the cycle remains for queries with
masked members; relocating that field into State (a follow-up) closes it.
Member `sql` functions were compiled by building recording proxies in
Rust via neon and invoking the JS function through them. Every property
access created neon `JsFunction`s (the get-trap and to-string closures);
their backing napi function objects were never reclaimed even after the
JS side was GC'd, so the refresh worker leaked unboundedly while
generating pre-aggregation SQL (Tesseract only).
Move the recording to a stateless JS module (MemberSqlTemplateCompiler):
it builds the `{CUBE}`, FILTER_PARAMS, FILTER_GROUP, SECURITY_CONTEXT and
SQL_UTILS proxies in JS — where V8 garbage-collects them normally — runs
the member function, and returns the produced template plus the recorded
dependencies as plain data. BaseQuery exposes it via `compileMemberSql`,
reached from Rust through a new `BaseTools::compile_member_sql`; the
planner consumes the result exactly as before. The proxy machinery,
`ProxyState`, and `MemberSql::compile_template_sql` are removed from the
native side.
The only JS reference that still crosses back into Rust is a
`FILTER_PARAMS.x.filter(fn)` column callback, which is deferred (invoked
at render time with the resolved column). It is an existing JS function
held by a `Root` — not a Rust-created `JsFunction` — and is released with
the query graph, so it is bounded per query and does not reintroduce the
leak.
The recording logic is now unit-testable in isolation
(member-sql-template-compiler.test.ts), which was impractical while it
lived behind the Rust↔JS bridge.
|
Claude finished @waralexrom's task in 4m 38s —— View job Code review: memory-leak fixes in the native planner — verdict: ship after one follow-up1 medium, 4 low. Both fixes look correct and well-documented. The DetailsTasks
What the fix does — and what it leaves behindThe split is sound: The JS-side recorder ( Findings[medium] Documented The comment is honest:
Recommend either (a) moving [low] Compiled
[low]
[low] The recorder treats [low] Dead-but-kept PR description says it's "intentionally kept for future compile-model-once reuse." That's a defensible choice, but a Nits
Praise
|
|
Claude finished @waralexrom's task in 1m 4s —— View job Code review: memory-leak fixes in the native planner — verdict: ship after one follow-up1 medium, 4 low. Re-checked after Full reviewTasks
What the fix does — and what it leaves behindThe split is sound: The JS-side recorder ( Findings[medium] Documented The comment is honest:
Recommend either (a) moving [low] Compiled
[low]
[low] The recorder treats [low] Dead-but-kept PR description says it's "intentionally kept for future compile-model-once reuse." That's a defensible choice, but a Nits
Praise
· Branch: |
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## master #11120 +/- ##
==========================================
+ Coverage 79.12% 83.63% +4.50%
==========================================
Files 472 256 -216
Lines 96294 79019 -17275
Branches 3524 0 -3524
==========================================
- Hits 76193 66085 -10108
+ Misses 19586 12934 -6652
+ Partials 515 0 -515
Flags with carried forward coverage won't be shown. Click here to find out more. ☔ View full report in Codecov by Harness. 🚀 New features to boost your workflow:
|
|
Claude finished @waralexrom's task in 1m 8s —— View job Code review: memory-leak fixes in the native planner — verdict: ship after one follow-up1 medium remains. Re-reviewed after Full reviewTasks
What changed since the last review
Prior findings — current status
The fixes still look correct overall. The only thing I'd actively push on before merge is the open medium — everything else is doc/perf polish that can land separately. · Branch: |
|
Claude finished @waralexrom's task in 35s —— View job Code review: memory-leak fixes in the native planner — verdict: ship1 medium remains. Re-checked after Full reviewTasks
What changed since the last review
Prior findings — current status
The PR already has · Branch: |
…y TypedFilter change
|
Claude finished @waralexrom's task in 1m 5s —— View job Code review: memory-leak fixes in the native planner — verdict: ship0 blocking. Re-checked after Full reviewTasks
What changed since the last review
VerificationConfirmed against current source:
The doc update is accurate; the medium finding from the prior review is closed. Prior findings — current status
Nothing remaining is blocking. PR has · Branch: |
Summary
Fixes two memory leaks in the Tesseract native SQL planner that OOM'd the refresh worker when
CUBEJS_TESSERACT_SQL_PLANNERis enabled. Per-query state was never reclaimed: one leak was a RustRcreference cycle, the other was native (neon) proxy objects created per member-sqlevaluation that N-API never freed.Changes
Rccycle (82154224fa). Split a newState { query_tools, compiler, join_tree_cache }out ofQueryTools(withDeref<Target = QueryTools>), severing the cycleQueryTools → join_tree_cache → JoinTree → BaseCube → QueryTools. Transient planners now holdRc<State>. Adds ano_query_tools_leakregression test (Weak-after-drop).sqlcompilation to JS (e18d53a7bf). New stateless recorderMemberSqlTemplateCompiler.jsruns the membersqlfunction under JS-side proxies ({CUBE},FILTER_PARAMS,FILTER_GROUP,SECURITY_CONTEXT,SQL_UTILS) and returns the template plus recorded dependencies. Rust now callscompile_member_sqlover the bridge instead of constructing neon proxy objects, removing ~580 lines of Rust proxy code that leaked N-API structures.clone_to_contextis intentionally kept for future compile-model-once reuse.Testing
member-sql-template-compiler(Jest): 17 passingCUBEJS_TESSERACT_SQL_PLANNER=true CUBEJS_TESSERACT_PRE_AGGREGATIONS=true): 35 suites, 451 passing, 0 failing