fix(cubestore): guard replay handle seq pointer / location length mismatch by waralexrom · Pull Request #11093 · cube-js/cube

waralexrom · 2026-06-16T14:33:35Z

Summary

A stale ReplayHandle whose seq_pointers_by_location length no longer matches the table's locations poisoned both the streaming import path and the replay-handle merge pass — stream imports for the table failed permanently and replay handles accumulated unbounded across the cluster. This adds write-time and read-time guards plus per-table isolation in the merge so a single corrupt handle can't take everything down.

Root cause

The seq-pointer vector length is frozen at handle-creation time from table.locations().len(), and table locations are immutable. So a handle of length N against a table with M≠N locations can only arise when a table_id is reused for a table with a different partition count (e.g. metastore restore from a snapshot older than the state where the handles were written, bypassing drop_table's handle cleanup). The mismatch then makes:

seq_pointer_for_location (read path, via stream import initial_seq_for) fail with an internal error every cycle, and
union_seq_pointer_by_location in merge_replay_handles abort the whole cleanup pass, so handles for all tables stop being merged and pile up.

Changes

Refuse to persist replay handles whose seq-pointer vector length doesn't match the table's locations — validate_seq_pointers_by_location, applied in both create_replay_handle_from_seq_pointers and replace_replay_handles.
Report the read-path mismatch as corrupt_data instead of internal, so the scheduler's existing CorruptData handling deactivates the table (and Cube rebuilds it) rather than looping the error forever.
Isolate per-table merging in merge_replay_handles: a corrupt handle for one table is logged and skipped, no longer aborting cleanup for every other table.

Testing

cargo test -p cubestore --lib replay_handle — 3 unit tests for the length guard (match → ok, mismatch → err, None → ok) and 1 integration test asserting the write path rejects a mismatching vector and persists nothing, while a matching one writes exactly one handle. All green.
Full crate compiles on top of current master.

…match A stale ReplayHandle whose seq_pointers_by_location length no longer matches the table's locations (e.g. after metastore restore where a table id is reused with a different partition count) poisoned both the stream import path and the replay handle merge pass, causing imports to fail forever and handles to accumulate unbounded. - Refuse to persist replay handles whose seq pointer vector length does not match the table's locations (validate_seq_pointers_by_location in both create_replay_handle_from_seq_pointers and replace_replay_handles). - Report the read-path length mismatch as corrupt_data so the scheduler deactivates the table instead of looping the error. - Isolate per-table merging in merge_replay_handles so one corrupt handle no longer aborts cleanup for every other table. - Add unit tests for the length guard and an integration test for the write-path rejection.

claude · 2026-06-16T14:34:03Z

Claude finished @waralexrom's task in 1m 55s —— View job

Review

Verdict: LGTM — the three-layer defense (write-time guard, read-time error classification, per-table merge isolation) is well-targeted at the failure mode in the PR description. 0 high, 0 medium, 3 low.

Details

Correctness

replay_handle.rs:206 Switching to CubeError::corrupt_data on the read path is correct: the JobStatus error message will contain "CorruptData" (see scheduler/mod.rs:860), so the scheduler will deactivate + rebuild the table instead of looping the error forever. Matches the existing pattern.
replay_handle.rs:216 validate_seq_pointers_by_location treats locations = None as length 0, so a Some([…]) pointer vector against a table without locations will (correctly) fail validation. Symmetric with seq_pointer_for_location.
metastore/mod.rs:4567 and :4677 Write-time guards are placed inside the rocks write op, so they roll back cleanly. Good.
scheduler/mod.rs:484 Per-table isolation via the IIFE is a reasonable inline scope — union_seq_pointer_by_location is the only thing that can return Err here in steady state, so a corrupt handle for one table_id no longer aborts the whole pass. After-loop replace_replay_handles errors are also caught + logged, which protects against the new write-time guard tripping inside the rocks tx.

Low-priority notes

Possible regression for dropped tables in replace_replay_handles (metastore/mod.rs:4678). The new code does tables_table.get_row_or_not_found(table_id)? before writing the merged handle. If drop_table failed to clean up handles previously (e.g. crash mid-drop), the scheduler's merge pass used to still consolidate them; now it will error out and skip. With the scheduler-side log + continue this is benign (handles just stay un-merged), but stale handles for missing tables will now accumulate indefinitely instead of being merged down. Consider treating TableNotFound as "skip validation, accept None pointers" or scheduling a separate orphan cleanup pass. Worth at minimum a // note: comment.
Error classification asymmetry (replay_handle.rs:220). validate_seq_pointers_by_location returns CubeError::internal while the read path now returns CubeError::corrupt_data for the same condition. Intentional (write-path errors shouldn't trigger table deactivation), but a one-line comment would help future readers understand the asymmetry.
Test isolation (metastore/mod.rs:7672). The test uses hardcoded directory names rh-guard-local / rh-guard-remote relative to current_dir. Other tests in this file appear to use the same pattern, but if tests can run concurrently this is racy. Consider a unique name per test or tempdir. Not blocking — matches surrounding style.

Testing

Unit + integration coverage for the new guard look adequate (match / mismatch / None / end-to-end write rejection). One gap worth considering: a test covering the scheduler-side isolation — i.e. one corrupt handle does not block merging of a sibling table. The mismatch can be constructed by writing the handle directly via the rocks table (bypassing the new guard).

· branch [`cubestore-apm-fix`](https://github.com/cube-js/cube/tree/cubestore-apm-fix)

waralexrom requested a review from a team as a code owner June 16, 2026 14:33

github-actions Bot added cube store Issues relating to Cube Store rust Pull requests that update Rust code labels Jun 16, 2026

vercel Bot deployed to Preview June 16, 2026 14:35 View deployment

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cubestore): guard replay handle seq pointer / location length mismatch#11093

fix(cubestore): guard replay handle seq pointer / location length mismatch#11093
waralexrom wants to merge 1 commit into
masterfrom
cubestore-apm-fix

waralexrom commented Jun 16, 2026

Uh oh!

claude Bot commented Jun 16, 2026 •

edited

Loading

Correctness

Low-priority notes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

waralexrom commented Jun 16, 2026

Summary

Root cause

Changes

Testing

Uh oh!

claude Bot commented Jun 16, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Review

Correctness

Low-priority notes

Testing

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

claude Bot commented Jun 16, 2026 •

edited

Loading