Skip to content

fix(cubestore): guard replay handle seq pointer / location length mismatch#11093

Open
waralexrom wants to merge 1 commit into
masterfrom
cubestore-apm-fix
Open

fix(cubestore): guard replay handle seq pointer / location length mismatch#11093
waralexrom wants to merge 1 commit into
masterfrom
cubestore-apm-fix

Conversation

@waralexrom

Copy link
Copy Markdown
Member

Summary

A stale ReplayHandle whose seq_pointers_by_location length no longer matches the table's locations poisoned both the streaming import path and the replay-handle merge pass — stream imports for the table failed permanently and replay handles accumulated unbounded across the cluster. This adds write-time and read-time guards plus per-table isolation in the merge so a single corrupt handle can't take everything down.

Root cause

The seq-pointer vector length is frozen at handle-creation time from table.locations().len(), and table locations are immutable. So a handle of length N against a table with M≠N locations can only arise when a table_id is reused for a table with a different partition count (e.g. metastore restore from a snapshot older than the state where the handles were written, bypassing drop_table's handle cleanup). The mismatch then makes:

  • seq_pointer_for_location (read path, via stream import initial_seq_for) fail with an internal error every cycle, and
  • union_seq_pointer_by_location in merge_replay_handles abort the whole cleanup pass, so handles for all tables stop being merged and pile up.

Changes

  • Refuse to persist replay handles whose seq-pointer vector length doesn't match the table's locations — validate_seq_pointers_by_location, applied in both create_replay_handle_from_seq_pointers and replace_replay_handles.
  • Report the read-path mismatch as corrupt_data instead of internal, so the scheduler's existing CorruptData handling deactivates the table (and Cube rebuilds it) rather than looping the error forever.
  • Isolate per-table merging in merge_replay_handles: a corrupt handle for one table is logged and skipped, no longer aborting cleanup for every other table.

Testing

  • cargo test -p cubestore --lib replay_handle — 3 unit tests for the length guard (match → ok, mismatch → err, None → ok) and 1 integration test asserting the write path rejects a mismatching vector and persists nothing, while a matching one writes exactly one handle. All green.
  • Full crate compiles on top of current master.

…match

A stale ReplayHandle whose seq_pointers_by_location length no longer
matches the table's locations (e.g. after metastore restore where a
table id is reused with a different partition count) poisoned both the
stream import path and the replay handle merge pass, causing imports to
fail forever and handles to accumulate unbounded.

- Refuse to persist replay handles whose seq pointer vector length does
  not match the table's locations (validate_seq_pointers_by_location in
  both create_replay_handle_from_seq_pointers and replace_replay_handles).
- Report the read-path length mismatch as corrupt_data so the scheduler
  deactivates the table instead of looping the error.
- Isolate per-table merging in merge_replay_handles so one corrupt handle
  no longer aborts cleanup for every other table.
- Add unit tests for the length guard and an integration test for the
  write-path rejection.
@waralexrom waralexrom requested a review from a team as a code owner June 16, 2026 14:33
@github-actions github-actions Bot added cube store Issues relating to Cube Store rust Pull requests that update Rust code labels Jun 16, 2026
@claude

claude Bot commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Claude finished @waralexrom's task in 1m 55s —— View job


Review

Verdict: LGTM — the three-layer defense (write-time guard, read-time error classification, per-table merge isolation) is well-targeted at the failure mode in the PR description. 0 high, 0 medium, 3 low.

Details

Correctness

  • replay_handle.rs:206 Switching to CubeError::corrupt_data on the read path is correct: the JobStatus error message will contain "CorruptData" (see scheduler/mod.rs:860), so the scheduler will deactivate + rebuild the table instead of looping the error forever. Matches the existing pattern.
  • replay_handle.rs:216 validate_seq_pointers_by_location treats locations = None as length 0, so a Some([…]) pointer vector against a table without locations will (correctly) fail validation. Symmetric with seq_pointer_for_location.
  • metastore/mod.rs:4567 and :4677 Write-time guards are placed inside the rocks write op, so they roll back cleanly. Good.
  • scheduler/mod.rs:484 Per-table isolation via the IIFE is a reasonable inline scope — union_seq_pointer_by_location is the only thing that can return Err here in steady state, so a corrupt handle for one table_id no longer aborts the whole pass. After-loop replace_replay_handles errors are also caught + logged, which protects against the new write-time guard tripping inside the rocks tx.

Low-priority notes

  1. Possible regression for dropped tables in replace_replay_handles (metastore/mod.rs:4678). The new code does tables_table.get_row_or_not_found(table_id)? before writing the merged handle. If drop_table failed to clean up handles previously (e.g. crash mid-drop), the scheduler's merge pass used to still consolidate them; now it will error out and skip. With the scheduler-side log + continue this is benign (handles just stay un-merged), but stale handles for missing tables will now accumulate indefinitely instead of being merged down. Consider treating TableNotFound as "skip validation, accept None pointers" or scheduling a separate orphan cleanup pass. Worth at minimum a // note: comment.

  2. Error classification asymmetry (replay_handle.rs:220). validate_seq_pointers_by_location returns CubeError::internal while the read path now returns CubeError::corrupt_data for the same condition. Intentional (write-path errors shouldn't trigger table deactivation), but a one-line comment would help future readers understand the asymmetry.

  3. Test isolation (metastore/mod.rs:7672). The test uses hardcoded directory names rh-guard-local / rh-guard-remote relative to current_dir. Other tests in this file appear to use the same pattern, but if tests can run concurrently this is racy. Consider a unique name per test or tempdir. Not blocking — matches surrounding style.

Testing

Unit + integration coverage for the new guard look adequate (match / mismatch / None / end-to-end write rejection). One gap worth considering: a test covering the scheduler-side isolation — i.e. one corrupt handle does not block merging of a sibling table. The mismatch can be constructed by writing the handle directly via the rocks table (bypassing the new guard).

· branch [`cubestore-apm-fix`](https://github.com/cube-js/cube/tree/cubestore-apm-fix)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

cube store Issues relating to Cube Store rust Pull requests that update Rust code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant