[BFTree] Add RangeIndex cluster migration#1731
Conversation
61a467b to
3a6e52f
Compare
3a6e52f to
dc78c02
Compare
74564ec to
74a1e44
Compare
4851503 to
5754d5d
Compare
Migration plumbing: - Chunked serializer (pure state machine, no I/O) + async MigrationReader wrapper - Chunked deserializer with FileStream I/O for receiving migration data - TransmitRangeIndexAsync source-side driver with configurable chunk size - RangeIndexMigrationReceiveState receiver with state machine - Remove redundant content-length prefix in migration wire format - SnapshotRangeIndexAndCreateReader factory on RangeIndexManager Bug fixes: - Remove RI+cluster startup guard (GarnetServer.cs) - Fix round-trip migration: Publish deletes existing data file - Fix Publish registration: accept InPlaceUpdated/CopyUpdated - Fix serializer empty-buffer bug in FileData phase - Add missing buffer-too-small guard for KeyHeader phase - TransmitRangeIndexAsync catches all exceptions (never throws) Sketch protection (SLOTS path): - All RI keys added to sketch in one batch - TRANSMITTING epoch barrier blocks writes during snapshot+transmit - DELETING epoch barrier blocks reads+writes during delete - try/finally ensures sketch cleanup on failure - TODO: Vector Sets have the same unprotected pattern Sketch protection (KEYS path): - RI keys already in sketch from user enumeration - Mark transmitted keys for DeleteKeysAsync() Tests: - 26 unit tests for serializer/deserializer - 11 cluster migration integration tests (SingleBySlot, ByKeys, ManyBySlot, WhileModifying, MigrateBack, LargeTree, ChunkSize, StressAsync) Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
5754d5d to
ac06150
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…of out int consumed Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…atedIndex The deserializer now only exposes parsed data (Key, Stub, TempPath). The store operations (file move, BfTree recovery, RMW, registration) live in RangeIndexManager where they belong. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…operty Deserializer now takes tempPath and optional ILogger directly instead of the full RangeIndexManager. This completes the separation: the deserializer is a pure parser with no knowledge of the manager's internals. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Close the FileStream as soon as fileBytesRemaining hits zero (ensuring data is flushed to disk) rather than waiting for trailer data to arrive in the same chunk. The trailer is now parsed in a separate state. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Test cases crafted from raw bytes (no real BfTree needed): - FileDataExactlyFillsChunk_TrailerInNextChunk: file flush verified - FileDataOneBytePerChunk: single byte file data chunks - EmptyChunkDuringWaitingForTrailer: empty chunk accepted - KeySplitAcrossTwoChunks: partial key in first chunk - ZeroFileDataGoesDirectlyToTrailer: no file bytes - CorruptedFileDataFailsChecksumInTrailer: checksum mismatch Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- ClusterMigrateEmptyRangeIndex: creates RI key with no data, migrates, verifies RI.SET works on target after migration - Log snapshot file size in SnapshotForMigration for debugging Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…content Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Missing RangeIndex feature-enabled guardsThis PR doesn't consistently check whether the RangeIndex feature is enabled (i.e., whether 1. Source side —
|
| } | ||
|
|
||
| public async Task<bool> TransmitKeysAsync(Dictionary<byte[], byte[]> vectorSetKeysToIgnore) | ||
| public async Task<bool> TransmitKeysAsync(Dictionary<byte[], byte[]> vectorSetKeysToIgnore, Dictionary<byte[], byte[]> rangeIndexKeysToIgnore) |
There was a problem hiding this comment.
Is it possible to combine these two instead of having to maintain two dictionaries?
|
|
||
| var delRes = localServerSession.BasicGarnetApi.DELETE(key); | ||
|
|
||
| session.logger?.LogDebug("Deleting RangeIndex {key} after migration: {delRes}", System.Text.Encoding.UTF8.GetString(key), delRes); |
There was a problem hiding this comment.
nit: logging message is confusing. Should be 'Deleted..' otherwise move it before actual delete.
vazois
left a comment
There was a problem hiding this comment.
libs/cluster/Server/Migration/MigrateScanFunctions.cs:53 — maybe not for this PR, but can't we just get these values from an enum?
|
|
||
| return true; | ||
| } | ||
| finally |
There was a problem hiding this comment.
maybe catch and log a possible exception here. ensure to re-throw.
| await WaitForConfigPropagationAsync().ConfigureAwait(false); | ||
|
|
||
| // Discover Vector Sets linked namespaces | ||
| var allKeys = migrateTask.sketch.Keys.Select(t => t.Item1.ToArray()); |
There was a problem hiding this comment.
can we avoid the copy here and just iterate over the container?
vazois
left a comment
There was a problem hiding this comment.
libs/cluster/Server/Migration/MigrateSessionKeys.cs:43 — seems really wasteful to maintain a separate dictionary just for skipping keys. We can potentially store the info for the key type inline and skip the key once we read it from the sketch list
| } | ||
|
|
||
| /// <summary>Reset state for the next key stream.</summary> | ||
| private void Reset() |
There was a problem hiding this comment.
Ensure that the reset does not race with back to back Migration calls or parallel sessions on different slots
| /// from one migration arrive on the same <see cref="ClusterSession"/>, guaranteeing | ||
| /// in-order delivery. | ||
| /// </remarks> | ||
| internal sealed class RangeIndexMigrationReceiveState : IDisposable |
There was a problem hiding this comment.
Since this implementation is specific to RangeIndex, can you rename the file to indicate that? The current filename MigrationReceiveSession.cs is too generic.
Summary
Adds end-to-end cluster migration for RangeIndex keys, supporting both
MIGRATE SLOTSandMIGRATE KEYSpaths. RangeIndex keys are backed by a native BfTree whose on-disk state (data.bftree) lives outside Tsavorite — shipping just the 51-byte stub is insufficient. This change streams the entire BfTree snapshot file alongside the stub over the existing migration transport.Architecture
RangeIndexChunkedSerializer): Pure state machine overSpan<byte>— no I/O. Takes file data as input viaMoveNext(dest, fileData, out consumed).RangeIndexMigrationReader): Async wrapper that reads the snapshot file and feeds bytes to the serializer.RangeIndexChunkedDeserializer): Sync state machine that writes received file data to a temp file, validates xxHash64 checksum, recovers the native BfTree, and publishes the stub to the store.RangeIndexManager.SnapshotRangeIndexAndCreateReader): Snapshots the BfTree under an exclusive lock to a temp file, then creates the reader.Wire format
Single
MigrationRecordSpanType.SerializedRangeIndexStream(tag 4). Stream format across chunks:[4B keyLen][key bytes][8B fileSize][file bytes][8B xxHash64][4B stubLen][stub]Key and file bytes may span chunks; all other elements must fit within a single chunk.
SLOTS path
RecordType == 2, captures toMigrateOperation.RangeIndexesMigrateRangeIndexKeysAsyncruns a sketch-protected batch cycle:TransmitRangeIndexAsyncfinally: clear sketch (unblocks clients)KEYS path
GetRangeIndexKeysForMigrationdiscovers RI keys via RIGETTransmitKeysAsyncskips RI keys (inrangeIndexKeysToIgnore)TransmitRangeIndexAsync, then marked in sketch forDeleteKeysAsyncBug fixes
GarnetServer.cs)Publishdeletes existing data file before movePublishregistration: acceptInPlaceUpdated/CopyUpdatedstatusTransmitRangeIndexAsynccatches all exceptions (never throws)Tests
TODO