Refactor disk-based replication checkpoint shipping and safe hlog segment truncation#1773
Draft
Refactor disk-based replication checkpoint shipping and safe hlog segment truncation#1773
Conversation
Consolidate file segment and metadata transmission into a single CLUSTER SNAPSHOT_DATA <token> <type> <startAddress> <data> command. A startAddress of -1 signals a single-message payload (e.g., metadata) committed directly. Any other startAddress indicates a streamed file segment where empty data signals end-of-stream. The previous CLUSTER SEND_CKPT_FILE_SEGMENT and SEND_CKPT_METADATA commands are not removed but are deprecated in favor of SNAPSHOT_DATA. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Ship BfTree snapshot files (snapshot.{token}.bftree) during checkpoint
synchronization to replicas. This enables replicas to lazily restore
RangeIndex trees from checkpoint snapshots after recovery.
New types following the existing ISnapshotDataSource/ISnapshotTransmitSource/
ISnapshotReader pattern:
- RangeIndexFileDataSource: reads .bftree files via FileStream
- RangeIndexFileTransmitSource: sends chunks with per-file key hash header
- RangeIndexCheckpointReader: enumerates snapshot files for a checkpoint token
- RangeIndexFileSink: writes received .bftree data to disk on replicas
Wire protocol: for each RINDEX_SNAPSHOT file, a header message (startAddress=-1)
carries the 32-char hex key hash directory name, followed by file data chunks,
followed by an empty EOT packet. Receiver validates key hash format.
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… receiving a cluster snapshot data request
…cation Add cluster-aware purging to PurgeOldCheckpointSnapshots so that in cluster mode, snapshot deletion is deferred to CheckpointStore which verifies no active readers hold the checkpoint entry. CheckpointStore now calls PurgeOldCheckpointSnapshots with enforceClusterSafety:true after confirming reader safety, both in DeleteOutdatedCheckpoints and PurgeAllCheckpointsExceptEntry. - Add clusterEnabled field to RangeIndexManager constructor - Add enforceClusterSafety parameter to PurgeOldCheckpointSnapshots - Wire CheckpointStore to purge BfTree snapshots alongside HLOG/index - Pass clusterEnabled from GarnetServer to RangeIndexManager Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…t truncation Introduce PerformInternalCleanup property on ICheckpointManager to control whether Tsavorite performs internal cleanup of checkpoint snapshot files and hybrid log segments during the checkpoint state machine. When false, the external layer (cluster mode) manages cleanup with reader-safety checks. - Add PerformInternalCleanup to ICheckpointManager interface - Add virtual property to DeviceLogCommitCheckpointManager (default: true) - Override as false in GarnetClusterCheckpointManager - Guard CleanupLogCheckpoint/CleanupIndexCheckpoint in Checkpoint.cs - Activate safe ShiftBeginAddress in CheckpointStore.DeleteOutdatedCheckpoints using the oldest active checkpoint's begin address as the truncation boundary - Add ClusterReplicationHlogSegmentCleanupTest to validate hlog segment truncation does not interfere with concurrent replica sync Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Remove the ISnapshotReader and ISnapshotTransmitSource implementations for RangeIndex BfTree checkpoints, along with the receive-side sink and helper methods for enumerating/purging BfTree snapshot files during replication. Deleted: - RangeIndexCheckpointReader.cs - RangeIndexFileTransmitSource.cs - RangeIndexFileSink.cs Cleaned up: - CheckpointFileType: removed RINDEX_SNAPSHOT enum value - CheckpointStore: removed PurgeOldCheckpointSnapshots calls - ReceiveCheckpointHandler: removed RINDEX_SNAPSHOT handling - ReplicaSyncSession: removed RangeIndex reader registration - RangeIndexManager.cs: moved replication methods to partial file - StoreWrapper: removed public RangeIndexManager property - GarnetServer: reverted clusterEnabled parameter The RangeIndexManager.Replication.cs partial file is retained as the separation of AOF replication methods remains useful. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Refactors the checkpoint shipping pipeline for disk-based replication, introducing clean abstractions for reading and transmitting Tsavorite checkpoint data. Also adds safe hybrid log segment truncation to prevent deletion of segments actively being read by syncing replicas.
Key changes
Checkpoint shipping abstractions (send side):
Checkpoint shipping abstractions (receive side):
Safe hlog segment truncation (PerformInternalCleanup):
RangeIndexManager refactoring:
Testing