[WIP] feat(manifest): support snapshot mainifest cache#386
Draft
gripleaf wants to merge 1 commit into
Draft
Conversation
f8bc721 to
30b686e
Compare
30b686e to
3058b21
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
In our production scenario, one Paimon table has about 60,000 buckets. A new batch of data is imported every 15 minutes, and this interval may become longer later. During query planning, the scan path may decode about 4.89 million manifest entries. With 16 threads, manifest entry decoding takes around 30 seconds, while only a small subset of entries is finally
kept after pruning.
This PR introduces a snapshot-level manifest entry result cache to reduce repeated manifest decoding cost for successive full scans.
The cache stores decoded and merged live manifest entries by snapshot. When the scan advances to a newer snapshot, it tries to build the target snapshot incrementally from the latest cached snapshot by reading only intermediate delta manifests. Request-specific filters, such as partition, bucket, level, and predicate filters, are not cached and are still
evaluated for each scan.
The feature is disabled by default and can be enabled by:
Tests
API and Format
This change adds new public scan options in include/paimon/defs.h:
This change does not affect storage format or protocol.
Documentation
Yes.
Added user guide documentation:
Updated the user guide to include the new page.
Generative AI tooling
Generated-by: OpenAI Codex GPT-5