Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/source/user_guide.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ User Guide
user_guide/snapshot
user_guide/manifest
user_guide/manifest_cache
user_guide/manifest_entry_cache
user_guide/parquet_metadata_cache
user_guide/data_types
user_guide/primary_key_table
Expand Down
84 changes: 84 additions & 0 deletions docs/source/user_guide/manifest_entry_cache.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,84 @@
.. Copyright 2026-present Alibaba Inc.

.. Licensed under the Apache License, Version 2.0 (the "License");
.. you may not use this file except in compliance with the License.
.. You may obtain a copy of the License at

.. http://www.apache.org/licenses/LICENSE-2.0

.. Unless required by applicable law or agreed to in writing, software
.. distributed under the License is distributed on an "AS IS" BASIS,
.. WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
.. See the License for the specific language governing permissions and
.. limitations under the License.

Manifest Entry Cache
====================

Overview
--------

Large tables may contain many manifest entries, while a scan may only need a
small subset after snapshot, partition, bucket, and statistics pruning. The
snapshot-level manifest entry cache reduces repeated manifest decoding cost for
successive full scans.

The cache stores decoded and merged live manifest entries by snapshot for
``ScanMode::ALL``. When a newer snapshot is scanned, paimon-cpp tries to build
the target snapshot incrementally from the latest cached snapshot by reading
only intermediate delta manifests.

Request-specific filters are not stored in the cache. Partition, bucket, level,
and predicate filters are still evaluated for each scan, so cached entries can
be reused safely across different scan predicates.

Configuration
-------------

Manifest entry caching reuses the cache instance provided by
``ScanContextBuilder::WithCache()`` and stores the snapshot bundle under
``CacheKind::SNAPSHOT_LIVE_MANIFEST``:

.. code-block:: cpp

auto cache = std::make_shared<LruCache>(128 * 1024 * 1024);
ScanContextBuilder context_builder(table_path);
PAIMON_ASSIGN_OR_RAISE(
std::unique_ptr<ScanContext> scan_context,
context_builder
.WithCache(cache)
.AddOption(Options::SCAN_MANIFEST_ENTRY_CACHE_MAX_SNAPSHOTS, "3")
.Finish());

Cache entries are scoped by table path and branch, so they can be reused across
newly created ``TableScan`` and ``FileStoreScan`` instances as long as they
share the same cache object.

``Options::SCAN_MANIFEST_ENTRY_CACHE_MAX_SNAPSHOTS`` controls how many snapshot
results are retained per table and branch. Older snapshot entries are evicted
first. The default value is ``0``, which disables the cache path. Set it to a
positive value to enable the cache when ``ScanContextBuilder::WithCache()`` is
also configured.

If no cache is provided through ``ScanContextBuilder::WithCache()``, this
optimization is skipped. The snapshot manifest entry cache shares the same
``Cache`` interface with raw manifest and data-file footer caches, but it uses a
dedicated ``CacheKind`` and a table/branch key instead of file byte ranges.

Limitations
-----------

The cache is currently used only for ``ScanMode::ALL``. It is skipped for
row-range scans because row-range pruning is applied at manifest-meta level.

Metrics
-------

The scan metrics expose counters for the last scan:

- ``lastManifestEntryCacheHit``: whether the target snapshot was served
directly from the cache.
- ``lastManifestEntryCacheIncrementalSnapshots``: how many intermediate
snapshots were applied during incremental construction.
- ``lastManifestEntryCacheLoadedManifests``: how many manifest files were
loaded for the cache path.
3 changes: 3 additions & 0 deletions include/paimon/cache/cache.h
Original file line number Diff line number Diff line change
Expand Up @@ -33,6 +33,7 @@ enum class CacheKind {
DEFAULT,
MANIFEST,
DATA_FILE_FOOTER,
SNAPSHOT_LIVE_MANIFEST,
};

class PAIMON_EXPORT CacheKey {
Expand All @@ -41,6 +42,8 @@ class PAIMON_EXPORT CacheKey {
int32_t length, bool is_index);
static std::shared_ptr<CacheKey> ForKind(const std::string& file_path, int64_t position,
int32_t length, CacheKind kind);
static std::shared_ptr<CacheKey> ForSnapshotLiveManifestEntries(const std::string& table_path,
const std::string& branch);

public:
virtual ~CacheKey() = default;
Expand Down
5 changes: 5 additions & 0 deletions include/paimon/defs.h
Original file line number Diff line number Diff line change
Expand Up @@ -166,6 +166,11 @@ struct PAIMON_EXPORT Options {
/// "latest-full", "latest", "from-snapshot", "from-snapshot-full". Default value is "default".
static const char SCAN_MODE[];

/// "scan.manifest-entry-cache.max-snapshots" - Maximum number of snapshot manifest entry
/// results retained per table and branch. Setting it to 0 disables manifest entry cache.
/// Default value is 0.
static const char SCAN_MANIFEST_ENTRY_CACHE_MAX_SNAPSHOTS[];

/// "read.batch-size" - Read batch size for any file format if it supports.
/// The default value is 1024.
static const char READ_BATCH_SIZE[];
Expand Down
1 change: 1 addition & 0 deletions src/paimon/CMakeLists.txt
Original file line number Diff line number Diff line change
Expand Up @@ -304,6 +304,7 @@ set(PAIMON_CORE_SRCS
core/operation/raw_file_split_read.cpp
core/operation/read_context.cpp
core/operation/scan_context.cpp
core/manifest/snapshot_live_manifest_entries.cpp
core/operation/write_context.cpp
core/operation/write_restore.cpp
core/postpone/postpone_bucket_writer.cpp
Expand Down
2 changes: 2 additions & 0 deletions src/paimon/common/defs.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -47,6 +47,8 @@ const char Options::SOURCE_SPLIT_TARGET_SIZE[] = "source.split.target-size";
const char Options::SOURCE_SPLIT_OPEN_FILE_COST[] = "source.split.open-file-cost";
const char Options::SCAN_SNAPSHOT_ID[] = "scan.snapshot-id";
const char Options::SCAN_MODE[] = "scan.mode";
const char Options::SCAN_MANIFEST_ENTRY_CACHE_MAX_SNAPSHOTS[] =
"scan.manifest-entry-cache.max-snapshots";
const char Options::READ_BATCH_SIZE[] = "read.batch-size";
const char Options::WRITE_BATCH_SIZE[] = "write.batch-size";
const char Options::WRITE_BUFFER_SIZE[] = "write-buffer-size";
Expand Down
14 changes: 14 additions & 0 deletions src/paimon/common/io/cache/cache_key.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -17,6 +17,14 @@
#include "paimon/common/io/cache/cache_key.h"

namespace paimon {
namespace {

std::string SnapshotLiveManifestEntriesCacheKey(const std::string& table_path,
const std::string& branch) {
return table_path + "#" + branch;
}

} // namespace

std::shared_ptr<CacheKey> CacheKey::ForPosition(const std::string& file_path, int64_t position,
int32_t length, bool is_index) {
Expand All @@ -31,6 +39,12 @@ std::shared_ptr<CacheKey> CacheKey::ForKind(const std::string& file_path, int64_
return key;
}

std::shared_ptr<CacheKey> CacheKey::ForSnapshotLiveManifestEntries(const std::string& table_path,
const std::string& branch) {
return ForKind(SnapshotLiveManifestEntriesCacheKey(table_path, branch), /*position=*/-1,
/*length=*/-1, CacheKind::SNAPSHOT_LIVE_MANIFEST);
}

bool PositionCacheKey::IsIndex() const {
return is_index_;
}
Expand Down
12 changes: 12 additions & 0 deletions src/paimon/common/io/cache/lru_cache_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -381,6 +381,18 @@ TEST_F(LruCacheTest, TestForKindSetsKeyKind) {
ASSERT_EQ(CacheKind::MANIFEST, put_key->GetKind());
}

TEST_F(LruCacheTest, TestForSnapshotLiveManifestEntries) {
auto main_key = CacheKey::ForSnapshotLiveManifestEntries("table_path", "main");
auto same_key = CacheKey::ForSnapshotLiveManifestEntries("table_path", "main");
auto branch_key = CacheKey::ForSnapshotLiveManifestEntries("table_path", "dev");
auto table_key = CacheKey::ForSnapshotLiveManifestEntries("other_table_path", "main");

ASSERT_EQ(CacheKind::SNAPSHOT_LIVE_MANIFEST, main_key->GetKind());
ASSERT_TRUE(CacheKeyEqual()(main_key, same_key));
ASSERT_FALSE(CacheKeyEqual()(main_key, branch_key));
ASSERT_FALSE(CacheKeyEqual()(main_key, table_key));
}

/// Verifies that multiple evictions happen when a single large entry is inserted.
TEST_F(LruCacheTest, TestMultipleEvictions) {
LruCache cache(300);
Expand Down
13 changes: 13 additions & 0 deletions src/paimon/core/core_options.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -403,6 +403,7 @@ struct CoreOptions::Impl {
int32_t bucket = -1;

int32_t manifest_merge_min_count = 30;
int32_t scan_manifest_entry_cache_max_snapshots = 0;
int32_t read_batch_size = 1024;
int32_t write_batch_size = 1024;
int32_t local_sort_max_num_file_handles = 128;
Expand Down Expand Up @@ -717,6 +718,13 @@ struct CoreOptions::Impl {
}
// Parse scan.mode - scanning behavior of the source, default "default"
PAIMON_RETURN_NOT_OK(parser.ParseStartupMode(&startup_mode));
// Parse scan.manifest-entry-cache.max-snapshots - cache size by snapshot count
PAIMON_RETURN_NOT_OK(parser.Parse(Options::SCAN_MANIFEST_ENTRY_CACHE_MAX_SNAPSHOTS,
&scan_manifest_entry_cache_max_snapshots));
if (scan_manifest_entry_cache_max_snapshots < 0) {
return Status::Invalid(fmt::format("{} must be non-negative",
Options::SCAN_MANIFEST_ENTRY_CACHE_MAX_SNAPSHOTS));
}
// Parse scan.fallback-branch - fallback branch when partition not found
PAIMON_RETURN_NOT_OK(parser.Parse(Options::SCAN_FALLBACK_BRANCH, &scan_fallback_branch));
// Parse branch - branch name, default "main"
Expand Down Expand Up @@ -968,6 +976,11 @@ std::optional<int64_t> CoreOptions::GetScanSnapshotId() const {
std::optional<int64_t> CoreOptions::GetScanTimestampMillis() const {
return impl_->scan_timestamp_millis;
}

int32_t CoreOptions::GetScanManifestEntryCacheMaxSnapshots() const {
return impl_->scan_manifest_entry_cache_max_snapshots;
}

int64_t CoreOptions::GetManifestTargetFileSize() const {
return impl_->manifest_target_file_size;
}
Expand Down
1 change: 1 addition & 0 deletions src/paimon/core/core_options.h
Original file line number Diff line number Diff line change
Expand Up @@ -78,6 +78,7 @@ class PAIMON_EXPORT CoreOptions {
int64_t GetSourceSplitOpenFileCost() const;
std::optional<int64_t> GetScanSnapshotId() const;
std::optional<int64_t> GetScanTimestampMillis() const;
int32_t GetScanManifestEntryCacheMaxSnapshots() const;

int64_t GetManifestTargetFileSize() const;
std::shared_ptr<Cache> GetCache() const;
Expand Down
6 changes: 6 additions & 0 deletions src/paimon/core/core_options_test.cpp
Original file line number Diff line number Diff line change
Expand Up @@ -55,6 +55,7 @@ TEST(CoreOptionsTest, TestDefaultValue) {
ASSERT_EQ(8 * 1024 * 1024L, core_options.GetManifestTargetFileSize());
ASSERT_EQ(16 * 1024 * 1024L, core_options.GetManifestFullCompactionThresholdSize());
ASSERT_EQ(30, core_options.GetManifestMergeMinCount());
ASSERT_EQ(0, core_options.GetScanManifestEntryCacheMaxSnapshots());
ASSERT_EQ(nullptr, core_options.GetCache());
ASSERT_EQ(128 * 1024 * 1024L, core_options.GetSourceSplitTargetSize());
ASSERT_EQ(4 * 1024 * 1024L, core_options.GetSourceSplitOpenFileCost());
Expand Down Expand Up @@ -184,6 +185,7 @@ TEST(CoreOptionsTest, TestFromMap) {
{Options::COMMIT_MAX_RETRIES, "20"},
{Options::SCAN_SNAPSHOT_ID, "5"},
{Options::SCAN_MODE, "from-snapshot-full"},
{Options::SCAN_MANIFEST_ENTRY_CACHE_MAX_SNAPSHOTS, "7"},
{Options::SNAPSHOT_NUM_RETAINED_MIN, "15"},
{Options::SNAPSHOT_NUM_RETAINED_MAX, "30"},
{Options::SNAPSHOT_EXPIRE_LIMIT, "20"},
Expand Down Expand Up @@ -305,6 +307,7 @@ TEST(CoreOptionsTest, TestFromMap) {
ASSERT_EQ(120 * 1000, core_options.GetCommitTimeout());
ASSERT_EQ(20, core_options.GetCommitMaxRetries());
ASSERT_EQ(5, core_options.GetScanSnapshotId().value_or(-1));
ASSERT_EQ(7, core_options.GetScanManifestEntryCacheMaxSnapshots());
ExpireConfig expire_config = core_options.GetExpireConfig();
ASSERT_EQ(15, expire_config.GetSnapshotRetainMin());
ASSERT_EQ(30, expire_config.GetSnapshotRetainMax());
Expand Down Expand Up @@ -432,6 +435,9 @@ TEST(CoreOptionsTest, TestInvalidCase) {
"invalid lookup mode: invalid");
ASSERT_NOK_WITH_MSG(CoreOptions::FromMap({{Options::LOOKUP_COMPACT_MAX_INTERVAL, "invalid"}}),
"Invalid Config [lookup-compact.max-interval: invalid]");
ASSERT_NOK_WITH_MSG(
CoreOptions::FromMap({{Options::SCAN_MANIFEST_ENTRY_CACHE_MAX_SNAPSHOTS, "-1"}}),
"scan.manifest-entry-cache.max-snapshots must be non-negative");
ASSERT_NOK_WITH_MSG(
CoreOptions::FromMap({{Options::LOOKUP_CACHE_HIGH_PRIO_POOL_RATIO, "1.1"}}),
"The high priority pool ratio should in the range [0, 1), while input is 1.1");
Expand Down
Loading
Loading