Skip to content

[WIP] feat(manifest): support snapshot mainifest cache#386

Draft
gripleaf wants to merge 1 commit into
alibaba:mainfrom
gripleaf:feat/snapshot-manifest-cache
Draft

[WIP] feat(manifest): support snapshot mainifest cache#386
gripleaf wants to merge 1 commit into
alibaba:mainfrom
gripleaf:feat/snapshot-manifest-cache

Conversation

@gripleaf

Copy link
Copy Markdown
Contributor

Purpose

In our production scenario, one Paimon table has about 60,000 buckets. A new batch of data is imported every 15 minutes, and this interval may become longer later. During query planning, the scan path may decode about 4.89 million manifest entries. With 16 threads, manifest entry decoding takes around 30 seconds, while only a small subset of entries is finally
kept after pruning.

This PR introduces a snapshot-level manifest entry result cache to reduce repeated manifest decoding cost for successive full scans.

The cache stores decoded and merged live manifest entries by snapshot. When the scan advances to a newer snapshot, it tries to build the target snapshot incrementally from the latest cached snapshot by reading only intermediate delta manifests. Request-specific filters, such as partition, bucket, level, and predicate filters, are not cached and are still
evaluated for each scan.

The feature is disabled by default and can be enabled by:

  • scan.manifest-entry-cache.enabled
  • scan.manifest-entry-cache.max-snapshots

Tests

  • cmake --build build --target paimon-core-test -j2
  • ./build/release/paimon-core-test --gtest_filter='CoreOptionsTest.TestDefaultValue:CoreOptionsTest.TestFromMap:CoreOptionsTest.TestInvalidCase:FileStoreScanTest.TestSnapshotManifestEntryCache'
  • ./build/release/paimon-core-test --gtest_filter='FileStoreScanTest.*'

API and Format

This change adds new public scan options in include/paimon/defs.h:

  • Options::SCAN_MANIFEST_ENTRY_CACHE_ENABLED
  • Options::SCAN_MANIFEST_ENTRY_CACHE_MAX_SNAPSHOTS

This change does not affect storage format or protocol.

Documentation

Yes.

Added user guide documentation:

  • docs/source/user_guide/manifest_entry_cache.rst

Updated the user guide to include the new page.

Generative AI tooling

Generated-by: OpenAI Codex GPT-5

@gripleaf gripleaf force-pushed the feat/snapshot-manifest-cache branch 4 times, most recently from f8bc721 to 30b686e Compare June 29, 2026 10:08
@gripleaf gripleaf force-pushed the feat/snapshot-manifest-cache branch from 30b686e to 3058b21 Compare June 29, 2026 12:44
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant