Skip to content

feat(cloud-storage): add pluggable cloud storage support for distributed HugeGraph persistence#3061

Draft
vaijosh wants to merge 11 commits into
apache:masterfrom
vaijosh:RockDBCloud
Draft

feat(cloud-storage): add pluggable cloud storage support for distributed HugeGraph persistence#3061
vaijosh wants to merge 11 commits into
apache:masterfrom
vaijosh:RockDBCloud

Conversation

@vaijosh

@vaijosh vaijosh commented Jun 19, 2026

Copy link
Copy Markdown
Contributor

Why

This PR introduces a unified cloud storage capability for HugeGraph’s distributed stack, so store data can be synchronized to cloud object storage for stronger durability and recovery options.

The focus is not a provider-specific implementation detail, but a cloud-capable architecture with a default S3-compatible provider and extension points for additional providers.

What this PR delivers

1) Cloud storage support in distributed runtime

  • Enables cloud sync for store-side RocksDB in distributed deployments (backend=hstore flow).
  • Supports both:
    • cloud-first mode: commit waits for cloud sync (stronger durability)
    • async mode: local/Raft commit first, periodic cloud reconciliation

2) Pluggable provider architecture

  • Introduces provider/client abstractions for cloud backends.
  • Uses ServiceLoader discovery so new cloud providers can be added via plugin JARs.
  • Keeps built-in s3 provider as default (S3-compatible API model), while allowing future providers without core rewrites.

3) Cloud-neutral configuration model

  • Standardizes configuration naming around cloud-neutral keys (cloud_*).
  • Aligns server-side propagation and store-side consumption of cloud settings.
  • Cleans up older S3-specific naming to keep the config surface consistent and provider-agnostic.

4) Operational docs and examples

  • Updates architecture and usage docs to describe cloud storage behavior consistently.
  • Adds/updates sample plugin guidance and SPI wiring for provider developers.
  • Aligns docker/dev scripts and templates with cloud storage configuration.

User Impact

  • Resilience against Ephemeral Infrastructure: By treating cloud object storage as the decoupled, durable source of truth, HugeGraph can seamlessly adapt to cloud-native environments (e.g., Kubernetes). If an instance or pod is unexpectedly terminated, rescheduled, or suffers from local disk/EBS detachment, the system prevents catastrophic data loss. New instances can instantly rehydrate or recover their state directly from the cloud checkpoint.
  • Flexible Durability SLAs: Operators running distributed deployments (backend=hstore) can choose their optimal trade-off between throughput and durability:
    • Cloud-First Mode (sync): Guarantees zero data loss by ensuring local Raft commits inline with cloud storage flushes before returning success to the caller.
    • Asynchronous Mode (async): Minimizes latency by performing background reconciliation to the cloud storage bucket within a bounded time horizon.

Validation

  • Maven compile/build verification on affected modules.
  • Manual Validation and smoke test validation check RocksDB-cloud.md for more details.

Reviewer focus areas

  • Cloud config key consistency across server/store/docs
  • Provider abstraction and ServiceLoader integration
  • Cloud-first vs async durability semantics
  • Recovery/rehydration behavior and operational clarity

@codecov

codecov Bot commented Jun 23, 2026

Copy link
Copy Markdown

Codecov Report

❌ Patch coverage is 0% with 662 lines in your changes missing coverage. Please review.
✅ Project coverage is 29.84%. Comparing base (39dfb2d) to head (4aed5da).
⚠️ Report is 5 commits behind head on master.

Files with missing lines Patch % Lines
...java/org/apache/hugegraph/pd/PartitionService.java 0.00% 176 Missing ⚠️
...gegraph/store/partition/PartitionLeaseManager.java 0.00% 104 Missing ⚠️
...hugegraph/store/partition/LeaseEpochValidator.java 0.00% 80 Missing ⚠️
...egraph/store/raft/PartitionLeaseStateListener.java 0.00% 52 Missing ⚠️
...va/org/apache/hugegraph/store/PartitionEngine.java 0.00% 46 Missing ⚠️
...g/apache/hugegraph/store/pd/DefaultPdProvider.java 0.00% 35 Missing ⚠️
...pache/hugegraph/pd/meta/PartitionBucketRecord.java 0.00% 33 Missing ⚠️
.../java/org/apache/hugegraph/pd/client/PDClient.java 0.00% 31 Missing ⚠️
...va/org/apache/hugegraph/pd/meta/StoreInfoMeta.java 0.00% 26 Missing ⚠️
...rg/apache/hugegraph/pd/meta/MetadataKeyHelper.java 0.00% 22 Missing ⚠️
... and 8 more

❗ There is a different number of reports uploaded between BASE (39dfb2d) and HEAD (4aed5da). Click for more details.

HEAD has 1 upload less than BASE
Flag BASE (39dfb2d) HEAD (4aed5da)
2 1
Additional details and impacted files
@@             Coverage Diff              @@
##             master    #3061      +/-   ##
============================================
- Coverage     34.89%   29.84%   -5.06%     
- Complexity      338      375      +37     
============================================
  Files           803      808       +5     
  Lines         68241    69051     +810     
  Branches       8965     9057      +92     
============================================
- Hits          23815    20606    -3209     
- Misses        41826    46069    +4243     
+ Partials       2600     2376     -224     

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant