Skip to content

<fix>[storage]: support ceph multi image cache pools#4449

Open
ZStack-Robot wants to merge 3 commits into
4.8.38-Eliteryfrom
sync/shan.wu/cherry-pick-multi-imagecache-pool@@3
Open

<fix>[storage]: support ceph multi image cache pools#4449
ZStack-Robot wants to merge 3 commits into
4.8.38-Eliteryfrom
sync/shan.wu/cherry-pick-multi-imagecache-pool@@3

Conversation

@ZStack-Robot

Copy link
Copy Markdown
Collaborator

PROBLEM:
Ceph primary storage previously assumed one image cache pool for one primary storage. When a VM root volume is created on another pool, the root volume may be cloned from the image cache snapshot in the default pool. Some customer Ceph clusters do not support cross-pool RBD clone, so VM creation, reimage, and change image can fail when the selected volume pool differs from the image cache pool.

SOLUTION:
Add a Ceph image cache pool strategy. The default strategy keeps the old behavior and uses the default image cache pool. The volume-pool strategy selects the target root volume pool when that physical pool also has the ImageCache role. Otherwise it falls back to the default image cache pool. The existing-cache strategy reuses an existing cache first, preferring the default pool cache and then another valid cache record.

Image cache preparation now selects the target cache pool before capacity allocation. If the selected pool already has a valid cache snapshot, the existing cache is reused. If the selected cache record is stale, only that cache record and its refs are removed and the flow retries. If another pool has a valid cache for the same image, the snapshot is copied into the selected pool with RBD cp/deep cp. The normal create-snapshot and protect-snapshot flow then creates a new cache in the selected pool. If no usable cache exists, the image is downloaded through the existing backup storage flow. Same-Ceph backup storage is copied into the selected pool when needed to avoid later cross-pool clone.

The final root volume operation still clones from the prepared image cache snapshot. The data-copy step is only used to materialize the cache in the selected pool. Snapshot reuse images keep their original volumeSnapshotReuse cache record and do not fall back to ordinary backup-storage download. Reinit can still select existing cache records by image UUID when the source ImageVO has already been deleted.

TEST:
Verified on root@172.20.13.237 under /root/zstack-workspace/zstack. Ran the changed Ceph integration case through TestCaseStabilityTest: CephPrimaryStorageVolumePoolsCase. Result: BUILD SUCCESS. Tests run: 1, Failures: 0, Errors: 0.

The case covers default strategy compatibility, volume-pool strategy, existing-cache strategy, same-Ceph backup storage copy to selected pool, stale selected cache cleanup, copying cache from another pool, image cache cleanup with refs, reimage, and multi-ImageCache-pool setup in the common Ceph primary storage spec.

Resolves: ZSTAC-62608

Change-Id: Ie498c8af87c1a0248429cc81980b135725894778

sync from gitlab !10411

@coderabbitai

coderabbitai Bot commented Jul 5, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 44 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: ebd4e3a7-8bf4-4499-af51-5fe6f383df62

📥 Commits

Reviewing files that changed from the base of the PR and between 2f71ae4 and f131bc6.

⛔ Files ignored due to path filters (1)
  • conf/globalConfig/ceph.xml is excluded by !**/*.xml
📒 Files selected for processing (14)
  • conf/db/upgrade/V4.8.38-Elitery__schema.sql
  • header/src/main/java/org/zstack/header/storage/primary/DownloadVolumeTemplateToPrimaryStorageMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/CephGlobalConfig.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/APIAddCephPrimaryStoragePoolMsg.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephImageCachePoolStrategy.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • sdk/src/main/java/org/zstack/sdk/AddCephPrimaryStoragePoolAction.java
  • test/src/test/groovy/org/zstack/test/integration/storage/ceph/CephOperationCase.groovy
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/CephPrimaryStorageVolumePoolsCase.groovy
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/capacity/CephOpenSourcePoolCapacityCase.groovy
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/sandstone/capacity/CephSandStonePoolCapacityCase.groovy
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/ceph/xsky/capacity/CephXskyPoolCapacityCase.groovy
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStoragePoolSpec.groovy
  • testlib/src/main/java/org/zstack/testlib/CephPrimaryStorageSpec.groovy
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/shan.wu/cherry-pick-multi-imagecache-pool@@3

Comment @coderabbitai help to get the list of available commands.

@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/cherry-pick-multi-imagecache-pool@@3 branch from e9e78c6 to c43dce2 Compare July 5, 2026 15:07
shan.wu added 2 commits July 5, 2026 23:10
PROBLEM:
Ceph primary storage previously assumed one image cache pool for one primary storage. When a VM root volume is created on another pool, the root volume may be cloned from the image cache snapshot in the default pool. Some customer Ceph clusters do not support cross-pool RBD clone, so VM creation, reimage, and change image can fail when the selected volume pool differs from the image cache pool.

SOLUTION:
Add a Ceph image cache pool strategy. The default strategy keeps the old behavior and uses the default image cache pool. The volume-pool strategy selects the target root volume pool when that physical pool also has the ImageCache role. Otherwise it falls back to the default image cache pool. The existing-cache strategy reuses an existing cache first, preferring the default pool cache and then another valid cache record.

Image cache preparation now selects the target cache pool before capacity allocation. If the selected pool already has a valid cache snapshot, the existing cache is reused. If the selected cache record is stale, only that cache record and its refs are removed and the flow retries. If another pool has a valid cache for the same image, the snapshot is copied into the selected pool with RBD cp/deep cp. The normal create-snapshot and protect-snapshot flow then creates a new cache in the selected pool. If no usable cache exists, the image is downloaded through the existing backup storage flow. Same-Ceph backup storage is copied into the selected pool when needed to avoid later cross-pool clone.

The final root volume operation still clones from the prepared image cache snapshot. The data-copy step is only used to materialize the cache in the selected pool. Snapshot reuse images keep their original volumeSnapshotReuse cache record and do not fall back to ordinary backup-storage download. Reinit can still select existing cache records by image UUID when the source ImageVO has already been deleted.

TEST:
Verified on root@172.20.13.237 under /root/zstack-workspace/zstack. Ran the changed Ceph integration case through TestCaseStabilityTest: CephPrimaryStorageVolumePoolsCase. Result: BUILD SUCCESS. Tests run: 1, Failures: 0, Errors: 0.

The case covers default strategy compatibility, volume-pool strategy, existing-cache strategy, same-Ceph backup storage copy to selected pool, stale selected cache cleanup, copying cache from another pool, image cache cleanup with refs, reimage, and multi-ImageCache-pool setup in the common Ceph primary storage spec.

Resolves: ZSTAC-62608

Change-Id: Ie498c8af87c1a0248429cc81980b135725894778
…t image exists.

skip copying image cache from one pool to another if dest image exists.

Resolves: ZSTAC-62608

Change-Id: I57322174441cb4dd5cd7014b7d41b5751749398c
@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/cherry-pick-multi-imagecache-pool@@3 branch from c43dce2 to e2b39e2 Compare July 5, 2026 15:11
remove invalid Elitery migration file.

Resolves: ZSTAC-62608

Change-Id: I7279646b6d746a6e6f70726d6766727362747967
@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/cherry-pick-multi-imagecache-pool@@3 branch from e2b39e2 to f131bc6 Compare July 5, 2026 15:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant