Skip to content

<fix>[storage]: ZSV-12579 avoid rbd cli for ceph encryption#4448

Open
MatheMatrix wants to merge 1 commit into
feature-zsv-5.1.0-encryptionfrom
sync/zstackio/codex/ceph-kvm-no-rbd-cli
Open

<fix>[storage]: ZSV-12579 avoid rbd cli for ceph encryption#4448
MatheMatrix wants to merge 1 commit into
feature-zsv-5.1.0-encryptionfrom
sync/zstackio/codex/ceph-kvm-no-rbd-cli

Conversation

@MatheMatrix

Copy link
Copy Markdown
Owner

Summary

Move Ceph volume-encryption replacement away from KVM-side rbd mv/rm. KVM now only performs qemu-img IO and writes encrypted bits to the target install path; Java switches installPath after success when needed.

Changes

  • Add target/returned installPath support to EncryptVolumeBits messages.
  • Let Ceph encrypt-in-place return a new target path and clean it on failure.
  • Keep conversion writing directly to the prepared target path.

Testing

  • git diff --check
  • Verified KVM Ceph plugin diff has no new raw rbd CLI mv/rm/du/info usage
  • CI pipeline

Resolves: ZSV-12579

sync from gitlab !10410

@coderabbitai

coderabbitai Bot commented Jul 3, 2026

Copy link
Copy Markdown

Review Change Stack

Warning

Review limit reached

You’ve reached a temporary PR review limit under our Fair Usage Limits Policy.

Your recent review volume is higher than typical usage, so adaptive limits are currently applied.

Next review available in: 59 minutes

Enable usage-based reviews in Billing to review now. Otherwise, wait until the next included review is available.
You're only billed for reviews past your plan's rate limits ($0.25/file).

How can I continue?

After more reviews become available, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

To avoid repeated limits, reduce automatic review volume by pausing incremental auto-reviews earlier, using label-based review opt-in, excluding WIP or generated PR titles, or requesting reviews manually when the PR is ready. If your team needs uninterrupted high-volume reviews, an organization admin can enable usage-based reviews.

How do review limits work?

CodeRabbit enforces per-developer PR review limits for each organization. Most developers receive the normal plan review availability.

For paid Pro and Pro+ PR reviews, CodeRabbit uses adaptive limits for sustained high-volume activity. When a developer's recent PR review activity reaches the 95th percentile or higher among CodeRabbit users, additional reviews become available more gradually as earlier reviews age out of the rolling window.

Please refer docs for additional details.

Review details
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: 7712d03d-947a-486b-a163-8592104b03bb

📥 Commits

Reviewing files that changed from the base of the PR and between 579a3f2 and 72b9588.

📒 Files selected for processing (1)
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java

Walkthrough

本次改动为卷原地加密流程新增“目标安装路径”支持:EncryptVolumeBitsOnPrimaryStorageMsg/Reply 新增路径字段;Ceph 主存储实现据此下发加密命令,失败时清理已生成的 RBD 数据;卷侧根据返回路径更新 VolumeVO.installPath 并清理旧 bits。

Changes

卷原地加密路径变更

Layer / File(s) Summary
加密消息与回复新增路径字段
header/src/main/java/org/zstack/header/storage/primary/EncryptVolumeBitsOnPrimaryStorageMsg.java, header/src/main/java/org/zstack/header/storage/primary/EncryptVolumeBitsOnPrimaryStorageReply.java
消息新增 targetInstallPath 字段及 getter/setter,回复新增 installPath 字段及 getter/setter,用于携带目标/最终安装路径。
Ceph 主存储加密与失败清理实现
plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
KVMHostEncryptInPlaceCmd/KVMHostLuksRsp 新增路径字段;handle 方法根据 targetInstallPath 下发命令,调用失败时执行 deleteRbdBitsBestEffort 清理,调用成功时用返回路径回填 reply。
卷侧安装路径更新与旧 bits 清理
storage/src/main/java/org/zstack/storage/volume/VolumeInPlaceEncryptor.java
在加密成功回调中根据新旧 installPath 是否变化更新 VolumeVO,并新增 deleteOldVolumeBitsBestEffort 方法清理旧路径下的卷 bits。

Estimated code review effort: 3 (Moderate) | ~25 minutes

Sequence Diagram(s)

sequenceDiagram
  participant VolumeInPlaceEncryptor
  participant CephPrimaryStorageBase
  participant KVMHost

  VolumeInPlaceEncryptor->>CephPrimaryStorageBase: EncryptVolumeBitsOnPrimaryStorageMsg(targetInstallPath)
  CephPrimaryStorageBase->>KVMHost: httpCallToKvmHost(targetInstallPath)
  alt 调用失败
    KVMHost-->>CephPrimaryStorageBase: 错误响应
    CephPrimaryStorageBase->>CephPrimaryStorageBase: deleteRbdBitsBestEffort(targetInstallPath)
    CephPrimaryStorageBase-->>VolumeInPlaceEncryptor: 返回原错误
  else 调用成功
    KVMHost-->>CephPrimaryStorageBase: KVMHostLuksRsp(installPath)
    CephPrimaryStorageBase-->>VolumeInPlaceEncryptor: reply.installPath
    VolumeInPlaceEncryptor->>VolumeInPlaceEncryptor: 更新 VolumeVO.installPath / encrypted
    VolumeInPlaceEncryptor->>CephPrimaryStorageBase: deleteOldVolumeBitsBestEffort(旧路径)
  end
Loading

Poem

兔子搬家挖新洞,
旧洞若空悄悄拆,
加密路径记心间,
失败也要收拾干净,
蹦跳一声,代码上线啦 🐇✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Title check ✅ Passed 标题准确概括了本次 Ceph 加密流程移除 rbd CLI 依赖的核心变更。
Description check ✅ Passed 描述与变更内容一致,说明了目标安装路径、失败清理和安装路径更新等主要改动。
Docstring Coverage ✅ Passed No functions found in the changed files to evaluate docstring coverage. Skipping docstring coverage check.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/zstackio/codex/ceph-kvm-no-rbd-cli

Comment @coderabbitai help to get the list of available commands.

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/codex/ceph-kvm-no-rbd-cli branch from 579a3f2 to f5dbe01 Compare July 3, 2026 13:50

@coderabbitai coderabbitai Bot left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@storage/src/main/java/org/zstack/storage/volume/VolumeInPlaceEncryptor.java`:
- Around line 289-298: The delete-old-bits flow in deleteOldVolumeBitsBestEffort
is missing the volume size and any failure handling. Update the
DeleteVolumeBitsOnPrimaryStorageMsg construction to include the current volume
size from VolumeVO via setSize(volume.getSize()), and send it with a failure
callback or equivalent logging so bus.send(dmsg) reports cleanup errors instead
of failing silently.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Repository UI

Review profile: CHILL

Plan: Pro

Run ID: eff7b002-628e-414a-b60d-a68f37743daf

📥 Commits

Reviewing files that changed from the base of the PR and between 45e799e and 579a3f2.

📒 Files selected for processing (4)
  • header/src/main/java/org/zstack/header/storage/primary/EncryptVolumeBitsOnPrimaryStorageMsg.java
  • header/src/main/java/org/zstack/header/storage/primary/EncryptVolumeBitsOnPrimaryStorageReply.java
  • plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java
  • storage/src/main/java/org/zstack/storage/volume/VolumeInPlaceEncryptor.java

Comment on lines +289 to +298
private void deleteOldVolumeBitsBestEffort(String psUuid, VolumeVO volume, String oldInstallPath) {
DeleteVolumeBitsOnPrimaryStorageMsg dmsg = new DeleteVolumeBitsOnPrimaryStorageMsg();
dmsg.setPrimaryStorageUuid(psUuid);
dmsg.setInstallPath(oldInstallPath);
dmsg.setBitsUuid(volume.getUuid());
dmsg.setBitsType(VolumeVO.class.getSimpleName());
dmsg.setHypervisorType(VolumeFormat.getMasterHypervisorTypeByVolumeFormat(volume.getFormat()).toString());
bus.makeTargetServiceIdByResourceUuid(dmsg, PrimaryStorageConstant.SERVICE_ID, psUuid);
bus.send(dmsg);
}

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🗄️ Data Integrity & Integration | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
# 确认 DeleteVolumeBitsOnPrimaryStorageMsg.size 字段类型/默认值
rg -n -A3 -B3 'class DeleteVolumeBitsOnPrimaryStorageMsg' --type=java
rg -n 'setSize|private.*size' --type=java -g '*DeleteVolumeBitsOnPrimaryStorageMsg.java'

Repository: MatheMatrix/zstack

Length of output: 156


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 1) 定位相关类与调用点
git ls-files | rg 'VolumeInPlaceEncryptor\.java|DeleteVolumeBitsOnPrimaryStorageMsg\.java|CephPrimaryStorageBase\.java|cleanTrash|cleanUpTrash|deleteRbdBitsBestEffort'

# 2) 读取相关文件的结构/行号
for f in \
  storage/src/main/java/org/zstack/storage/volume/VolumeInPlaceEncryptor.java \
  plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java \
  $(git ls-files | rg 'DeleteVolumeBitsOnPrimaryStorageMsg\.java$'); do
  [ -f "$f" ] && echo "===== $f =====" && wc -l "$f" && ast-grep outline "$f" --view expanded || true
done

Repository: MatheMatrix/zstack

Length of output: 34751


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 直接读取相关片段,确认 size 字段、默认值、发送模式和失败处理
for f in \
  $(git ls-files | rg 'DeleteVolumeBitsOnPrimaryStorageMsg\.java$') \
  storage/src/main/java/org/zstack/storage/volume/VolumeInPlaceEncryptor.java \
  plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java; do
  if [ -f "$f" ]; then
    echo "===== $f ====="
    nl -ba "$f" | sed -n '1,240p'
  fi
done

Repository: MatheMatrix/zstack

Length of output: 305


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 读取消息定义与调用/处理路径的关键片段
echo "===== DeleteVolumeBitsOnPrimaryStorageMsg ====="
sed -n '1,130p' header/src/main/java/org/zstack/header/storage/primary/DeleteVolumeBitsOnPrimaryStorageMsg.java

echo "===== VolumeInPlaceEncryptor.deleteOldVolumeBitsBestEffort ====="
sed -n '280,315p' storage/src/main/java/org/zstack/storage/volume/VolumeInPlaceEncryptor.java

echo "===== CephPrimaryStorageBase.deleteVolumeBitsOnPrimaryStorage ====="
sed -n '3478,3518p' plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java

echo "===== CephPrimaryStorageBase.deleteRbdBitsBestEffort ====="
sed -n '3788,3832p' plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java

Repository: MatheMatrix/zstack

Length of output: 8439


🏁 Script executed:

#!/bin/bash
set -euo pipefail

# 查找同类 best-effort 删除是否已有回调/日志模式
rg -n -A6 -B6 'bus\.send\(.*new CloudBusCallBack|logger\.(warn|info|debug)\(' \
  storage/src/main/java/org/zstack/storage/volume/VolumeInPlaceEncryptor.java \
  plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java

Repository: MatheMatrix/zstack

Length of output: 50374


补上删除消息的 size,并加失败日志
storage/src/main/java/org/zstack/storage/volume/VolumeInPlaceEncryptor.java:289 这里构造 DeleteVolumeBitsOnPrimaryStorageMsg 时漏了 setSize(volume.getSize());下游会用这个字段做容量回收,默认 0 会导致旧 bits 删除后容量不回收。bus.send(dmsg) 也没有失败回调,清理失败会静默留下旧明文 bits。

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@storage/src/main/java/org/zstack/storage/volume/VolumeInPlaceEncryptor.java`
around lines 289 - 298, The delete-old-bits flow in
deleteOldVolumeBitsBestEffort is missing the volume size and any failure
handling. Update the DeleteVolumeBitsOnPrimaryStorageMsg construction to include
the current volume size from VolumeVO via setSize(volume.getSize()), and send it
with a failure callback or equivalent logging so bus.send(dmsg) reports cleanup
errors instead of failing silently.

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/codex/ceph-kvm-no-rbd-cli branch 5 times, most recently from a23c592 to f6cf8f6 Compare July 4, 2026 07:31
@ZStack-Robot

Copy link
Copy Markdown
Collaborator

Comment from yaohua.wu:

Review: MR !10410 — ZSV-12579

Background (preserved across rounds)

  • Jira: ZSV-12579 — 【后端】【磁盘加密】ZBS适配
  • Bug summary: 本组 MR 将 Ceph LUKS 加密流程从 KVM 侧直接执行 RBD 元数据替换,调整为 KVM 只用 qemu-img 写入临时/目标 RBD,Java 与 ceph primary-storage agent 负责最终 RBD rename、swap 与 actual size 查询。目标是避免 KVM 侧依赖 raw rbd mv/rm/du/info 语义。
  • Intent & scope: zstack 侧准备临时/目标 installPath、调用新增 cephagent endpoint,并把 actual size 查询迁回 primary-storage agent;zstack-utility 侧实现 KVM convert/encrypt 与 cephagent replace/swap。两个 MR 必须一起合并。
  • Round 1 initial findings: 1 × Critical, 1 × Warning,集中在 Ceph LUKS swap 失败语义和 conversion 临时 RBD 清理路径。
  • Suggested fix direction: 已完成 RBD replacement 后的旧 bits 清理应降为 best-effort,不应让上层回滚加密元数据;conversion 的 *-converting-* 临时 RBD 需要在 KVM convert 或 cephagent replace 任一失败路径清理。

关联 MR

MR 关系
zstack-utility !7427 KVM agent 写入 targetInstallPath,cephagent 实现 /luksreplace/luksswapinplace;本 MR 依赖这些 endpoint 的成功/失败语义。

🔴 Critical

# File:Line Issue Fix Reviewer(s) Conf Route
1 plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java:3582 Cross-repo: Java 当前把 /luksswapinplace 的任何失败都当作“加密未完成”返回给 VolumeInPlaceEncryptor。但 utility MR 的 cephagent.py:1174rbd mv tmp_path src_path 已成功后,仍会因 rbd rm old_path 失败抛错。结果是原路径上的 RBD 已经变成 LUKS,Java 却返回失败并回滚 key-provider/metadata,DB 仍可能保持 encrypted=false,后续按 raw 盘使用会读坏数据或在重试时二次加密。 zstack-utility !7427 中把 swap committed 之后的 old_path 删除改成 best-effort:rbd mv tmp_path src_path 成功后即保证 endpoint 返回 success,rbd rm old_path 失败只记录 warning/垃圾项;或者为 cephagent response 增加 committed 状态,让 Java 在 committed=true 时仍持久化 encrypted=true correctness, adversarial 0.92 gated_auto → downstream-resolver

🟡 Warning

# File:Line Issue Fix Reviewer(s) Conf Route
1 plugin/ceph/src/main/java/org/zstack/storage/ceph/primary/CephPrimaryStorageBase.java:3640 ConvertVolumeEncryptionOnPrimaryStorage 现在让 KVM 写入 makeTemporaryLuksRbdInstallPath(..., "converting"),但 KVM convert 失败时只回 reply.setError,没有清理该临时 RBD;replace endpoint 失败时 :3664 也没有清理。VolumeBase rollback 只删除最终 item.getTargetInstallPath(),覆盖不到 *-converting-*,会留下孤儿 RBD 和容量占用。 在 KVM convert 的 fail 分支和 LUKS_REPLACE_VOLUME_PATH 的 fail 分支都调用 deleteRbdBitsBestEffort(kcmd.targetInstallPath) 后再回复;保持 encrypt-in-place 路径 :3564 的同一清理模式。 robustness, correctness 0.86 gated_auto → downstream-resolver

Coverage

  • Local authoritative diff: zstack 1 file kept / 1 total, generated-file filter skipped 0.
  • MCP diff cross-check: file list matches local diff.
  • Upstream freshness: target feature-zsv-5.1.0-encryption has 0 commits ahead of source at review time; MR reports merge_status=can_be_merged.
  • Cross-repo checked: Java targetInstallPath / cephagent temporaryInstallPath contract, KVM response fields, replace/swap failure semantics.
  • Testing gaps: CI pipeline 未完成;缺少覆盖 rbd mv tmp->src 成功但 rbd rm old 失败、以及 KVM convert 成功后 replace 失败时临时 RBD 清理的集成/故障注入测试。

Verdict: REVISION_REQUIRED

需要先修复 1 个会造成 DB 加密状态与真实 RBD 格式不一致的 Critical;Warning 建议同轮补上,避免失败重试产生孤儿 RBD。


🤖 Robot Reviewer

@MatheMatrix MatheMatrix force-pushed the sync/zstackio/codex/ceph-kvm-no-rbd-cli branch from f6cf8f6 to f680ae0 Compare July 4, 2026 08:09
Resolves: ZSV-12469

Change-Id: I5ed4f9f00e700e4818a5c440b6dc5b04edee9c79
@MatheMatrix MatheMatrix force-pushed the sync/zstackio/codex/ceph-kvm-no-rbd-cli branch from f680ae0 to 72b9588 Compare July 4, 2026 09:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants