Skip to content

<fix>[kvm]: limit hot-plug volume to 20 to prevent PCI slot exhaustion (ZSTAC-76384)#3625

Open
zstack-robot-2 wants to merge 1 commit into5.5.12from
sync/ye.zou/fix/ZSTAC-76384
Open

<fix>[kvm]: limit hot-plug volume to 20 to prevent PCI slot exhaustion (ZSTAC-76384)#3625
zstack-robot-2 wants to merge 1 commit into5.5.12from
sync/ye.zou/fix/ZSTAC-76384

Conversation

@zstack-robot-2
Copy link
Copy Markdown
Collaborator

Problem

When continuously attaching data volumes to a running VM backed by SharedBlock storage, the 20th attachment fails with a PCI slot exhaustion error from libvirt ("No more available PCI slots").

Root cause: i440fx VMs have 32 PCI slots. With ~12 slots consumed by system devices (VGA, USB, NIC, balloon, virtio-serial, root disk, etc.) and 1 slot reserved by libvirt for PCI bridge during hot-plug, only ~19 slots remain for data volumes. SharedBlock volumes use virtio-blk (one PCI slot per volume), so the 20th hot-plug attempt fails at the hypervisor level.

The software limit (MAX_DATA_VOLUME_NUM) is 24 for all cases, which exceeds what libvirt can actually support during hot-plug on SharedBlock storage.

Fix

  • Add getMaxDataVolumeNumberForRunningVm() to MaxDataVolumeNumberExtensionPoint interface (default returns same as getMaxDataVolumeNumber() for backward compatibility)
  • Override in KVMHostFactory to cap hot-plug at KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 20
  • In VolumeBase.vmAttachVolumeValidator, apply the stricter limit for Running/Paused VMs (hot-plug) while keeping the full limit (24) for stopped VMs (cold-plug)

This prevents the confusing hypervisor-level failure and instead returns a clear "maximum data volumes reached" error before attempting the libvirt call.

sync from gitlab !9483

Resolves: ZSTAC-76384

Change-Id: I5ea30c85171f43d2739bb9e2ef636c42063dacce
@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Mar 27, 2026

概览

该变更引入了针对运行中虚拟机的数据卷最大数量限制的差异化管理机制,通过新增默认接口方法、常量定义和实现方法,在虚拟机热插拔数据卷时应用更严格的限制(20个PCI插槽约束)。

变更说明

组织 / 文件 摘要
扩展点接口
header/src/main/java/org/zstack/header/volume/MaxDataVolumeNumberExtensionPoint.java
新增默认方法 getMaxDataVolumeNumberForRunningVm(),默认委托给 getMaxDataVolumeNumber(),保持向后兼容性。
常量定义
plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java
新增公开常量 KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 20,表示运行中虚拟机热插拔支持的最大数据卷数量。
KVM 实现
plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java
新增方法 getMaxDataVolumeNumberForRunningVm(),返回配置上限与热插拔上限的较小值。
数据卷验证器更新
storage/src/main/java/org/zstack/storage/volume/VolumeBase.java
修改 vmAttachVolumeValidator 方法,根据虚拟机运行状态(运行/暂停或其他)选择不同的数据卷上限规则。

序列图

sequenceDiagram
    participant Client as 调用方
    participant Validator as VolumeBase<br/>vmAttachVolumeValidator
    participant VMState as 虚拟机<br/>状态检查
    participant ExtPoint as MaxDataVolumeNumber<br/>ExtensionPoint
    participant KVMImpl as KVMHostFactory<br/>实现类

    Client->>Validator: 请求挂载数据卷
    Validator->>VMState: 获取虚拟机状态
    alt VM运行中或暂停
        VMState-->>Validator: Running/Paused
        Validator->>ExtPoint: 调用 getMaxDataVolumeNumberForRunningVm()
        ExtPoint->>KVMImpl: 委托给KVM实现
        KVMImpl-->>ExtPoint: 返回 min(配置上限, 20)
        ExtPoint-->>Validator: 返回热插拔上限
    else 其他状态
        VMState-->>Validator: 其他状态
        Validator->>ExtPoint: 调用 getMaxDataVolumeNumber()
        ExtPoint->>KVMImpl: 委托给KVM实现
        KVMImpl-->>ExtPoint: 返回配置上限
        ExtPoint-->>Validator: 返回配置上限
    end
    Validator->>Validator: 验证数据卷数量
    Validator-->>Client: 验证结果
Loading

预估代码审查工作量

🎯 3 (中等) | ⏱️ ~20 分钟

诗歌

🐰 数据卷的梦想在远方,
热插拔受限二十张,
状态分岔细心量,
接口优雅有方向,
兼容性保护周全长!


Important

Pre-merge checks failed

Please resolve all errors before merging. Addressing warnings is optional.

❌ Failed checks (1 error, 1 warning)

Check name Status Explanation Resolution
Title check ❌ Error Pull request title exceeds the 72 character limit at 84 characters, violating the format requirement. Shorten the title to 72 characters or less while maintaining the required format. Consider: '[kvm]: limit hot-plug volume to 20 to prevent PCI slot exhaustion'
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (1 passed)
Check name Status Explanation
Description check ✅ Passed The pull request description is directly related to the changeset, providing detailed problem statement, root cause analysis, and clear explanation of the implemented fix.
✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/ye.zou/fix/ZSTAC-76384

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🧹 Nitpick comments (1)
header/src/main/java/org/zstack/header/volume/MaxDataVolumeNumberExtensionPoint.java (1)

10-12: 为新扩展点方法补充 Javadoc 说明

该默认方法是外部实现方会直接继承的接口契约,建议补充 Javadoc,明确“运行态/暂停态”语义与回退行为,避免误用。

✍️ 建议补充
+    /**
+     * Maximum data volume number for running/paused VM (hot-plug path).
+     * Default behavior keeps backward compatibility by reusing the general limit.
+     */
     default int getMaxDataVolumeNumberForRunningVm() {
         return getMaxDataVolumeNumber();
     }

As per coding guidelines: “接口方法不应有多余的修饰符(例如 public),且必须配有有效的 Javadoc 注释。”

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In
`@header/src/main/java/org/zstack/header/volume/MaxDataVolumeNumberExtensionPoint.java`
around lines 10 - 12, Add a Javadoc comment to the default method
getMaxDataVolumeNumberForRunningVm() in the MaxDataVolumeNumberExtensionPoint
interface that clearly states it applies to VMs in running or paused state,
describes that implementations may override this to provide a different limit
for running/paused VMs, and documents the fallback behavior that the default
implementation delegates to getMaxDataVolumeNumber(); ensure the Javadoc is
placed immediately above getMaxDataVolumeNumberForRunningVm() and uses the
interface method names (getMaxDataVolumeNumberForRunningVm and
getMaxDataVolumeNumber) to make the contract explicit for implementers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java`:
- Around line 163-167: The constant KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG is off
by one versus the validator logic in VolumeBase.vmAttachVolumeValidator (which
uses the check vmDataVolumeUsage + 1 > maxDataVolumeNum), so adjust the limit to
prevent allowing the 20th hot-plug: change KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG
from 20 to 19 (or alternatively change the validator's comparison semantics),
ensure references to KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG remain consistent, and
run relevant unit/integration tests around vmAttachVolumeValidator and hot-plug
flows to confirm the 20th attach is rejected.

---

Nitpick comments:
In
`@header/src/main/java/org/zstack/header/volume/MaxDataVolumeNumberExtensionPoint.java`:
- Around line 10-12: Add a Javadoc comment to the default method
getMaxDataVolumeNumberForRunningVm() in the MaxDataVolumeNumberExtensionPoint
interface that clearly states it applies to VMs in running or paused state,
describes that implementations may override this to provide a different limit
for running/paused VMs, and documents the fallback behavior that the default
implementation delegates to getMaxDataVolumeNumber(); ensure the Javadoc is
placed immediately above getMaxDataVolumeNumberForRunningVm() and uses the
interface method names (getMaxDataVolumeNumberForRunningVm and
getMaxDataVolumeNumber) to make the contract explicit for implementers.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: eb0fe53e-52a6-4c68-b161-0a940cd8021b

📥 Commits

Reviewing files that changed from the base of the PR and between c4c23a6 and d847623.

📒 Files selected for processing (4)
  • header/src/main/java/org/zstack/header/volume/MaxDataVolumeNumberExtensionPoint.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java
  • plugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.java
  • storage/src/main/java/org/zstack/storage/volume/VolumeBase.java

Comment on lines +163 to +167
// Maximum data volumes for hot-plug (running VM) due to PCI bus slot constraints.
// i440fx provides 32 PCI slots; ~12 are reserved by system devices and libvirt PCI bridge,
// leaving ~20 slots for hot-plugged virtio-blk data volumes (SharedBlock storage scenario).
int KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 20;

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG 可能有一位偏差,无法拦截第 20 块热插场景

当前上限设为 20,而 VolumeBase.vmAttachVolumeValidator 的判断是 vmDataVolumeUsage + 1 > maxDataVolumeNum。这会放行第 20 次挂载请求,和“第 20 次触发 libvirt PCI 槽位不足”的问题描述不一致,可能导致修复不生效。建议将上限调整为 19(或同步调整判断语义)。

🛠 建议修复
-    // leaving ~20 slots for hot-plugged virtio-blk data volumes (SharedBlock storage scenario).
-    int KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 20;
+    // leaving ~19 slots for hot-plugged virtio-blk data volumes (SharedBlock storage scenario).
+    int KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 19;
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
// Maximum data volumes for hot-plug (running VM) due to PCI bus slot constraints.
// i440fx provides 32 PCI slots; ~12 are reserved by system devices and libvirt PCI bridge,
// leaving ~20 slots for hot-plugged virtio-blk data volumes (SharedBlock storage scenario).
int KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 20;
// Maximum data volumes for hot-plug (running VM) due to PCI bus slot constraints.
// i440fx provides 32 PCI slots; ~12 are reserved by system devices and libvirt PCI bridge,
// leaving ~19 slots for hot-plugged virtio-blk data volumes (SharedBlock storage scenario).
int KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 19;
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java` around lines 163 -
167, The constant KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG is off by one versus the
validator logic in VolumeBase.vmAttachVolumeValidator (which uses the check
vmDataVolumeUsage + 1 > maxDataVolumeNum), so adjust the limit to prevent
allowing the 20th hot-plug: change KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG from 20
to 19 (or alternatively change the validator's comparison semantics), ensure
references to KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG remain consistent, and run
relevant unit/integration tests around vmAttachVolumeValidator and hot-plug
flows to confirm the 20th attach is rejected.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants