<fix>[kvm]: limit hot-plug volume to 20 to prevent PCI slot exhaustion (ZSTAC-76384)#3625
<fix>[kvm]: limit hot-plug volume to 20 to prevent PCI slot exhaustion (ZSTAC-76384)#3625zstack-robot-2 wants to merge 1 commit into5.5.12from
Conversation
Resolves: ZSTAC-76384 Change-Id: I5ea30c85171f43d2739bb9e2ef636c42063dacce
概览该变更引入了针对运行中虚拟机的数据卷最大数量限制的差异化管理机制,通过新增默认接口方法、常量定义和实现方法,在虚拟机热插拔数据卷时应用更严格的限制(20个PCI插槽约束)。 变更说明
序列图sequenceDiagram
participant Client as 调用方
participant Validator as VolumeBase<br/>vmAttachVolumeValidator
participant VMState as 虚拟机<br/>状态检查
participant ExtPoint as MaxDataVolumeNumber<br/>ExtensionPoint
participant KVMImpl as KVMHostFactory<br/>实现类
Client->>Validator: 请求挂载数据卷
Validator->>VMState: 获取虚拟机状态
alt VM运行中或暂停
VMState-->>Validator: Running/Paused
Validator->>ExtPoint: 调用 getMaxDataVolumeNumberForRunningVm()
ExtPoint->>KVMImpl: 委托给KVM实现
KVMImpl-->>ExtPoint: 返回 min(配置上限, 20)
ExtPoint-->>Validator: 返回热插拔上限
else 其他状态
VMState-->>Validator: 其他状态
Validator->>ExtPoint: 调用 getMaxDataVolumeNumber()
ExtPoint->>KVMImpl: 委托给KVM实现
KVMImpl-->>ExtPoint: 返回配置上限
ExtPoint-->>Validator: 返回配置上限
end
Validator->>Validator: 验证数据卷数量
Validator-->>Client: 验证结果
预估代码审查工作量🎯 3 (中等) | ⏱️ ~20 分钟 诗歌
Important Pre-merge checks failedPlease resolve all errors before merging. Addressing warnings is optional. ❌ Failed checks (1 error, 1 warning)
✅ Passed checks (1 passed)
✨ Finishing Touches📝 Generate docstrings
🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 1
🧹 Nitpick comments (1)
header/src/main/java/org/zstack/header/volume/MaxDataVolumeNumberExtensionPoint.java (1)
10-12: 为新扩展点方法补充 Javadoc 说明该默认方法是外部实现方会直接继承的接口契约,建议补充 Javadoc,明确“运行态/暂停态”语义与回退行为,避免误用。
✍️ 建议补充
+ /** + * Maximum data volume number for running/paused VM (hot-plug path). + * Default behavior keeps backward compatibility by reusing the general limit. + */ default int getMaxDataVolumeNumberForRunningVm() { return getMaxDataVolumeNumber(); }As per coding guidelines: “接口方法不应有多余的修饰符(例如
public),且必须配有有效的 Javadoc 注释。”🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@header/src/main/java/org/zstack/header/volume/MaxDataVolumeNumberExtensionPoint.java` around lines 10 - 12, Add a Javadoc comment to the default method getMaxDataVolumeNumberForRunningVm() in the MaxDataVolumeNumberExtensionPoint interface that clearly states it applies to VMs in running or paused state, describes that implementations may override this to provide a different limit for running/paused VMs, and documents the fallback behavior that the default implementation delegates to getMaxDataVolumeNumber(); ensure the Javadoc is placed immediately above getMaxDataVolumeNumberForRunningVm() and uses the interface method names (getMaxDataVolumeNumberForRunningVm and getMaxDataVolumeNumber) to make the contract explicit for implementers.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java`:
- Around line 163-167: The constant KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG is off
by one versus the validator logic in VolumeBase.vmAttachVolumeValidator (which
uses the check vmDataVolumeUsage + 1 > maxDataVolumeNum), so adjust the limit to
prevent allowing the 20th hot-plug: change KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG
from 20 to 19 (or alternatively change the validator's comparison semantics),
ensure references to KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG remain consistent, and
run relevant unit/integration tests around vmAttachVolumeValidator and hot-plug
flows to confirm the 20th attach is rejected.
---
Nitpick comments:
In
`@header/src/main/java/org/zstack/header/volume/MaxDataVolumeNumberExtensionPoint.java`:
- Around line 10-12: Add a Javadoc comment to the default method
getMaxDataVolumeNumberForRunningVm() in the MaxDataVolumeNumberExtensionPoint
interface that clearly states it applies to VMs in running or paused state,
describes that implementations may override this to provide a different limit
for running/paused VMs, and documents the fallback behavior that the default
implementation delegates to getMaxDataVolumeNumber(); ensure the Javadoc is
placed immediately above getMaxDataVolumeNumberForRunningVm() and uses the
interface method names (getMaxDataVolumeNumberForRunningVm and
getMaxDataVolumeNumber) to make the contract explicit for implementers.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)
Review profile: CHILL
Plan: Pro
Run ID: eb0fe53e-52a6-4c68-b161-0a940cd8021b
📒 Files selected for processing (4)
header/src/main/java/org/zstack/header/volume/MaxDataVolumeNumberExtensionPoint.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.javaplugin/kvm/src/main/java/org/zstack/kvm/KVMHostFactory.javastorage/src/main/java/org/zstack/storage/volume/VolumeBase.java
| // Maximum data volumes for hot-plug (running VM) due to PCI bus slot constraints. | ||
| // i440fx provides 32 PCI slots; ~12 are reserved by system devices and libvirt PCI bridge, | ||
| // leaving ~20 slots for hot-plugged virtio-blk data volumes (SharedBlock storage scenario). | ||
| int KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 20; | ||
|
|
There was a problem hiding this comment.
KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG 可能有一位偏差,无法拦截第 20 块热插场景
当前上限设为 20,而 VolumeBase.vmAttachVolumeValidator 的判断是 vmDataVolumeUsage + 1 > maxDataVolumeNum。这会放行第 20 次挂载请求,和“第 20 次触发 libvirt PCI 槽位不足”的问题描述不一致,可能导致修复不生效。建议将上限调整为 19(或同步调整判断语义)。
🛠 建议修复
- // leaving ~20 slots for hot-plugged virtio-blk data volumes (SharedBlock storage scenario).
- int KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 20;
+ // leaving ~19 slots for hot-plugged virtio-blk data volumes (SharedBlock storage scenario).
+ int KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 19;📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // Maximum data volumes for hot-plug (running VM) due to PCI bus slot constraints. | |
| // i440fx provides 32 PCI slots; ~12 are reserved by system devices and libvirt PCI bridge, | |
| // leaving ~20 slots for hot-plugged virtio-blk data volumes (SharedBlock storage scenario). | |
| int KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 20; | |
| // Maximum data volumes for hot-plug (running VM) due to PCI bus slot constraints. | |
| // i440fx provides 32 PCI slots; ~12 are reserved by system devices and libvirt PCI bridge, | |
| // leaving ~19 slots for hot-plugged virtio-blk data volumes (SharedBlock storage scenario). | |
| int KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 19; |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@plugin/kvm/src/main/java/org/zstack/kvm/KVMConstant.java` around lines 163 -
167, The constant KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG is off by one versus the
validator logic in VolumeBase.vmAttachVolumeValidator (which uses the check
vmDataVolumeUsage + 1 > maxDataVolumeNum), so adjust the limit to prevent
allowing the 20th hot-plug: change KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG from 20
to 19 (or alternatively change the validator's comparison semantics), ensure
references to KVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG remain consistent, and run
relevant unit/integration tests around vmAttachVolumeValidator and hot-plug
flows to confirm the 20th attach is rejected.
Problem
When continuously attaching data volumes to a running VM backed by SharedBlock storage, the 20th attachment fails with a PCI slot exhaustion error from libvirt ("No more available PCI slots").
Root cause: i440fx VMs have 32 PCI slots. With ~12 slots consumed by system devices (VGA, USB, NIC, balloon, virtio-serial, root disk, etc.) and 1 slot reserved by libvirt for PCI bridge during hot-plug, only ~19 slots remain for data volumes. SharedBlock volumes use virtio-blk (one PCI slot per volume), so the 20th hot-plug attempt fails at the hypervisor level.
The software limit (
MAX_DATA_VOLUME_NUM) is 24 for all cases, which exceeds what libvirt can actually support during hot-plug on SharedBlock storage.Fix
getMaxDataVolumeNumberForRunningVm()toMaxDataVolumeNumberExtensionPointinterface (default returns same asgetMaxDataVolumeNumber()for backward compatibility)KVMHostFactoryto cap hot-plug atKVM_MAX_DATA_VOLUME_NUMBER_HOT_PLUG = 20VolumeBase.vmAttachVolumeValidator, apply the stricter limit for Running/Paused VMs (hot-plug) while keeping the full limit (24) for stopped VMs (cold-plug)This prevents the confusing hypervisor-level failure and instead returns a clear "maximum data volumes reached" error before attempting the libvirt call.
sync from gitlab !9483