Skip to content

<fix>[vm]: verify dest state in migrate flow#4009

Open
MatheMatrix wants to merge 1 commit into
5.5.22from
sync/shan.wu/fix/5.5.22/ZSTAC-83894
Open

<fix>[vm]: verify dest state in migrate flow#4009
MatheMatrix wants to merge 1 commit into
5.5.22from
sync/shan.wu/fix/5.5.22/ZSTAC-83894

Conversation

@MatheMatrix
Copy link
Copy Markdown
Owner

Root Cause

When the hypervisor migration call reports failure, the VM may already be running on the destination host. Treating that response as a hard migration failure can roll back management state even though the data plane migration has completed.

Solution

  • In VmMigrateOnHypervisorFlow, check the VM state on the destination host when the hypervisor migration call fails.
  • If the destination host reports Running, continue the normal migration flow with chain.next().
  • If the destination host reports any other state or the state check fails, keep the original failure path with chain.fail(originalError).
  • Keep migration success cleanup on the existing normal flow path.

Test

  • git diff --check
  • Remote integration case on root@172.20.13.237:/root/zstack-workspace/zstack:
    mvn -q -Dtest=org.zstack.test.integration.kvm.KvmTest -DsubCaseCollectionStrategy=Designated -DcaseFilePath=/tmp/zstac83894-main-cases.txt -DsurefireArgLine="-noverify" test
  • Result: Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

sync from gitlab !9902

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 18, 2026

Review Change Stack

No actionable comments were generated in the recent review. 🎉

ℹ️ Recent review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 0cbe8afa-0b7d-4c49-99bb-ea08c7c6e3a5

📥 Commits

Reviewing files that changed from the base of the PR and between c394379 and 6736c47.

📒 Files selected for processing (4)
  • compute/src/main/java/org/zstack/compute/vm/VmMigrateOnHypervisorFlow.java
  • test/src/test/groovy/org/zstack/test/integration/kvm/vm/VmLastHostUuidCase.groovy
  • test/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/MigrateVmFailureCheckTargetHostCase.groovy
  • test/src/test/groovy/org/zstack/test/integration/storage/primary/local_nfs/MaintainHostMultiTypePsCase.groovy
🚧 Files skipped from review as they are similar to previous changes (2)
  • compute/src/main/java/org/zstack/compute/vm/VmMigrateOnHypervisorFlow.java
  • test/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/MigrateVmFailureCheckTargetHostCase.groovy

Walkthrough

当迁移请求到目标主机失败时,新增逻辑向目标主机发送 CheckVmStateOnHypervisorMsg 查询 VM 状态;仅当目标主机返回 VM 为 Running 时继续迁移流程,否则以原始迁移错误结束。新增集成测试覆盖回滚与成功场景及相关 mocks。

Changes

VM 迁移故障恢复与状态验证

Layer / File(s) Summary
迁移故障状态验证实现
compute/src/main/java/org/zstack/compute/vm/VmMigrateOnHypervisorFlow.java
添加 CheckVmStateOnHypervisorMsg/Reply 导入和 CollectionDSL.list 静态导入;在迁移目标服务回调中先判定取消错误,若为迁移失败错误码则调用 checkVmStateOnDestinationHost();该方法向目的主机查询 VM 状态并仅在返回 Running 时调用 chain.next(),否则以原迁移错误失败。
集成测试框架与 Mock 设置
test/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/MigrateVmFailureCheckTargetHostCase.groovy (1-130, 178-213), test/src/test/groovy/org/zstack/test/integration/storage/primary/local_nfs/MaintainHostMultiTypePsCase.groovy
新增 MigrateVmFailureCheckTargetHostCase 测试类,构建 EnvSpec 并实现 mockMigrateVmFailure()(模拟 KVM 迁移 API 返回错误)与 mockVmState()(拦截 CheckVmStateOnHypervisorMsg,按入参返回各主机 VM 状态并收集 checkedHosts);在另一路测试中新增 KVM_VM_CHECK_STATE 模拟器以返回特定 VM 的 Shutdown/Running 状态。
测试场景:回滚与成功迁移
test/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/MigrateVmFailureCheckTargetHostCase.groovy (131-176)
两个用例:当目标主机报告 VM 非 Running 时断言迁移回滚且 VM 保留在源主机;当目标主机报告 VM Running 时断言迁移成功并更新 hostUuid,并验证 checkedHosts 内容。

Sequence Diagram

sequenceDiagram
  participant Flow as VmMigrateOnHypervisorFlow
  participant CloudBus as CloudBus
  participant DestHost as 目标主机

  Flow->>DestHost: MigrateVmOnHypervisorMsg (迁移请求)
  DestHost-->>Flow: 错误响应(迁移失败)
  Flow->>CloudBus: CheckVmStateOnHypervisorMsg (请求 vmInstanceUuids)
  CloudBus->>DestHost: 转发查询请求
  DestHost-->>CloudBus: 返回 VM 状态映射
  CloudBus-->>Flow: CheckVmStateOnHypervisorReply (states)
  alt states[vmUuid] == Running
    Flow->>Flow: chain.next() (继续迁移流程)
  else
    Flow->>Flow: chain.fail(migrateError) (以原错误结束)
  end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

诗歌

迁移路途遇波折,兔子不急忙,
先问彼岸可安康,若已运行便放行,
若非彼方心未定,归还故里莫慌张,
模拟与测皆相随,检查名单一一详,
代码走通测试好,兔儿鼓掌喜洋洋。 🐰✨

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 0.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Title check ✅ Passed 标题完全关联主要变更:检查目标主机上的VM状态以处理迁移失败场景,符合 [scope]: 格式且不超过72字符。
Description check ✅ Passed 描述与变更集相关,明确说明根本原因、解决方案、测试方法和结果,与代码变更内容一致。
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch sync/shan.wu/fix/5.5.22/ZSTAC-83894

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@compute/src/main/java/org/zstack/compute/vm/VmMigrateOnHypervisorFlow.java`:
- Around line 87-89: The code reads the VM state without null checks which can
NPE when states is absent; update the block in VmMigrateOnHypervisorFlow to
null-check CheckVmStateOnHypervisorReply#getStates() and the looked-up state
(r.getStates() and r.getStates().get(spec.getVmInventory().getUuid())) before
comparing to VmInstanceState.Running.toString(), and if either is null treat it
as a failure path and invoke chain.fail(migrateError) (keep the existing
migrateError handling and logging).
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Path: http://open.zstack.ai:20001/code-reviews/zstack-cloud.yaml (via .coderabbit.yaml)

Review profile: CHILL

Plan: Pro

Run ID: 2bc0dd2d-1b7e-4ffb-b185-e5bba819b173

📥 Commits

Reviewing files that changed from the base of the PR and between f83c595 and 1e49d9d.

📒 Files selected for processing (2)
  • compute/src/main/java/org/zstack/compute/vm/VmMigrateOnHypervisorFlow.java
  • test/src/test/groovy/org/zstack/test/integration/kvm/vm/migrate/MigrateVmFailureCheckTargetHostCase.groovy

@zstack-robot-2
Copy link
Copy Markdown
Collaborator

Comment on compute/src/main/java/org/zstack/compute/vm/VmMigrateOnHypervisorFlow.java:

Comment from shan.wu:

Fixed in c39437913e.

CheckVmStateOnHypervisorReply#getStates() is now checked for null/empty before reading the VM state. Missing state data falls back to chain.fail(migrateError), preserving the original migration error path.

Validation:

  • git diff --check
  • Remote integration case: Tests run: 1, Failures: 0, Errors: 0, Skipped: 0

@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/fix/5.5.22/ZSTAC-83894 branch 3 times, most recently from 7d78c9f to 08f52b3 Compare May 18, 2026 08:59
Check the destination host when the hypervisor migration call fails.

If the VM is Running there, continue the normal migration flow.
That lets DB sync and post hooks run through the standard path.
Otherwise fail the flow and keep the rollback behavior.

Resolves: ZSTAC-83894

Change-Id: I8b4774a405fc3b1c05d21b6742facd26bc8d03e6
@MatheMatrix MatheMatrix force-pushed the sync/shan.wu/fix/5.5.22/ZSTAC-83894 branch from 08f52b3 to 6736c47 Compare May 18, 2026 11:32
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants