test/storm: shared WaitForLogin SSH fallback for reboot waits#695
Draft
bfjelds wants to merge 1 commit into
Draft
test/storm: shared WaitForLogin SSH fallback for reboot waits#695bfjelds wants to merge 1 commit into
bfjelds wants to merge 1 commit into
Conversation
Introduce stormvm.WaitForLoginWithSshFallback, a single helper for waiting on a VM to return after a reboot. On QEMU it waits for the serial "login:" prompt and, if that times out, falls back to confirming the reboot over SSH by comparing "uptime --since" before and after. This tolerates the known serial-getty udev race (systemd#10850, ~2% of boots) where dev-ttyS0.device is skipped so serial-getty never starts even though the VM is healthy. On genuine failure it captures a screenshot and scans the serial log for dracut/initramfs symptoms (CheckSerialLogForDracutIssues, bug 15086). It also proactively (re)starts serial-getty@ttyS0 so later boots are detected via the serial log, and handles the Azure platform via SSH liveness polling. Consolidate both test suites onto the shared helper: - rollback tests (helper.go): the update, rollback, and split-rollback reboot paths now call WaitForLoginWithSshFallback instead of QemuConfig.WaitForLogin. - servicing tests (update.go): the finalize-reboot path now calls the shared helper, removing ~90 lines of inline SSH-fallback logic and the duplicated local checkSerialLogForDracutIssues. Also remove the stale local serial.log accumulator before each WaitForLogin (qemu.go) so every saved NNN-serial.log contains only that boot rather than accumulating output from all prior iterations. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Introduces a single shared helper,
stormvm.WaitForLoginWithSshFallback, for waiting on a VM to return after a reboot, and consolidates both the rollback and servicing test suites onto it.On QEMU it waits for the serial
login:prompt and, if that times out, falls back to confirming the reboot over SSH by comparinguptime --sincebefore and after. This tolerates the known serial-getty udev race (systemd#10850, ~2% of boots) wheredev-ttyS0.deviceis skipped so serial-getty never starts even though the VM is healthy. On genuine failure it captures a screenshot and scans the serial log for dracut/initramfs symptoms (CheckSerialLogForDracutIssues, bug 15086). It also proactively (re)startsserial-getty@ttyS0so later boots are detected via the serial log, and handles the Azure platform via SSH liveness polling.Changes
tools/storm/utils/vm/wait_login.go(new): sharedWaitForLoginWithSshFallbackandCheckSerialLogForDracutIssues.tools/storm/rollback/tests/helper.go: the update, rollback, and split-rollback reboot paths call the shared helper instead ofQemuConfig.WaitForLogin.tools/storm/servicing/tests/update.go: the finalize-reboot path calls the shared helper, removing ~90 lines of inline SSH-fallback logic and the duplicated localcheckSerialLogForDracutIssues.tools/storm/utils/vm/qemu/qemu.go: remove the stale localserial.logaccumulator before eachWaitForLoginso each savedNNN-serial.logcontains only that boot.Notes
i%10periodic-reboot path inupdate.gostill usesRebootQemuVmwith a lightweight SSH liveness check. It cannot call the shared helper without an import cycle (RebootQemuVmis in packageqemu, which packagevmimports), and it bundles libvirt reboot+wait rather than duplicating the fallback.Validation
go build ./...andgo vetpass for the affected packages (one pre-existing%w-in-logrus.Errorfvet warning is unrelated to this change).