Summary:
When starting a microVM with a higher vCPU count (e.g., 16 vCPUs) on a dual-socket NUMA host, the guest occasionally hangs during SMP initialization.
During the hang, some Firecracker vCPU threads remain blocked in futex_wait_queue, and the guest kernel does not complete bringing up all secondary CPUs.
The issue occurs randomly during VM startup and has been observed when running Firecracker via jailer.
Firecracker Version:
Firecracker v1.13.1
Environment:
host kernel version:6.1.23
guest kernel version:5.4.116(The issue persists with 5.10.245 also ,when tried)
Architecture: x86_64
CPU op-mode(s): 32-bit, 64-bit
Address sizes: 46 bits physical, 48 bits virtual
Byte Order: Little Endian
CPU(s): 32
On-line CPU(s) list: 0-31
Vendor ID: GenuineIntel
Model name: Intel(R) Xeon(R) Silver 4110 CPU @ 2.10GHz
CPU family: 6
Model: 85
Thread(s) per core: 2
Core(s) per socket: 8
Socket(s): 2
Stepping: 4
BogoMIPS: 4200.00
Virtualization features:
Virtualization: VT-x
Caches (sum of all):
L1d: 512 KiB (16 instances)
L1i: 512 KiB (16 instances)
L2: 16 MiB (16 instances)
L3: 22 MiB (2 instances)
NUMA:
NUMA node(s): 2
NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
Vulnerabilities:
Gather data sampling: Mitigation; Microcode
Itlb multihit: KVM: Mitigation: Split huge pages
L1tf: Mitigation; PTE Inversion; VMX conditional cache flushes, SMT vulnerable
Mds: Mitigation; Clear CPU buffers; SMT vulnerable
Meltdown: Mitigation; PTI
Mmio stale data: Mitigation; Clear CPU buffers; SMT vulnerable
Reg file data sampling: Not affected
Retbleed: Mitigation; IBRS
Spec rstack overflow: Not affected
Spec store bypass: Mitigation; Speculative Store Bypass disabled via prctl
Spectre v1: Mitigation; usercopy/swapgs barriers and __user pointer sanitization
Spectre v2: Mitigation; IBRS; IBPB conditional; STIBP conditional; RSB filling; PBRSB-eIBRS Not affected; BHI Not affected
Srbds: Not affected
Tsx async abort: Mitigation; Clear CPU buffers; SMT vulnerable
VM Configuration:
"machine-config": {
"vcpu_count": 16,
"mem_size_mib": 72817,
"smt": false
}
Steps to Reproduce:
Run Firecracker using jailer on a dual-socket NUMA host.
Configure the microVM with 16 vCPUs.
Boot a Linux guest kernel.
Observe that VM startup occasionally hangs during SMP initialization.
The issue does not occur consistently, but appears randomly when starting the VM.
Guest Kernel Logs:
During the failure the guest kernel stops while bringing up secondary CPUs:
[ 1.745873] smp: Bringing up secondary CPUs ...
[ 1.746812] x86: Booting SMP configuration:
The boot process does not proceed further.
Firecracker Thread State:
During the hang, Firecracker shows all vCPU threads created:
PID SPID TTY TIME CMD
2358731 2358731 pts/4 00:00:01 firecracker
2358731 2358743 pts/4 00:00:01 fc_vcpu 0
2358731 2358744 pts/4 00:00:00 fc_vcpu 1
2358731 2358745 pts/4 00:00:00 fc_vcpu 2
2358731 2358746 pts/4 00:00:00 fc_vcpu 3
2358731 2358747 pts/4 00:00:00 fc_vcpu 4
2358731 2358748 pts/4 00:00:00 fc_vcpu 5
2358731 2358749 pts/4 00:00:00 fc_vcpu 6
2358731 2358750 pts/4 00:00:00 fc_vcpu 7
2358731 2358751 pts/4 00:00:00 fc_vcpu 8
2358731 2358752 pts/4 00:00:00 fc_vcpu 9
2358731 2358753 pts/4 00:00:00 fc_vcpu 10
2358731 2358754 pts/4 00:00:00 fc_vcpu 11
2358731 2358755 pts/4 00:00:00 fc_vcpu 12
2358731 2358756 pts/4 00:00:00 fc_vcpu 13
2358731 2358757 pts/4 00:00:00 fc_vcpu 14
2358731 2358758 pts/4 00:00:00 fc_vcpu 15
Stuck vCPU Thread Stack:
Inspecting a stuck vCPU thread shows it blocked in a futex wait
[<0>] futex_wait_queue+0x60/0x90
[<0>] futex_wait+0x185/0x270
[<0>] do_futex+0x106/0x1b0
[<0>] __x64_sys_futex+0x8e/0x1d0
[<0>] do_syscall_64+0x55/0xb0
[<0>] entry_SYSCALL_64_after_hwframe+0x6e/0xd8
Expected Behavior
The microVM should boot normally and the guest OS should complete SMP initialization and start executing the init process.
Summary:
Firecracker Version:
Firecracker v1.13.1
Environment:
host kernel version:6.1.23
VM Configuration:
"machine-config": {
"vcpu_count": 16,
"mem_size_mib": 72817,
"smt": false
}
Steps to Reproduce:
Run Firecracker using jailer on a dual-socket NUMA host.
Guest Kernel Logs:
During the failure the guest kernel stops while bringing up secondary CPUs:
Firecracker Thread State:
Stuck vCPU Thread Stack:
Expected Behavior