Environment
- OS: Ubuntu 26.04, kernel 7.0.0-22-generic
- Driver: nvidia-open 580.159.03
- GPU: 2× RTX 2080 Ti 22GB + NVLink
- CPU: i5-12400, 48GB RAM
Behavior
System hard-crashes (kernel panic → kexec/kdump → reboot) when GPU transitions from heavy load to idle. Reproduced 3 times in 24 hours, always at the same point: after vLLM inference server finishes processing and GPU utilization drops from 100% to 0%.
No Xid errors, no coredump, no kernel log entries before crash. /var/crash/ contains only kexec_cmd and kdump_lock (0 bytes — dump not generated).
Related
Similar report with same driver version on RTX 5070: pop-os/cosmic-comp#2341
Workaround
Switching to closed-source nvidia-driver-580-server resolves the issue (suggests bug is in the open kernel module, not the driver layer).
Environment
Behavior
System hard-crashes (kernel panic → kexec/kdump → reboot) when GPU transitions from heavy load to idle. Reproduced 3 times in 24 hours, always at the same point: after vLLM inference server finishes processing and GPU utilization drops from 100% to 0%.
No Xid errors, no coredump, no kernel log entries before crash.
/var/crash/contains only kexec_cmd and kdump_lock (0 bytes — dump not generated).Related
Similar report with same driver version on RTX 5070: pop-os/cosmic-comp#2341
Workaround
Switching to closed-source nvidia-driver-580-server resolves the issue (suggests bug is in the open kernel module, not the driver layer).