Fix/ssh keepalive and firewall build net#3
Conversation
…ng the build cf SSH invocations had no keepalive, so a long quiet remote step (e.g. a large artifact upload in CF_UPLOAD_CMD) could leave packer streaming SSH waiting forever on a half-open connection, hanging the build with nothing running on the node. Add ServerAliveInterval=15 / ServerAliveCountMax=6 to every ssh call so idle sessions stay alive and a dead peer is detected within ~90s.
…when enabled On nodes with the datacenter or host firewall enabled, PVEFW-INPUT drops the build VM connection to the packer HTTP server (preseed/kickstart) on vmbr1, so installer builds hang at Waiting for SSH. Add a bootstrap step that detects an enabled pve-firewall and adds a host rule allowing the build subnet in. A pve-firewall host rule (not a post-up iptables rule) survives reboots and firewall reloads. No-op when the firewall is disabled.
|
I read some of the code but can you elaborate on what circumstances necessitates this change? I guess specifically talk about how Proxmox firewall rules prevent the original code from working. |
Every installer recipe (Debian/Alma/Rocky/Ubuntu/Windows) boots the VM from an ISO, and the OS installer pulls its answer file — preseed.cfg / kickstart ks.cfg / autounattend.xml — from Packer's built-in HTTP server. Packer serves that on the build host, and the VM reaches it over the cofoundry NAT bridge vmbr1: VM 10.0.0.x → host 10.0.0.1:<packer_http_port>. That fetch is the very first thing every installer build needs. On a default Proxmox install the firewall is off, so the host's iptables INPUT policy accepts that connection and it just works — which is why this never shows up for most people. When the datacenter firewall is enabled (Datacenter → Firewall → Options → Enable, i.e. /etc/pve/firewall/cluster.fw enable: 1), Proxmox inserts its PVEFW-INPUT chain into INPUT. Inbound host traffic is now governed by the firewall ruleset, which does not permit arbitrary traffic from vmbr1 by default — so PVEFW-INPUT silently drops the VM's SYN to the Packer HTTP port. The installer can never fetch its preseed/kickstart, the install never starts, and the build hangs forever at Waiting for SSH to become available... (Packer waiting for an OS that never finishes installing). So it's specifically the VM → host preseed/kickstart fetch on the build bridge that gets dropped — not Packer's API/SSH traffic. It only bites on nodes with the firewall on (production/hosted clusters); mine is a 3-node cluster with cluster.fw enable: |
No description provided.