Skip to content

Fix/ssh keepalive and firewall build net#3

Open
gabbelitoV2 wants to merge 2 commits into
ConvoyPanel:mainfrom
gabbelitoV2:fix/ssh-keepalive-and-firewall-build-net
Open

Fix/ssh keepalive and firewall build net#3
gabbelitoV2 wants to merge 2 commits into
ConvoyPanel:mainfrom
gabbelitoV2:fix/ssh-keepalive-and-firewall-build-net

Conversation

@gabbelitoV2

Copy link
Copy Markdown
Contributor

No description provided.

…ng the build

cf SSH invocations had no keepalive, so a long quiet remote step (e.g. a large artifact upload in CF_UPLOAD_CMD) could leave packer streaming SSH waiting forever on a half-open connection, hanging the build with nothing running on the node. Add ServerAliveInterval=15 / ServerAliveCountMax=6 to every ssh call so idle sessions stay alive and a dead peer is detected within ~90s.
…when enabled

On nodes with the datacenter or host firewall enabled, PVEFW-INPUT drops the build VM connection to the packer HTTP server (preseed/kickstart) on vmbr1, so installer builds hang at Waiting for SSH. Add a bootstrap step that detects an enabled pve-firewall and adds a host rule allowing the build subnet in. A pve-firewall host rule (not a post-up iptables rule) survives reboots and firewall reloads. No-op when the firewall is disabled.
@ericwang401

Copy link
Copy Markdown
Contributor

I read some of the code but can you elaborate on what circumstances necessitates this change? I guess specifically talk about how Proxmox firewall rules prevent the original code from working.

@gabbelitoV2

Copy link
Copy Markdown
Contributor Author

I read some of the code but can you elaborate on what circumstances necessitates this change? I guess specifically talk about how Proxmox firewall rules prevent the original code from working.

Every installer recipe (Debian/Alma/Rocky/Ubuntu/Windows) boots the VM from an ISO, and the OS installer pulls its answer file — preseed.cfg / kickstart ks.cfg / autounattend.xml — from Packer's built-in HTTP server. Packer serves that on the build host, and the VM reaches it over the cofoundry NAT bridge vmbr1: VM 10.0.0.x → host 10.0.0.1:<packer_http_port>. That fetch is the very first thing every installer build needs.

On a default Proxmox install the firewall is off, so the host's iptables INPUT policy accepts that connection and it just works — which is why this never shows up for most people.

When the datacenter firewall is enabled (Datacenter → Firewall → Options → Enable, i.e. /etc/pve/firewall/cluster.fw enable: 1), Proxmox inserts its PVEFW-INPUT chain into INPUT. Inbound host traffic is now governed by the firewall ruleset, which does not permit arbitrary traffic from vmbr1 by default — so PVEFW-INPUT silently drops the VM's SYN to the Packer HTTP port. The installer can never fetch its preseed/kickstart, the install never starts, and the build hangs forever at Waiting for SSH to become available... (Packer waiting for an OS that never finishes installing).

So it's specifically the VM → host preseed/kickstart fetch on the build bridge that gets dropped — not Packer's API/SSH traffic. It only bites on nodes with the firewall on (production/hosted clusters); mine is a 3-node cluster with cluster.fw enable:

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants