fix(windows): register k8s-restart-job in NodePrep to avoid PIS bootstrap race#8535
fix(windows): register k8s-restart-job in NodePrep to avoid PIS bootstrap race#8535r2k1 wants to merge 2 commits into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR moves the registration of the k8s-restart-job scheduled task from the BasePrep phase to the NodePrep phase in the Windows CSE script. The task uses an -AtStartup trigger, and registering it during BasePrep (which runs at VHD bake time in the Prepared Image Specification / two-phase flow) caused it to fire on the first boot before NodePrep could remove the temporary kubeconfig embedding the shared nodeclient certificate. As a result, kubelet would silently skip the proper TLS bootstrap-token CSR exchange and never register with the API server. By registering the task only after the temporary kubeconfig is removed, the race is eliminated for PIS-baked Windows VHDs while preserving existing behavior for non-PIS pools (which run both phases in a single CSE invocation).
Changes:
- Remove
Register-NodeResetScriptTaskfromBasePrepand replace it with an explanatory comment. - Register
Register-NodeResetScriptTaskinNodePrepimmediately after the temporaryc:\k\configcleanup block.
…trap race When Windows VHDs are baked or staged via Prepared Image Specification (PIS), CSE runs in two phases: BasePrep at bake time and NodePrep at node provisioning time. Register-NodeResetScriptTask was registered in BasePrep with an -AtStartup trigger, so on the first boot from a baked VHD it fired before NodePrep got a chance to remove the temporary kubeconfig at `c:\k\config`. That kubeconfig embeds the cluster-shared "nodeclient" client certificate. As a result, kubelet started up, read the embedded certificate, persisted it under `c:\k\pki`, and skipped the standard token-based TLS bootstrap. With `CN=nodeclient` (not `CN=system:node:<hostname>`) the API server's NodeAuthorizer rejected every kubelet request — "User \"nodeclient\" cannot create certificatesigningrequests" — and the node never registered. The agentpool ARM operation succeeded, so the failure was silent. Repro / verification (NZ Sandbox sub, Windows2022, K8s 1.34.7): - Pool WITH PIS: cert subject CN=nodeclient,O=system:nodes → never joined - Pool WITHOUT PIS: cert subject CN=system:node:<host> → Ready in ~30s Move Register-NodeResetScriptTask out of BasePrep and into NodePrep, after the temporary kubeconfig removal, so the scheduled task does not exist on a baked VHD and only fires on subsequent boots after the node has been properly provisioned. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
69eac1e to
0a5b677
Compare
| # Register AFTER temp kubeconfig removal: the -AtStartup trigger would | ||
| # otherwise race PIS-baked VHD first boot and bring kubelet up with the | ||
| # embedded "nodeclient" cert instead of doing TLS bootstrap. | ||
| Register-NodeResetScriptTask |
There was a problem hiding this comment.
Can we have a test for this - that the registration happens in nodeprop not baseprep?
There was a problem hiding this comment.
I added a validator. But I don't really like it.
I don't like testing "X doesn't supposed to happen". They are infinite amount of things that doesn't suppose to happen ...
There was a problem hiding this comment.
Replaced it with a proper e2e. BootstrapTLS enabled vs disabled
Existing Test_Windows2022_VHDCaching runs with secure TLS bootstrap enabled, which masks bugs in the legacy bootstrap-token path on Windows PIS / two-stage CSE provisioning. This scenario forces kubelet onto the legacy path so stage 2's WaitUntilNodeReady catches regressions there. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
439330a to
0dd4090
Compare
| // to use the legacy bootstrap-token path. Catches regressions in the two-stage | ||
| // CSE flow that only surface when no secure-tls-bootstrap client is around to | ||
| // overwrite the temporary kubeconfig. | ||
| func Test_Windows2022_VHDCaching_LegacyTLSBootstrap(t *testing.T) { |
There was a problem hiding this comment.
we already have the bootstrap token fallback scenarios which are designed to test a similar path - could we just add another one for the VHDCaching case instead?
What this PR does / why we need it:
Move
Register-NodeResetScriptTaskfromBasePreptoNodePrep(after the temporary kubeconfig removal) on Windows. This fixes a silent kubelet TLS bootstrap failure that hits Windows agentpools attached to a Prepared Image Specification (PIS) / pre-baked VHD path.Root cause
Windows CSE runs in two phases for PIS / baked VHDs:
BasePrep (bake time,
PreProvisionOnly=true):Write-KubeConfigwrites a temporaryc:\k\configwithclient-certificate-data: <clusterClientCertificate>embedded — the cluster-shared "nodeclient" cert produced by the AKS RP'scertificates_goal_resolver.go(Subject: CN=nodeclient, O=system:nodes).Register-NodeResetScriptTaskregistersk8s-restart-jobwith an-AtStartuptrigger.NodePrep (provision time):
c:\k\config(line ~579) if TLS bootstrap is enabled.Start-ScheduledTask k8s-restart-job.Because
k8s-restart-jobis-AtStartup, on the FIRST boot from the baked VHD the task fires beforeNodePrepremoves the temporary kubeconfig.windowsnodereset.ps1callsStart-Service kubelet. Kubelet readsc:\k\config, extracts the embeddednodeclientcertificate intoc:\k\pki\kubelet-client-<ts>.pem, never performs the proper bootstrap-token CSR exchange. Subsequent API server calls fail becauseCN=nodeclientdoesn't satisfy the NodeAuthorizer:The pool reports
Succeeded(CSE / extension level), so the failure is silent — the node never registers with the cluster.Reproduction (verified)
On the AKS New Zealand Sandbox subscription, cluster
aks-pis-testinaustraliaeast, K8s 1.34.7, Windows Server 2022:pislinuxCN=system:node:aks-pislinux-...piswinCN=nodeclientwinctlCN=system:node:akswinctl000000Timeline on the broken pool (from
C:\AzureData\CustomDataSetupScript.log+C:\k\windowsnodereset.log+C:\k\kubelet.err.log):Fix
Move the task registration out of
BasePrepand intoNodePrep, after the temporary kubeconfig has been removed. The scheduled task does not exist on a baked VHD, so the first boot does not race withNodePrepand kubelet only starts via the explicitStart-ScheduledTaskcall at the end ofNodePrep— at which point the temporary kubeconfig with the embedded cluster client cert is gone and kubelet correctly falls back to the bootstrap-token path inc:\k\bootstrap-config.Risk / blast radius
-AtStartup,windowsnodereset.ps1restarts kubelet/kubeproxy as before.Which issue(s) this PR fixes:
Fixes #