KEP-5943: Topology For Volume Snapshots#5944
KEP-5943: Topology For Volume Snapshots#5944mdzraf wants to merge 3 commits intokubernetes:masterfrom
Conversation
mdzraf
commented
Mar 3, 2026
- One-line PR description: This PR adds a new KEP for Adding Topology Support for Snapshots
- Issue link: Topology For Volume Snapshots #5943
- Other comments:
a89a2eb to
cc5ad00
Compare
| - Implementing cross region/AZ snapshot cloning functionality. | ||
| - Add ability to modify any existing volume snapshot fields. | ||
| - Modifying the in-tree kube-scheduler or adding snapshot CRD dependencies to core Kubernetes components. | ||
| - Topology enforcement for `Immediate` volume binding mode. |
There was a problem hiding this comment.
What does volume binding mode mean for a snapshot?
There was a problem hiding this comment.
This more refers to the StorageClass's volume binding mode on the PVC being restored, not a property of the snapshot itself. Made the description more clear in the new rev.
Topology enforcement when restoring a snapshot to a PVC whose StorageClass uses `Immediate` volume binding mode (topology enforcement only applies to `WaitForFirstConsumer`).
There was a problem hiding this comment.
Even with Immediate binding mode, couldn't we still look at the dataSource's topology?
There was a problem hiding this comment.
Even with Immediate binding mode, couldn't we still look at the dataSource's topology?
I thought about this but decided against it for consistency with how Immediate binding mode works today, where PersistentVolumes are bound or provisioned without knowledge of the Pod's scheduling requirements. By choosing Immediate, the administrator is already acknowledging this can result in unschedulable pods, and the documented recommendation to avoid that is to use WaitForFirstConsumer. Treating snapshot-sourced PVCs differently would be inconsistent with that existing contract. Happy to revisit this if you think there is a reason to figure out a mechanism to enforce topology on immediate binding mode.
|
|
||
| The scheduler's built-in `volumebinding` plugin has no awareness of snapshot topology today. It only considers the StorageClass's `allowedTopologies` and the node's own constraints. This means it can select a node in `foo-bar-1` even though the snapshot source is only accessible from `foo-bar-2`. | ||
|
|
||
| Attempting to solve this at the provisioner level (e.g., in the external-provisioner) would only allow us to fail after the scheduler has already made its decision. The provisioner could reject the request and trigger a reschedule, but this creates a retry loop: the scheduler picks a node, provisioning fails, the scheduler picks another node, and so on with no guarantee of convergence. |
There was a problem hiding this comment.
Other than a new scheduler plugin, do we still need to make changes in the external-provisioner?
There was a problem hiding this comment.
Since by the point that we get past the scheduler plugin we have already selected a node and the provisioner already derives the topology from the compatible selected node, there is no need to changes to the external-provisioner as the existing behavior will continue to work.
5de6d69 to
7c0348e
Compare
|
/remove-sig apps |
|
/assign @msau42 |
| // This field is immutable. | ||
| // +optional | ||
| // +kubebuilder:validation:XValidation:rule="self == oldSelf",message="accessibleTopology is immutable" | ||
| AccessibleTopology []TopologySegment `json:"accessibleTopology,omitempty" protobuf:"bytes,7,rep,name=accessibleTopology"` |
There was a problem hiding this comment.
Should we consider using NodeAffinity to align with PV? It also makes it more clear that we use this in the scheduler by comparing with Node topology.
There was a problem hiding this comment.
I think the name AccessibleTopology is best since snapshots aren't node-mounted like PVs, but I changed it to []v1.TopologySelectorTerm (mirroring StorageClass.AllowedTopologies) so the scheduler can use the same label-expression matching against
node topology. Is that okay ? It's kind of a middle ground
7c0348e to
703c8f0
Compare
|
[APPROVALNOTIFIER] This PR is NOT APPROVED This pull-request has been approved by: mdzraf The full list of commands accepted by this bot can be found here. DetailsNeeds approval from an approver in each of these files:Approvers can indicate their approval by writing |