Background
In 2021 KNative made it so that the ValidatingWebhookConfiguration and MutatingWebhookConfiguration resources were "owned" by the Namespace which KNative is installed into (kanative-serving by default):
This was intended to ensure that users did not leave the webhooks when uninstalling (via deleting the Namespace) and break KNative when they re-installed (because the webhooks are cluster resources, and their backend would not exist and so would fail to validate anything).
Kubernetes has the concept of ownerReferences to indicate the relationships between resources. If an ownerReference sets blockOwnerDeletion, kubernetes will clean up these "child" resources before/after the "parent" resources is deleted (before: foreground delete, after: background delete).
For example, a Pod owned by a ReplicaSet might have the following ownerReferences:
apiVersion: v1
kind: Pod
metadata:
name: xxxxxx
namespace: xxxxxx
ownerReferences:
- apiVersion: apps/v1
blockOwnerDeletion: true
controller: true
kind: ReplicaSet
name: xxxxxx-759d8cb89
uid: xxxxxxxx-xxxx-xxxx-xxxx-xxxxxxxxxxxx
Whats the problem?
This breaks ArgoCD, a very widely used GitOps system for Kubernetes.
Specifically, ArgoCD will never remove a resource that has ownerReferences set, so the issue we were trying to prevent actually happens 100% of the time when deploying KNative with ArgoCD.
Here are some related upstream issues:
What's the solution?
I propose we make two changes:
- Add a config which will disable the behavior of setting the ownerReferences:
- For example, an environment variable like
KNATIVE_DISABLE_WEBHOOK_OWNER which can be set on all controller pods.
- When we do set ownerReferences, we should NOT be setting
controller=true:
- This is much less likely to break downstream projects (obviously the Namespace is not the controlling resource). It also allows KNative to work with the upstream patch for ArgoCD that ignores non-controller ownerReferences.
Where is the relevant code?
The code which sets the ownerReferences lives in the knative-pkg libraries:
Background
In 2021 KNative made it so that the
ValidatingWebhookConfigurationandMutatingWebhookConfigurationresources were "owned" by the Namespace which KNative is installed into (kanative-servingby default):This was intended to ensure that users did not leave the webhooks when uninstalling (via deleting the Namespace) and break KNative when they re-installed (because the webhooks are cluster resources, and their backend would not exist and so would fail to validate anything).
Kubernetes has the concept of ownerReferences to indicate the relationships between resources. If an ownerReference sets
blockOwnerDeletion, kubernetes will clean up these "child" resources before/after the "parent" resources is deleted (before: foreground delete, after: background delete).For example, a Pod owned by a ReplicaSet might have the following ownerReferences:
Whats the problem?
This breaks ArgoCD, a very widely used GitOps system for Kubernetes.
Specifically, ArgoCD will never remove a resource that has
ownerReferencesset, so the issue we were trying to prevent actually happens 100% of the time when deploying KNative with ArgoCD.Here are some related upstream issues:
OwnerReferencenot deleted when removed from Helm Chart argoproj/argo-cd#4764controllerflag in owner references argoproj/argo-cd#12210What's the solution?
I propose we make two changes:
KNATIVE_DISABLE_WEBHOOK_OWNERwhich can be set on all controller pods.controller=true:Where is the relevant code?
The code which sets the ownerReferences lives in the
knative-pkglibraries:webhook/resourcesemantics/validation/reconcile_config.gowebhook/configmaps/configmaps.gowebhook/resourcesemantics/defaulting/defaulting.go