-
Notifications
You must be signed in to change notification settings - Fork 69
Backup registry auth secret must not be owned by any workspace #1631
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Changes from all commits
d39fda0
925f3bb
14e54eb
7189cc3
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,97 @@ | ||
| # Backup Auth Secret Must Not Be Owned by Any Workspace | ||
|
|
||
| **Status**: Accepted | ||
| **Date**: 2026-05-11 | ||
| **Deciders**: DevWorkspace Operator maintainers | ||
| **Related Issue**: [CRW-10760](https://redhat.atlassian.net/browse/CRW-10760) | ||
|
|
||
| ## Context | ||
|
|
||
| The backup system copies the registry authentication secret (e.g., quay.io credentials) from the operator namespace into each workspace namespace as `devworkspace-backup-registry-auth`. The original implementation set a Kubernetes controller ownerReference from this secret to the DevWorkspace that triggered the copy: | ||
|
|
||
| ```go | ||
| controllerutil.SetControllerReference(workspace, desiredSecret, scheme) | ||
| ``` | ||
|
|
||
| This was likely intended to clean up the secret when no workspaces need it anymore — standard Kubernetes garbage collection pattern. | ||
|
|
||
| However, two properties of this secret make ownerReference-based lifecycle management incorrect: | ||
|
|
||
| 1. **The secret is a namespace singleton**: All workspaces in a namespace share the same `devworkspace-backup-registry-auth` secret, but a Kubernetes object can have only one controller owner. Whichever workspace's backup job ran last "wins" ownership. | ||
|
|
||
| 2. **The secret is needed after all workspaces are deleted**: The primary restore use case is creating a new workspace from a backup of a deleted one. If the auth secret is garbage-collected when the last workspace is deleted, the user cannot authenticate to the private registry to pull the backup image. | ||
|
|
||
| **Bug observed (CRW-10760)**: When using quay.io (private registry) for backups, deleting a workspace caused backup entries to disappear from the Dashboard backup list for ALL workspaces in the namespace. The auth secret was garbage-collected, and the Dashboard could no longer query the registry. | ||
|
|
||
| **Validated on CRC cluster** (DWO 0.40.1, quay.io/okurinny): | ||
| - The `devworkspace-backup-registry-auth` secret was confirmed to have ownerReference to a single workspace (`nodejs`) | ||
| - Deleting that workspace triggered K8s GC, removing the secret | ||
| - The secret was not re-created by subsequent backup cycles (remaining workspaces already had recent backups) | ||
| - Backup listing in the Dashboard failed for all workspaces | ||
|
|
||
| ## Decision | ||
|
|
||
| **Remove the ownerReference from the backup registry auth secret.** The secret becomes a namespace-scoped resource with no ownership tie to any workspace. | ||
|
|
||
| ### What Changes | ||
|
|
||
| - `pkg/secrets/backup.go`: Remove the `controllerutil.SetControllerReference()` call in `CopySecret()` | ||
| - The secret is still created via `c.Create()` with `AlreadyExists` handling, just without an ownerReference | ||
| - The restore path now resolves the operator namespace via `infrastructure.GetNamespace()` to copy the secret on demand when it is missing | ||
|
|
||
| ### What Doesn't Change | ||
|
|
||
| - Per-workspace resources (job runner ServiceAccount, image-builder RoleBinding) retain their ownerReferences — their GC on workspace deletion is correct and expected | ||
| - The backup Job itself retains its ownerReference (short-lived, TTL-cleaned) | ||
| - The `CopySecret` function signature stays the same | ||
|
|
||
| ## Considered Alternatives | ||
|
|
||
| ### Alternative 1: Multi-owner references (non-controller) | ||
|
|
||
| Add each workspace as a non-controller owner of the secret. K8s GC would only delete the secret when ALL owning workspaces are gone. | ||
|
|
||
| **Rejected because**: | ||
| - The secret must survive deletion of ALL workspaces (for restore) | ||
| - Adds complexity to track and merge owner lists | ||
| - Non-controller ownerReferences have subtle GC semantics | ||
|
|
||
| ### Alternative 2: Finalizer-based cleanup | ||
|
|
||
| Remove ownerReference but add a cleanup mechanism (e.g., a controller that deletes the secret when the namespace has zero DevWorkspaces). | ||
|
|
||
| **Rejected because**: | ||
| - Adds complexity for a marginal benefit (one small secret in an otherwise-empty namespace) | ||
| - Could race with restore operations (user creates a workspace from backup right after the last one is deleted) | ||
| - Namespace deletion already cleans up all resources | ||
|
|
||
| ### Alternative 3: Per-workspace auth secrets | ||
|
|
||
| Create a unique auth secret per workspace (e.g., `devworkspace-backup-registry-auth-{workspace-id}`). | ||
|
|
||
| **Rejected because**: | ||
| - Multiplies secrets unnecessarily (all contain the same credentials) | ||
| - The restore path expects the predefined name `devworkspace-backup-registry-auth` | ||
| - Still wouldn't survive workspace deletion for restore use case | ||
|
|
||
| ## Consequences | ||
|
|
||
| ### Positive | ||
|
|
||
| 1. **Backups survive workspace deletion**: Users can delete all workspaces and still restore from backups | ||
| 2. **No cross-workspace interference**: Deleting one workspace no longer affects other workspaces' backup capabilities | ||
| 3. **Simpler lifecycle**: No ownership tracking needed for a namespace-scoped singleton | ||
|
|
||
| ### Negative | ||
|
|
||
| 1. **Secret persists in empty namespaces**: If all workspaces are deleted and the user never restores, the auth secret remains until the namespace is deleted. This is a minor leak — one small secret per namespace. | ||
|
|
||
| ### Neutral | ||
|
|
||
| 1. **Existing secrets on upgraded clusters**: Secrets created by older DWO versions will retain their stale ownerReference until the next `CopySecret` call overwrites them (via `SyncObjectWithCluster`). In the worst case, one more GC event occurs before the fix takes effect. | ||
|
|
||
| ## References | ||
|
|
||
| - `pkg/secrets/backup.go` — `CopySecret()` function | ||
| - `controllers/backupcronjob/rbac.go` — Per-workspace SA/RoleBinding (unchanged) | ||
| - `pkg/constants/metadata.go:204` — `DevWorkspaceBackupAuthSecretName` |
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -22,13 +22,13 @@ import ( | |
| dw "github.com/devfile/api/v2/pkg/apis/workspaces/v1alpha2" | ||
| controllerv1alpha1 "github.com/devfile/devworkspace-operator/apis/controller/v1alpha1" | ||
| "github.com/devfile/devworkspace-operator/pkg/constants" | ||
| "github.com/devfile/devworkspace-operator/pkg/infrastructure" | ||
| "github.com/go-logr/logr" | ||
| corev1 "k8s.io/api/core/v1" | ||
| k8sErrors "k8s.io/apimachinery/pkg/api/errors" | ||
| metav1 "k8s.io/apimachinery/pkg/apis/meta/v1" | ||
| "k8s.io/apimachinery/pkg/runtime" | ||
| "sigs.k8s.io/controller-runtime/pkg/client" | ||
| "sigs.k8s.io/controller-runtime/pkg/controller/controllerutil" | ||
| ) | ||
|
|
||
| // GetRegistryAuthSecret retrieves the registry authentication secret for accessing backup images | ||
|
|
@@ -60,9 +60,12 @@ func HandleRegistryAuthSecret(ctx context.Context, c client.Client, workspace *d | |
| if client.IgnoreNotFound(err) != nil { | ||
| return nil, err | ||
| } | ||
| // If we don't provide an operator namespace, don't attempt to look there. | ||
| if operatorConfigNamespace == "" { | ||
| return nil, nil | ||
| resolvedNS, nsErr := infrastructure.GetNamespace() | ||
| if nsErr != nil { | ||
| return nil, fmt.Errorf("cannot resolve operator namespace to copy registry auth secret: %w", nsErr) | ||
| } | ||
| operatorConfigNamespace = resolvedNS | ||
| } | ||
|
Comment on lines
63
to
69
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Avoid resolving operator namespace before confirming auth is required. If Also applies to: 72-79 🤖 Prompt for AI Agents |
||
|
|
||
| // Check if AuthSecret is configured in operator config | ||
|
|
@@ -111,10 +114,6 @@ func CopySecret(ctx context.Context, c client.Client, workspace *dw.DevWorkspace | |
| Type: sourceSecret.Type, | ||
| } | ||
|
|
||
| if err := controllerutil.SetControllerReference(workspace, desiredSecret, scheme); err != nil { | ||
| return nil, err | ||
| } | ||
|
|
||
| err = c.Create(ctx, desiredSecret) | ||
| if err != nil { | ||
| if k8sErrors.IsAlreadyExists(err) { | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -18,6 +18,7 @@ package secrets_test | |
| import ( | ||
| "context" | ||
| "errors" | ||
| "os" | ||
| "testing" | ||
|
|
||
| . "github.com/onsi/ginkgo/v2" | ||
|
|
@@ -34,6 +35,7 @@ import ( | |
| "sigs.k8s.io/controller-runtime/pkg/log/zap" | ||
|
|
||
| "github.com/devfile/devworkspace-operator/pkg/constants" | ||
| "github.com/devfile/devworkspace-operator/pkg/infrastructure" | ||
| "github.com/devfile/devworkspace-operator/pkg/secrets" | ||
| ) | ||
|
|
||
|
|
@@ -115,14 +117,15 @@ var _ = Describe("HandleRegistryAuthSecret (restore path: operatorConfigNamespac | |
| Expect(result.Name).To(Equal(constants.DevWorkspaceBackupAuthSecretName)) | ||
| }) | ||
|
|
||
| It("returns nil when the predefined secret does not exist in the workspace namespace", func() { | ||
| By("using a fake client with no secrets") | ||
| It("returns error when secret is missing and operator namespace cannot be resolved", func() { | ||
| By("using a fake client with no secrets and no WATCH_NAMESPACE set") | ||
| fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build() | ||
| workspace := makeWorkspace(workspaceNS) | ||
| config := makeConfig("quay-backup-auth") | ||
|
|
||
| result, err := secrets.HandleRegistryAuthSecret(ctx, fakeClient, workspace, config, "", scheme, log) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| Expect(err).To(HaveOccurred()) | ||
| Expect(err.Error()).To(ContainSubstring("cannot resolve operator namespace")) | ||
| Expect(result).To(BeNil()) | ||
|
Comment on lines
+120
to
129
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Make this failure-path test independent of ambient This spec assumes the env var is unset but does not enforce it locally, so it can become environment-dependent and flaky. Explicitly control env state for this case. As per coding guidelines, "In test code, use 'infrastructure.InitializeForTesting()' to mock infrastructure type instead of relying on actual infrastructure detection". 🤖 Prompt for AI Agents |
||
| }) | ||
|
|
||
|
|
@@ -188,7 +191,7 @@ var _ = Describe("HandleRegistryAuthSecret (backup path: operatorConfigNamespace | |
| Expect(result).To(BeNil()) | ||
| }) | ||
|
|
||
| It("copies secret from operator namespace when AuthSecret is configured and secret not found in workspace namespace", func() { | ||
| It("copies secret from operator namespace without ownerReferences", func() { | ||
| By("creating a secret in the operator namespace") | ||
| operatorSecret := makeSecret(constants.DevWorkspaceBackupAuthSecretName, operatorNS) | ||
| operatorSecret.Data = map[string][]byte{"auth": []byte("operator-credentials")} | ||
|
|
@@ -216,12 +219,8 @@ var _ = Describe("HandleRegistryAuthSecret (backup path: operatorConfigNamespace | |
| By("verifying the copied secret has the watch-secret label") | ||
| Expect(copiedSecret.Labels).To(HaveKeyWithValue(constants.DevWorkspaceWatchSecretLabel, "true")) | ||
|
|
||
| By("verifying the copied secret has an owner reference to the workspace") | ||
| Expect(copiedSecret.OwnerReferences).To(HaveLen(1)) | ||
| Expect(copiedSecret.OwnerReferences[0].Name).To(Equal(workspace.Name)) | ||
| Expect(copiedSecret.OwnerReferences[0].Kind).To(Equal("DevWorkspace")) | ||
| Expect(copiedSecret.OwnerReferences[0].Controller).NotTo(BeNil()) | ||
| Expect(*copiedSecret.OwnerReferences[0].Controller).To(BeTrue()) | ||
| By("verifying the copied secret has no ownerReferences") | ||
| Expect(copiedSecret.OwnerReferences).To(BeEmpty()) | ||
| }) | ||
|
|
||
| It("NEVER overwrites user-provided secret even if operator has different credentials", func() { | ||
|
|
@@ -266,6 +265,59 @@ var _ = Describe("HandleRegistryAuthSecret (backup path: operatorConfigNamespace | |
| }) | ||
| }) | ||
|
|
||
| var _ = Describe("HandleRegistryAuthSecret (restore path: fallback to operator namespace)", func() { | ||
| const ( | ||
| workspaceNS = "user-namespace" | ||
| operatorNS = "operator-namespace" | ||
| ) | ||
|
|
||
| var ( | ||
| ctx context.Context | ||
| scheme *runtime.Scheme | ||
| log = zap.New(zap.UseDevMode(true)).WithName("SecretsTest") | ||
| origWatchNS string | ||
| hadWatchNS bool | ||
| ) | ||
|
|
||
| BeforeEach(func() { | ||
| ctx = context.Background() | ||
| scheme = buildScheme() | ||
| origWatchNS, hadWatchNS = os.LookupEnv(infrastructure.WatchNamespaceEnvVar) | ||
| os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS) | ||
| }) | ||
|
|
||
| AfterEach(func() { | ||
| if hadWatchNS { | ||
| os.Setenv(infrastructure.WatchNamespaceEnvVar, origWatchNS) | ||
| } else { | ||
| os.Unsetenv(infrastructure.WatchNamespaceEnvVar) | ||
| } | ||
| }) | ||
|
Comment on lines
+282
to
+295
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. 🧩 Analysis chain🏁 Script executed: #!/bin/bash
rg -nP 'os\.(Setenv|Unsetenv)\(' pkg/secrets/backup_test.go
rg -nP 'WatchNamespaceEnvVar|BeforeEach|AfterEach' pkg/secrets/backup_test.goRepository: devfile/devworkspace-operator Length of output: 437 🏁 Script executed: cd pkg/secrets && sed -n '279,290p' backup_test.goRepository: devfile/devworkspace-operator Length of output: 448 Handle errors from Lines 282 and 286 ignore errors returned by 🧰 Tools🪛 golangci-lint (2.12.2)[error] 282-282: Error return value of (errcheck) [error] 286-286: Error return value of (errcheck) 🤖 Prompt for AI Agents |
||
|
|
||
| It("copies the secret from operator namespace when missing in workspace namespace", func() { | ||
| By("creating the auth secret only in the operator namespace") | ||
| operatorSecret := makeSecret("quay-backup-auth", operatorNS) | ||
|
|
||
| fakeClient := fake.NewClientBuilder().WithScheme(scheme).WithObjects(operatorSecret).Build() | ||
| workspace := makeWorkspace(workspaceNS) | ||
| config := makeConfig("quay-backup-auth") | ||
|
|
||
| result, err := secrets.HandleRegistryAuthSecret(ctx, fakeClient, workspace, config, "", scheme, log) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| Expect(result).NotTo(BeNil()) | ||
| Expect(result.Name).To(Equal(constants.DevWorkspaceBackupAuthSecretName)) | ||
| Expect(result.Namespace).To(Equal(workspaceNS)) | ||
|
|
||
| By("verifying the secret was copied to the workspace namespace") | ||
| copied := &corev1.Secret{} | ||
| err = fakeClient.Get(ctx, client.ObjectKey{ | ||
| Name: constants.DevWorkspaceBackupAuthSecretName, | ||
| Namespace: workspaceNS, | ||
| }, copied) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| }) | ||
| }) | ||
|
|
||
| // errorOnNameClient is a thin client wrapper that injects an error for a specific secret name. | ||
| type errorOnNameClient struct { | ||
| client.Client | ||
|
|
@@ -285,3 +337,62 @@ func (e *errorOnNameClient) Get(ctx context.Context, key client.ObjectKey, obj c | |
|
|
||
| // Ensure errorOnNameClient satisfies client.Client at compile time. | ||
| var _ client.Client = &errorOnNameClient{} | ||
|
|
||
| var _ = Describe("CopySecret", func() { | ||
| const ( | ||
| workspaceNS = "user-namespace" | ||
| operatorNS = "operator-namespace" | ||
| ) | ||
|
|
||
| var ( | ||
| ctx context.Context | ||
| scheme *runtime.Scheme | ||
| log = zap.New(zap.UseDevMode(true)).WithName("SecretsTest") | ||
| ) | ||
|
|
||
| BeforeEach(func() { | ||
| ctx = context.Background() | ||
| scheme = buildScheme() | ||
| }) | ||
|
|
||
| It("creates the secret without ownerReferences", func() { | ||
| By("copying a source secret into the workspace namespace") | ||
| sourceSecret := makeSecret("quay-push-secret", operatorNS) | ||
| workspace := makeWorkspace(workspaceNS) | ||
|
|
||
| fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build() | ||
|
|
||
| result, err := secrets.CopySecret(ctx, fakeClient, workspace, sourceSecret, scheme, log) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| Expect(result).NotTo(BeNil()) | ||
| Expect(result.Name).To(Equal(constants.DevWorkspaceBackupAuthSecretName)) | ||
| Expect(result.Namespace).To(Equal(workspaceNS)) | ||
|
|
||
| By("verifying the created secret has no ownerReferences") | ||
| created := &corev1.Secret{} | ||
| err = fakeClient.Get(ctx, client.ObjectKey{ | ||
| Name: constants.DevWorkspaceBackupAuthSecretName, | ||
| Namespace: workspaceNS, | ||
| }, created) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| Expect(created.OwnerReferences).To(BeEmpty()) | ||
| }) | ||
|
|
||
| It("preserves the secret data and type from the source", func() { | ||
| sourceSecret := &corev1.Secret{ | ||
| ObjectMeta: metav1.ObjectMeta{ | ||
| Name: "quay-push-secret", | ||
| Namespace: operatorNS, | ||
| }, | ||
| Data: map[string][]byte{".dockerconfigjson": []byte(`{"auths":{}}`)}, | ||
| Type: corev1.SecretTypeDockerConfigJson, | ||
| } | ||
| workspace := makeWorkspace(workspaceNS) | ||
| fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build() | ||
|
|
||
| result, err := secrets.CopySecret(ctx, fakeClient, workspace, sourceSecret, scheme, log) | ||
| Expect(err).NotTo(HaveOccurred()) | ||
| Expect(result.Data).To(HaveKey(".dockerconfigjson")) | ||
| Expect(result.Type).To(Equal(corev1.SecretTypeDockerConfigJson)) | ||
| }) | ||
| }) | ||
Uh oh!
There was an error while loading. Please reload this page.