Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
97 changes: 97 additions & 0 deletions docs/adr-backup-auth-secret-lifecycle.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,97 @@
# Backup Auth Secret Must Not Be Owned by Any Workspace

**Status**: Accepted
**Date**: 2026-05-11
**Deciders**: DevWorkspace Operator maintainers
**Related Issue**: [CRW-10760](https://redhat.atlassian.net/browse/CRW-10760)

## Context

The backup system copies the registry authentication secret (e.g., quay.io credentials) from the operator namespace into each workspace namespace as `devworkspace-backup-registry-auth`. The original implementation set a Kubernetes controller ownerReference from this secret to the DevWorkspace that triggered the copy:

```go
controllerutil.SetControllerReference(workspace, desiredSecret, scheme)
```

This was likely intended to clean up the secret when no workspaces need it anymore — standard Kubernetes garbage collection pattern.

However, two properties of this secret make ownerReference-based lifecycle management incorrect:

1. **The secret is a namespace singleton**: All workspaces in a namespace share the same `devworkspace-backup-registry-auth` secret, but a Kubernetes object can have only one controller owner. Whichever workspace's backup job ran last "wins" ownership.

2. **The secret is needed after all workspaces are deleted**: The primary restore use case is creating a new workspace from a backup of a deleted one. If the auth secret is garbage-collected when the last workspace is deleted, the user cannot authenticate to the private registry to pull the backup image.

**Bug observed (CRW-10760)**: When using quay.io (private registry) for backups, deleting a workspace caused backup entries to disappear from the Dashboard backup list for ALL workspaces in the namespace. The auth secret was garbage-collected, and the Dashboard could no longer query the registry.

**Validated on CRC cluster** (DWO 0.40.1, quay.io/okurinny):
- The `devworkspace-backup-registry-auth` secret was confirmed to have ownerReference to a single workspace (`nodejs`)
- Deleting that workspace triggered K8s GC, removing the secret
- The secret was not re-created by subsequent backup cycles (remaining workspaces already had recent backups)
- Backup listing in the Dashboard failed for all workspaces

## Decision

**Remove the ownerReference from the backup registry auth secret.** The secret becomes a namespace-scoped resource with no ownership tie to any workspace.

### What Changes

- `pkg/secrets/backup.go`: Remove the `controllerutil.SetControllerReference()` call in `CopySecret()`
- The secret is still created via `c.Create()` with `AlreadyExists` handling, just without an ownerReference
- The restore path now resolves the operator namespace via `infrastructure.GetNamespace()` to copy the secret on demand when it is missing

### What Doesn't Change

- Per-workspace resources (job runner ServiceAccount, image-builder RoleBinding) retain their ownerReferences — their GC on workspace deletion is correct and expected
- The backup Job itself retains its ownerReference (short-lived, TTL-cleaned)
- The `CopySecret` function signature stays the same

## Considered Alternatives

### Alternative 1: Multi-owner references (non-controller)

Add each workspace as a non-controller owner of the secret. K8s GC would only delete the secret when ALL owning workspaces are gone.

**Rejected because**:
- The secret must survive deletion of ALL workspaces (for restore)
- Adds complexity to track and merge owner lists
- Non-controller ownerReferences have subtle GC semantics

### Alternative 2: Finalizer-based cleanup

Remove ownerReference but add a cleanup mechanism (e.g., a controller that deletes the secret when the namespace has zero DevWorkspaces).

**Rejected because**:
- Adds complexity for a marginal benefit (one small secret in an otherwise-empty namespace)
- Could race with restore operations (user creates a workspace from backup right after the last one is deleted)
- Namespace deletion already cleans up all resources

### Alternative 3: Per-workspace auth secrets

Create a unique auth secret per workspace (e.g., `devworkspace-backup-registry-auth-{workspace-id}`).

**Rejected because**:
- Multiplies secrets unnecessarily (all contain the same credentials)
- The restore path expects the predefined name `devworkspace-backup-registry-auth`
- Still wouldn't survive workspace deletion for restore use case

## Consequences

### Positive

1. **Backups survive workspace deletion**: Users can delete all workspaces and still restore from backups
2. **No cross-workspace interference**: Deleting one workspace no longer affects other workspaces' backup capabilities
3. **Simpler lifecycle**: No ownership tracking needed for a namespace-scoped singleton

### Negative

1. **Secret persists in empty namespaces**: If all workspaces are deleted and the user never restores, the auth secret remains until the namespace is deleted. This is a minor leak — one small secret per namespace.

### Neutral

1. **Existing secrets on upgraded clusters**: Secrets created by older DWO versions will retain their stale ownerReference until the next `CopySecret` call overwrites them (via `SyncObjectWithCluster`). In the worst case, one more GC event occurs before the fix takes effect.

## References

- `pkg/secrets/backup.go` — `CopySecret()` function
- `controllers/backupcronjob/rbac.go` — Per-workspace SA/RoleBinding (unchanged)
- `pkg/constants/metadata.go:204` — `DevWorkspaceBackupAuthSecretName`
13 changes: 6 additions & 7 deletions pkg/secrets/backup.go
Original file line number Diff line number Diff line change
Expand Up @@ -22,13 +22,13 @@ import (
dw "github.com/devfile/api/v2/pkg/apis/workspaces/v1alpha2"
controllerv1alpha1 "github.com/devfile/devworkspace-operator/apis/controller/v1alpha1"
"github.com/devfile/devworkspace-operator/pkg/constants"
"github.com/devfile/devworkspace-operator/pkg/infrastructure"
"github.com/go-logr/logr"
corev1 "k8s.io/api/core/v1"
k8sErrors "k8s.io/apimachinery/pkg/api/errors"
metav1 "k8s.io/apimachinery/pkg/apis/meta/v1"
"k8s.io/apimachinery/pkg/runtime"
"sigs.k8s.io/controller-runtime/pkg/client"
"sigs.k8s.io/controller-runtime/pkg/controller/controllerutil"
)

// GetRegistryAuthSecret retrieves the registry authentication secret for accessing backup images
Expand Down Expand Up @@ -60,9 +60,12 @@ func HandleRegistryAuthSecret(ctx context.Context, c client.Client, workspace *d
if client.IgnoreNotFound(err) != nil {
return nil, err
}
// If we don't provide an operator namespace, don't attempt to look there.
if operatorConfigNamespace == "" {
return nil, nil
resolvedNS, nsErr := infrastructure.GetNamespace()
if nsErr != nil {
return nil, fmt.Errorf("cannot resolve operator namespace to copy registry auth secret: %w", nsErr)
}
operatorConfigNamespace = resolvedNS
Comment thread
coderabbitai[bot] marked this conversation as resolved.
}
Comment on lines 63 to 69
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid resolving operator namespace before confirming auth is required.

If AuthSecret is empty, the function should proceed anonymously, but it currently tries to resolve operator namespace first and can fail early with an unrelated error. Move namespace resolution to only run when AuthSecret is non-empty.

Also applies to: 72-79

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/secrets/backup.go` around lines 63 - 69, The code resolves
operatorConfigNamespace unconditionally which causes failures even when no auth
is needed; change the logic so infrastructure.GetNamespace() is only called when
AuthSecret is non-empty: wrap the operatorConfigNamespace resolution inside the
branch that checks cfg.AuthSecret (or AuthSecret variable) and only attempt to
resolve/set operatorConfigNamespace when AuthSecret != ""; apply the same change
for the later block that currently resolves namespace (the code around
operatorConfigNamespace and infrastructure.GetNamespace) so anonymous (no-auth)
flows skip namespace resolution entirely.


// Check if AuthSecret is configured in operator config
Expand Down Expand Up @@ -111,10 +114,6 @@ func CopySecret(ctx context.Context, c client.Client, workspace *dw.DevWorkspace
Type: sourceSecret.Type,
}

if err := controllerutil.SetControllerReference(workspace, desiredSecret, scheme); err != nil {
return nil, err
}

err = c.Create(ctx, desiredSecret)
if err != nil {
if k8sErrors.IsAlreadyExists(err) {
Expand Down
131 changes: 121 additions & 10 deletions pkg/secrets/backup_test.go
Original file line number Diff line number Diff line change
Expand Up @@ -18,6 +18,7 @@ package secrets_test
import (
"context"
"errors"
"os"
"testing"

. "github.com/onsi/ginkgo/v2"
Expand All @@ -34,6 +35,7 @@ import (
"sigs.k8s.io/controller-runtime/pkg/log/zap"

"github.com/devfile/devworkspace-operator/pkg/constants"
"github.com/devfile/devworkspace-operator/pkg/infrastructure"
"github.com/devfile/devworkspace-operator/pkg/secrets"
)

Expand Down Expand Up @@ -115,14 +117,15 @@ var _ = Describe("HandleRegistryAuthSecret (restore path: operatorConfigNamespac
Expect(result.Name).To(Equal(constants.DevWorkspaceBackupAuthSecretName))
})

It("returns nil when the predefined secret does not exist in the workspace namespace", func() {
By("using a fake client with no secrets")
It("returns error when secret is missing and operator namespace cannot be resolved", func() {
By("using a fake client with no secrets and no WATCH_NAMESPACE set")
fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build()
workspace := makeWorkspace(workspaceNS)
config := makeConfig("quay-backup-auth")

result, err := secrets.HandleRegistryAuthSecret(ctx, fakeClient, workspace, config, "", scheme, log)
Expect(err).NotTo(HaveOccurred())
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("cannot resolve operator namespace"))
Expect(result).To(BeNil())
Comment on lines +120 to 129
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make this failure-path test independent of ambient WATCH_NAMESPACE.

This spec assumes the env var is unset but does not enforce it locally, so it can become environment-dependent and flaky. Explicitly control env state for this case.

As per coding guidelines, "In test code, use 'infrastructure.InitializeForTesting()' to mock infrastructure type instead of relying on actual infrastructure detection".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/secrets/backup_test.go` around lines 120 - 129, The test is flaky because
it assumes WATCH_NAMESPACE is unset and real infrastructure detection; make it
deterministic by explicitly setting the env var to an empty string (or saving
and restoring it) within the test and by initializing test infrastructure via
infrastructure.InitializeForTesting() so the test does not consult real
environment/infrastructure; update the spec around the call to
secrets.HandleRegistryAuthSecret (and helper calls makeWorkspace/makeConfig if
needed) to call infrastructure.InitializeForTesting() at start and ensure
WATCH_NAMESPACE is explicitly cleared/controlled for the duration of the test,
restoring prior state afterwards.

})

Expand Down Expand Up @@ -188,7 +191,7 @@ var _ = Describe("HandleRegistryAuthSecret (backup path: operatorConfigNamespace
Expect(result).To(BeNil())
})

It("copies secret from operator namespace when AuthSecret is configured and secret not found in workspace namespace", func() {
It("copies secret from operator namespace without ownerReferences", func() {
By("creating a secret in the operator namespace")
operatorSecret := makeSecret(constants.DevWorkspaceBackupAuthSecretName, operatorNS)
operatorSecret.Data = map[string][]byte{"auth": []byte("operator-credentials")}
Expand Down Expand Up @@ -216,12 +219,8 @@ var _ = Describe("HandleRegistryAuthSecret (backup path: operatorConfigNamespace
By("verifying the copied secret has the watch-secret label")
Expect(copiedSecret.Labels).To(HaveKeyWithValue(constants.DevWorkspaceWatchSecretLabel, "true"))

By("verifying the copied secret has an owner reference to the workspace")
Expect(copiedSecret.OwnerReferences).To(HaveLen(1))
Expect(copiedSecret.OwnerReferences[0].Name).To(Equal(workspace.Name))
Expect(copiedSecret.OwnerReferences[0].Kind).To(Equal("DevWorkspace"))
Expect(copiedSecret.OwnerReferences[0].Controller).NotTo(BeNil())
Expect(*copiedSecret.OwnerReferences[0].Controller).To(BeTrue())
By("verifying the copied secret has no ownerReferences")
Expect(copiedSecret.OwnerReferences).To(BeEmpty())
})

It("NEVER overwrites user-provided secret even if operator has different credentials", func() {
Expand Down Expand Up @@ -266,6 +265,59 @@ var _ = Describe("HandleRegistryAuthSecret (backup path: operatorConfigNamespace
})
})

var _ = Describe("HandleRegistryAuthSecret (restore path: fallback to operator namespace)", func() {
const (
workspaceNS = "user-namespace"
operatorNS = "operator-namespace"
)

var (
ctx context.Context
scheme *runtime.Scheme
log = zap.New(zap.UseDevMode(true)).WithName("SecretsTest")
origWatchNS string
hadWatchNS bool
)

BeforeEach(func() {
ctx = context.Background()
scheme = buildScheme()
origWatchNS, hadWatchNS = os.LookupEnv(infrastructure.WatchNamespaceEnvVar)
os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS)
})

AfterEach(func() {
if hadWatchNS {
os.Setenv(infrastructure.WatchNamespaceEnvVar, origWatchNS)
} else {
os.Unsetenv(infrastructure.WatchNamespaceEnvVar)
}
})
Comment on lines +282 to +295
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP 'os\.(Setenv|Unsetenv)\(' pkg/secrets/backup_test.go
rg -nP 'WatchNamespaceEnvVar|BeforeEach|AfterEach' pkg/secrets/backup_test.go

Repository: devfile/devworkspace-operator

Length of output: 437


🏁 Script executed:

cd pkg/secrets && sed -n '279,290p' backup_test.go

Repository: devfile/devworkspace-operator

Length of output: 448


Handle errors from os.Setenv and os.Unsetenv, and restore prior environment state

Lines 282 and 286 ignore errors returned by os.Setenv() and os.Unsetenv(). Additionally, the AfterEach unconditionally unsets the environment variable instead of restoring its original value before the test, which violates the error-handling requirement and can cause test isolation issues. Store the original value before the test and restore it in AfterEach, or explicitly check and handle any errors.

🧰 Tools
🪛 golangci-lint (2.12.2)

[error] 282-282: Error return value of os.Setenv is not checked

(errcheck)


[error] 286-286: Error return value of os.Unsetenv is not checked

(errcheck)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/secrets/backup_test.go` around lines 279 - 287, BeforeEach currently
calls os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS) without
checking the error and AfterEach unconditionally calls os.Unsetenv; instead, in
BeforeEach capture the prior value with os.LookupEnv, set the env using
os.Setenv and handle any error (fail the test via the test framework), and in
AfterEach restore the original state: if the prior value existed, call
os.Setenv(originalKey, originalVal) and check the error, otherwise call
os.Unsetenv and check the error; reference the BeforeEach/AfterEach blocks and
the use of infrastructure.WatchNamespaceEnvVar and operatorNS to locate where to
add the lookup, error checks, and restoration logic.


It("copies the secret from operator namespace when missing in workspace namespace", func() {
By("creating the auth secret only in the operator namespace")
operatorSecret := makeSecret("quay-backup-auth", operatorNS)

fakeClient := fake.NewClientBuilder().WithScheme(scheme).WithObjects(operatorSecret).Build()
workspace := makeWorkspace(workspaceNS)
config := makeConfig("quay-backup-auth")

result, err := secrets.HandleRegistryAuthSecret(ctx, fakeClient, workspace, config, "", scheme, log)
Expect(err).NotTo(HaveOccurred())
Expect(result).NotTo(BeNil())
Expect(result.Name).To(Equal(constants.DevWorkspaceBackupAuthSecretName))
Expect(result.Namespace).To(Equal(workspaceNS))

By("verifying the secret was copied to the workspace namespace")
copied := &corev1.Secret{}
err = fakeClient.Get(ctx, client.ObjectKey{
Name: constants.DevWorkspaceBackupAuthSecretName,
Namespace: workspaceNS,
}, copied)
Expect(err).NotTo(HaveOccurred())
})
})

// errorOnNameClient is a thin client wrapper that injects an error for a specific secret name.
type errorOnNameClient struct {
client.Client
Expand All @@ -285,3 +337,62 @@ func (e *errorOnNameClient) Get(ctx context.Context, key client.ObjectKey, obj c

// Ensure errorOnNameClient satisfies client.Client at compile time.
var _ client.Client = &errorOnNameClient{}

var _ = Describe("CopySecret", func() {
const (
workspaceNS = "user-namespace"
operatorNS = "operator-namespace"
)

var (
ctx context.Context
scheme *runtime.Scheme
log = zap.New(zap.UseDevMode(true)).WithName("SecretsTest")
)

BeforeEach(func() {
ctx = context.Background()
scheme = buildScheme()
})

It("creates the secret without ownerReferences", func() {
By("copying a source secret into the workspace namespace")
sourceSecret := makeSecret("quay-push-secret", operatorNS)
workspace := makeWorkspace(workspaceNS)

fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build()

result, err := secrets.CopySecret(ctx, fakeClient, workspace, sourceSecret, scheme, log)
Expect(err).NotTo(HaveOccurred())
Expect(result).NotTo(BeNil())
Expect(result.Name).To(Equal(constants.DevWorkspaceBackupAuthSecretName))
Expect(result.Namespace).To(Equal(workspaceNS))

By("verifying the created secret has no ownerReferences")
created := &corev1.Secret{}
err = fakeClient.Get(ctx, client.ObjectKey{
Name: constants.DevWorkspaceBackupAuthSecretName,
Namespace: workspaceNS,
}, created)
Expect(err).NotTo(HaveOccurred())
Expect(created.OwnerReferences).To(BeEmpty())
})

It("preserves the secret data and type from the source", func() {
sourceSecret := &corev1.Secret{
ObjectMeta: metav1.ObjectMeta{
Name: "quay-push-secret",
Namespace: operatorNS,
},
Data: map[string][]byte{".dockerconfigjson": []byte(`{"auths":{}}`)},
Type: corev1.SecretTypeDockerConfigJson,
}
workspace := makeWorkspace(workspaceNS)
fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build()

result, err := secrets.CopySecret(ctx, fakeClient, workspace, sourceSecret, scheme, log)
Expect(err).NotTo(HaveOccurred())
Expect(result.Data).To(HaveKey(".dockerconfigjson"))
Expect(result.Type).To(Equal(corev1.SecretTypeDockerConfigJson))
})
})
Loading