Skip to content

Backup registry auth secret must not be owned by any workspace#1631

Open
akurinnoy wants to merge 4 commits into
devfile:mainfrom
akurinnoy:fix/CRW-10760
Open

Backup registry auth secret must not be owned by any workspace#1631
akurinnoy wants to merge 4 commits into
devfile:mainfrom
akurinnoy:fix/CRW-10760

Conversation

@akurinnoy
Copy link
Copy Markdown
Collaborator

@akurinnoy akurinnoy commented May 13, 2026

What does this PR do?

This PR removes the controller ownerReference from the backup registry auth secret so it is not garbage-collected when a workspace is deleted. Also makes the restore path fall back to copying the secret from the operator namespace when it is missing in the workspace namespace.
The PR includes an ADR documenting why the auth secret must not be owned by any workspace.

What issues does this PR fix or reference?

Fixes https://redhat.atlassian.net/browse/CRW-10760

Is it tested? How?

New unit tests added. Validated manually on CRC cluster (DWO 0.40.1, quay.io private registry):

  • Backup job creates auth secret without ownerReferences
  • Deleting a workspace does not garbage-collect the auth secret
  • Restore path copies the secret from operator namespace when missing

PR Checklist

  • E2E tests pass (when PR is ready, comment /test v8-devworkspace-operator-e2e, v8-che-happy-path to trigger)
    • v8-devworkspace-operator-e2e: DevWorkspace e2e test
    • v8-che-happy-path: Happy path for verification integration with Che

Summary by CodeRabbit

  • Bug Fixes

    • Fixed backup list disappearing for namespace workspaces after deleting individual workspaces when using external registries.
  • Behavior Change

    • Backup auth secret is no longer tied to a specific workspace; if missing, the operator will locate and copy it from the operator namespace on demand.
  • Documentation

    • Added an ADR documenting the backup auth secret lifecycle and garbage-collection behavior.

Review Change Stack

akurinnoy and others added 2 commits May 13, 2026 14:32
The backup registry auth secret (devworkspace-backup-registry-auth) is
a namespace singleton shared by all workspaces. Setting a controller
ownerReference to a single workspace caused Kubernetes garbage
collection to delete the secret when that workspace was deleted,
breaking backup/restore for all remaining workspaces in the namespace.

Remove the SetControllerReference call so the secret persists
independently of any workspace lifecycle. The secret is cleaned up
naturally when the namespace is deleted.

Assisted-by: Claude Code

Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
When the backup registry auth secret is missing from the workspace
namespace (e.g. after GC on upgrade), the restore path now resolves
the operator namespace via infrastructure.GetNamespace() and copies
the secret from there, matching the backup path behavior.

Previously the restore path returned nil when the secret was missing,
causing restore init containers to fail on private registries.

Assisted-by: Claude Code

Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
@akurinnoy akurinnoy self-assigned this May 13, 2026
@openshift-ci
Copy link
Copy Markdown

openshift-ci Bot commented May 13, 2026

[APPROVALNOTIFIER] This PR is NOT APPROVED

This pull-request has been approved by: akurinnoy
Once this PR has been reviewed and has the lgtm label, please assign dkwon17 for approval. For more information see the Code Review Process.

The full list of commands accepted by this bot can be found here.

Details Needs approval from an approver in each of these files:

Approvers can indicate their approval by writing /approve in a comment
Approvers can cancel approval by writing /approve cancel in a comment

@coderabbitai
Copy link
Copy Markdown

coderabbitai Bot commented May 13, 2026

📝 Walkthrough

Walkthrough

Removes workspace controller ownerReference from the namespace-scoped backup registry auth secret, adds operator-namespace fallback via infrastructure.GetNamespace() for restore when operatorConfigNamespace is empty, and updates/extends tests to validate both behaviors and data preservation.

Changes

Backup Auth Secret Lifecycle

Layer / File(s) Summary
Architectural Decision & Problem Statement
docs/adr-backup-auth-secret-lifecycle.md
ADR documents the ownerReference GC issue tying a namespace singleton secret to workspace lifecycle and the decision to stop setting controller ownerReferences in CopySecret() while preserving sync semantics and describing restore-on-demand via operator namespace resolution.
Secret Lifecycle Implementation Fix
pkg/secrets/backup.go
Imports updated, HandleRegistryAuthSecret resolves operator namespace via infrastructure.GetNamespace() when operatorConfigNamespace is empty and returns an error on failure; CopySecret no longer calls controllerutil.SetControllerReference and retains create + AlreadyExists handling.
Test Coverage for Lifecycle Changes
pkg/secrets/backup_test.go
Tests import os and infrastructure; existing copy test updated to expect no ownerReferences; new suites validate restore-path fallback to operator namespace when workspace secret is missing and that CopySecret creates workspace secret without ownerReferences while preserving data keys and Type.

Sequence Diagram

sequenceDiagram
    participant HandleAuth as HandleRegistryAuthSecret
    participant Infrastructure as infrastructure.GetNamespace()
    participant CopySecret as CopySecret
    participant Client as c.Create()
    participant WorkspaceNS as Workspace Namespace

    HandleAuth->>HandleAuth: operatorConfigNamespace empty?
    alt operatorConfigNamespace is empty
        HandleAuth->>Infrastructure: GetNamespace() resolve operator NS
        Infrastructure-->>HandleAuth: operator namespace
        HandleAuth->>CopySecret: source: operator NS\ndest: workspace NS
    end
    CopySecret->>Client: Create secret (no SetControllerReference)
    Client->>WorkspaceNS: secret created without ownerReferences
    WorkspaceNS-->>CopySecret: result
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~20 minutes

Possibly related PRs

Suggested labels

lgtm, approved

Suggested reviewers

  • rohanKanojia
  • dkwon17
  • ibuziuk

Poem

🐰 A secret unbound, no owner in sight,
From operator's stash it springs into light,
Copied with care, its data held true,
No garbage to sweep when a workspace bids adieu,
Hops of relief—fresh tests say, "Woo-hoo!"

🚥 Pre-merge checks | ✅ 4 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 50.00% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (4 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title clearly and specifically summarizes the main change: removing the workspace controller ownership from the backup registry auth secret to prevent garbage collection.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/secrets/backup_test.go`:
- Around line 279-287: BeforeEach currently calls
os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS) without checking the
error and AfterEach unconditionally calls os.Unsetenv; instead, in BeforeEach
capture the prior value with os.LookupEnv, set the env using os.Setenv and
handle any error (fail the test via the test framework), and in AfterEach
restore the original state: if the prior value existed, call
os.Setenv(originalKey, originalVal) and check the error, otherwise call
os.Unsetenv and check the error; reference the BeforeEach/AfterEach blocks and
the use of infrastructure.WatchNamespaceEnvVar and operatorNS to locate where to
add the lookup, error checks, and restoration logic.

In `@pkg/secrets/backup.go`:
- Around line 64-69: The code currently swallows namespace resolution failures
by returning nil, nil when infrastructure.GetNamespace() returns an error;
update the error path in pkg/secrets/backup.go so that when nsErr != nil you
return the error (or a wrapped error) instead of nil, nil, and ensure
operatorConfigNamespace is only set after a successful GetNamespace() call;
reference GetNamespace(), nsErr, and operatorConfigNamespace to locate and fix
the failing branch so restore fails fast with a clear cause.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: a4ce1a03-5397-4980-a568-430bf14ef1e2

📥 Commits

Reviewing files that changed from the base of the PR and between 1e949fc and 925f3bb.

📒 Files selected for processing (3)
  • docs/adr-backup-auth-secret-lifecycle.md
  • pkg/secrets/backup.go
  • pkg/secrets/backup_test.go

Comment on lines +279 to +287
BeforeEach(func() {
ctx = context.Background()
scheme = buildScheme()
os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS)
})

AfterEach(func() {
os.Unsetenv(infrastructure.WatchNamespaceEnvVar)
})
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
rg -nP 'os\.(Setenv|Unsetenv)\(' pkg/secrets/backup_test.go
rg -nP 'WatchNamespaceEnvVar|BeforeEach|AfterEach' pkg/secrets/backup_test.go

Repository: devfile/devworkspace-operator

Length of output: 437


🏁 Script executed:

cd pkg/secrets && sed -n '279,290p' backup_test.go

Repository: devfile/devworkspace-operator

Length of output: 448


Handle errors from os.Setenv and os.Unsetenv, and restore prior environment state

Lines 282 and 286 ignore errors returned by os.Setenv() and os.Unsetenv(). Additionally, the AfterEach unconditionally unsets the environment variable instead of restoring its original value before the test, which violates the error-handling requirement and can cause test isolation issues. Store the original value before the test and restore it in AfterEach, or explicitly check and handle any errors.

🧰 Tools
🪛 golangci-lint (2.12.2)

[error] 282-282: Error return value of os.Setenv is not checked

(errcheck)


[error] 286-286: Error return value of os.Unsetenv is not checked

(errcheck)

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/secrets/backup_test.go` around lines 279 - 287, BeforeEach currently
calls os.Setenv(infrastructure.WatchNamespaceEnvVar, operatorNS) without
checking the error and AfterEach unconditionally calls os.Unsetenv; instead, in
BeforeEach capture the prior value with os.LookupEnv, set the env using
os.Setenv and handle any error (fail the test via the test framework), and in
AfterEach restore the original state: if the prior value existed, call
os.Setenv(originalKey, originalVal) and check the error, otherwise call
os.Unsetenv and check the error; reference the BeforeEach/AfterEach blocks and
the use of infrastructure.WatchNamespaceEnvVar and operatorNS to locate where to
add the lookup, error checks, and restoration logic.

Comment thread pkg/secrets/backup.go
akurinnoy and others added 2 commits May 13, 2026 17:18
Return an error instead of silently returning nil when
infrastructure.GetNamespace() fails on the restore path. This makes
auth failures visible immediately rather than causing a confusing
image pull error later.

Also properly save and restore the WATCH_NAMESPACE env var in tests.

Assisted-by: Claude Code

Signed-off-by: Oleksii Kurinnyi <okurynny@redhat.com>
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
Signed-off-by: Oleksii Kurinnyi <okurinny@redhat.com>
Copy link
Copy Markdown

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 2

♻️ Duplicate comments (1)
pkg/secrets/backup_test.go (1)

286-287: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Handle and assert errors from env mutation calls.

os.Setenv / os.Unsetenv errors are still ignored in setup/teardown, which breaks errcheck and weakens test isolation guarantees.

#!/bin/bash
# Verify unchecked env mutation calls in this test file
rg -n -C2 'os\.(Setenv|Unsetenv)\(' pkg/secrets/backup_test.go

As per coding guidelines, "Don't ignore errors. Always handle or propagate errors explicitly."

Also applies to: 290-294

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/secrets/backup_test.go` around lines 286 - 287, The test currently
ignores errors from os.Setenv/os.Unsetenv (e.g., the call setting
infrastructure.WatchNamespaceEnvVar to operatorNS), which fails errcheck; update
the test to either use t.Setenv(...) (preferred) or check the returned error and
call t.Fatalf/require.NoError to fail the test on failure, and do the same for
the corresponding Unsetenv calls (and other occurrences around the same block at
the 290-294 region). Ensure you reference the environment variable symbol
infrastructure.WatchNamespaceEnvVar and the operatorNS value when updating the
setup/teardown so errors are handled/asserted.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@pkg/secrets/backup_test.go`:
- Around line 120-129: The test is flaky because it assumes WATCH_NAMESPACE is
unset and real infrastructure detection; make it deterministic by explicitly
setting the env var to an empty string (or saving and restoring it) within the
test and by initializing test infrastructure via
infrastructure.InitializeForTesting() so the test does not consult real
environment/infrastructure; update the spec around the call to
secrets.HandleRegistryAuthSecret (and helper calls makeWorkspace/makeConfig if
needed) to call infrastructure.InitializeForTesting() at start and ensure
WATCH_NAMESPACE is explicitly cleared/controlled for the duration of the test,
restoring prior state afterwards.

In `@pkg/secrets/backup.go`:
- Around line 63-69: The code resolves operatorConfigNamespace unconditionally
which causes failures even when no auth is needed; change the logic so
infrastructure.GetNamespace() is only called when AuthSecret is non-empty: wrap
the operatorConfigNamespace resolution inside the branch that checks
cfg.AuthSecret (or AuthSecret variable) and only attempt to resolve/set
operatorConfigNamespace when AuthSecret != ""; apply the same change for the
later block that currently resolves namespace (the code around
operatorConfigNamespace and infrastructure.GetNamespace) so anonymous (no-auth)
flows skip namespace resolution entirely.

---

Duplicate comments:
In `@pkg/secrets/backup_test.go`:
- Around line 286-287: The test currently ignores errors from
os.Setenv/os.Unsetenv (e.g., the call setting
infrastructure.WatchNamespaceEnvVar to operatorNS), which fails errcheck; update
the test to either use t.Setenv(...) (preferred) or check the returned error and
call t.Fatalf/require.NoError to fail the test on failure, and do the same for
the corresponding Unsetenv calls (and other occurrences around the same block at
the 290-294 region). Ensure you reference the environment variable symbol
infrastructure.WatchNamespaceEnvVar and the operatorNS value when updating the
setup/teardown so errors are handled/asserted.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: d993c0c2-89ec-4642-997c-d9ed96e27aa5

📥 Commits

Reviewing files that changed from the base of the PR and between 925f3bb and 7189cc3.

📒 Files selected for processing (3)
  • docs/adr-backup-auth-secret-lifecycle.md
  • pkg/secrets/backup.go
  • pkg/secrets/backup_test.go
✅ Files skipped from review due to trivial changes (1)
  • docs/adr-backup-auth-secret-lifecycle.md

Comment on lines +120 to 129
It("returns error when secret is missing and operator namespace cannot be resolved", func() {
By("using a fake client with no secrets and no WATCH_NAMESPACE set")
fakeClient := fake.NewClientBuilder().WithScheme(scheme).Build()
workspace := makeWorkspace(workspaceNS)
config := makeConfig("quay-backup-auth")

result, err := secrets.HandleRegistryAuthSecret(ctx, fakeClient, workspace, config, "", scheme, log)
Expect(err).NotTo(HaveOccurred())
Expect(err).To(HaveOccurred())
Expect(err.Error()).To(ContainSubstring("cannot resolve operator namespace"))
Expect(result).To(BeNil())
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Make this failure-path test independent of ambient WATCH_NAMESPACE.

This spec assumes the env var is unset but does not enforce it locally, so it can become environment-dependent and flaky. Explicitly control env state for this case.

As per coding guidelines, "In test code, use 'infrastructure.InitializeForTesting()' to mock infrastructure type instead of relying on actual infrastructure detection".

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/secrets/backup_test.go` around lines 120 - 129, The test is flaky because
it assumes WATCH_NAMESPACE is unset and real infrastructure detection; make it
deterministic by explicitly setting the env var to an empty string (or saving
and restoring it) within the test and by initializing test infrastructure via
infrastructure.InitializeForTesting() so the test does not consult real
environment/infrastructure; update the spec around the call to
secrets.HandleRegistryAuthSecret (and helper calls makeWorkspace/makeConfig if
needed) to call infrastructure.InitializeForTesting() at start and ensure
WATCH_NAMESPACE is explicitly cleared/controlled for the duration of the test,
restoring prior state afterwards.

Comment thread pkg/secrets/backup.go
Comment on lines 63 to 69
if operatorConfigNamespace == "" {
return nil, nil
resolvedNS, nsErr := infrastructure.GetNamespace()
if nsErr != nil {
return nil, fmt.Errorf("cannot resolve operator namespace to copy registry auth secret: %w", nsErr)
}
operatorConfigNamespace = resolvedNS
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Avoid resolving operator namespace before confirming auth is required.

If AuthSecret is empty, the function should proceed anonymously, but it currently tries to resolve operator namespace first and can fail early with an unrelated error. Move namespace resolution to only run when AuthSecret is non-empty.

Also applies to: 72-79

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@pkg/secrets/backup.go` around lines 63 - 69, The code resolves
operatorConfigNamespace unconditionally which causes failures even when no auth
is needed; change the logic so infrastructure.GetNamespace() is only called when
AuthSecret is non-empty: wrap the operatorConfigNamespace resolution inside the
branch that checks cfg.AuthSecret (or AuthSecret variable) and only attempt to
resolve/set operatorConfigNamespace when AuthSecret != ""; apply the same change
for the later block that currently resolves namespace (the code around
operatorConfigNamespace and infrastructure.GetNamespace) so anonymous (no-auth)
flows skip namespace resolution entirely.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant