Skip to content

bug: Kubernetes user namespace E2E uses incompatible kubectl setup #1597

@alangou

Description

@alangou

Agent Diagnostic

Loaded skill: create-github-issue.

Investigation findings:

  • The user_namespaces Rust E2E target is declared in e2e/rust/Cargo.toml with required-features = ["e2e-kubernetes"].
  • On main, mise run e2e:kubernetes did not normally execute this target because e2e/rust/e2e-kubernetes.sh defaulted to e2e,e2e-host-gateway, without e2e-kubernetes.
  • When the target is selected, the test setup is incompatible with the repo-controlled Kubernetes E2E flow. e2e/rust/tests/user_namespaces.rs invokes kubectl as docker exec openshell-cluster-openshell kubectl, but e2e/with-kube-gateway.sh, the CI kind workflow, and the local k3d flow use host kubectl --context ...; they do not create a Docker container named openshell-cluster-openshell.
  • Even after replacing that hardcoded docker exec, the test mutates the gateway StatefulSet with kubectl set env, which triggers a gateway rollout while with-kube-gateway.sh is holding kubectl port-forward sessions to the gateway service/statefulset. That can break the CLI endpoint used by openshell sandbox create and surface as a generic sandbox creation timeout.

Relevant files:

  • e2e/rust/tests/user_namespaces.rs
  • e2e/rust/e2e-kubernetes.sh
  • e2e/with-kube-gateway.sh
  • .github/workflows/e2e-kubernetes-test.yml

Description

Actual behavior: When the Kubernetes user namespace E2E test is selected, it is likely to fail before exercising user namespace behavior because it shells out through a hardcoded Docker container name that the standard CI/local Kubernetes setup does not create. If that is fixed directly, the test can still be flaky or fail because it restarts the gateway in the middle of the run and disrupts active port-forwards.

Expected behavior: The user namespace E2E test should run against the same Kubernetes setup as the rest of mise run e2e:kubernetes, using the configured kube context and installing the gateway with user namespaces enabled before the test command starts. It should not perform a mid-test gateway rollout.

Reproduction Steps

  1. Ensure the user_namespaces target is selected by running Kubernetes E2E with the e2e-kubernetes feature, or by targeting the test directly.
  2. Run the Kubernetes E2E path against the repo-managed kind/k3d setup.
  3. Observe that the test's kubectl helper attempts docker exec openshell-cluster-openshell kubectl ... instead of using the configured kube context.
  4. If the kubectl helper is patched ad hoc, observe that kubectl set env statefulset/openshell performs a gateway rollout during the test and can break the wrapper's port-forwarded gateway endpoint.

Environment

  • OS: Linux CI / local Kubernetes E2E hosts
  • Kubernetes E2E setup: e2e/with-kube-gateway.sh using kind in CI or k3d/existing context locally
  • OpenShell: current main and branches that select e2e-kubernetes by default

Logs

Expected failure mode when selected under the standard setup:

kubectl [..] failed: Error response from daemon: No such container: openshell-cluster-openshell

Potential follow-on failure mode after replacing the hardcoded Docker exec:

sandbox <name> did not appear within 60s

Suggested Fix

  • Replace the test-local docker exec openshell-cluster-openshell kubectl helper with a shared E2E helper that uses the same kube context as with-kube-gateway.sh.
  • Add a wrapper-level way to install the Helm chart with server.enableUserNamespaces=true before tests start, for example an OPENSHELL_E2E_KUBE_ENABLE_USER_NAMESPACES=1 switch that appends the appropriate Helm --set.
  • Remove the mid-test kubectl set env / gateway rollout from user_namespaces.rs.
  • Keep the test focused on creating a sandbox, waiting for the Sandbox CR/pod, inspecting spec.hostUsers=false and the expected capabilities, and cleaning up.

Metadata

Metadata

Assignees

No one assigned

    Labels

    state:triage-neededOpened without agent diagnostics and needs triage

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions