Skip to content

fix(helm): create sandbox JWT secret when cert-manager is enabled#1700

Merged
TaylorMutch merged 2 commits into
mainfrom
1691-cert-manager-jwt-secret/tmutch
Jun 3, 2026
Merged

fix(helm): create sandbox JWT secret when cert-manager is enabled#1700
TaylorMutch merged 2 commits into
mainfrom
1691-cert-manager-jwt-secret/tmutch

Conversation

@TaylorMutch
Copy link
Copy Markdown
Collaborator

@TaylorMutch TaylorMutch commented Jun 2, 2026

Summary

The cert-manager install path left the gateway StatefulSet unable to start because nothing created the openshell-jwt-keys Secret. cert-manager owns TLS Secrets but does not mint the sandbox JWT signing key, and the certgen hook only rendered when pkiInitJob.enabled was true. This PR separates sandbox JWT signing-key provisioning from TLS PKI provisioning so the JWT Secret always exists.

Related Issue

Closes #1691

Changes

  • certgen: add a --jwt-only mode that creates only the Opaque JWT signing Secret, for use when another controller (cert-manager) owns the TLS Secrets.
  • certgen.yaml: render the hook when pkiInitJob.enabled or certManager.enabled is true. cert-manager takes precedence and runs the hook with --jwt-only even if pkiInitJob.enabled remains true. Removes the old mutual-exclusion failure between the two values.
  • _helpers.tpl: add openshell.sandboxJwtSecretName, shared by the hook and the StatefulSet mount.
  • Docs/values/README: document the new precedence; the cert-manager install no longer needs --set pkiInitJob.enabled=false. Updated managing-certificates.mdx, openshift.mdx, architecture/gateway.md, and the debug-openshell-cluster skill.
  • Tests: new tests/certgen_test.yaml Helm unit suite (combined, JWT-only, custom secret name, both-enabled precedence) and a CLI parse test for --jwt-only.

Testing

  • mise run pre-commit passes
  • Unit tests added/updated (762 cargo unit tests; 36 Helm unit tests across 4 suites)
  • E2E tests added/updated (if applicable)

Checklist

  • Follows Conventional Commits
  • Commits are signed off (DCO)

@TaylorMutch TaylorMutch added the area:docs Documentation and examples label Jun 2, 2026
@TaylorMutch TaylorMutch added the area:cluster Related to running OpenShell on k3s/docker label Jun 2, 2026
@TaylorMutch TaylorMutch requested a review from a team as a code owner June 2, 2026 23:57
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

@TaylorMutch TaylorMutch added the test:e2e Requires end-to-end coverage label Jun 2, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jun 2, 2026

Label test:e2e applied for 11485e8. Open the existing run and click Re-run all jobs to execute with the label set. The run will execute the standard E2E suite after building the required gateway and supervisor images once. The matching required CI gate status on this PR will flip green automatically once the run finishes.

@TaylorMutch TaylorMutch changed the title fix(helm): create sandbox JWT secret under cert-manager fix(helm): create sandbox JWT secret when cert-manager is enabled Jun 2, 2026
@johntmyers johntmyers added the gator:in-review Gator is reviewing or awaiting PR review feedback label Jun 3, 2026
@johntmyers
Copy link
Copy Markdown
Collaborator

johntmyers commented Jun 3, 2026

gator-agent

PR Review Status

Validation: This PR is project-valid because it directly addresses #1691 by separating sandbox JWT Secret provisioning from TLS PKI provisioning in the Helm cert-manager path.
Head SHA: 8b5b09755ae32269a367f8b6cd2142c655b8a4c6

Review findings:

  • No blocking findings remain. The client-CA cert-manager precedence finding was addressed by gating built-in PKI client-CA selection with not .Values.certManager.enabled and adding statefulset_client_ca_test.yaml coverage for the regression case.

Next state: gator:watch-pipeline

Comment thread deploy/helm/openshell/templates/statefulset.yaml
@johntmyers johntmyers added gator:watch-pipeline Gator is monitoring PR CI/CD status and removed gator:in-review Gator is reviewing or awaiting PR review feedback labels Jun 3, 2026
@TaylorMutch TaylorMutch requested a review from johntmyers June 3, 2026 16:25
The cert-manager install path (certManager.enabled=true,
pkiInitJob.enabled=false) left the gateway StatefulSet unable to start
because nothing created the openshell-jwt-keys Secret: cert-manager owns
TLS Secrets but does not mint the sandbox JWT signing key, and the
certgen hook only rendered when pkiInitJob.enabled was true.

Separate JWT signing-key provisioning from TLS PKI provisioning:

- certgen: add a --jwt-only mode that creates only the Opaque JWT
  signing Secret, for use when another controller owns TLS Secrets.
- certgen.yaml: render the hook when pkiInitJob.enabled OR
  certManager.enabled is true. cert-manager takes precedence and runs
  the hook with --jwt-only even if pkiInitJob.enabled remains true.
  Remove the mutual-exclusion failure between the two values.
- _helpers.tpl: add openshell.sandboxJwtSecretName, shared by the hook
  and the StatefulSet mount.
- Update values, README, docs, architecture, and the
  debug-openshell-cluster skill to reflect the new precedence; the
  documented cert-manager install no longer needs pkiInitJob.enabled=false.

Closes #1691
The client CA volume logic treated pkiInitJob.enabled as proof that
built-in PKI owns the client CA. With cert-manager precedence now
allowing certManager.enabled=true alongside the default
pkiInitJob.enabled=true, that assumption mounts the server TLS cert
secret as the client CA and ignores
certManager.clientCaFromServerTlsSecret=false, which can break mTLS or
trust the wrong CA.

Gate the pkiInitJob.enabled term with (not certManager.enabled) in all
three client CA conditions (volume mount, volume definition, and secret
selection) so cert-manager owns TLS when enabled. Add a Helm test suite
covering built-in PKI, cert-manager shared CA, the regression config
(cert-manager + clientCaFromServerTlsSecret=false + default pkiInitJob),
and the no-client-CA case.
@TaylorMutch TaylorMutch force-pushed the 1691-cert-manager-jwt-secret/tmutch branch from 8225d9f to 8b5b097 Compare June 3, 2026 16:37
@johntmyers johntmyers added gator:blocked Gator is blocked by process or repository gates and removed gator:watch-pipeline Gator is monitoring PR CI/CD status labels Jun 3, 2026
@johntmyers
Copy link
Copy Markdown
Collaborator

johntmyers commented Jun 3, 2026

gator-agent

Blocked

Gator is blocked by an infrastructure failure in the required Helm Lint check, not by a PR code issue.

Current CI summary for head 8b5b09755ae32269a367f8b6cd2142c655b8a4c6:

  • Branch Checks: passed
  • E2E: passed
  • GPU E2E: passed / not applied
  • Helm Lint: failed

Failure: mise run helm:lint could not download oci://registry-1.docker.io/bitnamicharts/postgresql:18.6.7 because Docker Hub returned HTTP 429 toomanyrequests for the unauthenticated pull rate limit.

Next action: a maintainer should rerun the Helm Lint workflow once the registry rate limit clears, or update the CI dependency-fetch path to use authenticated/cached access. Gator will keep watching this PR.

@TaylorMutch
Copy link
Copy Markdown
Collaborator Author

gator-agent

Blocked

Gator is blocked by an infrastructure failure in the required Helm Lint check, not by a PR code issue.

Current CI summary for head 8b5b09755ae32269a367f8b6cd2142c655b8a4c6:

  • Branch Checks: passed
  • E2E: passed
  • GPU E2E: passed / not applied
  • Helm Lint: failed

Failure: mise run helm:lint could not download oci://registry-1.docker.io/bitnamicharts/postgresql:18.6.7 because Docker Hub returned HTTP 429 toomanyrequests for the unauthenticated pull rate limit.

Next action: a maintainer should rerun the Helm Lint workflow once the registry rate limit clears, or update the CI dependency-fetch path to use authenticated/cached access. Gator will keep watching this PR.

Reran, this is passing now

@johntmyers johntmyers added gator:approval-needed Gator completed review; maintainer approval needed and removed gator:blocked Gator is blocked by process or repository gates labels Jun 3, 2026
@johntmyers
Copy link
Copy Markdown
Collaborator

gator-agent

Maintainer Approval Needed

Gator validation and PR monitoring are complete.

Validation: PR #1700 is project-valid because it fixes #1691, a reproducible Helm cert-manager install failure where the gateway cannot mount the sandbox JWT signing Secret.
Review: No blocking findings remain. The client-CA cert-manager precedence issue found during gator review was addressed in head 8b5b09755ae32269a367f8b6cd2142c655b8a4c6 with template changes and Helm unit coverage.
Checks: Branch Checks, Helm Lint, and E2E are green on head 8b5b09755ae32269a367f8b6cd2142c655b8a4c6.
E2E: test:e2e is applied and Core E2E passed; GPU E2E is green/not applied.

Human maintainer approval or merge decision is now required.

@TaylorMutch TaylorMutch merged commit 5f58cb0 into main Jun 3, 2026
47 of 50 checks passed
@TaylorMutch TaylorMutch deleted the 1691-cert-manager-jwt-secret/tmutch branch June 3, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

area:cluster Related to running OpenShell on k3s/docker area:docs Documentation and examples gator:approval-needed Gator completed review; maintainer approval needed test:e2e Requires end-to-end coverage

Projects

None yet

Development

Successfully merging this pull request may close these issues.

bug: cert-manager Helm install does not create sandbox JWT secret

2 participants