Skip to content

bug: gateway service fails to start on Ubuntu 22.04 and TLS dir mismatch #1593

@zongqichen

Description

@zongqichen

Agent Diagnostic

  • Loaded debug-openshell-cluster skill
  • Read deploy/deb/openshell-gateway.service and crates/openshell-server/src/defaults.rs on main
  • Ran systemctl --user show openshell-gateway. ExecStartPre resolved %S/openshell/tls to ~/.config/openshell/tls
  • Ran ls ~/.local/state/openshell/tls, which is empty. That's where default_local_tls_dir() looks via openshell_state_dir()
  • Checked man 5 systemd.unit on Ubuntu 22.04. %S for user managers is $XDG_CONFIG_HOME on systemd <250, changed to $XDG_STATE_HOME in systemd 250
  • Tried a drop-in with explicit --tls-cert, --tls-key, --tls-client-ca pointing at ~/.config/openshell/tls. Gateway started cleanly, so the certs are fine, only the lookup path is wrong

Description

Fresh install on Ubuntu 22.04 fails to start: Error: × --tls-cert is required when TLS is enabled (use --disable-tls to skip)

I expected the gateway to come up cleanly, since ExecStartPre runs generate-certs and default_local_tls_dir() is supposed to auto-detect them.

The function ExecStartPre writes certs to %S/openshell/tls, and the binary looks under $XDG_STATE_HOME/openshell/tls. On systemd 249 (Ubuntu 22.04) %S in a user unit resolves to $XDG_CONFIG_HOME, so the two end up in different places (~/.config/openshell/tls vs ~/.local/state/openshell/tls). systemd 250 changed %S for user managers to $XDG_STATE_HOME, which is probably why this wasn't caught. Anyone on 24.04 or newer Fedora won't see it.

Reproduction Steps

curl -LsSf https://raw.githubusercontent.com/NVIDIA/OpenShell/main/install.sh | sh

Environment

  • WSL2 (Ubuntu 22.04.5 LTS)
  • systemd: 249.11-0ubuntu3.19
  • Docker: 27.3.1
  • OpenShell: 0.0.49 (installed via dpkg -i openshell_0.0.49-1_amd64.deb from the GitHub release)

Logs

May 27 16:03:28 systemd[969]: Starting OpenShell Gateway...
May 27 16:03:28 openshell-gateway[570674]: 2026-05-27T14:03:28.627272Z  INFO openshell_server::certgen: PKI files created. dir=***/.config/openshell/tls
May 27 16:03:28 systemd[969]: Started OpenShell Gateway.
May 27 16:03:28 openshell-gateway[570695]: Error:   × --tls-cert is required when TLS is enabled (use --disable-tls to skip)
May 27 16:03:28 systemd[969]: openshell-gateway.service: Main process exited, code=exited, status=1/FAILURE
May 27 16:03:28 systemd[969]: openshell-gateway.service: Failed with result 'exit-code'.
May 27 16:03:33 systemd[969]: openshell-gateway.service: Scheduled restart job, restart counter is at 1.
May 27 16:03:33 systemd[969]: Stopped OpenShell Gateway.

Agent-First Checklist

  • I pointed my agent at the repo and had it investigate this issue
  • I loaded relevant skills (e.g., debug-openshell-cluster, debug-inference, openshell-cli)
  • My agent could not resolve this — the diagnostic above explains why

Metadata

Metadata

Assignees

Labels

area:gatewayGateway server and control-plane workstate:agent-readyApproved for agent implementationstate:pr-openedPR has been opened for this issuetopic:compatibilityCompatibility-related work

Type

No fields configured for Bug.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions