Skip to content

fix(cli): restart stale clients after updates#35455

Open
kitlangton wants to merge 1 commit into
v2from
service-version-guard
Open

fix(cli): restart stale clients after updates#35455
kitlangton wants to merge 1 commit into
v2from
service-version-guard

Conversation

@kitlangton

Copy link
Copy Markdown
Contributor

Closes #35448

Summary

  • prevent older clients from starting, stopping, or reloading over a healthy newer daemon
  • compare generated preview build and retry suffixes numerically, including next-9999 versus next-15000
  • gracefully tear down a stale TUI and ask the Node launcher to run the currently installed binary with the same arguments
  • bound launcher-driven restart to one attempt so a failed update cannot create another loop

Incident

A 0.0.0-next-14901 TUI remained open after the installed package advanced to 0.0.0-next-14928. When its SSE connection dropped, strict discovery rejected the newer daemon. Service.start() then treated the mismatch as replaceable, terminated the daemon, and spawned process.execPath.

That path now resolved to the 14928 binary, so the stale process could never produce the 14901 daemon it was waiting for. Other TUIs raced to replace each terminated daemon, and registration losers self-evicted on the 10-second ownership check.

Across 23 minutes this produced 366 daemon starts, 377 interrupted event streams, 1,633 watcher-stop records, three durable Step interrupted failures in the reported session, and additional aborted provider work. Most warning volume was repeated startup/shutdown noise; the material failure was loss of process-local session execution.

Behavior

Client and daemon Result
Same version Reuse the daemon
Newer client Existing replacement behavior
Older client Preserve the daemon and restart the client
No healthy daemon Start normally

The TUI restart uses exit code 75 after normal renderer/plugin teardown. The existing Node launcher handles that code by launching its resolved binary once more with the original arguments and inherited terminal streams. A second 75 is propagated instead of looping.

Verification

  • repository push hook: bun turbo typecheck (31 packages)
  • bun typecheck in packages/client, packages/cli, and packages/tui
  • client service, Effect client, import-boundary, and Promise client tests: 17 passed
  • CLI service, launcher, and standalone lifecycle tests: 5 passed
  • TUI reconnect and app lifecycle tests: 8 passed
  • file-scoped Prettier, oxlint, and git diff --check
  • two read-only reviews, including a follow-up review after fixing numeric prerelease ordering

Follow-ups

This intentionally does not add a cross-process startup lock, drain active sessions before a newer client replaces an older daemon, or provide durable provider-work recovery after process death. Those are separate lifecycle changes rather than requirements for stopping this mixed-version loop.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant