fix: non-blocking server initialization for faster gateway startup#472
fix: non-blocking server initialization for faster gateway startup#472jchangx wants to merge 2 commits into
Conversation
Move reloadConfiguration to a background goroutine so the transport server (stdio/sse/streaming) starts immediately instead of waiting for all MCP servers to respond. Unreachable servers (e.g. VPN-only endpoints like grafana-remote and sigma-remote) previously caused ~60s startup delays due to 30s transport timeouts. Also add a 15s per-server timeout in listCapabilities to prevent a single slow/unreachable server from blocking all other servers in the concurrent capability listing. The go-sdk's Server.AddTool is thread-safe and automatically sends tools/list_changed notifications to connected clients, so tools appear progressively as each server responds. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: bcd0e0b060
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| go func() { | ||
| if err := g.reloadConfiguration(ctx, configuration, nil, nil); err != nil { |
There was a problem hiding this comment.
Keep initial reload synchronous in dry-run mode
Launching reloadConfiguration in a goroutine here means gateway run --dry-run can return at line 436 before any capability discovery executes, because dry-run exits immediately and the process can terminate before the background goroutine runs. This regresses dry-run from a real configuration/capability validation pass into a best-effort no-op, so broken/unreachable server configs may now appear successful and expected dry-run output (discovered tools / initialization summary) can be missing.
Useful? React with 👍 / 👎.
Move the dry-run early-return before the background goroutine so that dry-run mode runs reloadConfiguration synchronously. This ensures server configs are fully validated and discovered tools are reported before the process exits, instead of racing with a background goroutine that may never complete. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Summary
reloadConfigurationto a background goroutine so the transport server starts immediately instead of blocking on all MCP servers respondinglistCapabilitiesto prevent one slow/unreachable server from blocking the restProblem
When VPN-only servers (e.g.
grafana-remote,sigma-remoteat*.s.us-east-1.aws.dckr.io) are unreachable, the gateway takes ~60s to start becausereloadConfigurationblocks onlistCapabilities, which waits for each server's 30s transport timeout. This causes MCP clients (like Claude Code's/mcpreconnect) to time out before the gateway becomes ready.Approach
Non-blocking init: The transport server (stdio/sse/streaming) now starts immediately.
reloadConfigurationruns in a background goroutine. The go-sdk'sServer.AddToolis thread-safe and automatically sendsnotifications/tools/list_changedto connected clients, so tools appear progressively as each server responds.Per-server timeout: Each server in
listCapabilitiesnow gets a 15s context timeout. This prevents one unreachable server from consuming the full transport timeout (30s) and ensures healthy servers aren't delayed waiting for the errgroup to complete.Risk analysis
startStdioServerServer.AddToolthread safetyreloadConfigurationinitialize, giving background load timetools/listresponsetools/list_changednotification triggers re-fetchTest plan
go build ./pkg/gateway/...compiles cleanlygo test ./pkg/gateway/... -shortpasses/mcpreconnect works on first try without VPNtools/list_changednotification reaches Claude Code and triggers tool re-fetch🤖 Generated with Claude Code