fix: abort in-flight request handlers on connection close#1735
fix: abort in-flight request handlers on connection close#1735felixweinberger wants to merge 1 commit intomainfrom
Conversation
Previously, request handlers would continue running after the transport disconnected, wasting resources and preventing cleanup of long-running operations. Protocol._onclose() now aborts all active request handler AbortControllers with a ConnectionClosed error. Also fixes InMemoryTransport.close() firing onclose twice on the initiating side due to peer recursion. Fixes #611 Co-authored-by: Aljosa Asanovic <aljosa.a@gmail.com>
@modelcontextprotocol/client
@modelcontextprotocol/server
@modelcontextprotocol/express
@modelcontextprotocol/hono
@modelcontextprotocol/node
commit: |
There was a problem hiding this comment.
LGTM — small, well-tested fix that connects existing abort controller infrastructure to the connection close path.
Extended reasoning...
Overview
This PR makes two small, focused changes: (1) in Protocol._onclose(), it aborts all in-flight request handler abort controllers when the connection closes, and (2) it makes InMemoryTransport.close() idempotent with a _closed guard. Both changes are accompanied by new tests.
Security risks
No security concerns. The changes are purely about resource cleanup on disconnect — no auth, crypto, or permission logic is involved.
Level of scrutiny
The protocol.ts change adds 6 lines that follow the exact same save-replace-iterate pattern already used for _responseHandlers cleanup 3 lines above. The InMemoryTransport change is a standard idempotency guard (3 lines). Both are mechanical and low-risk. The two bug reports found are pre-existing issues (timeout handle leak in _onclose) and a very low-impact edge case in a test utility (InMemoryTransport.close missing try-finally), neither of which are regressions from this PR.
Other factors
The abort controller infrastructure (_requestHandlerAbortControllers map, creation in _onrequest, cleanup in .finally(), cancellation via notifications/cancelled) already existed — this PR simply connects the same mechanism to the close event. Test coverage is good: the protocol test verifies the abort signal fires with ConnectionClosed, and three InMemoryTransport tests cover single-fire, double-close, and concurrent close scenarios.
| async close(): Promise<void> { | ||
| if (this._closed) return; | ||
| this._closed = true; | ||
|
|
||
| const other = this._otherTransport; | ||
| this._otherTransport = undefined; | ||
| await other?.close(); |
There was a problem hiding this comment.
🟡 If await other?.close() throws (e.g., the peer's onclose callback throws), this.onclose?.() on line 48 is never reached. Since _closed is already set to true on line 43, subsequent calls to close() return immediately on line 42, permanently preventing Protocol._onclose() from firing. Wrapping in try { await other?.close(); } finally { this.onclose?.(); } would ensure onclose always fires.
Extended reasoning...
Bug analysis
The new close() implementation sets this._closed = true on line 43 before calling await other?.close() on line 47. If the peer transport's close() throws — which happens when the peer's onclose callback (i.e., Protocol._onclose()) throws — the exception propagates out of close() before reaching this.onclose?.() on line 48.
Code path
When Protocol.connect() wires up the transport, it sets transport.onclose to call Protocol._onclose(). So this.onclose?.() in InMemoryTransport.close() is what triggers Protocol's cleanup of response handlers, abort controllers, and state. If this line is skipped, that cleanup never happens.
Step-by-step proof
- Transport A calls
close()._closedis set totrue,_otherTransportis saved and cleared. await other?.close()calls Transport B'sclose(). Transport B sets its own_closed = true.- Transport B calls
this.onclose?.(), which invokesProtocol._onclose()on B's side. - Suppose B's
Protocol._onclose()throws (e.g., a response handler callback or abort controller listener throws). - The exception propagates back through
await other?.close()in Transport A'sclose(). - Transport A's
this.onclose?.()on line 48 is never reached. - Calling
A.close()again hits the guardif (this._closed) return;on line 42 and returns immediately. - Transport A's
Protocol._onclose()is never called — response handlers, abort controllers, and other state are never cleaned up.
Regression from this PR
Before this PR, there was no _closed guard. If the first close() threw, calling close() again would work: _otherTransport was already undefined (cleared in step 1), so await undefined?.close() would be a no-op, and this.onclose?.() would fire successfully on retry.
Impact
Low. InMemoryTransport is a test utility, and Protocol._onclose() is not expected to throw in practice. The old code had its own bug (double-firing onclose) which this PR correctly fixes — the missing try-finally is a minor gap in the fix.
Fix
Wrap the peer close call in try-finally:
try { await other?.close(); } finally { this.onclose?.(); }This ensures onclose fires regardless of whether the peer's close throws.
|
|
||
| const requestHandlerAbortControllers = this._requestHandlerAbortControllers; | ||
| this._requestHandlerAbortControllers = new Map(); | ||
|
|
||
| const error = new SdkError(SdkErrorCode.ConnectionClosed, 'Connection closed'); | ||
|
|
||
| this._transport = undefined; |
There was a problem hiding this comment.
🟣 Pre-existing: _onclose() does not clear _timeoutInfo (leaking active setTimeout handles that keep the event loop alive for up to 60s after disconnect) or _requestResolvers (minor memory leak of stale resolver entries). Since this PR is already extending the cleanup pattern in _onclose() with abort controllers, consider also iterating _timeoutInfo to clearTimeout() each entry and clearing both maps for completeness.
Extended reasoning...
What the bug is
The _onclose() method (line 726) cleans up several internal maps when the connection closes: _responseHandlers, _progressHandlers, _taskProgressTokens, _pendingDebouncedNotifications, and now (with this PR) _requestHandlerAbortControllers. However, two maps are conspicuously missing from this cleanup: _timeoutInfo and _requestResolvers.
The _timeoutInfo leak (the substantive issue)
When an outgoing request is made via _requestWithSchema(), a timeout timer is set up via _setupTimeout() (around line 1306). This stores a setTimeout handle in _timeoutInfo. Normally, when a response arrives, _onresponse() calls _cleanupTimeout(messageId) which calls clearTimeout() and removes the entry.
However, in _onclose(), the response handlers are called directly with the ConnectionClosed error — they call reject(error) but do NOT call _cleanupTimeout(). The _timeoutInfo map is never iterated or cleared.
Step-by-step proof
- Client sends a request with the default 60s timeout →
_setupTimeout()stores a handle in_timeoutInfo - Connection closes before the response arrives →
_onclose()fires _onclose()saves and clears_responseHandlers, then iterates them callinghandler(error)which callsreject()- The
setTimeouthandle in_timeoutInfois NOT cleared - Up to 60 seconds later, the timer fires and calls
cancel(), which: deletes from already-empty_responseHandlers(no-op), triesthis._transport?.send()(no-op since transport is undefined), and callsreject()on an already-settled promise (no-op) - During those 60 seconds, the active timer handle keeps the Node.js event loop alive, preventing clean process exit
The _requestResolvers leak (minor)
_requestResolvers stores resolvers for task-related requests queued via the task message system. When _onclose() fires, the promises are correctly rejected through the responseHandlers loop, but the stale resolver entries remain in the map. This is a minor memory leak with no functional impact since message IDs increment monotonically, making collisions impossible.
Impact
The _timeoutInfo leak is the more impactful issue: in serverless functions, test runners, or CLI tools, the leaked timer handles prevent process.exit() from completing naturally for up to 60 seconds after a connection close. This can cause test suite hangs or delayed Lambda invocation cleanup.
Suggested fix
Add to _onclose(), alongside the existing cleanup:
for (const info of this._timeoutInfo.values()) {
clearTimeout(info.timeoutId);
}
this._timeoutInfo.clear();
this._requestResolvers.clear();This is pre-existing — not introduced by this PR — but the PR is directly modifying _onclose() and extending its cleanup pattern, making this a natural place to address it.
Aborts active request handlers when the transport connection closes, and makes
InMemoryTransport.close()idempotent.Salvaged from #833 by @alasano, ported to the v2 package structure with the Protocol tests that were requested in review.
Motivation and Context
When a client disconnects mid-request (network failure, timeout, crash), the server's
Protocol._onclose()cleans up response handlers but leaves in-flight request handlers running. Long-running operations (file uploads, external API calls, elicitation prompts) continue indefinitely, wasting resources and causing hangs.Separately,
InMemoryTransport.close()recurses through the peer and firesonclosetwice on the initiating side.Fixes #611. Supersedes #833.
How Has This Been Tested?
AbortSignalfires withConnectionClosedon transport closeBreaking Changes
None. Request handlers that ignore
ctx.mcpReq.signalare unaffected; handlers that respect it will now abort cleanly instead of running to completion after disconnect.Types of changes
Checklist
Additional context
The abort controllers infrastructure already existed for explicit
notifications/cancelledhandling; this connects the same mechanism to connection close events.