Skip to content

fix: abort in-flight request handlers on connection close#1735

Open
felixweinberger wants to merge 1 commit intomainfrom
fweinberger/abort-handlers-on-close
Open

fix: abort in-flight request handlers on connection close#1735
felixweinberger wants to merge 1 commit intomainfrom
fweinberger/abort-handlers-on-close

Conversation

@felixweinberger
Copy link
Contributor

Aborts active request handlers when the transport connection closes, and makes InMemoryTransport.close() idempotent.

Salvaged from #833 by @alasano, ported to the v2 package structure with the Protocol tests that were requested in review.

Motivation and Context

When a client disconnects mid-request (network failure, timeout, crash), the server's Protocol._onclose() cleans up response handlers but leaves in-flight request handlers running. Long-running operations (file uploads, external API calls, elicitation prompts) continue indefinitely, wasting resources and causing hangs.

Separately, InMemoryTransport.close() recurses through the peer and fires onclose twice on the initiating side.

Fixes #611. Supersedes #833.

How Has This Been Tested?

  • New Protocol test verifying request handler AbortSignal fires with ConnectionClosed on transport close
  • New InMemoryTransport tests for single-fire, double-close idempotency, and concurrent close from both sides
  • All 444 core tests pass

Breaking Changes

None. Request handlers that ignore ctx.mcpReq.signal are unaffected; handlers that respect it will now abort cleanly instead of running to completion after disconnect.

Types of changes

  • Bug fix (non-breaking change which fixes an issue)
  • New feature (non-breaking change which adds functionality)
  • Breaking change (fix or feature that would cause existing functionality to change)
  • Documentation update

Checklist

  • I have read the MCP Documentation
  • My code follows the repository's style guidelines
  • New and existing tests pass locally
  • I have added appropriate error handling
  • I have added or updated documentation as needed

Additional context

The abort controllers infrastructure already existed for explicit notifications/cancelled handling; this connects the same mechanism to connection close events.

Previously, request handlers would continue running after the transport
disconnected, wasting resources and preventing cleanup of long-running
operations. Protocol._onclose() now aborts all active request handler
AbortControllers with a ConnectionClosed error.

Also fixes InMemoryTransport.close() firing onclose twice on the
initiating side due to peer recursion.

Fixes #611

Co-authored-by: Aljosa Asanovic <aljosa.a@gmail.com>
@pkg-pr-new
Copy link

pkg-pr-new bot commented Mar 23, 2026

Open in StackBlitz

@modelcontextprotocol/client

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/client@1735

@modelcontextprotocol/server

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/server@1735

@modelcontextprotocol/express

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/express@1735

@modelcontextprotocol/hono

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/hono@1735

@modelcontextprotocol/node

npm i https://pkg.pr.new/modelcontextprotocol/typescript-sdk/@modelcontextprotocol/node@1735

commit: 01003ff

Copy link

@claude claude bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM — small, well-tested fix that connects existing abort controller infrastructure to the connection close path.

Extended reasoning...

Overview

This PR makes two small, focused changes: (1) in Protocol._onclose(), it aborts all in-flight request handler abort controllers when the connection closes, and (2) it makes InMemoryTransport.close() idempotent with a _closed guard. Both changes are accompanied by new tests.

Security risks

No security concerns. The changes are purely about resource cleanup on disconnect — no auth, crypto, or permission logic is involved.

Level of scrutiny

The protocol.ts change adds 6 lines that follow the exact same save-replace-iterate pattern already used for _responseHandlers cleanup 3 lines above. The InMemoryTransport change is a standard idempotency guard (3 lines). Both are mechanical and low-risk. The two bug reports found are pre-existing issues (timeout handle leak in _onclose) and a very low-impact edge case in a test utility (InMemoryTransport.close missing try-finally), neither of which are regressions from this PR.

Other factors

The abort controller infrastructure (_requestHandlerAbortControllers map, creation in _onrequest, cleanup in .finally(), cancellation via notifications/cancelled) already existed — this PR simply connects the same mechanism to the close event. Test coverage is good: the protocol test verifies the abort signal fires with ConnectionClosed, and three InMemoryTransport tests cover single-fire, double-close, and concurrent close scenarios.

Comment on lines 42 to 48
async close(): Promise<void> {
if (this._closed) return;
this._closed = true;

const other = this._otherTransport;
this._otherTransport = undefined;
await other?.close();
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟡 If await other?.close() throws (e.g., the peer's onclose callback throws), this.onclose?.() on line 48 is never reached. Since _closed is already set to true on line 43, subsequent calls to close() return immediately on line 42, permanently preventing Protocol._onclose() from firing. Wrapping in try { await other?.close(); } finally { this.onclose?.(); } would ensure onclose always fires.

Extended reasoning...

Bug analysis

The new close() implementation sets this._closed = true on line 43 before calling await other?.close() on line 47. If the peer transport's close() throws — which happens when the peer's onclose callback (i.e., Protocol._onclose()) throws — the exception propagates out of close() before reaching this.onclose?.() on line 48.

Code path

When Protocol.connect() wires up the transport, it sets transport.onclose to call Protocol._onclose(). So this.onclose?.() in InMemoryTransport.close() is what triggers Protocol's cleanup of response handlers, abort controllers, and state. If this line is skipped, that cleanup never happens.

Step-by-step proof

  1. Transport A calls close(). _closed is set to true, _otherTransport is saved and cleared.
  2. await other?.close() calls Transport B's close(). Transport B sets its own _closed = true.
  3. Transport B calls this.onclose?.(), which invokes Protocol._onclose() on B's side.
  4. Suppose B's Protocol._onclose() throws (e.g., a response handler callback or abort controller listener throws).
  5. The exception propagates back through await other?.close() in Transport A's close().
  6. Transport A's this.onclose?.() on line 48 is never reached.
  7. Calling A.close() again hits the guard if (this._closed) return; on line 42 and returns immediately.
  8. Transport A's Protocol._onclose() is never called — response handlers, abort controllers, and other state are never cleaned up.

Regression from this PR

Before this PR, there was no _closed guard. If the first close() threw, calling close() again would work: _otherTransport was already undefined (cleared in step 1), so await undefined?.close() would be a no-op, and this.onclose?.() would fire successfully on retry.

Impact

Low. InMemoryTransport is a test utility, and Protocol._onclose() is not expected to throw in practice. The old code had its own bug (double-firing onclose) which this PR correctly fixes — the missing try-finally is a minor gap in the fix.

Fix

Wrap the peer close call in try-finally:

try { await other?.close(); } finally { this.onclose?.(); }

This ensures onclose fires regardless of whether the peer's close throws.

Comment on lines 732 to 738

const requestHandlerAbortControllers = this._requestHandlerAbortControllers;
this._requestHandlerAbortControllers = new Map();

const error = new SdkError(SdkErrorCode.ConnectionClosed, 'Connection closed');

this._transport = undefined;
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🟣 Pre-existing: _onclose() does not clear _timeoutInfo (leaking active setTimeout handles that keep the event loop alive for up to 60s after disconnect) or _requestResolvers (minor memory leak of stale resolver entries). Since this PR is already extending the cleanup pattern in _onclose() with abort controllers, consider also iterating _timeoutInfo to clearTimeout() each entry and clearing both maps for completeness.

Extended reasoning...

What the bug is

The _onclose() method (line 726) cleans up several internal maps when the connection closes: _responseHandlers, _progressHandlers, _taskProgressTokens, _pendingDebouncedNotifications, and now (with this PR) _requestHandlerAbortControllers. However, two maps are conspicuously missing from this cleanup: _timeoutInfo and _requestResolvers.

The _timeoutInfo leak (the substantive issue)

When an outgoing request is made via _requestWithSchema(), a timeout timer is set up via _setupTimeout() (around line 1306). This stores a setTimeout handle in _timeoutInfo. Normally, when a response arrives, _onresponse() calls _cleanupTimeout(messageId) which calls clearTimeout() and removes the entry.

However, in _onclose(), the response handlers are called directly with the ConnectionClosed error — they call reject(error) but do NOT call _cleanupTimeout(). The _timeoutInfo map is never iterated or cleared.

Step-by-step proof

  1. Client sends a request with the default 60s timeout → _setupTimeout() stores a handle in _timeoutInfo
  2. Connection closes before the response arrives → _onclose() fires
  3. _onclose() saves and clears _responseHandlers, then iterates them calling handler(error) which calls reject()
  4. The setTimeout handle in _timeoutInfo is NOT cleared
  5. Up to 60 seconds later, the timer fires and calls cancel(), which: deletes from already-empty _responseHandlers (no-op), tries this._transport?.send() (no-op since transport is undefined), and calls reject() on an already-settled promise (no-op)
  6. During those 60 seconds, the active timer handle keeps the Node.js event loop alive, preventing clean process exit

The _requestResolvers leak (minor)

_requestResolvers stores resolvers for task-related requests queued via the task message system. When _onclose() fires, the promises are correctly rejected through the responseHandlers loop, but the stale resolver entries remain in the map. This is a minor memory leak with no functional impact since message IDs increment monotonically, making collisions impossible.

Impact

The _timeoutInfo leak is the more impactful issue: in serverless functions, test runners, or CLI tools, the leaked timer handles prevent process.exit() from completing naturally for up to 60 seconds after a connection close. This can cause test suite hangs or delayed Lambda invocation cleanup.

Suggested fix

Add to _onclose(), alongside the existing cleanup:

for (const info of this._timeoutInfo.values()) {
    clearTimeout(info.timeoutId);
}
this._timeoutInfo.clear();
this._requestResolvers.clear();

This is pre-existing — not introduced by this PR — but the PR is directly modifying _onclose() and extending its cleanup pattern, making this a natural place to address it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Request handlers not cancelled when transport connection closes unexpectedly

1 participant