Problem
In stateless mode (sessionIdGenerator: undefined), the recommended pattern — including the SDK's own simpleStatelessStreamableHttp.ts example — creates a full McpServer + Protocol + StreamableHTTPServerTransport on every HTTP request:
app.post('/mcp', async (req, res) => {
const server = getServer(); // new McpServer per request
const transport = new StreamableHTTPServerTransport({ sessionIdGenerator: undefined });
await server.connect(transport);
await transport.handleRequest(req, res, req.body);
res.on('close', () => { transport.close(); server.close(); });
});
Each request allocates:
McpServer → Server → Protocol: 9 Maps/Sets (_requestHandlers, _responseHandlers, _progressHandlers, _notificationHandlers, _requestHandlerAbortControllers, _timeoutInfo, _pendingDebouncedNotifications, _taskProgressTokens, _requestResolvers), plus _loggingLevels Map
Server: new AjvJsonSchemaValidator (compiles JSON schemas)
StreamableHTTPServerTransport → WebStandardStreamableHTTPServerTransport: 3 Maps (_streamMapping, _requestToStreamMapping, _requestResponseMap), plus getRequestListener from @hono/node-server
This works fine for low-traffic dev/demo scenarios. But for production HTTP servers handling sustained concurrent traffic, V8's GC can't reclaim these objects fast enough, causing steady memory growth until OOMKill.
Real-world impact
We run an MCP server (platform-mcp-gateway) in production on Kubernetes with 1200Mi memory limit. Using this pattern, memory grew ~1-2% per hour until hitting the limit, triggering repeated OOMKill alerts. The service has been running for months — this is a slow leak, not a burst.
Benchmark
We benchmarked the per-request McpServer approach vs. a lightweight JSON-RPC dispatcher that reuses the same handler functions (2,000 requests, --expose-gc):
| Metric |
McpServer per request |
Lightweight dispatcher |
Delta |
| Throughput |
2,797 req/s |
6,536 req/s |
2.3x faster |
| Heap growth |
+3.78 MB |
+1.41 MB |
2.7x less |
| Per-request retained |
~1,984 bytes |
~738 bytes |
-63% |
Why you can't just reuse a McpServer
The obvious fix — share one McpServer across concurrent requests — doesn't work because Protocol.connect(transport) replaces this._transport. If request A and B overlap:
connect(transportA) → sets this._transport = transportA
connect(transportB) → sets this._transport = transportB
- Request A's
onmessage fires → _onrequest captures this._transport (now transportB) → response goes to wrong client
Suggestions
-
Lightweight stateless mode: for stateless servers, the full Protocol/Transport stack is overkill — there's no session state, no SSE streaming needed, no server-initiated notifications. A StatelessMcpServer (or a flag on McpServer) could skip all the per-request infrastructure and just dispatch JSON-RPC directly.
-
Fix the connect() transport race: if _onrequest captured the transport from the onmessage callback's closure (the transport that received the message) instead of from this._transport, a single McpServer could safely handle concurrent stateless requests.
-
At minimum, document the trade-off: the stateless example should note that creating a server per request has significant overhead at scale and suggest alternatives for production deployments.
Environment
@modelcontextprotocol/sdk: 1.29.0
- Node.js: 24.x
- Runtime: Kubernetes pods (1200Mi limit,
--max-old-space-size=900)
Problem
In stateless mode (
sessionIdGenerator: undefined), the recommended pattern — including the SDK's ownsimpleStatelessStreamableHttp.tsexample — creates a fullMcpServer+Protocol+StreamableHTTPServerTransporton every HTTP request:Each request allocates:
McpServer→Server→Protocol: 9 Maps/Sets (_requestHandlers,_responseHandlers,_progressHandlers,_notificationHandlers,_requestHandlerAbortControllers,_timeoutInfo,_pendingDebouncedNotifications,_taskProgressTokens,_requestResolvers), plus_loggingLevelsMapServer: newAjvJsonSchemaValidator(compiles JSON schemas)StreamableHTTPServerTransport→WebStandardStreamableHTTPServerTransport: 3 Maps (_streamMapping,_requestToStreamMapping,_requestResponseMap), plusgetRequestListenerfrom@hono/node-serverThis works fine for low-traffic dev/demo scenarios. But for production HTTP servers handling sustained concurrent traffic, V8's GC can't reclaim these objects fast enough, causing steady memory growth until OOMKill.
Real-world impact
We run an MCP server (
platform-mcp-gateway) in production on Kubernetes with 1200Mi memory limit. Using this pattern, memory grew ~1-2% per hour until hitting the limit, triggering repeated OOMKill alerts. The service has been running for months — this is a slow leak, not a burst.Benchmark
We benchmarked the per-request
McpServerapproach vs. a lightweight JSON-RPC dispatcher that reuses the same handler functions (2,000 requests,--expose-gc):Why you can't just reuse a McpServer
The obvious fix — share one
McpServeracross concurrent requests — doesn't work becauseProtocol.connect(transport)replacesthis._transport. If request A and B overlap:connect(transportA)→ setsthis._transport = transportAconnect(transportB)→ setsthis._transport = transportBonmessagefires →_onrequestcapturesthis._transport(nowtransportB) → response goes to wrong clientSuggestions
Lightweight stateless mode: for stateless servers, the full Protocol/Transport stack is overkill — there's no session state, no SSE streaming needed, no server-initiated notifications. A
StatelessMcpServer(or a flag onMcpServer) could skip all the per-request infrastructure and just dispatch JSON-RPC directly.Fix the
connect()transport race: if_onrequestcaptured the transport from theonmessagecallback's closure (the transport that received the message) instead of fromthis._transport, a single McpServer could safely handle concurrent stateless requests.At minimum, document the trade-off: the stateless example should note that creating a server per request has significant overhead at scale and suggest alternatives for production deployments.
Environment
@modelcontextprotocol/sdk: 1.29.0--max-old-space-size=900)