Skip to content

fix(mcp-bridge): terminate backend process on client disconnect and bound concurrent sessions#13482

Open
shreemaan-abhishek wants to merge 1 commit into
apache:masterfrom
shreemaan-abhishek:fix/mcp-bridge-session-cleanup
Open

fix(mcp-bridge): terminate backend process on client disconnect and bound concurrent sessions#13482
shreemaan-abhishek wants to merge 1 commit into
apache:masterfrom
shreemaan-abhishek:fix/mcp-bridge-session-cleanup

Conversation

@shreemaan-abhishek

Copy link
Copy Markdown
Contributor

Description

The mcp-bridge plugin spawns a backend process per SSE connection and relies on the SSE session loop returning to tear that process down. Today the loop only ends when a keepalive write fails, which can take up to two 30s ping cycles or, if writes keep being absorbed by the connection, may not happen at all. As a result a backend process started for a client that has already gone away can stay alive until the worker reloads. Separately, there is no ceiling on how many sessions a worker will keep open, so a route can keep spawning backend processes for as many connections as are opened.

This PR makes session teardown deterministic and bounds concurrency:

  • Register an ngx.on_abort handler so a client disconnect stops the session promptly instead of waiting for the next keepalive write to fail.
  • Make the ping loop wake early once the session has been asked to stop, so the backend process is released without waiting out the keepalive interval.
  • Always run teardown (backend process + broker state) and free the session slot through a guarded path, regardless of how the loop ended.
  • Add a per-worker concurrent-session ceiling via a new max_sessions config field (default 100); connections beyond the ceiling get 429.
  • Enable lua_check_client_abort in the main http block so on_abort is usable.

The concurrent-session bookkeeping is factored into a small apisix/plugins/mcp/session_limit.lua module so it can be unit-tested.

Behaviour changes

  • lua_check_client_abort is now enabled globally in the main http block. This means long-running Lua handler phases are terminated when the client disconnects. Proxied requests were already aborted by nginx in this case, so the practical effect is limited to Lua streaming handlers, which now stop work when the client goes away.
  • New optional config field max_sessions (integer, default 100). Existing configs keep working unchanged; the default applies when the field is omitted.

Which issue(s) this PR fixes:

Fixes #

Checklist

  • I have explained the need for this PR and the problem it solves
  • I have explained the changes or the new features added to this PR
  • I have added tests corresponding to this change
  • I have updated the documentation to reflect this change
  • I have verified that this change is backward compatible (If not, please discuss on the APISIX mailing list first)

…ound concurrent sessions

The SSE session loop only ended when a keepalive write failed, which can
take two 30s ping cycles or never happen, so a backend process spawned for
a disconnected client could stay alive until worker reload. There was also
no ceiling on concurrent sessions, so a route could keep spawning backend
processes for as many connections as were opened.

- register an ngx.on_abort handler so a client disconnect stops the session
  promptly instead of waiting for the next keepalive write to fail
- make the ping loop wake early once the session is asked to stop
- always run teardown (process + broker) and free the session slot via a
  guarded path, regardless of how the loop ended
- add a per-worker concurrent-session ceiling (max_sessions, default 100);
  excess SSE connections get 429
- enable lua_check_client_abort in the main http block so on_abort works

Behaviour change: lua_check_client_abort is now on globally, so long-running
Lua handler phases are terminated when the client disconnects (proxied
requests were already aborted by nginx). New config field max_sessions.
@dosubot dosubot Bot added the size:L This PR changes 100-499 lines, ignoring generated files. label Jun 8, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:L This PR changes 100-499 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant