Harden ConnectionManager#close against non-StandardError during @sock.close#69
Merged
Harden ConnectionManager#close against non-StandardError during @sock.close#69
Conversation
….close A second Async::Stop fired into a fiber that's already unwinding through Base#request's ensure cleanup can escape @sock.close. The existing rescue only catches StandardError, so the non-StandardError skips @sock = nil / @pid = nil / abort_request! and the client is returned to the pool with a half-closed socket and @request_in_progress == true. Moves the state reset into an ensure so it runs regardless of what @sock.close raises.
gmalette
approved these changes
Apr 23, 2026
nherson
approved these changes
Apr 23, 2026
5 tasks
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Follow-up to #68. Closes a second state-leak pathway on the same fiber-cancellation theme: when a non-StandardError (e.g.
Async::Stop) is fired into a fiber that's already unwinding through Dalli's cleanup, it can escape@sock.closeand leave the client in a dirty state that gets returned to the connection pool.The bug
With #68 merged,
Base#requestnow callsclosefrom anensurewhen a non-StandardError aborts a request. That fix is correct — butConnectionManager#closeis itself not fully hardened against non-StandardError escapes:If the scheduler fires a second
Async::Stopinto the fiber while it's inside this cleanup (e.g. a timeout that triggers cancellation of already-cancelling work),@sock.closecan raise a non-StandardError. Therescue StandardErrordoesn't catch it, the three post-lines never run, and the exception propagates. The client goes back to theConnectionPoolwith@request_in_progress == trueand@sockstill pointing at a half-closed FD.The next fiber that checks out the client hits
confirm_ready!→close if request_in_progress?, which attempts the same close. If that is also interrupted, the dirty-client reuse loops until the scheduler stops re-cancelling — during which window cross-response byte pollution on the socket is possible, matching the production symptom of "reads occasionally returning unexpected data."The fix
Move the state reset into an
ensure, so@sock = nil/@pid = nil/abort_request!always run regardless of what@sock.closeraises. The non-StandardError continues to propagate (we don't widen the rescue), so callers still see the cancellation.Test
test/integration/test_fiber_concurrency.rbgrows a second case that:dc.set@sock.closeto raiseFiberCancellation(Exception subclass, stand-in forAsync::Stop)read_lineto raiseFiberCancellation(first cancellation, triggers the ensure path from Clean up connection state on non-StandardError aborts #68)request_in_progress?and notconnected?after the double-exception unwindsVerified the test fails on main (pre-fix) with the exact state leak this PR targets:
and passes with the
ensureapplied.Test plan
bundle exec rake testpassesbundle exec rubocopclean