Skip to content

Fix blocking_operation_wait to always execute the operation#170

Open
samuel-williams-shopify wants to merge 1 commit intomainfrom
fix-blocking-operation-wait
Open

Fix blocking_operation_wait to always execute the operation#170
samuel-williams-shopify wants to merge 1 commit intomainfrom
fix-blocking-operation-wait

Conversation

@samuel-williams-shopify
Copy link
Copy Markdown
Contributor

@samuel-williams-shopify samuel-williams-shopify commented May 9, 2026

On Ruby head/4.1, IO#close calls the fiber scheduler's blocking_operation_wait hook for every close. The operation must be executed — returning without calling operation.call leaves Ruby's C-level state inconsistent, causing a NULL-pointer crash:

test/tcp_socket.rb:48: [BUG] Segmentation fault at 0x0000000000000000
get_blocking_operation (scheduler.c:123)
rb_fiber_scheduler_blocking_operation_wait (scheduler.c:1104)
io_close_fptr (io.c:5765)

Root cause

TestScheduler#blocking_operation_wait was conditionally defined (only when WorkerPool is available) and called @worker_pool&.call(operation). When @worker_pool was nil this returned nil silently, never executing the operation. Ruby then crashed trying to retrieve the result from the unexecuted operation.

Fix

Always define blocking_operation_wait and always execute the operation — delegating to the worker pool when available, falling back to Thread.new { operation.call }.join otherwise.

Made with Cursor

@samuel-williams-shopify samuel-williams-shopify force-pushed the fix-blocking-operation-wait branch 4 times, most recently from 606e545 to 91405bc Compare May 9, 2026 09:03
On Ruby head/4.1, IO#close calls the fiber scheduler's
blocking_operation_wait hook. The operation must be executed — returning
without calling operation.call leaves Ruby's internal state inconsistent
and causes a NULL-pointer crash in get_blocking_operation (scheduler.c).

Previously the hook was only defined when WorkerPool existed and silently
dropped the operation when @worker_pool was nil. Now the hook is always
defined and falls back to Thread.new { operation.call }.join when no
worker pool is configured.

Co-authored-by: Cursor <cursoragent@cursor.com>
@samuel-williams-shopify samuel-williams-shopify force-pushed the fix-blocking-operation-wait branch from 91405bc to 7a2e3b6 Compare May 9, 2026 09:05
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 9, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler implementation calls
rb_fiber_scheduler_block (e.g. a worker-pool scheduler). When the fiber is
suspended, the C frame of rb_fiber_scheduler_blocking_operation_wait is no
longer active. In optimised builds,  may be held only in
a machine register that the conservative GC does not scan for suspended
fibers, allowing the object to be collected before get_blocking_operation()
is called at line 1104.

RB_GC_GUARD(blocking_operation) after rb_funcall forces the compiler to keep
the VALUE on the stack (via a volatile read), ensuring it is always reachable
as a GC root regardless of register allocation.

Confirmed by GC.disable workaround in socketry/io-event#170
which prevents the crash by stopping GC during the blocking_operation_wait call.

Co-authored-by: Cursor <cursoragent@cursor.com>
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 9, 2026
…eration_wait

rb_funcall(scheduler, :blocking_operation_wait, 1, blocking_operation) can
cause a fiber switch if the scheduler calls rb_fiber_scheduler_block. When
the fiber is suspended, the C frame of rb_fiber_scheduler_blocking_operation_wait
is no longer active. In optimised builds (-O3 --enable-shared), blocking_operation
may be held only in a machine register not saved/scanned by the conservative GC,
allowing it to be collected. get_blocking_operation() at line 1104 then reads
freed/reused memory, crashing with rb_unexpected_object_type.

Confirmed by reproducing the crash using:
  ./configure --enable-shared --disable-install-doc --enable-yjit cppflags=-DENABLE_PATH_CHECK=0

RB_GC_GUARD(blocking_operation) after rb_funcall forces the compiler to keep
the VALUE on the stack (volatile read), ensuring the GC always finds it.

See: socketry/io-event#170
     socketry/io-event#171
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant