Skip to content

Fix GC compaction safety and write barriers in worker_pool.c#172

Open
samuel-williams-shopify wants to merge 1 commit intomainfrom
fix-worker-pool-write-barriers
Open

Fix GC compaction safety and write barriers in worker_pool.c#172
samuel-williams-shopify wants to merge 1 commit intomainfrom
fix-worker-pool-write-barriers

Conversation

@samuel-williams-shopify
Copy link
Copy Markdown
Contributor

@samuel-williams-shopify samuel-williams-shopify commented May 10, 2026

Main bug: stale raw C pointer after GC compaction

worker_pool_call extracted a raw C pointer (rb_fiber_scheduler_blocking_operation_t *) from the blocking_operation TypedData object before yielding the fiber. The compacting GC can move that TypedData object while the calling fiber is suspended, leaving a stale pointer that the worker thread then dereferences — crashing.

Fix: store the Ruby VALUE alongside the raw pointer and register all four Work struct VALUEs (blocking_operation_value, scheduler, blocker, fiber) as precise GC roots via rb_gc_register_address. The compacting GC updates these in-place when objects move. The worker thread re-extracts the raw pointer from the (now-updated) VALUE right before use.

Additional: RUBY_TYPED_WB_PROTECTED + compact function

Without WB_PROTECTED, RB_OBJ_WRITE (already present at worker creation) installed no write barrier. Adding the flag makes it functional. A new worker_pool_compact function updates worker->thread via rb_gc_location so thread objects can be moved by the compacting GC.

Additional: rb_gc_mark_movable for thread objects

Changed from rb_gc_mark (which pins objects, preventing compaction) to rb_gc_mark_movable so thread objects are eligible for compaction.

@samuel-williams-shopify samuel-williams-shopify force-pushed the fix-worker-pool-write-barriers branch 6 times, most recently from 8e19e94 to 40ceaca Compare May 10, 2026 03:21
@samuel-williams-shopify samuel-williams-shopify changed the title Fix write barriers in worker_pool.c Fix GC compaction safety and write barriers in worker_pool.c May 10, 2026
@samuel-williams-shopify samuel-williams-shopify force-pushed the fix-worker-pool-write-barriers branch from 40ceaca to e620883 Compare May 10, 2026 03:24
1. RB_GC_GUARD(_blocking_operation) in worker_pool_call
   The _blocking_operation VALUE argument is on the calling fiber's C stack.
   The GC finds it via conservative scan, marks the underlying TypedData
   object, and prevents collection — keeping the raw C pointer extracted
   before the fiber switch valid throughout the worker's execution.
   RB_GC_GUARD ensures the compiler doesn't treat the VALUE as dead before
   the end of the function.

2. RUBY_TYPED_WB_PROTECTED + compact function
   Without WB_PROTECTED, RB_OBJ_WRITE (used at worker creation) installed
   no write barrier. A new compact function updates worker->thread via
   rb_gc_location for proper compacting GC support.

3. rb_gc_mark_movable for thread objects
   Changed from rb_gc_mark (which pins) to rb_gc_mark_movable so threads
   can be relocated by the compacting GC.

Co-authored-by: Cursor <cursoragent@cursor.com>
@samuel-williams-shopify samuel-williams-shopify force-pushed the fix-worker-pool-write-barriers branch from e620883 to 2e301ec Compare May 10, 2026 04:27
samuel-williams-shopify added a commit to samuel-williams-shopify/ruby that referenced this pull request May 10, 2026
Extract the raw operation pointer before rb_funcall so it is obtained
while the GVL is held and no fiber switch has occurred yet.

Place RB_GC_GUARD(blocking_operation) after the last implicit use of the
VALUE — all accesses via the derived `operation` pointer — so the
compiler cannot treat blocking_operation as dead before this point,
keeping it reachable as a GC root through rb_funcall and through all
subsequent uses of the raw pointer.

See: socketry/io-event#172
Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant