Skip to content

Use eventfd for wakeup and enable IORING_SETUP_SINGLE_ISSUER#166

Open
samuel-williams-shopify wants to merge 1 commit intomainfrom
uring-eventfd-wakeup
Open

Use eventfd for wakeup and enable IORING_SETUP_SINGLE_ISSUER#166
samuel-williams-shopify wants to merge 1 commit intomainfrom
uring-eventfd-wakeup

Conversation

@samuel-williams-shopify
Copy link
Copy Markdown
Contributor

Motivation

The current wakeup() implementation submits a NOP SQE from the waking thread directly into the ring's submission queue. This cross-thread SQ access prevents enabling IORING_SETUP_SINGLE_ISSUER, a kernel 6.0+ flag that tells the kernel only one thread will ever submit SQEs, allowing it to skip internal SQ locking on every submit call.

Changes

eventfd-based wakeup

Instead of posting a NOP from the waking thread, the selector now:

  1. Creates an eventfd(EFD_CLOEXEC | EFD_NONBLOCK) at initialization.
  2. Before each blocking io_uring_wait_cqe_timeout, the owner thread registers a one-shot poll_add on the eventfd (user_data = NULL, silently discarded on completion).
  3. wakeup() — called from any thread — just calls write(wakeup_fd, 1). No ring access from the waking thread.
  4. The kernel sees the eventfd readable, completes the poll_add, and io_uring_wait_cqe_timeout returns.
  5. After unblocking the owner thread drains the eventfd so the next poll_add doesn't fire immediately.

IORING_SETUP_SINGLE_ISSUER

Because the SQ is now only ever touched by the owner thread, io_uring_queue_init is called with IORING_SETUP_SINGLE_ISSUER when the flag is available (guarded by #ifdef, defined in liburing ≥ 2.3 / kernel ≥ 6.0). The supported_p probe uses the same flags.

Benchmark workflow fix

benchmark.yaml was missing liburing-dev, so benchmarks silently fell back to the epoll backend. Added the apt install step so benchmarks now actually exercise URing.

Testing

The existing wakeup tests in test/io/event/selector.rb cover the correctness of cross-thread wakeup. CI will run them on Ubuntu with liburing-dev as usual.

The benchmark workflow (benchmark.yaml) will now also run against the URing backend, providing throughput numbers to compare against the main branch.

Made with Cursor

@samuel-williams-shopify samuel-williams-shopify force-pushed the uring-eventfd-wakeup branch 4 times, most recently from 07f1949 to dc2593b Compare May 9, 2026 07:07
@samuel-williams-shopify
Copy link
Copy Markdown
Contributor Author

Benchmark results (Linux 6.19, Ruby 3.4.8, dedicated machine)

Methodology: two git worktrees built independently — main (NOP wakeup) vs this PR (IO_Event_Interrupt async read + SINGLE_ISSUER + DEFER_TASKRUN). Each HTTP run is 2s wrk, 7 runs averaged.

Wakeup microbenchmark (benchmark/selector_wakeup.rb)

Cross-thread roundtrip: time from wakeup() call on another thread to select() returning on the owner thread.

Backend main PR Δ
URing 9.32 µs 10.21 µs +9%
EPoll 11.19 µs 10.84 µs noise
Select 21.06 µs 18.36 µs noise

Idle wakeup cost (calling wakeup() when selector is not blocking):

Backend main PR
URing 1.54 µs 1.53 µs

The ~1 µs roundtrip gap vs NOP is the irreducible cost of the eventfd write going through the kernel and completing a pending async read, rather than a NOP arriving at io_uring_wait_cqe_timeout directly. DEFER_TASKRUN closes most of this gap (was ~2.6 µs without it).

HTTP benchmark (benchmark/server/event.rb, fiber-per-connection)

main (NOP) PR Δ
Run 1 14,854 14,821
Run 2 14,085 14,592
Run 3 14,171 14,240
Run 4 13,774 14,768
Run 5 14,057 14,544
Run 6 14,717 14,627
Run 7 14,965 14,566
Average 14,375 req/s 14,594 req/s +1.5%

The DEFER_TASKRUN flag benefits the entire completion path, not just wakeup — which is why the HTTP throughput improves even though the pure wakeup latency is similar.

Replace the NOP-SQE cross-thread wakeup with an async read on an
IO_Event_Interrupt (eventfd on Linux, pipe elsewhere):

- wakeup() calls IO_Event_Interrupt_signal() — a plain write() that never
  touches the ring's SQ, making IORING_SETUP_SINGLE_ISSUER safe to use.
- Before each blocking wait the owner thread submits an async read on the
  interrupt descriptor; the read completes when wakeup() fires, consuming
  the counter atomically with no separate drain step.
- IORING_SETUP_SINGLE_ISSUER (kernel 6.0+): only the owner thread submits
  SQEs, allowing the kernel to skip internal SQ locking.
- IORING_SETUP_DEFER_TASKRUN (kernel 6.1+, requires SINGLE_ISSUER): defers
  io_uring task work to the application thread, reducing cross-CPU signalling
  overhead across the entire completion path (+~1-2% on HTTP benchmarks).

Both flags are guarded by #ifdef and degrade gracefully on older kernels.

Co-authored-by: Cursor <cursoragent@cursor.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant