perf(recv): AF_XDP RX path + io_engine_af_xdp vtable registration — Phase 2 PR 3#2
Conversation
Phase 2 PR 3 of the AF_XDP integration plan
(AnyVM-Tech/AnyScan plans/2026-04-27-portscan-afxdp-plan-v1.md, §3.4).
What this PR adds
- src/recv-afxdp.c — per-thread RX-only XSK + UMEM, initial FILL-ring
stocking + steady-state recycle, xsk_ring_cons__peek loop, hand-off to
the existing process_packet() so SYN-ACK/RST/UDP/ICMP accounting and
writer-queue handoff stay bit-for-bit identical to the AF_PACKET path.
- DRV+ZC → DRV-copy → SKB bind-mode fallback ladder mirroring send-afxdp.c.
- Receivers bind queue_id = config->senders + thread_id so they don't
collide with the TX path's queue_id = thread_id (a single XSK owns its
queue without XDP_SHARED_UMEM). c6in.metal channel count fits this
comfortably below the lower-half ZC cap (plan §3.5).
- include/scanner.h — declares xdp_receiver_thread under USE_AF_XDP.
- src/engine.c — registers io_engine_af_xdp vtable
({afxdp_tx_init_per_thread, xdp_sender_thread, xdp_receiver_thread,
afxdp_tx_teardown_per_thread}). pick_io_engine() now returns it instead
of NULL.
- src/conf.c — updates the not-built-with-USE_AF_XDP=1 hint to point at
the rebuild step now that PRs 2 + 3 have shipped.
- tests/io_engine_dispatch.sh — rewrites test [3] to verify dispatch is
reachable on USE_AF_XDP=1 builds and that the rebuild hint shows on
default builds.
What is deliberately NOT in this PR
- Makefile USE_AF_XDP=1 conditional (libxdp/libbpf link, source inclusion)
— Phase 2 PR C.
- AnyScan apt deps (libxdp-dev libbpf-dev libelf-dev) — Phase 2 PR C.
- systemd CAP_BPF capability + worker bundle runtime probe — Phase 2 PR C.
- Adapter --io-engine wiring + ANYSCAN_SCANNER_IO_ENGINE knob — Phase 2 PR D.
Test status
- `make` (default AF_PACKET build): passes, all 11 dispatch tests pass.
- `make USE_AF_XDP=1` (simulated by hand since the Makefile flag lands in
PR C): all 12 source files compile cleanly under -Wall -O3, full link
with -lxdp -lbpf -lelf -lz succeeds, dispatch tests pass (af_xdp branch
reaches parse_arguments OK and no longer hits the rebuild error).
- AnyGPT: cargo build --workspace clean, cargo test --workspace 33/33 pass.
- Live-bench gates that cannot be exercised in this environment (ENA
zero-copy stability per amzn-drivers#221, FILL-ring back-pressure under
sustained line-rate ingress) are documented in src/recv-afxdp.c header
comments and remain on the c6in.metal bench TODO.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: d6fb7fe636
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| /* Receivers bind queues offset past the TX queues — see file-header | ||
| * comment for the rationale. queue_id collisions show up as a clean | ||
| * bind-ladder failure, not a silent dropped-packet hazard. */ | ||
| uint32_t queue_id = (uint32_t)ctx->config->senders + (uint32_t)ctx->thread_id; |
There was a problem hiding this comment.
Bind AF_XDP receivers on queues that can see all replies
xdp_receiver_thread forces RX sockets onto queue_id = senders + thread_id, which guarantees no receiver is attached to queues 0..senders-1 that are used by TX sockets. On NICs with RSS/default indirection across all RX queues, reply traffic hashed to those unbound queues is never processed by process_packet, causing systematic false negatives even though sending succeeds; this is a correctness issue, not just a performance tradeoff.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed correctness bug — fixed in 9b4b22a.
Root cause: loading libxdp's default xsks_map redirect program (which the RX-only sockets did) attaches it netdev-wide, so reply packets RSS-hashed back to the TX queues 0..S-1 hit the program with no XSK in xsks_map[queue_id], returned XDP_PASS, and fell into the kernel stack where nothing was listening (no AF_PACKET receiver in af_xdp mode). Silent reply loss as you described.
Fix: each sender's XSK is now a combined TX+RX socket on the same (NIC, queue). The receiver thread reaches into the matching sender's xdp_tx_state via new afxdp_state_*() accessors and consumes that XSK's RX ring. engine.c::run_scan now does r_ctx[i] = scan_ctx[i % senders] so each receiver inherits the corresponding sender's XSK pointer. UMEM is partitioned half/half (TX_FRAMES + RX_FRAMES) so the TX free-stack and the FILL ring never address the same frame — keeps SPSC ring invariants without cross-thread frame transfer at runtime.
Remaining caveat (documented, not silenced): replies hashed to NIC queues outside 0..senders-1 are still lost — that's a NIC-level RSS configuration issue, not an XSK-binding issue. Sender 0's init now logs a clear `ethtool -X equal ` reminder when it binds, and PR C will wire that step into the worker-bundle install script. The file headers of `send-afxdp.c` and `recv-afxdp.c` document this for live bench.
| if (afxdp_rx_init_in_thread(ctx, ctx->config, queue_id) != 0) { | ||
| return NULL; |
There was a problem hiding this comment.
Fail the scan when AF_XDP receiver init cannot bind
If afxdp_rx_init_in_thread fails, the receiver thread just returns, but run_scan does not detect that loss and continues transmitting/finishing normally. In environments where AF_XDP RX bind fails (e.g., insufficient queues, missing privileges, incompatible mode), the scan can complete with empty or partial results and a success exit, which is a silent data-quality failure.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 9b4b22a.
Added `_Atomic int fatal_error = 0` (defined in main.c, declared extern in scanner_defs.h). Threads that hit unrecoverable conditions set both `fatal_error = 1` and `stop_signal = 1` so cooperating threads bail promptly. `main()` now returns `atomic_load(&fatal_error) ? 1 : 0`, so a scan that misconfigured the RX path produces a non-zero exit instead of completing as success with empty results.
Concretely for AF_XDP: with the combined TX+RX architecture the separate "RX init" failure mode goes away (RX setup is part of `afxdp_tx_init_per_thread`, which already calls `exit(1)` on failure via the existing engine.c path). The remaining failure mode the receiver thread can hit is `ctx->xdp_tx == NULL` (sender's init didn't populate it for any reason) — `xdp_receiver_thread` now checks for that explicitly, logs a clear "shared XSK is NULL — sender init likely failed" message with a fallback hint, sets `fatal_error`, and bails.
Addresses two P1 review comments on PR #2: 1. **Queue-id assignment caused silent reply loss.** The earlier shape (TX-only XSKs on queues 0..S-1, RX-only XSKs on S..S+R-1) loaded libxdp's default xsks_map redirect program netdev-wide, so reply packets that RSS hashed back to TX queues 0..S-1 hit the program with no XSK in xsks_map[queue_id], returned XDP_PASS, and fell into the kernel stack where nothing was listening (no AF_PACKET receiver in af_xdp mode). Fix: each sender's XSK is now a *combined* TX+RX socket on the same (NIC, queue). The receiver thread reaches into the matching sender's xdp_tx_state and consumes that XSK's RX ring. engine.c::run_scan now does `r_ctx[i] = scan_ctx[i % senders]` so each receiver inherits the corresponding sender's XSK pointer. 2. **Receiver init failure was silently swallowed.** Added an _Atomic int fatal_error in main.c; threads that hit unrecoverable conditions (e.g. shared XSK is NULL because sender init failed) set it and stop_signal. main() returns non-zero when fatal_error is set so the orchestrator can distinguish "scan ran but found nothing" from "scan was misconfigured and produced no replies". Architecture details - xdp_tx_state extended with the RX consumer ring; UMEM is partitioned half/half (TX_FRAMES + RX_FRAMES) so the TX free-stack and FILL ring never address the same frame and the SPSC ring invariants hold without cross-thread frame transfer at runtime. - afxdp_tx_init_per_thread now drops INHIBIT_PROG_LOAD (libxdp loads the redirect program), passes &s->rx + &s->tx to xsk_socket__create, and stocks the FILL ring with RX-half frames before pthread_create. After init the receiver is the sole producer of FILL. - New afxdp_state_*() accessors in send-afxdp.c expose the receiver-side rings to recv-afxdp.c without leaking xdp_tx_state internals across translation units. - Sender 0's init now logs an `ethtool -X <iface> equal <senders>` reminder because RSS reach for queues > senders-1 is still a known gap; PR C will wire this into install-worker-bundle.sh. Tests - Default `make` (AF_PACKET): 11/11 dispatch tests pass. - `make USE_AF_XDP=1` (simulated via -DUSE_AF_XDP + manual link with -lxdp -lbpf -lelf -lz; Makefile flag lands in PR C): clean compile, link succeeds, 9/9 dispatch tests pass with af_xdp branch reachable. - AnyGPT consumer: cargo build --workspace clean, cargo test --workspace passes (371 + 31 + 2 + 33 across crates, 0 failures).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 9b4b22acbf
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| int src = (config->senders > 0) ? (i % config->senders) : 0; | ||
| r_ctx[i] = scan_ctx[src]; |
There was a problem hiding this comment.
Enforce one AF_XDP receiver per sender queue
In run_scan, receivers are created with src = i % config->senders, but each AF_XDP receiver thread only drains a single sender-owned RX/FILL ring via ctx->xdp_tx (src/recv-afxdp.c::xdp_receiver_thread). This means senders > receivers leaves some sender queues with no RX consumer (replies on those queues are never processed), while receivers > senders attaches multiple receivers to the same ring pair, violating the documented SPSC ring model and introducing races/drops. Please fail fast for AF_XDP unless the mapping guarantees exactly one receiver per sender queue.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Fixed in 277ddc1.
`run_scan` now refuses to start when `--io-engine=af_xdp` and `senders != receivers`, with a clear error pointing at `-T N -R N` or the `af_packet` fallback. The check is gated on `#ifdef USE_AF_XDP` and only applies to the AF_XDP path because AF_PACKET / PF_RING ZC don't share per-queue ring state.
I also added a regression test for it: `tests/io_engine_dispatch.sh` now exercises `-T 2 -R 4 --io-engine=af_xdp` and asserts it exits 1 with an error mentioning the symmetric-count constraint. The dispatch suite is now 11/11 on both default and USE_AF_XDP=1 builds.
You're right to flag this — with the combined TX+RX XSK architecture each receiver is a strict SPSC consumer of one specific sender's RX/FILL ring pair, so no other ratio is safe. Failing fast is the right call vs. silently producing partial results.
Addresses the third P1 review comment on PR #2: with the combined TX+RX XSK architecture, the receiver:sender mapping must be strictly 1:1, or asymmetric counts produce silent data-quality bugs. - receivers < senders → some sender queues have no RX consumer; replies on those queues land on the kernel's RX ring forever (silent drops indistinguishable from "nothing matched"). - receivers > senders → wrap-mod attaches multiple receivers to the same XSK and concurrently reads/writes the same SPSC RX/FILL ring pair (races, drops, no error reported). run_scan now refuses to start when --io-engine=af_xdp and senders != receivers, with a clear error pointing at -T N -R N or the af_packet fallback. AF_PACKET / PF_RING ZC are unaffected because they don't share per-queue ring state. Tests - tests/io_engine_dispatch.sh adds a new check: -T 2 -R 4 with --io-engine=af_xdp must exit 1 and the error must mention the symmetric-count constraint. Default build: 11/11. USE_AF_XDP build: 11/11 (was 9/9 — two more checks under [3]).
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 277ddc1d5e
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| uint32_t refilled = afxdp_rx_fill_push(fill, recycled_addrs, rcvd); | ||
| if (refilled < rcvd && !quiet_mode) { |
There was a problem hiding this comment.
Requeue every consumed RX frame back to FILL
This path releases all rcvd RX descriptors, but only publishes refilled addresses back to the FILL ring; any rcvd - refilled addresses are dropped on the floor. Because xsk_ring_prod__reserve may return fewer slots than requested, this leaks UMEM frames under FILL-ring pressure and progressively shrinks the reusable RX buffer pool, eventually causing sustained packet loss (silent scan false negatives) instead of temporary backpressure.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Confirmed leak — fixed in d788dfc.
You're right that `xsk_ring_prod__reserve` returning fewer slots than requested means the unrecycled addrs are permanently lost from rotation. Once enough have leaked, FILL stays empty, the kernel has nowhere to write incoming packets, and the receiver goes silent (false negatives masquerading as "nothing matched").
Fix: a fixed-size pending queue (`uint64_t pending[ANYSCAN_AFXDP_RX_FRAMES]`, bounded by the half/half UMEM partition so it cannot overflow under the documented invariant). Each loop iteration:
- Flush pending into FILL first (drains as kernel makes room).
- Peek RX → process_packet → release.
- Push as many recycled addrs as FILL accepts; queue the rest tail-first into pending. The next iteration's flush retries.
I also added a defensive overflow check that sets `fatal_error` and bails if pending + leftover ever exceeds `RX_FRAMES` — that would mean the partitioning invariant has been violated, and we'd rather fail loudly than corrupt memory.
The back-pressure log line is reworded from "recycled only %u/%u frames" to "deferred %u/%u into pending queue (depth=K)" so it accurately describes what happened.
Builds: default 11/11, USE_AF_XDP=1 11/11.
Addresses the fourth P1 review comment on PR #2: when xsk_ring_prod__reserve on the FILL ring returns fewer slots than rcvd, the leftover addrs were silently dropped on the floor. Each dropped addr permanently shrinks the usable RX buffer pool — eventually FILL stays empty, the kernel can't write incoming packets, and the receiver goes silent (false negatives). Fix: a fixed-size pending queue (bounded by ANYSCAN_AFXDP_RX_FRAMES so it cannot overflow under the half/half partition invariant). Each loop iteration first flushes pending into FILL, then handles new RX. Frames that don't fit get queued tail-first; the next iteration retries. The kernel makes FILL slots available as it consumes them for incoming packets, so under transient back-pressure the queue drains naturally. Defensive: if pending + leftover ever exceeds RX_FRAMES the partitioning invariant has been violated — log loudly, set fatal_error, and bail rather than corrupt memory. The "back-pressure" warning is reworded to "deferred N/M frames into pending queue (depth=K)" so the log line accurately reflects what happened (not "we lost frames"). Tests - Default `make`: 11/11. - `make USE_AF_XDP=1` (simulated): clean compile, link OK, 11/11.
Phase 2 PR 3 of the AF_XDP integration plan (AnyVM-Tech/AnyScan plans/2026-04-27-portscan-afxdp-plan-v1.md, §3.4).
Builds on PR #1 (Phase 2 PR 2 — TX path + dispatch refactor, merged as 4dd3a2a).
What lands here
src/recv-afxdp.c(new)Per-thread AF_XDP RX path mirroring
src/send-afxdp.c's shape:tx_size = 0) + UMEM with the same 16 MiB / 8192 frame budget per receiver thread.FILL_RING_SIZEframe addrs so the kernel has buffers from the first packet) + steady-state recycle in the RX loop.xsk_ring_cons__peek→process_packet(...)→xsk_ring_cons__release→ push recycled addrs back to FILL ring → kick the kernel viarecvfrom(MSG_DONTWAIT)whenxsk_ring_prod__needs_wakeup(&fill)is set.process_packetkeeps the SYN-ACK / RST / UDP / ICMP scoreboard,iph->daddr == src_ipfilter, and writer-queue path bit-for-bit identical to the AF_PACKET receiver. No semantics drift.Vtable registration in
src/engine.cio_engine_af_xdpdefined under#ifdef USE_AF_XDP:init_per_thread→afxdp_tx_init_per_thread(RX setup happens insidexdp_receiver_threadbecause engine.c only calls init/teardown for senders, mirroring how AF_PACKET'sreceiver_threadopens its own raw socket).tx_thread→xdp_sender_thread(from PR perf(send): AF_XDP TX path — Phase 2 PR 2 of AF_XDP plan #1)rx_thread→xdp_receiver_thread(this PR)teardown_per_thread→afxdp_tx_teardown_per_thread(from PR perf(send): AF_XDP TX path — Phase 2 PR 2 of AF_XDP plan #1)pick_io_engine(IO_ENGINE_AF_XDP)now returns&io_engine_af_xdpinstead of NULL.Queue-id assignment
Senders bind
queue_id = thread_id(0..senders-1). Receivers bindqueue_id = config->senders + thread_idso they don't collide on the same NIC queue (a single XSK owns the queue withoutXDP_SHARED_UMEM). On c6in.metal the channel count fits this comfortably below the lower-half ZC cap (plan §3.5). Smaller NICs surface as a clean bind-ladder failure with a one-line `use --io-engine=af_packet` pointer.Misc
include/scanner.hdeclaresxdp_receiver_threadunderUSE_AF_XDP.src/conf.cupdates the not-built-with hint to point users at the rebuild step now that PRs 2 + 3 have shipped (drops the obsolete "lands in Phase 2 PR 2 + 3" message).tests/io_engine_dispatch.shrewrites test [3] to verify dispatch is reachable onUSE_AF_XDP=1builds and that the rebuild hint shows on default builds.What is NOT in this PR (intentional)
USE_AF_XDP=1conditional (libxdp/libbpf link flags, source inclusion) — Phase 2 PR C. Without it, this code is#ifdef USE_AF_XDPand the defaultmakebuild is unaffected. The PR was verified by hand-compiling with-DUSE_AF_XDPand linking with-lxdp -lbpf -lelf -lz.CAP_BPF+ worker bundle runtime probe — Phase 2 PR C.--io-enginewiring +ANYSCAN_SCANNER_IO_ENGINEknob — Phase 2 PR D.Test status
make(default AF_PACKET build): clean. All 11tests/io_engine_dispatch.shchecks pass.make USE_AF_XDP=1(simulated by hand-compile + manual link since the Makefile flag lands in PR C): all 12 source files compile cleanly under-Wall -O3 -DUSE_AF_XDP, full link with-lxdp -lbpf -lelf -lzsucceeds (scannerbinary 66 KB), dispatch tests pass.cargo build --workspaceclean,cargo test --workspace33/33 pass.Live-bench gates (NOT verified here, on c6in.metal TODO)
These are explicitly documented in
src/recv-afxdp.cheader comments and remain on the c6in.metal bench TODO:xsks_mapredirect-program detach on hard-kill paths (the orderly path usesxsk_socket__delete, which detaches the program; a SIGKILL'd scanner needsip link set <iface> xdp offcleanup).Plan reference
See `plans/2026-04-27-portscan-afxdp-plan-v1.md` §3.4 (RX shape, FILL-ring stocking, ring sizing, BlackRock invariants), §4.3 (kernel feature checks + bind-mode ladder) in the AnyScan repo.
Sequencing
This is PR B in the four-PR Phase 2 sequence:
src/send-afxdp.c(merged 4dd3a2a).src/recv-afxdp.c+io_engine_af_xdpregistration.USE_AF_XDP=1+ AnyScan apt deps + systemdCAP_BPF+ worker-bundle runtime probe.ANYSCAN_SCANNER_IO_ENGINEopt-in knob + adapter--io-enginepropagation.Do not self-merge — handing off to the orchestrator for review.