perf(send): AF_XDP TX path — Phase 2 PR 2 of AF_XDP plan#1
Conversation
The scanner historically hardcoded `pthread_create(..., sender_thread, ...)`
in `src/engine.c`, leaving the PF_RING ZC code paths in `src/send-pfring.c`
and `src/recv-pfring.c` compiled-but-unreachable (per the §2.3 wart noted in
the AF_XDP integration plan, AnyVM-Tech/AnyScan PR #65).
This commit replaces the hardcoded dispatch with a small `io_engine_vtable_t`
indirection and adds a `--io-engine={af_packet,pfring_zc,af_xdp}` CLI flag.
Behaviour:
- `af_packet` (default): identical to the prior path — opens a PF_PACKET
raw socket, binds to the configured ifindex, runs `sender_thread` and
`receiver_thread`. Same socket + bind sequence, just relocated into
`af_packet_init_per_thread`.
- `pfring_zc`: when built with `USE_PFRING_ZC=1`, dispatch now reaches
`pfring_zc_sender_thread` / `pfring_zc_receiver_thread` for the first
time. Without that build flag, parse_arguments rejects the value with
a clear error.
- `af_xdp`: stub slot only. The actual `afxdp_sender_thread` and
`afxdp_receiver_thread` land in Phase 2 PRs 2 + 3 of the integration
plan; until then `--io-engine=af_xdp` errors at startup pointing the
operator at `af_packet`.
Notes on backwards compatibility:
- Default behaviour is unchanged: with no `--io-engine` flag, the binary
runs the same AF_PACKET path it always did.
- The inline `socket(PF_PACKET, SOCK_RAW, ...)` + `bind(...)` from the
old `run_scan` is moved into `af_packet_init_per_thread` so the dispatch
can stay symmetric across engines. `bind()`'s return value is now
checked (it was previously ignored).
- The ICMP prescan helpers continue to use their own raw PF_PACKET
sockets independently of the chosen TX engine — `alive_sender_thread`
predates the io_engine concept.
Tests:
- `tests/io_engine_dispatch.sh` (also wired into `make test`) covers:
`--help`, unknown engines, `--io-engine=af_xdp` clean error, the
pfring_zc compile-flag gate, and that `--io-engine=af_packet` is
accepted at parse time. The harness deliberately stops at parse_arguments
so the tests can run without `CAP_NET_RAW`.
- Verified `make` (default, AF_PACKET) and `gcc -fsyntax-only -DUSE_PFRING_ZC`
compile cleanly.
Out of scope for this commit:
- The actual AF_XDP send/receive paths (Phase 2 PR 2 + 3).
- Makefile / build-system integration for `USE_AF_XDP` (Phase 2 PR 4).
- PF_RING ZC cluster/pool/queue initialization. The dispatch now reaches
the ZC threads, but the cluster init wart is a separate follow-on; the
new `pfring_zc_init_per_thread` errors out cleanly if the cluster has
not been wired up by the surrounding code.
Refs: AnyVM-Tech/AnyScan PR #65 (plan), plan §3.1 + §3.3.
… drain
Phase 2 PR 2 of the AF_XDP integration plan (AnyVM-Tech/AnyScan PR #65).
This commit adds the AF_XDP transmit path so scanner threads can push
SYN/UDP/ICMP probes through libxdp's XSK API instead of TPACKET_V2,
matching the existing AF_PACKET sender bit-for-bit on packet construction
and rate-limiting while bypassing the kernel netdev TX path.
What lands here (~280 LOC of pure code, plus comments):
- include/xdp-defs.h: UMEM/ring sizing constants (8192 frames × 2048 B,
2048-desc TX/comp/fill rings — matches Suricata defaults per plan §3.4),
bind-mode fallback enum (DRV+ZC → DRV+COPY → SKB), and opaque
forward decls for xdp_tx_state / xdp_rx_state.
- include/scanner_defs.h: thread_context_t gains struct xdp_tx_state*
xdp_tx and struct xdp_rx_state* xdp_rx pointers under
`#ifdef USE_AF_XDP`. Forward-declared so scanner_defs.h does not pull
in <xdp/xsk.h>, keeping build-time deps off the AF_PACKET-only path.
- include/scanner.h: declares afxdp_tx_init_per_thread,
afxdp_tx_teardown_per_thread, and xdp_sender_thread under USE_AF_XDP.
- src/send-afxdp.c: the substantive change.
* afxdp_alloc_umem: posix_memalign'd UMEM region + xsk_umem__create
with 2048-desc fill/comp; per-thread free-frame stack (8192 entries,
pre-populated with all frame ids).
* afxdp_try_bind: bind ladder (plan §4.3). Tries
XDP_FLAGS_DRV_MODE | XDP_ZEROCOPY first; on failure ladders to
DRV+XDP_COPY then XDP_FLAGS_SKB_MODE. Logs the mode that actually
bound. INHIBIT_PROG_LOAD set since rx == NULL means libxdp does not
need the default xsks_map redirect program.
* afxdp_tx_init_per_thread: alloc UMEM, run bind ladder against
(config->interface, queue_id = thread_id). Mirrors the resulting
xsk fd into ctx->socket_fd for status-thread compatibility.
* afxdp_tx_teardown_per_thread: xsk_socket__delete + xsk_umem__delete
+ free(umem_area, free_stack, state).
* xdp_sender_thread: the TX loop.
- CPU-affinity matches sender.c::sender_thread (thread_id % NPROC).
- Builds the same SYN/UDP/ICMP template via
create_syn_packet/create_udp_packet/create_icmp_packet from net.c.
- Per-batch: drain completion ring → recycle frames; build up to
BATCH_SIZE (10) packets directly into UMEM frames pulled from the
free stack (filtered indices simply skip without consuming a
frame); xsk_ring_prod__reserve(built_count) with a bounded retry
ladder if the ring is full (drain comp, kick via sendto if
needs_wakeup, retry up to 32 times); on success fill descs with
(addr,len) and xsk_ring_prod__submit(built_count); kick again if
needs_wakeup.
- atomic_fetch_add stats and call rate_limit_batch unchanged from
the AF_PACKET path so the AIMD adapter sees consistent pps.
What is deliberately NOT in this commit:
- The io_engine_af_xdp vtable struct in src/engine.c. Phase 2 PR 3
defines it once the matching xdp_receiver_thread lands; until then
pick_io_engine() returns NULL for IO_ENGINE_AF_XDP and prints a
clear "lands in Phase 2 PR 2 + 3" error. Default builds are
unchanged — the file is gated on `#ifdef USE_AF_XDP` so it is not
even compiled.
- The Makefile USE_AF_XDP=1 conditional (libxdp/libbpf link, source
list, dep install). Phase 2 PR 4. Without that block this whole
translation unit is `#ifdef`'d out.
Build dependency note: as a stack of additive PRs, USE_AF_XDP=1 builds
will fail to link until Phase 2 PR 3 lands the io_engine_af_xdp vtable
(engine.c references it as `extern` under USE_AF_XDP). Default
(AF_PACKET) builds — the only configuration shipped to ops today — are
unchanged. The dispatch fork also continues to print
"USE_AF_XDP=1 not set" for `--io-engine=af_xdp` until the vtable + the
Makefile flag both ship.
What was verified locally:
- `make` (default, AF_PACKET-only) — clean. Pre-existing strncpy
warning in receiver.c is unchanged.
- `make test` — 11/11 io_engine_dispatch.sh smoke tests pass.
- `gcc -Wall -O3 -Iinclude -DUSE_AF_XDP -c` on every translation
unit (main, conf, engine, net, utils, sender, receiver, parsing,
crypto-blackrock*, send-afxdp) — clean. Confirms USE_AF_XDP
compiles cleanly across the codebase even though it cannot link
yet (see "Build dependency note" above).
What is *NOT* verified here (needs c6in.metal live bench, plan §5.3):
- Actual XSK bind on an ENA NIC and the DRV+ZC vs DRV+COPY vs SKB
ladder behaviour against the production AMI.
- amzn-drivers#221 ZC driver-reset interaction.
- ENA "lower-half-channels" zero-copy constraint — current code
attempts the bind with queue_id = thread_id and falls back on
failure, but does not proactively cap thread count via ethtool -l.
Phase 2 PR 3 / PR 4 will add the channel-count probe.
- veth loopback test from plan §5.1 — requires a separate netns
harness. Lands with the recv path in PR 3 since validating the TX
side end-to-end without an RX counterpart is not interesting.
Refs:
- Plan: AnyVM-Tech/AnyScan PR #65, plan/2026-04-27-portscan-afxdp-plan-v1.md
§3.4 (TX loop pattern), §3.5 (ENA specifics), §4.3 (bind ladder).
- Phase 2 PR 1 (dispatch refactor + scanner fork): AnyScan PR #67 →
this fork's perf/portscan-afxdp-phase2-pr1.
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 5844d61a26
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| uint64_t port_total_idx = index / ctx->work.total_ips; | ||
| uint32_t current_ip_nbo = get_ip_from_index(ip_idx, ctx->work.all_ip_ranges, ctx->work.total_ip_ranges); | ||
| uint32_t current_ip_hbo = ntohl(current_ip_nbo); | ||
| ctx->work.current_global_idx++; |
There was a problem hiding this comment.
Retry unsent targets after TX ring reservation failure
ctx->work.current_global_idx is advanced before the batch is guaranteed a slot in the AF_XDP TX ring. If xsk_ring_prod__reserve later fails under ring backpressure, the failure path only returns UMEM frames and continues, so those already-consumed targets are silently dropped instead of retried. This can create deterministic scan coverage gaps (missed IP/port probes) whenever TX stalls.
Useful? React with 👍 / 👎.
There was a problem hiding this comment.
Good catch — you're right that this was a deterministic coverage gap, not just a perf wart. Fixed in b2b0514.
Two related rollback paths added:
-
Outer rollback on TX-ring reservation failure. Snapshot
current_global_idxat the start of every batch (idx_at_batch_start). Ifxsk_ring_prod__reservenever lands within the 32-attempt retry ladder (each kicking the kernel + draining the completion ring), restore the snapshot and free all built frames so the next outer iteration re-processes the same targets. BlackRock is deterministic incurrent_global_idxso re-processing rebuilds identical packets; only the per-threadxorshift32state for IP id / TCP seq mutates, which is fine (those fields are random by design). A briefusleep(10)gives the kernel time to drain so we don't busy-spin the same indices. -
Inner rollback on frame-allocation failure. Found while auditing the rollback logic — the same shape exists when
afxdp_alloc_framereturns < 0 (all 8192 frames in flight). The current iteration had already donecurrent_global_idx++before alloc was attempted; without the matching decrement that target would be silently dropped on the first sustained burst of in-flight frames. The fix decrementscurrent_global_idxby 1 before breaking out of the inner loop, mirroring AF_PACKET's "tp_statusbusy → bail without advancing" semantics at the top ofsender.c's inner loop.
Net behaviour now: under sustained TX backpressure or a full free-frame pool, the loop re-processes the same targets next iteration instead of dropping them. The retry budget + usleep keeps the loop from livelocking; the rollback keeps scan coverage deterministic regardless of TX pace.
Verified:
make(default, AF_PACKET) clean.make test— 11/11 dispatch smoke tests pass.gcc -Wall -O3 -DUSE_AF_XDP -c src/send-afxdp.cclean.
Address codex bot review on perf/portscan-afxdp-phase2-pr2#1: the AF_XDP TX path was speculatively advancing ctx->work.current_global_idx as it built each packet, then dropping packets-and-indices together if xsk_ring_prod__reserve never landed within the 32-attempt retry ladder. That created a deterministic scan-coverage gap under sustained TX backpressure: every batch that exhausted the retry budget silently dropped up to BATCH_SIZE targets. The AF_PACKET path doesn't have this shape because it checks tp_status at the top of its inner loop and bails *before* advancing the iteration; the AF_XDP path can't do that naturally because frame allocation and slot reservation are separate ring operations that batch up across the loop. Two fixes: 1. Outer rollback on reservation failure. Snapshot current_global_idx at the start of every batch (idx_at_batch_start). On reservation failure, restore that snapshot and free all built frames so the next outer iteration re-processes the same targets. BlackRock is deterministic in current_global_idx so re-processing rebuilds identical packets; only the per-thread xorshift state for IP id / TCP seq mutates, which is fine (those fields are random by design). A brief usleep(10) gives the kernel time to drain so we don't busy-spin the same indices. 2. Inner rollback on alloc-frame failure. The frame stack is empty when afxdp_alloc_frame returns < 0 (all 8192 frames in flight). The current iteration had already done current_global_idx++; without the matching decrement that target would be silently dropped on the first sustained burst of in-flight frames. Mirroring AF_PACKET's "tp_status busy → bail without advancing" semantics, we now decrement current_global_idx by 1 before breaking. Net: under sustained TX backpressure or full free-frame pool, the loop re-processes the same targets next iteration instead of dropping them. The retry budget + usleep keeps the loop from livelocking; the rollback keeps scan coverage deterministic regardless of TX pace. Verified: - `make` (default, AF_PACKET) — clean. - `make test` — 11/11 io_engine_dispatch.sh smoke tests pass. - `gcc -Wall -O3 -DUSE_AF_XDP -c src/send-afxdp.c` — clean. Refs: PR #1 review comment from chatgpt-codex-connector[bot] on line 311 of src/send-afxdp.c.
#4) Adds a fourth I/O engine, --io-engine=dpdk, alongside af_packet, pfring_zc and af_xdp. Mirrors the AF_XDP wire-up shape introduced by PRs #1-#3 but swaps the kernel XSK socket path for a DPDK PMD running directly in userspace (rte_eth_tx_burst / rte_eth_rx_burst against vfio-pci-bound NICs). Why: AWS ENA on kernel ≤6.12.74 forces AF_XDP into drv+copy mode, capping c6in.metal at ~22M pps aggregate (memory: anyscan_afxdp_ena_constraint). DPDK bypasses the kernel ENA driver entirely via vfio-pci and removes both the syscall kick and the lower-half-channels ZC constraint, opening up the 50-100M pps ceiling identified in plans/2026-04-28-portscan-dpdk-impl-v1.md. What lands here: - include/dpdk-defs.h: opaque struct dpdk_state forward-decl + sizing constants (mbuf pool 8192/sender, 1024-deep TX/RX rings, 256-mbuf per-lcore cache). - src/dpdk-eal.c: process-wide EAL bring-up + teardown — rte_eal_init, port probe, mempool create, port_configure with RSS-on-IP/TCP/UDP, TX and RX queue setup, dev_start, promiscuous-on. Runs once on the main thread before run_scan; teardown drains ports and rte_eal_cleanup. - src/send-dpdk.c: TX-burst loop. Same BlackRock walk + blacklist / alive-queue filter as send-afxdp.c; only the I/O layer changes (rte_pktmbuf_alloc_bulk → packet patch in mbuf data area → rte_eth_tx_burst). Same partial-send rollback so sustained backpressure doesn't silently drop scan targets. - src/recv-dpdk.c: RX-burst loop. process_packet is reused unchanged (Ethernet→IP→TCP/UDP/ICMP filter + scoreboard). One receiver per RX queue; rejects mismatched receivers/queue counts loudly (rte_eth_rx_burst is MT-unsafe per queue). - src/engine.c: io_engine_dpdk vtable + IO_ENGINE_DPDK case in pick_io_engine. - include/scanner.h, include/scanner_defs.h: forward decls + new config / per-thread fields under #ifdef USE_DPDK. - src/conf.c: --io-engine=dpdk recognized; new --dpdk-port, --dpdk-num-{tx,rx}q, --dpdk-eal-args flags. Mandatory --gateway-mac when io_engine=dpdk (no kernel ARP in DPDK). - src/main.c: argv pre-scan splits scanner argv vs raw EAL argv on the conventional `--` separator; calls dpdk_eal_bringup before setup_scan and dpdk_eal_teardown after run_scan. - Makefile: USE_DPDK=1 conditional that adds -DUSE_DPDK, pkg-config libdpdk cflags+libs, and the new TUs. Default build is bit-for-bit identical to upstream when USE_DPDK is unset. - tests/dpdk_dispatch.sh: smoke test for --io-engine=dpdk parse-time behaviour. Mirrors tests/io_engine_dispatch.sh (CLI parse + build-flag rejection / acceptance assertions, no root or NIC required). Build verification (all in this branch on a Debian bookworm host with libdpdk-dev 24.11 + libxdp 1.5.4 / libbpf 1.5.0 installed): - `make` → 11/11 io_engine_dispatch + 11/11 dpdk_dispatch pass. - `make USE_AF_XDP=1` → 11/11 + 11/11 pass. - `make USE_DPDK=1` → 11/11 + 8/8 pass (the DPDK-only assertions activate on this build). - `make USE_AF_XDP=1 USE_DPDK=1` → 11/11 + 8/8 pass. No new compiler warnings (the strncpy ones in receiver.c pre-date this PR). Note on parsing.c reference in the task brief: io_engine_from_string and io_engine_name actually live in src/conf.c (this PR extends them there); src/parsing.c is the IP/port range parser and is untouched. Out of scope (Phase 2 next steps in the AnyScan repo, not here): - install-external-deps.sh / package-worker-bundle.sh / deploy.sh ANYSCAN_USE_DPDK build-flag plumbing (mirrors PR #71's AF_XDP wire-up). - tools/setup-dpdk.sh hugepages + vfio-pci bind/unbind script. - install-worker-bundle.sh probe_dpdk_runtime_available. - vulnscanner-zmap-adapter.py SUPPORTED_IO_ENGINES + EAL argv emit. - Live c6in.metal bench (separate worker per the plan's §5.3). Refs: AnyVM-Tech/AnyScan plans/2026-04-28-portscan-dpdk-impl-v1.md anygpt-50 Co-authored-by: skullcmd <skullcmd@anyvm.tech> Co-authored-by: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Summary
Phase 2 PR 2 of 4 of the AF_XDP integration plan (AnyVM-Tech/AnyScan PR #65). This PR ships the AF_XDP transmit path so scanner threads can push SYN/UDP/ICMP probes through libxdp's XSK API instead of TPACKET_V2, matching the existing AF_PACKET sender bit-for-bit on packet construction and rate-limiting while bypassing the kernel netdev TX path.
This is a stacked PR. It depends on the dispatch refactor +
--io-engineflag from Phase 2 PR 1 (branchperf/portscan-afxdp-phase2-pr1, commit998c66b). PR 1 must land inmainfirst, or this branch should be rebased onto whatever commit lands the dispatch refactor.PR 3 (the recv counterpart +
io_engine_af_xdpvtable registration) and PR 4 (the Makefile / install / systemd plumbing) follow.Files
include/xdp-defs.hxdp_tx_state/xdp_rx_stateforward decls. Gated on#ifdef USE_AF_XDP.include/scanner.hafxdp_tx_init_per_thread,afxdp_tx_teardown_per_thread,xdp_sender_threadunder#ifdef USE_AF_XDP.include/scanner_defs.hthread_context_tgainsstruct xdp_tx_state *xdp_tx+struct xdp_rx_state *xdp_rxpointers under#ifdef USE_AF_XDP. Forward-declared so this header doesn't pull<xdp/xsk.h>.src/send-afxdp.cDiff stat: 4 files changed, 496 insertions(+), 0 deletions.
TX loop shape (plan §3.4)
The key difference vs.
sender.c: AF_PACKET's TPACKET_V2 ring auto-recycles frames viatp_status; AF_XDP requires the userspace TX loop to track which UMEM frames the kernel has returned via the completion ring. That bookkeeping (the free-frame stack +afxdp_drain_completion_ring) is the only substantive new code; everything above the I/O socket boundary (BlackRock cipher, blacklist, alive-queue, rate limit, stats) is reused unchanged.Bind-mode fallback ladder (plan §4.3)
INHIBIT_PROG_LOADis set on every attempt because we are TX-only (rx == NULL); libxdp does not need to load the defaultxsks_mapredirect program.XDP_USE_NEED_WAKEUPis always set so we can gatesendtokicks behindxsk_ring_prod__needs_wakeup.What is deliberately NOT in this PR
io_engine_af_xdpvtable struct insrc/engine.c. Phase 2 PR 3 defines it once the matchingxdp_receiver_threadlands.USE_AF_XDP=1conditional. Phase 2 PR 4. Without that block this whole translation unit is#ifdef'd out, so the default (AF_PACKET) build is unchanged.Build dependency note: as a stack of additive PRs,
make USE_AF_XDP=1builds will fail to link until PR 3 lands theio_engine_af_xdpvtable (engine.creferences it asexternunderUSE_AF_XDP). Default (AF_PACKET) builds — the only configuration shipped to ops today — are unchanged. The dispatch path also continues to print "USE_AF_XDP=1 not set" for--io-engine=af_xdpuntil both the vtable and the Makefile flag ship.Test plan
Verified locally:
make(default, AF_PACKET-only) — clean. Pre-existingstrncpywarning inreceiver.cis unchanged.make test— 11/11io_engine_dispatch.shsmoke tests pass.gcc -Wall -O3 -Iinclude -DUSE_AF_XDP -con every translation unit (main,conf,engine,net,utils,sender,receiver,parsing,crypto-blackrock*,send-afxdp) — clean. ConfirmsUSE_AF_XDPcompiles cleanly across the codebase even though it cannot link yet (see "Build dependency note" above).wc -ltotals 496 — within range of the plan estimate (~280 LOC forsend-afxdp.cproper plus comments / header).Not verified here — needs c6in.metal live bench (plan §5.3):
queue_id = thread_idand falls back on failure, but does not proactively cap thread count viaethtool -l. PR 3 / PR 4 add the channel-count probe.Refs
plans/2026-04-27-portscan-afxdp-plan-v1.md§3.4 (TX loop), §3.5 (ENA specifics), §4.3 (bind ladder).perf/portscan-afxdp-phase2-pr1branch.🤖 Generated with Claude Code