fix(ffi): segfault when threadsafe JSCallback invoked from multiple native threads by robobun · Pull Request #28115 · oven-sh/bun

robobun · 2026-03-14T18:31:26Z

Problem

FFI_Callback_threadsafe_call is the trampoline for new JSCallback(fn, { threadsafe: true }) and is invoked from arbitrary native threads — that's its entire purpose. It was capturing FFICallbackFunctionWrapper by value in the postTaskTo lambda:

WebCore::ScriptExecutionContext::postTaskTo(..., [argsVec = WTF::move(argsVec), wrapper](...) { ... });
//                                                                               ^^^^^^^ copy

That copy invokes JSC::Strong<>'s copy constructor on the calling native thread, which calls HandleSet::allocate() and writeBarrier(). HandleSet is a non-locked singly-linked free list + sentinel list owned by the VM. Mutating it from a non-JS thread races with the JS thread (which churns the same lists on every Strong<> create/destroy and during GC marking), corrupting the handle lists.

It also called wrapper.globalObject.get() on the foreign thread to fish out the script execution context, reading a HandleSlot concurrently with GC.

Repro

Strong.h:147:46: runtime error: member call on null pointer of type 'JSC::HandleSet'
SentinelLinkedList.h:212:11: runtime error: member call on null pointer of type 'WTF::BasicRawSentinelNode<JSC::HandleNode>'

— from test/js/bun/ffi/ffi-threadsafe-callback.test.ts, which spawns 4 pthreads each firing a threadsafe JSCallback 5000× while the JS thread creates/closes throwaway JSCallbacks to contend on the same HandleSet. Under debug+ASAN the unfixed build fails 5/5 runs within ~1s.

Fix

Cache ScriptExecutionContextIdentifier (a plain uint32_t) in the wrapper at construction time (on the JS thread).
Make FFICallbackFunctionWrapper ThreadSafeRefCounted and capture a Ref<> in the lambda instead of copying it. Creating a Ref is just an atomic increment; the Strong<> members are never copied.
FFICallbackFunctionWrapper_destroy becomes deref(), so the wrapper survives a close() that races with already-queued tasks.

The posted task still runs on the JS thread and dereferences wrapperRef->m_function there, which is safe.

Verification

	`bun bd test test/js/bun/ffi/ffi-threadsafe-callback.test.ts`
before	5/5 fail — UBSan null `HandleSet` / `SentinelLinkedList`
after	5/5 pass (~1.7s), all 20000 callbacks delivered

All existing test/js/bun/ffi/* tests pass.

Closes #28113

robobun · 2026-03-14T18:31:37Z

^{Updated 2:45 PM PT - Apr 29th, 2026}

❌ @autofix-ci[bot], your commit 9c80f24 has 2 failures in Build #49191 (All Failures):

test/js/bun/s3/s3-storage-class.test.ts - code 1 on 🍎 14 aarch64
test/js/node/worker_threads/worker_threads.test.ts - pid 7573 segmentation fault on 🐧 3.23 aarch64

🧪 To try this PR locally:

bunx bun-pr 28115

That installs a local version of the PR into your bun-28115 executable, so you can run:

bun-28115 --bun

coderabbitai · 2026-03-14T18:33:33Z

Walkthrough

FFICallbackFunctionWrapper is made thread-safe: it now derives from ThreadSafeRefCounted, caches the ScriptExecutionContextIdentifier, uses Ref/leakRef for creation and deref for destruction, and the threadsafe callback path captures a Ref and cached context id before posting work. A regression test exercising multi-threaded callbacks was added.

Changes

Cohort / File(s)	Summary
FFI Callback Thread-Safety `src/bun.js/bindings/JSFFIFunction.cpp`	FFICallbackFunctionWrapper now derives from `ThreadSafeRefCounted<...>` and adds public `WebCore::ScriptExecutionContextIdentifier m_contextIdentifier` initialized from `globalObject->scriptExecutionContext()->identifier()`. Creation uses `Ref<...>` with `leakRef()`; `FFICallbackFunctionWrapper_destroy` calls `deref()`. `FFI_Callback_threadsafe_call` now captures a `Ref<FFICallbackFunctionWrapper>` (and caches `contextId`) when posting the task and accesses the function via the captured ref; added thread-safety/lifetime comments.
Regression Test `test/regression/issue/28113.test.ts`	New regression test that builds a native repro (pthreads-based), loads it via Bun FFI, registers a `{ threadsafe: true }` JSCallback, and exercises multiple native threads (4 × 1000 callbacks) verifying the callback counter reaches expected value; test is skipped on Windows.

🚥 Pre-merge checks | ✅ 4

✅ Passed checks (4 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately and concisely describes the main fix: preventing a segfault in the threadsafe FFI callback path when invoked from multiple native threads.
Linked Issues check	✅ Passed	The pull request code changes directly address the segfault issue `#28113` by making FFICallbackFunctionWrapper thread-safe and avoiding JSC object access from non-JS threads.
Out of Scope Changes check	✅ Passed	All changes (JSFFIFunction.cpp and the regression test) are directly scoped to fixing the threadsafe callback segfault and validating the fix.
Description check	✅ Passed	The pull request provides a comprehensive description with detailed problem statement, root cause analysis, fix explanation, and verification results.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 3

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@test/regression/issue/28113.test.ts`:
- Around line 48-52: The test currently uses a fixed 2s sleep to wait for the
JSCallback to run (counter and callback / JSCallback setup); replace that sleep
with an awaited, deterministic loop that polls the counter until it equals the
expected value (or throws after a reasonable timeout) so the test awaits the
completion condition instead of waiting a fixed time; apply the same change to
the other occurrence referenced (around the second sleep at lines 56-57) and
ensure the wait has a clear timeout guard to fail fast if the callback never
runs.
- Around line 94-95: Remove the brittle negative crash-string assertions by
deleting the two lines that assert on stderr:
expect(stderr).not.toContain("Segmentation fault"); and
expect(stderr).not.toContain("Bus error"); in the test (where stderr is
asserted). Keep the existing exit-code and behavioral assertions (do not add
replacement checks) so the test relies on exit/status semantics instead of
searching output for panic strings.
- Line 4: The skip condition only excludes Windows ARM64 but the test uses
pthreads which are unavailable under MSVC on any Windows host; update the flag
used to skip the test (isFFIUnavailable) to treat all Windows builds as
unavailable (e.g., change the definition of isFFIUnavailable from "isWindows &&
isArm64" to simply "isWindows") so the pthread-based test is skipped on Windows;
modify the declaration of isFFIUnavailable referenced by the test to reflect
this change.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: 2554b051-4dd0-4ee0-873f-51662ce4259b

📥 Commits

Reviewing files that changed from the base of the PR and between 10bdb48 and a69e8cd.

📒 Files selected for processing (2)

src/bun.js/bindings/JSFFIFunction.cpp
test/regression/issue/28113.test.ts

alii · 2026-03-14T19:07:30Z

@robobun adopt and verify

robobun · 2026-03-14T19:08:43Z

✅ Pushed two fixes for review feedback in 0663aea:

StandaloneModuleGraph: only switch to .utf8 when output contains non-ASCII, preserving the zero-copy .latin1 path for pure-ASCII bundles.
OutputFile.toBunString: added defer buf.allocator.free(buf.bytes) before the cloneUTF8 return to avoid leaking the buffer.

hsjoberg · 2026-03-14T20:01:27Z

Epic.

The fix appears to work! ⭐
I tested in my repro https://github.com/hsjoberg/bun-ffi-crash (detailed in #28113).
Even tested a more insane stress test hammering the callback via 128 worker threads.

Thank you Claude Code bot.

…Callback FFI_Callback_threadsafe_call is the trampoline for JSCallback({ threadsafe: true }) and runs on arbitrary native threads. It was capturing the FFICallbackFunctionWrapper by value in the postTaskTo lambda, which invokes JSC::Strong<>'s copy constructor. That constructor calls HandleSet::allocate() and writeBarrier() on the VM's HandleSet — a singly-linked free list with no locking — racing with the JS thread and corrupting the strong-handle list. Cache the ScriptExecutionContextIdentifier (a plain uint32_t) in the wrapper at construction time (on the JS thread). Make the wrapper ThreadSafeRefCounted and capture a Ref<> in the lambda instead of copying it, so no Strong<> is touched off-thread and the wrapper survives a concurrent close() while tasks are still queued. The new test spawns 4 pthreads that each fire the callback 5000 times while the JS thread churns HandleSet allocations via JSCallback create/close. Under debug+ASAN the unfixed build reliably hits 'member call on null pointer of type JSC::HandleSet' in Strong.h from the corrupted free list; the fixed build receives all 20000 callbacks. Closes #28113

robobun · 2026-04-29T17:51:01Z

Rebased onto current main (was 6 weeks stale with conflicts) and replaced the regression test.

Same core fix — cached ScriptExecutionContextIdentifier, ThreadSafeRefCounted wrapper, Ref<> capture.

New test (test/js/bun/ffi/ffi-threadsafe-callback.test.ts + threadsafe-callback.c) — the previous test had the JS thread blocked inside pthread_join while the worker threads fired, so there was no contention on HandleSet from the JS side and the race wasn't reliably triggered. The new test keeps the JS thread actively churning JSCallback create/close (each a pair of Strong<> alloc/free) while 4 worker threads fire 20k callbacks total. Unfixed debug+ASAN: 5/5 UBSan member call on null pointer of type 'JSC::HandleSet'. Fixed: 5/5 pass in ~1.7s.

claude · 2026-04-29T18:15:09Z

+// TinyCC (and all of bun:ffi) is disabled on Windows ARM64.
+// On Windows x64 there is no system `cc`, so skip there too — the bug being
+// covered (JSC::Strong<> copied on a non-JS thread) is platform-independent.
+const canRun = !isWindows && !(isWindows && isArm64);


🟡 Nit: !isWindows && !(isWindows && isArm64) is logically equivalent to just !isWindows — the second clause can never affect the result (if isWindows is false the first clause already passes; if true it already fails). Consider simplifying to const canRun = !isWindows; and dropping the now-unused isArm64 import from harness.

Extended reasoning...

What

Line 8 of test/js/bun/ffi/ffi-threadsafe-callback.test.ts reads:

const canRun = !isWindows && !(isWindows && isArm64);

This expression is a tautology over !isWindows. The second conjunct !(isWindows && isArm64) is dead code, and as a consequence isArm64 (imported on line 3) is effectively unused.

Step-by-step proof

Enumerate the two cases for isWindows:

isWindows = false → first clause !isWindows is true. Second clause: isWindows && isArm64 is false && X = false, so !(false) = true. Result: true && true = true. Same as !isWindows.

isWindows = true → first clause !isWindows is false. && short-circuits; the second clause is never evaluated. Result: false. Same as !isWindows.

In both cases the result equals !isWindows regardless of isArm64, so isArm64 contributes nothing and the import on line 3 is unused.

Why existing code doesn't prevent it

There's no lint rule catching tautological boolean sub-expressions here, and TypeScript's noUnusedLocals doesn't flag isArm64 because it is syntactically referenced — just in dead code.

Addressing the "documentary purpose" objection

One could argue the two-clause form mirrors the two-line comment above it (Windows ARM64 lacks TinyCC; Windows x64 lacks cc). But that argument doesn't hold up: the comment already fully documents both reasons, and the second boolean clause doesn't add independent information — it's a strict subset of the first (isWindows && isArm64 ⊆ isWindows). If anything, leaving it in is mildly misleading: a reader skimming the expression might assume there's some Windows-non-ARM64 case that can run, when there isn't. The comment is the right place for the rationale; the code should just say what it does.

Impact

Zero behavioral impact — the test skips on exactly the same platforms either way. This is purely a readability/cleanliness nit: a redundant clause and an unused import in a brand-new test file.

Fix

import { bunEnv, bunExe, isMacOS, isWindows, tempDir } from "harness"; // TinyCC (and all of bun:ffi) is disabled on Windows ARM64. // On Windows x64 there is no system `cc`, so skip there too — the bug being // covered (JSC::Strong<> copied on a non-JS thread) is platform-independent. const canRun = !isWindows;

robobun · 2026-05-01T19:54:33Z

Independently hit this and pushed a minimal variant to farm/c5575d59/ffi-threadsafe-handleset before finding this PR — capture &wrapper by reference + WTF_MAKE_NONCOPYABLE(FFICallbackFunctionWrapper). The ThreadSafeRefCounted + cached m_contextIdentifier approach here is more thorough (survives close() racing queued tasks, and avoids reading Strong<>::get() off-thread entirely), so deferring to this one.

The test on my branch may be useful as an alternative/addition: it dlopen's pthread_create/pthread_join directly (no system cc required) and runs 256 batches of 8 concurrent pthreads through the callback. Under bun bd it fails 20/20 without the fix (HandleSet::writeBarrier / SentinelLinkedList assertions) and passes 20/20 in ~2s with it; release bun segfaults ~40% of runs without the fix.

github-actions Bot added the claude label Mar 14, 2026

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread test/regression/issue/28113.test.ts Outdated

Comment thread test/regression/issue/28113.test.ts Outdated

Comment thread test/regression/issue/28113.test.ts Outdated

claude Bot reviewed Mar 14, 2026

View reviewed changes

Comment thread src/bun.js/bindings/JSFFIFunction.cpp Outdated

Comment thread test/regression/issue/28113.test.ts Outdated

robobun mentioned this pull request Mar 14, 2026

Fix bun install hang with security scanner and many packages #28116

Closed

hsjoberg reviewed Mar 15, 2026

View reviewed changes

Comment thread test/regression/issue/28113.test.ts Outdated

robobun force-pushed the claude/fix-ffi-threadsafe-callback-segfault branch from f8a231e to 436fcd4 Compare April 29, 2026 17:49

[autofix.ci] apply automated fixes

9c80f24

claude Bot reviewed Apr 29, 2026

View reviewed changes

robobun mentioned this pull request May 3, 2026

fix(ffi): defer JSBigInt allocation for threadsafe JSCallback i64/u64 args to the JS thread #30165

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(ffi): segfault when threadsafe JSCallback invoked from multiple native threads#28115

fix(ffi): segfault when threadsafe JSCallback invoked from multiple native threads#28115
robobun wants to merge 2 commits intomainfrom
claude/fix-ffi-threadsafe-callback-segfault

robobun commented Mar 14, 2026 •

edited

Loading

Uh oh!

robobun commented Mar 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading

Uh oh!

coderabbitai Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alii commented Mar 14, 2026

Uh oh!

robobun commented Mar 14, 2026 •

edited

Loading

Uh oh!

hsjoberg commented Mar 14, 2026 •

edited

Loading

Uh oh!

Uh oh!

robobun commented Apr 29, 2026

Uh oh!

claude Bot Apr 29, 2026

Uh oh!

robobun commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

robobun commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Repro

Fix

Verification

Uh oh!

robobun commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

coderabbitai Bot commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

alii commented Mar 14, 2026

Uh oh!

robobun commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

hsjoberg commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

Uh oh!

robobun commented Apr 29, 2026

Uh oh!

claude Bot Apr 29, 2026

Choose a reason for hiding this comment

What

Step-by-step proof

Why existing code doesn't prevent it

Addressing the "documentary purpose" objection

Impact

Fix

Uh oh!

robobun commented May 1, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

robobun commented Mar 14, 2026 •

edited

Loading

robobun commented Mar 14, 2026 •

edited

Loading

coderabbitai Bot commented Mar 14, 2026 •

edited

Loading

robobun commented Mar 14, 2026 •

edited

Loading

hsjoberg commented Mar 14, 2026 •

edited

Loading