Fix deadlock, UAF, and lock inversion in PathWatcher by robobun · Pull Request #28104 · oven-sh/bun

robobun · 2026-03-14T14:56:54Z

Summary

Targeted fix for three concurrency bugs in src/bun.js/node/path_watcher.zig that cause fs.watch() to deadlock or crash:

unregisterWatcher self-deadlock: The defer calling deinit() fired while this.mutex was still held (Zig defers are LIFO). Hoisted to a should_deinit flag checked by a defer registered before the lock, so LIFO ordering yields unlock → deinit.
unrefPendingTask UAF: Same pattern — deinit() called under this.mutex, then destroy(this) means the deferred mutex.unlock() writes to freed memory. Same fix pattern.
processWatcher AB/BA lock inversion: Worker thread held watcher.mutex → acquired manager.mutex via _decrementPathRef. Main thread held manager.mutex → wanted watcher.mutex. Fix: capture append result, unlock watcher.mutex before calling _decrementPathRef.

Also applies the same should_deinit pattern to unrefPendingDirectory to avoid self-deadlock when deinit() re-locks via setClosed().

This is the same approach as #27469, rebased onto current main.

Test plan

Debug build compiles successfully
fs.watch.test.ts passes (permission tests fail due to running as root — pre-existing, same on main)
CI green

https://claude.ai/code/session_01PxdzvsDCznAw3KZ5zSPF5c

Three targeted fixes in path_watcher.zig: 1. unregisterWatcher self-deadlock: The defer that called deinit() fired while this.mutex was still held (Zig defers are LIFO). Hoist to a should_deinit flag checked by a defer registered before the lock, so LIFO ordering yields unlock then deinit. 2. unrefPendingTask UAF: Same pattern — deinit() called under this.mutex, then destroy(this) means the deferred unlock is a UAF. Same fix: should_deinit flag with early defer. 3. processWatcher AB/BA lock inversion: Worker thread held watcher.mutex then acquired manager.mutex via _decrementPathRef. Main thread held manager.mutex then wanted watcher.mutex. Fix: capture append result, unlock watcher.mutex before calling _decrementPathRef. Also fixes unrefPendingDirectory with the same should_deinit pattern to avoid self-deadlock when deinit() re-locks this.mutex via setClosed(). https://claude.ai/code/session_01PxdzvsDCznAw3KZ5zSPF5c

robobun · 2026-03-14T14:57:06Z

^{Updated 8:39 AM PT - Mar 14th, 2026}

❌ @claude, your commit 693d6c0 has 10 failures in Build #39620 (All Failures):

test/regression/issue/3657.test.ts - pid 7428 error: addresssanitizer on 🐧 13 x64-asan
test/js/bun/util/v8-heap-snapshot.test.ts - SIGKILL on 🐧 3.23 x64
test/js/node/test/parallel/test-fs-watch-recursive-add-file-to-existing-subfolder.js - pid 8180 error: addresssanitizer on 🐧 13 x64-asan
test/js/node/test/parallel/test-fs-watch-recursive-update-file.js - pid 7560 error: addresssanitizer on 🐧 13 x64-asan
test/js/node/test/parallel/test-fs-watch-recursive-add-file.js - pid 4689 error: addresssanitizer on 🐧 13 x64-asan
test/js/node/test/parallel/test-fs-watch-recursive-add-file-with-url.js - pid 8150 error: addresssanitizer on 🐧 13 x64-asan
test/js/node/test/parallel/test-fs-watch-recursive-sync-write.js - pid 9301 error: addresssanitizer on 🐧 13 x64-asan
test/js/node/test/parallel/test-fs-watch-recursive-assert-leaks.js - pid 5457 error: addresssanitizer on 🐧 13 x64-asan
test/js/node/test/parallel/test-fs-watchfile.js - pid 3534 error: addresssanitizer on 🐧 13 x64-asan
test/js/node/test/parallel/test-fs-watch-recursive-add-folder.js - pid 3416 error: addresssanitizer on 🐧 13 x64-asan

🧪 To try this PR locally:

bunx bun-pr 28104

That installs a local version of the PR into your bun-28104 executable, so you can run:

bun-28104 --bun

github-actions · 2026-03-14T15:02:29Z

Found 8 issues this PR may fix:

Deadlock: all threads permanently blocked due to lock ordering violation between allocator and thread pool (macOS arm64, Bun 1.3.5 SEA) #26762 - Deadlock with lock ordering violation between allocator and thread pool - directly addresses the mutex deadlock prevention
fs.watch() throws unhandled EACCES when a file inside watched directory is not readable #28038 - fs.watch() throws unhandled EACCES - prevents crashes during error conditions with deferred cleanup
fs.watch does not work after previous .close() #18919 - fs.watch does not work after previous .close() - fixes cleanup issues that prevent proper reuse after close operations
fs.watch stops emitting change events for a file after it is deleted and recreated (Linux: WSL2/Ubuntu) #24875 - fs.watch stops emitting change events after file deletion/recreation - prevents state corruption during file lifecycle events
Bun watch #19004 - Bun watch crash on Windows - prevents use-after-free crashes during watcher cleanup
PANIC on hot-reload when modifying imported modules (release ,debug) and HUGE memory leak (debug). #8513 - PANIC on hot-reload with memory leak - addresses improper cleanup causing panics and memory leaks
Possible race condition with --filter #10357 - Possible race condition with --filter - fixes threading issues with improved mutex ordering
fs.watch goes to infinite loop with big files #6955 - fs.watch infinite loop with big files - prevents deadlocks that cause infinite loops during cleanup

If this is helpful, consider adding Fixes number to the PR description to auto-close the issue on merge.

Generated with Claude Code

coderabbitai · 2026-03-14T15:07:11Z

Walkthrough

Refactors deinitialization logic in path watcher to defer cleanup operations until after mutex locks are released, avoiding potential deadlocks and use-after-free scenarios during task and directory watcher teardown.

Changes

Cohort / File(s)	Summary
Path Watcher Deinitialization Safety `src/bun.js/node/path_watcher.zig`	Replaces direct deinit() calls with deferred cleanup patterns using should_deinit flags, ensuring deinitialization occurs after releasing locks. Adjusts reference counting and control flow around watcher teardown to prevent deadlocks and resource leaks during error scenarios.

🚥 Pre-merge checks | ✅ 2

✅ Passed checks (2 passed)

Check name	Status	Explanation
Title check	✅ Passed	The title accurately summarizes the main changes: fixing three concurrency bugs (deadlock, UAF, lock inversion) in PathWatcher, which is the core objective of the changeset.
Description check	✅ Passed	The description covers both required template sections: detailed explanation of what the PR does with specific bug fixes, and a test plan showing local verification steps completed.

_{✏️ Tip: You can configure your own custom pre-merge checks in the settings.}

📝 Coding Plan

Generate coding plan for human review comments

_{Comment @coderabbitai help to get the list of available commands and usage tips.}

coderabbitai

Actionable comments posted: 2

🤖 Prompt for all review comments with AI agents

Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 42-55: The race is that deinit_on_last_task can be set after the
last task cleared has_pending_tasks but before deinit() checks
hasPendingTasks(), leaving teardown untriggered; fix by making
PathWatcherManager.deinit() synchronize with the same mutex or atomically
coordinate with the pending-task flag: either grab the same this.mutex (the
mutex used by unrefPendingTask()/the block that does has_pending_tasks.store)
around the hasPendingTasks() check and the setting of deinit_on_last_task, or
perform an atomic read/compare (e.g. load has_pending_tasks with appropriate
ordering and if it is zero call the manager teardown immediately) and then store
deinit_on_last_task — ensure the code path that sets deinit_on_last_task cannot
race past a concurrent pending_tasks==0 store without triggering deinit
(reference symbols: PathWatcherManager.deinit(),
hasPendingTasks()/has_pending_tasks, deinit_on_last_task,
this.mutex/unrefPendingTask).
- Around line 847-863: The worker path can race into double-destroy by deferring
this.deinit() when it observes this.isClosed() while the closer concurrently
tears down the watcher; to fix, do not schedule deinit from the worker side:
inside the block that runs when pending_directories becomes 0 (the code using
this.pending_directories, this.has_pending_directories.store, and the isClosed()
check), remove the should_deinit/defer deinit logic and instead perform only
safe teardown work under the mutex (e.g. call unregisterWatcher() or set a
"closed-and-drained" state) and let the close path perform the actual
this.deinit() once it holds the proper ownership; ensure you reference and use
this.pending_directories, this.has_pending_directories, this.isClosed(),
unregisterWatcher(), and deinit() to implement this change so no deferred deinit
can run on freed memory.

ℹ️ Review info

⚙️ Run configuration

Configuration used: Path: .coderabbit.yaml

Review profile: ASSERTIVE

Plan: Pro

Run ID: c8035eef-7d0c-4045-802f-fb725d7d3456

📥 Commits

Reviewing files that changed from the base of the PR and between 1f134a1 and 693d6c0.

📒 Files selected for processing (1)

src/bun.js/node/path_watcher.zig

coderabbitai · 2026-03-14T15:07:13Z

+        // deinit() may destroy(this). Defer it until after unlock so we don't
+        // unlock() a freed mutex.
+        var should_deinit = false;
+        defer if (should_deinit) this.deinit();
+
        this.mutex.lock();
        defer this.mutex.unlock();
        this.pending_tasks -= 1;
-        if (this.deinit_on_last_task and this.pending_tasks == 0) {
+        if (this.pending_tasks == 0) {
+            // Clear unconditionally: if tasks drain to zero before deinit() runs,
+            // gating this on deinit_on_last_task leaves the flag stale-true and
+            // deinit() keeps deferring on a count that is already zero.
            this.has_pending_tasks.store(false, .release);
-            this.deinit();
+            if (this.deinit_on_last_task) should_deinit = true;


⚠️ Potential issue | 🟠 Major

deinit_on_last_task can still be armed after the final task already exited.

This fixes the stale-has_pending_tasks case, but PathWatcherManager.deinit() still does hasPendingTasks() before taking this.mutex. If the last worker reaches Line 50 after that load but before deinit() stores deinit_on_last_task = true, Line 55 stays false, the worker returns, and no later unrefPendingTask() exists to finish teardown. That leaves the manager/main watcher leaked after the last watcher closes on the error path.

Suggested fix

fn deinit(this: *PathWatcherManager) void { @@ - if (this.hasPendingTasks()) { - this.mutex.lock(); - defer this.mutex.unlock(); - // deinit when all tasks are done - this.deinit_on_last_task = true; - return; - } + { + this.mutex.lock(); + defer this.mutex.unlock(); + if (this.pending_tasks > 0) { + // deinit when all tasks are done + this.deinit_on_last_task = true; + return; + } + }

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/bun.js/node/path_watcher.zig` around lines 42 - 55, The race is that deinit_on_last_task can be set after the last task cleared has_pending_tasks but before deinit() checks hasPendingTasks(), leaving teardown untriggered; fix by making PathWatcherManager.deinit() synchronize with the same mutex or atomically coordinate with the pending-task flag: either grab the same this.mutex (the mutex used by unrefPendingTask()/the block that does has_pending_tasks.store) around the hasPendingTasks() check and the setting of deinit_on_last_task, or perform an atomic read/compare (e.g. load has_pending_tasks with appropriate ordering and if it is zero call the manager teardown immediately) and then store deinit_on_last_task — ensure the code path that sets deinit_on_last_task cannot race past a concurrent pending_tasks==0 store without triggering deinit (reference symbols: PathWatcherManager.deinit(), hasPendingTasks()/has_pending_tasks, deinit_on_last_task, this.mutex/unrefPendingTask).

coderabbitai · 2026-03-14T15:07:14Z

+        // deinit() calls setClosed() which re-locks this.mutex, and may then
+        // proceed to destroy(this). Defer it until after unlock so we don't
+        // self-deadlock or unlock() a freed mutex.
+        var should_deinit = false;
+        defer if (should_deinit) this.deinit();
+
        this.mutex.lock();
        defer this.mutex.unlock();
        this.pending_directories -= 1;
-        if (this.isClosed() and this.pending_directories == 0) {
+        if (this.pending_directories == 0) {
+            // Clear unconditionally: if the scan drains to zero before close()
+            // runs (the common case — scan is fast, close happens later),
+            // gating this on isClosed() leaves the flag stale-true. deinit()
+            // then early-returns on hasPendingDirectories() forever,
+            // unregisterWatcher never runs, and every fd the scan opened leaks.
            this.has_pending_directories.store(false, .release);
-            this.deinit();
+            if (this.isClosed()) should_deinit = true;


⚠️ Potential issue | 🔴 Critical

PathWatcher.deinit() can race the last unrefPendingDirectory() into a double destroy.

deinit() still calls setClosed() and then separately loads hasPendingDirectories(). If the close path releases this.mutex in setClosed() and the worker reaches Lines 856-863 before that load, the worker queues a deferred this.deinit() because closed == true while the closer simultaneously sees hasPendingDirectories() == false and tears the watcher down immediately. The deferred call then runs on freed memory.

Suggested fix

pub fn deinit(this: *PathWatcher) void { - this.setClosed(); - if (this.hasPendingDirectories()) { - // will be freed on last directory - return; - } + { + this.mutex.lock(); + defer this.mutex.unlock(); + this.closed.store(true, .release); + if (this.pending_directories > 0) { + // will be freed on last directory + return; + } + }

📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change

// deinit() calls setClosed() which re-locks this.mutex, and may then

// proceed to destroy(this). Defer it until after unlock so we don't

// self-deadlock or unlock() a freed mutex.

var should_deinit = false;

defer if (should_deinit) this.deinit();

this.mutex.lock();

defer this.mutex.unlock();

this.pending_directories -= 1;

if (this.isClosed() and this.pending_directories == 0) {

if (this.pending_directories == 0) {

// Clear unconditionally: if the scan drains to zero before close()

// runs (the common case — scan is fast, close happens later),

// gating this on isClosed() leaves the flag stale-true. deinit()

// then early-returns on hasPendingDirectories() forever,

// unregisterWatcher never runs, and every fd the scan opened leaks.

this.has_pending_directories.store(false, .release);

this.deinit();

if (this.isClosed()) should_deinit = true;

pub fn deinit(this: *PathWatcher) void {

{

this.mutex.lock();

defer this.mutex.unlock();

this.closed.store(true, .release);

if (this.pending_directories > 0) {

// will be freed on last directory

return;

}

}

🤖 Prompt for AI Agents

Verify each finding against the current code and only fix it if needed. In `@src/bun.js/node/path_watcher.zig` around lines 847 - 863, The worker path can race into double-destroy by deferring this.deinit() when it observes this.isClosed() while the closer concurrently tears down the watcher; to fix, do not schedule deinit from the worker side: inside the block that runs when pending_directories becomes 0 (the code using this.pending_directories, this.has_pending_directories.store, and the isClosed() check), remove the should_deinit/defer deinit logic and instead perform only safe teardown work under the mutex (e.g. call unregisterWatcher() or set a "closed-and-drained" state) and let the close path perform the actual this.deinit() once it holds the proper ownership; ensure you reference and use this.pending_directories, this.has_pending_directories, this.isClosed(), unregisterWatcher(), and deinit() to implement this change so no deferred deinit can run on freed memory.

claude · 2026-03-14T15:34:17Z

    pub fn unrefPendingDirectory(this: *PathWatcher) void {
+        // deinit() calls setClosed() which re-locks this.mutex, and may then
+        // proceed to destroy(this). Defer it until after unlock so we don't
+        // self-deadlock or unlock() a freed mutex.
+        var should_deinit = false;
+        defer if (should_deinit) this.deinit();
+
        this.mutex.lock();
        defer this.mutex.unlock();
        this.pending_directories -= 1;
-        if (this.isClosed() and this.pending_directories == 0) {
+        if (this.pending_directories == 0) {
+            // Clear unconditionally: if the scan drains to zero before close()
+            // runs (the common case — scan is fast, close happens later),
+            // gating this on isClosed() leaves the flag stale-true. deinit()
+            // then early-returns on hasPendingDirectories() forever,
+            // unregisterWatcher never runs, and every fd the scan opened leaks.
            this.has_pending_directories.store(false, .release);
-            this.deinit();
+            if (this.isClosed()) should_deinit = true;
        }
    }


🔴 Race condition in unrefPendingDirectory: both the main thread (via detach()->deinit()) and a worker thread (via the deferred should_deinit path) can execute PathWatcher.deinit() concurrently, causing a double-free/UAF. After the worker clears has_pending_directories and releases the mutex, the main thread's deinit() sees hasPendingDirectories()==false at line 909 and proceeds to destroy(this) at line 936, while the worker's deferred this.deinit() at line 851 then fires on freed memory. Adding an atomic deinit_started flag with a compare-and-swap guard in deinit() would ensure only one thread proceeds with cleanup.

Extended reasoning...

The Bug

The deferred should_deinit pattern introduced in unrefPendingDirectory (lines 846-865) creates a race window between the worker thread and the main thread that can lead to double-free or use-after-free.

Race Scenario Step-by-Step

Consider a user calling watcher.close() (which triggers detach()->deinit() on the main thread) while a DirectoryRegisterTask worker is finishing its last directory scan (which calls unrefPendingDirectory on the worker thread):

Thread A (main): detach() -> deinit() -> setClosed() acquires this.mutex, stores closed=true, releases this.mutex (lines 840-843)

Thread B (worker): unrefPendingDirectory() acquires this.mutex (line 853), decrements pending_directories to 0 (line 855), stores has_pending_directories=false with .release ordering (line 862), reads isClosed() -> true (set by Thread A in step 1), sets should_deinit=true (line 863), then the defer this.mutex.unlock() fires (line 854)

Thread A: hasPendingDirectories() (line 909) performs an atomic .acquire load of has_pending_directories -> sees false (Thread B cleared it in step 2; visible via the happens-before chain: A mutex.unlock -> B mutex.lock -> B .release store -> B mutex.unlock -> A .acquire load)

Thread A: Proceeds past the guard at line 909, calls manager.unregisterWatcher(this) (line 925), then bun.default_allocator.destroy(this) (line 936) — PathWatcher is now freed

Thread B: The outer defer fires (line 851): should_deinit == true -> this.deinit() -> this.setClosed() -> this.mutex.lock() — UAF: dereferences freed this

Even if step 5 doesn't crash on the mutex access, the second deinit() proceeds to call unregisterWatcher and destroy again, causing a double-free.

Why This Wasn't Possible Before

Before this PR, unrefPendingDirectory called this.deinit() while still holding this.mutex. Since deinit()->setClosed() tries to re-acquire this.mutex (a non-recursive lock), this self-deadlocked — which is the bug this PR intends to fix. The deadlock, while itself a bug, prevented the worker thread from ever completing deinit(), so only the main thread could proceed with cleanup. The PR correctly fixes the self-deadlock by deferring deinit() until after the mutex is released, but this creates the window between mutex.unlock (DEFER-2 in LIFO order) and deinit (DEFER-1) where the main thread can observe the cleared has_pending_directories flag and race ahead to destroy the watcher.

Impact

This is a real concurrency bug that can cause crashes (segfault on UAF or double-free) in production when a user closes a recursive fs.watch() watcher while directory scanning is still in progress. The race window is narrow but plausible since it only requires the main-thread close and the last worker directory-unref to overlap.

Suggested Fix

Add an atomic deinit_started flag (e.g., std.atomic.Value(bool)) to PathWatcher and guard deinit() with an atomic compare-and-swap:

pub fn deinit(this: *PathWatcher) void { // Ensure only one thread proceeds with cleanup if (this.deinit_started.cmpxchgStrong(false, true, .acq_rel, .acquire) != null) return; this.setClosed(); // ... rest of deinit }

This ensures that whichever thread reaches deinit() first performs the cleanup, and the other thread's call is a no-op.

claude · 2026-03-14T15:34:17Z

+        var should_deinit = false;
+        defer if (should_deinit) this.deinit();
+
        this.mutex.lock();
        defer this.mutex.unlock();

        var watchers = this.watchers.slice();


🟣 Pre-existing AB/BA lock inversion between onError() and deinit(): onError() acquires this.mutex then default_manager_mutex (lines 311, 326), while deinit() acquires them in the opposite order (lines 671, 685). This PR makes the deadlock path more reachable by deferring deinit() outside the lock in unregisterWatcher, but the root cause (inconsistent lock ordering in onError) is pre-existing. Consider fixing onError() to release this.mutex before acquiring default_manager_mutex, matching the order used by deinit().

Extended reasoning...

What the bug is

onError() (line 306) and deinit() (line 669) acquire this.mutex and default_manager_mutex in opposite orders, creating a classic AB/BA lock inversion:

onError() (watcher thread): acquires this.mutex (line 311), then default_manager_mutex (line 326)

deinit(): acquires default_manager_mutex (line 671), then conditionally acquires this.mutex (line 685, when hasPendingTasks() is true)

How the PR makes this more reachable

Before this PR, deinit() was called from unregisterWatcher and unrefPendingTask while this.mutex was already held. This caused a self-deadlock at line 685 (re-locking a non-recursive mutex), which was itself a bug — but it also meant the cross-thread AB/BA path could never be reached from those call sites.

After this PR, deinit() is deferred until after this.mutex is released (the should_deinit pattern). This correctly fixes the self-deadlock and UAF, but now deinit() enters with no locks held and acquires default_manager_mutex first, then this.mutex — the opposite order from onError().

Concrete deadlock scenario

Thread A (main thread): unregisterWatcher sets should_deinit = true, releases this.mutex, deferred deinit() fires, acquires default_manager_mutex (line 671), calls hasPendingTasks() which returns true (directory scans in flight), tries this.mutex.lock() at line 685 — blocks.

Thread B (watcher thread): onError() fires, acquires this.mutex (line 311), tries default_manager_mutex.lock() at line 326 — blocks.

Result: classic deadlock — each thread holds the lock the other needs.

Why this is pre-existing

The inconsistent lock ordering in onError() is not introduced by this PR — onError() is unchanged. Moreover, onError() itself calls this.deinit() at line 332 after releasing both locks (via the block scope ending at line 329). This means a concurrent deinit() from another path (e.g., a second error, or a watcher closing) could already race against onError() acquiring default_manager_mutex while holding this.mutex. The PR increases the surface area for hitting this inversion but does not introduce the fundamental ordering inconsistency.

Impact and fix

The deadlock requires a watcher error occurring simultaneously with the last watcher being unregistered while directory scan tasks are pending — a narrow but real window. The fix would be to make onError() release this.mutex before acquiring default_manager_mutex, similar to how processWatcher was fixed in this same PR. For example, the watchers block (lines 310-328) could be split: iterate and notify watchers under this.mutex, release it, then acquire default_manager_mutex to clear default_manager.

robobun · 2026-04-17T03:57:58Z

Superseded by #29391 — same approach, rebased onto current main with no behavior changes beyond the defer-ordering fix (the unconditional has_pending_* clear in this PR caused ASAN regressions).

@chrislloyd

## What does this PR do? Targeted fix for four concurrency bugs in `src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock or crash. This is the minimal port of #27469 (by @chrislloyd) onto current main — previous attempts (#27957, #28088, #28104) added scope and introduced ASAN regressions. This PR applies only the defer-ordering fix. ## Root causes ### `unregisterWatcher` self-deadlock (primary) The deinit-on-last-watcher `defer` was registered *after* the `mutex.lock()`/`defer mutex.unlock()` pair: ```zig this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } ``` Zig defers fire LIFO, so `deinit()` ran while still holding `this.mutex`. `deinit()` re-acquires `this.mutex` when `hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock (observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock detected` on Linux). Also UAF: if `deinit()` completes it calls `destroy(this)`, then the deferred `mutex.unlock()` writes to freed memory. ### `unrefPendingTask` UAF Same shape: `deinit()` called while holding `this.mutex`. If it completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF. ### `unrefPendingDirectory` self-deadlock `deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while holding → self-deadlock. ### `processWatcher` AB/BA lock inversion Worker thread holds `watcher.mutex` → calls `manager._decrementPathRef()` on the OOM error path, which acquires `manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` → wants `watcher.mutex`. Classic AB/BA deadlock. ## Fix All four use the same pattern: set a `should_deinit` flag under the lock, checked by a `defer` registered *before* the lock/unlock pair. LIFO ordering then yields unlock → deinit. For `processWatcher`, capture the append result and release `watcher.mutex` before calling `_decrementPathRef`. No semantic changes to when the `has_pending_*` atomics are cleared — the conditions remain exactly as before, only `deinit()` is moved outside the lock. ## How did you verify your code works? - `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes - Without the fix (src/ stashed), the same test fails with `panic: Deadlock detected` from the debug-build mutex checker - All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under ASAN (these regressed in #28104) - `test/regression/issue/3657.test.ts` passes under ASAN - `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing root-permission failures (same on main) Supersedes #27469, #27957, #28088, #28104. --- **Re: duplicate-bot flagging #26385** — that PR fixes a different lock inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around `main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex` self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex` inversion. They're complementary, not duplicates. **Re: find-issues-bot flagging #18919** — tested the repro; it hits a separate pre-existing use-after-poison in the File Watcher thread that also reproduces on main without this change. Not claiming that fix here.

@chrislloyd

…9391) ## What does this PR do? Targeted fix for four concurrency bugs in `src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock or crash. This is the minimal port of oven-sh#27469 (by @chrislloyd) onto current main — previous attempts (oven-sh#27957, oven-sh#28088, oven-sh#28104) added scope and introduced ASAN regressions. This PR applies only the defer-ordering fix. ## Root causes ### `unregisterWatcher` self-deadlock (primary) The deinit-on-last-watcher `defer` was registered *after* the `mutex.lock()`/`defer mutex.unlock()` pair: ```zig this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } ``` Zig defers fire LIFO, so `deinit()` ran while still holding `this.mutex`. `deinit()` re-acquires `this.mutex` when `hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock (observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock detected` on Linux). Also UAF: if `deinit()` completes it calls `destroy(this)`, then the deferred `mutex.unlock()` writes to freed memory. ### `unrefPendingTask` UAF Same shape: `deinit()` called while holding `this.mutex`. If it completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF. ### `unrefPendingDirectory` self-deadlock `deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while holding → self-deadlock. ### `processWatcher` AB/BA lock inversion Worker thread holds `watcher.mutex` → calls `manager._decrementPathRef()` on the OOM error path, which acquires `manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` → wants `watcher.mutex`. Classic AB/BA deadlock. ## Fix All four use the same pattern: set a `should_deinit` flag under the lock, checked by a `defer` registered *before* the lock/unlock pair. LIFO ordering then yields unlock → deinit. For `processWatcher`, capture the append result and release `watcher.mutex` before calling `_decrementPathRef`. No semantic changes to when the `has_pending_*` atomics are cleared — the conditions remain exactly as before, only `deinit()` is moved outside the lock. ## How did you verify your code works? - `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes - Without the fix (src/ stashed), the same test fails with `panic: Deadlock detected` from the debug-build mutex checker - All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under ASAN (these regressed in oven-sh#28104) - `test/regression/issue/3657.test.ts` passes under ASAN - `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing root-permission failures (same on main) Supersedes oven-sh#27469, oven-sh#27957, oven-sh#28088, oven-sh#28104. --- **Re: duplicate-bot flagging oven-sh#26385** — that PR fixes a different lock inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around `main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex` self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex` inversion. They're complementary, not duplicates. **Re: find-issues-bot flagging oven-sh#18919** — tested the repro; it hits a separate pre-existing use-after-poison in the File Watcher thread that also reproduces on main without this change. Not claiming that fix here.

robobun mentioned this pull request Mar 14, 2026

Fix PathWatcherManager deadlock and UAF in deferred deinit #28088

Closed

github-actions Bot added the claude label Mar 14, 2026

coderabbitai Bot reviewed Mar 14, 2026

View reviewed changes

claude Bot reviewed Mar 14, 2026

View reviewed changes

robobun mentioned this pull request Apr 17, 2026

Fix PathWatcherManager deadlock and UAF in deferred deinit #29391

Merged

robobun closed this Apr 17, 2026

claude Bot mentioned this pull request Apr 29, 2026

fs.watch: fix PathWatcher double-free race between close() and DirectoryRegisterTask #29936

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Fix deadlock, UAF, and lock inversion in PathWatcher#28104

Fix deadlock, UAF, and lock inversion in PathWatcher#28104
robobun wants to merge 1 commit intomainfrom
claude/path-watcher-deadlock-fix-v2

robobun commented Mar 14, 2026

Uh oh!

robobun commented Mar 14, 2026 •

edited

Loading

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

coderabbitai Bot commented Mar 14, 2026

Uh oh!

coderabbitai Bot left a comment

Uh oh!

coderabbitai Bot Mar 14, 2026

Uh oh!

coderabbitai Bot Mar 14, 2026

Uh oh!

claude Bot Mar 14, 2026

Uh oh!

claude Bot Mar 14, 2026

Uh oh!

robobun commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

robobun commented Mar 14, 2026

Summary

Test plan

Uh oh!

robobun commented Mar 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

github-actions Bot commented Mar 14, 2026

Uh oh!

coderabbitai Bot commented Mar 14, 2026

Walkthrough

Changes

Uh oh!

coderabbitai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

coderabbitai Bot Mar 14, 2026

Choose a reason for hiding this comment

Uh oh!

claude Bot Mar 14, 2026

Choose a reason for hiding this comment

The Bug

Race Scenario Step-by-Step

Why This Wasn't Possible Before

Impact

Suggested Fix

Uh oh!

claude Bot Mar 14, 2026

Choose a reason for hiding this comment

What the bug is

How the PR makes this more reachable

Concrete deadlock scenario

Why this is pre-existing

Impact and fix

Uh oh!

robobun commented Apr 17, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

robobun commented Mar 14, 2026 •

edited

Loading