Fix deadlock, UAF, and lock inversion in PathWatcher#28104
Fix deadlock, UAF, and lock inversion in PathWatcher#28104
Conversation
Three targeted fixes in path_watcher.zig: 1. unregisterWatcher self-deadlock: The defer that called deinit() fired while this.mutex was still held (Zig defers are LIFO). Hoist to a should_deinit flag checked by a defer registered before the lock, so LIFO ordering yields unlock then deinit. 2. unrefPendingTask UAF: Same pattern — deinit() called under this.mutex, then destroy(this) means the deferred unlock is a UAF. Same fix: should_deinit flag with early defer. 3. processWatcher AB/BA lock inversion: Worker thread held watcher.mutex then acquired manager.mutex via _decrementPathRef. Main thread held manager.mutex then wanted watcher.mutex. Fix: capture append result, unlock watcher.mutex before calling _decrementPathRef. Also fixes unrefPendingDirectory with the same should_deinit pattern to avoid self-deadlock when deinit() re-locks this.mutex via setClosed(). https://claude.ai/code/session_01PxdzvsDCznAw3KZ5zSPF5c
|
Found 8 issues this PR may fix:
If this is helpful, consider adding Fixes number to the PR description to auto-close the issue on merge. Generated with Claude Code |
WalkthroughRefactors deinitialization logic in path watcher to defer cleanup operations until after mutex locks are released, avoiding potential deadlocks and use-after-free scenarios during task and directory watcher teardown. Changes
🚥 Pre-merge checks | ✅ 2✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. 📝 Coding Plan
Comment |
There was a problem hiding this comment.
Actionable comments posted: 2
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@src/bun.js/node/path_watcher.zig`:
- Around line 42-55: The race is that deinit_on_last_task can be set after the
last task cleared has_pending_tasks but before deinit() checks
hasPendingTasks(), leaving teardown untriggered; fix by making
PathWatcherManager.deinit() synchronize with the same mutex or atomically
coordinate with the pending-task flag: either grab the same this.mutex (the
mutex used by unrefPendingTask()/the block that does has_pending_tasks.store)
around the hasPendingTasks() check and the setting of deinit_on_last_task, or
perform an atomic read/compare (e.g. load has_pending_tasks with appropriate
ordering and if it is zero call the manager teardown immediately) and then store
deinit_on_last_task — ensure the code path that sets deinit_on_last_task cannot
race past a concurrent pending_tasks==0 store without triggering deinit
(reference symbols: PathWatcherManager.deinit(),
hasPendingTasks()/has_pending_tasks, deinit_on_last_task,
this.mutex/unrefPendingTask).
- Around line 847-863: The worker path can race into double-destroy by deferring
this.deinit() when it observes this.isClosed() while the closer concurrently
tears down the watcher; to fix, do not schedule deinit from the worker side:
inside the block that runs when pending_directories becomes 0 (the code using
this.pending_directories, this.has_pending_directories.store, and the isClosed()
check), remove the should_deinit/defer deinit logic and instead perform only
safe teardown work under the mutex (e.g. call unregisterWatcher() or set a
"closed-and-drained" state) and let the close path perform the actual
this.deinit() once it holds the proper ownership; ensure you reference and use
this.pending_directories, this.has_pending_directories, this.isClosed(),
unregisterWatcher(), and deinit() to implement this change so no deferred deinit
can run on freed memory.
ℹ️ Review info
⚙️ Run configuration
Configuration used: Path: .coderabbit.yaml
Review profile: ASSERTIVE
Plan: Pro
Run ID: c8035eef-7d0c-4045-802f-fb725d7d3456
📒 Files selected for processing (1)
src/bun.js/node/path_watcher.zig
| // deinit() may destroy(this). Defer it until after unlock so we don't | ||
| // unlock() a freed mutex. | ||
| var should_deinit = false; | ||
| defer if (should_deinit) this.deinit(); | ||
|
|
||
| this.mutex.lock(); | ||
| defer this.mutex.unlock(); | ||
| this.pending_tasks -= 1; | ||
| if (this.deinit_on_last_task and this.pending_tasks == 0) { | ||
| if (this.pending_tasks == 0) { | ||
| // Clear unconditionally: if tasks drain to zero before deinit() runs, | ||
| // gating this on deinit_on_last_task leaves the flag stale-true and | ||
| // deinit() keeps deferring on a count that is already zero. | ||
| this.has_pending_tasks.store(false, .release); | ||
| this.deinit(); | ||
| if (this.deinit_on_last_task) should_deinit = true; |
There was a problem hiding this comment.
deinit_on_last_task can still be armed after the final task already exited.
This fixes the stale-has_pending_tasks case, but PathWatcherManager.deinit() still does hasPendingTasks() before taking this.mutex. If the last worker reaches Line 50 after that load but before deinit() stores deinit_on_last_task = true, Line 55 stays false, the worker returns, and no later unrefPendingTask() exists to finish teardown. That leaves the manager/main watcher leaked after the last watcher closes on the error path.
Suggested fix
fn deinit(this: *PathWatcherManager) void {
@@
- if (this.hasPendingTasks()) {
- this.mutex.lock();
- defer this.mutex.unlock();
- // deinit when all tasks are done
- this.deinit_on_last_task = true;
- return;
- }
+ {
+ this.mutex.lock();
+ defer this.mutex.unlock();
+ if (this.pending_tasks > 0) {
+ // deinit when all tasks are done
+ this.deinit_on_last_task = true;
+ return;
+ }
+ }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/bun.js/node/path_watcher.zig` around lines 42 - 55, The race is that
deinit_on_last_task can be set after the last task cleared has_pending_tasks but
before deinit() checks hasPendingTasks(), leaving teardown untriggered; fix by
making PathWatcherManager.deinit() synchronize with the same mutex or atomically
coordinate with the pending-task flag: either grab the same this.mutex (the
mutex used by unrefPendingTask()/the block that does has_pending_tasks.store)
around the hasPendingTasks() check and the setting of deinit_on_last_task, or
perform an atomic read/compare (e.g. load has_pending_tasks with appropriate
ordering and if it is zero call the manager teardown immediately) and then store
deinit_on_last_task — ensure the code path that sets deinit_on_last_task cannot
race past a concurrent pending_tasks==0 store without triggering deinit
(reference symbols: PathWatcherManager.deinit(),
hasPendingTasks()/has_pending_tasks, deinit_on_last_task,
this.mutex/unrefPendingTask).
| // deinit() calls setClosed() which re-locks this.mutex, and may then | ||
| // proceed to destroy(this). Defer it until after unlock so we don't | ||
| // self-deadlock or unlock() a freed mutex. | ||
| var should_deinit = false; | ||
| defer if (should_deinit) this.deinit(); | ||
|
|
||
| this.mutex.lock(); | ||
| defer this.mutex.unlock(); | ||
| this.pending_directories -= 1; | ||
| if (this.isClosed() and this.pending_directories == 0) { | ||
| if (this.pending_directories == 0) { | ||
| // Clear unconditionally: if the scan drains to zero before close() | ||
| // runs (the common case — scan is fast, close happens later), | ||
| // gating this on isClosed() leaves the flag stale-true. deinit() | ||
| // then early-returns on hasPendingDirectories() forever, | ||
| // unregisterWatcher never runs, and every fd the scan opened leaks. | ||
| this.has_pending_directories.store(false, .release); | ||
| this.deinit(); | ||
| if (this.isClosed()) should_deinit = true; |
There was a problem hiding this comment.
PathWatcher.deinit() can race the last unrefPendingDirectory() into a double destroy.
deinit() still calls setClosed() and then separately loads hasPendingDirectories(). If the close path releases this.mutex in setClosed() and the worker reaches Lines 856-863 before that load, the worker queues a deferred this.deinit() because closed == true while the closer simultaneously sees hasPendingDirectories() == false and tears the watcher down immediately. The deferred call then runs on freed memory.
Suggested fix
pub fn deinit(this: *PathWatcher) void {
- this.setClosed();
- if (this.hasPendingDirectories()) {
- // will be freed on last directory
- return;
- }
+ {
+ this.mutex.lock();
+ defer this.mutex.unlock();
+ this.closed.store(true, .release);
+ if (this.pending_directories > 0) {
+ // will be freed on last directory
+ return;
+ }
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| // deinit() calls setClosed() which re-locks this.mutex, and may then | |
| // proceed to destroy(this). Defer it until after unlock so we don't | |
| // self-deadlock or unlock() a freed mutex. | |
| var should_deinit = false; | |
| defer if (should_deinit) this.deinit(); | |
| this.mutex.lock(); | |
| defer this.mutex.unlock(); | |
| this.pending_directories -= 1; | |
| if (this.isClosed() and this.pending_directories == 0) { | |
| if (this.pending_directories == 0) { | |
| // Clear unconditionally: if the scan drains to zero before close() | |
| // runs (the common case — scan is fast, close happens later), | |
| // gating this on isClosed() leaves the flag stale-true. deinit() | |
| // then early-returns on hasPendingDirectories() forever, | |
| // unregisterWatcher never runs, and every fd the scan opened leaks. | |
| this.has_pending_directories.store(false, .release); | |
| this.deinit(); | |
| if (this.isClosed()) should_deinit = true; | |
| pub fn deinit(this: *PathWatcher) void { | |
| { | |
| this.mutex.lock(); | |
| defer this.mutex.unlock(); | |
| this.closed.store(true, .release); | |
| if (this.pending_directories > 0) { | |
| // will be freed on last directory | |
| return; | |
| } | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@src/bun.js/node/path_watcher.zig` around lines 847 - 863, The worker path can
race into double-destroy by deferring this.deinit() when it observes
this.isClosed() while the closer concurrently tears down the watcher; to fix, do
not schedule deinit from the worker side: inside the block that runs when
pending_directories becomes 0 (the code using this.pending_directories,
this.has_pending_directories.store, and the isClosed() check), remove the
should_deinit/defer deinit logic and instead perform only safe teardown work
under the mutex (e.g. call unregisterWatcher() or set a "closed-and-drained"
state) and let the close path perform the actual this.deinit() once it holds the
proper ownership; ensure you reference and use this.pending_directories,
this.has_pending_directories, this.isClosed(), unregisterWatcher(), and deinit()
to implement this change so no deferred deinit can run on freed memory.
| pub fn unrefPendingDirectory(this: *PathWatcher) void { | ||
| // deinit() calls setClosed() which re-locks this.mutex, and may then | ||
| // proceed to destroy(this). Defer it until after unlock so we don't | ||
| // self-deadlock or unlock() a freed mutex. | ||
| var should_deinit = false; | ||
| defer if (should_deinit) this.deinit(); | ||
|
|
||
| this.mutex.lock(); | ||
| defer this.mutex.unlock(); | ||
| this.pending_directories -= 1; | ||
| if (this.isClosed() and this.pending_directories == 0) { | ||
| if (this.pending_directories == 0) { | ||
| // Clear unconditionally: if the scan drains to zero before close() | ||
| // runs (the common case — scan is fast, close happens later), | ||
| // gating this on isClosed() leaves the flag stale-true. deinit() | ||
| // then early-returns on hasPendingDirectories() forever, | ||
| // unregisterWatcher never runs, and every fd the scan opened leaks. | ||
| this.has_pending_directories.store(false, .release); | ||
| this.deinit(); | ||
| if (this.isClosed()) should_deinit = true; | ||
| } | ||
| } |
There was a problem hiding this comment.
🔴 Race condition in unrefPendingDirectory: both the main thread (via detach()->deinit()) and a worker thread (via the deferred should_deinit path) can execute PathWatcher.deinit() concurrently, causing a double-free/UAF. After the worker clears has_pending_directories and releases the mutex, the main thread's deinit() sees hasPendingDirectories()==false at line 909 and proceeds to destroy(this) at line 936, while the worker's deferred this.deinit() at line 851 then fires on freed memory. Adding an atomic deinit_started flag with a compare-and-swap guard in deinit() would ensure only one thread proceeds with cleanup.
Extended reasoning...
The Bug
The deferred should_deinit pattern introduced in unrefPendingDirectory (lines 846-865) creates a race window between the worker thread and the main thread that can lead to double-free or use-after-free.
Race Scenario Step-by-Step
Consider a user calling watcher.close() (which triggers detach()->deinit() on the main thread) while a DirectoryRegisterTask worker is finishing its last directory scan (which calls unrefPendingDirectory on the worker thread):
- Thread A (main):
detach()->deinit()->setClosed()acquiresthis.mutex, storesclosed=true, releasesthis.mutex(lines 840-843) - Thread B (worker):
unrefPendingDirectory()acquiresthis.mutex(line 853), decrementspending_directoriesto 0 (line 855), storeshas_pending_directories=falsewith.releaseordering (line 862), readsisClosed()->true(set by Thread A in step 1), setsshould_deinit=true(line 863), then thedefer this.mutex.unlock()fires (line 854) - Thread A:
hasPendingDirectories()(line 909) performs an atomic.acquireload ofhas_pending_directories-> seesfalse(Thread B cleared it in step 2; visible via the happens-before chain: A mutex.unlock -> B mutex.lock -> B .release store -> B mutex.unlock -> A .acquire load) - Thread A: Proceeds past the guard at line 909, calls
manager.unregisterWatcher(this)(line 925), thenbun.default_allocator.destroy(this)(line 936) — PathWatcher is now freed - Thread B: The outer
deferfires (line 851):should_deinit == true->this.deinit()->this.setClosed()->this.mutex.lock()— UAF: dereferences freedthis
Even if step 5 doesn't crash on the mutex access, the second deinit() proceeds to call unregisterWatcher and destroy again, causing a double-free.
Why This Wasn't Possible Before
Before this PR, unrefPendingDirectory called this.deinit() while still holding this.mutex. Since deinit()->setClosed() tries to re-acquire this.mutex (a non-recursive lock), this self-deadlocked — which is the bug this PR intends to fix. The deadlock, while itself a bug, prevented the worker thread from ever completing deinit(), so only the main thread could proceed with cleanup. The PR correctly fixes the self-deadlock by deferring deinit() until after the mutex is released, but this creates the window between mutex.unlock (DEFER-2 in LIFO order) and deinit (DEFER-1) where the main thread can observe the cleared has_pending_directories flag and race ahead to destroy the watcher.
Impact
This is a real concurrency bug that can cause crashes (segfault on UAF or double-free) in production when a user closes a recursive fs.watch() watcher while directory scanning is still in progress. The race window is narrow but plausible since it only requires the main-thread close and the last worker directory-unref to overlap.
Suggested Fix
Add an atomic deinit_started flag (e.g., std.atomic.Value(bool)) to PathWatcher and guard deinit() with an atomic compare-and-swap:
pub fn deinit(this: *PathWatcher) void {
// Ensure only one thread proceeds with cleanup
if (this.deinit_started.cmpxchgStrong(false, true, .acq_rel, .acquire) != null) return;
this.setClosed();
// ... rest of deinit
}This ensures that whichever thread reaches deinit() first performs the cleanup, and the other thread's call is a no-op.
| var should_deinit = false; | ||
| defer if (should_deinit) this.deinit(); | ||
|
|
||
| this.mutex.lock(); | ||
| defer this.mutex.unlock(); | ||
|
|
||
| var watchers = this.watchers.slice(); |
There was a problem hiding this comment.
🟣 Pre-existing AB/BA lock inversion between onError() and deinit(): onError() acquires this.mutex then default_manager_mutex (lines 311, 326), while deinit() acquires them in the opposite order (lines 671, 685). This PR makes the deadlock path more reachable by deferring deinit() outside the lock in unregisterWatcher, but the root cause (inconsistent lock ordering in onError) is pre-existing. Consider fixing onError() to release this.mutex before acquiring default_manager_mutex, matching the order used by deinit().
Extended reasoning...
What the bug is
onError() (line 306) and deinit() (line 669) acquire this.mutex and default_manager_mutex in opposite orders, creating a classic AB/BA lock inversion:
onError()(watcher thread): acquiresthis.mutex(line 311), thendefault_manager_mutex(line 326)deinit(): acquiresdefault_manager_mutex(line 671), then conditionally acquiresthis.mutex(line 685, whenhasPendingTasks()is true)
How the PR makes this more reachable
Before this PR, deinit() was called from unregisterWatcher and unrefPendingTask while this.mutex was already held. This caused a self-deadlock at line 685 (re-locking a non-recursive mutex), which was itself a bug — but it also meant the cross-thread AB/BA path could never be reached from those call sites.
After this PR, deinit() is deferred until after this.mutex is released (the should_deinit pattern). This correctly fixes the self-deadlock and UAF, but now deinit() enters with no locks held and acquires default_manager_mutex first, then this.mutex — the opposite order from onError().
Concrete deadlock scenario
- Thread A (main thread):
unregisterWatchersetsshould_deinit = true, releasesthis.mutex, deferreddeinit()fires, acquiresdefault_manager_mutex(line 671), callshasPendingTasks()which returns true (directory scans in flight), triesthis.mutex.lock()at line 685 — blocks. - Thread B (watcher thread):
onError()fires, acquiresthis.mutex(line 311), triesdefault_manager_mutex.lock()at line 326 — blocks. - Result: classic deadlock — each thread holds the lock the other needs.
Why this is pre-existing
The inconsistent lock ordering in onError() is not introduced by this PR — onError() is unchanged. Moreover, onError() itself calls this.deinit() at line 332 after releasing both locks (via the block scope ending at line 329). This means a concurrent deinit() from another path (e.g., a second error, or a watcher closing) could already race against onError() acquiring default_manager_mutex while holding this.mutex. The PR increases the surface area for hitting this inversion but does not introduce the fundamental ordering inconsistency.
Impact and fix
The deadlock requires a watcher error occurring simultaneously with the last watcher being unregistered while directory scan tasks are pending — a narrow but real window. The fix would be to make onError() release this.mutex before acquiring default_manager_mutex, similar to how processWatcher was fixed in this same PR. For example, the watchers block (lines 310-328) could be split: iterate and notify watchers under this.mutex, release it, then acquire default_manager_mutex to clear default_manager.
|
Superseded by #29391 — same approach, rebased onto current main with no behavior changes beyond the defer-ordering fix (the unconditional |
## What does this PR do? Targeted fix for four concurrency bugs in `src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock or crash. This is the minimal port of #27469 (by @chrislloyd) onto current main — previous attempts (#27957, #28088, #28104) added scope and introduced ASAN regressions. This PR applies only the defer-ordering fix. ## Root causes ### `unregisterWatcher` self-deadlock (primary) The deinit-on-last-watcher `defer` was registered *after* the `mutex.lock()`/`defer mutex.unlock()` pair: ```zig this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } ``` Zig defers fire LIFO, so `deinit()` ran while still holding `this.mutex`. `deinit()` re-acquires `this.mutex` when `hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock (observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock detected` on Linux). Also UAF: if `deinit()` completes it calls `destroy(this)`, then the deferred `mutex.unlock()` writes to freed memory. ### `unrefPendingTask` UAF Same shape: `deinit()` called while holding `this.mutex`. If it completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF. ### `unrefPendingDirectory` self-deadlock `deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while holding → self-deadlock. ### `processWatcher` AB/BA lock inversion Worker thread holds `watcher.mutex` → calls `manager._decrementPathRef()` on the OOM error path, which acquires `manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` → wants `watcher.mutex`. Classic AB/BA deadlock. ## Fix All four use the same pattern: set a `should_deinit` flag under the lock, checked by a `defer` registered *before* the lock/unlock pair. LIFO ordering then yields unlock → deinit. For `processWatcher`, capture the append result and release `watcher.mutex` before calling `_decrementPathRef`. No semantic changes to when the `has_pending_*` atomics are cleared — the conditions remain exactly as before, only `deinit()` is moved outside the lock. ## How did you verify your code works? - `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes - Without the fix (src/ stashed), the same test fails with `panic: Deadlock detected` from the debug-build mutex checker - All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under ASAN (these regressed in #28104) - `test/regression/issue/3657.test.ts` passes under ASAN - `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing root-permission failures (same on main) Supersedes #27469, #27957, #28088, #28104. --- **Re: duplicate-bot flagging #26385** — that PR fixes a different lock inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around `main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex` self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex` inversion. They're complementary, not duplicates. **Re: find-issues-bot flagging #18919** — tested the repro; it hits a separate pre-existing use-after-poison in the File Watcher thread that also reproduces on main without this change. Not claiming that fix here.
…9391) ## What does this PR do? Targeted fix for four concurrency bugs in `src/bun.js/node/path_watcher.zig` that cause `fs.watch()` to deadlock or crash. This is the minimal port of oven-sh#27469 (by @chrislloyd) onto current main — previous attempts (oven-sh#27957, oven-sh#28088, oven-sh#28104) added scope and introduced ASAN regressions. This PR applies only the defer-ordering fix. ## Root causes ### `unregisterWatcher` self-deadlock (primary) The deinit-on-last-watcher `defer` was registered *after* the `mutex.lock()`/`defer mutex.unlock()` pair: ```zig this.mutex.lock(); defer this.mutex.unlock(); // fires last var watchers = this.watchers.slice(); defer { // fires BEFORE unlock if (this.deinit_on_last_watcher and this.watcher_count == 0) { this.deinit(); // called holding this.mutex } } ``` Zig defers fire LIFO, so `deinit()` ran while still holding `this.mutex`. `deinit()` re-acquires `this.mutex` when `hasPendingTasks()` is true. The mutex is non-recursive → self-deadlock (observed as `__ulock_wait2` hang on macOS, debug-build `panic: Deadlock detected` on Linux). Also UAF: if `deinit()` completes it calls `destroy(this)`, then the deferred `mutex.unlock()` writes to freed memory. ### `unrefPendingTask` UAF Same shape: `deinit()` called while holding `this.mutex`. If it completes, `destroy(this)` means the deferred `mutex.unlock()` is a UAF. ### `unrefPendingDirectory` self-deadlock `deinit()` calls `setClosed()` which re-locks `this.mutex`. Called while holding → self-deadlock. ### `processWatcher` AB/BA lock inversion Worker thread holds `watcher.mutex` → calls `manager._decrementPathRef()` on the OOM error path, which acquires `manager.mutex`. Main thread `unregisterWatcher` holds `manager.mutex` → wants `watcher.mutex`. Classic AB/BA deadlock. ## Fix All four use the same pattern: set a `should_deinit` flag under the lock, checked by a `defer` registered *before* the lock/unlock pair. LIFO ordering then yields unlock → deinit. For `processWatcher`, capture the append result and release `watcher.mutex` before calling `_decrementPathRef`. No semantic changes to when the `has_pending_*` atomics are cleared — the conditions remain exactly as before, only `deinit()` is moved outside the lock. ## How did you verify your code works? - `bun bd test test/js/node/watch/fs.watch.deadlock.test.ts` passes - Without the fix (src/ stashed), the same test fails with `panic: Deadlock detected` from the debug-build mutex checker - All Node.js parallel `test-fs-watch-recursive-*.js` tests pass under ASAN (these regressed in oven-sh#28104) - `test/regression/issue/3657.test.ts` passes under ASAN - `test/js/node/watch/fs.watch.test.ts` — 30 pass, 2 pre-existing root-permission failures (same on main) Supersedes oven-sh#27469, oven-sh#27957, oven-sh#28088, oven-sh#28104. --- **Re: duplicate-bot flagging oven-sh#26385** — that PR fixes a different lock inversion (`PathWatcherManager.mutex` ↔ `Watcher.mutex`, around `main_watcher.remove()`). This PR fixes `PathWatcherManager.mutex` self-deadlock and `PathWatcher.mutex` ↔ `PathWatcherManager.mutex` inversion. They're complementary, not duplicates. **Re: find-issues-bot flagging oven-sh#18919** — tested the repro; it hits a separate pre-existing use-after-poison in the File Watcher thread that also reproduces on main without this change. Not claiming that fix here.
Summary
Targeted fix for three concurrency bugs in
src/bun.js/node/path_watcher.zigthat causefs.watch()to deadlock or crash:unregisterWatcherself-deadlock: Thedefercallingdeinit()fired whilethis.mutexwas still held (Zig defers are LIFO). Hoisted to ashould_deinitflag checked by a defer registered before the lock, so LIFO ordering yields unlock → deinit.unrefPendingTaskUAF: Same pattern —deinit()called underthis.mutex, thendestroy(this)means the deferredmutex.unlock()writes to freed memory. Same fix pattern.processWatcherAB/BA lock inversion: Worker thread heldwatcher.mutex→ acquiredmanager.mutexvia_decrementPathRef. Main thread heldmanager.mutex→ wantedwatcher.mutex. Fix: capture append result, unlockwatcher.mutexbefore calling_decrementPathRef.Also applies the same
should_deinitpattern tounrefPendingDirectoryto avoid self-deadlock whendeinit()re-locks viasetClosed().This is the same approach as #27469, rebased onto current main.
Test plan
fs.watch.test.tspasses (permission tests fail due to running as root — pre-existing, same on main)https://claude.ai/code/session_01PxdzvsDCznAw3KZ5zSPF5c