Fix pthreads normal-mutex debug implementation #26622
Open
slowriot wants to merge 9 commits intoemscripten-core:mainfrom
Open
Fix pthreads normal-mutex debug implementation #26622slowriot wants to merge 9 commits intoemscripten-core:mainfrom
slowriot wants to merge 9 commits intoemscripten-core:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
This PR fixes a debug-only regression in Emscripten’s musl pthread normal-mutex path under -pthread (notably affecting dlmalloc with -sWASM_WORKERS) by restoring historical _m_lock semantics while keeping same-thread relock deadlock trapping via debug-only thread-local ownership tracking. It also adds browser regression tests covering the allocator abort and the subsequent stale-owner deadlock false-positive.
Changes:
- Restore
PTHREAD_MUTEX_NORMALdebug behavior to keep_m_lockin the historical0/EBUSYencoding while maintaining deadlock trapping. - Add debug-only thread-local tracking of held normal mutexes using
_m_prev/_m_next, and use it for deadlock assertions. - Add two browser regression tests for the allocator regression and the reporting/teardown false-positive.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 3 comments.
Show a summary per file
| File | Description |
|---|---|
system/lib/libc/musl/src/thread/pthread_mutex_trylock.c |
Implements debug normal-mutex lock-word preservation and hooks normal-mutex ownership tracking. |
system/lib/libc/musl/src/thread/pthread_mutex_unlock.c |
Hooks normal-mutex unlock to remove from debug thread-local ownership tracking. |
system/lib/libc/musl/src/thread/pthread_mutex_timedlock.c |
Updates debug deadlock assertions to consult thread-local normal-mutex ownership list. |
system/lib/libc/musl/src/internal/pthread_impl.h |
Adds debug-only thread-local list + helpers for normal-mutex ownership tracking. |
test/test_browser.py |
Registers two new browser regression tests under the wasm worker suite. |
test/wasm_worker/pthread_mutex_debug_allocator_regression.cpp |
New browser repro focused on allocator churn under -pthread + wasm workers. |
test/wasm_worker/pthread_mutex_debug_reporting_teardown.cpp |
New browser repro focused on the follow-on stale-owner deadlock assert. |
AUTHORS |
Adds contributor entry. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
…/emscripten into pthread-mutex-debug-fix
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
This PR fixes a regression in Emscripten's debug-only
PTHREAD_MUTEX_NORMALpath that showed up under-pthread, and adds regression coverage for the two failure modes we reduced from the original application.See #26619 for details on the regression.
Problem
In
4.0.11, Emscripten changed debug builds soPTHREAD_MUTEX_NORMALno longer always takes the old fasta_cas(..., 0, EBUSY)path. The motivation was valid: in debug builds, a same-thread relock of a normal mutex should trap instead of hanging forever.The problem is that this changed the observable semantics of the mutex lock word for normal mutexes in debug builds. In particular, normal mutexes started carrying owner-style state through the slow path instead of preserving the historical
0/EBUSYencoding.That broke
dlmallocunder-pthreaddebug system-library builds.dlmallocuses normal pthread mutexes for its internal locks, and with the new debug path we were able to trigger allocator aborts with a minimal browser repro consisting only of:-pthread-sWASM_WORKERSNo application logic, rendering, or complex containers were required.
The first part of the fix was therefore to restore historical
_m_locksemantics for normal mutexes in debug Emscripten builds while keeping the new debug deadlock trapping behavior.However, that exposed a second issue.
To preserve the debug deadlock check without using
_m_lock, the intermediate fix stored normal-mutex ownership in_m_count. That was enough to fix the allocator regression, but it introduced a second false-positive deadlock under contention:_m_countis shared mutable state, so contending threads can observe stale owner information and incorrectly conclude that the current thread already owns the mutex.This second issue showed up in the harness-facing repro as a deadlock assert during steady-state execution/reporting, even though the mutex was not actually being recursively acquired by the same thread.
Root Cause
There are really two related problems here:
Debug normal mutexes were no longer preserving the historical
0/EBUSY_m_lockencoding.That broke internal users such as
dlmalloc._m_countis not a safe place to track debug-only ownership for normal mutexes under contention.It is shared state, so reading it from another thread is inherently racy and can produce stale-owner false positives.
So the correct design is:
_m_locksemantics unchanged for normal mutexesFix
This PR implements that design.
Changes:
system/lib/libc/musl/src/thread/pthread_mutex_trylock.csystem/lib/libc/musl/src/thread/pthread_mutex_unlock.csystem/lib/libc/musl/src/thread/pthread_mutex_timedlock.csystem/lib/libc/musl/src/internal/pthread_impl.hWhat the fix does:
PTHREAD_MUTEX_NORMALstill keeps_m_lockin the historical0/EBUSYform.struct pthread._m_countand not_m_lock.Why this design:
4.0.11debug change: same-thread relock of a normal mutex still traps instead of silently hanging._m_locksemantics thatdlmallocdepended on.A subtle but important implementation detail is that debug normal-mutex ownership is tracked in a per-thread list anchored on
struct pthread, rather than in a separate_Thread_localvariable or in shared mutex fields such as_m_lock/_m_count. This preserves the historical mutex lock-word semantics for normal mutexes, avoids racy shared ownership state, and also avoids introducing extra TLS state that brokeother.test_threadprofilerduring CI.Why not simpler alternatives
I intentionally did not take these approaches:
Reverting the
4.0.11debug change entirely.That would discard the intended "trap instead of deadlock" behavior that motivated the change.
Keeping
_m_countas the owner field.That fixes the first failure but causes the second one because
_m_countis shared and racy.Disabling assertions / forcing NDEBUG.
That works around the issue but does not fix the debug runtime.
The patch here is meant to be a real upstream fix, not a workaround.
Tests
This PR adds two browser regression tests in
test/test_browser.py:test/wasm_worker/pthread_mutex_debug_allocator_regression.cppThis captures the original allocator failure mode: debug
-pthread + -sWASM_WORKERSplus trivial main/worker allocation traffic must stay clean.test/wasm_worker/pthread_mutex_debug_reporting_teardown.cppThis captures the stale-owner false-positive that appeared after the first fix was applied.
I also verified that the existing deadlock behavior is still intact via:
other.test_pthread_mutex_deadlockother.test_threadprofilerSo the final state is:
--threadprofilerstartup behavior preservedValidation
The fix is validated with:
browser.test_wasm_worker_pthread_mutex_debug_allocator_regressionbrowser.test_wasm_worker_pthread_mutex_debug_reporting_teardownother.test_pthread_mutex_deadlockother.test_threadprofiler