Resolve EXC_BAD_ACCESS in GULNetworkURLSession via O(1) passive memory lifecycle cleanup#233
Resolve EXC_BAD_ACCESS in GULNetworkURLSession via O(1) passive memory lifecycle cleanup#233agustindeleon7 wants to merge 12 commits into
Conversation
This commit replaces the manual O(N) cleanup loop in `setSessionInFetcherMap:forSessionID:` with an O(1) passive self-cleanup mechanism using Associated Objects. It introduces `GULSessionDeallocTracker`, which is tied to the lifecycle of the `NSURLSession`. To prevent `objc_loadWeakRetained` crashes when the session is being destroyed, the tracker holds a strong reference to the `GULNetworkURLSessionWeakHolder` wrapper and uses direct pointer comparison during `dealloc` instead of reading the weak reference. It also ensures old trackers are deallocated outside of the `NSLock` to prevent deadlocks.
…variable warnings
|
Thanks for your pull request! It looks like this may be your first contribution to a Google open source project. Before we can look at your pull request, you'll need to sign a Contributor License Agreement (CLA). View this failed invocation of the CLA check for more information. For the most up to date status, view the checks section at the bottom of the pull request. |
|
Please address CI failures |
Hi @paulb777! Thanks for the reply. I will fix the CI failures. |
|
@paulb777 I've fixed the static analysis issues in this PR. Apologies for the oversight! Quick question: This failing job couldn't find a file because it was recently removed in firebase/firebase-ios-sdk#16195. Is it safe to ignore this failure? |
Good catch! That's safe to ignore that one. |
Thanks @ncooke3! I'm seeing another issue regarding the Xcode version in this job run. Should the Xcode version in the workflow be updated? |
|
#234 for the test coverage CI issue Looks like there are also CI issues for finding Xcode 16.2 on main |
|
Gemini review: Here is a review of the pull request "Resolve EXC_BAD_ACCESS in GULNetworkURLSession via O(1) passive memory lifecycle cleanup" (#233) based on the provided diff. Overall, this is a very elegant and well-thought-out Objective-C solution. It correctly replaces an inefficient O(N) active polling mechanism with an O(1) passive event-driven approach. Here is a detailed breakdown of the changes, the good parts, and a few suggestions for the author. 🌟 The Good
💡 Suggestions & Feedback for the PRWhile the logic is sound, the intent behind some of the concurrency handling is subtle and could easily be accidentally removed by a future maintainer who doesn't understand it. I would recommend adding a comment to the PR asking the author to clarify the oldTrackerToReleaseOutsideLock logic. 1. Add an inline comment for deadlock prevention The variable oldTrackerToReleaseOutsideLock and the (void) cast at the end of the function are clever but cryptic. A future developer might see (void)oldTrackerToReleaseOutsideLock;, assume it's useless dead code, delete it, and inadvertently reintroduce a deadlock. Suggested comment for the PR: "Great catch on preventing the deadlock with oldTrackerToReleaseOutsideLock. Because this is a very subtle concurrency detail, could we add a brief comment explaining why this is necessary so it doesn't get accidentally refactored out in the future? Something like: // Retain the old tracker until the lock is released. If it deallocates inside the lock, its -dealloc method will attempt to re-acquire the lock, resulting in a deadlock." 2. Nullability of sessionID In GULSessionDeallocTracker's -initWithSessionID:holder:, the sessionID property is blindly copied: _sessionID = [sessionID copy];. While setSessionInFetcherMap:forSessionID: checks if sessionID is nil early on (if (!sessionID) { return; }), it might be worth adding an NSParameterAssert(sessionID) in the tracker's init method just to be absolutely safe and document the non-null expectation. VerdictApprove, with minor nitpicks. The core architectural shift is solid, performant, and correctly handles concurrency edge cases. It just needs a tiny bit of documentation to protect its cleverness from future regressions. |
Thanks, but no worries. We can ignore those failures for this PR. I'll re-trigger CI. |
…ive ARC cleanup in GULNetworkURLSession
|
@morganchen12 Would you review this PR, the associated issue firebase/firebase-ios-sdk#16299, and the #230 which may have led to this potential regression. |
Context & Problem
The previous implementation for cleaning up released sessions in
sessionIDToFetcherMaprelied on a manual O(N) loop. In a highly concurrent environment, this approach introduced severe race conditions. Specifically, iterating and reading the weak property (holder.session) while the session object was in the middle of its destruction phase led to anEXC_BAD_ACCESScrash withinobjc_loadWeakRetained.Solution
This PR completely removes the active O(N) iteration loop and replaces it with an O(1) passive, lifecycle-driven cleanup strategy using Objective-C Associated Objects.
Key Architectural Changes:
GULSessionDeallocTracker: We attach a lightweight tracker to the session usingobjc_setAssociatedObject. The cleanup logic is now intrinsically tied to the session's natural ARC lifecycle.objc_loadWeakRetainedCrashes: Reading a weak reference of an object currently undergoing deallocation is unsafe. To mathematically guarantee this crash won't happen, the tracker holds a strong reference to itsGULNetworkURLSessionWeakHolderwrapper. During dealloc, it uses direct pointer comparison (currentDictionaryHolder == _holder) to safely verify if the entry should be removed, bypassing weak reference evaluation completely.NSLockscope. This prevents the old tracker's-deallocfrom attempting to re-acquire the global lock, which would cause an unrecoverable deadlock.Latest Refinements & CI Stability:
(void)oldTrackerToReleaseOutsideLock;after the lock is released.NSParameterAssert(sessionID)for clear nullability contracts.testSessionMapStressfromGULMutableDictionaryTest. This test was improperly architected (it stressedGULNetworkURLSessionrather than the dictionary itself), which caused heavy artificial lock contention and consistent CI timeouts on simulated runner environments.testSessionPassiveRemovalOnDeallocationtoGULNetworkTest. This focused unit test leverages an@autoreleasepoolblock to explicitly and deterministically validate the ARC-based cleanup mechanism without introducing flaky thread starvation.Impact
EXC_BAD_ACCESScrash.os_unfair_lock/NSLockconcurrency limits under test environments, and natively validate the new memory lifecycle.