Fix check-then-act race condition in parallel subscriber#6956
Conversation
| inFlightRequests.put(0, response); | ||
| inFlightRequestsNum.incrementAndGet(); | ||
| inFlightPermits.tryAcquire(); |
There was a problem hiding this comment.
I'm probably missing something here - but is there a reason that tryAcquire is before doing inFlightRequests.put and the presignedUrlExtension().getObject?
Additionally - is tryAcquire what we want to use here? It can return false which we're not handling, should we use acquire here instead? I know this is the first request so it should in theory always return true, but probably better to either handle the false or use acquire?
There was a problem hiding this comment.
Moved tryAcquire before getObject and inFlightRequests.put.
Added a defensive check, throws IllegalStateException if it tryAcquire fails.
|
|
||
| response.whenComplete((res, error) -> { | ||
| if (error != null || isCompletedExceptionally.get()) { | ||
| inFlightPermits.release(); |
There was a problem hiding this comment.
Is there a reason we don't also do inFlightRequests.remove(0); in the error case?
There was a problem hiding this comment.
Didn't do inFlightRequests.remove(0) because handlePartError cancels all in-flight requests.
But's its more cleaner to remove explicitly - added inFlightRequests.remove(0); and moved remove and release to top of whenComplete
|
|
||
| response.whenComplete((res, error) -> { | ||
| if (error != null || isCompletedExceptionally.get()) { | ||
| inFlightPermits.release(); |
There was a problem hiding this comment.
We're doing inFlightPermits.release(); in all of the paths through whenComplete right? If so, I think it would make more sense to move up top and do only once. (ie, put it at line 220 before the first if)
7e1c3a0
into
feature/master/pre-signed-url-getobject
|
This pull request has been closed and the conversation has been locked. Comments on closed PRs are hard for our team to see. If you need more assistance, please open a new issue that references this one. |
Motivation and Context
The
ParallelPresignedUrlMultipartDownloaderSubscriberhas a check-then-act race condition when limiting concurrent in-flight requests. Two threads can simultaneously readinFlightRequestsNumas below the limit, both pass the check, and both send a request — exceedingmaxInFlightParts.The race occurs between:
Modifications
Replaced AtomicInteger
inFlightRequestsNumwithjava.util.concurrent.Semaphore.Semaphore.tryAcquire()atomically checks availability and reserves a permit in a single operation — eliminating the gap between check and act. No two threads can both "pass the check" because the check IS the reservation.tryAcquire() — atomic check+reserve, enqueue if unavailablerelease() on completion, error, or validation failuretryAcquire() in the drain loop,release() if poll returns nulltryAcquire()/release() for the initial requestParallelPresignedUrlMultipartDownloaderSubscriber: ReplacedAtomicInteger inFlightRequestsNumwithSemaphore inFlightPermits.All increment/decrement/check operations replaced withtryAcquire()/release(). Added permit release on all error paths to prevent permit leaks.Testing
ParallelPresignedUrlMultipartDownloaderSubscriberTest: Added multiPartDownload_manyParts_shouldCompleteSuccessfully — downloads a 13-part object (exceeding maxInFlightParts=10) to verify the Semaphore correctly batches requests without deadlock or permit leaks.Screenshots (if appropriate)
Types of changes
Checklist
mvn installsucceedsscripts/new-changescript and following the instructions. Commit the new file created by the script in.changes/next-releasewith your changes.License