Skip to content

chore: fixing fdv2 race condition#343

Merged
tanderson-ld merged 4 commits intomainfrom
ta/SDK-2072/fix-fdv2-race-condition
Apr 10, 2026
Merged

chore: fixing fdv2 race condition#343
tanderson-ld merged 4 commits intomainfrom
ta/SDK-2072/fix-fdv2-race-condition

Conversation

@tanderson-ld
Copy link
Copy Markdown
Contributor

@tanderson-ld tanderson-ld commented Apr 10, 2026

The root of the bug is that OFF is reported in stop(), but the worker thread may still be running and making its own calls.

The fix is to update stop() to behave exactly like start, which is that the request is made only once and the callback is queued into a pending list. The worker thread is now the only thread that makes data and status callbacks, which eliminates the race risk.


Note

Medium Risk
Touches FDv2 data source lifecycle/concurrency and changes when OFF is reported; bugs here could affect connection shutdown behavior and callback ordering under contention.

Overview
Fixes a race where stop() could report DataSourceState.OFF while the FDv2 worker thread was still emitting later status updates.

FDv2DataSource.stop() is reworked to be idempotent and callback-queued (like start()), with a shared shutdownCause used to coordinate shutdown so only the worker thread emits the final OFF status (and includes an error only for unintentional shutdowns). Tests update mock synchronizers to unblock pending next() calls on close() and add coverage ensuring no status events occur after OFF.

Reviewed by Cursor Bugbot for commit 0228cef. Bugbot is set up for automated code reviews on this repo. Configure here.

@tanderson-ld tanderson-ld requested a review from a team as a code owner April 10, 2026 15:05
}

@Test
public void noStatusEventsEmittedAfterOff() throws Exception {
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: This test fails on main, but passes on this fix branch.

LDAwaitFuture<FDv2SourceResult> f = new LDAwaitFuture<>();
f.set(FDv2SourceResult.status(FDv2SourceResult.Status.shutdown()));
return f;
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: This mock was missing some behavior that is now exercised by the fix.

LDAwaitFuture<FDv2SourceResult> f = pendingFuture;
if (f != null) {
f.set(FDv2SourceResult.status(FDv2SourceResult.Status.shutdown()));
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: This mock was missing some behavior that is now exercised by the fix.

Copy link
Copy Markdown
Contributor Author

@tanderson-ld tanderson-ld Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: This check is was not explicitly necessary.

sourceManager.close();
// Caller owns sharedExecutor; we do not shut it down.
dataSourceUpdateSink.setStatus(DataSourceState.OFF, null);
completionCallback.onSuccess(null);
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: This check is was not explicitly necessary.


// stopping is still in progress; queue the callback to be fired by tryCompleteStop.
pendingStopCallbacks.add(completionCallback);
}
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: stop() is now symmetric with start() in that it influences the worker thread idempotently and queues for callback when complete.

}

runSynchronizers(context, dataSourceUpdateSink);
LDFailure failure = maybeReportUnexpectedExhaustion("All data source acquisition methods have been exhausted.");
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: This is partly related to the race, and is now eliminated by all OFF statuses happening after shutdownCause is set.

Copy link
Copy Markdown

@cursor cursor bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cursor Bugbot has reviewed your changes and found 1 potential issue.

Fix All in Cursor

❌ Bugbot Autofix is OFF. To automatically fix reported issues with cloud agents, have a team admin enable autofix in the Cursor dashboard.

Reviewed by Cursor Bugbot for commit c74b3f3. Configure here.

dataSourceUpdateSink.setStatus(DataSourceState.VALID, null);
tryCompleteStart(true, null);
return;
return; // this will go to the finally block and block until stop sets shutdownCause
Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: Before in this case, the worker thread would terminate and the stop() call would be responsible for reporting OFF.

Now, this code will go to the finally block and wait for a shutdown cause. By using a blocking future mechanism, the code is much simpler. This configuration is exceedingly rare, and so the tradeoff seems worth the simpler code.

LDFailure failure = maybeReportUnexpectedExhaustion("All initializers exhausted and there are no available synchronizers.");
tryCompleteStart(false, failure);
// try to claim this is the cause of the shutdown, but it might have already been set by an intentional stop().
shutdownCause.set(new LDFailure("All initializers exhausted and there are no available synchronizers.", LDFailure.FailureType.UNKNOWN_ERROR));
Copy link
Copy Markdown
Contributor Author

@tanderson-ld tanderson-ld Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: The only tryCompleteStart(false, failure) call happens after a shutdown cause now in the finally block. So that's a nice result!

Throwable cause;
try {
cause = shutdownCause.get();
} catch (Exception e) {
Copy link
Copy Markdown
Contributor Author

@tanderson-ld tanderson-ld Apr 10, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For reviewers: This exception is if the executor is shutting down.

@tanderson-ld tanderson-ld merged commit 376b17e into main Apr 10, 2026
8 checks passed
@tanderson-ld tanderson-ld deleted the ta/SDK-2072/fix-fdv2-race-condition branch April 10, 2026 19:26
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants