Skip to content

redis_store: hold subscribed_keys write lock across receiver drop#2353

Open
amankrx wants to merge 2 commits into
TraceMachina:mainfrom
amankrx:fix/redis-subscription-drop-race
Open

redis_store: hold subscribed_keys write lock across receiver drop#2353
amankrx wants to merge 2 commits into
TraceMachina:mainfrom
amankrx:fix/redis-subscription-drop-race

Conversation

@amankrx
Copy link
Copy Markdown
Collaborator

@amankrx amankrx commented May 20, 2026

RedisSubscription::Drop previously dropped the watch::Receiver before taking the subscribed_keys write lock, then decided whether to remove the publisher entry based on receiver_count() == 0. Two concurrent drops on subscriptions sharing a publisher (e.g. multiple WaitExecution clients on the same operation_id) could both decrement their counts before either took the lock, then race for it: the loser saw the entry already removed and emitted a spurious "Key … was not found in subscribed keys" error. Worse, if a fresh subscribe(same_key) interleaved between the two drops, the second drop could remove the freshly-inserted publisher and silently strand its subscribers.

Acquire the write lock first, evaluate "count == 1 with my receiver still alive", remove the entry under the lock if so, then drop the receiver. The lock now serialises both the count read and the map mutation, closing both race windows. Demote the absence log from error! to warn!: with the fix, that path now indicates a genuine unexpected mutation outside the lock, not the race noise.

Description

Please include a summary of the changes and the related issue. Please also
include relevant motivation and context.

Fixes # (issue)

Type of change

Please delete options that aren't relevant.

  • Bug fix (non-breaking change which fixes an issue)

How Has This Been Tested?

Adds 4 regression tests covering single-drop silence, drop-one-of-two preserving the publisher, 200-iteration concurrent-drop race, and resubscribe-after-drop creating a fresh publisher.

Checklist

  • Updated documentation if needed
  • Tests added/amended
  • bazel test //... passes locally
  • PR is contained in a single commit, using git amend see some docs

This change is Reviewable

Copy link
Copy Markdown
Collaborator

@MarcusSorealheis MarcusSorealheis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants