Fix race in WithWaitForLeave that flaked TestPartialStateJoin/Outgoing_device_list_updates#878
Open
erikjohnston wants to merge 2 commits into
Open
Fix race in WithWaitForLeave that flaked TestPartialStateJoin/Outgoing_device_list_updates#878erikjohnston wants to merge 2 commits into
WithWaitForLeave that flaked TestPartialStateJoin/Outgoing_device_list_updates#878erikjohnston wants to merge 2 commits into
Conversation
27a2bb1 to
59d4118
Compare
…ave-handler race `WithWaitForLeave` registered a PDU handler (whose callback feeds `leaveChannel`) but then decided whether to wait by peeking at `room.CurrentState`. Because an incoming PDU is added to the room (`HandleTransactionRequests` calls `room.AddEvent`, updating `CurrentState`) *before* the PDU callback runs, the test goroutine could observe the updated state, take the "already seen" shortcut and return, firing the deferred `removePDUHandler()` in the window before the callback ran. The now-handler-less callback then flagged the (expected) leave as an unexpected PDU, failing the test. Wait on the channel fed by the handler instead of polling `room.CurrentState`, so the handler stays registered until its own callback has run. Every caller passes a joined user and an action that performs the leave, so the leave always arrives fresh via that callback. Make this contract explicit: snapshot membership before the action and fail if the user has already left (rather than silently treating it as "already seen"), and fail on timeout, since neither should happen in correct usage. The action still runs in the already-left case so any cleanup it performs is preserved. The `HandleTransactionRequests` ordering (callback after `room.AddEvent`) is left unchanged, as other tests rely on it (e.g. `federation_redaction_test.go` reads `serverRoom.Timeline` immediately after a callback-driven waiter).
59d4118 to
b744d2d
Compare
| t.Errorf("%s timed out waiting for %s to leave test room %s.", s.ServerName(), userID, room.RoomID) | ||
| } | ||
| } | ||
|
|
Collaborator
There was a problem hiding this comment.
Tests are failing:
❌ TestPartialStateJoin (2m34.33s)
❌ TestPartialStateJoin/Leave_during_resync (10.99s)
❌ TestPartialStateJoin/Leave_during_resync/does_not_wait_for_resync (2.28s)
❌ TestPartialStateJoin/Leave_during_resync/is_seen_after_the_resync (2.29s)
❌ TestPartialStateJoin/Outgoing_device_list_updates (24.94s)
❌ TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_all_servers_in_partial_state_rooms (1.68s)
❌ TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_incorrectly_absent_servers_once_partial_state_join_completes (3.81s)
❌ TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_incorrectly_kicked_servers_once_partial_state_join_completes (5.67s)
❌ TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_newly_joined_servers_in_partial_state_rooms (4.67s)
Specifics for couple test cases:
❌ TestPartialStateJoin/Leave_during_resync/does_not_wait_for_resync (2.28s)
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/register => 200 OK (111.307859ms)
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/capabilities => 200 OK (4.095333ms)
server.go:210: Creating room !0-kuhoS2HynKMFoBaAVt:host.docker.internal:45171 with version 10
federation_room_join_partial_state_test.go:3684: Alice begins a partial join to a room
federation_room_join_partial_state_test.go:4510: Registered state_ids handler for event $bktTP-VUFCYYjBE92q9rLxKnEnkERA6CL8evt3fpU5g
federation_room_join_partial_state_test.go:4551: Registered /state handler for event $bktTP-VUFCYYjBE92q9rLxKnEnkERA6CL8evt3fpU5g
2026/06/09 08:48:20 Received send-join of event $sWMXmm7djuzQ7_nFd8y6dW01nIvFTjgvCog7R-viTVo
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/join/!0-kuhoS2HynKMFoBaAVt:host.docker.internal:45171 => 200 OK (75.501743ms)
federation_room_join_partial_state_test.go:4404: /join request completed
federation_room_join_partial_state_test.go:3688: Alice waits to see her join
federation_room_join_partial_state_test.go:4489: Incoming state_ids request for event [$bktTP-VUFCYYjBE92q9rLxKnEnkERA6CL8evt3fpU5g] in room !0-kuhoS2HynKMFoBaAVt:host.docker.internal:45171
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/sync => 200 OK (22.618627ms)
federation_room_join_partial_state_test.go:3695: Alice leaves and waits for confirmation
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/rooms/!0-kuhoS2HynKMFoBaAVt:host.docker.internal:45171/leave => 200 OK (23.643394ms)
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/sync => 200 OK (19.05725ms)
federation_room_join_partial_state_test.go:3723: Alice's leave is received by the resident server
federation_room_join_partial_state_test.go:4427: Cleaning up after test...
federation_room_join_partial_state_test.go:4496: Replying to /state_ids request for event [$bktTP-VUFCYYjBE92q9rLxKnEnkERA6CL8evt3fpU5g]
federation_room_join_partial_state_test.go:4529: Incoming state request for event [$bktTP-VUFCYYjBE92q9rLxKnEnkERA6CL8evt3fpU5g] in room !0-kuhoS2HynKMFoBaAVt:host.docker.internal:45171
federation_room_join_partial_state_test.go:4537: Replying to /state request for event [$bktTP-VUFCYYjBE92q9rLxKnEnkERA6CL8evt3fpU5g]
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/rooms/!0-kuhoS2HynKMFoBaAVt:host.docker.internal:45171/members => 200 OK (32.05937ms)
federation_room_join_partial_state_test.go:4429: @user-51-t43alice:hs1's partial state join to !0-kuhoS2HynKMFoBaAVt:host.docker.internal:45171 completed.
federation_room_join_partial_state_test.go:156: host.docker.internal:45171: @user-51-t43alice:hs1 had already left test room !0-kuhoS2HynKMFoBaAVt:host.docker.internal:45171 before WithWaitForLeave ran.
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/rooms/!0-kuhoS2HynKMFoBaAVt:host.docker.internal:45171/leave => 200 OK (15.093236ms)
federation_room_join_partial_state_test.go:171: host.docker.internal:45171 timed out waiting for @user-51-t43alice:hs1 to leave test room !0-kuhoS2HynKMFoBaAVt:host.docker.internal:45171.
❌ TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_newly_joined_servers_in_partial_state_rooms (4.67s)
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/register => 200 OK (14.497077ms)
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/capabilities => 200 OK (6.319961ms)
server.go:210: Creating room !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483 with version 10
federation_room_join_partial_state_test.go:4510: Registered state_ids handler for event $kLwIKis2rqHT0iYKhrlPl9ANQbbQZCNJIh0QcHG1-Wc
federation_room_join_partial_state_test.go:4551: Registered /state handler for event $kLwIKis2rqHT0iYKhrlPl9ANQbbQZCNJIh0QcHG1-Wc
2026/06/09 08:47:43 Received send-join of event $eCDCY848myJfso3EUaOp0tSD-oYrAaRoEM0uKmV8QQ4
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/join/!0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483 => 200 OK (118.335705ms)
federation_room_join_partial_state_test.go:4404: /join request completed
federation_room_join_partial_state_test.go:4489: Incoming state_ids request for event [$kLwIKis2rqHT0iYKhrlPl9ANQbbQZCNJIh0QcHG1-Wc] in room !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483
client.go:860: [CSAPI] PUT hs1/_matrix/client/v3/devices/HJZGHFCAOA => 200 OK (86.732122ms)
federation_room_join_partial_state_test.go:2294: @user-33-t24alice:hs1 sent device list update.
federation_room_join_partial_state_test.go:2167: Complement server received m.device_list_update: {"device_display_name":"A new device name 1","device_id":"HJZGHFCAOA","prev_id":[],"stream_id":41,"user_id":"@user-33-t24alice:hs1"}
federation_room_join_partial_state_test.go:2297: @charlie and @derek received device list update.
2026/06/09 08:47:44 Received send-join of event $aIVc_eE8-6DBmfRIR6IfFdu7nBYl8YBfHeGcbcy_Hbk
federation_room_join_partial_state_test.go:2300: Server.MustJoinRoom joined room ID !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/sync => 200 OK (68.678334ms)
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/sync => 200 OK (10.852562ms)
federation_room_join_partial_state_test.go:2313: Alice successfully received event $aIVc_eE8-6DBmfRIR6IfFdu7nBYl8YBfHeGcbcy_Hbk via /sync
client.go:860: [CSAPI] PUT hs1/_matrix/client/v3/devices/HJZGHFCAOA => 200 OK (7.263855ms)
federation_room_join_partial_state_test.go:2316: @user-33-t24alice:hs1 sent device list update.
federation_room_join_partial_state_test.go:2167: Complement server received m.device_list_update: {"device_display_name":"A new device name 2","device_id":"HJZGHFCAOA","prev_id":[],"stream_id":43,"user_id":"@user-33-t24alice:hs1"}
federation_room_join_partial_state_test.go:2167: Complement server received m.device_list_update: {"device_display_name":"A new device name 2","device_id":"HJZGHFCAOA","prev_id":[41],"stream_id":43,"user_id":"@user-33-t24alice:hs1"}
federation_room_join_partial_state_test.go:2319: @charlie, @derek and @elsie received device list update.
federation_room_join_partial_state_test.go:4496: Replying to /state_ids request for event [$kLwIKis2rqHT0iYKhrlPl9ANQbbQZCNJIh0QcHG1-Wc]
federation_room_join_partial_state_test.go:4529: Incoming state request for event [$kLwIKis2rqHT0iYKhrlPl9ANQbbQZCNJIh0QcHG1-Wc] in room !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483
federation_room_join_partial_state_test.go:4537: Replying to /state request for event [$kLwIKis2rqHT0iYKhrlPl9ANQbbQZCNJIh0QcHG1-Wc]
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/rooms/!0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483/members => 200 OK (76.347531ms)
federation_room_join_partial_state_test.go:2323: @user-33-t24alice:hs1's partial state join to !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483 completed.
client.go:860: [CSAPI] PUT hs1/_matrix/client/v3/devices/HJZGHFCAOA => 200 OK (6.628917ms)
federation_room_join_partial_state_test.go:2326: @user-33-t24alice:hs1 sent device list update.
federation_room_join_partial_state_test.go:2167: Complement server received m.device_list_update: {"device_display_name":"A new device name 3","device_id":"HJZGHFCAOA","prev_id":[43],"stream_id":46,"user_id":"@user-33-t24alice:hs1"}
federation_room_join_partial_state_test.go:2167: Complement server received m.device_list_update: {"device_display_name":"A new device name 2","device_id":"HJZGHFCAOA","prev_id":[43],"stream_id":44,"user_id":"@user-33-t24alice:hs1"}
federation_room_join_partial_state_test.go:2329: @charlie, @derek and @elsie received device list update.
federation_room_join_partial_state_test.go:156: host.docker.internal:39381: @user-33-t24alice:hs1 had already left test room !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483 before WithWaitForLeave ran.
federation_room_join_partial_state_test.go:4427: Cleaning up after test...
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/rooms/!0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483/members => 200 OK (7.654741ms)
federation_room_join_partial_state_test.go:4429: @user-33-t24alice:hs1's partial state join to !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483 completed.
federation_room_join_partial_state_test.go:2167: Complement server received m.device_list_update: {"device_display_name":"A new device name 3","device_id":"HJZGHFCAOA","prev_id":[44],"stream_id":46,"user_id":"@user-33-t24alice:hs1"}
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/rooms/!0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483/leave => 200 OK (31.443572ms)
federation_room_join_partial_state_test.go:169: host.docker.internal:40483 saw @user-33-t24alice:hs1 leave test room !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483.
federation_room_join_partial_state_test.go:169: host.docker.internal:39381 saw @user-33-t24alice:hs1 leave test room !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483.
federation_room_join_partial_state_test.go:4427: Cleaning up after test...
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/rooms/!0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483/members => 200 OK (5.144066ms)
federation_room_join_partial_state_test.go:4429: @user-33-t24alice:hs1's partial state join to !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483 completed.
federation_room_join_partial_state_test.go:156: host.docker.internal:40483: @user-33-t24alice:hs1 had already left test room !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483 before WithWaitForLeave ran.
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/rooms/!0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483/leave => 200 OK (8.149377ms)
federation_room_join_partial_state_test.go:171: host.docker.internal:40483 timed out waiting for @user-33-t24alice:hs1 to leave test room !0-C7eWPIF3FWzfgDTjDj:host.docker.internal:40483.
Member
Author
There was a problem hiding this comment.
Oh, hmm, sorry, could have sworn I saw the tests pass.
| delete(s.eduHandlers, eduHandlerKey) | ||
| } | ||
| } | ||
|
|
Collaborator
There was a problem hiding this comment.
For reference, can you paste the full test run flake?
Member
Author
There was a problem hiding this comment.
https://github.com/element-hq/synapse/actions/runs/27136501582/job/80090514774?pr=19832
📦 github.com/matrix-org/complement/tests/msc3902
2026/06/08 12:22:24 config: &{BaseImageURI:localhost/complement-synapse DebugLoggingEnabled:false AlwaysPrintServerLogs:false EnvVarsPropagatePrefix:PASS_ SpawnHSTimeout:30s ContainerCPUCores:0 ContainerMemoryBytes:0 KeepBlueprints:[] HostMounts:[] BaseImageURIs:map[] PackageNamespace:msc3902 CACertificate:0xc000964c08 CAPrivateKey:0xc000804380 BestEffort:false HostnameRunningComplement:host.docker.internal EnableDirtyRuns:true HSPortBindingIP:127.0.0.1 PostTestScript:}
❌ TestPartialStateJoin (1m56.99s)
❌ TestPartialStateJoin/Outgoing_device_list_updates (19.95s)
❌ TestPartialStateJoin/Outgoing_device_list_updates/Device_list_updates_reach_incorrectly_kicked_servers_once_partial_state_join_completes (4.2s)
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/register => 200 OK (15.620036ms)
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/capabilities => 200 OK (3.801544ms)
server.go:210: Creating room !0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563 with version 10
federation_room_join_partial_state_test.go:4499: Registered state_ids handler for event $mN1voF4gDGPdTSp3EUwOmCDlOFqlzgXW_KUR-AUXZ8U
federation_room_join_partial_state_test.go:4540: Registered /state handler for event $mN1voF4gDGPdTSp3EUwOmCDlOFqlzgXW_KUR-AUXZ8U
2026/06/08 12:23:47 Received send-join of event $kEyw4F8XinpYV9_VahFs5JkXlLv0JIgGb0MkavOKyjE
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/join/!0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563 => 200 OK (80.730602ms)
federation_room_join_partial_state_test.go:4393: /join request completed
federation_room_join_partial_state_test.go:4478: Incoming state_ids request for event [$mN1voF4gDGPdTSp3EUwOmCDlOFqlzgXW_KUR-AUXZ8U] in room !0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563
2026/06/08 12:23:48 Received send-join of event $awFsIhzHfpfMVd_m_ei2qJfZt-YtGhCjhTYrYJzq9U0
federation_room_join_partial_state_test.go:2405: Server.MustJoinRoom joined room ID !0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/sync => 200 OK (49.114711ms)
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/sync => 200 OK (9.370922ms)
federation_room_join_partial_state_test.go:2415: Alice successfully received event $awFsIhzHfpfMVd_m_ei2qJfZt-YtGhCjhTYrYJzq9U0 via /sync
client.go:860: [CSAPI] PUT hs1/_matrix/client/v3/devices/GPHZZFZGEP => 200 OK (4.413737ms)
federation_room_join_partial_state_test.go:2418: @user-35-t26alice:hs1 sent device list update.
federation_room_join_partial_state_test.go:2156: Complement server received m.device_list_update: {"device_display_name":"A new device name 1","device_id":"GPHZZFZGEP","prev_id":[],"stream_id":54,"user_id":"@user-35-t26alice:hs1"}
federation_room_join_partial_state_test.go:2156: Complement server received m.device_list_update: {"device_display_name":"A new device name 1","device_id":"GPHZZFZGEP","prev_id":[],"stream_id":54,"user_id":"@user-35-t26alice:hs1"}
federation_room_join_partial_state_test.go:2421: @charlie, @derek and @elsie received device list update.
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/sync => 200 OK (6.652265ms)
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/sync => 200 OK (48.310036ms)
federation_room_join_partial_state_test.go:2439: Alice successfully received event $SxL9gU1hmed77R46CGYKccn6aDKuZBFGFcKN2eBkKRc via /sync
client.go:860: [CSAPI] PUT hs1/_matrix/client/v3/devices/GPHZZFZGEP => 200 OK (5.036587ms)
federation_room_join_partial_state_test.go:2497: @user-35-t26alice:hs1 sent device list update.
federation_room_join_partial_state_test.go:2156: Complement server received m.device_list_update: {"device_display_name":"A new device name 2","device_id":"GPHZZFZGEP","prev_id":[54],"stream_id":56,"user_id":"@user-35-t26alice:hs1"}
federation_room_join_partial_state_test.go:2500: @charlie and @derek received device list update.
federation_room_join_partial_state_test.go:4485: Replying to /state_ids request for event [$mN1voF4gDGPdTSp3EUwOmCDlOFqlzgXW_KUR-AUXZ8U]
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/rooms/!0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563/members => 200 OK (69.344577ms)
federation_room_join_partial_state_test.go:2513: @user-35-t26alice:hs1's partial state join to !0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563 completed.
federation_room_join_partial_state_test.go:2156: Complement server received m.device_list_update: {"device_display_name":"A new device name 2","device_id":"GPHZZFZGEP","prev_id":[54],"stream_id":57,"user_id":"@user-35-t26alice:hs1"}
federation_room_join_partial_state_test.go:2517: @elsie received missed device list update.
client.go:860: [CSAPI] PUT hs1/_matrix/client/v3/devices/GPHZZFZGEP => 200 OK (5.751307ms)
federation_room_join_partial_state_test.go:2520: @user-35-t26alice:hs1 sent device list update.
federation_room_join_partial_state_test.go:2156: Complement server received m.device_list_update: {"device_display_name":"A new device name 3","device_id":"GPHZZFZGEP","prev_id":[56],"stream_id":59,"user_id":"@user-35-t26alice:hs1"}
federation_room_join_partial_state_test.go:2156: Complement server received m.device_list_update: {"device_display_name":"A new device name 3","device_id":"GPHZZFZGEP","prev_id":[57],"stream_id":59,"user_id":"@user-35-t26alice:hs1"}
federation_room_join_partial_state_test.go:2523: @charlie, @derek and @elsie received device list update.
federation_room_join_partial_state_test.go:4416: Cleaning up after test...
client.go:860: [CSAPI] GET hs1/_matrix/client/v3/rooms/!0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563/members => 200 OK (3.937128ms)
federation_room_join_partial_state_test.go:4418: @user-35-t26alice:hs1's partial state join to !0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563 completed.
client.go:860: [CSAPI] POST hs1/_matrix/client/v3/rooms/!0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563/leave => 200 OK (25.974991ms)
federation_room_join_partial_state_test.go:156: host.docker.internal:45563 saw @user-35-t26alice:hs1 leave test room !0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563.
federation_room_join_partial_state_test.go:152: host.docker.internal:33565 has already seen @user-35-t26alice:hs1 leave test room !0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563.
federation_room_join_partial_state_test.go:77: Received unexpected PDU: {"auth_events":["$613ehdee8-7NYiEo9wLU_E89o8bDbIvOvNdMfB8c_aw","$kEyw4F8XinpYV9_VahFs5JkXlLv0JIgGb0MkavOKyjE","$jnQmT8p0NUxGgUChevtHR2Qu027MlckWRQLfTsnaRXs"],"content":{"membership":"leave"},"depth":11,"hashes":{"sha256":"oaWwqeddMuOgqMSQs/IDHmDxHcxwNdjTTVJrhcHk0JM"},"origin_server_ts":1780921429379,"prev_events":["$SxL9gU1hmed77R46CGYKccn6aDKuZBFGFcKN2eBkKRc"],"room_id":"!0-hvmfClz7gk1ghOQTi5:host.docker.internal:45563","sender":"@user-35-t26alice:hs1","signatures":{"hs1":{"ed25519:a_zXSM":"y5zkhzV66kRDjAI1A/BUHRHnz7Ww1dki3S3o3nza3tBOnNOBA2E/C7Lf7WdOfiPTR9VLstNNufapfqaDYWhECA"}},"state_key":"@user-35-t26alice:hs1","type":"m.room.member"}
Co-authored-by: Eric Eastwood <erice@element.io>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
🤖 Generated with Claude Code
Background
TestPartialStateJoin/Outgoing_device_list_updates(intests/msc3902) flakes in downstream CI (seen in Synapse), failing during test cleanup with:The "unexpected" PDU is the joining user's own
leave, sent legitimately as the test tears down — so this is a race in the test helper, not a homeserver bug.Root cause
WithWaitForLeaveregisters a PDU handler whose callback pushes matching leaves ontoleaveChannel, then decides whether to block by peeking atroom.CurrentState:When a transaction arrives,
HandleTransactionRequestscallsroom.AddEvent(which updatesCurrentState) before invoking the PDU callback. So this interleaving is possible:room.AddEvent(leave)→CurrentState(user) == "leave", callback not yet run"leave"→ returns →defer removePDUHandler()deletes the handlert.Errorf("Received unexpected PDU")The helper used two different signals for the same event — the callback channel and
CurrentState— and tore its handler down based on the one that fires first. It's flaky because it only fails when the scheduler switches betweenAddEventand the callback (more likely under CI load).Fix
Wait on the channel fed by the handler instead of polling
room.CurrentState, so the handler stays registered until its own callback has run.Every caller passes a joined user and an action that performs the leave, so the leave always arrives fresh via that callback — there is no legitimate "already left" case. Make that contract explicit instead of papering over it:
t.Errorf) if the user has already left, rather than silently treating it as "already seen". The action still runs in that case, so any cleanup it performs (e.g.partialStateJoinResult.Destroy) is preserved.leaveChannel.CurrentStatefallback is gone, since neither an already-left user nor a timeout should happen in correct usage.This is contained to the
msc3902helper. TheHandleTransactionRequestsordering (callback afterroom.AddEvent) is left intact, as other tests rely on it — e.g.federation_redaction_test.goreadsserverRoom.Timelineimmediately after a callback-driven waiter.