perf: room-broadcast NEW_CHANGES fan-out for steady-state clients behind feature flag (#7780)#7853
Conversation
…ind feature flag (ether#7780)
Qodo reviews are paused for this user.Troubleshooting steps vary by plan Learn more → On a Teams plan? Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center? |
Review Summary by QodoRoom-broadcast NEW_CHANGES fan-out for steady-state clients
WalkthroughsDescription• Implement room-broadcast fan-out for NEW_CHANGES messages to steady-state clients • Split message delivery: broadcast to synced clients, per-socket catch-up for stragglers • Add roomBroadcastNewChanges feature flag, disabled by default for safe A/B testing • Reduce per-recipient packet construction overhead in steady-state path Diagramflowchart LR
A["updatePadClients"] --> B{"roomBroadcastNewChanges enabled?"}
B -->|Yes| C["Partition sockets"]
C --> D["Synced at head-1"]
C --> E["Stragglers"]
D --> F["Room broadcast once"]
E --> G["Per-socket catch-up"]
F --> H["Update sessioninfo"]
G --> H
B -->|No| I["Legacy per-socket loop"]
I --> H
File Changes1. src/node/handler/PadMessageHandler.ts
|
Code Review by Qodo
1. Broadcast timeDelta can crash
|
| const exemplarSession = sessioninfos[syncedSocketIds[0]]; | ||
| const msg = { | ||
| type: 'COLLABROOM', | ||
| data: { | ||
| type: 'NEW_CHANGES', | ||
| newRev: headRev, | ||
| changeset: forWire.translated, | ||
| apool: forWire.pool, | ||
| author, | ||
| currentTime, | ||
| timeDelta: currentTime - exemplarSession.time, | ||
| }, |
There was a problem hiding this comment.
1. Broadcast timedelta can crash 🐞 Bug ≡ Correctness
In updatePadClients() with roomBroadcastNewChanges enabled, timeDelta is computed from exemplarSession.time after awaiting revision fetch, so the exemplar session can disappear (disconnect) or have undefined time, causing a throw or broadcasting NaN timeDelta to all steady-state clients. This breaks timeslider time tracking because the client adds timeDelta to currentTime.
Agent Prompt
### Issue description
When `settings.roomBroadcastNewChanges` is enabled, `updatePadClients()` builds a single `NEW_CHANGES` message for all steady-state sockets and sets `timeDelta` using `currentTime - exemplarSession.time`. Because this happens after an `await`, the exemplar session can be removed from `sessioninfos` (disconnect) or have a non-numeric/undefined `time`, causing either a runtime exception or `NaN` `timeDelta` broadcasted to many clients.
### Issue Context
- `sessioninfos` entries are deleted on disconnect, so a socket can disappear between the initial scan and the later exemplar lookup.
- Some sessions may have missing/undefined `time` (for example, reconnect path sets `rev` but does not initialize `time`), and there is an existing comment warning that missing `time` produces `timeDelta=NaN`.
- Timeslider client code applies `timeDelta` to `padContents.currentTime`, so `NaN` corrupts time tracking.
### Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[1033-1090]
- src/node/handler/PadMessageHandler.ts[246-250]
- src/node/handler/PadMessageHandler.ts[1307-1314]
- src/node/handler/PadMessageHandler.ts[1502-1511]
- src/static/js/broadcast.ts[206-268]
### Implementation notes
- Avoid depending on any single session object for `timeDelta` in the broadcast message. Prefer computing `timeDelta` from revision timestamps, e.g.:
- `currentTime = revision.meta.timestamp`
- `prevTime = headRev > 0 ? (await getRevision(headRev - 1)).meta.timestamp : currentTime`
- `timeDelta = currentTime - prevTime`
- If you keep an exemplar-based fast path, guard it: ensure `exemplarSession` exists and `typeof exemplarSession.time === 'number'`, otherwise fall back to revision timestamp delta.
- (Optional hardening) Initialize `sessionInfo.time` on the reconnect path similarly to the normal connect path to prevent future `NaN` deltas in per-socket catch-up.
ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools
|
If you look through the load testing you should see we also tested for max performance. I would say this is not a suitable first issue for someone new to Etherpad dev. |
|
@JohnMcLear Please review this PR |
Summary
Closes #7780.
This PR implements Shape A for room fan-out in src/node/handler/PadMessageHandler.ts:
Goal: reduce per-recipient packet construction overhead in the steady-state path without changing client protocol behavior.
Why each change is necessary
src/node/handler/PadMessageHandler.ts
Introduces split-path fan-out:
This is necessary because a naive latest-only broadcast would be dropped by lagging clients that enforce rev + 1 semantics.
src/node/utils/Settings.ts
Adds roomBroadcastNewChanges to settings type and default.
This is required so the optimization can be A/B tested safely and remain off by default.
settings.json.template
Documents the new flag and intended usage.
This is required so operators can enable the experiment intentionally and understand scope.
settings.json.docker
Adds ROOM_BROADCAST_NEW_CHANGES env mapping.
This is required for containerized benchmark and rollout workflows.
src/tests/backend-new/specs/roomBroadcastNewChanges-defaults.test.ts
Verifies default remains false.
This guards the acceptance requirement that the change is feature-flagged and opt-in.
Compatibility and risk
Validation
N=3 measurement
Run setup used for this branch measurement:
Results:
Averages:
Notes: