Skip to content

perf: room-broadcast NEW_CHANGES fan-out for steady-state clients behind feature flag (#7780)#7853

Open
kishansomaiya wants to merge 2 commits into
ether:developfrom
kishansomaiya:perf/7780-room-broadcast-new-changes
Open

perf: room-broadcast NEW_CHANGES fan-out for steady-state clients behind feature flag (#7780)#7853
kishansomaiya wants to merge 2 commits into
ether:developfrom
kishansomaiya:perf/7780-room-broadcast-new-changes

Conversation

@kishansomaiya
Copy link
Copy Markdown

Summary

Closes #7780.

This PR implements Shape A for room fan-out in src/node/handler/PadMessageHandler.ts:

  • broadcast NEW_CHANGES once to steady-state recipients at head - 1
  • keep per-socket rev + 1 catch-up for stragglers
  • gate the behavior behind settings.roomBroadcastNewChanges, default false

Goal: reduce per-recipient packet construction overhead in the steady-state path without changing client protocol behavior.

Why each change is necessary

  • src/node/handler/PadMessageHandler.ts
    Introduces split-path fan-out:

    • steady-state sockets use room broadcast for the latest revision
    • stragglers keep per-socket incremental catch-up
      This is necessary because a naive latest-only broadcast would be dropped by lagging clients that enforce rev + 1 semantics.
  • src/node/utils/Settings.ts
    Adds roomBroadcastNewChanges to settings type and default.
    This is required so the optimization can be A/B tested safely and remain off by default.

  • settings.json.template
    Documents the new flag and intended usage.
    This is required so operators can enable the experiment intentionally and understand scope.

  • settings.json.docker
    Adds ROOM_BROADCAST_NEW_CHANGES env mapping.
    This is required for containerized benchmark and rollout workflows.

  • src/tests/backend-new/specs/roomBroadcastNewChanges-defaults.test.ts
    Verifies default remains false.
    This guards the acceptance requirement that the change is feature-flagged and opt-in.

Compatibility and risk

  • Default behavior is unchanged when the flag is false.
  • No client-side protocol changes.
  • Straggler path preserves current correctness guarantees.

Validation

  • TypeScript check passed
  • Backend-new Vitest suite passed
  • New defaults test passed

N=3 measurement

Run setup used for this branch measurement:

  • Duration: 10 seconds
  • Authors: 5
  • N=3 per mode, same host, fresh run per sample

Results:

mode run cpu_delta_sec NEW_CHANGES_emits_delta
off 1 0.9375 396
off 2 1.0625 404
off 3 1.0312 400
on 1 1.1875 102
on 2 1.0156 100
on 3 0.9531 99

Averages:

  • avg off cpu_delta_sec: 1.0104
  • avg on cpu_delta_sec: 1.0521
  • cpu change: +4.12%
  • avg off NEW_CHANGES emits delta: 400.0000
  • avg on NEW_CHANGES emits delta: 100.3333
  • emits change: -74.92%

Notes:

  • The expected emit reduction is clearly visible.
  • CPU improvement is not visible at this small workload; likely below noise floor.
  • If preferred, I can add a follow-up measurement block with a heavier profile matching CI-style load settings.

@qodo-code-review
Copy link
Copy Markdown

Qodo reviews are paused for this user.

Troubleshooting steps vary by plan Learn more →

On a Teams plan?
Reviews resume once this user has a paid seat and their Git account is linked in Qodo.
Link Git account →

Using GitHub Enterprise Server, GitLab Self-Managed, or Bitbucket Data Center?
These require an Enterprise plan - Contact us
Contact us →

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

Review Summary by Qodo

Room-broadcast NEW_CHANGES fan-out for steady-state clients

✨ Enhancement

Grey Divider

Walkthroughs

Description
• Implement room-broadcast fan-out for NEW_CHANGES messages to steady-state clients
• Split message delivery: broadcast to synced clients, per-socket catch-up for stragglers
• Add roomBroadcastNewChanges feature flag, disabled by default for safe A/B testing
• Reduce per-recipient packet construction overhead in steady-state path
Diagram
flowchart LR
  A["updatePadClients"] --> B{"roomBroadcastNewChanges enabled?"}
  B -->|Yes| C["Partition sockets"]
  C --> D["Synced at head-1"]
  C --> E["Stragglers"]
  D --> F["Room broadcast once"]
  E --> G["Per-socket catch-up"]
  F --> H["Update sessioninfo"]
  G --> H
  B -->|No| I["Legacy per-socket loop"]
  I --> H

Loading

File Changes

1. src/node/handler/PadMessageHandler.ts ✨ Enhancement +118/-8

Split fan-out logic for steady-state vs straggler clients

• Add forWireCache to cache prepared changesets for wire transmission
• Introduce getRevision() and getForWire() helper functions for caching
• Implement split-path fan-out when roomBroadcastNewChanges is enabled
• Broadcast NEW_CHANGES once to synced clients at head-1 using room.except()
• Preserve per-socket incremental catch-up for straggler clients
• Refactor legacy path to use new helper functions

src/node/handler/PadMessageHandler.ts


2. src/node/utils/Settings.ts ⚙️ Configuration changes +6/-0

Add roomBroadcastNewChanges feature flag setting

• Add roomBroadcastNewChanges: boolean to SettingsType interface
• Set default value to false for safe opt-in behavior
• Document flag purpose: A/B testing room-broadcast optimization

src/node/utils/Settings.ts


3. settings.json.template 📝 Documentation +9/-0

Document roomBroadcastNewChanges configuration option

• Add roomBroadcastNewChanges configuration option with default false
• Document the performance experiment (#7780) and its behavior
• Clarify that flag should remain disabled unless explicitly benchmarking
• Explain split behavior: broadcast for steady-state, per-socket for stragglers

settings.json.template


View more (2)
4. settings.json.docker ⚙️ Configuration changes +6/-0

Add Docker environment variable mapping for feature flag

• Add roomBroadcastNewChanges configuration with environment variable mapping
• Support ROOM_BROADCAST_NEW_CHANGES env var for containerized deployments
• Document the performance experiment reference (#7780)

settings.json.docker


5. src/tests/backend-new/specs/roomBroadcastNewChanges-defaults.test.ts 🧪 Tests +8/-0

Test default value of roomBroadcastNewChanges flag

• New test file verifying roomBroadcastNewChanges defaults to false
• Ensures feature flag remains opt-in and disabled by default
• Guards acceptance requirement for safe A/B testing

src/tests/backend-new/specs/roomBroadcastNewChanges-defaults.test.ts


Grey Divider

Qodo Logo

@qodo-free-for-open-source-projects
Copy link
Copy Markdown

qodo-free-for-open-source-projects Bot commented May 26, 2026

Code Review by Qodo

🐞 Bugs (1) 📘 Rule violations (0) 📎 Requirement gaps (0)

Grey Divider


Action required

1. Broadcast timeDelta can crash 🐞 Bug ≡ Correctness
Description
In updatePadClients() with roomBroadcastNewChanges enabled, timeDelta is computed from
exemplarSession.time after awaiting revision fetch, so the exemplar session can disappear
(disconnect) or have undefined time, causing a throw or broadcasting NaN timeDelta to all
steady-state clients. This breaks timeslider time tracking because the client adds timeDelta to
currentTime.
Code

src/node/handler/PadMessageHandler.ts[R1055-1066]

Evidence
The broadcast path reads exemplarSession.time without null/number checks after awaiting
getRevision(headRev), but disconnects delete sessioninfos[socket.id], so exemplarSession can
become undefined; additionally the reconnect path does not initialize sessionInfo.time, and the
codebase explicitly documents that missing time leads to timeDelta=NaN. On the client,
timeDelta directly mutates timeslider time state, so NaN is harmful.

src/node/handler/PadMessageHandler.ts[1033-1090]
src/node/handler/PadMessageHandler.ts[246-250]
src/node/handler/PadMessageHandler.ts[1307-1314]
src/node/handler/PadMessageHandler.ts[1502-1511]
src/static/js/broadcast.ts[206-268]

Agent prompt
The issue below was found during a code review. Follow the provided context and guidance below and implement a solution

## Issue description
When `settings.roomBroadcastNewChanges` is enabled, `updatePadClients()` builds a single `NEW_CHANGES` message for all steady-state sockets and sets `timeDelta` using `currentTime - exemplarSession.time`. Because this happens after an `await`, the exemplar session can be removed from `sessioninfos` (disconnect) or have a non-numeric/undefined `time`, causing either a runtime exception or `NaN` `timeDelta` broadcasted to many clients.
### Issue Context
- `sessioninfos` entries are deleted on disconnect, so a socket can disappear between the initial scan and the later exemplar lookup.
- Some sessions may have missing/undefined `time` (for example, reconnect path sets `rev` but does not initialize `time`), and there is an existing comment warning that missing `time` produces `timeDelta=NaN`.
- Timeslider client code applies `timeDelta` to `padContents.currentTime`, so `NaN` corrupts time tracking.
### Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[1033-1090]
- src/node/handler/PadMessageHandler.ts[246-250]
- src/node/handler/PadMessageHandler.ts[1307-1314]
- src/node/handler/PadMessageHandler.ts[1502-1511]
- src/static/js/broadcast.ts[206-268]
### Implementation notes
- Avoid depending on any single session object for `timeDelta` in the broadcast message. Prefer computing `timeDelta` from revision timestamps, e.g.:
- `currentTime = revision.meta.timestamp`
- `prevTime = headRev > 0 ? (await getRevision(headRev - 1)).meta.timestamp : currentTime`
- `timeDelta = currentTime - prevTime`
- If you keep an exemplar-based fast path, guard it: ensure `exemplarSession` exists and `typeof exemplarSession.time === 'number'`, otherwise fall back to revision timestamp delta.
- (Optional hardening) Initialize `sessionInfo.time` on the reconnect path similarly to the normal connect path to prevent future `NaN` deltas in per-socket catch-up.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools


Grey Divider

Qodo Logo

Comment on lines +1055 to +1066
const exemplarSession = sessioninfos[syncedSocketIds[0]];
const msg = {
type: 'COLLABROOM',
data: {
type: 'NEW_CHANGES',
newRev: headRev,
changeset: forWire.translated,
apool: forWire.pool,
author,
currentTime,
timeDelta: currentTime - exemplarSession.time,
},
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Broadcast timedelta can crash 🐞 Bug ≡ Correctness

In updatePadClients() with roomBroadcastNewChanges enabled, timeDelta is computed from
exemplarSession.time after awaiting revision fetch, so the exemplar session can disappear
(disconnect) or have undefined time, causing a throw or broadcasting NaN timeDelta to all
steady-state clients. This breaks timeslider time tracking because the client adds timeDelta to
currentTime.
Agent Prompt
### Issue description
When `settings.roomBroadcastNewChanges` is enabled, `updatePadClients()` builds a single `NEW_CHANGES` message for all steady-state sockets and sets `timeDelta` using `currentTime - exemplarSession.time`. Because this happens after an `await`, the exemplar session can be removed from `sessioninfos` (disconnect) or have a non-numeric/undefined `time`, causing either a runtime exception or `NaN` `timeDelta` broadcasted to many clients.

### Issue Context
- `sessioninfos` entries are deleted on disconnect, so a socket can disappear between the initial scan and the later exemplar lookup.
- Some sessions may have missing/undefined `time` (for example, reconnect path sets `rev` but does not initialize `time`), and there is an existing comment warning that missing `time` produces `timeDelta=NaN`.
- Timeslider client code applies `timeDelta` to `padContents.currentTime`, so `NaN` corrupts time tracking.

### Fix Focus Areas
- src/node/handler/PadMessageHandler.ts[1033-1090]
- src/node/handler/PadMessageHandler.ts[246-250]
- src/node/handler/PadMessageHandler.ts[1307-1314]
- src/node/handler/PadMessageHandler.ts[1502-1511]
- src/static/js/broadcast.ts[206-268]

### Implementation notes
- Avoid depending on any single session object for `timeDelta` in the broadcast message. Prefer computing `timeDelta` from revision timestamps, e.g.:
  - `currentTime = revision.meta.timestamp`
  - `prevTime = headRev > 0 ? (await getRevision(headRev - 1)).meta.timestamp : currentTime`
  - `timeDelta = currentTime - prevTime`
- If you keep an exemplar-based fast path, guard it: ensure `exemplarSession` exists and `typeof exemplarSession.time === 'number'`, otherwise fall back to revision timestamp delta.
- (Optional hardening) Initialize `sessionInfo.time` on the reconnect path similarly to the normal connect path to prevent future `NaN` deltas in per-socket catch-up.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

@JohnMcLear
Copy link
Copy Markdown
Member

If you look through the load testing you should see we also tested for max performance. I would say this is not a suitable first issue for someone new to Etherpad dev.

@kishansomaiya
Copy link
Copy Markdown
Author

@JohnMcLear Please review this PR

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Performance: [#7756 follow-up] Room-broadcast NEW_CHANGES fan-out to drop ~5-7% per-recipient packet construction

2 participants