Skip to content

Move widget state from broadcast stream to Automerge CRDT document #761

@rgbkrk

Description

@rgbkrk

Summary

Widget state currently lives in two ephemeral in-memory replicas — CommState (daemon, Rust) and WidgetStore (frontend, JS) — connected by a tokio::broadcast channel. This architecture has several problems:

  • Silent data loss: Broadcast channel lag silently drops messages
  • No persistence: Widget state dies with the kernel. A daemon restart loses everything.
  • CommSync replay is fragile: Late-joining clients get a snapshot sorted by insertion order, but if a widget references another that was created later (unlikely but possible with layout manipulation), the replay breaks.
  • No echo suppression: Frontend updates round-trip through the kernel and come back as duplicate store mutations, causing unnecessary React re-renders
  • Output widget capture is a message-forwarding workaround: The daemon intercepts cell outputs and re-routes them as comm_msg(custom) messages to Output widgets, because there's no shared document for the widget to read from.

The fix: put widget state in the Automerge CRDT document, the same way cell outputs and metadata already live there.

What moves to the CRDT

A comms/ map in the notebook doc (not a separate doc — see rationale below):

ROOT/
  schema_version: u64              ← bump to 3
  ...existing cells/, metadata/...
  comms/                           ← Map keyed by comm_id (NEW)
    {comm_id}/
      target_name: Str             ← "jupyter.widget"
      model_module: Str            ← "@jupyter-widgets/controls" | "anywidget" | ...
      model_name: Str              ← "IntSliderModel" | "OutputModel" | ...
      state: Str                   ← JSON-encoded widget state (blob refs as {"$blob": "<hash>"})
      outputs/                     ← List<Str> (OutputModel only: manifest hashes, same format as cell outputs)
      seq: u64                     ← Insertion order for dependency-correct replay

Why same doc, not separate doc

The original version of this issue proposed a separate Automerge doc scoped to the kernel session. After the notebook-sync DocHandle refactor (#786) and the native metadata migration (#791), the same-doc approach is better:

  1. One sync mechanism, not two. The metadata migration proved that adding structured data to the notebook doc works. Adding comms/ is the same pattern — put_json_at_key() already exists.
  2. No second sync connection per client. Each client already syncs the notebook doc. A second doc means a second DocHandle, second sync task, second set of connection plumbing.
  3. DocHandle.with_doc() is atomic across cells and comms. The daemon can clear outputs, write to doc.comms[widget_id].outputs, and sync — all in one lock acquisition. With two docs, cross-doc atomicity requires coordination.
  4. Output widget simplification. If cell outputs and widget outputs are in the same doc, the daemon writes to one of two locations using the same blob manifest pipeline. No custom message protocol needed.
  5. New clients get everything from one sync. No CommSync, no Phase 1.5 handshake.

The concerns about the same-doc approach are mitigable:

  • Lifetime (kernel session vs notebook file): The daemon clears doc.comms on kernel shutdown. Same effect as destroying a separate doc.
  • High-frequency updates (slider drag): The daemon coalesces rapid updates (16ms window). Automerge overhead for a single scalar write is ~50-100 bytes sync message — trivial for a Unix socket.
  • History growth: Compaction on kernel restart (snapshot the doc, discard history) handles this.
  • .ipynb persistence: The save-to-disk path already selectively reads cells and metadata — it simply ignores comms/.

What moves

Data Current location CRDT location
Widget existence (open/close) CommState HashMap + broadcast doc.comms map
Widget state (slider value, button style, etc.) CommState + WidgetStore doc.comms[id].state
Output widget captured outputs Custom message dance (daemon → broadcast → frontend → sendUpdate back) doc.comms[id].outputs — daemon writes directly, same manifest pipeline as cell outputs
Binary buffers (images, numpy arrays) In-memory on CommSnapshot.buffers, base64 Blob store, hash refs in state as {"$blob": "<hash>"} sentinels

What stays on events/broadcasts

Data Why
Custom messages (method: "custom", model.send()) Non-idempotent ordered events (button clicks, ipycanvas draw commands). CRDTs model state, not event streams.
ExecutionStarted, ExecutionDone, QueueChanged UI animation hints. The doc is authoritative; events are just fast-path signals.
KernelError, EnvProgress, EnvSyncState, FileChanged Genuinely ephemeral

What this eliminates

  1. CommSync handshake: New clients just sync the doc. The entire Phase 1.5 disappears.
  2. CommState struct (most of it): The daemon writes to the doc instead of maintaining a parallel HashMap. Output widget capture routing stays (it's routing logic, not state).
  3. 5 broadcast variants: CommSync, Comm (for open/update/close), Output, OutputsCleared, DisplayUpdate — all replaced by doc sync. Broadcast surface: 13 → 8 variants.
  4. Echo suppression problem: CRDT merge of identical state is a no-op.
  5. Output widget custom message protocol: No more {method: "output", output: ...} / {method: "clear_output", wait: bool}. The daemon writes captured outputs to doc.comms[widget_id].outputs using the same blob manifest pipeline as cell outputs. The frontend renders them the same way.
  6. closedModels set in WidgetStore: If it's not in the doc, it doesn't exist.
  7. sendUpdate feedback loop for Output widgets.

Precedent from recent refactors

PR What it proved
#786, #789 notebook-sync DocHandle with_doc(|doc| ...) gives synchronous access. SyncCommand enum shrank from ~15 variants to 4. Perfect for comm writes.
#791, #800 Native metadata put_json_at_key() recursively stores JSON as native Automerge types. Dual-write → remove legacy path. Same playbook applies.
#797 Sync-before-ExecutionDone Validates "events are hints, not state." The fix enforces doc sync before broadcast — same principle for CommSync elimination.
#755, #789 Python reads from doc Python already ignores Output broadcasts and reads from the doc via confirm_sync() + get_cells(). Same pattern extends to get_comms().

Implementation plan

Phase A: Schema + dual-write — #808

  • Add comms map to NotebookDoc::new(), migrate_v2_to_v3(), bump schema_version to 3
  • Add put_comm, update_comm_state, remove_comm, get_comms, clear_comms methods
  • For OutputModel: append_comm_output, clear_comm_outputs
  • Daemon dual-writes to doc.comms AND CommState on comm_open/comm_msg(update)/comm_close
  • Keep CommSync as fallback — no behavior changes
  • Size: Medium

Phase B: Frontend + Python read from doc — #809

  • Add get_comms() to WASM NotebookHandle and notebook-sync DocHandle
  • Frontend watches doc.comms after sync_applied, drives WidgetStore from doc state
  • Python session.get_widgets() reads from doc after confirm_sync()
  • CommSync still sent as backup during transition
  • Size: Medium-Large

Phase C: Eliminate parallel paths — #810

  • Remove CommSync, Comm (open/update/close), Output, OutputsCleared, DisplayUpdate broadcast variants
  • Output widget captured outputs → doc.comms[widget_id].outputs (same blob pipeline as cell outputs)
  • update_display_data scans doc.comms[*].outputs in addition to cells
  • Reduce CommState to OutputCaptureRouter (capture routing logic only)
  • Apply sync-before-event pattern for execution_started (like fix(daemon): sync doc to peer before forwarding ExecutionDone #797)
  • Size: Large — this is where the real simplification lands

Phase D: Binary unification + update_comm#811

  • Widget buffers through blob store: {"$blob": "<hash>"} sentinels in state JSON
  • New UpdateComm { comm_id, state_delta } request type (clean path for frontend→kernel state changes)
  • Daemon coalesces rapid update_comm requests (16ms window) to bound CRDT history growth
  • Frontend optimistic updates reconcile with doc sync
  • SendComm retained only for method: "custom" messages
  • Size: Medium-Large

Edge cases

  • clear_output(wait=True): Daemon buffers one output, clears + appends atomically on next output. Logic stays daemon-side, writes to CRDT instead of sending custom message.
  • High-frequency updates (slider drag, play widget): Coalesced by daemon (16ms window). Frontend optimistic local state for immediate feedback.
  • Kernel session scoping: doc.clear_comms() on kernel shutdown. Compaction opportunity.
  • ipywidgets 8 echo_update: CRDT approach makes it unnecessary — merge of identical state is a no-op.
  • anywidget AFM interface: model.get(key) → read from doc. model.set(key, value)UpdateComm request. model.on("change:key", cb) → watch doc changes. model.send()SendComm (irreducible stream).
  • Container widget ordering: seq field ensures get_comms() returns widgets in creation order for dependency-correct replay (layouts must be instantiated after their children).

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitecture proposals and structural changesenhancementNew feature or requestipywidgetsWidget rendering, comm protocol, Output widgetssyncAutomerge CRDT sync protocol

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions