Skip to content

Runtime agents: decompose kernel ownership into per-runtime Automerge peers #832

@rgbkrk

Description

@rgbkrk

Summary

Today, runtimed is monolithic — it manages kernels, receives Jupyter messages, and proxies kernel output (cell outputs, execution counts, kernel status) into the Automerge document. All of these writes are attributed to a single "runtimed" actor ID, making it impossible to distinguish "runtimed set up the document structure" from "the Python runtime produced this output."

The proposal is to introduce runtime agents — lightweight peers that each own a single kernel and participate as first-class Automerge peers on the notebook document.

Architecture

Current

┌─────────────────────────────────┐
│           runtimed              │
│  actor: "runtimed"              │
│                                 │
│  ┌───────────┐ ┌───────────┐   │
│  │ kernel A  │ │ kernel B  │   │
│  └───────────┘ └───────────┘   │
│        │              │         │
│  Jupyter msgs → CRDT writes    │
│  (all attributed to "runtimed")│
└─────────────────────────────────┘

Proposed

┌──────────────────────┐
│      runtimed        │
│  actor: "runtimed"   │
│  (coordinator/sync)  │
└──────┬───────┬───────┘
       │       │
┌──────┴──┐ ┌──┴──────┐
│ runtime │ │ runtime │
│ agent A │ │ agent B │
│ actor:  │ │ actor:  │
│ "runtime│ │ "runtime│
│ :python │ │ :deno   │
│ :<env>" │ │ :<env>" │
│         │ │         │
│ owns    │ │ owns    │
│ kernel  │ │ kernel  │
└─────────┘ └─────────┘

Each runtime agent:

  • Is an Automerge peer with its own actor ID (e.g., runtime:python:<env-id>)
  • Owns a kernel lifecycle (launch, restart, shutdown)
  • Handles Jupyter protocol messages from its kernel
  • Writes outputs, execution counts, and kernel status directly to the CRDT
  • Syncs with runtimed via the existing Automerge sync protocol

runtimed becomes a coordinator:

  • Manages rooms and peer sync
  • Spawns/supervises runtime agents
  • Handles notebook-level operations (persistence, file watching, trust)
  • No longer directly handles kernel Jupyter messages

Benefits

  • Per-runtime provenance for free: each agent's writes are naturally tagged with its own actor ID — no actor-switching needed
  • Cleaner separation of concerns: runtimed doesn't need to know about Jupyter protocol details
  • Foundation for heterogeneous runtimes: different runtime agents could implement different protocols (Jupyter, LSP, REPL, etc.)
  • Independent scaling: runtime agents could run in separate processes or containers
  • Fault isolation: a crashing kernel agent doesn't take down the coordinator

Prerequisites

  • Meaningful Automerge actor IDs (edit provenance) — landed in the current provenance work
  • Stable sync protocol between peers (the existing notebook-sync crate handles this)

Open questions

  • Should runtime agents be threads, tasks, or separate processes?
  • How does the agent learn which cells to execute? (Currently runtimed receives ExecuteCell requests from the protocol — the agent would need to subscribe to these.)
  • How does the agent's sync state get bootstrapped? (Likely the same empty_with_actor → sync pattern the frontend uses.)
  • Should the agent own the kernel environment setup (uv/conda) or just the running kernel?

Metadata

Metadata

Assignees

No one assigned

    Labels

    architectureArchitecture proposals and structural changes

    Type

    No type

    Projects

    No projects

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions