Skip to content

feat: Node.js plugin isolation via workerd sandbox#426

Merged
ascorbic merged 29 commits into
emdash-cms:mainfrom
BenjaminPrice:feat/node-plugin-isolation
May 22, 2026
Merged

feat: Node.js plugin isolation via workerd sandbox#426
ascorbic merged 29 commits into
emdash-cms:mainfrom
BenjaminPrice:feat/node-plugin-isolation

Conversation

@BenjaminPrice
Copy link
Copy Markdown
Contributor

@BenjaminPrice BenjaminPrice commented Apr 10, 2026

What does this PR do?

Adds workerd-based plugin sandboxing for Node.js deployments, closing EmDash's most significant architectural gap: plugins that run sandboxed on Cloudflare Workers now run sandboxed on Node.js too, with matching capability enforcement and isolation guarantees.

Discussion: #425

How it works

A new @emdash-cms/workerd package implements the SandboxRunner interface (same contract as @emdash-cms/cloudflare). In production it spawns workerd as a child process with a generated capnp config. In development it uses miniflare for faster startup. Plugin workers run in V8 isolates and communicate with Node via an authenticated HTTP backing service.

Architecture:

  • WorkerdSandboxRunner (production): spawns workerd, generates capnp config per plugin, manages lifecycle with exponential backoff restart, SIGTERM forwarding, intentional-stop guard, and epoch-based stale handle detection
  • MiniflareDevRunner (dev): uses miniflare's outboundService to route bridge calls to Node handler functions; non-bridge fetches are blocked to prevent capability bypass
  • bridge-handler.ts: shared bridge dispatch logic used by both runners, with capability enforcement matching the Cloudflare PluginBridge
  • Backing service: HTTP server with per-plugin HMAC tokens, server-side capability validation, and constant-time auth comparison

Security model matches Cloudflare:

  • Per-plugin auth tokens encoding capabilities, validated server-side
  • Per-startup invoke token authenticates inbound hook/route calls (prevents same-host attackers from invoking the per-plugin TCP listener directly)
  • Constant-time comparison everywhere (timingSafeEqual in Node, hand-rolled XOR in workerd where Web Crypto doesn't expose it)
  • Network fetch uses core's shared createHttpAccess (redirect revalidation on each hop, credential stripping on cross-origin redirects, SSRF blocking, max 5 redirects)
  • Strict capability enforcement: write:content does NOT imply read:content (matches Cloudflare bridge)
  • Cross-plugin data isolation (KV and storage scoped per plugin ID)
  • Direct fetch() from plugin code is blocked in dev (miniflare outboundService returns 403) and routed through the backing service in prod (capnp globalOutbound), forcing all network access through ctx.http.fetch and capability/host checks

Honest about resource limits: cpuMs, memoryMb, and subrequests are Cloudflare platform features, not standalone workerd. Only wallTimeMs is enforced (via Promise.race). Operators get a startup warning if they configure unenforced limits, and the docs include a caution box.

Sandbox bypass mode (sandbox: false): Debugging escape hatch that loads sandboxed plugin entries in-process via adaptSandboxEntry + data URL import(). Build-time and marketplace plugins both work under bypass; marketplace install/update routes skip the SANDBOX_NOT_AVAILABLE gate when bypassed and rebuild the hook pipeline so changes take effect immediately.

Type of change

Checklist

  • I have read CONTRIBUTING.md
  • pnpm typecheck passes
  • pnpm --silent lint:json | jq '.diagnostics | length' returns 0 errors
  • pnpm test passes (or targeted tests for my change)
  • pnpm format has been run
  • I have added/updated tests for my changes (if applicable)
  • I have added a changeset (if this PR changes a published package)
  • New features link to an approved Discussion: Node plugin isolation via workerd sidecar #425

AI-generated code disclosure

  • This PR includes AI-generated code

Tests

36 new tests in @emdash-cms/workerd across 3 files, plus 2163 existing core tests passing:

  • miniflare-isolation.test.ts (6 tests): V8 isolate isolation, service bindings, KV namespaces, dynamic code loading, worker reconfiguration
  • bridge-handler.test.ts (19 tests): KV CRUD + isolation, strict capability enforcement (read/write content do not imply each other; network:fetch:any satisfies network:fetch), storage collection validation, error message parity with Cloudflare bridge
  • plugin-integration.test.ts (11 tests): Real plugin operations modeled after the sandboxed-test plugin (KV round-trip, storage round-trip, content lifecycle with ULID/versioning/soft-delete, cross-plugin isolation, write-only plugins cannot read)

All tests use real in-memory SQLite with production schema, no mocking.

 ✓ test/bridge-handler.test.ts (19 tests)
 ✓ test/plugin-integration.test.ts (11 tests)
 ✓ test/miniflare-isolation.test.ts (6 tests)

 Test Files  3 passed (3)
      Tests  36 passed (36)

Review iterations

This PR went through ~20 rounds of automated and human review. Notable hardening from review feedback:

  • Security: invoke-token authentication on hook/route calls, constant-time comparisons (timingSafeEqual + hand-rolled for workerd), miniflare outboundService blocks non-bridge fetches, strict capability enforcement (write:contentread:content)
  • Lifecycle: intentional-stop flag suppresses crash recovery on planned reloads, deferred startup serialization via startupPromise, isHealthy() returns false when restart pending, stopWorkerd() fast-paths on already-exited and uses local exited flag for SIGKILL fallback
  • Binary handling: ctx.http.fetch returns base64-encoded bytes preserving binary content, RequestInit marshaling preserves multi-value headers ([[name, value]] pairs), Blob/File/FormData/URLSearchParams bodies, ArrayBuffer with byteOffset/byteLength window
  • Storage parity (also fixed in Cloudflare bridge): delegated storageQuery/storageCount to PluginStorageRepository so where/orderBy/cursor/count work correctly. Fixes infinite-loop pagination on shipped plugins like forms-submissions.
  • Media: upload writes to configured Storage adapter, sets status='ready', rolls back storage object on DB failure (best-effort cleanup); delete removes the storage object too
  • Workerd binary: resolved from package bin/workerd (no npx runtime download), execFileSync for paths with spaces, stdout/stderr drained to prevent pipe deadlock
  • Marketplace bypass: loadMarketplacePluginsBypassed runs before pipeline creation on cold start; syncMarketplacePluginsBypassed handles runtime install/update/uninstall and rebuilds the hook pipeline

Changes

New: packages/workerd/

File Purpose
src/sandbox/runner.ts WorkerdSandboxRunner implementing SandboxRunner, child process lifecycle, auth token generation, intentional-stop guard, epoch-based stale handle detection
src/sandbox/dev-runner.ts MiniflareDevRunner for development, auto-selected when NODE_ENV === "development"
src/sandbox/bridge-handler.ts Shared bridge dispatch with capability enforcement, matching Cloudflare PluginBridge parity, delegating storage to PluginStorageRepository
src/sandbox/backing-service.ts HTTP server wrapper adding auth token validation around bridge handler
src/sandbox/capnp.ts workerd capnp config generator from plugin manifests, globalOutbound routes through backing service
src/sandbox/wrapper.ts Plugin wrapper code generator (runs inside workerd), constant-time invoke-token check, RequestInit marshaling, base64 binary preservation

Modified: packages/core/

File Change
plugins/sandbox/types.ts Added isHealthy() to SandboxRunner interface, SandboxUnavailableError class, mediaStorage in SandboxOptions
plugins/sandbox/noop.ts Implements isHealthy(), error message mentions both CF and workerd
astro/integration/runtime.ts Added sandbox?: boolean config option (escape hatch)
astro/integration/virtual-modules.ts Generates sandboxBypassed flag when sandbox: false
astro/middleware.ts Reads sandboxBypassed via namespace import (handles missing export)
emdash-runtime.ts loadBypassedPlugins + loadMarketplacePluginsBypassed, runtime sync, isSandboxBypassed(), threads mediaStorage into sandbox runners
api/handlers/marketplace.ts handleMarketplaceInstall/handleMarketplaceUpdate accept sandboxBypassed, skip SANDBOX_NOT_AVAILABLE gate when set
index.ts Exports SandboxUnavailableError, createHttpAccess, createUnrestrictedHttpAccess, repositories used by platform adapters

Modified: packages/cloudflare/

File Change
sandbox/runner.ts Implements isHealthy() (delegates to isAvailable()), passes storageConfig to bridge bindings
sandbox/bridge.ts Storage operations delegate to PluginStorageRepository (parity fix)

Docs

File Change
plugins/sandbox.mdx Added Node.js + workerd setup guide, 3-column security comparison, escape hatch docs, honest resource-limit caution box
plugins/creating-plugins.mdx Added "Testing in the Sandbox" section with capability behavior table (methods throw, not "missing from ctx")

Changeset

.changeset/bumpy-crabs-nail.mdemdash minor, @emdash-cms/cloudflare patch, @emdash-cms/workerd minor.

Copilot AI review requested due to automatic review settings April 10, 2026 04:15
@changeset-bot
Copy link
Copy Markdown

changeset-bot Bot commented Apr 10, 2026

🦋 Changeset detected

Latest commit: 4f15fc1

The changes in this PR will be included in the next version bump.

This PR includes changesets to release 14 packages
Name Type
emdash Minor
@emdash-cms/cloudflare Minor
@emdash-cms/sandbox-workerd Minor
@emdash-cms/fixture-perf-site Patch
@emdash-cms/perf-demo-site Patch
@emdash-cms/cache-demo-site Patch
@emdash-cms/admin Minor
@emdash-cms/auth Minor
@emdash-cms/blocks Minor
@emdash-cms/gutenberg-to-portable-text Minor
@emdash-cms/x402 Minor
create-emdash Minor
@emdash-cms/auth-atproto Patch
@emdash-cms/plugin-embeds Patch

Not sure what this means? Click here to learn what changesets are.

Click here if you're a maintainer who wants to add another changeset to this PR

@github-actions
Copy link
Copy Markdown
Contributor

Scope check

This PR changes 3,719 lines across 29 files. Large PRs are harder to review and more likely to be closed without review.

If this scope is intentional, no action needed. A maintainer will review it. If not, please consider splitting this into smaller PRs.

See CONTRIBUTING.md for contribution guidelines.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a new Node.js sandbox runner based on workerd (plus a dev runner using Miniflare) to bring Cloudflare-like plugin isolation/capability enforcement to non-Workers deployments, along with core interface updates (SandboxRunner.isHealthy(), SandboxUnavailableError) and accompanying docs/tests.

Changes:

  • Introduces new @emdash-cms/workerd package implementing the SandboxRunner contract (workerd sidecar + backing service; Miniflare dev runner).
  • Extends core sandbox APIs (new isHealthy() + SandboxUnavailableError) and wires an opt-out sandbox?: boolean integration flag.
  • Updates docs and adds a new conformance/integration test suite for bridge behavior and isolation.

Reviewed changes

Copilot reviewed 28 out of 29 changed files in this pull request and generated 16 comments.

Show a summary per file
File Description
pnpm-workspace.yaml Adds @types/better-sqlite3 to the shared catalog.
pnpm-lock.yaml Locks new workspace package/deps and catalog specifier updates.
packages/workerd/tsconfig.json TypeScript config for the new workerd package build output.
packages/workerd/test/plugin-integration.test.ts Integration tests for real plugin-like operations against the bridge handler.
packages/workerd/test/miniflare-isolation.test.ts Miniflare isolation behavior tests (service bindings, KV, dynamic code, etc.).
packages/workerd/test/bridge-handler.test.ts Bridge handler conformance tests (capabilities, KV/storage isolation, errors).
packages/workerd/src/sandbox/wrapper.ts Generates the plugin wrapper code that proxies ctx.* via HTTP bridge calls.
packages/workerd/src/sandbox/runner.ts Production workerd sidecar runner (process lifecycle, tokens, backing service).
packages/workerd/src/sandbox/index.ts Public exports for the workerd sandbox surface.
packages/workerd/src/sandbox/dev-runner.ts Development runner intended to use Miniflare for faster iteration.
packages/workerd/src/sandbox/capnp.ts Generates workerd capnp config for per-plugin nanoservices.
packages/workerd/src/sandbox/bridge-handler.ts Shared bridge dispatcher implementing capability enforcement and DB access.
packages/workerd/src/sandbox/backing-service.ts Node HTTP backing service wrapper that authenticates and dispatches bridge calls.
packages/workerd/src/index.ts Package entrypoint export wiring.
packages/workerd/package.json Defines the new @emdash-cms/workerd package metadata and deps.
packages/marketplace/package.json Switches @types/better-sqlite3 to catalog version.
packages/core/src/plugins/sandbox/types.ts Adds isHealthy() to SandboxRunner and introduces SandboxUnavailableError.
packages/core/src/plugins/sandbox/noop.ts Implements isHealthy() and updates the “sandbox not available” error text.
packages/core/src/plugins/sandbox/index.ts Exports SandboxUnavailableError.
packages/core/src/plugins/index.ts Re-exports the new sandbox error and HTTP access helpers.
packages/core/src/index.ts Exposes SandboxUnavailableError and HTTP access creation helpers publicly.
packages/core/src/emdash-runtime.ts Improves startup warning when sandbox runner is configured but unavailable.
packages/core/src/astro/integration/vite-config.ts Adds sandbox: false escape hatch behavior to virtual module generation.
packages/core/src/astro/integration/runtime.ts Adds sandbox?: boolean config option documentation/type.
packages/core/package.json Switches @types/better-sqlite3 to catalog version.
packages/cloudflare/src/sandbox/runner.ts Implements isHealthy() for the Cloudflare runner.
docs/src/content/docs/plugins/sandbox.mdx Documents Node+workerd sandboxing and adds the escape hatch + comparison table.
docs/src/content/docs/plugins/creating-plugins.mdx Adds “Testing in the Sandbox” guidance and behavior differences table.
.changeset/bumpy-crabs-nail.md Changeset describing new workerd sandboxing feature and related exports.
Files not reviewed (1)
  • pnpm-lock.yaml: Language not supported

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread packages/workerd/src/sandbox/dev-runner.ts Outdated
Comment thread packages/workerd/src/sandbox/runner.ts
Comment thread packages/workerd/src/sandbox/runner.ts Outdated
Comment thread packages/workerd/src/sandbox/runner.ts
Comment thread packages/workerd/src/sandbox/runner.ts
Comment thread docs/src/content/docs/plugins/sandbox.mdx Outdated
Comment thread packages/workerd/src/sandbox/runner.ts
Comment thread packages/workerd/src/sandbox/runner.ts Outdated
Comment thread packages/workerd/src/sandbox/runner.ts
Comment thread packages/workerd/src/sandbox/runner.ts Outdated
@github-actions
Copy link
Copy Markdown
Contributor

Overlapping PRs

This PR modifies files that are also changed by other open PRs:

This may cause merge conflicts or duplicated work. A maintainer will coordinate.

@BenjaminPrice BenjaminPrice marked this pull request as draft April 10, 2026 10:30
@pkg-pr-new
Copy link
Copy Markdown

pkg-pr-new Bot commented Apr 10, 2026

Open in StackBlitz

@emdash-cms/admin

npm i https://pkg.pr.new/@emdash-cms/admin@426

@emdash-cms/auth

npm i https://pkg.pr.new/@emdash-cms/auth@426

@emdash-cms/blocks

npm i https://pkg.pr.new/@emdash-cms/blocks@426

@emdash-cms/cloudflare

npm i https://pkg.pr.new/@emdash-cms/cloudflare@426

emdash

npm i https://pkg.pr.new/emdash@426

create-emdash

npm i https://pkg.pr.new/create-emdash@426

@emdash-cms/gutenberg-to-portable-text

npm i https://pkg.pr.new/@emdash-cms/gutenberg-to-portable-text@426

@emdash-cms/x402

npm i https://pkg.pr.new/@emdash-cms/x402@426

@emdash-cms/plugin-ai-moderation

npm i https://pkg.pr.new/@emdash-cms/plugin-ai-moderation@426

@emdash-cms/plugin-atproto

npm i https://pkg.pr.new/@emdash-cms/plugin-atproto@426

@emdash-cms/plugin-audit-log

npm i https://pkg.pr.new/@emdash-cms/plugin-audit-log@426

@emdash-cms/plugin-color

npm i https://pkg.pr.new/@emdash-cms/plugin-color@426

@emdash-cms/plugin-embeds

npm i https://pkg.pr.new/@emdash-cms/plugin-embeds@426

@emdash-cms/plugin-forms

npm i https://pkg.pr.new/@emdash-cms/plugin-forms@426

@emdash-cms/plugin-webhook-notifier

npm i https://pkg.pr.new/@emdash-cms/plugin-webhook-notifier@426

commit: 4f15fc1

@BenjaminPrice BenjaminPrice marked this pull request as ready for review April 10, 2026 12:47
Copy link
Copy Markdown
Collaborator

@ascorbic ascorbic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work on this! I've spent the morning going through this with Claude and it's neary there but I just have a few comments.

I think the fact there's now a bit more latency on the bridge means we should add content.createMany/updateMany/deleteMany to the API to avoid so many round-trips.

For media uploads I think it's best if we move the files out of the JSON bridge entirely, as it's not very efficient. Better to use somehting like the way it's done for public storage, where it gets a signed URL and then is uploaded directly. This could be handled inside the wrapper with a binding so the API appears the same to the plugin itself.

One thing to consider too is defaulting to using Unix sockets instead of TCP (with fallback in Windows dev).

Comment thread packages/workerd/src/sandbox/bridge-handler.ts Outdated
Comment thread packages/workerd/src/sandbox/wrapper.ts Outdated
Comment thread packages/workerd/src/sandbox/wrapper.ts Outdated
Comment thread packages/workerd/src/sandbox/runner.ts
Comment thread packages/core/src/emdash-runtime.ts
Comment thread packages/workerd/src/sandbox/bridge-handler.ts
Comment thread packages/workerd/src/sandbox/dev-runner.ts
Comment thread packages/workerd/src/sandbox/runner.ts Outdated
Comment thread packages/workerd/src/sandbox/runner.ts Outdated
Comment thread packages/workerd/src/sandbox/runner.ts
@ascorbic
Copy link
Copy Markdown
Collaborator

Also, we definitely need some real workerd tests for this. Finally: it should be called sandbox-workerd not just workerd

@ascorbic
Copy link
Copy Markdown
Collaborator

@BenjaminPrice it would be great to get this in! Will you be able to look at the comments, or do you want me to take it on instead?

@BenjaminPrice
Copy link
Copy Markdown
Contributor Author

BenjaminPrice commented Apr 30, 2026 via email

@BenjaminPrice BenjaminPrice force-pushed the feat/node-plugin-isolation branch from a846039 to 3812e5a Compare May 3, 2026 10:28
BenjaminPrice added a commit to BenjaminPrice/emdash that referenced this pull request May 3, 2026
Addresses all inline feedback from ascorbic's April 18 review on PR emdash-cms#426.

Fixes:
- Version increment uses idiomatic Kysely object form (bridge-handler)
- Blob body headers bug: array-based lookup instead of object access (wrapper)
- Request body size limit (10MB) on backing service
- storageQuery limit enforcement (cap at 100, matching content/media/users)
- SIGTERM handler deduplicated to one per process (module-level singleton)
- Windows workerd binary resolution (.exe suffix)
- Explicit /__ready endpoint replaces fragile /__health polling
- Dev runner enforces wall-time limits (withWallTimeLimit wrapper)
- Explicit EMDASH_SANDBOX_DEV env var for dev runner selection
- Eager workerd restart on plugin load/unload (50ms debounced)
- Handler cache cleanup on plugin unload (keyed by pluginId)
- Media upload uses base64 encoding instead of JSON number arrays
- Batch content APIs: createMany/updateMany/deleteMany (max 100 items)
- Unix socket plumbing for backing service (disabled pending capnp validation)
- plugin:deactivate hook fires on marketplace plugin removal
- Sandbox availability warning fires regardless of plugin count
- Cloudflare Workers detection for sandbox:false (clear error message)
- Package renamed from @emdash-cms/workerd to @emdash-cms/sandbox-workerd
- Workerd integration test suite (skips if workerd unavailable)
…isolation

Proves that miniflare (wrapping workerd) supports all capabilities needed
for sandboxed plugin execution on the Node deployment path:

- Plugin code loads from strings (no filesystem, bundles from DB/R2 work)
- Service bindings between workers provide capability scoping
- External service bindings route plugin calls to Node handler functions
- KV namespace bindings provide per-plugin isolated storage
- Plugins without bindings cannot access unavailable capabilities
- Dispose/recreate cycle supports plugin install/uninstall

Key finding: miniflare's serviceBindings with async Node handlers
eliminates the need for a separate HTTP backing service server. The
bridge calls route directly from workerd isolates to Node functions.
…p config

Implements the SandboxRunner interface for Node.js deployments using
workerd as a sidecar process:

- WorkerdSandboxRunner: spawns workerd via child_process, manages
  lifecycle with epoch-based stale handle detection and health checks
- Backing service: authenticated HTTP server in Node handling plugin
  bridge calls (content, media, KV, storage, email, users, network)
- Auth: per-startup HMAC secret, per-plugin tokens encoding capabilities.
  Server-side capability validation on every request.
- capnp config generator: creates workerd config from plugin manifests,
  each plugin as a nanoservice with its own port
- Plugin wrapper: generates JS that runs inside workerd isolate,
  proxying ctx.* calls via HTTP fetch to the backing service
- Wall-time enforcement via Promise.race (matching Cloudflare pattern)
Extends the SandboxRunner interface with isHealthy() for sidecar-based
runners where the sandbox process can crash independently of the host.

- SandboxRunner.isHealthy(): returns false when sidecar is down
- SandboxUnavailableError: typed error for stale handles and unavailable sandbox
- NoopSandboxRunner: implements isHealthy() (always false)
- CloudflareSandboxRunner: implements isHealthy() (delegates to isAvailable)
- WorkerdSandboxRunner: exponential backoff restart on crash (1s, 2s, 4s,
  cap 30s, give up after 5 failures in 60s), SIGTERM forwarding to child
- SandboxNotAvailableError message updated to mention both Cloudflare and
  workerd sandbox runners (no longer Cloudflare-specific)
…F protection

Replaces the naive hostname-only check in the workerd backing service
with core's createHttpAccess/createUnrestrictedHttpAccess. This gives
the workerd sandbox runner identical behavior to in-process plugins:

- Redirect targets revalidated against allowedHosts on each hop
- Credential headers stripped on cross-origin redirects
- SSRF protection blocks private IPs, cloud metadata endpoints
- Max 5 redirects enforced

Exports createHttpAccess and createUnrestrictedHttpAccess from the
emdash package so platform adapters can reuse the shared policy layer.
…y warnings

Adds debugging escape hatch and clearer messaging for sandbox availability:

- sandbox: false config option explicitly disables plugin sandboxing even
  when a sandboxRunner is configured, for isolating whether bugs are in
  plugin code or in the sandbox runtime
- Upgrades sandbox-unavailable log from console.debug to console.warn with
  actionable message mentioning workerd installation
- SandboxNotAvailableError message now references both @emdash-cms/cloudflare/sandbox
  and @emdash-cms/workerd/sandbox as options
@ascorbic
Copy link
Copy Markdown
Collaborator

@BenjaminPrice this is really close! It would be great if we could address the last few points and get it out

@BenjaminPrice
Copy link
Copy Markdown
Contributor Author

@BenjaminPrice this is really close! It would be great if we could address the last few points and get it out

Sorry, work has been quite busy. I'll try to get to this during this week.

… fetch

The URLSearchParams branch of marshalRequestInit was using object-bracket
header assignment on out.headers, but out.headers is an array of [name,
value] pairs everywhere else in the function. The bracket assignment put
a non-numeric property on the array — invisible to the bridge handler's
unmarshalRequestInit (which iterates pairs) and dropped by JSON.stringify
during transport. Servers received URLSearchParams POSTs without a
Content-Type and rejected them as non-form-encoded.

Switch to the same array-shape pattern used by the Blob branch above.
Math.min(Number(opts.limit) || 50, 100) accepted negative numbers: a
plugin sending { limit: -5 } produced query.limit(-4), which SQLite
silently treats as zero rows and PostgreSQL rejects. Apply the same
Math.max(1, ...) floor used in userList to contentList, mediaList, and
storageQuery, preserving storageQuery's undefined-passthrough so callers
can still omit the limit field.
waitForReady() only probed plugins.values().next().value. With N plugins
workerd opens N sockets and any one of them can fail to bind (port
collision, import-time error). If plugin 1 bound but plugin 2 didn't,
the runner reported ready and the next invokeHook on plugin 2 hung to
the per-call timeout or hit ECONNREFUSED.

Extract a probeAllReady() helper that fetches /__ready on every plugin
in parallel and returns true only when every probe is ok. waitForReady()
calls it each tick. Helper is module-exported for tests but not added
to the package barrel.
contentCreateMany / contentUpdateMany / contentDeleteMany looped with
await across per-item helpers. If item 50 of 100 threw, the first 49
were already committed, the plugin saw a generic error, and there was
no way to know which prefix had landed. A plugin author calling a verb
named "createMany" expects atomicity.

Wrap each many-op body in db.transaction().execute(trx => ...) so a
mid-batch failure rolls the whole batch back. MAX_BATCH_SIZE guard
stays outside the transaction.
The exit handler registered inside restart() captured this and would
unconditionally mutate this.workerdProcess and this.healthy whenever it
fired. If a late exit from a prior process landed after a newer
workerd had already been assigned to this.workerdProcess, the stale
handler nulled out the live process's handle and marked it unhealthy.

Capture proc into a local immediately after spawn and route the exit
listener through makeWorkerdExitHandler, which checks
host.workerdProcess !== proc and bails before touching shared state.
Helper is module-exported for unit testing but not added to the package
barrel.
scheduleEagerStart used `void this.ensureRunning()`, which swallowed
startup rejections (ENOENT on the workerd binary, capnp parse errors,
waitForReady timeouts). On Node 22+ those rejections become
unhandledRejection, and they bypassed crashCount/scheduleRestart so the
runner stayed silently down with no automatic retry.

Attach a .catch that logs the error and calls scheduleRestart, which
already handles backoff and the 5-failure-window cap.
stopWorkerd's SIGTERM-then-SIGKILL fallback timer was never cleared on
graceful exit. The `if (!exited)` guard prevented an actual signal
from being sent, but the pending timer itself kept the Node event
loop alive for up to 5s past termination -- so terminateAll() and
shutdown stalled.

Extract the kill-with-escalation dance into waitForProcessExit() and
clear the timer inside the exit listener. Switch the escalation check
to proc.exitCode === null, which matches actual process state more
precisely than the exited flag.
nextPluginPort climbed monotonically and unloadPlugin never returned
the port to the pool. Long-running sites with frequent marketplace
install/uninstall (or dev watcher reloads) eventually walked toward
the top of the ephemeral range, raising the odds of collision with
other listeners on the host.

Add a freePorts stack; load() pops from it before incrementing
nextPluginPort, unloadPlugin() pushes the freed port back. This does
not handle pre-existing host-port collisions (no EADDRINUSE probe yet),
but it prevents the leak the bot flagged.
The map was written on bridge-handler insertion and iterated in
removePlugin, but no read path ever consulted it -- the cache was
keyed by claims.pluginId, not by token. Pure dead code that grew
unbounded under plugin churn.
The runner spawned workerd with env: { ...process.env }, handing every
parent env var to a child that runs untrusted plugin code. Empirical
testing showed workerd 1.x's nodejs_compat polyfill returns an empty
process.env to plugins regardless of the child's actual environment,
so today this isn't an active leak -- but that behavior is an
undocumented implementation detail that a future workerd release could
change without warning.

Introduce minimalWorkerdEnv() that allowlists only what workerd needs
to start (PATH, HOME, TMPDIR/TMP/TEMP, LANG, LC_ALL). Operators who
need extra vars can list them in EMDASH_WORKERD_PASSTHROUGH_ENV. The
integration test suite still spawns a real workerd and confirms it
boots under the new minimal env.
Brings 138 commits from upstream main onto the branch so it merges
cleanly into the PR base. Conflicts resolved:

- packages/core/src/emdash-runtime.ts: upstream refactored
  syncMarketplacePlugins into a generic syncSandboxedSourcePlugins
  (also adding syncRegistryPlugins). Kept that structure while
  preserving the sandbox-bypass branch we added in this PR.
  Also propagated the SandboxedPlugin -> SandboxedPluginInstance
  rename through loadSandboxedPlugins' signature.
- pnpm-lock.yaml: regenerated via pnpm install on top of upstream's
  lockfile so workerd's deps (workerd, ulidx, miniflare,
  better-sqlite3, kysely, etc.) are present.
- packages/workerd/src/sandbox/{runner,dev-runner}.ts: imported the
  renamed SandboxedPluginInstance type and updated return types /
  implements clauses to match.

All 73 workerd tests pass after the merge.
Copy link
Copy Markdown
Collaborator

@ascorbic ascorbic left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Amazing work! Great to get this in

@neuhaus
Copy link
Copy Markdown

neuhaus commented May 22, 2026

Great work guys! Someone still has to press the merge button I guess?

@ascorbic ascorbic merged commit 02ed8ba into emdash-cms:main May 22, 2026
33 checks passed
@emdashbot emdashbot Bot mentioned this pull request May 22, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants