fix: daemonize core-agent and remove worker-owned process lifecycle by jrothrock · Pull Request #325 · scoutapp/scout_apm_node

jrothrock · 2026-05-29T16:39:36Z

Problem

Two related bugs in cluster deployments, both rooted in the same design flaw — workers treating the core-agent as a child process they own:

Always-spawns bug: start() called peerRunning() to check if a socket existed, but then called startProcess() unconditionally regardless of the result. In a cluster, every worker that initialised Scout would spawn a new core-agent binary even if one was already running. The second agent would fail to bind the port and exit immediately, leaving that worker with a dead detachedProcess reference.

Cluster-shutdown bug (issue #117): stopProcess() sent SIGKILL to the core-agent process group and removed the socket. When any worker called scout.shutdown() (including on crash/unhandled rejection with allowShutdown: true), it killed the shared core-agent for every other worker in the cluster. Other workers continued serving HTTP requests normally (Scout's send path is async and fire-and-forget after onFinished), but all APM data was silently dropped until a replacement core-agent spawned — exactly the window when visibility is most needed.

Fix

Mirrors the approach used by the Python agent (scout_apm_python/src/scout_apm/core/agent/manager.py):

--daemonize true added to the binary args. The binary forks itself into a true background daemon; the spawned process exits immediately. No PID is retained by the worker.
start() returns early when peerRunning() is true — the worker connects to the existing daemon rather than spawning another.
stopProcess() is now a no-op. Workers have no PID to kill and should not attempt to manage the daemon's lifecycle.
allowShutdown branch removed from Scout.shutdown() — disconnect (draining the socket pool) is the full extent of what a worker does on shutdown.
Removed getProcess() and detachedProcess field entirely.

Test changes

test/util.ts cleanup(): calls agent.disconnect() instead of process.kill()
Updated two tests that relied on getProcess() to reflect the new behaviour

Co-Authored-By: Claude Sonnet 4.6 noreply@anthropic.com

The core-agent is now launched with --daemonize true so the binary forks itself into a true daemon. No PID is retained by the worker. Fixes two related cluster bugs: 1. Always-spawns bug: start() now returns early when peerRunning() is true instead of calling startProcess() unconditionally. Each worker that starts after the first simply connects to the already- running daemon. 2. Cluster-shutdown bug (issue #117): stopProcess() is now a no-op. Workers can no longer kill the shared core-agent when shutting down or crashing, which previously took down Scout for all other workers in the cluster until a replacement was spawned. This mirrors how the Python agent handles the core-agent lifecycle: launch it, forget the PID, let the daemon manage itself. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

… instantly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…rker-owned stopProcess() is now a no-op; no worker can kill the shared daemon. allowShutdown had no documented behavior and no remaining effect. Stripped from ScoutConfiguration type and all test callsites. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jrothrock and others added 3 commits May 29, 2026 16:39

test: verify first start() waits for daemon, subsequent calls resolve…

3b1fc47

… instantly Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

jrothrock changed the base branch from master to docs/update-readme May 29, 2026 22:28

jrothrock force-pushed the fix/daemonize-core-agent branch 2 times, most recently from 6663542 to 25564af Compare May 29, 2026 22:58

jrothrock changed the base branch from docs/update-readme to master May 29, 2026 22:58

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix: daemonize core-agent and remove worker-owned process lifecycle#325

fix: daemonize core-agent and remove worker-owned process lifecycle#325
jrothrock wants to merge 3 commits into
masterfrom
fix/daemonize-core-agent

jrothrock commented May 29, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

jrothrock commented May 29, 2026

Problem

Fix

Test changes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant