Note: The details in this issue were gathered during an investigative Copilot session. Line numbers and code references are based on the current main branch and may have shifted. The findings and repro steps have been validated manually.
Summary
MSBuild's out-of-process node reuse is non-functional on macOS. Three independent bugs combine so that:
- Node reuse always fails — every idle node times out during the handshake
- Cross-terminal reuse is impossible — handshake mismatch due to
SessionId / getsid() semantics on Unix
- Orphaned nodes accumulate indefinitely — failed reuse probes keep nodes alive far past their configured idle timeout
In a typical multi-build workflow (e.g., multiple VS Code windows or git worktrees), hundreds of MSBuild worker nodes accumulate and never exit, consuming memory with no possibility of reuse.
Environment: macOS 15 (ARM64), .NET SDK 10.0.103
Bug 1: 50ms Handshake Timeout (Primary — Reuse Always Fails)
When the scheduler tries to reuse an existing idle node, it calls TryConnectToProcess with timeout 0:
// NodeProviderOutOfProcBase.cs:321
nodeToReuse.TryConnectToProcess(nodeToReuse.Id, 0 /* poll */, ...);
On Unix, this is clamped to 50ms:
// CommunicationsUtilities.cs:883
timeout = Math.Max(timeout, 50);
50ms is insufficient for a sleeping node to wake, read the handshake from the pipe, validate it, and respond. Reuse always times out.
Trace Evidence (MSBUILDDEBUGCOMM=1)
Writing handshake part 1 (options) to pipe 27254... (Timeout 50ms)
Writing handshake part 2 (salt) to pipe 27254...
Writing handshake part 3-6 (version) to pipe 27254...
Writing handshake part 7 (sessionId) to pipe 27254...
Did not receive return handshake in 50ms. Probably not an MSBuild process.
Handshake is written successfully (pipe connects to a valid idle node), but the response read times out at 50ms. This happens for every single idle node.
Suggested Fix
Add a dedicated reuse timeout (~1000ms). Busy nodes fail instantly (pipe occupied), so this won't slow builds:
private const int TimeoutForNodeReuse = 1000;
nodeToReuse.TryConnectToProcess(nodeToReuse.Id, TimeoutForNodeReuse, ...);
Bug 2: SessionId / getsid() Breaks Cross-Terminal Reuse
The handshake includes a SessionId component. On Unix, Process.SessionId calls getsid(), which returns the session leader PID:
// CommunicationsUtilities.cs:290-294
sessionId = currentProcess.SessionId;
On macOS, each terminal window and VS Code integrated terminal has a different session leader PID. Nodes spawned from one terminal can never be reused by builds in another — the handshake always mismatches.
# Terminal 1: SESS=1937
# Terminal 2: SESS=2504
# VS Code: SESS=3102
Suggested Fix
On non-Windows, set sessionId = 0. Unix doesn't need the RDP session isolation that motivated this field:
if (!NativeMethodsShared.IsWindows)
sessionId = 0;
else
sessionId = currentProcess.SessionId;
Bug 3: ClientConnectTimeout Keeps Nodes Alive Indefinitely
When a reuse probe (Bug 1) connects to an idle node's pipe, writes the handshake, then disconnects after 50ms, the node side is stuck reading the handshake:
// NodeEndpointOutOfProcBase.cs:48
private const int ClientConnectTimeout = 60000; // 60 seconds!
The node's PacketPumpProc calls TryReadIntForHandshake(..., ClientConnectTimeout) which blocks for up to 60 seconds per field, even though the client disconnected. With multiple concurrent builds, each probing idle nodes, a node gets continuously probed: each failed probe blocks it for up to 60s, preventing the idle connection timeout (MSBUILDNODECONNECTIONTIMEOUT) from ever being checked.
This is why MSBUILDNODECONNECTIONTIMEOUT=60000 has no effect in practice.
Suggested Fix
Reduce ClientConnectTimeout to match the reuse probe timeout, or detect broken pipes early and abort the handshake read.
How the Bugs Interact
Build starts
→ Scheduler finds 50+ idle nodes from previous builds
→ Tries to connect to each with 0ms timeout (clamped to 50ms) [Bug 1]
→ Handshakes also fail due to SessionId mismatch [Bug 2]
→ All reuse attempts fail → spawns fresh nodes
→ Failed probes keep idle nodes stuck in 60s handshake reads [Bug 3]
→ Idle nodes never reach their timeout check → never exit
→ Next build repeats with even MORE idle nodes
→ System accumulates hundreds of orphaned MSBuild processes
Workaround Assessment
| Workaround |
Effective? |
Downside |
MSBUILDDISABLENODEREUSE=1 |
Yes — nodes exit after build |
10+ concurrent builds spawn too many total processes, machine freezes |
MSBUILDNODECONNECTIONTIMEOUT=60000 |
No — Bug 3 prevents timeout from firing |
- |
| External watchdog script (kill idle nodes) |
Yes |
Requires extra tooling |
-m:N to limit nodes per build |
Partially |
Reduces peak count but doesn't fix the accumulation |
No good client-side workaround exists. These bugs need to be fixed in MSBuild.
Repro Steps
- macOS with .NET SDK installed
- Clone a non-trivial .NET solution (e.g., 20+ projects)
- Open 3+ terminal windows
- Run
dotnet build in each terminal
- After builds complete, check:
pgrep -f 'MSBuild.dll.*nodemode' | wc -l
- Observe: nodes from all builds are still running, none were reused
- Wait 15+ minutes — nodes still running (Bug 3 keeps them alive if any builds ran concurrently)
Source References
| Bug |
File |
Line |
Code |
| 1 |
CommunicationsUtilities.cs |
883 |
Math.Max(timeout, 50) |
| 1 |
NodeProviderOutOfProcBase.cs |
321 |
TryConnectToProcess(..., 0 /* poll */) |
| 2 |
CommunicationsUtilities.cs |
290 |
sessionId = currentProcess.SessionId |
| 3 |
NodeEndpointOutOfProcBase.cs |
48 |
ClientConnectTimeout = 60000 |
| 3 |
NodeEndpointOutOfProcBase.cs |
377-410 |
PacketPumpProc wait loop |
Summary
MSBuild's out-of-process node reuse is non-functional on macOS. Three independent bugs combine so that:
SessionId/getsid()semantics on UnixIn a typical multi-build workflow (e.g., multiple VS Code windows or git worktrees), hundreds of MSBuild worker nodes accumulate and never exit, consuming memory with no possibility of reuse.
Environment: macOS 15 (ARM64), .NET SDK 10.0.103
Bug 1: 50ms Handshake Timeout (Primary — Reuse Always Fails)
When the scheduler tries to reuse an existing idle node, it calls
TryConnectToProcesswith timeout 0:On Unix, this is clamped to 50ms:
50ms is insufficient for a sleeping node to wake, read the handshake from the pipe, validate it, and respond. Reuse always times out.
Trace Evidence (
MSBUILDDEBUGCOMM=1)Handshake is written successfully (pipe connects to a valid idle node), but the response read times out at 50ms. This happens for every single idle node.
Suggested Fix
Add a dedicated reuse timeout (~1000ms). Busy nodes fail instantly (pipe occupied), so this won't slow builds:
Bug 2:
SessionId/getsid()Breaks Cross-Terminal ReuseThe handshake includes a
SessionIdcomponent. On Unix,Process.SessionIdcallsgetsid(), which returns the session leader PID:On macOS, each terminal window and VS Code integrated terminal has a different session leader PID. Nodes spawned from one terminal can never be reused by builds in another — the handshake always mismatches.
Suggested Fix
On non-Windows, set
sessionId = 0. Unix doesn't need the RDP session isolation that motivated this field:Bug 3:
ClientConnectTimeoutKeeps Nodes Alive IndefinitelyWhen a reuse probe (Bug 1) connects to an idle node's pipe, writes the handshake, then disconnects after 50ms, the node side is stuck reading the handshake:
The node's
PacketPumpProccallsTryReadIntForHandshake(..., ClientConnectTimeout)which blocks for up to 60 seconds per field, even though the client disconnected. With multiple concurrent builds, each probing idle nodes, a node gets continuously probed: each failed probe blocks it for up to 60s, preventing the idle connection timeout (MSBUILDNODECONNECTIONTIMEOUT) from ever being checked.This is why
MSBUILDNODECONNECTIONTIMEOUT=60000has no effect in practice.Suggested Fix
Reduce
ClientConnectTimeoutto match the reuse probe timeout, or detect broken pipes early and abort the handshake read.How the Bugs Interact
Workaround Assessment
MSBUILDDISABLENODEREUSE=1MSBUILDNODECONNECTIONTIMEOUT=60000-m:Nto limit nodes per buildNo good client-side workaround exists. These bugs need to be fixed in MSBuild.
Repro Steps
dotnet buildin each terminalpgrep -f 'MSBuild.dll.*nodemode' | wc -lSource References
CommunicationsUtilities.csMath.Max(timeout, 50)NodeProviderOutOfProcBase.csTryConnectToProcess(..., 0 /* poll */)CommunicationsUtilities.cssessionId = currentProcess.SessionIdNodeEndpointOutOfProcBase.csClientConnectTimeout = 60000NodeEndpointOutOfProcBase.csPacketPumpProcwait loop