Skip to content

fix(dashboard): fix Windows named pipe race on daemon shutdown#40642

Merged
Skn0tt merged 2 commits intomicrosoft:mainfrom
Skn0tt:fix-dashboard-kill-pipe-drain
May 6, 2026
Merged

fix(dashboard): fix Windows named pipe race on daemon shutdown#40642
Skn0tt merged 2 commits intomicrosoft:mainfrom
Skn0tt:fix-dashboard-kill-pipe-drain

Conversation

@Skn0tt
Copy link
Copy Markdown
Member

@Skn0tt Skn0tt commented May 5, 2026

Summary

Fixes a Windows CI flake where dashboard tests sporadically failed with Cannot read properties of undefined (reading 'endpoint').

The root cause: after --kill, Windows keeps the named pipe connectable during gracefullyCloseAll. The next test could connect to the dying daemon, get rejected mid-handshake, and see server.endpoint as undefined — cascading failures across many tests.

Daemon side (dashboardApp.ts): move the kill handler before statePromise.then() so it fires immediately. Send the daemon PID in socket.end(data, callback) and only call gracefullyProcessExitDoNotHang inside the flush callback, ensuring the client receives the PID before teardown begins.

Client side (runKillClient): after receiving the daemon PID, poll process.kill(pid, 0) until ESRCH (up to 35 s). This guarantees the named pipe is fully released before the next test tries to acquire the singleton.

Test fixture (cli-fixtures.ts): give connectToDashboard a 60 s fixture timeout.

Also reverts f536e37 (added EBUSY check + server.close in singleton handler) which was a previous incomplete attempt at fixing this.

Fixes #40626

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@Skn0tt Skn0tt changed the title fix(dashboard): wait for pipe to be released after --kill fix(dashboard): fix Windows named pipe race on daemon shutdown May 6, 2026
@Skn0tt Skn0tt force-pushed the fix-dashboard-kill-pipe-drain branch 2 times, most recently from 8b12ec2 to cec169c Compare May 6, 2026 11:05
The previous test left the Windows named pipe connectable while the daemon
was still shutting down (gracefullyCloseAll). The next test could connect
to the dying daemon, get rejected, and leave server.endpoint undefined -
cascading failures across many tests.

Two changes fix this:

1. Daemon side: send PID to the client and only call
   gracefullyProcessExitDoNotHang in the socket.end() flush callback,
   ensuring the client receives the PID before teardown begins. Also move
   the handler before statePromise.then() so it fires immediately.

2. Client side: after receiving the daemon PID, poll process.kill(pid, 0)
   until ESRCH (up to 35s). This guarantees the named pipe is released
   before the next test tries to acquire the singleton.

Also reverts f536e37 (EBUSY check + server.close in singleton handler)
which was a previous incomplete attempt, and gives the connectToDashboard
fixture a 60s timeout to accommodate graceful browser teardown.
@Skn0tt Skn0tt force-pushed the fix-dashboard-kill-pipe-drain branch from cec169c to 5fe3cb5 Compare May 6, 2026 11:08
@Skn0tt Skn0tt marked this pull request as ready for review May 6, 2026 11:08
@Skn0tt Skn0tt requested a review from dgozman May 6, 2026 11:08
else
resolve(pid);
});
client.once('error', () => reject(new Error('no dashboard running')));
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd instead resolve with undefined - checking error message always feels brittle.

}
// Poll until the daemon process exits — at that point the OS has released all
// its handles, including the named pipe, so the next acquisition won't see a stale pipe.
const deadline = Date.now() + 35000;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Let's use monotonicTime().
  • Why do we process.kill() in a loop? Shouldn't once be enough?

Copy link
Copy Markdown
Member Author

@Skn0tt Skn0tt May 6, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

process.kill(, 0) is for checking liveness, not killing. So we have to loop it.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see, that was not clear to me 😄 I guess there is still a small chance of a race due to pid reuse? Perhaps we can first gracefullyCloseAll() and then reply with the pid? This way exiting the process should be instant, so we reduce the probability of a race.

@Skn0tt Skn0tt requested a review from dgozman May 6, 2026 11:23
@github-actions

This comment has been minimized.

@github-actions

This comment has been minimized.

@Skn0tt
Copy link
Copy Markdown
Member Author

Skn0tt commented May 6, 2026

The failures above are unrelated, they also occur on main: https://github.com/microsoft/playwright/actions/runs/25431767679/job/74599532404

Gonna look at them next.

This diff has the risk of PID reuse, see above - i'll come back later, for now this should already help our CI.

@Skn0tt Skn0tt merged commit dd468b4 into microsoft:main May 6, 2026
25 of 30 checks passed
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented May 6, 2026

Test results for "MCP"

9 failed
❌ [msedge] › mcp/dashboard.spec.ts:185 › should start dashboard and annotate when no dashboard is running @mcp-windows-latest-msedge
❌ [msedge] › mcp/dashboard.spec.ts:207 › should enter annotate mode on fresh dashboard.tsx mount with -s --annotate @mcp-windows-latest-msedge
❌ [msedge] › mcp/dashboard.spec.ts:231 › should annotate via direct browser_annotate MCP call @mcp-windows-latest-msedge
❌ [msedge] › mcp/dashboard.spec.ts:264 › should annotate when context has no fixed viewport @mcp-windows-latest-msedge
❌ [msedge] › mcp/dashboard.spec.ts:332 › should cancel browser_annotate when the MCP client disconnects @mcp-windows-latest-msedge
❌ [msedge] › mcp/dashboard.spec.ts:361 › should switch screencast to -s session on show --annotate @mcp-windows-latest-msedge
❌ [webkit] › mcp/config.ini.spec.ts:57 › ini config sets browser launch options @mcp-windows-latest-webkit
❌ [webkit] › mcp/http.spec.ts:103 › http transport browser lifecycle (isolated) @mcp-windows-latest-webkit
❌ [webkit] › mcp/http.spec.ts:142 › http transport browser sigint @mcp-windows-latest-webkit

6934 passed, 1052 skipped


Merge workflow run.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants