Skip to content
Closed
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
25 changes: 25 additions & 0 deletions src/tests/backend/common.ts
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,33 @@ import TestAgent from "supertest/lib/agent";
import {Http2Server} from "node:http2";
import {SignJWT} from "jose";
import {privateKeyExported} from "../../node/security/OAuth2Provider";
import * as http from 'node:http';
const webaccess = require('../../node/hooks/express/webaccess');

// Enable HTTP keep-alive on the global agent for the test process. Without
// this, every supertest request opens a fresh TCP connection and the server
// closes it on response — the server side then enters TIME_WAIT for the
// default Windows TcpTimedWaitDelay (~120 s) before the ephemeral port is
// freed.
//
// The Windows backend-test job's OS-level netstat sidecar (PR #7846)
// captured the smoking gun for the silent-ELIFECYCLE flake in run
// 26419467467: localhost server-side TIME_WAIT counts on the Etherpad
// listening port climbed linearly at ~14/s, reaching 228 active TIME_WAIT
// entries on `[::1]:50398` 37 ms before the kill — all server-active-close
// half-dead sockets, all from rapid sequential supertest requests with no
// connection reuse. The kill cluster on Windows + Node 24 + plugins
// correlates tightly with this TIME_WAIT accumulation: it gives libuv a
// large pool of half-dead handles to walk on every IOCP completion.
//
// Setting keepAlive=true on http.globalAgent makes supertest's underlying
// http.request reuse a single TCP connection for sequential requests to
// the same origin, collapsing TIME_WAIT churn from ~14/s to nearly zero.
// Linux is unaffected; the flake was Windows-only because Linux's
// TIME_WAIT recycling is much faster and the kernel can sustain higher
// half-dead-socket counts without symptom.
http.globalAgent.keepAlive = true;

Comment on lines +22 to +45
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Action required

1. Keep-alive breaks shutdown timeout 🐞 Bug ☼ Reliability

Setting http.globalAgent.keepAlive=true can leave an idle HTTP connection open when backend test
teardown calls server.exit(), and server.stop() will fail if shutdown hooks exceed its 3s timeout.
express.closeServer() explicitly allows up to 5s before it forcibly destroys remaining sockets, so
this change can cause teardown to error with "Timed out waiting for shutdown tasks" and fail the
test run.
Agent Prompt
## Issue description
`src/tests/backend/common.ts` enables keep-alive on `http.globalAgent` at module load, but the test teardown does not close/destroy the agent’s sockets before calling `server.exit()`. Etherpad shutdown (`server.stop()`) enforces a 3s timeout for shutdown hooks, while the HTTP server shutdown path (`closeServer()`) may legitimately take up to 5s before it force-terminates remaining sockets.

This creates a new failure mode: an idle keep-alive connection can keep the HTTP server from fully closing within 3 seconds, causing `server.stop()` to throw `Timed out waiting for shutdown tasks` and the test process to exit non-zero.

## Issue Context
- Backend tests start a server and create a long-lived supertest client.
- With keep-alive enabled globally, it is normal for at least one connection to remain open briefly after the last request.
- The shutdown path already anticipates lingering connections (5s force-destroy), but the higher-level shutdown timeout is only 3s.

## Fix Focus Areas
- src/tests/backend/common.ts[156-161]
- src/tests/backend/common.ts[22-45]

### Suggested implementation direction
In the existing `after(async function () { ... })` teardown in `common.ts`, explicitly close keep-alive sockets before `await server.exit()`:
- Call `http.globalAgent.destroy()` (and optionally restore `http.globalAgent.keepAlive` to its previous value).

This keeps the keep-alive behavior during tests, but ensures shutdown hooks complete quickly and deterministically.

ⓘ Copy this prompt and use it to remediate the issue with your preferred AI generation tools

const backups:MapArrayType<any> = {};
let agentPromise:Promise<any>|null = null;

Expand Down
Loading