Add data directory locking to NodeFS to prevent multi-process corruption#892
Add data directory locking to NodeFS to prevent multi-process corruption#892reisepass wants to merge 1 commit into
Conversation
|
This makes a lot of sense. Accidental multi-process access during local dev is very real, especially when using pnpm dev in multiple terminals. The lock file approach feels like a clean and pragmatic safeguard, especially since Postgres assumes exclusive control over the data directory. Curious is there any plan to expose a clearer error message when the second process is rejected so it’s obvious what happened? |
|
This is a really practical improvement. Locking the data dir should prevent a whole class of annoying dev-time corruption issues, and the partial initdb handling makes the setup much more robust. Great work! |
|
@reisepass Thank you for this!
I would prefer if you address only the lock file issue for the moment. I just skimmed through the changes (sorry, really busy with other things), seems like there are a lot of other files (tests?) added. For the sake of simplicity, if the only thing addressed is the locking of folder access, please consolidate in a single test file or add to existing test files the minimum that tests the new functionality.
Sounds reasonable. |
|
@tdrz what do you think now its much fewer files as requested |
Thank you! Looks pretty good, I'll take a closer look this week. Thank you for your work and patience! |
|
Hurray for CI, so instantiation.test.ts , drop-database.test.ts and pgvector.test.ts do not call close() it seems. |
Some do not call it intentionally: This test in particular could be adapted to delete the lock before creating
Probably, but needs careful consideration. |
|
Also this points out that probably same-process sequential reuse should be allowed. While pglite is single threaded if you are running in the same nodeJS process i guess differetn pglite instances must take tunrs since nodejs is single threaded so multiple connections to the same file is allowed if in the same process |
Postgres assumes exclusive control of its data directory; two PGlite
instances writing the same dataDir silently corrupt it. NodeFS now
takes a sibling lock file (dataDir.lock) holding the owner PID:
- Acquisition is atomic: exclusive create ('wx'), so two racing
processes can never both succeed. A stale lock (holder PID is dead)
is claimed by atomic rename before retrying, so a losing racer can
never remove a winner's freshly created lock.
- Another live process holds the lock: throw with the PID and guidance.
Reclaiming only happens when the holder is dead, so PID reuse can at
worst cause a conservative refusal, never steal a live lock.
- Release verifies ownership: the lock file is only unlinked if it
still holds this instance's token, and only after FS teardown.
- Another instance in this same process holds the dataDir: throw at the
creation site. Deleting the lock file is the explicit override.
- Opt-in takeover for HMR-style dev servers, where module reloads
create a fresh instance and the abandoned one can never be closed:
new PGlite({ fs: new NodeFS(dataDir, { takeover: true }) }) closes
the previous instance cleanly before claiming the directory.
Tests live in a single file (tests/nodefs-lock.test.js). Existing tests
that hold a dataDir now close their instances; the unclean-shutdown
test deletes the stale lock before reopening. test:clean also removes
the sibling .lock file.
|
been a while, but looks like upstread did not change too much for this to still be valid. as per comment the unclean-shutdown test now deletes the stale lock before creating On same-process sequential reuse, I gave it the consideration:
So the default stays a loud error at creation time, and deleting the lock file (your test adaptation) is the explicit override. Details on the rest of the changes in the next comment. |
|
Beyond that, since the last review:
|
I'v been running into issues with pglite getting corrupted in persist to disk mode. The most human scenario is that you
have on pnpm dev running in one window and then you start for a quick test not remembering that you already had one open.
In my dev work this honestly happens daily so it was hard to use pglite practically as an sqlite replacement.
But simple fix just add a lock file.
This is nothing fancy since postgres assumes it has full control we just reject the second process trying to use the data
dir.