fix(sync-service): make AsyncDeleter boot and self-heal on a full disk#4599
Draft
erik-the-implementer wants to merge 5 commits into
Draft
fix(sync-service): make AsyncDeleter boot and self-heal on a full disk#4599erik-the-implementer wants to merge 5 commits into
erik-the-implementer wants to merge 5 commits into
Conversation
❌ 1 Tests Failed:
View the top 2 failed test(s) by shortest run time
To view more test analytics, go to the Test Analytics Dashboard |
✅ Deploy Preview for electric-next ready!
To edit notification comments on pull requests, go to your Netlify project configuration. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Electric.AsyncDeletercould not start when its trash directory's filesystem was full:init/1ignoredFile.mkdir_p/1's result and the immediately followingFile.ls!raised a misleadingFile.Error{reason: :enoent}(masking the real:enospc). SinceAsyncDeleteris a stack-supervisor child, this propagated as:failed_to_start_childand crash-looped the whole stack — a self-reinforcing deadlock, becauseAsyncDeleteris the very process that reclaims trash. Fixes #4595.This PR makes the deleter boot resiliently, surface the real error, and own end-to-end recovery so it can start reclaiming space precisely when the disk is full.
What changed
init/1: match onFile.mkdir_p's result andFile.ls/1(non-raising) instead ofFile.ls!. On failure it logs the real reason viaLogger.error(e.g.:enospc,:enotdir) and boots anyway — never crashes the stack.delete/1disambiguation::prim_file.renamereturns{:error, :enoent}both when the source is already gone and when the trash dir is missing. The old code blindly read both as "already gone →:ok", silently reporting success while reclaiming nothing. It now probesFile.exists?/1to tell them apart.pending_sourcesinventory and retried (mkdir + rename) on a self-heal timer that is armed only while degraded and goes silent once healthy. A handed-off source's bytes are captured into the trash and reaped as soon as space frees up.Behavior note
On the degraded (full-disk) path,
delete/1now returns:okrather than{:error, …}. This is strictly safer: the old error tuple would have hitshape_cleaner.ex's:ok = Storage.cleanup!(...)hard-match and crashed the cleanup task. The shape is dropped from the index while its bytes are reclaimed asynchronously by the heal loop — consistent with the existing async-delete philosophy, with a slightly wider index/disk consistency window during a full disk.Testing
New
resilient boottest group (trash dir obstructed by a regular file at.electric_trash→mkdir_pfails with:enotdir, uid-independent):deletehands off a live source (returns:ok, source preserved) when the trash dir is missing;:ok;mix test test/electric/async_deleter_test.exs→ 9 tests, 0 failures (5 original + 4 new).mix compile --warnings-as-errorsclean.