Minor change to how docker cleans up resources#184
Open
Conversation
There was a problem hiding this comment.
Pull request overview
Adjusts the test node Docker teardown behavior to reduce cleanup failures during --init, addressing containers restarting during docker compose down.
Changes:
- Update
docker compose downinvocation to remove orphans and use a shorter, explicit shutdown timeout. - Force-remove leftover containers and volumes during init cleanup.
- Change
restartpolicy forblockscoutandpostgresfromalwaystounless-stoppedto prevent restart during teardown.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 1 comment.
| File | Description |
|---|---|
test-node.bash |
Makes init cleanup more aggressive (remove-orphans/timeout + force removal) to avoid teardown races. |
docker-compose.yaml |
Adjusts restart policies for key services to reduce restart-induced teardown failures. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
- Add try/finally around all WebSocket provider lifecycles to prevent connection leaks on errors (ethcommands.ts) - Replace fragile `tail | tr` output parsing with capture_output and capture_last_field helpers that fail on empty results (test-node.bash) - Replace `sleep 45` with deterministic wait_for_chain_progress that confirms block production before proceeding (test-node.bash) - Add timeouts to previously-infinite polling loops in bridgeFunds, bridgeNativeToken, and waitForSync (ethcommands.ts) - Add Docker healthchecks for geth, sequencer, redis, postgres and use service_healthy conditions in depends_on for proper startup ordering (docker-compose.yaml) - Track background PIDs and clean up on exit via trap (test-node.bash) - Add Redis connection cleanup via finally blocks (redis.ts) - Separate l3node-data volume from validator-data (docker-compose.yaml) - Replace sed-based JSON rewriting with jq for timeboost config - Move S3 credential warning to point of use instead of module level - Add .gitignore, enable set -euo pipefail, fix --dev-contracts shift bug, fix yargs l3 boolean type, validate flag dependencies post-parse Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Member
|
Added a big update, go ahead and test and see what you think |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
I had claude do some analysis on our docker usage and here is what it found:
Now I have a clear picture. Here's my analysis and fix:
Analysis
The error occurs during --init cleanup at lines 462-467 of test-node.bash. The root cause is a race condition:
The fix has two parts:
Use docker compose down --remove-orphans --timeout 10 to give containers time and clean up orphans.
Change the leftover container cleanup to force-remove (docker rm -f), since any container still alive at this point is one that down couldn't stop (due to restart policy) and needs to
be killed.
⏺ Update(test-node.bash)
⎿ Added 3 lines, removed 3 lines
464 - docker compose down
464 + docker compose down --remove-orphans --timeout 10 2>/dev/null || true
⏺ Now change restart: always to restart: unless-stopped for blockscout and postgres. This prevents them from restarting during a docker compose down while still auto-restarting on crashes
during normal operation.
⏺ Update(docker-compose.yaml)
⎿ Added 1 line, removed 1 line
6 image: blockscout-testnode
7 - restart: always
7 + restart: unless-stopped
⏺ Update(docker-compose.yaml)
⎿ Added 1 line, removed 1 line
30
31 postgres:
32 image: postgres:13.6
33 - restart: always
33 + restart: unless-stopped
⏺ Summary of changes
Root cause: restart: always on blockscout and postgres causes Docker to restart them immediately after docker compose down sends SIGTERM, so the containers are running again when the
script tries to remove them and the network.
Changes made:
allows docker compose down / docker stop to stop them permanently without restart.
- Added --remove-orphans --timeout 10 to docker compose down — cleans up orphaned containers and gives services 10 seconds for graceful shutdown
- Added 2>/dev/null || true so a partial failure in down doesn't abort the script (the subsequent force-remove handles stragglers)
- Changed docker rm to docker rm -f — force-kills any leftover containers that survived down (e.g., from a previous interrupted run)
- Changed docker volume rm to docker volume rm -f — same defensive approach for volumes still referenced by dying containers