fix(perf): TCP connection pool sized for global chunk-worker count#66
Merged
fix(perf): TCP connection pool sized for global chunk-worker count#66
Conversation
The puller can run up to fileParallelism × chunksPerFile chunk workers concurrently — each file has its own chunk semaphore but all files share the BlobPuller's connection pool. Pre-v0.4.1 the pool was sized to just chunksPerFile (default 8), so with the default fileParallelism=4 we had 8×4=32 chunk workers competing for 8 sockets. Symptom: the Performance modal's "Pool acquire wait" stat sat at p50 ~280 ms / p95 1.6 s — a quarter of every chunk's wall clock spent waiting for a free socket from a starved pool, not doing useful work. Reported by VirusAlex on a v0.4.0 live run. Fix: poolSize = chunksPerFile × fileParallelism. Gives a 1:1 socket-per- worker ratio so acquire() never blocks under steady-state load. The TCP server's MAX_CONCURRENT_CONNECTIONS=1024 cap is well above any sensible product (default 32; even pathological configs like 16×16=256 stay under). HTTP path is unaffected — java.net.http.HttpClient manages its own connection pool internally on virtual threads, so it never had this contention. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Bug
The puller can run up to
fileParallelism × chunksPerFilechunk workers concurrently — each file has its own chunk-semaphore but all files share the sameBlobPuller's connection pool. Pre-v0.4.1 the pool was sized at justchunksPerFile(default 8), so with the defaultfileParallelism=4we had 8 × 4 = 32 chunk workers competing for 8 sockets.Reported by VirusAlex on a v0.4.0 live transfer:
A quarter of every chunk's wall clock was spent blocked on
pool.acquire()instead of doing useful work.Fix
Gives a 1:1 socket-per-worker ratio so
acquire()never blocks under steady-state load. The TCP server'sMAX_CONCURRENT_CONNECTIONS=1024cap is well above any sensible product (default 32; even pathological configs like 16×16=256 stay under).HTTP path is unaffected —
java.net.http.HttpClientmanages its own connection pool internally on virtual threads, so it never had this contention.Test plan
mvn compileclean.mvn test -Dtest=ArchitectureTest8/8 pass.Pool acquire wait p50 281mson v0.4.0; expect that stat to drop to <10 ms.🤖 Generated with Claude Code