Skip to content

fix(perf): TCP connection pool sized for global chunk-worker count#66

Merged
VirusAlex merged 1 commit intomainfrom
fix/tcp-pool-size
May 1, 2026
Merged

fix(perf): TCP connection pool sized for global chunk-worker count#66
VirusAlex merged 1 commit intomainfrom
fix/tcp-pool-size

Conversation

@VirusAlex
Copy link
Copy Markdown
Owner

Bug

The puller can run up to fileParallelism × chunksPerFile chunk workers concurrently — each file has its own chunk-semaphore but all files share the same BlobPuller's connection pool. Pre-v0.4.1 the pool was sized at just chunksPerFile (default 8), so with the default fileParallelism=4 we had 8 × 4 = 32 chunk workers competing for 8 sockets.

Reported by VirusAlex on a v0.4.0 live transfer:

Pool acquire wait p50: 281 ms · p95: 1.6 s · max: 2.3 s

A quarter of every chunk's wall clock was spent blocked on pool.acquire() instead of doing useful work.

Fix

- yield new TcpBlobPuller(host, job.peerTcpPort(), peerToken,
-     Math.max(1, job.chunksPerFile()), bytesObserver);
+ int poolSize = Math.max(1, job.chunksPerFile())
+              * Math.max(1, job.fileParallelism());
+ yield new TcpBlobPuller(host, job.peerTcpPort(), peerToken,
+     poolSize, bytesObserver);

Gives a 1:1 socket-per-worker ratio so acquire() never blocks under steady-state load. The TCP server's MAX_CONCURRENT_CONNECTIONS=1024 cap is well above any sensible product (default 32; even pathological configs like 16×16=256 stay under).

HTTP path is unaffected — java.net.http.HttpClient manages its own connection pool internally on virtual threads, so it never had this contention.

Test plan

  • Local mvn compile clean.
  • Local mvn test -Dtest=ArchitectureTest 8/8 pass.
  • CI green.
  • Manual: re-run the same transfer that showed Pool acquire wait p50 281ms on v0.4.0; expect that stat to drop to <10 ms.

🤖 Generated with Claude Code

The puller can run up to fileParallelism × chunksPerFile chunk workers
concurrently — each file has its own chunk semaphore but all files share
the BlobPuller's connection pool. Pre-v0.4.1 the pool was sized to just
chunksPerFile (default 8), so with the default fileParallelism=4 we had
8×4=32 chunk workers competing for 8 sockets.

Symptom: the Performance modal's "Pool acquire wait" stat sat at
p50 ~280 ms / p95 1.6 s — a quarter of every chunk's wall clock spent
waiting for a free socket from a starved pool, not doing useful work.
Reported by VirusAlex on a v0.4.0 live run.

Fix: poolSize = chunksPerFile × fileParallelism. Gives a 1:1 socket-per-
worker ratio so acquire() never blocks under steady-state load. The TCP
server's MAX_CONCURRENT_CONNECTIONS=1024 cap is well above any sensible
product (default 32; even pathological configs like 16×16=256 stay under).

HTTP path is unaffected — java.net.http.HttpClient manages its own
connection pool internally on virtual threads, so it never had this
contention.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@VirusAlex VirusAlex merged commit 3956ceb into main May 1, 2026
1 check passed
@VirusAlex VirusAlex deleted the fix/tcp-pool-size branch May 1, 2026 08:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants