[Cosmos] Replace per-client schedulers with shared CosmosSchedulers to fix thread scaling#49062
Conversation
db98eb9 to
c42cfe8
Compare
|
@sdkReviewAgent |
There was a problem hiding this comment.
Pull request overview
This PR addresses a thread-scaling issue in the Cosmos Java SDK where per-client Schedulers.newSingle() usage caused background thread counts to grow linearly with the number of client instances. It introduces shared schedulers in CosmosSchedulers and updates background work to run on those shared schedulers instead of allocating dedicated per-client threads.
Changes:
- Added shared bounded-elastic schedulers to
CosmosSchedulersfor Global Endpoint Manager refresh and per-partition availability checks. - Updated
GlobalEndpointManagerbackground refresh to use the shared scheduler and removed per-instance scheduler disposal. - Updated
GlobalPartitionEndpointManagerForPerPartitionCircuitBreakerto use the shared scheduler and removed per-instance scheduler disposal.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/CosmosSchedulers.java |
Adds shared bounded-elastic schedulers for endpoint refresh and partition availability checks. |
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/GlobalEndpointManager.java |
Switches background location refresh work from per-instance single scheduler to shared bounded-elastic scheduler. |
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/perPartitionCircuitBreaker/GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker.java |
Switches staleness check work from per-instance single scheduler to shared bounded-elastic scheduler. |
|
✅ Review complete (47:35) No new comments — existing review coverage is sufficient. Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage |
…ead scaling Thread count for 'global-ep-mgr' and 'partition-availability-staleness-check' threads was scaling linearly with tenant/client count because both GlobalEndpointManager and GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker created per-instance Schedulers.newSingle() schedulers. Changes: - Add GLOBAL_ENDPOINT_MANAGER_BOUNDED_ELASTIC and PARTITION_AVAILABILITY_CHECK_BOUNDED_ELASTIC shared schedulers to CosmosSchedulers - GlobalEndpointManager: Replace per-instance scheduler with shared scheduler, track background refresh Disposable via AtomicReference for immediate cleanup on close(). Use getAndSet() to atomically dispose old subscriptions on reschedule. - GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker: Replace per-instance scheduler with shared scheduler, track recovery Disposable via AtomicReference for immediate cleanup on close(). Use compareAndSet on isPartitionRecoveryTaskRunning to prevent duplicate background tasks under concurrent init() calls. Shared BoundedElastic schedulers reuse threads with 60s TTL, preventing thread count from growing with client count while still supporting concurrent background tasks from multiple clients. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
c42cfe8 to
02b230e
Compare
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
|
/azp run java - cosmos - tests |
|
Azure Pipelines successfully started running 1 pipeline(s). |
kushagraThapar
left a comment
There was a problem hiding this comment.
LGTM, thanks @xinlian12
Problem
PR testing revealed that
global-ep-mgrandpartition-availability-staleness-checkthread counts increase linearly with tenant/client count because bothGlobalEndpointManagerandGlobalPartitionEndpointManagerForPerPartitionCircuitBreakercreate per-instanceSchedulers.newSingle()schedulers.With N clients -> N dedicated threads for each component -> 2N extra threads just for background location refresh and circuit breaker staleness checks.
Solution
Replace per-instance
Schedulers.newSingle()with shared staticBoundedElasticschedulers inCosmosSchedulers, following the existing pattern used forCOSMOS_PARALLEL,TRANSPORT_RESPONSE_BOUNDED_ELASTIC, etc.Changes
CosmosSchedulers.javaGLOBAL_ENDPOINT_MANAGER_BOUNDED_ELASTICshared schedulerPARTITION_AVAILABILITY_CHECK_BOUNDED_ELASTICshared schedulerGlobalEndpointManager.javaSchedulers.newSingle(CosmosDaemonThreadFactory)withCosmosSchedulers.GLOBAL_ENDPOINT_MANAGER_BOUNDED_ELASTICDisposableviaAtomicReferencewithgetAndSet()to atomically clean up old subscriptions on concurrent callsclose()cancels the tracked subscription instead of disposing the schedulerGlobalPartitionEndpointManagerForPerPartitionCircuitBreaker.javaSchedulers.newSingle("partition-availability-staleness-check")withCosmosSchedulers.PARTITION_AVAILABILITY_CHECK_BOUNDED_ELASTICDisposableviaAtomicReferencefor consistent cleanup onclose()Design Decisions
close()startRefreshLocationTimerAsync()is called concurrentlyisClosedguards in both classes provide additional protection against post-close workBenchmark Results Thread Scaling Fix Validation
Config: Gateway mode, ReadThroughput, concurrency=64, 10min per run, accounts lx1-lx28 (cycling modulo 28), host-pinned
Branches:
upstream/mainvsxinlian12/fix/shared-schedulers-thread-scaling1. Throughput: main vs fix (H1 ReadThroughput, steady-state)
H2:
2. Thread Scaling (PEAK, H1 ReadThroughput)
3. Thread Pool Breakdown (PEAK, H1 ReadThroughput)
4. Key Findings
global-ep-mgreliminated: 0 per-client threads across all tenant counts (was 1:1 on main)partition-availcapped at ~160: shared BoundedElastic pool reuses threads with 60s TTL (was 1:1 on main)global-endpoint-manager-bounded-elasticcapped at ~160: replacement shared pool