[Cosmos] Replace per-client schedulers with shared CosmosSchedulers to fix thread scaling by xinlian12 · Pull Request #49062 · Azure/azure-sdk-for-java

xinlian12 · 2026-05-05T22:29:11Z

Problem

PR testing revealed that global-ep-mgr and partition-availability-staleness-check thread counts increase linearly with tenant/client count because both GlobalEndpointManager and GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker create per-instance Schedulers.newSingle() schedulers.

With N clients -> N dedicated threads for each component -> 2N extra threads just for background location refresh and circuit breaker staleness checks.

Solution

Replace per-instance Schedulers.newSingle() with shared static BoundedElastic schedulers in CosmosSchedulers, following the existing pattern used for COSMOS_PARALLEL, TRANSPORT_RESPONSE_BOUNDED_ELASTIC, etc.

Changes

CosmosSchedulers.java

Added GLOBAL_ENDPOINT_MANAGER_BOUNDED_ELASTIC shared scheduler
Added PARTITION_AVAILABILITY_CHECK_BOUNDED_ELASTIC shared scheduler

GlobalEndpointManager.java

Replaced per-instance Schedulers.newSingle(CosmosDaemonThreadFactory) with CosmosSchedulers.GLOBAL_ENDPOINT_MANAGER_BOUNDED_ELASTIC
Track background refresh Disposable via AtomicReference with getAndSet() to atomically clean up old subscriptions on concurrent calls
close() cancels the tracked subscription instead of disposing the scheduler

GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker.java

Replaced per-instance Schedulers.newSingle("partition-availability-staleness-check") with CosmosSchedulers.PARTITION_AVAILABILITY_CHECK_BOUNDED_ELASTIC
Track recovery Disposable via AtomicReference for consistent cleanup on close()

Design Decisions

BoundedElastic over Single -- supports concurrent background tasks from multiple clients; threads auto-reclaim with 60s TTL
Disposable tracking -- shared schedulers cannot be disposed per-client, so background subscriptions are tracked and cancelled individually in close()
AtomicReference.getAndSet() -- prevents Disposable leaks when startRefreshLocationTimerAsync() is called concurrently
Existing isClosed guards in both classes provide additional protection against post-close work

Benchmark Results Thread Scaling Fix Validation

Config: Gateway mode, ReadThroughput, concurrency=64, 10min per run, accounts lx1-lx28 (cycling modulo 28), host-pinned
Branches: upstream/main vs xinlian12/fix/shared-schedulers-thread-scaling

1. Throughput: main vs fix (H1 ReadThroughput, steady-state)

Tenants	main (ops/s)	fix (ops/s)	Delta
4	33,895	33,762	-0%
16	30,779	30,398	-1%
64	29,212	29,279	+0%
256	28,427	27,798	-2%
896	FAIL	27,009	fix succeeds

H2:

Tenants	main (ops/s)	fix (ops/s)	Delta
4	31,199	30,959	-1%
16	28,323	28,305	-0%
64	27,175	26,757	-2%
256	25,261	25,791	+2%
896	18,855	20,766	+10%

2. Thread Scaling (PEAK, H1 ReadThroughput)

Tenants	main	fix	Reduction
4	218	223	-2%
16	255	267	-5%
64	354	364	-3%
256	748	552	26%
896	1,748	544	69%

3. Thread Pool Breakdown (PEAK, H1 ReadThroughput)

Branch	T	part-avail	global-ep	glob-ep-bounded	bench-disp	transport	bulk-exec	reactor-ep	TOTAL
main	4	4	4	0	68	48	38	16	232
fix	4	4	0	5	67	47	41	16	236
main	16	16	16	0	67	43	39	16	255
fix	16	16	0	30	67	52	39	16	267
main	64	64	64	0	67	45	40	16	354
fix	64	64	0	65	67	49	43	16	364
main	256	256	256	0	67	54	42	16	749
fix	256	160	0	160	67	48	42	16	553
main	896	814	815	0	0	15	42	16	1,760
fix	896	160	0	160	67	43	42	16	548

4. Key Findings

global-ep-mgr eliminated: 0 per-client threads across all tenant counts (was 1:1 on main)
partition-avail capped at ~160: shared BoundedElastic pool reuses threads with 60s TTL (was 1:1 on main)
global-endpoint-manager-bounded-elastic capped at ~160: replacement shared pool
Thread count flat at 256-896t: fix holds at ~548 threads regardless (main grows 748-1,760)
No throughput regression at 4-256t: -2% to +2% (within noise)
896t-H1 now works: main failed with client creation timeout, fix succeeds with 27K ops/s
896t-H2: +10% throughput improvement: 18,855 to 20,766 ops/s from reduced thread overhead

xinlian12 · 2026-05-05T22:38:30Z

@sdkReviewAgent

Copilot

Pull request overview

This PR addresses a thread-scaling issue in the Cosmos Java SDK where per-client Schedulers.newSingle() usage caused background thread counts to grow linearly with the number of client instances. It introduces shared schedulers in CosmosSchedulers and updates background work to run on those shared schedulers instead of allocating dedicated per-client threads.

Changes:

Added shared bounded-elastic schedulers to CosmosSchedulers for Global Endpoint Manager refresh and per-partition availability checks.
Updated GlobalEndpointManager background refresh to use the shared scheduler and removed per-instance scheduler disposal.
Updated GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker to use the shared scheduler and removed per-instance scheduler disposal.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File	Description
`sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/CosmosSchedulers.java`	Adds shared bounded-elastic schedulers for endpoint refresh and partition availability checks.
`sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/GlobalEndpointManager.java`	Switches background location refresh work from per-instance single scheduler to shared bounded-elastic scheduler.
`sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/perPartitionCircuitBreaker/GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker.java`	Switches staleness check work from per-instance single scheduler to shared bounded-elastic scheduler.

xinlian12 · 2026-05-05T23:26:58Z

✅ Review complete (47:35)

No new comments — existing review coverage is sufficient.

_{Steps: ✓ context, correctness, cross-sdk, design, history, past-prs, synthesis, test-coverage}

…ead scaling Thread count for 'global-ep-mgr' and 'partition-availability-staleness-check' threads was scaling linearly with tenant/client count because both GlobalEndpointManager and GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker created per-instance Schedulers.newSingle() schedulers. Changes: - Add GLOBAL_ENDPOINT_MANAGER_BOUNDED_ELASTIC and PARTITION_AVAILABILITY_CHECK_BOUNDED_ELASTIC shared schedulers to CosmosSchedulers - GlobalEndpointManager: Replace per-instance scheduler with shared scheduler, track background refresh Disposable via AtomicReference for immediate cleanup on close(). Use getAndSet() to atomically dispose old subscriptions on reschedule. - GlobalPartitionEndpointManagerForPerPartitionCircuitBreaker: Replace per-instance scheduler with shared scheduler, track recovery Disposable via AtomicReference for immediate cleanup on close(). Use compareAndSet on isPartitionRecoveryTaskRunning to prevent duplicate background tasks under concurrent init() calls. Shared BoundedElastic schedulers reuse threads with 60s TTL, preventing thread count from growing with client count while still supporting concurrent background tasks from multiple clients. Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 · 2026-05-06T16:28:06Z

/azp run java - cosmos - tests

azure-pipelines · 2026-05-06T16:28:38Z

Azure Pipelines successfully started running 1 pipeline(s).

FabianMeiswinkel

LGTM

kushagraThapar

LGTM, thanks @xinlian12

github-actions Bot added the Cosmos label May 5, 2026

xinlian12 force-pushed the fix/shared-schedulers-thread-scaling branch from db98eb9 to c42cfe8 Compare May 5, 2026 22:36

xinlian12 changed the title ~~[Cosmos] Replace per-client schedulers with shared CosmosSchedulers to fix thread scaling~~ [Cosmos][NO Review] Replace per-client schedulers with shared CosmosSchedulers to fix thread scaling May 5, 2026

xinlian12 marked this pull request as ready for review May 5, 2026 22:38

Copilot AI review requested due to automatic review settings May 5, 2026 22:38

xinlian12 requested review from a team and kirankumarkolli as code owners May 5, 2026 22:38

Copilot started reviewing on behalf of xinlian12 May 5, 2026 22:39 View session

Copilot AI reviewed May 5, 2026

View reviewed changes

xinlian12 force-pushed the fix/shared-schedulers-thread-scaling branch from c42cfe8 to 02b230e Compare May 6, 2026 04:19

Update CHANGELOG for shared scheduler thread scaling fix

ce677db

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>

xinlian12 changed the title ~~[Cosmos][NO Review] Replace per-client schedulers with shared CosmosSchedulers to fix thread scaling~~ [Cosmos] Replace per-client schedulers with shared CosmosSchedulers to fix thread scaling May 6, 2026

FabianMeiswinkel approved these changes May 6, 2026

View reviewed changes

kushagraThapar approved these changes May 7, 2026

View reviewed changes

xinlian12 merged commit 9ca9e54 into Azure:main May 7, 2026
104 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Cosmos] Replace per-client schedulers with shared CosmosSchedulers to fix thread scaling#49062

[Cosmos] Replace per-client schedulers with shared CosmosSchedulers to fix thread scaling#49062
xinlian12 merged 2 commits intoAzure:mainfrom
xinlian12:fix/shared-schedulers-thread-scaling

xinlian12 commented May 5, 2026 •

edited

Loading

Uh oh!

xinlian12 commented May 5, 2026

Uh oh!

Copilot AI left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xinlian12 commented May 5, 2026

Uh oh!

xinlian12 commented May 6, 2026

Uh oh!

azure-pipelines Bot commented May 6, 2026

Uh oh!

FabianMeiswinkel left a comment

Uh oh!

kushagraThapar left a comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Conversation

xinlian12 commented May 5, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Problem

Solution

Changes

Design Decisions

Benchmark Results Thread Scaling Fix Validation

1. Throughput: main vs fix (H1 ReadThroughput, steady-state)

2. Thread Scaling (PEAK, H1 ReadThroughput)

3. Thread Pool Breakdown (PEAK, H1 ReadThroughput)

4. Key Findings

Uh oh!

xinlian12 commented May 5, 2026

Uh oh!

Copilot AI left a comment

Choose a reason for hiding this comment

Pull request overview

Reviewed changes

Uh oh!

Uh oh!

Uh oh!

Uh oh!

xinlian12 commented May 5, 2026

Uh oh!

xinlian12 commented May 6, 2026

Uh oh!

azure-pipelines Bot commented May 6, 2026

Uh oh!

FabianMeiswinkel left a comment

Choose a reason for hiding this comment

Uh oh!

kushagraThapar left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

xinlian12 commented May 5, 2026 •

edited

Loading