Skip to content

[NO REVIEW]fix: optimize ChangeFeed partition planning from O(P*T log T) to O(T log T + P log T)#49084

Closed
xinlian12 wants to merge 9 commits intoAzure:mainfrom
xinlian12:fix/issue-49023-changefeed-planning-perf
Closed

[NO REVIEW]fix: optimize ChangeFeed partition planning from O(P*T log T) to O(T log T + P log T)#49084
xinlian12 wants to merge 9 commits intoAzure:mainfrom
xinlian12:fix/issue-49023-changefeed-planning-perf

Conversation

@xinlian12
Copy link
Copy Markdown
Member

@xinlian12 xinlian12 commented May 7, 2026

Closes #49023

Summary

Eliminate the quadratic O(P T log T) complexity in ChangeFeedState.extractForEffectiveRange by introducing a batch API extractForEffectiveRanges(List<Range>) that sorts continuation tokens once and uses binary search for overlap detection, reducing the total complexity to O(T log T + P log T).

Problem

The Cosmos Spark change feed connector spent excessive driver time planning microbatches when reading containers with many feed ranges / continuation tokens (>30k). The hot path ChangeFeedMicroBatchStream.planInputPartitions repeatedly copied and sorted the full continuation-token list for each planned partition, resulting in O(P T log T) behavior that could stall planning for >15 minutes.

Solution

  • Batch API: New extractForEffectiveRanges(List<Range<String>>) method sorts tokens once (O(T log T)), then uses binary search per range (O(log T)) following the established InMemoryCollectionRoutingMap.getOverlappingRanges pattern.
  • No internal mutable state: Removed the volatile cachedSortedTokensSnapshot field and all cache invalidation machinery. The sorted token list is local to each batch call no cache, no invalidation risks, no race conditions.
  • Single-range delegates to batch: extractForEffectiveRange(range) now delegates to extractForEffectiveRanges(singletonList(range)), maintaining backward compatibility with a single code path.
  • Spark hot loop optimized: ChangeFeedBatch.planInputPartitions now calls the batch API once instead of calling extractForEffectiveRange per partition.

Files Changed

  • ChangeFeedState.java Added extractForEffectiveRanges, removed cache machinery (SortedTokensSnapshot, MinMaxAccumulator, collectOverlapping, findFirstPotentialOverlapIndex)
  • SparkBridgeImplementationInternal.scala Added extractChangeFeedStateForRanges batch bridge method
  • ChangeFeedBatch.scala Hot loop uses batch API
  • ChangeFeedStateTest.java Updated tests for batch API, added single/batch parity test
  • CHANGELOG.md Updated entry

Generated by coding-agent-harness

Annie Liang and others added 4 commits May 6, 2026 16:30
… O(T log T + P log T)

Cache sorted continuation tokens once per ChangeFeedState instance and use
binary search to find overlapping ranges, eliminating redundant copy+sort
per partition in Spark's planInputPartitions hot path.

Changes:
- Add SortedTokensSnapshot cache with identity-based invalidation
- Replace linear scan with binary search for first overlapping token
- Add fallback to full scan for non-contiguous/legacy ranges
- Add comprehensive tests: correctness, caching, large-scale (10k tokens),
  edge cases (null continuation, single token, boundary crossover, full range)

For 10,000 tokens and partitions, total extraction time drops from minutes
to ~400ms.

Implements Azure#49023

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…x partial-miss fallback, add test coverage

- F1: Fallback now scans indices before startIndex even when primary found results,
  catching partial misses from overlapping/non-contiguous ranges
- F2: Extracted shared collectOverlapping() method, eliminating DRY violation
- F3: Added fallback tests (complete miss + partial miss) with overlapping token ranges
- F4: Renamed misleading _noOverlap test to _lastTokenExactMatch, added real no-overlap test
- F5: Increased perf test threshold to 30s with CI-fragility comment
- F6: Wrapped cached sorted list with Collections.unmodifiableList()
- F7: Added comment explaining intentional reference equality for cache invalidation
- F8: Replaced size() > 0 with !isEmpty() via the new helper method
- F10: Added test with unsorted input to exercise the sort path

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
… thread-safety and early-break, mark cache transient

- F1: Add test verifying setContinuation() invalidates cached snapshot
- F2: Document benign race and thread-safety intent on volatile field
- F3: Document early-break contiguity assumption in collectOverlapping
- F4: Mark cachedSortedTokensSnapshot field transient

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
…inuation throw test, document setContinuation contract, tighten fallback comment, replace String[] with MinMaxAccumulator, include elapsed time in assertion

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copilot AI review requested due to automatic review settings May 7, 2026 01:01
@xinlian12 xinlian12 requested review from a team and kirankumarkolli as code owners May 7, 2026 01:01
@github-actions github-actions Bot added the Cosmos label May 7, 2026
Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR optimizes Cosmos change feed partition planning by avoiding repeated sorting of continuation tokens during ChangeFeedState.extractForEffectiveRange, reducing planning complexity for large numbers of feed ranges/tokens.

Changes:

  • Added a lazily-initialized, cached sorted snapshot of continuation tokens and a binary-search-based overlap scan in ChangeFeedState.
  • Added/extended unit tests to validate correctness across repeated calls, edge cases, and fallback behavior.
  • Documented the performance fix in the Cosmos SDK changelog.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 4 comments.

File Description
sdk/cosmos/azure-cosmos/src/main/java/com/azure/cosmos/implementation/changefeed/common/ChangeFeedState.java Introduces cached sorted-token snapshot, binary search start index, and overlap-collection logic with fallback.
sdk/cosmos/azure-cosmos/CHANGELOG.md Adds a bug-fix entry describing the planning performance improvement.
sdk/cosmos/azure-cosmos-tests/src/test/java/com/azure/cosmos/implementation/ChangeFeedStateTest.java Adds extensive tests for correctness, caching behavior, and fallback scenarios (includes a large-scale timing assertion).

Comment thread sdk/cosmos/azure-cosmos/CHANGELOG.md Outdated
…tability comment, reduce perf test scale

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12 xinlian12 changed the title fix: optimize ChangeFeed partition planning from O(P*T log T) to O(T log T + P log T) [NO REVIEW]fix: optimize ChangeFeed partition planning from O(P*T log T) to O(T log T + P log T) May 7, 2026
Annie Liang and others added 2 commits May 6, 2026 20:24
…ectiveRanges

- Add extractForEffectiveRanges(List<Range>) that sorts tokens once and
  uses binary search per range, following InMemoryCollectionRoutingMap pattern
- Single-range extractForEffectiveRange delegates to batch method
- Remove volatile cachedSortedTokensSnapshot, SortedTokensSnapshot,
  MinMaxAccumulator, collectOverlapping, findFirstPotentialOverlapIndex
- Update Spark ChangeFeedBatch hot loop to use batch API
- Add batch bridge method extractChangeFeedStateForRanges in SparkBridgeImplementationInternal
- Add parity test between single and batch extraction
- Complexity: O(T log T + P log T) with zero internal mutable state

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Simulates realistic Spark planning with 30K feed ranges. Compares:
- Batch API (extractForEffectiveRanges): sort once + binary search per range
- Single-call loop (extractForEffectiveRange per range): sort per call
Logs timing for manual inspection and asserts batch completes < 30s.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12 xinlian12 force-pushed the fix/issue-49023-changefeed-planning-perf branch from 6ead43b to 36468c3 Compare May 7, 2026 04:05
…tedTokensAndRanges

- Replace two binary searches (minIndex + maxIndex) with single binary search
  for start position + forward scan with early break on non-overlap
- Remove SortedTokensAndRanges inner class; use List<CompositeContinuationToken> directly
- Use CompositeContinuationToken(null, range) as binary search key (null token is valid)
- Equally efficient for non-overlapping contiguous ranges (Cosmos DB contract)

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
@xinlian12
Copy link
Copy Markdown
Member Author

Closing to recreate with cleaner approach batch API with simplified implementation.

@xinlian12 xinlian12 closed this May 7, 2026
@xinlian12 xinlian12 deleted the fix/issue-49023-changefeed-planning-perf branch May 7, 2026 04:38
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[BUG]: Spark Cosmos connector has quadratic behaviour on planning based on partitions/continuation tokens

2 participants