fix(cubestore): repartition merge commits with unchecked swap (fixes RepartitionRange row-count regression) by paveltiunov · Pull Request #11091 · cube-js/cube

paveltiunov · 2026-06-16T04:03:09Z

Summary

PR #11088 introduced a merge-based repartition path (per_partition and range strategies). In production with range enabled this regressed with a flood of failing RepartitionRange jobs:

Error("Internal: Deactivated row count (484102) doesn't match activated row count (360063)
during swap of (5907991, 5908005) to (5971515, 5971547, 5971587) chunks")

Root cause

The new merge path (ChunkStore::merge_chunk_group_into_children) k-way merges a group of source chunks through merge_chunks, then commits the new child chunks. merge_chunks aggregates (for aggregate indexes — i.e. pre-aggregation rollups) and last-row-dedups (for unique-key tables) the group, so it legitimately emits fewer rows than it consumed.

The commit used the checked swap_chunks, which enforces activated_row_count == deactivated_row_count. For any aggregate/unique-key table the merge collapses rows, so the check rejected the swap and the repartition job failed permanently — the inactive parent never drained.

The legacy per-chunk path (partition_rows) never aggregated (it only routes rows by partition key), so the row count was always conserved there and the check passed — which is why this only surfaced on the new merge strategies.

Fix

Commit the merge with swap_chunks_without_check, exactly as compaction already does for its own dedup'd merges (compact_chunks_to_in_memory / compact_chunks_to_persistent). The row-count equality invariant does not hold for an aggregating/deduping merge, so enforcing it is incorrect for this path.

        self.meta_store
            .swap_chunks_without_check(old_chunk_ids, new_chunk_ids, replay_handle_id)
            .await?;

Tests

Following the request to reproduce the error first, then fix it:

Unit test store::tests::repartition_merge_aggregate_index_collapses_rows: builds an aggregate index whose two source chunks share every key, drives repartition_chunk_range, and asserts the children conserve the aggregated row count. Before the fix this panics with the exact production error: Deactivated row count (20) doesn't match activated row count (10) during swap of (1, 2) to (3, 4) chunks.
End-to-end SQL tests repartition_range_jobs_aggregate_index_keeps_data_consistent and repartition_merge_aggregate_index_keeps_data_consistent: aggregate-index table, dimension keys repeated across inserts, forced split, full scheduler → range-slicing → job-runner path. Before the fix the inactive parents never drain (jobs keep failing the swap); after it sum(m) and every per-key sum are conserved.

All 15 repartition* unit/SQL tests pass with the fix; the three new tests fail without it.

Risk

Mirrors the long-standing compaction behavior; default strategy remains per_chunk (unchanged). No metastore/schema changes.

The merge-based repartition path (per_partition / range strategies) commits its new chunks with swap_chunks, which enforces that the activated row count equals the deactivated row count. merge_chunks aggregates (aggregate indexes) and last-row-dedups (unique-key tables) the source group, so it legitimately emits fewer rows than it consumed. The checked swap then rejected the commit with "Deactivated row count (..) doesn't match activated row count (..) during swap", failing RepartitionRange / per-partition jobs. Commit with swap_chunks_without_check instead, matching how compaction commits its dedup'd merges. Adds a unit test on an aggregate index that reproduces the exact failure before the fix. Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

…dedup Adds range- and per-partition-strategy SQL tests on an aggregate-index table whose chunks share dimension keys across inserts. The repartition merge groups those rows by the sort key, so the swap activates fewer rows than it deactivates - the production RepartitionRange row-count-mismatch scenario. Without the unchecked-swap fix the repartition jobs never drain the inactive parents; with it the data stays consistent (sum(m) and per-key sums conserved). Co-authored-by: Pavel Tiunov <pavel.tiunov@gmail.com>

codecov · 2026-06-16T04:28:08Z

Codecov Report

✅ All modified and coverable lines are covered by tests.
⚠️ Please upload report for BASE (cubestore-chunk-repartition-speed-up@1ed20eb). Learn more about missing BASE report.

Additional details and impacted files

@@                           Coverage Diff                           @@
##             cubestore-chunk-repartition-speed-up   #11091   +/-   ##
=======================================================================
  Coverage                                        ?   58.50%           
=======================================================================
  Files                                           ?      216           
  Lines                                           ?    17270           
  Branches                                        ?     3524           
=======================================================================
  Hits                                            ?    10103           
  Misses                                          ?     6652           
  Partials                                        ?      515

Flag	Coverage Δ
cube-backend	`58.50% <ø> (?)`

Flags with carried forward coverage won't be shown. Click here to find out more.

☔ View full report in Codecov by Harness.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:

❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

cursoragent and others added 2 commits June 16, 2026 03:55

github-actions Bot added cube store Issues relating to Cube Store rust Pull requests that update Rust code labels Jun 16, 2026

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(cubestore): repartition merge commits with unchecked swap (fixes RepartitionRange row-count regression)#11091

fix(cubestore): repartition merge commits with unchecked swap (fixes RepartitionRange row-count regression)#11091
paveltiunov wants to merge 2 commits into
cubestore-chunk-repartition-speed-upfrom
cursor/fix-repartition-row-count-regression-b143

paveltiunov commented Jun 16, 2026

Uh oh!

codecov Bot commented Jun 16, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

paveltiunov commented Jun 16, 2026

Summary

Root cause

Fix

Tests

Risk

Uh oh!

codecov Bot commented Jun 16, 2026

Codecov Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants