Skip to content

CLI to merge mature splits#29

Open
rdettai-sk wants to merge 10 commits intosekoiafrom
mature-merger
Open

CLI to merge mature splits#29
rdettai-sk wants to merge 10 commits intosekoiafrom
mature-merger

Conversation

@rdettai-sk
Copy link
Copy Markdown
Collaborator

Description

A cli tool that merges mature splits. It can be made into a service into the future.

How was this PR tested?

Tested on test and has some pretty good unit tests.

@rdettai-sk rdettai-sk changed the base branch from main to sekoia April 12, 2026 15:26
Copilot AI review requested due to automatic review settings April 12, 2026 20:14
Copy link
Copy Markdown

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a one-shot “mature merge” capability to Quickwit to merge fully-mature published splits across nodes, exposed via a new quickwit tool merge-mature CLI subcommand.

Changes:

  • Introduces a mature-merge planner (mature_merge_plan) that groups eligible mature splits (by day/partition/doc-mapping) into merge operations.
  • Implements an execution runner (mature_merge) that reuses the existing merge actor pipeline to run planned operations across indexes with configurable concurrency and dry-run output.
  • Extends CLI tooling and a few actor utilities/counters to support the new workflow (publisher counters, temp dir handling, merge permits).

Reviewed changes

Copilot reviewed 7 out of 7 changed files in this pull request and generated 10 comments.

Show a summary per file
File Description
quickwit/quickwit-indexing/src/mature_merge.rs New executor that scans indexes, plans merges, and runs the merge actor pipeline (plus dry-run logging and summary).
quickwit/quickwit-indexing/src/mature_merge_plan.rs New pure planning logic and unit tests for selecting/grouping eligible mature splits.
quickwit/quickwit-indexing/src/lib.rs Exposes the new mature-merge modules from quickwit-indexing.
quickwit/quickwit-indexing/src/actors/publisher.rs Adds counter for number of replaced splits and updates tests accordingly.
quickwit/quickwit-indexing/src/actors/merge_split_downloader.rs Uses TempDirectory::named_temp_child for scratch subdirectories.
quickwit/quickwit-indexing/src/actors/merge_scheduler_service.rs Adds MergePermit::new helper for externally managed concurrency.
quickwit/quickwit-cli/src/tool.rs Adds merge-mature subcommand, argument parsing, and optional metrics serving.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread quickwit/quickwit-indexing/src/mature_merge_plan.rs Outdated
Comment thread quickwit/quickwit-indexing/src/mature_merge_plan.rs Outdated
Comment thread quickwit/quickwit-indexing/src/mature_merge_plan.rs Outdated
Comment thread quickwit/quickwit-cli/src/tool.rs
Comment thread quickwit/quickwit-cli/src/tool.rs
Comment thread quickwit/quickwit-cli/src/tool.rs Outdated
Comment thread quickwit/quickwit-indexing/src/mature_merge.rs Outdated
}

Ok(IndexMergeOutcome {
num_published_merges: publisher_counters.num_replace_operations,
Copy link

Copilot AI Apr 12, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

num_published_merges is derived from publisher_counters.num_replace_operations, which may undercount merges that legitimately publish no new split (e.g. a merge that results in an empty output but still replaces input splits). Consider using operations.len() on success, or incorporate publisher_counters.num_empty_splits (or another explicit counter) so the reported merge count matches executed operations.

Suggested change
num_published_merges: publisher_counters.num_replace_operations,
num_published_merges: publisher_counters.num_replace_operations
+ publisher_counters.num_empty_splits,

Copilot uses AI. Check for mistakes.
Comment thread quickwit/quickwit-indexing/src/mature_merge_plan.rs Outdated
Comment thread quickwit/quickwit-indexing/src/mature_merge.rs
Comment thread quickwit/quickwit-cli/src/tool.rs
Comment on lines +117 to +120
let list_splits_query =
ListSplitsQuery::for_index(index_uid).with_split_state(SplitState::Published);
let list_splits_request = ListSplitsRequest::try_from_list_splits_query(&list_splits_query)?;
let splits_stream = metastore.list_splits(list_splits_request).await?;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Wouldn't it be more efficient to filter as much as possible directly in the metastore ?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have added a filter on mature splits, but for the rest (e.g split sizes) I would need to add a method to the metastore. I think we can do that as an optimization later.

Comment thread quickwit/quickwit-indexing/src/mature_merge.rs Outdated
Comment thread quickwit/quickwit-indexing/src/mature_merge_plan.rs Outdated
Comment thread quickwit/quickwit-indexing/src/mature_merge_plan.rs Outdated
Comment thread quickwit/quickwit-indexing/src/mature_merge_plan.rs Outdated
Comment thread quickwit/quickwit-indexing/src/mature_merge_plan.rs Outdated
Comment thread quickwit/quickwit-indexing/src/mature_merge_plan.rs Outdated
continue;
}

let day_bucket = start_day * SECS_PER_DAY;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why is it needed ?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants