Skip to content

feat(drive): expand count-index group-by carrier shapes (G1a/G1b/G8a-c)#3652

Open
QuantumExplorer wants to merge 1 commit into
v3.1-devfrom
feat/count-index-group-by-carrier-shapes
Open

feat(drive): expand count-index group-by carrier shapes (G1a/G1b/G8a-c)#3652
QuantumExplorer wants to merge 1 commit into
v3.1-devfrom
feat/count-index-group-by-carrier-shapes

Conversation

@QuantumExplorer
Copy link
Copy Markdown
Member

@QuantumExplorer QuantumExplorer commented May 15, 2026

Issue being fixed or feature implemented

Extends the count-index group-by surface introduced alongside #3633 with new fixture coverage for edge cases of the carrier-aggregate-on-range (ACOR) and In-fanout paths, plus the rs-drive infrastructure those fixtures exercise.

What was done?

New bench fixtures + matching rs-drive support + book chapter:

  • G1aIn on byBrand with one absent value. Proof grows by ~255 B for the absence subproof; verifier omits the absent branch from Entries.
  • G1b — high-fanout In on byBrand (|IN| = B = 100). Same shape as G1, scaled up; reveals every byBrand entry.
  • G8a — bounded carrier + bounded ACOR (range on both axes) with descending walk. Same carrier shape as G8, different op variants on both range commitments.
  • G8b / G8c — rejection shapes for the two-range carrier with group_by = [brand, color] and group_by = []. Carrier is opened only for GroupByRange + single-field group_by.

Touched code:

  • packages/rs-drive/src/query/drive_document_count_query/drive_dispatcher.rs — mode-detection from clauses + per-mode dispatch for the new variants.
  • packages/rs-drive/src/query/drive_document_count_query/{execute_range_count,path_query,tests}.rs and executors/range_aggregate_carrier_proof.rs — executor side.
  • packages/rs-drive/src/query/mod.rs — query plumbing.
  • packages/rs-drive/src/verify/document_count/verify_carrier_aggregate_count_proof/{mod,v0/mod}.rs — verifier side.
  • packages/rs-drive/src/drive/contract/insert/insert_contract/v0/mod.rs — count-tree wiring needed by the new fixtures.
  • packages/rs-drive/benches/document_count_worst_case.rs — fixture bodies.
  • book/src/drive/count-index-group-by-examples.md — chapter expanded with the new G* sections (proof sizes, verified shapes, tree-walk descriptions).

How Has This Been Tested?

  • cargo test -p drive (executor + verifier paths for the new variants)
  • Bench fixture (document_count_worst_case) regenerated; proof sizes match the documented G* numbers in the book chapter.

Breaking Changes

None. Drive-internal additions on top of the v1 count surface; no wire-protocol or consensus changes.

Checklist:

  • I have performed a self-review of my own code
  • I have commented my code, particularly in hard-to-understand areas
  • I have added or updated relevant unit/integration/functional/e2e tests
  • I have added "!" to the title and described breaking changes in the corresponding section if my code contains any
  • I have made corresponding changes to the documentation if needed

Summary by CodeRabbit

  • New Features

    • Added support for directional ordering (ascending and descending) in document count queries.
  • Improvements

    • Enhanced request handling for document counting with improved type safety.
    • Expanded test coverage for document count edge cases and worst-case scenarios.

Review Change Stack

New count-query fixtures and the rs-drive infrastructure that proves
them:

- G1a — In on byBrand with one absent value (proof grows by ~255 B
  absence subproof; verifier omits absent branches from Entries).
- G1b — high-fanout In on byBrand (|IN| = B = 100); same shape as G1
  scaled up.
- G8a — bounded carrier + bounded ACOR (range on both axes) with
  descending walk; same carrier shape as G8 with different op
  variants on both range commitments.
- G8b/G8c — rejection shapes for the two-range carrier with
  group_by = [brand, color] and group_by = [] respectively (carrier
  is opened only for GroupByRange + single-field group_by).

Touches the dispatcher, range-aggregate carrier proof executor,
path-query builder, verify side, and contract-insert wiring for the
new count-tree variants. Book chapter updated to match.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions github-actions Bot added this to the v3.1.0 milestone May 15, 2026
@github-actions
Copy link
Copy Markdown
Contributor

📖 Book Preview built successfully.

Download the preview from the workflow artifacts.
To view locally: download the artifact, unzip, and open index.html.

Updated at 2026-05-15T13:37:43.834Z

@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented May 15, 2026

📝 Walkthrough

Walkthrough

This PR migrates the document count request API from CBOR-encoded raw values to structured typed clauses, adds directional traversal support for descending order-by scenarios, and expands benchmark coverage for group-by proof cases. The dispatcher validates and canonicalizes where clauses, merging same-field range pairs; proof generation now respects iteration direction via a left_to_right parameter threaded through the carrier aggregate pipeline.

Changes

Document Count Request API Modernization

Layer / File(s) Summary
DocumentCountRequest schema and dispatcher wiring
packages/rs-drive/src/query/drive_document_count_query/drive_dispatcher.rs
DocumentCountRequest struct changed from raw_where_value/raw_order_by_value to where_clauses/order_clauses fields; execute_document_count_request validates and uses typed clauses directly; helper parsing functions where_clauses_from_value and order_clauses_from_value made public.
Range clause canonicalization
packages/rs-drive/src/query/drive_document_count_query/drive_dispatcher.rs
validate_and_canonicalize_where_clauses expanded to merge same-field range pairs into single between*-style clauses via new merge_same_field_range_pairs logic while preserving validation for edge cases and non-range clauses.
Directional traversal for carrier aggregate proofs
packages/rs-drive/src/query/drive_document_count_query/execute_range_count.rs, packages/rs-drive/src/query/drive_document_count_query/executors/range_aggregate_carrier_proof.rs, packages/rs-drive/src/query/drive_document_count_query/path_query.rs, packages/rs-drive/src/verify/document_count/verify_carrier_aggregate_count_proof/mod.rs, packages/rs-drive/src/verify/document_count/verify_carrier_aggregate_count_proof/v0/mod.rs
Carrier aggregate proof generation and verification now accept and thread left_to_right: bool parameter; carrier_aggregate_count_path_query constructs outer Query with Query::new_with_direction(left_to_right) instead of default nondirectional constructor; dispatcher derives direction from order_by_ascending.
DriveDocumentQuery constructor refactoring
packages/rs-drive/src/query/mod.rs
from_decomposed_values refactored to parse orderBy into Vec<OrderClause> and delegate to new from_typed_clauses helper; new public constructor centralizes limit validation, InternalClauses extraction from typed clauses, and conversion to IndexMap required by DriveDocumentQuery.
Test updates to typed clause API
packages/rs-drive/src/query/drive_document_count_query/tests.rs, packages/rs-drive/src/drive/contract/insert/insert_contract/v0/mod.rs
Count tests updated to construct DocumentCountRequest with structured where_clauses: Vec<WhereClause> instead of raw_where_value; affected tests include aggregate fan-out limit, compound range+In, duplicate equality clauses, and range distinct proof scenarios; test assertions and proof flows unchanged.
Benchmark expansion for group-by coverage
packages/rs-drive/benches/document_count_worst_case.rs
Benchmark added G8a/G8b/G8c group-by matrix cases with descending order-by; case tuples extended to include raw_order_by; count_request builder updated to parse where/order_by Values into typed clauses via dispatcher parsing helpers; proof display loop updated to pass per-case raw_order_by for new case shapes.

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~25 minutes

Possibly related issues

Suggested reviewers

  • thepastaclaw
  • shumkov

Poem

🐰 Typed clauses dance through the dispatcher,
Left-to-right proofs twirl in descending grace,
Range pairs merge to a single, perfect state—
Where validity blooms and benchmarks shine bright! 📊✨

🚥 Pre-merge checks | ✅ 5
✅ Passed checks (5 passed)
Check name Status Explanation
Description Check ✅ Passed Check skipped - CodeRabbit’s high-level summary is enabled.
Title check ✅ Passed The title accurately reflects the main change: expanding count-index group-by carrier shapes with specific new shape designations (G1a/G1b/G8a-c).
Docstring Coverage ✅ Passed Docstring coverage is 100.00% which is sufficient. The required threshold is 80.00%.
Linked Issues check ✅ Passed Check skipped because no linked issues were found for this pull request.
Out of Scope Changes check ✅ Passed Check skipped because no linked issues were found for this pull request.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
📝 Generate docstrings
  • Create stacked PR
  • Commit on current branch
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch feat/count-index-group-by-carrier-shapes

Thanks for using CodeRabbit! It's free for OSS, and your support helps us grow. If you like it, consider giving us a shout-out.

❤️ Share

Comment @coderabbitai help to get the list of available commands and usage tips.

@thepastaclaw
Copy link
Copy Markdown
Collaborator

thepastaclaw commented May 15, 2026

Review Gate

Commit: 54a06621

  • Debounce: 1766m ago (need 30m)

  • CI checks: build failure: Rust workspace tests / Tests (macOS)

  • CodeRabbit review: comment found

  • Off-peak hours: off-peak (12:03 PM PT Saturday)

  • Run review now (check to override)

Copy link
Copy Markdown
Contributor

@coderabbitai coderabbitai Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

Caution

Some comments are outside the diff and can’t be posted inline due to platform limitations.

⚠️ Outside diff range comments (1)
packages/rs-drive/src/query/mod.rs (1)

784-801: ⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Stop silently dropping malformed orderBy clauses.

Line [789] currently uses filter_map with .ok(), so invalid orderBy entries are ignored instead of rejected. This can change query semantics silently instead of returning a syntax error.

💡 Suggested fix
-        let order_by_clauses: Vec<OrderClause> = order_by
-            .map(|id_cbor| {
-                if let Value::Array(clauses) = id_cbor {
-                    clauses
-                        .iter()
-                        .filter_map(|order_clause| {
-                            if let Value::Array(clauses_components) = order_clause {
-                                OrderClause::from_components(clauses_components).ok()
-                            } else {
-                                None
-                            }
-                        })
-                        .collect()
-                } else {
-                    Vec::new()
-                }
-            })
-            .unwrap_or_default();
+        let order_by_clauses: Vec<OrderClause> = match order_by {
+            None => Vec::new(),
+            Some(Value::Array(clauses)) => clauses
+                .iter()
+                .map(|order_clause| match order_clause {
+                    Value::Array(clauses_components) => {
+                        OrderClause::from_components(clauses_components).map_err(Error::from)
+                    }
+                    _ => Err(Error::Query(QuerySyntaxError::InvalidOrderByProperties(
+                        "order clauses must be [field, \"asc\"|\"desc\"] arrays",
+                    ))),
+                })
+                .collect::<Result<Vec<_>, Error>>()?,
+            Some(_) => {
+                return Err(Error::Query(QuerySyntaxError::InvalidOrderByProperties(
+                    "order clauses must be an array",
+                )));
+            }
+        };
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/rs-drive/src/query/mod.rs` around lines 784 - 801, The code
currently builds order_by_clauses by using filter_map(...).ok(), which silently
drops malformed orderBy entries; instead, change the logic around
OrderClause::from_components to validate every clause and return an error when
any clause fails parsing. Replace the filter_map + collect with mapping to
Result<OrderClause, _> (using OrderClause::from_components) and collect into a
Result<Vec<OrderClause>, _>, then propagate or map the error so order_by_clauses
is only produced on success; reference OrderClause::from_components,
order_by_clauses, and the matching on Value::Array to locate the change.
🧹 Nitpick comments (2)
packages/rs-drive/src/query/drive_document_count_query/execute_range_count.rs (1)

450-460: ⚡ Quick win

Document left_to_right as a proof-shaping argument.

This new parameter affects the serialized PathQuery used by prover/verifier parity, but the method docs don’t describe its behavior yet. Please add a short arg note and ordering semantics to keep this API self-describing.

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/rs-drive/src/query/drive_document_count_query/execute_range_count.rs`
around lines 450 - 460, Add a short doc comment to
execute_carrier_aggregate_count_with_proof explaining that the left_to_right
boolean is a proof-shaping argument that controls the ordering of entries in the
serialized PathQuery (affecting prover/verifier parity), specify which ordering
corresponds to true vs false (e.g., left-to-right vs right-to-left), and note
that this must match callers of carrier_aggregate_count_path_query to ensure
consistent proof serialization and verification across prover and verifier.
packages/rs-drive/src/verify/document_count/verify_carrier_aggregate_count_proof/mod.rs (1)

10-13: ⚡ Quick win

Update verifier docs for directional traversal.

The API now takes left_to_right, but the doc text still describes lex-ascending output unconditionally. Please align the contract text with direction-dependent ordering.

Also applies to: 32-51

🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In
`@packages/rs-drive/src/verify/document_count/verify_carrier_aggregate_count_proof/mod.rs`
around lines 10 - 13, The doc comment for the verifier function that returns
(root_hash, per_key_counts) is out of sync with the updated API: it still claims
lex-ascending output unconditionally but the function now accepts a
left_to_right parameter. Update the function/module docs (the comment above the
carrier AggregateCountOnRange verifier in mod.rs, and the similar block
referenced at lines 32–51) to state that per_key_counts ordering depends on
left_to_right (e.g., left_to_right == true => serialized lexicographic ascending
order, left_to_right == false => serialized lexicographic descending order), and
mention the left_to_right parameter name in the contract so callers know the
ordering behavior.
🤖 Prompt for all review comments with AI agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

Inline comments:
In `@packages/rs-drive/benches/document_count_worst_case.rs`:
- Around line 1056-1077: The matrix row for G8a isn't actually exercising
descending walks because raw_order_by isn't threaded through the matrix runner:
update report_group_by_matrix to accept and propagate raw_order_by alongside
raw_where, ensure the MatrixCase struct/instances (the new G8a entry) include
the appropriate raw_order_by value (representing
descending/left_to_right=false), and modify drive_count_outcome (the code that
builds the request) to use the passed raw_order_by instead of Value::Null when
constructing the request so the order-sensitive path (left_to_right=false) is
truly tested.

---

Outside diff comments:
In `@packages/rs-drive/src/query/mod.rs`:
- Around line 784-801: The code currently builds order_by_clauses by using
filter_map(...).ok(), which silently drops malformed orderBy entries; instead,
change the logic around OrderClause::from_components to validate every clause
and return an error when any clause fails parsing. Replace the filter_map +
collect with mapping to Result<OrderClause, _> (using
OrderClause::from_components) and collect into a Result<Vec<OrderClause>, _>,
then propagate or map the error so order_by_clauses is only produced on success;
reference OrderClause::from_components, order_by_clauses, and the matching on
Value::Array to locate the change.

---

Nitpick comments:
In
`@packages/rs-drive/src/query/drive_document_count_query/execute_range_count.rs`:
- Around line 450-460: Add a short doc comment to
execute_carrier_aggregate_count_with_proof explaining that the left_to_right
boolean is a proof-shaping argument that controls the ordering of entries in the
serialized PathQuery (affecting prover/verifier parity), specify which ordering
corresponds to true vs false (e.g., left-to-right vs right-to-left), and note
that this must match callers of carrier_aggregate_count_path_query to ensure
consistent proof serialization and verification across prover and verifier.

In
`@packages/rs-drive/src/verify/document_count/verify_carrier_aggregate_count_proof/mod.rs`:
- Around line 10-13: The doc comment for the verifier function that returns
(root_hash, per_key_counts) is out of sync with the updated API: it still claims
lex-ascending output unconditionally but the function now accepts a
left_to_right parameter. Update the function/module docs (the comment above the
carrier AggregateCountOnRange verifier in mod.rs, and the similar block
referenced at lines 32–51) to state that per_key_counts ordering depends on
left_to_right (e.g., left_to_right == true => serialized lexicographic ascending
order, left_to_right == false => serialized lexicographic descending order), and
mention the left_to_right parameter name in the contract so callers know the
ordering behavior.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: defaults

Review profile: CHILL

Plan: Pro

Run ID: b77c2ed8-5e9e-4952-95ee-0d33ab0e2f6d

📥 Commits

Reviewing files that changed from the base of the PR and between dfb6d84 and 54a0662.

📒 Files selected for processing (11)
  • book/src/drive/count-index-group-by-examples.md
  • packages/rs-drive/benches/document_count_worst_case.rs
  • packages/rs-drive/src/drive/contract/insert/insert_contract/v0/mod.rs
  • packages/rs-drive/src/query/drive_document_count_query/drive_dispatcher.rs
  • packages/rs-drive/src/query/drive_document_count_query/execute_range_count.rs
  • packages/rs-drive/src/query/drive_document_count_query/executors/range_aggregate_carrier_proof.rs
  • packages/rs-drive/src/query/drive_document_count_query/path_query.rs
  • packages/rs-drive/src/query/drive_document_count_query/tests.rs
  • packages/rs-drive/src/query/mod.rs
  • packages/rs-drive/src/verify/document_count/verify_carrier_aggregate_count_proof/mod.rs
  • packages/rs-drive/src/verify/document_count/verify_carrier_aggregate_count_proof/v0/mod.rs

Comment on lines +1056 to +1077
// G8a: bounded carrier + bounded ACOR with descending walk.
// Same `RangeAggregateCarrierProof` mode as G8 but the
// dispatcher merges two-sided ranges into `between*` clauses
// via `merge_same_field_range_pairs` and threads
// `left_to_right = false` through the carrier path query.
MatrixCase {
label: "[brand] / where=brand BETWEEN AND color BETWEEN (left_to_right=false)",
platform_allowed:
"yes (RangeAggregateCarrierProof — bounded-range carrier with descending walk)",
raw_where: Value::Array(vec![
clause("brand", ">", Value::Text(brand_label(BRAND_COUNT / 2))),
clause(
"brand",
"<",
Value::Text(brand_label(BRAND_COUNT * 65 / 100)),
),
clause("color", ">", Value::Text(color_label(200))),
clause("color", "<", Value::Text(color_label(400))),
]),
mode: CountMode::GroupByRange,
limit: None,
},
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major | ⚡ Quick win

Thread raw_order_by through the matrix runner.

This new G8a matrix row is labeled as the descending carrier walk, but report_group_by_matrix still only carries raw_where, and drive_count_outcome() builds the request with Value::Null for order_by. So this case is actually exercising the default direction, not left_to_right=false, and the matrix output is misleading for the new order-sensitive path.

Suggested wiring
 struct MatrixCase {
     label: &'static str,
     platform_allowed: &'static str,
     raw_where: Value,
+    raw_order_by: Value,
     mode: CountMode,
     limit: Option<u32>,
 }

 fn drive_count_outcome(
     fixture: &CountBenchFixture,
     raw_where: Value,
+    raw_order_by: Value,
     mode: CountMode,
     limit: Option<u32>,
     prove: bool,
     platform_version: &PlatformVersion,
 ) -> String {
-    let request = count_request(fixture, raw_where, Value::Null, mode, limit, prove);
+    let request = count_request(fixture, raw_where, raw_order_by, mode, limit, prove);
     // ...
 }

 MatrixCase {
     label: "[brand] / where=brand BETWEEN AND color BETWEEN (left_to_right=false)",
     platform_allowed:
         "yes (RangeAggregateCarrierProof — bounded-range carrier with descending walk)",
     raw_where: Value::Array(vec![/* ... */]),
+    raw_order_by: order_by_brand_desc.clone(),
     mode: CountMode::GroupByRange,
     limit: None,
 },

 let noproof_result = drive_count_outcome(
     fixture,
     case.raw_where.clone(),
+    case.raw_order_by.clone(),
     case.mode,
     case.limit,
     false,
     platform_version,
 );
 let prove_result = drive_count_outcome(
     fixture,
     case.raw_where.clone(),
+    case.raw_order_by.clone(),
     case.mode,
     case.limit,
     true,
     platform_version,
 );
🤖 Prompt for AI Agents
Verify each finding against current code. Fix only still-valid issues, skip the
rest with a brief reason, keep changes minimal, and validate.

In `@packages/rs-drive/benches/document_count_worst_case.rs` around lines 1056 -
1077, The matrix row for G8a isn't actually exercising descending walks because
raw_order_by isn't threaded through the matrix runner: update
report_group_by_matrix to accept and propagate raw_order_by alongside raw_where,
ensure the MatrixCase struct/instances (the new G8a entry) include the
appropriate raw_order_by value (representing descending/left_to_right=false),
and modify drive_count_outcome (the code that builds the request) to use the
passed raw_order_by instead of Value::Null when constructing the request so the
order-sensitive path (left_to_right=false) is truly tested.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants