Skip to content

GH-50338: [C++] Add ComputeLogicalNullCount to Datum#50347

Open
goel-skd wants to merge 1 commit into
apache:mainfrom
goel-skd:gh-50338-datum-compute-logical-null-count
Open

GH-50338: [C++] Add ComputeLogicalNullCount to Datum#50347
goel-skd wants to merge 1 commit into
apache:mainfrom
goel-skd:gh-50338-datum-compute-logical-null-count

Conversation

@goel-skd

@goel-skd goel-skd commented Jul 2, 2026

Copy link
Copy Markdown
Contributor

Rationale for this change

Datum exposes null_count(), which works on arrays, chunked arrays and scalars, but it does not account for types whose logical nulls are not captured by the top-level validity bitmap (union, run-end encoded and dictionary types). ArrayData, ArraySpan, Array and ChunkedArray all expose ComputeLogicalNullCount() for this purpose; Datum was the missing piece, so callers had to unwrap the datum and dispatch on its kind themselves. This PR closes that gap.

What changes are included in this PR?

Add Datum::ComputeLogicalNullCount(), mirroring the structure of Datum::null_count():

  • for arrays, delegates to ArrayData::ComputeLogicalNullCount(),
  • for chunked arrays, delegates to ChunkedArray::ComputeLogicalNullCount() (added in [C++] Add ComputeLogicalNullCount method to ChunkedArray #50260),
  • for scalars, returns the same value as null_count(); Scalar::is_valid reflects logical validity for union and run-end encoded scalars, while a DictionaryScalar counts as non-null whenever its index is valid, even if the index points to a null dictionary value (this caveat is documented on the new method).

As with the other Datum accessors, the method is only valid for scalar and array-like data. As with the Array-level method, the value is recomputed on each call for the affected types rather than cached.

The DCHECK failure message shared with null_count() was also corrected from "only valid for array-like values" to "only valid for scalar or array-like values", matching the documented contract of both methods.

Are these changes tested?

Yes. A new Datum.ComputeLogicalNullCount test covers:

  • valid and null scalars,
  • an array with a validity bitmap (result matches null_count()),
  • a sparse union array, where null_count() is 0 but the logical null count is not,
  • a chunked array of union arrays (the logical null count is summed over the chunks),
  • a dictionary array, where a valid index referencing a null dictionary value counts as a logical null, and
  • a dictionary scalar, documenting that its is_valid reflects only index validity, so it does not count a referenced null dictionary value.

Are there any user-facing changes?

Yes, this adds a new public method, Datum::ComputeLogicalNullCount(). The change is purely additive; no existing APIs are modified.

@goel-skd goel-skd requested a review from pitrou as a code owner July 2, 2026 22:09
@github-actions

github-actions Bot commented Jul 2, 2026

Copy link
Copy Markdown

⚠️ GitHub issue #50338 has been automatically assigned in GitHub to PR creator.

Datum::null_count() does not account for types that carry logical
nulls without a validity bitmap (union, dictionary and run-end
encoded types). Add Datum::ComputeLogicalNullCount(), delegating to
ArrayData::ComputeLogicalNullCount() and
ChunkedArray::ComputeLogicalNullCount() for array-like data; for
scalars, is_valid already reflects logical validity.
@goel-skd goel-skd force-pushed the gh-50338-datum-compute-logical-null-count branch from 2e82f92 to 30f92df Compare July 2, 2026 22:35
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant