GH-50338: [C++] Add ComputeLogicalNullCount to Datum#50347
Open
goel-skd wants to merge 1 commit into
Open
Conversation
|
|
Datum::null_count() does not account for types that carry logical nulls without a validity bitmap (union, dictionary and run-end encoded types). Add Datum::ComputeLogicalNullCount(), delegating to ArrayData::ComputeLogicalNullCount() and ChunkedArray::ComputeLogicalNullCount() for array-like data; for scalars, is_valid already reflects logical validity.
2e82f92 to
30f92df
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Rationale for this change
Datumexposesnull_count(), which works on arrays, chunked arrays and scalars, but it does not account for types whose logical nulls are not captured by the top-level validity bitmap (union, run-end encoded and dictionary types).ArrayData,ArraySpan,ArrayandChunkedArrayall exposeComputeLogicalNullCount()for this purpose;Datumwas the missing piece, so callers had to unwrap the datum and dispatch on its kind themselves. This PR closes that gap.What changes are included in this PR?
Add
Datum::ComputeLogicalNullCount(), mirroring the structure ofDatum::null_count():ArrayData::ComputeLogicalNullCount(),ChunkedArray::ComputeLogicalNullCount()(added in [C++] AddComputeLogicalNullCountmethod toChunkedArray#50260),null_count();Scalar::is_validreflects logical validity for union and run-end encoded scalars, while aDictionaryScalarcounts as non-null whenever its index is valid, even if the index points to a null dictionary value (this caveat is documented on the new method).As with the other
Datumaccessors, the method is only valid for scalar and array-like data. As with theArray-level method, the value is recomputed on each call for the affected types rather than cached.The
DCHECKfailure message shared withnull_count()was also corrected from "only valid for array-like values" to "only valid for scalar or array-like values", matching the documented contract of both methods.Are these changes tested?
Yes. A new
Datum.ComputeLogicalNullCounttest covers:null_count()),null_count()is 0 but the logical null count is not,is_validreflects only index validity, so it does not count a referenced null dictionary value.Are there any user-facing changes?
Yes, this adds a new public method,
Datum::ComputeLogicalNullCount(). The change is purely additive; no existing APIs are modified.