Add any_value aggregate function#23043
Open
Kevin-Li-2025 wants to merge 2 commits into
Open
Conversation
Jefffrey
reviewed
Jun 19, 2026
Comment on lines
+27
to
+39
| query I | ||
| SELECT any_value(column2) FROM any_value_test; | ||
| ---- | ||
| 10 | ||
|
|
||
| query IIT rowsort | ||
| SELECT column1, any_value(column2), any_value(column3) | ||
| FROM any_value_test | ||
| GROUP BY column1; | ||
| ---- | ||
| 1 10 first | ||
| 2 NULL NULL | ||
| 3 30 third |
Contributor
There was a problem hiding this comment.
are these two tests technically deterministic?
| # under the License. | ||
|
|
||
| statement ok | ||
| CREATE TABLE any_value_test AS VALUES |
Contributor
There was a problem hiding this comment.
maybe lets add a test for all nulls column
Contributor
|
Thanks @Kevin-Li-2025 @Jefffrey , LGTM. One follow-up idea is to impl a native |
Author
|
Thanks, agreed that a native GroupsAccumulator is the right follow-up for grouped any_value. I’ll keep this PR scoped to adding the function and handle the grouped fast path separately, since it needs type-specific state handling plus targeted grouped-aggregation benchmarks to demonstrate the improvement without delaying this PR. |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Which issue does this PR close?
any_valueaggregate function #22799.Rationale for this change
any_valueis a common aggregate in SQL engines for queries that need one representative non-null value from each group without imposing an ordering requirement. DataFusion currently hasfirst_value, but that aggregate is order-sensitive, so exposingany_valuegives users the intended arbitrary-value semantics directly.What changes are included in this PR?
any_value(expression)aggregate UDF and registers it with the default aggregate functions.Are these changes tested?
Yes. I ran:
Are there any user-facing changes?
Yes. This adds a new SQL aggregate function,
any_value.I used AI assistance to help inspect the codebase and run validation, and I reviewed the resulting implementation and tests.