Skip to content

Improve performance of binary and string concatenation operator#23057

Draft
pepijnve wants to merge 2 commits into
apache:mainfrom
pepijnve:concat_opt
Draft

Improve performance of binary and string concatenation operator#23057
pepijnve wants to merge 2 commits into
apache:mainfrom
pepijnve:concat_opt

Conversation

@pepijnve

Copy link
Copy Markdown
Contributor

Which issue does this PR close?

  • None yet

Rationale for this change

During recent profiling work string concatenation proved to be a hotspot. Investigation of the current kernel implementation for string views showed that there was still some room for improvement.

  • Preallocating the exact size of the required output buffers can avoid reallocations.
  • By copying data directly to the final data buffer a memcpy from the temp buffer can be avoided.

Together this can result in ~30% improvement per the string concat benchmark

Note that this work is a port of apache/arrow-rs#10161. Ideally the implementation from Arrow is used by DataFusion once the PR in that project is merged and released. Since DataFusion currently uses a custom kernel it seemed to make sense to temporarily port the proposed PR from Arrow.

What changes are included in this PR?

  • Rewrite the byte view concatenation kernels

Are these changes tested?

Covered by existing tests

Are there any user-facing changes?

No

@github-actions github-actions Bot added the physical-expr Changes to the physical-expr crates label Jun 20, 2026
@pepijnve pepijnve marked this pull request as ready for review June 20, 2026 16:32

@comphead comphead left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @pepijnve
Lets wait apache/arrow-rs#10161 to be merged, the PR might get some changes down the road and we can port changes here.

It would be useful to have some TODO to use the arrow-rs code instead, once it is released

@pepijnve

Copy link
Copy Markdown
Contributor Author

I'll mark this one as draft for now and add the TODO already.

@pepijnve pepijnve marked this pull request as draft June 20, 2026 18:58
@github-actions

Copy link
Copy Markdown

Thank you for opening this pull request!

Reviewer note: cargo-semver-checks reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).

Details
error: `cargo metadata` exited with an error:     Updating crates.io index
error: failed to get `whoami` as a dependency of package `tokio-postgres v0.7.18`
    ... which satisfies dependency `tokio-postgres = "^0.7.18"` (locked to 0.7.18) of package `datafusion-sqllogictest v54.0.0 (/home/runner/work/datafusion/datafusion/datafusion/sqllogictest)`

Caused by:
  failed to load source for dependency `whoami`

Caused by:
  unable to update registry `crates-io`

Caused by:
  download of wh/oa/whoami failed

Caused by:
  curl failed

Caused by:
  [56] Failure when receiving data from the peer (Recv failure: Connection reset by peer)

@github-actions github-actions Bot added the auto detected api change Auto detected API change label Jun 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

auto detected api change Auto detected API change physical-expr Changes to the physical-expr crates

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants