Skip to content

Drop trailing empty token from splitByWholeSeparator#1710

Open
alhudz wants to merge 2 commits into
apache:masterfrom
alhudz:split-whole-separator-trailing-empty
Open

Drop trailing empty token from splitByWholeSeparator#1710
alhudz wants to merge 2 commits into
apache:masterfrom
alhudz:split-whole-separator-trailing-empty

Conversation

@alhudz

@alhudz alhudz commented Jun 16, 2026

Copy link
Copy Markdown
Contributor

Repro: StringUtils.splitByWholeSeparator("a:b:", ":") returns ["a", "b", ""]; expected ["a", "b"]. The sibling StringUtils.split("a:b:", ":") already returns ["a", "b"].

Cause: in splitByWholeSeparatorWorker the no-more-separator branch adds str.substring(beg) unconditionally. When the input ends on a separator, beg == len, so an empty trailing token is appended. Leading and adjacent separators are already collapsed in non-preserve mode, so only the trailing case leaked, which disagrees with the documented "adjacent separators are treated as one separator".

Fix: in non-preserve mode only add the trailing token when it is non-empty (beg < len). splitByWholeSeparatorPreserveAllTokens keeps the trailing empty token as before.

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI to create any part of, or all of, this pull request. Which AI tool was used to create this pull request, and to what extent did it contribute?
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied. This may not always be possible, but it is a best practice.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body. Note that a maintainer may squash commits during the merge process.

@garydgregory garydgregory changed the title drop trailing empty token from splitByWholeSeparator Drop trailing empty token from splitByWholeSeparator Jun 16, 2026

@garydgregory garydgregory left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@alhudz
Please see my comment.

assertEquals(splitWithMultipleSeparatorExpectedResults[i], splitWithMultipleSeparator[i]);
}

// a trailing separator must not leak an empty token (it is dropped, like leading and adjacent ones)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please add a new @ParameterizedTest method for these new assertions.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. Moved them into a new @ParameterizedTest, testSplitByWholeStringDropsTrailingEmpty, driven by @CsvSource. Each row also asserts the preserve-all-tokens variant still keeps the trailing empty token. Confirmed it fails on every row without the worker guard.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants