Skip to content

fix indexOfDifference splitting a surrogate pair#1713

Open
alhudz wants to merge 1 commit into
apache:masterfrom
alhudz:indexofdifference-surrogate-pair
Open

fix indexOfDifference splitting a surrogate pair#1713
alhudz wants to merge 1 commit into
apache:masterfrom
alhudz:indexofdifference-surrogate-pair

Conversation

@alhudz

@alhudz alhudz commented Jun 17, 2026

Copy link
Copy Markdown
Contributor

Repro: getCommonPrefix("𐐀", "𐐁") returns the lone high surrogate \uD801, and difference("𐐀", "𐐁") returns the lone low surrogate \uDC01. Both are malformed UTF-16.
Cause: both indexOfDifference overloads walk the inputs one char at a time, so when two strings share the high surrogate of a supplementary code point but differ in the low half, the reported index lands inside the pair (index 1 rather than 0). getCommonPrefix then slices the pair in half and difference starts its result on the orphaned low half.
Fix: when the difference falls on a low surrogate whose preceding (common) char is a high surrogate, report the start of the pair. getCommonPrefix and difference follow for free. BMP inputs never hit the new branch, so existing behaviour is unchanged.

  • Read the contribution guidelines for this project.
  • Read the ASF Generative Tooling Guidance if you use Artificial Intelligence (AI).
  • I used AI to create any part of, or all of, this pull request. Which AI tool was used to create this pull request, and to what extent did it contribute?
  • Run a successful build using the default Maven goal with mvn; that's mvn on the command line by itself.
  • Write unit tests that match behavioral changes, where the tests fail if the changes to the runtime are not applied. This may not always be possible, but it is a best practice.
  • Write a pull request description that is detailed enough to understand what the pull request does, how, and why.
  • Each commit in the pull request should have a meaningful subject line and body.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant