Skip to content

Implement stringByFoldingWithOptions#600

Open
HendrikHuebner wants to merge 8 commits into
gnustep:masterfrom
HendrikHuebner:stringByFoldingWithOptions
Open

Implement stringByFoldingWithOptions#600
HendrikHuebner wants to merge 8 commits into
gnustep:masterfrom
HendrikHuebner:stringByFoldingWithOptions

Conversation

@HendrikHuebner
Copy link
Copy Markdown
Contributor

This PR implements [stringByFoldingWithOptions using ICU transliterators.

The behavior does not match Apple's implementation exactly, but I tried getting as close as possible:

  • ICU transliterators support fullwidth character conversion
  • Diacritics are removed using the sequence NFD; [:Nonspacing Mark:] Remove; NFC, as explained in their documentation.
  • lower case conversion is done using u_strToLower, which is locale-aware.

Additionally I added a caching optimization, which works similarly to how collators are handled.

@HendrikHuebner HendrikHuebner requested a review from rfm as a code owner April 22, 2026 16:47
@rfm
Copy link
Copy Markdown
Contributor

rfm commented May 1, 2026

I'm waiting for this to get to a state where it compiles before going into detail, but from a quick look it seems basically fine.
I do notice some inappropriate use of NSZoneMalloc() and NSZoneFree() which, while not actually broken, would be better done using malloc() and free(). That's anywhere we are just using heap temporarily rather than adding memory to an existing object (in which case we should use the zone of that object). Possibly, at some future point we may remove zones (as Apple has done), but for now we want to use them corrcectly:
All memory used by an object should be in the same zone
All heap memory not used by an object is most efficiently done using malloc/free

@HendrikHuebner
Copy link
Copy Markdown
Contributor Author

I do notice some inappropriate use of NSZoneMalloc() and NSZoneFree() which, while not actually broken, would be better done using malloc() and free().

I've replaced the uses in my patch with libc malloc/free. Note that the zoned versions are used extensively in NSString.m, however.

@HendrikHuebner
Copy link
Copy Markdown
Contributor Author

@rfm CI is still failing on Clang+Windows because I did not know how to implemement locale sensitive lowercase folding without u_strToLower, so I just fell back to non locale sensitive lower case conversion. That obviously makes the test fail. How should I handle this?

@rfm
Copy link
Copy Markdown
Contributor

rfm commented May 5, 2026

I don't understand why you are not using u_strToLower on windows. As far as I know the ICU support ought to be the same on windows as unix platforms.

I did find a block suggesting a native alternative (LCMap­String­Ex) at https://devblogs.microsoft.com/oldnewthing/20241007-00/?p=110345 but that block also seem to suggest you can use u_strToLower

@HendrikHuebner HendrikHuebner force-pushed the stringByFoldingWithOptions branch from 04a7a22 to 07257cc Compare May 20, 2026 13:10
@HendrikHuebner
Copy link
Copy Markdown
Contributor Author

@rfm The PR builds now. Originally, I thought u_strToLower wasn't available on Windows. Seems like it is avaiable, but transliterators are not...?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Development

Successfully merging this pull request may close these issues.

2 participants