Skip to content

Handle surrogate pairs in unstable encoding#542

Open
shreejaykurhade wants to merge 1 commit into
openai:mainfrom
shreejaykurhade:fix-surrogate-encoding
Open

Handle surrogate pairs in unstable encoding#542
shreejaykurhade wants to merge 1 commit into
openai:mainfrom
shreejaykurhade:fix-surrogate-encoding

Conversation

@shreejaykurhade
Copy link
Copy Markdown

Fixes #541

encode and encode_ordinary already retry with UTF-16 surrogate repair when the Rust core raises UnicodeEncodeError.

This applies the same behavior to encode_with_unstable, so surrogate pairs and lone surrogates are handled consistently across Python encoding APIs.

Added a regression test for encode_with_unstable.

Tested with:

python -m pytest tests/test_encoding.py -q

@shreejaykurhade shreejaykurhade force-pushed the fix-surrogate-encoding branch from 5fe46bb to 88ce50e Compare May 13, 2026 19:11
@shreejaykurhade shreejaykurhade force-pushed the fix-surrogate-encoding branch from 88ce50e to d3baea9 Compare May 13, 2026 19:12
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

encode_with_unstable does not handle surrogate pairs like encode

1 participant