fix: write and read figure JSON as UTF-8#5633
Draft
LukeTheoJohnson wants to merge 1 commit into
Draft
Conversation
write_json wrote figure JSON with Path.write_text(json_str) and read_json read it back with Path.read_text(), both omitting the encoding. On platforms whose default text encoding is not UTF-8 (e.g. cp1252 on Windows), writing a figure containing non-ASCII text raised UnicodeEncodeError and reading produced mojibake. write_html already passes "utf-8" explicitly; apply the same to the JSON I/O path so figures round-trip everywhere. Update the existing pathlib mock tests to assert the UTF-8 encoding.
b6cbd44 to
1c182c4
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
pio.write_json(fig, path)raisesUnicodeEncodeErrorfor any figure containing non-ASCII text (titles, axis labels, hover text, categorical values such as °C, café, µ) on platforms whose default text encoding is not UTF-8. On Windows the default is cp1252:read_jsonhas the same bug. It reads the file without specifying an encoding, so a UTF-8 JSON file is decoded as cp1252 and the text comes back as mojibake. This works on macOS/Linux only because their default encoding happens to be UTF-8.Root cause
plotly/io/_json.pyopened the file without an encoding on both sides:write_json:path.write_text(json_str)read_json:path.read_text()write_htmlalready does this correctly (path.write_text(html_str, "utf-8")), and the same class of bug was previously fixed for HTML output (#3898). I believe the JSON I/O path was simply missed.Fix
Pass
"utf-8"explicitly in bothwrite_jsonandread_json, matchingwrite_html. JSON is UTF-8 by default per RFC 8259.Tests
test_write_json_pathlibandtest_read_json_from_pathlibto assert the UTF-8 encoding is used. Verified red to green: both fail on the unpatched source and pass with the fix.write_jsontoread_jsonreturn of a non-ASCII figure returns the original text on Windows (cp1252).tests/test_io/test_to_from_json.pyshows only the pre-existingFigureWidgetImportErrors (anywidget not installed).Scope
Small source change plus test assertions. No behaviour change on platforms that already default to UTF-8.