Use UTF-8 for filename encoding instead of ASCII#168
Merged
Conversation
littlefs stores names as opaque byte strings, so the API-level encoding is a free choice. The previous default of ASCII rejected any non-ASCII filename with UnicodeEncodeError (and decoded directory entries as ASCII as well). Switch the default FILENAME_ENCODING to UTF-8: non-ASCII names now round-trip through open/stat/listdir/mkdir/rename/remove, and because ASCII is a strict subset of UTF-8 all existing filenames are unaffected. Adds regression tests covering Latin-1, CJK and emoji filenames.
There was a problem hiding this comment.
Pull request overview
This PR changes the default filename encoding used by the Python bindings from ASCII to UTF-8 so that non-ASCII filenames can be created and round-tripped through the LittleFS API (while keeping ASCII behavior unchanged).
Changes:
- Switch
FILENAME_ENCODINGdefault from'ascii'to'utf-8'. - Add regression tests for Unicode filenames (Latin-1, CJK, emoji) covering open/stat/listdir/mkdir/rename/remove.
- Document the rationale for UTF-8 as the default filename encoding.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.
| File | Description |
|---|---|
| test/test_unicode_filenames.py | Adds regression coverage ensuring non-ASCII filenames round-trip across core filesystem operations. |
| src/littlefs/lfs.pyx | Changes the default filename encoding constant to UTF-8 and documents the reasoning. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Build on the UTF-8 default by letting callers choose the filename encoding per filesystem instead of relying on the process-wide lfs.FILENAME_ENCODING global. - Low-level lfs.* functions take an optional filename_encoding that falls back to the module global, keeping existing callers unaffected. - LittleFS accepts filename_encoding= and threads it through every path-handling call (open, stat, listdir/scandir, mkdir, remove, rename, *attr). - Useful for images whose names were written with a non-UTF-8 encoding (e.g. latin-1, shift-jis), which would otherwise mis-decode or raise. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a --filename-encoding option to the shared CLI parser so create, extract, list, and repl can encode/decode image filenames with a non-UTF-8 codec (e.g. latin-1, shift-jis). Defaults to None so the default lives solely in lfs.FILENAME_ENCODING (utf-8). Unlike name_max/attr_max/file_max, this is a host-side encode/decode choice and is never stored in the image, so it may differ freely between create and extract. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Confirmed against the upstream C source: lfs_path_namelen() (strcspn) measures the name in bytes and lfs.c checks `nlen > name_max`, so a multi-byte UTF-8 character counts as 2-4 bytes against the limit. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
- lfs.pyi: dir_read returns Optional[LFSStat] (None at end-of-directory) - __init__.py: scandir wraps iteration in try/finally so dir_close always runs even if dir_read raises (e.g. UnicodeDecodeError) - __main__.py: clarify that extract must use the same --filename-encoding as create; it cannot differ freely - lfs.pyx: file_sync now returns the error code, matching its -> int stub and every other file_* function (was the lone outlier returning None) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
littlefs stores names as opaque byte strings, so the API-level encoding is a free choice. The previous default of ASCII rejected any non-ASCII filename with UnicodeEncodeError (and decoded directory entries as ASCII as well).
Switch the default FILENAME_ENCODING to UTF-8: non-ASCII names now round-trip through open/stat/listdir/mkdir/rename/remove, and because ASCII is a strict subset of UTF-8 all existing filenames are unaffected.
Adds regression tests covering Latin-1, CJK and emoji filenames.