Skip to content

Use UTF-8 for filename encoding instead of ASCII#168

Merged
BrianPugh merged 5 commits into
jrast:masterfrom
slash-proc:utf8-filename-encoding
Jun 5, 2026
Merged

Use UTF-8 for filename encoding instead of ASCII#168
BrianPugh merged 5 commits into
jrast:masterfrom
slash-proc:utf8-filename-encoding

Conversation

@slash-proc

Copy link
Copy Markdown
Contributor

littlefs stores names as opaque byte strings, so the API-level encoding is a free choice. The previous default of ASCII rejected any non-ASCII filename with UnicodeEncodeError (and decoded directory entries as ASCII as well).

Switch the default FILENAME_ENCODING to UTF-8: non-ASCII names now round-trip through open/stat/listdir/mkdir/rename/remove, and because ASCII is a strict subset of UTF-8 all existing filenames are unaffected.

Adds regression tests covering Latin-1, CJK and emoji filenames.

littlefs stores names as opaque byte strings, so the API-level encoding is a
free choice. The previous default of ASCII rejected any non-ASCII filename
with UnicodeEncodeError (and decoded directory entries as ASCII as well).

Switch the default FILENAME_ENCODING to UTF-8: non-ASCII names now round-trip
through open/stat/listdir/mkdir/rename/remove, and because ASCII is a strict
subset of UTF-8 all existing filenames are unaffected.

Adds regression tests covering Latin-1, CJK and emoji filenames.

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR changes the default filename encoding used by the Python bindings from ASCII to UTF-8 so that non-ASCII filenames can be created and round-tripped through the LittleFS API (while keeping ASCII behavior unchanged).

Changes:

  • Switch FILENAME_ENCODING default from 'ascii' to 'utf-8'.
  • Add regression tests for Unicode filenames (Latin-1, CJK, emoji) covering open/stat/listdir/mkdir/rename/remove.
  • Document the rationale for UTF-8 as the default filename encoding.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated no comments.

File Description
test/test_unicode_filenames.py Adds regression coverage ensuring non-ASCII filenames round-trip across core filesystem operations.
src/littlefs/lfs.pyx Changes the default filename encoding constant to UTF-8 and documents the reasoning.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

BrianPugh and others added 3 commits June 4, 2026 10:56
Build on the UTF-8 default by letting callers choose the filename
encoding per filesystem instead of relying on the process-wide
lfs.FILENAME_ENCODING global.

- Low-level lfs.* functions take an optional filename_encoding that
  falls back to the module global, keeping existing callers unaffected.
- LittleFS accepts filename_encoding= and threads it through every
  path-handling call (open, stat, listdir/scandir, mkdir, remove,
  rename, *attr).
- Useful for images whose names were written with a non-UTF-8 encoding
  (e.g. latin-1, shift-jis), which would otherwise mis-decode or raise.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Add a --filename-encoding option to the shared CLI parser so create,
extract, list, and repl can encode/decode image filenames with a
non-UTF-8 codec (e.g. latin-1, shift-jis). Defaults to None so the
default lives solely in lfs.FILENAME_ENCODING (utf-8).

Unlike name_max/attr_max/file_max, this is a host-side encode/decode
choice and is never stored in the image, so it may differ freely
between create and extract.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Confirmed against the upstream C source: lfs_path_namelen() (strcspn)
measures the name in bytes and lfs.c checks `nlen > name_max`, so a
multi-byte UTF-8 character counts as 2-4 bytes against the limit.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated 4 comments.

Comment thread src/littlefs/lfs.pyi Outdated
Comment thread src/littlefs/__main__.py Outdated
Comment thread src/littlefs/__init__.py Outdated
Comment thread test/test_unicode_filenames.py
- lfs.pyi: dir_read returns Optional[LFSStat] (None at end-of-directory)
- __init__.py: scandir wraps iteration in try/finally so dir_close always
  runs even if dir_read raises (e.g. UnicodeDecodeError)
- __main__.py: clarify that extract must use the same --filename-encoding
  as create; it cannot differ freely
- lfs.pyx: file_sync now returns the error code, matching its -> int stub
  and every other file_* function (was the lone outlier returning None)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

Copilot AI left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 6 out of 6 changed files in this pull request and generated no new comments.

@BrianPugh BrianPugh merged commit fcd02ba into jrast:master Jun 5, 2026
21 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants