Skip to content

osfs: Add experimental WithMmap for read-only Opens#213

Merged
hiddeco merged 2 commits into
go-git:mainfrom
hiddeco:feat/readerat-fs
May 19, 2026
Merged

osfs: Add experimental WithMmap for read-only Opens#213
hiddeco merged 2 commits into
go-git:mainfrom
hiddeco:feat/readerat-fs

Conversation

@hiddeco
Copy link
Copy Markdown
Member

@hiddeco hiddeco commented May 15, 2026

Pins the io.ReaderAt concurrent-safety contract on billy.File.ReadAt so consumers performing concurrent random-access reads (pack scanners, idx readers, etc.) can rely on a shared handle. Adds -race tests in memfs and embedfs asserting compliance; osfs already inherits the property from *os.File.ReadAt.

Adds an opt-in osfs.WithMmap() constructor option. On darwin and linux, opening a file without write flags on a WithMmap-configured filesystem returns a memory-mapped billy.File; on other platforms the option is accepted but has no effect. The mmap-backed file:

  • tracks a real cursor over the mapped bytes for Read/Seek;
  • serves ReadAt concurrently via an RWMutex that serialises Close against in-flight reads (otherwise munmap would invalidate the mapping under a racing read);
  • rejects Write/WriteAt/Truncate with os.ErrPermission, it is read-only by construction;
  • sets a runtime.SetFinalizer on construction and clears it on Close, so a forgotten Close doesn't leak the mapping;
  • falls through to the regular fd-backed *file wrapper when mmap is unavailable for a specific file (size 0, larger than math.MaxInt on a 32-bit platform, kernel rejection for pipes/devices/FS quirks).

Motivation

Consumers reading large hot files repeatedly (pack/idx/rev scanners in go-git is the motivating example) benefit from an mmap-backed ReadAt path: per-call syscall cost goes away, and parallel reads from multiple goroutines run without serialisation. The existing *os.File.ReadAt already serves correct concurrent random-access reads on darwin/linux; the new path replaces the pread(2) syscall with a slice copy.

go-git/go-git#2132 (PackHandle refcounted FD lifecycle) consumes this path via fs.Open(path).ReadAt(...) on a WithMmap-configured osfs.

Experimental

WithMmap is marked experimental in its godoc. Today every read-only Open on a WithMmap-configured filesystem becomes mmap-backed — including small files (config, refs, hooks) where mmap can be neutral or net-negative. If real-world feedback shows the per-FS opt-in is too coarse, the option's effect may narrow to per-call later.

mmap also changes failure semantics:

  • Truncating the underlying file while a read is in flight raises SIGBUS instead of returning an error.
  • Replacing the file via rename leaves the mapping pointing at the old inode.
  • The mmap-backed file does not satisfy billy.Syncer or the Locker interface even though the surrounding filesystem advertises both capabilities.

Callers that may see the file mutate underneath them should leave the option off.

Comment thread osfs/readerat_mmap.go Outdated
Comment thread osfs/readerat_mmap.go Outdated
Pins the concurrency contract on `billy.File.ReadAt`: parallel
calls from multiple goroutines on the same handle must be safe,
matching the `io.ReaderAt` contract documented in package `io`.
The contract has always been inherited via the embedded interface,
but only stdlib convention enforced it — backings could ship a
`billy.File` whose `ReadAt` serialised reads and still satisfy
the interface. Making the contract explicit in the godoc lets
consumers performing concurrent random-access reads (e.g. pack
readers, idx scanners) rely on a single shared handle without
type-asserting against a separate capability interface.

Adds `-race` tests in `memfs` and `embedfs` that fan eight workers
out across a shared handle, hitting randomised offsets against a
known pattern. Both backings already honoured the contract —
`memfs/content.ReadAt` reads under a `sync.RWMutex`, and `embedfs`
delegates to `bytes.Reader.ReadAt` which is concurrent-safe by
design — but the tests pin the behaviour so future wrappers
cannot regress it. `osfs` inherits the same property from
`*os.File.ReadAt` (`pread` on POSIX, `ReadFile`-with-offset on
Windows), exercised by the existing concurrent osfs tests.

The `embedfs` fixture set picks up a deterministic 4 KiB
`testdata/concurrent.bin` for the test; the existing `TestReadDir`
assertion that enumerates the `testdata` directory is updated
accordingly.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
@hiddeco hiddeco force-pushed the feat/readerat-fs branch from 0506f9e to 6598282 Compare May 18, 2026 19:21
Adds an opt-in `osfs.WithMmap()` constructor option. On darwin and
linux, an `osfs` filesystem built with the option returns a memory-
mapped `billy.File` from `Open` (and any `OpenFile` without write
flags); on other platforms the option is accepted but has no
effect — the fd-backed `*file` wrapper continues to satisfy
`billy.File` with the same `*os.File.ReadAt` semantics as before.

The mmap-backed file is a new internal `*mmapFile` type. It
implements `billy.File` honestly: `Read` and `Seek` track a real
cursor over the mapped bytes, `ReadAt` is concurrent-safe via an
`RWMutex` that serialises Close against in-flight reads (otherwise
munmap would invalidate the mapping under a racing read and the
kernel would raise SIGBUS), and `Write`/`WriteAt`/`Truncate`
return `os.ErrPermission`. A `runtime.SetFinalizer` is set on
construction and cleared on Close so a forgotten Close cannot leak
the mapping — the pattern `golang.org/x/exp/mmap` uses, transposed
to our handle.

Files where mmap is unavailable for benign reasons (size 0, size
larger than `math.MaxInt` on a 32-bit platform, kernel rejection
for pipes / devices / FS quirks) fall through to the regular
`*file` wrapper without surfacing an error. Real failures (stat
failing, etc.) are propagated and the underlying fd is closed.

The option is marked **experimental** in its godoc. Today every
read-only `Open` on a `WithMmap`-configured filesystem becomes
mmap-backed; that is intentional but may not be the right
granularity in the long run. If real-world feedback shows the
per-FS opt-in is too coarse — for example, callers reading both
large hot files (where mmap helps) and small config/refs files
(where it is neutral or net-negative) under the same filesystem
— the option's effect may narrow to per-call.

This supersedes earlier iterations of the feature (an extension
`billy.ReaderAtFS` interface returning a narrow `ReaderAtCloser`
type, with `OpenReaderAt` methods on every backing and forwarder
plumbing through `helper/chroot` and `helper/polyfill`). The
narrower design did not earn its API surface for what is, in the
end, "let `osfs` opt into mmap" — `billy.File` already names the
shape of a readable, closable, seekable file, and the
concurrency-safe `ReadAt` contract on it (pinned in the previous
commit) is sufficient for the use cases the extension interface
was meant to serve.

Tests parametrise across both backings (`fd`, `mmap`) where the
platform offers them. `TestOpenBackingSelection` asserts that
`WithMmap` actually flips the concrete type on darwin/linux and
that the write-mode case bypasses mmap regardless. Concurrent
ReadAt tests run under `-race` to exercise the RWMutex.

Assisted-by: Claude Opus 4.7
Signed-off-by: Hidde Beydals <hidde@hhh.computer>
@hiddeco hiddeco force-pushed the feat/readerat-fs branch from 6598282 to 0dea49c Compare May 18, 2026 19:45
@hiddeco hiddeco changed the title Add ReaderAtFS extension osfs: Add experimental WithMmap for read-only Opens May 18, 2026
@hiddeco hiddeco marked this pull request as ready for review May 18, 2026 19:51
Copy link
Copy Markdown
Member

@pjbgf pjbgf left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@hiddeco Thanks for working on this. 🙇

@hiddeco hiddeco merged commit 0095b06 into go-git:main May 19, 2026
16 checks passed
@hiddeco hiddeco deleted the feat/readerat-fs branch May 19, 2026 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants