Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
61 commits
Select commit Hold shift + click to select a range
086a3e1
Scaffold elfuse oci subcommand and image reference parser
Max042004 May 15, 2026
43a3d38
Add OCI content-addressable blob store and SHA-256 digester
Max042004 May 15, 2026
9bf7141
Add OCI manifest, image-index, and image-config parsers
Max042004 May 15, 2026
cc97d97
Add OCI registry HTTPS client (anonymous + bearer token challenge)
Max042004 May 15, 2026
c8e1e97
Add OCI registry private-registry options (basic auth, custom CA, ins…
Max042004 May 15, 2026
08a2f4e
Add OCI local store and elfuse oci pull pipeline
Max042004 May 15, 2026
0ec6b84
Add OCI offline manifest tree renderer for elfuse oci inspect
Max042004 May 15, 2026
209a338
Vendor zstd v1.5.6 decode-only for OCI layer unpack
Max042004 May 20, 2026
0004508
Add OCI tar reader for ustar and GNU long-name entries
Max042004 May 20, 2026
81078f4
Add OCI decompression dispatch for gzip and zstd layer blobs
Max042004 May 20, 2026
ffae40a
Add OCI sidecar metadata table for unpacked layers
Max042004 May 20, 2026
9545c2e
Add OCI layer applier with whiteout, symlink-escape, hardlink semantics
Max042004 May 20, 2026
c59640f
Add OCI sysroot volume provisioning over sparse case-sensitive APFS
Max042004 May 20, 2026
f317c81
Add clonefile-based per-run rootfs for OCI image clones
Max042004 May 20, 2026
ed7bb61
Wire OCI unpack pipeline and add oci unpack / clone subcommands
Max042004 May 20, 2026
56f327b
Silence hdiutil stdout in sysroot detach and create
Max042004 May 20, 2026
1d7bb30
Add image-config runtime block to elfuse oci inspect
Max042004 May 20, 2026
0ad590d
Add OCI runspec resolver for image runtime + CLI override merge
Max042004 May 20, 2026
5ad5e20
Add OCI guest PATH resolver with sysroot containment
Max042004 May 20, 2026
9c2779b
Extract elfuse_launch from main for Phase 3 oci run reuse
Max042004 May 20, 2026
5938c82
Add elfuse oci run subcommand and orchestrator
Max042004 May 20, 2026
1c0ccf7
Add OCI compat shell smoke and fixture-builder for Phase 3 closeout
Max042004 May 20, 2026
9fc3adb
Add OCI image-layout 1.0.0 marker to store root
Max042004 May 21, 2026
c954a83
Move OCI store pins from refs/ flat-file to index.json
Max042004 May 21, 2026
5cd7a82
Auto-migrate OCI store refs/ flat-files to index.json on open
Max042004 May 21, 2026
f2e0494
Add OCI origin sidecar to unpacked image trees
Max042004 May 21, 2026
3b99337
Add OCI root-set walker for store garbage collection
Max042004 May 21, 2026
5ad07e5
Add OCI image prune mark-and-sweep
Max042004 May 21, 2026
e6cb907
Add OCI prune filters: --older-than and --keep-bytes
Max042004 May 21, 2026
9259014
Lift OCI unpack per-layer step into public oci_unpack_layer helper
Max042004 May 21, 2026
34ec1fe
Add OCI per-layer unpack snapshot cache via APFS clonefile
Max042004 May 21, 2026
bb42fd6
Add OCI raw-tar layer apply mode for Plan 3 C3.3 cache populate
Max042004 May 21, 2026
8e254c8
Add OCI layers schema marker and v1 cache auto-migration
Max042004 May 21, 2026
d521aa2
Add OCI ChainID helper and stack cache APIs for Plan 3 C3.3c
Max042004 May 21, 2026
946caaf
Rewrite OCI unpack orchestrator on raw + ChainID stack cache
Max042004 May 21, 2026
345448b
Add OCI cross-image dedup metrics for oci inspect
Max042004 May 21, 2026
4df17b1
Add OCI rebuild-cache for back-filling stack snapshots
Max042004 May 21, 2026
920d61c
Add OCI layer and stack prune sweep
Max042004 May 21, 2026
be475a1
Add OCI store-wide status command
Max042004 May 22, 2026
28ce75b
Add OCI pull --refresh manifest revalidation
Max042004 May 22, 2026
d01b703
Add OCI policy.json schema and loader
Max042004 May 22, 2026
4da7e9c
Plumb OCI policy.json into fetch and pull CLI
Max042004 May 22, 2026
537e855
Add OCI policy registries.d overlay
Max042004 May 22, 2026
10f2c57
Add OCI parallel blob fetch via curl_multi
Max042004 May 22, 2026
e30adfc
Add OCI HTTP Range resume for partial blob fetches
Max042004 May 22, 2026
7a5d1e1
Add OCI per-blob progress callback and TTY/non-TTY renderer
Max042004 May 22, 2026
b72e866
Add OCI clone-rootfs writable-fs DoD coverage
Max042004 May 22, 2026
095db72
Add OCI runtime files injection for resolv.conf / hosts / hostname
Max042004 May 22, 2026
5bf4639
Add OCI runtime /dev/full and /dev/console emulation
Max042004 May 22, 2026
0b13164
Add OCI runtime /proc surface for cgroup hostname comm statm
Max042004 May 22, 2026
64a223d
Add OCI image-config User symbolic resolution via /etc/passwd
Max042004 May 22, 2026
ce066b9
Document OCI Phase 4 runtime surface and libc-adjacent envelope
Max042004 May 22, 2026
76303c2
Add OCI image index walk to oci run subcommand
Max042004 May 22, 2026
575707c
Add OCI_FETCH_ONLINE=1 alpine:3 end-to-end smoke
Max042004 May 22, 2026
45380d9
Fix OCI layer-apply rejecting root tar entry
Max042004 May 23, 2026
60a3c5d
Use OCI unpack copyfile fallback for cross-volume stage
Max042004 May 23, 2026
17babd0
Add OCI tar PAX path linkpath support
Max042004 May 23, 2026
6cd07c2
Add OCI compat heavy mode sparsebundle + alpine-shaped fixture
Max042004 May 23, 2026
b2f64b8
Add OCI compat heavy mode busybox-shaped fixture
Max042004 May 23, 2026
c4a1110
Add OCI compat heavy mode two-layer-whiteout fixture
Max042004 May 23, 2026
700ac9d
Add ELFUSE_OCI_PROGRESS=plain to opt out of pull in-place redraw
Max042004 May 23, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 7 additions & 1 deletion .gitignore
Original file line number Diff line number Diff line change
@@ -1,6 +1,12 @@
build/
archive/
externals/
# externals/ holds downloaded fixtures (kernel, rootfs, packages) that are
# fetched on demand; tracking them in git would balloon the repo. The
# vendored cJSON and zstd trees are exceptions: they ship with the source
# so the OCI parser and layer unpacker build out of the box.
externals/*
!externals/cjson/
!externals/zstd/
lib/modules/
*.o
*.bin
279 changes: 277 additions & 2 deletions Makefile

Large diffs are not rendered by default.

173 changes: 173 additions & 0 deletions docs/usage.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,6 +99,179 @@ and memory access, and per-thread inspection. Implementation details, including
the snapshot protocol used to keep Hypervisor.framework register access on the
owning thread, are documented in [internals.md](internals.md).

## Running OCI Images (`elfuse oci run`)

Phase 3 adds a direct-execution path for pulled OCI images:

```sh
elfuse oci run [OPTIONS] IMAGE [ARG...]
```

The subcommand reads the image's runtime block (Entrypoint, Cmd, Env,
WorkingDir, User) and folds in any CLI overrides, then unpacks the image
into the local APFS sysroot volume, clones a per-run rootfs via APFS
`clonefile(2)`, resolves argv[0] against PATH inside the rootfs, and
hands off to the same VM bring-up the legacy positional-ELF `elfuse`
entry uses.

The image must already be pulled. `oci run` does not auto-pull on miss.
The usual workflow is:

```sh
elfuse oci pull alpine:3
elfuse oci run alpine:3 /bin/sh -c 'echo hello from inside'
```

### Options

| Option | Meaning |
|--------|---------|
| `--store DIR` | Override the local store root |
| `--volume DIR` | Override the APFS sysroot volume mount point |
| `--entrypoint PROG` | Replace the image Entrypoint with `PROG` |
| `-e KEY=VAL`, `--env KEY=VAL` | Set or replace one env var (repeatable) |
| `-e KEY`, `--env KEY` | Import `KEY` from the host environ (repeatable) |
| `-w DIR`, `--workdir DIR` | Override image WorkingDir |
| `-u UID[:GID]`, `--user UID[:GID]` | Override image User (numeric only) |
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: Contradictory documentation for --user. The options table describes it as 'numeric only', but the User and WorkingDir section immediately below describes detailed symbolic-name resolution (accepting symbolic name, name:group, reading /etc/passwd and /etc/group). These cannot both be correct.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At docs/usage.md, line 135:

<comment>Contradictory documentation for `--user`. The options table describes it as 'numeric only', but the User and WorkingDir section immediately below describes detailed symbolic-name resolution (accepting symbolic `name`, `name:group`, reading /etc/passwd and /etc/group). These cannot both be correct.</comment>

<file context>
@@ -99,6 +99,179 @@ and memory access, and per-thread inspection. Implementation details, including
+| `-e KEY=VAL`, `--env KEY=VAL` | Set or replace one env var (repeatable) |
+| `-e KEY`, `--env KEY` | Import `KEY` from the host environ (repeatable) |
+| `-w DIR`, `--workdir DIR` | Override image WorkingDir |
+| `-u UID[:GID]`, `--user UID[:GID]` | Override image User (numeric only) |
+| `--keep` | Keep the per-run cloned rootfs after exit |
+| `--name NAME` | Reserved: deterministic clone-dir suffix (ignored today) |
</file context>
Suggested change
| `-u UID[:GID]`, `--user UID[:GID]` | Override image User (numeric only) |
| `-u UID[:GID]`, `--user UID[:GID]` | Override image User (supports numeric UID[:GID] or symbolic name[:group]) |

| `--keep` | Keep the per-run cloned rootfs after exit |
| `--name NAME` | Reserved: deterministic clone-dir suffix (ignored today) |

### Argv override matrix

| Image Entrypoint | Image Cmd | CLI ARGV | `--entrypoint` | Result argv |
|--|--|--|--|--|
| set | set | none | none | Entrypoint ++ Cmd |
| set | set | provided | none | Entrypoint ++ CLI ARGV (Cmd dropped) |
| set | none | provided | none | Entrypoint ++ CLI ARGV |
| none | set | none | none | Cmd |
| none | set | provided | none | CLI ARGV (Cmd dropped) |
| set | set | optional | provided | [`--entrypoint`] ++ CLI ARGV |
| none | none | provided | none | CLI ARGV |
| none | none | none | none | `EINVAL` "image has no entrypoint or cmd; pass one on the CLI" |

### Env merge policy

The merged guest env is built in this order:

1. Image `Env` (verbatim, in spec order)
2. Each CLI `-e KEY=VAL` set-or-replaces by key
3. Each CLI `-e KEY` (no `=`) imports the host's value when present, otherwise drops silently
4. `TERM` auto-imported from the host iff the merged env has no `TERM`
5. `PATH=/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin` injected iff the merged env has no `PATH`
6. `container=elfuse` injected unconditionally so systemd-style sandbox detection works

CLI `-e DYLD_*=...` overrides are hard-rejected with `EINVAL`: `DYLD_*` is a
macOS-only loader contract with no meaning inside an aarch64-linux guest.
Image-provided `DYLD_*` entries pass through (the guest ignores them).

### User and WorkingDir

`User` accepts seven shapes: the empty string (no override), a numeric
`UID`, `UID:GID`, a symbolic `name`, `name:group`, `uid:group`, or
`name:gid`. Symbolic forms read `/etc/passwd` and `/etc/group` from
the cloned rootfs. A token made entirely of ASCII digits is always
parsed numerically, even when a same-named account ships in the image
(this matches runc semantics, so an image that happens to carry a
`1234` account does not capture `--user 1234`). When the symbolic
form names an account the unpacked layers do not actually carry,
lookup fails closed; `elfuse` never silently falls back to root.
`--user UID` alone defaults GID to the same value.

`WorkingDir` must be absolute and free of `..` segments. If neither the
image nor the CLI sets it, the guest starts in `/`. The directory is
materialized under the cloned rootfs (`mkdir -p`, mode 0755, best-
effort chown to the resolved uid:gid when `--user` or image User
selects credentials).

### Scope guardrails

- Auto-pull on `run` miss -> never; `elfuse oci pull` must run first
- Network policy, `docker run -p`-style port mapping -> later phases
- Live `docker exec`-style attach -> never

### Runtime host-truth surface

`elfuse oci run` runs the guest against a freshly cloned per-run
rootfs and a small set of synthesized host-truth files. The rootfs
is produced by APFS `clonefile(2)` against the unpacked image
layers, so the first guest write to any path triggers copy-on-write
in APFS without touching the original image. The clone is removed at
guest exit unless `--keep` is set; nothing is ever pushed back to
the on-disk image, and concurrent `oci run` invocations against the
same image are isolated.

Three `/etc` files are overwritten in the clone before the guest
starts. Any pre-existing symlink (the common case is
`/etc/resolv.conf -> /run/systemd/resolve/stub-resolv.conf`) is
unlinked first so it does not dangle inside the guest:

| File | Source |
|--|--|
| `/etc/resolv.conf` | `nameserver` lines harvested from `scutil --dns`; falls back to `8.8.8.8` and `1.1.1.1` on any scutil failure |
| `/etc/hosts` | fixed 5-line block: `localhost`, the ip6-loopback aliases, ip6 link-local multicast, and `127.0.0.1 host.elfuse.internal` |
| `/etc/hostname` | literal string `elfuse` |

The following pseudo-filesystem paths are synthesized by the host-side
openat interceptor and do not need to exist inside the rootfs:

| Path | Behavior |
|--|--|
| `/dev/null`, `/dev/zero`, `/dev/random`, `/dev/urandom`, `/dev/tty` | redirected to the host device of the same name |
| `/dev/full` | reads zero-fill, writes of any non-zero length return `ENOSPC` |
| `/dev/console` | mirrored from the controlling tty when present (macOS reserves the real `/dev/console` for the kernel) |
| other `/dev/*` | `ENOENT` |
| `/proc/cpuinfo`, `/proc/meminfo`, `/proc/version` | derived from host sysctl |
| `/proc/self/{maps,exe,status,stat,comm,statm,cgroup}` | synthesized; `cgroup` reports the canonical `0::/` (elfuse runs outside any cgroup hierarchy) |
| `/proc/sys/kernel/{ostype,osrelease,hostname}` | tracks the cached `uname` fields (`Linux`, `6.17.0-20-generic`, `elfuse`) |

### Libc-adjacent compatibility

`elfuse` does not patch libc-adjacent payload (NSS modules, time-zone
data, locale data, character-set converters, dynamic-linker cache)
inside the guest. Each item below names the contract `elfuse` honors
and the failure mode an image hits when it does not ship the
matching files.

- **`/etc/nsswitch.conf`** is read by the guest's libc, not by
`elfuse`. Only the `files` and `dns` backends actually function:
`files` resolves through `/etc/{passwd,group,hosts}` in the cloned
rootfs, and `dns` resolves through host `getaddrinfo` via the
synthesized `/etc/resolv.conf`. Backends such as `systemd`, `sss`,
or `ldap` need their NSS shared object plus a matching daemon,
neither of which `elfuse` provides.
- **NSS shared objects** (`libnss_systemd.so`, `libnss_sss.so`,
`libnss_ldap.so`, ...) are `dlopen`'d by guest libc against its own
loader. `elfuse` never injects NSS modules: they are aarch64-linux
ELF objects against guest libc, so the macOS host has no way to
load them, and the guest can only `dlopen` the modules its image
already carries.
- **tzdata** (`/usr/share/zoneinfo`, `/etc/localtime`, `/etc/timezone`)
ships with the image. `elfuse` does not transcode macOS
`/var/db/timezone/zoneinfo` into the tzdata format; if the image is
missing the needed zone, glibc / musl fall back to UTC. The `TZ`
environment variable is honored as-is and is not rewritten by the
Env merge policy.
- **`/usr/lib/locale/locale-archive`** is not regenerated. glibc
images without a built archive (or the matching `<lang>.UTF-8/`
directory) fall back to the `C` locale; locale-aware sort / printf
/ strcoll outputs ASCII order. musl images do not use the archive
and are unaffected.
- **`/usr/lib/<triple>/gconv/`** modules and the `gconv-modules`
index ship with the image. Missing modules surface as `EILSEQ` from
`iconv` / glibc's character-set conversion; this most often shows
up when an image ships a stripped glibc layer.
- **`ld.so.cache`** is not rebuilt. The guest dynamic linker reads
whatever cache the image carries; missing entries fall through to
the linker's library-path search, which is the normal slow path.

Common workloads and the symptom-to-workaround mapping:

| Symptom | Trigger | Workaround |
|--|--|--|
| `getaddrinfo` returns `EAI_AGAIN` or an empty result | `/etc/nsswitch.conf` lists a backend (`systemd`, `sss`, ...) that needs a daemon | use a distro whose `nsswitch.conf` is `files dns` (alpine ships this by default; debian needs the file edited) |
| `date`, `strftime` show UTC instead of the expected zone | the image does not contain `/usr/share/zoneinfo/<Zone>` | install tzdata in the image (`apk add tzdata` / `apt install tzdata`), or pass `-e TZ=UTC` to acknowledge UTC |
| `sort`, `printf`, `strcoll` collate in ASCII order | the image is missing `/usr/lib/locale/locale-archive` or the matching `<lang>.UTF-8/` directory | accept the C-locale fallback, run `locale-gen` during the image build, or use a musl-based image (alpine), which does not depend on the archive |

## Guest Compatibility Model

`elfuse` is designed for Linux user-space workloads, not for booting a Linux
Expand Down
20 changes: 20 additions & 0 deletions externals/cjson/LICENSE
Original file line number Diff line number Diff line change
@@ -0,0 +1,20 @@
Copyright (c) 2009-2017 Dave Gamble and cJSON contributors

Permission is hereby granted, free of charge, to any person obtaining a copy
of this software and associated documentation files (the "Software"), to deal
in the Software without restriction, including without limitation the rights
to use, copy, modify, merge, publish, distribute, sublicense, and/or sell
copies of the Software, and to permit persons to whom the Software is
furnished to do so, subject to the following conditions:

The above copyright notice and this permission notice shall be included in
all copies or substantial portions of the Software.

THE SOFTWARE IS PROVIDED "AS IS", WITHOUT WARRANTY OF ANY KIND, EXPRESS OR
IMPLIED, INCLUDING BUT NOT LIMITED TO THE WARRANTIES OF MERCHANTABILITY,
FITNESS FOR A PARTICULAR PURPOSE AND NONINFRINGEMENT. IN NO EVENT SHALL THE
AUTHORS OR COPYRIGHT HOLDERS BE LIABLE FOR ANY CLAIM, DAMAGES OR OTHER
LIABILITY, WHETHER IN AN ACTION OF CONTRACT, TORT OR OTHERWISE, ARISING FROM,
OUT OF OR IN CONNECTION WITH THE SOFTWARE OR THE USE OR OTHER DEALINGS IN
THE SOFTWARE.

35 changes: 35 additions & 0 deletions externals/cjson/VENDORING.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
# Vendored cJSON

This directory contains a vendored copy of [cJSON](https://github.com/DaveGamble/cJSON),
the ultralightweight JSON parser written in ANSI C. cJSON ships as a single
`.c` / `.h` pair and is dual-licensed under the MIT license (see `LICENSE`).

## Why vendored

`oci-roadmap.md` Q9 commits Phase 1 to hand-rolled C alongside the existing
elfuse codebase: no Go, no Rust, no `cargo` / `go` in the build matrix. cJSON
is the smallest credible JSON dependency that fits that contract; it is
self-contained, has no external dependencies, and compiles cleanly with
`clang` and `gcc` on macOS and Linux.

## Version

Pinned to upstream tag `v1.7.18` (2024-07-30). Fetched with:

```
curl -fsSL -o cJSON.h https://raw.githubusercontent.com/DaveGamble/cJSON/v1.7.18/cJSON.h
curl -fsSL -o cJSON.c https://raw.githubusercontent.com/DaveGamble/cJSON/v1.7.18/cJSON.c
curl -fsSL -o LICENSE https://raw.githubusercontent.com/DaveGamble/cJSON/v1.7.18/LICENSE
```

## Local modifications

None. The files are byte-identical to the upstream tag so future security
updates can be applied by re-running the curl commands above.

## Build integration

The Makefile compiles `cJSON.c` with project warning flags relaxed: cJSON is
third-party code and its style does not match elfuse's `-Wpedantic
-Wmissing-prototypes -Wshadow` posture. Only `src/oci/` translation units
include `externals/cjson/cJSON.h`; the rest of the codebase never sees it.
Loading