You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Add OCI image support to elfuse so that OCI-packaged Linux userspace can be fetched, unpacked, and executed on macOS without a Linux kernel or VM in the loop.
This issue proposes a phased roadmap and surfaces the compatibility cliffs that need to be answered before code lands. Initial scope is intentionally limited to linux/arm64 images consumed as Linux root filesystems - not to OCI Runtime Spec compatibility.
Background
OCI images have become the standard packaging format for Linux userspace, well beyond their original "container" framing. On macOS, common ways to run those images today are VM-backed:
All execute Linux workloads against a Linux kernel inside a virtualized environment. nerdctl is a CLI typically driven through a Linux VM environment such as Lima; it is not itself
a macOS runtime.
They differ in UX, but share one execution model:
OCI image -> Linux VM -> Linux kernel -> runtime -> application
Even OrbStack, which optimizes this aggressively, still executes Linux applications inside a Linux guest kernel.
What elfuse Does Differently
elfuse translates Linux syscalls into macOS primitives via a runtime layer atop Hypervisor.framework:
Linux ELF -> elfuse syscall/runtime -> macOS kernel
The goal is a Linux ABI runtime, not a complete Linux emulation: fast Linux userspace execution, low-overhead process startup, host filesystem access without a virtualized bridge, and direct visibility into the Linux ABI for education and research.
Today, running anything dynamically-linked requires hand-built sysroots fed via --sysroot. OCI integration replaces that with the existing Linux software distribution ecosystem.
Relationship to Apple Container (prior art only)
Apple's container and Containerization projects are useful prior art for image distribution: registry access, image
management, and root filesystem construction. They are not an execution-model match. Apple's current design runs each Linux container inside a lightweight VM on Virtualization.framework.
elfuse would only reuse image distribution and unpacking ideas, not the guest-kernel, vsock, or VM-orchestration layers.
OrbStack runs nearly any Linux container because it runs a real kernel. elfuse targets a narrower set: CLI tooling, compiler and scripting environments, lightweight userspace, and ABI research.
Non-goals (initial phases)
To set expectations correctly up front, the following are explicitly out of scope and will not be accepted as bug reports against early phases:
Linux namespace, cgroup, seccomp, or capability parity
General container networking semantics (docker run -p, CNI,
per-container loopback)
Arbitrary image support regardless of filesystem and userspace assumptions (e.g. images that require overlayfs semantics, an init system, or device management)
Proposed Roadmap
Phase 1 - Pull and verified local store
Pull linux/arm64 images by reference or digest. Select the correct manifest from an image index, verify digests, store blobs and image config locally. No stored user credentials in Phase 1; anonymous token-challenge flows (Docker Hub style) may still be required for public registries.
Phase 2 - Correct layer unpack into an elfuse sysroot
Apply OCI layers into a case-safe local sysroot with correct whiteouts and opaque-directory semantics, symlinks, hardlinks, modes, ownership metadata, and supported xattrs. Document and reject layer features that are not yet supported. The unpacked tree is consumable via the existing --sysroot path.
Prerequisite (not optional): commit to an APFS case-sensitivity strategy before this phase begins. macOS APFS is case-insensitive by default; Linux images legally contain case-colliding paths. Acceptable strategies: require a case-sensitive APFS volume,
ship a sparse case-sensitive disk image automatically, or reject colliding images with a deterministic error. "Best effort on default APFS" is not acceptable - it produces silent corruption.
Phase 3 - Direct execution from image config (narrow class)
Honor the image-config fields needed for direct launch for a narrow compatibility class only: Entrypoint, Cmd, Env, WorkingDir, and numeric User (uid[:gid]; symbolic users are rejected at this phase). Provide:
elfuse run IMAGE [ARG...]
This phase targets non-networked, no-init, no-copy-up-required workloads such as sh -c 'echo ...', BusyBox applets, and simple compiler invocations. Real-world images depending on DNS, /etc/hosts, /dev
nodes, or writable rootfs paths require Phase 4.
Phase 4 - Runtime environment synthesis
Add the minimum Linux userspace environment real workloads expect: /etc/resolv.conf, /etc/hosts, /etc/hostname, essential /dev nodes, basic /proc, pty / signal / reaping behavior, and a defined writable-filesystem model. This is the work that actually decides whether elfuse run alpine:latest sh is a demo or a tool.
Libc-adjacent compatibility (nsswitch.conf / NSS modules, tzdata, locale-archive, gconv) is acknowledged here as a known compatibility surface. The exact policy is left to implementation, but the issue cannot pretend these do not exist.
OCI Runtime Spec compatibility and containerd / nerdctl backend integration are deliberately deferred past Phase 4; they may never make sense given the non-goals above.
Open Questions
These need answers before Phase 1 begins.
Filesystem and storage
Which APFS case-sensitivity strategy will Phase 2 commit to: require a user-provided case-sensitive volume, auto-create a sparse case-sensitive disk image, or strict collision rejection?
macOS has no overlayfs. What is the writable-rootfs model: read-only sysroot with writable host bind points, full copy-up
tree on the host filesystem, or a userspace union (FUSE)?
What exact subset of OCI Image Spec is in scope for Phase 2: whiteouts and opaque directories, hardlinks, xattrs, device nodes, foreign / nondistributable layers, gzip and zstd compression?
Runtime contract
4. User in the image config: numeric UID only is the Phase 3 answer, but what is the long-term plan? Username lookup via the image's own /etc/passwd, host-UID remap, or "best effort" with documented incompatibilities? macOS has no user namespaces, so this is a real design call.
5. Which Linux runtime-environment files does elfuse synthesize versus expect from the image? Candidates: /etc/resolv.conf, /etc/hosts, /etc/hostname, /tmp, /var/tmp, /dev/*, /proc/*, locale-archive, ld.so.cache.
6. Networking model: inherited host networking only, restricted bind behavior, or explicit user-space port-forward / proxy There is no guest network namespace by design.
Distribution and implementation
7. Registry scope for Phase 1: anonymous + token challenge only, or also auth / credential helpers / custom CAs / digest pinning?
8. Are ORAS artifacts, referrers, and sigstore verification explicitly
out of scope for the first implementation?
9. Implementation choice: vendor an existing OCI client (Go or a Rust crate) and add a toolchain dependency, shell out to skopeo / umoci, or implement pull and unpack in C alongside the existing codebase?
Engineering Breakdown
The following sub-tasks turn the roadmap into reviewable units. Each references the OCI spec section it touches and a macOS-specific risk to address. Acceptance criteria are concrete enough to drive tests.
Phase 1 - Pull and verified local store
Parse and normalize OCI image references REGISTRY/REPO[:TAG][@DIGEST] (distribution-spec). DoD: alpine:latest, ghcr.io/owner/img:tag, and repo@sha256:...
all resolve to canonical internal keys; malformed refs are rejected deterministically.
linux/arm64 manifest selection from image index
(image-spec/image-index.mdplatform object). Risk: Apple Silicon host may tempt host-based assumptions; selection must use image metadata, not uname alone. DoD: selected digest matches skopeo inspect --raw for linux/arm64.
Anonymous token-challenge support OR explicit rejection of registries requiring it. Risk: conflating "no credentials" with "no auth path". DoD: Docker Hub anonymous pull works end-to-end, or scope is documented and code errors appropriately.
Blob fetch with digest and size verification before persistence
(distribution-spec blob pull; image-spec/descriptor.md).
Risk: partial downloads on APFS leaving corrupt blobs. DoD: every blob verified against digest and declared size; corrupt blobs ejected, never committed.
Layer media-type support: uncompressed tar, gzip tar; zstd implemented or rejected with a clear "unsupported layer media type" error (image-spec/layer.md). DoD: pull records media type; unpack consumes gzip layers; zstd handled per documented policy.
Local content-addressable blob/config store with atomic writes
(image-spec/descriptor.md; optionally image-spec/image-layout.md). DoD: repeated pulls
dedupe; interrupted pulls do not leave visible-complete blobs; store survives restart.
Manifest-to-config/layer graph persistence
(image-spec/manifest.md; image-spec/config.md). DoD: elfuse oci inspect alpine:latest shows resolved manifest, config, and layer digests plus platform, offline.
Phase 2 - Correct layer unpack
Commit to APFS case-sensitivity strategy (volume / sparse APFSX disk image / collision rejection) and enforce it at unpack time.
Risk: default APFS is case-insensitive. DoD: unpack target is provably case-sensitive, or unpack of a colliding image fails
deterministically.
Apply layers strictly in manifest order
(image-spec/manifest.md). DoD: unpacked tree hash matches umoci unpack for the supported file-type subset.
Whiteout handling: .wh.<name> deletes from lower layers
(image-spec/layer.md). DoD: crafted add/delete/add fixture matches umoci final tree on case-sensitive storage.
Symlink unpack with traversal-escape rejection
(image-spec/layer.md tar changeset semantics). Risk: absolute or .. symlinks escaping the sysroot. DoD: in-root symlinks unpack; malicious tar entries rejected without touching host paths.
Hardlink preservation (image-spec/layer.md). DoD: hardlinked fixture has matching inode link counts after unpack.
File mode, mtime, uid, gid policy with documented host mapping
(image-spec/layer.md). Risk: unprivileged chown limits on macOS. DoD: modes and mtimes round-trip; ownership behavior is documented and test-covered (preserved / best-effort / sidecar).
Reject unsupported special files: block / char devices, sockets
(image-spec/layer.md). DoD: unpack aborts with a precise unsupported-entry error naming path and type; no silent skip.
xattr policy: preserve supported subset OR ignore all, one documented behavior. Risk: Linux vs macOS xattr namespace
incompatibility. DoD: tests prove deterministic behavior under the chosen policy.
Differential unpack validation against a reference tool. DoD:
for alpine, busybox, and nginx:alpine, unpacked tree matches umoci / skopeo+umoci for regular files, dirs, symlinks, hardlinks, modes, and whiteout results.
Phase 3 - Direct execution from image config (narrow class)
Parse image config blob and expose Entrypoint, Cmd, Env,
WorkingDir, User via elfuse oci inspect (image-spec/config.md).
Entrypoint + Cmd merge semantics with documented override matrix (image-spec/config.md). DoD: tested for image-only-Cmd, image-only-Entrypoint, both, and both-with-CLI-override.
Env injection with deterministic merge against host env and CLI overrides; host-var filter policy
(image-spec/config.md Env). Risk: DYLD_*, locale, HOME, PATH leakage from macOS host. DoD: guest env shows exactly the documented set, zero unintended host bleed.
WorkingDir normalization under sysroot only; missing-dir policy (image-spec/config.md WorkingDir). DoD: absolute and relative paths normalize within sysroot; nonexistent dirs create-or-fail per documented rule.
Numeric User (uid[:gid]) only at this phase; reject username/groupname forms with an explicit "NSS resolution not yet
implemented" error (image-spec/config.md User). Risk: no Linux user namespaces on macOS.
PATH resolution inside the sysroot only; host shell never used. DoD: elfuse run alpine:latest sh -c 'echo ok' resolves /bin/sh from the sysroot.
Tag-to-digest pinning at pull time. DoD: run IMAGE by tag uses the manifest digest recorded at pull; by-digest run is stable.
Phase 3 compatibility matrix: automated tests for alpine, busybox, and one multi-layer image, limited to non-networked commands.
Phase 4 - Runtime environment synthesis
Writable-filesystem model: one documented choice of read-only + binds / copy-up tree / userspace union. DoD: touch /hello succeeds inside run; immutable base sysroot remains pristine.
/etc/resolv.conf synthesis from macOS resolver state with documented refresh policy. DoD: getaddrinfo() resolves public names in a supported image.
/etc/hosts and /etc/hostname synthesis with documented precedence vs image-provided files.
Minimal /proc surface for targeted workloads with documented omissions. DoD: compatibility-matrix commands pass; unsupported files fail predictably.
Libc-adjacent compatibility audit: nsswitch.conf, NSS modules, tzdata, locale-archive, gconv. Document which break which workloads; implementation decisions follow.
Positioning
elfuse with OCI should be described as:
A Linux ABI runtime for macOS that consumes OCI-packaged Linux userspace and launches it directly, without a Linux guest kernel.
It is an OCI image consumer, not an OCI Runtime Spec implementation, and not a Docker replacement. The compatibility envelope is materially narrower than VM-backed runtimes - the non-goals above are the price of removing the Linux kernel from the execution path.
References
OCI specifications referenced throughout this issue:
Summary
Add OCI image support to elfuse so that OCI-packaged Linux userspace can be fetched, unpacked, and executed on macOS without a Linux kernel or VM in the loop.
This issue proposes a phased roadmap and surfaces the compatibility cliffs that need to be answered before code lands. Initial scope is intentionally limited to
linux/arm64images consumed as Linux root filesystems - not to OCI Runtime Spec compatibility.Background
OCI images have become the standard packaging format for Linux userspace, well beyond their original "container" framing. On macOS, common ways to run those images today are VM-backed:
containerstackAll execute Linux workloads against a Linux kernel inside a virtualized environment.
nerdctlis a CLI typically driven through a Linux VM environment such as Lima; it is not itselfa macOS runtime.
They differ in UX, but share one execution model:
Even OrbStack, which optimizes this aggressively, still executes Linux applications inside a Linux guest kernel.
What elfuse Does Differently
elfuse translates Linux syscalls into macOS primitives via a runtime layer atop Hypervisor.framework:
The goal is a Linux ABI runtime, not a complete Linux emulation: fast Linux userspace execution, low-overhead process startup, host filesystem access without a virtualized bridge, and direct visibility into the Linux ABI for education and research.
Today, running anything dynamically-linked requires hand-built sysroots fed via
--sysroot. OCI integration replaces that with the existing Linux software distribution ecosystem.Relationship to Apple Container (prior art only)
Apple's
containerandContainerizationprojects are useful prior art for image distribution: registry access, imagemanagement, and root filesystem construction. They are not an execution-model match. Apple's current design runs each Linux container inside a lightweight VM on Virtualization.framework.
elfuse would only reuse image distribution and unpacking ideas, not the guest-kernel, vsock, or VM-orchestration layers.
Comparison with OrbStack
OrbStack shows how far the VM-backed model can be pushed. elfuse trades compatibility for a different execution model:
OrbStack runs nearly any Linux container because it runs a real kernel. elfuse targets a narrower set: CLI tooling, compiler and scripting environments, lightweight userspace, and ABI research.
Non-goals (initial phases)
To set expectations correctly up front, the following are explicitly out of scope and will not be accepted as bug reports against early phases:
runc/crunequivalent)docker run -p, CNI,per-container loopback)
Proposed Roadmap
Phase 1 - Pull and verified local store
Pull
linux/arm64images by reference or digest. Select the correct manifest from an image index, verify digests, store blobs and image config locally. No stored user credentials in Phase 1; anonymous token-challenge flows (Docker Hub style) may still be required for public registries.Phase 2 - Correct layer unpack into an elfuse sysroot
Apply OCI layers into a case-safe local sysroot with correct whiteouts and opaque-directory semantics, symlinks, hardlinks, modes, ownership metadata, and supported xattrs. Document and reject layer features that are not yet supported. The unpacked tree is consumable via the existing
--sysrootpath.Prerequisite (not optional): commit to an APFS case-sensitivity strategy before this phase begins. macOS APFS is case-insensitive by default; Linux images legally contain case-colliding paths. Acceptable strategies: require a case-sensitive APFS volume,
ship a sparse case-sensitive disk image automatically, or reject colliding images with a deterministic error. "Best effort on default APFS" is not acceptable - it produces silent corruption.
Phase 3 - Direct execution from image config (narrow class)
Honor the image-config fields needed for direct launch for a narrow compatibility class only:
Entrypoint,Cmd,Env,WorkingDir, and numericUser(uid[:gid]; symbolic users are rejected at this phase). Provide:This phase targets non-networked, no-init, no-copy-up-required workloads such as
sh -c 'echo ...', BusyBox applets, and simple compiler invocations. Real-world images depending on DNS, /etc/hosts, /devnodes, or writable rootfs paths require Phase 4.
Phase 4 - Runtime environment synthesis
Add the minimum Linux userspace environment real workloads expect:
/etc/resolv.conf,/etc/hosts,/etc/hostname, essential/devnodes, basic/proc, pty / signal / reaping behavior, and a defined writable-filesystem model. This is the work that actually decides whetherelfuse run alpine:latest shis a demo or a tool.Libc-adjacent compatibility (
nsswitch.conf/ NSS modules, tzdata,locale-archive,gconv) is acknowledged here as a known compatibility surface. The exact policy is left to implementation, but the issue cannot pretend these do not exist.OCI Runtime Spec compatibility and containerd /
nerdctlbackend integration are deliberately deferred past Phase 4; they may never make sense given the non-goals above.Open Questions
These need answers before Phase 1 begins.
Filesystem and storage
tree on the host filesystem, or a userspace union (FUSE)?
Runtime contract
4.
Userin the image config: numeric UID only is the Phase 3 answer, but what is the long-term plan? Username lookup via the image's own/etc/passwd, host-UID remap, or "best effort" with documented incompatibilities? macOS has no user namespaces, so this is a real design call.5. Which Linux runtime-environment files does elfuse synthesize versus expect from the image? Candidates:
/etc/resolv.conf,/etc/hosts,/etc/hostname,/tmp,/var/tmp,/dev/*,/proc/*,locale-archive,ld.so.cache.6. Networking model: inherited host networking only, restricted bind behavior, or explicit user-space port-forward / proxy There is no guest network namespace by design.
Distribution and implementation
7. Registry scope for Phase 1: anonymous + token challenge only, or also auth / credential helpers / custom CAs / digest pinning?
8. Are ORAS artifacts, referrers, and sigstore verification explicitly
out of scope for the first implementation?
9. Implementation choice: vendor an existing OCI client (Go or a Rust crate) and add a toolchain dependency, shell out to
skopeo/umoci, or implement pull and unpack in C alongside the existing codebase?Engineering Breakdown
The following sub-tasks turn the roadmap into reviewable units. Each references the OCI spec section it touches and a macOS-specific risk to address. Acceptance criteria are concrete enough to drive tests.
Phase 1 - Pull and verified local store
REGISTRY/REPO[:TAG][@DIGEST](distribution-spec). DoD:alpine:latest,ghcr.io/owner/img:tag, andrepo@sha256:...all resolve to canonical internal keys; malformed refs are rejected deterministically.
(distribution-spec; image-spec/manifest.md; image-spec/image-index.md).
DoD: pull succeeds against Docker Hub and GHCR
for both an OCI manifest and an image index; media type is recorded and validated.
linux/arm64manifest selection from image index(image-spec/image-index.md
platformobject). Risk: Apple Silicon host may tempt host-based assumptions; selection must use image metadata, notunamealone. DoD: selected digest matchesskopeo inspect --rawforlinux/arm64.(distribution-spec blob pull; image-spec/descriptor.md).
Risk: partial downloads on APFS leaving corrupt blobs. DoD: every blob verified against digest and declared size; corrupt blobs ejected, never committed.
(image-spec/descriptor.md; optionally image-spec/image-layout.md). DoD: repeated pulls
dedupe; interrupted pulls do not leave visible-complete blobs; store survives restart.
(image-spec/manifest.md;
image-spec/config.md). DoD:
elfuse oci inspect alpine:latestshows resolved manifest, config, and layer digests plus platform, offline.Phase 2 - Correct layer unpack
Risk: default APFS is case-insensitive. DoD: unpack target is provably case-sensitive, or unpack of a colliding image fails
deterministically.
(image-spec/manifest.md). DoD: unpacked tree hash matches
umoci unpackfor the supported file-type subset..wh.<name>deletes from lower layers(image-spec/layer.md). DoD: crafted add/delete/add fixture matches
umocifinal tree on case-sensitive storage..wh..wh..opqclears lower-layer directory contents (image-spec/layer.md). DoD: lower-dir + upper opaque dir fixture matchesumoci.(image-spec/layer.md tar changeset semantics). Risk: absolute or
..symlinks escaping the sysroot. DoD: in-root symlinks unpack; malicious tar entries rejected without touching host paths.(image-spec/layer.md). Risk: unprivileged
chownlimits on macOS. DoD: modes and mtimes round-trip; ownership behavior is documented and test-covered (preserved / best-effort / sidecar).(image-spec/layer.md). DoD: unpack aborts with a precise unsupported-entry error naming path and type; no silent skip.
incompatibility. DoD: tests prove deterministic behavior under the chosen policy.
for
alpine,busybox, andnginx:alpine, unpacked tree matchesumoci/skopeo+umocifor regular files, dirs, symlinks, hardlinks, modes, and whiteout results.Phase 3 - Direct execution from image config (narrow class)
WorkingDir, User via
elfuse oci inspect(image-spec/config.md).(image-spec/config.md Env). Risk:
DYLD_*, locale,HOME,PATHleakage from macOS host. DoD: guestenvshows exactly the documented set, zero unintended host bleed.User(uid[:gid]) only at this phase; reject username/groupname forms with an explicit "NSS resolution not yetimplemented" error (image-spec/config.md User). Risk: no Linux user namespaces on macOS.
elfuse run alpine:latest sh -c 'echo ok'resolves/bin/shfrom the sysroot.run IMAGEby tag uses the manifest digest recorded at pull; by-digest run is stable.alpine,busybox, and one multi-layer image, limited to non-networked commands.Phase 4 - Runtime environment synthesis
touch /hellosucceeds inside run; immutable base sysroot remains pristine./etc/resolv.confsynthesis from macOS resolver state with documented refresh policy. DoD:getaddrinfo()resolves public names in a supported image./etc/hostsand/etc/hostnamesynthesis with documented precedence vs image-provided files./devnodes via syscall translation:/dev/null,/dev/zero,/dev/random,/dev/urandom,/dev/tty. DoD:dd if=/dev/urandom of=/dev/null count=1succeeds./procsurface for targeted workloads with documented omissions. DoD: compatibility-matrix commands pass; unsupported files fail predictably.nsswitch.conf, NSS modules, tzdata,locale-archive,gconv. Document which break which workloads; implementation decisions follow.Positioning
elfuse with OCI should be described as:
It is an OCI image consumer, not an OCI Runtime Spec implementation, and not a Docker replacement. The compatibility envelope is materially narrower than VM-backed runtimes - the non-goals above are the price of removing the Linux kernel from the execution path.
References
OCI specifications referenced throughout this issue:
Related runtimes and tooling: