docs: Add apple/container integration analysis

cgwalters · cgwalters · commit c753572b3c01 · 2026-02-25T16:28:47.000Z
Describe how bcvk could reuse ext4 filesystem images that apple/container
already creates, rather than reimplementing ext4 synthesis. The key insight
is that apple/container's snapshot store contains plain ext4 files at
predictable paths, and bcvk can read them directly.

Cover the kernel extraction problem: bcvk always boots the image's own
kernel, so it needs to read the kernel out of the ext4 image before
booting. The ext4-view crate (pure Rust, read-only ext4 access) solves
this without requiring mount privileges.

Also document apple/container's storage APIs (content store layout,
snapshot store structure) based on source analysis of the Containerization
Swift package.

Assisted-by: OpenCode (claude-opus-4-6)
diff --git a/docs/src/SUMMARY.md b/docs/src/SUMMARY.md
@@ -5,6 +5,9 @@
 - [Installation](./installation.md)
 - [Quick Start](./quick-start.md)
 - [Workflow Comparison](./workflow-comparison.md)
+  - [vs bootc](./vs-bootc.md)
+  - [vs podman-bootc](./vs-podman-bootc.md)
+  - [Apple container integration](./vs-apple-container.md)
 
 # Reference
 
diff --git a/docs/src/vs-apple-container.md b/docs/src/vs-apple-container.md
@@ -0,0 +1,289 @@
+# bcvk and Apple container integration
+
+Apple's [`container`](https://github.com/apple/container) is a Swift-based
+tool that runs Linux containers as lightweight virtual machines on Apple Silicon
+Macs using the macOS Virtualization framework. It is macOS-only (requires macOS
+26+ and Apple Silicon) and targets standard OCI container images.
+
+bcvk runs *bootable* container images as VMs using QEMU/libvirt on Linux.
+
+Despite both tools using VMs, they use them differently. Apple's `container`
+runs container processes inside lightweight VMs — the VM is an isolation
+mechanism wrapping what is conceptually still a container. bcvk boots a
+complete OS from a container image — the VM *is* the end product, not an
+implementation detail.
+
+The interesting integration opportunity is that Apple's tool creates ext4
+filesystem images from OCI layers and caches them on disk. bcvk could read
+those ext4 images directly, extract the kernel, and boot them as full VMs —
+avoiding the `bootc install to-disk` step on macOS.
+
+## Background
+
+Apple's `container` converts OCI images into ext4 filesystem images using
+`EXT4Unpacker` from the
+[Containerization](https://github.com/apple/containerization) Swift package,
+then attaches them to VMs as virtio-blk devices. A `SnapshotStore` caches the
+ext4 images keyed by manifest digest. The VM boots a separate minimal kernel
+and `vminitd` guest agent — the kernel is *not* from the container image.
+
+bcvk's current flows (ephemeral run via VirtioFS, to-disk via `bootc install`)
+require bootc images that contain a kernel, initramfs, and systemd. The kernel
+is always extracted from the container image itself.
+
+## How Apple's ext4 pipeline works
+
+Apple's `EXT4Unpacker` (in the
+[Containerization](https://github.com/apple/containerization) package) does
+roughly the following:
+
+1. Creates a sparse ext4 filesystem image file via `EXT4.Formatter(path,
+   minDiskSize: N)`. The default minimum size is 512 GiB for regular
+   containers (sparse, so actual disk usage is much smaller).
+
+2. Iterates through the OCI image manifest's layers in order. For each layer,
+   it calls `filesystem.unpack(source: layer.path, ...)`, which reads the
+   layer tarball (gzip, zstd, or uncompressed) and writes its contents
+   directly into the ext4 image. OCI whiteout files (`.wh.*` and
+   `.wh..wh..opq`) are handled inline — whiteout entries delete files from
+   previous layers.
+
+3. The result is a flat ext4 image containing the fully merged container
+   rootfs. No union filesystem or overlay is needed at runtime.
+
+This ext4 image is then attached to a lightweight VM as a virtio-blk device.
+Inside the VM, a minimal guest agent (`vminitd`) mounts it and runs container
+processes within Linux cgroups and namespaces. Critically, the kernel and
+vminitd are *not* from the container image — they're provided separately by the
+`container` tool's own "init image."
+
+## How bcvk's current flows work
+
+bcvk has two main paths for getting from container image to running VM:
+
+**Ephemeral run** (`bcvk ephemeral run`): The container image is pulled via
+podman and mounted directly as the VM's root filesystem using VirtioFS (via
+virtiofsd). The kernel and initramfs are extracted from *within* the container
+image (from `/usr/lib/modules/<version>/` or `/boot/EFI/Linux/*.efi`). The VM
+boots with `rootfstype=virtiofs root=rootfs` and systemd takes over as init.
+This requires a *bootc* image — one that contains a kernel, initramfs, and
+systemd.
+
+**To-disk** (`bcvk to-disk`): An ephemeral VM is launched using the approach
+above, and within it, `bootc install to-disk` runs to install the OS to an
+attached virtio-blk disk. The output is a full disk image (with partition
+table, bootloader, etc.) suitable for libvirt or QEMU.
+
+Both flows fundamentally require bootc images. Standard OCI containers (e.g.
+`docker.io/library/nginx`) lack a kernel, initramfs, and systemd, so bcvk
+can't boot them.
+
+## Reusing Apple's ext4 images directly
+
+Since Apple's `container` tool already synthesizes ext4 rootfs images and
+caches them on disk (see the "Apple's storage APIs" section below), bcvk
+doesn't need to reimplement ext4 synthesis. On macOS, if the user has already
+pulled an image with Apple's `container` tool, the ext4 is sitting at a
+well-known path:
+`~/Library/Application Support/com.apple.containerization/snapshots/<manifest-digest>/snapshot`.
+
+To boot that ext4 as a VM, bcvk needs to:
+
+1. **Locate the ext4 snapshot** — resolve the image reference to a
+   platform-specific manifest digest, strip the `sha256:` prefix, and look
+   for the file at the snapshot store path.
+
+2. **Extract the kernel** — read the kernel and initramfs out of the ext4
+   image without mounting it (see below).
+
+3. **Boot via QEMU** — direct kernel boot (`-kernel`/`-initrd`) with the
+   ext4 image attached as a virtio-blk device and the right kernel command
+   line to mount it as root.
+
+This avoids reimplementing Apple's `EXT4Unpacker` entirely. bcvk becomes a
+consumer of Apple's snapshot store rather than a competing image pipeline.
+
+## Kernel extraction from the ext4 image
+
+Apple's `container` ships its own pre-built kernel separately from the
+container image. bcvk takes a different approach: the kernel always comes from
+the container image itself. This is a core design principle — bcvk boots the
+image's own kernel so the VM matches what would run in production. There is no
+"ship a separate kernel" option.
+
+For bootc images accessed via VirtioFS, bcvk already extracts the kernel from
+the mounted filesystem using `find_kernel()` in `crates/kit/src/kernel.rs`.
+That function searches for UKIs in `/boot/EFI/Linux/*.efi` and
+`/usr/lib/modules/<version>/*.efi`, and for traditional `vmlinuz` +
+`initramfs.img` pairs in `/usr/lib/modules/<version>/`. It operates on a
+`cap_std::fs::Dir`, which requires the filesystem to be mounted or otherwise
+accessible as a directory tree.
+
+When working with Apple's ext4 snapshots, the rootfs is an ext4 image file
+rather than a mounted directory. The kernel needs to be extracted from that
+ext4 image *before* the VM boots (since QEMU's `-kernel` flag needs the kernel
+as a host file). This creates a chicken-and-egg problem: we need to read the
+ext4 to get the kernel, but we don't want to mount the ext4 (that would
+require root or fuse).
+
+The solution is to use a userspace ext4 reader. The
+[`ext4-view`](https://github.com/nicholasbishop/ext4-view-rs) crate provides
+read-only access to ext4 filesystems from a file or byte buffer, without
+mounting. It's pure Rust, no unsafe, `no_std` compatible, and its API follows
+`std::fs` conventions (`read()`, `read_dir()`, `metadata()`, `exists()`).
+
+The implementation would work roughly as follows:
+
+1. After locating the ext4 snapshot from Apple's store, open it with
+   `ext4_view::Ext4::load_from_path()`.
+
+2. Run the same kernel search logic that `find_kernel()` uses, but against
+   the `Ext4` filesystem API instead of `cap_std::fs::Dir`. The search paths
+   are identical: `/boot/EFI/Linux/*.efi`, `/usr/lib/modules/<version>/*.efi`,
+   `/usr/lib/modules/<version>/vmlinuz` + `initramfs.img`.
+
+3. Extract the kernel (and initramfs if present) to a temporary file on the
+   host using `Ext4::read()`, which returns the file contents as `Vec<u8>`.
+
+4. Pass the extracted kernel to QEMU via `-kernel` (and `-initrd` if
+   applicable), with the ext4 image as a virtio-blk device.
+
+This approach is attractive because `ext4-view`'s API maps closely to
+`cap_std::fs::Dir`. The kernel search logic could be refactored to be generic
+over a filesystem trait — something like a `ReadDir + Read + Metadata`
+abstraction — that both `Dir` and `Ext4` implement. Alternatively, a simpler
+approach: a second `find_kernel_in_ext4()` function that duplicates the search
+logic against the `Ext4` type. Given that the search logic is ~90 lines, a
+small amount of duplication may be acceptable for a first pass, with
+deduplication via a trait coming later.
+
+The `ext4-view` crate is Apache-2.0/MIT dual-licensed (compatible with bcvk's
+licensing), has no unsafe code, and is actively maintained. It handles the ext4
+format details (block groups, extent trees, directory entries) that would be
+tedious to implement from scratch.
+
+## What would be different from default `apple/container`
+
+Even though bcvk would read `apple/container`'s ext4 images, the boot model is
+fundamentally different. `apple/container`'s `vminitd` is a purpose-built gRPC
+agent that manages container processes using Linux cgroups and namespaces
+*within* the VM — essentially a container runtime inside a VM. bcvk boots
+systemd and runs the full OS using the image's own kernel. The container
+image *is* the OS.
+
+This means the images bcvk can boot from `apple/container`'s snapshot store are
+limited to those that contain a kernel — bootc-style images. For images that
+lack a kernel entirely (e.g. `docker.io/library/nginx`), bcvk would not
+attempt to boot them. That's not bcvk's use case.
+
+## Practical assessment
+
+The implementation path for booting `apple/container`'s ext4 snapshots:
+
+1. Locate the ext4 snapshot on disk. Resolve the image reference to a
+   manifest digest (via the OCI index in `apple/container`'s content store
+   or by querying `container` CLI) and find the file at
+   `~/Library/Application Support/com.apple.containerization/snapshots/<digest>/snapshot`.
+
+2. Use `ext4-view` to read the ext4 image and extract the kernel and
+   initramfs to temporary host files, using the same search logic as the
+   existing `find_kernel()`.
+
+3. Boot via QEMU with `-kernel`/`-initrd` pointing to the extracted files
+   and the ext4 image as a virtio-blk root device.
+
+4. Wire this into `bcvk ephemeral run` as a new path on macOS.
+
+The hardest part is not reading the ext4 or extracting the kernel — both are
+straightforward with `ext4-view`. The more interesting design question is
+digest resolution: mapping an image reference to the right snapshot directory.
+
+## Apple's storage APIs and what they expose
+
+The `Containerization` Swift package and the `container` tool's services expose
+a layered set of APIs for accessing stored container images and their
+synthesized ext4 filesystems. Understanding these APIs is useful for evaluating
+whether bcvk (or any external tool) could reuse Apple's image storage directly.
+
+### The content store: OCI blobs as files
+
+The lowest layer is `LocalContentStore` (in `ContainerizationOCI`), which
+implements a standard OCI content-addressable storage layout. Blobs are stored
+as flat files at `<basePath>/blobs/sha256/<digest>`, where the default base
+path is `~/Library/Application Support/com.apple.containerization/content/`.
+
+The `ContentStore` protocol provides `get(digest:) -> Content?`, which returns
+a `Content` object for any blob. The `Content` protocol exposes:
+
+- `path: URL` — the filesystem path to the blob file
+- `data() -> Data` — read the entire blob into memory
+- `data(offset:length:) -> Data?` — read a range of the blob
+- `size() -> UInt64` — file size
+- `digest() -> SHA256.Digest` — content hash
+
+`Image.getContent(digest:)` wraps this: given a digest that the image
+references, it returns the `Content` object, from which you can get the `.path`
+to the raw layer tarball on disk. The layers are stored as compressed tarballs
+(gzip or zstd), exactly as pulled from the registry.
+
+Any external tool that knows the digest of a layer can read it directly from
+the filesystem without going through the Swift API — the layout is just files
+in a well-known directory.
+
+### The snapshot store: cached ext4 images
+
+Above the content store sits `SnapshotStore` (in the `container` tool's
+`ContainerImagesService`). This is where synthesized ext4 images are cached.
+The layout on disk is `<basePath>/snapshots/<manifest-digest>/snapshot`, where
+each `snapshot` file is a regular (sparse) ext4 filesystem image.
+
+`SnapshotStore.get(for:platform:)` returns a `Filesystem` object describing
+the cached ext4. The `Filesystem` type has a `source: String` field containing
+the absolute path to the ext4 file, along with `type` (block, virtiofs, etc.),
+`destination` (mount point), and `options` (mount options). For snapshots, the
+type is `.block(format: "ext4", ...)` and the source points to the `snapshot`
+file.
+
+`SnapshotStore.unpack(image:platform:)` creates the ext4 if it doesn't already
+exist: it delegates to `EXT4Unpacker.unpack()`, which iterates the image's
+layers in order, unpacking each compressed tarball directly into an ext4 image
+via `EXT4.Formatter`. The result is moved atomically into the snapshot
+directory. Alongside the `snapshot` file, a `snapshot-info` JSON file stores
+the serialized `Filesystem` metadata.
+
+### Can an external tool read these files?
+
+Yes, straightforwardly. Both the layer tarballs in the content store and the
+ext4 images in the snapshot store are regular files. There is no database, no
+proprietary container format, no locking mechanism that would prevent another
+process from reading them. If Apple's `container` tool has already pulled an
+image and unpacked it, bcvk could read the ext4 file directly from
+`~/Library/Application Support/com.apple.containerization/snapshots/<digest>/snapshot`.
+
+There are caveats. The snapshot store is keyed by the platform-specific
+manifest digest (not the image reference or index digest), so you'd need to
+resolve the image reference to the correct manifest digest to find the right
+snapshot directory. The content store's digest-stripping convention
+(`trimmingDigestPrefix` removes the `sha256:` prefix) is standard. Both stores
+could be relocated if the user changes the base path.
+
+### Could bcvk use the Swift APIs directly?
+
+Not practically. The `Containerization` package is Swift-only and the ext4
+writing code (`ContainerizationEXT4`) is gated behind `#if os(macOS)` — it
+won't compile on Linux. The APIs are designed to be consumed from Swift
+processes running on macOS.
+
+However, bcvk doesn't *need* the APIs. Since the on-disk layout is simple and
+well-defined, bcvk can read the ext4 snapshot files directly using their
+filesystem paths. The `ext4-view` crate handles reading the ext4 contents
+(for kernel extraction) without any dependency on Apple's Swift packages.
+
+### Summary of the storage surface
+
+The content store and snapshot store together form a clean two-tier cache:
+compressed layer tarballs keyed by content digest, and materialized ext4 images
+keyed by manifest digest. Both tiers are plain files on disk with predictable
+paths. An external tool running on the same macOS system can read them without
+any API dependency on Apple's Swift packages — all you need is the image digest
+and knowledge of the directory layout.