Skip to content

Commit c753572

Browse files
committed
docs: Add apple/container integration analysis
Describe how bcvk could reuse ext4 filesystem images that apple/container already creates, rather than reimplementing ext4 synthesis. The key insight is that apple/container's snapshot store contains plain ext4 files at predictable paths, and bcvk can read them directly. Cover the kernel extraction problem: bcvk always boots the image's own kernel, so it needs to read the kernel out of the ext4 image before booting. The ext4-view crate (pure Rust, read-only ext4 access) solves this without requiring mount privileges. Also document apple/container's storage APIs (content store layout, snapshot store structure) based on source analysis of the Containerization Swift package. Assisted-by: OpenCode (claude-opus-4-6)
1 parent 5f71805 commit c753572

2 files changed

Lines changed: 292 additions & 0 deletions

File tree

docs/src/SUMMARY.md

Lines changed: 3 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5,6 +5,9 @@
55
- [Installation](./installation.md)
66
- [Quick Start](./quick-start.md)
77
- [Workflow Comparison](./workflow-comparison.md)
8+
- [vs bootc](./vs-bootc.md)
9+
- [vs podman-bootc](./vs-podman-bootc.md)
10+
- [Apple container integration](./vs-apple-container.md)
811

912
# Reference
1013

docs/src/vs-apple-container.md

Lines changed: 289 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,289 @@
1+
# bcvk and Apple container integration
2+
3+
Apple's [`container`](https://github.com/apple/container) is a Swift-based
4+
tool that runs Linux containers as lightweight virtual machines on Apple Silicon
5+
Macs using the macOS Virtualization framework. It is macOS-only (requires macOS
6+
26+ and Apple Silicon) and targets standard OCI container images.
7+
8+
bcvk runs *bootable* container images as VMs using QEMU/libvirt on Linux.
9+
10+
Despite both tools using VMs, they use them differently. Apple's `container`
11+
runs container processes inside lightweight VMs — the VM is an isolation
12+
mechanism wrapping what is conceptually still a container. bcvk boots a
13+
complete OS from a container image — the VM *is* the end product, not an
14+
implementation detail.
15+
16+
The interesting integration opportunity is that Apple's tool creates ext4
17+
filesystem images from OCI layers and caches them on disk. bcvk could read
18+
those ext4 images directly, extract the kernel, and boot them as full VMs —
19+
avoiding the `bootc install to-disk` step on macOS.
20+
21+
## Background
22+
23+
Apple's `container` converts OCI images into ext4 filesystem images using
24+
`EXT4Unpacker` from the
25+
[Containerization](https://github.com/apple/containerization) Swift package,
26+
then attaches them to VMs as virtio-blk devices. A `SnapshotStore` caches the
27+
ext4 images keyed by manifest digest. The VM boots a separate minimal kernel
28+
and `vminitd` guest agent — the kernel is *not* from the container image.
29+
30+
bcvk's current flows (ephemeral run via VirtioFS, to-disk via `bootc install`)
31+
require bootc images that contain a kernel, initramfs, and systemd. The kernel
32+
is always extracted from the container image itself.
33+
34+
## How Apple's ext4 pipeline works
35+
36+
Apple's `EXT4Unpacker` (in the
37+
[Containerization](https://github.com/apple/containerization) package) does
38+
roughly the following:
39+
40+
1. Creates a sparse ext4 filesystem image file via `EXT4.Formatter(path,
41+
minDiskSize: N)`. The default minimum size is 512 GiB for regular
42+
containers (sparse, so actual disk usage is much smaller).
43+
44+
2. Iterates through the OCI image manifest's layers in order. For each layer,
45+
it calls `filesystem.unpack(source: layer.path, ...)`, which reads the
46+
layer tarball (gzip, zstd, or uncompressed) and writes its contents
47+
directly into the ext4 image. OCI whiteout files (`.wh.*` and
48+
`.wh..wh..opq`) are handled inline — whiteout entries delete files from
49+
previous layers.
50+
51+
3. The result is a flat ext4 image containing the fully merged container
52+
rootfs. No union filesystem or overlay is needed at runtime.
53+
54+
This ext4 image is then attached to a lightweight VM as a virtio-blk device.
55+
Inside the VM, a minimal guest agent (`vminitd`) mounts it and runs container
56+
processes within Linux cgroups and namespaces. Critically, the kernel and
57+
vminitd are *not* from the container image — they're provided separately by the
58+
`container` tool's own "init image."
59+
60+
## How bcvk's current flows work
61+
62+
bcvk has two main paths for getting from container image to running VM:
63+
64+
**Ephemeral run** (`bcvk ephemeral run`): The container image is pulled via
65+
podman and mounted directly as the VM's root filesystem using VirtioFS (via
66+
virtiofsd). The kernel and initramfs are extracted from *within* the container
67+
image (from `/usr/lib/modules/<version>/` or `/boot/EFI/Linux/*.efi`). The VM
68+
boots with `rootfstype=virtiofs root=rootfs` and systemd takes over as init.
69+
This requires a *bootc* image — one that contains a kernel, initramfs, and
70+
systemd.
71+
72+
**To-disk** (`bcvk to-disk`): An ephemeral VM is launched using the approach
73+
above, and within it, `bootc install to-disk` runs to install the OS to an
74+
attached virtio-blk disk. The output is a full disk image (with partition
75+
table, bootloader, etc.) suitable for libvirt or QEMU.
76+
77+
Both flows fundamentally require bootc images. Standard OCI containers (e.g.
78+
`docker.io/library/nginx`) lack a kernel, initramfs, and systemd, so bcvk
79+
can't boot them.
80+
81+
## Reusing Apple's ext4 images directly
82+
83+
Since Apple's `container` tool already synthesizes ext4 rootfs images and
84+
caches them on disk (see the "Apple's storage APIs" section below), bcvk
85+
doesn't need to reimplement ext4 synthesis. On macOS, if the user has already
86+
pulled an image with Apple's `container` tool, the ext4 is sitting at a
87+
well-known path:
88+
`~/Library/Application Support/com.apple.containerization/snapshots/<manifest-digest>/snapshot`.
89+
90+
To boot that ext4 as a VM, bcvk needs to:
91+
92+
1. **Locate the ext4 snapshot** — resolve the image reference to a
93+
platform-specific manifest digest, strip the `sha256:` prefix, and look
94+
for the file at the snapshot store path.
95+
96+
2. **Extract the kernel** — read the kernel and initramfs out of the ext4
97+
image without mounting it (see below).
98+
99+
3. **Boot via QEMU** — direct kernel boot (`-kernel`/`-initrd`) with the
100+
ext4 image attached as a virtio-blk device and the right kernel command
101+
line to mount it as root.
102+
103+
This avoids reimplementing Apple's `EXT4Unpacker` entirely. bcvk becomes a
104+
consumer of Apple's snapshot store rather than a competing image pipeline.
105+
106+
## Kernel extraction from the ext4 image
107+
108+
Apple's `container` ships its own pre-built kernel separately from the
109+
container image. bcvk takes a different approach: the kernel always comes from
110+
the container image itself. This is a core design principle — bcvk boots the
111+
image's own kernel so the VM matches what would run in production. There is no
112+
"ship a separate kernel" option.
113+
114+
For bootc images accessed via VirtioFS, bcvk already extracts the kernel from
115+
the mounted filesystem using `find_kernel()` in `crates/kit/src/kernel.rs`.
116+
That function searches for UKIs in `/boot/EFI/Linux/*.efi` and
117+
`/usr/lib/modules/<version>/*.efi`, and for traditional `vmlinuz` +
118+
`initramfs.img` pairs in `/usr/lib/modules/<version>/`. It operates on a
119+
`cap_std::fs::Dir`, which requires the filesystem to be mounted or otherwise
120+
accessible as a directory tree.
121+
122+
When working with Apple's ext4 snapshots, the rootfs is an ext4 image file
123+
rather than a mounted directory. The kernel needs to be extracted from that
124+
ext4 image *before* the VM boots (since QEMU's `-kernel` flag needs the kernel
125+
as a host file). This creates a chicken-and-egg problem: we need to read the
126+
ext4 to get the kernel, but we don't want to mount the ext4 (that would
127+
require root or fuse).
128+
129+
The solution is to use a userspace ext4 reader. The
130+
[`ext4-view`](https://github.com/nicholasbishop/ext4-view-rs) crate provides
131+
read-only access to ext4 filesystems from a file or byte buffer, without
132+
mounting. It's pure Rust, no unsafe, `no_std` compatible, and its API follows
133+
`std::fs` conventions (`read()`, `read_dir()`, `metadata()`, `exists()`).
134+
135+
The implementation would work roughly as follows:
136+
137+
1. After locating the ext4 snapshot from Apple's store, open it with
138+
`ext4_view::Ext4::load_from_path()`.
139+
140+
2. Run the same kernel search logic that `find_kernel()` uses, but against
141+
the `Ext4` filesystem API instead of `cap_std::fs::Dir`. The search paths
142+
are identical: `/boot/EFI/Linux/*.efi`, `/usr/lib/modules/<version>/*.efi`,
143+
`/usr/lib/modules/<version>/vmlinuz` + `initramfs.img`.
144+
145+
3. Extract the kernel (and initramfs if present) to a temporary file on the
146+
host using `Ext4::read()`, which returns the file contents as `Vec<u8>`.
147+
148+
4. Pass the extracted kernel to QEMU via `-kernel` (and `-initrd` if
149+
applicable), with the ext4 image as a virtio-blk device.
150+
151+
This approach is attractive because `ext4-view`'s API maps closely to
152+
`cap_std::fs::Dir`. The kernel search logic could be refactored to be generic
153+
over a filesystem trait — something like a `ReadDir + Read + Metadata`
154+
abstraction — that both `Dir` and `Ext4` implement. Alternatively, a simpler
155+
approach: a second `find_kernel_in_ext4()` function that duplicates the search
156+
logic against the `Ext4` type. Given that the search logic is ~90 lines, a
157+
small amount of duplication may be acceptable for a first pass, with
158+
deduplication via a trait coming later.
159+
160+
The `ext4-view` crate is Apache-2.0/MIT dual-licensed (compatible with bcvk's
161+
licensing), has no unsafe code, and is actively maintained. It handles the ext4
162+
format details (block groups, extent trees, directory entries) that would be
163+
tedious to implement from scratch.
164+
165+
## What would be different from default `apple/container`
166+
167+
Even though bcvk would read `apple/container`'s ext4 images, the boot model is
168+
fundamentally different. `apple/container`'s `vminitd` is a purpose-built gRPC
169+
agent that manages container processes using Linux cgroups and namespaces
170+
*within* the VM — essentially a container runtime inside a VM. bcvk boots
171+
systemd and runs the full OS using the image's own kernel. The container
172+
image *is* the OS.
173+
174+
This means the images bcvk can boot from `apple/container`'s snapshot store are
175+
limited to those that contain a kernel — bootc-style images. For images that
176+
lack a kernel entirely (e.g. `docker.io/library/nginx`), bcvk would not
177+
attempt to boot them. That's not bcvk's use case.
178+
179+
## Practical assessment
180+
181+
The implementation path for booting `apple/container`'s ext4 snapshots:
182+
183+
1. Locate the ext4 snapshot on disk. Resolve the image reference to a
184+
manifest digest (via the OCI index in `apple/container`'s content store
185+
or by querying `container` CLI) and find the file at
186+
`~/Library/Application Support/com.apple.containerization/snapshots/<digest>/snapshot`.
187+
188+
2. Use `ext4-view` to read the ext4 image and extract the kernel and
189+
initramfs to temporary host files, using the same search logic as the
190+
existing `find_kernel()`.
191+
192+
3. Boot via QEMU with `-kernel`/`-initrd` pointing to the extracted files
193+
and the ext4 image as a virtio-blk root device.
194+
195+
4. Wire this into `bcvk ephemeral run` as a new path on macOS.
196+
197+
The hardest part is not reading the ext4 or extracting the kernel — both are
198+
straightforward with `ext4-view`. The more interesting design question is
199+
digest resolution: mapping an image reference to the right snapshot directory.
200+
201+
## Apple's storage APIs and what they expose
202+
203+
The `Containerization` Swift package and the `container` tool's services expose
204+
a layered set of APIs for accessing stored container images and their
205+
synthesized ext4 filesystems. Understanding these APIs is useful for evaluating
206+
whether bcvk (or any external tool) could reuse Apple's image storage directly.
207+
208+
### The content store: OCI blobs as files
209+
210+
The lowest layer is `LocalContentStore` (in `ContainerizationOCI`), which
211+
implements a standard OCI content-addressable storage layout. Blobs are stored
212+
as flat files at `<basePath>/blobs/sha256/<digest>`, where the default base
213+
path is `~/Library/Application Support/com.apple.containerization/content/`.
214+
215+
The `ContentStore` protocol provides `get(digest:) -> Content?`, which returns
216+
a `Content` object for any blob. The `Content` protocol exposes:
217+
218+
- `path: URL` — the filesystem path to the blob file
219+
- `data() -> Data` — read the entire blob into memory
220+
- `data(offset:length:) -> Data?` — read a range of the blob
221+
- `size() -> UInt64` — file size
222+
- `digest() -> SHA256.Digest` — content hash
223+
224+
`Image.getContent(digest:)` wraps this: given a digest that the image
225+
references, it returns the `Content` object, from which you can get the `.path`
226+
to the raw layer tarball on disk. The layers are stored as compressed tarballs
227+
(gzip or zstd), exactly as pulled from the registry.
228+
229+
Any external tool that knows the digest of a layer can read it directly from
230+
the filesystem without going through the Swift API — the layout is just files
231+
in a well-known directory.
232+
233+
### The snapshot store: cached ext4 images
234+
235+
Above the content store sits `SnapshotStore` (in the `container` tool's
236+
`ContainerImagesService`). This is where synthesized ext4 images are cached.
237+
The layout on disk is `<basePath>/snapshots/<manifest-digest>/snapshot`, where
238+
each `snapshot` file is a regular (sparse) ext4 filesystem image.
239+
240+
`SnapshotStore.get(for:platform:)` returns a `Filesystem` object describing
241+
the cached ext4. The `Filesystem` type has a `source: String` field containing
242+
the absolute path to the ext4 file, along with `type` (block, virtiofs, etc.),
243+
`destination` (mount point), and `options` (mount options). For snapshots, the
244+
type is `.block(format: "ext4", ...)` and the source points to the `snapshot`
245+
file.
246+
247+
`SnapshotStore.unpack(image:platform:)` creates the ext4 if it doesn't already
248+
exist: it delegates to `EXT4Unpacker.unpack()`, which iterates the image's
249+
layers in order, unpacking each compressed tarball directly into an ext4 image
250+
via `EXT4.Formatter`. The result is moved atomically into the snapshot
251+
directory. Alongside the `snapshot` file, a `snapshot-info` JSON file stores
252+
the serialized `Filesystem` metadata.
253+
254+
### Can an external tool read these files?
255+
256+
Yes, straightforwardly. Both the layer tarballs in the content store and the
257+
ext4 images in the snapshot store are regular files. There is no database, no
258+
proprietary container format, no locking mechanism that would prevent another
259+
process from reading them. If Apple's `container` tool has already pulled an
260+
image and unpacked it, bcvk could read the ext4 file directly from
261+
`~/Library/Application Support/com.apple.containerization/snapshots/<digest>/snapshot`.
262+
263+
There are caveats. The snapshot store is keyed by the platform-specific
264+
manifest digest (not the image reference or index digest), so you'd need to
265+
resolve the image reference to the correct manifest digest to find the right
266+
snapshot directory. The content store's digest-stripping convention
267+
(`trimmingDigestPrefix` removes the `sha256:` prefix) is standard. Both stores
268+
could be relocated if the user changes the base path.
269+
270+
### Could bcvk use the Swift APIs directly?
271+
272+
Not practically. The `Containerization` package is Swift-only and the ext4
273+
writing code (`ContainerizationEXT4`) is gated behind `#if os(macOS)` — it
274+
won't compile on Linux. The APIs are designed to be consumed from Swift
275+
processes running on macOS.
276+
277+
However, bcvk doesn't *need* the APIs. Since the on-disk layout is simple and
278+
well-defined, bcvk can read the ext4 snapshot files directly using their
279+
filesystem paths. The `ext4-view` crate handles reading the ext4 contents
280+
(for kernel extraction) without any dependency on Apple's Swift packages.
281+
282+
### Summary of the storage surface
283+
284+
The content store and snapshot store together form a clean two-tier cache:
285+
compressed layer tarballs keyed by content digest, and materialized ext4 images
286+
keyed by manifest digest. Both tiers are plain files on disk with predictable
287+
paths. An external tool running on the same macOS system can read them without
288+
any API dependency on Apple's Swift packages — all you need is the image digest
289+
and knowledge of the directory layout.

0 commit comments

Comments
 (0)