Add OCI image support: pull, unpack, run, prune, status, policy by Max042004 · Pull Request #34 · sysprog21/elfuse

Max042004 · 2026-05-15T15:18:20Z

This PR lands the full elfuse OCI image support. It supersedes the
original Phase 1 scope of this PR (CLI scaffold + pull/inspect) and
now covers Phases 1-4 plus the post-Phase-3 improvements plan: image
layout alignment, GC/prune, layer + stack snapshot caches, store
status, parallel pull, registry policy.json, and a heavy-mode compat
matrix.

Scope

Pull / inspect — content-addressable blob store, HTTPS + bearer
token, OCI index walk to the linux/arm64 leaf manifest, partial-
store-aware inspect renderer.
Unpack — tar reader (ustar + PAX x/g records), gzip + decode-
only vendored zstd, whiteout-aware layer apply (typeflag '1'/'2'/'5'
- .wh.* markers), per-image sysroot on a case-sensitive APFS
  sparsebundle.
Run — elfuse oci run clones the unpacked tree via clonefile(2),
honors Entrypoint / Cmd / Env / WorkingDir / User, and reuses the
existing elfuse launch path so a dynamically-linked guest binary
runs through the same shim + syscall surface as the non-OCI mode.
Lifecycle — oci prune with --older-than / --keep-bytes;
layer + stack prune sweep; oci status (text + --json);
oci rebuild-cache for pre-snapshot stores.
Performance — parallel blob fetch with HTTP Range resume;
per-layer raw snapshot cache; ChainID stack snapshot cache; APFS
COW clone-rootfs reuse between runs.
Policy — podman / skopeo-style policy.json + registries.d
overlay (per-registry insecure / ca_bundle / auth_file). CLI flags
override; loopback-only --insecure.
Test coverage — 25 OCI unit suites (test-oci-*), compat-shell
smoke (tests/test-oci-compat.sh), and an opt-in heavy mode
(OCI_COMPAT_TEST=1) that drives three layered fixtures
(alpine-shaped, busybox-shaped hardlink dispatch, two-layer
whiteout) end-to-end through a freshly-provisioned scratch
sparsebundle.

Manual smoke test (docker.io/library/python:3.12)

A real end-to-end pull-and-run against a mainstream multi-layer glibc
image. The image's default Entrypoint is docker-entrypoint.sh (a
shell script, which elfuse does not execute), so the commands below
override --entrypoint to the python3 binary directly.

make elfuse
SCRATCH=$(mktemp -d)
echo "store: $SCRATCH"

# 1. Pull (~400 MB across 7 layers, ~3 minutes on a fast link).
#    If your terminal mishandles CSI cursor-up and the progress
#    output stacks duplicate rows, prepend ELFUSE_OCI_PROGRESS=plain
#    to fall back to one summary line per blob.
./build/elfuse oci pull --store "$SCRATCH" python:3.12

# 2. Offline inspect: image index -> linux/arm64 manifest -> config
#    runtime block (Entrypoint / Cmd / Env / WorkingDir / User).
./build/elfuse oci inspect --store "$SCRATCH" python:3.12

# 3. Cold run. First invocation triggers layer unpack onto the
#    sysroot APFS sparsebundle, then clone-rootfs, then launch. The
#    unpack step dominates the ~50 s wall on a fresh store.
./build/elfuse oci run --store "$SCRATCH" \
    --entrypoint /usr/local/bin/python3 python:3.12 \
    -c 'print("hello from elfuse", 1+2)'
# expected stdout:  hello from elfuse 3

# 4. Warm run. clone-rootfs reuses the unpacked image tree, so wall
#    drops to ~2 s and is dominated by VM bring-up + dynamic-linker
#    bring-up + Python interp init.
./build/elfuse oci run --store "$SCRATCH" \
    --entrypoint /usr/local/bin/python3 python:3.12 \
    -c 'import sys, platform; print(sys.version); print(platform.platform()); print(platform.machine())'
# expected stdout:  Python 3.12.x ... / Linux-<kernel>-aarch64-with-glibc2.41 / aarch64

# 5. stdlib smoke. Confirms json + math + f-string formatting all
#    flow through the emulated syscall surface.
./build/elfuse oci run --store "$SCRATCH" \
    --entrypoint /usr/local/bin/python3 python:3.12 \
    -c 'import json, math; print(json.dumps({"pi": round(math.pi, 5), "ok": True}))'
# expected stdout:  {"pi": 3.14159, "ok": true}

Performance characterization (vs OrbStack)

Measured on Apple M4 / macOS 15.4.1 (Darwin 24.4.0). OrbStack 2.1.3
acts as the ground-truth aarch64-linux runtime: it executes the same
docker.io/library/python:3.12 image inside a Virtualization.framework-
backed Linux VM with a real Linux kernel, so the comparison isolates
the cost of elfuse's user-mode ABI emulation against a native syscall
surface.

Pure CPU (factorial big-int multiply, no syscall)

import sys, math, time
sys.set_int_max_str_digits(0)   # Python 3.12 default cap is 4300 digits
N = 200000
t = time.perf_counter()
f = math.factorial(N)
s = sum(int(d) for d in str(f))
print("fact(%d) digit_sum=%d digits=%d compute=%.3fs" %
      (N, s, len(str(f)), time.perf_counter() - t))

Each engine ran twice; the second is warm. compute is the
time.perf_counter() delta inside Python (pure interpreter +
big-int multiply work); real is the outer wall (includes engine
startup); startup ≈ real - compute.

Engine	run	compute (s)	real (s)	startup (s)
elfuse	1	0.791	3.72	2.93
elfuse	2 warm	0.804	3.35	2.55
orbstack	1	0.792	1.10	0.31
orbstack	2 warm	0.796	0.97	0.17

Both engines emit digit_sum=4154076 digits=973351 — correctness
parity confirmed. Pure compute ratio: 1.01× (within measurement
noise). HVF runs guest aarch64 instructions directly so big-int
multiply + Python bytecode dispatch pay zero translation overhead.
Startup ratio: 15.0× (constant ~2.5 s for elfuse vs ~0.17 s for
orbstack), independent of N — verified separately at N=50000 where
both compute drops to ~0.14 s but elfuse startup stays at 2.53 s.

Syscall density (Python loop hammering syscalls)

import os, time
N_BASE = 1_000_000
N_READ = 100_000

def time_loop(label, fn, n):
    fn(min(n // 100, 10_000))   # warm-up
    t = time.perf_counter()
    fn(n)
    return label, time.perf_counter() - t, n

def baseline(n):
    for _ in range(n): pass

def getppid(n):
    g = os.getppid
    for _ in range(n): g()

def clock_ns(n):
    g = time.monotonic_ns
    for _ in range(n): g()

def urandom_read(n):
    fd = os.open("/dev/urandom", os.O_RDONLY)
    try:
        rd = os.read
        for _ in range(n): rd(fd, 1)
    finally:
        os.close(fd)

results = [
    time_loop("baseline (pass)",              baseline,     N_BASE),
    time_loop("getppid",                      getppid,      N_BASE),
    time_loop("clock_gettime (monotonic_ns)", clock_ns,     N_BASE),
    time_loop("/dev/urandom 1B read",         urandom_read, N_READ),
]
base_per = results[0][1] / results[0][2]
for label, secs, n in results:
    per = secs / n
    overhead = (per - base_per) * 1e6 if label != "baseline (pass)" else 0.0
    print("%-38s total=%.3fs n=%d per=%.3fus  syscall_overhead=%.3fus" %
          (label, secs, n, per * 1e6, overhead))

syscall_overhead strips the Python loop interpreter cost (measured
from the baseline band) so the residual is the pure trap+return
cost of a single syscall.

Band	elfuse (μs/call)	orbstack (μs/call)	ratio
baseline (pass)	0.007	0.007	1.0×
getppid	0.960	0.091	10.5×
clock_gettime (monotonic_ns)	1.006	0.018	55.9×
/dev/urandom 1B read	1.704	0.210	8.1×

getppid is the cleanest measurement: no kernel work, just trap +
return. elfuse pays roughly 1 μs per syscall versus ~0.1 μs native.
Rough HVF round-trip breakdown: vCPU state sync ~200 ns, Linux→macOS
semantics ~100 ns, the macOS syscall itself ~100 ns, errno + sync
back ~100 ns, HVF re-entry + ERET ~500 ns. This 1 μs floor is the
structural ceiling for any elfuse syscall path.

vDSO observation — time.monotonic_ns should hit the synthetic
vDSO under src/core/vdso.{c,h} and skip the trap (orbstack does, at
0.018 μs), but the measured 1.006 μs matches the trapping baseline.
elfuse's vDSO entry is not being picked up by glibc 2.41 in this
image. This is an existing optimization opportunity unrelated to the
scope of this PR; left untouched here so the patch series stays
focused on image-distribution and runtime correctness.

Wall-clock model

For a pure-CPU workload of compute time W:

elfuse_total   ≈ 2.5 s + W
orbstack_total ≈ 0.17 s + W

W	elfuse	orbstack	ratio	scenario
0.1 s	2.6 s	0.27 s	9.6×	CLI one-shot
1 s	3.5 s	1.17 s	3.0×	short script
10 s	12.5 s	10.17 s	1.23×	medium task
60 s	62.5 s	60.17 s	1.04×	batch job

elfuse is competitive for long-running workloads (where the constant
startup amortizes out) and a known tradeoff for short CLI one-shots
where startup dominates total wall.

Known limitations

fork() followed by execve() of a dynamically-linked ELF crashes
in the child during dynamic-linker bring-up. This blocks Python's
subprocess.run([...other_dynamic_binary...]), shell pipelines that
spawn external binaries, and timeout(1). Single-process Python
workloads, stdlib computation, and file I/O are unaffected.
Multi-arch image selection is hardcoded to linux/arm64. There is
no --platform flag; cross-arch image support is out of scope for
this PR.
pull progress uses CSI cursor-up + clear-line for in-place
redraw. Terminal panes that ignore those escapes show stacking
rows; set ELFUSE_OCI_PROGRESS=plain to disable the redraw and
emit one summary line per blob instead.

Summary by cubic

Expands elfuse oci from pull/inspect into a full image lifecycle: run, unpack/clone on a case‑sensitive APFS volume, with parallel/resumable pulls, caching, GC, and status. Adds policy‑driven auth/TLS, richer inspect, and runtime wiring to execute images directly.

New Features
- New CLI: oci run|unpack|clone|prune|rebuild-cache|status; pull gains --refresh and progress; inspect shows image runtime and layer‑reuse stats.
- Unpack pipeline: tar reader + gzip/zstd decode, whiteout‑aware layer apply, per‑image sysroot on a sparse APFS volume; per‑run rootfs via clonefile(2).
- Caches: raw per‑layer snapshots and ChainID stack snapshots; parallel blob fetch with HTTP Range resume; schema marker under layers/ (v2).
- Runtime: inject /etc/{resolv.conf,hosts,hostname}; emulate /dev/{full,console}; add /proc cgroup/hostname/comm/statm; PATH resolver; runspec merge; shared VM launcher.
- Auth/TLS policy: podman/skopeo‑style policy.json + registries.d overlay merged with CLI; Basic auth, custom CA, loopback‑gated --insecure.
- Store ops: prune (blobs/layers/stacks) with --older-than/--keep-bytes; status summary; dedup metrics in inspect; writes oci-layout.
Migration
- Pins moved to OCI index.json; store auto‑migrates from refs/ on open.
- Layer cache marked schema v2; first open wipes legacy v1 entries (blobs/images untouched).
- Vendored decode‑only zstd and cJSON included; relies on system zlib and libcurl.

^{Written for commit 700ac9d. Summary will update on new commits. Review in cubic}

Lays the first slice of Phase 1 from issue sysprog21#31: the elfuse oci subcommand surface and a self-contained OCI image reference parser. No registry, store, or unpack code lands here; this is the routing and parsing scaffold that every later piece depends on. src/main.c routes argv[1] == "oci" to oci_cli_main before the Hypervisor.framework setup runs, so image distribution never has to satisfy the host DC ZVA assertion or the HVF entitlement check. The existing arg parser, --help, --version, --fork-child, and guest execution paths are otherwise untouched. src/oci/cli.c implements pull, inspect, prune, and list dispatch. inspect parses a reference and prints the canonical form along with the registry, repository, tag, and digest fields, which proves the end-to-end wiring. The remaining subcommands return rc=2 with an explicit "not implemented yet" message rather than crashing or silently succeeding so users get a stable surface to script against. src/oci/ref.c implements the de-facto containerd/docker reference grammar: reference := name [":" tag] ["@" digest] name := [domain "/"] path domain := first slash component containing "." or ":" or equal to "localhost" path := component ("/" component)* component := [a-z0-9]+ ((["._-"] | "__") [a-z0-9]+)* tag := [A-Za-z0-9_] [A-Za-z0-9_.-]{0,127} digest := ("sha256" | "sha512") ":" lowercase-hex Defaults match Docker conventions: missing registry becomes docker.io, single-segment paths under docker.io pick up the library/ prefix, and missing tag/digest defaults the tag to latest. A digest- only reference leaves tag NULL so the canonical form does not fabricate a tag the user never wrote. Digest hex is required to be lowercase because the local content-addressable store will key off the canonical digest string and uppercase encodings would otherwise cause silent dedup misses. memrchr is GNU-only and Darwin libc does not ship it, so a small memrchr_local helper handles the rightmost-slash search the tag detector needs. The looks_like_domain helper compares localhost as a 9-byte literal (the earlier draft had a length bug here that the unit tests caught). tests/test-oci-ref.c is a native macOS test program (not cross- compiled, no Hypervisor.framework, no codesign) that links directly against src/oci/ref.c. It runs 14 happy-path cases covering Docker defaults, registry detection, port handling, sha256 and sha512 digests, tag+digest pinning, and every separator variant in the component grammar, plus 20 error cases covering empty input, NULL input, uppercase, malformed digests, double @, empty tag/digest suffixes, length limits, and structural validation. All 34 cases pass. mk/config.mk adds tests/test-oci-ref.c to NATIVE_TESTS so the cross- compile pattern rule does not pick it up. Makefile adds the link rule for build/test-oci-ref (no codesign because there is no HVF dependency). mk/tests.mk exposes test-oci-ref as a phony target and runs it as the last stage of make check, alongside the existing proctitle, busybox, sysroot, and timeout-disable validations.

Second slice of Phase 1 from issue sysprog21#31. Lands the on-disk storage substrate that the upcoming registry client will spill manifests, configs, and layers into. No HTTP, no unpack, no CLI surface yet; this slice is intentionally a pure library plus offline unit tests so the storage semantics can be audited without standing up a network. src/oci/digest.{c,h} wraps CommonCrypto SHA-256 and SHA-512 in a streaming digester so multi-gigabyte layers can be hashed without buffering. Calls into CommonCrypto are clamped to 1 GiB chunks because CC_LONG is 32-bit and OCI layers can legitimately exceed that. Hex output is lowercase to match the reference parser (src/oci/ref.c); the OCI image reference grammar already rejects uppercase digest hex, so the entire pipeline -- parser, manifest fetcher, local store -- shares one canonical encoding and cannot silently miss a dedup match. A separate one-shot helper, hex validator, and "<algo>:<hex>" parser sit on top of the same streaming primitive. src/oci/blob-store.{c,h} is the content-addressable store. Layout matches the OCI image-layout convention: <root>/blobs/<algo>/<hex> for committed blobs plus <root>/tmp/blob-<pid>-<seq>-XXXXXX for the in-flight staging file. mkstemp supplies global uniqueness; an in-process counter is added to the template so failures of the rand pool cannot defeat in-process disambiguation. The commit path hashes streamed bytes, fsyncs the staging file, and uses link(2) rather than rename(2) to publish the final inode. link returning EEXIST is the dedup hit signal: two writers racing on the same digest both unlink their staging files and report success, because the content is by definition identical when the digest matched. Digest mismatch returns -1 with errno EINVAL and unlinks the staging file, so an interrupted or hostile pull never leaves a visible-complete blob behind. The abort path takes the same cleanup. STORE_PATH_MAX is set comfortably above PATH_MAX so snprintf truncation cannot silently corrupt a path; callers passing smaller buffers still detect overflow via the return value. Per oci-roadmap.md Q1, the store will eventually sit on a case-sensitive APFS sparse volume managed by elfuse, but the volume bootstrap is its own later slice. For now the store API takes a plain directory path; the same API survives the volume migration unchanged. tests/test-oci-digest.c exercises 25 cases: NIST FIPS-180-4 vectors (empty, "abc", 56-byte, one-million-'a') for both SHA-256 and SHA-512, the same one-million-'a' streamed in 4 KiB and 17-byte chunks to lock down the chunking loop, hex validator boundary cases, and every "<algo>:<hex>" parse rejection (missing colon, unknown algorithm, short hex, uppercase hex, NULL input). NULL and zero- length updates must be safe and must not perturb the running state. tests/test-oci-blob-store.c drives 14 cases inside an mkdtemp scratch directory: layout creation, idempotent reopen, path() formatting, one-shot put + has() round-trip, dedup commit leaves the same inode, digest mismatch is rejected with EINVAL and tmp/ stays empty, streaming writer over multiple chunks, abort leaves no leftover, and close + reopen still sees the committed blob (issue sysprog21#31 DoD: "store survives restart"). dir_is_empty / path_is_dir / path_is_file helpers keep the assertions terse. Makefile adds oci/digest.c and oci/blob-store.c to SRCS, plus the two new native-test link rules. mk/config.mk extends NATIVE_TESTS so the cross-compile pattern rule does not pick the new tests up. mk/tests.mk exposes test-oci-digest and test-oci-blob-store as phony targets and runs them as the final two stages of make check, beside the existing test-oci-ref stage. All 39 (25 + 14) new assertions pass; the rest of make check stays green (unit suite 81 passed / 0 failed, busybox, proctitle, procfs-exec, timeout-disable, OCI-ref 34/34).

Third slice of Phase 1 from issue sysprog21#31. Lands the JSON deserialization substrate the upcoming registry client will run every fetched manifest, index, and config blob through. No HTTP, no unpack, no CLI surface yet; this slice is intentionally a pure offline library plus a 76-case unit test driven by inline JSON fixtures so the parse contract is auditable without standing up a network. externals/cjson/ vendors cJSON v1.7.18 verbatim (MIT-licensed, single .c/.h pair) per oci-roadmap.md Q9. No local modifications; future security updates re-fetch via the three curl commands in externals/cjson/VENDORING.md. .gitignore switches from ignoring all of externals/ to ignoring externals/* with an explicit !externals/cjson/ exception so the vendored tree stays tracked while the downloaded test fixtures stay out of git. The Makefile compiles cJSON with the same project CFLAGS the rest of the codebase uses; cJSON happens to be clean under -Wall -Wextra -Wpedantic on this version, so no per-file warning override is required. src/oci/media-type.{c,h} is the canonical enum + table for every OCI and Docker media type the manifest/index/config/layer code branches on. Foreign (nondistributable) layers are recognized and distinguishable so the parser can name the actual offending layer type instead of collapsing them to a generic "unknown", but the supported-layer predicate excludes them per oci-roadmap.md Q3 (elfuse cannot fetch the out-of-band payload they reference). The parser strips charset/boundary parameters and surrounding whitespace before lookup so the registry's Content-Type header value canonicalizes the same way the manifest's mediaType JSON field does. src/oci/manifest.{c,h} parses image manifests, image indexes, and image configs against schemaVersion 2. Every descriptor digest is validated through oci_digest_parse so a parsed oci_descriptor_t carries both the original "<algo>:<hex>" string and a populated (algo, hex[]) pair the blob store from slice 2 can consume directly. Size fields go through a fractional-part / negative / round-trip-precision check because cJSON returns numbers in a double; the parser rejects sizes beyond 2**53 - 1 where IEEE 754 precision starts dropping integers and rejects fractional sizes that would otherwise truncate silently to a near-but-wrong integer. Manifest config descriptors are required to carry a config media type, layer descriptors must carry a layer media type, and foreign layers are rejected with a precise error. Image configs require rootfs.type == "layers" (the only value the OCI image-spec defines) and validate every rootfs.diff_ids entry as a lowercase digest. Platform fields default empty variant / os.version strings to "" rather than NULL so the selector can use unconditional strcmp. oci_index_pick_linux_arm64 prefers variant "v8", then empty variant, then any other arm64 variant. It also skips entries whose manifest media type is not recognized -- even when the platform matches, the registry-fetch path cannot consume the resulting manifest, so picking such an entry would only defer a failure. tests/test-oci-manifest.c exercises 76 cases inline: every recognized media type lookup, charset/whitespace stripping, NULL and bogus strings, every predicate, both compression results; OCI and Docker happy-path manifest parses with two-layer gzip + zstd mix; the seven manifest rejection paths (malformed JSON, schemaVersion != 2, missing config, uppercase digest, negative size, fractional size, foreign layer, non-config media type on the config descriptor); the four index paths (multi-arch v8 wins; no-v8 picks empty variant over v7; no linux/arm64 returns NULL; Docker manifest list; unknown manifest mediaType is recorded but the selector skips it); and the four image config paths (happy with User/Env/Entrypoint/Cmd/WorkingDir/diff_ids; missing rootfs; non-layers rootfs.type; malformed diff_id). Makefile / mk/config.mk / mk/tests.mk wire the new translation units into elfuse's link line, add oci/media-type.o + oci/manifest.o + the vendored cJSON object, register tests/test-oci-manifest.c in NATIVE_TESTS so the cross-compile pattern rule does not pick it up, and run the new test as the final stage of make check beside the existing test-oci-ref / test-oci-digest / test-oci-blob-store stages. All 76 new assertions pass; the rest of make check stays green (unit suite 81 passed / 0 failed / 3 skipped, busybox, proctitle, procfs-exec, timeout-disable, OCI-ref 34/34, OCI-digest 25/25, OCI-blob-store 14/14). elfuse oci pull / prune / list still return rc=2; wiring the parser into the CLI is gated on slice 4 (HTTPS + token challenge + blob fetch). The parsers exist now so that work can land without also adding deserialization.

Fourth slice of Phase 1 from issue sysprog21#31, split into 4a here. Lands the HTTP fetch substrate that connects the slice-3 manifest parsers to a real registry and streams blob bodies into the slice-2 content-addressed store, all behind a single fetcher handle. No CLI wiring yet (elfuse oci pull still returns rc=2); slice 5 connects the pull command to this layer, persists the manifest graph, and pins the resolved tag-to-digest. Slice 4 was cut into 4a / 4b per oci-roadmap.md Q7 so each slice stays under the ~800 LOC review budget. 4a covers the anonymous Docker Hub / GHCR public-pull subset: anonymous GET, 401 + Www-Authenticate Bearer challenge, token fetch, retry, blob streaming with declared-size cap and on-commit digest verification. 4b will add basic auth, --insecure-ca custom CA, and --insecure loopback-gated TLS verify off. src/oci/fetch.{c,h} wraps libcurl. A fetcher owns one CURL easy handle, one cached bearer token, and the most recent Www-Authenticate challenge. The first request is anonymous. If the registry replies 401, the header parser captures realm / service / scope, fetch_token GETs the realm with those parameters, the JSON response is parsed with cJSON, and the original request is retried once with Authorization: Bearer <token>. The cached token is reused for subsequent calls on the same fetcher so a manifest plus N layer pulls cost one token round trip rather than N+1. docker.io is rewritten to registry-1.docker.io because the reference parser stores the canonical name while the actual API host differs. The blob path is content-addressed end to end. oci_fetch_blob short circuits when the descriptor is already present in the store; otherwise it opens an oci_blob_writer keyed by the descriptor digest, streams response body chunks through the writer, and tracks a running byte count capped at the descriptor's declared size so a hostile server cannot stream forever. The writer's own digest check at commit time rejects any payload that hashes to anything other than the descriptor hex. Size mismatch, digest mismatch, transport error, and non-2xx all unwind via oci_blob_writer_abort so an interrupted pull never leaves a visible-complete blob behind. CURLOPT_FOLLOWLOCATION is enabled so the common case where a registry 307s blob fetches to S3 / Cloudfront with a pre-signed URL works transparently; libcurl strips the Authorization header on cross-host redirects, which is exactly what the storage backend expects. The header parser keys on Content-Type, Docker-Content-Digest, and Www-Authenticate. Content-Type is stripped of charset/parameters before the manifest parser sees it so the canonicalization matches the mediaType field inside the JSON body. Docker-Content-Digest is captured verbatim so the upcoming tag-to-digest pinning in slice 5 can record the registry's resolved digest without recomputing. Response body accumulation has a 16 MiB ceiling (FETCH_BODY_MAX) so an unbounded reply cannot fill memory; real manifests, indexes, and image configs are orders of magnitude below this. Blob responses bypass the buffer entirely and stream straight through the writer. tests/test-oci-fetch.c spawns an in-process HTTP/1.1 mock server bound to 127.0.0.1 on an ephemeral port and drives the fetcher against scripted handlers. Nine offline cases exercise anonymous manifest GET (body, Content-Type stripping, Docker-Content-Digest capture); manifest 404 surfaces with the right status; bearer challenge runs the full 401 then token then retry sequence and inspects the request log to verify the second hop hits /token and the third carries the Bearer header; cached token reuse on a second fetch confirms no re-challenge round trip; blob success commits a known-good payload to the store; already-cached blob short-circuits with zero server requests; oversize response is rejected and leaves no visible blob; digest mismatch on a correctly-sized payload is rejected at commit; blob 404 fails cleanly. An opt-in tenth case behind OCI_FETCH_ONLINE=1 pulls alpine:3.20 from Docker Hub through the real bearer flow as a smoke test; it is wired as make test-oci-fetch-online and is not part of make check. Makefile adds src/oci/fetch.c to SRCS and -lcurl to HVF_LDFLAGS so the production elfuse binary links libcurl from the macOS SDK (no vendoring per oci-roadmap.md Q7 and Q9). build/test-oci-fetch links libcurl plus pthread for the mock server. mk/config.mk registers the test source in NATIVE_TESTS so the cross-compile pattern rule does not try to aarch64-compile it. mk/tests.mk adds test-oci-fetch as the final stage of make check and exposes test-oci-fetch-online as a separate target. make check stays green: 78 unit tests, busybox 81/0/3, proctitle, procfs-exec, timeout-disable, OCI-ref 34/34, OCI-digest 25/25, OCI-blob-store 14/14, OCI-manifest 76/76, OCI-fetch 9/9.

…ecure) Fourth slice of Phase 1 from issue sysprog21#31, 4b half. Closes out the oci-roadmap.md Q7 ship list by extending the slice-4a fetcher with HTTP Basic authentication, custom CA bundle, and a loopback-gated TLS verify-off path. fetch_manifest / fetch_blob signatures are unchanged; everything new lives in oci_fetcher_options_t and a new per-easy-handle helper. src/oci/fetch.h grows four fields on oci_fetcher_options_t: username, password, ca_file, allow_insecure. oci_fetcher_new now stashes username/password as a pre-joined "user:pass" string (CURLOPT_USERPWD takes the joined form), strdup's ca_file, and records allow_insecure verbatim. apply_security_opts() is called from every GET callsite (perform_manifest_get, perform_blob_get, fetch_token) right after curl_easy_reset, which attaches CURLOPT_USERPWD plus CURLAUTH_BASIC, CURLOPT_CAINFO, and CURLOPT_SSL_VERIFY{PEER,HOST}=0 when each is set. This shape gives the token endpoint the basic credentials too: a registry that bridges Basic for the token exchange and Bearer for the data API sees both. libcurl drops the USERPWD-derived Authorization header in favor of the manually appended Authorization: Bearer on the retry, so basic gives way to bearer once a token is in hand. The loopback policy gate runs at the entry of oci_fetch_manifest and oci_fetch_blob, not in oci_fetcher_new: ref is not available at construction time, and policy is about which host the fetcher is actually about to talk to. extract_host_from_registry strips the optional :port (and the [] of bracketed IPv6 literals) from ref->registry, is_loopback_host case-insensitively matches against 127.0.0.1 / localhost / ::1, and check_insecure_policy combines them so a non-loopback target with allow_insecure=true returns -1 with errno=EPERM before a single byte is sent. The policy reads ref->registry rather than the test-only base_url_override so unit tests can drive a non-loopback ref while still pointing the mock URL at 127.0.0.1, and the production surface (no override) gets the same answer it would in deployment. tests/test-oci-fetch.c upgrades the in-process mock from plain HTTP to TLS. The mock generates an ephemeral RSA-2048 keypair and a self-signed certificate at startup via OpenSSL EVP, signed for CN=127.0.0.1 with SAN IP:127.0.0.1 + DNS:localhost, valid for one day. The certificate PEM is written into the scratch directory and the fetcher receives the path through opts.ca_file. accept loop wraps each connection in SSL_accept; read/write go through a small io_t abstraction so handler signatures change only in the IO parameter type. mock_send_full keeps the same response shape but writes through SSL_write. libcurl's SSL backend is forced to OpenSSL (LibreSSL on macOS) via curl_global_sslset() called before any other libcurl entry. macOS system libcurl is a multi-SSL build that defaults to Secure Transport, and Secure Transport ignores CURLOPT_CAINFO. Without this pin the ca_file negative cases would pass for the wrong reason: the handshake would succeed against the keychain, not the supplied PEM. LibreSSL on macOS still finds the system trust roots for the OCI_FETCH_ONLINE=1 case, so the online docker.io smoke test continues to work. mk/toolchain.mk auto-detects OPENSSL_PREFIX from /opt/homebrew/opt/openssl@3 (Apple Silicon) or /usr/local/opt/openssl@3 (Intel) and exposes OPENSSL_CFLAGS / OPENSSL_LDFLAGS. The Makefile attaches them only to build/test-oci-fetch (target-specific CFLAGS plus link flags), so the production elfuse binary still has no OpenSSL dependency: the new TLS plumbing is testing scaffolding, not runtime code. Test count grows from 9 to 15 cases. New cases: basic auth success (verifies the server saw "Basic YWxpY2U6c2VjcmV0" exactly once); basic auth carried into the token endpoint (verifies the token GET saw the same basic credentials and the manifest retry switched to Bearer); insecure on a loopback registry is allowed (HTTPS request goes through despite no ca_file); insecure on a non-loopback registry is rejected with errno=EPERM and zero bytes leak to the mock server (request log stays empty); ca_file unset against the self-signed mock fails the handshake with http_status=0; ca_file pointing at an unrelated self-signed certificate also fails the handshake. The 9 existing cases continue to pass over TLS by supplying the mock's CA PEM as ca_file. make check stays green: 78 unit tests, busybox 81/0/3, proctitle, procfs-exec, timeout-disable, OCI-ref 34/34, OCI-digest 25/25, OCI-blob-store 14/14, OCI-manifest 76/76, OCI-fetch 15/15. make test-oci-fetch-online (opt-in) also passes.

Slice 5a of Phase 1 from issue sysprog21#31. Wires the slice 4a/4b fetcher and the slice 3 manifest parser into the elfuse oci pull command and persists the resolved blob graph on disk. inspect still renders only the canonical reference; the offline manifest-tree renderer ships in slice 5b. src/oci/store.{c,h} wraps the slice-2 content-addressable blob store with a tag-to-digest pin table. On-disk layout under <root>: blobs/<algo>/<hex> (immutable, from slice 2) tmp/blob-<pid>-<seq>-XXXXXX (in-flight staging) refs/<registry>/<repository>/<tag> (pin file, one line: <algo>:<hex>) oci_store_open creates the refs/ subtree, then opens a blob store rooted at the same path so the two layers share one directory. oci_store_put_ref refuses digest-only refs (their digest is the pin, no file needed), validates the supplied digest string with oci_digest_parse, mkdir -p's the registry/repository prefix on demand, writes <digest>\n into a tmp file alongside the final path, fsyncs, and renames into place. Rename rather than link because tag pins are mutable: pulling alpine:3.20 today may resolve to a different digest than yesterday and overwriting the pin is the correct semantic. The blob layer keeps its link(2) discipline because content-addressed blobs stay immutable. oci_store_get_ref reads the pin file, strips the trailing newline, validates the digest via oci_digest_parse, and returns a heap- allocated copy. Miss reports errno=ENOENT so callers can distinguish "never pulled" from "io error reading pin". oci_store_default_root returns the platform default: $XDG_DATA_HOME/ elfuse/store when set, otherwise $HOME/Library/Application Support/ elfuse/store. Phase 2 will mount a sparse case-sensitive APFS volume at the same path (oci-roadmap.md Q1); the API does not change. src/oci/pull.{c,h} implements the pipeline. oci_pull runs five phases linearly: 1. Fetch the top-level manifest by ref->digest or ref->tag, advertising Accept for both OCI and Docker index + manifest types. 2. Hash the body with SHA-256 and cross-check against the Docker-Content-Digest header when the registry sent one. Body / header mismatch is a hostile-registry signal and aborts before anything else writes to the store. When the user pulled by digest, also cross-check the body digest against ref->digest. 3. Persist the manifest body into blob store at sha256:<computed-hex>. 4. If the top-level was an image index, parse it, run oci_index_pick_linux_arm64, fetch the sub-manifest by its descriptor digest with expected-digest verification, persist it, and switch to the sub-manifest body for the next phase. The pin digest stays at the top-level (index) digest so that the next inspect / pull by tag re-walks index then manifest. 5. Parse the manifest, fetch the config blob, fetch each layer blob in manifest order via oci_fetch_blob. Each blob fetch short- circuits when oci_blob_store_has reports a hit, so a re-pull issues zero layer downloads (only the two manifest bodies are re-fetched in the index case; manifest caching is its own future slice). 6. Write the tag-to-manifest-digest pin via oci_store_put_ref. Skip for digest-only refs (no tag to pin). Schema v1 manifests and foreign / nondistributable layers are rejected by oci_manifest_parse from slice 3; oci_pull surfaces those diagnostics and aborts before any partial layer hits the store. The errno preserved across the cleanup goto so callers can key tests off EPROTO / ENOENT / EINVAL without seeing free()'s leftover stomp. Progress output is one line per descriptor with a truncated digest, size, state (downloaded vs cached), and media-type name. -q / --quiet silences it. The full hex still goes into the pin file and the blob store for verification. src/oci/cli.c grows pull argument parsing: --store DIR, -u | --user USER[:PASS], --insecure-ca PEM, --insecure, -q | --quiet, plus the positional reference. Defaults come from oci_store_default_root. split_userpass handles "user", "user:", and "user:pass" forms with one dynamically-allocated buffer the cleanup path frees. inspect, prune, list keep their slice-1 behaviour for now. tests/lib/oci-mock.{c,h} extracts the TLS-terminated HTTP/1.1 mock server from test-oci-fetch.c. The accept loop, ephemeral self-signed RSA-2048 + SAN cert generator, header parser, request log, and mock_send_full response helper all move out so both the fetch and the pull suites share one ~400 LOC implementation. Public symbols gain an oci_mock_ prefix to make the helper boundary explicit. Three small helpers (wipe_dir, scratch_root, base_url) tag along because both suites need them. test-oci-fetch.c shrinks by 380 lines, switches to the new header, and keeps its 15/15 passing. tests/test-oci-store.c covers 9 cases: layout creation, put + get round trip, miss returns ENOENT with out_digest=NULL, digest-only ref is rejected with EINVAL (its digest is the pin), malformed digest string is rejected with EINVAL, deep repository slashes get mkdir -p, pin overwrite replaces the file, blob and pin share the same root, and default_root respects XDG_DATA_HOME / falls back to HOME. tests/test-oci-pull.c covers 6 end-to-end cases against the mock. The test builds a synthetic image at runtime: three layer byte strings, one image config JSON referencing the layer digests, one manifest JSON referencing the config + layer digests, one index JSON referencing the manifest digest. All five digests are real SHA-256 of the actual bytes the mock serves, so the cross-check inside oci_pull exercises a real verification path. The cases are: tag resolves to index resolves to arm64 sub-manifest with config + 3 layers stored and pin written; tag resolves directly to manifest (no index) with pin written; digest- only ref pulls but no pin is written (and get_ref returns EINVAL); re-pull short-circuits layer + config downloads (second pull issues exactly 2 requests: index + sub-manifest); body / Docker-Content-Digest mismatch aborts with EPROTO and no pin written; index without linux/arm64 entry aborts with ENOENT. Makefile / mk/config.mk / mk/tests.mk wire the new translation units: oci/store.o and oci/pull.o join SRCS; test-oci-store.c and test-oci-pull.c land in NATIVE_TESTS so the cross-compile rule skips them; new link rules build test-oci-store and test-oci-pull; tests/lib/ oci-mock.o is a separate object linked into both test-oci-fetch and test-oci-pull with OPENSSL_CFLAGS applied; make check gains two new stages running test-oci-store and test-oci-pull after the existing OCI suites. make check stays fully green: 78 unit tests; busybox 81/0/3; proctitle low-stack; procfs-exec; timeout-disable; OCI-ref 34/34; OCI-digest 25/25; OCI-blob-store 14/14; OCI-manifest 76/76; OCI-fetch 15/15; OCI-store 9/9; OCI-pull 6/6. make test-oci-fetch-online (opt-in) still passes.

Slice 5b of Phase 1 from issue sysprog21#31. Closes out Phase 1 by giving elfuse oci inspect an actual function beyond the slice-1 canonical-ref print: it reads the local store the slice 5a pull pipeline populated and renders the manifest graph without touching the network. Phase 2 follows: sparse APFS volume bootstrap, layer unpack with whiteouts, clonefile copy-up. src/oci/inspect.{c,h} owns the offline renderer. oci_inspect resolves the manifest digest in three steps: 1. ref->digest when set (digest-pinned reference) 2. pin file <root>/refs/<registry>/<repository>/<tag> when ref->tag is set 3. Neither: print "(no local manifest; run 'elfuse oci pull' first)" on stdout and return 0. This preserves the slice-1 inspect smoke output shape for refs that were never pulled. The pinned digest goes through oci_digest_parse to reject corrupt pin files, then read_blob_file slurps <root>/blobs/<algo>/<hex> into a heap buffer. read_blob_file caps the read at 64 MiB (real manifests are well under 1 MiB; the cap prevents a corrupted store from forcing a pathological malloc) and reports errno=ENOENT when the blob file is absent. Classification between index and manifest is structural: the slice-3 parsers reject disjoint shapes (oci_index_parse requires a manifests array; oci_manifest_parse requires config + layers), so trying index first and falling back to manifest is unambiguous. Image config blobs never reach this path because pins point at manifest-shaped blobs. Index rendering prints a platforms table. Default mode shows only the picked linux/arm64 entry (tagged "[arm64]") and drills into the sub-manifest blob to print its config descriptor + layer table. The --all-platforms flag lists every platform entry and skips the drill; the flag answers "what does this image cover", not "what is inside the arm64 variant". Both decisions are documented inline at the oci_inspect_options_t definition. Failure mode for a partial store: index loads fine but the linux/arm64 sub-manifest blob is missing. The platform table still goes to stdout (the user sees what is available), a warning lands on stderr, and the call returns -1 with errno=ENOENT and err_msg = "indexed manifest blob missing from local store". Scripts key on the exit code; humans read the table. The errno is preserved across the cleanup goto in the same shape slice-5a oci_pull adopted. Digest formatting follows the slice-5a progress lines for visual consistency: full digests appear in the pinned: line and in index entry tagging (so users can copy / grep the exact value), and a 22- column short form ("sha256:" + 12 hex + "...") appears in the layer tables. short_digest takes a caller-supplied buffer so two short digests in one printf do not clobber a shared static. src/oci/cli.c grows parse_inspect_args + a cmd_inspect rewrite. The new flag set is --store DIR (override the platform default) and --all-platforms (the flag described above); the canonical-ref header print stays in cli.c so the slice-1 smoke output continues working when the store has no record. After the header, cmd_inspect opens the store and calls oci_inspect. rc 0 means success or pin miss; rc 1 means a real failure (malformed blob, blob missing, IO). tests/test-oci-inspect.c drives 6 cases against a pre-populated scratch store. The store is built directly with oci_blob_store_put_ bytes + oci_store_put_ref, not through oci_pull, so the test stays independent of the slice-4 fetcher and the slice-5a pipeline. open_memstream captures stdout into a heap buffer and the assertions grep for distinctive substrings (digest hex prefixes, "[arm64]", section headers) so format tweaks do not cause spurious failures. The 6 cases are: a direct image manifest (config + 2 layers, asserts no [2] index appears so off-by-one shows up); an image index where default mode drills the arm64 sub-manifest and amd64 / s390x stay hidden; the same index with --all-platforms (all three platforms listed, drill section absent); a pin miss for an unknown tag (rc=0, informational line); a digest reference whose blob is absent (rc=-1, errno=ENOENT, "error: manifest blob ... not found"); and the index-ok sub-manifest-missing case (stdout still has the platform table, rc=-1, errno=ENOENT, err_msg identifies the missing inner blob). The last case dup2's stderr to /dev/null around the run so the warning line does not pollute the test driver output. Makefile adds oci/inspect.c to SRCS. mk/config.mk registers tests/test-oci-inspect.c in NATIVE_TESTS so the cross-compile pattern rule skips it. The new link rule pulls in inspect.o, store.o, blob-store.o, digest.o, manifest.o, media-type.o, ref.o, and cJSON; no libcurl, no openssl. mk/tests.mk gains a test-oci-inspect target and runs it as a make-check stage after OCI-pull. make check stays fully green: 78 unit tests; busybox 81/0/3; proctitle low-stack; procfs-exec; timeout-disable; OCI-ref 34/34; OCI-digest 25/25; OCI-blob-store 14/14; OCI-manifest 76/76; OCI-fetch 15/15; OCI-store 9/9; OCI-pull 6/6; OCI-inspect 6/6. make test-oci-fetch-online (opt-in) still passes. elfuse oci inspect now has a real second pane: the slice-1 canonical header followed by either the rendered manifest tree or a clear "never pulled" notice. prune and list still return rc=2.

cubic-dev-ai

11 issues found across 40 files

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/oci/pull.c">

<violation number="1" location="src/oci/pull.c:253">
P2: Error-path leak: `sub_resp` may be allocated but not freed when sub-manifest fetch fails before `have_sub` is set.</violation>
</file>

<file name="src/oci/media-type.c">

<violation number="1" location="src/oci/media-type.c:100">
P2: Media type parsing is case-sensitive, but media type type/subtype tokens are case-insensitive; valid values with different casing will be misclassified as unknown.</violation>
</file>

<file name="src/oci/ref.c">

<violation number="1" location="src/oci/ref.c:83">
P2: Repository-path validation incorrectly rejects valid names with repeated dashes (for example `my--repo`).</violation>

<violation number="2" location="src/oci/ref.c:356">
P2: `docker.io` default-namespace detection is case-sensitive, so mixed-case hostnames can skip the required `library/` prefix.</violation>
</file>

<file name="src/oci/fetch.c">

<violation number="1" location="src/oci/fetch.c:782">
P2: Manifest fetch skips bearer-challenge parsing when a token is already cached, so 401 responses from expired/stale tokens are not retried with a refreshed token.</violation>

<violation number="2" location="src/oci/fetch.c:945">
P2: Blob fetch also disables challenge parsing when a token is cached, preventing 401-triggered token refresh and causing avoidable pull failures.</violation>
</file>

<file name="src/oci/blob-store.c">

<violation number="1" location="src/oci/blob-store.c:354">
P2: The commit path is not crash-durable because it never fsyncs the destination directory after linking the blob into place.</violation>
</file>

<file name="src/oci/store.c">

<violation number="1" location="src/oci/store.c:285">
P2: Fsync the pin directory after `rename` to make tag->digest updates crash-safe; file fsync alone does not persist the directory entry change.</violation>
</file>

<file name="src/oci/manifest.c">

<violation number="1" location="src/oci/manifest.c:295">
P2: `schemaVersion` parsing can accept fractional JSON numbers because `valueint` is used without an integer round-trip check.</violation>

<violation number="2" location="src/oci/manifest.c:385">
P2: Layer descriptor memory is leaked on post-parse validation failures because `nlayers` is incremented too late.</violation>

<violation number="3" location="src/oci/manifest.c:481">
P2: Index descriptor memory leaks when platform parsing fails because `nentries` is incremented after the fallible parse.</violation>
</file>

_{Tip: cubic can generate docs of your entire codebase and keep them up to date. Try it here.
Re-trigger cubic}

cubic-dev-ai · 2026-05-15T15:37:11Z

+            fflush(progress);
+        }
+
+        if (fetch_and_persist_manifest(fetcher, store, ref,


P2: Error-path leak: sub_resp may be allocated but not freed when sub-manifest fetch fails before have_sub is set.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/pull.c, line 253: <comment>Error-path leak: `sub_resp` may be allocated but not freed when sub-manifest fetch fails before `have_sub` is set.</comment> <file context> @@ -0,0 +1,346 @@ + fflush(progress); + } + + if (fetch_and_persist_manifest(fetcher, store, ref, + entry->desc.digest_str, + entry->desc.digest_str, &sub_resp, </file context>

cubic-dev-ai · 2026-05-15T15:37:11Z

+        return OCI_MT_UNKNOWN;
+
+    for (size_t i = 0; i < MEDIA_TYPE_COUNT; i++) {
+        if (!strcmp(MEDIA_TYPES[i].name, buf))


P2: Media type parsing is case-sensitive, but media type type/subtype tokens are case-insensitive; valid values with different casing will be misclassified as unknown.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/media-type.c, line 100: <comment>Media type parsing is case-sensitive, but media type type/subtype tokens are case-insensitive; valid values with different casing will be misclassified as unknown.</comment> <file context> @@ -0,0 +1,189 @@ + return OCI_MT_UNKNOWN; + + for (size_t i = 0; i < MEDIA_TYPE_COUNT; i++) { + if (!strcmp(MEDIA_TYPES[i].name, buf)) + return MEDIA_TYPES[i].kind; + } </file context>

cubic-dev-ai · 2026-05-15T15:37:11Z

+        } else {
+            return false;
+        }
+        if (i >= len || !is_lower_alnum(s[i]))


P2: Repository-path validation incorrectly rejects valid names with repeated dashes (for example my--repo).

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/ref.c, line 83: <comment>Repository-path validation incorrectly rejects valid names with repeated dashes (for example `my--repo`).</comment> <file context> @@ -0,0 +1,429 @@ + } else { + return false; + } + if (i >= len || !is_lower_alnum(s[i])) + return false; + } </file context>

cubic-dev-ai · 2026-05-15T15:37:11Z

+        goto oom;
+
+    bool needs_library_prefix =
+        strcmp(out->registry, DEFAULT_REGISTRY) == 0 &&


P2: docker.io default-namespace detection is case-sensitive, so mixed-case hostnames can skip the required library/ prefix.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/ref.c, line 356: <comment>`docker.io` default-namespace detection is case-sensitive, so mixed-case hostnames can skip the required `library/` prefix.</comment> <file context> @@ -0,0 +1,429 @@ + goto oom; + + bool needs_library_prefix = + strcmp(out->registry, DEFAULT_REGISTRY) == 0 && + memchr(path_start, '/', path_len) == NULL; + if (needs_library_prefix) { </file context>

cubic-dev-ai · 2026-05-15T15:37:11Z

+        return -1;
+    }
+
+    if (link(w->tmp_path, final_path) < 0) {


P2: The commit path is not crash-durable because it never fsyncs the destination directory after linking the blob into place.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/blob-store.c, line 354: <comment>The commit path is not crash-durable because it never fsyncs the destination directory after linking the blob into place.</comment> <file context> @@ -0,0 +1,399 @@ + return -1; + } + + if (link(w->tmp_path, final_path) < 0) { + if (errno != EEXIST) { + int saved = errno; </file context>

cubic-dev-ai · 2026-05-15T15:37:11Z

+            *err_msg = "close on pin tmp file failed";
+        return -1;
+    }
+    if (rename(tmp, path) < 0) {


P2: Fsync the pin directory after rename to make tag->digest updates crash-safe; file fsync alone does not persist the directory entry change.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/store.c, line 285: <comment>Fsync the pin directory after `rename` to make tag->digest updates crash-safe; file fsync alone does not persist the directory entry change.</comment> <file context> @@ -0,0 +1,360 @@ + *err_msg = "close on pin tmp file failed"; + return -1; + } + if (rename(tmp, path) < 0) { + int saved = errno; + unlink(tmp); </file context>

cubic-dev-ai · 2026-05-15T15:37:11Z

+            goto fail;
+        const cJSON *plat =
+            cJSON_GetObjectItemCaseSensitive(entry, "platform");
+        if (parse_platform(plat, &slot->platform, err_msg) < 0)


P2: Index descriptor memory leaks when platform parsing fails because nentries is incremented after the fallible parse.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/manifest.c, line 481: <comment>Index descriptor memory leaks when platform parsing fails because `nentries` is incremented after the fallible parse.</comment> <file context> @@ -0,0 +1,707 @@ + goto fail; + const cJSON *plat = + cJSON_GetObjectItemCaseSensitive(entry, "platform"); + if (parse_platform(plat, &slot->platform, err_msg) < 0) + goto fail; + out->nentries++; </file context>

Suggested change

if (parse_platform(plat, &slot->platform, err_msg) < 0)

out->nentries++;

if (parse_platform(plat, &slot->platform, err_msg) < 0)

goto fail;

cubic-dev-ai · 2026-05-15T15:37:12Z

+        if (parse_descriptor(desc, &out->layers[out->nlayers], err_msg) < 0)
+            goto fail;
+        oci_media_type_t lmt = out->layers[out->nlayers].media_type;
+        if (!oci_media_type_is_layer(lmt)) {
+            set_parse_err(err_msg,
+                          "manifest layer has non-layer media type");
+            goto fail;
+        }
+        if (oci_media_type_is_foreign(lmt)) {
+            set_parse_err(err_msg,
+                          "manifest references foreign (nondistributable) "
+                          "layer; not supported");
+            goto fail;
+        }
+        if (!oci_media_type_is_layer_supported(lmt)) {
+            set_parse_err(err_msg,
+                          "manifest layer media type is not supported "
+                          "(only tar / tar+gzip / tar+zstd)");
+            goto fail;
+        }
+        out->nlayers++;


P2: Layer descriptor memory is leaked on post-parse validation failures because nlayers is incremented too late.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/manifest.c, line 385: <comment>Layer descriptor memory is leaked on post-parse validation failures because `nlayers` is incremented too late.</comment> <file context> @@ -0,0 +1,707 @@ + set_parse_err(err_msg, "manifest layer entry is not an object"); + goto fail; + } + if (parse_descriptor(desc, &out->layers[out->nlayers], err_msg) < 0) + goto fail; + oci_media_type_t lmt = out->layers[out->nlayers].media_type; </file context>

Suggested change

if (parse_descriptor(desc, &out->layers[out->nlayers], err_msg) < 0)

goto fail;

oci_media_type_t lmt = out->layers[out->nlayers].media_type;

if (!oci_media_type_is_layer(lmt)) {

set_parse_err(err_msg,

"manifest layer has non-layer media type");

goto fail;

}

if (oci_media_type_is_foreign(lmt)) {

set_parse_err(err_msg,

"manifest references foreign (nondistributable) "

"layer; not supported");

goto fail;

}

if (!oci_media_type_is_layer_supported(lmt)) {

set_parse_err(err_msg,

"manifest layer media type is not supported "

"(only tar / tar+gzip / tar+zstd)");

goto fail;

}

out->nlayers++;

oci_descriptor_t *slot = &out->layers[out->nlayers];

if (parse_descriptor(desc, slot, err_msg) < 0)

goto fail;

out->nlayers++;

oci_media_type_t lmt = slot->media_type;

if (!oci_media_type_is_layer(lmt)) {

set_parse_err(err_msg,

"manifest layer has non-layer media type");

goto fail;

}

if (oci_media_type_is_foreign(lmt)) {

set_parse_err(err_msg,

"manifest references foreign (nondistributable) "

"layer; not supported");

goto fail;

}

if (!oci_media_type_is_layer_supported(lmt)) {

set_parse_err(err_msg,

"manifest layer media type is not supported "

"(only tar / tar+gzip / tar+zstd)");

goto fail;

}

cubic-dev-ai · 2026-05-15T15:37:12Z

+            *err_msg = type_msg;
+        return -1;
+    }
+    *out = item->valueint;


P2: schemaVersion parsing can accept fractional JSON numbers because valueint is used without an integer round-trip check.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/manifest.c, line 295: <comment>`schemaVersion` parsing can accept fractional JSON numbers because `valueint` is used without an integer round-trip check.</comment> <file context> @@ -0,0 +1,707 @@ + *err_msg = type_msg; + return -1; + } + *out = item->valueint; + return 0; +} </file context>

Phase 2 of issue sysprog21#31 needs zstd to decompress OCI image layers that carry application/vnd.oci.image.layer.v1.tar+zstd (or the Docker equivalent). zstd has wide registry support beyond gzip and is the only other compression in the OCI spec that real-world images use. Per oci-roadmap.md Q9, the OCI work stays hand-rolled C with no Go or Rust toolchain dependency. zstd cleanly separates decode-only from the full encoder, so the vendored subset is intentionally minimal: lib/zstd.h, lib/zstd_errors.h lib/common/*.{c,h} (allocator, FSE/Huff decoders, xxhash, portability shims, threading stubs) lib/decompress/*.{c,h} (streaming decode state machine) Compression, dictBuilder, deprecated, and legacy v01-v06 paths are excluded. lib/decompress/huf_decompress_amd64.S is also dropped: the build sets -DZSTD_DISABLE_ASM=1 so huf_decompress.c skips the AMD64 asm symbols, and the elfuse host is Apple Silicon in any case. Build wiring mirrors externals/cjson/: a per-file rule under build/externals/zstd/, project warning posture relaxed via -Wno-* because zstd is third-party code, configuration macros -DZSTD_DISABLE_ASM=1 -DZSTD_LEGACY_SUPPORT=0 -DZSTD_MULTITHREAD=0, and the objects statically embedded into elfuse so no -lzstd link line. -lz is appended to HVF_LDFLAGS for gzip-compressed layers (zlib is a macOS system library, no vendoring needed). externals/zstd/VENDORING.md records the upstream tag, the exact curl and cp commands, and the rule that only src/oci/decompress.c may include externals/zstd/lib/zstd.h.

@LongLink

Phase 2 layer unpack needs to walk tar entries out of decompressed layer streams. This commit adds the streaming reader on its own so the applier in a later commit consumes a stable typed-entry API instead of parsing tar headers inline. src/oci/tar.{c,h} parses POSIX 1003.1-1990 ustar headers, the ustar prefix+name join (allowing names up to 255 chars without the GNU extension), and the GNU '././@LongLink' typeflag-'L'/'K' records used when an OCI registry hands the reader a path or symlink target longer than 100 bytes. The reader collapses block, char, fifo, and socket typeflags into a single OCI_TAR_UNSUPPORTED variant so the applier can emit one precise refusal message per unpack-time rejection without re-decoding the typeflag. PAX extended headers are rejected outright with EPROTONOSUPPORT, per oci-roadmap.md Q3 asymmetric subset. If a real-world image is found to depend on a PAX-only field, expand the accept list with targeted parsing (mtime / size / path) rather than enabling generic PAX extension support. The on-roadmap risk register records this caveat. Header chksums are verified against both unsigned and signed-byte sums so historic and modern tar implementations interoperate. The base-256 GNU encoding for sizes that overflow 8 GiB octal is accepted; layer blobs that large are not realistic but the parser stays honest. The reader exposes a callback-driven byte source so the future oci/decompress.c can hand it zlib, libzstd, or passthrough streams without the tar parser caring which side feeds it. Short reads and sub-block chunking are handled internally via a 512-byte block realignment loop, validated by the unit test feeding 1, 5, 256, and 512-byte chunks of the same fixture. tests/test-oci-tar.c builds tar payloads in memory (no external tar) and exercises 19 cases covering: empty archive EOF, regular files at four chunk sizes, directories with trailing-slash normalization, symlinks, hardlinks, GNU long-name >100-char paths, PAX rejection, char/block/fifo collapsing to UNSUPPORTED, unknown typeflag rejection, chksum mismatch, .wh.<name> and .wh..wh..opq whiteout flagging, and implicit payload drain on next-iter calls. Makefile, mk/config.mk, and mk/tests.mk register the new translation unit plus the test-oci-tar build and run rules; the test is wired into make check after test-oci-inspect.

Phase 2 layer unpack needs to consume OCI layers in application/vnd.oci.image.layer.v1.tar+gzip, application/vnd.oci.image.layer.v1.tar+zstd, and the uncompressed application/vnd.oci.image.layer.v1.tar shapes. This commit puts gzip, zstd, and passthrough behind one oci_stream_t so the tar reader stays compression-agnostic. src/oci/decompress.{c,h} provides oci_decompress_open(fd, alg) plus streaming oci_stream_read / oci_stream_close. gzip routes through zlib's inflate with windowBits = 15 + 32 so the decoder auto-detects the gzip wrapper (raw deflate without a header is intentionally rejected because real OCI layers always carry the gzip wrapper). zstd routes through libzstd's streaming ZSTD_DCtx with ZSTD_d_windowLogMax = 27 (128 MiB), so a pathologically large window parameter is rejected with EINVAL before any output is produced. decompress.c is the only translation unit in elfuse that includes externals/zstd/lib/zstd.h. The build rule attaches -I$(ZSTD_DIR)/lib as a target-specific CFLAG so the rest of the codebase never sees zstd headers, keeping the public include surface to oci/decompress.h. A passthrough mode lets the tar reader consume OCI_COMPRESSION_NONE layers through the same API. The implementation does not buffer beyond the initial input buffer in passthrough mode; once exhausted it hands the caller's buf directly to read(2) so large uncompressed payloads stream without an extra copy. tests/test-oci-decompress.c covers five cases: passthrough, gzip roundtrip with a zlib-generated fixture, gzip truncated frame rejection, zstd roundtrip with an embedded byte-array fixture (produced once via the system zstd CLI because the vendored libzstd is decode-only), and the 28-bit window cap rejection regression. Makefile, mk/config.mk, and mk/tests.mk register the new translation unit, the target-specific zstd include path, and the test build / run rules. The test binary links the vendored zstd objects plus system zlib (-lz, already in HVF_LDFLAGS from the zstd vendoring commit).

elfuse unpacks layers as the invoking macOS user; chown to arbitrary uids/gids fails, and the host inode mode cannot always carry the Linux setuid/setgid/sticky bits a tar entry requests. Phase 3 still needs the original Linux view at runtime, so Phase 2 records the authoritative uid/gid/mode per guest path in a sidecar JSON file that lives alongside the unpacked tree. src/oci/layer-meta.{c,h} provides oci_meta_table_t with record / lookup / remove / count plus write and read helpers that serialize to <root_dir>/.elfuse-meta.json. The on-disk schema is { "version": 1, "entries": [ { "p": "/path", "u": NNN, "g": NNN, "m": NNN } ] } Mode bits are stored decimal because cJSON has no native octal; the bottom 12 bits encode rwx + setuid + setgid + sticky verbatim. The unit test confirms setuid 0104755 and sticky 0101777 round-trip faithfully. oci_meta_remove keeps the persisted table tight: whiteouts and tar overwrites drop the prior entry so a redundant sidecar tuple never shadows a path that no longer exists in the unpacked tree. Storage is a linear-scan dynamic array because OCI layers typically hold a few hundred to a few thousand entries; if profiling later shows the scan is hot, the same struct can sit behind an open-addressing FNV-1a hash without touching callers. Writes go through a tmp + fsync + atomic rename so an interrupted write never publishes a partially-flushed sidecar. Reads cap the file at 64 MiB so a hostile or corrupt sidecar cannot drag the host into swap; reject on malformed JSON or version mismatch with EINVAL, on missing file with ENOENT (the latter is the cold-cache signal that an old unpack predated the sidecar feature). xattr storage is intentionally absent: oci-roadmap.md Q3 commits Phase 2 to ignore-with-warning on xattr entries rather than fabricate a half-supported mapping between Linux user/security/ system xattr namespaces and the macOS extended-attribute domain. tests/test-oci-meta.c covers six cases: record / lookup / count / miss-ENOENT, idempotent overwrite, remove with no-op on missing path, write+read roundtrip preserving setuid and sticky, missing sidecar reports ENOENT, and malformed JSON rejected with EINVAL.

Phase 2 layer unpack drives every entry of every layer's decompressed tar stream into the unpack root in strict manifest order. The applier ties tar reader, decompression dispatch, and sidecar metadata together; the next commit (sparse APFS volume bootstrap) and the one after (clonefile per-run rootfs) build on this surface. src/oci/layer-apply.{c,h} exposes oci_layer_apply(reader, root, stats, meta, err) plus oci_path_join_safe and oci_symlink_target_check. The latter two are exported for unit tests because the path containment rules they enforce are the security boundary unpack relies on: oci_path_join_safe mirrors src/syscall/path.h::path_translate_at from PR sysprog21#33. Reject leading '/' (absolute), reject any segment equal to `..`, reject empty paths. Real OCI layers ship paths relative to the layer root; anything else is hostile. oci_symlink_target_check parses the symlink target as if a follower started at link_dir under sysroot=root. Absolute targets get treated as sysroot-relative (which matches how the guest will follow them at runtime through src/syscall/proc-state.c::sysroot_path_is_contained). The check tracks running depth and rejects any drop below zero with ELOOP, so `escape -> ../../../etc/passwd` from inside the unpack root is refused before symlink(2) ever fires. Whiteout handling follows the OCI image-spec layer change-set rules: .wh.<name> the upper-layer entry <name> is removed recursively from the unpack root; the sidecar drops any prior tuple for the same guest path so the persisted .elfuse-meta.json never references a path that no longer exists on disk. .wh..wh..opq the containing directory's lower-layer contents are cleared; subsequent entries in this same layer (e.g. dir/kept) survive because they land after the marker. Hardlinks resolve the target as an intra-archive guest path through oci_path_join_safe and require lstat(target_host) to succeed; a hardlink to a missing target is rejected with ENOLINK, since the OCI spec mandates apply order and forward references would mean a malformed archive. Mode bits propagate via fchmod to whatever the host inode can carry; setuid/setgid/sticky and the rwx triplets are also recorded in the running oci_meta_table_t so Phase 3 has the authoritative Linux view regardless of what the host kernel let elfuse apply as a non-root user. Block, char, fifo, and socket entries are refused with ENOTSUP at this layer because the tar reader already collapses them into OCI_TAR_UNSUPPORTED; the applier surfaces the precise error per oci-roadmap.md Q3. A local la_strchrnul shim sidesteps the macOS 15.4 deployment-target gate on strchrnul, so the applier builds against older SDKs without an __builtin_available block. tests/test-oci-layer-apply.c covers nine cases: basic mixed-entry apply (regular + dir + symlink + hardlink with inode parity check), symlink escape rejected with ELOOP, hardlink missing target rejected with ENOLINK, whiteout removal, opaque whiteout clears prior dir contents while later entries survive, char-device entry rejected with ENOTSUP, path-join `..` rejected with EINVAL, path-join absolute rejected with EINVAL, and the legal-target acceptance path.

Phase 2 unpack requires a case-sensitive filesystem (oci-roadmap.md Q1) so Linux layers that ship colliding names (Foo and foo in the same directory, common in man pages and many distros) survive without silent merging. macOS data volumes default to case-insensitive APFS, so elfuse provisions its own sparsebundle. src/oci/volume.{c,h} resolves the sysroot volume root and provisions on first use. The default path is $HOME/Library/Application Support/elfuse/sysroots/ with a sparsebundle backing image at $HOME/Library/Application Support/elfuse/sysroots.sparsebundle Bootstrap delegates to src/core/sysroot.h::sysroot_create_mount, which already wraps the hdiutil create + attach sequence with a case-sensitive APFS format. No duplicated hdiutil orchestration. A pthread-mutex-protected cache keeps the mount handle alive across multiple oci subcommand invocations within one elfuse process, so running `elfuse oci pull` followed by `elfuse oci unpack` does not re-attach the sparsebundle on every command. `--volume DIR` overrides go through sysroot_probe_case_sensitivity from PR sysprog21#33. Non-case-sensitive directories are refused with EINVAL rather than silently engaging the case-fold sidecar in src/syscall/sidecar.c; the sidecar is a runtime fallback for guests, not a Phase 2 unpack policy (see the design note in oci-roadmap.md Q1: a single sparse APFS volume beats both "require user-provided case-sensitive volume" and "strict collision rejection"). oci_volume_subdir creates intermediate components for the images/, runs/, and images/.staging/ subtrees the next two commits (clonefile copy-up + unpack orchestrator) will write to. Existing directories are tolerated; assembly failures surface the underlying errno. tests/test-oci-volume.c covers the override rejection path and the subdir creation. The default-sparsebundle bootstrap is gated behind OCI_VOLUME_TEST=1 because hdiutil orchestration costs ~150 ms and ~16 MiB of disk on every first invocation; make check runs the ungated subset (case-insensitive rejection, ENOENT on missing path, subdir creation).

Phase 2 commits oci-roadmap.md Q2 to APFS clonefile-based copy-up: each `elfuse oci clone` invocation gets a fresh directory tree cloned from the immutable image sysroot. APFS file-level CoW makes the clone nearly O(1) at start and only allocates new blocks for files the guest actually modifies, so the rootfs model is cheap on both wall time and disk. This is structurally a place elfuse beats VM-backed runtimes: Docker Desktop and OrbStack run Linux overlayfs inside a guest kernel; elfuse has no guest kernel, and the closest macOS-native primitive (clonefile) gives roughly the same "cheap per-container view of a shared base" property without the in-guest daemon. fuse-overlayfs via the PR sysprog21#35 guest FUSE transport would also work in theory, but oci-roadmap.md Q2 deliberately rejects it: it adds an in-guest daemon, costs IPC on every syscall, and offers nothing clonefile cannot already do for the unpack tree. src/oci/clone-rootfs.{c,h} exposes: oci_clone_rootfs(src_image_dir, volume_root, **out_run_dir, **err) Allocates a fresh <volume>/runs/<random>/ slot, calls clonefile(src, dst, CLONE_NOFOLLOW), returns the absolute path. CLONE_NOFOLLOW prevents a symlink at the src root from pulling the clone off the immutable image; the layer applier from the previous commit already rejected escape-symlinks inside the tree. oci_clone_rootfs_remove(run_dir, **err) Recursive cleanup. Tolerates ENOENT so a CLI flow can call remove unconditionally on the success path without surfacing "file not found" when there is nothing to remove. oci_clone_rootfs_gc(volume_root, older_than, **err) Phase 2 stub. Phase 3 will walk volume_root/runs/ and unlink entries older than older_than for `elfuse oci prune`. Apple's clonefile(2) is recursive across directories since macOS 10.12. Hardlinks INSIDE the source tree survive the clone metadata pass; cross-tree hardlinks back to the immutable image are NOT created, so the layer applier's intra-archive hardlink handling in the previous commit was load-bearing. Run-id generation uses getentropy for 12 hex chars (48 bits), which is ample for elfuse process lifetimes and avoids the predictability of a time-or-counter-based scheme. tests/test-oci-clone.c covers three cases: CoW preservation (mutate the clone, assert source unchanged), no-op remove on a missing path, and the gc stub. The CoW test skips with a clear message if clonefile returns ENOTSUP, so a future non-APFS scratch directory does not turn the suite red.

This commit ties tar reader, decompression dispatch, layer applier, sidecar metadata, sparse APFS volume bootstrap, and clonefile-based copy-up together into one orchestrator and exposes the user-facing surface via two new subcommands. src/oci/unpack.{c,h} provides oci_unpack(store, ref, opts, **out_image_dir, **err). Pipeline: 1. oci_volume_ensure resolves and provisions the sysroot volume (default sparsebundle or --volume override); non-case-sensitive overrides are rejected with EINVAL before any disk write. 2. oci_volume_subdir provisions images/ and images/.staging/. 3. resolve_manifest_digest prefers ref->digest when set, otherwise reads the tag pin via oci_store_get_ref; pin miss surfaces ENOENT so the CLI can print a "run oci pull first" hint. 4. read_blob loads the manifest body via oci_blob_store_path. If the body is an image index, oci_index_pick_linux_arm64 selects the arm64 sub-manifest and re-reads it. 5. For each layer in manifest order: foreign / nondistributable layers refuse with ENOTSUP, reverify_layer_digest re-runs SHA-256 over the on-disk blob bytes (defensive, even though Phase 1 already verified at write time), then oci_decompress_open + oci_tar_reader_new + oci_layer_apply drive the entry stream into a staging tree under <volume>/images/.staging/<random>/. 6. oci_meta_write commits .elfuse-meta.json into the staging tree. 7. rename(2) atomically moves the staging tree into the final images/sha256-<hex>/ slot. Re-running with the same ref short-circuits when the final slot exists; --force removes the prior commit (via rm -rf) before staging. src/oci/cli.c gains cmd_unpack and cmd_clone: elfuse oci unpack [--store DIR] [--volume DIR] [--force] [-q] <ref> elfuse oci clone [--store DIR] [--volume DIR] [--name N] [--keep] <ref> Both subcommands print exactly one line on stdout: the absolute path of the unpacked or cloned tree. Trailing slash on unpack lets $(elfuse oci unpack alpine)/bin compose cleanly. Diagnostic noise and per-layer apply progress flow to stderr so the stdout contract stays scriptable. clone implies unpack: it calls oci_unpack first and then oci_clone_rootfs to materialize a fresh <volume>/runs/<random>/ under the same sparsebundle. --keep is forward-looking (Phase 2 does not auto-clean either way), --name is reserved for Phase 3. tests/test-oci-unpack.c is the integration smoke. Every constituent module already has dedicated unit coverage in test-oci-{tar,decompress,layer-apply,meta,volume,clone}, so this file confirms the link-time dependency edges plus a useful invariant: oci_unpack must NOT spin up hdiutil or hit the network without a valid volume context. The full end-to-end fixture is reserved for Phase 3 where the e2e suite gains a shared tests/lib/oci-fixture alongside tests/lib/oci-mock. End-to-end smoke (on the author's Apple Silicon Mac, with network): ./build/elfuse oci pull alpine:latest IMG=$(./build/elfuse oci unpack alpine:latest) test -f "${IMG}lib/ld-musl-aarch64.so.1" # interpreter present ROOT=$(./build/elfuse oci clone alpine:latest) ./build/elfuse --sysroot "$ROOT" /bin/sh -c 'echo ok' Phase 2 is intentionally a no-op for `elfuse run IMAGE`; Phase 3 wires that and the Entrypoint / Cmd / Env / User merge.

hdiutil prints messages like '"diskN" ejected.' to stdout even on success, and 'created: <path>' on hdiutil create. Phase 2's oci unpack and oci clone subcommands promise a single-line stdout contract (the unpacked or cloned tree path), so a downstream ROOT=$(elfuse oci clone alpine:latest) would otherwise capture an hdiutil progress line into $ROOT and break every subsequent flow that treats $ROOT as a path. Phase 1 never had a caller that strictly cared about stdout, so this surfaced only with Phase 2 in place. Add spawn_simple_silent that posix_spawn_file_actions_addopen's /dev/null over the child's fd 1, and route the two stdout-printing hdiutil callers (detach via sysroot_detach_mountpoint_force, create via the sparsebundle bootstrap path inside sysroot_create_mount) through it. Stderr is intentionally left alone so genuine hdiutil error output still surfaces for diagnostics. The hdiutil attach path already used spawn_capture_stdout to parse the plist, so it was never a leak source. End-to-end smoke after the fix: ROOT=$(./build/elfuse oci clone alpine:latest) ./build/elfuse --sysroot "$ROOT" "$ROOT/bin/busybox" \ sh -c 'echo hello from inside oci alpine' now prints exactly the canonical greeting on stdout with no hdiutil noise mixed in.

Extends the inspect renderer with a runtime: section below the layer table that surfaces the launch contract elfuse oci run will honor in a later Phase 3 commit. The block lists User, WorkingDir, Entrypoint, Cmd, and Env from the image-config blob referenced by the manifest's config descriptor, giving an operator a single view of how a pulled image expects to be invoked. Reads the config blob with the existing read_blob_file helper and the oci_image_config_parse parser, both already present in Phase 1. The read is best-effort: a missing or malformed config blob leaves the block out silently instead of failing the whole inspect, since the manifest tree is the primary signal and the config digest is already named in the layer table. Both the direct-manifest path and the index drill path now pass blobs into render_manifest so they share the same rendering. Absent fields skip their bullet entirely so an image that only sets Cmd does not advertise five empty rows; explicit empty arrays render as [] so an operator can tell "field present but empty" apart from "field absent". Entrypoint and Cmd render as JSON-style arrays with backslash and double-quote escaping; Env folds onto continuation lines indented to the value column to keep grep-friendly VAR=value shape across multiple variables. Extends tests/test-oci-inspect.c with full image-config coverage in the direct-manifest case (all five runtime fields populated, with substring assertions for each rendered line) and adds a new empty-Env case that verifies the explicit-empty bullet and the absent-field omissions for User, WorkingDir, and Entrypoint.

Introduces src/oci/runspec.{c,h}, a pure-data module that folds the image-config runtime block (User, WorkingDir, Entrypoint, Cmd, Env) together with elfuse oci run CLI overrides into a concrete launch bundle: guest cwd, argv, envp, and optional uid/gid credentials. No filesystem touches, no PATH search, no syscalls -- those concerns belong to follow-up Phase 3 commits. The split keeps the override matrix and the Env policy verifiable by a unit test that builds oci_image_runtime_t literals in C. Argv assembly walks the override matrix documented in the Phase 3 plan. --entrypoint clobbers both image Entrypoint and image Cmd ([override] ++ CLI args). Image Entrypoint plus image Cmd ride together when no CLI args were given; once any CLI positional appears, the image Cmd is dropped and the CLI args take its slot. The one hard-fail case is the all-empty path (image has neither Entrypoint nor Cmd and the CLI supplied no argv), which returns EINVAL with "image has no entrypoint or cmd; pass one on the CLI". Env merge starts from the image Env array, applies CLI -e overrides in order (KEY=VAL set-or-replace; bare KEY imports the matching host environ value when present, otherwise drops silently), auto-imports TERM from the host when the merged Env has no TERM, injects the Linux PAM-default PATH when no PATH key has landed, and forces container=elfuse so systemd-style sandbox detection works regardless of what the image declared. CLI overrides whose KEY starts with DYLD_ hard-fail with EINVAL because DYLD_* is a macOS-only loader contract with no guest meaning; image-provided DYLD_* entries pass through (aarch64 Linux ignores them, so the runtime cost of stripping exceeds the safety win). WorkingDir defaults to "/" when neither the image nor the CLI sets it; relative paths and any path containing a ".." segment hard-fail with EINVAL. Sysroot containment is enforced later by the path-resolve module and the syscall layer. User accepts numeric "UID" or "UID:GID". Symbolic users such as nginx fail with the deterministic Phase 4 pointer message ("NSS resolution not yet implemented"). UID-only inputs default GID to the same value to match the proc_set_ids triple-set call shape the Phase 3 plan describes. CLI --user takes precedence over image User; both routes share the same numeric parser but emit distinct diagnostics so the user can tell whether the bad value came from their flag or from the pulled image. Error reporting uses a thread-local 512-byte buffer for dynamic messages and static string literals for the fixed ones. The header documents the shared "*err valid until the next call from this thread" lifetime contract so the caller does not have to branch on which path failed. The new tests/test-oci-runspec.c covers every row of the argv override matrix (8), every step of the Env policy including TERM and PATH gates (11), every User parse outcome (6), and every WorkingDir validation case (5) -- 30 cases total, all green. Wires the binary into Makefile / mk/config.mk / mk/tests.mk under 'make check'.

Introduces src/oci/path-resolve.{c,h}, the pre-launch helper that takes a guest argv[0] (POSIX execvp semantics), the merged PATH from the runspec env, and the guest cwd, and returns both the host filesystem path elfuse should open() to load the binary and the guest-absolute path the guest itself thinks it is running. The split is necessary because the host opens a file inside the cloned rootfs while the guest reads /proc/self/exe and argv[0] expecting its own absolute view. Containment policy: every candidate is fed to realpath(3) and the resolved path must land inside the sysroot. Escape symlinks (a layer mistake or a malicious image dropping /usr/bin/foo -> ../../../etc/ passwd) are silently skipped so the PATH search continues past them to the next entry. This matches runc's escape-symlink handling and keeps the launch deterministic regardless of layer order. The containment uses realpath internally but the returned host_path stays as the symlink-as-found so the guest sees argv[0] under the name it was invoked with (the kernel handles symlink resolution at open time). POSIX execvp semantics: argv0 containing '/' bypasses PATH (absolute argv0 mapped to <sysroot><argv0>; relative argv0 anchored to cwd_guest). Otherwise PATH is split on ':' and each entry is treated as a guest-absolute directory; empty entries fall back to cwd_guest per POSIX. Executability is decided by host stat(2) (which follows symlinks) against st_mode & 0111. PATH search records the first found-but-not-executable candidate and surfaces EACCES if no later entry succeeds, mirroring execvp's "first noexec wins" behaviour. Diagnostics carry the guest argv[0] quoted. PATH search misses also quote a colon-separated list of directories that were actually probed (empty searched-dirs annotation when PATH was empty, no annotation at all for direct-mode argv0 with '/'). Escape symlinks and broken chains do NOT show up in the searched list: they are directories that contributed no host candidate, so an operator reading the error sees the dirs that were genuinely walked. The module owns a thread-local 1 KiB err buffer for dynamic messages. The module deliberately does NOT reuse src/syscall/path.c's path_translate_at because that resolver is tied to the running guest's live sysroot/cwd plumbing while this resolver runs before the vCPU starts. Containment via realpath is the same idea but the input/output contracts differ. The new tests/test-oci-path-resolve.c covers PATH-search hits and misses, internal symlink follow (host_path keeps the symlink-as-found), escape-symlink filter (skipped from search and from searched-dirs list), EACCES on noexec (both PATH and direct modes), ENOENT diagnostics with and without the searched-dirs suffix, relative argv0 anchored to cwd_guest, and empty-PATH handling -- 11 cases, all green. macOS expands /tmp -> /private/tmp via realpath, so the test scratch root is realpath'd at construction time to keep the assert-equal comparisons honest. Wires the binary into Makefile / mk/config.mk / mk/tests.mk under 'make check'.

Splits the post-CLI VM bring-up out of src/main.c into a new src/core/launch.{c,h} module so the Phase 3 oci run orchestrator (commit 5) can share one launch path with the legacy positional-ELF main. No functional change: the same guest_bootstrap_prepare -> sysroot casefold probe -> guest_bootstrap_create_vcpu -> GDB stub -> vcpu_run_loop -> teardown sequence runs in the same order against the same inputs. launch_args_t carries everything elfuse_launch needs in one struct so the call shape is stable across future callers: elf_path, sysroot, guest_argv (NULL-terminated heap copy), envp (NULL -> host environ), gdb port and stop-on-entry, timeout, verbose, plus three forward- looking fields that Phase 3 commit 5 will start populating (has_creds + uid/gid for OCI User spoofing, cwd_guest for image WorkingDir, fork_child_fd / vfork_notify_fd for any future routing of the fork-child entry through one launch struct). The proctitle rewriting call stays in main(). Its old position was between guest_bootstrap_prepare and guest_bootstrap_create_vcpu, neither of which read the original argv block, so moving it to right before the elfuse_launch call is a behavior-preserving move. The reason it must stay in the caller at all is the same as before: runtime_set_process_title needs the live argv pointer the kernel handed in, not the strdup'd shadow that guest_argv carries. shim_blob.h follows the bring-up into launch.c. main.c no longer references shim_bin / shim_bin_len, and a single definition site keeps the linker honest. The HVF headers (Hypervisor/hv_vcpu) drop out of main.c with the rest of the VM types. cleanup_main_resources shrinks: the guest_t and guest_initialized arguments are gone (elfuse_launch owns those), so main()'s remaining cleanup is just the host cwd restore, the --create-sysroot detach, the heap argv free, and the elf_path / sysroot_path free. Every pre-launch error path in main() now calls cleanup with the new (smaller) signature. Regression gate (mandatory per Phase 3 plan): - 'make check' captured before the refactor (Phase 3 commit 3 HEAD). - Refactor applied, 'make check' re-run. Both summary blocks are byte-identical: 78 internal aarch64 tests + 81/84 busybox applets + every OCI suite (ref/digest/blob-store/manifest/fetch/store/ pull/inspect/tar/decompress/meta/layer-apply/volume/clone/unpack/ runspec/path-resolve) at the same pass counts. - 'build/elfuse build/test-hello' smoke prints "hello" as expected. - tests/test-matrix.sh elfuse-aarch64 was not run; the local worktree has no externals/test-fixtures checkout. The 'make check' proctitle low-stack regression + busybox applet suite cover the same proctitle / signal / dynamic-linking surface that the matrix would have exercised.

Closes the Phase 3 launch loop. The new src/oci/run.{c,h} module walks the orchestration the plan calls for: 1. oci_unpack into the APFS sysroot volume (idempotent; no-op if layers already extracted, hard fails if the image was never pulled) 2. resolve the volume root via oci_volume_ensure so clone-rootfs lands in the same sparsebundle as unpack 3. oci_clone_rootfs into <volume>/runs/<id>/ via clonefile(2) 4. read + parse the manifest, then the image config blob, off the local blob store 5. fold the image runtime block and the CLI overrides into one launch bundle via oci_runspec_build (Phase 3 C2) 6. mkdir -p the resolved WorkingDir under the cloned rootfs, best-effort chown to spec.uid:spec.gid (macOS rejects fchown for non-root callers spoofing arbitrary uids; sidecar metadata will record the intended owner once Phase 4 lands) 7. resolve argv[0] inside the cloned rootfs via oci_path_resolve (Phase 3 C3) so PATH search and sysroot containment happen before bring-up 8. swap argv[0] for the guest-absolute path so the guest's /proc/self/exe matches the name it was invoked under 9. save host cwd and chdir into <run_dir><spec.cwd> so the guest inherits its OCI WorkingDir 10. assemble launch_args_t and dispatch through elfuse_launch (Phase 3 C4); a process-global launch override hook lets the unit test substitute a capture-and-return-0 stub instead of spinning up a real HVF VM 11. restore host cwd, free intermediate state, remove the clone dir unless --keep is set; the cleanup runs on launch failure too so a failed run does not leave stale clones on the volume oci_cli_run handles the user-facing CLI surface: --store / --volume / --entrypoint / -e KEY[=VAL] (repeatable) / -w / -u / --keep / --name (reserved; clone-rootfs has no deterministic-name slot today) / IMAGE / ARG-tail. Parsing follows the same shape as cmd_pull / cmd_clone; flag-walk until the first non-flag, IMAGE next, everything after is positional argv. The dispatcher in src/oci/cli.c gains the "run" case between "clone" and "prune". Test coverage in tests/test-oci-run.c (6 cases, all green): - cli: -h prints the run usage block, rc=0 - cli: missing IMAGE returns rc=2 - cli: unknown option returns rc=2 - cli: -e without a value returns rc=2 - run: --volume=/tmp (case-insensitive on default macOS APFS) fails fast inside oci_unpack -> oci_volume_ensure; the launch override never fires - run: ref with no local pin reports an ENOENT-class failure; the launch override never fires The test ships a process-local elfuse_launch stub that abort()s if called. Every case installs a hook via oci_run_set_launch_for_testing before invoking the orchestrator, so the stub is purely a linker satisfier and lets the test binary skip core/launch.o and the entire VM/syscall transitive chain. End-to-end launch coverage (actually running a guest from a hand-built fixture store) is the job of the Phase 3 commit 6 compat shell harness, which has the fixture builder and a real sparsebundle path. The orchestrator owns a thread-local 2 KiB err buffer so dynamic diagnostics (quoted argv[0] + searched PATH list propagated up from path-resolve) can flow through *err to the CLI driver.

Lands the Phase 3 commit-6 surface: a standalone OCI fixture builder tool, a shell harness that drives the new oci run subcommand end-to-end against a hand-built store, and the user-facing docs/usage.md section that documents the override matrix, env policy, and scope guardrails. tests/lib/oci-fixture-builder.c is a self-contained CLI that takes a store root, a ref, image-config flags (--entrypoint, --cmd, --env, --workdir, --user), and one or more uncompressed-tar layer files, then hashes + writes the layer blobs, synthesizes the image-config JSON (with rootfs.diff_ids tying back to the uncompressed-layer digests per OCI spec), writes the config blob, builds + writes the manifest blob, and pins the ref. The tool is offline and reusable for any "shape an image from local files" workflow, not just the compat suite. cJSON does the JSON assembly so the output matches what the Phase 1 parser expects byte for byte. tests/test-oci-compat.sh runs in three layers: 1. Default mode (always under make check): - CLI surface smokes: --help renders the run usage block (rc=0), missing IMAGE / unknown option / -e without value all return rc=2. - Fixture-builder integration: assemble a tiny one-layer fixture under a scratch tmpdir (the layer is a tar containing the project's existing test-hello aarch64 assembly stub), assert exit 0, assert the store now has 3 blobs (layer + config + manifest) and a ref pin file, then drive `elfuse oci inspect` against the fixture and assert the runtime block renders with the entrypoint / env / user lines we supplied. 2. OCI_COMPAT_TEST=1 (gated): - Reserves a slot for the alpine-shaped / busybox-shaped / two-layer-whiteout end-to-end fixtures from the Phase 3 plan, which need an hdiutil-backed sparsebundle (case-sensitive APFS) to exercise the actual elfuse oci run launch. The heavy harness lands in a follow-up patch alongside the Phase 4 work so this commit stays scoped to what default `make check` can verify. 3. OCI_FETCH_ONLINE=1 (gated): - Sibling slot for the docker.io/library/alpine:3 pull+run check. Skipped by default; the heavy compat matrix ships the harness. mk/tests.mk gets two new targets: oci-fixture-builder (build the tool on its own) and test-oci-compat (run the shell harness, wired into make check). The harness depends on the ELFUSE_BIN, the builder, and the existing TEST_HELLO_DEP so make picks the right order. docs/usage.md gains a "Running OCI Images" section before the compatibility model: a tabular options list, the full argv override matrix, the env merge policy (image base -> -e KEY=VAL -> -e KEY host import -> TERM auto-import -> Linux PAM default PATH -> container=elfuse forced injection, with the DYLD_* CLI-reject rule explicit), the User / WorkingDir guardrails (numeric only, no `..` segments), and the Phase 3 scope notes spelling out what is Phase 4 work and what is permanently out of scope. Counts: tests/test-oci-compat.sh reports 10/10 passes in default mode (4 CLI smokes + 6 fixture-builder integration checks) plus 2 SKIP lines for the gated harnesses. Full make check still green across every OCI suite (no Phase 1 / Phase 2 / earlier Phase 3 regression).

The store root now carries a spec-compliant <root>/oci-layout marker so external tools (skopeo, umoci, crane) can consume the directory as oci:<root>. The write is atomic (tmp + link, EEXIST = happy path) and idempotent: a pre-existing marker is never rewritten, preserving any third-party version bump.

Adopt the OCI image-layout v1.0.0 index.json schema as the single source of truth for tag-to-digest pins, replacing the per-tag flat files under refs/<registry>/<repository>/<tag>. Each pin is one manifests[] descriptor keyed by org.opencontainers.image.ref.name; mediaType, digest, and size mirror the manifest blob on disk. Writers serialize the read-modify-write of index.json via flock(<root>/index.json.lock, LOCK_EX) and publish atomically through tmp + rename, so concurrent pulls of distinct tags both land. Readers parse the rename-atomic snapshot lock-free. Stop writing refs/ entirely. A pre-existing refs/ directory is left untouched so a downgrade still finds the legacy data; C2.3 will migrate older stores on open. Also expose oci_store_list_refs for downstream callers (Plan 1 root-set, Plan 4 oci status). test-oci-store gains schema validation, enumeration, and concurrent-writer coverage.

C2.2 stopped writing the legacy refs/ pin tree but left readers unable to see pins that pre-dated the index.json schema. C2.3 detects such a store on oci_store_open and rebuilds index.json from refs/ contents in place, keeping refs/ on disk so a downgrade to the pre-C2.2 binary still finds its data. Migration acquires flock(index.json.lock, LOCK_EX) before the read-modify-write so it cannot race a concurrent first put_ref; a re-stat under the lock bails out when another opener completed the migration first. Pins whose manifest blob is missing from blobs/ are skipped with a stderr warning rather than aborting the open, since a single dangling pin should not block recovery of the rest of the store. The walker handles arbitrarily deep repository paths (ghcr.io/owner/group/sub/img) and rejects malformed leaves (too shallow, dotfiles) with explicit log lines. Migration can be suppressed by setting ELFUSE_OCI_NO_MIGRATE in the environment, which leaves refs/ visible only to a downgraded binary and makes oci_store_get_ref return ENOENT until the env var is cleared on a later open. This is the documented escape hatch for downgrade tests and recovery workflows. Three new test cases cover the path: a two-pin fixture migrates and survives a reopen without re-running; ELFUSE_OCI_NO_MIGRATE keeps index.json absent and the legacy pin invisible; and a coexisting refs/ alongside an existing index.json leaves the index.json byte-for-byte untouched on reopen. All 18 store unit tests plus the wider OCI suites (blob-store, pull, inspect, run, compat) remain green.

The oci_unpack pipeline now writes .elfuse-origin.json into the staging directory before the final rename. The file records manifest_digest, config_digest, and the rootfs.diff_ids array parsed from the image config blob. This is the on-disk attribution Plan 1's root-set walker needs to map an unpacked sysroot back to every blob it depends on, so a future prune sweep does not delete layer blobs still in use. A new oci_origin_write helper in src/oci/origin-meta.c implements the atomic tmp+fsync+rename pattern, mirroring src/oci/layer-meta.c. The helper is failure-fatal at the unpack call site: a missing origin file would silently break GC, which is unrecoverable, so the staging directory is torn down and oci_unpack returns -1 on any write error. New unit suite tests/test-oci-origin.c covers single-diff, multi-diff order, empty diff arrays, rewrite-overwrites-atomically, NULL/empty guards, and NULL diff_ids fallback. Build wiring adds origin-meta.o to test-oci-unpack and test-oci-run link lists and registers test-oci-origin in make check.

oci_store_collect_roots accumulates every blob digest still reachable from on-disk state into a sorted digest set: pins in index.json drive the manifest/config/layer walk, and unpacked image trees under <volume>/images/sha256-<hex>/ contribute their origin sidecar's manifest digest as another walk root. The mark phase Plan 1's prune sweep needs in C1.3; pure read, no mutation. For each manifest digest the walker reads the blob and parses it. An image-manifest contributes its config descriptor plus every layer descriptor. An image-index contributes every sub-manifest descriptor and recurses into the ones whose blob is on disk so config + layers join the keep set; sub-manifests for un-fetched platforms still get their descriptor digest recorded so a sweep cannot delete the platform that did materialise. Failure model is fail-fast on anything that would let prune later delete a reachable blob: a missing manifest blob, an unparseable manifest, a missing or malformed .elfuse-origin.json, or a missing image-config blob all return -1 with err populated. A missing <volume>/images/ tree is the fresh-store case and treated as zero contribution rather than an error. The opposite policy (soft skip on corrupt origin) would let one broken tree leak a sweep that deletes blobs it actually needs, which is unrecoverable; fail-fast is the safer side of the data-loss vs. ergonomics trade. New module src/oci/digest-set.c/h: sorted strdup array with bsearch contains and lower-bound add. Working set is in the low hundreds, so O(n) insertion stays cheap and the API leaves room for a hash-backed implementation later if profiling proves the sweep hot. src/oci/volume-list.c is split off from volume.c on purpose. The mount/provisioning path pulls in core/sysroot.o and the hdiutil chain; the read-only enumerator only needs opendir + lstat. Keeping them in the same translation unit would force every store-linking test to link the sysroot stack just to walk a directory. The new oci_volume_list_unpacked stays in the same oci/volume.h namespace. src/oci/origin-meta.c gains oci_origin_read / oci_origin_free alongside the C1.1 writer; the layer-meta module is the precedent for read + write in the same translation unit. Reader validates that manifest_digest, config_digest, and layer_diffids are present and correctly typed; mistyped or missing fields surface as EINVAL so the garbage collector treats a malformed sidecar as a fatal root-set hole. Seven new cases in test-oci-store.c cover the empty store, a single pin (manifest + config + layer), two pins with one shared layer (dedup), an unpacked tree without any pin, the pin + unpacked combination, a corrupt origin sidecar (fail-fast), and a pin whose manifest blob has been unlinked from blobs/ (fail-fast). 25/25 store tests green; full OCI matrix (origin, unpack, pull, inspect, run, blob-store, compat) all still pass.

C1.3 wires the previously-stub elfuse oci prune command to a real mark-and-sweep collector. The mark phase reuses oci_store_collect_roots (C1.2) over pins in index.json and unpacked sysroots under --volume. The sweep walks blobs/sha256/ and blobs/sha512/, unlinking any blob whose digest is not in the keep set. Locking: prune runs under flock(index.json.lock, LOCK_EX) for the duration of mark + sweep so a concurrent put_ref cannot publish a new pin between collect_roots and sweep. Pull-side blob commit remains lock-free; a pull interrupted mid-blob is treated as a transient that the caller re-fetches on retry. CLI: --commit gates the unlink (default is dry-run with a "(dry-run; pass --commit to delete)" footer). --volume mirrors unpack/clone so the same volume root contributes its unpacked sysroots to the keep set. Output is two lines (reclaimable + kept) to stdout, ready for shell composition. Subdirectories under blobs/<algo>/ and files whose names are not valid lowercase hex of the right length are skipped without surfacing as errors, leaving foreign state (external tool metadata, hand-created directories) untouched. Tests: 8 new cases in test-oci-store.c covering dry-run vs commit, no-pins-no-volume, unpacked-tree as sole root, mark failure abort, idempotent re-run, decoy subdir + non-hex filename ignored, and NULL-arg EINVAL. compat shell gains 5 prune-smoke lines exercising the full CLI dispatch on the fixture-builder store.

oci_store_prune now classifies dangling blobs into a candidate list before unlinking and runs two optional filter passes that flip candidates from PRUNE to SKIP: --older-than DUR vetoes per-blob: a dangling blob whose mtime is younger than (now - DUR) survives. The grace window protects the blob committed by a half-completed pull whose put_ref has not landed yet. --keep-bytes SIZE enforces a global LRU budget over candidates that survived the older-than veto. Survivors are sorted by mtime ascending and walked newest-first; the newest blobs whose cumulative size fits SIZE are reclassified as SKIP, and the walk terminates at the first blob that does not fit so older candidates are always evicted ahead of newer ones even when an older blob could fit alone. Order matters: filter older-than first so a transient blob in the grace window never enters the LRU computation. Both flags default to 0, which the store API documents as "no filter" so the C1.3 behaviour (every dangling blob is pruned) is preserved when the caller does not opt in. SKIP candidates contribute to a new skipped_blobs / skipped_bytes pair on oci_store_prune_options_t so callers can render a three-way kept / pruned / skipped split. CLI parsing accepts s/m/h/d/w suffixes for --older-than and K/M/G (optional B trailer) for --keep-bytes; the size grammar is KiB-based to match du and df. Negative inputs, ERANGE, and unrecognised suffixes are rejected with EINVAL and a stderr message that names the failing argument. The prune output prints a "skipped: N blobs (M bytes)" line only when at least one candidate was spared so unfiltered prunes stay quiet. Tests: - 6 new cases in test-oci-store.c covering the older-than veto, zero-disables, the keep-bytes LRU eviction order, zero-budget equivalence, both filters composed, and a dry-run smoke that asserts stats track without touching disk. New helpers set_blob_mtime (utimes wrapper) and stage_dated_dangling drive blob mtime deterministically. - 4 new compat shell smokes covering --older-than 1d on a touch -t backdated blob, --keep-bytes 0 as the disable form, and the two parser-rejection paths (invalid duration, invalid byte size). - Full OCI suite green: store 39/39, origin 6/6, unpack 1/1, pull 6/6, inspect 7/7, run 6/6, blob-store 14/14, compat 20/20.

Replaces the C3.2 cumulative-by-diff_id snapshot scheme with the Plan 3 C3.3c-ii two-tier cache. The per-layer cache at <store>/layers/sha256/<diff_id>/ now holds the raw tar payload of one layer (whiteout markers preserved as 0-byte files via oci_layer_apply_raw_tar) plus a per-layer .elfuse-meta.layer.json sidecar. A new <store>/layers/stacks/sha256/<chain_hex>/ cache holds the assembled cumulative stage_dir state through some prefix of an image's layer list, keyed by the OCI image-spec ChainID, plus the cumulative .elfuse-meta.json sidecar. oci_unpack now: 1. Computes ChainID(L0..Lk) up front for every layer. 2. Searches the stack cache from the longest prefix down. On hit, rm stage_dir + clonefile(stack_dir, stage_dir) and re-load cum_meta from the snapshot's .elfuse-meta.json so trailing layers accumulate on top. 3. For each layer the prefix did not cover, raw cache hit skips populate; raw cache miss stages a raw_dir, drives oci_unpack_layer_raw, writes the per-layer sidecar, then oci_store_layer_commit publishes it. 4. oci_unpack_assemble_layer runs a two-pass overlay walker against the raw cache entry. Pass 1 honours .wh.<name> (rm-r the named entry) and .wh..wh..opq (clear the parent dir contents) against stage_dir; pass 2 clonefiles every non-whiteout entry on top. Both passes skip the .elfuse-meta.layer.json sidecar. 5. After each layer, the orchestrator writes the cumulative .elfuse-meta.json into stage_dir and snapshots it into layers/stacks/sha256/<chain>/ via clonefile + oci_store_stack_commit so any future unpack sharing this prefix short-circuits. oci_unpack_layer reverts to the C3.1 seven-arg shape and is now a pure stage_dir overlay-extract primitive. The cache plumbing that previously lived inside the helper moves entirely to the orchestrator. A sibling oci_unpack_layer_raw drives the raw-tar applier so the orchestrator does not duplicate the reverify + decompress + applier scaffold for the populate path. oci_unpack_assemble_layer is exposed in unpack.h so multi-layer test fixtures can drive the assembly without a hdiutil-backed volume. layer-meta.{c,h} gains oci_meta_read_named and _write_named so callers can pick the on-disk filename. The existing oci_meta_read / _write become thin wrappers passing ".elfuse-meta.json" so cumulative call sites are unchanged. raw cache writers pass ".elfuse-meta.layer.json"; cumulative stack writers keep the default. Filenames must be relative basenames (no embedded '/'). Hardlink relationships from the tar are not reconstructed across the assembly step. Each clonefile produces an independent inode; APFS copy-on-write keeps disk usage flat regardless. This is a known limitation documented in oci_unpack_assemble_layer's doc. Stack snapshot commit failure is fatal: silently degrading the cache would defeat the dedup path C3.3c-ii exists to enable. EXDEV during clonefile (cache and stage on different APFS volumes) is also a hard failure, matching the C3.2 policy. test-oci-unpack: deletes the three C3.2 cumulative cache cases (cache_miss_populates_layers_subtree, cache_hit_skips_re_extract, cache_hit_merges_meta) whose semantics no longer exists. The three C3.1 helper cases stay with the signature updated to the new seven-arg oci_unpack_layer shape. Nine new cases cover the C3.3c-ii surface: unpack_layer_raw_single_file_populates_raw_dir unpack_layer_raw_preserves_whiteout_as_file two_layer_overlay_assembly_no_whiteout two_layer_whiteout_removes_lower two_layer_opaque_clears_dir cross_image_raw_cache_dedup cross_image_stack_prefix_dedup same_image_full_stack_short_circuits meta_sidecar_split_round_trip Tests drive building-block APIs directly because oci_unpack's hdiutil-backed volume gate is still wired to OCI_VOLUME_TEST=1; the multi-layer flows compose oci_unpack_layer_raw + oci_unpack_assemble_layer + oci_store_{layer,stack}_commit to verify cache state across image scenarios. Test surface: test-oci-unpack 13/13, test-oci-store 53/53, test-oci-origin 6/6, test-oci-pull 6/6, test-oci-inspect 7/7, test-oci-run 6/6, test-oci-blob-store 14/14, test-oci-layer-apply 12/12, test-oci-tar 19/19, test-oci-decompress 5/5, test-oci-digest 33/33, test-oci-compat.sh 20/20. make elfuse links + signs.

Plan 3 C3.4 surfaces a "layer reuse:" section in oci inspect output and exposes the helper Plan 4 oci status will reuse for store-wide aggregate stats. The new oci_dedup_metrics_compute walker in src/oci/dedup-metrics.c takes a target manifest digest plus an optional volume_root and reports five fields: total_layers (target's rootfs.diff_ids count), shared_layers (|target ^ others|), shared_bytes (raw cache st_size sum over the intersection), compared_images (other images deduped by manifest digest), and deepest_shared_prefix (longest k where ChainID(target[0..k-1]) is also reached by some other image's chain). Two dedup axes (per-diff_id and per-ChainID prefix) match how the C3.3c-ii orchestrator dedupes: raw cache shares any same-diff_id payload regardless of position, while the stack cache only short-circuits a leading prefix. Walker sources: - Pins from index.json. Image-index pins resolve to their linux/arm64 sub-manifest before contributing diff_ids; pins that fail to resolve or whose image-config is missing/unparseable are skipped silently (dedup is informational, not a GC keep-set). - Unpacked sysroots under volume_root/images/sha256-<hex>/. The origin sidecar already carries manifest_digest + diff_ids so no blob read is needed. Unpacked trees whose manifest digest matches a pin already counted are dropped via the compared_manifests set so the same image never inflates compared_images. Target failures are surfaced as -1 with errno + *err; the inspect render catches that and prints "layer reuse: (image-config unavailable)" so the surrounding manifest tree still renders. compared_images == 0 (only the target itself in the store) prints "(no other images to compare)". oci_inspect_options_t gains volume_root and suppress_layer_reuse; the new section renders after the layer table in both the direct-manifest and the indexed-drill paths, but is skipped under --all-platforms (no manifest is picked). The CLI plumbs --volume DIR into inspect; help text documents the new flag. Failure model: - Target manifest/config missing or unparseable -> degrade sentinel, rc unchanged. - Other-image manifest/config missing or unparseable -> silently skipped; compared_images counts only fully usable images. - Bytes formatting: >= 1 MiB renders as "~X.Y MiB on cache"; smaller non-zero values render in bytes; zero bytes are omitted so the line never implies a populated cache that does not exist. Reusable bits for Plan 4 (oci status): oci_dedup_metrics_compute is the per-image entry point. A store-wide aggregate is one iteration over oci_store_list_refs away. Tests: test-oci-dedup-metrics is new (8 cases covering single image, shared layers with byte accounting, disjoint diff_ids, same-manifest self-exclusion, reordered shared layer with no prefix, image-index pin arm64 resolution, missing other-config skip, and unpacked sysroot contribution via volume_root). test-oci-inspect grows from 7 to 13 (+6 integration cases verifying section render shape, zero-shared output, single-image sentinel, target config degrade sentinel, --all-platforms suppression, and --volume unpacked-tree counting).

Plan 3 C3.5 introduces elfuse oci rebuild-cache. The new subcommand walks every <volume>/images/sha256-<hex>/ unpacked sysroot, reads its .elfuse-origin.json sidecar to recover the original layer diff_id ordering, recomputes the terminating ChainID via oci_chainid_compute, and (when --commit is set) clonefiles the tree into a fresh stack cache entry at <store>/layers/stacks/sha256/<chain>/. Subsequent unpacks of any image sharing the same ordered layer list short-circuit through the C3.3c-ii orchestrator's stack-cache fast path instead of paying the full extract cost. This closes the migration gap left by C3.3c-ii. The stack cache only grows as a side effect of oci_unpack, so trees unpacked before C3.3c landed (or unpacked into a store while the C3.3b schema marker had just wiped v1 entries) leave no stack snapshot on disk even though their assembled stage_dir state is sitting under images/. Design decisions: - Only the terminating ChainID is back-filled per tree. Intermediate prefix entries cannot be reconstructed because an unpacked tree only captures the final overlay state; the per-layer raw cache at layers/sha256/<diff_id>/ similarly remains empty until a re-pull plus re-unpack of the source image repopulates it. - Detection: every tree with a non-empty diff_ids list is eligible; the stack_has probe filters out already-cached chains. Repeated invocations are idempotent. - CLI shape matches the flat dispatch the other subcommands use: elfuse oci rebuild-cache --store DIR --volume DIR [--commit]. Dry-run is default (the prune convention); --commit writes. - The unpacked tree's .elfuse-origin.json is stripped from the staged snapshot before stack_commit so the rebuilt entry matches the byte shape a fresh oci_unpack produces (origin_write runs AFTER the stack snapshot in the fresh-unpack path). - Per-tree failure is non-fatal: origin read errors split into no_origin (ENOENT) vs bad_origin counters; chainid_compute, clonefile, or stack_commit failure increments trees_failed and logs to stderr. The walk never aborts so a single corrupt tree cannot block back-filling the rest. Listing-level failure (such as failing to traverse images/) returns -1. - No interaction with raw cache entries, blob storage, pin metadata, or the C3.3b schema marker; rebuild-cache only manipulates layers/stacks/. EEXIST during stack_commit is treated as benign (matches the C3.3c-i contract) so concurrent rebuild + unpack do not race destructively. API surface lives in the new src/oci/rebuild-cache.{c,h}; the cli.c addition is just argument parsing + the human report. The internal rm_recursive helper is duplicated from src/oci/unpack.c (rather than lifting both copies to a shared util) so the small back-fill module does not have to pull in the full unpack / layer-apply / decompress graph for one staging-tree cleanup helper. Lift to a shared util when a third call site appears. Tests: test-oci-rebuild-cache is new (9 cases: empty volume, single- tree commit with origin sidecar strip verification, dry-run reports without touching disk, already-cached skip via pre-seeded stack dir, missing origin sidecar skip, malformed origin JSON skip, empty diff_ids array skip, two-tree two-round idempotency, three-layer ChainID matches the on-disk path computed via an independent oci_chainid_compute walk). test-oci-compat.sh grows from 20 to 21 with a rebuild-cache --dry-run smoke against the existing fixture. make elfuse links + signs.

Plan 3 C3.3d extends oci_store_prune to garbage-collect entries in the two Plan 3 cache families alongside the Plan 1 blob sweep. The existing blob-only mark plus sweep stays byte-identical when layers/ and layers/stacks/ are empty; CLI output preserves the "reclaimable: N blobs" / "kept: M blobs" / "dry-run" lines verbatim so operator scripts and the existing compat smoke patterns keep matching. Mark walker: A new public oci_store_collect_layer_roots in src/oci/store.{h,c} runs parallel to oci_store_collect_roots and produces two sorted digest sets: one of diff_ids (keys for <root>/layers/<algo>/<hex>/) and one of every ChainID prefix (keys for <root>/layers/stacks/<algo>/<hex>/). The walker shares the pin and unpacked-sysroot sources with blob mark. For each pinned image-manifest it drills into the image-config blob and reads rootfs.diff_ids; for image-index pins it picks the linux/arm64 sub-manifest and recurses, treating "no linux/arm64 entry" and "sub-manifest blob not on disk" as soft no-contribution to match expand_manifest_digest's policy for the same shapes. Unpacked sysroots come from the origin sidecar's layer_diffids field so no blob read is needed for that source. Missing or unparseable manifest / image-config blobs are fatal mark failures so prune cannot delete a reachable cache entry on the false belief that nothing references it. ChainID expansion records every prefix chain, not just the terminating chain, because oci_unpack writes one stack snapshot per prefix during the apply loop (src/oci/unpack.c around line 1063); a walker that only tracked terminal chains would let prune delete a prefix entry an unpack actually committed. Three small statics in store.c (resolve_image_diff_ids, add_diff_ids_and_chains, dir_tree_size_sum) are pattern-duplicates of sibling helpers in src/oci/dedup-metrics.c. They were copied rather than lifted to follow the rebuild-cache.c::rm_recursive precedent (commit 4df17b1) of deferring a shared util module until a third caller appears. Sweep: The blob-specific classify_algo_dir + apply_verdicts machinery from Plan 1 stays in place but moves from "writes stats->kept_blobs / stats->pruned_blobs / ..." to "writes through caller-supplied output pointers". The change is local and additive at every call site. A new classify_tree_cache_dir sweeps both <root>/layers/<algo>/ and <root>/layers/stacks/<algo>/ via the same base_subpath parameter: it recognises sha256/sha512 directory entries whose name is a valid lowercase hex digest, looks the canonical "<algo>:<hex>" up in the family's keep set, and either bumps the kept counter or appends a candidate with the entry's recursive st_size sum and its st_mtime (set by rename(2) at commit time, so newer entries sort newer for the LRU budget). apply_verdicts gains a prune_family_t parameter selecting the removal primitive: BLOB calls unlink(2); TREE calls the existing layer_stage_rm recursive rm helper so a populated cache directory goes down in one call. ENOENT mid-removal stays a benign skip in both branches; other errno values are fatal. oci_store_prune now runs three back-to-back family pipelines (blobs, layers, stacks). Each family classifies, filters, and applies its own candidate list against its own keep set. The mark phase still runs once: collect_roots produces the blob set first, then collect_layer_roots produces the diff_id + chain_id sets, both under the same flock(index.json.lock, LOCK_EX) window so all three keep sets are derived from one snapshot of pins + unpacked sysroots. The keep-bytes budget applies per family (each family runs its own apply_filters pass) so a fat blob cannot crowd a layer eviction off a shared global budget. Lock model is unchanged: index.json.lock LOCK_EX covers the whole operation. Layer / stack cache writers (oci_unpack, oci_rebuild_cache) do not take this lock so they may publish new entries while prune is running; those entries are reachable from their image's pin or unpacked sysroot, both of which the mark phase captured, so their diff_id / chain_id is in the keep set even if the directory did not exist at sweep time. The narrow remaining window (layer extracted before put_ref lands) matches the C1.3 blob mid-pull semantic: the operator retries the pull. oci_store_prune_options_t gains ten new output fields (kept/pruned/skipped counts + pruned/skipped byte sums for layers and stacks). The struct grows additively so designated-init callers do not break. C3.3d does not bump the layers/.schema marker (still v2): the on-disk layout did not change, only the prune behaviour. CLI: src/oci/cli.c::cmd_prune renders new "layers:" and "stacks:" lines right after the existing blob "reclaimable: / reclaimed:" line, and new "kept: N layers" / "kept: N stacks" lines after "kept: N blobs". Both groups render only when their counter is non-zero so an empty-cache store still produces the legacy two-line output, and the compat smoke patterns that grep for "reclaimable: N blobs", "kept: N blobs", and "dry-run" all keep matching. No new CLI flags: layer + stack sweep is the default behaviour because an operator running prune wants the store cleaned in full. Tests: tests/test-oci-store.c gains 14 new cases (53 -> 67) covering: - oci_store_collect_layer_roots: empty store, single-layer pin (L0 ChainID identity), three-layer pin (every prefix chain present), unpacked-tree contribution, fatal failure on missing image-config blob. - layer sweep: dangling entry unlinked, entry kept via pin, entry kept via unpacked tree, dry-run does not touch disk, recursive st_size sum drives pruned_layer_bytes. - stack sweep: dangling entry unlinked, every prefix chain kept when its image is pinned. - filter integration: older-than veto skips fresh layer entry, keep-bytes budget evicts oldest layer first. A new stage_image_v2 helper writes a parseable image-config blob with caller-supplied rootfs.diff_ids so the mark walker can drill into rootfs.diff_ids. The existing stage_image helper upgrades from an opaque config payload to a minimal valid image-config (empty rootfs.diff_ids, opaque payload folded into an "author" annotation to keep per-test digest distinctness); every Plan 1 / C1.3 / C1.4 test stays green because layer mark walks an empty diff_id list and contributes nothing to the keep set. tests/test-oci-compat.sh gains a C3.3d smoke block (21 -> 23) that drops a dangling layer dir and a dangling stack dir into the store after the prune-smoke fixtures, runs prune --commit, and asserts both the new "layers: reclaimed" / "stacks: reclaimed" lines and the on-disk teardown. Test surface verified locally on oci-plan3-layer-snapshot: test-oci-store 67/67, test-oci-origin 6/6, test-oci-inspect 13/13, test-oci-unpack 13/13, test-oci-pull 6/6, test-oci-run 6/6, test-oci-blob-store 14/14, test-oci-layer-apply 12/12, test-oci-tar 19/19, test-oci-decompress 5/5, test-oci-digest 33/33, test-oci-dedup-metrics 8/8, test-oci-rebuild-cache 9/9, test-oci-compat 23/23. make elfuse links and signs.

New public API oci_status_compute(store, opts, out, err) plus oci_status_free in src/oci/status.{c,h}. The walker iterates pins via oci_store_list_refs, optionally walks unpacked sysroots under volume_root/images/, runs three disk sweeps (blobs/, layers/, layers/stacks/), and computes raw + stack cache populate ratios over the reachable diff_id / ChainID union sets. Per-pin / per-tree failures land in a status enum (missing-manifest, corrupt-manifest, corrupt-config, missing-origin, ...) so a single bad row does not hide the rest of the snapshot; fatal exits are reserved for store-open and sweep IO failures. CLI gains elfuse oci status --store DIR --volume DIR --json --no-disk-usage. Human render emits PINS / UNPACKED SYSROOTS / STORE TOTALS sections; --json emits a schemaVersion 1 document keyed for jq consumers. --no-disk-usage zeroes every byte total while keeping the entry counts and populate ratios intact, for operators running status on stores too large to walk recursively. Implementation duplicates slurp_blob / sum_tree_size / resolve_config_digest / load_diff_ids / accumulate_chain from dedup-metrics.c. This is the third caller of the diff_id walker pattern after dedup-metrics.c and store.c; the rebuild-cache rm recursive precedent is the model for deferring a shared src/oci/image-walk module until a fourth caller appears. Tests: new test-oci-status binary with 10 cases (empty store, single pin with size and mtime asserts, missing manifest sentinel, corrupt manifest sentinel with sibling-still-OK assertion, image-index pin drills to linux/arm64 sub-manifest, unpacked sysroot row with bytes, unpacked missing origin sentinel, skip_disk_usage zeroes byte fields, populate ratio with 5 reachable diff_ids and 3 cached entries, two images sharing one layer dedupe to 3 in the reachable union). tests/test-oci-compat.sh gains 3 status smoke assertions (human PINS + STORE TOTALS shape, --json schemaVersion 1 with pins / totals substrings, --no-disk-usage zeroes blob_bytes while disk_usage_skipped is true). make elfuse links + signs; full OCI unit suite (store, origin, inspect, unpack, pull, run, blob-store, layer-apply, tar, decompress, digest, dedup-metrics, rebuild-cache) stays green.

The default oci pull behaviour stays unchanged: every step re-runs and the manifest body is always re-fetched. The new --refresh flag wires an opt-in conditional GET path so a repeat pull of a tag whose pinned digest is still on disk emits If-None-Match: "<pinned-digest>" on the top-level manifest request. On 304 the cached manifest body is loaded from the blob store, the config and layer loops short-circuit via the existing oci_blob_store_has cache check, and the pin write is skipped. On 200 with a new digest the pipeline runs in full; the previous manifest blob stays on disk until prune sweeps it. Plumbing changes: - oci_fetch_manifest gains an if_none_match parameter and an etag field on the response. 304 is now treated as success (body NULL, body_len 0, http_status 304); other non-2xx statuses still raise EPROTO. - oci_pull_options_t gains a refresh bool. The new prologue in oci_pull reads the pin, stats the manifest blob, and forwards the quoted digest only when both are available; missing pin or missing blob falls through to a normal pull. - pull.c gains a small load_manifest_blob helper for the 304 path (lift to image-walk pending a fourth caller; the diff_id walker duplication watchlist still applies). - CLI gains a --refresh flag with a usage line; cli/cmd_pull forwards args.refresh into oci_pull_options_t. - The TLS mock server captures inbound If-None-Match into the request struct, and oci_mock_send_full grows an etag argument so handlers can emit ETag headers and respond 304 when the inbound digest matches. Existing callers receive NULL. Test surface: - test-oci-pull adds four cases: --refresh produces a single network request on an unchanged tag (no blob re-fetch); --refresh against a tag whose registry digest has flipped re-pulls in full and keeps the old manifest blob on disk for prune; --refresh against an empty store falls through with no If-None-Match sent; --refresh on a digest-only ref is a noop (no conditional header, no pin written). - test-oci-compat asserts oci pull --help advertises the new flag. - The full OCI regression matrix (fetch / pull / store / origin / inspect / unpack / run / blob-store / layer-apply / tar / decompress / digest / dedup-metrics / rebuild-cache / status / compat) stays green, and make check (aarch64 unit + busybox) reports 81 passed. Scope notes: the sub-manifest fetch after an index drill does not yet participate in conditional revalidation (no semantic anchor for a by-digest GET), so a tag whose top-level is an image index still issues one extra network round trip on 304. Layer / config blob short-circuits cover the heavy bytes either way. Future work can add a sub-manifest cache to elide that hop.

Plan 6 C6.1 lands a podman/skopeo-style policy.json reader at src/oci/policy.{c,h} that fetch.c (C6.2) will consult before applying CLI overrides. The loader walks the candidate path chain ELFUSE_POLICY_FILE > $XDG_CONFIG_HOME/elfuse/policy.json (fallback $HOME/.config/elfuse/policy.json) > $HOME/Library/Application Support/ elfuse/policy.json > built-in default. The supported schema subset is {default{insecure, ca_bundle}, registries{<host>{insecure, ca_bundle, auth_file, sigstore{publicKey}}}}. The sigstore.publicKey field parses into oci_policy_effective_t.sigstore_public_key but is otherwise unused; it reserves the slot for a future Phase 4+ verify hook so operators can author the field today without churning the schema later. Path expansion handles a leading "~/" or pure "~" by joining against \$HOME; "~user/" forms pass through verbatim so the loader does not pull in getpwnam. ca_bundle existence is checked at load time so a fetcher consulting the policy never races a missing trust bundle mid-pull; auth_file existence and 0600 mode-checking land with the fetcher's credential reader in C6.2 because they share failure-mode ergonomics. Unknown JSON keys at every level are accepted and recorded so the C6.3 registries.d overlay and future schema extensions can roll out without coordinated reader changes. Default oci pull behaviour stays byte-identical: no caller consumes oci_policy_t yet (C6.2 plumbs it through cli.c + fetch.c). The translation unit is added to SRCS to keep the warnings posture in sync with the rest of OCI. Tests: new tests/test-oci-policy.c with 16 sub-tests: - path chain (empty, override wins, override miss is hard error, override empty string falls through, xdg fallback, home/.config fallback, library fallback) - full schema round trip (ghcr.io + 127.0.0.1:5000 + quay.io sigstore + unknown-host falls back to default) - "~/" path expansion - unknown top-level + entry keys tolerated - four invalid shapes hard-error with diagnostic - ca_bundle missing hard-error - ca_bundle:null inherits default, missing entry inherits default make elfuse links + signs; the loader is unused but compiles clean.

C6.2 wires the C6.1 policy loader into the registry fetcher and into cmd_pull. Pull now reads a policy.json from the documented config-path chain (ELFUSE_POLICY_FILE / XDG / HOME / Library) and merges the per-registry insecure / ca_bundle / auth_file settings with the CLI flags. CLI wins: an explicit -u, --insecure, or --insecure-ca shadows the matching policy field for the same registry. The fetcher gains a const oci_policy_t* (caller-owned lifetime) and a new file-local effective_opts_t. Each manifest / blob / token request calls resolve_effective(f, ref, &eff) which performs an oci_policy_lookup on ref->registry, merges with the CLI defaults, and loads any policy auth_file via a new oci_policy_load_auth helper. The auth file must be {"username","password"} JSON with mode (st_mode & 077) == 0; group- or other-readable files fail with EPERM. The loopback gate around allow_insecure now reads the effective bit, so a policy insecure=true on a non-loopback host fails the same way a CLI --insecure on a non-loopback host does. cmd_pull loads the policy before constructing the fetcher and prints one stderr warning per CLI flag that overrides a non-default policy value, gated on a non-empty policy source path and silenced by --quiet. The Pull usage block documents the lookup chain. Default oci pull behaviour without a policy file stays byte-identical: oci_policy_load with no candidate file returns the built-in zero policy, resolve_effective treats every policy field as NULL/false, and the fetcher behaves exactly as it did before. Tests: - test-oci-policy +4 (oci_policy_load_auth: happy path, mode 0644 rejected, missing username rejected, malformed JSON rejected); 16 -> 20 sub-tests green. - test-oci-pull +3 (policy insecure=true for loopback host pulls without CLI --insecure; policy auth_file with mode 0644 aborts the pull with a mode diagnostic; CLI --insecure overrides policy insecure=false); 10 -> 13 tests green. - test-oci-compat.sh +1 (oci pull --help mentions the Policy lookup block). - test-oci-fetch 15/15 green after the apply_security_opts / check_insecure_policy refactor. - test-oci-store / origin / inspect / unpack / run / blob-store / layer-apply / tar / decompress / digest / dedup-metrics / rebuild-cache / status all green; full make check 0 FAIL. Not covered by automated tests: cmd_pull's warn output itself (stderr-from-binary integration would need a mock + elfuse launcher harness that does not exist today). The fetcher-side override behaviour is exercised by the three new test-oci-pull cases; the warn print remains a manual-verification surface for now.

Plan 6 C6.3. Each per-host JSON snippet under registries.d/ next to the base policy file field-merges into the matching entry (or grafts a new one if absent). The overlay path reuses the policy_entry_t shape minus the registries-wrapper; the filename minus its .json suffix is the target host. Files are processed in lexicographic order for determinism. Implementation: - policy_entry_t gains has_ca_bundle / has_auth_file / has_sigstore_public_key alongside the existing has_insecure so the field-level merge can distinguish "field declared" from "field omitted" for every overlayable slot. Lookup keeps reading the NULL pointer the way C6.1 wrote it; the flags are merge-only state. - parse_entry_block is split: parse_entry_fields is the shared per-field walker, taking a src_path that is NULL for base-policy entries (host-scoped diagnostics) and the overlay file path for registries.d entries (file-scoped diagnostics). field_err picks the right format. parse_sigstore_block becomes parse_sigstore_fields along the same lines. - load_overlay_dir scans <base-policy-parent>/registries.d/ for *.json. ENOENT on the directory is silent (overlay is optional); any other opendir errno (ENOTDIR, EACCES, ...) is a hard error so an operator pointing at an unreadable tree gets told. Non-regular candidates are skipped defensively; non-*.json filenames are ignored (README, .DS_Store, ...). - parse_overlay_file slurps the file, requires an object root, walks the same fields as a base entry into a scratch policy_entry_t, then merge_overlay_into_entry transfers the declared slots into the target. ca_bundle stat-check runs against the overlay-declared path with a file-scoped diagnostic. unknown_keys append (no dedup). - sigstore.publicKey now parses identically in base and overlay, and surfaces through oci_policy_lookup. fetch.c still does not consume it; the slot stays reserved for the Phase 4+ sigstore verify hook. No public API change: oci_policy_load / _free / _lookup / _source / _load_auth signatures and oci_policy_effective_t shape are byte-identical to C6.2. fetch.c, cli.c, and pull.c are untouched. Tests (tests/test-oci-policy.c, 20 -> 28 green): - overlay_field_level_merge: base ca_bundle + overlay auth_file -> both present after lookup - overlay_adds_new_host: overlay introduces a host the base policy never declared - overlay_overrides_base_field: overlay insecure=true beats base insecure=false - overlay_dir_missing_silent: no registries.d/ next to base policy is a successful load - overlay_malformed_json_hard_error: bad overlay JSON propagates an error with overlay path + "JSON" - overlay_ignores_non_json_files: README.md sibling does not derail the scan - overlay_sigstore_public_key_surfaced: overlay sigstore.publicKey is readable via oci_policy_effective_t - overlay_multiple_hosts: two overlay files for two hosts each surface independently Default oci pull behaviour with no policy file and no registries.d/ is byte-identical to C6.2. test-oci-fetch (15/15), test-oci-pull (13/13), and test-oci-compat (28/28) stay green; make elfuse LD + SIGN clean.

Implements Plan 5 C5.1. oci_fetch_blob_batch dispatches a descriptor array through libcurl's multi interface so a pull's config and layer blobs flow over the network in parallel instead of one after another. The concurrency cap reads OCI_FETCH_MAX_CONCURRENT (default 4, clamped to [1, 16]); a single effective_opts_t resolves the CLI + policy merge once at batch entry and every easy handle borrows it; first-round 401 with a Bearer challenge triggers one serial token refresh and a retry round restarted with the refreshed token; any single-blob failure aborts every in-flight writer atomically before any commit lands so a partial pull never leaves a visible blob behind. oci_fetch_blob now forwards onto oci_fetch_blob_batch as a one-element wrapper. pull.c collapses its serial config + layers loop into a single batch call and still prints per-blob cached vs downloaded lines because the store-has lookup is captured before the batch hides the transfer. blob-store gains oci_blob_writer_begin_named, a writer entry that stages partials at tmp/blob-<hex prefix 16>-XXXXXX so the C5.2 resume sweep can find them by digest. The previous tmp/blob-<pid>-<seq> naming stays available via the unchanged oci_blob_writer_begin so existing callers keep working. The test mock now spawns one worker thread per accepted connection (detached) and exposes a per-connection response delay and an in-flight watermark. Without the worker change, parallel batch tests would see only one transfer at a time and the wall-clock speedup assertion would be vacuous. Five new fetch cases cover the new path: parallel wall-time beats serial by at least 1.5x (8 blobs, 150 ms mock delay), any blob failure aborts the whole batch with no tmp leak, duplicate digests fetch once, a single token refresh covers every first-round 401, and OCI_FETCH_MAX_CONCURRENT=2 caps the in-flight count.

Each oci_fetch_blob_batch entry sweeps tmp/ for partials older than seven days, then per-blob calls oci_blob_writer_resume_named, which scandirs tmp/ for blob-<hex16>-* matches, picks the largest survivor, reopens it O_RDWR, replays its bytes through the digester, and seeks to end-of-file. The caller sets CURLOPT_RANGE = bytes=<offset>- on the easy handle and seeds bctx.bytes_seen with the partial size so the streaming overflow gate measures total-blob progress. Servers that ignore the Range and reply 200, or reply 416 Range Not Satisfiable, trip a per-handle BH_NEEDS_RESTART state. The body callback peeks CURLINFO_RESPONSE_CODE on its first invocation while the request carried a Range header and aborts early when the status is not 206; a 416 with no body falls through to the score path's status-only restart trigger. The outer multi loop processes restarts with a new batch_reset_handle_fresh helper (the renamed retry path, now shared between token-refresh and Range-restart). After the reset resume_offset is zero, so a second 200/416 cannot pick the restart branch again and self-caps at one attempt per handle. resume_named pre-rejects partials whose size is >= the descriptor's declared size: a partial at or past expected size would otherwise tip the streaming overflow gate into a non-recoverable failure instead of a clean restart. Reopen / re-hash failures unlink the partial and fall back to oci_blob_writer_begin_named so default behaviour stays byte-identical to C5.1 when no partial is present. oci_blob_store_sweep_partials is a new public entry that unlinks blob-* files in tmp/ older than ttl_secs. The batch invokes it once per call with seven days. The wide blob-* prefix is safe because the blob store owns tmp/ exclusively. oci_mock_send_full gains a content_range parameter so handlers can issue 206 with the correct Content-Range header. The mock Range parser that landed in C5.1 is now consumed by a batch_range_mode_t flag the h_batch handler reads (honour / ignore / 416). Tests: 4 new sub-cases in test-oci-fetch (resume 206 happy path, server-ignores-Range restart, 416 restart, 7-day stale sweep) and 3 in test-oci-blob-store (resume reopens partial and commits, resume falls back to fresh writer when no partial, sweep TTL unlinks aged files only). make check 0 FAIL: 639 OK across all suites.

batch_handle_t gains progress_cb + progress_user borrowed from the batch entry's parameters. batch_configure_easy wires CURLOPT_XFERINFOFUNCTION (CURLOPT_NOPROGRESS=0) only when the caller supplies a callback, so the C5.1 fast path (no progress) keeps zero xferinfo overhead. batch_xferinfo_cb forwards into the user callback with bytes_dl adjusted to (dlnow + resume_offset) so a resumed transfer's progress pairs correctly with desc->size as the total. dltotal is ignored because libcurl reports the remaining-bytes count when a Range header is in flight, while desc->size is the authoritative whole-blob total. batch_score_done fires one explicit final invocation at the BH_DONE_OK boundary so the renderer always sees a bytes_dl == bytes_total event regardless of libcurl's xferinfo pacing. pull.c grows a file-local renderer (pull_progress_t) that splits descriptors into cached (printed immediately, byte-identical to the C5.1 / C5.2 wording) and to-be-downloaded (rendered through the callback). TTY mode prints n placeholder lines and uses CSI nF + CSI 2K to redraw the zone in place on every xferinfo tick. Non-TTY mode defers per-blob output until the final bytes_dl == bytes_total event, preserving the line-per-completion log shape that scripted consumers grep against. isatty(fileno(progress)) is the detection gate; --quiet keeps fp == NULL and short-circuits all formatter output. The cached-vs-downloaded annotation stays byte-identical to the pre-C5.3 output. Tests: test-oci-fetch grows one case verifying the cb contract -- every committed blob produces at least one event with bytes_dl == bytes_total == desc->size, and every event's bytes_total matches the descriptor size. test-oci-pull grows one case running oci_pull with opts.progress = tmpfile() so the buffer is captured non-TTY; the assertion tallies one downloaded line per blob (config + N layers), two manifest lines, zero cached lines, and asserts the buffer is free of any CSI escape sequence. make check 0 FAIL: 641 OK across all suites.

Phase 4 F4.1 acceptance asks for two distinct properties from the APFS clonefile-based per-run rootfs: a write inside the clone must succeed without backing into the source, and a delete inside the clone must not unlink the corresponding source entry. The existing test_clone_cow covers only the mutate-existing path; this commit adds: test_clone_new_file_isolated - touch a brand-new file in the clone; source dir stays without that path (the literal "touch /hello" wording from issue sysprog21#31 Phase 4 acceptance 1) test_clone_unlink_preserves_src - delete an existing file in the clone; the same path in the source still resolves Both reuse the existing mkdtemp / write_file / file_has scaffolding and inherit the ENOTSUP skip so non-APFS scratch volumes report SKIP rather than fail. No src/ change; F4.1 production code (the APFS clonefile call site in src/oci/clone-rootfs.c) was committed in f317c81 during Phase 2.

Phase 4 F4.2 (/etc/resolv.conf) and F4.3 (/etc/hosts + /etc/hostname) ask elfuse to synthesise host-truth files into the per-run rootfs so guest libc lookups (getaddrinfo, gethostname, /etc/hosts walks) see values matching the macOS host rather than the image's containerd defaults. New src/oci/runtime-files.{c,h} exposes a single oci_runtime_files_inject(run_dir, err) entry point that creates <run_dir>/etc/ at mode 0755 if missing and writes three files, unlinking any pre-existing symlink first (image distros often ship /etc/resolv.conf as a symlink to /run/systemd/resolve/stub-resolv.conf that would otherwise dangle inside the guest): /etc/resolv.conf - "nameserver <ip>" lines extracted from scutil --dns stdout via a posix_spawn + pipe reader; falls back to 8.8.8.8 / 1.1.1.1 when scutil fails or reports zero configured resolvers /etc/hosts - fixed five-line block: 127.0.0.1 localhost, ::1 with the ip6-loopback aliases, the two link-local multicast names, and 127.0.0.1 host.elfuse.internal as the documented host- loopback hook. The image's own /etc/hosts is overwritten unconditionally; no merge. /etc/hostname - the literal string "elfuse\n" matching the container=elfuse env injection runspec already sets src/oci/run.c gains a step-3.5 call to oci_runtime_files_inject between oci_clone_rootfs and the manifest parse; failures abort the run before launch with the inject diagnostic surfaced through *err and the clone-rootfs torn down by the existing cleanup epilogue. Six unit tests in tests/test-oci-runtime-files.c cover the policy: fresh /etc creation, symlink overwrite, regular-file overwrite, literal hostname content, the required /etc/hosts entries, and /etc/resolv.conf containing a nameserver line regardless of whether scutil succeeded or the fallback fired.

Linux /dev/full reads return a NUL stream and writes always fail with ENOSPC. Container runtimes synthesise /dev/console from the controlling tty because the host /dev/console is reserved for kernel use. Neither node can come from an OCI layer (layer-apply rejects char device tar entries with ENOTSUP), so both are added to the procemu runtime intercept path. /dev/full opens host /dev/zero so reads naturally return zeros and lseek works, then tags the FD via proc_path so proc_intercept_write returns ENOSPC for any non-zero write while preserving the POSIX zero-length write succeeds rule. /dev/console maps to host /dev/tty, matching the runc/containerd controlling-tty redirect. Extend tests/test-proc.c with /dev/full read/write/writev/lseek cases and a best-effort /dev/console open case that tolerates non-tty CI environments.

Adds six new synthetic /proc files for container-style detection and sysinfo introspection: /proc/self/cgroup - cgroup v2 "0::/" (not containerized) /proc/self/comm - basename of the loaded ELF + LF /proc/self/statm - seven page-count fields, source same as /proc/self/stat /proc/sys/kernel/ostype - literal "Linux" /proc/sys/kernel/osrelease - mirrors cached uname release /proc/sys/kernel/hostname - mirrors cached uname nodename systemd-detect-virt, runc-internal, and podman read /proc/self/cgroup to decide whether they are running inside a container; the canonical v2 "0::/" form tells them elfuse is a plain host environment. The sysctl files keep procfs and uname(2) agreed on so init scripts that cross-check do not abort. Adds sys_uname_cached() so procemu can read the static uname struct without duplicating literal strings. Eight new procfs cases in tests/test-procfs.c cover the new files. Follow-up not in scope: /proc/cpuinfo currently reports host _SC_NPROCESSORS_ONLN. Wiring it to a guest vCPU count would need a new guest_t accessor; defer until that API has a second caller.

issue sysprog21#31 Phase 4 F4.7. The image-config User field accepts six shapes per OCI image-spec: empty, uid, uid:gid, name, name:group, uid:group, name:gid. The runspec resolver previously parsed only the two numeric shapes and rejected symbolic forms with a Phase 4 pointer; container detection tooling and most base images use nobody / www-data / postgres style strings, so a guest run falling through to host uid was the practical outcome. oci_user_lookup() parses passwd-shaped and group-shaped tokens against the per-run clone-rootfs (rootfs/etc/passwd, rootfs/etc/group), preferring numeric interpretation when the token is all-digit (matching runc). A symbolic User with a rootfs missing /etc/passwd fails closed with EINVAL rather than silently degrading to root, so a misconfigured image surfaces at launch instead of at first guest decision. The lookup helper lives in its own translation unit so the runspec module stays pure-data: oci_runspec_build only touches the filesystem when the caller passes a rootfs through flags->rootfs_for_nss. CLI --user is extended to accept symbolic forms through the same path. Coverage: tests/test-oci-user.c (12 cases) drives the parser and filesystem path against scratch rootfses; tests/test-oci-runspec.c adds three runspec-seam cases, rewrites the symbolic-rejected case to assert the no-rootfs branch, and converts the legacy non-numeric --user case into a no-rootfs diagnostic check.

Replace Phase 4 -> later pointers in docs/usage.md (Scope guardrails and User and WorkingDir) with the surface that actually landed in C4.1..C4.5: per-run writable rootfs via APFS clonefile, /etc/ {resolv.conf,hosts,hostname} injection, /dev/{full,console} plus the existing null/zero/random/urandom/tty set, /proc/self/{cgroup,comm, statm} and /proc/sys/kernel/{ostype,osrelease,hostname}, and the seven-shape User resolver against rootfs /etc/passwd + /etc/group. Add a Libc-adjacent compatibility section that fixes elfuse's position on the six host-fs-adjacent payloads the spec leaves to the image: nsswitch.conf (only files and dns backends work), NSS shared objects (no host dlopen of guest .so), tzdata (image carries; no format conversion), locale-archive (image carries; C fallback when absent), gconv-modules (image carries; iconv yields EILSEQ when absent), and ld.so.cache (dynamic linker handles its own). Includes a three-row symptom matrix covering getaddrinfo, date / TZ-dependent output, and locale-aware sort / printf.

Docker.io multi-arch tags such as alpine:3 pin the ref at the image index digest, not at the leaf manifest digest, because the index is the natural refresh anchor (a new arm64 manifest only changes the index entry, not the index digest expectation in client tooling). oci_pull already preserves this shape: pin -> index blob. oci_run previously fed the pinned blob straight into oci_manifest_parse and failed with "manifest parse failed: manifest config descriptor missing" because an index has "manifests"[] instead of "config" plus "layers". oci_inspect already does the classify-then-walk pattern; oci_run now mirrors it through a new resolve_image_manifest() static helper. The helper: 1. parses the pinned digest string 2. loads the blob 3. tries oci_index_parse first 4. on index, picks linux/arm64 via oci_index_pick_linux_arm64, loads the sub-manifest blob, swaps the body 5. parses the final body with oci_manifest_parse Step 4 in oci_run shrinks to a single call into the helper plus the caller-side cleanup that already existed. A test-only hook oci_run_resolve_image_manifest_for_testing exposes the helper so tests/test-oci-run.c can drive multi-arch fixtures without needing a case-sensitive APFS sysroot volume. Three new cases cover the shapes: - leaf-pinned: ref pinned at the manifest digest (fixture-builder path, tests/test-oci-compat.sh path); parses as a leaf without index drilling - index-walked: ref pinned at a three-platform index whose arm64 leaf the helper must drill into; the helper returns the leaf- manifest body, not the index body - index without arm64: helper rejects with ENOENT and an error message that mentions linux/arm64 End-to-end sanity: build/elfuse oci run alpine:3 /bin/busybox echo "hi from alpine" now prints "hi from alpine"; previously it failed at manifest parse before reaching unpack.

The placeholder skip block in tests/test-oci-compat.sh always said the alpine:3 online harness "lands in a follow-up patch". Land it now, on the back of the index-walk fix (76303c2): when OCI_FETCH_ONLINE is set, the suite pulls docker.io/library/alpine:3 into a scratch store under SCRATCH, then runs alpine:3 against /bin/busybox echo with a fixed sentinel string. Two assertions: - oci pull alpine:3 succeeds (cycles the registry HTTPS client and the index-aware pin storage that 5b10f432 already records) - oci run alpine:3 returns 0 and stdout matches the sentinel line verbatim; if oci_run regressed back to the pre-fix behavior the log substring "manifest config descriptor missing" surfaces a targeted bad message instead of the generic rc check This is the regression anchor for the multi-arch index-walk path that 76303c2 introduced: anything that breaks the docker.io image- index unwrap surface trips this case the moment a developer flips OCI_FETCH_ONLINE=1. The scratch store keeps the test isolated from the user's default store; the default sparsebundle volume is reused for unpack since the on-volume image content is content-addressed and idempotent. OCI_FETCH_ONLINE remains gated, so make check stays offline-only. Local verification: OCI_FETCH_ONLINE=1 bash tests/test-oci-compat.sh reports 30/30, including both new online cases.

Every container layer tar carries a root-directory entry encoded as "./". The DIR-type trailing-slash strip in src/oci/tar.c collapses it to ".", which oci_path_join_safe then explicitly rejected as "empty path" (EINVAL). Cold unpack of any real-world image - including busybox:latest as the smallest reproducer - died on the first tar entry before producing a single file on disk. Skip the entry in layer_apply_impl after the leading-slash strip: the unpack root is created by the assembler before this loop runs, so the root entry has no work to drive. Empty paths are skipped the same way for archives that record a zero-length root name. Wall 1 of the cold-unpack repair sweep. Walls 2 (EXDEV) and 3 (PAX) follow in the next two commits and become visible only after this patch clears the path-join error.

The default elfuse layout puts the store on the root APFS volume and the stage on a hdiutil-mounted sparsebundle, so the three clonefile(2) call sites in oci/unpack.c returned EXDEV on every fresh unpack and had no fallback path. Cold unpack of busybox (and any other image) failed with "assemble: clonefile EXDEV (raw cache and stage must share an APFS volume)" before any layer file landed on disk. Switch all three sites (per-file raw assembly, stack restore, stack snapshot) to copyfile with COPYFILE_CLONE. The clone flag keeps the APFS COW path on same-volume copies and falls back to a real byte copy across volumes, so the default layout works without changing where the store or sparsebundle live. The dir-tree sites also pass COPYFILE_RECURSIVE and COPYFILE_NOFOLLOW so symlinks are preserved. Wall 2 of the cold-unpack repair sweep. Walls 1 (root tar entry) and 3 (PAX) cover the other two failure modes the cold path hits.

Real-world container layers (anything glibc-shaped with long pathnames or filenames over 100 bytes) emit POSIX.1-2001 PAX extended headers with typeflag 'x' to carry path / linkpath / size / mtime keys. The tar parser previously refused the typeflag outright with EPROTONOSUPPORT, so python:alpine and every other image that carries even one PAX-encoded path failed cold unpack with "tar PAX extensions not supported". Add consume_pax_record alongside the GNU 'L' / 'K' long-name path. Per-file 'x' records have their payload parsed for "<len> key=val\n" tuples; `path` and `linkpath` keys promote into the same pending_long_name and pending_long_link buffers the GNU path populates, so downstream code stays unaware of which long-name format produced the override. Other keys (size, mtime, atime, uid, gid, xattrs) are silently ignored - the unpack pipeline does not track them. Global 'g' records establish defaults for all subsequent entries. Container builders use 'g' for mtime / uid defaults that this project does not consume, so the implementation discards the payload bytes-correctly without parsing. tests/test-oci-tar.c retires test_pax_rejected (the old contract was rejection with EPROTONOSUPPORT) and gains two replacements: test_pax_extended_path verifies that a per-file 'x' record's path key latches onto the next entry, and test_pax_global_skipped verifies that a 'g' record is consumed silently without disturbing the entry that follows. Wall 3 of the cold-unpack repair sweep. With Walls 1 and 2 already landed, this completes the path from registry pull to a running guest binary for production-shape images.

OCI_COMPAT_TEST=1 was a SKIP slot from Phase 3 since "sparsebundle volume provisioning and the three Phase 3 plan fixtures land in a follow-up compat-matrix patch". This commit lands the first leg: a scratch case-sensitive APFS sparsebundle that the heavy block creates on demand and detaches in the EXIT trap, plus the first of the three fixtures - alpine-shaped, a single-layer image with /bin/busybox + /etc/os-release. The scratch volume keeps the heavy E2E from polluting $HOME/Library/Application Support/elfuse/sysroots.sparsebundle on a developer laptop, which is the practical reason the gate existed in the first place. hdiutil create + attach goes through the same case-sensitive APFS path that oci_volume_ensure validates, so --volume points at the fresh mountpoint with no extra plumbing. Fixture A drives busybox as both the entrypoint and the applet dispatcher: oci run ... echo "elfuse-alpine-shaped-ok" produces the canonical stdout line via the echo applet. busybox is the static aarch64-linux-musl binary at externals/test-fixtures/aarch64-musl/staticbin/bin/busybox; the fixture skips with a fetch-fixtures.sh pointer when missing so clean clones still pass make check. Default mode (no OCI_COMPAT_TEST=1) keeps the original SKIP behavior, so $HOME never sees an hdiutil mount. Validation: - OCI_COMPAT_TEST=1 bash tests/test-oci-compat.sh: 31/31 (28 default + 3 heavy/A) - default bash tests/test-oci-compat.sh: 28/28 - 25 OCI unit suites (test-oci-*): all green

Fixture B drives the apply_hardlink path that none of the mainstream registry images exercise at any meaningful scale (debian:bookworm-slim ships 2 hardlinks, python:3.12 ships 1, ruby:alpine ships 0 in its core layer). The layer tar carries /bin/busybox plus /bin/echo and /bin/cat as on-disk hardlinks; BSD tar detects the shared inode and emits two typeflag '1' records that layer-apply must turn back into real hardlinks on the unpacked tree. A pre-flight check rejects the case where the build host's tar silently turns hardlinks into duplicates, so the fixture cannot quietly degrade into a busybox-only smoke if a later host swap changes that behavior. BSD tar tags hardlink rows with a leading 'h' in the mode column ("hrwxr-xr-x ... link to ..."), distinct from regular '-' and symlink 'l' rows. The entrypoint is /bin/echo, which is the hardlink itself, so busybox's argv[0] applet dispatch picks the echo applet and the CLI tail becomes its argument verbatim. The expected stdout is the canonical "elfuse-busybox-shaped-ok" line. A regression where unpack drops the hardlink (or links to the wrong target) shows up as either a launch failure or as the busybox usage banner instead of the echoed line. Validation: - OCI_COMPAT_TEST=1 bash tests/test-oci-compat.sh: 34/34 (31 prior heavy/A baseline + 3 heavy/B) - default bash tests/test-oci-compat.sh: 28/28 unchanged

Fixture C closes the third leg of the heavy compat matrix. Layer 1 stages /bin/busybox plus /bin/ls (hardlink) and a /data dir with keep.txt + remove.txt; layer 2 carries a single empty file at /data/.wh.remove.txt. After layer apply the unpacked rootfs must contain /data/keep.txt and nothing else under /data. The OCI image-spec is explicit that the ".wh.<name>" marker must never appear in the final filesystem, so the test asserts on two surfaces: the runtime stdout shape (`/bin/ls /data` emits "keep.txt" and not "remove.txt") and the on-disk unpacked tree under HEAVY_MOUNT/images/sha256-<hex>/data (must have keep.txt, must not have remove.txt, must not have .wh.remove.txt). A regression that forwards the marker as a real file would slip past the runtime check on layered tooling but fails the disk-state check immediately. This completes the Phase 3 follow-up the original SKIP comment named: sparsebundle volume provisioning + all three plan fixtures (alpine-shaped, busybox-shaped, two-layer-whiteout) now run end-to-end under OCI_COMPAT_TEST=1. Validation: - OCI_COMPAT_TEST=1 bash tests/test-oci-compat.sh: 37/37 (28 default + 9 heavy across the three fixtures) - default bash tests/test-oci-compat.sh: 28/28 unchanged - 25 OCI unit suites (test-oci-*): all green

pull_progress_tty_redraw uses CSI cursor-up ("\033[NF") plus CSI clear-line ("\033[2K") to redraw N blob rows in place each time the curl xferinfo callback fires. Some terminal panes emulate a pty (so isatty reports true) but silently ignore the cursor-up sequence; the result is that every redraw cycle prints the same N rows below the previous ones, stacking hundreds of duplicate lines across a single pull, and the ignored clear-line lets shorter media-type strings bleed into the suffix of the prior longer one ("config.v1+jsontar+gzip" instead of "config.v1+json"). The fix is a one-line escape hatch: ELFUSE_OCI_PROGRESS=plain (also accepted: lines, off) forces is_tty=false even on a real TTY, sending the renderer down the line-per-completion path that already exists for non-TTY callers (test-oci-pull's test_pull_progress_non_tty covers it). Operators on a misbehaving terminal pane can export the env once and never see the stacking again; the default behavior on cooperative terminals is unchanged. Validation: - 25 OCI unit suites: all green (test-oci-pull 14/14 unchanged) - bash tests/test-oci-compat.sh: 28/28 - OCI_COMPAT_TEST=1 bash tests/test-oci-compat.sh: 37/37

cubic-dev-ai

3 issues found across 131 files (changes from recent commits).

Prompt for AI agents (unresolved issues)


Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/oci/pull.c">

<violation number="1" location="src/oci/pull.c:253">
P2: Error-path leak: `sub_resp` may be allocated but not freed when sub-manifest fetch fails before `have_sub` is set.</violation>
</file>

<file name="src/oci/media-type.c">

<violation number="1" location="src/oci/media-type.c:100">
P2: Media type parsing is case-sensitive, but media type type/subtype tokens are case-insensitive; valid values with different casing will be misclassified as unknown.</violation>
</file>

<file name="src/oci/ref.c">

<violation number="1" location="src/oci/ref.c:83">
P2: Repository-path validation incorrectly rejects valid names with repeated dashes (for example `my--repo`).</violation>

<violation number="2" location="src/oci/ref.c:356">
P2: `docker.io` default-namespace detection is case-sensitive, so mixed-case hostnames can skip the required `library/` prefix.</violation>
</file>

<file name="src/oci/fetch.c">

<violation number="1" location="src/oci/fetch.c:782">
P2: Manifest fetch skips bearer-challenge parsing when a token is already cached, so 401 responses from expired/stale tokens are not retried with a refreshed token.</violation>

<violation number="2" location="src/oci/fetch.c:945">
P2: Blob fetch also disables challenge parsing when a token is cached, preventing 401-triggered token refresh and causing avoidable pull failures.</violation>
</file>

<file name="src/oci/blob-store.c">

<violation number="1" location="src/oci/blob-store.c:354">
P2: The commit path is not crash-durable because it never fsyncs the destination directory after linking the blob into place.</violation>
</file>

<file name="src/oci/store.c">

<violation number="1" location="src/oci/store.c:285">
P2: Fsync the pin directory after `rename` to make tag->digest updates crash-safe; file fsync alone does not persist the directory entry change.</violation>
</file>

<file name="src/oci/manifest.c">

<violation number="1" location="src/oci/manifest.c:295">
P2: `schemaVersion` parsing can accept fractional JSON numbers because `valueint` is used without an integer round-trip check.</violation>

<violation number="2" location="src/oci/manifest.c:385">
P2: Layer descriptor memory is leaked on post-parse validation failures because `nlayers` is incremented too late.</violation>

<violation number="3" location="src/oci/manifest.c:481">
P2: Index descriptor memory leaks when platform parsing fails because `nentries` is incremented after the fallible parse.</violation>
</file>

<file name="docs/usage.md">

<violation number="1" location="docs/usage.md:135">
P2: Contradictory documentation for `--user`. The options table describes it as 'numeric only', but the User and WorkingDir section immediately below describes detailed symbolic-name resolution (accepting symbolic `name`, `name:group`, reading /etc/passwd and /etc/group). These cannot both be correct.</violation>
</file>

<file name="src/oci/inspect.h">

<violation number="1" location="src/oci/inspect.h:57">
P3: The `suppress_layer_reuse` comment is inverted and documents the opposite runtime behavior, which can cause callers to pass the wrong value.</violation>
</file>

<file name="externals/zstd/VENDORING.md">

<violation number="1" location="externals/zstd/VENDORING.md:12">
P3: The file references 'oci-roadmap.md', which does not exist in the codebase. Remove the broken reference or update it to point to the actual document containing the policy commitment.</violation>
</file>

_{Note: This PR contains a large number of files. cubic only reviews up to 100 files per PR, so some files may not have been reviewed. cubic prioritizes the most important files to review.

On a pro plan you can use ultrareview for larger PRs.

Re-trigger cubic}

cubic-dev-ai · 2026-05-23T04:37:26Z

+| `-e KEY=VAL`, `--env KEY=VAL` | Set or replace one env var (repeatable) |
+| `-e KEY`, `--env KEY` | Import `KEY` from the host environ (repeatable) |
+| `-w DIR`, `--workdir DIR` | Override image WorkingDir |
+| `-u UID[:GID]`, `--user UID[:GID]` | Override image User (numeric only) |


P2: Contradictory documentation for --user. The options table describes it as 'numeric only', but the User and WorkingDir section immediately below describes detailed symbolic-name resolution (accepting symbolic name, name:group, reading /etc/passwd and /etc/group). These cannot both be correct.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At docs/usage.md, line 135: <comment>Contradictory documentation for `--user`. The options table describes it as 'numeric only', but the User and WorkingDir section immediately below describes detailed symbolic-name resolution (accepting symbolic `name`, `name:group`, reading /etc/passwd and /etc/group). These cannot both be correct.</comment> <file context> @@ -99,6 +99,179 @@ and memory access, and per-thread inspection. Implementation details, including +| `-e KEY=VAL`, `--env KEY=VAL` | Set or replace one env var (repeatable) | +| `-e KEY`, `--env KEY` | Import `KEY` from the host environ (repeatable) | +| `-w DIR`, `--workdir DIR` | Override image WorkingDir | +| `-u UID[:GID]`, `--user UID[:GID]` | Override image User (numeric only) | +| `--keep` | Keep the per-run cloned rootfs after exit | +| `--name NAME` | Reserved: deterministic clone-dir suffix (ignored today) | </file context>

Suggested change

| `-u UID[:GID]`, `--user UID[:GID]` | Override image User (numeric only) |

| `-u UID[:GID]`, `--user UID[:GID]` | Override image User (supports numeric UID[:GID] or symbolic name[:group]) |

cubic-dev-ai · 2026-05-23T04:37:27Z

+    /* When true (default), render a "layer reuse:" section after the
+     * manifest layer table. Setting this to false suppresses the section
+     * entirely (useful for tests that only want to verify the renderer
+     * baseline without dedup compute side-effects). The CLI never sets
+     * this to false.
+     */


P3: The suppress_layer_reuse comment is inverted and documents the opposite runtime behavior, which can cause callers to pass the wrong value.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At src/oci/inspect.h, line 57: <comment>The `suppress_layer_reuse` comment is inverted and documents the opposite runtime behavior, which can cause callers to pass the wrong value.</comment> <file context> @@ -45,9 +46,21 @@ typedef struct { + * convention. Pure information: dedup metrics never write to disk. + */ + const char *volume_root; + /* When true (default), render a "layer reuse:" section after the + * manifest layer table. Setting this to false suppresses the section + * entirely (useful for tests that only want to verify the renderer </file context>

Suggested change

/* When true (default), render a "layer reuse:" section after the

* manifest layer table. Setting this to false suppresses the section

* entirely (useful for tests that only want to verify the renderer

* baseline without dedup compute side-effects). The CLI never sets

* this to false.

*/

/* When false (default), render a "layer reuse:" section after the

* manifest layer table. Setting this to true suppresses the section

* entirely (useful for tests that only want to verify the renderer

* baseline without dedup compute side-effects). The CLI never sets

* this to true.

*/

cubic-dev-ai · 2026-05-23T04:37:27Z

+
+## Why vendored, decode-only
+
+`oci-roadmap.md` Q9 commits the OCI work to hand-rolled C: no Go, no Rust,


P3: The file references 'oci-roadmap.md', which does not exist in the codebase. Remove the broken reference or update it to point to the actual document containing the policy commitment.

Prompt for AI agents

Check if this issue is valid — if so, understand the root cause and fix it. At externals/zstd/VENDORING.md, line 12: <comment>The file references 'oci-roadmap.md', which does not exist in the codebase. Remove the broken reference or update it to point to the actual document containing the policy commitment.</comment> <file context> @@ -0,0 +1,72 @@ + +## Why vendored, decode-only + +`oci-roadmap.md` Q9 commits the OCI work to hand-rolled C: no Go, no Rust, +no `cargo` / `go` in the build matrix. zstd is the only OCI-spec layer +compression beyond gzip that has wide registry support, and the upstream </file context>

jserv

Rebase onto the latest main branch and squash/rework the commits into fewer, cleaner ones.

Max042004 added 7 commits May 15, 2026 23:00

cubic-dev-ai Bot reviewed May 15, 2026

View reviewed changes

Max042004 added 22 commits May 20, 2026 22:02

Max042004 added 27 commits May 21, 2026 22:30

Max042004 changed the title ~~Add elfuse oci subcommand for pulling and inspecting images~~ Add OCI image support: pull, unpack, run, prune, status, policy May 23, 2026

cubic-dev-ai Bot reviewed May 23, 2026

View reviewed changes

jserv requested changes May 23, 2026

View reviewed changes

	\| `-u UID[:GID]`, `--user UID[:GID]` \| Override image User (numeric only) \|
	\| `-u UID[:GID]`, `--user UID[:GID]` \| Override image User (supports numeric UID[:GID] or symbolic name[:group]) \|


		## Why vendored, decode-only

		`oci-roadmap.md` Q9 commits the OCI work to hand-rolled C: no Go, no Rust,

Conversation

Max042004 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Scope

Manual smoke test (docker.io/library/python:3.12)

Performance characterization (vs OrbStack)

Pure CPU (factorial big-int multiply, no syscall)

Syscall density (Python loop hammering syscalls)

Wall-clock model

Known limitations

Summary by cubic

Uh oh!

cubic-dev-ai Bot left a comment

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 15, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

cubic-dev-ai Bot May 23, 2026

Choose a reason for hiding this comment

Uh oh!

jserv left a comment

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Max042004 commented May 15, 2026 •

edited

Loading

cubic-dev-ai Bot left a comment •

edited

Loading