feat(native): native macOS runner mode for trusted repos#91
Open
ephpm-claude[bot] wants to merge 7 commits into
Open
feat(native): native macOS runner mode for trusted repos#91ephpm-claude[bot] wants to merge 7 commits into
ephpm-claude[bot] wants to merge 7 commits into
Conversation
…in, static concurrency
…filesystem write isolation
Run GHA jobs directly on the macOS host instead of per-job VMs, enabling 4+ concurrent jobs (vs Apple's 2-VM cap) with zero boot overhead. Configured per-repo under [runner.macos] with "org/repo" keys, "org/*" wildcards, and a separate nativeMacSem concurrency gate. The VM path is untouched. Jobs never run as root: a hidden _ephemerd service user is created lazily (per-job ephemeral users were abandoned — macOS user deletion requires Full Disk Access and wedges opendirectoryd). Each job gets its own HOME/TMPDIR/work dir, keychain, Homebrew prefix, and a sandbox-exec profile denying localhost outbound and port binding. Also fixes uncovered along the way: - runner extraction is OS-suffixed (runners/<ver>-<goos>) so the macOS host and Linux VM no longer corrupt each other's runner on the shared data dir (Linux dispatch exit 127) - isOfficialRunnerImage prefixes had a trailing dash that never matched the runner-ci-linux tag, breaking custom-image dispatch - DEVELOPER_DIR resolved via xcode-select -p instead of hardcoded Xcode.app path (broke git on CLT-only hosts) - macOS VM runner monitor logs pgrep results at debug level Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
Security follow-ups from review of the native runner. Native jobs run directly on the host with no VM boundary, so the sandbox profile and unix permissions are the entire isolation story — two concrete holes closed here, plus one documented as needing live-macOS work. 1. Sibling-job + daemon-state isolation. Every native job runs as the same _ephemerd uid and all workspaces live under <dataDir>/native/, so a job could read a concurrent job's checkout token or source. The profile now denies read AND write of the whole <dataDir>/native subtree and re-allows only the job's own dir (sandbox-exec applies the last matching rule). config.toml, ephemerd.sock, and the vm dir gain write denies to match their existing read denies. 2. .ssh write hole. .ssh was read-denied but writable, leaving an authorized_keys append vector on any host where the runner uid can reach the target home. Now denied for write too. 3. Dedicated primary group instead of staff (gid 20). staff is the default group for every normal macOS account, so the runner process inherited group access to the many staff-group-owned files on a typical Mac. The service user now gets a dedicated _ephemerd group. Provisioning is best-effort: any failure falls back to staff (the previously-tested behavior), so a group hiccup never blocks jobs. Not done here (documented in a code comment as a follow-up): flipping the profile from allow-by-default to deny-by-default. That is the stronger posture for native execution but requires enumerating every path the GHA runner + toolchains touch and live-testing on macOS so jobs don't break — can't be verified blind from a non-macOS host. The LAN-egress gap (sandbox-exec has no CIDR support; pf rules still a follow-up) is unchanged and remains the reason native mode should stay restricted to trusted first-party repos.
The hardened sandbox blocked the GHA runner from starting. Three distinct macOS sandbox-exec behaviors, each found via local repro: 1. deny file-read* on the native subtree blocked file-read-metadata, which realpath() needs to traverse through native/ to the job dir. The .NET host died with "Failed to resolve full path of the current executable" (exit 133). Fixed: deny only file-read-data. 2. getcwd() and bash walk UP from the job's runner dir and must readdir(native/) to learn the job-id component name; the read-data deny on the native subtree blocked that, giving "getcwd: cannot access parent directories" and "run.sh: Operation not permitted" (exit 126). Fixed: allow file-read-data on the native dir node (literal) — leaks only the non-secret list of concurrent job ids. 3. macOS sandbox resolves a specific-operation deny (file-read-data) over a later wildcard allow (file-read*), so the per-job re-allow must name file-read-data explicitly to win. Added an explicit file-read-data re-allow on the job subtree alongside file-read*. Job-to-job isolation is preserved: a sibling job's directory listing and file contents stay denied (verified). Smoke-test jobs now run end-to-end as _ephemerd with all steps green. Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
[runner.macos]withorg/repokeys andorg/*wildcards. SeparatenativeMacSemgate; VM path untouched._ephemerdservice user (created lazily, like_www). Per-job ephemeral users were attempted and abandoned: macOS user deletion requires Full Disk Access and wedges opendirectoryd.user = "..."config overrides.sandbox-execprofile (deny localhost outbound + all port binding; CIDR rules are unsupported by sandbox-exec — pf firewall is a follow-up).Bug fixes found along the way
runners/<ver>-<goos>): macOS host and Linux VM were corrupting each other's runner on the shared data dir, causing Linux dispatch exit 127.isOfficialRunnerImageprefixes had a trailing dash that never matched therunner-ci-linuxtag — custom-image Linux dispatch always exited 127.DEVELOPER_DIRresolved viaxcode-select -p(hardcoded Xcode.app path broke git on CLT-only hosts).Test plan
_ephemerd— all steps green incl. checkout🤖 Generated with Claude Code