Skip to content

fix(ci): repair rustup proxy AFTER cache restore on macOS release builds#254

Merged
githubrobbi merged 3 commits into
mainfrom
fix/ci-rustup-proxy-post-cache-restore
May 15, 2026
Merged

fix(ci): repair rustup proxy AFTER cache restore on macOS release builds#254
githubrobbi merged 3 commits into
mainfrom
fix/ci-rustup-proxy-post-cache-restore

Conversation

@githubrobbi
Copy link
Copy Markdown
Collaborator

What

Release pipeline #97 (v0.5.98) failed on the aarch64-apple-darwin matrix leg with:

```
error: unexpected argument 'build' found
Usage: rustup-init[EXE] [OPTIONS]
Stack backtrace: rustup_init::run_rustup_inner ...
```

The Linux leg passed for the first time since v0.5.96 — confirming PR #251 fixed the "Show binary sizes" regression and we are now hitting a different failure mode on macOS specifically.

Diagnosis

PR #245 added a proxy-repair line (`rustup default "$(rustup show active-toolchain | awk '{print $1}')"`) inside the Install pinned nightly step. That line works — the in-step `cargo --version` smoke check passes — but the very next step, `Swatinem/rust-cache`, restores `~/.cargo/bin/` from the cache. On `macos-latest` that cache holds a poisoned `cargo` symlink pointing at `rustup-init` (the installer) rather than the toolchain proxy. Restoring it overwrites the freshly-repaired proxy, and the next `cargo build` picks up the bad one and exits with the rustup-init backtrace before reaching any cargo subcommand.

Empirical proof from the failing job (`76219153458`):

Time Event
16:36:34Z `cargo --version` → `cargo 1.97.0-nightly` (proxy healthy)
16:36:35Z `Swatinem/rust-cache`: restoring `~/.cargo/bin/` …
16:36:54Z `cargo build` → `rustup_init::run_rustup_inner` backtrace (proxy poisoned)

So PR #245's repair is too early. It happens before the cache restore that re-poisons the proxy.

Fix

Add a second proxy-repair step after the cache-restore step and before the build step:

```yaml
- name: Repair rustup proxy after cache restore (macOS guard)
if: runner.os == 'macOS'
shell: bash
run: |
rustup default "$(rustup show active-toolchain | awk '{print $1}')"
cargo --version
```

Re-running `rustup default` rewrites the proxy binaries in `~/.cargo/bin` again, so the subsequent `cargo build` resolves to the active toolchain's real cargo even when the restored cache was poisoned. On the next successful run, `Swatinem/rust-cache` will save a clean cache and the poison clears itself.

Both `release.yml::build-release-binaries` and `release-cache-warm.yml::warm` get the same step (with matching commentary). Without the fix in the warm workflow, every macOS warm run would re-save the poisoned cache and perpetuate the regression.

Gated on `runner.os == 'macOS'` so the Linux and Windows legs — which do not exhibit this symptom — are unaffected.

Why not just disable `cache-bin` in Swatinem/rust-cache?

`Swatinem/rust-cache` has a `cache-bin: false` switch that would skip `~/.cargo/bin/` entirely and prevent the poisoning at its source. Trade-off: every cache miss would re-install `cargo-cyclonedx`, `cargo-deny`, `cargo-machete`, etc. The post-cache repair preserves that caching benefit and is surgical to the macOS rustup-proxy symptom.

Why not `gh cache delete` the macOS cache key right now?

The poisoned cache will be re-saved on the next macOS warm run regardless — without the workflow fix in this PR, deleting the cache is a one-shot bandage. This PR is needed in both cases. Once it merges, you can optionally evict the bad cache (`gh cache delete `) to skip one cycle of self-healing, but the workflow will heal on its own after one successful macOS run on either workflow.

Verification

  • Local pre-push gate (`lint-pre-push`) green: file-size, drift gates, vet, machete, clippy CI/prod/tests, rustdoc, doc-tests, tests, smoke, deny, lint-ci-windows — 169s.
  • PR CI fast-lane will exercise the same workflow file via actionlint/yaml-lint.
  • The fix is verifiable end-to-end by re-running the v0.5.98 release pipeline after this PR merges; the macOS leg should make it past "Build optimized release binaries".

Sequencing for the v0.5.98 ship

The currently-running pipeline #97 will finish failing the macOS leg (and skip the publish job). After this PR merges, re-dispatch the same v0.5.98 release pipeline manually — same tag, same artifacts, the macOS leg should now build cleanly. No version bump needed because v0.5.98 was never published.

Release pipeline #97 (v0.5.98) failed on the aarch64-apple-darwin build with:

    error: unexpected argument 'build' found

    Usage: rustup-init[EXE] [OPTIONS]

    Stack backtrace: rustup_init::run_rustup_inner ...

PR #245 already added a proxy-repair line ('rustup default ...') to the 'Install pinned nightly' step, and the in-step 'cargo --version' smoke check still passes — but the very next step, 'Swatinem/rust-cache', restores '~/.cargo/bin/' from a prior run's cache.  On macos-latest that cache holds a poisoned 'cargo' symlink pointing at 'rustup-init' (the installer) rather than the toolchain proxy.  The restore overwrites the freshly-repaired '~/.cargo/bin/cargo', and the subsequent 'cargo build' picks up the poisoned proxy and exits with the rustup-init backtrace before reaching any cargo subcommand.

Empirical proof from the failing job (76219153458):

    16:36:34Z  cargo --version  ->  cargo 1.97.0-nightly  (proxy healthy)

    16:36:35Z  Swatinem/rust-cache: restoring ~/.cargo/bin/ ...

    16:36:54Z  cargo build         ->  rustup-init backtrace  (proxy poisoned)

Fix: add a second proxy-repair step *after* the cache-restore step and *before* the build step.  Re-running 'rustup default "$(rustup show active-toolchain | awk '{print $1}')"' rewrites the proxy binaries in '~/.cargo/bin' so the build resolves to the active toolchain's cargo even when the restored cache was poisoned.  On the next successful run, 'Swatinem/rust-cache' saves a clean cache and the poison clears itself.

Both 'release.yml::build-release-binaries' and 'release-cache-warm.yml::warm' get the same step (with matching commentary) — without the fix in the warm workflow, every macOS warm run would re-save the poisoned cache and perpetuate the regression.  Step is gated on 'runner.os == macOS' so Linux and Windows legs (which do not exhibit this symptom) are unaffected.

Pre-cache repair line is intentionally kept so the in-step 'cargo --version' smoke check still fails fast on a broken runner-image proxy.
@githubrobbi githubrobbi enabled auto-merge (squash) May 15, 2026 17:04
@githubrobbi githubrobbi merged commit c070543 into main May 15, 2026
24 checks passed
@githubrobbi githubrobbi deleted the fix/ci-rustup-proxy-post-cache-restore branch May 15, 2026 17:18
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant