Skip to content

ci(release): cap build_and_package at 60 min to kill queue-starved legs#18

Open
Cliftonz wants to merge 2 commits into
mainfrom
ci/release-job-timeout
Open

ci(release): cap build_and_package at 60 min to kill queue-starved legs#18
Cliftonz wants to merge 2 commits into
mainfrom
ci/release-job-timeout

Conversation

@Cliftonz
Copy link
Copy Markdown
Contributor

Summary

Adds timeout-minutes: 60 to the build_and_package matrix job in release.yml so a queue-starved leg dies in 60 min instead of GitHub's hard 24 h job-cap.

Smoking gun

Both v0.1.0-rc.1 (run 25340850348) and v0.1.0-rc.2 (run 25342271485) had their macos-13 matrix leg cancelled at exactly 24h 00m. The last line of the runner-bootstrap log on BOTH jobs is:

Waiting for a runner to pick up this job...

No actions/checkout, no cargo build, no cargo packager ever ran. The job sat in queue 24 h until GitHub's hard cap fired. Cause: macos-13 (Intel) runner-pool exhaustion — Apple Silicon migration drained that label.

Why this is still worth landing

The macos-13 matrix entry itself was already removed in 9bbed66 (build: drop macOS Intel (macos-13) from release matrix), so the bug as-existed is gone. This PR is belt-and-braces: the next time ANY runner label goes scarce (whoever re-adds macos-13, or macos-latest queues blow up during an Apple release event), the job dies in 60 min, not 24 h.

Cost note

GitHub does NOT bill for queue time — only for runner execution. So the actual compute charge from the two runaway runs was ~$0. The real damage was a 24 h delay on every dependent job (sign_artifacts, generate_sbom, upload_to_release all needs: [build_and_package]), pushing release wallclock from ~10 min to a full day.

Test plan

  • release.yml syntax unchanged outside the one block
  • 60 min cap > the longest healthy leg observed in gh run list history (ubuntu ~8 min, windows ~10 min, macos-latest arm ~15 min)
  • Validated implicitly by next release cut — no opt-in needed

🤖 Generated with Claude Code

Cliftonz and others added 2 commits May 27, 2026 11:56
Phase A of automating all eight distribution channels: kill the
single-point-of-failure dependency on crates.io, fix winget's wrong
installer regex, and add secret-presence gates so missing credentials
surface as warnings instead of release-breaking job errors.

Before: every channel had `needs: [publish-crates-io]`. A crates.io
verify failure (as happened at v0.1.0 because jarvy-templates was
marked `publish = false`) cascaded to skip AUR, winget, Chocolatey,
Homebrew. None of those channels actually consume crates.io — they
all pull from the GitHub release artifacts, so the dependency was
spurious.

After: each channel runs independently the moment the GitHub release
publishes. A crates.io failure no longer blocks `.msi` reaching
winget or Chocolatey.

Per-channel fixes:

* update-homebrew — drop the crates.io needs gate. (Tarball gap to
  produce `.tar.gz` artifacts is a separate Phase B blocker; this
  job still skips when HOMEBREW_TAP_DEPLOY_KEY is unset.)

* update-aur — drop the crates.io needs gate, add secret-presence
  gate for AUR_SSH_PRIVATE_KEY + AUR_USERNAME + AUR_EMAIL, compute
  real sha256sums from SHA256SUMS.txt instead of leaving
  `SHA256_PLACEHOLDER_X86/_ARM` tokens in the pushed PKGBUILD (the
  prior code would have written a broken PKGBUILD that failed user
  `makepkg` integrity check). AUR-bin push still depends on the
  upstream tarball gap closing; the sha256 lookup step fails loud
  with a clear `release.yml does not yet produce *.tar.gz` error in
  that case rather than pushing garbage.

* update-winget — drop the crates.io needs gate, add WINGET_TOKEN
  presence gate. Fix `installers-regex`: the prior value
  `jarvy-.*-x86_64-pc-windows-.*\.zip$` was a leftover from a never-
  wired cargo-dist plan; the release matrix has only ever produced
  `jarvy_<v>_x64_en-US.msi`, so the regex matched zero assets and
  the action silently no-op'd. New regex
  `^jarvy_\d+\.\d+\.\d+_x64_en-US\.msi$` matches real assets.

* update-chocolatey — drop the crates.io needs gate, add
  CHOCOLATEY_API_KEY presence gate. Substitution + sha256 lookup
  steps unchanged (those were already correct for .msi shape).

* `continue-on-error: true` removed from update-aur, update-winget,
  update-chocolatey. The secret-presence gate replaces the masking:
  missing setup surfaces as a workflow warning + job-skipped status;
  a real publish failure (after secret is set) now fails the job
  visibly so it can be triaged.

Phase B follow-up: add `.tar.gz` artifact production to release.yml
(macOS arm64 + Linux x86_64-gnu / aarch64-gnu / x86_64-musl). When
that lands, AUR-bin + Homebrew + the universal install scripts
(install.sh / install.ps1) all start working without further
publish-packages changes.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.1.0-rc.1 and v0.1.0-rc.2 each burned a full GitHub-imposed 24 h job
timeout on the (now-removed) macos-13 matrix leg. The smoking-gun line
from both jobs (runs 25340850348 + 25342271485) was the standard runner
wait message — the job never got picked up by a runner; it sat in queue
the entire 24 h:

  Waiting for a runner to pick up this job...

(This was queue starvation, not a build hang. GitHub does not bill for
queue time, but the leg still blocked downstream sign/SBOM/upload jobs
until the 24 h kill fired, delaying every release by a day.)

The macos-13 leg itself was already dropped in 9bbed66. This is the
belt-and-braces defense so the next runner-pool exhaustion (whether
from re-adding macos-13 or any other label going scarce) dies in 60 min
instead of 24 h.

Healthy matrix legs all finish well under the cap:
  - ubuntu-latest    ~ 5-8 min
  - windows-latest   ~ 8-10 min
  - macos-latest arm ~ 6-15 min

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant