ci(release): cap build_and_package at 60 min to kill queue-starved legs#18
Open
Cliftonz wants to merge 2 commits into
Open
ci(release): cap build_and_package at 60 min to kill queue-starved legs#18Cliftonz wants to merge 2 commits into
Cliftonz wants to merge 2 commits into
Conversation
Phase A of automating all eight distribution channels: kill the single-point-of-failure dependency on crates.io, fix winget's wrong installer regex, and add secret-presence gates so missing credentials surface as warnings instead of release-breaking job errors. Before: every channel had `needs: [publish-crates-io]`. A crates.io verify failure (as happened at v0.1.0 because jarvy-templates was marked `publish = false`) cascaded to skip AUR, winget, Chocolatey, Homebrew. None of those channels actually consume crates.io — they all pull from the GitHub release artifacts, so the dependency was spurious. After: each channel runs independently the moment the GitHub release publishes. A crates.io failure no longer blocks `.msi` reaching winget or Chocolatey. Per-channel fixes: * update-homebrew — drop the crates.io needs gate. (Tarball gap to produce `.tar.gz` artifacts is a separate Phase B blocker; this job still skips when HOMEBREW_TAP_DEPLOY_KEY is unset.) * update-aur — drop the crates.io needs gate, add secret-presence gate for AUR_SSH_PRIVATE_KEY + AUR_USERNAME + AUR_EMAIL, compute real sha256sums from SHA256SUMS.txt instead of leaving `SHA256_PLACEHOLDER_X86/_ARM` tokens in the pushed PKGBUILD (the prior code would have written a broken PKGBUILD that failed user `makepkg` integrity check). AUR-bin push still depends on the upstream tarball gap closing; the sha256 lookup step fails loud with a clear `release.yml does not yet produce *.tar.gz` error in that case rather than pushing garbage. * update-winget — drop the crates.io needs gate, add WINGET_TOKEN presence gate. Fix `installers-regex`: the prior value `jarvy-.*-x86_64-pc-windows-.*\.zip$` was a leftover from a never- wired cargo-dist plan; the release matrix has only ever produced `jarvy_<v>_x64_en-US.msi`, so the regex matched zero assets and the action silently no-op'd. New regex `^jarvy_\d+\.\d+\.\d+_x64_en-US\.msi$` matches real assets. * update-chocolatey — drop the crates.io needs gate, add CHOCOLATEY_API_KEY presence gate. Substitution + sha256 lookup steps unchanged (those were already correct for .msi shape). * `continue-on-error: true` removed from update-aur, update-winget, update-chocolatey. The secret-presence gate replaces the masking: missing setup surfaces as a workflow warning + job-skipped status; a real publish failure (after secret is set) now fails the job visibly so it can be triaged. Phase B follow-up: add `.tar.gz` artifact production to release.yml (macOS arm64 + Linux x86_64-gnu / aarch64-gnu / x86_64-musl). When that lands, AUR-bin + Homebrew + the universal install scripts (install.sh / install.ps1) all start working without further publish-packages changes. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
v0.1.0-rc.1 and v0.1.0-rc.2 each burned a full GitHub-imposed 24 h job timeout on the (now-removed) macos-13 matrix leg. The smoking-gun line from both jobs (runs 25340850348 + 25342271485) was the standard runner wait message — the job never got picked up by a runner; it sat in queue the entire 24 h: Waiting for a runner to pick up this job... (This was queue starvation, not a build hang. GitHub does not bill for queue time, but the leg still blocked downstream sign/SBOM/upload jobs until the 24 h kill fired, delaying every release by a day.) The macos-13 leg itself was already dropped in 9bbed66. This is the belt-and-braces defense so the next runner-pool exhaustion (whether from re-adding macos-13 or any other label going scarce) dies in 60 min instead of 24 h. Healthy matrix legs all finish well under the cap: - ubuntu-latest ~ 5-8 min - windows-latest ~ 8-10 min - macos-latest arm ~ 6-15 min Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Adds
timeout-minutes: 60to thebuild_and_packagematrix job inrelease.ymlso a queue-starved leg dies in 60 min instead of GitHub's hard 24 h job-cap.Smoking gun
Both
v0.1.0-rc.1(run 25340850348) andv0.1.0-rc.2(run 25342271485) had theirmacos-13matrix leg cancelled at exactly 24h 00m. The last line of the runner-bootstrap log on BOTH jobs is:No
actions/checkout, nocargo build, nocargo packagerever ran. The job sat in queue 24 h until GitHub's hard cap fired. Cause:macos-13(Intel) runner-pool exhaustion — Apple Silicon migration drained that label.Why this is still worth landing
The macos-13 matrix entry itself was already removed in 9bbed66 (
build: drop macOS Intel (macos-13) from release matrix), so the bug as-existed is gone. This PR is belt-and-braces: the next time ANY runner label goes scarce (whoever re-adds macos-13, or macos-latest queues blow up during an Apple release event), the job dies in 60 min, not 24 h.Cost note
GitHub does NOT bill for queue time — only for runner execution. So the actual compute charge from the two runaway runs was ~$0. The real damage was a 24 h delay on every dependent job (
sign_artifacts,generate_sbom,upload_to_releaseallneeds: [build_and_package]), pushing release wallclock from ~10 min to a full day.Test plan
release.ymlsyntax unchanged outside the one blockgh run listhistory (ubuntu ~8 min, windows ~10 min, macos-latest arm ~15 min)🤖 Generated with Claude Code