Skip to content

fix: auto-retry via fresh dispatch when the Pages origin wedges#43

Merged
coretl merged 2 commits into
mainfrom
fix/pages-verify-auto-retry
Jul 3, 2026
Merged

fix: auto-retry via fresh dispatch when the Pages origin wedges#43
coretl merged 2 commits into
mainfrom
fix/pages-verify-auto-retry

Conversation

@coretl

@coretl coretl commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Summary

  • Today's v0.21.0 release deploy reported success but the live origin stayed frozen serving the pre-tag switcher.json (v0.20.0 preferred, no v0.21.0) for 83+ minutes — confirmed a genuine wedge (unchanged etag/last-modified across 13 polls), not slow propagation.
  • Manually deleting the run's github-pages artifact and dispatching a fresh workflow_dispatch fixed it in under 90s. This matches external evidence in actions/deploy-pages#383: waiting longer never recovers a wedged origin, only a new deploy attempt does.
  • publish.yml's deploy job now automates that fix: on verify failure it re-dispatches itself fresh (via the existing dispatch-workflow shim mechanism, same pattern the tag re-dispatch job already uses).
  • No artifact-deletion step needed — a fresh gh workflow run is a brand-new run ID with its own artifact namespace, unlike a job rerun (which stacks a second github-pages artifact under the same run and breaks deploy-pages' artifact lookup — also hit today, separately, from manual reruns).

Design: retry-until deadline, not a retry counter

The retry budget is a wall-clock deadline (retry-until, epoch seconds), not an attempt count:

  • Computed once, in the re-dispatch job, as release.publishedAt + 15min (via gh release view --json publishedAt) — anchored on the release's publish time, not the underlying commit's timestamp. This repo's own release flow allows tagging a commit long after it was merged ("tag the merged commit on origin/main"), so a commit-age check would wrongly refuse to retry in exactly the same-SHA dedup scenario this exists to handle.
  • Carried unchanged through any retries (no re-computation, no increment) — self-terminating by construction, no plumbing-fragility risk from a counter silently resetting.
  • Naturally scoped to tag/release deploys only: retry-until is empty for push/PR/manual dispatches, so they get the plain red-check with no auto-retry — correct, since those are always first-of-SHA deploys and can't hit this dedup wedge class per deploy-pages#383's mechanism.

Changes

  • publish.yml: new retry-until input (empty by default = no retry budget); deploy job's actions permission bumped readwrite; Verify step gains id: verify; new Retry via a fresh dispatch if the origin is wedged step, gated on the verify step specifically failing AND a non-empty retry-until. re-dispatch job now fetches publishedAt and computes/passes the deadline.
  • publish-dispatch.yml: threads retry-until through its workflow_dispatch input into the uses: publish.yml call.

Test plan

  • npm test — all existing checks pass (no workflow-specific tests; this repo tests assemble.mjs, not the YAML)
  • YAML validated with js-yaml
  • Exercise on the next real deploy/wedge — the retry path can't be unit tested (depends on GitHub Pages' actual backend behavior), so watch the next tag release or intentionally trigger a wedge to confirm the auto-retry fires and clears it

External evidence (actions/deploy-pages#383) and today's incident (origin
frozen on the pre-tag switcher.json for 83+ minutes despite deploy-pages
reporting success) both show that waiting longer does not recover a wedged
Pages origin — only a genuinely new deploy attempt does. Widening the
verify timeout further isn't a realistic fix, so on verify failure the
deploy job now re-dispatches itself fresh (capped at 2 retries) instead of
just red-checking. Must be a fresh workflow_dispatch, not a job rerun —
rerunning stacks a second github-pages artifact under the same run and
breaks deploy-pages' artifact lookup, which we also hit today.
Replace the attempt-count cap with a wall-clock deadline anchored on the
release's publish time (gh release view --json publishedAt), not the
underlying commit's timestamp. This repo's own release flow allows tagging
a commit long after it was merged, so a commit-age check would wrongly
refuse to retry in exactly the same-SHA dedup scenario (deploy-pages#383)
this exists to handle. The deadline is computed once by the re-dispatch
job and carried unchanged through any retries, and is naturally scoped to
tag/release deploys only (retry-until is empty for push/PR/manual
dispatches, which can't hit this wedge class — they're always
first-of-SHA).
@coretl coretl temporarily deployed to github-pages July 3, 2026 13:53 — with GitHub Actions Inactive
@coretl coretl merged commit 10650a7 into main Jul 3, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant