fix: auto-retry via fresh dispatch when the Pages origin wedges#43
Merged
Conversation
External evidence (actions/deploy-pages#383) and today's incident (origin frozen on the pre-tag switcher.json for 83+ minutes despite deploy-pages reporting success) both show that waiting longer does not recover a wedged Pages origin — only a genuinely new deploy attempt does. Widening the verify timeout further isn't a realistic fix, so on verify failure the deploy job now re-dispatches itself fresh (capped at 2 retries) instead of just red-checking. Must be a fresh workflow_dispatch, not a job rerun — rerunning stacks a second github-pages artifact under the same run and breaks deploy-pages' artifact lookup, which we also hit today.
Replace the attempt-count cap with a wall-clock deadline anchored on the release's publish time (gh release view --json publishedAt), not the underlying commit's timestamp. This repo's own release flow allows tagging a commit long after it was merged, so a commit-age check would wrongly refuse to retry in exactly the same-SHA dedup scenario (deploy-pages#383) this exists to handle. The deadline is computed once by the re-dispatch job and carried unchanged through any retries, and is naturally scoped to tag/release deploys only (retry-until is empty for push/PR/manual dispatches, which can't hit this wedge class — they're always first-of-SHA).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
switcher.json(v0.20.0preferred, nov0.21.0) for 83+ minutes — confirmed a genuine wedge (unchanged etag/last-modified across 13 polls), not slow propagation.github-pagesartifact and dispatching a freshworkflow_dispatchfixed it in under 90s. This matches external evidence in actions/deploy-pages#383: waiting longer never recovers a wedged origin, only a new deploy attempt does.publish.yml'sdeployjob now automates that fix: on verify failure it re-dispatches itself fresh (via the existingdispatch-workflowshim mechanism, same pattern the tagre-dispatchjob already uses).gh workflow runis a brand-new run ID with its own artifact namespace, unlike a job rerun (which stacks a secondgithub-pagesartifact under the same run and breaksdeploy-pages' artifact lookup — also hit today, separately, from manual reruns).Design: retry-until deadline, not a retry counter
The retry budget is a wall-clock deadline (
retry-until, epoch seconds), not an attempt count:re-dispatchjob, asrelease.publishedAt + 15min(viagh release view --json publishedAt) — anchored on the release's publish time, not the underlying commit's timestamp. This repo's own release flow allows tagging a commit long after it was merged ("tag the merged commit onorigin/main"), so a commit-age check would wrongly refuse to retry in exactly the same-SHA dedup scenario this exists to handle.retry-untilis empty for push/PR/manual dispatches, so they get the plain red-check with no auto-retry — correct, since those are always first-of-SHA deploys and can't hit this dedup wedge class per deploy-pages#383's mechanism.Changes
publish.yml: newretry-untilinput (empty by default = no retry budget);deployjob'sactionspermission bumpedread→write;Verifystep gainsid: verify; newRetry via a fresh dispatch if the origin is wedgedstep, gated on the verify step specifically failing AND a non-emptyretry-until.re-dispatchjob now fetchespublishedAtand computes/passes the deadline.publish-dispatch.yml: threadsretry-untilthrough itsworkflow_dispatchinput into theuses: publish.ymlcall.Test plan
npm test— all existing checks pass (no workflow-specific tests; this repo testsassemble.mjs, not the YAML)js-yaml