You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
BullMQ treats custom job ids as unique while an old job with that id still exists; adding a job with the same id is ignored until the old job is removed. Because failed jobs are retained by removeOnFail: 50, a fetch-pr-files job that exhausts retries (e.g. a transient GitHub Files API failure) remains in the failed set and blocks later refreshes for the same SHA tuple.
This is the same defect class fixed for PR metadata jobs in #118 ("failed metadata jobs must not squat on the stable per-PR jobId"). The PR_FILES jobId is equally stable per SHA tuple, so the same reasoning applies.
The most concrete trigger is the base-retarget refresh introduced by #115 (closing #62): when pull_request.edited arrives with changes.base != null, the handler reuses the same headSha/baseSha tuple to re-enqueue PR_FILES. If a prior fetch on that tuple failed, the retarget refresh is silently dropped, and pr_files / scoring_data_stored stay pinned to the old base — defeating the #115 fix.
Steps to Reproduce
Trigger a pull_request.synchronize event that enqueues fetch-pr-files for a tracked PR.
Make that files job fail all retry attempts, for example through a transient GitHub Files API failure.
Observe that the failed job remains retained because the queue options use removeOnFail: 50.
Observe that the handler tries to enqueue the same job id, prFilesJobId(repoFullName, prNumber, headSha, baseSha).
The fresh files refresh can be skipped because the retained failed job still owns that custom job id.
The same scenario reproduces from FetchProcessor.enqueuePrFilesJob (used by the worker-side re-enqueue path).
Expected Behavior
A later pull_request.edited (base retarget) or any other same-SHA re-enqueue should be able to enqueue a fresh files refresh after a previous files job failed.
The mirror should eventually update:
pr_files
pr_file_contents
pull_requests.scoring_data_stored / pull_requests.base_sha consistency after a base retarget
Actual Behavior
A retained failed files job can cause later refresh attempts for the same (repo, PR, head SHA, base SHA) tuple to be ignored by BullMQ. The mirror keeps stale file scoring data until the failed job is manually removed or evicted by the removeOnFail: 50 retention window.
Suggested narrow fix: mirror #118's diff — replace removeOnFail: 50 with removeOnFail: true at the two enqueue sites above so failed jobs evict immediately and free the deterministic jobId.
The admin replay endpoints in api/admin.controller.ts also enqueue PR_FILES; they are out of scope here and can be a follow-up if operator-visible failure history is intentional there.
Description
PullRequestHandlerandFetchProcessorboth enqueuePR_FILESrefreshes with a deterministic BullMQ job id per (repo, PR, head SHA, base SHA) tuple:BullMQ treats custom job ids as unique while an old job with that id still exists; adding a job with the same id is ignored until the old job is removed. Because failed jobs are retained by
removeOnFail: 50, afetch-pr-filesjob that exhausts retries (e.g. a transient GitHub Files API failure) remains in the failed set and blocks later refreshes for the same SHA tuple.This is the same defect class fixed for PR metadata jobs in #118 ("failed metadata jobs must not squat on the stable per-PR jobId"). The
PR_FILESjobId is equally stable per SHA tuple, so the same reasoning applies.The most concrete trigger is the base-retarget refresh introduced by #115 (closing #62): when
pull_request.editedarrives withchanges.base != null, the handler reuses the same headSha/baseSha tuple to re-enqueuePR_FILES. If a prior fetch on that tuple failed, the retarget refresh is silently dropped, andpr_files/scoring_data_storedstay pinned to the old base — defeating the #115 fix.Steps to Reproduce
pull_request.synchronizeevent that enqueuesfetch-pr-filesfor a tracked PR.removeOnFail: 50.pull_request.editedwebhook withchanges.base(the base-retarget path from fix(webhook): refresh PR files on pull_request.edited base retarget (#62) #115) for the same PR, where the head SHA is unchanged.prFilesJobId(repoFullName, prNumber, headSha, baseSha).The same scenario reproduces from
FetchProcessor.enqueuePrFilesJob(used by the worker-side re-enqueue path).Expected Behavior
A later
pull_request.edited(base retarget) or any other same-SHA re-enqueue should be able to enqueue a fresh files refresh after a previous files job failed.The mirror should eventually update:
pr_filespr_file_contentspull_requests.scoring_data_stored/pull_requests.base_shaconsistency after a base retargetActual Behavior
A retained failed files job can cause later refresh attempts for the same (repo, PR, head SHA, base SHA) tuple to be ignored by BullMQ. The mirror keeps stale file scoring data until the failed job is manually removed or evicted by the
removeOnFail: 50retention window.Environment
Additional Context
Affected code:
packages/das/src/webhook/handlers/pull-request.handler.ts:121packages/das/src/queue/fetch.processor.ts:213(insideenqueuePrFilesJob)Suggested narrow fix: mirror #118's diff — replace
removeOnFail: 50withremoveOnFail: trueat the two enqueue sites above so failed jobs evict immediately and free the deterministic jobId.This is distinct from existing issues / PRs:
PR_METADATAjobs only.PR_FILESre-enqueue at all; this issue covers whether that re-enqueue can actually run after a prior failure.api/admin.controller.tsalso enqueuePR_FILES; they are out of scope here and can be a follow-up if operator-visible failure history is intentional there.