Skip to content

fix(jobs): pre-flight workspace check before gapfill/FBA; humanize errors#216

Closed
VibhavSetlur wants to merge 2 commits into
ModelSEED:stagingfrom
VibhavSetlur:fix/preflight-model-existence-before-job
Closed

fix(jobs): pre-flight workspace check before gapfill/FBA; humanize errors#216
VibhavSetlur wants to merge 2 commits into
ModelSEED:stagingfrom
VibhavSetlur:fix/preflight-model-existence-before-job

Conversation

@VibhavSetlur
Copy link
Copy Markdown
Collaborator

Why

Live Flower (poplar:5555) currently shows 543 failed modelseed.gapfill + modelseed.fba jobs all dying with WorkspaceError('_ERROR_Object not found!_ERROR_') — the same handful of broken model refs being retried 10–20 times each by frustrated users.

Task Failed Failure family
modelseed.gapfill 392 Object not found
modelseed.fba 151 Object not found

Every payload shape is identical: model = '/<user>/modelseed/<name>/model' where the workspace object truly does not exist. The frontend builds that ref purely from URL params (app/model/[...path]/page.tsx:1457-1474) and submits the job without verifying the object exists.

This complements José's backend change (exception text now reads No model found at '<path>'. Check that your reconstruct…) by:

  1. Preventing the doomed celery enqueue in the first place.
  2. Surfacing the same actionable wording for any pre-existing failed-job rows still showing the legacy _ERROR_ string in My Jobs.

What

  • lib/utils/jobErrors.ts — new formatJobError(raw, modelRef?) helper. Translates legacy _ERROR_Object not found!_ERROR_ (and bare Object not found) into actionable wording; passes the new backend message through unchanged; returns undefined for empty input.
  • app/model/[...path]/page.tsxsubmitModelJob now calls workspaceGet([modelRef]) before submitting; throws the friendly error if the object is missing; routes both pre-flight and backend errors through formatJobError. The doomed celery job is never enqueued.
  • app/(user-data)/my-jobs/page.tsx — the displayed errorMsg is run through formatJobError(job.error, args.model) so older failed-job rows show the friendly wording too, with the missing path substituted.
  • tests/unit/utils/jobErrors.test.ts — 7 new unit tests covering empty/null inputs, legacy form with/without ref, bare "Object not found", new backend message passthrough, unrelated errors untouched, Error-instance coercion.

What is NOT changed

Test plan

  • npm run lint — 0 errors (pre-existing warnings only, none in touched files)
  • npx tsc --noEmit — clean
  • npm run test:run98/98 pass (7 new for formatJobError)
  • npm run build/model/[...path], /team, /my-jobs all build successfully
  • npm audit --omit=dev --audit-level=high — 0 vulns
  • Dev-server smoke: GET /model/missingowner/modelseed/no-such-model returns 200; client pre-flight runs on Run-FBA / Run-Gapfill click
  • Sam: validate live behaviour against a real missing-model ref in staging before merging to master

🤖 Generated with Claude Code

VibhavSetlur and others added 2 commits June 3, 2026 13:21
…rors

When users navigate to a model ref whose backing workspace object is
missing (a reconstruct that never completed, or a stale/bookmarked link)
and click Run FBA / Run GapFilling, the frontend was enqueuing a celery
job that the backend could only fail with the cryptic PATRIC workspace
text `_ERROR_Object not found!_ERROR_`. Live Flower shows 543 of these
failed jobs across modelseed.gapfill (392) and modelseed.fba (151) —
many users retrying the same broken ref 10-20 times — and complements
the new clearer backend message José added.

This adds a single fix:

- New `lib/utils/jobErrors.ts#formatJobError` translates both the legacy
  `_ERROR_Object not found!_ERROR_` and the explicit "Object not found"
  substrings into actionable wording that names the missing path and
  points users at their reconstruct job. The new backend message is
  recognised and passed through unchanged.
- `app/model/[...path]/page.tsx#submitModelJob` now calls
  `workspaceGet([modelRef])` before submitting, throws the friendly
  error if the object is missing, and routes both the pre-flight and any
  backend rejection through `formatJobError`. The doomed celery job is
  never enqueued.
- `app/(user-data)/my-jobs/page.tsx` runs the displayed `errorMsg`
  through `formatJobError` so older failed-job rows show the new
  actionable wording too, substituting the job's own model ref.

Unit-tested with 7 new cases in `tests/unit/utils/jobErrors.test.ts`
(empty/null inputs, legacy form with/without ref, bare "Object not
found", new backend message passthrough, unrelated errors untouched,
Error-instance coercion). Local lint/typecheck/build clean, 98/98 unit
tests pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The Biochem Integration Tests suite intentionally degrades to "skipped"
when staging.modelseed.org is unreachable, but the live probe in beforeAll
had no internal timeout — on slow CI networks the vitest hookTimeout
(10s) fired before the catch block could mark isApiAvailable = false,
turning the whole suite (and CI) red. Race the probe against an explicit
7s timer so the catch path always runs first; observed master CI flake
matched this same signature on 2026-06-03.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@VibhavSetlur
Copy link
Copy Markdown
Collaborator Author

Closing to re-route through the correct flow: this branch should go into my fork's staging first, then a separate PR from my fork's staging → upstream staging. Replacing this PR with the two-stage version now (same commits, same green CI).

@VibhavSetlur VibhavSetlur deleted the fix/preflight-model-existence-before-job branch June 4, 2026 13:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant