Skip to content

feat: in-place torch upgrade with rollback (detection + pill + backgr…#56

Merged
cryptopoly merged 2 commits into
stagingfrom
claude/wizardly-shaw-53da59
May 17, 2026
Merged

feat: in-place torch upgrade with rollback (detection + pill + backgr…#56
cryptopoly merged 2 commits into
stagingfrom
claude/wizardly-shaw-53da59

Conversation

@cryptopoly
Copy link
Copy Markdown
Owner

…ound job)

Adds a self-contained path for users with a working CUDA torch install to upgrade to a newer wheel without re-running the full 2.5 GB GPU bundle. Surfaces as a compact pill in the Image / Video Studio runtime banners when realGenerationAvailable AND the matching cu{N} pip index serves a newer wheel than the one on disk; silent otherwise.

Backend

  • _install_helpers.py: _extract_cuda_tag, _index_url_for_cuda_tag, _parse_version_triple, _classify_torch_upgrade, _query_latest_torch_version (pip index versions parser, both output shapes), _abi_dependents_present, _move_torch_to_rollback + _restore_torch_from_rollback + _cleanup_old_torch_rollbacks, _TORCH_ABI_DEPENDENT_PACKAGES constant.
  • routes/setup/torch_upgrade.py: GET /api/setup/torch-upgrade-available (synchronous detection, returns {available, current, latest, upgradeType, rebuildPackages, indexUrl} or {available: false, reason}) and POST /api/setup/upgrade-torch (background job mirroring install-gpu-bundle pattern). Worker moves existing torch to .torch-rollback-<version>/ instead of purging, installs target from the same cu{N} index, re-pins constraint, force-reinstalls ABI deps on minor/major bumps (bitsandbytes/torchao/nunchaku/sageattention), verifies CUDA in a subprocess, restores rollback on verify failure, keeps the most recent rollback as a safety net.

Frontend

  • src/api/setup.ts: checkTorchUpgradeAvailable, startTorchUpgrade, getTorchUpgradeStatus + 4 exported types. Re-exported via src/api/index.ts.
  • src/components/TorchUpgradePill.tsx: one-shot probe on mount, hides when available: false, three display states (available / in-progress / done-or-error), polls status at 1.5 Hz with cleanup keyed by job.done, inline collapsible install log with phase-named markers. Restart Backend hook plumbed through.
  • src/styles.css: color-coded badges per upgrade type (patch=green / minor=amber / major=red).
  • Wired into ImageStudioRuntimeBanner + VideoStudioRuntimeBanner; renders only when realGenerationAvailable so users with broken torch are not second-guessed.

Tests

  • 24 new tests in tests/test_setup_routes.py covering every helper (version parsing edge cases including 2.6.0rc1 that caught a real bug in the first cut where digits across non-digit boundaries leaked into the parsed triple), both pip output shapes, rollback move/restore round-trip with simulated half-install in extras, cleanup mtime ordering, and all 8 detection-response shapes plus the apple-silicon rejection and running-job POST cases.

Drive-by: package-lock.json version field was lagging at 0.8.0 after the 0.9.0 bump; synced.

Verified: 24/24 new tests pass, 80/80 test_setup_routes.py pass, 217/217 setup + backend + services + inference pass, 371/371 vitest pass, tsc --noEmit clean. Pre-existing test_cache_strategies / test_sdcpp_* / test_preview_thumbnails failures verified to exist on baseline (Windows env + optional diffusers deps), unrelated.

…ound job)

Adds a self-contained path for users with a working CUDA torch install to
upgrade to a newer wheel without re-running the full 2.5 GB GPU bundle.
Surfaces as a compact pill in the Image / Video Studio runtime banners
when ``realGenerationAvailable`` AND the matching cu{N} pip index serves
a newer wheel than the one on disk; silent otherwise.

Backend
- ``_install_helpers.py``: ``_extract_cuda_tag``, ``_index_url_for_cuda_tag``,
  ``_parse_version_triple``, ``_classify_torch_upgrade``,
  ``_query_latest_torch_version`` (pip index versions parser, both output
  shapes), ``_abi_dependents_present``, ``_move_torch_to_rollback`` +
  ``_restore_torch_from_rollback`` + ``_cleanup_old_torch_rollbacks``,
  ``_TORCH_ABI_DEPENDENT_PACKAGES`` constant.
- ``routes/setup/torch_upgrade.py``: GET /api/setup/torch-upgrade-available
  (synchronous detection, returns ``{available, current, latest,
  upgradeType, rebuildPackages, indexUrl}`` or ``{available: false,
  reason}``) and POST /api/setup/upgrade-torch (background job mirroring
  install-gpu-bundle pattern). Worker moves existing torch to
  ``.torch-rollback-<version>/`` instead of purging, installs target from
  the same cu{N} index, re-pins constraint, force-reinstalls ABI deps on
  minor/major bumps (bitsandbytes/torchao/nunchaku/sageattention),
  verifies CUDA in a subprocess, restores rollback on verify failure,
  keeps the most recent rollback as a safety net.

Frontend
- ``src/api/setup.ts``: ``checkTorchUpgradeAvailable``, ``startTorchUpgrade``,
  ``getTorchUpgradeStatus`` + 4 exported types. Re-exported via
  ``src/api/index.ts``.
- ``src/components/TorchUpgradePill.tsx``: one-shot probe on mount,
  hides when ``available: false``, three display states (available /
  in-progress / done-or-error), polls status at 1.5 Hz with cleanup
  keyed by ``job.done``, inline collapsible install log with
  phase-named markers. Restart Backend hook plumbed through.
- ``src/styles.css``: color-coded badges per upgrade type
  (patch=green / minor=amber / major=red).
- Wired into ``ImageStudioRuntimeBanner`` + ``VideoStudioRuntimeBanner``;
  renders only when ``realGenerationAvailable`` so users with broken
  torch are not second-guessed.

Tests
- 24 new tests in ``tests/test_setup_routes.py`` covering every helper
  (version parsing edge cases including ``2.6.0rc1`` that caught a real
  bug in the first cut where digits across non-digit boundaries leaked
  into the parsed triple), both pip output shapes, rollback move/restore
  round-trip with simulated half-install in extras, cleanup mtime
  ordering, and all 8 detection-response shapes plus the apple-silicon
  rejection and running-job POST cases.

Drive-by: package-lock.json version field was lagging at 0.8.0 after
the 0.9.0 bump; synced.

Verified: 24/24 new tests pass, 80/80 ``test_setup_routes.py`` pass,
217/217 setup + backend + services + inference pass, 371/371 vitest
pass, ``tsc --noEmit`` clean. Pre-existing ``test_cache_strategies`` /
``test_sdcpp_*`` / ``test_preview_thumbnails`` failures verified to
exist on baseline (Windows env + optional diffusers deps), unrelated.
Both branches added a new setup submodule + API client surface at the
same insertion points:

  staging:  routes/setup/mtplx.py + Mtplx{Attempt,JobState,Status}
  this PR:  routes/setup/torch_upgrade.py + TorchUpgrade{Availability,
            Attempt,JobState,Type,UnavailableReason}

Resolution keeps both — order in routes/setup/__init__.py is now
alphabetical (longlive → mtplx → torch_upgrade → turbo → wan_install),
both routers register. In src/api/setup.ts the MTPLX block goes first
(it landed on staging first) and the Torch upgrade block follows;
src/api/index.ts re-exports both, alphabetised.

Verified post-merge: 80/80 test_setup_routes pass, 371/371 vitest
pass, ``tsc --noEmit`` clean, and both routers register the expected
paths (``/api/setup/install-mtplx`` + ``/api/setup/upgrade-torch``).
@cryptopoly cryptopoly merged commit a3de63e into staging May 17, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant