feat: in-place torch upgrade with rollback (detection + pill + backgr…#56
Merged
Conversation
…ound job)
Adds a self-contained path for users with a working CUDA torch install to
upgrade to a newer wheel without re-running the full 2.5 GB GPU bundle.
Surfaces as a compact pill in the Image / Video Studio runtime banners
when ``realGenerationAvailable`` AND the matching cu{N} pip index serves
a newer wheel than the one on disk; silent otherwise.
Backend
- ``_install_helpers.py``: ``_extract_cuda_tag``, ``_index_url_for_cuda_tag``,
``_parse_version_triple``, ``_classify_torch_upgrade``,
``_query_latest_torch_version`` (pip index versions parser, both output
shapes), ``_abi_dependents_present``, ``_move_torch_to_rollback`` +
``_restore_torch_from_rollback`` + ``_cleanup_old_torch_rollbacks``,
``_TORCH_ABI_DEPENDENT_PACKAGES`` constant.
- ``routes/setup/torch_upgrade.py``: GET /api/setup/torch-upgrade-available
(synchronous detection, returns ``{available, current, latest,
upgradeType, rebuildPackages, indexUrl}`` or ``{available: false,
reason}``) and POST /api/setup/upgrade-torch (background job mirroring
install-gpu-bundle pattern). Worker moves existing torch to
``.torch-rollback-<version>/`` instead of purging, installs target from
the same cu{N} index, re-pins constraint, force-reinstalls ABI deps on
minor/major bumps (bitsandbytes/torchao/nunchaku/sageattention),
verifies CUDA in a subprocess, restores rollback on verify failure,
keeps the most recent rollback as a safety net.
Frontend
- ``src/api/setup.ts``: ``checkTorchUpgradeAvailable``, ``startTorchUpgrade``,
``getTorchUpgradeStatus`` + 4 exported types. Re-exported via
``src/api/index.ts``.
- ``src/components/TorchUpgradePill.tsx``: one-shot probe on mount,
hides when ``available: false``, three display states (available /
in-progress / done-or-error), polls status at 1.5 Hz with cleanup
keyed by ``job.done``, inline collapsible install log with
phase-named markers. Restart Backend hook plumbed through.
- ``src/styles.css``: color-coded badges per upgrade type
(patch=green / minor=amber / major=red).
- Wired into ``ImageStudioRuntimeBanner`` + ``VideoStudioRuntimeBanner``;
renders only when ``realGenerationAvailable`` so users with broken
torch are not second-guessed.
Tests
- 24 new tests in ``tests/test_setup_routes.py`` covering every helper
(version parsing edge cases including ``2.6.0rc1`` that caught a real
bug in the first cut where digits across non-digit boundaries leaked
into the parsed triple), both pip output shapes, rollback move/restore
round-trip with simulated half-install in extras, cleanup mtime
ordering, and all 8 detection-response shapes plus the apple-silicon
rejection and running-job POST cases.
Drive-by: package-lock.json version field was lagging at 0.8.0 after
the 0.9.0 bump; synced.
Verified: 24/24 new tests pass, 80/80 ``test_setup_routes.py`` pass,
217/217 setup + backend + services + inference pass, 371/371 vitest
pass, ``tsc --noEmit`` clean. Pre-existing ``test_cache_strategies`` /
``test_sdcpp_*`` / ``test_preview_thumbnails`` failures verified to
exist on baseline (Windows env + optional diffusers deps), unrelated.
Both branches added a new setup submodule + API client surface at the
same insertion points:
staging: routes/setup/mtplx.py + Mtplx{Attempt,JobState,Status}
this PR: routes/setup/torch_upgrade.py + TorchUpgrade{Availability,
Attempt,JobState,Type,UnavailableReason}
Resolution keeps both — order in routes/setup/__init__.py is now
alphabetical (longlive → mtplx → torch_upgrade → turbo → wan_install),
both routers register. In src/api/setup.ts the MTPLX block goes first
(it landed on staging first) and the Torch upgrade block follows;
src/api/index.ts re-exports both, alphabetised.
Verified post-merge: 80/80 test_setup_routes pass, 371/371 vitest
pass, ``tsc --noEmit`` clean, and both routers register the expected
paths (``/api/setup/install-mtplx`` + ``/api/setup/upgrade-torch``).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
…ound job)
Adds a self-contained path for users with a working CUDA torch install to upgrade to a newer wheel without re-running the full 2.5 GB GPU bundle. Surfaces as a compact pill in the Image / Video Studio runtime banners when
realGenerationAvailableAND the matching cu{N} pip index serves a newer wheel than the one on disk; silent otherwise.Backend
_install_helpers.py:_extract_cuda_tag,_index_url_for_cuda_tag,_parse_version_triple,_classify_torch_upgrade,_query_latest_torch_version(pip index versions parser, both output shapes),_abi_dependents_present,_move_torch_to_rollback+_restore_torch_from_rollback+_cleanup_old_torch_rollbacks,_TORCH_ABI_DEPENDENT_PACKAGESconstant.routes/setup/torch_upgrade.py: GET /api/setup/torch-upgrade-available (synchronous detection, returns{available, current, latest, upgradeType, rebuildPackages, indexUrl}or{available: false, reason}) and POST /api/setup/upgrade-torch (background job mirroring install-gpu-bundle pattern). Worker moves existing torch to.torch-rollback-<version>/instead of purging, installs target from the same cu{N} index, re-pins constraint, force-reinstalls ABI deps on minor/major bumps (bitsandbytes/torchao/nunchaku/sageattention), verifies CUDA in a subprocess, restores rollback on verify failure, keeps the most recent rollback as a safety net.Frontend
src/api/setup.ts:checkTorchUpgradeAvailable,startTorchUpgrade,getTorchUpgradeStatus+ 4 exported types. Re-exported viasrc/api/index.ts.src/components/TorchUpgradePill.tsx: one-shot probe on mount, hides whenavailable: false, three display states (available / in-progress / done-or-error), polls status at 1.5 Hz with cleanup keyed byjob.done, inline collapsible install log with phase-named markers. Restart Backend hook plumbed through.src/styles.css: color-coded badges per upgrade type (patch=green / minor=amber / major=red).ImageStudioRuntimeBanner+VideoStudioRuntimeBanner; renders only whenrealGenerationAvailableso users with broken torch are not second-guessed.Tests
tests/test_setup_routes.pycovering every helper (version parsing edge cases including2.6.0rc1that caught a real bug in the first cut where digits across non-digit boundaries leaked into the parsed triple), both pip output shapes, rollback move/restore round-trip with simulated half-install in extras, cleanup mtime ordering, and all 8 detection-response shapes plus the apple-silicon rejection and running-job POST cases.Drive-by: package-lock.json version field was lagging at 0.8.0 after the 0.9.0 bump; synced.
Verified: 24/24 new tests pass, 80/80
test_setup_routes.pypass, 217/217 setup + backend + services + inference pass, 371/371 vitest pass,tsc --noEmitclean. Pre-existingtest_cache_strategies/test_sdcpp_*/test_preview_thumbnailsfailures verified to exist on baseline (Windows env + optional diffusers deps), unrelated.