Fix ROCm GPU arch detection: prefer torch device properties, robust hipInfo.exe lookup on Windows by danielhanchen · Pull Request #1969 · bitsandbytes-foundation/bitsandbytes

danielhanchen · 2026-06-10T03:37:01Z

Problem

On Windows ROCm setups, get_rocm_gpu_arch() probes hipinfo.exe with subprocess.run(["hipinfo.exe"], ...), which resolves through PATH only. In practice hipInfo.exe is rarely reachable that way:

Hosts without the HIP SDK don't have it on PATH at all.
AMD's PyTorch wheels for Windows ship hipInfo.exe into the environment's Scripts directory (next to python.exe), which is only on PATH while the venv is activated — not when the interpreter is invoked directly, via uv run, or embedded.

The probe then raises FileNotFoundError, and every import bitsandbytes logs:

ERROR:bitsandbytes.cuda_specs:Could not detect ROCm GPU architecture: [WinError 2] The system cannot find the file specified
WARNING:bitsandbytes.cuda_specs:
ROCm GPU architecture detection failed despite ROCm being available.

while ROCM_GPU_ARCH silently degrades to "unknown" — even though the GPU works fine.

Fix

Prefer torch.cuda.get_device_properties(0).gcnArchName — torch already knows the architecture on both Linux and Windows, with no subprocess at all. Feature-flag suffixes (e.g. gfx90a:sramecc+:xnack-) are stripped to keep the existing "gfx..." format. This introduces no new device initialization: importing bitsandbytes already initializes the device context in cextension.py via get_cuda_specs() → torch.cuda.get_device_capability().
Keep the rocminfo / hipInfo.exe parsing as a fallback, and on Windows additionally try hipInfo.exe next to python.exe (where AMD's wheels place it) before giving up.
The ERROR/WARNING logging is preserved for genuine failures; behavior on non-ROCm builds is unchanged.

Validation

On a Strix Halo (gfx1151) Windows 11 machine with AMD's wheels (torch 2.11.0+rocm7.13.0), venv not activated, so Scripts is not on PATH and shutil.which("hipinfo.exe") is None:

Before: ROCM_GPU_ARCH == "unknown" plus the ERROR/WARNING above on import.
After: the torch-properties path returns gfx1151 with no log output; forcing the subprocess fallback (mocking torch.cuda.is_available to False) also returns gfx1151 via the Scripts-relative hipInfo.exe.

Added two mocked unit tests that run on any backend. On the ROCm box:

tests/test_cuda_setup_evaluator.py: 6 passed, 4 skipped (CUDA-only)

All pre-commit hooks pass on the changed files.

Context

Adding support for building for AMD on Windows #1846 introduced the Windows hipinfo.exe probe (superseding fix ROCm GPU architecture detection failed on windows #1843 / Update cuda_specs.py #1833, which reported this same failure mode); [ROCm] Windows workflow for creating wheels with ROCm 7.2.1 support #1915 added the Windows ROCm wheels.
Downstream, Unsloth currently works around this by prepending the venv Scripts directory to PATH before importing bitsandbytes (Windows/WSL installer: fix winget msstore cert failure, amd-smi DiskPart prompt, and enable AMD GPU (Strix Halo gfx1151) unslothai/unsloth#5940, commit f48fc9a). This PR fixes it at the source so downstream consumers don't need that.

🤖 Generated with Claude Code

On Windows, get_rocm_gpu_arch() probed hipinfo.exe via PATH only. In practice hipInfo.exe is rarely on PATH: hosts without the HIP SDK do not have it there, and AMD's PyTorch wheels ship hipInfo.exe into the environment's Scripts directory, which is only on PATH when the venv is activated. The probe then raises FileNotFoundError, every import of bitsandbytes logs an ERROR + WARNING, and ROCM_GPU_ARCH silently degrades to unknown. Read torch.cuda.get_device_properties(0).gcnArchName first (works on Linux and Windows, no subprocess); keep the rocminfo / hipInfo.exe parsing as a fallback, additionally trying hipInfo.exe next to python.exe on Windows before giving up. Verified on gfx1151 (Strix Halo, Windows 11, torch 2.11.0+rocm7.13.0): previously unknown + ERROR; now gfx1151 via both the torch path and the forced subprocess fallback. Co-Authored-By: Claude Fable 5 <noreply@anthropic.com>

danielhanchen · 2026-06-10T03:52:40Z

Closing this for now.

matthewdouglas · 2026-06-10T15:39:05Z

@danielhanchen This seems like a reasonable change - what's the reason for closing?

) * Fix bitsandbytes ROCm GPU arch and warp size detection on Windows bitsandbytes resolves the ROCm GPU architecture (and warp size on 0.49.x) by shelling out to rocminfo / hipinfo.exe via PATH at import time. On Windows neither tool is normally on PATH (AMD torch wheels ship hipInfo.exe into the venv Scripts dir, only on PATH while activated), so every `import bitsandbytes` logs an ERROR and WARNING, ROCM_GPU_ARCH degrades to unknown, and the 0.49.x warp size defaults to 64, which is wrong on RDNA (wave 32) and silently disables pre-quantized 4-bit models via ALLOW_PREQUANTIZED_MODELS. Install a one-shot MetaPathFinder before unsloth_zoo is imported (the first bitsandbytes import on ROCm) that swaps get_rocm_gpu_arch and get_rocm_warpsize for torch-device-properties-first implementations right after bitsandbytes.cuda_specs executes, before cextension reads them. Falls back to running hipInfo.exe by absolute path (venv Scripts, conda Scripts, HIP SDK / AMD installer dirs). Repairs the constants in place when bitsandbytes was imported first. Strict no-op on non-Windows, non-ROCm builds, missing bitsandbytes, and versions that fix this upstream. Opt out with UNSLOTH_DISABLE_BNB_ROCM_FIX=1. Proposed upstream in bitsandbytes-foundation/bitsandbytes#1969; shipped here so all bitsandbytes versions are covered. Verified on gfx1151 Strix Halo, Windows 11, torch 2.11.0+rocm7.13.0 against bitsandbytes main, 0.49.2, and a torch-props-fixed variant. * [pre-commit.ci] auto fixes from pre-commit.com hooks for more information, see https://pre-commit.ci * Tighten comments in the bitsandbytes ROCm detection fix Comment and docstring pass only. AST comparison with docstrings stripped confirms every definition is identical to the version the 12 scenario suite ran against, and the suite plus the drift test pass unchanged on the edited files. * Keep the bitsandbytes cuda_specs finder installed for reload support Simulation testing caught a regression in the one-shot design: importlib.reload(bitsandbytes.cuda_specs) re-resolves the spec through sys.meta_path, so with the finder already removed the reload reinstalled the unpatched upstream detector and the Windows ROCm noise returned. Keep the finder on sys.meta_path permanently, matching the lifecycle of the existing causal_conv1d and vllm import blockers. The finder matches a single module name and patching stays idempotent via the sentinel flags, so repeat hits are no-ops. Validated on gfx1151 Windows 11: 22 simulation scenarios (conda and embedded layouts, Program Files scan ordering, paths with spaces and unicode, hanging probe timeout, lru-wrapped and C-function helper shapes, reload, failed-import retry, threads, spawn, dormant finder, Studio PATH coexistence, early fix-block ordering, bnb 0.45.5 / 0.47.0 / 0.49.2 / main / upstream-fixed) plus the original 12 scenario suite, CPU-torch and stale-HIP_PATH sandboxes, Python 3.10 to 3.13 gates, and a WSL Linux leg proving byte-identical Linux behavior with and without the fix, with and without rocminfo on PATH. --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>

danielhanchen closed this Jun 10, 2026

danielhanchen mentioned this pull request Jun 10, 2026

Fix bitsandbytes ROCm GPU arch and warp size detection on Windows unslothai/unsloth#6127

Merged

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Fix ROCm GPU arch detection: prefer torch device properties, robust hipInfo.exe lookup on Windows#1969

Fix ROCm GPU arch detection: prefer torch device properties, robust hipInfo.exe lookup on Windows#1969
danielhanchen wants to merge 1 commit into
bitsandbytes-foundation:mainfrom
danielhanchen:fix/windows-rocm-arch-detection

danielhanchen commented Jun 10, 2026

Uh oh!

danielhanchen commented Jun 10, 2026

Uh oh!

matthewdouglas commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

Conversation

danielhanchen commented Jun 10, 2026

Problem

Fix

Validation

Context

Uh oh!

danielhanchen commented Jun 10, 2026

Uh oh!

matthewdouglas commented Jun 10, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants