Enable mgpu in FrameView#5514
Conversation
Greptile SummaryThis PR removes the
Confidence Score: 4/5Safe to merge after the The core Fabric device-allowlist removal is straightforward and the cuda:0 path is unaffected. The blocking concern is that
Important Files Changed
Sequence DiagramsequenceDiagram
participant Caller
participant FabricFrameView
participant USDRTSelectPrims
participant WarpKernel
participant UsdFrameView
Caller->>FabricFrameView: "__init__(device=cuda:N)"
Note over FabricFrameView: No device allowlist check (removed)
Caller->>FabricFrameView: set_world_poses(positions)
alt Fabric enabled
FabricFrameView->>USDRTSelectPrims: "SelectPrims(device=cuda:N)"
FabricFrameView->>WarpKernel: launch(compose_fabric_transformation)
FabricFrameView->>FabricFrameView: _prepare_for_reuse()
else Fabric disabled
FabricFrameView->>UsdFrameView: set_world_poses(...)
end
Caller->>FabricFrameView: get_scales()
alt Fabric enabled
FabricFrameView->>WarpKernel: launch(decompose_fabric_transformation)
FabricFrameView-->>Caller: wp.array (raw — no ProxyArray wrap)
else Fabric disabled
FabricFrameView->>UsdFrameView: get_scales()
FabricFrameView-->>Caller: result
end
Caller->>FabricFrameView: get_world_poses()
alt Fabric enabled
FabricFrameView->>WarpKernel: launch(decompose_fabric_transformation)
FabricFrameView-->>Caller: ProxyArray(positions), ProxyArray(orientations)
end
Reviews (6): Last reviewed commit: "Split FabricFrameView multi-GPU tests in..." | Re-trigger Greptile |
a6cd73e to
2c619fe
Compare
1c2e02d to
8de9a39
Compare
8de9a39 to
e206ba9
Compare
- Allow FabricFrameView to run on cuda:N for any N; USDRT SelectPrims no longer needs cuda:0. - Refactor the Fabric write path into a single _compose_fabric_transform helper shared by set_world_poses, set_scales, and the initial USD->Fabric sync, collapsing the sync to one kernel launch with one PrepareForReuse. - Replace the topology-invariant assert with RuntimeError so it survives python -O. - Add multi_gpu pytest marker plus cuda:1 unit-test coverage for both Fabric write paths, and run them in the existing test-multi-gpu CI job (one extra step, no new job).
The standard pytest invocation in CI runs the fabric test file without filtering on the ``multi_gpu`` marker, so the ``cuda:1`` tests get scheduled on every runner including the single-GPU ones. Previously ``_skip_if_unavailable`` hard-failed via ``pytest.fail`` whenever ``GITHUB_ACTIONS=true`` and the requested device was missing, on the theory that this would catch a misconfigured multi-GPU runner. In practice it just broke the standard CI: the dedicated ``test-fabric-multi-gpu`` workflow already pre-flights ``torch.cuda.device_count() >= 2`` before invoking pytest, so a genuinely misconfigured multi-GPU runner is already caught there. Always skip rather than fail when the requested ``cuda:N`` index isn't available. Drop the now-unused ``import os``.
Kit's CLI parser reads sys.argv directly at startup and segfaults on
pytest flags that collide with its own short options. Running
pytest -m multi_gpu source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py
crashes during collection because Kit sees ``-m multi_gpu`` and exits
with ``Ill formed parameter: -m`` followed by SIGSEGV (exit code 245)
inside ``simulation_app._start_app``.
Strip sys.argv to argv[0] before instantiating AppLauncher. The test
file takes no CLI arguments of its own, mirroring the broader pattern
used by ``test_tiled_camera_env.py`` which assigns
``sys.argv[1:] = args_cli.unittest_args`` after argparse.
wp.to_torch on a ProxyArray is deprecated in favor of the .torch accessor. Switch the three call sites that consume the ProxyArray returned by get_world_poses; leave get_scales call sites alone since that method still returns a raw wp.array (no .torch accessor).
- Add a GPU-count pre-flight step to the test-fabric-multi-gpu CI job so a runner regression to a single GPU fails the workflow instead of silently skipping every cuda:1 test. This is what the comment in _skip_if_unavailable already promised existed. - Note that the sys.argv strip in test_views_xform_prim_fabric.py must stay between the AppLauncher import and its instantiation; any CLI parser or reordering re-exposes Kit to pytest argv and segfaults at startup. - Document the _fabric_usd_sync_done side effect on _compose_fabric_transform so callers can see why subsequent getters stop pulling from USD.
cf57d31 to
a7a6956
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Updated Review (4f262aa)
Commit: 4f262aa6710b19679b5ab94015f0dde9a4fed38b
Previous review: 556b74b (workflow separation in progress)
📋 What Changed Since Last Review
Commit 4f262aa finalizes the workflow separation with a clean split:
| Change | Description |
|---|---|
test-fabric-multi-gpu.yaml |
✅ New dedicated workflow (60 lines) — self-contained CI for Fabric tests |
test-multi-gpu.yaml |
✅ Restored to upstream/develop (removed Fabric test job) |
fabric_frame_view.py |
Minor: relocated TODO comments |
changelog.d/*.rst |
Simplified wording |
test_views_xform_prim_fabric.py |
Style cleanup only |
Key improvement: Complete workflow separation. FabricFrameView changes now trigger only test-fabric-multi-gpu.yaml (via path filter), while test-multi-gpu.yaml returns to its upstream state for distributed-training validation. The two workflows are completely decoupled.
✅ Full PR Summary
This PR removes the cuda:0-only restriction from FabricFrameView, enabling Fabric GPU acceleration on any CUDA device. This unblocks distributed training where each rank is pinned to a non-primary GPU (e.g., cuda:1).
🔍 Code Review
Architecture:
- ✅ Clean removal of
_fabric_supported_devicesallowlist and associated guards - ✅ Minimal, surgical change — core write paths unchanged
- ✅ Well-scoped TODO comments reference follow-up PRs (#5673, #5674)
- ✅ Docstrings updated to reflect multi-GPU support
Error Handling:
- ✅
RuntimeErrorreplacesassertfor topology-change invariant (survivespython -O) - ✅
_skip_if_unavailable()gracefully skips tests on single-GPU runners
Test Coverage:
- ✅ Three
cuda:1-parameterized tests: roundtrip poses, no-writeback, scales roundtrip - ✅ New
multi_gpupytest marker registered inpyproject.toml - ✅ Kit argv stripping prevents segfault from pytest flags
- ✅ Uses
.torchaccessor instead of deprecatedwp.to_torch()
CI Design:
- ✅
test-fabric-multi-gpu.yaml— dedicated 60-line workflow with GPU pre-flight - ✅ Path-filtered triggers: only runs on changes to FabricFrameView or its tests
- ✅ Runner pre-flight (
torch.cuda.device_count() >= 2) fails loudly if misconfigured - ✅
test-multi-gpu.yamlbyte-identical to upstream/develop — zero risk of regression
🚦 CI Status
| Check | Status |
|---|---|
| Build Wheel | ✅ Pass |
| changelog fragments | ✅ Pass |
| Broken Links | ✅ Pass |
| labeler | ✅ Pass |
| pre-commit | ⏳ Pending |
| FabricFrameView multi-GPU tests | ⏳ Pending (self-hosted runner) |
| license-check | ⏳ Pending |
| Installation Tests | ⏳ Pending |
| Docs | ⏳ Pending |
📝 Verdict
LGTM — ready for human review. The workflow separation is now complete and clean. Core functionality is well-tested with appropriate CI coverage. The change is low-risk: cuda:0 path is unchanged, and the new cuda:1 tests provide explicit coverage. Once the pending CI checks pass (particularly the multi-GPU tests on the self-hosted runner), this PR is ready to merge.
Reviewed at: 2026-05-20T16:25 UTC
a7a6956 to
556b74b
Compare
556b74b to
4f262aa
Compare
4f262aa to
cc1d789
Compare
cc1d789 to
c42ec86
Compare
There was a problem hiding this comment.
🤖 Isaac Lab Review Bot — Updated Review (a039313)
Commit: a03931317a2be2486357370d03326e9c398f86ef
Previous review: c42ec86 → Now fully finalized
📋 What Changed Since Last Review
Two commits since c42ec86:
| Commit | Change |
|---|---|
60d99b86 |
Split FabricFrameView multi-GPU tests into dedicated workflow |
a0393131 |
Disable multi-GPU workflow (no self-hosted runner available) |
Key changes:
- Workflow separation complete —
test-fabric-multi-gpu.yamlis now a standalone workflow file, completely decoupled fromtest-multi-gpu.yaml - Workflow disabled — Multi-GPU workflow set to
workflow_dispatchonly with clear comment explaining why (no multi-GPU runner provisioned yet) - Pytest marker removed —
multi_gpumarker dropped frompyproject.toml; tests now useISAACLAB_TEST_MULTI_GPU=1env var gate via@pytest.mark.skipif - Changelog simplified — Removed extraneous "Changed" section, now focuses purely on the core fix
- Code cleanup — Removed redundant comments in
fabric_frame_view.py, usestorch.as_tensor()consistently in tests
✅ Full PR Summary
This PR removes the cuda:0-only restriction from FabricFrameView, enabling Fabric GPU acceleration on any CUDA device (cuda:0, cuda:1, etc.). This unblocks distributed training where each rank is pinned to a non-primary GPU.
🔍 Code Review
Architecture:
- ✅ Clean removal of
_fabric_supported_devicesallowlist - ✅ Minimal, surgical change — core Warp kernel paths unchanged
- ✅ TODO comments reference follow-up PRs (#5673, #5674)
- ✅ Docstrings updated for multi-GPU support
Test Coverage:
- ✅ Three
cuda:1-parameterized tests gated byISAACLAB_TEST_MULTI_GPU=1 - ✅
_skip_if_unavailable()gracefully skips on single-GPU runners - ✅ Uses
torch.as_tensor()for Warp→Torch conversion (consistent API)
CI Design:
- ✅
test-fabric-multi-gpu.yaml— dedicated workflow (disabled until runner available) - ✅ Path-filtered triggers ready for when runner is provisioned
- ✅
test-multi-gpu.yamlreturned to upstream state (zero diff risk) - ✅ Env var gating avoids pytest marker complexity
🚦 CI Status
| Check | Status |
|---|---|
| pre-commit | ✅ Pass |
| Check changelog fragments | ✅ Pass |
| Build Wheel | ✅ Pass |
| Check for Broken Links | ✅ Pass |
| Detect Changes | ✅ Pass |
| labeler | ✅ Pass |
| Load Config | ✅ Pass |
| Installation Tests | ⏳ Pending |
| Build Latest Docs | ⏳ Pending |
| license-check | ⏳ Pending |
📝 Verdict
LGTM — ready for human review. The workflow separation is clean and complete. The PR is low-risk:
cuda:0behavior unchanged- New
cuda:1tests provide explicit coverage (will run when multi-GPU runner is provisioned) - Multi-GPU workflow correctly disabled to avoid queue-indefinitely issue
Once CI passes, this is ready to merge.
Reviewed at: 2026-05-20T19:34 UTC
Move the test-fabric-multi-gpu job out of test-multi-gpu.yaml and into a dedicated test-fabric-multi-gpu.yaml. The two workflows share the same runner label, install step, and GPU pre-flight, but trigger on disjoint path sets so changes to FabricFrameView no longer gate the distributed-training validation and vice versa. test-multi-gpu.yaml is now byte-identical to upstream/develop.
c42ec86 to
60d99b8
Compare
No self-hosted runner with the 'multi-gpu' label is registered. All runs queue indefinitely. Kept as workflow_dispatch only so it can be manually triggered once a runner is provisioned. See also .github/workflows/test-multi-gpu.yaml (same issue).
The three cuda:1-parameterised tests in test_views_xform_prim_fabric.py were added by PR isaac-sim#5514 to validate FabricFrameView's SelectPrims path on non-zero CUDA devices. They currently hang indefinitely on real multi-GPU hardware (reproduced locally on 3x RTX 6000 Pro Blackwell and on the [self-hosted, ..., multi-gpu] runner pool). Flipping ISAACLAB_TEST_MULTI_GPU=1 in this workflow runs them as intended. The 25-min workflow timeout will cancel the job, surfacing the hang in CI so the FabricFrameView maintainers can iterate on a fix. Land this PR once the hang is resolved.
Re-enables the pull_request trigger in test-fabric-multi-gpu.yaml and wires it to run the FabricFrameView contract tests (including the three cuda:1-parameterised variants added in isaac-sim#5514) inside the pre-built Isaac Lab Docker image on the [self-hosted, ..., multi-gpu] runner pool. Setup: - Image: nvcr.io/nvidian/isaac-lab:latest-develop (published by publish-images.yaml on every develop push, bundles Isaac Sim + Isaac Lab). Pulled with --platform linux/amd64 to sidestep a multi-arch manifest issue. - ISAACLAB_TEST_MULTI_GPU=1 enables the cuda:1 tests. - Workspace mounted + reinstalled --no-deps editable so PR source overrides the baked-in copy. Status: this PR is expected to fail with the 25-min workflow timeout. It surfaces the FabricFrameView SelectPrims hang on non-zero CUDA device indices (reproduced locally on 3x RTX 6000 Pro Blackwell and on the multi-GPU runner pool). Land this PR once the underlying hang in fabric_frame_view.py is fixed.
Re-enables the pull_request trigger in test-fabric-multi-gpu.yaml and wires it to run the FabricFrameView contract tests (including the three cuda:1-parameterised variants added in isaac-sim#5514) inside the pre-built Isaac Lab Docker image on the [self-hosted, ..., multi-gpu] runner pool. Setup: - Image: nvcr.io/nvidian/isaac-lab:latest-develop (published by publish-images.yaml on every develop push, bundles Isaac Sim + Isaac Lab). Pulled with --platform linux/amd64 to sidestep a multi-arch manifest issue. - ISAACLAB_TEST_MULTI_GPU=1 enables the cuda:1 tests. - Workspace mounted + reinstalled --no-deps editable so PR source overrides the baked-in copy. Status: this PR is expected to fail with the 25-min workflow timeout. It surfaces the FabricFrameView SelectPrims hang on non-zero CUDA device indices (reproduced locally on 3x RTX 6000 Pro Blackwell and on the multi-GPU runner pool). Land this PR once the underlying hang in fabric_frame_view.py is fixed.
Re-enables the pull_request trigger in test-fabric-multi-gpu.yaml and wires it to run the FabricFrameView contract tests (including the three cuda:1-parameterised variants added in isaac-sim#5514) inside the pre-built Isaac Lab Docker image on the [self-hosted, ..., multi-gpu] runner pool. Setup: - Image: nvcr.io/nvidian/isaac-lab:latest-develop (published by publish-images.yaml on every develop push, bundles Isaac Sim + Isaac Lab). Pulled with --platform linux/amd64 to sidestep a multi-arch manifest issue. - ISAACLAB_TEST_MULTI_GPU=1 enables the cuda:1 tests. - Workspace mounted + reinstalled --no-deps editable so PR source overrides the baked-in copy. Status: this PR is expected to fail with the 25-min workflow timeout. It surfaces the FabricFrameView SelectPrims hang on non-zero CUDA device indices (reproduced locally on 3x RTX 6000 Pro Blackwell and on the multi-GPU runner pool). Land this PR once the underlying hang in fabric_frame_view.py is fixed.
Re-enables the pull_request trigger on test-fabric-multi-gpu.yaml and wires it to run the FabricFrameView contract tests with ISAACLAB_TEST_MULTI_GPU=1, which activates the three cuda:1 -parameterised tests added in isaac-sim#5514. The cuda:1 tests target FabricFrameView's SelectPrims path on non-zero CUDA device indices. They currently hang indefinitely on real multi-GPU hardware (reproduced locally on 3x RTX 6000 Pro Blackwell and on the multi-GPU runner pool); the 60-min workflow timeout will cancel the job and surface the regression in CI for the FabricFrameView maintainers. Install pipeline matches isaac-sim#5738's proven-working layout: - Pin Python 3.12 via SHA-pinned actions/setup-python. - Pre-install cmake via pip to skip install.py's sudo apt-get branch. - ./isaaclab.sh --install none (core only, avoids egl_probe libEGL). - pip install isaacsim[all,extscache]==${vars.ISAACSIM_BASE_VERSION || '6.0.0'} --extra-index-url https://pypi.nvidia.com. - Bypass Kit's interactive EULA via OMNI_KIT_ACCEPT_EULA / ACCEPT_EULA / ISAAC_SIM_HEADLESS. Status: this PR is expected to fail with the 60-min workflow timeout. Land once the underlying hang in fabric_frame_view.py is fixed.
Adds a single helper, cuda_test_devices(), that converts a 3-position
device mask (env-var ISAACLAB_TEST_DEVICES, default '110') into the
list of device strings tests parametrize over. Single-GPU CI sees no
change (default mask '110' resolves to [cpu, cuda:0], identical to the
hardcoded lists tests carry today). The new multi-GPU-pytest workflow
sets ISAACLAB_TEST_DEVICES=001 so migrated tests run on cuda:1 only.
Mask grammar: each position is 0 or 1, optional trailing X expands to
all remaining positions. Position 0 -> cpu; position k>=1 -> cuda:{k-1}.
Strict mode raises on missing devices; non-strict returns empty for
opt-in tests that should skip on hosts that can't satisfy them.
P0 migration (pure-Python utility tests, no Kit):
* source/isaaclab/test/utils/test_math.py: 45 parametrize sites +
2 inline for-loops migrated.
* source/isaaclab/test/utils/test_wrench_composer.py: 37 sites.
* source/isaaclab/test/utils/test_episode_data.py: 5 sites.
Each migrated site replaces a hardcoded [cpu, cuda:0] (or the reversed
or tuple form) with cuda_test_devices(). Migration is additive - one
import line per file plus the inline edits. No test logic changes.
Workflow: .github/workflows/test-multi-gpu-pytest.yaml runs on the
[self-hosted, ..., multi-gpu] pool with ISAACLAB_TEST_DEVICES=001.
Triggered on changes to the helper, the P0 test files, or the
workflow itself.
Excluded scope (to follow up after CI validates this MVP):
* P1 light-Kit tests (test_simulation_context, test_views_xform_prim,
test_newton_model_utils, test_views_xform_prim_newton).
* P2 asset tests (test_articulation / test_rigid_object on physx and
newton backends).
* FabricFrameView cuda:1 tests (PR isaac-sim#5514) - separate path, the
SelectPrims deadlock there is tracked independently.
Reverts the fabric-specific .github/workflows/test-fabric-multi-gpu.yaml
edits that were carried on this branch from the earlier PR scope; that
demo is independent of this framework work.
Adds a single helper, cuda_test_devices(), that converts a 3-position
device mask (env-var ISAACLAB_TEST_DEVICES, default '110') into the
list of device strings tests parametrize over. Single-GPU CI sees no
change (default mask '110' resolves to [cpu, cuda:0], identical to the
hardcoded lists tests carry today). The new multi-GPU-pytest workflow
sets ISAACLAB_TEST_DEVICES=001 so migrated tests run on cuda:1 only.
Mask grammar: each position is 0 or 1, optional trailing X expands to
all remaining positions. Position 0 -> cpu; position k>=1 -> cuda:{k-1}.
Strict mode raises on missing devices; non-strict returns empty for
opt-in tests that should skip on hosts that can't satisfy them.
P0 migration (pure-Python utility tests, no Kit):
* source/isaaclab/test/utils/test_math.py: 45 parametrize sites +
2 inline for-loops migrated.
* source/isaaclab/test/utils/test_wrench_composer.py: 37 sites.
* source/isaaclab/test/utils/test_episode_data.py: 5 sites.
Each migrated site replaces a hardcoded [cpu, cuda:0] (or the reversed
or tuple form) with cuda_test_devices(). Migration is additive - one
import line per file plus the inline edits. No test logic changes.
Workflow: .github/workflows/test-multi-gpu-pytest.yaml runs on the
[self-hosted, ..., multi-gpu] pool with ISAACLAB_TEST_DEVICES=001.
Triggered on changes to the helper, the P0 test files, or the
workflow itself.
Excluded scope (to follow up after CI validates this MVP):
* P1 light-Kit tests (test_simulation_context, test_views_xform_prim,
test_newton_model_utils, test_views_xform_prim_newton).
* P2 asset tests (test_articulation / test_rigid_object on physx and
newton backends).
* FabricFrameView cuda:1 tests (PR isaac-sim#5514) - separate path, the
SelectPrims deadlock there is tracked independently.
Reverts the fabric-specific .github/workflows/test-fabric-multi-gpu.yaml
edits that were carried on this branch from the earlier PR scope; that
demo is independent of this framework work.
Description
Removes the
cuda:0-only restriction inFabricFrameView. USDRTSelectPrimsnow accepts any CUDA device index, so Fabric acceleration runs on the simulation device (e.g.,cuda:1) instead of silently falling back to the slower USD path. This unblocks distributed training where each process is pinned to a specific GPU.Changes:
_fabric_supported_devices, the device guard in__init__, and the corresponding assertion in_initialize_fabric. Any CUDA device (or CPU) now works.cuda:1-parameterized tests gated byISAACLAB_TEST_MULTI_GPU=1env var, plus a dedicated CI workflow on the multi-GPU runner that sets it.wp.to_torch()calls. Replaced with.torchaccessor on ProxyArray (avoids DeprecationWarning).Type of change
cuda:0continues to work exactly as before;cuda:1+ now also works instead of silently falling back to USD. No public API surface changed.Checklist
pre-commitchecks with./isaaclab.sh --formatconfig/extension.tomlfileCONTRIBUTORS.mdor my name already exists thereTest plan
Three new tests gated by
ISAACLAB_TEST_MULTI_GPU=1and parameterized with["cuda:1"]:test_fabric_cuda1_world_pose_roundtrip—set_world_poses→get_world_posesreturns the same values on a non-primary CUDA device.test_fabric_cuda1_no_usd_writeback— Fabric writes oncuda:1do not write back to USD.test_fabric_cuda1_scales_roundtrip— covers theset_scaleswrite path oncuda:1.A dedicated CI workflow (
test-fabric-multi-gpu.yaml) runs on the[self-hosted, linux, x64, gpu, multi-gpu]runner withISAACLAB_TEST_MULTI_GPU=1set. Pre-flights withnvidia-smiandtorch.cuda.device_count(), fails loudly if the runner has < 2 GPUs.To verify locally on a multi-GPU machine:
ISAACLAB_TEST_MULTI_GPU=1 ./isaaclab.sh -p -m pytest \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -vTo verify the
cuda:0path is unchanged (multi-GPU tests auto-skip):./isaaclab.sh -p -m pytest \ source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py -v