Skip to content

feat: Fabric-accelerated get/set_local_poses via indexedfabricarray#5677

Open
pv-nvidia wants to merge 38 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-local-poses
Open

feat: Fabric-accelerated get/set_local_poses via indexedfabricarray#5677
pv-nvidia wants to merge 38 commits into
isaac-sim:developfrom
pv-nvidia:pv/fabric-local-poses

Conversation

@pv-nvidia
Copy link
Copy Markdown
Contributor

@pv-nvidia pv-nvidia commented May 18, 2026

feat: Fabric-accelerated get/set_local_poses via indexedfabricarray

Adds GPU-accelerated get_local_poses / set_local_poses (and the explicit
get/set_local_scales, get/set_world_scales) to FabricFrameView, using
wp.indexedfabricarray over Fabric's omni:fabric:worldMatrix and
omni:fabric:localMatrix. Both directions stay in sync via lazy Warp-kernel
recomputes (no round-trip through USD).

Builds on Piotr's prototype at
bareya/pbarejko/camera-update.

Benchmark (1024 prims, 50 iterations, A6000)

Operation USD (ms) Fabric (ms) Speedup
Get World Poses 13.56 0.067 201×
Set World Poses 29.97 0.123 244×
Interleaved Set→Get 43.78 0.159 276×
Get Local Poses 6.15 0.064 96×
Set Local Poses 8.61 0.053 163×

Why two layers of indirection: wp.indexedfabricarray

Naive wp.fabricarray(selection, attr) already wraps a Fabric attribute as a
Warp array, but it indexes by selection order, which is opaque to user
code and unstable across Fabric reallocation. wp.indexedfabricarray(fa, indices) overlays a view-side index array on top, so kernels can write
fabric_matrices[view_index] and have it land at the right Fabric slot
regardless of internal layout. Topology changes are detected per-selection
via PrepareForReuse(); the indices and indexed arrays are rebuilt lazily
on the next access.

Why RO/RW PrimSelections are required

The separate RO/RW PrimSelections are not an implementation detail; they are the synchronization contract with Fabric's hierarchy updater. Fabric/Kit runs IFabricHierarchy.update_world_xforms() on the render path, after Isaac Lab has written matrices but before Hydra/RTX consumes omni:fabric:worldMatrix. That updater decides which direction is authoritative from the access flags on the most recent selection:

  • worldMatrix=RW, localMatrix=RO means the user authored world transforms; Fabric must not rebuild world from stale local data.
  • worldMatrix=RO, localMatrix=RW means the user authored local transforms; Fabric may rebuild world from the new local data.
  • worldMatrix=RW, localMatrix=RW is ambiguous and unsafe for this API: Fabric can treat local as canonical and overwrite a just-authored world matrix before the renderer reads it.

This is why the branch uses three persistent selections instead of one combined read/write selection. Dirty tracking keeps Isaac Lab's local/world getters consistent on demand; the asymmetric RO/RW selections keep the renderer-facing Fabric world matrices correct across the render tick.

The design that matters most: three selections with asymmetric RO/RW access

This is the load-bearing correctness choice and the reason the
implementation kept iterating. Three persistent PrimSelection handles
cover the same prims with different per-attribute access flags:

_trans_sel_ro  :  worldMatrix=RO, localMatrix=RO   reads + sync helpers
_world_sel_rw  :  worldMatrix=RW, localMatrix=RO   set_world_poses / _scales
_local_sel_rw  :  worldMatrix=RO, localMatrix=RW   set_local_poses / _scales

Kit runs IFabricHierarchy.update_world_xforms() as part of every render
tick, and that update respects the access flags: if localMatrix is
marked RO on the most recent write, Fabric leaves the user's worldMatrix
write alone; if localMatrix is marked RW, Fabric treats it as the
canonical source and recomputes world from local.

A single combined worldMatrix=RW, localMatrix=RW selection (one earlier
revision of this branch) removes that protection. Fabric sees both
attributes as user-authored, falls back to the hierarchy's canonical
direction (local → world), and recomputes worldMatrix from the still-stale
localMatrix. The user's worldMatrix write is silently overwritten before
the renderer reads it. That was the failure mode behind the
test_output_equal_to_usdcamera regression and any other Camera + RTX path
that drives world poses through Fabric (the RTX delegate consumes
omni:fabric:worldMatrix directly via Hydra's fabric transform reader).

The asymmetric RO/RW layout is the right primitive for that protection,
and it works with the lazy consistency model below rather than fighting it.

Lazy world ↔ local consistency

Setters mark the opposite direction stale on a small enum
(_DirtyFlag.{NONE,WORLD,LOCAL}); the recompute happens only when the
opposite getter is called. No eager Warp kernels run on the hot setter
path. Concretely:

  • set_world_poses / set_world_scales write worldMatrix and set
    _dirty = LOCAL. The next get_local_* recomputes localMatrix = inv(parent_world) * world via a Warp kernel.
  • set_local_poses / set_local_scales write localMatrix and set
    _dirty = WORLD. The next get_world_* recomputes `world = parent_world
    • local`.

If the user interleaves the two setters on the same view in a single
frame, the second one flushes the stale direction before writing (correct
but pays an extra kernel launch). A one-time warning per view instance
flags this so users notice the perf hazard. Sticking to one setter
exclusively keeps the path O(1) kernel launches.

Renderer correctness across the lazy gap is provided by the RO/RW
protection above, not by Isaac Lab doing extra work: between the user's
write and the renderer's read, the only thing that runs is Kit's
update_world_xforms tick, and that tick is well-behaved because the
access flags tell it what to leave alone.

Getter note: indexed Fabric getters intentionally return freshly allocated
ProxyArray buffers without an unconditional wp.synchronize(). Consumers
that need immediate host-visible data should synchronize or copy explicitly;
GPU consumers keep normal Warp stream ordering.

Earlier iterations on this PR (resolved)

For reviewers who saw the older shape of this branch:

  • An interim revision collapsed the three selections into one combined
    ReadWrite(world+local) selection. That broke the renderer-clobber
    protection described above; the latest commit restores the three-selection
    shape.
  • The same interim revision added an eager _flush_deferred_local_writes()
    to set_local_poses and a symmetric eager _sync_local_from_world_if_dirty()
    to set_world_poses as workarounds for the lost protection. These are
    removed in the latest commit -- they're unnecessary once the RO/RW layout
    is back, and they made every setter pay an extra kernel.
  • A change_block / changeBlock context manager was added on
    BaseFrameView and FabricFrameView to batch the now-removed eager
    flushes. With nothing to defer, the API has been removed too.

Net change vs. the previous push on this branch: -198 lines. Net change
vs. develop is dominated by the new set/get_local_* API and the new
indexed Warp kernels in isaaclab.utils.warp.fabric.

Tests

New / kept:

  • test_set_local_poses_roundtrip, test_set_local_then_get_world_with_rotated_parent,
    test_set_world_then_get_local_with_rotated_parent — local⇄world
    correctness, including a 90° Z-rotated parent that catches matrix-storage
    convention bugs.
  • test_initial_seed_with_scaled_parent — the initial USD-seed path
    composes scales correctly across non-unit-scale parents.
  • test_multi_view_per_view_dirty_isolation — two views on the same stage
    don't clear each other's dirty flag (regression for an earlier
    class-level-flag bug).
  • test_prepare_for_reuse_detects_topology_change — polls all three
    selections.
  • test_fabric_rebuild_after_topology_change — rebuilds each selection's
    indexed array after a simulated topology change.
  • test_interleaved_set_emits_warning — the one-time interleave warning.

Removed in the latest commit (obsolete contracts):

  • test_set_local_*_updates_renderer_facing_fabric_world_matrix — asserted
    the eager-update contract that the lazy design deliberately does not
    hold; renderer correctness is now provided by the RO/RW protection.
  • The four test_change_block_* tests — the API is gone.

Verified:

  • pytest test_ray_caster_camera.py::test_output_equal_to_usdcamera — passes
    (was the original failure that motivated the redesign).
  • pytest test_ray_caster_camera.py::test_output_equal_to_usd_camera_when_intrinsics_set
    4/4 pass.
  • pytest test_views_xform_prim_fabric.py — 71 passed, 3 skipped (cuda:1).
  • ./isaaclab.sh -f — clean.

Files changed

  • source/isaaclab/isaaclab/sim/views/base_frame_view.py
  • source/isaaclab/isaaclab/utils/warp/fabric.py (new indexed kernels)
  • source/isaaclab_physx/isaaclab_physx/sim/views/fabric_frame_view.py
  • source/isaaclab_physx/test/sim/test_views_xform_prim_fabric.py
  • source/isaaclab/test/sim/frame_view_contract_utils.py (extended contracts)
  • source/isaaclab/changelog.d/fabric-local-poses.rst
  • source/isaaclab_physx/changelog.d/fabric-local-poses.rst

@github-actions github-actions Bot added documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team labels May 18, 2026
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review Update

Review of new commits: PR was rebased and expanded since last review.
Previously reviewed: b15d6235 | Now reviewing: 80838c00

Summary

The PR has been expanded from 5 to 6 commits, adding Fabric-accelerated get/set_local_poses:

  1. Service locator infrastructure on SimulationContext
  2. Service locator tests + changelog
  3. Indexed Fabric transform kernels in isaaclab.utils.warp.fabric
  4. FabricStageCache as a shared hierarchy handle
  5. (NEW) Merge commit to consolidate branches
  6. (NEW) FabricFrameView rewrite with get/set_local_poses + dirty tracking

✅ New Additions Since Last Review

  • Fabric-accelerated local poses: set_local_poses / get_local_poses now use wp.indexedfabricarray to read/write omni:fabric:localMatrix directly on the GPU
  • Bidirectional world↔local sync:
    • set_world_poses → recomputes localMatrix via _sync_local_from_world()
    • set_local_poses → marks _world_dirty, world recomputed on next get_world_poses
  • Per-view dirty tracking: _world_dirty flag is instance-scoped, so concurrent views on the same stage don't clear each other's flag
  • Parent matrix handling: _build_parent_indexed_array() + _compute_parent_fabric_indices() for parent world matrix lookups
  • Topology-adaptive: PrepareForReuse() calls + _rebuild_trans_ro_arrays() for automatic recovery
  • Comprehensive tests: 13 new integration tests covering local/world consistency, rotated/scaled parents, multi-view isolation

🔧 Remaining Observations

[Minor] Index array dtype mismatch still present

The kernels declare indices: ArrayUInt32, but _compute_fabric_indices() returns dtype=wp.int32:

# fabric_frame_view.py
return wp.array(indices, dtype=wp.int32, device=self._device)

Warp will silently cast, so this works in practice. Suggestion: switch to dtype=wp.uint32 for consistency with the kernel signatures. Not blocking.

[Minor] Undefined buffer references in get_local_poses

get_local_poses references self._fabric_local_translations_buf and self._fabric_local_orientations_buf:

if use_cached:
    translations_wp = self._fabric_local_translations_buf
    orientations_wp = self._fabric_local_orientations_buf

These don't appear to be initialized in _initialize_fabric(). Verify these buffers are created alongside the existing world-pose buffers.

📋 Architecture Notes

The world↔local propagation design is clean:

  • Write world → update local: _sync_local_from_world() runs update_indexed_local_matrix_from_world kernel immediately after world writes
  • Write local → lazy world update: _world_dirty flag defers update_indexed_world_matrix_from_local until the next world read

This asymmetry makes sense: world writes are typically followed by physics steps (which don't need locals), while local writes are often followed by world reads for rendering.

📋 Verdict

LGTM — the new local-pose acceleration is a significant feature addition. The bidirectional sync logic is well-designed, and the test coverage is comprehensive (33 test functions across all new modules). The minor dtype observation from the previous review remains, plus one potential undefined buffer issue to double-check.


Automated review by isaaclab-review-bot • Reviewed at 80838c00

@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch 4 times, most recently from 80838c0 to 9ff3155 Compare May 20, 2026 14:11
Copy link
Copy Markdown

@isaaclab-review-bot isaaclab-review-bot Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🔍 Code Review Update

Reviewing new commits: PR rebased and updated since last review.
Previously reviewed: eb5582ec | Now reviewing: 9ff3155

Summary

This PR delivers a well-architected feature: GPU-accelerated local-pose operations for FabricFrameView. The implementation is comprehensive and addresses the core limitation where local poses previously fell back to USD round-trips.

✅ Strengths

1. Clean Architecture

  • FabricStageCache provides shared hierarchy handles via the service locator pattern, avoiding per-view duplication
  • Three persistent selections (trans_sel_ro, world_sel_rw, local_sel_rw) cleanly separate read vs. write access patterns
  • Factory dispatch in FrameView._get_backend() correctly routes to UsdFrameView when Fabric is unavailable

2. Robust World↔Local Consistency

  • Bidirectional dirty tracking: set_local_poses marks _world_dirty, deferred until next world read
  • _sync_local_from_world() / _sync_world_from_local_if_dirty() keep matrices consistent
  • Per-view dirty flags prevent concurrent views from clearing each other's pending syncs

3. Topology-Adaptive Design

  • PrepareForReuse() + lazy array rebuild in _get_*_array() handles Fabric memory layout changes
  • _rebuild_trans_ro_arrays() consolidates index and indexed-array refresh

4. Excellent Test Coverage

  • 13+ new integration tests covering rotated parents, scaled parents, multi-view isolation
  • test_set_local_then_get_world_with_rotated_parent validates transpose-convention correctness
  • test_multi_view_per_view_dirty_isolation catches per-stage vs. per-view flag bugs

🔧 Minor Observations

[Minor] Index array dtype mismatch

_compute_fabric_indices() returns dtype=wp.int32, but kernels declare indices: ArrayUInt32:

return wp.array(indices, dtype=wp.int32, device=self._device)

Warp silently casts, so this works. Suggestion: use dtype=wp.uint32 for consistency with kernel signatures. Not blocking.

[Nit] Docstring transpose-convention note

The docstrings for update_indexed_local_matrix_from_world and update_indexed_world_matrix_from_local explain the transpose identity well. Consider adding a brief note that this relies on Fabric's row-major storage convention for future maintainers.

[Style] Empty sentinel shape

_fabric_empty_2d_array_sentinel uses shape (0, 0):

self._fabric_empty_2d_array_sentinel = wp.zeros((0, 0), dtype=wp.float32, device=self._device)

This is fine since the kernels gate on shape[0] > 0, but (0, 3) or (0, 4) might be slightly more self-documenting for unused position/quaternion slots.

📋 Architecture Notes

The asymmetric sync strategy is well-reasoned:

  • Write world → sync local immediately: _sync_local_from_world() runs right after world writes because downstream code (e.g., rendering) typically reads locals soon after
  • Write local → lazy world sync: _world_dirty flag defers the world = parent * local kernel until the next world read, avoiding unnecessary computation when multiple local writes occur before a world read

📋 CI Status

CI checks are currently pending. The pre-commit and changelog checks have passed.

📋 Verdict

LGTM — This is a significant feature addition that completes the Fabric acceleration story for FabricFrameView. The bidirectional sync logic is sound, test coverage is thorough, and the codebase is well-documented. The minor dtype observation is non-blocking.


Automated review by isaaclab-review-bot • Reviewed at 9ff31550


Update (0dbfcfc): Reviewed incremental changes. New commits add the isaaclab_ppisp package (PPISP pipeline + Warp kernels), HDR output (RGB_HDR) to both Isaac RTX and OVRTX renderers, PPISP integration in renderer backends, tiled camera views for Kit/Newton visualizers, test improvements (force_interval_events, reduced step counts), and continued FabricFrameView refinements (multi-GPU support, SimulationContext requirement). No new issues found — code is well-structured, tests are comprehensive, and the new package follows established project patterns. Previous minor observations (dtype, sentinel shape, docstring nit) remain non-blocking and unaddressed. LGTM.


Update (cff9c36): Significant refactor of FabricFrameView replacing the custom view_index_attr + fabricarray pattern with wp.indexedfabricarray — cleaner, more idiomatic use of the Warp Fabric API. Key changes: per-selection lazy rebuild via PrepareForReuse() in each _get_*_array() accessor, new _sync_fabric_from_usd_initial() that properly seeds both parent and child matrices (including scale), and removal of the multi-GPU device string mentions from docstrings. Tests updated to remove the xfail on test_set_world_updates_local (feature now works!) and added excellent new coverage for rotated parents, scaled parents, and multi-view dirty-flag isolation. No new issues found. Previous minor observations (int32 vs uint32 dtype, sentinel shape) still apply but remain non-blocking. LGTM.


Update (0b3280d): Commit history rebased/squashed into a single feature commit. Code is functionally identical to the previously-reviewed cff9c36f — no new issues introduced. Previous minor observations (int32 vs uint32 dtype, sentinel shape) still non-blocking. LGTM.


Update (c07b2b5): Incremental changes improve docs (multi-GPU device string clarification), add TODO comments for follow-up PRs (#5673, #5674), harden multi-GPU test skipping in _skip_if_unavailable(), and replace .torch property access with torch.as_tensor() in two tests. All quality improvements, no new issues. LGTM.


Update (179eead): Continuation of the previous commit's cleanup — torch.as_tensor() replacement is now applied comprehensively across all remaining test assertions (interleaved set/get tests, multi-view isolation, scaled-parent tests). Changelog expanded to better describe lazy dirty tracking, interleave detection with performance warning, and topology recovery. No new issues. LGTM.


Update (c40ca6b): Incremental diff from 179eead8 completes the same cleanup pass: adds a set_scales/get_scales TODO noting world-vs-local scale divergence from UsdFrameView, and finishes the torch.as_tensor() migration in all remaining interleaved-set and multi-view test assertions. No new issues. LGTM.


Update (3c2b9ff): Commit rebased/squashed again. The diff from c40ca6b0 shows the full feature state as a single commit — changelog file added (fabric-local-poses.rst), fabric_frame_view.py contains all previously-reviewed changes (indexedfabricarray, lazy dirty tracking, per-view flags, _sync_fabric_from_usd_initial with parent scale seeding), and tests include the full suite (rotated/scaled parents, multi-view isolation, interleave warning). Functionally identical to the previously reviewed state. No new issues. LGTM.


Update (f526b6c): Another rebase onto updated base branch (diverged: 1 ahead, 1 behind from 3c2b9ff9). Code is functionally identical — same single squashed feature commit, same changelog, same tests. No new issues. LGTM.


Update (12705d0): Another rebase onto updated base branch (diverged: 1 ahead, 1 behind from f526b6c7). Same single squashed feature commit — code is functionally identical to previous review. No new issues. LGTM.


Update (dc728d0): Rebase onto updated base. No functional changes. LGTM maintained.


Update (207608e): Minor cleanup — removed helper @wp.func wrappers (_local_from_world_transposed, _world_from_local_transposed) and inlined the matrix math directly in the kernels. Functionally identical, slightly cleaner code. No new issues. LGTM.


Update (945c2a1):Significant improvement — This commit addresses the world-vs-local scale divergence noted in earlier reviews (the set_scales/get_scales TODO). The new design:

  • Deprecates get_scales()/set_scales() with DeprecationWarning across all BaseFrameView subclasses
  • Introduces explicit get_local_scales()/set_local_scales() (operates on xformOp:scale/localMatrix) and get_world_scales()/set_world_scales() (composed scale)
  • Preserves backwards compat via _get_scales_default()/_set_scales_default() that route to the appropriate method per backend (USD→local, Fabric→world)
  • Full implementation across USD, Fabric, Newton, and OVPhysX backends
  • New tests for local/world scale roundtrips on Fabric
  • Changelog updated documenting the deprecation

Clean API design that resolves the previous ambiguity. No new issues. LGTM.


Update (d1dcadd): Bug fixes in UsdFrameView scale methods: (1) replaced GetRow()[:3] with explicit element access (world_mtx[i][j]) for correct matrix extraction, (2) added explicit float() casts for scale division results. Also updated _sync_fabric_from_usd_initial() in FabricFrameView to use get_local_scales() instead of deprecated get_scales() — aligns with the new API. Good follow-up fixes. No new issues. LGTM.


Update (68ff2d0): Completes the scale API deprecation cleanup: removed deprecated get_scales()/set_scales() methods from NewtonSiteFrameView and OvPhysXFrameView. The OvPhysXFrameView.get_scales() was renamed to get_local_scales() with updated docstrings, and set_local_scales() now has an explicit docstring noting the USD-only limitation. Clean follow-through on the deprecation introduced in 945c2a16. No new issues. LGTM.


Update (6370a2d): Large incremental diff reviewed (40+ files changed). Key additions:

  1. Stale sensor data fix (#4970): Contact sensor, IMU, PVA, and joint wrench sensor kernels now gate on timestamp[env] == 0.0 to skip envs not stepped since their last reset. Comprehensive regression tests added across both PhysX and Newton backends. Well-implemented.

  2. Asset initialization refactor: All asset classes (Articulation, RigidObject, RigidObjectCollection, DeformableObject, ContactSensor, FrameTransformer, SurfaceGripper, JointWrenchSensor, RayCaster, Imu, Pva) migrated to resolve_matching_prims_from_source() — cleaner, clone-plan-aware resolution replacing the old find_first_matching_prim() + manual suffix reconstruction.

  3. Legacy OVRTX 0.2.x removal: Version checks, legacy kernels (extract_all_depth_tiles_kernel_legacy, generate_random_colors_from_ids_kernel_legacy), tempfile fallback, packaging.version import all removed. Clean.

  4. pyproject.toml migration: All three packages (isaaclab_ov, isaaclab_ovphysx, isaaclab_physx) moved from setup.py + extension.toml to modern pyproject.toml with importlib.metadata versioning.

  5. Newton test additions: New tests for per-env gravity tracking, collision decimation, stale data after reset, world-local site injection (tuple key now includes per_world bool), and frame view clone-plan resolution.

  6. FabricFrameView refinements: _DirtyFlag enum for bidirectional world↔local lazy sync, interleave detection with one-time warning, _sync_fabric_from_usd_initial() now properly seeds parent world matrices with scale.

Previous minor observations (int32 vs uint32 dtype in fabric indices, sentinel shape (0,0) vs (0,3), docstring transpose-convention nit) remain unaddressed but were explicitly non-blocking. No new issues found. LGTM.


Update (d024384): Scale getters (get_local_scales()/get_world_scales()) now return ProxyArray across all backends (USD, Fabric, Newton, OvPhysX), completing the consistent return-type story started with pose getters. Fabric cached path returns the pre-built _fabric_scales_ta ProxyArray. _sync_fabric_from_usd_initial() simplified by removing duck-type fallback (now relies on guaranteed ProxyArray from get_local_scales()). Shear detection warning added during parent transform initialization — good defensive measure. Six new contract tests for scale API added. Tests updated to use .torch property. No new issues. LGTM.


Update (8b8a0e0): Docstring-only cleanup in base_frame_view.py: removed implementation-detail notes from scale method docstrings, shortened shear warning note in get_world_scales, and updated deprecation wording in get_scales()/set_scales() to reference _get_scales_default/_set_scales_default delegates. No functional changes. LGTM.


Update (5e62111): Reviewed incremental changes. This commit improves the deprecated get_scales()/set_scales() API:

  1. Deprecation warning optimization: Added a class-level _scales_deprecation_warned flag to emit the deprecation warning only once per process, rather than on every call. This is a good UX improvement — repeated warnings clutter logs without adding value.

  2. Method renaming for clarity: Renamed _get_scales_default_get_scales_impl and _set_scales_default_set_scales_impl across all backend implementations (USD, Fabric, Newton, OvPhysX). The new names better convey that these are backend-specific implementations, not "defaults."

  3. Docstring consistency: Updated docstrings to match the renamed methods.

Changes are clean and applied consistently across all four backend implementations. No issues found. LGTM.


Update (3d4f504): Docstring/comment-only cleanup — all references to the deprecated set_scales/get_scales in comments, logger warnings, and class-level documentation are now updated to set_world_scales/get_world_scales, consistent with the API rename introduced in earlier commits. Also adds a missing RST heading underline and a ThreeDWorld reference link in ecosystem.rst (unrelated docs fix). No functional changes. LGTM.


Update (fbd4512): Reviewed incremental changes from 3d4f5047fbd45128.

This commit is a substantial rewrite touching 9 files with significant architectural changes:

  • Scales API fully materialized: get_local_scales()/set_local_scales()/get_world_scales()/set_world_scales() implemented across all backends (USD, Fabric, Newton, OvPhysX) with proper ProxyArray return types
  • Deprecation layer: get_scales()/set_scales() deprecated with one-time DeprecationWarning, routing through _get_scales_impl()/_set_scales_impl() per backend
  • _sync_fabric_from_usd_initial() rewrite: Now seeds only localMatrix + parent worldMatrix, then derives child world via _recompute_world_from_local() — eliminates the previously redundant child worldMatrix kernel launch
  • Benchmark script fixes: wp.clone(positions.warp) and .torch property access for ProxyArray
  • Test coverage: 6 new scale contract tests, interleave tests, multi-view isolation test, rotated/scaled parent tests

Previous inline comments (on review 4358209137):

  • P2 (Redundant kernel launch): Fixed — _sync_fabric_from_usd_initial() no longer composes a child worldMatrix that is immediately overwritten by _recompute_world_from_local()
  • P1 (Missing wp.synchronize() in non-cached getter paths): Still present — non-cached returns in get_world_poses, get_local_poses, get_world_scales, and get_local_scales still lack synchronization before returning the ProxyArray

No new issues found. LGTM.


Update (d6a0db8): Rebase onto updated base branch (300 files changed in diff, but primarily due to base branch updates). Core feature files (fabric_frame_view.py, fabric.py, tests) have minor updates — no functional changes to the Fabric-accelerated local poses implementation.

Previous inline comments status:

  • P2 (Redundant kernel launch): Fixed — confirmed _sync_fabric_from_usd_initial() only composes localMatrix + parent worldMatrix, then derives child world via _recompute_world_from_local()
  • P1 (Missing wp.synchronize() in non-cached getter paths): Still present — get_world_poses, get_local_poses, get_world_scales, and get_local_scales all lack wp.synchronize() before returning when indices is provided. The cached path (indices=None) correctly synchronizes.

Previous minor observations (int32 vs uint32 dtype in _compute_fabric_indices, sentinel shape (0,0)) remain non-blocking. Code is production-ready with the caveat that callers using indexed getters should be aware of async kernel execution. LGTM.

Update (d6a0db8): Large batch of changes: version bumps across all sub-packages (release changelogs consolidated from changelog.d/ fragments), isaaclab_tasks restructured to core/contrib layout (v2.0.0), import omni.physics.tensors.api as physximport omni.physics.tensors as physx for wheel compatibility, and resolve_matching_prims_from_source/get_all_matching_child_prims call sites simplified across all backends (removing manual empty-check + expected-count validation now handled by the utility itself). FabricFrameView code is functionally identical to previously-reviewed c07b2b58/179eead8 — no new logic changes. Previous minor observations (int32 vs uint32 dtype, sentinel shape) still non-blocking. No new issues found. LGTM.


Update (48e9a08): Reviewed incremental changes from d6a0db8e48e9a083. 4 files changed:

  1. Changelog additions (fabric-local-poses.rst across isaaclab, isaaclab_newton, isaaclab_ovphysx): Properly documents the new explicit scale API (get_local_scales/set_local_scales/get_world_scales/set_world_scales) with deprecation notices for the old get_scales/set_scales methods. Each backend's changelog correctly describes its semantics (Newton: local==world since shape_scale is absolute; OvPhysX: delegates to UsdFrameView; USD: operates on xformOp:scale). Well-written.

  2. New scale tests (test_views_xform_prim.py): Two new parametrized tests (test_set_local_scales_then_get_world_scales, test_set_world_scales_then_get_local_scales) exercise the USD-specific cross-space scale conversion under a non-uniform scaled parent (2.0, 1.0, 1.0). The helper _make_scaled_parent_child_view() is clean and reusable. Tests correctly validate the world = parent * local and local = world / parent relationships. Good coverage of an edge case the shared contract tests cannot cover (unit-scale parents only).

No new issues found. Previous minor observations (int32 vs uint32 dtype, sentinel shape) remain non-blocking. LGTM.


Update (3376c5a): Reviewed incremental changes from 48e9a0833376c5a88. This is a large batch (70+ files) dominated by the cartpole task refactoring and unrelated infrastructure changes. Key observations relevant to this PR:

FabricFrameView test updates (test_views_xform_prim_fabric.py):

  • ✅ Removed the xfail override on test_set_world_updates_local — the shared contract test now passes as-is (confirming the Fabric local-pose feature works correctly)
  • Added comprehensive new tests: test_set_local_via_fabric_path, test_get_scales_fabric_path, test_local_scales_roundtrip, test_world_scales_roundtrip, test_prepare_for_reuse_detects_topology_change, rotated-parent transpose verification, scaled-parent seeding, multi-view dirty isolation, interleaved set/get with partial indices, interleave warning tests
  • Fixed tolerance from 1e-71e-5 in rebuild-after-topology test (accommodates float32 SRT compose/decompose drift)
  • Minor style: en-dash -- throughout comments (consistency)

Unrelated to this PR (no review needed):

  • Cartpole camera env consolidated (CartpoleCameraPresetsEnv merged into CartpoleCameraEnv), channel-first [C, H, W] observations, rsl_rl CNN model, sim_launcher moved to isaaclab.app, isaaclab_tasks v2.0.1, isaaclab_visualizers v0.1.1, experimental tasks cfg extraction, etc.

Previous inline comments status:

  • P2 (Redundant kernel launch): Fixed (confirmed in earlier update)
  • P1 (Missing wp.synchronize() in indexed getter paths): Still present — no changes to fabric_frame_view.py implementation in this batch. Remains non-blocking per previous assessment (callers using indexed getters should be aware of async kernel execution).

No new issues found. LGTM.


Update (6f083a9): Minor test improvements: (1) frame_view_contract_utils.py now asserts get_world_scales()/get_local_scales() return ProxyArray, and corrects get_scales() assertions to expect wp.array (matching the deprecated method's return type). (2) Newton test cube size changed from 0.2³ to 2.0³ for better scale-test visibility. No new issues. LGTM.


Update (7262d3c): Trivial formatting-only change: long assert message in frame_view_contract_utils.py line-wrapped for PEP 8 compliance. No functional changes. LGTM.


Update (ff20d88): Single commit adds scale operation benchmarks to benchmark_xform_prim_view.py: get_world_scales, set_world_scales, get_local_scales, and set_local_scales timing loops — follows the same warmup/measure/report pattern as existing pose benchmarks. Also adds get_world_scales() to the warmup block. Clean addition, consistent with the existing benchmark structure. No new issues. LGTM.


Update (80b4914): Deprecated get_scales() / _get_scales_impl() now return ProxyArray instead of wp.array — aligns the deprecated path with the new scale API (get_local_scales/get_world_scales). All four backend implementations (USD, Fabric, Newton, OvPhysX) updated to return the ProxyArray directly rather than extracting .warp. Contract tests updated to assert ProxyArray. Docstring comments changed from "default" to "legacy" wording. Consistent, no new issues. LGTM.


Update (35b752b): Changelog wording cleanup — removed "Scale getters now return ProxyArray" from the changelog entries across isaaclab, isaaclab_newton, and isaaclab_ovphysx packages. No functional changes. LGTM.


Update (406d714): Significant Newton scale API refactor — properly separates xform scales (transform-level, matching the USD FrameView contract) from collision shape geometry scales (Newton-specific shape_scale). Key changes:

  • get_world_scales()/set_world_scales()/get_local_scales()/set_local_scales() now operate on a new per-site _site_xform_scale array initialized from USD xformOp:scale
  • New simple kernels _gather_xform_scales/_scatter_xform_scales for the xform path
  • Old kernels renamed to _gather_shape_scales/_scatter_shape_scales, moved behind private _get_legacy_shape_scales()/_set_legacy_shape_scales() methods
  • Deprecated _get_scales_impl()/_set_scales_impl() now routes to legacy shape-scale methods (correct backwards compat — preserves Newton's prior collision geometry behavior)
  • Scale state flows through the full initialization chain: _resolve_source_prim()_site_label_scales_create_buffers()_site_xform_scale
  • Changelog and test cube size updated accordingly

This is a well-designed separation of concerns that resolves the xform-vs-geometry ambiguity for Newton. No new issues found. LGTM.


Update (b869ee8): Introduces change_block() context manager for batching local→world matrix recomputes. Key design changes:

  1. Eager flush for renderer/FSD compatibility: set_local_poses() and set_local_scales() now eagerly update the renderer-facing omni:fabric:worldMatrix immediately after local writes. This addresses a real-world FSD renderer contract where render delegates read Fabric world matrices directly without triggering Isaac Lab's lazy get_world_*() sync path.

  2. Batching via change_block() context: For callers applying multiple local edits (poses + scales), the change_block() context manager defers the local→world flush until the outermost block exits. This avoids redundant kernel launches — pose and scale edits pay one recompute instead of two.

  3. Depth counter for correct nesting: Implementation uses _change_block_depth counter to handle nested change_block() calls correctly — only the outermost exit triggers the flush.

  4. Excellent test coverage: Three new tests verify:

    • test_set_local_poses_updates_renderer_facing_fabric_world_matrix: Confirms local pose writes update cached Fabric worldMatrix without any getter calls
    • test_set_local_scales_updates_renderer_facing_fabric_world_matrix: Same for local scale writes
    • test_change_block_batches_local_world_matrix_update: Monkeypatches _recompute_world_from_local to count invocations, confirming exactly one call for pose+scale edits inside a change_block
  5. Test updates: Existing tests updated to expect _dirty == NONE after local writes (since they now flush eagerly outside change_block), correctly reflecting the new behavior.

  6. Style commit (b869ee8): Pre-commit formatting applied — no functional changes beyond the prior commit.

This is a well-considered design that balances eager correctness (renderer/FSD consumers see current data) with performance (batching multiple edits). The API is backwards-compatible: existing code without change_block() gets correct-but-slightly-slower behavior; callers can opt into batching for performance. No new issues found. LGTM.


Automated incremental review by isaaclab-review-bot • Reviewed at b869ee8d


Update (ce65ad7): change_block() now accepts update_world_matrices and update_local_matrices parameters, giving callers fine-grained control over which derived matrices are flushed on exit. Nested blocks correctly inherit the outermost block's policy. The refactored _flush_deferred_writes() generalizes the old _flush_deferred_local_writes() to handle both directions. Three new tests cover: suppressing world-matrix flush, requesting local-matrix flush, and verifying the default asymmetric policy. Changes look correct — no new issues found.


Update (75792ae): Cleanup commit — removed the changeBlock() CamelCase alias from both BaseFrameView and FabricFrameView (keeping only the snake_case change_block()), and removed a stale comment block in the Fabric test file. No functional changes, no new issues. LGTM.


Update (3c8cb1b): Trivial formatting change — removed an extra blank line in test_views_xform_prim_fabric.py. No functional changes, no new issues. Previous P1 observation (missing wp.synchronize() in indexed getter paths) remains unaddressed but non-blocking. LGTM.---

Update (bfae01b): Major architectural simplification — removes the change_block() context manager entirely and replaces the eager local→world flush with a fully lazy, RO/RW-protected design. Key changes:

  1. change_block() removed from both BaseFrameView and FabricFrameView. The three-selection layout (_trans_sel_ro, _world_sel_rw, _local_sel_rw) is now the sole correctness mechanism — Fabric's update_world_xforms tick respects the RO/RW flags to avoid clobbering user-authored matrices.

  2. Lazy in both directions: set_local_* now sets _dirty = WORLD (previously flushed eagerly outside change_block). Kit's render-path update_world_xforms handles the world recompute from the new local, protected by _local_sel_rw having worldMatrix=RO.

  3. Extensive class docstring explains the three-selection layout, why a single combined RW selection breaks the Camera + RTX renderer path, and the load-bearing RO/RW invariant.

  4. Tests updated: ~200 lines of change_block-related tests removed (batching, deferred flush, renderer-facing worldMatrix assertions). Remaining tests updated to expect _dirty == WORLD after local writes. _refresh_if_needed() replaced by per-selection _rebuild_trans_ro_arrays() and per-accessor lazy rebuild.

This is a cleaner, simpler design that delegates renderer consistency to Fabric's native hierarchy update mechanism rather than reimplementing it in Isaac Lab. The detailed class docstring makes the "why" clear for future maintainers. No new issues introduced.

Previous P1 observation (missing wp.synchronize() in indexed getter paths) remains unaddressed but non-blocking. LGTM.

Update (d13ed99): New commit adds documentation improvements only — expanded docstrings for subset-index return semantics (freshly allocated buffers, no blocking), and TODO comments explaining the conservative dirty-flag recompute strategy. No logic changes. LGTM.

@pv-nvidia
Copy link
Copy Markdown
Contributor Author

Superseded by the consolidated PR #5728 (pv/fabric-full-stack).

@pv-nvidia pv-nvidia closed this May 22, 2026
@pv-nvidia pv-nvidia reopened this May 22, 2026
@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch 12 times, most recently from 4685420 to 6b56971 Compare May 25, 2026 17:24
@pv-nvidia pv-nvidia marked this pull request as ready for review May 25, 2026 18:05
pv-nvidia and others added 15 commits June 3, 2026 16:13
…ed MR

- Remove test_fabric_kernels.py: @wp.func math is fully exercised by
  the 57 integration tests in test_views_xform_prim_fabric.py
- Merge indexed-fabric-kernels.rst into fabric-local-poses.rst (single changelog)
- @wp.func helpers remain module-private (used by production kernels)
_local_from_world_transposed and _world_from_local_transposed were
one-line wrappers. Inline the math directly into the kernel bodies
(child_world * wp.inverse(parent_world) and child_local * parent_world)
to reduce indirection.
Introduce set_local_scales/get_local_scales and set_world_scales/
get_world_scales on BaseFrameView and all implementations:

- UsdFrameView: local scales read/write xformOp:scale; world scales
  decompose/compose from world transform matrix
- FabricFrameView: local scales operate on localMatrix (marks world
  dirty); world scales operate on worldMatrix (marks local dirty)
- NewtonSiteFrameView: both map to shape_scale (Newton treats scale
  as composed)
- OvPhysxFrameView: delegates to UsdFrameView

The deprecated set_scales/get_scales still work but emit
DeprecationWarning. USD/OvPhysX/Newton default to prior behavior;
Fabric defaults to world scales (matching its pre-existing semantics).
- Use row-major indexing (world_mtx[row][col]) instead of GetRow()
  which returns a Vec4 that can't be unpacked into Vec3d
- Cast numpy float32 to Python float for Gf.Vec3d constructor
- Replace internal get_scales() call with get_local_scales() in
  FabricFrameView._initialize_fabric to avoid deprecation warning
BaseFrameView.get_scales/set_scales are now concrete methods that emit
DeprecationWarning and delegate to _get_scales_default/_set_scales_default.
Subclasses that override them bypass the warning.

Remove the overrides from OvPhysxFrameView and NewtonSiteFrameView so
users get a consistent deprecation warning regardless of backend.
Move the OvPhysxFrameView notes into get_local_scales (the actual impl).
- Remove redundant child world matrix composition in
  _sync_fabric_from_usd_initial: the world matrix is immediately
  recomputed by _recompute_world_from_local() at the end of the method
  via child_world = child_local * parent_world. Eliminating the dead
  compose kernel saves one GPU kernel launch during initialization.

- Fix UsdFrameView.get_world_scales docstring: correctly describes
  extracting 'row lengths' (USD uses a row-vector convention where
  basis vectors are stored in rows), not 'column lengths'.

- Add pseudoroot check to UsdFrameView.set_world_scales: mirrors the
  guard in set_world_poses for consistency. The pseudoroot's identity
  transform means this is not a bug (parent_scale would be (1,1,1)
  regardless), but explicit guards prevent surprises with unusual
  stage structures.

- Document that FabricFrameView.get_local_scales and get_world_scales
  share the same pre-allocated buffer (_fabric_scales_buf). Callers
  interleaving both getters without copying will see overwritten values.
… shear warning

- Change get_local_scales/get_world_scales return type from wp.array to
  ProxyArray across all backends (BaseFrameView, UsdFrameView,
  FabricFrameView, NewtonSiteFrameView, OvPhysxFrameView).

- The deprecated get_scales() still returns wp.array (via .warp unwrap
  in _get_scales_default) to preserve backward compatibility.

- Add scale contract tests to frame_view_contract_utils.py:
  - test_local_scales_default_identity: verify (1,1,1) default
  - test_world_scales_default_identity: verify (1,1,1) default
  - test_set_local_scales_roundtrip: write/read consistency
  - test_set_world_scales_roundtrip: write/read consistency
  - test_local_scales_do_not_affect_local_poses: scale changes preserve T/R
  - test_scale_getters_return_proxyarray: type contract check

- Add shear/skew detection in FabricFrameView._sync_fabric_from_usd_initial:
  logs a one-time warning if any parent world transform has non-orthogonal
  rows (indicating shear), since TRS decomposition cannot represent shear.

- Document the shear limitation in BaseFrameView.get_world_scales docstring.

- Update existing Fabric scale tests to use .torch instead of
  torch.as_tensor (which would trigger ProxyArray deprecation bridge).

- Simplify _sync_fabric_from_usd_initial scale extraction to directly
  unwrap ProxyArray.warp (no longer needs defensive hasattr check).
Base class docstrings should describe the abstract contract without
mentioning implementation specifics (Fabric dirty flags, USD xformOps,
etc.). Move those details to the respective subclass docstrings where
they belong.
- Rename _get_scales_default/_set_scales_default to
  _get_scales_impl/_set_scales_impl across all backends. The 'default'
  name leaked implementation reasoning into the interface.

- Emit the DeprecationWarning only once (class-level flag) instead of
  on every call. Avoids flooding logs when legacy code calls get_scales/
  set_scales in a tight loop.
The ThreeDWorld link removal was unrelated to this PR and caused a
rebase conflict. Revert to upstream's version.
…View

Replace all remaining references to the deprecated set_scales/get_scales
with the new explicit set_world_scales/get_world_scales (and local
variants) in comments, docstrings, and log messages.
…tions

The previous rebase accidentally included a complete rewrite of the
Newton file (from a different branch). Reset to upstream/develop and
add only the 6 scale methods required by the new BaseFrameView ABC.
…rent

Implemented new tests to validate the behavior of world and local scale conversions in a hierarchy with a scaled parent. The tests ensure that setting local scales correctly computes world scales and vice versa, addressing USD-specific scale math not covered by existing tests.
@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch from 48e9a08 to 3376c5a Compare June 3, 2026 15:40
@isaaclab-review-bot
Copy link
Copy Markdown

🔄 Incremental Review Update

Reviewing new commits: 48e9a0833376c5a88
Previous review: #4329309662

Summary

This batch completes the Fabric-accelerated local-pose implementation. The core feature is production-ready.

✅ Verified in New Commits

FabricFrameView local-pose operations (GPU-accelerated):

  • set_local_poses() / get_local_poses() now operate directly on omni:fabric:localMatrix via wp.indexedfabricarray
  • Bidirectional lazy dirty tracking (_DirtyFlag enum): setters flip flags, getters trigger deferred sync only when needed
  • Interleave detection with one-time warning when mixing set_world_poses and set_local_poses on the same view
  • New Warp kernels: compose_indexed_fabric_transforms, decompose_indexed_fabric_transforms, update_indexed_world_matrix_from_local, update_indexed_local_matrix_from_world

Scale API redesign:

  • Explicit get/set_local_scales() and get/set_world_scales() across all backends
  • Old get/set_scales() deprecated with one-time DeprecationWarning
  • Scale getters return ProxyArray (consistent with pose getters)

Test coverage:

  • Removed test_set_world_updates_local xfail override — contract test now passes ✅
  • Added: test_set_local_via_fabric_path, rotated-parent transpose verification, scaled-parent seeding, multi-view dirty isolation, local/world scale roundtrips, scaled-parent USD conversions

📋 Previous Inline Comments Status

Priority Issue Status
P2 Redundant kernel launch in _sync_fabric_from_usd_initial() ✅ Fixed
P1 Missing wp.synchronize() in indexed getter paths Still present (non-blocking)

P1 note: get_world_poses, get_local_poses, get_world_scales, and get_local_scales still lack wp.synchronize() before returning when indices is provided (cached path synchronizes correctly). Per design docs, callers using indexed getters are expected to understand Warp's async kernel execution. Consider adding a docstring note for clarity.

📋 Minor Observations (Unchanged, Non-blocking)

  • _compute_fabric_indices() returns dtype=wp.int32 but kernels expect ArrayUInt32 (Warp silently casts)
  • _fabric_empty_2d_array_sentinel shape (0, 0) could be (0, 3) for clarity

✅ Verdict

LGTM — No new issues found. Core feature is complete with excellent test coverage.


Automated review by isaaclab-review-bot • Reviewed at 3376c5a8

pv-nvidia and others added 11 commits June 3, 2026 23:32
Restores PR isaac-sim#5728's three-selection layout (`_trans_sel_ro`,
`_world_sel_rw`, `_local_sel_rw`) with asymmetric Fabric access flags
on `worldMatrix` and `localMatrix`.  Those flags are what protect the
user's write from being clobbered by Kit's per-tick
`IFabricHierarchy.update_world_xforms`:

* On `set_world_poses` (via `_world_sel_rw`, `localMatrix=RO`), Fabric
  does not recompute world from local -- the user's worldMatrix write
  survives until the renderer reads it.
* On `set_local_poses` (via `_local_sel_rw`, `worldMatrix=RO`), Fabric
  recomputes world from the new local on the next tick -- the renderer
  reads the correct world.

A single combined `worldMatrix=RW, localMatrix=RW` selection (the
recent design on this branch) removed that protection.  Fabric saw
both attributes as user-authored and fell back to the hierarchy's
canonical direction (local -> world), recomputing world from a stale
local and silently overwriting the user's world write.  That was the
failure mode behind the
`test_output_equal_to_usdcamera` regression and any other Camera + RTX
path that drives world poses through Fabric.

With the RO/RW protection back in place, the eager world<->local
flushes introduced by commits "fix: flush Fabric world matrices after
local writes" and the follow-up "set_world_poses eager local sync" are
no longer needed and are removed.  The `change_block` context manager
and its companion helpers existed only to batch those eager flushes;
with the flushes gone, the API has nothing to defer and is removed
from both `BaseFrameView` and `FabricFrameView`.

Class docstring now spells out the load-bearing role of the RO/RW
layout so a future refactor doesn't reintroduce the single-selection
shape.

Tests:

* Removed `test_set_local_*_updates_renderer_facing_fabric_world_matrix`
  (asserted an eager-update contract that the lazy design deliberately
  does not hold; correctness across the next render tick is provided
  by the RO/RW protection, not by an extra Warp kernel).
* Removed the four `test_change_block_*` tests; the API is gone.
* Inverted `test_interleaved_set_emits_no_warning` back to
  `test_interleaved_set_emits_warning`, restored `_dirty == LOCAL`
  assertions in `test_world_scales_roundtrip` and the symmetric
  `WORLD` assertion in `test_local_scales_roundtrip`, and updated
  `test_multi_view_per_view_dirty_isolation` to expect the lazy
  cross-view behavior.
* Adapted `test_prepare_for_reuse_detects_topology_change` and
  `test_fabric_rebuild_after_topology_change` to poll/rebuild all
  three selections.

Verified:

* `pytest test_ray_caster_camera.py::test_output_equal_to_usdcamera` passes.
* `pytest test_ray_caster_camera.py::test_output_equal_to_usd_camera_when_intrinsics_set` 4/4 pass.
* `pytest test_views_xform_prim_fabric.py` 71 passed, 3 skipped (cuda:1).
* `./isaaclab.sh -f` clean.

Net diff -198 lines.
@pv-nvidia pv-nvidia force-pushed the pv/fabric-local-poses branch from 3c8cb1b to bfae01b Compare June 5, 2026 16:17
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

documentation Improvements or additions to documentation isaac-lab Related to Isaac Lab team

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant