Skip to content

[hipDNN] ALMIOPEN-1943 Add pytest suite and CI workflow for Python bindings#7600

Draft
tvy-amd wants to merge 89 commits into
developfrom
users/tvy/python_binding_tests
Draft

[hipDNN] ALMIOPEN-1943 Add pytest suite and CI workflow for Python bindings#7600
tvy-amd wants to merge 89 commits into
developfrom
users/tvy/python_binding_tests

Conversation

@tvy-amd
Copy link
Copy Markdown
Contributor

@tvy-amd tvy-amd commented May 19, 2026

Summary

Add a pytest test suite (42 tests) for the hipDNN Python bindings and a GitHub Actions workflow to run them on GPU hardware with miopen-provider. Default HIPDNN_BUILD_PYTHON_BINDINGS to OFF so existing builds are unaffected.

Risk Assessment

Low risk. Adds new test infrastructure and a CI workflow without modifying production library code. The only product-code change is flipping a CMake option default from ON to OFF, which is additive — existing consumers who set the flag explicitly are unaffected.

Testing Summary

  • Full pytest suite run locally with miopen-provider on GPU hardware (42/42 passed).
  • Non-GPU subset validated without provider (6/6 passed).

Testing Checklist

  • Python binding tests (local) - pytest projects/hipdnn/python/hipdnn_frontend/test/ -v - ASICs: gfx942 - Status: Passed (42/42)
  • PR CI - GitHub PR checks - Status: Pending

Technical Changes

  • Add projects/hipdnn/python/hipdnn_frontend/test/ with tests covering tensor API, graph API, convolution (fprop/dgrad/wgrad), matmul, and device buffer operations.
  • Extract shared test helpers (build_conv_fprop_graph, execute_graph) into helpers.py since pytest does not support importing from conftest.py as a regular module.
  • Fix tests for updated Python binding API: Tensor.create() no longer auto-assigns UIDs, Graph.tensor(attrs) removed.
  • Register custom pytest markers (gpu, integration) in pyproject.toml.
  • Add .github/workflows/hipdnn-python-tests.yml that builds hipDNN + miopen-provider on azure-linux-scale-rocm and runs the full suite.
  • Default HIPDNN_BUILD_PYTHON_BINDINGS to OFF in CMakeLists.txt; CI workflow explicitly enables it.

tvy-amd and others added 30 commits May 12, 2026 11:50
…arking changes

Rewrite python/CMakeLists.txt to use scikit-build-core's SKBUILD variable
for wheel-specific install rules (following the stinkytofu pattern) instead
of unconditional dist-packages install. Regular cmake --install stages the
wheel via pip wheel in an else() branch, avoiding recursion.

Revert all dnn-benchmarking packaging changes to keep this branch focused
on hipdnn-frontend Python bindings only.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Let scikit-build-core handle Python files via wheel.packages instead of
cmake install rules. Simplifies SKBUILD block to only install the .so.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Installs the built hipdnn-frontend wheel into a fresh venv and runs
17 smoke tests covering imports, enums, Tensor/Graph construction,
method chaining, serialization helpers, and engine ID management.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
When built via TheRock superbuild, nanobind is provided as a third-party
dependency. For standalone builds, FetchContent fetches it from GitHub.
Also bumps FetchContent tag from v1.8.0 to v2.4.0.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…ndings

Replace inline FetchContent for nanobind with the project's
hipdnn_add_dependency pattern so find-or-fetch behavior is consistent
with all other dependencies. Add tsl-robin-map as a separate dependency
since nanobind is configured with NB_USE_SUBMODULE_DEPS OFF.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Exclude Python, nanobind, and _deps headers from clang-tidy's header
filter to prevent false positives on third-party code. Rename binding
functions from snake_case to camelBack to match project naming
conventions, and add braces around bare if/else bodies.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Add shared bindings.hpp header with forward declarations to resolve
misc-use-internal-linkage. Suppress unavoidable performance-no-int-to-ptr
for Python FFI pointer casts. Use const references for nb::object and
nb::bytes parameters. Add const to status/error variables and use auto
with reinterpret_cast.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Only the module development component is needed for nanobind extension
modules, not the full Development (which also requires Development.Embed
and libpython). This matches the pattern used by other Python bindings in
the repo (stinkytofu, origami, rocisa).

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…ding package locations

When pip wheel re-invokes CMake on python/CMakeLists.txt standalone,
hipdnn_add_dependency is not available. Include Dependencies.cmake as
a fallback. Forward nanobind_DIR, tsl-robin-map_DIR, and version
variables so find_package() picks up already-built packages from
TheRock without re-downloading.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
The version variables are unnecessary — Dependencies.cmake has matching
hardcoded defaults, and when packages are found via nanobind_DIR /
tsl-robin-map_DIR the version is not used.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Add integration tests for conv_fprop, conv_dgrad, conv_wgrad, and matmul
that mirror the existing C++ samples. Add unit-style API tests for Tensor,
Graph, and DeviceBuffer. Convert the existing engine_id_overloads test to
pytest format. Configure pytest in pyproject.toml with gpu and integration
markers.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…hardcoded compiler paths

Forward FETCHCONTENT_SOURCE_DIR for nanobind and tsl-robin-map so the
standalone wheel build reuses already-downloaded sources when the parent
used FetchContent (where _DIR variables are empty).

Remove hardcoded /opt/rocm/llvm/bin/clang compiler paths from
pyproject.toml — let the environment or -C flags control the compiler.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
The pip wheel subprocess could not find the HIP package because
hip_DIR was not forwarded from the parent CMake configuration.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
The pip wheel subprocess finds HIP via hip_DIR but hip-config-amd.cmake
calls find_dependency for AMDDeviceLibs, amd_comgr, and hsa-runtime64.
Without forwarding these _DIR variables the isolated build environment
cannot locate them, causing CMake configuration failure in TheRock CI.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…nctions

Replace _determine_git_tag indirection with direct use of the VERSION
parameter for the git tag. The version is already passed from the caller
via hipdnn_add_dependency so the macro lookup is unnecessary.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Replace the scikit-build-core + pip subprocess approach with a standalone
wheel assembly script, eliminating the need to forward CMake dependency
variables (hip_DIR, AMDDeviceLibs_DIR, etc.) through pip config settings.

The superbuild path now stages the compiled .so and calls
assemble_wheel.py (stdlib only) to produce the wheel directly. The
developer path uses hatchling with a custom build hook that either
compiles via CMake or packages a pre-built .so via HIPDNN_PREBUILT_SO.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
tvy-amd and others added 7 commits May 20, 2026 16:48
Reuse the ENV{ROCM_PATH} → CACHE fallback pattern used across
the repo's toolchain files instead of a custom check.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
- Add subdirectory build section documenting HIPDNN_BUILD_PYTHON_BINDINGS
- Add bindings.hpp and test/ to project structure
- Collapse samples into single line, add matmul sample
- Move hipDNN install prerequisite to standalone build section
- Remove verbose per-sample descriptions

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
…ocm-libraries into users/tvy/python_binding_tests

# Conflicts:
#	projects/hipdnn/python/pyproject.toml
When installed via ROCm wheels, libhipdnn_backend.so lives in a separate
package directory outside LD_LIBRARY_PATH. Use rocm_sdk.preload_libraries
to load it with RTLD_GLOBAL before importing the extension module.

Co-Authored-By: Claude Opus 4 <noreply@anthropic.com>
Base automatically changed from users/tvy/python_bindings to develop May 26, 2026 06:06
tvy-amd and others added 2 commits May 26, 2026 15:59
… users/tvy/python_binding_tests

# Conflicts:
#	projects/hipdnn/python/hipdnn_frontend/__init__.py
#	projects/hipdnn/python/pyproject.toml
Adds a CI System section describing TheRock as hipDNN's CI build system,
the TheRock hash pin location, the CI build flow, the 3rd-party dependency
constraints (no FetchContent from GitHub), the CI workflow files, and the
hipDNN CMake flags set by therock_matrix.py.

Co-Authored-By: Claude Opus 4.7 <noreply@anthropic.com>
tvy-amd added 10 commits May 26, 2026 16:53
- EnginePluginResourceManager dtor: mark cleanup lambda noexcept and wrap
  log calls / iteration in try/catch so the destructor cannot throw
  (bugprone-exception-escape).
- PlatformUtils.linux.hpp: make dlsym() result pointer const
  (misc-const-correctness).
- GraphDescriptor::toString: use char literal '}' instead of string
  literal "}" (performance-faster-string-find).
Remove the noexcept marker from the inner cleanup lambda and the nested
try/catch around each log call. Per-iteration try/catch around
safeDestroyHandle keeps cleanup of remaining handles going if a log call
throws, and keeps the destructor from escaping an exception
(bugprone-exception-escape).
clang-tidy bugprone-empty-catch -> warning treated as error in CI. The
catch is intentionally empty: re-logging from here would recurse into
the same throwing fmt/sink path the outer catch is guarding against.
- PlatformUtils.linux.hpp: add pointee const to dlsym result
  (misc-const-correctness)
- EngineDescriptor.cpp: use char literal '}' for string append
  (performance-faster-string-find)
- MiopenBatchnormFwdTrainingPlan.cpp: hoist prev/next running mean/variance
  pointers to void* const via ternary (misc-const-correctness)
- MiopenBatchnormBwdPlan.cpp: hoist biasPtr to void* const via ternary
  (misc-const-correctness)
tvy-amd added 8 commits May 26, 2026 14:35
…ptor codegen

clang-tidy performance-faster-string-find treats `str += "}"` and `str += "]"`
as warnings (treated as errors in CI). Replace with `str += '}';` and
`str += ']';` in all generated descriptor `.cpp` files and the
`descriptor.cpp.j2` template that produces them.
The python-tests job builds the full repo (hipdnn + miopen-provider) with
ENABLE_CLANG_TIDY defaulting to ON on Linux, so clang-tidy ran across
miopen-provider sources. The dedicated `clang-tidy (hipdnn)` job intentionally
scopes to `projects/hipdnn` only, and miopen-provider clang-tidy is disabled
upstream (issue #5067). Disable clang-tidy here so the python-tests job
verifies the Python bindings build/test rather than re-running C++ linting
that is already gated (or intentionally not gated) elsewhere.
Tests require a ROCm-capable GPU; azure-linux-scale-rocm has none,
causing pytest to fail with 'no ROCm-capable device is detected'.
Switch to linux-gfx942-1gpu-ossci-rocm (canonical 1-GPU gfx94X label
per TheRock amdgpu_family_matrix.py).
Switch runs-on back to azure-linux-scale-rocm and re-enable clang-tidy
(remove -DENABLE_CLANG_TIDY=OFF) per user direction.
azure-linux-scale-rocm has no GPU; pytest fails with 'no ROCm-capable
device is detected'. Keep build-only verification until a GPU runner
strategy is sorted out.
Re-add pytest step but pass -m 'not gpu' so it runs only the
non-GPU tests (tensor/graph API, etc.) on the CPU-only
azure-linux-scale-rocm runner. Register the 'gpu' marker in
pyproject.toml so pytest doesn't warn on the filter expression.
test_conv_{dgrad,fprop,wgrad}.py use @pytest.mark.integration; register
it in pyproject.toml so pytest doesn't emit PytestUnknownMarkWarning.
Dedicated 'clang-tidy (hipdnn)' job covers clang-tidy checks. Running
it again in python-tests with the CI runner's older clang-tidy version
produces false positives that the version-pinned dedicated job does
not flag.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant