[hipBLASLt] Expose SM-count-target hint and dynamic persistent tile (ext) toggle#7698
Open
jaopaulolc wants to merge 6 commits into
Open
[hipBLASLt] Expose SM-count-target hint and dynamic persistent tile (ext) toggle#7698jaopaulolc wants to merge 6 commits into
jaopaulolc wants to merge 6 commits into
Conversation
…ext) toggle
Add two matmul-descriptor attributes plus a C++ ext-API toggle that let
callers convey an estimate of how many compute units (CUs / SMs) hipBLASLt
should target, and request the work-stealing StreamK code path:
- HIPBLASLT_MATMUL_DESC_SM_COUNT_TARGET and
HIPBLASLT_MATMUL_PREF_SM_COUNT_TARGET (int32_t; 0 = "use all CUs";
negative rejected with HIPBLAS_STATUS_INVALID_VALUE).
- HIPBLASLT_MATMUL_DESC_DYN_PERSISTENT_TILE_EXT (int32_t bool) plus
hipblaslt_ext::GemmPreference::{set,get}DynPersistentTileEnabled() —
opts the matmul into the hipBLASLt dynamic persistent tile scheduler
(work-stealing StreamK). Lives in the ext API.
- hipblaslt_ext::GemmPreference::{set,get}SmCountTarget() C++ helpers.
- hipblaslt-bench: --sm_count_target and --dyn_persistent_tile CLI
options that forward the values into the matmul descriptor.
Defaults preserve current behavior. The PREF_MAX enum is bumped from 2
to 3; no existing enum values are renumbered, so ABI is preserved.
Unit tests:
- aux_ext_test gtest cases: gemm_preference_sm_count_target_default_is_zero,
gemm_preference_sm_count_target_round_trip,
gemm_preference_dyn_persistent_tile_round_trip (host-only).
- YAML-driven aux_test cases: aux_matmul_sm_count_target,
aux_matmul_dyn_persistent_tile_ext, aux_matmul_pref_sm_count_target.
Co-authored-by: Cursor <cursoragent@cursor.com>
Codecov Report❌ Patch coverage is ❌ Your project status has failed because the head coverage (77.83%) is below the target coverage (80.00%). You can increase the head coverage or adjust the target coverage. Additional details and impacted files@@ Coverage Diff @@
## develop #7698 +/- ##
===========================================
- Coverage 61.88% 61.87% -0.01%
===========================================
Files 2086 2086
Lines 357161 357195 +34
Branches 53836 53827 -9
===========================================
- Hits 221013 220984 -29
- Misses 117350 117414 +64
+ Partials 18798 18797 -1
*This pull request uses carry forward flags. Click here to find out more.
🚀 New features to boost your workflow:
|
…t duplicates
Move the SM-count-target hint out of the ext API and into the non-ext
hipBLASLt C API, mirroring cuBLAS's cublasSetSmCountTarget /
cublasGetSmCountTarget directly:
- New public functions in hipblaslt.h:
hipblasStatus_t hipblasLtSetSmCountTarget(hipblasLtHandle_t, int32_t);
hipblasStatus_t hipblasLtGetSmCountTarget(hipblasLtHandle_t, int32_t*);
Stored on the handle; 0 (default) = "no override"; negative input
returns HIPBLAS_STATUS_INVALID_VALUE; null pointer to the getter
returns HIPBLAS_STATUS_INVALID_VALUE.
- Internal rocblaslt_set_sm_count_target / rocblaslt_get_sm_count_target
helpers wired through rocblaslt-auxiliary.h with matching status codes
and log_api / log_error tracing, consistent with the existing
attribute-handler conventions.
- _rocblaslt_handle gains an sm_count_target field. The per-matmul
descriptor and per-preference attributes
(HIPBLASLT_MATMUL_DESC_SM_COUNT_TARGET /
HIPBLASLT_MATMUL_PREF_SM_COUNT_TARGET) are unchanged and, when set,
take precedence over the handle-level value (matches cuBLAS layering).
- Remove hipblaslt_ext::GemmPreference::{set,get}SmCountTarget — they
duplicated a cuBLAS-mirrored knob and no longer fit the "ext = no
cuBLAS analogue" rule. setDynPersistentTileEnabled /
getDynPersistentTileEnabled stay in ext (no cuBLAS analogue).
- Tests: replace the two aux_ext_test gemm_preference_sm_count_target_*
cases with four aux_handle_test cases that exercise the new public
C API (default == 0, round-trip including 0 sentinel, negative
rejection preserves prior value, null pointer rejection on getter).
The YAML-driven aux_matmul_sm_count_target / _pref_sm_count_target /
_dyn_persistent_tile_ext cases are unchanged and still cover the
per-desc and per-pref attributes plus the ext toggle.
Co-authored-by: Cursor <cursoragent@cursor.com>
…ranches Codecov flagged the size-validation branches in rocblaslt_auxiliary.cpp's HIPBLASLT_MATMUL_DESC_SM_COUNT_TARGET, HIPBLASLT_MATMUL_DESC_DYN_PERSISTENT_TILE_EXT and HIPBLASLT_MATMUL_PREF_SM_COUNT_TARGET set/get handlers, plus the GemmPreference dyn-persistent-tile setter/getter, as unexecuted in the host-only coverage build (the YAML-driven aux_test cases that already exercise these are gated behind data availability and are not always run for coverage). Add six standalone gtest cases under `aux_attr_test` that drive each undersized-buffer error path through the public hipBLASLt API, plus two `aux_handle_test` cases that pin the rocblaslt_status_invalid_handle branch for hipblasLtSet/GetSmCountTarget, and one `aux_ext_test` case that asserts the default-disabled state of GemmPreference::getDynPersistentTileEnabled. All 14 affected tests pass on gfx950. Co-authored-by: Cursor <cursoragent@cursor.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Motivation
hipBLASLt is gaining an internal dynamic persistent tile scheduler (work-stealing StreamK) that benefits from knowing how many compute units (CUs / SMs) the matmul should target — useful when another kernel (e.g. RCCL) is co-running on the device or when a persistent grid should be sized for a known CU budget. Today hipBLASLt has no public API to convey that hint or to opt the matmul into the dynamic persistent tile path.
cuBLAS already exposes the SM-count hint via two complementary surfaces:
cublasSetSmCountTarget(handle, int)/cublasGetSmCountTarget(handle, int*)(cuBLAS 13.0 §2.4.24–2.4.25).CUBLASLT_MATMUL_DESC_SM_COUNT_TARGETandCUBLASLT_MATMUL_PREF_SM_COUNT_TARGETdescriptor / preference attributes.This PR mirrors both into the non-ext hipBLASLt C API. For the dynamic persistent tile toggle, cuBLAS has no public analogue (it exposes no toggle for its persistent / Cluster Launch Control code path), so that knob lives in the
_EXTattribute namespace and thehipblaslt_extC++ API.Technical Details
Public API additions
Handle-level (non-ext, mirrors cuBLAS):
0(default) = "no override; use all CUs the device exposes".HIPBLAS_STATUS_INVALID_VALUE(matches cuBLAS).HIPBLAS_STATUS_INVALID_VALUE(matches cuBLAS).Per-matmul attributes (non-ext, mirrors cuBLASLt):
HIPBLASLT_MATMUL_DESC_SM_COUNT_TARGET = 33—int32_tmatmul-descriptor attribute.HIPBLASLT_MATMUL_PREF_SM_COUNT_TARGET = 2—int32_tpreference attribute.HIPBLASLT_MATMUL_PREF_MAXbumped 2 → 3; no existing enum values renumbered; ABI preserved.Dynamic persistent tile (ext-only, no cuBLAS analogue):
HIPBLASLT_MATMUL_DESC_DYN_PERSISTENT_TILE_EXT = 104—int32_text attribute; non-zero opts the matmul into the dynamic persistent tile (work-stealing StreamK) scheduler.hipblaslt_ext::GemmPreference::setDynPersistentTileEnabled(bool)/getDynPersistentTileEnabled()C++ ext methods.Internal wiring
_rocblaslt_handlegains ansm_count_targetfield; newrocblaslt_set_sm_count_target/rocblaslt_get_sm_count_targethelpers inrocblaslt-auxiliary.hare wired throughhipblaslt.cppwith the existinglog_api/log_errortracing andRocBlasLtStatusToHIPStatustranslation, consistent with howhipblasLtCreate/hipblasLtDestroyare structured._rocblaslt_matmul_desc/_rocblaslt_matmul_preferencecarry their ownsm_count_targetfields (set/get viahipblasLtMatmulDescSetAttributeandhipblasLtMatmulPreferenceSetAttribute), with the same negative-value rejection and size validation as the other attributes.utility.cppattribute-to-string helper updated for logging.hipblaslt-bench CLI
--sm_count_target <int32_t>and--dyn_persistent_tile <bool>. Both default to "off" so behavior is unchanged unless explicitly requested.--sm_count_targetis validated to be non-negative at parse time (exits with code 1 on failure).clients/common/{include,src}/hipblaslt_bench_options.{hpp,cpp}(new) sotesting_matmul.hppcan forward the values into bothhipblasLtMatmulDescCreatecall-sites without going through the YAML-drivenArgumentsstruct.Removed (ext duplicates of the cuBLAS-mirrored knob)
hipblaslt_ext::GemmPreference::setSmCountTarget(int32_t)andgetSmCountTarget()were removed in the second commit on this branch (8ddf47b5c2). They duplicated a knob that has a direct cuBLAS analogue and therefore did not belong in the_EXTnamespace. C++ users wanting the same effect should callhipblasLtSetSmCountTargeton the handle, orhipblasLtMatmulPreferenceSetAttribute(pref, HIPBLASLT_MATMUL_PREF_SM_COUNT_TARGET, ...)/ the analogous descriptor setter on the per-matmul object.Tests added
clients/tests/src/hipblaslt_test.cpp, newaux_handle_testsuite):set_sm_count_target_default_is_zero— default value on a fresh handle is0.set_sm_count_target_round_trip— including the0sentinel round-trip.set_sm_count_target_rejects_negative— negative input returnsHIPBLAS_STATUS_INVALID_VALUEand leaves the prior value untouched.get_sm_count_target_rejects_null_pointer— null pointer to the getter returnsHIPBLAS_STATUS_INVALID_VALUE.hipblaslt_test.cpp):aux_ext_test.gemm_preference_dyn_persistent_tile_round_tripfor the survivingGemmPreferenceboolean.aux_testcases for the descriptor and preference attributes (round-trip, default, negative rejection, undersized buffers):aux_matmul_sm_count_targetaux_matmul_dyn_persistent_tile_extaux_matmul_pref_sm_count_targetTest Plan
Built with
cmake --build build -- -j32against ROCm 7.1.1 on an MI355X (gfx950).libhipblaslt.so,hipblaslt-bench, andhipblaslt-testall built clean. Tests ran frombuild/projects/hipblaslt/clients/.Test Result
All new tests pass (8/8) and the pre-existing pref-attribute tests still pass (regression-check on the
HIPBLASLT_MATMUL_PREF_MAXbump from 2 → 3):hipblaslt-benchCLI validation works:Notes for reviewers
rocblas_gemm_flags_use_cu_efficiencyflag). The C symbol name keepsSM_COUNT_TARGETto preserve the cuBLAS-mirror convention so CUDA porters can locate the attribute by grepping for the cuBLAS name.e04723e908— initial API additions (handle-level setter was missing; SM helpers were in ext).8ddf47b5c2— addshipblasLtSetSmCountTarget/hipblasLtGetSmCountTargetto the non-ext API and removes the now-redundantGemmPreference::{set,get}SmCountTargetext methods. Happy to squash on merge if maintainers prefer a single commit.Submission Checklist
users/<gh-username>/<topic>..pre-commit-config.yaml; clang-format / cmake-lint / trailing-whitespace / large-file checks all clean).HIPBLASLT_MATMUL_PREF_MAXupdated to reflect the new entry.0/falseon every new attribute and on the new handle-level setter.1.4.0section.