[#4884] Fold Reduce_*(axes) op when all reduced axes have length 1#4885
[#4884] Fold Reduce_*(axes) op when all reduced axes have length 1#4885itikhono wants to merge 1 commit into
Conversation
There was a problem hiding this comment.
Pull request overview
This PR addresses a GPU compilation failure where Reduce* ops that reduce only over unit-length axes (i.e., mathematically no-ops) can reach GPU lowering as fused_reduce<{N,1}> and crash due to missing valid HIP kernel configurations (reported in #4884, observed in YOLO26-pose exports). The fix introduces a simplify_algebra matcher that removes these no-op reductions when the reduced axes are statically known to have length 1.
Changes:
- Add
find_reduce_no_optosimplify_algebrato foldreduce_*ops into their input when all reduced axes are singleton and the input shape is static (skipping dynamic shapes and runtime-axes form). - Add unit tests covering common folding/keeping/skipping scenarios and a YOLO pose-shape regression case.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 3 comments.
| File | Description |
|---|---|
src/simplify_algebra.cpp |
Adds a new matcher (find_reduce_no_op) and wires it into simplify_algebra’s match pipeline to eliminate singleton-axis reductions before GPU lowering. |
test/simplify_algebra_test.cpp |
Adds targeted tests asserting when singleton-axis reductions should fold vs. be preserved/skipped. |
| auto matcher() const | ||
| { | ||
| return match::name("reduce_max", | ||
| "reduce_min", | ||
| "reduce_sum", | ||
| "reduce_prod", | ||
| "reduce_mean", | ||
| "reduce_any", | ||
| "reduce_all"); |
| auto axes = ins->get_operator().to_value()["axes"].to_vector<std::int64_t>(); | ||
| if(axes.empty()) | ||
| return; | ||
|
|
||
| const auto& lens = sh.lens(); | ||
| const auto rank = static_cast<std::int64_t>(lens.size()); | ||
| const bool all_singleton = std::all_of(axes.begin(), axes.end(), [&](std::int64_t a) { | ||
| if(a < 0) | ||
| a += rank; | ||
| return a >= 0 and a < rank and lens[a] == 1; | ||
| }); | ||
| if(all_singleton) | ||
| m.replace_instruction(ins, in); |
| TEST_CASE(simplify_reduce_no_op_singleton_axis) | ||
| { | ||
| check_reduce_folds("reduce_max", {1, 21, 1}, {2}); | ||
| } | ||
|
|
||
| TEST_CASE(simplify_reduce_no_op_negative_axis) | ||
| { | ||
| check_reduce_folds("reduce_sum", {1, 21, 1}, {-1}); | ||
| } | ||
|
|
||
| TEST_CASE(simplify_reduce_no_op_multi_axes) | ||
| { | ||
| check_reduce_folds("reduce_mean", {1, 1, 21, 1}, {0, 1, 3}); | ||
| } | ||
|
|
||
| // yolo*-pose regression: without the fold, GPU lowering JIT-fails on this shape. | ||
| TEST_CASE(simplify_reduce_no_op_yolo_pose_shape) | ||
| { | ||
| check_reduce_folds("reduce_max", {1, 8400, 1}, {-1}); | ||
| } |
|
#4841 already fixes this issue by extending |
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #4885 +/- ##
========================================
Coverage 92.86% 92.86%
========================================
Files 585 585
Lines 30152 30213 +61
========================================
+ Hits 27998 28056 +58
- Misses 2154 2157 +3
🚀 New features to boost your workflow:
|
Ok, I will double check today |
Motivation
migraphx-driver compile --gpucrashes on graphs where aReduce*reduces over axes of length 1: it reaches GPU lowering as afused_reduce<{N, 1}>with no valid HIP kernel. Hits any YOLO26-pose export from Ultralytics; the post-processing emitsSlice → ReduceMax(axes=[-1])over[1, 8400, 1].Repro (real model)
See issue #4884 for more details
Fix
New
find_reduce_no_opmatcher insimplify_algebraforreduce_max / min / sum / prod / mean / any / all. Replaces the op with its input when every entry ofaxesresolves to a unit dim on a static shape. Skips dynamic shapes and the 2-input (runtime-axes) form.simplify_reshapes::find_nop_reshapesalready lists reduce ops but compares full shapes including strides. AfterSlicethe input has non-canonical strides while the reduce output is canonical, so it skips the reduce. We can't relax that check (other ops in the same matcher rely on strides).find_reduce_no_opdoes a reduce-only, lens-only check.Tests
test/simplify_algebra_test.cpp:_singleton_axis,_negative_axis,_multi_axes,_yolo_pose_shape({1, 8400, 1})_keeps_real_reduce,_keeps_partial_singleton_skips_dynamic_input,_skips_variable_axesEnd-to-end
compile --gpuonyolo26s-pose.onnxand the minimal repro now completes.Changelog Category